Context & Problem
Municipal land-use documents and zoning laws are notoriously dense and challenging to parse. When citizens or paralegals interact with standard text-retrieval systems or generic LLMs, the models frequently hallucinate or fail to pull the exact municipal codes required for compliance.
As part of my Master's capstone project, I am building an intelligent retrieval system tailored specifically for San Diego land-use regulations that enforces high precision and factual grounding.
Architecture & Approach
To ensure the system retrieves accurate legal phrasing, I bypassed standard embedding models and rigorously benchmarked a domain-adapted Legal-BERT model against the baseline MiniLM architecture.
Because real-world user queries for zoning laws are complex and multi-faceted, I engineered a synthetic 50-query "Ground Truth" dataset using Gemini 2.0 Flash, built directly from legal document chunks. This dataset anchors the evaluation pipeline.
To validate the generation phase, the pipeline assesses outputs using a multi-metric approach: ROUGE-L for structural similarity, BERTScore for semantic similarity, and a strict LLM-as-a-judge faithfulness scoring mechanism to completely penalize hallucinations.
Key Results
- Outperformed baseline architectures in both Mean Reciprocal Rank (MRR) and Hit Rate precision.
- Quantifiably mitigated hallucination rates through the implementation of a strict faithfulness evaluation pipeline.
- Final capstone deployment and end-to-end user evaluation in progress.