San Diego Land Use RAG | Aresh Tajvar

Context & Problem

Municipal land-use documents and zoning laws are notoriously dense and challenging to parse. When citizens or paralegals interact with standard text-retrieval systems or generic LLMs, the models frequently hallucinate or fail to pull the exact municipal codes required for compliance.

As part of my Master's capstone project, I am building an intelligent retrieval system tailored specifically for San Diego land-use regulations that enforces high precision and factual grounding.

Architecture & Approach

To ensure the system retrieves accurate legal phrasing, I bypassed standard embedding models and rigorously benchmarked a domain-adapted Legal-BERT model against the baseline MiniLM architecture.

Because real-world user queries for zoning laws are complex and multi-faceted, I engineered a synthetic 50-query "Ground Truth" dataset using Gemini 2.0 Flash, built directly from legal document chunks. This dataset anchors the evaluation pipeline.

To validate the generation phase, the pipeline assesses outputs using a multi-metric approach: ROUGE-L for structural similarity, BERTScore for semantic similarity, and a strict LLM-as-a-judge faithfulness scoring mechanism to completely penalize hallucinations.

Key Results

Outperformed baseline architectures in both Mean Reciprocal Rank (MRR) and Hit Rate precision.
Quantifiably mitigated hallucination rates through the implementation of a strict faithfulness evaluation pipeline.
Final capstone deployment and end-to-end user evaluation in progress.

View Code on GitHub View Live Demo