Authority Names Integration

Task Details
← Task Board

Task Description

# TASK: Integrate Authority Names into Audit Engine

**Priority:** HIGH (authority names are saved but never consumed — feature is half-built)
**Agent:** engineer
**Filed:** 2026-02-16
**Filed by:** Doug (via Claude Code session)

---

## The Problem

The web UI at `farmiq.ai/field-audit/` lets customers type their authoritative field names (one per line, with optional acreage). These are saved to:

```
farmiq.ai/field-audit/data/customers/{customer_id}/authority_names.json
```

Format:
```json
{
  "names": [
    {"name": "Christman North", "acres": 470, "farm": ""},
    {"name": "Turner 09", "acres": 155, "farm": ""},
    {"name": "Bole East", "acres": null, "farm": ""}
  ],
  "updated_at": "2026-02-16T...",
  "count": 3
}
```

**But the audit engine (`FieldNamingAudit/audit_engine.py`) doesn't read this file.** The canonical name suggestion is based solely on frequency + recency + source priority from the boundary files themselves.

Authority names should be the **highest-priority** source — if a customer says their field is called "Christman North", that should win over whatever the shapefile says.

---

## What Needs to Happen

### 1. Load authority names in audit_engine.py

At the start of the audit run, check for `authority_names.json` in the customer's data directory. Parse it into a lookup structure.

### 2. Match authority names to clusters

After spatial clustering is complete (IoU matching), try to match each authority name to a cluster:

**By name similarity:**
- Compare each authority name against all field names in each cluster using fuzzy matching (rapidfuzz or similar)
- Threshold: ratio >= 70% OR partial_ratio >= 85%
- Example: authority "Turner 09" should match cluster containing "T09", "TURNER 09", "Turner East 09"

**By acreage (secondary signal):**
- If authority name has acreage, use it as a tiebreaker: match clusters where avg_area_ac is within 20% of the authority acreage
- This helps when name matching is ambiguous

### 3. Boost canonical name confidence

When an authority name matches a cluster:
- Set `canonical_name` to the authority name (exact spelling from the customer)
- Set `canonical_name_confidence` to 1.0 (or close to it)
- Add flag: `"authority_matched": true`
- Add the authority name to the cluster's name variants list with source "authority"

### 4. Handle unmatched authority names

If an authority name doesn't match any cluster:
- Log it as a warning
- Include in the audit output: `"unmatched_authority_names": [...]`
- This helps the customer see which of their names didn't have boundary data

---

## File Locations

- **Audit engine:** `FieldNamingAudit/audit_engine.py`
- **Name analyzer:** `FieldNamingAudit/name_analyzer.py`
- **Authority names:** `farmiq.ai/field-audit/data/customers/{customer}/authority_names.json`
- **Audit results output:** `farmiq.ai/field-audit/data/customers/{customer}/audit_results/audit.json`

The audit engine's `--scan` argument points to the sources directory. The authority names file is one level up from sources (in the customer directory). So:
```
data/customers/brett/
├── authority_names.json    <-- load this
├── sources/                <-- --scan points here
├── audit_results/          <-- output goes here
└── approved/
```

---

## Testing

1. Create a test `authority_names.json` for Doug's data with known field names (T09, Christman North, etc.)
2. Run the audit engine against Doug's data
3. Verify: authority names should appear as canonical names in the output
4. Verify: unmatched names should appear in the unmatched list

---

## DO NOT

- Do NOT modify the web frontend (app.py, templates) — that's already done
- Do NOT change the authority_names.json format — the web UI writes it as-is
- Do NOT make authority matching mandatory — engine must work fine without authority_names.json
- Do NOT delete or rename any existing field name variants — authority name is additive

Job Queue (0)

No job queue entries for this task yet