Skip List Known Failures

Task Details
← Task Board
Task Description

# Skip List — Stop Retrying Known Failures

**Agent:** engineer
**Priority:** high

## Problem
Auto-claude keeps retrying files that will never succeed:
- 278 "no_bin" ISOXML files for Dan Weist (task files with no TLG data binary)
- 4,849 Raven job files that need a different parser we don't have yet
- Various "bad_file" entries that error every time

Each retry costs money and wastes a job slot.

## Solution
Add a skip list to the event bridge and data pipeline agent so they stop queuing work for file types/errors that are known dead ends.

### 1. Create skip rules file
File: `/data/agentpi/state/skip_rules.json`

```json
{
    "gdrive_skip": {
        "file_types": ["raven_job"],
        "statuses": ["no_bin", "bad_file", "timeout"],
        "note": "Raven jobs need a dedicated parser (not built yet). no_bin = missing TLG data. bad_file = corrupt or unsupported format."
    },
    "tap_skip": {
        "statuses": ["no_bin", "bad_file", "skip_large"],
        "note": "These have been tried and confirmed unfixable without new code."
    }
}
```

### 2. Update event bridge pending counts
In `check_gdrive_pending()`, exclude skipped statuses from the pending count:
```sql
SELECT customer, COUNT(*) as pending
FROM gdrive_files
WHERE status = 'pending'
AND file_type NOT IN ('raven_job')
AND customer IN (...)
GROUP BY customer HAVING pending >= 5
```

### 3. Update scan_client_drives.py
When processing pending files, skip file_types listed in skip_rules.json.

### 4. Update tap_batch_process.py
Already skips no_bin/bad_file in the processing manifest — but the event bridge still counts them as "pending" which inflates the number. Fix the pending count to subtract known errors.

## Impact
- Dan Weist: 5,319 "pending" drops to ~460 real pending (391 ISOXML + 70 yield shapefiles + 9 shapefile zips - known errors)
- Stops wasting jobs on 4,849 Raven files
- More accurate pending counts on the dashboard
Job Queue (0)

No job queue entries for this task yet