# Skip List — Stop Retrying Known Failures
**Agent:** engineer
**Priority:** high
## Problem
Auto-claude keeps retrying files that will never succeed:
- 278 "no_bin" ISOXML files for Dan Weist (task files with no TLG data binary)
- 4,849 Raven job files that need a different parser we don't have yet
- Various "bad_file" entries that error every time
Each retry costs money and wastes a job slot.
## Solution
Add a skip list to the event bridge and data pipeline agent so they stop queuing work for file types/errors that are known dead ends.
### 1. Create skip rules file
File: `/data/agentpi/state/skip_rules.json`
```json
{
"gdrive_skip": {
"file_types": ["raven_job"],
"statuses": ["no_bin", "bad_file", "timeout"],
"note": "Raven jobs need a dedicated parser (not built yet). no_bin = missing TLG data. bad_file = corrupt or unsupported format."
},
"tap_skip": {
"statuses": ["no_bin", "bad_file", "skip_large"],
"note": "These have been tried and confirmed unfixable without new code."
}
}
```
### 2. Update event bridge pending counts
In `check_gdrive_pending()`, exclude skipped statuses from the pending count:
```sql
SELECT customer, COUNT(*) as pending
FROM gdrive_files
WHERE status = 'pending'
AND file_type NOT IN ('raven_job')
AND customer IN (...)
GROUP BY customer HAVING pending >= 5
```
### 3. Update scan_client_drives.py
When processing pending files, skip file_types listed in skip_rules.json.
### 4. Update tap_batch_process.py
Already skips no_bin/bad_file in the processing manifest — but the event bridge still counts them as "pending" which inflates the number. Fix the pending count to subtract known errors.
## Impact
- Dan Weist: 5,319 "pending" drops to ~460 real pending (391 ISOXML + 70 yield shapefiles + 9 shapefile zips - known errors)
- Stops wasting jobs on 4,849 Raven files
- More accurate pending counts on the dashboard