# TASK: CNH Yield Cleaner Hardening — Moderate & Minor Fixes **Priority:** HIGH **Agent:** engineer **Filed:** 2026-02-16 **Filed by:** Claude Code session (processor audit) --- ## Goal Harden the CNH yield cleaner (`CNH/yield_cleaner.py`) for reliable batch processing across all crop types and combine configurations. Critical fixes (column validation, per-crop conversion, timestamp safety) are already done. This task covers moderate and minor improvements. ## ALREADY FIXED (do not redo) - CSV pre-flight column validation (`load_data()` checks for required columns) - Per-crop conversion factor support (`Config.CROP_CONVERSIONS` dict + `--crop` CLI flag) - Timestamp parsing uses `errors='coerce'` + dropna - Warning for all-zero speed data ## Moderate Issues to Fix ### 1. Flow delay NaN tracking When `_apply_flow_delay()` shifts points, some edge points get NaN coordinates. These should be tracked in `removal_reason` column as 'flow_delay_edge' rather than silently dropped. Check if NaN lat/lon after flow delay are being counted in removal stats. ### 2. Multi-combine normalization robustness The `_normalize_combines()` function assumes a `task` column exists and has non-null values. Some CSVs may not have this column (single combine). Add a guard: if no `task` column or all same value, skip normalization. ### 3. Pass detection with GPS gaps `pass3_combine_artifacts()` detects passes from time gaps. But if GPS drops out briefly (common in tree lines, buildings), it creates false pass boundaries. Consider also checking distance between consecutive points — a 10s gap with <5m movement isn't a real pass break. ### 4. Boundary polygon validation `load_boundary()` doesn't validate the polygon. If the GeoJSON has an invalid geometry (self-intersecting, empty), the boundary clip in pass1 will crash. Add `polygon.is_valid` check and attempt `polygon.buffer(0)` fix for invalid geometries. ### 5. Report PDF generation resilience If matplotlib fails during PDF generation (missing fonts, display issues), the whole run crashes. Wrap PDF generation in try-except and still save the cleaned CSV/TIF even if the PDF fails. ## Minor Issues ### 6. Column name normalization Different CNH firmware versions may use different column names (e.g., 'Dry Yield' vs 'dry_yield', 'Latitude' vs 'lat'). Add a column name normalization step in load_data(). ### 7. Memory usage on large fields For fields with 500K+ points, the IQR + local spatial outlier detection builds a full cKDTree. This can use significant memory on the Pi. Consider chunk-based processing for very large datasets. ### 8. Grain cart calibration factor output When the cleaning is done, output the implied calibration factor: (total_cleaned_volume / grain_cart_volume). This helps Doug calibrate future runs. Just print it if grain cart data is available nearby. ### 9. CSV output encoding Ensure CSV output uses UTF-8 encoding consistently. Some field names may have special characters. ## DO NOT - Do NOT change the 5-pass cleaning algorithm structure - Do NOT modify the already-fixed critical issues - Do NOT change the Config class defaults without calibration data - Do NOT add new output formats — hardening only