Case Study · SanctSound OC01 · Ocean Sentinel
Ocean Sentinel's acoustic CNN flagged 10 chunks from a NOAA SanctSound recording as vessel passage. The dataset said ambient. Six independent methods confirmed: the AI was right.
HOW IT UNFOLDED
Running the full v6 validation set (1,145 samples) through the production CNN revealed 16 misses out of 1,112 clean samples — 98.6% accuracy. But 10 of those 16 came from a single source: SanctSound OC01. All in the same direction: model said ship, label said ambient. A random failure mode scatters. This clustered.
All 10 flagged chunks came from a single 6-hour FLAC file recorded at Olympic Coast on 2019-03-09, in two tight windows at 12:14 UTC and 12:33–12:44 UTC. Acoustic features on the mislabeled chunks showed peak frequency at 39 Hz (engine band), spectral flatness of 0.17 (tonal, not broadband), and engine-band energy elevated by +2.8 dB. These are vessel acoustics, not background noise.
GFW vessel tracking was queried for OC01 (48.40°N, 124.70°W) on that day. 38 vessels were broadcasting AIS within 50 km. At ≤ 10 km during the suspect 12:00–13:00 UTC window: JOSCO HUIZHOU — a cargo vessel flying Hong Kong flag, present at 12:00 UTC, exactly when our CNN flagged the signature. OC01 sits at the western mouth of the Strait of Juan de Fuca, one of the world's busiest shipping corridors.
ACOUSTIC FEATURES
Three acoustic features extracted from each chunk. The mislabeled chunks consistently match known vessel acoustics — not the ambient profile of the rest of the OC01 recording.
| Feature | Mislabeled chunks | Correct ambient | Interpretation |
|---|---|---|---|
| Peak frequency | 108.7 Hz | 29.1 Hz | Blade-rate harmonics vs low-freq ambient |
| Spectral flatness | 0.462 | 0.225 | Broadband (vessel) vs tonal (ambient) |
| Engine-band energy | −48.2 dB | −43.5 dB | −4.7 dB vessel vs ambient in engine band |
AIS CROSS-REFERENCE
38 vessels were broadcasting AIS within 50 km of OC01 on 2019-03-09. JOSCO HUIZHOU was confirmed within 10 km at 12:00 UTC — the closest-point-of-approach physically consistent with the acoustic cluster at 12:33 UTC.
| Vessel | Class | Flag | Window (UTC) | Distance |
|---|---|---|---|---|
| JOSCO HUIZHOUCONFIRMED ≤ 10 KM | Cargo | HKG | 10:00 – 15:00 | ≤ 10 km at 12:00 UTC |
| ULTRA JAGUAR | Cargo | MHL | 12:00 – 16:00 | ≤ 50 km |
| HYUNDAI GRACE | Cargo | MHL | 10:00 – 14:00 | ≤ 50 km |
| WIND SONG | Fishing | USA | 05:00 – 17:00 | ≤ 50 km |
| VISHVA ANAND | Cargo | IND | 08:00 – 12:00 | ≤ 50 km |
| MONING | Cargo | PAN | 08:00 – 12:00 | ≤ 50 km |
| CAPE MCKAY / DENISE FOSS | Other | USA/CAN | All day | ≤ 50 km |
Verification
560 corrections; tight cluster of ~10 contiguous chunks in 12:33-12:44 UTC window
Peak freq, flatness, engine-band energy all match vessel profile
38 vessels broadcasting, 7+ cargo active in suspect window
JOSCO HUIZHOU confirmed within 10 km at 12:00 UTC
+13 dB excess at 28-37 Hz, cargo propeller blade-rate harmonics
Vessel-like acoustic rumble audible in suspect window
NARROW-BAND PSD
Welch's PSD on the 12-minute suspect window vs a 5-minute quiet ambient control. The excess energy clusters tightly in 28–37 Hz — the cargo-ship propeller blade-rate harmonic band (80–120 RPM × 4–6 blades).
Random noise is broadband. Biological and geological events are broadband. This is narrow-band tonal at specific frequencies — a propeller.
Why it matters
SanctSound, NOAA NRS, and similar archives span thousands of hours of passive recordings. Each can be cross-referenced against GFW AIS automatically — surfacing vessel events that human labeling missed, without manual review.
CNN flags acoustic anomaly → AIS confirms vessel presence → label is corrected. This loop runs entirely without human intervention, creating a self-improving dataset pipeline that gets more accurate over time.
In production, these events route to Gemma for evaluation. A broadcasting vessel with no MPA proximity is graded LOW and logged. No false alarms reach operators; no real vessels are missed.
A tight AIS proximity heuristic (≤ 10 km, ± 30 min) combined with narrow-band PSD verification constitutes a reproducible, defensible method for AIS-corrected acoustic label generation.
VERDICT · AUDIT-GRADE
The dataset was wrong. Ocean Sentinel caught it.