Case Study · SanctSound OC01 · Ocean Sentinel

The model was right.
The label was wrong.

Ocean Sentinel's acoustic CNN flagged 10 chunks from a NOAA SanctSound recording as vessel passage. The dataset said ambient. Six independent methods confirmed: the AI was right.

See live detections Part II: 560 mislabels at scale

Real accuracy after label correction

Mislabeled chunks surfaced by CNN

AIS vessels in range that day

Independent verification methods

HOW IT UNFOLDED

Three steps from anomaly to proof.

01

Miss analysis surfaced a cluster

Running the full v6 validation set (1,145 samples) through the production CNN revealed 16 misses out of 1,112 clean samples — 98.6% accuracy. But 10 of those 16 came from a single source: SanctSound OC01. All in the same direction: model said ship, label said ambient. A random failure mode scatters. This clustered.

02

Every miss traced to one recording

All 10 flagged chunks came from a single 6-hour FLAC file recorded at Olympic Coast on 2019-03-09, in two tight windows at 12:14 UTC and 12:33–12:44 UTC. Acoustic features on the mislabeled chunks showed peak frequency at 39 Hz (engine band), spectral flatness of 0.17 (tonal, not broadband), and engine-band energy elevated by +2.8 dB. These are vessel acoustics, not background noise.

03

AIS named the ship

GFW vessel tracking was queried for OC01 (48.40°N, 124.70°W) on that day. 38 vessels were broadcasting AIS within 50 km. At ≤ 10 km during the suspect 12:00–13:00 UTC window: JOSCO HUIZHOU — a cargo vessel flying Hong Kong flag, present at 12:00 UTC, exactly when our CNN flagged the signature. OC01 sits at the western mouth of the Strait of Juan de Fuca, one of the world's busiest shipping corridors.

ACOUSTIC FEATURES

The signature that doesn't lie.

Three acoustic features extracted from each chunk. The mislabeled chunks consistently match known vessel acoustics — not the ambient profile of the rest of the OC01 recording.

Feature	Mislabeled chunks	Correct ambient	Interpretation
Peak frequency	108.7 Hz	29.1 Hz	Blade-rate harmonics vs low-freq ambient
Spectral flatness	0.462	0.225	Broadband (vessel) vs tonal (ambient)
Engine-band energy	−48.2 dB	−43.5 dB	−4.7 dB vessel vs ambient in engine band

AIS CROSS-REFERENCE

The western mouth of the Strait of Juan de Fuca.

38 vessels were broadcasting AIS within 50 km of OC01 on 2019-03-09. JOSCO HUIZHOU was confirmed within 10 km at 12:00 UTC — the closest-point-of-approach physically consistent with the acoustic cluster at 12:33 UTC.

Vessel	Class	Flag	Window (UTC)	Distance
JOSCO HUIZHOUCONFIRMED ≤ 10 KM	Cargo	HKG	10:00 – 15:00	≤ 10 km at 12:00 UTC
ULTRA JAGUAR	Cargo	MHL	12:00 – 16:00	≤ 50 km
HYUNDAI GRACE	Cargo	MHL	10:00 – 14:00	≤ 50 km
WIND SONG	Fishing	USA	05:00 – 17:00	≤ 50 km
VISHVA ANAND	Cargo	IND	08:00 – 12:00	≤ 50 km
MONING	Cargo	PAN	08:00 – 12:00	≤ 50 km
CAPE MCKAY / DENISE FOSS	Other	USA/CAN	All day	≤ 50 km

Verification

Six methods. One conclusion.

01

CNN classification

560 corrections; tight cluster of ~10 contiguous chunks in 12:33-12:44 UTC window

02

Acoustic feature analysis

Peak freq, flatness, engine-band energy all match vessel profile

03

AIS day-scale <= 50 km

38 vessels broadcasting, 7+ cargo active in suspect window

04

AIS tight <= 10 km

JOSCO HUIZHOU confirmed within 10 km at 12:00 UTC

05

Narrow-band PSD

+13 dB excess at 28-37 Hz, cargo propeller blade-rate harmonics

06

Listening verification

Vessel-like acoustic rumble audible in suspect window

NARROW-BAND PSD

Propeller harmonics, measured.

Welch's PSD on the 12-minute suspect window vs a 5-minute quiet ambient control. The excess energy clusters tightly in 28–37 Hz — the cargo-ship propeller blade-rate harmonic band (80–120 RPM × 4–6 blades).

Random noise is broadband. Biological and geological events are broadband. This is narrow-band tonal at specific frequencies — a propeller.

Blade-rate band (5–50 Hz)+4.52 dB

Engine band (50–500 Hz)−2.02 dB

All 10 peak bins28–37 Hz

Active bin

31.25 Hz

+13.17 dB

Why it matters

Every passive acoustic archive can be improved.

01

Data quality at scale

SanctSound, NOAA NRS, and similar archives span thousands of hours of passive recordings. Each can be cross-referenced against GFW AIS automatically — surfacing vessel events that human labeling missed, without manual review.

02

Cross-modal verification loop

CNN flags acoustic anomaly → AIS confirms vessel presence → label is corrected. This loop runs entirely without human intervention, creating a self-improving dataset pipeline that gets more accurate over time.

03

Zero missed alerts

In production, these events route to Gemma for evaluation. A broadcasting vessel with no MPA proximity is graded LOW and logged. No false alarms reach operators; no real vessels are missed.

04

Publishable methodology

A tight AIS proximity heuristic (≤ 10 km, ± 30 min) combined with narrow-band PSD verification constitutes a reproducible, defensible method for AIS-corrected acoustic label generation.

VERDICT · AUDIT-GRADE

Six independent methods, one answer.

The dataset was wrong. Ocean Sentinel caught it.

Open dashboard Continue to Part II

The model was right.The label was wrong.