Case Study · Part II · OC01 audit, with receipts
Part I found ten mislabeled chunks at OC01 and named the ship. Three months later we rebuilt the verification stack three times and ended at the only source no one could argue with: NOAA's MarineCadastre AIS archive, second-level vessel positions. The answer is sharper than we expected — and far more interesting.
THREE AUDITS, ONE TRUTH
We ran v7.6 (per-site calibrated, 96.4% honest test accuracy) over the OC01 corpus to verify the relabel. It agreed with every single correction. Then we noticed: v7.6 was trained on those exact corrected labels. The agreement is memorisation, not verification. We threw the result out.
No neural network. Spectral flatness, blade-band energy (5-50 Hz), engine-band energy (50-500 Hz) on every chunk. The 560 "ship" chunks looked QUIETER than the 240 "ambient" chunks on every feature — wrong direction. We thought the corrections were wrong. They weren't. The 4 recordings have different absolute-dB baselines, and the "ambient control" recording itself contains vessels. Pure spectral comparison across recordings is unreliable.
NOAA Office for Coastal Management publishes second-level AIS positions for US waters. 464,413 pings over 48 hours within 100 km of OC01. We re-computed every chunk's nearest vessel within ±2 minutes of the chunk timestamp. Zero chunks have a vessel further than 10 km. OC01 sits at the western mouth of Juan de Fuca strait — the busiest shipping corridor on the US Pacific coast. "Ambient" at this site is a labeling convention, not an acoustic ground truth.
PART I, CONFIRMED TO THE SECOND
Part I claimed JOSCO HUIZHOU was within 10 km at 12:00 UTC, based on a coarse GFW daily query. NOAA's MarineCadastre archive has 698 AIS pings of the same vessel that day, accurate to the second. The closest pass was 6.81 km, at 12:39:55 UTC — inside the CNN flag window (12:14–12:44 UTC). The original case study was right.

Source: MarineCadastre.gov AIS archive · MMSI 477133400 · HKG-flagged cargo · 698 position pings
THE FINDING
All 800 chunks. Both labels. Every minute of the 48-hour audit window. The "ambient control" recording was selected to be quiet relative to known vessel passages, not relative to absolute silence. Absolute silence doesn't exist here.

CPA DISTRIBUTION
Binary ship/ambient labels at this site lose information. The right encoding is a CPA distance band per chunk, attached to the named vessel that produced it.
Most chunks land between 6 and 10 km. The current "ship" labels overlap with "ambient" in this distance range — both contain audible passages.

WHAT THE MODEL ATTENDS TO — AND WHAT IT DOESN'T
Grad-CAM on v7.6's last convolutional block. The CNN attends primarily to cavitation broadband (500–1000 Hz) and the lower engine band (80–250 Hz). It has no access to the blade-rate band (5–50 Hz)— preprocessing masks it before the model sees the spectrogram, by design, to prevent the model learning hydrophone-specific low-frequency noise.
Part I's case study found the JOSCO HUIZHOU signature in the blade-rate band via Welch's PSD on raw audio. The CNN cannot see that band. Yet both methods, plus MarineCadastre AIS positioning, converge on the same vessel. Three independent signal pathways — acoustic broadband, acoustic narrowband, and radio position — agreeing on ground truth.That's how robust detection systems are built.

Methodology · open source
Hourly bbox sweep at 1/2/3/5/7/10/15/25/50 km. Brackets each vessel CPA upper bound per hour, enough to identify which named vessels were near the hydrophone and when.
NOAA Office for Coastal Management public archive. Filter to bbox, compute distance per AIS message, then find the per-chunk nearest vessel within +/- 2 minutes.
128-bin mel spectrograms, blade-band, engine-band, and spectral flatness. No neural network, which is why the cross-recording calibration drift is visible.
The reproducible artifacts are per-chunk CPA, named-vessel attribution, and acoustic-feature JSON. Together they make each label correction traceable back to a timestamped vessel position and a measured signal profile.
METHODOLOGY CONTRIBUTION
Not "we found 560 corrections." We started with that claim and walked it back as the data demanded. What we have instead is sharper — and probably more useful for the next archive revision.
What we're offering
MarineCadastre.gov