Introduce OCR Handler for secret detection in images and videos#4863
Introduce OCR Handler for secret detection in images and videos#4863
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| Err: fmt.Errorf("%w: OCR processing error: %v", ErrProcessingWarning, err), | ||
| } | ||
| h.measureLatencyAndHandleErrors(ctx, start, err, dataOrErrChan) | ||
| return |
There was a problem hiding this comment.
OCR error handler sends duplicate errors to channel
Medium Severity
When OCR processing fails, the error is sent to dataOrErrChan twice — once explicitly on line 80–82, and again inside measureLatencyAndHandleErrors on line 83, which also writes the error to the same channel. Every other handler (defaultHandler, arHandler, archiveHandler, apkHandler) relies solely on measureLatencyAndHandleErrors for error reporting. This causes duplicate error events for consumers of the channel. Worse, if the error is context.DeadlineExceeded, the second write wraps it differently and isFatal returns true, potentially causing unexpected early termination.
| const ( | ||
| maxOCRImageSize = 50 * 1024 * 1024 // 50 MB | ||
| maxOCRVideoSize = 500 * 1024 * 1024 // 500 MB | ||
| frameIntervalSeconds = 1 // Extract 1 frame per second. |
There was a problem hiding this comment.
Interval constant incorrectly used as frame rate
Low Severity
The constant frameIntervalSeconds (named as a time interval) is passed directly to ffmpeg's fps filter, which expects a frame rate (frames per second). This works by coincidence because the value is 1 (1 fps = 1 second interval), but the semantics are inverted. If someone changes the value to 2 (intending a frame every 2 seconds), it would instead extract 2 frames per second — the exact opposite of the intent.


Problem Statement
Secret Leakage Through Visual Media is a Blind Spot in Secret Scanning
Secret scanning tools today operate exclusively on text-based content — source code, config files, logs, and documents. But credentials and secrets increasingly appear in visual media: screenshots of terminal sessions, screen recordings of deployments, documentation images showing API keys, and video tutorials where dashboards with tokens are briefly visible.
These secrets are completely invisible to current scanning pipelines because image and video files are treated as opaque binaries and skipped entirely. An AWS key pasted in a screenshot committed to a repo is just as dangerous as one in a .env file, but no scanner will catch it.
Our Solution
We extend TruffleHog's scanning pipeline with an OCR-powered handler that extracts text from images (PNG, JPG, JPEG) and video frames (MP4, MKV, WEBM), then feeds it through the existing secret detection engine. Same decoders, same detectors, same verification.
Team:
@mustansir14 @MuneebUllahKhan222 @amanfcp
Key design decisions:
Accuracy Improvements
Out-of-the-box tesseract struggles with monospaced IDE/terminal fonts. We've tuned the pipeline in several ways:
Usage
Scan a directory for secrets in images and videos
Requirements:
tesseractandffmpegmust be installed and available in PATH when--enable-ocris set. Images work with tesseract alone; video requires both.Challenges / Constraints
Future Improvements
Making It Production-Ready
This closes a real gap in the secret scanning landscape, secrets don't stop being secrets just because they're in a screenshot.
Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Adds an opt-in file handler that shells out to
tesseract/ffmpegand processes large binary inputs, which introduces new runtime dependencies and potential performance/resource risks when enabled.Overview
Adds an opt-in OCR pipeline (
--enable-ocr) that extracts text from supported image/video files and feeds it through the existing secret-scanning flow.Introduces a new
ocrfile handler routed by MIME type (image/png,image/jpeg,video/mp4,video/x-matroska,video/webm), implements frame extraction (1fps) and image preprocessing before calling externaltesseract/ffmpeg, and adds handler tests plus README documentation for installation and usage.Written by Cursor Bugbot for commit e54f4b4. This will update automatically on new commits. Configure here.