-
Notifications
You must be signed in to change notification settings - Fork 819
Add in-memory snapshot cache for /api/frame.jpeg endpoint #1967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
adamjacobmuller
wants to merge
5
commits into
AlexxIT:master
Choose a base branch
from
adamjacobmuller:cached-snapshots
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+346
−8
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implements a high-performance snapshot caching system that dramatically
reduces latency for repeated snapshot requests from 150-2700ms to <10ms.
Problem Statement:
- Every /api/frame.jpeg request required waiting for RTSP connection,
keyframe arrival, and FFmpeg transcoding (150-2700ms total)
- Home Assistant dashboards, motion detection systems, and preview grids
generate dozens of requests per minute, causing high latency and
resource usage
Solution:
- Background consumer continuously transcodes keyframes to JPEG
- Snapshots cached in memory (~100-500KB per stream)
- Configurable idle timeout stops producer after inactivity
- Zero waste when producers already running (piggybacks existing streams)
- Cache persists in memory even after timeout for instant resumption
Architecture:
1. Stream-level cache storage (internal/streams/stream.go)
- Thread-safe JPEG data and timestamp storage
- RWMutex for concurrent read access
2. Background SnapshotCacher (internal/streams/snapshot_cache.go)
- Persistent keyframe consumer with idle timeout (600s default)
- Continuous JPEG transcoding via injected function
- Graceful shutdown via consumer.Stop() to unblock WriteTo
- Automatic cleanup on idle timeout or stream termination
3. Enhanced snapshot handler (internal/mjpeg/init.go)
- Check cache first, serve instantly if available
- Fall back to fresh snapshot if cache miss or client requests
- HTTP headers expose cache age/timestamp for client decisions
- Query parameter override: ?cached=true/false
Configuration:
mjpeg:
snapshot_cache: true # Enable/disable
snapshot_cache_timeout: 600 # Idle timeout (seconds)
snapshot_serve_cached_by_default: false # Serve policy
HTTP Response Headers:
X-Snapshot-Age-Ms: 234 # Milliseconds since capture
X-Snapshot-Timestamp: 2025-... # ISO 8601 capture time
X-Snapshot-Cached: true # true if from cache
Performance:
- First request: 150-2700ms (unchanged - cold start)
- Subsequent requests: <10ms (~99% improvement)
- Memory usage: ~300KB per stream (30MB for 100 cameras)
- Works with WebRTC/HLS: zero additional overhead when consumers active
Key Implementation Details:
- Dependency injection for transcode function avoids import cycles
- WriteBuffer.WriteTo blocks until Stop() called on consumer
- Idempotent stop() via atomic.Bool prevents double cleanup
- Cache never evicted (negligible memory for typical deployments)
- Per-request policy override via query parameter
Tested:
- Multi-architecture Docker image (linux/amd64, linux/arm64)
- Verified cache updates continuously in background
- Confirmed graceful shutdown on idle timeout
- Validated 11x performance improvement in production test
Addresses panic: runtime error: slice bounds out of range [32382:2108] The issue occurred when nuStart wasn't reset after buffer clearing, causing it to point beyond the buffer length on subsequent fragmented units. This adds: 1. Bounds checking before writing NAL unit size to prevent invalid slice operations 2. nuStart reset when buffer is cleared to prevent stale state The panic typically occurred during H265 RTP stream processing when fragmentation unit (FU) state became inconsistent.
Modified TouchSnapshotCache call to include stream name parameter, enabling per-stream logging and diagnostics in snapshot cache operations. This improves troubleshooting and monitoring of cache behavior across multiple streams.
Key improvements to snapshot cache implementation: 1. Enhanced logging with stream names: - Added stream name field to SnapshotCacher for per-stream logging - All log messages now include stream name for better diagnostics - Added trace-level logging for detailed troubleshooting 2. Stale cache detection and recovery: - Detect when cache age exceeds 2x timeout threshold - Automatically restart cacher when cache becomes stale - Prevents serving outdated snapshots from stuck cachers 3. Improved lifecycle management: - Clear cacher reference in run() loop on exit for auto-restart - Better error handling when consumer fails to start - Retain old cached snapshot when new cacher fails to start 4. Fixed potential deadlock: - Check cache age before acquiring snapshotCacherMu - Prevents lock ordering issues between cachedJPEGMu and snapshotCacherMu 5. Better observability: - Log bytes written on consumer errors - Log when WriteTo completes normally vs error - Track timestamp with nanosecond precision in logs - Added stopAndClear helper for clarity (future use) These changes make the snapshot cache more resilient to transient producer failures and easier to debug in production.
Change most operational snapshot-cache logs from DEBUG to TRACE: - Startup sequence (creating cacher, adding consumer, etc) - Run loop messages - Cache update messages (fires on every keyframe) - Stop/cleanup sequence Keeps important messages at DEBUG or higher: - Successfully started cacher - Idle timeout events - Warning/error conditions
Contributor
|
Interesting. I wonder how much this overlaps with GOP cache. I mean, if GOP cache was implemented, I suppose it could be reused for the snapshot too. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an intelligent in-memory snapshot caching system for the
/api/frame.jpegendpoint to eliminate slow response times caused by large keyframe intervals.Problem
The current
/api/frame.jpegimplementation blocks until the next keyframe arrives. When cameras use large keyframe intervals (5-10+ seconds), snapshot requests can take that long to respond:This creates a poor user experience in:
Additional benefit: Reduces camera load since multiple clients can share the same cached snapshot instead of each triggering separate connections.
Solution
Implements background snapshot caching with:
?cached=true(or configure globally)Configuration
Usage
Performance Impact
Before (fresh snapshot):
After (cached snapshot):
Memory overhead: ~300KB JPEG per cached stream (only while accessed)
Benefits
Implementation Details
internal/streams/snapshot_cache.go(223 lines)internal/mjpeg/init.go- add caching logic to frame handlerinternal/streams/stream.go- add cache storage fieldspkg/h265/rtp.go- prevent panic from stale buffer pointersTesting
Tested with:
Related Issues
Directly addresses:
May help with:
-ss) to/api/frame.jpegto avoid frames being dark #1657 - Dark frame issues (cache can skip initial frames)This change is backward compatible - default behavior is unchanged unless clients opt-in with
?cached=true.