Retail Edge AI System: 8 Camera Jetson Orin NX Architecture
Last updated: April 2026
A practical edge-first deployment pattern for retail stores using 8 PoE IP cameras, Jetson Orin NX 16GB, local NVMe storage, and cloud-light metadata sync.
Verdict
For a standard 8-camera indoor retail deployment, Jetson Orin NX 16GB is the best-fit platform. It provides enough decode and inference headroom without jumping to the cost and power envelope of AGX Orin.
Try this in System Designer and compare alternatives before final hardware purchase.
Architecture Overview
Video is processed locally at the edge. Only events, metadata, clips, and dashboard summaries should be synced upstream. This keeps cloud bandwidth and storage costs under control.
Deployment Summary
| Use case | Retail analytics, people/object detection, queue monitoring |
| Cameras | 8 PoE IP cameras |
| Resolution | 1080p |
| Frame rate | 20-30 FPS depending on model complexity |
| Latency target | 100-200 ms for local alerts |
| Retention | 7-14 days via local NVMe ring buffer |
Recommended Stack
| Compute | NVIDIA Jetson Orin NX 16GB |
| Network | PoE+ switch with 120-150W power budget |
| Storage | 1-2TB NVMe SSD, high-endurance preferred |
| Camera codec | H.265 preferred; H.264 acceptable |
| Cloud pattern | Send metadata/events, not continuous video |
Camera Layer
Use ONVIF-compliant PoE cameras at 1080p. Typical bitrate is 4-8 Mbps per stream, placing total camera ingress around 32-64 Mbps before overhead.
Network Layer
Use a PoE+ switch with VLAN separation. Keep cameras isolated from management traffic and expose only the edge device to dashboard or admin networks.
Compute Layer
Jetson Orin NX is the sweet spot for 8 streams with detection and tracking. AGX Orin is safer for 12-16 cameras or multi-model workloads.
Power and Performance
| Component | Estimate |
|---|---|
| 8 cameras x ~10W | ~80W |
| PoE switch overhead | ~20W |
| Jetson Orin NX | ~15-25W |
| Total | ~120-140W |
Expected Performance
| Metric | Expected range |
|---|---|
| Stable stream capacity | 8 streams |
| GPU utilization | ~65-80% |
| Local alert latency | 100-200 ms |
| Thermal load | Moderate; active cooling recommended |
Bottlenecks and Failure Modes
Primary risk: model complexity. Moving from a small detection model to a heavier model can reduce stable camera capacity before bandwidth becomes the problem.
| Failure mode | What causes it | Symptom | Mitigation |
|---|---|---|---|
| Decode saturation | More streams or higher resolution | Dropped frames | Lower FPS, use H.265, upgrade to AGX |
| Inference saturation | Larger model, tracking, re-ID | Latency spikes | Use smaller model, batching, ROI inference |
| Storage pressure | Continuous video retention | Write stalls, dropped clips | High-endurance NVMe, reduce retention |
| Thermal throttling | Closed enclosure, high sustained load | FPS reduction over time | Active cooling, thermal headroom, ventilation |
Scaling Decisions
- 4-6 cameras: consider Orin Nano Super for lower cost.
- 8 cameras: Orin NX 16GB is the default recommendation.
- 12 cameras: Orin NX can be tight; validate with target model and codec.
- 16 cameras: move to AGX Orin or split across nodes.
- Re-identification / multi-model pipelines: prefer AGX Orin class capacity.
Validate This Architecture With EdgeAIStack
- System Designer — platform recommendation, headroom, and alternatives.
- Network Bandwidth — camera ingress and uplink sizing.
- Storage Endurance — NVMe capacity and write-life estimation.
- Power Budget — PoE and compute power envelope checks.
FAQ
How many cameras can Jetson Orin NX handle?
For retail video analytics, Orin NX is typically a strong fit for roughly 6-10 1080p streams depending on frame rate, model size, tracking, and thermal conditions.
Is AGX Orin required for an 8-camera retail deployment?
Usually no. AGX Orin becomes more attractive beyond 10 cameras, with larger models, or with heavier multi-model pipelines.
Should retail video analytics run in the cloud?
For most stores, inference should run locally. Send metadata, alerts, and selected clips upstream instead of continuous full video.