// Memory Sizing

YOLOv8 RAM Requirements on Jetson: How Much Memory Do You Need?

Last updated: March 8, 2026

Decide between 8GB, 16GB, and 32GB Jetson modules by understanding runtime memory consumption. Learn how model size, stream count, OS overhead, and unified memory constraints drive your RAM requirements.

OS baseline: 2.5–3.5 GB
Per stream: ~40–60 MB
8–16 GB: 1–8 cameras
32 GB: multi-model pipelines

Quick Answer

A YOLOv8s TensorRT INT8 engine (~50 MB file) consumes ~200–500 MB at runtime due to activation memory, input/output buffers, and pipeline state. Do not size from the ONNX file size. For 1–2 cameras with small models, 8 GB Orin Nano is sufficient. For 4–8 cameras with medium-to-large models or tracking, 16 GB Orin NX is the reliable minimum. 32 GB+ is needed for complex multi-task pipelines or high-resolution inputs.

Use tegrastats to measure actual unified memory consumption under full load. Account for OS baseline (2.5–3.5 GB), per-stream decoder overhead (~40–60 MB), activation memory, and thermal headroom.

Scope of This Page

This article provides YOLOv8-specific RAM planning for Jetson devices. It covers YOLOv8 memory behavior, model variants, stream buffers, Jetson deployment scenarios, and practical sizing recommendations.

This page assumes you understand general RAM sizing concepts. For framework-agnostic guidance on pipeline memory planning, activation memory, buffering theory, and system-level considerations, see RAM Sizing for Edge AI Inference.

Use this page when deploying YOLOv8 on Jetson. Use the general guide for understanding memory planning fundamentals.

Planning Takeaway

8 GB Orin Nano: Small pipelines (1–2 streams, YOLOv8n/s, no secondary models). Easy to cool passively. Tight on headroom; not recommended for sustained multi-stream workloads or production deployments with thermal margin.

16 GB Orin NX: Default production choice. Handles 4–8 cameras with YOLOv8s/m, optional tracking, and secondary classifiers. Ample headroom for model updates and thermal throttling scenarios. Recommended tier for most camera deployments.

32 GB AGX Orin: Large multi-task pipelines, YOLOv8l/x, detection + re-ID + segmentation, or research workloads. Also necessary for high-resolution inputs (1280×1280+) where activation memory scales quadratically.

Who This Page Is For

  • Engineers sizing Jetson RAM for camera-based inference pipelines
  • Teams deploying multi-model pipelines (detection + tracking + classification)
  • System architects choosing between Orin Nano, NX, and AGX Orin for production
  • Developers troubleshooting memory constraints on existing deployments

RAM Usage Drivers

Total RAM consumption for a YOLOv8 inference pipeline on Jetson is the sum of:

  1. TensorRT engine memory: The compiled engine loaded into unified memory. This is the model weight size after INT8/FP16 optimization — smaller than the original ONNX or PyTorch file.
  2. Activation memory: Intermediate tensors produced by each layer during a forward pass. This scales with input resolution and is the largest contributor for high-resolution inputs.
  3. Input/output tensor buffers: Memory allocated for the input image tensor and output detection tensor. At 1080p with batch size 1, an input tensor in FP16 is approximately 6 MB.
  4. Video decode buffers: Each RTSP stream requires decoded frame buffers in YUV and/or BGR format. One 1080p decoded frame is ~3–6 MB; a queue of 4–8 frames per stream multiplies this.
  5. OS and runtime baseline: JetPack Ubuntu, Docker, CUDA runtime, cuDNN, application processes — approximately 2–3 GB at idle on a typical production configuration.
  6. Pipeline state: Tracking state (ByteTrack, DeepSORT), alert history, ring buffer metadata, logging buffers — typically 100–500 MB for moderate pipelines.

YOLOv8 Model Variants

YOLOv8 comes in five sizes (n, s, m, l, x). Each has different parameter counts, ONNX file sizes, and runtime memory footprints after TensorRT INT8 compilation:

  • YOLOv8n (nano): ~3.2M params, TensorRT INT8 engine ~20–30 MB, runtime activation ~80–150 MB at 1080p
  • YOLOv8s (small): ~11M params, TensorRT INT8 engine ~40–60 MB, runtime activation ~150–300 MB at 1080p
  • YOLOv8m (medium): ~26M params, TensorRT INT8 engine ~80–120 MB, runtime activation ~300–500 MB at 1080p
  • YOLOv8l (large): ~44M params, TensorRT INT8 engine ~150–200 MB, runtime activation ~500–800 MB at 1080p
  • YOLOv8x (extra large): ~68M params, TensorRT INT8 engine ~200–300 MB, runtime activation ~700 MB – 1.2 GB at 1080p

These are per-engine figures at batch size 1. Total pipeline memory is significantly higher when OS, decode buffers, and tracking state are included.

Batch Size Impact

TensorRT engines compiled with a fixed batch size allocate activation memory proportional to the batch size. A batch size 1 engine at 300 MB activation becomes ~600 MB at batch size 2, and ~1.2 GB at batch size 4.

For real-time single-stream inference, batch size 1 is standard. For multi-stream inference where frames from multiple cameras are batched together before a single inference call, batch size N (where N = camera count) reduces inference call overhead but increases peak memory by N×. On Jetson's unified memory, this tradeoff is worth profiling explicitly — batching may reduce GPU utilization overhead but significantly increase memory pressure.

DeepStream pipelines on Jetson handle batching internally. Configure batch-size in the DeepStream config to 1 initially, profile memory, then increase only if GPU utilization shows significant room for batching improvement.

Stream Buffers and Decoder Overhead

Each RTSP stream decoded by the Jetson NVDEC hardware decoder requires:

  • Decoder reference frame buffers: approximately 20–30 MB per stream for 1080p H.264 (more for H.265 and 4K)
  • Output surface buffers in NV12/YUV format: ~3 MB per frame × 4–8 frame queue = 12–24 MB per stream
  • Pre-processing output (resized, normalized BGR tensor): ~6 MB per frame at 1080p FP16

Per-stream buffer overhead: approximately 40–60 MB per 1080p stream in a typical pipeline. For 8 streams: 320–480 MB for stream buffers alone, before inference engine memory.

OS and Runtime Overhead

A production JetPack deployment consumes approximately 2.5–3.5 GB of unified memory at idle before any YOLOv8 workload. This includes OS kernel, Docker, CUDA runtime, cuDNN, TensorRT libraries, and application overhead.

For detailed breakdown of OS overhead components and system-level memory planning, see RAM Sizing for Edge AI Inference, which explains OS overhead in detail.

For YOLOv8 sizing: Always assume 3 GB baseline on Orin Nano. This leaves 5 GB for YOLOv8 engine, buffers, and state—a tight constraint that drives the recommendation toward 16 GB Orin NX for production.

Unified Memory Constraints on Jetson

Jetson uses a unified memory pool shared by CPU and GPU—unlike x86 + discrete GPU systems. This means GPU inference memory pressure directly reduces available RAM for the OS and application. If YOLOv8 allocates 2 GB, the OS has 2 GB less for buffers, logging, and other processes.

Key implications for YOLOv8 sizing:

  • GPU and CPU compete for the same pool: Monitor total memory with tegrastats, which reports unified memory as one pool.
  • Zero-copy preprocessing is beneficial: Use NvBufSurface APIs for efficient frame passing between decoder and YOLOv8.
  • Shared memory bandwidth: High application-side memory activity (logging, network I/O) can increase inference latency slightly.

For comprehensive explanation of unified memory architecture, see RAM Sizing for Edge AI Inference.

Scenario Sizing Table

Scenario Model Cameras Estimated RAM Usage Recommended Jetson Notes
Entry: presence detection YOLOv8n INT8 1 ~2.5 GB total Orin Nano 8GB Comfortable headroom
Retail: foot traffic counting YOLOv8s INT8 2 ~3.5 GB total Orin Nano 8GB 4 GB+ available for other tasks
Retail: multi-zone detection YOLOv8s INT8 4 ~4.5–5 GB total Orin Nano 8GB (tight); validate with thermal monitoring Marginal headroom; no room for secondary models. Sustained 4-camera load may trigger thermal throttling without active cooling—see thermal guide.
Warehouse: PPE detection YOLOv8m INT8 4 ~5.5 GB total Orin NX 16GB 8GB Nano too tight for this model size at 4 streams
Warehouse: 8-cam detection + tracking YOLOv8m INT8 + ByteTrack 8 ~8–10 GB total Orin NX 16GB 16GB provides comfortable headroom
Smart city: detection + re-ID YOLOv8l INT8 + re-ID model 8 ~12–14 GB total AGX Orin 32GB Multiple large models; 16GB insufficient
Research node: detection + segmentation YOLOv8x + SAM FP16 4 ~18–22 GB total AGX Orin 32GB SAM alone consumes 3–4 GB

Secondary Models: Tracking and Classification

Most production pipelines run YOLOv8 as the primary detector followed by secondary models for tracking, classification, or re-identification. Each secondary model adds to the memory total:

  • ByteTrack / SORT (CPU-based): State-only, no neural model. Memory is tracking state: ~50 MB for 8 streams with moderate object counts.
  • DeepSORT with re-ID: Includes a neural re-ID model (ResNet18 variant: ~40–60 MB runtime, ResNet50: ~100–150 MB runtime) plus tracking state.
  • Secondary classifier (MobileNetV3): ~20–30 MB TensorRT runtime.
  • Pose estimation (YOLOv8-pose): Similar footprint to the equivalent detection model plus keypoint output tensors.

A detection + tracking + classification pipeline on 8 cameras with YOLOv8m adds approximately 300–500 MB to the base detection-only estimate. This is manageable on 16 GB Orin NX. Adding a large re-ID model pushes the requirement toward 32 GB.

Common Pitfalls

  • Sizing RAM from the ONNX file size: A YOLOv8s.onnx file at 22 MB does not reflect runtime memory consumption. TensorRT compilation, activation memory, and pipeline buffers multiply this significantly. Always measure with tegrastats under load.
  • Not accounting for TensorRT workspace memory: TensorRT allocates a workspace buffer during engine optimization and a smaller runtime workspace during inference. The workspace size is configurable; larger workspaces can improve optimization but consume more RAM during the build phase.
  • Testing at 1 camera and deploying at 8: Memory consumption at 8 cameras is not 8× the 1-camera consumption — decoder overhead, pipeline state, and batch buffers scale differently. Profile at the actual deployment stream count before finalizing hardware.
  • Running inference in FP32 when INT8 is viable: FP32 uses 4× the memory of INT8 for equivalent model parameters. If accuracy requirements are met by INT8, using FP32 unnecessarily halves the effective RAM available for additional streams or models.
  • Ignoring Docker container memory overhead: Each Docker container adds 100–300 MB of container runtime overhead. Running multiple containers for separate camera groups or models multiplies this. On 8 GB Orin Nano, container overhead is a meaningful budget item.
  • Not leaving headroom for model updates: When deploying a new TensorRT engine, the old engine and new engine are both resident in memory briefly during the swap. If memory is already near-full, this causes an OOM condition during what should be a routine update.

FAQ

How do I measure actual YOLOv8 RAM usage on Jetson?

Run tegrastats while the pipeline is under full load. Check the "RAM" field (total unified memory usage) and process memory in top or htop. For deeper profiling, use Nsight Systems or jtop.

Is YOLOv8n INT8 accurate enough for production use?

YOLOv8n INT8 achieves mAP50-95 of approximately 34–36 on COCO — viable for presence detection and coarse object counting but not for fine-grained classification or small object detection. Validate accuracy on your specific scene and object classes before committing to a model size.

Can I run YOLOv8 on Jetson Orin Nano without TensorRT?

Yes — PyTorch and ONNX Runtime are also supported. However, TensorRT delivers 2–5× better inference throughput and lower latency than PyTorch on Jetson. For production deployments, TensorRT is strongly preferred. The conversion adds build time but pays off at runtime.

Does input resolution affect RAM usage significantly?

Yes, substantially. Activation memory scales quadratically with input resolution. A model running at 1280×1280 input uses approximately 2× the activation memory of the same model at 640×640. Use the minimum input resolution that meets your detection accuracy requirements.

What happens if my pipeline exceeds available RAM?

The Linux OOM killer terminates the highest-memory process — typically the inference application. This causes a silent pipeline crash. On Jetson, the zRAM swap may absorb brief overruns but sustained swap usage causes pipeline latency to degrade severely. Monitor RSS memory and set up a watchdog for pipeline restarts.

Can I run multiple YOLOv8 model variants on the same Jetson simultaneously?

Yes, as long as total memory consumption fits within the unified memory pool. Running YOLOv8n on one camera group and YOLOv8m on another is a valid architecture — both engines load simultaneously into unified memory. Profile total memory usage with both engines loaded before finalizing hardware.

// Related Guides