What is unified memory on Jetson?

Jetson modules use a unified memory architecture — there is no separate VRAM. CPU processes, the OS, and GPU inference all share the same physical memory pool. This means your 8 GB Orin Nano isn't 8 GB dedicated to inference; the OS alone uses ~1.5–2 GB.

Why does INT8 not reduce memory 4×?

INT8 reduces weight storage 4× vs FP32. But activation memory — the largest component at high resolutions — is computed in FP16 even in INT8 networks. Runtime activation memory reduction is ~2×, not 4×.

What is TensorRT build workspace?

During trtexec export, TensorRT allocates 1–4 GB of temporary workspace for kernel selection and layer fusion. This is a one-time cost at build time — it doesn't consume memory during inference.

How accurate are these estimates?

±30% for activations and runtime overhead. Weights are exact (calculated from verified parameter counts). Always validate with jtop or tegrastats on device before finalising memory specifications.

Estimate inference memory for your edge AI workload

Memory planning for edge AI deployments determines whether a model fits on-device without swapping or OOM errors. This tool calculates VRAM and system RAM requirements across quantization levels (FP32, FP16, INT8) and hardware platforms — covering Jetson unified memory constraints, Coral SRAM limits, and Hailo-8 on-chip buffers. Supports NVIDIA Jetson (unified memory), Google Coral (on-chip SRAM), and Hailo-8 / 8L.

Hardware Context

Hardware Family

Loading hardware catalog…

Platform

// Select a platform first

Runtime

Model Configuration

Model Family

Model Variant

// Select a model family first

Precision

Batch Size

Streams

Resolution

224×224

320×320

416×416

640×640

1280×720

1920×1080

Tracker

None

ByteTrack (~50 MB)

DeepSORT (~300 MB)

OC-SORT (~45 MB)

BoT-SORT (~100 MB)

Deployment

Bare Metal (no overhead)

Docker (+100 MB)

K3s / K8s (+200 MB)

// Select hardware and model to continue

Summary

Configure inputs and run to see results.

Memory Breakdown

—

Planning Notes

Configure inputs to see planning recommendations.

Assumptions

Configure the system to see detailed assumptions.

// RELATED TOOLS

→ Tool 08: Inference Throughput Estimator → Tool 07: Module Power Calculator → Tool 06: Full Deployment Planner

// machine-readable output — application/json

{ }

Estimate inference memory for your edge AI workload

Memory Planning for Edge AI Deployments

Unified memory vs. discrete VRAM

Quantization and activation memory

Related tools