// Edge LLM Hardware

Jetson Thor vs Jetson AGX Orin: 2026 Edge LLM Decision Framework

Q: Is Jetson AGX Thor worth the extra cost over AGX Orin 64GB for edge AI in 2026?

Only if you're running 30B+ parameter LLMs, deploying Vision-Language-Action models, or running multiple concurrent models on one device. For multi-stream vision, classical CV pipelines, and LLMs up to ~13B parameters, AGX Orin 64GB at $1,999 remains the better choice. Thor's $3,499 price, 130W sustained power, and active-cooling requirements eliminate it from battery and passive-cooling deployments.

Q: Can Jetson AGX Orin 64GB run Llama 3.3 70B locally?

Not comfortably. A 70B model at INT8 weights alone needs ~70GB before activations and KV cache. AGX Orin tops out at 64GB unified memory and lacks native FP4 hardware. Practical ceiling on AGX Orin is 13B–30B class models using GGUF Q4 quantization (e.g. Qwen2.5-32B at Q4 fits with margin). For 70B-class models, Jetson AGX Thor with its 128GB memory and NVFP4 hardware is the on-device option.

Q: What is NVFP4 and why does it matter for edge LLM deployment?

NVFP4 is NVIDIA's block-scaled 4-bit floating-point format introduced with Blackwell. It pairs FP4 weights with FP8 activations, dynamically switched at runtime by the Transformer Engine. NVFP4 is what makes a 70B-parameter model fit in 128GB of unified memory with headroom for KV cache and activations. Jetson AGX Orin (Ampere) cannot do FP4 in hardware — it tops out at INT8 and FP16.

Q: Does Jetson Thor replace AGX Orin in the product line?

No. NVIDIA positions Thor as a new tier above AGX Orin, targeting physical AI, humanoid robotics, and generative AI at the edge. AGX Orin remains in production for multi-stream perception and CV workloads where 275 TOPS at 60W is more than enough. Most existing AGX Orin deployments do not need to upgrade.

Updated May 2026

The Jetson AGX Thor's 2,070 FP4 TFLOPS and 128GB unified memory change what's possible at the edge — including running 70B-parameter LLMs locally. But the $3,499 dev kit isn't always the right answer. Here's how to decide, with quantization tradeoffs and workload-by-workload guidance.

AGX Thor $3,499

AGX Orin 64GB $1,999

7.5x AI compute

40–130W vs 15–60W

Quick Answer

Pick AGX Orin 64GB ($1,999) for multi-stream vision, classical CV pipelines, multimodal models ≤13B parameters, and any deployment already meeting latency on Orin. It's still the production workhorse.

Pick AGX Thor ($3,499) if you need 30B+ LLMs on-device, Vision-Language-Action (VLA) robotics models, multiple concurrent models with hard real-time isolation, or humanoid-robotics workloads.

The headline feature is FP4 (NVFP4). It's what lets a 70B model fit in 128GB with room for activations and KV cache. Orin cannot do FP4 in hardware. If 70B-class LLMs aren't on your roadmap, you probably don't need Thor.

Quantization assumptions: INT8 / GGUF Q4 on AGX Orin; NVFP4 / W4A16 on AGX Thor. Power figures are sustained, not peak.

Who This Page Is For

Choosing between Jetson AGX Thor and AGX Orin 64GB for a new edge AI deployment
Evaluating whether to upgrade an existing AGX Orin fleet to Thor
Sizing edge LLM workloads where quantization format and memory ceiling determine viability
Planning robotics, VLA, or humanoid-robot deployments with mixed-criticality workloads

// Architecture Changes

What Actually Changed (Aug 2025 → 2026)

In August 2025, the Jetson AGX Thor Developer Kit went generally available at $3,499. By early 2026, software updates pushed its generative-AI throughput approximately 7× above launch, with Llama 3.3 70B and DeepSeek R1 70B both running on-device. Three architectural changes matter beyond the raw spec sheet.

1. Native FP4 via the NVFP4 format

Thor is built on the Blackwell GPU architecture and ships with a next-generation Transformer Engine that dynamically switches between FP4 and FP8 at runtime. This is the change that lets a 70B model fit in 128GB of unified memory with headroom for activations and KV cache. W4A16 (4-bit weights, 16-bit activations) takes this further: NVIDIA documents W4A16 deployments of 175B-parameter models on a single Thor. Orin tops out at INT8/FP16 in hardware — FP4 cannot be replicated on Orin without significant accuracy loss.

2. MIG (Multi-Instance GPU) on an edge device

Thor introduces Multi-Instance GPU partitioning, previously a datacenter-only feature. A single GPU can be carved into isolated instances with dedicated compute and memory resources. For mixed-criticality workloads — perception, planning, control, and a foundation model on one device — MIG is the difference between "the LLM inference call starves the control loop" and guaranteed real-time response. This is the killer feature for robotics.

3. TensorRT Edge-LLM SDK (JetPack 7.1)

JetPack 7.1 introduced the TensorRT Edge-LLM SDK, an open-source C++ runtime built for Jetson-class devices without the loose memory and Python-everywhere assumptions of cloud serving stacks. EAGLE-3 speculative decoding lands here too, with measured uplift up to ~2.5× on certain models. If you're deploying LLMs to a robot or industrial control system, this is the runtime to evaluate.

// Spec Comparison

Spec Comparison: Thor vs Orin 64GB

Specs are necessary but not sufficient. The right comparison is at your power envelope and your memory ceiling, not peak TOPS.

Spec	Jetson AGX Thor	Jetson AGX Orin 64GB
GPU architecture	Blackwell	Ampere
AI compute (peak)	2,070 FP4 TFLOPS	275 INT8 TOPS
Unified memory	128 GB LPDDR5X	64 GB LPDDR5
Power envelope (sustained)	40–130 W	15–60 W
Native quantization	FP4 (NVFP4), FP8, INT8	FP16, INT8
MIG (Multi-Instance GPU)	Yes	No
Networking	4× 25 GbE	1× 10 GbE
Largest viable LLM (quantized)	~175B (W4A16)	~30B (GGUF Q4)
Dev kit price (USD)	$3,499	$1,999
Best for	Physical AI, VLA, 30B+ LLMs	Vision, classical CV, ≤13B LLMs

The power column is the line most teams underweight. Thor's 40W minimum is higher than Orin's maximum in some Orin power modes. For battery-powered robots, drone payloads, or sealed industrial enclosures with passive cooling, that single row eliminates Thor from consideration regardless of compute.

// Quantization

The Quantization Ladder

Whether a model fits on a given Jetson is almost entirely a quantization question. The ladder used for sizing:

FP16 — the baseline you almost never deploy

A 70B model at FP16 needs ~140 GB just for weights, before activations and KV cache. Doesn't fit anywhere in the Jetson line. Use FP16 only for accuracy reference and calibration.

FP8 — the safe first step

Halves memory footprint vs FP16. Properly calibrated, accuracy drop is typically <1%. Available on Thor in hardware via the Transformer Engine; not available on Orin. Sensitive workloads (math, code generation, structured reasoning) may need extra tuning — don't assume FP8 is free.

W4A16 — the unlock for large models

4-bit weights with 16-bit activations. This is what makes 175B-parameter models viable on a single Thor. Accuracy hit is workload-dependent, but for chat and general reasoning it's usually acceptable. Use TensorRT Model Optimizer or AutoAWQ to produce the quantized weights.

NVFP4 — Thor's native format

Block-scaled FP4 paired with FP8 activations, dynamically switched by the Transformer Engine. This is the headline number behind the 2,070 TFLOPS figure. The catch: model coverage is still expanding. Llama 3.3 70B, DeepSeek R1 70B, and the Nemotron family are well-supported; smaller community models may not yet have NVFP4 conversion paths.

GGUF Q4_K_M / Q5_K_M — the lingua franca for community models

If you're running llama.cpp or Ollama, GGUF quantization is what you'll actually deploy. Q4_K_M is the most common balance point. Works on Thor, Orin, and CPU-only edge boxes — but you'll leave Blackwell-specific throughput on the table by not using NVFP4 on Thor.

// Workload Mapping

Real Workloads: Which Platform to Pick

Decision rules organized by what the system actually does:

Workload	Recommended platform	Why
Single-stream 1080p object detection (YOLO11, RT-DETR)	Orin Nano 8GB	Thor and AGX Orin are both wildly overpowered; Orin Nano is cheaper and lower-power.
4–8 stream multi-camera analytics	AGX Orin 64GB	Mature, well-benchmarked, in production at scale. Don't pay 75% more for Thor unless you also need LLM workloads.
Industrial visual AI agent (VLM + perception, 7B class)	AGX Orin 64GB	Qwen2.5-VL-7B or Gemma 3 4B fits comfortably in 64GB at INT8 alongside a perception stack.
Local AI assistant at the edge (≤30B params)	Either — Orin if budget-constrained, Thor for headroom	30B models at Q4 fit in Orin 64GB; Thor enables 70B class with margin.
70B+ LLM on-device (Llama 3.3 70B, DeepSeek R1 70B)	AGX Thor	Requires NVFP4 / W4A16 + 128GB memory. Not viable on Orin.
Humanoid robot / VLA models (GR00T, OpenVLA)	AGX Thor	Foundation-model robotics is what Thor was designed for.
Multi-model concurrent pipeline (perception + LLM + planner on one device)	AGX Thor	MIG isolates workloads so the LLM call doesn't starve the control loop.
Ultra-low-power AI sensor (≤5W, battery)	Coral TPU or Hailo-8 + SBC	Neither Jetson is in scope at this power envelope. Don't force a Jetson into a battery design.
Cost-constrained smart camera (single stream, 1080p)	RK3588 + RKNN	~$150 BOM with 6 TOPS NPU is enough for many production workloads. Reserve Jetson for what actually needs it.

// Hidden Costs

Hidden Costs: Power, Thermals, Serviceability

A spec comparison that stops at TOPS-per-dollar misses the three things that kill edge deployments in the field:

Sustained power. Thor's 130W mode is sustained, not peak. Active cooling and a power supply that can deliver clean 19V/7A continuously are non-negotiable. Many industrial enclosures spec'd for Orin will not pass thermal validation for Thor.
Inrush and brown-out behavior. Battery-powered systems with Thor see significant voltage sag on inference spikes. Add a supercapacitor or step down to Orin NX / Orin Nano if you're battery-constrained.
Spare-part economics. AGX Orin modules are $1,999 to replace in the field; Thor T5000 modules are more. With 1,000 deployed units and a 2% annual failure rate, the platform-cost delta compounds quickly across a 3–5 year deployment.

// Decision Tree

A Simple Decision Tree

Is the largest model you'll run ≤13B parameters? → Orin (Nano, NX, or AGX based on stream count).
Do you need a 30B-class model and have a 60W+ power budget? → AGX Orin 64GB.
Do you need a 70B+ model, or multiple concurrent models with hard real-time guarantees? → AGX Thor.
Is your deployment power-constrained below 15W? → Not Jetson — look at Coral, Hailo-8, or RK3588.
Are you building a humanoid robot or surgical assistant? → AGX Thor. This is the workload it was designed for.

// FAQ

Frequently Asked Questions

Is Jetson AGX Thor worth the extra cost over AGX Orin 64GB?

Only if you're running 30B+ parameter LLMs, deploying Vision-Language-Action models, or running multiple concurrent models on one device. For multi-stream vision, classical CV pipelines, and LLMs up to ~13B parameters, AGX Orin 64GB at $1,999 remains the better choice. Thor's $3,499 price, 130W sustained power, and active-cooling requirements rule it out for battery and passive-cooling deployments.

Can AGX Orin 64GB run Llama 3.3 70B locally?

Not comfortably. A 70B model at INT8 weights alone needs ~70GB before activations and KV cache. AGX Orin tops out at 64GB unified memory and lacks native FP4 hardware. Practical ceiling on AGX Orin is 13B–30B class models using GGUF Q4 quantization (e.g. Qwen2.5-32B at Q4 fits with margin). For 70B-class models, Jetson AGX Thor with its 128GB memory and NVFP4 hardware is the on-device option.

What is NVFP4 and why does it matter for edge LLM deployment?

NVFP4 is NVIDIA's block-scaled 4-bit floating-point format introduced with Blackwell. It pairs FP4 weights with FP8 activations, dynamically switched at runtime by the Transformer Engine. NVFP4 is what makes a 70B-parameter model fit in 128GB of unified memory with headroom for KV cache and activations. AGX Orin (Ampere) cannot do FP4 in hardware — it tops out at INT8 and FP16.

Does Jetson Thor replace AGX Orin in the product line?

No. NVIDIA positions Thor as a new tier above AGX Orin, targeting physical AI, humanoid robotics, and generative AI at the edge. AGX Orin remains in production for multi-stream perception and CV workloads where 275 TOPS at 60W is more than enough. Most existing AGX Orin deployments do not need to upgrade.

What's the cheapest viable platform for a local 7B-parameter LLM?

Jetson Orin NX 16GB at INT8 or GGUF Q4 runs 7B-class models (Llama 3.1 8B, Qwen2.5-7B, Gemma 2 9B) at usable throughput. AGX Orin 64GB is faster but only needed if you're running the LLM alongside other heavy workloads. For a dedicated single-model deployment, Orin NX is the price/performance sweet spot.

Bottom line

If 70B-class LLMs, VLA models, or humanoid robotics aren't on your roadmap, AGX Orin 64GB remains the right answer in 2026 at less than half the cost. Thor is the upgrade path for foundation-model edge workloads — not a replacement for the existing Orin line.

Size your edge AI deployment

Use the System Designer to validate platform choice, capacity, and headroom for your workload. For pure platform comparison, run the Hardware Selector.

Open System Designer →

Jetson Thor vs Jetson AGX Orin: 2026 Edge LLM Decision Framework

Quick Answer

Who This Page Is For

What Actually Changed (Aug 2025 → 2026)

1. Native FP4 via the NVFP4 format

2. MIG (Multi-Instance GPU) on an edge device

3. TensorRT Edge-LLM SDK (JetPack 7.1)

Spec Comparison: Thor vs Orin 64GB

The Quantization Ladder

FP16 — the baseline you almost never deploy

FP8 — the safe first step

W4A16 — the unlock for large models

NVFP4 — Thor's native format

GGUF Q4_K_M / Q5_K_M — the lingua franca for community models

Real Workloads: Which Platform to Pick

Hidden Costs: Power, Thermals, Serviceability

A Simple Decision Tree

Frequently Asked Questions

Bottom line

Related Guides

Size your edge AI deployment