// Production Operations

Jetson Deployment Checklist: Production Setup Steps That Prevent Failure

Q: How do I handle failed inference pipeline containers?

Configure the systemd service or Docker restart policy (--restart=on-failure:5) for automatic restart. Add a watchdog process that monitors pipeline liveness (e.g., checks that frames are being processed at expected rate) and restarts the container if the pipeline stalls without crashing.

Last updated: March 2026

A production Jetson node needs more than a working demo. This checklist covers the setup steps that prevent storage overflow, security gaps, reboot failures, thermal issues, and silent pipeline outages in the field.

5 deployment phases

4–8 hr first node

1–2 hr repeatable rollout

Monitoring before ship

Quick Answer

Production readiness depends on completing all five phases in sequence: platform setup (JetPack, packages, GPU verification), storage and power (NVMe, Docker root, nvpmodel), security hardening (SSH keys, firewall, camera passwords), pipeline deployment (Docker, TensorRT, systemd service), and monitoring (SMART, thermal, pipeline metrics). Each phase builds on the previous; skipping earlier phases creates downstream failures that are harder to diagnose remotely.

Most deployment failures come not from the inference model itself but from skipped hardening, missing monitoring, or non-persistent runtime setup. A node that works perfectly during setup often fails within weeks of field deployment because configuration is lost on reboot, security gaps go undetected, or storage fills silently. This checklist ensures the node survives reboot, persists configuration, detects problems early, and is hardened before network exposure.

Planning Takeaway

The most common deployment mistake is treating first boot as production-ready. Real production readiness requires persistence across reboot (systemd services, /etc/fstab), observability (SMART monitoring, thermal alerts, pipeline metrics), storage durability (NVMe with write endurance), and hardening (SSH keys, firewall, no default credentials) before the node ever touches a customer network. If the node cannot survive an unexpected reboot and recover automatically with full pipeline functionality, it is not production-ready.

Who This Page Is For

Teams preparing the first Jetson node for production deployment
Engineers converting a lab demo into a field deployment
Operators standardizing repeatable rollout steps
Teams trying to avoid reboot failures and silent outages
Anyone documenting a deployment baseline for future node replication

Production Readiness at a Glance

JetPack: Pinned version flashed, CUDA/TensorRT verified, packages updated
Storage: NVMe mounted with noatime, Docker root on NVMe, ring buffer configured
Security: SSH key-only auth, UFW enabled, default credentials changed on node and cameras
Pipeline: systemd service enabled, auto-start tested after reboot, TensorRT engine built on target hardware
Monitoring: SMART alerts at 80% TBW, thermal alerts at 90°C, pipeline FPS/latency metrics active

Rule: All five areas must be complete before handing a node off to operations. Partial deployments create time-bomb failure modes.

Why this matters: Edge nodes deployed under time pressure almost always have gaps in security hardening and monitoring. Security gaps surface 6–18 months later as network intrusions or data exfiltration. Missing monitoring means drive failures, thermal throttling, and pipeline stalls go undetected until a site visit confirms what logs would have shown weeks earlier.

Engineering Summary

JetPack version determines your entire software stack: CUDA, cuDNN, TensorRT, and driver versions are all JetPack-pinned. Document the exact JetPack build—mismatches cause inference failures that are hard to debug remotely.
NVMe is mandatory for production: eMMC and SD card storage fills rapidly under Docker layers, logs, and video ring buffers, and lacks the write endurance for sustained workloads. Move Docker root and all write-heavy paths to NVMe on day one.
Security defaults are all wrong out of the box: JetPack ships with password-based SSH, a known default account, and unnecessary services running. All three must be corrected before a node is network-connected at a customer site.
TensorRT engines are device-specific: An engine built on an Orin NX will not run on an Orin Nano. Build and cache engines on the exact target hardware; never copy engines between different Jetson SKUs.
Monitoring must be active before deployment, not after first failure: SMART drive health, thermal zone readings, and pipeline liveness checks must be confirmed operational before the node ships. The first sign of a problem should be an alert, not a service call.

Checklist assumptions: This checklist is sequential, and skipping earlier phases usually creates downstream failures that are harder to diagnose later. For example, skipping Phase 1 package updates can break Phase 4 TensorRT builds with CUDA incompatibility errors, while skipping Phase 2 Docker root migration often surfaces later as disk pressure and monitoring alerts.

Note: The checklist assumes a production inference node with NVMe storage, persistent network connectivity, and remote management requirements. Headless lab nodes or rapid prototyping deployments may skip some hardening steps temporarily, but production nodes need all five phases complete before handoff to operations.

Estimated Setup Time by Phase

Strategic summary: The first node takes longest because it establishes the baseline. Later nodes become faster only if setup is scripted and documented—a manual repeat of the first node's steps will take just as long.

Phase	First Node	With Provisioning Script	Primary Time Driver
1. Initial Setup	60–90 min	30–45 min	JetPack flash + package updates
2. Storage & Power	30–60 min	10–15 min	Partition layout + fstab verification
3. Security Hardening	45–60 min	10–15 min	SSH config + firewall rules + VLAN verification
4. Pipeline Deployment	90–180 min	30–60 min	TensorRT engine build (5–30 min) + end-to-end validation
5. Monitoring	30–60 min	10–20 min	Alert configuration + documentation

Note: Estimates assume single-node setup with hardware on hand. First-time TensorRT engine builds for large models (YOLOv8l INT8) add 15–30 minutes to Phase 4.

Recommendation: If you expect to deploy more than one node, convert every manual step in this page into provisioning code after the first successful build. The time investment pays for itself on the second node and becomes a repeatable baseline for all future nodes.

Phase 1: Initial Setup

This phase establishes a known-good software baseline and confirms the node is usable before configuration work begins.

[ ] Flash JetPack via SDK Manager or sdkmanager CLI
Use the current JetPack release from the NVIDIA developer portal. Flash from a Linux host using SDK Manager, or use the pre-flashed SD card image for developer kits. Verify the JetPack version matches your target software stack.

[ ] Complete initial OS setup (first boot)
Set hostname, timezone, and locale during first-boot setup. Use a hostname that identifies the node's physical location (e.g., edgenode-warehouse-01). Avoid generic hostnames like "jetson" that create conflicts when multiple nodes appear on the same network.

[ ] Update all packages
Run sudo apt update && sudo apt upgrade -y immediately after first boot. JetPack images are rarely current with upstream security patches.

[ ] Verify CUDA, cuDNN, and TensorRT versions
Run dpkg -l | grep -E "cuda|cudnn|tensorrt" to confirm versions installed. Document these versions — they determine which model export and conversion paths are supported. Mismatched runtime versions are a common cause of inference failures.

[ ] Test GPU availability
Run python3 -c "import torch; print(torch.cuda.is_available())" (if PyTorch is installed) or run a TensorRT sample to confirm the GPU is accessible and functional.

Phase 2: Storage and Power Configuration

This phase moves the node from demo storage assumptions to production durability and power behavior.

[ ] Install and format NVMe SSD
Insert the NVMe drive, partition it, and format according to your storage layout plan. For the recommended 4-partition layout, see storage layout and ring buffer design. Mount partitions persistently via /etc/fstab with noatime on the video partition.

[ ] Move Docker root to NVMe
By default, Docker stores layers in /var/lib/docker on the OS drive. On systems with a small eMMC or SD card boot drive, this fills quickly. Update /etc/docker/daemon.json to set "data-root": "/opt/docker" pointing to the NVMe.

[ ] Configure nvpmodel power mode
Select the power mode matching your thermal design: sudo nvpmodel -m 1 for 7W mode (Orin Nano), -m 0 for max power. Make the setting persistent across reboots by editing /etc/nvpmodel.conf. For thermal context, see fanless enclosure thermal constraints.

[ ] Set jetson_clocks for sustained inference performance
Run sudo jetson_clocks to lock CPU, GPU, and EMC clocks to maximum for benchmarking. For production, evaluate whether jetson_clocks or the dynamic clock governor better matches your latency vs. power requirements.

[ ] Verify swap configuration
Check swap with free -h. Jetson enables zRAM swap by default. For production inference nodes where memory pressure should be managed by right-sizing hardware rather than swapping, consider disabling swap to prevent unpredictable latency spikes. See RAM sizing for edge inference for guidance on sizing correctly.

Phase 3: Security Hardening

This phase closes the most common attack paths before the node is exposed to customer or enterprise networks.

[ ] Change default OS user password
The default JetPack account uses a well-known password. Change it immediately: passwd. Disable or remove the default account if a dedicated service account is used for deployment automation.

[ ] Configure SSH key authentication; disable password SSH
Add a public key to ~/.ssh/authorized_keys. Then set PasswordAuthentication no in /etc/ssh/sshd_config and restart sshd. This prevents brute-force password attacks on SSH.

[ ] Configure UFW firewall
Enable UFW with a default-deny policy. Allow only necessary ports: SSH (22), RTSP ingestion if cameras connect to the node on a routed path, and any application-specific management ports. Block all inbound traffic by default.

[ ] Disable unnecessary services
Disable services not needed in production: sudo systemctl disable bluetooth, sudo systemctl disable avahi-daemon, sudo systemctl disable cups. Reduce attack surface by running only required services.

[ ] Configure unattended-upgrades for security patches
Install unattended-upgrades and configure it to automatically apply security updates. Review the configuration to exclude kernel updates (which may require JetPack re-flash) while allowing userspace security patches.

[ ] Change default camera credentials
Log in to each IP camera's web interface and change the default username/password. Document credentials securely. Cameras with default credentials are one of the most common entry points for network intrusions.

[ ] Verify VLAN isolation
From a device on the camera VLAN, verify that it cannot reach the management VLAN or the internet (unless explicitly permitted). From the compute node's management interface, verify that it can reach the camera VLAN for RTSP but camera devices cannot initiate connections to the management VLAN.

Phase 4: Inference Pipeline Deployment

This phase makes the inference application persistent, reproducible, and verifiably correct on the target hardware.

[ ] Deploy inference application via Docker
Use the NVIDIA container toolkit with --runtime nvidia for GPU access. Pin container image versions explicitly — avoid :latest tags in production. Test container startup time; long startup delays affect recovery time after node reboots.

[ ] Configure all RTSP stream URLs
Enumerate each camera's RTSP URL in the application configuration file. Use static IP assignments or DNS entries — avoid relying on mDNS or dynamic hostnames. Test each stream with ffprobe before starting the inference pipeline.

[ ] Run TensorRT engine build and verify
TensorRT engines are device-specific — build engines on the target Jetson, not on a development machine. Engine build can take 5–30 minutes depending on model size. Verify engine output with a test image before integrating into the full pipeline.

[ ] Configure ring buffer and log rotation
Set up the ring buffer cleanup process and configure logrotate. Test ring buffer behavior by filling the video partition to capacity during QA. Verify that the cleanup process runs correctly and that no recording gaps result from deletion timing.

[ ] Configure systemd service for auto-start
Create a systemd service unit for the inference application. Set Restart=on-failure and RestartSec=10s for automatic recovery. Enable the service: sudo systemctl enable inference-pipeline.service. Test by rebooting the node and verifying the pipeline starts automatically.

[ ] Validate inference accuracy end-to-end
Run a known test scenario through the pipeline — a person walking in frame, a vehicle of known type, or an object in a known position — and verify that detections are correct and alerts fire as expected. Do not assume the pipeline works correctly in production without end-to-end validation.

Phase 5: Monitoring and Maintenance

This phase ensures the first sign of a problem is an alert, not a site visit.

[ ] Configure SMART monitoring for NVMe
Install smartmontools and configure smartd to run weekly health checks. Log SMART attributes to a remote monitoring system. Alert when Percentage Used exceeds 80% or Available Spare drops below 20%.

[ ] Set up thermal monitoring
Log junction temperatures via tegrastats or a custom script reading /sys/devices/virtual/thermal/thermal_zone*/temp. Alert when any thermal zone exceeds 90°C. Sustained throttling indicates an enclosure thermal design issue that must be resolved before leaving the node in production.

[ ] Configure NUT for UPS monitoring
If a UPS is installed (recommended — see power and UPS for edge deployments), configure NUT to monitor battery state and trigger graceful shutdown when battery falls below 30% charge or 3 minutes estimated runtime.

[ ] Establish pipeline health metrics
Export metrics from the inference pipeline: frames processed per second, inference latency (p50, p95, p99), dropped frames, missed detections on test scenes. A healthy pipeline should show stable FPS and latency over 24-hour periods with no degradation trend.

[ ] Document node configuration
Record: JetPack version, TensorRT version, model name and version, RTSP stream URLs, power mode, partition layout, NVMe model and serial number, UPS model, and network configuration (IPs, VLANs). Store this in a version-controlled configuration repository. Edge nodes deployed without documentation become unmaintainable.

Checklist Summary Table

Phase	Key Tasks	Risk If Skipped
1. Initial Setup	Flash JetPack, update packages, verify GPU	Unpatched CVEs, runtime version mismatch
2. Storage & Power	NVMe layout, Docker root, nvpmodel, swap	OS storage overflow, thermal throttling, latency spikes
3. Security Hardening	SSH keys, firewall, service disable, camera passwords, VLAN verify	Network intrusion, camera takeover, data exfiltration
4. Pipeline Deployment	Docker, RTSP config, TensorRT build, ring buffer, systemd, validation	Pipeline failure on reboot, recording gaps, incorrect inference
5. Monitoring	SMART, thermal, UPS, pipeline metrics, documentation	Silent failures, undetected drive wear, unrecoverable outages

Common Pitfalls

Skipping security hardening because the node is "on a private network": Private networks get breached. A Jetson on the corporate LAN with default SSH passwords and no firewall is one lateral move away from a significant incident.
Not testing auto-start after reboot: A pipeline configured to start manually works fine during setup and fails silently after the first unexpected reboot. Test reboot recovery during QA, not after deployment.
Building TensorRT engines on a different Jetson model: TensorRT engine files are not portable between different Jetson modules. An engine built on an Orin NX will not run on an Orin Nano and vice versa. Build engines on the exact target hardware.
Deploying without end-to-end pipeline validation: Testing the inference model in isolation does not validate the full pipeline — RTSP ingestion, decode, pre-processing, inference, post-processing, and alert logic all need end-to-end testing with real camera input.
Not pinning Docker image versions: docker pull my-inference-app:latest in a production deployment script will silently pull a new image version on the next update run, potentially breaking the pipeline with incompatible changes.
Documenting nothing: Edge nodes deployed without configuration documentation are effectively black boxes. When the node fails 18 months later, the team has to reconstruct every configuration decision from scratch.

Final Sign-Off Checklist

☐ Node rebooted and inference pipeline confirmed auto-started via systemd?
☐ End-to-end pipeline validation completed with real camera input and a known test scenario?
☐ SMART monitoring confirmed active with alert threshold set at 80% Percentage Used?
☐ Thermal monitoring confirmed active; no thermal zones exceeding 85°C under sustained load?
☐ Node configuration documented and stored in version control (JetPack version, model, NVMe serial, VLAN layout)?

Frequently Asked Questions

How long does TensorRT engine optimization take?

Engine build time depends on model size and optimization level. YOLOv8n INT8: 3–8 minutes on Orin Nano. YOLOv8l INT8: 15–30 minutes on Orin NX. Build once, cache the engine file. Rebuild only when the model changes or JetPack is updated.

Should I use Docker or run the inference application natively?

Docker is strongly recommended for production deployments. It provides dependency isolation, reproducible environments, and simplified updates via image version pinning. The NVIDIA Container Toolkit provides full GPU access from within Docker containers on Jetson.

How do I remotely access a deployed Jetson?

SSH over the management VLAN is standard. For nodes without a VPN or direct network path, a reverse SSH tunnel or a lightweight VPN (WireGuard) allows secure remote access without exposing SSH to the internet. Never expose SSH directly to a public IP.

What is the recommended JetPack upgrade procedure?

JetPack upgrades (e.g., from 6.0 to 6.1) typically require a full re-flash. Document the current configuration, snapshot any custom configuration files, re-flash the new JetPack, then re-apply configuration from your documentation. A configuration-as-code approach (Ansible, custom provisioning scripts) makes this substantially faster.

How do I handle failed inference pipeline containers?

Configure the systemd service or Docker restart policy (--restart=on-failure:5) for automatic restart. Add a watchdog process that monitors pipeline liveness (e.g., checks that frames are being processed at expected rate) and restarts the container if the pipeline stalls without crashing.

Is there a way to A/B test model versions on a deployed node?

One approach: maintain two model directories (model-a and model-b) and an environment variable in the container configuration that selects the active model. Switching models becomes a container restart with the environment variable changed — no re-deployment of the container image required.

The Bottom Line

A Jetson deployment is production-ready only when it survives reboot, writes to durable storage, runs with monitoring, and is hardened before network exposure. A model working once in the lab is not a production sign-off.