// Production Operations

Jetson Deployment Checklist: Production Setup Steps That Prevent Failure

Last updated: March 2026

A production Jetson node needs more than a working demo. This checklist covers the setup steps that prevent storage overflow, security gaps, reboot failures, thermal issues, and silent pipeline outages in the field.

5 deployment phases
4–8 hr first node
1–2 hr repeatable rollout
Monitoring before ship

Quick Answer

Production readiness depends on completing all five phases in sequence: platform setup (JetPack, packages, GPU verification), storage and power (NVMe, Docker root, nvpmodel), security hardening (SSH keys, firewall, camera passwords), pipeline deployment (Docker, TensorRT, systemd service), and monitoring (SMART, thermal, pipeline metrics). Each phase builds on the previous; skipping earlier phases creates downstream failures that are harder to diagnose remotely.

Most deployment failures come not from the inference model itself but from skipped hardening, missing monitoring, or non-persistent runtime setup. A node that works perfectly during setup often fails within weeks of field deployment because configuration is lost on reboot, security gaps go undetected, or storage fills silently. This checklist ensures the node survives reboot, persists configuration, detects problems early, and is hardened before network exposure.

Planning Takeaway

The most common deployment mistake is treating first boot as production-ready. Real production readiness requires persistence across reboot (systemd services, /etc/fstab), observability (SMART monitoring, thermal alerts, pipeline metrics), storage durability (NVMe with write endurance), and hardening (SSH keys, firewall, no default credentials) before the node ever touches a customer network. If the node cannot survive an unexpected reboot and recover automatically with full pipeline functionality, it is not production-ready.

Who This Page Is For

Production Readiness at a Glance

  • JetPack: Pinned version flashed, CUDA/TensorRT verified, packages updated
  • Storage: NVMe mounted with noatime, Docker root on NVMe, ring buffer configured
  • Security: SSH key-only auth, UFW enabled, default credentials changed on node and cameras
  • Pipeline: systemd service enabled, auto-start tested after reboot, TensorRT engine built on target hardware
  • Monitoring: SMART alerts at 80% TBW, thermal alerts at 90°C, pipeline FPS/latency metrics active

Rule: All five areas must be complete before handing a node off to operations. Partial deployments create time-bomb failure modes.

Why this matters: Edge nodes deployed under time pressure almost always have gaps in security hardening and monitoring. Security gaps surface 6–18 months later as network intrusions or data exfiltration. Missing monitoring means drive failures, thermal throttling, and pipeline stalls go undetected until a site visit confirms what logs would have shown weeks earlier.

Engineering Summary

Checklist assumptions: This checklist is sequential, and skipping earlier phases usually creates downstream failures that are harder to diagnose later. For example, skipping Phase 1 package updates can break Phase 4 TensorRT builds with CUDA incompatibility errors, while skipping Phase 2 Docker root migration often surfaces later as disk pressure and monitoring alerts.

Note: The checklist assumes a production inference node with NVMe storage, persistent network connectivity, and remote management requirements. Headless lab nodes or rapid prototyping deployments may skip some hardening steps temporarily, but production nodes need all five phases complete before handoff to operations.

Estimated Setup Time by Phase

Strategic summary: The first node takes longest because it establishes the baseline. Later nodes become faster only if setup is scripted and documented—a manual repeat of the first node's steps will take just as long.

Phase First Node With Provisioning Script Primary Time Driver
1. Initial Setup 60–90 min 30–45 min JetPack flash + package updates
2. Storage & Power 30–60 min 10–15 min Partition layout + fstab verification
3. Security Hardening 45–60 min 10–15 min SSH config + firewall rules + VLAN verification
4. Pipeline Deployment 90–180 min 30–60 min TensorRT engine build (5–30 min) + end-to-end validation
5. Monitoring 30–60 min 10–20 min Alert configuration + documentation

Note: Estimates assume single-node setup with hardware on hand. First-time TensorRT engine builds for large models (YOLOv8l INT8) add 15–30 minutes to Phase 4.

Recommendation: If you expect to deploy more than one node, convert every manual step in this page into provisioning code after the first successful build. The time investment pays for itself on the second node and becomes a repeatable baseline for all future nodes.

Phase 1: Initial Setup

This phase establishes a known-good software baseline and confirms the node is usable before configuration work begins.

[ ] Flash JetPack via SDK Manager or sdkmanager CLI
Use the current JetPack release from the NVIDIA developer portal. Flash from a Linux host using SDK Manager, or use the pre-flashed SD card image for developer kits. Verify the JetPack version matches your target software stack.

[ ] Complete initial OS setup (first boot)
Set hostname, timezone, and locale during first-boot setup. Use a hostname that identifies the node's physical location (e.g., edgenode-warehouse-01). Avoid generic hostnames like "jetson" that create conflicts when multiple nodes appear on the same network.

[ ] Update all packages
Run sudo apt update && sudo apt upgrade -y immediately after first boot. JetPack images are rarely current with upstream security patches.

[ ] Verify CUDA, cuDNN, and TensorRT versions
Run dpkg -l | grep -E "cuda|cudnn|tensorrt" to confirm versions installed. Document these versions — they determine which model export and conversion paths are supported. Mismatched runtime versions are a common cause of inference failures.

[ ] Test GPU availability
Run python3 -c "import torch; print(torch.cuda.is_available())" (if PyTorch is installed) or run a TensorRT sample to confirm the GPU is accessible and functional.

Phase 2: Storage and Power Configuration

This phase moves the node from demo storage assumptions to production durability and power behavior.

[ ] Install and format NVMe SSD
Insert the NVMe drive, partition it, and format according to your storage layout plan. For the recommended 4-partition layout, see storage layout and ring buffer design. Mount partitions persistently via /etc/fstab with noatime on the video partition.

[ ] Move Docker root to NVMe
By default, Docker stores layers in /var/lib/docker on the OS drive. On systems with a small eMMC or SD card boot drive, this fills quickly. Update /etc/docker/daemon.json to set "data-root": "/opt/docker" pointing to the NVMe.

[ ] Configure nvpmodel power mode
Select the power mode matching your thermal design: sudo nvpmodel -m 1 for 7W mode (Orin Nano), -m 0 for max power. Make the setting persistent across reboots by editing /etc/nvpmodel.conf. For thermal context, see fanless enclosure thermal constraints.

[ ] Set jetson_clocks for sustained inference performance
Run sudo jetson_clocks to lock CPU, GPU, and EMC clocks to maximum for benchmarking. For production, evaluate whether jetson_clocks or the dynamic clock governor better matches your latency vs. power requirements.

[ ] Verify swap configuration
Check swap with free -h. Jetson enables zRAM swap by default. For production inference nodes where memory pressure should be managed by right-sizing hardware rather than swapping, consider disabling swap to prevent unpredictable latency spikes. See RAM sizing for edge inference for guidance on sizing correctly.

Phase 3: Security Hardening

This phase closes the most common attack paths before the node is exposed to customer or enterprise networks.

[ ] Change default OS user password
The default JetPack account uses a well-known password. Change it immediately: passwd. Disable or remove the default account if a dedicated service account is used for deployment automation.

[ ] Configure SSH key authentication; disable password SSH
Add a public key to ~/.ssh/authorized_keys. Then set PasswordAuthentication no in /etc/ssh/sshd_config and restart sshd. This prevents brute-force password attacks on SSH.

[ ] Configure UFW firewall
Enable UFW with a default-deny policy. Allow only necessary ports: SSH (22), RTSP ingestion if cameras connect to the node on a routed path, and any application-specific management ports. Block all inbound traffic by default.

[ ] Disable unnecessary services
Disable services not needed in production: sudo systemctl disable bluetooth, sudo systemctl disable avahi-daemon, sudo systemctl disable cups. Reduce attack surface by running only required services.

[ ] Configure unattended-upgrades for security patches
Install unattended-upgrades and configure it to automatically apply security updates. Review the configuration to exclude kernel updates (which may require JetPack re-flash) while allowing userspace security patches.

[ ] Change default camera credentials
Log in to each IP camera's web interface and change the default username/password. Document credentials securely. Cameras with default credentials are one of the most common entry points for network intrusions.

[ ] Verify VLAN isolation
From a device on the camera VLAN, verify that it cannot reach the management VLAN or the internet (unless explicitly permitted). From the compute node's management interface, verify that it can reach the camera VLAN for RTSP but camera devices cannot initiate connections to the management VLAN.

Phase 4: Inference Pipeline Deployment

This phase makes the inference application persistent, reproducible, and verifiably correct on the target hardware.

[ ] Deploy inference application via Docker
Use the NVIDIA container toolkit with --runtime nvidia for GPU access. Pin container image versions explicitly — avoid :latest tags in production. Test container startup time; long startup delays affect recovery time after node reboots.

[ ] Configure all RTSP stream URLs
Enumerate each camera's RTSP URL in the application configuration file. Use static IP assignments or DNS entries — avoid relying on mDNS or dynamic hostnames. Test each stream with ffprobe before starting the inference pipeline.

[ ] Run TensorRT engine build and verify
TensorRT engines are device-specific — build engines on the target Jetson, not on a development machine. Engine build can take 5–30 minutes depending on model size. Verify engine output with a test image before integrating into the full pipeline.

[ ] Configure ring buffer and log rotation
Set up the ring buffer cleanup process and configure logrotate. Test ring buffer behavior by filling the video partition to capacity during QA. Verify that the cleanup process runs correctly and that no recording gaps result from deletion timing.

[ ] Configure systemd service for auto-start
Create a systemd service unit for the inference application. Set Restart=on-failure and RestartSec=10s for automatic recovery. Enable the service: sudo systemctl enable inference-pipeline.service. Test by rebooting the node and verifying the pipeline starts automatically.

[ ] Validate inference accuracy end-to-end
Run a known test scenario through the pipeline — a person walking in frame, a vehicle of known type, or an object in a known position — and verify that detections are correct and alerts fire as expected. Do not assume the pipeline works correctly in production without end-to-end validation.

Phase 5: Monitoring and Maintenance

This phase ensures the first sign of a problem is an alert, not a site visit.

[ ] Configure SMART monitoring for NVMe
Install smartmontools and configure smartd to run weekly health checks. Log SMART attributes to a remote monitoring system. Alert when Percentage Used exceeds 80% or Available Spare drops below 20%.

[ ] Set up thermal monitoring
Log junction temperatures via tegrastats or a custom script reading /sys/devices/virtual/thermal/thermal_zone*/temp. Alert when any thermal zone exceeds 90°C. Sustained throttling indicates an enclosure thermal design issue that must be resolved before leaving the node in production.

[ ] Configure NUT for UPS monitoring
If a UPS is installed (recommended — see power and UPS for edge deployments), configure NUT to monitor battery state and trigger graceful shutdown when battery falls below 30% charge or 3 minutes estimated runtime.

[ ] Establish pipeline health metrics
Export metrics from the inference pipeline: frames processed per second, inference latency (p50, p95, p99), dropped frames, missed detections on test scenes. A healthy pipeline should show stable FPS and latency over 24-hour periods with no degradation trend.

[ ] Document node configuration
Record: JetPack version, TensorRT version, model name and version, RTSP stream URLs, power mode, partition layout, NVMe model and serial number, UPS model, and network configuration (IPs, VLANs). Store this in a version-controlled configuration repository. Edge nodes deployed without documentation become unmaintainable.

Checklist Summary Table

Phase Key Tasks Risk If Skipped
1. Initial Setup Flash JetPack, update packages, verify GPU Unpatched CVEs, runtime version mismatch
2. Storage & Power NVMe layout, Docker root, nvpmodel, swap OS storage overflow, thermal throttling, latency spikes
3. Security Hardening SSH keys, firewall, service disable, camera passwords, VLAN verify Network intrusion, camera takeover, data exfiltration
4. Pipeline Deployment Docker, RTSP config, TensorRT build, ring buffer, systemd, validation Pipeline failure on reboot, recording gaps, incorrect inference
5. Monitoring SMART, thermal, UPS, pipeline metrics, documentation Silent failures, undetected drive wear, unrecoverable outages

Common Pitfalls

  • Skipping security hardening because the node is "on a private network": Private networks get breached. A Jetson on the corporate LAN with default SSH passwords and no firewall is one lateral move away from a significant incident.
  • Not testing auto-start after reboot: A pipeline configured to start manually works fine during setup and fails silently after the first unexpected reboot. Test reboot recovery during QA, not after deployment.
  • Building TensorRT engines on a different Jetson model: TensorRT engine files are not portable between different Jetson modules. An engine built on an Orin NX will not run on an Orin Nano and vice versa. Build engines on the exact target hardware.
  • Deploying without end-to-end pipeline validation: Testing the inference model in isolation does not validate the full pipeline — RTSP ingestion, decode, pre-processing, inference, post-processing, and alert logic all need end-to-end testing with real camera input.
  • Not pinning Docker image versions: docker pull my-inference-app:latest in a production deployment script will silently pull a new image version on the next update run, potentially breaking the pipeline with incompatible changes.
  • Documenting nothing: Edge nodes deployed without configuration documentation are effectively black boxes. When the node fails 18 months later, the team has to reconstruct every configuration decision from scratch.

Final Sign-Off Checklist

  • ☐ Node rebooted and inference pipeline confirmed auto-started via systemd?
  • ☐ End-to-end pipeline validation completed with real camera input and a known test scenario?
  • ☐ SMART monitoring confirmed active with alert threshold set at 80% Percentage Used?
  • ☐ Thermal monitoring confirmed active; no thermal zones exceeding 85°C under sustained load?
  • ☐ Node configuration documented and stored in version control (JetPack version, model, NVMe serial, VLAN layout)?

Frequently Asked Questions

How long does TensorRT engine optimization take?

Engine build time depends on model size and optimization level. YOLOv8n INT8: 3–8 minutes on Orin Nano. YOLOv8l INT8: 15–30 minutes on Orin NX. Build once, cache the engine file. Rebuild only when the model changes or JetPack is updated.

Should I use Docker or run the inference application natively?

Docker is strongly recommended for production deployments. It provides dependency isolation, reproducible environments, and simplified updates via image version pinning. The NVIDIA Container Toolkit provides full GPU access from within Docker containers on Jetson.

How do I remotely access a deployed Jetson?

SSH over the management VLAN is standard. For nodes without a VPN or direct network path, a reverse SSH tunnel or a lightweight VPN (WireGuard) allows secure remote access without exposing SSH to the internet. Never expose SSH directly to a public IP.

What is the recommended JetPack upgrade procedure?

JetPack upgrades (e.g., from 6.0 to 6.1) typically require a full re-flash. Document the current configuration, snapshot any custom configuration files, re-flash the new JetPack, then re-apply configuration from your documentation. A configuration-as-code approach (Ansible, custom provisioning scripts) makes this substantially faster.

How do I handle failed inference pipeline containers?

Configure the systemd service or Docker restart policy (--restart=on-failure:5) for automatic restart. Add a watchdog process that monitors pipeline liveness (e.g., checks that frames are being processed at expected rate) and restarts the container if the pipeline stalls without crashing.

Is there a way to A/B test model versions on a deployed node?

One approach: maintain two model directories (model-a and model-b) and an environment variable in the container configuration that selects the active model. Switching models becomes a container restart with the environment variable changed — no re-deployment of the container image required.

The Bottom Line

A Jetson deployment is production-ready only when it survives reboot, writes to durable storage, runs with monitoring, and is hardened before network exposure. A model working once in the lab is not a production sign-off.

// Related Guides