Why the split is forced
Hobby-grade drone autopilots close their rate loops at 400 Hz on a Cortex-M4. Most factory automation closes at 1 kHz on a real-time PLC. Field robots — anything that pulls tools through soil, holds a tool against a workpiece, or drives hydraulic actuators with humans nearby — live in the harder regime: 1 kHz or faster, sub-millisecond jitter, plus a hard guarantee that no scheduler hiccup, page fault, or garbage collector will miss a deadline.
That regime forces a split. On one side, the perception stack needs tens of TOPS for depth networks, behavior models, and vision-language planners. On the other side, the actuation stack needs cycle-accurate timing for current loops, encoder interfaces, EtherCAT, and ISO 13849-1 interlocks. No single chip does both well. So the architecture splits into a real-time domain and a compute domain, and the engineering question collapses to four sub-questions:
- Which FPGA owns the real-time domain?
- Which Jetson module owns the compute domain?
- How do the two halves talk to each other?
- Where does the safety boundary sit?
This guide walks each one with the parts, the standards, and the trade-offs that matter when you actually order silicon.
The two-domain architecture
The architecture is conceptually simple. An FPGA — typically a Lattice ECP5, Lattice CrossLink-NX, AMD/Xilinx Artix-7, or AMD/Xilinx Spartan-7 — handles every signal that has a hard deadline. That includes three-phase PWM for BLDC servos, quadrature encoder decoding, current-shunt ADC sampling, EtherCAT slave logic, watchdog timers, and the safety-related interlocks that hold up ISO 13849-1 PL-d or PL-e claims.
A Jetson Orin module — Nano, NX, or AGX — handles everything else. ROS 2 nodes for perception and planning, CUDA kernels for inference, the Model Context Protocol stack if the robot exposes an LLM-facing tool surface, OTA firmware delivery, telemetry, and the operator HMI.
Between them sits an interconnect — PCIe, SPI, UART, or memory-mapped DMA over a soft bridge — that does only two jobs. It carries setpoints down (target velocity, target position, behavior-tree state) and state up (measured current, encoder counts, fault flags). Everything else lives in one domain or the other, and the discipline of keeping it that way is what makes the architecture trustworthy.
This is the FPGA real-time control and Jetson edge AI split that defines every credible field-robotics carrier board on the market in 2026.
FPGA family selection
The four candidate families in current field-robotics designs differ enough that the choice is a real one.
Lattice ECP5 / ECP5-5G. Logic density from 12K to 84K LUTs, up to 3.744 Mb of sysMEM block RAM, up to 156 18×18 DSP multipliers, four SerDes lanes at 3.2 Gbps (5.0 Gbps on ECP5-5G), and native hard-IP for PCIe, 1GbE/SGMII, and XAUI. The decisive advantage for early-stage robotics teams is the open-source toolchain — Yosys plus nextpnr is actively maintained against ECP5, so a startup can stand up its synthesis flow without buying Vivado seats and without binding its build pipeline to a vendor IDE. ECP5 also reaches BGA pitches down to 0.5 mm, which is the realistic floor for hand-assembled prototype boards.
Lattice CrossLink-NX. LIFCL-17 and LIFCL-40 variants, with dual MIPI D-PHY hard-IP at 2.5 Gbps per lane and 6×6 mm packaging. CrossLink-NX is not where you put motor-control logic — it is where you put sensor-aggregation logic. A common pattern uses CrossLink-NX as a front-end MIPI multiplexer for two or four 2 MP cameras, then passes aggregated video into the Jetson over CSI-2, leaving the ECP5 alone to close the servo loops.
AMD/Xilinx Artix-7. XC7A35T at 33,280 LUTs through XC7A200T at 215,360 LUTs, with GTP transceivers up to 6.6 Gbps and 13 Mb of BRAM. Artix-7 is the right call when the design needs integrated transceivers for SFP+ links, multi-gigabit sensor backhaul, or PCIe Gen2 hard-IP without an external PHY. The toolchain (Vivado) is heavier and the part cost runs higher than ECP5 at equivalent LUT density.
AMD/Xilinx Spartan-7. Sub-watt operation in the smaller variants, 6K–102K logic cells, 176 GMACs at 551 MHz. No transceivers — that is the cost-down. Spartan-7 is the safety-co-processor option: a second FPGA on the board that runs only the diverse-channel safety logic, drawing less than a watt, while the primary ECP5 or Artix-7 handles motion. This pattern earns a straightforward ISO 13849-1 PL-d argument because the two FPGAs are architecturally independent.
The right answer for most field-robotics builds in 2026 is ECP5 as the motion-control workhorse, optionally paired with CrossLink-NX for vision aggregation. Artix-7 enters when SFP+ or PCIe Gen2 hard-IP is essential. Spartan-7 enters when the safety case wants a separate, independent device.
Jetson Orin module selection
The Orin family in 2026 covers roughly a 10× range in AI performance.
| Module | TOPS | LPDDR5 | Bandwidth | Power |
|---|---|---|---|---|
| Orin Nano 4GB | 34 | 4 GB | 51 GB/s | 7–25 W |
| Orin Nano 8GB | 67 | 8 GB | 102 GB/s | 7–25 W |
| Orin NX 8GB | 117 | 8 GB | 102.4 GB/s | 10–40 W |
| Orin NX 16GB | 157 | 16 GB | 102.4 GB/s | 10–40 W |
| AGX Orin 32GB | 200 | 32 GB | 204.8 GB/s | 15–60 W |
| AGX Orin 64GB | 275 | 64 GB | 204.8 GB/s | 15–60 W |
Two non-obvious points decide the module choice in practice.
First, memory bandwidth dominates inference latency for the depth, segmentation, and VLM models real field robots run. A 67-TOPS Orin Nano 8GB with 102 GB/s often outpaces the 34-TOPS Nano 4GB by more than the TOPS ratio suggests, because the model is bandwidth-bound, not compute-bound. NVIDIA’s late-2024 software update raised the Nano 8GB from 40 to 67 TOPS and bandwidth from 68 to 102 GB/s without a hardware change.
Second, the power envelope sets the thermal design, not the average inference load. A Jetson AGX at 60 W in a sealed IP-rated enclosure can outpace the heat path through the housing on a 40 °C day in a vineyard, so the right module for agricultural robotics is often the Orin NX 16GB at 157 TOPS, not the AGX, even when the AGX would fit the workload on a bench.
Interconnect: PCIe, SPI, or UART
Three interconnect options dominate.
PCIe. The Jetson Orin Nano exposes PCIe Gen3 — four controllers, seven lanes, 56 GT/s aggregate, with controller 0 configurable as x1, x2, or x4. The Orin NX doubles per-lane throughput to PCIe Gen4 (16 GT/s per lane, 144 GT/s aggregate). PCIe Gen3 ×4 delivers roughly 4 GB/s unidirectional bandwidth, and small MMIO transactions complete in under 2 µs one-way. Use PCIe when the design needs DMA throughput for high-rate sensor data or when the FPGA acts as a bus master into shared memory. The cost is design complexity — PCIe layout discipline, AC-coupling caps, reference-clock distribution, and a longer bring-up.
SPI. The bread-and-butter interconnect for setpoint exchange. SPI at 50 MHz over a four-wire bus carries roughly 6 MB/s of payload (50 MHz ÷ 8 bits/byte ≈ 6.25 MB/s raw, ~6 MB/s after framing overhead) — enough for hundreds of setpoint/state words per millisecond cycle. SPI latency is deterministic, the FPGA hardware is trivial, and isolation drops in with parts like the ADI ADuM4154: 5 kV isolation, 17 MHz clock support, 14 ns propagation delay, four-peripheral CS multiplex with 2.5 µs address switching. Use SPI when the payload is small and the schedule is predictable, which describes most servo and hydraulic control architectures.
UART. Slowest, simplest, and still useful. A 3 Mbaud UART carries 300 kB/s (3 Mbps ÷ 10 bits/byte including start/stop framing) and survives almost any board layout. Use it for the safety-event channel, the debug shell, or the secondary path between the FPGA’s safety state machine and a watchdog peer on the Jetson side. UART is the path that keeps working when PCIe gets wedged during a kernel hang.
Most field-robotics carriers run two interconnects in parallel: SPI for the per-millisecond setpoint stream, PCIe for bulk telemetry and configuration loads, and UART as the always-on side channel.
Safety partitioning: the FPGA owns the boundary
ISO 13849-1:2023 defines five performance levels, PL a through PL e. PL d requires a probability of dangerous failure per hour (PFHd) between 10⁻⁷ and 10⁻⁶ — one failure per million to ten million operating hours. PL e tightens that to below 10⁻⁷. EU law mandates the 2023 edition by 2027-05-15, replacing EN ISO 13849-1:2015.
The architectural rule that follows is simple. The Jetson cannot sit in the safety chain. Linux scheduling latency under PREEMPT_RT measured 15–23 µs worst case in published cyclictest results on a Cortex-A53 — well-behaved, but with no mathematically sound bound below the single-digit-millisecond range. Standard, non-RT Linux has no bounded worst case at all and can stall for tens of milliseconds under load. A safety interlock that depends on Linux completing a system call inside a deadline cannot satisfy PL d, let alone PL e.
So the FPGA owns the safety boundary. The pattern is:
- E-stop input lands directly on FPGA pins, not on Jetson GPIO.
- Light-curtain and door-switch inputs land on the FPGA, dual-channel, with a cross-check timer.
- The FPGA holds the gate-driver enable signal. Releasing that enable requires both safety inputs healthy AND a 100 Hz heartbeat from the Jetson AND an explicit operator command latched at the FPGA.
- Loss of any one input drops the enable in microseconds and shorts the motor windings via the gate driver’s hardware brake input.
The Jetson can ask the FPGA to consider a stop. The Jetson cannot ask the FPGA to defer a stop. That asymmetry is what makes the architecture safe — and the same asymmetry is what makes the AI side trustable in the first place.
Motor control and isolation
The gate-driver layer sits between the FPGA’s PWM outputs and the actual MOSFET half-bridges. TI’s DRV8353 is the workhorse for 9–100 V supply rails — 1 A source / 2 A sink peak gate drive, three integrated bidirectional current-shunt amplifiers with selectable 5/10/20/40 V/V gain, SPI configuration, and UVLO/OCP/OTW protection in a 6×6 mm WQFN-40. For 24 V and 48 V battery systems, the DRV8323 is the lower-voltage drop-in: 6–60 V, same package, same SPI map.
Galvanic isolation between the FPGA’s SPI bus and the gate-driver SPI bus is non-negotiable on high-voltage hydraulic and industrial supplies. The ADuM4154 SPIsolator handles that — 5 kV isolation, four-peripheral CS mux, 14 ns propagation delay — and the iCoupler chip-scale transformer outlives opto-isolators in field conditions.
EtherCAT and ROS 2 on the same board
A working field-robotics carrier handles two real-time networking tasks at once. The FPGA hosts an EtherCAT slave stack — ESC IP is available for Lattice and Xilinx devices — so the board sits on an EtherCAT segment alongside Beckhoff or Synapticon drives. EtherCAT cycle times under 100 µs are routine, processing 1,000 distributed digital I/O takes about 30 µs on 100 Mbit Ethernet, and distributed-clock jitter stays under 1 µs across 300 nodes on 120 m of cable.
On the Jetson side, ROS 2 talks to the FPGA via ethercat_driver_ros2, the ICube-Robotics hardware interface for ros2_control built on the IgH EtherCAT Master for Linux. A PREEMPT_RT kernel gives the master node a predictable enough scheduling environment to drive 1 kHz cycles, and PTP synchronizes the EtherCAT distributed-clock timestamps with the Jetson’s ROS 2 clock. A perception event at frame N+1 then lines up against motor state at the same wall-clock instant.
This is the construction and mining story in particular: a 1 kHz hydraulic control cycle on the FPGA, a 30 Hz perception loop on the Jetson, and a sub-millisecond shared time base so the planner can issue a setpoint that the FPGA actually executes on the next motor cycle.
The pitfalls
Three failure modes recur across hybrid-controller designs.
Linux jitter contaminates the control loop when the partition leaks. Teams sometimes put “just one quick” setpoint computation on the Jetson side and discover, weeks into bring-up, that p99.9 latency on that computation reaches the single-digit milliseconds — consistent with Red Hat’s formal PREEMPT_RT analysis. The fix is architectural, not tuning: any computation in the per-cycle critical path lives in the FPGA fabric or in a soft-core RISC-V on the FPGA, not on the Jetson.
Shared DRAM contention bites bulk transfers. PCIe DMA from the FPGA to Jetson LPDDR5 competes with the GPU’s bandwidth budget. 102 GB/s on Orin NX is not infinite, and a 4 GB/s DMA stream during a perception inference can measurably extend frame times. Schedule DMA bursts to align with inter-frame idle windows.
Ground loops and EMI between digital and analog domains wreck current-sense accuracy. The board needs a clear partition: a quiet analog ground for the current-shunt amplifiers, a digital ground for the FPGA and Jetson, and a single star point. The ADuM4154 isolation buys most of the isolation budget; the layout buys the rest.
Thermal coupling between Jetson and FPGA hot zones throttles the wrong device. An AGX Orin at 60 W and an Artix-7 200T drawing in the low-single-digit watt range, sharing a thermal pad, will throttle the Jetson first. Place the FPGA upstream of the airflow path, dedicate the heatsink mass to the Jetson, and have the FPGA’s safety FSM drop the gate-driver enable before silicon junction temperature reaches its data-sheet maximum.
Wide-input power deserves the same partition discipline. Vicor DCM2322 isolated DC-DC modules cover 9–50 V, 14–72 V, and 43–154 V input ranges at 35–120 W per module — enough to power FPGA, Jetson, and gate-driver stages from a single unregulated battery bus, with high-side switches handling inrush limiting and inductive-discharge clamping for field enclosures.
Build vs. buy: when custom wins
The COTS alternatives are NI CompactRIO (cRIO), Speedgoat, and Beckhoff CX. None publish list pricing. New NI CompactRIO systems with the necessary I/O modules routinely exceed $10K per configuration; Speedgoat new configurations run significantly higher; and Beckhoff CX licensing ties the platform to TwinCAT and the Beckhoff I/O ecosystem. All three are credible for R&D and HIL — none are credible inside a $30K agricultural robot built at production volume.
The break-even is structural. COTS modules charge a per-unit licensing and integration tax that custom carriers pay once, as NRE. Once production volume amortizes the PCB design across the unit count, the custom carrier is materially cheaper per shipped robot, and the OEM owns the design rather than renting it.
Standing up a custom carrier in-house carries its own cost. Building comparable FPGA + Jetson capability internally typically runs $900K–$2M+ over 18–24 months — the cost of two FPGA engineers, an embedded Linux engineer, a hardware engineer, an EMC iteration, and the second board spin.
What TACTUN does
TACTUN designs the control spine for field robotics — the FPGA + Jetson carrier board, the motor-control stages, the safety logic, and the application-builder software that runs on top. The spine pairs a custom FPGA architecture for deterministic real-time control with NVIDIA Jetson edge AI compute, configured against the customer’s specific sensors, actuators, and motor mix (servo, stepper, hydraulic, pneumatic). The customer keeps full ownership of the AI stack; TACTUN owns the spine underneath it.
The engagement is built to remove the in-house tax. Board architecture is designed in 5 business days. Prototypes ship in 3–5 months on standard contract manufacturing. The customer pays $0 NRE and pays only for production hardware. A 14-year systems-integration record on the founding team, plus current NVIDIA Inception Program membership, is what makes that schedule realistic. The TACTUN platform handles the FPGA, the Jetson integration, the motor-control silicon, and the safety partitioning, so the customer team stays focused on the autonomy stack.
Talk to us
If you are standing up a field-robotics product and the FPGA + Jetson architecture above looks like the right shape — but you would rather not absorb 18–24 months of carrier-board development to get there — contact us. We will look at the actuator list, the sensor mix, and the safety envelope, and tell you whether the spine we already build maps to your machine.