FAQ
June 10, 2026 · View on GitHub
Common questions from the issue tracker. See also docs/Build.md and the README.
Build & TensorRT versions
ICudaEngine has no member getNbBindings / C++ doesn't work on TensorRT 10/11.
Fixed — the C++ core has one trt_compat layer that supports TensorRT 8 / 10 / 11, and the build auto-detects the version. Rebuild from main.
Windows build fails (unistd.h, cudaMallocAsync not found, CMake paths).
unistd.h came from the old vendored filesystem header (removed); the core now uses cudaMalloc (not cudaMallocAsync), which fixes the "cannot locate cudaMallocAsync" loader error. Pass -DTensorRT_ROOT=.... Linux/Jetson are the primarily tested platforms; Windows is best-effort.
deserializeCudaEngine returns null.
An .engine only loads on the same TensorRT version that built it. Rebuild the engine with the TensorRT you link/run against, and keep LD_LIBRARY_PATH pointing at that TensorRT's lib.
Which TensorRT under /data (or elsewhere) is used?
-DTensorRT_ROOT=... wins; otherwise the build searches /data/TensorRT-* then /usr. See docs/Build.md.
Export & engine
Wrong predictions / export fails / EfficientNMS_TRT shape-inference warning on recent PyTorch.
Fixed — export-det.py / export-seg.py pass dynamo=False to torch.onnx.export (PyTorch 2.x's dynamo exporter mishandled the graph). Re-export from main.
_pickle.UnpicklingError: Weights only load failed (PyTorch ≥ 2.6).
PyTorch changed torch.load(weights_only=True) by default. Use a matching ultralytics version, or torch.serialization.add_safe_globals([...]) / weights_only=False for trusted checkpoints.
Change conf / iou / topk.
For End2End engines these are baked into the graph at export — re-run export-det.py --conf-thres ... --iou-thres ... --topk ... and rebuild. For raw engines, pass --conf-thres/--iou-thres to infer.py at runtime.
INT8.
Not built in. Build an INT8 engine with trtexec --int8 plus a calibration cache, or extend EngineBuilder with an IInt8Calibrator. FP16 (--fp16) is the supported fast path.
Inference with ONNX Runtime fails (missing EfficientNMS_TRT).
EfficientNMS_TRT is a TensorRT plugin, not an ONNX Runtime op. For ORT, export a raw model (native ultralytics export) and do NMS yourself.
Inference
Batch size > 1.
Export with a dynamic batch axis and build the engine with an optimization profile that covers your batch; the input is [N, 3, H, W]. (The default examples use batch 1.)
Use a USB camera / video in C++.
Pass a video file path to the binary (the runner opens it with cv::VideoCapture); for a camera, adapt csrc/core/src/runner.cpp to open a device index.
Export box coordinates to a text file.
The boxes are available after postprocess; add a small writer in the runner loop (each Object carries rect, label, prob).
Tracking (ByteTrack, etc.). Out of scope — the engine outputs detections; feed them to your tracker of choice.
Tasks
Multi-class pose.
Stock YOLOv8-pose is single-class (person). A custom multi-class pose model outputs 4 + nc + 17*3 channels per anchor; the postprocess takes the top score over the nc class channels as the score and its index as the label (nc = 1 is the usual single-class case).
Segmentation output is only an axis-aligned box + mask.
That is the current output; rotated rectangles / polygons would need an extra step (fit a minAreaRect / contour on the mask).
YOLOv11 / YOLOv10. YOLOv11 detect/pose use the same head family as v8 and generally work after a normal export. YOLOv10 has an NMS-free head and is not supported as-is.
Performance
Preprocess / mask resize is slow.
letterbox and the per-object mask resize are CPU-bound and scale with resolution and object count. Lower the input size, reuse buffers, or move resize to the GPU. Use --profile (C++) or benchmark.py to find the real bottleneck.
No detections on a custom model
Check, in order: the class count (nc) matches your model; the engine was built from the matching ONNX/TensorRT version; --conf-thres isn't too high; and the engine layout matches its consumer (End2End vs raw — see the README). If it still fails, open an issue with the export command, the model's nc, and a sample image.