SAM3-TENSORRT-PYTHON — SAM3 inference pipeline with TensorRT (FP16)
SAM3 TensorRT Pipeline
Current: 2026-02-20 — Repository status: actively maintained. Keywords: Current, SAM3, TensorRT, FP16, TensorRT-Python, Gradio, ONNX, NVIDIA
This project provides a complete pipeline to run SAM3 (Segment Anything Model 3) with TensorRT:
- System audit for CUDA / TensorRT readiness (
Check.py) - ONNX export of the SAM3 submodules (
SAM3_PyTorch_To_Onnx.py) - TensorRT engine building from ONNX (
Build_Engines.py) - High‑performance inference with text prompts (
SAM3_TensorRT_Inference.py) - Interactive web UI for easy testing (
ui_gradio.py)
COLAB DEMO LINK : https://colab.research.google.com/gist/Kishan200308/14ea81a5a8c0a6c5a2e729dd781d82a2/sam3-tensorrt-colab-demo.ipynb
The workflow is designed around FP16 TensorRT engines with dynamic shapes and explicit batch, supporting both bounding box detection and mask segmentation modes.

🚀 Performance Benchmarks
By migrating from native PyTorch to TensorRT (FP16), this pipeline delivers massive efficiency gains.
| Metric | Original PyTorch | TensorRT (FP16) | Improvement |
|---|---|---|---|
| VRAM Usage | ~6-7 GB | ~2.4 GB | ~65% Reduction |
| Inference Time (T4 GPU) | ~1.6 sec | ~0.6 sec | ~2.5x Speedup |
Note: Benchmarks tested on NVIDIA T4 GPU. Performance may vary based on hardware.
Quick Start
Check System Readiness
Before starting, verify your system is properly configured:
python3 Check.py
This will check CUDA, TensorRT, and all required dependencies.
1. Environment Setup
Python Packages
Install required Python packages:
pip install torch torchvision --upgrade
pip install "git+https://github.com/huggingface/transformers"
pip install onnx onnxscript onnxslim onnxruntime-gpu onnx_graphsurgeon opencv-python matplotlib tokenizers tabulate --upgrade
pip install nvidia-modelopt "numpy>=2.2.6" "protobuf>=4.25.1" nvidia-ml-py gradio --upgrade
pip uninstall opencv-python opencv-contrib-python opencv-python-headless -y
pip install opencv-python
Note: The transformers installation from git is only required if you want to export ONNX from the original SAM3 HuggingFace repo.
TensorRT Installation
MAKE SURE TENSORRT IS INSTALLED AND ADDED TO PATH
For Linux, install TensorRT 10.14.1.48 with CUDA 12.9:
sudo apt-get install -y --allow-downgrades \
libnvinfer10=10.14.1.48-1+cuda12.9 \
libnvinfer-dev=10.14.1.48-1+cuda12.9 \
libnvinfer-headers-dev=10.14.1.48-1+cuda12.9 \
libnvinfer-headers-plugin-dev=10.14.1.48-1+cuda12.9 \
libnvinfer-bin=10.14.1.48-1+cuda12.9 \
libnvinfer-dispatch10=10.14.1.48-1+cuda12.9 \
libnvinfer-dispatch-dev=10.14.1.48-1+cuda12.9 \
libnvinfer-lean10=10.14.1.48-1+cuda12.9 \
libnvinfer-lean-dev=10.14.1.48-1+cuda12.9 \
libnvinfer-plugin10=10.14.1.48-1+cuda12.9 \
libnvinfer-plugin-dev=10.14.1.48-1+cuda12.9 \
libnvinfer-vc-plugin10=10.14.1.48-1+cuda12.9 \
libnvinfer-vc-plugin-dev=10.14.1.48-1+cuda12.9 \
libnvonnxparsers10=10.14.1.48-1+cuda12.9 \
libnvonnxparsers-dev=10.14.1.48-1+cuda12.9 \
python3-libnvinfer=10.14.1.48-1+cuda12.9 \
python3-libnvinfer-dev=10.14.1.48-1+cuda12.9 \
python3-libnvinfer-lean=10.14.1.48-1+cuda12.9 \
python3-libnvinfer-dispatch=10.14.1.48-1+cuda12.9
pip install tensorrt-cu12==10.14.1.48.post1 \
tensorrt-dispatch-cu12==10.14.1.48.post1 \
tensorrt-lean-cu12==10.14.1.48.post1
Add TensorRT to your PATH:
echo 'export PATH="/usr/src/tensorrt/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/tensorrt:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"' >> ~/.bashrc
source ~/.bashrc
Verify installation:
python3 Check.py
2. Download or Export ONNX Models
You have two options: download pre‑exported ONNX or export from PyTorch yourself.
2.1. Download pre‑exported ONNX (recommended)
Download prebuilt ONNX models for 1008 resolution:
hf download --local-dir "Onnx-Models" kishanstar2003/SAM3_ONNX_FP16
This will create an Onnx-Models directory containing:
vision-encoder.onnxtext-encoder.onnxgeometry-encoder.onnxdecoder.onnxtokenizer.json(auto copied to the engines directory)
2.2. Export ONNX from the SAM3 PyTorch model (Manual)
If you want to manually export ONNX models:
- Download the original SAM3 PyTorch model:
hf download facebook/sam3 --local-dir sam3
- Patch the transformers code:
python3 patch_sam3_interp_rope.py
- Export to ONNX:
python3 SAM3_PyTorch_To_Onnx.py --all --model-path "sam3" --output-dir "Onnx-Models" --device cuda --size 1008
Key points:
- The script exports four modules via wrappers:
VisionEncoderWrapper→vision-encoder.onnxTextEncoderWrapper→text-encoder.onnxGeometryEncoderWrapper→geometry-encoder.onnxDecoderWrapper→decoder.onnx
- All exports use opset 20 and dynamic batch / prompt dimensions, compatible with TensorRT.
- The
--size 1008parameter sets the resolution for the exported models.
Changing Resolution:
If you want to use a different resolution (e.g., 644), simply change the --size parameter when exporting ONNX:
python3 SAM3_PyTorch_To_Onnx.py --all --model-path "sam3" --output-dir "Onnx-Models" --device cuda --size 644
Then rebuild the engines using the same command as before:
python3 Build_Engines.py --onnx "Onnx-Models" --engine "Engines"
The engine building command remains the same regardless of resolution.
3. Build TensorRT Engines
Once you have the ONNX models in Onnx-Models, build TensorRT engines using Build_Engines.py.
python3 Build_Engines.py --onnx "Onnx-Models" --engine "Engines"
Arguments:
--base(optional): base directory (default: current working directory).--onnx: directory containing.onnxmodels (default:BASE/Onnx-Models).--engine: output directory for.enginefiles (default:BASE/Engines).
The script:
- Runs
trtexecwith FP16 and appropriate min/opt/max shapes for each module:vision-encodertext-encodergeometry-encoderdecoder
- Skips engines that already exist.
4. Verify System & TensorRT Installation
Use Check.py to audit your environment:
python3 Check.py
It reports:
- GPU hardware and driver via
nvidia-smi - NVCC presence and version
- PyTorch, CUDA version, and ONNX Runtime
- TensorRT Python bindings and builder creation
trtexecavailability- Available ONNX Runtime providers (CUDA / TensorRT, etc.)
Run this once after setup to confirm everything is wired correctly.
5. Run TensorRT Inference
With engines and tokenizer in place, you can run inference in two ways: command line or interactive web UI.
5.1. Command Line Inference
Run the end‑to‑end inference script:
Bounding Box Detection Mode:
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --prompt "person" --conf 0.8 --output result.jpg --models "Engines"
Mask Segmentation Mode:
python3 SAM3_TensorRT_Inference.py --input "Assets/Test.jpg" --prompt "person" --conf 0.8 --output result.jpg --models "Engines" --segment
Arguments:
--input: path to input image file.--prompt: text prompt (e.g., "person", "car", "dog").--conf: confidence threshold (0.0–1.0) applied on box scores.--output: path to save the annotated image.--models: directory containing.enginefiles andtokenizer.json(typicallyEngines).--segment: (optional) enable mask segmentation mode. If omitted, uses bounding box detection.
5.2. Interactive Web UI
For easier testing and experimentation, use the Gradio web interface:
python3 ui_gradio.py
This launches a web interface where you can:
- Upload images directly
- Enter text prompts interactively
- Adjust confidence thresholds with sliders
- Toggle between bounding box and segmentation modes
- View results instantly with performance metrics
The UI automatically loads the TensorRT engines from the Engines directory and provides real-time inference.
What the script does:
- Wraps each engine with
TRTModulefor efficient execution using PyTorch CUDA tensors. - Preprocesses the input image:
- Resize to
1008 × 1008 - Normalize to [-1, 1]
- Resize to
- Runs:
- Vision encoder → FPN features + positional encodings
- Text encoder → token embeddings + masks (via
tokenizersandtokenizer.json) - Decoder → predicted boxes, logits, presence logits, and masks
- Computes combined scores from logits and presence logits, filters by
--conf, denormalizes boxes, and draws them onto the original image. - Bounding Box Mode: Draws rectangular boxes around detected objects
- Segmentation Mode: Generates and overlays pixel-accurate masks for detected objects
Output:
- An image with bounding boxes/masks and scores, saved to --output.
🐳 Docker Image Usage (Always Pull Latest Code)
You can run the SAM3 TensorRT pipeline using the prebuilt Docker image while always pulling the latest code from GitHub.
📦 Important
The container mounts your current directory as /workspace.
Before starting, make sure you have downloaded the original SAM3 model from Hugging Face into your current directory:
Download SAM3
hf download facebook/sam3 --local-dir sam3
🚀 Run Docker Container (Auto-Update Repo)
Set the port (default: 7860):
export PORT=7860
Then run:
docker run --gpus all \
--ipc=host \
-p $PORT:$PORT \
-e GRADIO_SERVER_PORT=$PORT \
-v $(pwd):/workspace \
-it \
kishanstark2003/sam3_demo_gradio:latest \
/bin/bash -c "\
export PATH=\
⚠️ Disclaimer This project provides high-performance optimizations for SAM3. Note that TensorRT engine performance and stability are highly dependent on specific hardware (GPU architecture) and software (CUDA/TensorRT versions). Use these optimization scripts at your own risk.
⚖️ Licensing & Acknowledgments
This repository contains both original code and derivative works of Meta's Segment Anything Model 3 (SAM 3).
- Source Code: All Python scripts (
.py), conversion logic, and TensorRT wrappers provided in this repository are licensed under the MIT License. - SAM 3 Materials & Derivatives: The underlying model weights, architectures, and all exported ONNX/TensorRT engines generated by these scripts are subject to the Meta SAM License.
Research Acknowledgment
Per the SAM License (Section 1.b.ii), this project acknowledges the use of SAM Materials distributed by Meta Platforms, Inc. for the development and optimization of this TensorRT inference pipeline.
Built With
- gradio
- onnx
- python
- tensorrt


Log in or sign up for Devpost to join the conversation.