English | 简体中文
Ultralytics YOLO: You Only Look Once
Summary
This article introduces the YOLO model provided based on the 'ultralytics/ultralytics' repository, offering model export, toolchain quantization and compilation methods, as well as Python and CPP deployment on the RDK platform. It includes 6 YOLO versions and 4 algorithm tasks, providing over 1,000 performance and accuracy benchmarks on platforms such as RDK X5, RDK S100, and RDK S100P, as well as usage methods.
The BPU model is open-sourced in the HuggingFace repository: HuggingFace: Cauchy_Ultralytics_YOLO
BPU offering model can also get on sweet potato server: [downloads/rdk_model_zoo] (https://archive.d-robotics.cc/downloads/rdk_model_zoo/)
RDK S Model Zoo: GitHub: RDK_Model_Zoo_S
RDK X5 Model Zoo: GitHub: RDK_Model_Zoo
Suggestions
Before reading this article, please ensure you have a basic understanding of Linux systems, some foundational knowledge in machine learning or deep learning, and basic development skills in Python or C/C++. Carefully check for any errors such as
No such file or directory,No module named "xxx",command not found,permission denied,SyntaxError: invalid syntax, etc. Do not copy and run commands line by line without understanding.Please make sure you have thoroughly read the first three chapters of the RDK manual, and have also experienced the OpenExplore package and the BPU algorithm toolchain manual's introductory sections. Successfully convert 1–2 of your preferred preset ONNX models using the OpenExplore package.
Please note that the community code is collaboratively developed with developers over the long term and has not undergone the same rigorous testing as commercial releases. Due to limited author capacity and resources, we cannot currently guarantee long-term stable operation. If you have better ideas, we welcome your issues and pull requests (PRs).
Please note that Ultralytics YOLO is licensed under the AGPL-3.0 license. Use it in compliance with the relevant license terms. For more information, please refer to: https://www.ultralytics.com/license
Introduction to YOLO
YOLO (You Only Look Once) is a popular object detection and image segmentation model developed by Joseph Redmon and Ali Farhadi of the University of Washington. YOLO was introduced in 2015 and quickly gained popularity due to its high speed and accuracy.
- YOLOv2, released in 2016, improved upon the original model by incorporating batch normalization, anchor boxes, and dimension clustering.
- YOLOv3: The third iteration of the YOLO model family, originally by Joseph Redmon, known for its efficient real-time object detection capabilities.
- YOLOv4: A darknet-native update to YOLOv3, released by Alexey Bochkovskiy in 2020.
- YOLOv5: An improved version of the YOLO architecture by Ultralytics, offering better performance and speed trade-offs compared to previous versions.
- YOLOv6: Released by Meituan in 2022, and in use in many of the company's autonomous delivery robots.
- YOLOv7: Updated YOLO models released in 2022 by the authors of YOLOv4.
- YOLOv8: The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
- YOLOv9: An experimental model trained on the Ultralytics YOLOv5 codebase implementing Programmable Gradient Information (PGI).
- YOLOv10: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency.
- YOLO11 🚀: Ultralytics' latest YOLO models delivering state-of-the-art (SOTA) performance across multiple tasks.
- YOLO12 builds a YOLO framework centered around attention mechanisms, employing innovative methods and architectural improvements to break the dominance of CNN models within the YOLO series. This enables real-time object detection with faster inference speeds and higher detection accuracy.
- Ultralytics YOLO26 (upcoming release) is the latest evolution of the YOLO series of real-time object detectors, designed from the ground up specifically for edge and low-power devices. It introduces a simplified design, eliminates unnecessary complexity, and integrates targeted innovations to achieve faster, lighter, and more accessible deployments.
Quick Experience
Fast experience environment for RDK board burn the community to provide the latest RDK OS system, and update the normal network, download the RDK Model of Zoo samples/Vision/Ultralytics_YOLO folder, global systems can be used to experience the Python interpreter. If you want to experience virtual environments such as conda, you can refer to the "Model Deployment" section at the end of this article.
# Download RDK Model Zoo
https://github.com/D-Robotics/rdk_model_zoo_s
# Clone this repo (Optional)
git clone https://github.com/D-Robotics/rdk_model_zoo_s.git
# Make Sure your are in this file
$ cd samples/Vision/Ultralytics_YOLO
# Example: Detect
python3 -m py.rdk_yolo_app --yolo-type yolo11 --model-type detect --workspace result_detect_workspace
# Example: Seg
python3 -m py.rdk_yolo_app --yolo-type yolo11 --model-type segmentation --workspace result_segmentation_workspace
# Example: Pose
python3 -m py.rdk_yolo_app --yolo-type yolo11 --model-type pose --workspace result_pose_workspace
# Example: cls
python3 -m py.rdk_yolo_app --yolo-type yolo11 --model-type classification --workspace result_classification_workspace
Result Alalysis
The program automatically downloads the corresponding model of YOLO11, performs reasoning on all the images in the folder, and saves the reasoning and visualization results in the folder specified by the current directory '--workspace'.
Reference for operation logs
# python3 -m py.rdk_yolo_app --yolo-type yolo11 --model-type classification --workspace result_classification_workspace
[RDK_YOLO] [15:26:26.435] [INFO] Namespace(model_path='BPU/Nash-e/yolo11n_seg_nashe_640x640_nv12.hbm', source='../../../resource/datasets/COCO2017/assets/', workspace='result_classification_workspace', mode='default', mode_save_name='result_bpu.txt', yolo_type='yolo11', model_type='classification', classes_num=80, nms_thres=0.7, score_thres=0.25, reg=16, strides=[8, 16, 32], mc=32, is_open=True, is_point=False, pose_classes_num=1, nkpt=17, kpt_conf_thres=0.5)
--2025-12-01 15:26:26-- https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_s100/Ultralytics_YOLO_OE_3.5.0/Nash-e/yolo11s_cls_nashe_640x640_nv12.hbm
Resolving archive.d-robotics.cc (archive.d-robotics.cc)... 58.218.215.103
Connecting to archive.d-robotics.cc (archive.d-robotics.cc)|58.218.215.103|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7609264 (7.3M) [application/octet-stream]
Saving to: 'yolo11s_cls_nashe_640x640_nv12.hbm'
yolo11s_cls_nashe_640x640_nv12.hb 100%[===========================================================>] 7.26M 800KB/s in 10s
2025-12-01 15:26:38 (715 KB/s) - 'yolo11s_cls_nashe_640x640_nv12.hbm' saved [7609264/7609264]
[UCP]: log level = 3
[UCP]: UCP version = 3.7.6
[VP]: log level = 3
[DNN]: log level = 3
[HPL]: log level = 3
[UCPT]: log level = 6
[RDK_YOLO] [15:26:38.552] [INFO] Auto Select Device: RDK S100 / RDK S100P
[BPU][[BPU_MONITOR]][281473793001152][INFO]BPULib verison(2, 1, 2)[0d3f195]!
[DNN] HBTL_EXT_DNN log level:6
[DNN]: 3.7.6_(4.3.3 HBRT)
[RDK_YOLO] [15:26:38.557] [INFO] YOLO Type: yolo11, Model Type: classification, init success.
inference: 100%|█████████████████████| 7/7 [00:00<00:00, 48.23 item/s]
用于demo的程序还支持以下参数, 欢迎您自行探索. 如果发现有问题, 欢迎您提PR来修复.
$ python3 -m py.rdk_yolo_app -h
usage: rdk_yolo_app.py [-h] [--model-path MODEL_PATH] [--source SOURCE] [--workspace WORKSPACE] [--mode MODE]
[--mode-save-name MODE_SAVE_NAME] [--yolo-type YOLO_TYPE] [--model-type MODEL_TYPE]
[--classes-num CLASSES_NUM] [--nms-thres NMS_THRES] [--score-thres SCORE_THRES] [--reg REG]
[--strides STRIDES] [--mc MC] [--is-open IS_OPEN] [--is-point IS_POINT] [--pose-classes-num POSE_CLASSES_NUM]
[--nkpt NKPT] [--kpt-conf-thres KPT_CONF_THRES]
options:
-h, --help show this help message and exit
--model-path MODEL_PATH
Path to BPU Model.
--source SOURCE image file or images path.
--workspace WORKSPACE
workspace path.
--mode MODE default / coco2017 / imagenet1k
--mode-save-name MODE_SAVE_NAME
--yolo-type YOLO_TYPE
yolov5u, yolov8, yolov9, yolov10, yolo11, yolo12
--model-type MODEL_TYPE
detect, segmentation, pose, classification
--classes-num CLASSES_NUM
Classes Num to Detect.
--nms-thres NMS_THRES
IoU threshold, default 0.7.
--score-thres SCORE_THRES
confidence threshold.
--reg REG DFL reg layer, default 16, sometimes 26.
--strides STRIDES --strides 8, 16, 32
--mc MC Mask Coefficients, default 32.
--is-open IS_OPEN Ture: morphologyEx, default True for better viewer.
--is-point IS_POINT Ture: Draw edge points, default False, true for edge points.
--pose-classes-num POSE_CLASSES_NUM
Classes Num to Detect, default 1.
--nkpt NKPT num of keypoints, default17.
--kpt-conf-thres KPT_CONF_THRES
confidence threshold.
Support Models
目标检测 (Obeject Detection)
Supported Platform:
RDK S100P / RDK S100 / RDK X5 (Module)
- YOLO12 - Detect, Size: n, s, m, l, x
- YOLO11 - Detect, Size: n, s, m, l, x
- YOLOv10 - Detect, Size: n, s, m, b, l, x
- YOLov9 - Detect, Size: t, s, m, c, e
- YOLOv8 - Detect, Size: n, s, m, l, x
- YOLOv5u - Detect, Size: n, s, m, l, x
实例分割 (Instance Segmentation)
Supported Platform:
RDK S100P / RDK S100 / RDK X5 (Module)
YOLO11 - Seg, Size: n, s, m, l, x
YOLOv9 - Seg, Size: c, e
YOLOv8 - Seg, Size: n, s, m, l, x
姿态估计 (Pose Estimation)
Supported Platform:
RDK S100P / RDK S100 / RDK X5 (Module)
YOLO11 - Pose, Size: n, s, m, l, x
YOLOv8 - Pose, Size: n, s, m, l, x
图像分类 (Image Classification)
Supported Platform:
RDK S100P / RDK S100 / RDK X5 (Module)
YOLO11 - CLS, Size: n, s, m, l, x
YOLOv8 - CLS, Size: n, s, m, l, x
Platform Details
RDK S100P
- Target Device:
CPU Arch: 6 x Cortex - A78AE @ 2.0 GHz (137K DMIPS)
BPU Arch: 1 x Nash-m @ 1.5 GHz (128 TOPs, int8)
Memory: LPDDR5 @ 24GB
Operating System: >= RDK OS 4.0.4-beta based on Ubuntu 22.04
- Host Development Environment:
CPU Arch: x86
D-Robotics OpenExplore Toolchain Version: >= 3.5.0
Ultralytics YOLO Version: >= 8.3.0
Operating System: Ubuntu 22.04.5 LTS
Python Version: 3.10.18
RDK S100
- Target Device:
CPU Arch: 6 x Cortex - A78AE @ 1.5 GHz (100K DMIPS)
BPU Arch: 1 x Nash-e @ 1.0 GHz (80 TOPs, int8)
Memory: LPDDR5 @ 12GB
Operating System: >= RDK OS 4.0.4-beta based on Ubuntu 22.04
- Host Development Environment:
CPU Arch: x86
D-Robotics OpenExplore Toolchain Version: >= 3.5.0
Ultralytics YOLO Version: >= 8.3.0
Operatin System: Ubuntu 22.04.5 LTS
Python Version: 3.10.18
RDK X5 / RDK X5 Module
- Target Device:
CPU Arch: 8 x Cortex - A55 @ 1.5 GHz (38K DMIPS)
BPU Arch: 1 x Bayes-e @ 1.0 GHz (10 TOPs, int8)
Memory: LPDDR4 @ 4GB / 8GB
Operating System: >= RDK OS 3.3.3 based on Ubuntu 22.04
- Host Development Environment:
CPU Arch: x86
D-Robotics OpenExplore Toolchain Version: >= 3.5.0
Ultralytics YOLO Version: >= 8.3.0
Operating System: Ubuntu 22.04.5 LTS
Python Version: 3.10.18
BenchMark Instructions
Performance Test Instructions
The Device column indicates the test platform: S100P refers to RDK S100P, S100 refers to RDK S100, X5 refers to RDK X5 (Module).
The Model column specifies the tested model, which corresponds directly to the models listed in the "Supported Models" section of this document.
The Size (Pixels) column denotes the algorithmic input resolution of the model—i.e., the input resolution of the exported ONNX model. Input images with other resolutions are typically preprocessed (resized) to this resolution before being fed into the network for inference.
The Classes column indicates the number of detection categories supported by the model. All models listed here use weights trained by Ultralytics YOLO on either the COCO2017 dataset or the ImageNet-1k dataset; thus, the number of classes matches that of the respective training dataset.
The BPU Task Latency / BPU Throughput (Threads) column reports BPU latency and throughput under various threading conditions: Single-thread latency: measured per frame, using a single thread and a single BPU core—representing the ideal-case latency for a single BPU inference task. Multi-thread throughput: multiple threads concurrently submit tasks to the BPU. Each BPU core can handle tasks from multiple threads. In practical engineering scenarios, using 2 threads typically achieves minimal per-frame latency while fully utilizing all BPU cores (100% utilization), striking a good balance between throughput (FPS) and frame latency. The table generally records results up to the point where throughput no longer increases significantly with additional threads. BPU latency and throughput were measured on-device using the following command. The hrt_model_exec tool is provided by the OE package, with source code located in package/board/hrt_model_exec/src within the OE package: bash hrt_model_exec perf --thread_num 2 --model_file <model.bin / model.hbm> Due to varying experimental conditions, reproduced results may differ. All measurements reported here were conducted under the optimal device state specified in the "Platform Details" section. The hrt_model_exec performance test accounts for cache warm-up and proper multi-threaded program design. The measured time spans from when the user application submits a BPU task until the task completes. For streaming inference, input and output memory buffers should be allocated once and reused across frames. Do not include memory allocation/deallocation time in inference timing, nor repeatedly allocate/free memory during streaming inference—this constitutes poor software design.
CPU Latency (Single Core) refers to post-processing time. Current implementations include performance optimizations for post-processing, whose duration scales linearly with the number of valid detected objects. The reported values assume fewer than 100 valid objects per image. While Python and C/C++ implementations may show slight differences in post-processing time, the gap is small because the Python version heavily relies on highly optimized NumPy operations.
params(M) and FLOPs(B) represent the parameter count and computational complexity (in floating-point operations) of the original floating-point model. These values are obtained from logs printed by the Ultralytics YOLO package when calling YOLO.export() after loading a .pt model. Note that the final fixed-point BPU model’s parameters and FLOPs depend on model structure optimization, graph optimization, and compiler optimizations. Although correlated with the floating-point model’s metrics, they are not strictly proportional. Therefore, floating-point FLOPs are uniformly recorded here as a reference.
Accuracy Test Instructions
The meanings of the Device and Model columns are identical to those described in the "Performance Test Instructions" section.
Accuracy metrics were computed using Microsoft’s official, unmodified pycocotools library: For Object Detection, evaluation uses iouType="bbox". For Instance Segmentation, evaluation uses both iouType="bbox" and iouType="segm". For Human Pose Estimation (Keypoint Detection), evaluation uses iouType="keypoints".
Specific metrics are derived as follows: Accuracy bbox-all [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=all maxDets=100 ] Accuracy bbox-small [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=small maxDets=100 ] Accuracy bbox-medium [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] Accuracy bbox-large [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=large maxDets=100 ] Accuracy mask-all [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=all maxDets=100 ] Accuracy mask-small [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=small maxDets=100 ] Accuracy mask-medium [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] Accuracy mask-large [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=large maxDets=100 ] Accuracy pose-all [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=all maxDets=20 ] Accuracy pose-medium [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=20 ] Accuracy pose-large [email protected]:.95: Average Precision (AP) @[ IoU=0.50:0.95 area=large maxDets=20 ]
AP (Average Precision) emphasizes quality: it requires both high recall (finding targets) and high precision (accurate bounding boxes and correct classification). In contrast, AR (Average Recall) emphasizes quantity: it counts any detection that overlaps a ground truth, without penalizing false positives. Thus, a model can have high AR but low AP (e.g., by generating many low-quality detections) or high AP but low AR (e.g., by only outputting high-confidence predictions and missing many targets). This document uses AP as the primary accuracy metric.
All tests used the 5,000 images from the COCO2017 validation set. Inference was performed directly on the device, and results were dumped to JSON files for evaluation using the third-party pycocotools library. A confidence threshold of 0.25 and an NMS IoU threshold of 0.7 were applied.
It is normal for pycocotools to report slightly lower accuracy than Ultralytics’ own evaluation tools. This discrepancy arises because pycocotools computes the AP integral using rectangular approximation, whereas Ultralytics uses trapezoidal approximation. Our focus is on using a consistent evaluation method to compare fixed-point (quantized) and floating-point models, thereby assessing quantization-induced accuracy loss.
For classification tasks, the ImageNet-1k dataset was used, with Top-1 and Top-5 accuracy reported to evaluate quantization-induced accuracy degradation.
Converting BPU model input from NCHW-RGB888 to YUV420SP (NV12) introduces minor accuracy loss due to color-space transformation. This loss can be mitigated by incorporating such color-space conversion during model training.
Minor numerical discrepancies may exist between Python and C/C++ API results, primarily due to subtle differences in how floating-point data is handled during memory copying and type conversions between the two implementations.
The results in this table were obtained using Post-Training Quantization (PTQ) with calibration on 50 images, simulating the typical experience of a developer performing their first direct compilation without further accuracy tuning or Quantization-Aware Training (QAT). These results satisfy general validation requirements but do not represent the upper bound of achievable accuracy.
Performance
RDK S100P
Obeject Detection
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100P | YOLO12n Detect | 640×640 | 80 | 1.88 ms / 513.70 FPS (1 thread ) 3.07 ms / 634.97 FPS (2 threads) |
2.0 ms | 2.6 M | 7.7 M |
| S100P | YOLO12s Detect | 640×640 | 80 | 3.10 ms / 315.83 FPS (1 thread ) 5.50 ms / 357.85 FPS (2 threads) |
2.0 ms | 9.3 M | 21.4 M |
| S100P | YOLO12m Detect | 640×640 | 80 | 6.47 ms / 152.80 FPS (1 thread ) 12.18 ms / 162.62 FPS (2 threads) |
2.0 ms | 20.2 M | 67.5 M |
| S100P | YOLO12l Detect | 640×640 | 80 | 10.23 ms / 97.01 FPS (1 thread ) 19.67 ms / 101.04 FPS (2 threads) |
2.0 ms | 26.4 M | 88.9 M |
| S100P | YOLO12x Detect | 640×640 | 80 | 17.05 ms / 58.34 FPS (1 thread ) 33.21 ms / 59.92 FPS (2 threads) |
2.0 ms | 59.1 M | 199.0 M |
| S100P | YOLO11n Detect | 640×640 | 80 | 1.16 ms / 816.50 FPS (1 thread ) 1.66 ms / 1155.65 FPS (2 threads) |
2.0 ms | 2.6 M | 6.5 M |
| S100P | YOLO11s Detect | 640×640 | 80 | 1.81 ms / 533.50 FPS (1 thread ) 2.98 ms / 656.31 FPS (2 threads) |
2.0 ms | 9.4 M | 21.5 M |
| S100P | YOLO11m Detect | 640×640 | 80 | 3.90 ms / 252.02 FPS (1 thread ) 7.10 ms / 278.36 FPS (2 threads) |
2.0 ms | 20.1 M | 68.0 M |
| S100P | YOLO11l Detect | 640×640 | 80 | 4.73 ms / 208.61 FPS (1 thread ) 8.75 ms / 225.99 FPS (2 threads) |
2.0 ms | 25.3 M | 86.9 M |
| S100P | YOLO11x Detect | 640×640 | 80 | 8.84 ms / 112.05 FPS (1 thread ) 16.92 ms / 117.39 FPS (2 threads) |
2.0 ms | 56.9 M | 194.9 M |
| S100P | YOLOv10n Detect | 640×640 | 80 | 1.12 ms / 837.97 FPS (1 thread ) 1.58 ms / 1211.72 FPS (2 threads) |
2.0 ms | 2.3 M | 6.7 M |
| S100P | YOLOv10s Detect | 640×640 | 80 | 1.75 ms / 548.80 FPS (1 thread ) 2.81 ms / 692.74 FPS (2 threads) |
2.0 ms | 7.2 M | 21.6 M |
| S100P | YOLOv10m Detect | 640×640 | 80 | 3.06 ms / 319.65 FPS (1 thread ) 5.45 ms / 361.32 FPS (2 threads) |
2.0 ms | 15.4 M | 59.1 M |
| S100P | YOLOv10b Detect | 640×640 | 80 | 4.30 ms / 228.16 FPS (1 thread ) 7.85 ms / 250.93 FPS (2 threads) |
2.0 ms | 19.1 M | 92.0 M |
| S100P | YOLOv10l Detect | 640×640 | 80 | 5.42 ms / 181.96 FPS (1 thread ) 10.10 ms / 196.04 FPS (2 threads) |
2.0 ms | 24.4 M | 120.3 M |
| S100P | YOLOv10x Detect | 640×640 | 80 | 7.33 ms / 135.18 FPS (1 thread ) 13.90 ms / 142.81 FPS (2 threads) |
2.0 ms | 29.5 M | 160.4 M |
| S100P | YOLOv9t Detect | 640×640 | 80 | 1.29 ms / 736.75 FPS (1 thread ) 1.90 ms / 1013.70 FPS (2 threads) |
2.0 ms | 2.1 M | 8.2 M |
| S100P | YOLOv9s Detect | 640×640 | 80 | 1.93 ms / 497.53 FPS (1 thread ) 3.19 ms / 611.75 FPS (2 threads) |
2.0 ms | 7.2 M | 26.9 M |
| S100P | YOLOv9m Detect | 640×640 | 80 | 3.77 ms / 260.19 FPS (1 thread ) 6.83 ms / 288.82 FPS (2 threads) |
2.0 ms | 20.1 M | 76.8 M |
| S100P | YOLOv9c Detect | 640×640 | 80 | 4.76 ms / 206.90 FPS (1 thread ) 8.77 ms / 225.46 FPS (2 threads) |
2.0 ms | 25.3 M | 102.7 M |
| S100P | YOLOv9e Detect | 640×640 | 80 | 12.27 ms / 81.00 FPS (1 thread ) 23.73 ms / 83.75 FPS (2 threads) |
2.0 ms | 57.4 M | 189.5 M |
| S100P | YOLOv8n Detect | 640×640 | 80 | 1.10 ms / 851.31 FPS (1 thread ) 1.52 ms / 1258.50 FPS (2 threads) |
2.0 ms | 3.2 M | 8.7 M |
| S100P | YOLOv8s Detect | 640×640 | 80 | 1.83 ms / 524.95 FPS (1 thread ) 2.95 ms / 660.43 FPS (2 threads) |
2.0 ms | 11.2 M | 28.6 M |
| S100P | YOLOv8m Detect | 640×640 | 80 | 3.43 ms / 285.34 FPS (1 thread ) 6.14 ms / 320.93 FPS (2 threads) |
2.0 ms | 25.9 M | 78.9 M |
| S100P | YOLOv8l Detect | 640×640 | 80 | 6.72 ms / 147.19 FPS (1 thread ) 12.67 ms / 156.40 FPS (2 threads) |
2.0 ms | 43.7 M | 165.2 M |
| S100P | YOLOv8x Detect | 640×640 | 80 | 10.44 ms / 95.08 FPS (1 thread ) 20.11 ms / 98.81 FPS (2 threads) |
2.0 ms | 68.2 M | 257.8 M |
| S100P | YOLOv5nu Detect | 640×640 | 80 | 0.99 ms / 954.28 FPS (1 thread ) 1.34 ms / 1418.24 FPS (2 threads) |
2.0 ms | 2.6 M | 7.7 M |
| S100P | YOLOv5su Detect | 640×640 | 80 | 1.60 ms / 602.38 FPS (1 thread ) 2.56 ms / 763.66 FPS (2 threads) |
2.0 ms | 9.1 M | 24.0 M |
| S100P | YOLOv5mu Detect | 640×640 | 80 | 3.06 ms / 319.05 FPS (1 thread ) 5.43 ms / 363.38 FPS (2 threads) |
2.0 ms | 25.1 M | 64.2 M |
| S100P | YOLOv5lu Detect | 640×640 | 80 | 6.04 ms / 163.65 FPS (1 thread ) 11.36 ms / 174.46 FPS (2 threads) |
2.0 ms | 53.2 M | 135.0 M |
| S100P | YOLOv5xu Detect | 640×640 | 80 | 10.74 ms / 92.40 FPS (1 thread ) 20.67 ms / 96.10 FPS (2 threads) |
2.0 ms | 97.2 M | 246.4 M |
Instance Segmentation
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100P | YOLO11n Seg | 640×640 | 80 | 1.45 ms / 647.24 FPS (1 thread ) 2.14 ms / 883.83 FPS (2 threads) |
5.0 ms | 2.9 M | 10.4 M |
| S100P | YOLO11s Seg | 640×640 | 80 | 2.31 ms / 413.73 FPS (1 thread ) 3.88 ms / 501.74 FPS (2 threads) |
5.0 ms | 10.1 M | 35.5 M |
| S100P | YOLO11m Seg | 640×640 | 80 | 5.36 ms / 182.99 FPS (1 thread ) 9.89 ms / 199.15 FPS (2 threads) |
5.0 ms | 22.4 M | 123.3 M |
| S100P | YOLO11l Seg | 640×640 | 80 | 6.20 ms / 158.60 FPS (1 thread ) 11.57 ms / 170.73 FPS (2 threads) |
5.0 ms | 27.6 M | 142.2 M |
| S100P | YOLO11x Seg | 640×640 | 80 | 11.89 ms / 83.30 FPS (1 thread ) 22.83 ms / 86.92 FPS (2 threads) |
5.0 ms | 62.1 M | 319.0 M |
| S100P | YOLOv9c Seg | 640×640 | 80 | 6.29 ms / 156.58 FPS (1 thread ) 11.74 ms / 168.32 FPS (2 threads) |
5.0 ms | 27.7 M | 158.0 M |
| S100P | YOLOv9e Seg | 640×640 | 80 | 14.20 ms / 69.78 FPS (1 thread ) 27.42 ms / 72.35 FPS (2 threads) |
5.0 ms | 59.7 M | 244.8 M |
| S100P | YOLOv8n Seg | 640×640 | 80 | 1.39 ms / 666.05 FPS (1 thread ) 2.01 ms / 946.83 FPS (2 threads) |
5.0 ms | 3.4 M | 12.6 M |
| S100P | YOLOv8s Seg | 640×640 | 80 | 2.32 ms / 411.56 FPS (1 thread ) 3.86 ms / 502.69 FPS (2 threads) |
5.0 ms | 11.8 M | 42.6 M |
| S100P | YOLOv8m Seg | 640×640 | 80 | 4.37 ms / 223.73 FPS (1 thread ) 7.95 ms / 247.14 FPS (2 threads) |
5.0 ms | 27.3 M | 100.2 M |
| S100P | YOLOv8l Seg | 640×640 | 80 | 8.20 ms / 120.19 FPS (1 thread ) 15.46 ms / 127.84 FPS (2 threads) |
5.0 ms | 46.0 M | 220.5 M |
| S100P | YOLOv8x Seg | 640×640 | 80 | 13.01 ms / 76.23 FPS (1 thread ) 25.11 ms / 79.06 FPS (2 threads) |
5.0 ms | 71.8 M | 344.1 M |
Pose Estimation
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100P | YOLO11n Pose | 640×640 | 80 | 1.23 ms / 770.10 FPS (1 thread ) 1.74 ms / 1097.27 FPS (2 threads) |
1.0 ms | 2.9 M | 7.6 M |
| S100P | YOLO11s Pose | 640×640 | 80 | 1.92 ms / 501.66 FPS (1 thread ) 3.11 ms / 627.29 FPS (2 threads) |
1.0 ms | 9.9 M | 23.2 M |
| S100P | YOLO11m Pose | 640×640 | 80 | 4.04 ms / 241.82 FPS (1 thread ) 7.32 ms / 269.96 FPS (2 threads) |
1.0 ms | 20.9 M | 71.7 M |
| S100P | YOLO11l Pose | 640×640 | 80 | 4.87 ms / 202.29 FPS (1 thread ) 8.99 ms / 220.09 FPS (2 threads) |
1.0 ms | 26.2 M | 90.7 M |
| S100P | YOLO11x Pose | 640×640 | 80 | 9.15 ms / 108.13 FPS (1 thread ) 17.45 ms / 113.64 FPS (2 threads) |
1.0 ms | 58.8 M | 203.3 M |
| S100P | YOLOv8n Pose | 640×640 | 80 | 1.14 ms / 822.46 FPS (1 thread ) 1.58 ms / 1206.58 FPS (2 threads) |
1.0 ms | 3.3 M | 9.2 M |
| S100P | YOLOv8s Pose | 640×640 | 80 | 1.97 ms / 486.85 FPS (1 thread ) 3.23 ms / 606.41 FPS (2 threads) |
1.0 ms | 11.6 M | 30.2 M |
| S100P | YOLOv8m Pose | 640×640 | 80 | 3.65 ms / 267.74 FPS (1 thread ) 6.54 ms / 301.30 FPS (2 threads) |
1.0 ms | 26.4 M | 81.0 M |
| S100P | YOLOv8l Pose | 640×640 | 80 | 6.92 ms / 142.52 FPS (1 thread ) 12.99 ms / 152.18 FPS (2 threads) |
1.0 ms | 44.4 M | 168.6 M |
| S100P | YOLOv8x Pose | 640×640 | 80 | 10.67 ms / 92.89 FPS (1 thread ) 20.48 ms / 96.97 FPS (2 threads) |
1.0 ms | 69.4 M | 263.2 M |
Image Classification
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100P | YOLO11n CLS | 640×640 | 80 | 0.40 ms / 2368.83 FPS (1 thread ) 0.46 ms / 4151.62 FPS (2 threads) 0.56 ms / 5164.09 FPS (3 threads) |
0.5 ms | 2.8 M | 4.2 M |
| S100P | YOLO11s CLS | 640×640 | 80 | 0.52 ms / 1843.47 FPS (1 thread ) 0.61 ms / 3128.03 FPS (2 threads) 0.81 ms / 3593.24 FPS (3 threads) |
0.5 ms | 6.7 M | 13.0 M |
| S100P | YOLO11m CLS | 640×640 | 80 | 0.78 ms / 1248.81 FPS (1 thread ) 1.01 ms / 1935.47 FPS (2 threads) |
0.5 ms | 11.6 M | 40.3 M |
| S100P | YOLO11l CLS | 640×640 | 80 | 0.90 ms / 1088.57 FPS (1 thread ) 1.27 ms / 1544.42 FPS (2 threads) |
0.5 ms | 14.1 M | 50.4 M |
| S100P | YOLO11x CLS | 640×640 | 80 | 1.45 ms / 676.30 FPS (1 thread ) 2.34 ms / 844.07 FPS (2 threads) |
0.5 ms | 29.6 M | 111.3 M |
| S100P | YOLOv8n CLS | 640×640 | 80 | 0.38 ms / 2470.91 FPS (1 thread ) 0.45 ms / 4304.69 FPS (2 threads) 0.55 ms / 5272.31 FPS (3 threads) |
0.5 ms | 2.7 M | 4.3 M |
| S100P | YOLOv8s CLS | 640×640 | 80 | 0.49 ms / 1953.98 FPS (1 thread ) 0.57 ms / 3364.17 FPS (2 threads) 0.70 ms / 4109.05 FPS (3 threads) |
0.5 ms | 6.4 M | 13.5 M |
| S100P | YOLOv8m CLS | 640×640 | 80 | 0.80 ms / 1218.07 FPS (1 thread ) 1.05 ms / 1876.12 FPS (2 threads) |
0.5 ms | 17.0 M | 42.7 M |
| S100P | YOLOv8l CLS | 640×640 | 80 | 1.44 ms / 683.65 FPS (1 thread ) 2.34 ms / 842.80 FPS (2 threads) |
0.5 ms | 37.5 M | 99.7 M |
| S100P | YOLOv8x CLS | 640×640 | 80 | 2.09 ms / 470.34 FPS (1 thread ) 3.55 ms / 559.63 FPS (2 threads) |
0.5 ms | 57.4 M | 154.8 M |
Accuracy
Obeject Detection
| Device | Model | Accuracy bbox-all [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-small [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-medium [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| S100P | YOLO12n Detect | 0.338 / 0.313 (92.4 %) | 0.128 / 0.095 (74.0 %) | 0.374 / 0.342 (91.4 %) | 0.524 / 0.515 (98.3 %) |
| S100P | YOLO12s Detect | 0.403 / 0.380 (94.2 %) | 0.201 / 0.152 (75.5 %) | 0.450 / 0.432 (95.9 %) | 0.602 / 0.581 (96.5 %) |
| S100P | YOLO12m Detect | 0.452 / 0.423 (93.7 %) | 0.251 / 0.204 (81.3 %) | 0.509 / 0.489 (96.0 %) | 0.638 / 0.616 (96.5 %) |
| S100P | YOLO12l Detect | 0.463 / 0.429 (92.8 %) | 0.268 / 0.211 (78.6 %) | 0.522 / 0.492 (94.3 %) | 0.646 / 0.630 (97.7 %) |
| S100P | YOLO12x Detect | 0.475 / 0.440 (92.7 %) | 0.276 / 0.222 (80.3 %) | 0.536 / 0.509 (94.9 %) | 0.659 / 0.627 (95.1 %) |
| S100P | YOLO11n Detect | 0.327 / 0.306 (93.9 %) | 0.130 / 0.104 (80.0 %) | 0.357 / 0.340 (95.2 %) | 0.511 / 0.500 (97.8 %) |
| S100P | YOLO11s Detect | 0.400 / 0.380 (95.0 %) | 0.198 / 0.166 (83.9 %) | 0.445 / 0.427 (96.1 %) | 0.587 / 0.579 (98.6 %) |
| S100P | YOLO11m Detect | 0.444 / 0.417 (94.0 %) | 0.247 / 0.214 (87.0 %) | 0.497 / 0.478 (96.1 %) | 0.627 / 0.599 (95.6 %) |
| S100P | YOLO11l Detect | 0.460 / 0.434 (94.5 %) | 0.267 / 0.227 (85.2 %) | 0.520 / 0.498 (95.9 %) | 0.638 / 0.611 (95.8 %) |
| S100P | YOLO11x Detect | 0.474 / 0.446 (94.0 %) | 0.283 / 0.240 (84.7 %) | 0.529 / 0.506 (95.6 %) | 0.652 / 0.627 (96.1 %) |
| S100P | YOLOv10n Detect | 0.303 / 0.276 (91.3 %) | 0.099 / 0.064 (64.7 %) | 0.330 / 0.302 (91.5 %) | 0.478 / 0.460 (96.2 %) |
| S100P | YOLOv10s Detect | 0.386 / 0.354 (91.6 %) | 0.175 / 0.126 (72.2 %) | 0.434 / 0.402 (92.5 %) | 0.574 / 0.527 (91.7 %) |
| S100P | YOLOv10m Detect | 0.425 / 0.368 (86.7 %) | 0.221 / 0.179 (80.9 %) | 0.481 / 0.431 (89.6 %) | 0.603 / 0.472 (78.2 %) |
| S100P | YOLOv10b Detect | 0.443 / 0.382 (86.3 %) | 0.242 / 0.194 (80.2 %) | 0.498 / 0.437 (87.7 %) | 0.618 / 0.480 (77.7 %) |
| S100P | YOLOv10l Detect | 0.445 / 0.372 (83.6 %) | 0.258 / 0.202 (78.5 %) | 0.498 / 0.435 (87.4 %) | 0.626 / 0.463 (74.0 %) |
| S100P | YOLOv10x Detect | 0.459 / 0.409 (89.2 %) | 0.258 / 0.212 (82.1 %) | 0.518 / 0.475 (91.8 %) | 0.639 / 0.535 (83.6 %) |
| S100P | YOLOv9t Detect | 0.313 / 0.301 (96.2 %) | 0.113 / 0.105 (93.6 %) | 0.338 / 0.325 (96.3 %) | 0.483 / 0.461 (95.5 %) |
| S100P | YOLOv9s Detect | 0.400 / 0.383 (95.8 %) | 0.191 / 0.165 (86.3 %) | 0.444 / 0.431 (97.0 %) | 0.583 / 0.560 (96.1 %) |
| S100P | YOLOv9m Detect | 0.449 / 0.432 (96.1 %) | 0.253 / 0.231 (91.2 %) | 0.504 / 0.487 (96.5 %) | 0.617 / 0.602 (97.5 %) |
| S100P | YOLOv9c Detect | 0.461 / 0.446 (96.8 %) | 0.269 / 0.250 (93.2 %) | 0.512 / 0.499 (97.4 %) | 0.640 / 0.618 (96.6 %) |
| S100P | YOLOv9e Detect | 0.481 / 0.465 (96.6 %) | 0.298 / 0.270 (90.9 %) | 0.538 / 0.520 (96.7 %) | 0.662 / 0.647 (97.7 %) |
| S100P | YOLOv8n Detect | 0.309 / 0.292 (94.4 %) | 0.113 / 0.098 (87.2 %) | 0.338 / 0.321 (94.9 %) | 0.473 / 0.457 (96.7 %) |
| S100P | YOLOv8s Detect | 0.391 / 0.373 (95.4 %) | 0.195 / 0.166 (85.1 %) | 0.437 / 0.425 (97.3 %) | 0.566 / 0.558 (98.6 %) |
| S100P | YOLOv8m Detect | 0.441 / 0.420 (95.4 %) | 0.249 / 0.213 (85.6 %) | 0.494 / 0.478 (96.7 %) | 0.618 / 0.612 (99.1 %) |
| S100P | YOLOv8l Detect | 0.461 / 0.442 (95.8 %) | 0.271 / 0.241 (88.9 %) | 0.516 / 0.499 (96.6 %) | 0.651 / 0.628 (96.4 %) |
| S100P | YOLOv8x Detect | 0.474 / 0.448 (94.6 %) | 0.280 / 0.245 (87.6 %) | 0.527 / 0.504 (95.7 %) | 0.658 / 0.640 (97.2 %) |
| S100P | YOLOv5nu Detect | 0.278 / 0.264 (94.7 %) | 0.093 / 0.080 (85.5 %) | 0.309 / 0.293 (94.8 %) | 0.417 / 0.406 (97.5 %) |
| S100P | YOLOv5su Detect | 0.367 / 0.349 (95.2 %) | 0.169 / 0.141 (83.3 %) | 0.416 / 0.398 (95.8 %) | 0.530 / 0.524 (98.9 %) |
| S100P | YOLOv5mu Detect | 0.425 / 0.406 (95.6 %) | 0.226 / 0.194 (86.0 %) | 0.477 / 0.467 (98.0 %) | 0.603 / 0.592 (98.2 %) |
| S100P | YOLOv5lu Detect | 0.458 / 0.436 (95.1 %) | 0.260 / 0.215 (82.9 %) | 0.516 / 0.500 (96.8 %) | 0.641 / 0.631 (98.4 %) |
| S100P | YOLOv5xu Detect | 0.466 / 0.445 (95.5 %) | 0.281 / 0.239 (85.0 %) | 0.523 / 0.506 (96.7 %) | 0.645 / 0.638 (99.0 %) |
Instance Segmentation
bbox
| Device | Model | Accuracy bbox-all [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-small [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-medium [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| S100P | YOLO11n Seg | 0.322 / 0.294 (91.4 %) | 0.113 / 0.081 (71.8 %) | 0.352 / 0.324 (92.1 %) | 0.502 / 0.490 (97.6 %) |
| S100P | YOLO11s Seg | 0.394 / 0.372 (94.4 %) | 0.184 / 0.149 (81.2 %) | 0.442 / 0.424 (96.0 %) | 0.582 / 0.577 (99.1 %) |
| S100P | YOLO11m Seg | 0.443 / 0.414 (93.3 %) | 0.246 / 0.208 (84.3 %) | 0.497 / 0.473 (95.2 %) | 0.627 / 0.599 (95.6 %) |
| S100P | YOLO11l Seg | 0.460 / 0.430 (93.5 %) | 0.267 / 0.220 (82.5 %) | 0.520 / 0.493 (94.9 %) | 0.638 / 0.610 (95.6 %) |
| S100P | YOLO11x Seg | 0.474 / 0.441 (93.0 %) | 0.283 / 0.231 (81.7 %) | 0.529 / 0.501 (94.6 %) | 0.652 / 0.625 (95.8 %) |
| S100P | YOLOv9c Seg | 0.453 / 0.422 (93.0 %) | 0.254 / 0.206 (81.2 %) | 0.508 / 0.483 (94.9 %) | 0.621 / 0.601 (96.8 %) |
| S100P | YOLOv9e Seg | 0.481 / 0.450 (93.6 %) | 0.292 / 0.245 (83.9 %) | 0.537 / 0.507 (94.3 %) | 0.650 / 0.628 (96.6 %) |
| S100P | YOLOv8n Seg | 0.304 / 0.282 (92.9 %) | 0.109 / 0.087 (79.7 %) | 0.334 / 0.310 (92.8 %) | 0.461 / 0.440 (95.4 %) |
| S100P | YOLOv8s Seg | 0.386 / 0.363 (94.0 %) | 0.180 / 0.149 (82.8 %) | 0.432 / 0.410 (94.8 %) | 0.564 / 0.547 (97.0 %) |
| S100P | YOLOv8m Seg | 0.431 / 0.407 (94.3 %) | 0.228 / 0.191 (83.9 %) | 0.486 / 0.467 (96.0 %) | 0.608 / 0.596 (98.0 %) |
| S100P | YOLOv8l Seg | 0.453 / 0.426 (94.1 %) | 0.258 / 0.220 (85.0 %) | 0.502 / 0.483 (96.3 %) | 0.626 / 0.607 (97.0 %) |
| S100P | YOLOv8x Seg | 0.465 / 0.435 (93.5 %) | 0.268 / 0.214 (79.7 %) | 0.520 / 0.496 (95.2 %) | 0.641 / 0.622 (97.1 %) |
mask
| Device | Model | Accuracy mask-all [email protected]:.95 FP32 / BPU Python |
Accuracy mask-small [email protected]:.95 FP32 / BPU Python |
Accuracy mask-medium [email protected]:.95 FP32 / BPU Python |
Accuracy mask-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| S100P | YOLO11n Seg | 0.262 / 0.226 (86.3 %) | 0.062 / 0.044 (72.0 %) | 0.283 / 0.250 (88.2 %) | 0.444 / 0.394 (88.8 %) |
| S100P | YOLO11s Seg | 0.311 / 0.287 (92.2 %) | 0.099 / 0.088 (88.9 %) | 0.350 / 0.326 (93.3 %) | 0.509 / 0.474 (93.2 %) |
| S100P | YOLO11m Seg | 0.347 / 0.315 (90.7 %) | 0.136 / 0.122 (90.3 %) | 0.396 / 0.362 (91.4 %) | 0.549 / 0.493 (89.8 %) |
| S100P | YOLO11l Seg | 0.357 / 0.325 (91.1 %) | 0.143 / 0.126 (88.1 %) | 0.409 / 0.374 (91.4 %) | 0.560 / 0.504 (90.1 %) |
| S100P | YOLO11x Seg | 0.366 / 0.331 (90.4 %) | 0.149 / 0.129 (86.9 %) | 0.420 / 0.379 (90.2 %) | 0.572 / 0.520 (90.9 %) |
| S100P | YOLOv9c Seg | 0.352 / 0.319 (90.7 %) | 0.132 / 0.116 (88.1 %) | 0.404 / 0.367 (90.8 %) | 0.547 / 0.497 (91.0 %) |
| S100P | YOLOv9e Seg | 0.371 / 0.340 (91.7 %) | 0.155 / 0.136 (87.8 %) | 0.425 / 0.386 (90.7 %) | 0.571 / 0.525 (92.0 %) |
| S100P | YOLOv8n Seg | 0.246 / 0.221 (89.9 %) | 0.059 / 0.048 (81.8 %) | 0.265 / 0.243 (91.8 %) | 0.409 / 0.364 (89.0 %) |
| S100P | YOLOv8s Seg | 0.305 / 0.282 (92.6 %) | 0.096 / 0.086 (90.2 %) | 0.343 / 0.316 (92.1 %) | 0.496 / 0.457 (92.1 %) |
| S100P | YOLOv8m Seg | 0.337 / 0.312 (92.7 %) | 0.121 / 0.110 (90.8 %) | 0.386 / 0.358 (92.8 %) | 0.533 / 0.494 (92.5 %) |
| S100P | YOLOv8l Seg | 0.351 / 0.326 (92.9 %) | 0.137 / 0.126 (92.1 %) | 0.398 / 0.371 (93.4 %) | 0.550 / 0.509 (92.5 %) |
| S100P | YOLOv8x Seg | 0.358 / 0.331 (92.3 %) | 0.139 / 0.119 (85.5 %) | 0.409 / 0.379 (92.5 %) | 0.562 / 0.514 (91.4 %) |
Pose Estimation
| Device | Model | Accuracy pose-all [email protected]:.95 FP32 / BPU Python |
Accuracy pose-medium [email protected]:.95 FP32 / BPU Python |
Accuracy pose-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|
| S100P | YOLO11n Pose | 0.465 / 0.445 (95.7 %) | 0.386 / 0.373 (96.5 %) | 0.597 / 0.568 (95.1 %) |
| S100P | YOLO11s Pose | 0.559 / 0.533 (95.4 %) | 0.495 / 0.467 (94.5 %) | 0.672 / 0.649 (96.6 %) |
| S100P | YOLO11m Pose | 0.627 / 0.607 (96.8 %) | 0.586 / 0.563 (96.2 %) | 0.711 / 0.692 (97.3 %) |
| S100P | YOLO11l Pose | 0.636 / 0.617 (97.0 %) | 0.592 / 0.570 (96.3 %) | 0.726 / 0.704 (97.0 %) |
| S100P | YOLO11x Pose | 0.672 / 0.648 (96.5 %) | 0.634 / 0.605 (95.5 %) | 0.750 / 0.733 (97.8 %) |
| S100P | YOLOv8n Pose | 0.476 / 0.460 (96.7 %) | 0.391 / 0.372 (95.0 %) | 0.610 / 0.593 (97.2 %) |
| S100P | YOLOv8s Pose | 0.578 / 0.550 (95.2 %) | 0.510 / 0.476 (93.4 %) | 0.692 / 0.667 (96.4 %) |
| S100P | YOLOv8m Pose | 0.630 / 0.605 (96.0 %) | 0.578 / 0.553 (95.7 %) | 0.724 / 0.697 (96.3 %) |
| S100P | YOLOv8l Pose | 0.657 / 0.631 (96.1 %) | 0.607 / 0.579 (95.3 %) | 0.747 / 0.726 (97.2 %) |
| S100P | YOLOv8x Pose | 0.671 / 0.649 (96.7 %) | 0.624 / 0.602 (96.4 %) | 0.757 / 0.733 (96.8 %) |
Image Classification
| Device | Model | Accuracy TOP1 FP32 / BPU Python |
Accuracy TOP5 FP32 / BPU Python |
|---|---|---|---|
| S100P | YOLO11n CLS | 0.700 / 0.566 (80.8 %) | 0.894 / 0.803 (89.8 %) |
| S100P | YOLO11s CLS | 0.754 / 0.661 (87.7 %) | 0.927 / 0.872 (94.1 %) |
| S100P | YOLO11m CLS | 0.773 / 0.706 (91.3 %) | 0.939 / 0.903 (96.1 %) |
| S100P | YOLO11l CLS | 0.783 / 0.712 (90.8 %) | 0.942 / 0.905 (96.1 %) |
| S100P | YOLO11x CLS | 0.795 / 0.734 (92.4 %) | 0.949 / 0.919 (96.8 %) |
| S100P | YOLOv8n CLS | 0.689 / 0.577 (83.7 %) | 0.883 / 0.808 (91.5 %) |
| S100P | YOLOv8s CLS | 0.737 / 0.631 (85.6 %) | 0.917 / 0.850 (92.8 %) |
| S100P | YOLOv8m CLS | 0.768 / 0.703 (91.6 %) | 0.935 / 0.899 (96.2 %) |
| S100P | YOLOv8l CLS | 0.783 / 0.723 (92.3 %) | 0.942 / 0.910 (96.6 %) |
| S100P | YOLOv8x CLS | 0.790 / 0.742 (93.9 %) | 0.945 / 0.923 (97.6 %) |
Performance
RDK S100
Obeject Detection
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100 | YOLO12n Detect | 640×640 | 80 | 2.65 ms / 368.54 FPS (1 thread ) 4.43 ms / 443.33 FPS (2 threads) |
2.0 ms | 2.6 M | 7.7 M |
| S100 | YOLO12s Detect | 640×640 | 80 | 4.48 ms / 220.08 FPS (1 thread ) 8.10 ms / 244.66 FPS (2 threads) |
2.0 ms | 9.3 M | 21.4 M |
| S100 | YOLO12m Detect | 640×640 | 80 | 9.27 ms / 107.09 FPS (1 thread ) 17.56 ms / 113.12 FPS (2 threads) |
2.0 ms | 20.2 M | 67.5 M |
| S100 | YOLO12l Detect | 640×640 | 80 | 14.66 ms / 67.85 FPS (1 thread ) 28.30 ms / 70.28 FPS (2 threads) |
2.0 ms | 26.4 M | 88.9 M |
| S100 | YOLO12x Detect | 640×640 | 80 | 24.72 ms / 40.33 FPS (1 thread ) 48.27 ms / 41.26 FPS (2 threads) |
2.0 ms | 59.1 M | 199.0 M |
| S100 | YOLO11n Detect | 640×640 | 80 | 1.62 ms / 596.53 FPS (1 thread ) 2.39 ms / 813.87 FPS (2 threads) |
2.0 ms | 2.6 M | 6.5 M |
| S100 | YOLO11s Detect | 640×640 | 80 | 2.63 ms / 371.42 FPS (1 thread ) 4.39 ms / 448.18 FPS (2 threads) |
2.0 ms | 9.4 M | 21.5 M |
| S100 | YOLO11m Detect | 640×640 | 80 | 5.63 ms / 175.69 FPS (1 thread ) 10.35 ms / 191.62 FPS (2 threads) |
2.0 ms | 20.1 M | 68.0 M |
| S100 | YOLO11l Detect | 640×640 | 80 | 6.96 ms / 142.36 FPS (1 thread ) 13.02 ms / 152.41 FPS (2 threads) |
2.0 ms | 25.3 M | 86.9 M |
| S100 | YOLO11x Detect | 640×640 | 80 | 13.13 ms / 75.78 FPS (1 thread ) 25.24 ms / 78.82 FPS (2 threads) |
2.0 ms | 56.9 M | 194.9 M |
| S100 | YOLOv10n Detect | 640×640 | 80 | 1.58 ms / 608.94 FPS (1 thread ) 2.32 ms / 837.04 FPS (2 threads) |
2.0 ms | 2.3 M | 6.7 M |
| S100 | YOLOv10s Detect | 640×640 | 80 | 2.53 ms / 385.50 FPS (1 thread ) 4.18 ms / 471.09 FPS (2 threads) |
2.0 ms | 7.2 M | 21.6 M |
| S100 | YOLOv10m Detect | 640×640 | 80 | 4.49 ms / 219.98 FPS (1 thread ) 8.11 ms / 244.17 FPS (2 threads) |
2.0 ms | 15.4 M | 59.1 M |
| S100 | YOLOv10b Detect | 640×640 | 80 | 6.28 ms / 157.57 FPS (1 thread ) 11.65 ms / 170.32 FPS (2 threads) |
2.0 ms | 19.1 M | 92.0 M |
| S100 | YOLOv10l Detect | 640×640 | 80 | 7.95 ms / 124.70 FPS (1 thread ) 14.98 ms / 132.53 FPS (2 threads) |
2.0 ms | 24.4 M | 120.3 M |
| S100 | YOLOv10x Detect | 640×640 | 80 | 10.83 ms / 91.79 FPS (1 thread ) 20.66 ms / 96.17 FPS (2 threads) |
2.0 ms | 29.5 M | 160.4 M |
| S100 | YOLOv9t Detect | 640×640 | 80 | 1.77 ms / 546.03 FPS (1 thread ) 2.67 ms / 730.68 FPS (2 threads) |
2.0 ms | 2.1 M | 8.2 M |
| S100 | YOLOv9s Detect | 640×640 | 80 | 2.74 ms / 357.91 FPS (1 thread ) 4.62 ms / 425.97 FPS (2 threads) |
2.0 ms | 7.2 M | 26.9 M |
| S100 | YOLOv9m Detect | 640×640 | 80 | 5.52 ms / 179.23 FPS (1 thread ) 10.13 ms / 195.30 FPS (2 threads) |
2.0 ms | 20.1 M | 76.8 M |
| S100 | YOLOv9c Detect | 640×640 | 80 | 6.98 ms / 142.00 FPS (1 thread ) 13.05 ms / 151.95 FPS (2 threads) |
2.0 ms | 25.3 M | 102.7 M |
| S100 | YOLOv9e Detect | 640×640 | 80 | 17.75 ms / 56.15 FPS (1 thread ) 34.41 ms / 57.85 FPS (2 threads) |
2.0 ms | 57.4 M | 189.5 M |
| S100 | YOLOv8n Detect | 640×640 | 80 | 1.53 ms / 632.06 FPS (1 thread ) 2.24 ms / 868.87 FPS (2 threads) |
2.0 ms | 3.2 M | 8.7 M |
| S100 | YOLOv8s Detect | 640×640 | 80 | 2.63 ms / 371.16 FPS (1 thread ) 4.41 ms / 446.48 FPS (2 threads) |
2.0 ms | 11.2 M | 28.6 M |
| S100 | YOLOv8m Detect | 640×640 | 80 | 5.18 ms / 190.64 FPS (1 thread ) 9.45 ms / 209.80 FPS (2 threads) |
2.0 ms | 25.9 M | 78.9 M |
| S100 | YOLOv8l Detect | 640×640 | 80 | 9.97 ms / 99.68 FPS (1 thread ) 19.00 ms / 104.65 FPS (2 threads) |
2.0 ms | 43.7 M | 165.2 M |
| S100 | YOLOv8x Detect | 640×640 | 80 | 15.77 ms / 63.15 FPS (1 thread ) 30.53 ms / 65.20 FPS (2 threads) |
2.0 ms | 68.2 M | 257.8 M |
| S100 | YOLOv5nu Detect | 640×640 | 80 | 1.42 ms / 674.92 FPS (1 thread ) 2.02 ms / 959.05 FPS (2 threads) |
2.0 ms | 2.6 M | 7.7 M |
| S100 | YOLOv5su Detect | 640×640 | 80 | 2.31 ms / 420.83 FPS (1 thread ) 3.79 ms / 519.22 FPS (2 threads) |
2.0 ms | 9.1 M | 24.0 M |
| S100 | YOLOv5mu Detect | 640×640 | 80 | 4.50 ms / 218.77 FPS (1 thread ) 8.11 ms / 244.06 FPS (2 threads) |
2.0 ms | 25.1 M | 64.2 M |
| S100 | YOLOv5lu Detect | 640×640 | 80 | 8.96 ms / 110.78 FPS (1 thread ) 16.97 ms / 117.15 FPS (2 threads) |
2.0 ms | 53.2 M | 135.0 M |
| S100 | YOLOv5xu Detect | 640×640 | 80 | 15.97 ms / 62.32 FPS (1 thread ) 30.90 ms / 64.41 FPS (2 threads) |
2.0 ms | 97.2 M | 246.4 M |
Instance Segmentation
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100 | YOLO11n Seg | 640×640 | 80 | 2.06 ms / 463.93 FPS (1 thread ) 3.17 ms / 613.76 FPS (2 threads) |
5.0 ms | 2.9 M | 10.4 M |
| S100 | YOLO11s Seg | 640×640 | 80 | 3.34 ms / 291.16 FPS (1 thread ) 5.69 ms / 344.91 FPS (2 threads) |
5.0 ms | 10.1 M | 35.5 M |
| S100 | YOLO11m Seg | 640×640 | 80 | 7.86 ms / 125.63 FPS (1 thread ) 14.67 ms / 135.16 FPS (2 threads) |
5.0 ms | 22.4 M | 123.3 M |
| S100 | YOLO11l Seg | 640×640 | 80 | 9.17 ms / 108.00 FPS (1 thread ) 17.29 ms / 114.67 FPS (2 threads) |
5.0 ms | 27.6 M | 142.2 M |
| S100 | YOLO11x Seg | 640×640 | 80 | 17.74 ms / 56.07 FPS (1 thread ) 34.33 ms / 57.96 FPS (2 threads) |
5.0 ms | 62.1 M | 319.0 M |
| S100 | YOLOv9c Seg | 640×640 | 80 | 9.07 ms / 109.16 FPS (1 thread ) 17.12 ms / 115.80 FPS (2 threads) |
5.0 ms | 27.7 M | 158.0 M |
| S100 | YOLOv9e Seg | 640×640 | 80 | 20.15 ms / 49.38 FPS (1 thread ) 39.07 ms / 50.91 FPS (2 threads) |
5.0 ms | 59.7 M | 244.8 M |
| S100 | YOLOv8n Seg | 640×640 | 80 | 1.93 ms / 495.35 FPS (1 thread ) 2.98 ms / 652.21 FPS (2 threads) |
5.0 ms | 3.4 M | 12.6 M |
| S100 | YOLOv8s Seg | 640×640 | 80 | 3.37 ms / 288.70 FPS (1 thread ) 5.76 ms / 341.12 FPS (2 threads) |
5.0 ms | 11.8 M | 42.6 M |
| S100 | YOLOv8m Seg | 640×640 | 80 | 6.65 ms / 148.28 FPS (1 thread ) 12.29 ms / 161.07 FPS (2 threads) |
5.0 ms | 27.3 M | 100.2 M |
| S100 | YOLOv8l Seg | 640×640 | 80 | 12.21 ms / 81.34 FPS (1 thread ) 23.32 ms / 85.17 FPS (2 threads) |
5.0 ms | 46.0 M | 220.5 M |
| S100 | YOLOv8x Seg | 640×640 | 80 | 19.51 ms / 51.00 FPS (1 thread ) 37.80 ms / 52.62 FPS (2 threads) |
5.0 ms | 71.8 M | 344.1 M |
Pose Estimation
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100 | YOLO11n Pose | 640×640 | 80 | 1.69 ms / 568.00 FPS (1 thread ) 2.48 ms / 780.47 FPS (2 threads) |
1.0 ms | 2.9 M | 7.6 M |
| S100 | YOLO11s Pose | 640×640 | 80 | 2.76 ms / 354.06 FPS (1 thread ) 4.62 ms / 424.92 FPS (2 threads) |
1.0 ms | 9.9 M | 23.2 M |
| S100 | YOLO11m Pose | 640×640 | 80 | 5.89 ms / 167.50 FPS (1 thread ) 10.79 ms / 183.34 FPS (2 threads) |
1.0 ms | 20.9 M | 71.7 M |
| S100 | YOLO11l Pose | 640×640 | 80 | 7.23 ms / 136.91 FPS (1 thread ) 13.48 ms / 147.21 FPS (2 threads) |
1.0 ms | 26.2 M | 90.7 M |
| S100 | YOLO11x Pose | 640×640 | 80 | 13.61 ms / 73.02 FPS (1 thread ) 26.16 ms / 76.04 FPS (2 threads) |
1.0 ms | 58.8 M | 203.3 M |
| S100 | YOLOv8n Pose | 640×640 | 80 | 1.62 ms / 587.59 FPS (1 thread ) 2.31 ms / 837.95 FPS (2 threads) |
1.0 ms | 3.3 M | 9.2 M |
| S100 | YOLOv8s Pose | 640×640 | 80 | 2.83 ms / 344.35 FPS (1 thread ) 4.71 ms / 417.72 FPS (2 threads) |
1.0 ms | 11.6 M | 30.2 M |
| S100 | YOLOv8m Pose | 640×640 | 80 | 5.47 ms / 180.46 FPS (1 thread ) 9.92 ms / 199.50 FPS (2 threads) |
1.0 ms | 26.4 M | 81.0 M |
| S100 | YOLOv8l Pose | 640×640 | 80 | 10.31 ms / 96.20 FPS (1 thread ) 19.55 ms / 101.60 FPS (2 threads) |
1.0 ms | 44.4 M | 168.6 M |
| S100 | YOLOv8x Pose | 640×640 | 80 | 16.07 ms / 61.88 FPS (1 thread ) 31.01 ms / 64.14 FPS (2 threads) |
1.0 ms | 69.4 M | 263.2 M |
Image Classification
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| S100 | YOLO11n CLS | 640×640 | 80 | 0.53 ms / 1827.92 FPS (1 thread ) 0.62 ms / 3115.65 FPS (2 threads) 0.70 ms / 4141.22 FPS (3 threads) |
0.5 ms | 2.8 M | 4.2 M |
| S100 | YOLO11s CLS | 640×640 | 80 | 0.68 ms / 1415.98 FPS (1 thread ) 0.76 ms / 2553.99 FPS (2 threads) 1.05 ms / 2767.63 FPS (3 threads) |
0.5 ms | 6.7 M | 13.0 M |
| S100 | YOLO11m CLS | 640×640 | 80 | 1.02 ms / 955.28 FPS (1 thread ) 1.35 ms / 1445.18 FPS (2 threads) |
0.5 ms | 11.6 M | 40.3 M |
| S100 | YOLO11l CLS | 640×640 | 80 | 1.21 ms / 805.52 FPS (1 thread ) 1.73 ms / 1139.48 FPS (2 threads) |
0.5 ms | 14.1 M | 50.4 M |
| S100 | YOLO11x CLS | 640×640 | 80 | 1.97 ms / 501.49 FPS (1 thread ) 3.23 ms / 612.29 FPS (2 threads) |
0.5 ms | 29.6 M | 111.3 M |
| S100 | YOLOv8n CLS | 640×640 | 80 | 0.49 ms / 1928.23 FPS (1 thread ) 0.57 ms / 3399.86 FPS (2 threads) 0.66 ms / 4410.92 FPS (3 threads) |
0.5 ms | 2.7 M | 4.3 M |
| S100 | YOLOv8s CLS | 640×640 | 80 | 0.62 ms / 1562.83 FPS (1 thread ) 0.71 ms / 2712.53 FPS (2 threads) 0.89 ms / 3279.66 FPS (3 threads) |
0.5 ms | 6.4 M | 13.5 M |
| S100 | YOLOv8m CLS | 640×640 | 80 | 1.00 ms / 970.04 FPS (1 thread ) 1.31 ms / 1500.86 FPS (2 threads) |
0.5 ms | 17.0 M | 42.7 M |
| S100 | YOLOv8l CLS | 640×640 | 80 | 1.98 ms / 497.58 FPS (1 thread ) 3.22 ms / 614.92 FPS (2 threads) |
0.5 ms | 37.5 M | 99.7 M |
| S100 | YOLOv8x CLS | 640×640 | 80 | 2.77 ms / 357.03 FPS (1 thread ) 4.81 ms / 412.60 FPS (2 threads) |
0.5 ms | 57.4 M | 154.8 M |
Accuracy
Obeject Detection
| Device | Model | Accuracy bbox-all [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-small [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-medium [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| S100 | YOLO12n Detect | 0.338 / 0.311 (92.0 %) | 0.128 / 0.096 (74.9 %) | 0.374 / 0.344 (91.8 %) | 0.524 / 0.507 (96.6 %) |
| S100 | YOLO12s Detect | 0.403 / 0.380 (94.3 %) | 0.201 / 0.156 (77.4 %) | 0.450 / 0.431 (95.9 %) | 0.602 / 0.573 (95.1 %) |
| S100 | YOLO12m Detect | 0.452 / 0.424 (93.8 %) | 0.251 / 0.211 (84.2 %) | 0.509 / 0.488 (95.9 %) | 0.638 / 0.609 (95.4 %) |
| S100 | YOLO12l Detect | 0.463 / 0.431 (93.1 %) | 0.268 / 0.220 (82.0 %) | 0.522 / 0.494 (94.7 %) | 0.646 / 0.629 (97.5 %) |
| S100 | YOLO12x Detect | 0.475 / 0.441 (92.8 %) | 0.276 / 0.215 (78.0 %) | 0.536 / 0.512 (95.5 %) | 0.659 / 0.619 (94.0 %) |
| S100 | YOLO11n Detect | 0.327 / 0.309 (94.5 %) | 0.130 / 0.108 (83.2 %) | 0.357 / 0.338 (94.7 %) | 0.511 / 0.497 (97.4 %) |
| S100 | YOLO11s Detect | 0.400 / 0.380 (95.2 %) | 0.198 / 0.167 (84.5 %) | 0.445 / 0.426 (95.9 %) | 0.587 / 0.575 (97.9 %) |
| S100 | YOLO11m Detect | 0.444 / 0.417 (94.1 %) | 0.247 / 0.211 (85.7 %) | 0.497 / 0.479 (96.3 %) | 0.627 / 0.590 (94.1 %) |
| S100 | YOLO11l Detect | 0.460 / 0.433 (94.1 %) | 0.267 / 0.226 (84.9 %) | 0.520 / 0.495 (95.3 %) | 0.638 / 0.605 (94.9 %) |
| S100 | YOLO11x Detect | 0.474 / 0.445 (93.7 %) | 0.283 / 0.231 (81.4 %) | 0.529 / 0.506 (95.5 %) | 0.652 / 0.623 (95.5 %) |
| S100 | YOLOv10n Detect | 0.303 / 0.278 (91.7 %) | 0.099 / 0.068 (68.6 %) | 0.330 / 0.304 (92.1 %) | 0.478 / 0.455 (95.3 %) |
| S100 | YOLOv10s Detect | 0.386 / 0.354 (91.7 %) | 0.175 / 0.122 (69.7 %) | 0.434 / 0.405 (93.3 %) | 0.574 / 0.529 (92.2 %) |
| S100 | YOLOv10m Detect | 0.425 / 0.374 (88.0 %) | 0.221 / 0.179 (81.0 %) | 0.481 / 0.439 (91.3 %) | 0.603 / 0.490 (81.3 %) |
| S100 | YOLOv10b Detect | 0.443 / 0.380 (85.7 %) | 0.242 / 0.193 (79.7 %) | 0.498 / 0.434 (87.1 %) | 0.618 / 0.469 (75.9 %) |
| S100 | YOLOv10l Detect | 0.445 / 0.380 (85.4 %) | 0.258 / 0.209 (81.3 %) | 0.498 / 0.444 (89.1 %) | 0.626 / 0.476 (76.0 %) |
| S100 | YOLOv10x Detect | 0.459 / 0.413 (90.0 %) | 0.258 / 0.214 (82.9 %) | 0.518 / 0.480 (92.7 %) | 0.639 / 0.539 (84.3 %) |
| S100 | YOLOv9t Detect | 0.313 / 0.300 (95.8 %) | 0.113 / 0.105 (93.7 %) | 0.338 / 0.325 (96.1 %) | 0.483 / 0.458 (94.9 %) |
| S100 | YOLOv9s Detect | 0.400 / 0.383 (95.9 %) | 0.191 / 0.160 (84.0 %) | 0.444 / 0.435 (97.9 %) | 0.583 / 0.556 (95.4 %) |
| S100 | YOLOv9m Detect | 0.449 / 0.434 (96.6 %) | 0.253 / 0.228 (90.3 %) | 0.504 / 0.492 (97.6 %) | 0.617 / 0.593 (96.1 %) |
| S100 | YOLOv9c Detect | 0.461 / 0.445 (96.5 %) | 0.269 / 0.246 (91.6 %) | 0.512 / 0.500 (97.6 %) | 0.640 / 0.610 (95.2 %) |
| S100 | YOLOv9e Detect | 0.481 / 0.460 (95.7 %) | 0.298 / 0.266 (89.3 %) | 0.538 / 0.516 (95.9 %) | 0.662 / 0.626 (94.5 %) |
| S100 | YOLOv8n Detect | 0.309 / 0.291 (94.3 %) | 0.113 / 0.101 (89.3 %) | 0.338 / 0.320 (94.8 %) | 0.473 / 0.448 (94.7 %) |
| S100 | YOLOv8s Detect | 0.391 / 0.373 (95.5 %) | 0.195 / 0.168 (86.2 %) | 0.437 / 0.421 (96.4 %) | 0.566 / 0.556 (98.3 %) |
| S100 | YOLOv8m Detect | 0.441 / 0.419 (95.2 %) | 0.249 / 0.213 (85.7 %) | 0.494 / 0.477 (96.5 %) | 0.618 / 0.602 (97.4 %) |
| S100 | YOLOv8l Detect | 0.461 / 0.441 (95.6 %) | 0.271 / 0.241 (88.9 %) | 0.516 / 0.499 (96.6 %) | 0.651 / 0.625 (96.0 %) |
| S100 | YOLOv8x Detect | 0.474 / 0.449 (94.7 %) | 0.280 / 0.250 (89.2 %) | 0.527 / 0.505 (95.8 %) | 0.658 / 0.628 (95.5 %) |
| S100 | YOLOv5nu Detect | 0.278 / 0.261 (93.6 %) | 0.093 / 0.081 (86.6 %) | 0.309 / 0.287 (93.0 %) | 0.417 / 0.400 (96.0 %) |
| S100 | YOLOv5su Detect | 0.367 / 0.352 (95.9 %) | 0.169 / 0.144 (85.3 %) | 0.416 / 0.402 (96.7 %) | 0.530 / 0.521 (98.3 %) |
| S100 | YOLOv5mu Detect | 0.425 / 0.406 (95.6 %) | 0.226 / 0.195 (86.4 %) | 0.477 / 0.465 (97.6 %) | 0.603 / 0.594 (98.4 %) |
| S100 | YOLOv5lu Detect | 0.458 / 0.437 (95.4 %) | 0.260 / 0.226 (86.9 %) | 0.516 / 0.499 (96.7 %) | 0.641 / 0.628 (97.9 %) |
| S100 | YOLOv5xu Detect | 0.466 / 0.445 (95.6 %) | 0.281 / 0.238 (84.8 %) | 0.523 / 0.506 (96.9 %) | 0.645 / 0.634 (98.3 %) |
Instance Segmentation
bbox
| Device | Model | Accuracy bbox-all [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-small [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-medium [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| S100 | YOLO11n Seg | 0.322 / 0.295 (91.6 %) | 0.113 / 0.084 (74.2 %) | 0.352 / 0.322 (91.3 %) | 0.502 / 0.487 (97.0 %) |
| S100 | YOLO11s Seg | 0.394 / 0.369 (93.7 %) | 0.184 / 0.149 (81.0 %) | 0.442 / 0.419 (94.9 %) | 0.582 / 0.571 (98.1 %) |
| S100 | YOLO11m Seg | 0.443 / 0.414 (93.2 %) | 0.246 / 0.206 (83.5 %) | 0.497 / 0.473 (95.3 %) | 0.627 / 0.590 (94.2 %) |
| S100 | YOLO11l Seg | 0.460 / 0.428 (93.1 %) | 0.267 / 0.217 (81.5 %) | 0.520 / 0.490 (94.3 %) | 0.638 / 0.604 (94.8 %) |
| S100 | YOLO11x Seg | 0.474 / 0.440 (92.8 %) | 0.283 / 0.223 (78.7 %) | 0.529 / 0.501 (94.6 %) | 0.652 / 0.621 (95.2 %) |
| S100 | YOLOv9c Seg | 0.453 / 0.420 (92.6 %) | 0.254 / 0.206 (81.2 %) | 0.508 / 0.479 (94.2 %) | 0.621 / 0.584 (93.9 %) |
| S100 | YOLOv9e Seg | 0.481 / 0.449 (93.5 %) | 0.292 / 0.246 (84.1 %) | 0.537 / 0.506 (94.3 %) | 0.650 / 0.620 (95.4 %) |
| S100 | YOLOv8n Seg | 0.304 / 0.283 (93.1 %) | 0.109 / 0.088 (80.3 %) | 0.334 / 0.310 (92.8 %) | 0.461 / 0.441 (95.5 %) |
| S100 | YOLOv8s Seg | 0.386 / 0.363 (94.1 %) | 0.180 / 0.153 (85.2 %) | 0.432 / 0.405 (93.7 %) | 0.564 / 0.550 (97.5 %) |
| S100 | YOLOv8m Seg | 0.431 / 0.407 (94.4 %) | 0.228 / 0.193 (84.7 %) | 0.486 / 0.468 (96.2 %) | 0.608 / 0.591 (97.2 %) |
| S100 | YOLOv8l Seg | 0.453 / 0.425 (93.9 %) | 0.258 / 0.214 (83.0 %) | 0.502 / 0.484 (96.4 %) | 0.626 / 0.592 (94.6 %) |
| S100 | YOLOv8x Seg | 0.465 / 0.434 (93.4 %) | 0.268 / 0.216 (80.6 %) | 0.520 / 0.494 (95.0 %) | 0.641 / 0.613 (95.6 %) |
mask
| Device | Model | Accuracy mask-all [email protected]:.95 FP32 / BPU Python |
Accuracy mask-small [email protected]:.95 FP32 / BPU Python |
Accuracy mask-medium [email protected]:.95 FP32 / BPU Python |
Accuracy mask-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| S100 | YOLO11n Seg | 0.262 / 0.227 (86.7 %) | 0.062 / 0.046 (75.3 %) | 0.283 / 0.249 (88.0 %) | 0.444 / 0.392 (88.3 %) |
| S100 | YOLO11s Seg | 0.311 / 0.285 (91.7 %) | 0.099 / 0.088 (89.3 %) | 0.350 / 0.322 (91.9 %) | 0.509 / 0.470 (92.3 %) |
| S100 | YOLO11m Seg | 0.347 / 0.313 (90.3 %) | 0.136 / 0.121 (89.0 %) | 0.396 / 0.361 (91.2 %) | 0.549 / 0.482 (87.8 %) |
| S100 | YOLO11l Seg | 0.357 / 0.324 (90.7 %) | 0.143 / 0.124 (86.9 %) | 0.409 / 0.372 (90.8 %) | 0.560 / 0.499 (89.1 %) |
| S100 | YOLO11x Seg | 0.366 / 0.332 (90.6 %) | 0.149 / 0.124 (83.2 %) | 0.420 / 0.381 (90.8 %) | 0.572 / 0.516 (90.3 %) |
| S100 | YOLOv9c Seg | 0.352 / 0.317 (90.1 %) | 0.132 / 0.116 (87.9 %) | 0.404 / 0.366 (90.5 %) | 0.547 / 0.485 (88.6 %) |
| S100 | YOLOv9e Seg | 0.371 / 0.340 (91.6 %) | 0.155 / 0.137 (88.2 %) | 0.425 / 0.386 (90.8 %) | 0.571 / 0.521 (91.3 %) |
| S100 | YOLOv8n Seg | 0.246 / 0.220 (89.8 %) | 0.059 / 0.049 (83.0 %) | 0.265 / 0.242 (91.5 %) | 0.409 / 0.365 (89.3 %) |
| S100 | YOLOv8s Seg | 0.305 / 0.281 (92.3 %) | 0.096 / 0.088 (92.3 %) | 0.343 / 0.313 (91.2 %) | 0.496 / 0.459 (92.5 %) |
| S100 | YOLOv8m Seg | 0.337 / 0.311 (92.2 %) | 0.121 / 0.112 (92.1 %) | 0.386 / 0.358 (92.8 %) | 0.533 / 0.484 (90.7 %) |
| S100 | YOLOv8l Seg | 0.351 / 0.326 (92.8 %) | 0.137 / 0.124 (90.8 %) | 0.398 / 0.372 (93.4 %) | 0.550 / 0.495 (90.1 %) |
| S100 | YOLOv8x Seg | 0.358 / 0.330 (92.0 %) | 0.139 / 0.120 (86.6 %) | 0.409 / 0.377 (92.0 %) | 0.562 / 0.508 (90.4 %) |
Pose Estimation
| Device | Model | Accuracy pose-all [email protected]:.95 FP32 / BPU Python |
Accuracy pose-medium [email protected]:.95 FP32 / BPU Python |
Accuracy pose-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|
| S100 | YOLO11n Pose | 0.465 / 0.451 (97.0 %) | 0.386 / 0.375 (97.2 %) | 0.597 / 0.576 (96.5 %) |
| S100 | YOLO11s Pose | 0.559 / 0.531 (95.0 %) | 0.495 / 0.465 (94.0 %) | 0.672 / 0.647 (96.3 %) |
| S100 | YOLO11m Pose | 0.627 / 0.601 (95.9 %) | 0.586 / 0.559 (95.4 %) | 0.711 / 0.690 (97.0 %) |
| S100 | YOLO11l Pose | 0.636 / 0.615 (96.6 %) | 0.592 / 0.571 (96.6 %) | 0.726 / 0.698 (96.1 %) |
| S100 | YOLO11x Pose | 0.672 / 0.651 (96.8 %) | 0.634 / 0.607 (95.8 %) | 0.750 / 0.734 (97.9 %) |
| S100 | YOLOv8n Pose | 0.476 / 0.461 (96.9 %) | 0.391 / 0.372 (95.1 %) | 0.610 / 0.595 (97.6 %) |
| S100 | YOLOv8s Pose | 0.578 / 0.548 (94.9 %) | 0.510 / 0.475 (93.3 %) | 0.692 / 0.667 (96.4 %) |
| S100 | YOLOv8m Pose | 0.630 / 0.604 (95.9 %) | 0.578 / 0.551 (95.4 %) | 0.724 / 0.699 (96.5 %) |
| S100 | YOLOv8l Pose | 0.657 / 0.632 (96.3 %) | 0.607 / 0.578 (95.2 %) | 0.747 / 0.728 (97.5 %) |
| S100 | YOLOv8x Pose | 0.671 / 0.649 (96.7 %) | 0.624 / 0.596 (95.5 %) | 0.757 / 0.739 (97.6 %) |
Image Classification
| Device | Model | Accuracy TOP1 FP32 / BPU Python |
Accuracy TOP5 FP32 / BPU Python |
|---|---|---|---|
| S100 | YOLO11n CLS | 0.700 / 0.590 (84.3 %) | 0.894 / 0.820 (91.7 %) |
| S100 | YOLO11s CLS | 0.754 / 0.667 (88.6 %) | 0.927 / 0.875 (94.5 %) |
| S100 | YOLO11m CLS | 0.773 / 0.706 (91.3 %) | 0.939 / 0.902 (96.1 %) |
| S100 | YOLO11l CLS | 0.783 / 0.712 (90.9 %) | 0.942 / 0.906 (96.1 %) |
| S100 | YOLO11x CLS | 0.795 / 0.733 (92.2 %) | 0.949 / 0.918 (96.7 %) |
| S100 | YOLOv8n CLS | 0.689 / 0.570 (82.7 %) | 0.883 / 0.802 (90.8 %) |
| S100 | YOLOv8s CLS | 0.737 / 0.636 (86.3 %) | 0.917 / 0.852 (92.9 %) |
| S100 | YOLOv8m CLS | 0.768 / 0.702 (91.4 %) | 0.935 / 0.899 (96.2 %) |
| S100 | YOLOv8l CLS | 0.783 / 0.723 (92.3 %) | 0.942 / 0.909 (96.5 %) |
| S100 | YOLOv8x CLS | 0.790 / 0.742 (93.9 %) | 0.945 / 0.921 (97.5 %) |
Performance
RDK X5
Obeject Detection
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| X5 | YOLO12n Detect | 640×640 | 80 | 39.70 ms / 25.17 FPS (1 thread ) 73.19 ms / 27.24 FPS (2 threads) |
5.0 ms | 2.6 M | 7.7 M |
| X5 | YOLO12s Detect | 640×640 | 80 | 63.74 ms / 15.68 FPS (1 thread ) 121.24 ms / 16.45 FPS (2 threads) |
5.0 ms | 9.3 M | 21.4 M |
| X5 | YOLO12m Detect | 640×640 | 80 | 103.02 ms / 9.70 FPS (1 thread ) 199.58 ms / 9.99 FPS (2 threads) |
5.0 ms | 20.2 M | 67.5 M |
| X5 | YOLO12l Detect | 640×640 | 80 | 183.00 ms / 5.46 FPS (1 thread ) 359.03 ms / 5.56 FPS (2 threads) |
5.0 ms | 26.4 M | 88.9 M |
| X5 | YOLO12x Detect | 640×640 | 80 | 315.16 ms / 3.17 FPS (1 thread ) | 5.0 ms | 59.1 M | 199.0 M |
| X5 | YOLO11n Detect | 640×640 | 80 | 8.25 ms / 121.05 FPS (1 thread ) 10.56 ms / 188.57 FPS (2 threads) |
5.0 ms | 2.6 M | 6.5 M |
| X5 | YOLO11s Detect | 640×640 | 80 | 15.81 ms / 63.16 FPS (1 thread ) 25.74 ms / 77.43 FPS (2 threads) |
5.0 ms | 9.4 M | 21.5 M |
| X5 | YOLO11m Detect | 640×640 | 80 | 34.68 ms / 28.82 FPS (1 thread ) 63.30 ms / 31.51 FPS (2 threads) |
5.0 ms | 20.1 M | 68.0 M |
| X5 | YOLO11l Detect | 640×640 | 80 | 45.23 ms / 22.10 FPS (1 thread ) 84.30 ms / 23.66 FPS (2 threads) |
5.0 ms | 25.3 M | 86.9 M |
| X5 | YOLO11x Detect | 640×640 | 80 | 96.70 ms / 10.34 FPS (1 thread ) 186.76 ms / 10.68 FPS (2 threads) |
5.0 ms | 56.9 M | 194.9 M |
| X5 | YOLOv10n Detect | 640×640 | 80 | 8.75 ms / 114.19 FPS (1 thread ) 11.60 ms / 171.72 FPS (2 threads) |
5.0 ms | 2.3 M | 6.7 M |
| X5 | YOLOv10s Detect | 640×640 | 80 | 14.84 ms / 67.32 FPS (1 thread ) 23.85 ms / 83.58 FPS (2 threads) |
5.0 ms | 7.2 M | 21.6 M |
| X5 | YOLOv10m Detect | 640×640 | 80 | 29.40 ms / 33.99 FPS (1 thread ) 52.83 ms / 37.75 FPS (2 threads) |
5.0 ms | 15.4 M | 59.1 M |
| X5 | YOLOv10b Detect | 640×640 | 80 | 40.14 ms / 24.90 FPS (1 thread ) 74.20 ms / 26.88 FPS (2 threads) |
5.0 ms | 19.1 M | 92.0 M |
| X5 | YOLOv10l Detect | 640×640 | 80 | 49.89 ms / 20.04 FPS (1 thread ) 93.66 ms / 21.30 FPS (2 threads) |
5.0 ms | 24.4 M | 120.3 M |
| X5 | YOLOv10x Detect | 640×640 | 80 | 68.92 ms / 14.51 FPS (1 thread ) 131.54 ms / 15.16 FPS (2 threads) |
5.0 ms | 29.5 M | 160.4 M |
| X5 | YOLOv9t Detect | 640×640 | 80 | 6.97 ms / 143.14 FPS (1 thread ) 7.96 ms / 250.11 FPS (2 threads) |
5.0 ms | 2.1 M | 8.2 M |
| X5 | YOLOv9s Detect | 640×640 | 80 | 13.00 ms / 76.81 FPS (1 thread ) 20.16 ms / 98.81 FPS (2 threads) |
5.0 ms | 7.2 M | 26.9 M |
| X5 | YOLOv9m Detect | 640×640 | 80 | 32.63 ms / 30.63 FPS (1 thread ) 59.31 ms / 33.62 FPS (2 threads) |
5.0 ms | 20.1 M | 76.8 M |
| X5 | YOLOv9c Detect | 640×640 | 80 | 40.46 ms / 24.71 FPS (1 thread ) 74.77 ms / 26.67 FPS (2 threads) |
5.0 ms | 25.3 M | 102.7 M |
| X5 | YOLOv9e Detect | 640×640 | 80 | 119.80 ms / 8.35 FPS (1 thread ) 233.08 ms / 8.56 FPS (2 threads) |
5.0 ms | 57.4 M | 189.5 M |
| X5 | YOLOv8n Detect | 640×640 | 80 | 7.00 ms / 142.60 FPS (1 thread ) 8.06 ms / 246.82 FPS (2 threads) |
5.0 ms | 3.2 M | 8.7 M |
| X5 | YOLOv8s Detect | 640×640 | 80 | 13.63 ms / 73.30 FPS (1 thread ) 21.38 ms / 93.20 FPS (2 threads) |
5.0 ms | 11.2 M | 28.6 M |
| X5 | YOLOv8m Detect | 640×640 | 80 | 30.74 ms / 32.51 FPS (1 thread ) 55.51 ms / 35.93 FPS (2 threads) |
5.0 ms | 25.9 M | 78.9 M |
| X5 | YOLOv8l Detect | 640×640 | 80 | 59.51 ms / 16.80 FPS (1 thread ) 112.80 ms / 17.68 FPS (2 threads) |
5.0 ms | 43.7 M | 165.2 M |
| X5 | YOLOv8x Detect | 640×640 | 80 | 92.72 ms / 10.78 FPS (1 thread ) 178.95 ms / 11.15 FPS (2 threads) |
5.0 ms | 68.2 M | 257.8 M |
| X5 | YOLOv5nu Detect | 640×640 | 80 | 6.33 ms / 157.59 FPS (1 thread ) 6.80 ms / 291.89 FPS (2 threads) |
5.0 ms | 2.6 M | 7.7 M |
| X5 | YOLOv5su Detect | 640×640 | 80 | 12.33 ms / 81.04 FPS (1 thread ) 18.88 ms / 105.56 FPS (2 threads) |
5.0 ms | 9.1 M | 24.0 M |
| X5 | YOLOv5mu Detect | 640×640 | 80 | 26.57 ms / 37.62 FPS (1 thread ) 47.20 ms / 42.24 FPS (2 threads) |
5.0 ms | 25.1 M | 64.2 M |
| X5 | YOLOv5lu Detect | 640×640 | 80 | 52.83 ms / 18.92 FPS (1 thread ) 99.42 ms / 20.06 FPS (2 threads) |
5.0 ms | 53.2 M | 135.0 M |
| X5 | YOLOv5xu Detect | 640×640 | 80 | 91.55 ms / 10.92 FPS (1 thread ) 176.49 ms / 11.30 FPS (2 threads) |
5.0 ms | 97.2 M | 246.4 M |
Instance Segmentation
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| X5 | YOLO11n Seg | 640×640 | 80 | 11.55 ms / 86.39 FPS (1 thread ) 12.83 ms / 155.10 FPS (2 threads) |
20.0 ms | 2.9 M | 10.4 M |
| X5 | YOLO11s Seg | 640×640 | 80 | 21.62 ms / 46.22 FPS (1 thread ) 33.12 ms / 60.20 FPS (2 threads) |
20.0 ms | 10.1 M | 35.5 M |
| X5 | YOLO11m Seg | 640×640 | 80 | 50.43 ms / 19.82 FPS (1 thread ) 90.49 ms / 22.04 FPS (2 threads) |
20.0 ms | 22.4 M | 123.3 M |
| X5 | YOLO11l Seg | 640×640 | 80 | 60.60 ms / 16.50 FPS (1 thread ) 110.99 ms / 17.97 FPS (2 threads) |
20.0 ms | 27.6 M | 142.2 M |
| X5 | YOLO11x Seg | 640×640 | 80 | 130.40 ms / 7.67 FPS (1 thread ) 249.71 ms / 7.99 FPS (2 threads) |
20.0 ms | 62.1 M | 319.0 M |
| X5 | YOLOv9c Seg | 640×640 | 80 | 55.85 ms / 17.90 FPS (1 thread ) 101.47 ms / 19.65 FPS (2 threads) |
20.0 ms | 27.7 M | 158.0 M |
| X5 | YOLOv9e Seg | 640×640 | 80 | 135.34 ms / 7.39 FPS (1 thread ) 260.08 ms / 7.67 FPS (2 threads) |
20.0 ms | 59.7 M | 244.8 M |
| X5 | YOLOv8n Seg | 640×640 | 80 | 10.40 ms / 96.02 FPS (1 thread ) 10.75 ms / 185.21 FPS (2 threads) |
20.0 ms | 3.4 M | 12.6 M |
| X5 | YOLOv8s Seg | 640×640 | 80 | 19.56 ms / 51.08 FPS (1 thread ) 28.99 ms / 68.76 FPS (2 threads) |
20.0 ms | 11.8 M | 42.6 M |
| X5 | YOLOv8m Seg | 640×640 | 80 | 40.52 ms / 24.67 FPS (1 thread ) 70.70 ms / 28.21 FPS (2 threads) |
20.0 ms | 27.3 M | 100.2 M |
| X5 | YOLOv8l Seg | 640×640 | 80 | 75.00 ms / 13.33 FPS (1 thread ) 139.61 ms / 14.29 FPS (2 threads) |
20.0 ms | 46.0 M | 220.5 M |
| X5 | YOLOv8x Seg | 640×640 | 80 | 115.94 ms / 8.62 FPS (1 thread ) 221.06 ms / 9.02 FPS (2 threads) |
20.0 ms | 71.8 M | 344.1 M |
Pose Estimation
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| X5 | YOLO11n Pose | 640×640 | 80 | 8.36 ms / 119.43 FPS (1 thread ) 10.97 ms / 181.61 FPS (2 threads) |
10.0 ms | 2.9 M | 7.6 M |
| X5 | YOLO11s Pose | 640×640 | 80 | 16.35 ms / 61.11 FPS (1 thread ) 26.99 ms / 73.85 FPS (2 threads) |
10.0 ms | 9.9 M | 23.2 M |
| X5 | YOLO11m Pose | 640×640 | 80 | 35.74 ms / 27.97 FPS (1 thread ) 65.60 ms / 30.40 FPS (2 threads) |
10.0 ms | 20.9 M | 71.7 M |
| X5 | YOLO11l Pose | 640×640 | 80 | 46.38 ms / 21.55 FPS (1 thread ) 86.82 ms / 22.97 FPS (2 threads) |
10.0 ms | 26.2 M | 90.7 M |
| X5 | YOLO11x Pose | 640×640 | 80 | 98.88 ms / 10.11 FPS (1 thread ) 191.38 ms / 10.42 FPS (2 threads) |
10.0 ms | 58.8 M | 203.3 M |
| X5 | YOLOv8n Pose | 640×640 | 80 | 6.95 ms / 143.64 FPS (1 thread ) 8.23 ms / 241.76 FPS (2 threads) |
10.0 ms | 3.3 M | 9.2 M |
| X5 | YOLOv8s Pose | 640×640 | 80 | 14.16 ms / 70.54 FPS (1 thread ) 22.62 ms / 88.09 FPS (2 threads) |
10.0 ms | 11.6 M | 30.2 M |
| X5 | YOLOv8m Pose | 640×640 | 80 | 31.60 ms / 31.62 FPS (1 thread ) 57.34 ms / 34.78 FPS (2 threads) |
10.0 ms | 26.4 M | 81.0 M |
| X5 | YOLOv8l Pose | 640×640 | 80 | 60.37 ms / 16.56 FPS (1 thread ) 114.73 ms / 17.38 FPS (2 threads) |
10.0 ms | 44.4 M | 168.6 M |
| X5 | YOLOv8x Pose | 640×640 | 80 | 94.15 ms / 10.62 FPS (1 thread ) 182.08 ms / 10.96 FPS (2 threads) |
10.0 ms | 69.4 M | 263.2 M |
Image Classification
| Device | Model | Size(Pixels) | Classes | BPU Task Latency / BPU Throughput (Threads) |
CPU Latency (Single Core) |
params(M) | FLOPs(B) |
|---|---|---|---|---|---|---|---|
| X5 | YOLO11n CLS | 640×640 | 80 | 1.06 ms / 939.95 FPS (1 thread ) 1.61 ms / 1236.07 FPS (2 threads) |
0.5 ms | 2.8 M | 4.2 M |
| X5 | YOLO11s CLS | 640×640 | 80 | 2.01 ms / 495.14 FPS (1 thread ) 3.49 ms / 569.44 FPS (2 threads) |
0.5 ms | 6.7 M | 13.0 M |
| X5 | YOLO11m CLS | 640×640 | 80 | 3.82 ms / 261.13 FPS (1 thread ) 7.09 ms / 280.82 FPS (2 threads) |
0.5 ms | 11.6 M | 40.3 M |
| X5 | YOLO11l CLS | 640×640 | 80 | 5.02 ms / 199.15 FPS (1 thread ) 9.49 ms / 210.12 FPS (2 threads) |
0.5 ms | 14.1 M | 50.4 M |
| X5 | YOLO11x CLS | 640×640 | 80 | 10.04 ms / 99.49 FPS (1 thread ) 19.48 ms / 102.39 FPS (2 threads) |
0.5 ms | 29.6 M | 111.3 M |
| X5 | YOLOv8n CLS | 640×640 | 80 | 0.74 ms / 1348.98 FPS (1 thread ) 0.98 ms / 2018.94 FPS (2 threads) |
0.5 ms | 2.7 M | 4.3 M |
| X5 | YOLOv8s CLS | 640×640 | 80 | 1.44 ms / 690.86 FPS (1 thread ) 2.36 ms / 842.52 FPS (2 threads) |
0.5 ms | 6.4 M | 13.5 M |
| X5 | YOLOv8m CLS | 640×640 | 80 | 3.66 ms / 272.72 FPS (1 thread ) 6.78 ms / 294.01 FPS (2 threads) |
0.5 ms | 17.0 M | 42.7 M |
| X5 | YOLOv8l CLS | 640×640 | 80 | 7.98 ms / 125.23 FPS (1 thread ) 15.38 ms / 129.63 FPS (2 threads) |
0.5 ms | 37.5 M | 99.7 M |
| X5 | YOLOv8x CLS | 640×640 | 80 | 13.12 ms / 76.18 FPS (1 thread ) 25.64 ms / 77.78 FPS (2 threads) |
0.5 ms | 57.4 M | 154.8 M |
Accuracy
Obeject Detection
| Device | Model | Accuracy bbox-all [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-small [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-medium [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| X5 | YOLO12n Detect | 0.338 / 0.313 (92.5 %) | 0.128 / 0.095 (74.3 %) | 0.374 / 0.343 (91.7 %) | 0.524 / 0.511 (97.4 %) |
| X5 | YOLO12s Detect | 0.403 / 0.379 (94.0 %) | 0.201 / 0.157 (78.1 %) | 0.450 / 0.427 (95.0 %) | 0.602 / 0.575 (95.5 %) |
| X5 | YOLO12m Detect | 0.452 / 0.424 (93.8 %) | 0.251 / 0.208 (82.7 %) | 0.509 / 0.489 (96.1 %) | 0.638 / 0.617 (96.7 %) |
| X5 | YOLO12l Detect | 0.463 / 0.434 (93.8 %) | 0.268 / 0.212 (78.9 %) | 0.522 / 0.499 (95.6 %) | 0.646 / 0.630 (97.6 %) |
| X5 | YOLO12x Detect | 0.475 / 0.443 (93.3 %) | 0.276 / 0.227 (82.3 %) | 0.536 / 0.513 (95.7 %) | 0.659 / 0.632 (95.9 %) |
| X5 | YOLO11n Detect | 0.327 / 0.310 (95.1 %) | 0.130 / 0.110 (84.8 %) | 0.357 / 0.341 (95.4 %) | 0.511 / 0.498 (97.5 %) |
| X5 | YOLO11s Detect | 0.400 / 0.381 (95.2 %) | 0.198 / 0.165 (83.1 %) | 0.445 / 0.426 (95.8 %) | 0.587 / 0.577 (98.3 %) |
| X5 | YOLO11m Detect | 0.444 / 0.278 (62.7 %) | 0.247 / 0.048 (19.3 %) | 0.497 / 0.299 (60.2 %) | 0.627 / 0.490 (78.2 %) |
| X5 | YOLO11l Detect | 0.460 / 0.435 (94.7 %) | 0.267 / 0.224 (84.1 %) | 0.520 / 0.499 (96.0 %) | 0.638 / 0.611 (95.8 %) |
| X5 | YOLO11x Detect | 0.474 / 0.445 (93.9 %) | 0.283 / 0.233 (82.3 %) | 0.529 / 0.505 (95.3 %) | 0.652 / 0.627 (96.2 %) |
| X5 | YOLOv10n Detect | 0.303 / 0.280 (92.5 %) | 0.099 / 0.079 (79.2 %) | 0.330 / 0.302 (91.3 %) | 0.478 / 0.457 (95.7 %) |
| X5 | YOLOv10s Detect | 0.386 / 0.357 (92.4 %) | 0.175 / 0.131 (74.7 %) | 0.434 / 0.406 (93.6 %) | 0.574 / 0.520 (90.6 %) |
| X5 | YOLOv10m Detect | 0.425 / 0.379 (89.1 %) | 0.221 / 0.181 (82.0 %) | 0.481 / 0.439 (91.3 %) | 0.603 / 0.502 (83.2 %) |
| X5 | YOLOv10b Detect | 0.443 / 0.390 (88.1 %) | 0.242 / 0.207 (85.6 %) | 0.498 / 0.435 (87.2 %) | 0.618 / 0.502 (81.2 %) |
| X5 | YOLOv10l Detect | 0.445 / 0.379 (85.1 %) | 0.258 / 0.211 (81.9 %) | 0.498 / 0.440 (88.2 %) | 0.626 / 0.476 (76.1 %) |
| X5 | YOLOv10x Detect | 0.459 / 0.418 (91.3 %) | 0.258 / 0.216 (83.8 %) | 0.518 / 0.480 (92.8 %) | 0.639 / 0.562 (88.0 %) |
| X5 | YOLOv9t Detect | 0.313 / 0.299 (95.6 %) | 0.113 / 0.105 (93.1 %) | 0.338 / 0.322 (95.5 %) | 0.483 / 0.456 (94.5 %) |
| X5 | YOLOv9s Detect | 0.400 / 0.384 (96.2 %) | 0.191 / 0.174 (90.9 %) | 0.444 / 0.430 (96.8 %) | 0.583 / 0.557 (95.6 %) |
| X5 | YOLOv9m Detect | 0.449 / 0.432 (96.3 %) | 0.253 / 0.227 (89.6 %) | 0.504 / 0.488 (96.8 %) | 0.617 / 0.604 (97.9 %) |
| X5 | YOLOv9c Detect | 0.461 / 0.440 (95.5 %) | 0.269 / 0.242 (90.1 %) | 0.512 / 0.497 (96.9 %) | 0.640 / 0.611 (95.4 %) |
| X5 | YOLOv9e Detect | 0.481 / 0.462 (96.1 %) | 0.298 / 0.268 (90.1 %) | 0.538 / 0.514 (95.5 %) | 0.662 / 0.642 (97.0 %) |
| X5 | YOLOv8n Detect | 0.309 / 0.293 (94.7 %) | 0.113 / 0.103 (90.9 %) | 0.338 / 0.323 (95.4 %) | 0.473 / 0.448 (94.7 %) |
| X5 | YOLOv8s Detect | 0.391 / 0.378 (96.7 %) | 0.195 / 0.174 (89.3 %) | 0.437 / 0.426 (97.5 %) | 0.566 / 0.558 (98.6 %) |
| X5 | YOLOv8m Detect | 0.441 / 0.425 (96.4 %) | 0.249 / 0.220 (88.6 %) | 0.494 / 0.480 (97.0 %) | 0.618 / 0.612 (99.0 %) |
| X5 | YOLOv8l Detect | 0.461 / 0.444 (96.2 %) | 0.271 / 0.243 (89.6 %) | 0.516 / 0.501 (97.1 %) | 0.651 / 0.628 (96.4 %) |
| X5 | YOLOv8x Detect | 0.474 / 0.451 (95.1 %) | 0.280 / 0.251 (89.7 %) | 0.527 / 0.504 (95.6 %) | 0.658 / 0.638 (97.0 %) |
| X5 | YOLOv5nu Detect | 0.278 / 0.212 (76.0 %) | 0.093 / 0.043 (46.2 %) | 0.309 / 0.219 (71.0 %) | 0.417 / 0.356 (85.5 %) |
| X5 | YOLOv5su Detect | 0.367 / 0.354 (96.5 %) | 0.169 / 0.148 (88.0 %) | 0.416 / 0.402 (96.7 %) | 0.530 / 0.523 (98.6 %) |
| X5 | YOLOv5mu Detect | 0.425 / 0.406 (95.6 %) | 0.226 / 0.195 (86.1 %) | 0.477 / 0.461 (96.7 %) | 0.603 / 0.594 (98.5 %) |
| X5 | YOLOv5lu Detect | 0.458 / 0.440 (96.0 %) | 0.260 / 0.226 (87.0 %) | 0.516 / 0.503 (97.3 %) | 0.641 / 0.627 (97.7 %) |
| X5 | YOLOv5xu Detect | 0.466 / 0.448 (96.2 %) | 0.281 / 0.241 (85.8 %) | 0.523 / 0.512 (98.0 %) | 0.645 / 0.639 (99.2 %) |
Instance Segmentation
bbox
| Device | Model | Accuracy bbox-all [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-small [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-medium [email protected]:.95 FP32 / BPU Python |
Accuracy bbox-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| X5 | YOLO11n Seg | 0.322 / 0.294 (91.4 %) | 0.113 / 0.089 (78.8 %) | 0.352 / 0.320 (90.9 %) | 0.502 / 0.479 (95.3 %) |
| X5 | YOLO11s Seg | 0.394 / 0.373 (94.7 %) | 0.184 / 0.155 (84.2 %) | 0.442 / 0.422 (95.5 %) | 0.582 / 0.570 (97.9 %) |
| X5 | YOLO11m Seg | 0.443 / 0.413 (93.2 %) | 0.246 / 0.199 (80.8 %) | 0.497 / 0.472 (95.0 %) | 0.627 / 0.601 (95.9 %) |
| X5 | YOLO11l Seg | 0.460 / 0.430 (93.6 %) | 0.267 / 0.216 (80.9 %) | 0.520 / 0.494 (95.2 %) | 0.638 / 0.609 (95.5 %) |
| X5 | YOLO11x Seg | 0.474 / 0.441 (92.9 %) | 0.283 / 0.225 (79.3 %) | 0.529 / 0.500 (94.4 %) | 0.652 / 0.625 (95.9 %) |
| X5 | YOLOv9c Seg | 0.453 / 0.422 (93.0 %) | 0.254 / 0.205 (80.7 %) | 0.508 / 0.482 (94.7 %) | 0.621 / 0.605 (97.3 %) |
| X5 | YOLOv9e Seg | 0.481 / 0.452 (94.2 %) | 0.292 / 0.254 (86.7 %) | 0.537 / 0.507 (94.4 %) | 0.650 / 0.632 (97.1 %) |
| X5 | YOLOv8n Seg | 0.304 / 0.286 (94.1 %) | 0.109 / 0.091 (83.6 %) | 0.334 / 0.314 (94.0 %) | 0.461 / 0.446 (96.7 %) |
| X5 | YOLOv8s Seg | 0.386 / 0.368 (95.2 %) | 0.180 / 0.155 (86.4 %) | 0.432 / 0.413 (95.4 %) | 0.564 / 0.554 (98.3 %) |
| X5 | YOLOv8m Seg | 0.431 / 0.410 (95.1 %) | 0.228 / 0.197 (86.4 %) | 0.486 / 0.469 (96.5 %) | 0.608 / 0.596 (98.0 %) |
| X5 | YOLOv8l Seg | 0.453 / 0.427 (94.2 %) | 0.258 / 0.216 (83.5 %) | 0.502 / 0.487 (96.9 %) | 0.626 / 0.605 (96.6 %) |
| X5 | YOLOv8x Seg | 0.465 / 0.438 (94.3 %) | 0.268 / 0.219 (81.6 %) | 0.520 / 0.499 (96.0 %) | 0.641 / 0.626 (97.8 %) |
mask
| Device | Model | Accuracy mask-all [email protected]:.95 FP32 / BPU Python |
Accuracy mask-small [email protected]:.95 FP32 / BPU Python |
Accuracy mask-medium [email protected]:.95 FP32 / BPU Python |
Accuracy mask-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|---|
| X5 | YOLO11n Seg | 0.262 / 0.224 (85.6 %) | 0.062 / 0.049 (79.2 %) | 0.283 / 0.245 (86.6 %) | 0.444 / 0.384 (86.5 %) |
| X5 | YOLO11s Seg | 0.311 / 0.288 (92.6 %) | 0.099 / 0.092 (93.0 %) | 0.350 / 0.324 (92.7 %) | 0.509 / 0.470 (92.3 %) |
| X5 | YOLO11m Seg | 0.347 / 0.314 (90.5 %) | 0.136 / 0.115 (84.6 %) | 0.396 / 0.361 (91.2 %) | 0.549 / 0.492 (89.5 %) |
| X5 | YOLO11l Seg | 0.357 / 0.325 (90.9 %) | 0.143 / 0.125 (87.1 %) | 0.409 / 0.373 (91.1 %) | 0.560 / 0.504 (90.0 %) |
| X5 | YOLO11x Seg | 0.366 / 0.332 (90.6 %) | 0.149 / 0.125 (84.5 %) | 0.420 / 0.379 (90.2 %) | 0.572 / 0.520 (91.0 %) |
| X5 | YOLOv9c Seg | 0.352 / 0.319 (90.5 %) | 0.132 / 0.115 (87.0 %) | 0.404 / 0.366 (90.6 %) | 0.547 / 0.500 (91.6 %) |
| X5 | YOLOv9e Seg | 0.371 / 0.342 (92.2 %) | 0.155 / 0.142 (91.6 %) | 0.425 / 0.386 (90.9 %) | 0.571 / 0.527 (92.3 %) |
| X5 | YOLOv8n Seg | 0.246 / 0.222 (90.4 %) | 0.059 / 0.052 (87.6 %) | 0.265 / 0.246 (92.8 %) | 0.409 / 0.368 (89.9 %) |
| X5 | YOLOv8s Seg | 0.305 / 0.284 (93.0 %) | 0.096 / 0.089 (93.3 %) | 0.343 / 0.318 (92.9 %) | 0.496 / 0.462 (93.1 %) |
| X5 | YOLOv8m Seg | 0.337 / 0.314 (93.2 %) | 0.121 / 0.113 (93.6 %) | 0.386 / 0.360 (93.4 %) | 0.533 / 0.493 (92.5 %) |
| X5 | YOLOv8l Seg | 0.351 / 0.327 (93.2 %) | 0.137 / 0.124 (90.5 %) | 0.398 / 0.374 (94.0 %) | 0.550 / 0.506 (92.1 %) |
| X5 | YOLOv8x Seg | 0.358 / 0.332 (92.6 %) | 0.139 / 0.121 (87.1 %) | 0.409 / 0.380 (92.7 %) | 0.562 / 0.517 (91.9 %) |
Pose Estimation
| Device | Model | Accuracy pose-all [email protected]:.95 FP32 / BPU Python |
Accuracy pose-medium [email protected]:.95 FP32 / BPU Python |
Accuracy pose-large [email protected]:.95 FP32 / BPU Python |
|---|---|---|---|---|
| X5 | YOLO11n Pose | 0.465 / 0.453 (97.3 %) | 0.386 / 0.379 (98.2 %) | 0.597 / 0.577 (96.6 %) |
| X5 | YOLO11s Pose | 0.559 / 0.532 (95.1 %) | 0.495 / 0.468 (94.7 %) | 0.672 / 0.644 (95.8 %) |
| X5 | YOLO11m Pose | 0.627 / 0.609 (97.1 %) | 0.586 / 0.565 (96.4 %) | 0.711 / 0.693 (97.4 %) |
| X5 | YOLO11l Pose | 0.636 / 0.619 (97.3 %) | 0.592 / 0.569 (96.3 %) | 0.726 / 0.710 (97.8 %) |
| X5 | YOLO11x Pose | 0.672 / 0.650 (96.8 %) | 0.634 / 0.609 (96.1 %) | 0.750 / 0.733 (97.8 %) |
| X5 | YOLOv8n Pose | 0.476 / 0.459 (96.4 %) | 0.391 / 0.373 (95.3 %) | 0.610 / 0.594 (97.4 %) |
| X5 | YOLOv8s Pose | 0.578 / 0.551 (95.3 %) | 0.510 / 0.478 (93.7 %) | 0.692 / 0.667 (96.5 %) |
| X5 | YOLOv8m Pose | 0.630 / 0.606 (96.2 %) | 0.578 / 0.552 (95.6 %) | 0.724 / 0.692 (95.5 %) |
| X5 | YOLOv8l Pose | 0.657 / 0.632 (96.3 %) | 0.607 / 0.582 (95.8 %) | 0.747 / 0.725 (97.0 %) |
| X5 | YOLOv8x Pose | 0.671 / 0.648 (96.6 %) | 0.624 / 0.599 (96.0 %) | 0.757 / 0.736 (97.2 %) |
Image Classification
| Device | Model | Accuracy TOP1 FP32 / BPU Python |
Accuracy TOP5 FP32 / BPU Python |
|---|---|---|---|
| X5 | YOLO11n CLS | 0.700 / 0.585 (83.6 %) | 0.894 / 0.815 (91.2 %) |
| X5 | YOLO11s CLS | 0.754 / 0.663 (88.0 %) | 0.927 / 0.873 (94.2 %) |
| X5 | YOLO11m CLS | 0.773 / 0.708 (91.5 %) | 0.939 / 0.903 (96.1 %) |
| X5 | YOLO11l CLS | 0.783 / 0.714 (91.1 %) | 0.942 / 0.906 (96.1 %) |
| X5 | YOLO11x CLS | 0.795 / 0.733 (92.2 %) | 0.949 / 0.917 (96.6 %) |
| X5 | YOLOv8n CLS | 0.689 / 0.574 (83.2 %) | 0.883 / 0.806 (91.2 %) |
| X5 | YOLOv8s CLS | 0.737 / 0.635 (86.1 %) | 0.917 / 0.850 (92.8 %) |
| X5 | YOLOv8m CLS | 0.768 / 0.702 (91.5 %) | 0.935 / 0.899 (96.2 %) |
| X5 | YOLOv8l CLS | 0.783 / 0.727 (92.9 %) | 0.942 / 0.912 (96.9 %) |
| X5 | YOLOv8x CLS | 0.790 / 0.741 (93.8 %) | 0.945 / 0.921 (97.5 %) |
Advanced Development
High-Performance Computing Process Introduction
Object Detection
In the standard processing flow, scores, categories, and xyxy coordinates are fully computed for all 8400 bounding boxes (bbox) to calculate the loss function based on ground truth (GT). However, during deployment, we only need the qualified bboxes, so it's unnecessary to compute all 8400 bboxes completely.
The optimization primarily leverages the monotonicity of the Sigmoid function to perform filtering before calculation. This approach also applies to the DFL and feature decoding stages—filtering first, then computing—which saves substantial computational effort. As a result, the inference time is significantly reduced.
Classify part, ReduceMax operation The ReduceMax operation finds the maximum value along a specific dimension of a Tensor. This operation is used to find the maximum value among the 80 scores of 8,400 Grid Cells. The operation object is the 80 category values of each Grid Cell, operating on the C dimension. Note, this operation provides the maximum value, not the index of the maximum value among the 80 values. The activation function Sigmoid has monotonicity, so the relative magnitude relationship of the 80 scores before and after the Sigmoid function remains unchanged. In summary, the position of the maximum value output directly by the bin model (after dequantization) is the same as the position of the final score's maximum value. The maximum value output by the bin model, after Sigmoid calculation, is the same as the original maximum value from the onnx model.
Classify part, Threshold(TopK) operation This operation is used to find Grid Cells among 8,400 that meet the requirements. The operation object is the 8,400 Grid Cells, operating on the H and W dimensions. If you have read my program, you will notice that I flatten the H and W dimensions later, which is only for convenience in program design and written expression; there is no essential difference. We assume the score of a certain category for a certain Grid Cell is $x$, the integer data after the activation function is $y$, and the threshold filtering process provides a threshold denoted as $C$. The necessary and sufficient condition for this score to be qualified is:
From this, we can derive the necessary and sufficient condition for this score to be qualified:
This operation will obtain the indices of the qualified Grid Cells and their corresponding maximum values. After Sigmoid calculation, this maximum value becomes the score of the category for this Grid Cell.
Classify part, GatherElements operation and ArgMax operation Using the indices of the qualified Grid Cells obtained from the Threshold(TopK) operation, the GatherElements operation retrieves the qualified Grid Cells, and the ArgMax operation determines which of the 80 categories is the largest, obtaining the category of this qualified Grid Cell.
Bounding Box part, GatherElements operation:
Using the indices of qualified grid cells obtained from the Threshold (TopK) operation, the GatherElements operation retrieves these qualified grid cells, resulting in bbox information of shape 1×64×k×1.Bounding Box part, DFL: SoftMax+Conv operation Each Grid Cell will have 4 numbers to determine the position of this box. The DFL structure provides 16 estimates for the offset of a certain edge of the box based on the anchor position. SoftMax is applied to the 16 estimates, and then a convolution operation is used to calculate the expectation. This is the core design of Anchor Free, meaning each Grid Cell is only responsible for predicting 1 Bounding box. Assuming in the prediction of the offset of a certain edge, these 16 numbers are $ l_p $ or $(t_p, t_p, b_p)$, where $p = 0,1,...,15$, the calculation formula for the offset is:
- Bounding Box part, Decode: dist2bbox(ltrb2xyxy) operation
This operation decodes the ltrb description of each Bounding Box into an xyxy description. ltrb represents the distance of the left, top, right, and bottom edges relative to the center of the Grid Cell. After restoring the relative position to absolute position and multiplying by the sampling factor of the corresponding feature layer, the xyxy coordinates can be restored. xyxy represents the predicted coordinates of the top-left and bottom-right corners of the Bounding Box.

The input image size is $Size=640$. For the $i$th feature map $(i=1, 2, 3)$ of the Bounding box prediction branch, the corresponding downsampling factor is denoted as $Stride(i)$. In YOLOv8 - Detect, $Stride(1)=8, Stride(2)=16, Stride(3)=32$, corresponding to feature map sizes of $n_i = {Size}/{Stride(i)}$, i.e., sizes of $n_1 = 80, n_2 = 40 ,n_3 = 20$ for three feature maps, totaling $n_1^2+n_2^2+n_3^3=8400$ Grid Cells, responsible for predicting 8,400 Bounding Boxes. For feature map i, the $x$th row and $y$th column are responsible for predicting the detection box of the corresponding scale Bounding Box, where $x,y \in [0, n_i)\bigcap{Z}$, $Z$ is the set of integers. The DFL structure's Bounding Box detection box description is in ltrb format, while we need the $xyxy$ format. The specific transformation relationship is as follows:
The final detection results include category (id), score, and position (xyxy).
Instance Segmentation
- Mask Coefficients part, two GatherElements operations, used to obtain the Mask Coefficients information of the final qualified Grid Cell, i.e., the 32 coefficients. These 32 coefficients are linearly combined with the Mask Protos part, or can be considered as a weighted sum, to obtain the Mask information of the target corresponding to this Grid Cell.
Pose Estimation
The keypoints of Ultralytics YOLO Pose are based on object detection. The definition of kpt is as follows:
COCO_keypoint_indexes = {
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
}
The object detection part of the Ultralytics YOLO Pose model is consistent with Ultralytics YOLO Detect, with an additional feature map of Channel = 57 corresponding to 17 Key Points, which are the coordinates x, y relative to the feature map's downsampling factor and the score of this point.
After determining through the object detection part that the Key Points at a certain location meet the requirements, multiplying them by the downsampling factor of the corresponding receptive field yields the Key Points coordinates based on the input size.
Environment Preparation and Model Training
Note: This operation is performed on an x86 machine. It is recommended to use a machine with hardware acceleration, such as a GPU supporting CUDA, where torch.cuda.is_available() is True. It is recommended to use Ubuntu 22.04 with a Python 3.10 environment.
Download the ultralytics/ultralytics repository and refer to the ultralytics official documentation to configure the environment.
git clone https://github.com/ultralytics/ultralytics.git
For model training, refer to the ultralytics official documentation, which is maintained by ultralytics and of very high quality. There are also numerous reference materials available online, making it not difficult to obtain a pre-trained model similar to the official one. Note that no program modifications are needed during training, and the forward method should not be modified.
Ultralytics YOLO Official Documentation:
- Quick Start: https://docs.ultralytics.com/quickstart/
- Model Training: https://docs.ultralytics.com/modes/train/
Model Export
Note: This operation is performed on an x86 machine. It is recommended to use Ubuntu 22.04 with a Python 3.10 environment.
Enter the local repository and download the pre-trained weights from the ultralytics official site. Here, we take the YOLO11n-Detect model as an example.
cd ultralytics
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt
In the Ultralytics YOLO training environment, run the one-click YOLO export script provided by RDK Model Zoo https://github.com/D-Robotics/rdk_model_zoo/blob/main/demos/Vision/ultralytics_YOLO/x86/export_monkey_patch.py to export the model. This script uses the ultralytics.YOLO class to load the YOLO pt model, applies a monkey patch to replace the model at the PyTorch level, and then calls the ultralytics.YOLO.export method to export the model. The exported ONNX model will be saved in the same directory as the pt model.
python3 export_monkey_patch.py --pt yolo11n.pt
Model Compilation
Install the RDK X5 OpenExplore toolchain environment. Two installation methods are provided here:
- Docker Installation (Recommended)
RDK X5 OpenExplore version 1.2.8
docker pull openexplorer/ai_toolchain_ubuntu_20_x5_cpu:v1.2.8
Or obtain the offline version of the Docker image from the Digua Developer Community: https://forum.d-robotics.cc/t/topic/28035
- pip Installation of the Trimmed Toolchain (Alternative)
Note: This operation is performed on an x86 machine. It is recommended to use Ubuntu 22.04 with a Python 3.10 environment. Note that model conversion and compilation involve various optimization strategies and programs; do not install and run them on the device.
pip install rdkx5-yolo-mapper
If you encounter download failures from PyPI, you can use the Alibaba source to install:
pip install rdkx5-yolo-mapper -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
Common Issue: The following issue generally occurs when installing larger packages, such as torch, due to unstable network connections. In this case, simply re-run the installation command. Already installed packages will be automatically skipped and not reinstalled.
error: incomplete-download
× Download failed because not enough bytes were received (552.3 MB/594.3 MB)
╰─> URL: https://...
note: This is an issue with network connectivity, not pip.
hint: Consider using --resume-retries to enable download resumption
Enter the hb_mapper command to verify successful installation
$ hb_mapper --version
hb_mapper, version 1.24.3
Run the one-click YOLO conversion script provided by RDK Model Zoo https://github.com/D-Robotics/rdk_model_zoo/blob/main/demos/Vision/ultralytics_YOLO/x86/mapper.py in the OpenExplore toolchain environment.
For this script, you need to prepare calibration images and the ONNX model. Then, it normally prepares the calibration data and the compilation yaml configuration file for you. Finally, the converted bin model will be in the same directory as the onnx model.
python3 mapper.py --onnx [*.onnx] --cal-images [cal images path]
This script exposes some common parameters, with default values already satisfying most requirements.
$ python3 mapper.py -h
usage: mapper.py [-h] [--cal-images CAL_IMAGES] [--onnx ONNX] [--quantized QUANTIZED] [--jobs JOBS] [--optimize-level OPTIMIZE_LEVEL]
[--cal-sample CAL_SAMPLE] [--cal-sample-num CAL_SAMPLE_NUM] [--save-cache SAVE_CACHE] [--cal CAL] [--ws WS]
options:
-h, --help show this help message and exit
--cal-images CAL_IMAGES *.jpg, *.png calibration images path, 20 ~ 50 pictures is OK.
--onnx ONNX origin float onnx model path.
--quantized QUANTIZED int8 first / int16 first
--jobs JOBS model combine jobs.
--optimize-level OPTIMIZE_LEVEL O0, O1, O2, O3
--cal-sample CAL_SAMPLE sample calibration data or not.
--cal-sample-num CAL_SAMPLE_NUM num of sample calibration data.
--save-cache SAVE_CACHE remove bpu output files or not.
--cal CAL calibration_data_temporary_folder
--ws WS temporary workspace
Model Deployment
Python Program Deployment
Note: This operation is performed on the board, using the board's global Python interpreter. Ensure you are using the latest RDK X5 system image and miniboot provided by Digua Developer Community.
Use the scripts in https://github.com/D-Robotics/rdk_model_zoo/tree/main/demos/Vision/ultralytics_YOLO/py. The running effect refers to the quick experience section of this document.
If you want to install this environment completely, you can refer to the following steps.
# Download RDK Model Zoo
https://github.com/D-Robotics/rdk_model_zoo
# Clone this repo (Optional)
git clone https://github.com/D-Robotics/rdk_model_zoo.git
# Make Sure your are in this file
$ cd demos/Vision/ultralytics_YOLO
# Create conda env (optional)
conda create -n rdkx5_yolo python=3.10
conda activate rdkx5_yolo
# Install requirements
pip install hobot_dnn_rdkx5 numpy==1.26.4 opencv-python scipy
# Using Alibaba PyPI Source. (optional)
pip install hobot_dnn_rdkx5 numpy==1.26.4 opencv-python scipy -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
Then, the hobot_dnn_rdkx5 library can be used in this environment. Please note that the name of the library built into the system is hobot_dnn, and the name of the library installed from the PyPI source is hobot_dnn_rdkx5. In addition, the usage methods of the two are exactly the same. Of course, you can also install the hobot_dnn_rdkx5 library in the system's global Python interpreter to ensure that your usage habits are consistent.
Contributors
- Cauchy @吴超
- SkyXZ @熊旗
- Marcelo @马超
- Suai @苏明超
References
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
YOLOv10: Real-Time End-to-End Object Detection
YOLOv12: Attention-Centric Real-Time Object Detectors
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception







