Show HN: Dual YOLOv8n UAV Detection on RK3588S at 42 FPS Using NPU
35 points - today at 2:37 PM
SourceComments
The main trick is not the YOLO model itself, but the pipeline structure: MIPI capture through the ISP, resize/color conversion through RGA, and YOLOv8n inference through all 3 NPU cores with one RKNN context per core. With a 3-thread inference pool the pipeline goes from ~31 FPS to the OS08A10 cameraβs 46 FPS ceiling.
The memory footprint is also small: roughly 137β152 MB RSS for one 1080p stream, using a fixed preallocated buffer pool rather than per-frame allocations. Two streams are roughly 276β304 MB RSS.
The repo also has a multi-process side of the pipeline: detections are published over Unix-domain sockets to tracking, temporal features, a presence FSM, and an optional Qwen2.5-0.5B summary step. For the LLM step, the camera pipeline can temporarily blackout/resume so RKLLM gets the whole NPU.
I split the work into three repos:
- runtime dual-stream YOLOv8n RK3588S pipeline: https://github.com/alebal123bal/khadas_yolov8n_multithread
- train/export/INT8 RKNN conversion for YOLOv8/YOLOv5: https://github.com/alebal123bal/RKNN_TRAIN_YOLO
- Qwen on RK3588S, via RKLLM/NPU or llama.cpp/CPU: https://github.com/alebal123bal/RKLLM_LLAMA_QWEN
The demo class is UAV/drone, but this is meant as a general edge-inference pipeline example, not an operational/surveillance/defense system.