SkiaSharp and ImageSharp Initial Comparison

This is the first in a series of posts from my session at the Agent Camp – Christchurch about using Open Neural Network Exchange(ONNX) for processing Moving Picture Experts Group (MPEG) video and Pulse Code Modulation(PCM) audio streams.

For processing video streams one of the first steps is extracting individual Joint Photographic Experts Group(JPEG) images from MPEG Real-Time Streaming Protocol(RTSP) stream. The jpeg images then have to transformed into an ONNX DenseTensor<float> in the correct format for the Ultralytics Yolo26 model. These image processing posts will use Ultralytics Yolo26 standard Small object detection model which has an input image size of 640*640pixels.

I have used both the YoloSharp and YoloDotNet libraries (Thank you Niklas Swärd and dme-compunet I appreciate the amount of effort you have put in). Both these libraries have support for object detection, instance segmentation, oriented bounding boxes detection(OBB), classification and pose estimation. They both have support for different versions, video stream processing, plotting minimum bounding boxes, Non-Maximum Suppression(NMS) for earlier models like YOLOv8 or YOLO11. I just need object detection (none of the other model types, plotting minimum boxes etc.) to work as fast as possible on my Seeedstudio EdgeBox RPi 200.

First step, was to use Benchmark.Net compare the performance of Six Labors ImageSharp (used by YoloSharp) and SkiaSharp (used by YoloDotNet). Six Labors ImageSharp  is a high-performance, fully managed, 2D graphics API whereas SkiaSharp is a wrapper for Google’s Skia 2D Graphics Library.

ImageSharp Benchmark
SkiaSharp Benchmark

The initial comparison running on my development box (will benchmark on my Seeedstudio EdgeBox RPi 200.) was roughly what I was expecting though the SkaiSharp 2560×1440 mean duration was a bit odd. I think that the difference in the amount of memory allocated is because SkaiSharp’s memory is allocated by the native code. Both benchmarks need some refactoring to improve repeatability on my different platforms.

These benchmarks should be treated as indicative not authoritative 

NickSwardh NuGet Nvidia Jetson Orin Nano™ GPU CUDA Inferencing

My Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G coprocessor. YoloDotNet by NickSwardh V2 (uses SkiaSharp) was significantly faster when run on the ARM64 CPU so I wanted to try inferencing with the Nividia Jetson Orin 8G coprocessor.

Performance of YoloDotNet by NickSwardh V2 running on the ARM64 CPU

Performance of YoloDotNet by NickSwardh V2 running on the Nividia Jetson Orin 8G with Compute Unified Device Architecture (CUDA) enabled.

Enabling CUDA reduced the total image scaling, pre-processing, inferencing, and post processing time from 115mSec to 36mSec which is a significant improvement.

YoloV8-NuGet Performance ARM64 CPU

To see how the dme-compunet, updated YoloDotNet and sstainba NuGets performed on an ARM64 CPU I built a test rig for the different NuGets using standard images and ONNX Models.

I started with the dme-compunet YoloV8 NuGet which found all the tennis balls and the results were consistent with earlier tests.

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so I built “old” and “updated” test harnesses.

The YoloDotNet by NickSwardh V1 and V2 results were slightly different. The V2 NuGet uses SkiaSharp which appears to significantly improve the performance.

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just in case

The dme-compunet YoloV8 and NickSwardh YoloDotNet V1 versions produced the same results, but the NickSwardh YoloDotNet V2 results were slightly different.

  • dme-Compunet 291 mSec
  • NickSwardV1 480 mSec
  • NickSwardV2 115 mSecs
  • SStainba 422 mSec

Like in the YoloV8-NuGet Performance X64 CPU post the NickSwardV2 implementation which uses SkiaSharp was significantly faster so it looks like Sixlabors.ImageSharp is the issue.

To support Compute Unified Device Architecture (CUDA) or TensorRT inferencing with NickSwardV2(for SkiaSharp) will need some major modifications to the code so it might be better to build my own YoloV8 Nuget.