YoloV8 NuGet on a diet – Part 1

Recently most of my YoloV8 projects inference on edge devices or in Azure and do not use Graphics Processing Unit(GPU) hardware. Most of my projects use the dme-compunet YoloV8 NuGet and for compatibility (Removal of extended CUDA & TensorRT configuration functionality) reasons this post uses version 4.2 of the source.

The initial version of the YoloV8.dll in the version 4.2 of the NuGet was 96.5KB. Most of my applications deployed to edge devices and Azure do not require plotting functionality so I started by commenting out (not terribly subtle).

The intermediate version of the YoloV8.dll in the version 4.2 of the NuGet on a “diet” with the plotting functionality “commented out” also meant the SixLabors.ImageSharp.Drawing NuGet could be removed.

The final release version of the YoloV8.dll in the version 4.2 of the NuGet was 64.0 KB. The main purpose of this process was to remove any unnecessary functionality to see how hard it would be to replace the SixLabors.ImageSharp and SixLabors.ImageSharp.Drawing with libraries that have better performance.

YoloV8-NuGet Performance ARM64 CPU

To see how the dme-compunet, updated YoloDotNet and sstainba NuGets performed on an ARM64 CPU I built a test rig for the different NuGets using standard images and ONNX Models.

I started with the dme-compunet YoloV8 NuGet which found all the tennis balls and the results were consistent with earlier tests.

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so I built “old” and “updated” test harnesses.

The YoloDotNet by NickSwardh V1 and V2 results were slightly different. The V2 NuGet uses SkiaSharp which appears to significantly improve the performance.

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just in case

The dme-compunet YoloV8 and NickSwardh YoloDotNet V1 versions produced the same results, but the NickSwardh YoloDotNet V2 results were slightly different.

  • dme-Compunet 291 mSec
  • NickSwardV1 480 mSec
  • NickSwardV2 115 mSecs
  • SStainba 422 mSec

Like in the YoloV8-NuGet Performance X64 CPU post the NickSwardV2 implementation which uses SkiaSharp was significantly faster so it looks like Sixlabors.ImageSharp is the issue.

To support Compute Unified Device Architecture (CUDA) or TensorRT inferencing with NickSwardV2(for SkiaSharp) will need some major modifications to the code so it might be better to build my own YoloV8 Nuget.

YoloV8-NuGet Performance X64 CPU

When checking the dme-compunet, YoloDotNet, and sstainba and NuGets I noticed YoloDotNet readme.md detailed some performance enhancements…

What’s new in YoloDotNet v2.0?

YoloDotNet 2.0 is a Speed Demon release where the main focus has been on supercharging performance to bring you the fastest and most efficient version yet. With major code optimizations, a switch to SkiaSharp for lightning-fast image processing, and added support for Yolov10 as a little extra 😉 this release is set to redefine your YoloDotNet experience:

Changing the implementation to use SkiaSharp caught my attention because in previous testing manipulating images with the Sixlabors.ImageSharp library took longer than expected.

I built a test rig for comparing the performance of the different NuGets using standard images and ONNX Models.

I started with the dme-compunet YoloV8 NuGet which found all the tennis balls and the results were consistent with earlier tests.

dme-compunet test harness image bounding boxes

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so I built “old” and “updated” test harnesses. The V1 version found all the tennis balls and the results were consistent with earlier tests.

NickSwardh V1 test harness image bounding boxes

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so there were some code changes but the V1 and V2 results were slightly different.

NickSwardh V2 test harness image bounding boxes

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just in case and the results were consistent with previous tests.

sstainba test harness image bounding boxes

The dme-compunet YoloV8 and NickSwardh YoloDotNet V1 versions produce the same results, but the NickSwardh YoloDotNet V2 results were slightly different. The YoloV8 by sstainba results were unchanged.

  • dme-Compunet 71 mSec
  • NickSwardV1 76 mSec
  • NickSwardV2 33 mSecs
  • SStainba 82mSec

The NickSwardV2 implementation was significantly faster, but I need to investigate the slight difference in the bounding boxes. It looks like Sixlabors.ImageSharp might be the issue.

Azure Event Grid YoloV8- Basic MQTT Client Pose Estimation

The Azure.EventGrid.Image.YoloV8.Pose application downloads images from a security camera, processes them with the default YoloV8(by Ultralytics) Pose Estimation model then publishes the results to an Azure Event Grid MQTT broker topic.

private async void ImageUpdateTimerCallback(object? state)
{
   DateTime requestAtUtc = DateTime.UtcNow;

   // Just incase - stop code being called while photo or prediction already in progress
   if (_ImageProcessing)
   {
      return;
   }
   _ImageProcessing = true;

   try
   {
      _logger.LogDebug("Camera request start");

      PoseResult result;

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      {
         result = await _predictor.PoseAsync(cameraStream);
      }

      _logger.LogInformation("Speed Preprocess:{Preprocess} Postprocess:{Postprocess}", result.Speed.Preprocess, result.Speed.Postprocess);


      if (_logger.IsEnabled(LogLevel.Debug))
      {
         _logger.LogDebug("Pose results");

         foreach (var box in result.Boxes)
         {
            _logger.LogDebug(" Class:{box.Class} Confidence:{Confidence:f1}% X:{X} Y:{Y} Width:{Width} Height:{Height}", box.Class.Name, box.Confidence * 100.0, box.Bounds.X, box.Bounds.Y, box.Bounds.Width, box.Bounds.Height);

            foreach (var keypoint in box.Keypoints)
            {
               Model.PoseMarker poseMarker = (Model.PoseMarker)keypoint.Index;

               _logger.LogDebug("  Class:{Class} Confidence:{Confidence:f1}% X:{X} Y:{Y}", Enum.GetName(poseMarker), keypoint.Confidence * 100.0, keypoint.Point.X, keypoint.Point.Y);
            }
         }
      }

      var message = new MQTT5PublishMessage
      {
         Topic = string.Format(_applicationSettings.PublishTopic, _applicationSettings.UserName),
         Payload = Encoding.ASCII.GetBytes(JsonSerializer.Serialize(new
         {
            result.Boxes
         })),
         QoS = _applicationSettings.PublishQualityOfService,
      };

      _logger.LogDebug("HiveMQ.Publish start");

      var resultPublish = await _mqttclient.PublishAsync(message);

      _logger.LogDebug("HiveMQ.Publish done");
   }
   catch (Exception ex)
   {
      _logger.LogError(ex, "Camera image download, processing, or telemetry failed");
   }
   finally
   {
      _ImageProcessing = false;
   }

   TimeSpan duration = DateTime.UtcNow - requestAtUtc;

   _logger.LogDebug("Camera Image download, processing and telemetry done {TotalSeconds:f2} sec", duration.TotalSeconds);
}

The application uses a Timer(with configurable Due and Period times) to poll the security camera, detect objects in the image then publish a JavaScript Object Notation(JSON) representation of the results to Azure Event Grid MQTT broker topic using a HiveMQ client.

Utralytics Pose Model input image

The Unv ADZK-10 camera used in this sample has a Hypertext Transfer Protocol (HTTP) Uniform Resource Locator(URL) for downloading the current image. Like the YoloV8.Detect.SecurityCamera.Stream sample the image “streamed” using the HttpClient.GetStreamAsync to the YoloV8 PoseAsync method.

Azure.EventGrid.Image.YoloV8.Pose application console output

The same approach as the YoloV8.Detect.SecurityCamera.Stream sample is used because the image doesn’t have to be saved on the local filesystem.

Utralytics Pose Model marked-up image

To check the results, I put a breakpoint in the timer just after PoseAsync method is called and then used the Visual Studio 2022 Debugger QuickWatch functionality to inspect the contents of the PoseResult object.

Visual Studio 2022 Debugger PoseResult Quickwatch

For testing I configured a single Azure Event Grid custom topic subscription an Azure Storage Queue.

Azure Event Grid Topic Metrics

An Azure Storage Queue is an easy way to store messages while debugging/testing an application.

Azure Storage Explorer messages list

Azure Storage Explorer is a good tool for listing recent messages, then inspecting their payloads.

Azure Storage Explorer Message Details

The Azure Event Grid custom topic message text(in data_base64) contains the JavaScript Object Notation(JSON) of the pose detection result.

{"Boxes":[{"Keypoints":[{"Index":0,"Point":{"X":744,"Y":58,"IsEmpty":false},"Confidence":0.6334442},{"Index":1,"Point":{"X":746,"Y":33,"IsEmpty":false},"Confidence":0.759928},{"Index":2,"Point":{"X":739,"Y":46,"IsEmpty":false},"Confidence":0.19036674},{"Index":3,"Point":{"X":784,"Y":8,"IsEmpty":false},"Confidence":0.8745915},{"Index":4,"Point":{"X":766,"Y":45,"IsEmpty":false},"Confidence":0.086735755},{"Index":5,"Point":{"X":852,"Y":50,"IsEmpty":false},"Confidence":0.9166329},{"Index":6,"Point":{"X":837,"Y":121,"IsEmpty":false},"Confidence":0.85815763},{"Index":7,"Point":{"X":888,"Y":31,"IsEmpty":false},"Confidence":0.6234426},{"Index":8,"Point":{"X":871,"Y":205,"IsEmpty":false},"Confidence":0.37670398},{"Index":9,"Point":{"X":799,"Y":21,"IsEmpty":false},"Confidence":0.3686208},{"Index":10,"Point":{"X":768,"Y":205,"IsEmpty":false},"Confidence":0.21734264},{"Index":11,"Point":{"X":912,"Y":364,"IsEmpty":false},"Confidence":0.98523325},{"Index":12,"Point":{"X":896,"Y":382,"IsEmpty":false},"Confidence":0.98377174},{"Index":13,"Point":{"X":888,"Y":637,"IsEmpty":false},"Confidence":0.985927},{"Index":14,"Point":{"X":849,"Y":645,"IsEmpty":false},"Confidence":0.9834709},{"Index":15,"Point":{"X":951,"Y":909,"IsEmpty":false},"Confidence":0.96191007},{"Index":16,"Point":{"X":921,"Y":894,"IsEmpty":false},"Confidence":0.9618156}],"Class":{"Id":0,"Name":"person"},"Bounds":{"X":690,"Y":3,"Width":315,"Height":1001,"Location":{"X":690,"Y":3,"IsEmpty":false},"Size":{"Width":315,"Height":1001,"IsEmpty":false},"IsEmpty":false,"Top":3,"Right":1005,"Bottom":1004,"Left":690},"Confidence":0.8341071}]}

Azure Event Grid YoloV8- Basic MQTT Client Object Detection

The Azure.EventGrid.Image.Detect application downloads images from a security camera, processes them with the default YoloV8(by Ultralytics) object detection model, then publishes the results to an Azure Event Grid MQTT broker topic.

The Unv ADZK-10 camera used in this sample has a Hypertext Transfer Protocol (HTTP) Uniform Resource Locator(URL) for downloading the current image. Like the YoloV8.Detect.SecurityCamera.Stream sample the image “streamed” using the HttpClient.GetStreamAsync to the YoloV8 DetectAsync method.

private async void ImageUpdateTimerCallback(object? state)
{
   DateTime requestAtUtc = DateTime.UtcNow;

   // Just incase - stop code being called while photo or prediction already in progress
   if (_ImageProcessing)
   {
      return;
   }
   _ImageProcessing = true;

   try
   {
      _logger.LogDebug("Camera request start");

      DetectionResult result;

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      {
         result = await _predictor.DetectAsync(cameraStream);
      }

      _logger.LogInformation("Speed Preprocess:{Preprocess} Postprocess:{Postprocess}", result.Speed.Preprocess, result.Speed.Postprocess);

      if (_logger.IsEnabled(LogLevel.Debug))
      {
         _logger.LogDebug("Detection results");

         foreach (var box in result.Boxes)
         {
            _logger.LogDebug(" Class {box.Class} {Confidence:f1}% X:{box.Bounds.X} Y:{box.Bounds.Y} Width:{box.Bounds.Width} Height:{box.Bounds.Height}", box.Class, box.Confidence * 100.0, box.Bounds.X, box.Bounds.Y, box.Bounds.Width, box.Bounds.Height);
         }
      }

      var message = new MQTT5PublishMessage
      {
         Topic = string.Format(_applicationSettings.PublishTopic, _applicationSettings.UserName),
         Payload = Encoding.ASCII.GetBytes(JsonSerializer.Serialize(new
         {
            result.Boxes,
         })),
         QoS = _applicationSettings.PublishQualityOfService,
      };

      _logger.LogDebug("HiveMQ.Publish start");

      var resultPublish = await _mqttclient.PublishAsync(message);

      _logger.LogDebug("HiveMQ.Publish done");
   }
   catch (Exception ex)
   {
      _logger.LogError(ex, "Camera image download, processing, or telemetry failed");
   }
   finally
   {
      _ImageProcessing = false;
   }

   TimeSpan duration = DateTime.UtcNow - requestAtUtc;

   _logger.LogDebug("Camera Image download, processing and telemetry done {TotalSeconds:f2} sec", duration.TotalSeconds);
}

The application uses a Timer(with configurable Due and Period times) to poll the security camera, detect objects in the image then publish a JavaScript Object Notation(JSON) representation of the results to Azure Event Grid MQTT broker topic using a HiveMQ client.

Console application displaying object detection results

The uses the Microsoft.Extensions.Logging library to publish diagnostic information to the console while debugging the application.

Visual Studio 2022 QuickWatch displaying object detection results.

To check the results I put a breakpoint in the timer just after DetectAsync method is called and then used the Visual Studio 2022 Debugger QuickWatch functionality to inspect the contents of the DetectionResult object.

Visual Studio 2022 JSON Visualiser displaying object detection results.

To check the JSON payload of the MQTT message I put a breakpoint just before the HiveMQ PublishAsync method. I then inspected the payload using the Visual Studio 2022 JSON Visualizer.

Security Camera image for object detection photo bombed by Yarnold our Standard Apricot Poodle.

This application can also be deployed as a Linux systemd Service so it will start then run in the background. The same approach as the YoloV8.Detect.SecurityCamera.Stream sample is used because the image doesn’t have to be saved on the local filesystem.

YoloV8-File, Stream, & Byte array Camera Images

After building some proof-of-concept applications I have decided to use the YoloV8 by dme-compunet NuGet because it supports async await and code with async await is always better (yeah right).

The YoloV8.Detect.SecurityCamera.File sample downloads images from the security camera to the local file system, then calls DetectAsync with the local file path.

private static async void ImageUpdateTimerCallback(object state)
{
   //...
   try
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image File processing start");

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      using (Stream fileStream = System.IO.File.Create(_applicationSettings.ImageFilepath))
      {
         await cameraStream.CopyToAsync(fileStream);
      }

      DetectionResult result = await _predictor.DetectAsync(_applicationSettings.ImageFilepath);

      Console.WriteLine($"Speed: {result.Speed}");

      foreach (var prediction in result.Boxes)
      {
         Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
   }
//...
}
Console application using camera image saved on filesystem

The YoloV8.Detect.SecurityCamera.Bytes sample downloads images from the security camera as an array of bytes then calls DetectAsync.

private static async void ImageUpdateTimerCallback(object state)
{
   //...
   try
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image Bytes processing start");

      byte[] bytes = await _httpClient.GetByteArrayAsync(_applicationSettings.CameraUrl);

      DetectionResult result = await _predictor.DetectAsync(bytes);

      Console.WriteLine($"Speed: {result.Speed}");

      foreach (var prediction in result.Boxes)
      {
         Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
   }
//...
}
Console application downloading camera image as an array bytes.

The YoloV8.Detect.SecurityCamera.Stream sample “streams” the image from the security camera to DetectAsync.

private static async void ImageUpdateTimerCallback(object state)
{
   // ...
   try
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image Stream processing start");

      DetectionResult result;

      using (System.IO.Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      {
         result = await _predictor.DetectAsync(cameraStream);
      }

      Console.WriteLine($"Speed: {result.Speed}");

      foreach (var prediction in result.Boxes)
      {
         Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
   }
//...
}
Console application streaming camera image.

The ImageSelector parameter of DetectAsync caught my attention as I hadn’t seen this approach used before. The developers who wrote the NuGet package are definitely smarter than me so I figured I might learn something useful digging deeper.

My sample object detection applications all call

public static async Task<DetectionResult> DetectAsync(this YoloV8 predictor, ImageSelector selector)
{
    return await Task.Run(() => predictor.Detect(selector));
}

Which then invokes

public static DetectionResult Detect(this YoloV8 predictor, ImageSelector selector)
{
    predictor.ValidateTask(YoloV8Task.Detect);

    return predictor.Run(selector, (outputs, image, timer) =>
    {
        var output = outputs[0].AsTensor<float>();

        var parser = new DetectionOutputParser(predictor.Metadata, predictor.Parameters);

        var boxes = parser.Parse(output, image);
        var speed = timer.Stop();

        return new DetectionResult
        {
            Boxes = boxes,
            Image = image,
            Speed = speed,
        };
    });

    public TResult Run<TResult>(ImageSelector selector, PostprocessContext<TResult> postprocess) where TResult : YoloV8Result
    {
        using var image = selector.Load(true);

        var originSize = image.Size;

        var timer = new SpeedTimer();

        timer.StartPreprocess();

        var input = Preprocess(image);

        var inputs = MapNamedOnnxValues([input]);

        timer.StartInference();

        using var outputs = Infer(inputs);

        var list = new List<NamedOnnxValue>(outputs);

        timer.StartPostprocess();

        return postprocess(list, originSize, timer);
    }
}

It looks like most of the image loading magic of ImageSelector class is implemented using the SixLabors library…

public class ImageSelector<TPixel> where TPixel : unmanaged, IPixel<TPixel>
{
    private readonly Func<Image<TPixel>> _factory;

    public ImageSelector(Image image)
    {
        _factory = image.CloneAs<TPixel>;
    }

    public ImageSelector(string path)
    {
        _factory = () => Image.Load<TPixel>(path);
    }

    public ImageSelector(byte[] data)
    {
        _factory = () => Image.Load<TPixel>(data);
    }

    public ImageSelector(Stream stream)
    {
        _factory = () => Image.Load<TPixel>(stream);
    }

    internal Image<TPixel> Load(bool autoOrient)
    {
        var image = _factory();

        if (autoOrient)
            image.Mutate(x => x.AutoOrient());

        return image;
    }

    public static implicit operator ImageSelector<TPixel>(Image image) => new(image);
    public static implicit operator ImageSelector<TPixel>(string path) => new(path);
    public static implicit operator ImageSelector<TPixel>(byte[] data) => new(data);
    public static implicit operator ImageSelector<TPixel>(Stream stream) => new(stream);
}

Learnt something new must be careful to apply it only where it adds value.

YoloV8-One of these NuGets is not like the others

A couple of days after my initial testing the YoloV8 by dme-compunet NuGet was updated so I reran my test harnesses.

Then the YoloDotNet by NichSwardh NuGet was also updated so I reran my all my test harnesses again.

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just incase.

The dme-compunet YoloV8 and NickSwardh YoloDotNet NuGets results are now the same (bar the 30% cutoff) and YoloV8 by sstainba results have not changed.

YoloV8-All of these NuGets are not like the others

As part investigating which YoloV8 NuGet to use, I built three trial applications using dme-compunet YoloV8, sstainba Yolov8.Net, and NickSwardh YoloDotNet NuGets.

All of the implementations load the model, load the sample image, detect objects in the image, then markup the image with the classification, minimum bounding boxes, and confidences of each object.

Input Image

The first implementation uses YoloV8 by dme-compunet which supports asynchronous operation. The image is loaded asynchronously, the prediction is asynchronous, then marked up and saved asynchronously.

using (var predictor = new Compunet.YoloV8.YoloV8(_applicationSettings.ModelPath))
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
   Console.WriteLine();

   using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");

      var predictions = await predictor.DetectAsync(image);

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
      Console.WriteLine();

      Console.WriteLine($" Speed: {predictions.Speed}");

      foreach (var prediction in predictions.Boxes)
      {
         Console.WriteLine($"  Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }
      Console.WriteLine();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

      SixLabors.ImageSharp.Image imageOutput = await predictions.PlotImageAsync(image);

      await imageOutput.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}
dme-compunet YoloV8 test application output

The second implementation uses YoloDotNet by NichSwardh which partially supports asynchronous operation. The image is loaded asynchronously, the prediction is synchronous, the markup is synchronous, and then saved asynchronously.

using (var predictor = new Yolo(_applicationSettings.ModelPath, false))
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
   Console.WriteLine();

   using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");

      var predictions = predictor.RunObjectDetection(image);

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
      Console.WriteLine();

      foreach (var predicition in predictions)
      {
         Console.WriteLine($"  Class {predicition.Label.Name} {(predicition.Confidence * 100.0):f1}% X:{predicition.BoundingBox.Left} Y:{predicition.BoundingBox.Y} Width:{predicition.BoundingBox.Width} Height:{predicition.BoundingBox.Height}");
      }
      Console.WriteLine();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

      image.Draw(predictions);

      await image.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}
nickswardth YoloDotNet test application output

The third implementation uses YoloV8 by sstainba which partially supports asynchronous operation. The image is loaded asynchronously, the prediction is synchronous, the markup is synchronous, and then saved asynchronously.

using (var predictor = YoloV8Predictor.Create(_applicationSettings.ModelPath))
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
   Console.WriteLine();

   using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");

      var predictions = predictor.Predict(image);

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
      Console.WriteLine();

      foreach (var prediction in predictions)
      {
         Console.WriteLine($"  Class {prediction.Label.Name} {(prediction.Score * 100.0):f1}% X:{prediction.Rectangle.X} Y:{prediction.Rectangle.Y} Width:{prediction.Rectangle.Width} Height:{prediction.Rectangle.Height}");
      }

      Console.WriteLine();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

      // This is a bit hacky should be fixed up in future release
      Font font = new Font(SystemFonts.Get(_applicationSettings.FontName), _applicationSettings.FontSize);
      foreach (var prediction in predictions)
      {
         var x = (int)Math.Max(prediction.Rectangle.X, 0);
         var y = (int)Math.Max(prediction.Rectangle.Y, 0);
         var width = (int)Math.Min(image.Width - x, prediction.Rectangle.Width);
         var height = (int)Math.Min(image.Height - y, prediction.Rectangle.Height);

         //Note that the output is already scaled to the original image height and width.

         // Bounding Box Text
         string text = $"{prediction.Label.Name} [{prediction.Score}]";
         var size = TextMeasurer.MeasureSize(text, new TextOptions(font));

         image.Mutate(d => d.Draw(Pens.Solid(Color.Yellow, 2), new Rectangle(x, y, width, height)));

         image.Mutate(d => d.DrawText(text, font, Color.Yellow, new Point(x, (int)(y - size.Height - 1))));
      }

      await image.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}
sstainba YoloV8 test application output

I don’t understand why the three NuGets produced different results which is worrying.

ML.Net YoloV5 + Security Camera Revisited

This post is about “revisiting” my ML.Net YoloV5 + Camera on ARM64 Raspberry PI application, updating it to .NET 6, the latest version of the TechWings yolov5-net (library formerly from mentalstack) and the latest version of the ML.Net Open Neural Network Exchange(ONNX) libraries.

Visual Studio 2022 with updated NuGet packages

The updated TechWings yolov5-net library now uses Six Labors ImageSharp for markup rather than System.Drawing.Common. (I found System.Drawing.Common a massive Pain in the Arse (PiTA))

private static async void ImageUpdateTimerCallback(object state)
{
   DateTime requestAtUtc = DateTime.UtcNow;

   // Just incase - stop code being called while photo already in progress
   if (_cameraBusy)
   {
      return;
   }
   _cameraBusy = true;

   Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Image processing start");

   try
   {
#if SECURITY_CAMERA
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Security Camera Image download start");

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      using (Stream fileStream = File.Create(_applicationSettings.ImageInputFilenameLocal))
      {
         await cameraStream.CopyToAsync(fileStream);
      }

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Security Camera Image download done");
#endif

      List<YoloPrediction> predictions;

      // Process the image on local file system
      using (Image<Rgba32> image = await Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputFilenameLocal))
      {
         Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV5 inferencing start");
         predictions = _scorer.Predict(image);
         Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV5 inferencing done");

#if OUTPUT_IMAGE_MARKUP
         Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Image markup start");

         var font = new Font(new FontCollection().Add(_applicationSettings.ImageOutputMarkupFontPath), _applicationSettings.ImageOutputMarkupFontSize);

         foreach (var prediction in predictions) // iterate predictions to draw results
         {
            double score = Math.Round(prediction.Score, 2);

            var (x, y) = (prediction.Rectangle.Left - 3, prediction.Rectangle.Top - 23);

            image.Mutate(a => a.DrawPolygon(Pens.Solid(prediction.Label.Color, 1),
                  new PointF(prediction.Rectangle.Left, prediction.Rectangle.Top),
                  new PointF(prediction.Rectangle.Right, prediction.Rectangle.Top),
                  new PointF(prediction.Rectangle.Right, prediction.Rectangle.Bottom),
                  new PointF(prediction.Rectangle.Left, prediction.Rectangle.Bottom)
            ));

            image.Mutate(a => a.DrawText($"{prediction.Label.Name} ({score})",
                  font, prediction.Label.Color, new PointF(x, y)));
         }

         await image.SaveAsJpegAsync(_applicationSettings.ImageOutputFilenameLocal);

         Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Image markup done");
#endif
      }


#if PREDICTION_CLASSES
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Image classes start");
      foreach (var prediction in predictions)
      {
         Console.WriteLine($"  Name:{prediction.Label.Name} Score:{prediction.Score:f2} Valid:{prediction.Score > _applicationSettings.PredictionScoreThreshold}");
      }
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Image classes done");
#endif

#if PREDICTION_CLASSES_OF_INTEREST
      IEnumerable<string> predictionsOfInterest = predictions.Where(p => p.Score > _applicationSettings.PredictionScoreThreshold).Select(c => c.Label.Name).Intersect(_applicationSettings.PredictionLabelsOfInterest, StringComparer.OrdinalIgnoreCase);

      if (predictionsOfInterest.Any())
      {
         Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss} Camera image comtains {String.Join(",", predictionsOfInterest)}");
      }

#endif
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Camera image download, upload or post procesing failed {ex.Message}");
   }
   finally
   {
      _cameraBusy = false;
   }

   TimeSpan duration = DateTime.UtcNow - requestAtUtc;

   Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Image processing done {duration.TotalSeconds:f2} sec");
   Console.WriteLine();
}

The names of the input image, output image and yoloV5 model file are configured in the appsettings.json (on device) or secrets.json (Visual Studio 2022 desktop) file. The location (ImageOutputMarkupFontPath) and size (ImageOutputMarkupFontSize) of the font used are configurable to make it easier run the application on different devices and operating systems.

{
   "ApplicationSettings": {
      "ImageTimerDue": "0.00:00:15",
      "ImageTimerPeriod": "0.00:00:30",

      "CameraUrl": "HTTP://10.0.0.56:85/images/snapshot.jpg",
      "CameraUserName": "",
      "CameraUserPassword": "",

      "ImageInputFilenameLocal": "InputLatest.jpg",
      "ImageOutputFilenameLocal": "OutputLatest.jpg",

      "ImageOutputMarkupFontPath": "C:/Windows/Fonts/consola.ttf",
      "ImageOutputMarkupFontSize": 16,

      "YoloV5ModelPath": "YoloV5/yolov5s.onnx",

      "PredictionScoreThreshold": 0.5,

      "PredictionLabelsOfInterest": [
         "bicycle",
         "person",
         "bench"
      ]
   }
}

The test-rig consisted of a Unv ADZK-10 Security Camera, Power over Ethernet(PoE) module and my development desktop PC.

My bicycle and “mother in laws” car in backyard
YoloV5ObjectDetectionCamera running on my desktop PC

Once the YoloV5s model was loaded, inferencing was taking roughly 0.47 seconds.

Marked up image of my bicycle and “mother in laws” car in backyard

Summary

Again, I was “standing on the shoulders of giants” the TechWings code just worked. With a pretrained yoloV5 model, the ML.Net Open Neural Network Exchange(ONNX) plumbing it took a couple of hours to update the application. Most of this time was learning about the Six Labors ImageSharp library to mark up the images.