YoloV8 ONNX – Nvidia Jetson Orin Nano™ DenseTensor Performance

When running the YoloV8 Coprocessor demonstration on the Nividia Jetson Orin inferencing looked a bit odd, the dotted line wasn’t moving as fast as expected. To investigate this further I split the inferencing duration into pre-processing, inferencing and post-processing times. Inferencing and post-processing were “quick”, but pre-processing was taking longer than expected.

YoloV8 Coprocessor application running on Nvidia Jetson Orin

When I ran the demonstration Ultralytics YoloV8 object detection console application on my development desktop (13th Gen Intel(R) Core(TM) i7-13700 2.10 GHz with 32.0 GB) the pre-processing was much faster.

The much shorter pre-processing and longer inferencing durations were not a surprise as my development desktop does not have a Graphics Processing Unit(GPU)

Test image used for testing on Jetson device and development PC

The test image taken with my mobile was 3606×2715 pixels which was representative of the security cameras images to be processed by the solution.

Redgate ANTS Performance Profiler instrumentation of application execution

On my development box running the application with Redgate ANTS Performance Profiler highlighted that the Computnet YoloV8 code converting the image to a DenseTensor could be an issue.

 public static void ProcessToTensor(Image<Rgb24> image, Size modelSize, bool originalAspectRatio, DenseTensor<float> target, int batch)
 {
    var options = new ResizeOptions()
    {
       Size = modelSize,
       Mode = originalAspectRatio ? ResizeMode.Max : ResizeMode.Stretch,
    };

    var xPadding = (modelSize.Width - image.Width) / 2;
    var yPadding = (modelSize.Height - image.Height) / 2;

    var width = image.Width;
    var height = image.Height;

    // Pre-calculate strides for performance
    var strideBatchR = target.Strides[0] * batch + target.Strides[1] * 0;
    var strideBatchG = target.Strides[0] * batch + target.Strides[1] * 1;
    var strideBatchB = target.Strides[0] * batch + target.Strides[1] * 2;
    var strideY = target.Strides[2];
    var strideX = target.Strides[3];

    // Get a span of the whole tensor for fast access
    var tensorSpan = target.Buffer;

    // Try get continuous memory block of the entire image data
    if (image.DangerousTryGetSinglePixelMemory(out var memory))
    {
       Parallel.For(0, width * height, index =>
       {
             int x = index % width;
             int y = index / width;
             int tensorIndex = strideBatchR + strideY * (y + yPadding) + strideX * (x + xPadding);

             var pixel = memory.Span[index];
             WritePixel(tensorSpan.Span, tensorIndex, pixel, strideBatchR, strideBatchG, strideBatchB);
       });
    }
    else
    {
       Parallel.For(0, height, y =>
       {
             var rowSpan = image.DangerousGetPixelRowMemory(y).Span;
             int tensorYIndex = strideBatchR + strideY * (y + yPadding);

             for (int x = 0; x < width; x++)
             {
                int tensorIndex = tensorYIndex + strideX * (x + xPadding);
                var pixel = rowSpan[x];
                WritePixel(tensorSpan.Span, tensorIndex, pixel, strideBatchR, strideBatchG, strideBatchB);
             }
       });
    }
 }

 private static void WritePixel(Span<float> tensorSpan, int tensorIndex, Rgb24 pixel, int strideBatchR, int strideBatchG, int strideBatchB)
 {
    tensorSpan[tensorIndex] = pixel.R / 255f;
    tensorSpan[tensorIndex + strideBatchG - strideBatchR] = pixel.G / 255f;
    tensorSpan[tensorIndex + strideBatchB - strideBatchR] = pixel.B / 255f;
 }

For a 3606×2715 image the WritePixel method would be called tens of millions of times so its implementation and the overall approach used for ProcessToTensor has a significant impact on performance.

YoloV8 Coprocessor application running on Nvidia Jetson Orin with a resized image

Resizing the images had a significant impact on performance on the development box and Nividia Jetson Orin. This will need some investigation to see how much reducing the resizing the images impacts on the performance and accuracy of the model.

The ProcessToTensor method has already had some performance optimisations which improved performance by roughly 20%. There have been discussions about optimising similar code e.g. Efficient Bitmap to OnnxRuntime Tensor in C#, and Efficient RGB Image to Tensor in dotnet which look applicable and these will be evaluated.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ GPU TensorRT Inferencing

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G. To speed up inferencing on the Nividia Jetson Orin 8G with TensorRT I built an Open Neural Network Exchange(ONNX) TensorRT Execution Provider.

Roboflow Universe Tennis Ball by Ugur ozdemir dataset

The Open Neural Network Exchange(ONNX) model used was trained on Roboflow Universe by Ugur ozdemir dataset which has 23696 images. The initial version of the TensorRT integration used the builder.UseTensorrt method of the IYoloV8Builder interface.

...
YoloV8Builder builder = new YoloV8Builder();

builder.UseOnnxModel(_applicationSettings.ModelPath);

if (_applicationSettings.UseTensorrt)
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Using TensorRT");

   builder.UseTensorrt(_applicationSettings.DeviceId);
}
...

When the YoloV8.Coprocessor.Detect.Image application was configured to use the NVIDIA TensorRT Execution provider the average inference time was 58mSec but it took roughly 7 minutes to build and optimise the engine each time the application was run.

Generating the TensorRT engine every time the application is started

The TensorRT Execution provider has a number of configuration options but the IYoloV8Builder interface had to modified with UseCuda, UseRocm, UseTensorrt and UseTvm overloads implemented to allow additional configuration settings.

...
public class YoloV8Builder : IYoloV8Builder
{
...
    public IYoloV8Builder UseOnnxModel(BinarySelector model)
    {
        _model = model;

        return this;
    }

#if GPURELEASE
    public IYoloV8Builder UseCuda(int deviceId) => WithSessionOptions(SessionOptions.MakeSessionOptionWithCudaProvider(deviceId));

    public IYoloV8Builder UseCuda(OrtCUDAProviderOptions options) => WithSessionOptions(SessionOptions.MakeSessionOptionWithCudaProvider(options));

    public IYoloV8Builder UseRocm(int deviceId) => WithSessionOptions(SessionOptions.MakeSessionOptionWithRocmProvider(deviceId));
    
    // Couldn't test this don't have suitable hardware
    public IYoloV8Builder UseRocm(OrtROCMProviderOptions options) => WithSessionOptions(SessionOptions.MakeSessionOptionWithRocmProvider(options));

    public IYoloV8Builder UseTensorrt(int deviceId) => WithSessionOptions(SessionOptions.MakeSessionOptionWithTensorrtProvider(deviceId));

    public IYoloV8Builder UseTensorrt(OrtTensorRTProviderOptions options) => WithSessionOptions(SessionOptions.MakeSessionOptionWithTensorrtProvider(options));

    // Couldn't test this don't have suitable hardware
    public IYoloV8Builder UseTvm(string settings = "") => WithSessionOptions(SessionOptions.MakeSessionOptionWithTvmProvider(settings));
#endif
...
}

The trt_engine_cache_enable and trt_engine_cache_path TensorRT Execution provider session options configured the engine to be cached when it’s built for the first time so when a new inference session is created the engine can be loaded directly from disk.

...
YoloV8Builder builder = new YoloV8Builder();

builder.UseOnnxModel(_applicationSettings.ModelPath);

if (_applicationSettings.UseTensorrt)
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Using TensorRT");

   OrtTensorRTProviderOptions tensorRToptions = new OrtTensorRTProviderOptions();

   Dictionary<string, string> optionKeyValuePairs = new Dictionary<string, string>();

   optionKeyValuePairs.Add("trt_engine_cache_enable", "1");
   optionKeyValuePairs.Add("trt_engine_cache_path", "enginecache/");

   tensorRToptions.UpdateOptions(optionKeyValuePairs);

   builder.UseTensorrt(tensorRToptions);
}
...

In order to validate that the loaded engine loaded from the trt_engine_cache_path is usable for the current inference, an engine profile is also cached and loaded along with engine

If current input shapes are in the range of the engine profile, the loaded engine can be safely used. If input shapes are out of range, the profile will be updated and the engine will be recreated based on the new profile.

Reusing the TensorRT engine built the first time the application is started

When the YoloV8.Coprocessor.Detect.Image application was configured to use NVIDIA TensorRT and the engine was cached the average inference time was 58mSec and the Build method took roughly 10sec to execute after the application had been run once.

trtexec console application output

The trtexec utility can “pre-generate” engines but there doesn’t appear a way to use them with the TensorRT Execution provider.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ GPU CUDA Inferencing

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G. To speed up inferencing with the Nividia Jetson Orin 8G with Compute Unified Device Architecture (CUDA) I built an Open Neural Network Exchange(ONNX) CUDA Execution Provider.

The Open Neural Network Exchange(ONNX) model used was trained on Roboflow Universe by Ugur ozdemir dataset which has 23696 images.

// load the app settings into configuration
var configuration = new ConfigurationBuilder()
      .AddJsonFile("appsettings.json", false, true)
.Build();

_applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load: {_applicationSettings.ModelPath}");

YoloV8Builder builder = new YoloV8Builder();

builder.UseOnnxModel(_applicationSettings.ModelPath);

if (_applicationSettings.UseCuda)
{
   builder.UseCuda(_applicationSettings.DeviceId) ;
}

if (_applicationSettings.UseTensorrt)
{
   builder.UseTensorrt(_applicationSettings.DeviceId);
}

/*
builder.WithConfiguration(c =>
{
});
*/

/*
builder.WithSessionOptions(new Microsoft.ML.OnnxRuntime.SessionOptions()
{

});
*/

using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
using (var predictor = builder.Build())
{
   var result = await predictor.DetectAsync(image);

   Console.WriteLine();
   Console.WriteLine($"Speed: {result.Speed}");
   Console.WriteLine();

   foreach (var prediction in result.Boxes)
   {
      Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
   }

   Console.WriteLine();

   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

   using (var imageOutput = await result.PlotImageAsync(image))
   {
      await imageOutput.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}

When configured to run the YoloV8.Coprocessor.Detect.Image on the ARM64 CPU the average inference time was 729 mSec.

The first time ran the YoloV8.Coprocessor.Detect.Image application configured to use CUDA for inferencing it failed badly.

The YoloV8.Coprocessor.Detect.Image application was then configured to use CUDA and the average inferencing time was 85mSec.

It took a couple of weeks to get the YoloV8.Coprocessor.Detect.Image application inferencing on the Nividia Jetson Orin 8G coprocessor and this will be covered in detail in another posts.

Azure Event Grid YoloV8- Basic MQTT Client Pose Estimation

The Azure.EventGrid.Image.YoloV8.Pose application downloads images from a security camera, processes them with the default YoloV8(by Ultralytics) Pose Estimation model then publishes the results to an Azure Event Grid MQTT broker topic.

private async void ImageUpdateTimerCallback(object? state)
{
   DateTime requestAtUtc = DateTime.UtcNow;

   // Just incase - stop code being called while photo or prediction already in progress
   if (_ImageProcessing)
   {
      return;
   }
   _ImageProcessing = true;

   try
   {
      _logger.LogDebug("Camera request start");

      PoseResult result;

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      {
         result = await _predictor.PoseAsync(cameraStream);
      }

      _logger.LogInformation("Speed Preprocess:{Preprocess} Postprocess:{Postprocess}", result.Speed.Preprocess, result.Speed.Postprocess);


      if (_logger.IsEnabled(LogLevel.Debug))
      {
         _logger.LogDebug("Pose results");

         foreach (var box in result.Boxes)
         {
            _logger.LogDebug(" Class:{box.Class} Confidence:{Confidence:f1}% X:{X} Y:{Y} Width:{Width} Height:{Height}", box.Class.Name, box.Confidence * 100.0, box.Bounds.X, box.Bounds.Y, box.Bounds.Width, box.Bounds.Height);

            foreach (var keypoint in box.Keypoints)
            {
               Model.PoseMarker poseMarker = (Model.PoseMarker)keypoint.Index;

               _logger.LogDebug("  Class:{Class} Confidence:{Confidence:f1}% X:{X} Y:{Y}", Enum.GetName(poseMarker), keypoint.Confidence * 100.0, keypoint.Point.X, keypoint.Point.Y);
            }
         }
      }

      var message = new MQTT5PublishMessage
      {
         Topic = string.Format(_applicationSettings.PublishTopic, _applicationSettings.UserName),
         Payload = Encoding.ASCII.GetBytes(JsonSerializer.Serialize(new
         {
            result.Boxes
         })),
         QoS = _applicationSettings.PublishQualityOfService,
      };

      _logger.LogDebug("HiveMQ.Publish start");

      var resultPublish = await _mqttclient.PublishAsync(message);

      _logger.LogDebug("HiveMQ.Publish done");
   }
   catch (Exception ex)
   {
      _logger.LogError(ex, "Camera image download, processing, or telemetry failed");
   }
   finally
   {
      _ImageProcessing = false;
   }

   TimeSpan duration = DateTime.UtcNow - requestAtUtc;

   _logger.LogDebug("Camera Image download, processing and telemetry done {TotalSeconds:f2} sec", duration.TotalSeconds);
}

The application uses a Timer(with configurable Due and Period times) to poll the security camera, detect objects in the image then publish a JavaScript Object Notation(JSON) representation of the results to Azure Event Grid MQTT broker topic using a HiveMQ client.

Utralytics Pose Model input image

The Unv ADZK-10 camera used in this sample has a Hypertext Transfer Protocol (HTTP) Uniform Resource Locator(URL) for downloading the current image. Like the YoloV8.Detect.SecurityCamera.Stream sample the image “streamed” using the HttpClient.GetStreamAsync to the YoloV8 PoseAsync method.

Azure.EventGrid.Image.YoloV8.Pose application console output

The same approach as the YoloV8.Detect.SecurityCamera.Stream sample is used because the image doesn’t have to be saved on the local filesystem.

Utralytics Pose Model marked-up image

To check the results, I put a breakpoint in the timer just after PoseAsync method is called and then used the Visual Studio 2022 Debugger QuickWatch functionality to inspect the contents of the PoseResult object.

Visual Studio 2022 Debugger PoseResult Quickwatch

For testing I configured a single Azure Event Grid custom topic subscription an Azure Storage Queue.

Azure Event Grid Topic Metrics

An Azure Storage Queue is an easy way to store messages while debugging/testing an application.

Azure Storage Explorer messages list

Azure Storage Explorer is a good tool for listing recent messages, then inspecting their payloads.

Azure Storage Explorer Message Details

The Azure Event Grid custom topic message text(in data_base64) contains the JavaScript Object Notation(JSON) of the pose detection result.

{"Boxes":[{"Keypoints":[{"Index":0,"Point":{"X":744,"Y":58,"IsEmpty":false},"Confidence":0.6334442},{"Index":1,"Point":{"X":746,"Y":33,"IsEmpty":false},"Confidence":0.759928},{"Index":2,"Point":{"X":739,"Y":46,"IsEmpty":false},"Confidence":0.19036674},{"Index":3,"Point":{"X":784,"Y":8,"IsEmpty":false},"Confidence":0.8745915},{"Index":4,"Point":{"X":766,"Y":45,"IsEmpty":false},"Confidence":0.086735755},{"Index":5,"Point":{"X":852,"Y":50,"IsEmpty":false},"Confidence":0.9166329},{"Index":6,"Point":{"X":837,"Y":121,"IsEmpty":false},"Confidence":0.85815763},{"Index":7,"Point":{"X":888,"Y":31,"IsEmpty":false},"Confidence":0.6234426},{"Index":8,"Point":{"X":871,"Y":205,"IsEmpty":false},"Confidence":0.37670398},{"Index":9,"Point":{"X":799,"Y":21,"IsEmpty":false},"Confidence":0.3686208},{"Index":10,"Point":{"X":768,"Y":205,"IsEmpty":false},"Confidence":0.21734264},{"Index":11,"Point":{"X":912,"Y":364,"IsEmpty":false},"Confidence":0.98523325},{"Index":12,"Point":{"X":896,"Y":382,"IsEmpty":false},"Confidence":0.98377174},{"Index":13,"Point":{"X":888,"Y":637,"IsEmpty":false},"Confidence":0.985927},{"Index":14,"Point":{"X":849,"Y":645,"IsEmpty":false},"Confidence":0.9834709},{"Index":15,"Point":{"X":951,"Y":909,"IsEmpty":false},"Confidence":0.96191007},{"Index":16,"Point":{"X":921,"Y":894,"IsEmpty":false},"Confidence":0.9618156}],"Class":{"Id":0,"Name":"person"},"Bounds":{"X":690,"Y":3,"Width":315,"Height":1001,"Location":{"X":690,"Y":3,"IsEmpty":false},"Size":{"Width":315,"Height":1001,"IsEmpty":false},"IsEmpty":false,"Top":3,"Right":1005,"Bottom":1004,"Left":690},"Confidence":0.8341071}]}

YoloV8 ONNX – Nvidia Jetson Orin Nano™ ARM64 CPU Inferencing

I configured the demonstration Ultralytics YoloV8 object detection(yolov8s.onnx) console application to process a 1920×1080 image from a security camera on my desktop development box (13th Gen Intel(R) Core(TM) i7-13700 2.10 GHz with 32.0 GB)

Object Detection sample application running on my development box

A Seeedstudio reComputer J3011 uses a Nividia Jetson Orin 8G and looked like a cost-effective platform to explore how a dedicated Artificial Intelligence (AI) co-processor could reduce inferencing times.

To establish a “baseline” I “published” the demonstration application on my development box which created a folder with all the files required to run the application on the Seeedstudio reComputer J3011 ARM64 CPU. I had to manually merge the “User Secrets” and appsettings.json files so the camera connection configuration was correct.

The runtimes folder contained a number of folders with the native runtime files for the supported Open Neural Network Exchange(ONNX) platforms

Object Detection application publish runtimes folder

This Nividia Jetson Orin ARM64 CPU requires the linux-arm64 ONNX runtime which was “automagically” detected. (in previous versions of ML.Net the native runtime had to be copied to the execution directory)

Linux ONNX ARM64 runtime

The final step was to use the demonstration Ultralytics YoloV8 object detection(yolov8s.onnx) console application to process a 1920×1080 image from a security camera on the reComputer J3011 (6-core Arm® Cortex®64-bit CPU 1.5Ghz processor)

Object Detection sample application running on my Seeedstudio reComputer J3011

When I averaged the pre-processing, inferencing and post-processing times for both devices over 20 executions my development box was much faster which was not a surprise. Though the reComputer J3011 post processing times were a bit faster than I was expecting

ARM64 CPU Preprocess 0.05s Inference 0.31s Postprocess 0.05

Azure Event Grid YoloV8- Basic MQTT Client Object Detection

The Azure.EventGrid.Image.Detect application downloads images from a security camera, processes them with the default YoloV8(by Ultralytics) object detection model, then publishes the results to an Azure Event Grid MQTT broker topic.

The Unv ADZK-10 camera used in this sample has a Hypertext Transfer Protocol (HTTP) Uniform Resource Locator(URL) for downloading the current image. Like the YoloV8.Detect.SecurityCamera.Stream sample the image “streamed” using the HttpClient.GetStreamAsync to the YoloV8 DetectAsync method.

private async void ImageUpdateTimerCallback(object? state)
{
   DateTime requestAtUtc = DateTime.UtcNow;

   // Just incase - stop code being called while photo or prediction already in progress
   if (_ImageProcessing)
   {
      return;
   }
   _ImageProcessing = true;

   try
   {
      _logger.LogDebug("Camera request start");

      DetectionResult result;

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      {
         result = await _predictor.DetectAsync(cameraStream);
      }

      _logger.LogInformation("Speed Preprocess:{Preprocess} Postprocess:{Postprocess}", result.Speed.Preprocess, result.Speed.Postprocess);

      if (_logger.IsEnabled(LogLevel.Debug))
      {
         _logger.LogDebug("Detection results");

         foreach (var box in result.Boxes)
         {
            _logger.LogDebug(" Class {box.Class} {Confidence:f1}% X:{box.Bounds.X} Y:{box.Bounds.Y} Width:{box.Bounds.Width} Height:{box.Bounds.Height}", box.Class, box.Confidence * 100.0, box.Bounds.X, box.Bounds.Y, box.Bounds.Width, box.Bounds.Height);
         }
      }

      var message = new MQTT5PublishMessage
      {
         Topic = string.Format(_applicationSettings.PublishTopic, _applicationSettings.UserName),
         Payload = Encoding.ASCII.GetBytes(JsonSerializer.Serialize(new
         {
            result.Boxes,
         })),
         QoS = _applicationSettings.PublishQualityOfService,
      };

      _logger.LogDebug("HiveMQ.Publish start");

      var resultPublish = await _mqttclient.PublishAsync(message);

      _logger.LogDebug("HiveMQ.Publish done");
   }
   catch (Exception ex)
   {
      _logger.LogError(ex, "Camera image download, processing, or telemetry failed");
   }
   finally
   {
      _ImageProcessing = false;
   }

   TimeSpan duration = DateTime.UtcNow - requestAtUtc;

   _logger.LogDebug("Camera Image download, processing and telemetry done {TotalSeconds:f2} sec", duration.TotalSeconds);
}

The application uses a Timer(with configurable Due and Period times) to poll the security camera, detect objects in the image then publish a JavaScript Object Notation(JSON) representation of the results to Azure Event Grid MQTT broker topic using a HiveMQ client.

Console application displaying object detection results

The uses the Microsoft.Extensions.Logging library to publish diagnostic information to the console while debugging the application.

Visual Studio 2022 QuickWatch displaying object detection results.

To check the results I put a breakpoint in the timer just after DetectAsync method is called and then used the Visual Studio 2022 Debugger QuickWatch functionality to inspect the contents of the DetectionResult object.

Visual Studio 2022 JSON Visualiser displaying object detection results.

To check the JSON payload of the MQTT message I put a breakpoint just before the HiveMQ PublishAsync method. I then inspected the payload using the Visual Studio 2022 JSON Visualizer.

Security Camera image for object detection photo bombed by Yarnold our Standard Apricot Poodle.

This application can also be deployed as a Linux systemd Service so it will start then run in the background. The same approach as the YoloV8.Detect.SecurityCamera.Stream sample is used because the image doesn’t have to be saved on the local filesystem.

YoloV8-File, Stream, & Byte array Camera Images

After building some proof-of-concept applications I have decided to use the YoloV8 by dme-compunet NuGet because it supports async await and code with async await is always better (yeah right).

The YoloV8.Detect.SecurityCamera.File sample downloads images from the security camera to the local file system, then calls DetectAsync with the local file path.

private static async void ImageUpdateTimerCallback(object state)
{
   //...
   try
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image File processing start");

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      using (Stream fileStream = System.IO.File.Create(_applicationSettings.ImageFilepath))
      {
         await cameraStream.CopyToAsync(fileStream);
      }

      DetectionResult result = await _predictor.DetectAsync(_applicationSettings.ImageFilepath);

      Console.WriteLine($"Speed: {result.Speed}");

      foreach (var prediction in result.Boxes)
      {
         Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
   }
//...
}
Console application using camera image saved on filesystem

The YoloV8.Detect.SecurityCamera.Bytes sample downloads images from the security camera as an array of bytes then calls DetectAsync.

private static async void ImageUpdateTimerCallback(object state)
{
   //...
   try
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image Bytes processing start");

      byte[] bytes = await _httpClient.GetByteArrayAsync(_applicationSettings.CameraUrl);

      DetectionResult result = await _predictor.DetectAsync(bytes);

      Console.WriteLine($"Speed: {result.Speed}");

      foreach (var prediction in result.Boxes)
      {
         Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
   }
//...
}
Console application downloading camera image as an array bytes.

The YoloV8.Detect.SecurityCamera.Stream sample “streams” the image from the security camera to DetectAsync.

private static async void ImageUpdateTimerCallback(object state)
{
   // ...
   try
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image Stream processing start");

      DetectionResult result;

      using (System.IO.Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      {
         result = await _predictor.DetectAsync(cameraStream);
      }

      Console.WriteLine($"Speed: {result.Speed}");

      foreach (var prediction in result.Boxes)
      {
         Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
   }
//...
}
Console application streaming camera image.

The ImageSelector parameter of DetectAsync caught my attention as I hadn’t seen this approach used before. The developers who wrote the NuGet package are definitely smarter than me so I figured I might learn something useful digging deeper.

My sample object detection applications all call

public static async Task<DetectionResult> DetectAsync(this YoloV8 predictor, ImageSelector selector)
{
    return await Task.Run(() => predictor.Detect(selector));
}

Which then invokes

public static DetectionResult Detect(this YoloV8 predictor, ImageSelector selector)
{
    predictor.ValidateTask(YoloV8Task.Detect);

    return predictor.Run(selector, (outputs, image, timer) =>
    {
        var output = outputs[0].AsTensor<float>();

        var parser = new DetectionOutputParser(predictor.Metadata, predictor.Parameters);

        var boxes = parser.Parse(output, image);
        var speed = timer.Stop();

        return new DetectionResult
        {
            Boxes = boxes,
            Image = image,
            Speed = speed,
        };
    });

    public TResult Run<TResult>(ImageSelector selector, PostprocessContext<TResult> postprocess) where TResult : YoloV8Result
    {
        using var image = selector.Load(true);

        var originSize = image.Size;

        var timer = new SpeedTimer();

        timer.StartPreprocess();

        var input = Preprocess(image);

        var inputs = MapNamedOnnxValues([input]);

        timer.StartInference();

        using var outputs = Infer(inputs);

        var list = new List<NamedOnnxValue>(outputs);

        timer.StartPostprocess();

        return postprocess(list, originSize, timer);
    }
}

It looks like most of the image loading magic of ImageSelector class is implemented using the SixLabors library…

public class ImageSelector<TPixel> where TPixel : unmanaged, IPixel<TPixel>
{
    private readonly Func<Image<TPixel>> _factory;

    public ImageSelector(Image image)
    {
        _factory = image.CloneAs<TPixel>;
    }

    public ImageSelector(string path)
    {
        _factory = () => Image.Load<TPixel>(path);
    }

    public ImageSelector(byte[] data)
    {
        _factory = () => Image.Load<TPixel>(data);
    }

    public ImageSelector(Stream stream)
    {
        _factory = () => Image.Load<TPixel>(stream);
    }

    internal Image<TPixel> Load(bool autoOrient)
    {
        var image = _factory();

        if (autoOrient)
            image.Mutate(x => x.AutoOrient());

        return image;
    }

    public static implicit operator ImageSelector<TPixel>(Image image) => new(image);
    public static implicit operator ImageSelector<TPixel>(string path) => new(path);
    public static implicit operator ImageSelector<TPixel>(byte[] data) => new(data);
    public static implicit operator ImageSelector<TPixel>(Stream stream) => new(stream);
}

Learnt something new must be careful to apply it only where it adds value.

YoloV8-One of these NuGets is not like the others

A couple of days after my initial testing the YoloV8 by dme-compunet NuGet was updated so I reran my test harnesses.

Then the YoloDotNet by NichSwardh NuGet was also updated so I reran my all my test harnesses again.

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just incase.

The dme-compunet YoloV8 and NickSwardh YoloDotNet NuGets results are now the same (bar the 30% cutoff) and YoloV8 by sstainba results have not changed.

YoloV8-All of these NuGets are not like the others

As part investigating which YoloV8 NuGet to use, I built three trial applications using dme-compunet YoloV8, sstainba Yolov8.Net, and NickSwardh YoloDotNet NuGets.

All of the implementations load the model, load the sample image, detect objects in the image, then markup the image with the classification, minimum bounding boxes, and confidences of each object.

Input Image

The first implementation uses YoloV8 by dme-compunet which supports asynchronous operation. The image is loaded asynchronously, the prediction is asynchronous, then marked up and saved asynchronously.

using (var predictor = new Compunet.YoloV8.YoloV8(_applicationSettings.ModelPath))
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
   Console.WriteLine();

   using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");

      var predictions = await predictor.DetectAsync(image);

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
      Console.WriteLine();

      Console.WriteLine($" Speed: {predictions.Speed}");

      foreach (var prediction in predictions.Boxes)
      {
         Console.WriteLine($"  Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
      }
      Console.WriteLine();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

      SixLabors.ImageSharp.Image imageOutput = await predictions.PlotImageAsync(image);

      await imageOutput.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}
dme-compunet YoloV8 test application output

The second implementation uses YoloDotNet by NichSwardh which partially supports asynchronous operation. The image is loaded asynchronously, the prediction is synchronous, the markup is synchronous, and then saved asynchronously.

using (var predictor = new Yolo(_applicationSettings.ModelPath, false))
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
   Console.WriteLine();

   using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");

      var predictions = predictor.RunObjectDetection(image);

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
      Console.WriteLine();

      foreach (var predicition in predictions)
      {
         Console.WriteLine($"  Class {predicition.Label.Name} {(predicition.Confidence * 100.0):f1}% X:{predicition.BoundingBox.Left} Y:{predicition.BoundingBox.Y} Width:{predicition.BoundingBox.Width} Height:{predicition.BoundingBox.Height}");
      }
      Console.WriteLine();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

      image.Draw(predictions);

      await image.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}
nickswardth YoloDotNet test application output

The third implementation uses YoloV8 by sstainba which partially supports asynchronous operation. The image is loaded asynchronously, the prediction is synchronous, the markup is synchronous, and then saved asynchronously.

using (var predictor = YoloV8Predictor.Create(_applicationSettings.ModelPath))
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
   Console.WriteLine();

   using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");

      var predictions = predictor.Predict(image);

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
      Console.WriteLine();

      foreach (var prediction in predictions)
      {
         Console.WriteLine($"  Class {prediction.Label.Name} {(prediction.Score * 100.0):f1}% X:{prediction.Rectangle.X} Y:{prediction.Rectangle.Y} Width:{prediction.Rectangle.Width} Height:{prediction.Rectangle.Height}");
      }

      Console.WriteLine();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

      // This is a bit hacky should be fixed up in future release
      Font font = new Font(SystemFonts.Get(_applicationSettings.FontName), _applicationSettings.FontSize);
      foreach (var prediction in predictions)
      {
         var x = (int)Math.Max(prediction.Rectangle.X, 0);
         var y = (int)Math.Max(prediction.Rectangle.Y, 0);
         var width = (int)Math.Min(image.Width - x, prediction.Rectangle.Width);
         var height = (int)Math.Min(image.Height - y, prediction.Rectangle.Height);

         //Note that the output is already scaled to the original image height and width.

         // Bounding Box Text
         string text = $"{prediction.Label.Name} [{prediction.Score}]";
         var size = TextMeasurer.MeasureSize(text, new TextOptions(font));

         image.Mutate(d => d.Draw(Pens.Solid(Color.Yellow, 2), new Rectangle(x, y, width, height)));

         image.Mutate(d => d.DrawText(text, font, Color.Yellow, new Point(x, (int)(y - size.Height - 1))));
      }

      await image.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}
sstainba YoloV8 test application output

I don’t understand why the three NuGets produced different results which is worrying.

Azure Event Grid MQTT-With HiveMQ & MQTTnet Clients

Most of the examples of connecting to Azure Event Grid’s MQTT broker use MQTTnet so for a bit of variety I started with a hivemq-mqtt-client-dotnet based client. (A customer had been evaluating HiveMQ for a project which was later cancelled)

BEWARE – ClientID parameter is case sensitive.

The HiveMQ client was “inspired” by the How to Guides > Custom Client Certificates documentation.

class Program
{
   private static Model.ApplicationSettings _applicationSettings;
   private static HiveMQClient _client;
   private static bool _publisherBusy = false;

   static async Task Main()
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Hive MQ client starting");

      try
      {
         // load the app settings into configuration
         var configuration = new ConfigurationBuilder()
               .AddJsonFile("appsettings.json", false, true)
               .AddUserSecrets<Program>()
         .Build();

         _applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

         var optionsBuilder = new HiveMQClientOptionsBuilder();

         optionsBuilder
            .WithClientId(_applicationSettings.ClientId)
            .WithBroker(_applicationSettings.Host)
            .WithPort(_applicationSettings.Port)
            .WithUserName(_applicationSettings.UserName)
            .WithCleanStart(_applicationSettings.CleanStart)
            .WithClientCertificate(_applicationSettings.ClientCertificateFileName, _applicationSettings.ClientCertificatePassword)
            .WithUseTls(true);

         using (_client = new HiveMQClient(optionsBuilder.Build()))
         {
            _client.OnMessageReceived += OnMessageReceived;

            var connectResult = await _client.ConnectAsync();
            if (connectResult.ReasonCode != ConnAckReasonCode.Success)
            {
               throw new Exception($"Failed to connect: {connectResult.ReasonString}");
            }

            Console.WriteLine($"Subscribed to Topic");
            foreach (string topic in _applicationSettings.SubscribeTopics.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
            {
               var subscribeResult = await _client.SubscribeAsync(topic, _applicationSettings.SubscribeQualityOfService);

               Console.WriteLine($" Topic:{topic} Result:{subscribeResult.Subscriptions[0].SubscribeReasonCode}");
            }
   }
//...
}
HiveMQ Client console application output

The MQTTnet client was “inspired” by the Azure MQTT .NET Application sample

class Program
{
   private static Model.ApplicationSettings _applicationSettings;
   private static IMqttClient _client;
   private static bool _publisherBusy = false;

   static async Task Main()
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} MQTTNet client starting");

      try
      {
         // load the app settings into configuration
         var configuration = new ConfigurationBuilder()
               .AddJsonFile("appsettings.json", false, true)
               .AddUserSecrets<Program>()
         .Build();

         _applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

         var mqttFactory = new MqttFactory();

         using (_client = mqttFactory.CreateMqttClient())
         {
            // Certificate based authentication
            List<X509Certificate2> certificates = new List<X509Certificate2>
            {
               new X509Certificate2(_applicationSettings.ClientCertificateFileName, _applicationSettings.ClientCertificatePassword)
            };

            var tlsOptions = new MqttClientTlsOptionsBuilder()
                  .WithClientCertificates(certificates)
                  .WithSslProtocols(System.Security.Authentication.SslProtocols.Tls12)
                  .UseTls(true)
                  .Build();

            MqttClientOptions mqttClientOptions = new MqttClientOptionsBuilder()
                     .WithClientId(_applicationSettings.ClientId)
                     .WithTcpServer(_applicationSettings.Host, _applicationSettings.Port)
                     .WithCredentials(_applicationSettings.UserName, _applicationSettings.Password)
                     .WithCleanStart(_applicationSettings.CleanStart)
                     .WithTlsOptions(tlsOptions)
                     .Build();

            var connectResult = await _client.ConnectAsync(mqttClientOptions);
            if (connectResult.ResultCode != MqttClientConnectResultCode.Success)
            {
               throw new Exception($"Failed to connect: {connectResult.ReasonString}");
            }

            _client.ApplicationMessageReceivedAsync += OnApplicationMessageReceivedAsync;

            Console.WriteLine($"Subscribed to Topic");
            foreach (string topic in _applicationSettings.SubscribeTopics.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
            {
               var subscribeResult = await _client.SubscribeAsync(topic, _applicationSettings.SubscribeQualityOfService);

               Console.WriteLine($" {topic} Result:{subscribeResult.Items.First().ResultCode}");
            }
      }
//...
}
MQTTnet client console application output

The design of the MQTT protocol means that the hivemq-mqtt-client-dotnet and MQTTnet implementations are similar. Having used both I personally prefer the HiveMQ client library.