Building Cloud AI with AI-YoloSharp Object detection Azure HTTP Trigger function

Introduction

Using Copilot to generate the code to retrieve an image from a security camera and small applications for object detection, object classification, with YoloSharp also went surprisingly well. An Azure HTTP Trigger function is a bit more complex so I decided to do one next. I started with the Visual Studio 2022 generated code

using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;

namespace YoloObjectDetectionFunction
{
    public class Function1
    {
        private readonly ILogger<Function1> _logger;

        public Function1(ILogger<Function1> logger)
        {
            _logger = logger;
        }

        [Function("Function1")]
        public IActionResult Run([HttpTrigger(AuthorizationLevel.Function, "get", "post")] HttpRequest req)
        {
            _logger.LogInformation("C# HTTP trigger function processed a request.");
            return new OkObjectResult("Welcome to Azure Functions!");
        }
    }
}

I then generated the first version of the code with Copilot desktop

The code looked “okay” but had lots of extra “usings”, used FunctionName vs Function attributes etc. so didn’t compile.

I think the “var yolo = new Yolo(modelPath)” comes from YoloDotNet by Niklas Swärd and “var detections = yolo.Predict(image)” is from YoloV8.Net by Samuel Stainback

using System.IO;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;
using YoloSharp;
using YoloSharp.DataStructures;

public static class YoloObjectDetectionFunction
{
    private static readonly string modelPath = "path/to/your/yolo-model.onnx";

    [FunctionName("YoloObjectDetection")]
    public static async Task<IActionResult> Run(
        [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequestMessage req,
        ILogger log)
    {
        log.LogInformation("Yolo Object Detection function processed a request.");

        // Read the image from the request
        byte[] imageBytes = await req.Content.ReadAsByteArrayAsync();
        using (Image<Rgba32> image = Image.Load<Rgba32>(imageBytes))
        {
            // Initialize YoloSharp
            using (var yolo = new Yolo(modelPath))
            {
                // Detect objects in the image
                var detections = yolo.Predict(image);

                // Process the detections
                foreach (var detection in detections)
                {
                    // Draw bounding boxes on the image
                    image.Mutate(ctx => ctx.DrawPolygon(Rgba32.Red, 2, detection.BoundingBox.ToPolygon()));
                }

                // Save the modified image to a memory stream
                using (var ms = new MemoryStream())
                {
                    image.SaveAsPng(ms);
                    ms.Seek(0, SeekOrigin.Begin);

                    // Return the modified image as a response
                    var result = new HttpResponseMessage(HttpStatusCode.OK)
                    {
                        Content = new StreamContent(ms)
                    };
                    result.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("image/png");
                    return new FileStreamResult(ms, "image/png");
                }
            }
        }
    }
}

I manually fixed all the issues in the code with a lot of assistance from Visual Studio 2022 Intellisense

using System.Net;

using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;
using SixLabors.ImageSharp.Drawing.Processing;

//using YoloSharp;
//using YoloSharp.DataStructures;
using Compunet.YoloSharp;


public static class YoloObjectDetectionFunction
{
   //private static readonly string modelPath = "path/to/your/yolo-model.onnx";
   private static readonly string modelPath = "yolov8s.onnx";

   //[FunctionName("YoloObjectDetection")]
   [Function("YoloObjectDetection")]
   public static async Task<IActionResult> Run(
       [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequestMessage req,
       ILogger log)
   {
      log.LogInformation("Yolo Object Detection function processed a request.");

      // Read the image from the request
      byte[] imageBytes = await req.Content.ReadAsByteArrayAsync();
      using (Image<Rgba32> image = Image.Load<Rgba32>(imageBytes))
      {
         // Initialize YoloSharp
         //using (var yolo = new Yolo(modelPath))
         using (var yolo = new YoloPredictor(modelPath))
         {
            // Detect objects in the image
            //var detections = yolo.Predict(image);
            var detections = yolo.Detect(image);

            // Process the detections
            foreach (var detection in detections)
            {
               // Draw bounding boxes on the image
               //image.Mutate(ctx => ctx.DrawPolygon(Rgba32.Red, 2, detection.BoundingBox.ToPolygon()));

               var rectangle = new PointF[] {new Point(detection.Bounds.Bottom, detection.Bounds.Left), new Point(detection.Bounds.Bottom, detection.Bounds.Right), new Point(detection.Bounds.Right, detection.Bounds.Top), new Point(detection.Bounds.Left, detection.Bounds.Top)};

               image.Mutate(ctx => ctx.DrawPolygon(Rgba32.ParseHex("FF0000"), 2, rectangle));
            }

            // Save the modified image to a memory stream
            using (var ms = new MemoryStream())
            {
               image.SaveAsPng(ms);
               ms.Seek(0, SeekOrigin.Begin);

               // Return the modified image as a response
               var result = new HttpResponseMessage(HttpStatusCode.OK)
               {
                  Content = new StreamContent(ms)
               };
               result.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("image/png");
               return new FileStreamResult(ms, "image/png");
            }
         }
      }
   }
}

When I posted an image to the function with Telerik Fiddler Classic it threw an exception, because the ILogger injected into the Run method was null.

I then asked Github Copilot to “Modify code so the ILogger is parameter of constructor”

using System.Net;

using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;
using SixLabors.ImageSharp.Drawing.Processing;

//using YoloSharp;
//using YoloSharp.DataStructures;
using Compunet.YoloSharp;


//public static class YoloObjectDetectionFunction
public class YoloObjectDetectionFunction
{
   private static readonly ILogger _log;
   //private static readonly string modelPath = "path/to/your/yolo-model.onnx";
   private static readonly string modelPath = "yolov8s.onnx";

   public YoloObjectDetectionFunction(ILogger<YoloObjectDetectionFunction> log)
   {
      _log = log;
   }

   //[FunctionName("YoloObjectDetection")]
   [Function("YoloObjectDetection")]
   //public static async Task<IActionResult> Run( [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequestMessage req, ILogger log)
   public static async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequestMessage req)
   {
      _log.LogInformation("Yolo Object Detection function processed a request.");

      // Read the image from the request
      byte[] imageBytes = await req.Content.ReadAsByteArrayAsync();
      using (Image<Rgba32> image = Image.Load<Rgba32>(imageBytes))
      {
         // Initialize YoloSharp
         //using (var yolo = new Yolo(modelPath))
         using (var yolo = new YoloPredictor(modelPath))
         {
            // Detect objects in the image
            //var detections = yolo.Predict(image);
            var detections = yolo.Detect(image);

            // Process the detections
            foreach (var detection in detections)
            {
               // Draw bounding boxes on the image
               //image.Mutate(ctx => ctx.DrawPolygon(Rgba32.Red, 2, detection.BoundingBox.ToPolygon()));

               var rectangle = new PointF[] {new Point(detection.Bounds.Bottom, detection.Bounds.Left), new Point(detection.Bounds.Bottom, detection.Bounds.Right), new Point(detection.Bounds.Right, detection.Bounds.Top), new Point(detection.Bounds.Left, detection.Bounds.Top)};

               image.Mutate(ctx => ctx.DrawPolygon(Rgba32.ParseHex("FF0000"), 2, rectangle));
            }

            // Save the modified image to a memory stream
            using (var ms = new MemoryStream())
            {
               image.SaveAsPng(ms);
               ms.Seek(0, SeekOrigin.Begin);

               // Return the modified image as a response
               var result = new HttpResponseMessage(HttpStatusCode.OK)
               {
                  Content = new StreamContent(ms)
               };
               result.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("image/png");
               return new FileStreamResult(ms, "image/png");
            }
         }
      }
   }
}

When I posted an image to the function it threw an exception, because content of the HttpRequestMessage was null.

I then asked Github Copilot to “Modify the code so that the image is read from the form”

// Read the image from the form
var form = await req.ReadFormAsync();
var file = form.Files["image"];
if (file == null || file.Length == 0)
{
   return new BadRequestObjectResult("Image file is missing or empty.");
}

When I posted an image to the function it returned a 400 Bad Request Error.

After inspecting the request I realized that the name field was wrong, as the generated code was looking for “image”

Content-Disposition: form-data; name=”image”; filename=”sports.jpg”

Then, when I posted an image to the function it returned a 500 error.

But, the FileStreamResult was failing so I modified the code to return a FileContentResult

using (var ms = new MemoryStream())
{
   image.SaveAsJpeg(ms);

   return new FileContentResult(ms.ToArray(), "image/jpg");
}

Then, when I posted an image to the function it succeeded

But, the bounding boxes around the detected objects were wrong.

I then manually fixed up the polygon code so the lines for each bounding box were drawn in the correct order.

// Process the detections
foreach (var detection in detections)
{
   var rectangle = new PointF[] {
      new Point(detection.Bounds.Left, detection.Bounds.Bottom),
      new Point(detection.Bounds.Right, detection.Bounds.Bottom),
      new Point(detection.Bounds.Right, detection.Bounds.Top),
      new Point(detection.Bounds.Left, detection.Bounds.Top)
 };

Then, when I posted an image to the function it succeeded

The bounding boxes around the detected objects were correct.

I then “refactored” the code, removing all the unused “using”s, removed any commented out code, changed ILogger to be initialised using a Primary Constructor etc.

using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;
using SixLabors.ImageSharp.Drawing.Processing;

using Compunet.YoloSharp;

public class YoloObjectDetectionFunction(ILogger<YoloObjectDetectionFunction> log)
{
   private readonly ILogger<YoloObjectDetectionFunction> _log = log;
   private readonly string modelPath = "yolov8s.onnx";

   [Function("YoloObjectDetection")]
   public async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req)
   {
      _log.LogInformation("Yolo Object Detection function processed a request.");

      // Read the image from the form
      var form = await req.ReadFormAsync();
      var file = form.Files["image"];
      if (file == null || file.Length == 0)
      {
         return new BadRequestObjectResult("Image file is missing or empty.");
      }

      using (var stream = file.OpenReadStream())
      using (Image<Rgba32> image = Image.Load<Rgba32>(stream))
      {
         // Initialize YoloSharp
         using (var yolo = new YoloPredictor(modelPath))
         {
            // Detect objects in the image
            var detections = yolo.Detect(image);

            // Process the detections
            foreach (var detection in detections)
            {
               var rectangle = new PointF[] {
                  new Point(detection.Bounds.Left, detection.Bounds.Bottom),
                  new Point(detection.Bounds.Right, detection.Bounds.Bottom),
                  new Point(detection.Bounds.Right, detection.Bounds.Top),
                  new Point(detection.Bounds.Left, detection.Bounds.Top)
               };

               image.Mutate(ctx => ctx.DrawPolygon(Rgba32.ParseHex("FF0000"), 2, rectangle));
            }

            // Save the modified image to a memory stream
            using (var ms = new MemoryStream())
            {
               image.SaveAsJpeg(ms);

              return new FileContentResult(ms.ToArray(), "image/jpg");
            }
         }
      }
   }
}

Summary

The initial code generated by Copilot was badly broken but with the assistance of Visual Studio 2022 Intellisense was fixed fairly quickly. The ILogger not being initialised, me using the “wrong” upload file name were easy to debug, but the FileContentResult exception was a bit more difficult.

It took me a quite a bit longer to write the function with Copilot desktop/Github Copilot than what it would have taken me normally. But, I think a lot of this was due to having to take screen shots, writing this blog post as I went, and having already written several Azure HTTP Trigger function for processing uploaded images.

The Copilot generated code in this post is not suitable for production

IoT and AI Building Edge AI with AI- YoloSharp Client

Introduction

Using Copilot to generate the code to retrieve an image from a security camera went surprisingly well so next step was something a bit more complex. For this post I have used Copilot prompts to generate code which uses Ultralytics YoloV8 and Compunet YoloSharp for object detection, object classification, pose estimation, and instance segmentation

using System;
using System.Drawing;
using YOLOSharp;

class Program
{
    static void Main(string[] args)
    {
        string modelPath = "path/to/yolov8.onnx"; // Replace with the path to your YOLOv8 model
        string imagePath = "path/to/image.jpg"; // Replace with the path to your image

        // Load the YOLOv8 model
        var yolo = new YOLO(modelPath);

        // Load the image from disk
        using (var image = Image.FromFile(imagePath))
        {
            // Run the YOLOv8 model on the image
            var results = yolo.Predict(image);

            // Display the results
            foreach (var result in results)
            {
                Console.WriteLine($"Label: {result.Label}, Confidence: {result.Confidence}, BoundingBox: {result.BoundingBox}");
            }
        }
    }
}
In this example:

Replace path/to/yolov8.onnx with the actual path to your YOLOv8 model file.

Replace path/to/image.jpg with the actual path to your image file.

This application loads the YOLOv8 model using YOLOSharp, loads an image from disk, runs the model on the image, and displays the results, including the label, confidence, and bounding box for each detected object.

Happy coding! 🚀

Copilot had “assumed” I meant object detection(reasonable) with YoloSharp and the code didn’t compile.

Object Detection

After some minor fixes and shifting the configuration to an appsettings file the application compiled.

using Compunet.YoloSharp;
using Microsoft.Extensions.Configuration;
using SixLabors.ImageSharp;

class Program
{
   static void Main(string[] args)
   {
      Model.ApplicationSettings applicationSettings;

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} 0.3.CameraHTTPClient starting");
#if RELEASE
       Console.WriteLine("RELEASE");
#else
      Console.WriteLine("DEBUG");
#endif
      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
           .AddJsonFile("appsettings.json", false, true)
      .Build();

      applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      // Load the YOLOv8 model
      var yolo = new YoloPredictor(applicationSettings.ModelPath);

      // Load the image from disk
      using (var image = Image.Load(applicationSettings.ImagePath))
      {
         // Run the YOLOv8 model on the image
         var results = yolo.Detect(image);

         // Display the results
         foreach (var result in results)
         {
            Console.WriteLine($"Label: {result.Name}, Confidence: {result.Confidence}, BoundingBox: {result.Bounds}");
         }
      }
   }
}

I tested the implementation with sample “sports” image from the YoloSharp Github repository

The console application output looked reasonable

I manually compared the output of the console application with the Ultralytics Hub model preview tab and the results looked close enough.

Pose

My Copilot prompt

please write a C# console application that uses Compunet YoloSharp to run a yolov8 pose model on an image loaded from disk
using Compunet.YoloSharp;
using Microsoft.Extensions.Configuration;
using SixLabors.ImageSharp;

class Program
{
   static void Main(string[] args)
   {
      Model.ApplicationSettings applicationSettings;

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} 0.3.CameraHTTPClient starting");
#if RELEASE
       Console.WriteLine("RELEASE");
#else
      Console.WriteLine("DEBUG");
#endif
      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
           .AddJsonFile("appsettings.json", false, true)
      .Build();

      applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      // Load the YOLOv8 pose model
      var yolo = new YoloPredictor(applicationSettings.ModelPath);

      // Load the image from disk
      using (var image = Image.Load(applicationSettings.ImagePath))
      {
         // Run the YOLOv8 pose model on the image
         var results = yolo.Pose(image);

         // Display the results
         foreach (var result in results)
         {
            Console.WriteLine($"Label: {result.Name.Name}, Confidence: {result.Confidence}, BoundingBox: {result.Bounds}");
            Console.WriteLine("Keypoints:");
            foreach (var keypoint in result)
            {
               Console.WriteLine($"  - {keypoint.Point}");
            }
         }
      }
   }
}

After some minor fixes and shifting the configuration to an appsettings file the application compiled. I tested the implementation with sample “sports” image from the YoloSharp Github repository

The console application output looked reasonable

I manually compared the output of the console application with the Ultralytics Hub model preview tab and the results were reasonable

Classification

My Copilot prompt

please write a C# console application that uses Compunet YoloSharp to run a yolov8 pose model on an image loaded from disk
using Compunet.YoloSharp;
using Microsoft.Extensions.Configuration;
using SixLabors.ImageSharp;

class Program
{
   static void Main(string[] args)
   {
      Model.ApplicationSettings applicationSettings;

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} 0.3.CameraHTTPClient starting");
#if RELEASE
       Console.WriteLine("RELEASE");
#else
      Console.WriteLine("DEBUG");
#endif

      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
           .AddJsonFile("appsettings.json", false, true)
      .Build();

      applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      // Load the YOLOv8 classification model
      var yolo = new YoloPredictor(applicationSettings.ModelPath);

      // Load the image from disk
      using (var image = Image.Load(applicationSettings.ImagePath))
      {
         // Run the YOLOv8 classification model on the image
         var results = yolo.Classify(image);

         // Display the results
         foreach (var result in results)
         {
             Console.WriteLine($"Label: {result.Name.Name}, Confidence: {result.Confidence}");
         }
      }
   }
}

After some minor fixes and shifting the configuration to an appsettings file the application compiled. I tested the implementation with sample “toaster” image from the YoloSharp Github repository

The console application output looked reasonable

I’m pretty confident the input image was a toaster.

Summary

The Copilot prompts to generate code which uses Ultralytics YoloV8 and Compunet YoloSharp and may have produced better code with some “prompt engineering”. Using Visual Studio intellisense the generated code was easy to fix.

The Copilot generated code in this post is not suitable for production

YoloV8 ONNX – Nvidia Jetson Orin Nano™ Execution Providers

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nvidia Jetson Orin 8G which can be used for inferencing with the Open Neural Network Exchange(ONNX)Runtime.

Story of Fail

Inferencing worked first time on the ARM64 CPU because the required runtime is included in the Microsoft.ML.OnnxRuntime NuGet

ARM64 Linux ONNX runtime
Microsoft.ML.OnnxRuntime NuGet ARM64 Linux runtime

Inferencing failed on the Nividia Jetson Orin 8G because the CUDA Execution provider and TensorRT Execution Provider for the ONNXRuntime were not included in the Microsoft.ML.OnnxRuntime.GPU.Linux NuGet.

Missing ARM64 Linux GPU runtime

There were Linux x64 and Windows x64 versions of the ONNXRuntime library included in the Microsoft.ML.OnnxRuntime.Gpu NuGet

Microsoft.ML.OnnxRuntime.Gpu NuGet x64 Linux runtime

Desperately Seeking libonnxruntime.so

The Nvidia ONNX runtime site had pip wheel files for the different versions of Python and the Open Neural Network Exchange(ONNX)Runtime.

The onnxruntime_gpu-1.18.0-cp312-cp312-linux_aarch64.whl matched the version of the ONNXRuntime I needed and version of Python on the device..

When the pip wheel file was renamed onnxruntime_gpu-1.18.0-cp312-cp312-linux_aarch64.zip it could be opened, but there wasn’t a libonnruntime.so.

Onnxruntime_gpu-1.18.0-cp312-cp312-linux_aarch64 file listing

Building the TensorRT & CUDA Execution Providers

The ONNXRuntime build has to be done on Nividia Jetson Orin so after installing all the necessary prerequisites the first attempt failed.

bryn@ubuntu:~/onnxruntime/onnxruntime$ ./build.sh --config Release --update --build --build_wheel \
--use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \
--tensorrt_home /usr/lib/aarch64-linux-gnu

When in high power mode more cores are used but this consumes more resource when building the ONNXRuntime. To limit resource utilisation --parallel2 was added the command line because the compile process was having “out of memory” failures.

bryn@ubuntu:~/onnxruntime/onnxruntime$ ./build.sh --config Release --update --build --parallel 2 --build_wheel \
--use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \
--tensorrt_home /usr/lib/aarch64-linux-gnu

There were some compiler warnings but they appear to be benign.

First attempt at running the application failed because libonnxruntime.so was missing so –build_shared_lib was added to the command line

2024-06-10 18:21:58,480 build [INFO] - Build complete
bryn@ubuntu:~/onnxruntime/onnxruntime$ ./build.sh --config Release --update --build --parallel 2 --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu --build_shared_lib

When the build completed the files were copied to the runtime folder of the program.

The application could then be configured to use the TensorRT Execution Provider.

Getting CUDA and TensorRT working on the Nvidia Jetson Orin 8G took much longer than I expected, with many dead ends and device factory resets before the process was repeatable.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ CPU & GPU TensorRT Inferencing

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G. To speed up TensorRT inferencing I built an Open Neural Network Exchange(ONNX) TensorRT Execution Provider. After updating the code to add a “warm-up” and tracking of average pre-processing, inferencing & post-processing durations I did a series of CPU & GPU performance tests.

The testing consisted of permutations of three models TennisBallsYoloV8s20240618640×640.onnx, TennisBallsYoloV8s2024062410241024.onnx & TennisBallsYoloV8x20240614640×640 (limited testing as slow) and three images TennisBallsLandscape640x640.jpg, TennisBallsLandscape1024x1024.jpg & TennisBallsLandscape3072x4080.jpg.

Executive Summary

As expected, inferencing with a TensorRT 640×640 model and a 640×640 image was fastest, 9mSec pre-processing, 21mSec inferencing, then 4mSec post-processing.

If the image had to be scaled with SixLabors.ImageSharp this significantly increased the preprocessing (and overall) time.

CPU Inferencing

GPU TensorRT Small model Inferencing

GPU TensorRT Large model Inferencing

Nvidia Jetson Orin Nano™ JetPack 6

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G Coprocessor. To speed up ML.NET running on the Nividia Jetson Orin 8G required compatible versions of ML.NET Open Neural Network Exchange (ONNX) and NVIDIA Jetpack.

Before installing NVIDIA Jetpack 6 the Seeedstudio reComputer J3011 Edge AI Device has to be put into recovery mode

Seeedstudio reComputer J3011 Edge AI Device with jumper for recovery mode

When started in recovery mode the Seeedstudio J3011 was in list of Universal Serial Bus (USB) devices returned by lsusb

Upgrading to Jetpack 5.1.1 so the device could be upgraded using the Windows subsystem for Linux terminal failed. The NVIDIA SDK Manager downloads and installs all the required components and dependencies.

Installing NVIDIA Jetpack 6 from the Windows subsystem for Linux failed because the version of Ubuntu installed(Ubuntu 24.02 LTS) was not supported by NVIDIA SDK Manager.

Installing NVIDIA Jetpack 6 from a desktop PC running Ubuntu 24.02 LTS failed (the same issue as above) because the NVIDIA SDK Manager did not support that version of Ubuntu. The desktop PC was then “re-paved” with Ubuntu 22.04 LTS and NVIDIA SDK Manager worked.

An NVIDIA Developer program login is required to launch the NVIDIA SDK Manager

Selecting the right target hardware is important if it is not “auto detected”.

The Open Neural Network Exchange(ONNX) supports the Compute Unified Device Architecture (CUDA) which has to be included in the installation package.

Downloading NVIDIA Jetpack 6 and all the selected components of the install can be quite slow

Installation of NVIDIA Jetpack 6 and selected components can take a while.

Even though Jetpack 6 is now available for Seeed’s Jetson Orin Devices this process is still applicable for an upgrade or “factory reset”.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ DenseTensor Performance

When running the YoloV8 Coprocessor demonstration on the Nividia Jetson Orin inferencing looked a bit odd, the dotted line wasn’t moving as fast as expected. To investigate this further I split the inferencing duration into pre-processing, inferencing and post-processing times. Inferencing and post-processing were “quick”, but pre-processing was taking longer than expected.

YoloV8 Coprocessor application running on Nvidia Jetson Orin

When I ran the demonstration Ultralytics YoloV8 object detection console application on my development desktop (13th Gen Intel(R) Core(TM) i7-13700 2.10 GHz with 32.0 GB) the pre-processing was much faster.

The much shorter pre-processing and longer inferencing durations were not a surprise as my development desktop does not have a Graphics Processing Unit(GPU)

Test image used for testing on Jetson device and development PC

The test image taken with my mobile was 3606×2715 pixels which was representative of the security cameras images to be processed by the solution.

Redgate ANTS Performance Profiler instrumentation of application execution

On my development box running the application with Redgate ANTS Performance Profiler highlighted that the Computnet YoloV8 code converting the image to a DenseTensor could be an issue.

 public static void ProcessToTensor(Image<Rgb24> image, Size modelSize, bool originalAspectRatio, DenseTensor<float> target, int batch)
 {
    var options = new ResizeOptions()
    {
       Size = modelSize,
       Mode = originalAspectRatio ? ResizeMode.Max : ResizeMode.Stretch,
    };

    var xPadding = (modelSize.Width - image.Width) / 2;
    var yPadding = (modelSize.Height - image.Height) / 2;

    var width = image.Width;
    var height = image.Height;

    // Pre-calculate strides for performance
    var strideBatchR = target.Strides[0] * batch + target.Strides[1] * 0;
    var strideBatchG = target.Strides[0] * batch + target.Strides[1] * 1;
    var strideBatchB = target.Strides[0] * batch + target.Strides[1] * 2;
    var strideY = target.Strides[2];
    var strideX = target.Strides[3];

    // Get a span of the whole tensor for fast access
    var tensorSpan = target.Buffer;

    // Try get continuous memory block of the entire image data
    if (image.DangerousTryGetSinglePixelMemory(out var memory))
    {
       Parallel.For(0, width * height, index =>
       {
             int x = index % width;
             int y = index / width;
             int tensorIndex = strideBatchR + strideY * (y + yPadding) + strideX * (x + xPadding);

             var pixel = memory.Span[index];
             WritePixel(tensorSpan.Span, tensorIndex, pixel, strideBatchR, strideBatchG, strideBatchB);
       });
    }
    else
    {
       Parallel.For(0, height, y =>
       {
             var rowSpan = image.DangerousGetPixelRowMemory(y).Span;
             int tensorYIndex = strideBatchR + strideY * (y + yPadding);

             for (int x = 0; x < width; x++)
             {
                int tensorIndex = tensorYIndex + strideX * (x + xPadding);
                var pixel = rowSpan[x];
                WritePixel(tensorSpan.Span, tensorIndex, pixel, strideBatchR, strideBatchG, strideBatchB);
             }
       });
    }
 }

 private static void WritePixel(Span<float> tensorSpan, int tensorIndex, Rgb24 pixel, int strideBatchR, int strideBatchG, int strideBatchB)
 {
    tensorSpan[tensorIndex] = pixel.R / 255f;
    tensorSpan[tensorIndex + strideBatchG - strideBatchR] = pixel.G / 255f;
    tensorSpan[tensorIndex + strideBatchB - strideBatchR] = pixel.B / 255f;
 }

For a 3606×2715 image the WritePixel method would be called tens of millions of times so its implementation and the overall approach used for ProcessToTensor has a significant impact on performance.

YoloV8 Coprocessor application running on Nvidia Jetson Orin with a resized image

Resizing the images had a significant impact on performance on the development box and Nividia Jetson Orin. This will need some investigation to see how much reducing the resizing the images impacts on the performance and accuracy of the model.

The ProcessToTensor method has already had some performance optimisations which improved performance by roughly 20%. There have been discussions about optimising similar code e.g. Efficient Bitmap to OnnxRuntime Tensor in C#, and Efficient RGB Image to Tensor in dotnet which look applicable and these will be evaluated.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ GPU TensorRT Inferencing

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G. To speed up inferencing on the Nividia Jetson Orin 8G with TensorRT I built an Open Neural Network Exchange(ONNX) TensorRT Execution Provider.

Roboflow Universe Tennis Ball by Ugur ozdemir dataset

The Open Neural Network Exchange(ONNX) model used was trained on Roboflow Universe by Ugur ozdemir dataset which has 23696 images. The initial version of the TensorRT integration used the builder.UseTensorrt method of the IYoloV8Builder interface.

...
YoloV8Builder builder = new YoloV8Builder();

builder.UseOnnxModel(_applicationSettings.ModelPath);

if (_applicationSettings.UseTensorrt)
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Using TensorRT");

   builder.UseTensorrt(_applicationSettings.DeviceId);
}
...

When the YoloV8.Coprocessor.Detect.Image application was configured to use the NVIDIA TensorRT Execution provider the average inference time was 58mSec but it took roughly 7 minutes to build and optimise the engine each time the application was run.

Generating the TensorRT engine every time the application is started

The TensorRT Execution provider has a number of configuration options but the IYoloV8Builder interface had to modified with UseCuda, UseRocm, UseTensorrt and UseTvm overloads implemented to allow additional configuration settings.

...
public class YoloV8Builder : IYoloV8Builder
{
...
    public IYoloV8Builder UseOnnxModel(BinarySelector model)
    {
        _model = model;

        return this;
    }

#if GPURELEASE
    public IYoloV8Builder UseCuda(int deviceId) => WithSessionOptions(SessionOptions.MakeSessionOptionWithCudaProvider(deviceId));

    public IYoloV8Builder UseCuda(OrtCUDAProviderOptions options) => WithSessionOptions(SessionOptions.MakeSessionOptionWithCudaProvider(options));

    public IYoloV8Builder UseRocm(int deviceId) => WithSessionOptions(SessionOptions.MakeSessionOptionWithRocmProvider(deviceId));
    
    // Couldn't test this don't have suitable hardware
    public IYoloV8Builder UseRocm(OrtROCMProviderOptions options) => WithSessionOptions(SessionOptions.MakeSessionOptionWithRocmProvider(options));

    public IYoloV8Builder UseTensorrt(int deviceId) => WithSessionOptions(SessionOptions.MakeSessionOptionWithTensorrtProvider(deviceId));

    public IYoloV8Builder UseTensorrt(OrtTensorRTProviderOptions options) => WithSessionOptions(SessionOptions.MakeSessionOptionWithTensorrtProvider(options));

    // Couldn't test this don't have suitable hardware
    public IYoloV8Builder UseTvm(string settings = "") => WithSessionOptions(SessionOptions.MakeSessionOptionWithTvmProvider(settings));
#endif
...
}

The trt_engine_cache_enable and trt_engine_cache_path TensorRT Execution provider session options configured the engine to be cached when it’s built for the first time so when a new inference session is created the engine can be loaded directly from disk.

...
YoloV8Builder builder = new YoloV8Builder();

builder.UseOnnxModel(_applicationSettings.ModelPath);

if (_applicationSettings.UseTensorrt)
{
   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Using TensorRT");

   OrtTensorRTProviderOptions tensorRToptions = new OrtTensorRTProviderOptions();

   Dictionary<string, string> optionKeyValuePairs = new Dictionary<string, string>();

   optionKeyValuePairs.Add("trt_engine_cache_enable", "1");
   optionKeyValuePairs.Add("trt_engine_cache_path", "enginecache/");

   tensorRToptions.UpdateOptions(optionKeyValuePairs);

   builder.UseTensorrt(tensorRToptions);
}
...

In order to validate that the loaded engine loaded from the trt_engine_cache_path is usable for the current inference, an engine profile is also cached and loaded along with engine

If current input shapes are in the range of the engine profile, the loaded engine can be safely used. If input shapes are out of range, the profile will be updated and the engine will be recreated based on the new profile.

Reusing the TensorRT engine built the first time the application is started

When the YoloV8.Coprocessor.Detect.Image application was configured to use NVIDIA TensorRT and the engine was cached the average inference time was 58mSec and the Build method took roughly 10sec to execute after the application had been run once.

trtexec console application output

The trtexec utility can “pre-generate” engines but there doesn’t appear a way to use them with the TensorRT Execution provider.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ GPU CUDA Inferencing

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G. To speed up inferencing with the Nividia Jetson Orin 8G with Compute Unified Device Architecture (CUDA) I built an Open Neural Network Exchange(ONNX) CUDA Execution Provider.

The Open Neural Network Exchange(ONNX) model used was trained on Roboflow Universe by Ugur ozdemir dataset which has 23696 images.

// load the app settings into configuration
var configuration = new ConfigurationBuilder()
      .AddJsonFile("appsettings.json", false, true)
.Build();

_applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load: {_applicationSettings.ModelPath}");

YoloV8Builder builder = new YoloV8Builder();

builder.UseOnnxModel(_applicationSettings.ModelPath);

if (_applicationSettings.UseCuda)
{
   builder.UseCuda(_applicationSettings.DeviceId) ;
}

if (_applicationSettings.UseTensorrt)
{
   builder.UseTensorrt(_applicationSettings.DeviceId);
}

/*
builder.WithConfiguration(c =>
{
});
*/

/*
builder.WithSessionOptions(new Microsoft.ML.OnnxRuntime.SessionOptions()
{

});
*/

using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
using (var predictor = builder.Build())
{
   var result = await predictor.DetectAsync(image);

   Console.WriteLine();
   Console.WriteLine($"Speed: {result.Speed}");
   Console.WriteLine();

   foreach (var prediction in result.Boxes)
   {
      Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
   }

   Console.WriteLine();

   Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");

   using (var imageOutput = await result.PlotImageAsync(image))
   {
      await imageOutput.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
   }
}

When configured to run the YoloV8.Coprocessor.Detect.Image on the ARM64 CPU the average inference time was 729 mSec.

The first time ran the YoloV8.Coprocessor.Detect.Image application configured to use CUDA for inferencing it failed badly.

The YoloV8.Coprocessor.Detect.Image application was then configured to use CUDA and the average inferencing time was 85mSec.

It took a couple of weeks to get the YoloV8.Coprocessor.Detect.Image application inferencing on the Nividia Jetson Orin 8G coprocessor and this will be covered in detail in another posts.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ ARM64 CPU Inferencing

I configured the demonstration Ultralytics YoloV8 object detection(yolov8s.onnx) console application to process a 1920×1080 image from a security camera on my desktop development box (13th Gen Intel(R) Core(TM) i7-13700 2.10 GHz with 32.0 GB)

Object Detection sample application running on my development box

A Seeedstudio reComputer J3011 uses a Nividia Jetson Orin 8G and looked like a cost-effective platform to explore how a dedicated Artificial Intelligence (AI) co-processor could reduce inferencing times.

To establish a “baseline” I “published” the demonstration application on my development box which created a folder with all the files required to run the application on the Seeedstudio reComputer J3011 ARM64 CPU. I had to manually merge the “User Secrets” and appsettings.json files so the camera connection configuration was correct.

The runtimes folder contained a number of folders with the native runtime files for the supported Open Neural Network Exchange(ONNX) platforms

Object Detection application publish runtimes folder

This Nividia Jetson Orin ARM64 CPU requires the linux-arm64 ONNX runtime which was “automagically” detected. (in previous versions of ML.Net the native runtime had to be copied to the execution directory)

Linux ONNX ARM64 runtime

The final step was to use the demonstration Ultralytics YoloV8 object detection(yolov8s.onnx) console application to process a 1920×1080 image from a security camera on the reComputer J3011 (6-core Arm® Cortex®64-bit CPU 1.5Ghz processor)

Object Detection sample application running on my Seeedstudio reComputer J3011

When I averaged the pre-processing, inferencing and post-processing times for both devices over 20 executions my development box was much faster which was not a surprise. Though the reComputer J3011 post processing times were a bit faster than I was expecting

ARM64 CPU Preprocess 0.05s Inference 0.31s Postprocess 0.05

Training a model with Azure AI Machine Learning

I exported the Tennis Ball by Ugur Ozdemir dataset in a suitable format I could use it to train a model using the Visual Studio 2022 ML.Net support. The first step was to export the Tennis Ball dataset in COCO (Common Objects in Context) format.

Exporting Tennis ball dataset in COCO format

My development box doesn’t have a suitable Local(GPU) and Local(CPU) training failed

Local CPU selected for model training

After a couple of hours training the in the Visual Studio 2022 the output “Loss” value was NaN and the training didn’t end successfully.

Local CPU model training failure

Training with Local(CPU) failed so I then tried again with ML.Net Azure Machine Learning option.

Azure Machine Learning selected for model training

The configuration of my Azure Machine Learning experiment which represent the collection of trials used took much longer than expected.

Insufficient SKUs available in Australia East

Initially my subscription had Insufficient Standard NC4as_T4_v3 SKUs in Australia East so I had to request a quota increase which took a couple of support tickets.

Training Environment Provisioned
Uploading the model training dataset

I do wonder why they include Microsoft’s Visual Object Tagging Tool(VOTT) format as an option because there has been no work done on the project since late 2021.

Uploading the model validation dataset

I need to check how the Roboflow dataset was loaded (I think possibly only the training dataset was loaded, so that was split into training and test datasets) and trial different configurations.

I like the machine generated job names “frank machine”, “tough fowl” and “epic chicken”.

Azure Machine Learning Job list

I found my Ultralytics YoloV8 model coped better with different backgrounds and tennis ball colours.

Evaluating model with tennis balls on my living room floor
Evaluating model with tennis balls on the office floor

I used the “generated” code to consume the model with a simple console application.

Visual Studio 2022 ML.Net Integration client code generation
static async Task Main()
{
   Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} FasterrCNNResnet50 client starting");

   try
   {
      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
            .AddJsonFile("appsettings.json", false, true)
      .Build();

      Model.ApplicationSettings _applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      // Create single instance of sample data from first line of dataset for model input
      var image = MLImage.CreateFromFile(_applicationSettings.ImageInputPath);

      AzureObjectDetection.ModelInput sampleData = new AzureObjectDetection.ModelInput()
      {
         ImageSource = image,
      };

      // Make a single prediction on the sample data and print results.
      var predictionResult = AzureObjectDetection.Predict(sampleData);

      Console.WriteLine("Predicted Boxes:");
      Console.WriteLine(predictionResult);
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} MQTTnet.Publish failed {ex.Message}");
   }

   Console.WriteLine("Press ENTER to exit");
   Console.ReadLine();
}

The initial model was detecting only 28 (with much lower confidences) of the 30 tennis balls in the sample images.

Output of console application with object detection information

I used the “default configuration” settings and ran the model training for 17.5 hours overnight which cost roughly USD24.

Azure Pricing Calculator estimate for my training setup

This post is not about how train a “good” model it is the approach I took to create a “proof of concept” model for a demonstration.