Timer, using, Garbage Collection & Await

Initially my Ultralytics YoloV8 based unicorn/not unicorn classification model test application would run overnight processing images retrieved from a Security Camera using an HTTP GET.

A unicorn with 86% confidence
Test Application DEBUG build

When I changed to a release build the System.Threading.Timer TimerCallback would only be called once

Test Application RELEASE build failure

After some debugging I found that if I added a using statement the TimerCallback was called reliably.

static async Task Main()
{
   Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} SecurityCameraImage starting");
#if RELEASE
   Console.WriteLine("RELEASE");
#else
   Console.WriteLine("DEBUG");
#endif
   try
   {
      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
            .AddJsonFile("appsettings.json", false, true)
      .AddUserSecrets<Program>()
      .Build();

      _applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss} press <ctrl^c> to exit Due:{_applicationSettings.ImageTimerDue} Period:{_applicationSettings.ImageTimerPeriod}");

      NetworkCredential networkCredential = new(_applicationSettings.CameraUserName, _applicationSettings.CameraUserPassword);

      using (_httpClient = new HttpClient(new HttpClientHandler { PreAuthenticate = true, Credentials = networkCredential }))
      {
#if true
         Console.WriteLine("Using - NO");
         Timer imageUpdatetimer = new(ImageUpdateTimerCallback, null, _applicationSettings.ImageTimerDue, _applicationSettings.ImageTimerPeriod); // Debug only
#else
         Console.WriteLine("Using - YES");
         using Timer imageUpdatetimer = new(ImageUpdateTimerCallback,null, _applicationSettings.ImageTimerDue, _applicationSettings.ImageTimerPeriod); // Release works
#endif
         {
            try
            {
               await Task.Delay(Timeout.Infinite);
            }
            catch (TaskCanceledException)
            {
               Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Application shutown requested");
            }
         }
      }
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Application shutown failure {ex.Message}", ex);
   }
}

private static async void ImageUpdateTimerCallback(object? state)
{
   Console.WriteLine("Timer start");

   // Just incase - stop code being called while photo already in progress
   if (_cameraBusy)
   {
      return;
   }
   _cameraBusy = true;

   try
   {
      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Security Camera Image download start");

      using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
      using (FileStream fileStream = File.Open(_applicationSettings.ImageInputPath, FileMode.Create))
      {
         await cameraStream.CopyToAsync(fileStream);
      }

      Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} Security Camera Image download done");
   }
   catch (Exception ex)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} Security camera image download failed {ex.Message}");
   }
   finally
   {
      _cameraBusy = false;
   }
   Console.WriteLine("Timer done");
}

I assume that in release build the code was “optimised” and the Garbage Collector(GC) was more aggressively freeing resources.

Test Application RELEASE Build running

YoloV8 NuGet on a diet – Part 1

Recently most of my YoloV8 projects inference on edge devices or in Azure and do not use Graphics Processing Unit(GPU) hardware. Most of my projects use the dme-compunet YoloV8 NuGet and for compatibility (Removal of extended CUDA & TensorRT configuration functionality) reasons this post uses version 4.2 of the source.

The initial version of the YoloV8.dll in the version 4.2 of the NuGet was 96.5KB. Most of my applications deployed to edge devices and Azure do not require plotting functionality so I started by commenting out (not terribly subtle).

The intermediate version of the YoloV8.dll in the version 4.2 of the NuGet on a “diet” with the plotting functionality “commented out” also meant the SixLabors.ImageSharp.Drawing NuGet could be removed.

The final release version of the YoloV8.dll in the version 4.2 of the NuGet was 64.0 KB. The main purpose of this process was to remove any unnecessary functionality to see how hard it would be to replace the SixLabors.ImageSharp and SixLabors.ImageSharp.Drawing with libraries that have better performance.

Azure Function SendGrid Binding Fail

This post is for Azure Function developers having issues with the SendGrid binding throwing exceptions like the one below.

System.Private.CoreLib: Exception while executing function: Functions.AzureBlobFileUploadEmailer. Microsoft.Azure.WebJobs.Extensions.SendGrid: A 'To' address must be specified for the message

My Azure BlobTrigger Function sends an email (with SendGrid) when a file is uploaded to an Azure Blob Storage Container(a couple of times a day).

public class FileUploadEmailer(ILogger<FileUploadEmailer> logger, IOptions<EmailSettings> emailSettings)
{
   private readonly ILogger<FileUploadEmailer> _logger = logger;
   private readonly EmailSettings _emailSettings = emailSettings.Value;

   [Function(nameof(AzureBlobFileUploadEmailer))]
   [SendGridOutput(ApiKey = "SendGridAPIKey")]
   public string Run([BlobTrigger("filestobeprocesed/{name}", Connection = "upload-file-storage")] Stream stream, string name)
   {
      _logger.LogInformation("FileUploadEmailer Blob trigger function Processed blob Name:{0} start", name);

      try
      {
         var message = new SendGridMessage();

         message.SetFrom(_emailSettings.From);
         message.AddTo(_emailSettings.To);
         message.Subject = _emailSettings.Subject;

         message.AddContent(MimeType.Html, string.Format(_emailSettings.BodyFormat, name, DateTime.UtcNow));

         // WARNING - Use Newtonsoft JSON serializer to produce JSON string. System.Text.Json won't work because property annotations are different
         var messageJson = Newtonsoft.Json.JsonConvert.SerializeObject(message);

         _logger.LogInformation("FileUploadEmailer Blob trigger function Processed blob Name:{0} finish", name);

         return messageJson;
      }
      catch (Exception ex)
      {
         _logger.LogError(ex, "FileUploadEmailer Blob trigger function Processed blob Name: {0}", name);

         throw;
      }
   }
}

I missed the first clue when I looked at the JSON and missed the Tos, Ccs, Bccs property names.

{
"From":{"Name":"Foo","Email":"bryn.lewis@devmobile.co.nz"},
"Subject":"Hi 30/09/2024 1:27:49 pm",
"Personalizations":[{"Tos":[{"Name":"Bar","Email":"bryn.lewis@devmobile.co.nz"}],
"Ccs":null,
"Bccs":null,
"From":null,
"Subject":null,
"Headers":null,
"Substitutions":null,
"CustomArgs":null,
"SendAt":null,
"TemplateData":null}],
"Contents":[{"Type":"text/html","Value":"\u003Ch2\u003EHello AssemblyInfo.cs\u003C/h2\u003E"}],
"PlainTextContent":null,
"HtmlContent":null,
"Attachments":null,
"TemplateId":null,
"Headers":null,
"Sections":null,
"Categories":null,
"CustomArgs":null,
"SendAt":null,
"Asm":null,
"BatchId":null,
"IpPoolName":null,
"MailSettings":null,
"TrackingSettings":null,
"ReplyTo":null,
"ReplyTos":null
}

I wasn’t paying close enough attention to the sample code and used the System.Text.Json rather than Newtonsoft.Json to serialize the SendGridMessage object. They use different attributes for property names etc. so the JSON generated was wrong.

Initially, I tried adding System.Text.Json attributes to the SendGridMessage class

namespace SendGrid.Helpers.Mail
{
   /// <summary>
   /// Class SendGridMessage builds an object that sends an email through Twilio SendGrid.
   /// </summary>
   [JsonObject(IsReference = false)]
   public class SendGridMessage
   {
      /// <summary>
      /// Gets or sets an email object containing the email address and name of the sender. Unicode encoding is not supported for the from field.
      /// </summary>
      //[JsonProperty(PropertyName = "from")]
      [JsonPropertyName("from")]
      public EmailAddress From { get; set; }

      /// <summary>
      /// Gets or sets the subject of your email. This may be overridden by personalizations[x].subject.
      /// </summary>
      //[JsonProperty(PropertyName = "subject")]
      [JsonPropertyName("subject")]
      public string Subject { get; set; }

      /// <summary>
      /// Gets or sets a list of messages and their metadata. Each object within personalizations can be thought of as an envelope - it defines who should receive an individual message and how that message should be handled. For more information, please see our documentation on Personalizations. Parameters in personalizations will override the parameters of the same name from the message level.
      /// https://sendgrid.com/docs/Classroom/Send/v3_Mail_Send/personalizations.html.
      /// </summary>
      //[JsonProperty(PropertyName = "personalizations", IsReference = false)]
      [JsonPropertyName("personalizations")]
      public List<Personalization> Personalizations { get; set; }
...
}

SendGridMessage uses other classes like EmailAddress which worked because the property names matched the JSON

namespace SendGrid.Helpers.Mail
{
    /// <summary>
    /// An email object containing the email address and name of the sender or recipient.
    /// </summary>
    [JsonObject(IsReference = false)]
    public class EmailAddress : IEquatable<EmailAddress>
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="EmailAddress"/> class.
        /// </summary>
        public EmailAddress()
        {
        }

        /// <summary>
        /// Initializes a new instance of the <see cref="EmailAddress"/> class.
        /// </summary>
        /// <param name="email">The email address of the sender or recipient.</param>
        /// <param name="name">The name of the sender or recipient.</param>
        public EmailAddress(string email, string name = null)
        {
            this.Email = email;
            this.Name = name;
        }

        /// <summary>
        /// Gets or sets the name of the sender or recipient.
        /// </summary>
        [JsonProperty(PropertyName = "name")]
        public string Name { get; set; }

        /// <summary>
        /// Gets or sets the email address of the sender or recipient.
        /// </summary>
        [JsonProperty(PropertyName = "email")]
        public string Email { get; set; }
...
}

Many of the property name “mismatch” issues were in the Personalization class with the Toos, Ccs, bccs etc. properties

namespace SendGrid.Helpers.Mail
{
    /// <summary>
    /// An array of messages and their metadata. Each object within personalizations can be thought of as an envelope - it defines who should receive an individual message and how that message should be handled. For more information, please see our documentation on Personalizations. Parameters in personalizations will override the parameters of the same name from the message level.
    /// https://sendgrid.com/docs/Classroom/Send/v3_Mail_Send/personalizations.html.
    /// </summary>
    [JsonObject(IsReference = false)]
    public class Personalization
    {
        /// <summary>
        /// Gets or sets an array of recipients. Each email object within this array may contain the recipient’s name, but must always contain the recipient’s email.
        /// </summary>
        [JsonProperty(PropertyName = "to", IsReference = false)]
        [JsonConverter(typeof(RemoveDuplicatesConverter<EmailAddress>))]
        public List<EmailAddress> Tos { get; set; }

        /// <summary>
        /// Gets or sets an array of recipients who will receive a copy of your email. Each email object within this array may contain the recipient’s name, but must always contain the recipient’s email.
        /// </summary>
        [JsonProperty(PropertyName = "cc", IsReference = false)]
        [JsonConverter(typeof(RemoveDuplicatesConverter<EmailAddress>))]
        public List<EmailAddress> Ccs { get; set; }

        /// <summary>
        /// Gets or sets an array of recipients who will receive a blind carbon copy of your email. Each email object within this array may contain the recipient’s name, but must always contain the recipient’s email.
        /// </summary>
        [JsonProperty(PropertyName = "bcc", IsReference = false)]
        [JsonConverter(typeof(RemoveDuplicatesConverter<EmailAddress>))]
        public List<EmailAddress> Bccs { get; set; }

        /// <summary>
        /// Gets or sets the from email address. The domain must match the domain of the from email property specified at root level of the request body.
        /// </summary>
        [JsonProperty(PropertyName = "from")]
        public EmailAddress From { get; set; }

        /// <summary>
        /// Gets or sets the subject line of your email.
        /// </summary>
        [JsonProperty(PropertyName = "subject")]
        public string Subject { get; set; }
...
}

After a couple of failed attempts at decorating the SendGrid SendGridMessage, EmailAddress, Personalization etc. classes I gave up and reverted to the Newtonsoft.Json serialiser.

Note to self – pay closer attention to the samples.

YoloV8-NuGet Performance ARM64 CPU

To see how the dme-compunet, updated YoloDotNet and sstainba NuGets performed on an ARM64 CPU I built a test rig for the different NuGets using standard images and ONNX Models.

I started with the dme-compunet YoloV8 NuGet which found all the tennis balls and the results were consistent with earlier tests.

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so I built “old” and “updated” test harnesses.

The YoloDotNet by NickSwardh V1 and V2 results were slightly different. The V2 NuGet uses SkiaSharp which appears to significantly improve the performance.

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just in case

The dme-compunet YoloV8 and NickSwardh YoloDotNet V1 versions produced the same results, but the NickSwardh YoloDotNet V2 results were slightly different.

  • dme-Compunet 291 mSec
  • NickSwardV1 480 mSec
  • NickSwardV2 115 mSecs
  • SStainba 422 mSec

Like in the YoloV8-NuGet Performance X64 CPU post the NickSwardV2 implementation which uses SkiaSharp was significantly faster so it looks like Sixlabors.ImageSharp is the issue.

To support Compute Unified Device Architecture (CUDA) or TensorRT inferencing with NickSwardV2(for SkiaSharp) will need some major modifications to the code so it might be better to build my own YoloV8 Nuget.

YoloV8-NuGet Performance X64 CPU

When checking the dme-compunet, YoloDotNet, and sstainba and NuGets I noticed YoloDotNet readme.md detailed some performance enhancements…

What’s new in YoloDotNet v2.0?

YoloDotNet 2.0 is a Speed Demon release where the main focus has been on supercharging performance to bring you the fastest and most efficient version yet. With major code optimizations, a switch to SkiaSharp for lightning-fast image processing, and added support for Yolov10 as a little extra 😉 this release is set to redefine your YoloDotNet experience:

Changing the implementation to use SkiaSharp caught my attention because in previous testing manipulating images with the Sixlabors.ImageSharp library took longer than expected.

I built a test rig for comparing the performance of the different NuGets using standard images and ONNX Models.

I started with the dme-compunet YoloV8 NuGet which found all the tennis balls and the results were consistent with earlier tests.

dme-compunet test harness image bounding boxes

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so I built “old” and “updated” test harnesses. The V1 version found all the tennis balls and the results were consistent with earlier tests.

NickSwardh V1 test harness image bounding boxes

The YoloDotNet by NickSwardh NuGet update had some “breaking changes” so there were some code changes but the V1 and V2 results were slightly different.

NickSwardh V2 test harness image bounding boxes

Even though the YoloV8 by sstainba NuGet hadn’t been updated I ran the test harness just in case and the results were consistent with previous tests.

sstainba test harness image bounding boxes

The dme-compunet YoloV8 and NickSwardh YoloDotNet V1 versions produce the same results, but the NickSwardh YoloDotNet V2 results were slightly different. The YoloV8 by sstainba results were unchanged.

  • dme-Compunet 71 mSec
  • NickSwardV1 76 mSec
  • NickSwardV2 33 mSecs
  • SStainba 82mSec

The NickSwardV2 implementation was significantly faster, but I need to investigate the slight difference in the bounding boxes. It looks like Sixlabors.ImageSharp might be the issue.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ Execution Providers

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nvidia Jetson Orin 8G which can be used for inferencing with the Open Neural Network Exchange(ONNX)Runtime.

Story of Fail

Inferencing worked first time on the ARM64 CPU because the required runtime is included in the Microsoft.ML.OnnxRuntime NuGet

ARM64 Linux ONNX runtime
Microsoft.ML.OnnxRuntime NuGet ARM64 Linux runtime

Inferencing failed on the Nividia Jetson Orin 8G because the CUDA Execution provider and TensorRT Execution Provider for the ONNXRuntime were not included in the Microsoft.ML.OnnxRuntime.GPU.Linux NuGet.

Missing ARM64 Linux GPU runtime

There were Linux x64 and Windows x64 versions of the ONNXRuntime library included in the Microsoft.ML.OnnxRuntime.Gpu NuGet

Microsoft.ML.OnnxRuntime.Gpu NuGet x64 Linux runtime

Desperately Seeking libonnxruntime.so

The Nvidia ONNX runtime site had pip wheel files for the different versions of Python and the Open Neural Network Exchange(ONNX)Runtime.

The onnxruntime_gpu-1.18.0-cp312-cp312-linux_aarch64.whl matched the version of the ONNXRuntime I needed and version of Python on the device..

When the pip wheel file was renamed onnxruntime_gpu-1.18.0-cp312-cp312-linux_aarch64.zip it could be opened, but there wasn’t a libonnruntime.so.

Onnxruntime_gpu-1.18.0-cp312-cp312-linux_aarch64 file listing

Building the TensorRT & CUDA Execution Providers

The ONNXRuntime build has to be done on Nividia Jetson Orin so after installing all the necessary prerequisites the first attempt failed.

bryn@ubuntu:~/onnxruntime/onnxruntime$ ./build.sh --config Release --update --build --build_wheel \
--use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \
--tensorrt_home /usr/lib/aarch64-linux-gnu

When in high power mode more cores are used but this consumes more resource when building the ONNXRuntime. To limit resource utilisation --parallel2 was added the command line because the compile process was having “out of memory” failures.

bryn@ubuntu:~/onnxruntime/onnxruntime$ ./build.sh --config Release --update --build --parallel 2 --build_wheel \
--use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \
--tensorrt_home /usr/lib/aarch64-linux-gnu

There were some compiler warnings but they appear to be benign.

First attempt at running the application failed because libonnxruntime.so was missing so –build_shared_lib was added to the command line

2024-06-10 18:21:58,480 build [INFO] - Build complete
bryn@ubuntu:~/onnxruntime/onnxruntime$ ./build.sh --config Release --update --build --parallel 2 --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu --build_shared_lib

When the build completed the files were copied to the runtime folder of the program.

The application could then be configured to use the TensorRT Execution Provider.

Getting CUDA and TensorRT working on the Nvidia Jetson Orin 8G took much longer than I expected, with many dead ends and device factory resets before the process was repeatable.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ CPU & GPU TensorRT Inferencing

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G. To speed up TensorRT inferencing I built an Open Neural Network Exchange(ONNX) TensorRT Execution Provider. After updating the code to add a “warm-up” and tracking of average pre-processing, inferencing & post-processing durations I did a series of CPU & GPU performance tests.

The testing consisted of permutations of three models TennisBallsYoloV8s20240618640×640.onnx, TennisBallsYoloV8s2024062410241024.onnx & TennisBallsYoloV8x20240614640×640 (limited testing as slow) and three images TennisBallsLandscape640x640.jpg, TennisBallsLandscape1024x1024.jpg & TennisBallsLandscape3072x4080.jpg.

Executive Summary

As expected, inferencing with a TensorRT 640×640 model and a 640×640 image was fastest, 9mSec pre-processing, 21mSec inferencing, then 4mSec post-processing.

If the image had to be scaled with SixLabors.ImageSharp this significantly increased the preprocessing (and overall) time.

CPU Inferencing

GPU TensorRT Small model Inferencing

GPU TensorRT Large model Inferencing

Nvidia Jetson Orin Nano™ JetPack 6

The Seeedstudio reComputer J3011 has two processors an ARM64 CPU and an Nividia Jetson Orin 8G Coprocessor. To speed up ML.NET running on the Nividia Jetson Orin 8G required compatible versions of ML.NET Open Neural Network Exchange (ONNX) and NVIDIA Jetpack.

Before installing NVIDIA Jetpack 6 the Seeedstudio reComputer J3011 Edge AI Device has to be put into recovery mode

Seeedstudio reComputer J3011 Edge AI Device with jumper for recovery mode

When started in recovery mode the Seeedstudio J3011 was in list of Universal Serial Bus (USB) devices returned by lsusb

Upgrading to Jetpack 5.1.1 so the device could be upgraded using the Windows subsystem for Linux terminal failed. The NVIDIA SDK Manager downloads and installs all the required components and dependencies.

Installing NVIDIA Jetpack 6 from the Windows subsystem for Linux failed because the version of Ubuntu installed(Ubuntu 24.02 LTS) was not supported by NVIDIA SDK Manager.

Installing NVIDIA Jetpack 6 from a desktop PC running Ubuntu 24.02 LTS failed (the same issue as above) because the NVIDIA SDK Manager did not support that version of Ubuntu. The desktop PC was then “re-paved” with Ubuntu 22.04 LTS and NVIDIA SDK Manager worked.

An NVIDIA Developer program login is required to launch the NVIDIA SDK Manager

Selecting the right target hardware is important if it is not “auto detected”.

The Open Neural Network Exchange(ONNX) supports the Compute Unified Device Architecture (CUDA) which has to be included in the installation package.

Downloading NVIDIA Jetpack 6 and all the selected components of the install can be quite slow

Installation of NVIDIA Jetpack 6 and selected components can take a while.

Even though Jetpack 6 is now available for Seeed’s Jetson Orin Devices this process is still applicable for an upgrade or “factory reset”.

YoloV8 ONNX – Nvidia Jetson Orin Nano™ DenseTensor Performance

When running the YoloV8 Coprocessor demonstration on the Nividia Jetson Orin inferencing looked a bit odd, the dotted line wasn’t moving as fast as expected. To investigate this further I split the inferencing duration into pre-processing, inferencing and post-processing times. Inferencing and post-processing were “quick”, but pre-processing was taking longer than expected.

YoloV8 Coprocessor application running on Nvidia Jetson Orin

When I ran the demonstration Ultralytics YoloV8 object detection console application on my development desktop (13th Gen Intel(R) Core(TM) i7-13700 2.10 GHz with 32.0 GB) the pre-processing was much faster.

The much shorter pre-processing and longer inferencing durations were not a surprise as my development desktop does not have a Graphics Processing Unit(GPU)

Test image used for testing on Jetson device and development PC

The test image taken with my mobile was 3606×2715 pixels which was representative of the security cameras images to be processed by the solution.

Redgate ANTS Performance Profiler instrumentation of application execution

On my development box running the application with Redgate ANTS Performance Profiler highlighted that the Computnet YoloV8 code converting the image to a DenseTensor could be an issue.

 public static void ProcessToTensor(Image<Rgb24> image, Size modelSize, bool originalAspectRatio, DenseTensor<float> target, int batch)
 {
    var options = new ResizeOptions()
    {
       Size = modelSize,
       Mode = originalAspectRatio ? ResizeMode.Max : ResizeMode.Stretch,
    };

    var xPadding = (modelSize.Width - image.Width) / 2;
    var yPadding = (modelSize.Height - image.Height) / 2;

    var width = image.Width;
    var height = image.Height;

    // Pre-calculate strides for performance
    var strideBatchR = target.Strides[0] * batch + target.Strides[1] * 0;
    var strideBatchG = target.Strides[0] * batch + target.Strides[1] * 1;
    var strideBatchB = target.Strides[0] * batch + target.Strides[1] * 2;
    var strideY = target.Strides[2];
    var strideX = target.Strides[3];

    // Get a span of the whole tensor for fast access
    var tensorSpan = target.Buffer;

    // Try get continuous memory block of the entire image data
    if (image.DangerousTryGetSinglePixelMemory(out var memory))
    {
       Parallel.For(0, width * height, index =>
       {
             int x = index % width;
             int y = index / width;
             int tensorIndex = strideBatchR + strideY * (y + yPadding) + strideX * (x + xPadding);

             var pixel = memory.Span[index];
             WritePixel(tensorSpan.Span, tensorIndex, pixel, strideBatchR, strideBatchG, strideBatchB);
       });
    }
    else
    {
       Parallel.For(0, height, y =>
       {
             var rowSpan = image.DangerousGetPixelRowMemory(y).Span;
             int tensorYIndex = strideBatchR + strideY * (y + yPadding);

             for (int x = 0; x < width; x++)
             {
                int tensorIndex = tensorYIndex + strideX * (x + xPadding);
                var pixel = rowSpan[x];
                WritePixel(tensorSpan.Span, tensorIndex, pixel, strideBatchR, strideBatchG, strideBatchB);
             }
       });
    }
 }

 private static void WritePixel(Span<float> tensorSpan, int tensorIndex, Rgb24 pixel, int strideBatchR, int strideBatchG, int strideBatchB)
 {
    tensorSpan[tensorIndex] = pixel.R / 255f;
    tensorSpan[tensorIndex + strideBatchG - strideBatchR] = pixel.G / 255f;
    tensorSpan[tensorIndex + strideBatchB - strideBatchR] = pixel.B / 255f;
 }

For a 3606×2715 image the WritePixel method would be called tens of millions of times so its implementation and the overall approach used for ProcessToTensor has a significant impact on performance.

YoloV8 Coprocessor application running on Nvidia Jetson Orin with a resized image

Resizing the images had a significant impact on performance on the development box and Nividia Jetson Orin. This will need some investigation to see how much reducing the resizing the images impacts on the performance and accuracy of the model.

The ProcessToTensor method has already had some performance optimisations which improved performance by roughly 20%. There have been discussions about optimising similar code e.g. Efficient Bitmap to OnnxRuntime Tensor in C#, and Efficient RGB Image to Tensor in dotnet which look applicable and these will be evaluated.