ONNX Tensor loading Initial Comparison

This is the second in a series of posts from my session at the Agent Camp – Christchurch about using Open Neural Network Exchange(ONNX) for processing Moving Picture Experts Group (MPEG) video and Pulse Code Modulation(PCM) audio streams.

These benchmarks use Ultralytics Yolo26 standard object detection model input image size of 640*640pixels.

var _tensor= new DenseTensor<float>(new[] { 1, 3, modelH, modelW });

The original nested loop: multi-dimensional [0,c,y,x] indexer, with divide by 255f. This is the baseline to measure all other implementations against.

[Benchmark(Baseline = true, Description = "Baseline: indexer + / 255f")]
public void Baseline()
{
   for (int y = 0; y < modelH; y++)
      for (int x = 0; x < modelW; x++)
      {
          var c = _letterboxed.GetPixel(x, y);

         _tensor[0, 0, y, x] = px.Red / 255f;
         _tensor[0, 1, y, x] = px.Green / 255f;
         _tensor[0, 2, y, x] = px.Blue / 255f;
      }
}

The implementation bypasses the multi-dimensional [0,c,y,x] indexer entirely with Span<> over the tensor’s backing buffer. Channel planes are at offsets 0, planeSize, and 2*planeSize. Then a single loop reads each pixel once; writes to all three planes interleaved.

[Benchmark(Description = "Buffer span: flat index, interleaved")]
public void BufferSpan()
{
   SKColor[] pixels = _letterboxed.Pixels;
   const float scaler = 1 / 255f;
   int planeSize = _modelW* _modelW;
   Span<float> buf = _tensor.Buffer.Span;

   for (int i = 0; i < planeSize; i++)
   {
      SKColor px = pixels[i];
      buf[i] = px.Red * scaler;
      buf[planeSize + i] = px.Green * scaler;
      buf[2 * planeSize + i] = px.Blue * scaler;
   }
}

This implementation slices the flat buffer into three non-overlapping channel spans, it then runs three separate sequential loops, one for each colour. This Combines the benefits of span (no indexer overhead, JIT can also auto-vectorise) and with split loops which the JIT can eliminate per-element bounds checks after the slice.

   [Benchmark(Description = "Buffer span split: 3× sequential flat loops")]
   public void BufferSpanSplit()
   {
      SKColor[] pixels = _letterboxed.Pixels;
      const float scaler = 1 / 255f;
      int planeSize = _modelW* _modelH;
      Span<float> buf = _tensor.Buffer.Span;

      Span<float> rPlane = buf.Slice(0, planeSize);
      Span<float> gPlane = buf.Slice(planeSize, planeSize);
      Span<float> bPlane = buf.Slice(2 * planeSize, planeSize);

      for (int i = 0; i < planeSize; i++) rPlane[i] = pixels[i].Red * scaler;
      for (int i = 0; i < planeSize; i++) gPlane[i] = pixels[i].Green * scaler;
      for (int i = 0; i < planeSize; i++) bPlane[i] = pixels[i].Blue * scaler;
   }

The minimal difference in performance of the two fastest implementations of the benchmark suite running on my development box was a surprise. It will be interesting to see how the performance of the different implementations changes on my Seeedstudio EdgeBox RPi 200 which has a different instruction set (esp. ARM NEON Single Instruction, Multiple Data (SIMD) extensions) and memory caching model

These benchmarks should be treated as indicative not authoritative 

SkiaSharp and ImageSharp Initial Comparison

This is the first in a series of posts from my session at the Agent Camp – Christchurch about using Open Neural Network Exchange(ONNX) for processing Moving Picture Experts Group (MPEG) video and Pulse Code Modulation(PCM) audio streams.

For processing video streams one of the first steps is extracting individual Joint Photographic Experts Group(JPEG) images from MPEG Real-Time Streaming Protocol(RTSP) stream. The jpeg images then have to transformed into an ONNX DenseTensor<float> in the correct format for the Ultralytics Yolo26 model. These image processing posts will use Ultralytics Yolo26 standard Small object detection model which has an input image size of 640*640pixels.

I have used both the YoloSharp and YoloDotNet libraries (Thank you Niklas Swärd and dme-compunet I appreciate the amount of effort you have put in). Both these libraries have support for object detection, instance segmentation, oriented bounding boxes detection(OBB), classification and pose estimation. They both have support for different versions, video stream processing, plotting minimum bounding boxes, Non-Maximum Suppression(NMS) for earlier models like YOLOv8 or YOLO11. I just need object detection (none of the other model types, plotting minimum boxes etc.) to work as fast as possible on my Seeedstudio EdgeBox RPi 200.

First step, was to use Benchmark.Net compare the performance of Six Labors ImageSharp (used by YoloSharp) and SkiaSharp (used by YoloDotNet). Six Labors ImageSharp  is a high-performance, fully managed, 2D graphics API whereas SkiaSharp is a wrapper for Google’s Skia 2D Graphics Library.

ImageSharp Benchmark
SkiaSharp Benchmark

The initial comparison running on my development box (will benchmark on my Seeedstudio EdgeBox RPi 200.) was roughly what I was expecting though the SkaiSharp 2560×1440 mean duration was a bit odd. I think that the difference in the amount of memory allocated is because SkaiSharp’s memory is allocated by the native code. Both benchmarks need some refactoring to improve repeatability on my different platforms.

These benchmarks should be treated as indicative not authoritative 

Azure Event Grid nanoFramework Client – Publisher

Building a .NET nanoFramework application for testing Azure Event Grid MQTT Broker connectivity that would run on my Seeedstudio EdgeBox ESP100 and Seeedstudio Xiao ESP32S3 devices took a couple of hours. Most of that time was spent figuring out how to generate the certificate and elliptic curve private key

Create an elliptic curve private key

 openssl ecparam -name prime256v1 -genkey -noout -out device.key

Generate a certificate signing request

openssl req -new -key device.key -out device.csr -subj "/CN=device.example.com/O=YourOrg/OU=IoT"

Then use the intermediate certificate and key file from earlier to generate a device certificate and key.

 openssl x509 -req -in device.csr -CA IntermediateCA.crt -CAkey IntermediateCA.key -CAcreateserial -out device.crt -days 365 -sha256

In this post I have assumed that the reader is familiar with configuring Azure Event Grid clients, client groups, topic spaces, permission bindings and routing.

The PEM encoded root CA certificate chain that is used to validate the server
public const string CA_ROOT_PEM = @"-----BEGIN CERTIFICATE-----
CN: CN = Microsoft Azure ECC TLS Issuing CA 03
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
CN: CN = DigiCert Global Root G3
-----END CERTIFICATE-----";

The PEM encoded certificate chain that is used to authenticate the device
public const string CLIENT_CERT_PEM_A = @"-----BEGIN CERTIFICATE-----
-----BEGIN CERTIFICATE-----
 CN=Self signed device certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
 CN=Self signed Intermediate certificate
-----END CERTIFICATE-----";

 The PEM encoded private key of device
public const string CLIENT_KEY_PEM_A = @"-----BEGIN EC PRIVATE KEY-----
-----END EC PRIVATE KEY-----";

My application was “inspired” by the .NET nanoFramework m2mqtt example.

public static void Main()
{
   int sequenceNumber = 0;
   MqttClient mqttClient = null;
   Thread.Sleep(1000); // Found this works around some issues with running immediately after a reset

   bool wifiConnected = false;
   Console.WriteLine("WiFi connecting...");
   do
   {
      // Attempt to connect using DHCP
      wifiConnected = WifiNetworkHelper.ConnectDhcp(Secrets.WIFI_SSID, Secrets.WIFI_PASSWORD, requiresDateTime: true);

      if (!wifiConnected)
      {
         Console.WriteLine($"Failed to connect. Error: {WifiNetworkHelper.Status}");
         if (WifiNetworkHelper.HelperException != null)
         {
            Console.WriteLine($"Exception: {WifiNetworkHelper.HelperException}");
         }

         Thread.Sleep(1000);
      }
   }
   while (!wifiConnected);
   Console.WriteLine("WiFi connected");

   var caCert = new X509Certificate(Constants.CA_ROOT_PEM);

   X509Certificate2 clientCert = null;
   try
   {
      clientCert = new X509Certificate2(Secrets.CLIENT_CERT_PEM_A, Secrets.CLIENT_KEY_PEM_A, string.Empty);
   }
   catch (Exception ex)
   {
      Console.WriteLine($"Client Certificate Exception: {ex.Message}");
   }

   mqttClient = new MqttClient(Secrets.MQTT_SERVER, Constants.MQTT_PORT, true, caCert, clientCert, MqttSslProtocols.TLSv1_2);

   mqttClient.ProtocolVersion = MqttProtocolVersion.Version_5;

   bool mqttConnected = false;
   Console.WriteLine("MQTT connecting...");
   do
   {
      try
      {
         // Regular connect
         var resultConnect = mqttClient.Connect(Secrets.MQTT_CLIENTID, Secrets.MQTT_USERNAME, Secrets.MQTT_PASSWORD);
         if (resultConnect != MqttReasonCode.Success)
         {
            Console.WriteLine($"MQTT ERROR connecting: {resultConnect}");
            Thread.Sleep(1000);
         }
         else
         {
            mqttConnected = true;
         }
      }
      catch (Exception ex)
      {
         Console.WriteLine($"MQTT ERROR Exception '{ex.Message}'");
         Thread.Sleep(1000);
      }
   }
   while (!mqttConnected);
   Console.WriteLine("MQTT connected...");

   mqttClient.MqttMsgPublishReceived += MqttMsgPublishReceived;
   mqttClient.MqttMsgSubscribed += MqttMsgSubscribed;
   mqttClient.MqttMsgUnsubscribed += MqttMsgUnsubscribed;
   mqttClient.ConnectionOpened += ConnectionOpened;
   mqttClient.ConnectionClosed += ConnectionClosed;
   mqttClient.ConnectionClosedRequest += ConnectionClosedRequest;

   string topicPublish = string.Format(MQTT_TOPIC_PUBLISH_FORMAT, Secrets.MQTT_CLIENTID);
   while (true)
   {
      Console.WriteLine("MQTT publish message start...");

      var payload = new MessagePayload() { ClientID = Secrets.MQTT_CLIENTID, Sequence = sequenceNumber++ };

      string jsonPayload = JsonSerializer.SerializeObject(payload);

      var result = mqttClient.Publish(topicPublish, Encoding.UTF8.GetBytes(jsonPayload), "application/json; charset=utf-8", null);

      Debug.WriteLine($"MQTT published ({result}): {jsonPayload}");

      Thread.Sleep(100);
   }
}

I then configured my client (Edgebox100Z) and updated the “secrets.cs” file

Azure Event Grid MQTT Broker Clients

The application connected to the Azure Event Grid MQTT broker and started publishing the JSON payload with the incrementing sequence number.

Visual Studio debugger output of JSON payload publishing

The published messages were “routed” to an Azure Storage Queue where they could be inspected with a tool like Azure Storage Explorer.

Azure Event Grid MQTT Broker metrics with messages published selected

I could see the application was working in the Azure Event Grid MQTT broker metrics because the number of messages published was increasing.

Azure Event Grid Arduino Client – Publisher

The Arduino application for testing Azure Event Grid MQTT Broker connectivity worked on my Seeedstudio EdgeBox ESP100 and Seeedstudio Xiao ESP32S3 devices, so the next step was to modify it to publish some messages.

The first version generated the JSON payload using an snprintf which was a bit “nasty”

static uint32_t sequenceNumber = 0;

void loop() {
  mqttClient.loop();

  Serial.println("MQTT Publish start");

  char payloadBuffer[64];

  snprintf(payloadBuffer, sizeof(payloadBuffer), "{\"ClientID\":\"%s\", \"Sequence\": %i}", MQTT_CLIENTID, sequenceNumber++);

  Serial.println(payloadBuffer);

  if (!mqttClient.publish(MQTT_TOPIC_PUBLISH, payloadBuffer, strlen(payloadBuffer))) {
    Serial.print("\nMQTT publish failed:");        
    Serial.println(mqttClient.state());    
  }
  Serial.println("MQTT Publish finish");

  delay(60000);
}

I then configured my client (Edgebox100A) and updated the “secrets.h” file

Azure Event Grid MQTT Broker Clients

The application connected to the Azure Event Grid MQTT broker and started publishing the JSON payload with the incrementing sequence number.

Arduino IDE serial monitor output of JSON payload publishing

The second version generated the JSON payload using ArduinoJson library.

static uint32_t sequenceNumber = 0;

void loop() {
  mqttClient.loop();

  Serial.println("MQTT Publish start");

  // Create a static JSON document with fixed size
  StaticJsonDocument<64> doc;

  doc["Sequence"] = counter++;
  doc["ClientID"] = MQTT_CLIENTID;

  // Serialize JSON to a buffer
  char jsonBuffer[64];
  size_t n = serializeJson(doc, jsonBuffer);

  Serial.println(jsonBuffer);

  if(!mqttClient.publish(MQTT_TOPIC_PUBLISH, jsonBuffer, n))
  {
    Serial.println(mqttClient.state());    
  }

  Serial.println("MQTT Publish finish");

  delay(2000);
}

I could see the application was working in the Azure Event Grid MQTT broker metrics because the number of messages published was increasing.

Azure Event Grid MQTT Broker metrics with messages published selected

The published messages were “routed” to an Azure Storage Queue where they can be inspected with a tool like Azure Storage Explorer.

Azure Storage Explorer displaying a message’s payload

The message payload is in Base64 encoded so I used copilot convert it to text.

Microsoft copilot decoding the Base64 payload

In this post I have assumed that the reader is familiar with configuring Azure Event Grid clients, client groups, topic spaces, permission bindings and routing.

Bonus also managed to slip in a reference to copilot.

Azure Event Grid Arduino Client – The joy of certs

“Lets start at the very beginning, A very good place to start”

The Azure Event Grid MQTT Broker server X509 certificate chain can be copy ‘n’ paste from the output of the openssl command

openssl s_client -connect YourNamespace.newzealandnorth-1.ts.eventgrid.azure.net:8883 -showcerts

A self-signed X509 root certificate which can sign intermediate X509 certificates and key file can be generated with a single openssl command.

openssl req -x509 -newkey rsa:4096 -keyout rootCA.key -out rootCA.crt -days 3650 -nodes -subj "/CN=devMobile  /O=devMobile.co.nz /C=NZ" -addext "basicConstraints=critical,CA:TRUE" -addext "keyUsage=critical,keyCertSign"

For a non-trivial system there should be a number of intermediate certificates. I have tried creating intermediate certificates for a device type, geography, application, customer and combinations of these. The first couple of times got it wrong so start with a field trial so that it isn’t so painful to go back and fix. (beware the sunk cost fallacy)

openssl genrsa -out intermediate.key 4096

openssl req -new -key intermediate.key -out intermediate.csr -subj "/CN=intermediate  /O=devMobile.co.nz /C=NZ"

I found creating an intermediate certificate that could sign device certificates required a conf file for the basicConstraints and keyUsage configuration.

[ v3_intermediate_ca ]
basicConstraints = critical, CA:TRUE, pathlen:0
keyUsage = critical, keyCertSign
  • critical-The extension must be understood and processed by any application validating the certificate. If the application does not understand it, the certificate must be rejected.
  • CA:TRUE-This certificate is allowed to act as a Certificate Authority (CA), meaning it can sign other certificates.
  • pathlen:0-This CA can only issue end-entity (leaf) certificates and cannot issue further intermediate CA certificates.
  • keyCertSig- The certificate can be used to sign other certificates (i.e., it’s a CA certificate).
openssl x509 -req -in intermediate.csr  -CA rootCA.crt -CAkey rootCA.key -CAcreateserial -out intermediate.crt -days 1825 -extfile intermediate_ext.cnf -extensions v3_intermediate_ca

Creating a device certificate is similar to the process for the intermediate certificate but doesn’t need to be able to sign certificates.

openssl genrsa -out EdgeBox100A.key 4096

openssl req -new -key EdgeBox100A.key -out EdgeBox100A.csr -subj "/CN=EdgeBox100A"

openssl x509 -req -in EdgeBox100A.csr -CA intermediate.crt -CAkey intermediate.key -CAcreateserial -out EdgeBox100A.crt -days 365 

For production systems putting some thought into the Common name(CN), Organizational unit name(OU), Organization name(O), locality name(L), state or province name(S) and Country name(C)

// Minimalist ESP32 + Event Grid MQTT (mTLS) with PubSubClient
// Copyright (c) November 2025, devMobile Software
#include <PubSubClient.h>
#include <WiFi.h>
#include <WiFiClientSecure.h>

#include "constants.h"
#include "secrets.h"

// --- Wi-Fi ---
//const char* WIFI_SSID     = "";
//const char* WIFI_PASSWORD = "";

// --- Event Grid MQTT ---
//const char* MQTT_SERVER = "";
const uint16_t MQTT_PORT = 8883;

//const char* MQTT_CLIENTID = "";
//const char* MQTT_USERNAME = "";
//const char* MQTT_PASSWORD = "";
//const char* MQTT_TOPIC_PUBLISH = "devices/";
//const char* MQTT_TOPIC_SUBSCRIBE = "devices/";

/*
// The certificate that is used to authenticate the MQTT Broker
const char CA_ROOT_PEM[] PROGMEM = R"PEM(
-----BEGIN CERTIFICATE-----
      Thumbprint: 56D955C849887874AA1767810366D90ADF6C8536
      CN: CN=Microsoft Azure ECC TLS Issuing CA 03
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
      Thumbprint: 7E04DE896A3E666D00E687D33FFAD93BE83D349E
      CN: CN=DigiCert Global Root G3
-----END CERTIFICATE-----
)PEM";

The certificate that is used to authenticate the device
static const char CLIENT_CERT_PEM[] PROGMEM = R"PEM(
-----BEGIN CERTIFICATE-----
 CN=Self signed device certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
 CN=Self signed Intermediate certificate
-----END CERTIFICATE-----
)PEM";

 The PEM encoded private key of device
static const char CLIENT_KEY_PEM[] PROGMEM = R"PEM(
-----BEGIN PRIVATE KEY-----
-----END PRIVATE KEY-----
)PEM";
*/

WiFiClientSecure secureClient;
PubSubClient mqttClient(secureClient);

void setup() {
  Serial.begin(9600);
  delay(5000);
  Serial.println();

  // Connect to WiFi
  Serial.println("WiFi connecting");
  WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
  Serial.print("*");
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print("*");
  }
  Serial.println("\nWiFi connected");

  // Sync time for TLS
  Serial.println("\nTime synchronising");
  configTime(0, 0, "pool.ntp.org", "time.nist.gov");
  Serial.print("*");
  while (time(nullptr) < 100000) {
    delay(500);
    Serial.print("*");
  }
  Serial.println("\nTime synchronised");

  Serial.println("\nValidating ServerFQDN-Certificate combination");
  secureClient.setCACert(CA_ROOT_PEM);

  Serial.println("TCP connecting");
  if (secureClient.connect(MQTT_SERVER, MQTT_PORT)) {
    Serial.println("\nTCP connected");
  } else {
    Serial.println("\nTCP connection failed");
    return;
  }

  secureClient.setCertificate(CLIENT_CERT_A_PEM);
  secureClient.setPrivateKey(CLIENT_KEY_A_PEM);

  mqttClient.setServer(MQTT_SERVER, MQTT_PORT);

  Serial.println("\nMQTT connecting");
  Serial.print("*");
  while (!mqttClient.connect(MQTT_CLIENTID, MQTT_USERNAME, MQTT_PASSWORD)) {
    Serial.println(mqttClient.state());
    delay(5000);
    Serial.print("*");
  }
  Serial.println("\nMQTT connected");
}

static uint32_t sequenceNumber = 0;

void loop() {
  mqttClient.loop();

  Serial.print("'.");
  delay(10000);
}

My Arduino Xiao ESP32S3 and EdgeBox-ESP-100-Industrial Edge Controller devices could connect to the local Wi-Fi, get the time and date using the network time protocol(NTP), and validate the Azure Event Grid MQTT broker certificate. Then connect to the Azure Event Grid MQTT broker with the client name specified in the subject name of its X509 certificate.

Establishing a connection to the Azure Event Grid MQTT broker often failed which surprised me. Initially I didn’t have any retry logic which meant I wasted quite a bit of time trying to debug failed connections

Azure Event Grid Server Certificate Validation

Over the last couple of weekends I had been trying to get a repeatable process for extracting the X509 certificate information in the correct structure so my Arduino application could connect to Azure Event Grid. The first step was to get the certificate chain for my Azure Event Grid MQTT Broker with openssl

openssl s_client -connect YourNameSpaceHere.newzealandnorth-1.ts.eventgrid.azure.net:8883 -showcerts

The CN: CN=DigiCert Global Root G3 and the wildcard CN=*.eventgrid.azure.net certificates were “concatenated” in the constants header file which is included in the main program file. The format of the certificate chain is described in the comments. Avoid blank lines, “rogue” spaces or other formatting as these may cause the WiFiClientSecure Mbed TLS implementation to fail.

/*
Minimalist ESP32 + Azure Event Grid MQTT Event Grid broker namespace certificate validation
copyright (c) November 2025, devMobile Software
*/
#include <WiFi.h>
#include <WiFiClientSecure.h>
#include "secrets.h"
#include "constants.h"

// --- Wi-Fi ---
//const char* WIFI_SSID     = "";
//const char* WIFI_PASSWORD = "";

//const char* MQTT_SERVER = "YourNamespace.newzealandnorth-1.ts.eventgrid.azure.net";
const uint16_t MQTT_PORT = 8883;

/*
// The certificate that is used to authenticate the MQTT Broker
const char CA_ROOT_PEM[] PROGMEM = R"PEM(
-----BEGIN CERTIFICATE-----
MIIGdTCCBfugAwIBAgITMwAC8tqK8+gk3Ll5FwAAAALy2jAKBggqhkjOPQQDAzBd
....
      Thumbprint: 56D955C849887874AA1767810366D90ADF6C8536
      CN: CN=Microsoft Azure ECC TLS Issuing CA 03
      CN=*.eventgrid.azure.net      
....
4ZWZhnNydNZmt4H/7KAd5/UaIP/IUI/xBg==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDXTCCAuOgAwIBAgIQAVKe6DaPC11yukM+LY6mLTAKBggqhkjOPQQDAzBhMQsw
....
      Thumbprint: 7E04DE896A3E666D00E687D33FFAD93BE83D349E
      CN: CN=DigiCert Global Root G3
....
MGHYkSqHik6yPbKi1OaJkVl9grldr+Y+z+jgUwWIaJ6ljXXj8cPXpyFgz3UEDnip
Eg==
-----END CERTIFICATE-----
)PEM";
*/

WiFiClientSecure secureClient;

void setup() {
  Serial.begin(9600);
  delay(2000);
  Serial.println("\nServerCertificateValidationClient starting");

  struct tm timeinfo;
  if (getLocalTime(&timeinfo)) {
    Serial.printf("Startup DateTime: %04d-%02d-%02d %02d:%02d:%02d\n", timeinfo.tm_year + 1900, timeinfo.tm_mon + 1, timeinfo.tm_mday, timeinfo.tm_hour, timeinfo.tm_min, timeinfo.tm_sec);
  }

  // Connect to WiFi
  Serial.println("WiFi connecting");
  WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
  Serial.print("*");
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print("*");
  }
  Serial.println("\nWiFi connected");

  if (getLocalTime(&timeinfo)) {
    Serial.printf("Wifi DateTime: %04d-%02d-%02d %02d:%02d:%02d\n", timeinfo.tm_year + 1900, timeinfo.tm_mon + 1, timeinfo.tm_mday, timeinfo.tm_hour, timeinfo.tm_min, timeinfo.tm_sec);
  }

  // Sync time for TLS
  Serial.println("\nTime synchronising");
  configTime(0, 0, "pool.ntp.org", "time.nist.gov");
  Serial.print("*");
  while (time(nullptr) < 100000) {
    delay(500);
    Serial.print("*");
  }
  Serial.println("\nTime synchronised");

  if (getLocalTime(&timeinfo)) {
    Serial.printf("NTP DateTime: %04d-%02d-%02d %02d:%02d:%02d\n", timeinfo.tm_year + 1900, timeinfo.tm_mon + 1, timeinfo.tm_mday, timeinfo.tm_hour, timeinfo.tm_min, timeinfo.tm_sec);
  }

  Serial.println("\nValidating ServerFQDN-Certificate combination");
  secureClient.setCACert(CA_ROOT_PEM);
  Serial.print("*");
  while (!secureClient.connect(MQTT_SERVER, MQTT_PORT)) {
    delay(500);
    Serial.print("*");
  }
  Serial.println("\nTLS Connected");
}

void loop() {
  Serial.print("x");
  delay(5000);
}

After a hard reset the WiFiClientSecure connect failed because the device time had not been initialised so the device/server time offset was too large (see rfc9325)

After a “hard” reset the Network Time Protocol(NTP) client was used to set the device time.

After a “soft” reset the Network Time Protocol(NTP) client did not have to be called.

Cloud AI with Copilot – Faster R-CNN Azure HTTP Function Performance Setup

Introduction

The Faster R-CNN Azure HTTP Trigger function performed (not unexpectedly) differently when invoked with Fiddler Classic in the Azure Functions emulator vs. when deployed in an Azure App Plan.

The code used is a “tidied” up version of the version of the code from the Building Cloud AI with Copilot – Faster R-CNN Azure HTTP Function “Dog Food” post

public class Function1
{
   private readonly ILogger<Function1> _logger;
   private readonly List<string> _labels;
   private readonly InferenceSession _session;

   public Function1(ILogger<Function1> logger)
   {
      _logger = logger;
      _labels = File.ReadAllLines(Path.Combine(AppContext.BaseDirectory, "labels.txt")).ToList();
      _session = new InferenceSession(Path.Combine(AppContext.BaseDirectory, "FasterRCNN-10.onnx"));
   }

   [Function("ObjectDetectionFunction")]
   public async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req, ExecutionContext context)
   {
      if (!req.ContentType.StartsWith("image/"))
         return new BadRequestObjectResult("Content-Type must be an image.");

      using var ms = new MemoryStream();
      await req.Body.CopyToAsync(ms);
      ms.Position = 0;

      using var image = Image.Load<Rgb24>(ms);
      var inputTensor = PreprocessImage(image);

      var inputs = new List<NamedOnnxValue>
                  {
                      NamedOnnxValue.CreateFromTensor("image", inputTensor)
                  };

      using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = _session.Run(inputs);
      var output = results.ToDictionary(x => x.Name, x => x.Value);

      var boxes = (DenseTensor<float>)output["6379"];
      var labels = (DenseTensor<long>)output["6381"];
      var scores = (DenseTensor<float>)output["6383"];

      var detections = new List<object>();
      for (int i = 0; i < scores.Length; i++)
      {
         if (scores[i] > 0.5)
         {
            detections.Add(new
            {
               label = _labels[(int)labels[i]],
               score = scores[i],
               box = new
               {
                  x1 = boxes[i, 0],
                  y1 = boxes[i, 1],
                  x2 = boxes[i, 2],
                  y2 = boxes[i, 3]
               }
            });
         }
      }
      return new OkObjectResult(detections);
   }

   private static DenseTensor<float> PreprocessImage(Image<Rgb24> image)
   {
      // Step 1: Resize so that min(H, W) = 800, max(H, W) <= 1333, keeping aspect ratio
      int origWidth = image.Width;
      int origHeight = image.Height;
      int minSize = 800;
      int maxSize = 1333;

      float scale = Math.Min((float)minSize / Math.Min(origWidth, origHeight),
                             (float)maxSize / Math.Max(origWidth, origHeight));

      int resizedWidth = (int)Math.Round(origWidth * scale);
      int resizedHeight = (int)Math.Round(origHeight * scale);

      image.Mutate(x => x.Resize(resizedWidth, resizedHeight));

      // Step 2: Pad so that both dimensions are divisible by 32
      int padWidth = ((resizedWidth + 31) / 32) * 32;
      int padHeight = ((resizedHeight + 31) / 32) * 32;

      var paddedImage = new Image<Rgb24>(padWidth, padHeight);
      paddedImage.Mutate(ctx => ctx.DrawImage(image, new Point(0, 0), 1f));

      // Step 3: Convert to BGR and normalize
      float[] mean = { 102.9801f, 115.9465f, 122.7717f };
      var tensor = new DenseTensor<float>(new[] { 3, padHeight, padWidth });

      for (int y = 0; y < padHeight; y++)
      {
         for (int x = 0; x < padWidth; x++)
         {
            Rgb24 pixel = default;
            if (x < resizedWidth && y < resizedHeight)
               pixel = paddedImage[x, y];

            tensor[0, y, x] = pixel.B - mean[0];
            tensor[1, y, x] = pixel.G - mean[1];
            tensor[2, y, x] = pixel.R - mean[2];
         }
      }

      paddedImage.Dispose();

      return tensor;
   }
}

For my initial testing in the Azure Functions emulator using Fiddler Classic I manually generated 10 requests, then replayed them sequentially, and then finally concurrently.

The results for the manual, then sequential results were fairly consistent but the 10 concurrent requests each to took more than 10x longer. In addition, the CPU was at 100% usage while the concurrently executed functions were running.

Cloud Deployment

To see how the Faster R-CNN Azure HTTP Trigger function performed I created four resource groups.

The first contained resources used by the three different deployment models being tested

The second resource group was for testing a Dedicated hosting plan deployment.

The third resource group was for testing an Azure Functions Consumption plan hosting.

The fourth resource group was for testing Azure Functions Flex Consumption plan hosting.

Summary

The next couple of posts will compare and look at options for improving the “performance” (scalability, execution duration, latency, jitter, billing etc.) of the Github Copilot generated code.

Building Cloud AI with Copilot – Faster R-CNN Azure HTTP Function SKU Results

Introduction

While testing the FasterRCNNObjectDetectionHttpTrigger function with Telerik Fiddler Classic and my “standard” test image I noticed the response bodies were different sizes.

Initially the application plan was an S1 SKU (1 vCPU 1.75G RAM)

The output JSON was 641 bytes

[
  {
    "label": "person",
    "score": 0.9998331,
    "box": {
      "x1": 445.9223, "y1": 124.11987, "x2": 891.18915, "y2": 696.37164
    }
  },
  {
    "label": "person",
    "score": 0.9994991,
    "box": {
      "x1": 0, "y1": 330.16595, "x2": 471.0475, "y2": 761.35846
    }
  },
  {
    "label": "baseball bat",
    "score": 0.9952342,
    "box": { "x1": 869.8053, "y1": 336.96188, "x2": 1063.2261, "y2": 467.74136
    }
  },
  {
    "label": "sports ball",
    "score": 0.9945949,
    "box": { "x1": 1040.916, "y1": 372.41507, "x2": 1071.8958, "y2": 402.50424
    }
  },
  {
    "label": "baseball glove",
    "score": 0.9943546,
    "box": {
      "x1": 377.8922, "y1": 431.95053, "x2": 458.4937, "y2": 536.52124
    }
  },
  {
    "label": "person",
    "score": 0.51779467,
    "box": {
      "x1": 0, "y1": 239.91418, "x2": 60.342667, "y2": 397.17004
    }
  }
]

The application plan was scaled to a Premium v3 P0V3 (1 vCPU 4G RAM)

The output JSON was 637 bytes

[
  {
    "label": "person",
    "score": 0.9998332,
    "box": {
      "x1": 445.9223, "y1": 124.1199, "x2": 891.18915, "y2": 696.3716
    }
  },
  {
    "label": "person",
    "score": 0.9994991,
    "box": { "x1": 0, "y1": 330.16595, "x2": 471.0475, "y2": 761.35846
    }
  },
  {
    "label": "baseball bat",
    "score": 0.9952342,
    "box": {
      "x1": 869.8053, "y1": 336.9619, "x2": 1063.2261, "y2": 467.74133
    }
  },
  {
    "label": "sports ball",
    "score": 0.994595,
    "box": {
      "x1": 1040.916, "y1": 372.41507, "x2": 1071.8958, "y2": 402.50424
    }
  },
  {
    "label": "baseball glove",
    "score": 0.9943546,
    "box": {
      "x1": 377.8922, "y1": 431.95053, "x2": 458.4937, "y2": 536.52124
    }
  },
  {
    "label": "person",
    "score": 0.51779467,
    "box": {
      "x1": 0, "y1": 239.91418, "x2": 60.342667, "y2": 397.17004
    }
  }
]

The application plan was scaled to Premium v3 P1V3 (2 vCPU 8G RAM)

The output JSON was 641 bytes

[
  {
    "label": "person",
    "score": 0.9998331,
    "box": {
      "x1": 445.9223, "y1": 124.11987, "x2": 891.18915, "y2": 696.37164
    }
  },
  {
    "label": "person",
    "score": 0.9994991,
    "box": {
      "x1": 0, "y1": 330.16595, "x2": 471.0475, "y2": 761.35846
    }
  },
  {
    "label": "baseball bat",
    "score": 0.9952342,
    "box": {
      "x1": 869.8053, "y1": 336.96188, "x2": 1063.2261, "y2": 467.74136
    }
  },
  {
    "label": "sports ball",
    "score": 0.9945949,
    "box": {
      "x1": 1040.916, "y1": 372.41507, "x2": 1071.8958, "y2": 402.50424
    }
  },
  {
    "label": "baseball glove",
    "score": 0.9943546,
    "box": {
      "x1": 377.8922, "y1": 431.95053, "x2": 458.4937, "y2": 536.52124
    }
  },
  {
    "label": "person",
    "score": 0.51779467,
    "box": {
      "x1": 0, "y1": 239.91418, "x2": 60.342667, "y2": 397.17004
    }
  }
]

The application plan was scaled to a Premium v3 P2V3 (4 vCPU 16G RAM)

The output JSON was 641 bytes

[
  {
    "label": "person",
    "score": 0.9998331,
    "box": {
      "x1": 445.9223, "y1": 124.11987, "x2": 891.18915, "y2": 696.37164
    }
  },
  {
    "label": "person",
    "score": 0.9994991,
    "box": {
      "x1": 0, "y1": 330.16595, "x2": 471.0475, "y2": 761.35846
    }
  },
  {
    "label": "baseball bat",
    "score": 0.9952342,
    "box": {
      "x1": 869.8053, "y1": 336.96188, "x2": 1063.2261, "y2": 467.74136
    }
  },
  {
    "label": "sports ball",
    "score": 0.9945949,
    "box": {
      "x1": 1040.916, "y1": 372.41507, "x2": 1071.8958, "y2": 402.50424
    }
  },
  {
    "label": "baseball glove",
    "score": 0.9943546,
    "box": {
      "x1": 377.8922, "y1": 431.95053, "x2": 458.4937, "y2": 536.52124 }
  },
  {
    "label": "person",
    "score": 0.51779467,
    "box": {
      "x1": 0, "y1": 239.91418, "x2": 60.342667, "y2": 397.17004
    }
  }
]

The application plan was scaled to a Premium v2 P1V2 (1vCPU 3.5G)

The output JSON was 637 bytes

[
  {
    "label": "person",
    "score": 0.9998332,
    "box": {
      "x1": 445.9223, "y1": 124.1199, "x2": 891.18915, "y2": 696.3716
    }
  },
  {
    "label": "person",
    "score": 0.9994991,
    "box": {
      "x1": 0, "y1": 330.16595, "x2": 471.0475, "y2": 761.35846
    }
  },
  {
    "label": "baseball bat",
    "score": 0.9952342,
    "box": {
      "x1": 869.8053, "y1": 336.9619, "x2": 1063.2261, "y2": 467.74133
    }
  },
  {
    "label": "sports ball",
    "score": 0.994595,
    "box": {
      "x1": 1040.916, "y1": 372.41507, "x2": 1071.8958, "y2": 402.50424
    }
  },
  {
    "label": "baseball glove",
    "score": 0.9943546,
    "box": {
      "x1": 377.8922, "y1": 431.95053, "x2": 458.4937, "y2": 536.52124
    }
  },
  {
    "label": "person",
    "score": 0.51779467,
    "box": {
      "x1": 0, "y1": 239.91418, "x2": 60.342667, "y2": 397.17004
    }
  }
]

Summary

The differences between the 637 & 641were small

Not certain why this could happen currently best guess is memory pressure.

Building Cloud AI with Copilot – Faster R-CNN Azure HTTP Function “Dog Food”

Introduction

A couple of months ago a web crawler visited every page on my website (would be interesting to know if my Github repositories were crawled as well) and I wondered if this might impact my Copilot or Github Copilot experiments. My blogging about The Azure HTTP Trigger functions with Ultralytics Yolo, YoloSharp, Resnet, Faster R-CNN, with Open Neural Network Exchange(ONNX) etc. is fairly “niche” so any improvements in the understanding of the problems and generated code might be visible.

please write an httpTrigger azure function that uses Faster RCNN and ONNX to detect the object in an image uploaded in the body of an HTTP Post

Github Copilot had used Sixlabors ImageSharp, the ILogger was injected into the constructor, the code checked that the image was in the body of the HTTP POST and the object classes were loaded from a text file. I had to manually add some Nugets and using directives before the code compiled and ran in the emulator, but this was a definite improvement.

To test the implementation, I was using Telerik Fiddler Classic to HTTP POST my “standard” test image to function.

Github Copilot had generated code that checked that the image was in the body of the HTTP POST so I had to modify the Telerik Fiddler Classic request.

I also had to fix up the content-type header

The path to the onnx file was wrong and I had to create a labels.txt file from Python code.

The Azure HTTP Trigger function ran but failed because the preprocessing of the image didn’t implement the specified preprocess steps.

Change DenseTensor to BGR (based on https://github.com/onnx/models/tree/main/validated/vision/object_detection_segmentation/faster-rcnn#preprocessing-steps)

Normalise colour values with mean = [102.9801, 115.9465, 122.7717]

The Azure HTTP Trigger function ran but failed because the output tensor names were incorrect

I used Netron to inspect the model properties to get the correct names for the output tensors

I had a couple of attempts at resizing the image to see what impact this had on the accuracy of the confidence and minimum bounding rectangles.

resize the image such that both height and width are within the range of [800, 1333], and then pad the image with zeros such that both height and width are divisible by 32.

modify the code to resize the image such that both height and width are within the range of [800, 1333], and then pad the image with zeros such that both height and width are divisible by 32 and the aspect ratio is not changed.

The final version of the image processing code scaled then right padded the image to keep the aspect ratio and MBR coordinates correct.

As a final test I deployed the code to Azure and the first time I ran the function it failed because the labels file couldn’t be found because Unix file paths are case sensitive (labels.txt vs. Labels.txt).

The inferencing time was a bit longer than I expected.

// please write an httpTrigger azure function that uses Faster RCNN and ONNX to detect the object in an image uploaded in the body of an HTTP Post
//    manually added the ML.Net ONNX NuGet + using directives
//    manually added the ImageSharp NuGet + using directives
//    Used Copilot to add Microsoft.ML.OnnxRuntime.Tensors using directive
//    Manually added ONNX FIle + labels file sorted out paths
//    Used Netron to fixup output tensor names
// Change DenseTensor to BGR (based on https://github.com/onnx/models/tree/main/validated/vision/object_detection_segmentation/faster-rcnn#preprocessing-steps)
// Normalise colour values with mean = [102.9801, 115.9465, 122.7717]
// resize the image such that both height and width are within the range of [800, 1333], and then pad the image with zeros such that both height and width are divisible by 32.
// modify the code to resize the image such that both height and width are within the range of [800, 1333], and then pad the image with zeros such that both height and width are divisible by 32 and the aspect ratio is not changed.
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using SixLabors.ImageSharp; // Couldn't get inteliisense after adding NuGet package
using SixLabors.ImageSharp.PixelFormats; // Couldn't get inteliisense after adding NuGet package
using SixLabors.ImageSharp.Processing; // Couldn't get inteliisense after adding NuGet package


namespace FasterRCNNObjectDetectionHttpTriggerGithubCopilot
{
   public class Function1
   {
      private readonly ILogger<Function1> _logger;
      private readonly InferenceSession _session;
      private readonly List<string> _labels;

      public Function1(ILogger<Function1> logger)
      {
         _logger = logger;
         _session = new InferenceSession("FasterRCNN-10.onnx");
         _labels = File.ReadAllLines("labels.txt").ToList();
      }

      [Function("ObjectDetectionFunction")]
      public async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req)
      {
         if (!req.ContentType.StartsWith("image/"))
            return new BadRequestObjectResult("Content-Type must be an image.");

         using var ms = new MemoryStream();
         await req.Body.CopyToAsync(ms);
         ms.Position = 0;

         using var image = Image.Load<Rgb24>(ms);
         var inputTensor = PreprocessImage(image);

         var inputs = new List<NamedOnnxValue>
                  {
                      NamedOnnxValue.CreateFromTensor("image", inputTensor)
                  };

         using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = _session.Run(inputs);
         var output = results.ToDictionary(x => x.Name, x => x.Value);

         var boxes = (DenseTensor<float>)output["6379"];
         var labels = (DenseTensor<long>)output["6381"];
         var scores = (DenseTensor<float>)output["6383"];

         var detections = new List<object>();
         for (int i = 0; i < scores.Length; i++)
         {
            if (scores[i] > 0.5)
            {
               detections.Add(new
               {
                  label = _labels[(int)labels[i]],
                  score = scores[i],
                  box = new
                  {
                     x1 = boxes[i, 0],
                     y1 = boxes[i, 1],
                     x2 = boxes[i, 2],
                     y2 = boxes[i, 3]
                  }
               });
            }
         }

         return new OkObjectResult(detections);
      }

      private static DenseTensor<float> PreprocessImage( Image<Rgb24> image)
      {
         // Step 1: Resize so that min(H, W) = 800, max(H, W) <= 1333, keeping aspect ratio
         int origWidth = image.Width;
         int origHeight = image.Height;
         int minSize = 800;
         int maxSize = 1333;

         float scale = Math.Min((float)minSize / Math.Min(origWidth, origHeight),
                                (float)maxSize / Math.Max(origWidth, origHeight));
         /*
         float scale = 1.0f;

         // If either dimension is less than 800, scale up so the smaller is 800
         if (origWidth < minSize || origHeight < minSize)
         {
            scale = Math.Max((float)minSize / origWidth, (float)minSize / origHeight);
         }
         // If either dimension is greater than 1333, scale down so the larger is 1333
         if (origWidth * scale > maxSize || origHeight * scale > maxSize)
         {
            scale = Math.Min((float)maxSize / origWidth, (float)maxSize / origHeight);
         }
         */

         int resizedWidth = (int)Math.Round(origWidth * scale);
         int resizedHeight = (int)Math.Round(origHeight * scale);

         image.Mutate(x => x.Resize(resizedWidth, resizedHeight));

         // Step 2: Pad so that both dimensions are divisible by 32
         int padWidth = ((resizedWidth + 31) / 32) * 32;
         int padHeight = ((resizedHeight + 31) / 32) * 32;

         var paddedImage = new Image<Rgb24>(padWidth, padHeight);
         paddedImage.Mutate(ctx => ctx.DrawImage(image, new Point(0, 0), 1f));

         // Step 3: Convert to BGR and normalize
         float[] mean = { 102.9801f, 115.9465f, 122.7717f };
         var tensor = new DenseTensor<float>(new[] { 3, padHeight, padWidth });

         for (int y = 0; y < padHeight; y++)
         {
            for (int x = 0; x < padWidth; x++)
            {
               Rgb24 pixel = default;
               if (x < resizedWidth && y < resizedHeight)
                  pixel = paddedImage[x, y];

               tensor[0, y, x] = pixel.B - mean[0];
               tensor[1, y, x] = pixel.G - mean[1];
               tensor[2, y, x] = pixel.R - mean[2];
            }
         }

         paddedImage.Dispose();
         return tensor;
      }
   }
}

It took roughly an hour to “vibe code” the function, but it would have taken much longer for someone not familiar with the problem domain.

Summary

The Github Copilot generated code was okay but would be fragile, performance would suck and not scale terribly well.

The Copilot generated code in this post is not suitable for production

ONNXRuntime.AI-Faster R-CNN C# Sample differences

After building Faster R-CCN object detection applications with Copilot and Github Copilot the results when compared with the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample (which hasn’t been updated for years) were slightly different.

The sample image was 640×480 pixels

The FasterRCNNObjectDetectionApplicationGitHubCopilot application scaled image was initially 1056×800 then 1088×800 pixels.

The initial version the dimensions were “rounded down” to the next multiple of 32

// Calculate scale factor to fit within the range while maintaining aspect ratio
float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                (float)minSize / Math.Min(originalWidth, originalHeight));

// Calculate new dimensions
int newWidth = (int)(originalWidth * scale);
int newHeight = (int)(originalHeight * scale);

// Ensure dimensions are divisible by 32
newWidth = (newWidth / divisor) * divisor;
newHeight = (newHeight / divisor) * divisor;
Scaled 1056×800

Then for the second version the dimensions were “rounded up” to the next multiple of 32

// Calculate scale factor to fit within the range while maintaining aspect ratio
float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                (float)minSize / Math.Min(originalWidth, originalHeight));

// Calculate new dimensions
int newWidth = (int)(originalWidth * scale);
int newHeight = (int)(originalHeight * scale);

// Ensure dimensions are divisible by 32
newWidth = (int)(Math.Ceiling(newWidth / 32f) * 32f);
newHeight = (int)(Math.Ceiling(newHeight / 32f) * 32f);
Scaled 1088×800
Marked up 1088×800

The FasterRCNNObjectDetectionApplicationOriginal application scaled the input image to 1066×800

Scaled image 1066×800

The FasterRCNNObjectDetectionApplicationOriginal application pillar boxed/padded the image to 1088×800 as the DenseTensor was loaded.

using Image<Rgb24> image = Image.Load<Rgb24>(imageFilePath);

Console.WriteLine($"Before x:{image.Width} y:{image.Height}");

// Resize image
float ratio = 800f / Math.Min(image.Width, image.Height);
image.Mutate(x => x.Resize((int)(ratio * image.Width), (int)(ratio * image.Height)));

Console.WriteLine($"After x:{image.Width} y:{image.Height}");

// Preprocess image
var paddedHeight = (int)(Math.Ceiling(image.Height / 32f) * 32f);
var paddedWidth = (int)(Math.Ceiling(image.Width / 32f) * 32f);

Console.WriteLine($"Padded x:{paddedWidth} y:{paddedHeight}");

Tensor<float> input = new DenseTensor<float>(new[] { 3, paddedHeight, paddedWidth });
var mean = new[] { 102.9801f, 115.9465f, 122.7717f };
image.ProcessPixelRows(accessor =>
{
   for (int y = paddedHeight - accessor.Height; y < accessor.Height; y++)
   {
      Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
      for (int x = paddedWidth - accessor.Width; x < accessor.Width; x++)
      {
         input[0, y, x] = pixelSpan[x].B - mean[0];
         input[1, y, x] = pixelSpan[x].G - mean[1];
         input[2, y, x] = pixelSpan[x].R - mean[2];
      }
   }
});
Marked up image 1066×800

I think the three different implementations of the preprocessing steps and the graphics libraries used probably caused the differences in the results. The way an image is “resized” by System.Graphics.Common vs. ImageSharp(resampled, cropped and centered or padded and pillar boxed) could make a significant difference to the results.