ONNXRuntime.AI-Faster R-CNN C# Sample differences

After building Faster R-CCN object detection applications with Copilot and Github Copilot the results when compared with the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample (which hasn’t been updated for years) were slightly different.

The sample image was 640×480 pixels

The FasterRCNNObjectDetectionApplicationGitHubCopilot application scaled image was initially 1056×800 then 1088×800 pixels.

The initial version the dimensions were “rounded down” to the next multiple of 32

// Calculate scale factor to fit within the range while maintaining aspect ratio
float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                (float)minSize / Math.Min(originalWidth, originalHeight));

// Calculate new dimensions
int newWidth = (int)(originalWidth * scale);
int newHeight = (int)(originalHeight * scale);

// Ensure dimensions are divisible by 32
newWidth = (newWidth / divisor) * divisor;
newHeight = (newHeight / divisor) * divisor;
Scaled 1056×800

Then for the second version the dimensions were “rounded up” to the next multiple of 32

// Calculate scale factor to fit within the range while maintaining aspect ratio
float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                (float)minSize / Math.Min(originalWidth, originalHeight));

// Calculate new dimensions
int newWidth = (int)(originalWidth * scale);
int newHeight = (int)(originalHeight * scale);

// Ensure dimensions are divisible by 32
newWidth = (int)(Math.Ceiling(newWidth / 32f) * 32f);
newHeight = (int)(Math.Ceiling(newHeight / 32f) * 32f);
Scaled 1088×800
Marked up 1088×800

The FasterRCNNObjectDetectionApplicationOriginal application scaled the input image to 1066×800

Scaled image 1066×800

The FasterRCNNObjectDetectionApplicationOriginal application pillar boxed/padded the image to 1088×800 as the DenseTensor was loaded.

using Image<Rgb24> image = Image.Load<Rgb24>(imageFilePath);

Console.WriteLine($"Before x:{image.Width} y:{image.Height}");

// Resize image
float ratio = 800f / Math.Min(image.Width, image.Height);
image.Mutate(x => x.Resize((int)(ratio * image.Width), (int)(ratio * image.Height)));

Console.WriteLine($"After x:{image.Width} y:{image.Height}");

// Preprocess image
var paddedHeight = (int)(Math.Ceiling(image.Height / 32f) * 32f);
var paddedWidth = (int)(Math.Ceiling(image.Width / 32f) * 32f);

Console.WriteLine($"Padded x:{paddedWidth} y:{paddedHeight}");

Tensor<float> input = new DenseTensor<float>(new[] { 3, paddedHeight, paddedWidth });
var mean = new[] { 102.9801f, 115.9465f, 122.7717f };
image.ProcessPixelRows(accessor =>
{
   for (int y = paddedHeight - accessor.Height; y < accessor.Height; y++)
   {
      Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
      for (int x = paddedWidth - accessor.Width; x < accessor.Width; x++)
      {
         input[0, y, x] = pixelSpan[x].B - mean[0];
         input[1, y, x] = pixelSpan[x].G - mean[1];
         input[2, y, x] = pixelSpan[x].R - mean[2];
      }
   }
});
Marked up image 1066×800

I think the three different implementations of the preprocessing steps and the graphics libraries used probably caused the differences in the results. The way an image is “resized” by System.Graphics.Common vs. ImageSharp(resampled, cropped and centered or padded and pillar boxed) could make a significant difference to the results.

ONNXRuntime.AI-Faster R-CNN C# Sample oddness

After building Faster R-CCN object detection applications with Copilot and Github Copilot the results when compared with Utralytics Yolo (with YoloSharp) didn’t look too bad.

The input image sports.jpg 1200×798 pixels

The GithubCopilot FasterRCNNObjectDetectionApplicationCopilot application only generated labels, confidences and minimum bounding box coordinates.

The FasterRCNNObjectDetectionApplicationGitHubCopilot application the marked-up image was 1200×798 pixels

The YoloSharpObjectDetectionApplication application marked-up image was 1200×798 pixels

I went back to the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample source code to check my implementations and the highlighted area on the left caught my attention.

The FasterRCNNObjectDetectionApplicationOriginal application marked up image was 1023×800

I downloaded the sample code which hadn’t been updated for years.

public static void Main(string[] args)
{
   Console.WriteLine("FasterRCNNObjectDetectionApplicationOriginal");

   // Read paths
   string modelFilePath = args[0];
   string imageFilePath = args[1];
   string outImageFilePath = args[2];

   // Read image
   using Image<Rgb24> image = Image.Load<Rgb24>(imageFilePath);

   // Resize image
   float ratio = 800f / Math.Min(image.Width, image.Height);
   image.Mutate(x => x.Resize((int)(ratio * image.Width), (int)(ratio * image.Height)));

   // Preprocess image
   var paddedHeight = (int)(Math.Ceiling(image.Height / 32f) * 32f);
   var paddedWidth = (int)(Math.Ceiling(image.Width / 32f) * 32f);
   Tensor<float> input = new DenseTensor<float>(new[] { 3, paddedHeight, paddedWidth });
   var mean = new[] { 102.9801f, 115.9465f, 122.7717f };
   image.ProcessPixelRows(accessor =>
   {
      for (int y = paddedHeight - accessor.Height; y < accessor.Height; y++)
      {
         Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
         for (int x = paddedWidth - accessor.Width; x < accessor.Width; x++)
         {
            input[0, y, x] = pixelSpan[x].B - mean[0];
            input[1, y, x] = pixelSpan[x].G - mean[1];
            input[2, y, x] = pixelSpan[x].R - mean[2];
         }
      }
   });

   // Setup inputs and outputs
   var inputs = new List<NamedOnnxValue>
      {
            NamedOnnxValue.CreateFromTensor("image", input)
      };

   // Run inference
   using var session = new InferenceSession(modelFilePath);
   using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs);

   // Postprocess to get predictions
   var resultsArray = results.ToArray();
   float[] boxes = resultsArray[0].AsEnumerable<float>().ToArray();
   long[] labels = resultsArray[1].AsEnumerable<long>().ToArray();
   float[] confidences = resultsArray[2].AsEnumerable<float>().ToArray();
   var predictions = new List<Prediction>();
   var minConfidence = 0.7f;
   for (int i = 0; i < boxes.Length - 4; i += 4)
   {
      var index = i / 4;
      if (confidences[index] >= minConfidence)
      {
         predictions.Add(new Prediction
         {
            Box = new Box(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]),
            Label = LabelMap.Labels[labels[index]],
            Confidence = confidences[index]
         });
      }
   }

   // Put boxes, labels and confidence on image and save for viewing
   using var outputImage = File.OpenWrite(outImageFilePath);
   Font font = SystemFonts.CreateFont("Arial", 16);
   foreach (var p in predictions)
   {
      Console.WriteLine($"Label: {p.Label}, Confidence: {p.Confidence}, Bounding Box:[{p.Box.Xmin}, {p.Box.Ymin}, {p.Box.Xmax}, {p.Box.Ymax}]");
      image.Mutate(x =>
      {
         x.DrawLine(Color.Red, 2f, new PointF[] {

                  new PointF(p.Box.Xmin, p.Box.Ymin),
                  new PointF(p.Box.Xmax, p.Box.Ymin),

                  new PointF(p.Box.Xmax, p.Box.Ymin),
                  new PointF(p.Box.Xmax, p.Box.Ymax),

                  new PointF(p.Box.Xmax, p.Box.Ymax),
                  new PointF(p.Box.Xmin, p.Box.Ymax),

                  new PointF(p.Box.Xmin, p.Box.Ymax),
                  new PointF(p.Box.Xmin, p.Box.Ymin)
               });
         x.DrawText($"{p.Label}, {p.Confidence:0.00}", font, Color.White, new PointF(p.Box.Xmin, p.Box.Ymin));
      });
   }
   image.SaveAsJpeg(outputImage);

   Console.WriteLine("Press Enter to exit");
   Console.ReadLine();
}

I then compared the output of the object detection applications and the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample was different.

After some investigation I think the scaling of the image used for inferencing (based on the requirements on the model), then the scaling of the minimum bounding rectangles isn’t quite right.

IoT and AI Building Edge AI with AI- Introduction

Introduction

Recently, there have been mainstream media articles (Forbes, Wired-Video games , Wired-Copilot, BBC, The Guardian etc.), about how Artificial Intelligence (AI) could soon replace developers.

I have been using Github Copilot, mainly for “enhanced” Visual Studio IntelliSense, code snippets, and refactoring. I have also found Github Copilot generated code samples and more context aware searching of documentation really useful. Overall, the experience has been positive with limited hallucinations and my productivity has got better as my “prompt engineer” skills improved.

Initially, I started building “hobbyist” AI applications for the edge with an Azure Percept and for customer projects use Ultralytics Yolo, Azure Machine Learning, Azure Cognitive Services, ML.Net, Open Neural Network Exchange (ONNX), Nvidia Jeston Modules, Seeedstudio reComputer Industrial J3011 – Fanless Edge AI Devices etc.

After some discussion with my co-workers I decided to explore how much Github Copilot helped building a “non-trivial” computer vision Proof of Concept (PoC). In previous blog posts I have written about capturing still images from a Security Camera so figured that was a good place to start.

Even though I have been using Copilot for a while I still say please.

using System;
using System.IO;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        string cameraUrl = "http://your-security-camera-url/image.jpg"; // Replace with your security camera URL
        string savePath = "C:\\path\\to\\save\\image.jpg"; // Replace with the path where you want to save the image

        using (HttpClient client = new HttpClient())
        {
            try
            {
                HttpResponseMessage response = await client.GetAsync(cameraUrl);
                response.EnsureSuccessStatusCode();

                byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
                await File.WriteAllBytesAsync(savePath, imageBytes);

                Console.WriteLine("Image retrieved and saved successfully.");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"An error occurred: {ex.Message}");
            }
        }
    }
}

The generated code compiled first time, looked “ok” and worked with some of my security cameras when the cameraurl and save path were modified

I then modified the code so “sensitive” information was not stored in code or configuration files.

//using System;
//using System.IO;
//using System.Net.Http;
//using System.Reflection;
//using System.Threading.Tasks;

using Microsoft.Extensions.Configuration;

class Program
{
   static async Task Main(string[] args)
   {
      //string cameraUrl = "http://your-security-camera-url/image.jpg"; // Replace with your security camera URL
      //string savePath = "C:\\path\\to\\save\\image.jpg"; // Replace with the path where you want to save the image
      Model.ApplicationSettings applicationSettings;

   Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} 0.2.CameraHTTPClient starting");
#if RELEASE
         Console.WriteLine("RELEASE");
#else
      Console.WriteLine("DEBUG");
#endif
      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
           .AddJsonFile("appsettings.json", false, true)
      .AddUserSecrets<Program>()
      .Build();

      applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      using (HttpClient client = new HttpClient())
      {
         try
         {
            HttpResponseMessage response = await client.GetAsync(applicationSettings.CameraUrl);
            response.EnsureSuccessStatusCode();

            byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
            await File.WriteAllBytesAsync(applicationSettings.SavePath, imageBytes);

            Console.WriteLine("Image retrieved and saved successfully.");
         }
         catch (Exception ex)
         {
            Console.WriteLine($"An error occurred: {ex.Message}");
         }
      }
   }
}

namespace Model
{
   public class ApplicationSettings
   {
      public string CameraUrl { get; set; } = "";

      public string SavePath { get; set; } = "Images";
   }
}

Several of my security cameras (e.g. HILook T250H) required client to authentication so I added NetworkCredential my Copilot prompt.

// please write c# console application that uses httpclient and an httpget to retrieve image from security camera with NetworkCredential security
//
// Sure! Here's a C# console application that uses HttpClient and HttpGet to retrieve an image from a security camera with NetworkCredential security: 
// In this example:
//
// Replace http://your-security-camera-url/image.jpg with the actual URL of your security camera's image.
//
// Replace C:\\path\\to\\save\\image.jpg with the path where you want to save the retrieved image.
//
// Replace your-username and your-password with your actual credentials.
//
// This application uses HttpClient with NetworkCredential to authenticate the request, retrieves the image as a byte array, and saves it to the specified path. If there's an error during the process, it will be caught and displayed in the console.
//
// Happy coding! 🚀
//using System;
//using System.IO;
using System.Net;
//using System.Net.Http;
//using System.Threading.Tasks;

using Microsoft.Extensions.Configuration;

class Program
{
   static async Task Main(string[] args)
   {
      //string cameraUrl = "http://your-security-camera-url/image.jpg"; // Replace with your security camera URL
      //string savePath = "C:\\path\\to\\save\\image.jpg"; // Replace with the path where you want to save the image
      //string username = "your-username"; // Replace with your username
      //string password = "your-password"; // Replace with your password
      Model.ApplicationSettings applicationSettings;

      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} 0.3.CameraHTTPClient starting");
#if RELEASE
         Console.WriteLine("RELEASE");
#else
      Console.WriteLine("DEBUG");
#endif
      // load the app settings into configuration
      var configuration = new ConfigurationBuilder()
           .AddJsonFile("appsettings.json", false, true)
      .AddUserSecrets<Program>()
      .Build();

      applicationSettings = configuration.GetSection("ApplicationSettings").Get<Model.ApplicationSettings>();

      using (HttpClientHandler handler = new HttpClientHandler { Credentials = new NetworkCredential(applicationSettings.Username, applicationSettings.Password) })
      using (HttpClient client = new HttpClient(handler))
      {
         try
         {
            HttpResponseMessage response = await client.GetAsync(applicationSettings.CameraUrl);
            response.EnsureSuccessStatusCode();

            byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
            await File.WriteAllBytesAsync(applicationSettings.SavePath, imageBytes);

            Console.WriteLine("Image retrieved and saved successfully.");
         }
         catch (Exception ex)
         {
            Console.WriteLine($"An error occurred: {ex.Message}");
         }
      }
   }
}

namespace Model
{
   public class ApplicationSettings
   {
      public string CameraUrl { get; set; } = "";

      public string SavePath { get; set; } = "Images";

      public string Username { get; set; } = "";

      public string Password { get; set; } = "";
   }
}

My Visual Studio 2022 solution with a project for each Copilot generated sample.

Summary

The Copilot generated code for my three “trivial” PoC applications compiled and worked with minimal modifications.

The Copilot generated code in this post is not suitable for production