ONNXRuntime.AI-Faster R-CNN C# Sample differences

After building Faster R-CCN object detection applications with Copilot and Github Copilot the results when compared with the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample (which hasn’t been updated for years) were slightly different.

The sample image was 640×480 pixels

The FasterRCNNObjectDetectionApplicationGitHubCopilot application scaled image was initially 1056×800 then 1088×800 pixels.

The initial version the dimensions were “rounded down” to the next multiple of 32

// Calculate scale factor to fit within the range while maintaining aspect ratio
float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                (float)minSize / Math.Min(originalWidth, originalHeight));

// Calculate new dimensions
int newWidth = (int)(originalWidth * scale);
int newHeight = (int)(originalHeight * scale);

// Ensure dimensions are divisible by 32
newWidth = (newWidth / divisor) * divisor;
newHeight = (newHeight / divisor) * divisor;
Scaled 1056×800

Then for the second version the dimensions were “rounded up” to the next multiple of 32

// Calculate scale factor to fit within the range while maintaining aspect ratio
float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                (float)minSize / Math.Min(originalWidth, originalHeight));

// Calculate new dimensions
int newWidth = (int)(originalWidth * scale);
int newHeight = (int)(originalHeight * scale);

// Ensure dimensions are divisible by 32
newWidth = (int)(Math.Ceiling(newWidth / 32f) * 32f);
newHeight = (int)(Math.Ceiling(newHeight / 32f) * 32f);
Scaled 1088×800
Marked up 1088×800

The FasterRCNNObjectDetectionApplicationOriginal application scaled the input image to 1066×800

Scaled image 1066×800

The FasterRCNNObjectDetectionApplicationOriginal application pillar boxed/padded the image to 1088×800 as the DenseTensor was loaded.

using Image<Rgb24> image = Image.Load<Rgb24>(imageFilePath);

Console.WriteLine($"Before x:{image.Width} y:{image.Height}");

// Resize image
float ratio = 800f / Math.Min(image.Width, image.Height);
image.Mutate(x => x.Resize((int)(ratio * image.Width), (int)(ratio * image.Height)));

Console.WriteLine($"After x:{image.Width} y:{image.Height}");

// Preprocess image
var paddedHeight = (int)(Math.Ceiling(image.Height / 32f) * 32f);
var paddedWidth = (int)(Math.Ceiling(image.Width / 32f) * 32f);

Console.WriteLine($"Padded x:{paddedWidth} y:{paddedHeight}");

Tensor<float> input = new DenseTensor<float>(new[] { 3, paddedHeight, paddedWidth });
var mean = new[] { 102.9801f, 115.9465f, 122.7717f };
image.ProcessPixelRows(accessor =>
{
   for (int y = paddedHeight - accessor.Height; y < accessor.Height; y++)
   {
      Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
      for (int x = paddedWidth - accessor.Width; x < accessor.Width; x++)
      {
         input[0, y, x] = pixelSpan[x].B - mean[0];
         input[1, y, x] = pixelSpan[x].G - mean[1];
         input[2, y, x] = pixelSpan[x].R - mean[2];
      }
   }
});
Marked up image 1066×800

I think the three different implementations of the preprocessing steps and the graphics libraries used probably caused the differences in the results. The way an image is “resized” by System.Graphics.Common vs. ImageSharp(resampled, cropped and centered or padded and pillar boxed) could make a significant difference to the results.

ONNXRuntime.AI-Faster R-CNN C# Sample oddness

After building Faster R-CCN object detection applications with Copilot and Github Copilot the results when compared with Utralytics Yolo (with YoloSharp) didn’t look too bad.

The input image sports.jpg 1200×798 pixels

The GithubCopilot FasterRCNNObjectDetectionApplicationCopilot application only generated labels, confidences and minimum bounding box coordinates.

The FasterRCNNObjectDetectionApplicationGitHubCopilot application the marked-up image was 1200×798 pixels

The YoloSharpObjectDetectionApplication application marked-up image was 1200×798 pixels

I went back to the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample source code to check my implementations and the highlighted area on the left caught my attention.

The FasterRCNNObjectDetectionApplicationOriginal application marked up image was 1023×800

I downloaded the sample code which hadn’t been updated for years.

public static void Main(string[] args)
{
   Console.WriteLine("FasterRCNNObjectDetectionApplicationOriginal");

   // Read paths
   string modelFilePath = args[0];
   string imageFilePath = args[1];
   string outImageFilePath = args[2];

   // Read image
   using Image<Rgb24> image = Image.Load<Rgb24>(imageFilePath);

   // Resize image
   float ratio = 800f / Math.Min(image.Width, image.Height);
   image.Mutate(x => x.Resize((int)(ratio * image.Width), (int)(ratio * image.Height)));

   // Preprocess image
   var paddedHeight = (int)(Math.Ceiling(image.Height / 32f) * 32f);
   var paddedWidth = (int)(Math.Ceiling(image.Width / 32f) * 32f);
   Tensor<float> input = new DenseTensor<float>(new[] { 3, paddedHeight, paddedWidth });
   var mean = new[] { 102.9801f, 115.9465f, 122.7717f };
   image.ProcessPixelRows(accessor =>
   {
      for (int y = paddedHeight - accessor.Height; y < accessor.Height; y++)
      {
         Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
         for (int x = paddedWidth - accessor.Width; x < accessor.Width; x++)
         {
            input[0, y, x] = pixelSpan[x].B - mean[0];
            input[1, y, x] = pixelSpan[x].G - mean[1];
            input[2, y, x] = pixelSpan[x].R - mean[2];
         }
      }
   });

   // Setup inputs and outputs
   var inputs = new List<NamedOnnxValue>
      {
            NamedOnnxValue.CreateFromTensor("image", input)
      };

   // Run inference
   using var session = new InferenceSession(modelFilePath);
   using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs);

   // Postprocess to get predictions
   var resultsArray = results.ToArray();
   float[] boxes = resultsArray[0].AsEnumerable<float>().ToArray();
   long[] labels = resultsArray[1].AsEnumerable<long>().ToArray();
   float[] confidences = resultsArray[2].AsEnumerable<float>().ToArray();
   var predictions = new List<Prediction>();
   var minConfidence = 0.7f;
   for (int i = 0; i < boxes.Length - 4; i += 4)
   {
      var index = i / 4;
      if (confidences[index] >= minConfidence)
      {
         predictions.Add(new Prediction
         {
            Box = new Box(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]),
            Label = LabelMap.Labels[labels[index]],
            Confidence = confidences[index]
         });
      }
   }

   // Put boxes, labels and confidence on image and save for viewing
   using var outputImage = File.OpenWrite(outImageFilePath);
   Font font = SystemFonts.CreateFont("Arial", 16);
   foreach (var p in predictions)
   {
      Console.WriteLine($"Label: {p.Label}, Confidence: {p.Confidence}, Bounding Box:[{p.Box.Xmin}, {p.Box.Ymin}, {p.Box.Xmax}, {p.Box.Ymax}]");
      image.Mutate(x =>
      {
         x.DrawLine(Color.Red, 2f, new PointF[] {

                  new PointF(p.Box.Xmin, p.Box.Ymin),
                  new PointF(p.Box.Xmax, p.Box.Ymin),

                  new PointF(p.Box.Xmax, p.Box.Ymin),
                  new PointF(p.Box.Xmax, p.Box.Ymax),

                  new PointF(p.Box.Xmax, p.Box.Ymax),
                  new PointF(p.Box.Xmin, p.Box.Ymax),

                  new PointF(p.Box.Xmin, p.Box.Ymax),
                  new PointF(p.Box.Xmin, p.Box.Ymin)
               });
         x.DrawText($"{p.Label}, {p.Confidence:0.00}", font, Color.White, new PointF(p.Box.Xmin, p.Box.Ymin));
      });
   }
   image.SaveAsJpeg(outputImage);

   Console.WriteLine("Press Enter to exit");
   Console.ReadLine();
}

I then compared the output of the object detection applications and the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample was different.

After some investigation I think the scaling of the image used for inferencing (based on the requirements on the model), then the scaling of the minimum bounding rectangles isn’t quite right.

Building Edge AI with Copilot – Faster R-CNN Client

Introduction

To better understand how Copilot and Github Copilot generated code was impacted by the wording and order of prompts I re-played one of my previous sessions.

The generated code looked “reasonable” and vaguely familiar.

static void Main(string[] args)
{
    // Paths to the ONNX model and the image file (adjust these paths as needed)
    string modelPath = @"..\\..\\..\\..\\Models\\FasterRCNN-10.onnx";
    string imagePath = "sports.jpg";

    // Create the OnnxInference session
    using var session = new InferenceSession(modelPath);

    // Load the image from disk using ImageSharp
    using var image = Image.Load<Rgb24>(imagePath);

    // Resize the image to fit within the range and adjust dimensions to be divisible by 32
    ResizeImage(image);

    // Extract tensor data from the image (with shape [3, height, width])
    var inputTensor = ExtractTensorFromImage(image);

    // Create NamedOnnxValue input (ensure that the input name "image" matches your model's input)
    var inputs = new List<NamedOnnxValue>
    {
        NamedOnnxValue.CreateFromTensor("image", inputTensor)
    };

    // Run the model inference  
    using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs);

    // Process and display the outputs (bounding boxes, labels, and confidences)
    ProcessOutput(results);

    Console.WriteLine("Press Enter to exit");
    Console.ReadLine();
}

/// <summary>
/// Resizes the input image such that both width and height are within the range [800, 1333]
/// and ensures the dimensions are divisible by 32.
/// </summary>
private static void ResizeImage(Image<Rgb24> image)
{
   const int minSize = 800;
   const int maxSize = 1333;
   int originalWidth = image.Width;
   int originalHeight = image.Height;

   // Determine the scaling factor so that the smallest side is at least minSize and the largest does not exceed maxSize.
   float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                                   (float)minSize / Math.Min(originalWidth, originalHeight));

   // Compute the new dimensions based on the scale
   int newWidth = (int)(originalWidth * scale);
   int newHeight = (int)(originalHeight * scale);

   // Adjust dimensions to be divisible by 32
   newWidth = (newWidth / 32) * 32;
   newHeight = (newHeight / 32) * 32;

   image.Mutate(x => x.Resize(newWidth, newHeight));
}

/// <summary>
/// Converts the resized image into a DenseTensor<float> with shape [3, height, width].
/// The image is processed to subtract the Faster‑RCNN channel means (B, G, R order).
/// </summary>
private static DenseTensor<float> ExtractTensorFromImage(Image<Rgb24> image)
{
   int width = image.Width;
   int height = image.Height;

   // Create a tensor with shape [channels, height, width]
   var tensor = new DenseTensor<float>(new[] { 3, height, width });

   // Faster‑RCNN channel means (order: blue, green, red)
   float[] mean = { 102.9801f, 115.9465f, 122.7717f };

   // Process each pixel row; ImageSharp provides efficient pixel row access.
   image.ProcessPixelRows(accessor =>
   {
      for (int y = 0; y < height; y++)
      {
         var pixelRow = accessor.GetRowSpan(y);
         for (int x = 0; x < width; x++)
         {
             // Subtract the channel mean value (ensuring B, G, R order)
             tensor[0, y, x] = pixelRow[x].B - mean[0];
             tensor[1, y, x] = pixelRow[x].G - mean[1];
             tensor[2, y, x] = pixelRow[x].R - mean[2];
         }
      }
   });

return tensor;
}

/// <summary>
/// Processes the model output, extracting bounding boxes, labels, and confidences.
/// Only detections with confidence scores above a defined threshold are printed.
/// </summary>
private static void ProcessOutput(IDisposableReadOnlyCollection<DisposableNamedOnnxValue> output)
{
    // Note: The output node names ("6379", "6381", "6383") are model-specific.
    // They correspond, respectively, to boxes, labels, and confidence scores.
    var boxesTensor = output.First(x => x.Name == "6379").AsTensor<float>();
    var labelsTensor = output.First(x => x.Name == "6381").AsTensor<long>();
    var confidencesTensor = output.First(x => x.Name == "6383").AsTensor<float>();

    float[] boxes = boxesTensor.ToArray();
    long[] labels = labelsTensor.ToArray();
    float[] confidences = confidencesTensor.ToArray();

    const float minConfidence = 0.7f;

    // Each bounding box is represented by 4 values: x1, y1, x2, y2.
    for (int i = 0; i < boxes.Length; i += 4)
    {
        int detectionIndex = i / 4;
        if (confidences[detectionIndex] >= minConfidence)
        {
           long label = labels[detectionIndex];
           float confidence = confidences[detectionIndex];
           float x1 = boxes[i];
           float y1 = boxes[i + 1];
           float x2 = boxes[i + 2];
           float y2 = boxes[i + 3];
           Console.WriteLine($"Label: {label}, Confidence: {confidence}, Bounding Box: [{x1}, {y1}, {x2}, {y2}]");
        }
    }
}

The Copilot generated code had the names of the output tensors (6379,6381, 6383), the mean calculation and the order of the colours (B,G,R) correct. The name of the image file and the path to the model file in The Explanation and Additional information looked a lot like mine.

All I had to do was add the Microsoft.ML.OnnxRuntime and SixLabors.ImageSharp NuGets then the code compiled and ran first time. I then checked the results, and they looked reasonable.

The similarities between the generated code for the different blog posts was suspicious so I asked…

Summary

The Copilot generated code in this post in this was “inspired” the Copilot code generated for my Building Edge AI with GitHub Copilot – Faster R-CNN Client, Building Edge AI with GitHub Copilot – Faster R-CNN Client Revisited or AIIoTForTheEdgeAndAzureBuiltWithCopilot repository.

The Github Copilot generated code in my AIIoTForTheEdgeAndAzureBuiltWithCopilot repository was then “inspired” by the Object detection with Faster RCNN Deep Learning in C# and Image recognition with ResNet50v2 in C# samples.

The Copilot generated code in this post is not suitable for production

Building Edge AI with GitHub Copilot – Faster R-CNN Client Revisited

Introduction

While building the first GitHub Copilot version I noticed that when System.Drawing.Common was changed to Sixlabors.ImageSharp made a significant difference to the generated code. For this post I’ll be using GitHub Copilot to generate code for a console application that uses a Faster R-CNN ONNX model to process an image loaded from disk.

please write a C# console application that uses Onnx to run a ONNX Faster-RCNN object detection model on an image loaded from disk

I manually added the System.Drawing.Commom, Microsoft.ML.OnnxRuntime NuGet packages, using directives and then the code compiled.

static void Main(string[] args)
{
   // Path to the ONNX model and input image
   string modelPath = "FasterRCNN-10.onnx";
   string imagePath = "sports.jpg";

   // Load the image
   Bitmap image = new Bitmap(imagePath);
   var inputTensor = PreprocessImage(image);

   // Load the ONNX model
   using var session = new InferenceSession(modelPath);

   // Create input data
   var inputName = session.InputMetadata.Keys.First();
   var inputs = new List<NamedOnnxValue>
   {
         NamedOnnxValue.CreateFromTensor(inputName, inputTensor)
   };

   //// Run inference
   using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs);

   // Process output
   foreach (var result in results)
   {
      Console.WriteLine($"{result.Name}: {string.Join(", ", result.AsEnumerable<float>())}");
   }
}

static DenseTensor<float> PreprocessImage(Bitmap image)
{
   // Resize image to model's expected input size (e.g., 224x224)
   int targetWidth = 224;
   int targetHeight = 224;
   Bitmap resizedImage = new Bitmap(image, new Size(targetWidth, targetHeight));

   // Convert image to float array and normalize
   var input = new DenseTensor<float>(new[] { 1, 3, targetHeight, targetWidth });
   for (int y = 0; y < targetHeight; y++)
   {
      for (int x = 0; x < targetWidth; x++)
      {
         Color pixel = resizedImage.GetPixel(x, y);
         input[0, 0, y, x] = pixel.R / 255f;
         input[0, 1, y, x] = pixel.G / 255f;
         input[0, 2, y, x] = pixel.B / 255f;
      }
   }

   return input;
}

I fixed the paths to the ONNX model and image file, so the application ran but there was an error.

The dimensions of the input tensor were wrong based on the Faster R-CNN documentation because the declaration had an extra dimension.

Change dimensions of tensor by removing batch from declaration

static DenseTensor<float> PreprocessImage(Bitmap image)
{
   // Resize image to model's expected input size (e.g., 224x224)
   int targetWidth = 224;
   int targetHeight = 224;
   Bitmap resizedImage = new Bitmap(image, new Size(targetWidth, targetHeight));

   // Convert image to float array and normalize
   var input = new DenseTensor<float>(new[] { 3, targetHeight, targetWidth });
   for (int y = 0; y < targetHeight; y++)
   {
      for (int x = 0; x < targetWidth; x++)
      {
         Color pixel = resizedImage.GetPixel(x, y);
         input[0, y, x] = pixel.R / 255f;
         input[1, y, x] = pixel.G / 255f;
         input[2, y, x] = pixel.B / 255f;
      }
   }

   return input;
}

GitHub Copilot fixed the tensor declaration by removing the batch id, then the application ran but there was an error.

The dimensions of the input image were wrong so, based on the Faster R-CNN documentation I used GitHub Copilot to resize the image

resize the image such that both height and width are within the range of [800, 1333], such that both height and width are divisible by 32

static Bitmap ResizeImageForModel(Bitmap image)
{
   // Define the target range and divisibility
   const int minSize = 800;
   const int maxSize = 1333;
   const int divisor = 32;

   // Get original dimensions
   int originalWidth = image.Width;
   int originalHeight = image.Height;

   // Calculate scale factor to fit within the range while maintaining aspect ratio
   float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight),
                          (float)minSize / Math.Min(originalWidth, originalHeight));

   // Calculate new dimensions
   int newWidth = (int)(originalWidth * scale);
   int newHeight = (int)(originalHeight * scale);

   // Ensure dimensions are divisible by 32
   newWidth = (newWidth / divisor) * divisor;
   newHeight = (newHeight / divisor) * divisor;

   // Resize the image
   return new Bitmap(image, new Size(newWidth, newHeight));
}

GitHub Copilot fixed the image resizing, so the application ran but there was still an error.

The processing of the output tensor was wrong so, based on the Faster R-CNN documentation I used GitHub Copilot to add the code required to “correctly” display the results.

Display label, confidence and bounding box

I also manually added the using directive for System.Drawing.Drawing2D

static void ProcessOutput(IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results, Bitmap image)
{
   // Extract output tensors
   var boxes = results.First(r => r.Name == "boxes").AsEnumerable<float>().ToArray();
   var labels = results.First(r => r.Name == "labels").AsEnumerable<long>().ToArray();
   var scores = results.First(r => r.Name == "scores").AsEnumerable<float>().ToArray();

   using Graphics graphics = Graphics.FromImage(image);
   graphics.SmoothingMode = SmoothingMode.AntiAlias;

   for (int i = 0; i < labels.Length; i++)
   {
      if (scores[i] < 0.5) continue; // Filter low-confidence detections

      // Extract bounding box coordinates
      float x1 = boxes[i * 4];
      float y1 = boxes[i * 4 + 1];
      float x2 = boxes[i * 4 + 2];
      float y2 = boxes[i * 4 + 3];

      // Draw bounding box
      RectangleF rect = new RectangleF(x1, y1, x2 - x1, y2 - y1);
      graphics.DrawRectangle(Pens.Red, rect.X, rect.Y, rect.Width, rect.Height);

      // Display label and confidence
      string label = $"Label: {labels[i]}, Confidence: {scores[i]:0.00}";
      graphics.DrawString(label, new Font("Arial", 12), Brushes.Yellow, new PointF(x1, y1 - 20));
   }

   // Save the image with annotations
   image.Save("output.jpg");
   Console.WriteLine("Output image saved as 'output.jpg'.");
}

The application ran but there was an error because the output tensor names were wrong.

I used Netron to determine the correct output tensor names.

It was quicker to manually fix the output tensor names

static void ProcessOutput(IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results, Bitmap image)
 {
    // Extract output tensors
    var boxes = results.First(r => r.Name == "6379").AsEnumerable<float>().ToArray();
    var labels = results.First(r => r.Name == "6381").AsEnumerable<long>().ToArray();
    var scores = results.First(r => r.Name == "6383").AsEnumerable<float>().ToArray();

    using Graphics graphics = Graphics.FromImage(image);
    graphics.SmoothingMode = SmoothingMode.AntiAlias;

    for (int i = 0; i < labels.Length; i++)
    {
       if (scores[i] < 0.5) continue; // Filter low-confidence detections

       // Extract bounding box coordinates
       float x1 = boxes[i * 4];
       float y1 = boxes[i * 4 + 1];
       float x2 = boxes[i * 4 + 2];
       float y2 = boxes[i * 4 + 3];

       // Draw bounding box
       RectangleF rect = new RectangleF(x1, y1, x2 - x1, y2 - y1);
       graphics.DrawRectangle(Pens.Red, rect.X, rect.Y, rect.Width, rect.Height);

       // Display label and confidence
       string label = $"Label: {labels[i]}, Confidence: {scores[i]:0.00}";
       graphics.DrawString(label, new Font("Arial", 12), Brushes.Yellow, new PointF(x1, y1 - 20));
    }

    // Save the image with annotations
    image.Save("output.jpg");
    Console.WriteLine("Output image saved as 'output.jpg'.");
 }

The application ran but the results were bad, so I checked format of the input tensor and figured out the mean adjustment was missing.

Apply mean to each channel

I used GitHub Copilot to add code for the mean adjustment for each pixel

static DenseTensor<float> PreprocessImage(Bitmap image)
{
   // Resize image to model's expected input size  
   Bitmap resizedImage = ResizeImageForModel(image);

   // Apply FasterRCNN mean values to each channel  
   float[] mean = { 102.9801f, 115.9465f, 122.7717f };

   // Convert image to float array and normalize  
   var input = new DenseTensor<float>(new[] { 3, resizedImage.Height, resizedImage.Width });
   for (int y = 0; y < resizedImage.Height; y++)
   {
      for (int x = 0; x < resizedImage.Width; x++)
      {
         Color pixel = resizedImage.GetPixel(x, y);
         input[0, y, x] = (pixel.R - mean[0]) / 255f;
         input[1, y, x] = (pixel.G - mean[1]) / 255f;
         input[2, y, x] = (pixel.B - mean[2]) / 255f;
      }
   }

   return input;
}

The application ran but the results were still bad, so I checked format of the input tensor and figured out the mean adjustment was wrong. It was quicker to manually fix up the mean calculation.

static DenseTensor<float> PreprocessImage(Bitmap image)
{
   // Resize image to model's expected input size  
   Bitmap resizedImage = ResizeImageForModel(image);

   // Apply FasterRCNN mean values to each channel  
   float[] mean = { 102.9801f, 115.9465f, 122.7717f };

   // Convert image to float array and normalize  
   var input = new DenseTensor<float>(new[] { 3, resizedImage.Height, resizedImage.Width });
   for (int y = 0; y < resizedImage.Height; y++)
   {
      for (int x = 0; x < resizedImage.Width; x++)
      {
         Color pixel = resizedImage.GetPixel(x, y);

         input[0, y, x] = pixel.R - mean[0];
         input[1, y, x] = pixel.G - mean[1];
         input[2, y, x] = pixel.B - mean[2];
      }
   }

   return input;
}

The application ran but the results were still bad, so I checked format of the input tensor and figured out the input tensor was BGR rather than RGB.

Change to B,G,R

static DenseTensor<float> PreprocessImage(Bitmap image)
{
   // Resize image to model's expected input size  
   Bitmap resizedImage = ResizeImageForModel(image);

   // Apply FasterRCNN mean values to each channel  
   float[] mean = { 102.9801f, 115.9465f, 122.7717f };

   // Convert image to float array and normalize  
   var input = new DenseTensor<float>(new[] { 3, resizedImage.Height, resizedImage.Width });
   for (int y = 0; y < resizedImage.Height; y++)
   {
      for (int x = 0; x < resizedImage.Width; x++)
      {
         Color pixel = resizedImage.GetPixel(x, y);
         input[0, y, x] = pixel.B - mean[0] ;
         input[1, y, x] = pixel.G - mean[1] ;
         input[2, y, x] = pixel.R - mean[2] ;
      }
   }

   return input;
}

Finally, the application minimum bounding rectangles (MBRs), labels and confidences looked correct.

Summary

The GitHub Copilot generated code looked like it was “inspired” by the onnxruntime.ai Object detection with Faster RCNN Deep Learning in C# sample.

The additional code for marking up the image in the ProcessOutput was unexpected and I wonder if there wasn’t a Sixlabors.ImageSharp example for “inspiration”

The Copilot generated code in this post is not suitable for production

Building Edge AI with GitHub Copilot – Faster R-CNN Client

Introduction

For this post I’ll be using GitHub Copilot to generate code for a console application that uses a Faster R-CNN ONNX model to process an image loaded from disk.

please write a C# console application that uses Onnx to run a ONNX Faster-RCNN object detection model on an image loaded from disk

I added the System.Drawing.Common and Microsoft.ML.OnnxRuntime NuGet packages, required using directives then the code compiled. The ProcessOutput method was stubbed out so I will need to figure out the format of the output tensor

static void Main(string[] args)
{
   string modelPath = "FasterRCNN-10.onnx";
   string imagePath = "sports.jpg";

   using var session = new InferenceSession(modelPath);
   using var image = new Bitmap(imagePath);

   var inputTensor = ExtractTensorFromImage(image);
   var inputs = new List<NamedOnnxValue>
   {
       NamedOnnxValue.CreateFromTensor("image", inputTensor)
   };

   using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs);
   var output = results.First().AsEnumerable<float>().ToArray();

   // Process the output (e.g., draw bounding boxes on the image)
   ProcessOutput(output, image);

   image.Save("output.jpg", ImageFormat.Jpeg);
   Console.WriteLine("Object detection completed. Output saved as output.jpg");
}

private static DenseTensor<float> ExtractTensorFromImage(Bitmap image)
{
   int width = image.Width;
   int height = image.Height;
   var tensor = new DenseTensor<float>(new[] { 1, 3, height, width });

   for (int y = 0; y < height; y++)
   {
      for (int x = 0; x < width; x++)
      {
         Color pixel = image.GetPixel(x, y);
         tensor[0, 0, y, x] = pixel.R / 255.0f;
         tensor[0, 1, y, x] = pixel.G / 255.0f;
         tensor[0, 2, y, x] = pixel.B / 255.0f;
      }
   }

   return tensor;
}

private static void ProcessOutput(float[] output, Bitmap image)
{
   // Implement the logic to process the output and draw bounding boxes on the image
   // This is a placeholder for the actual implementation
}

I fixed the paths to the ONNX model and image files, so the application ran but there was an error.

The dimensions of the input tensor were wrong because the declaration included a batch id.

Change dimensions of tensor by removing batch from declaration

private static DenseTensor<float> ExtractTensorFromImage(Bitmap image)
{
   int width = image.Width;
   int height = image.Height;
   var tensor = new DenseTensor<float>(new[] { 3, height, width });

   for (int y = 0; y < height; y++)
   {
      for (int x = 0; x < width; x++)
      {
         Color pixel = image.GetPixel(x, y);
         tensor[0, y, x] = pixel.R / 255.0f;
         tensor[1, y, x] = pixel.G / 255.0f;
         tensor[2, y, x] = pixel.B / 255.0f;
      }
   }

   return tensor;
 }

GitHub Copilot fixed the tensor declaration by removing the batch id. The application also used System.Drawing which is not supported on my target embedded platforms.

change from System.Drawing to ImageSharp

I added the SixLabors.ImageSharp and removed System.Drawing.Common NuGets. Then updated the using directives as required.

using var image = Image.Load<Rgb24>(imagePath);
...
private static DenseTensor<float> ExtractTensorFromImage(Image<Rgb24> image)
{
   int width = image.Width;
   int height = image.Height;
   var tensor = new DenseTensor<float>(new[] { 3, height, width });

   image.ProcessPixelRows(accessor =>
   {
      for (int y = 0; y < height; y++)
      {
         var pixelRow = accessor.GetRowSpan(y);
         for (int x = 0; x < width; x++)
         {
            tensor[0, y, x] = pixelRow[x].R / 255.0f;
            tensor[1, y, x] = pixelRow[x].G / 255.0f;
            tensor[2, y, x] = pixelRow[x].B / 255.0f;
         }
      }
   });

   return tensor;
}

The application ran but there was an error as the dimensions of the input image were wrong.

Based on the Faster R-CNN documentation I used ImageSharp to resize the image

Use ImageSharp to resize the image such that both height and width are within the range of [800, 1333], such that both height and width are divisible by 32.

 private static void ResizeImage(Image<Rgb24> image)
 {
    const int minSize = 800;
    const int maxSize = 1333;

    int originalWidth = image.Width;
    int originalHeight = image.Height;

    float scale = Math.Min((float)maxSize / Math.Max(originalWidth, originalHeight), (float)minSize / Math.Min(originalWidth, originalHeight));
    int newWidth = (int)(originalWidth * scale);
    int newHeight = (int)(originalHeight * scale);

    // Ensure dimensions are divisible by 32
    newWidth = (newWidth / 32) * 32;
    newHeight = (newHeight / 32) * 32;

    image.Mutate(x => x.Resize(newWidth, newHeight));
 }

The application ran but the results were bad, so I checked format of the input tensor and figured out the R,G,B values need to be adjusted.

Apply mean to each channel

private static DenseTensor<float> ExtractTensorFromImage(Image<Rgb24> image)
{
   int width = image.Width;
   int height = image.Height;
   var tensor = new DenseTensor<float>(new[] { 3, height, width });

   // Mean values for each channel
   float[] mean = { 0.485f, 0.456f, 0.406f };

   image.ProcessPixelRows(accessor =>
   {
      for (int y = 0; y < height; y++)
      {
         var pixelRow = accessor.GetRowSpan(y);
         for (int x = 0; x < width; x++)
         {
            tensor[0, y, x] = (pixelRow[x].R / 255.0f) - mean[0];
            tensor[1, y, x] = (pixelRow[x].G / 255.0f) - mean[1];
            tensor[2, y, x] = (pixelRow[x].B / 255.0f) - mean[2];
         }
      }
   });

   return tensor;
 }

The application ran but the results were still bad, so I checked format of the input tensor and figured out the Resnet50 means had be used and the input tensor was BGR rather than RGB

Use FasterRCNN means not resnet ones
Change to B,G,R

private static DenseTensor<float> ExtractTensorFromImage(Image<Rgb24> image)
{
   int width = image.Width;
   int height = image.Height;
   var tensor = new DenseTensor<float>(new[] { 3, height, width });

   // Mean values for each channel for FasterRCNN
   float[] mean = { 102.9801f, 115.9465f, 122.7717f };

   image.ProcessPixelRows(accessor =>
   {
      for (int y = 0; y < height; y++)
      {
         var pixelRow = accessor.GetRowSpan(y);
         for (int x = 0; x < width; x++)
         {
            tensor[0, y, x] = pixelRow[x].B - mean[0];
            tensor[1, y, x] = pixelRow[x].G - mean[1];
            tensor[2, y, x] = pixelRow[x].R - mean[2];
         }
      }
   });

   return tensor;
}

When I inspected the values in the output tensor in the debugger they looked “reasonable” so got GitHub Copilot to add the code required to display the results.

Display label, confidence and bounding box

The application ran but there was an exception because the names of the output tensor “dimensions” were wrong.

I used Netron to get the correct output tensor “dimension” names.

I then manually fixed the output tensor “dimension” names

private static void ProcessOutput(IDisposableReadOnlyCollection<DisposableNamedOnnxValue> output)
{
   var boxes = output.First(x => x.Name == "6379").AsTensor<float>().ToArray();
   var labels = output.First(x => x.Name == "6381").AsTensor<long>().ToArray();
   var confidences = output.First(x => x.Name == "6383").AsTensor<float>().ToArray();

   const float minConfidence = 0.7f;

   for (int i = 0; i < boxes.Length; i += 4)
   {
      var index = i / 4;
      if (confidences[index] >= minConfidence)
      {
         long label = labels[index];
         float confidence = confidences[index];
         float x1 = boxes[i];
         float y1 = boxes[i + 1];
         float x2 = boxes[i + 2];
         float y2 = boxes[i + 3];

         Console.WriteLine($"Label: {label}, Confidence: {confidence}, Bounding Box: [{x1}, {y1}, {x2}, {y2}]");
      }
   }
}

I manually compared the output of the console application with equivalent YoloSharp application output and the results looked close enough.

Summary

The Copilot prompts required to generate code were significantly more complex than previous examples and I had to regularly refer to the documentation to figure out what was wrong. The code wasn’t great and Copilot didn’t add much value

The Copilot generated code in this post is not suitable for production

Building Edge AI with Github Copilot- Security Camera HTTP YoloSharp

When I started with the Security Camera HTTP code and added code to process the images with Ultralytics Object Detection model I found the order of the prompts could make a difference. My first attempt at adding YoloSharp to the SecurityCameraHttpClient application with Github Copilot didn’t go well and needed some “human intervention”. When I thought more about the order of the prompts the adding the same functionality went a lot better.

// Use a stream rather than loading image from a file
// Use YoloSharp to run an onnx Object Detection model on the image
// Make the YoloPredictor a class variable
// Save image if object with specified image class name detected
// Modify so objectDetected supports multiple image class names
// Modify code to make use of GPU configurable
// Make display of detections configurable in app settings
// Make saving of image configurable in app settings

internal class Program
{
   private static HttpClient _client;
   private static bool _isRetrievingImage = false;
   private static ApplicationSettings _applicationSettings;
   private static YoloPredictor _yoloPredictor;

   static void Main(string[] args)
   {
      Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} SecurityCameraClient starting");
#if RELEASE
         Console.WriteLine("RELEASE");
#else
         Console.WriteLine("DEBUG");
#endif

      var configuration = new ConfigurationBuilder()
            .AddJsonFile("appsettings.json", false, true)
            .AddUserSecrets<Program>()
            .Build();

      _applicationSettings = configuration.GetSection("ApplicationSettings").Get<ApplicationSettings>();

      // Initialize YoloPredictor with GPU configuration
      _yoloPredictor = new YoloPredictor(_applicationSettings.OnnxModelPath, new YoloPredictorOptions()
      {
         UseCuda = _applicationSettings.UseCuda, // Configurable GPU usage
      });

      using (HttpClientHandler handler = new HttpClientHandler { Credentials = new NetworkCredential(_applicationSettings.Username, _applicationSettings.Password) })
      using (_client = new HttpClient(handler))
      using (var timer = new Timer(async _ => await RetrieveImageAsync(), null, _applicationSettings.TimerDue, _applicationSettings.TimerPeriod))
      {
         Console.WriteLine("Press any key to exit...");
         Console.ReadKey();
      }
   }

      private static async Task RetrieveImageAsync()
      {
         if (_isRetrievingImage) return;

         _isRetrievingImage = true;
         try
         {
            Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} SecurityCameraClient download starting");

            HttpResponseMessage response = await _client.GetAsync(_applicationSettings.CameraUrl);
            response.EnsureSuccessStatusCode();

            using (Stream imageStream = await response.Content.ReadAsStreamAsync())
            {
               var detections = _yoloPredictor.Detect(imageStream);
               bool objectDetected = false;

               foreach (var detection in detections)
               {
                  if (_applicationSettings.LogDetections) // Check if logging detections is enabled
                  {
                     Console.WriteLine($"Detected {detection.Name.Name} with confidence {detection.Confidence}");
                  }

                  if (_applicationSettings.ClassNames.Contains(detection.Name.Name))
                  {
                     objectDetected = true;
                  }
               }

               if (objectDetected && _applicationSettings.SaveImage) // Check if saving images is enabled
               {
                  string savePath = string.Format(_applicationSettings.SavePath, DateTime.UtcNow);
                  using (FileStream fileStream = new FileStream(savePath, FileMode.Create, FileAccess.Write, FileShare.None))
                  {
                     imageStream.Position = 0;
                     await imageStream.CopyToAsync(fileStream);
                  }
               }
            }

            Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} SecurityCameraClient download done");
         }
         catch (Exception ex)
         {
            Console.WriteLine($"An error occurred: {ex.Message}");
         }
         finally
         {
            _isRetrievingImage = false;
         }
      }
}

public class ApplicationSettings
{
   public string CameraUrl { get; set; } = "";
   public string SavePath { get; set; } = "";
   public string Username { get; set; } = "";
   public string Password { get; set; } = "";
   public TimeSpan TimerDue { get; set; } = TimeSpan.Zero;
   public TimeSpan TimerPeriod { get; set; } = TimeSpan.Zero;
   public string OnnxModelPath { get; set; } = "";
   public bool UseCuda { get; set; } = false; // Configurable GPU usage
   public List<string> ClassNames { get; set; } //= new List<string>();
   public bool LogDetections { get; set; } = false; // Configurable logging of detections 
   public bool SaveImage { get; set; } = false; // Configurable saving of images
   }
}

The interactions Visual Studio IntelliSense with the GitHub Copilot prompts was interesting.

I wonder if this is because Visual Studio Intellisense has local context, whereas Github Copilot has “cloud” context.

It took a couple of failed attempts to find the best order, which I think would reduce over time.

The Copilot generated code in this post is not suitable for production

Building Edge AI with Github Copilot- Security Camera HTTP

I use GitHub Copilot with Visual Studio 2020 and I wanted to see if I could write a simple application only using “Ask Copilot”. Retrieving a static image from a security camera (like my HiLook IPC-T250H) felt like a reasonable challenge.

please write a console application that uses an HttpClient to retrieve an image from a security camera

static async Task Main(string[] args)
{
   string cameraUrl = "http://your-security-camera-url/image.jpg";
   string savePath = "downloaded_image.jpg";

   using (HttpClient client = new HttpClient())
   {
      try
      {
         HttpResponseMessage response = await client.GetAsync(cameraUrl);
         response.EnsureSuccessStatusCode();

         byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
         await File.WriteAllBytesAsync(savePath, imageBytes);

         Console.WriteLine("Image downloaded successfully.");
      }
      catch (Exception ex)
      {
         Console.WriteLine($"An error occurred: {ex.Message}");
      }
   }
}
}

The Github Copilot generated code compiled first time and looked reasonable for a Proof of Concept (PoC). After changing the “cameraUrl” the program could retrieve an image from one of my cameras when I disabled authentication.

please write a console application that uses an httpclient to retrieve an image from a security camera. The security camera needs a username and password

static async Task Main(string[] args)
{
   string cameraUrl = "http://your-security-camera-url/image";
   string username = "your-username";
   string password = "your-password";

   using (HttpClient client = new HttpClient())
   {
      var byteArray = new System.Text.ASCIIEncoding().GetBytes($"{username}:{password}");
      client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray));

      HttpResponseMessage response = await client.GetAsync(cameraUrl);
      response.EnsureSuccessStatusCode();

      byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
      File.WriteAllBytes("camera_image.jpg", imageBytes);

      Console.WriteLine("Image retrieved and saved as camera_image.jpg");
   }

Github Copilot had taken my request literally and implemented Hypertext Transfer Protocol (HTTP/1.1): Authentication (RFC7235). My HiLook IPC-T250H only supports Digest Authentication(RFC2617) so I tried

please modify the security code to support Digest authentication

static async Task Main(string[] args)
{
   string cameraUrl = "http://your-security-camera-url/image";
   string username = "your-username";
   string password = "your-password";

   using (HttpClient client = new HttpClient())
   {
      HttpResponseMessage response = await client.GetAsync(cameraUrl);

      if (response.StatusCode == System.Net.HttpStatusCode.Unauthorized)
      {
         var wwwAuthenticateHeader = response.Headers.WwwAuthenticate.FirstOrDefault();
         if (wwwAuthenticateHeader != null && wwwAuthenticateHeader.Scheme == "Digest")
         {
            var digestHeader = wwwAuthenticateHeader.Parameter;
            var digestResponse = CreateDigestHeader(digestHeader, cameraUrl, username, password, "GET");

            client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Digest", digestResponse);

            response = await client.GetAsync(cameraUrl);
            response.EnsureSuccessStatusCode();

            byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
            File.WriteAllBytes("camera_image.jpg", imageBytes);

            Console.WriteLine("Image retrieved and saved as camera_image.jpg");
         }
      }
   }
}

private static string CreateDigestHeader(string digestHeader, string url, string username, string password, string method)
{
   var headerValues = digestHeader.Split(',').Select(v => v.Trim().Split('=')).ToDictionary(v => v[0], v => v[1].Trim('"'));

   string ha1 = CalculateMD5Hash($"{username}:{headerValues["realm"]}:{password}");
   string ha2 = CalculateMD5Hash($"{method}:{url}");
   string response = CalculateMD5Hash($"{ha1}:{headerValues["nonce"]}:{headerValues["nc"]}:{headerValues["cnonce"]}:{headerValues["qop"]}:{ha2}");

   return $"username=\"{username}\", realm=\"{headerValues["realm"]}\", nonce=\"{headerValues["nonce"]}\", uri=\"{url}\", qop={headerValues["qop"]}, nc={headerValues["nc"]}, cnonce=\"{headerValues["cnonce"]}\", response=\"{response}\", opaque=\"{headerValues["opaque"]}\"";
}

private static string CalculateMD5Hash(string input)
{
   using (MD5 md5 = MD5.Create())
   {
      byte[] inputBytes = Encoding.ASCII.GetBytes(input);
      byte[] hashBytes = md5.ComputeHash(inputBytes);
      return BitConverter.ToString(hashBytes).Replace("-", "").ToLower();
   }
}

Github Copilot had taken my request literally and implemented Digest Authentication(RFC2617) so I did some “research”. The Copilot desktop application suggested using NetworkCredential which supports Basic, and digest (also NTLM, and Kerberos) authentication.

modify the code to use a NetworkCredential to authenticate with the security camera

using System.Net;

namespace SecurityCameraClient
{
   internal class Program
   {
      static async Task Main(string[] args)
      {
         string cameraUrl = "http://your-security-camera-url/image.jpg";
         string savePath = "downloaded_image.jpg";
         string username = "your-username";
         string password = "your-password";

         using (HttpClientHandler handler = new HttpClientHandler { Credentials = new NetworkCredential(username, password) })
         using (HttpClient client = new HttpClient(handler))
         {
            try
            {
               HttpResponseMessage response = await client.GetAsync(cameraUrl);
               response.EnsureSuccessStatusCode();

               byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
               await File.WriteAllBytesAsync(savePath, imageBytes);

               Console.WriteLine("Image downloaded successfully.");
            }
            catch (Exception ex)
            {
               Console.WriteLine($"An error occurred: {ex.Message}");
            }
         }
      }
   }
}

The Github Copilot generated code compiled and after changing the “cameraUrl”, “username” and “password” the program could retrieve an image from each of my cameras.

I wanted the application to retrieve images regularly, but the first Github Copilot generated code was pretty ugly (it used a System.Timers.Timer). After “consulting” the Copilot desktop application

modify the code to use a System.Threading Timer so an image is retrieved every 1000mSec

using System.Net;

namespace SecurityCameraClient
{
   internal class Program
   {
      private static Timer _timer;
      private static HttpClient _client;
      private static string _cameraUrl = "http://your-security-camera-url/image.jpg";
      string savePath = "downloaded_image.jpg";

      static void Main(string[] args)
      {
         string username = "your-username";
         string password = "your-password";

         HttpClientHandler handler = new HttpClientHandler { Credentials = new NetworkCredential(username, password) };
         _client = new HttpClient(handler, disposeHandler: true);

         _timer = new Timer(async _ => await RetrieveImageAsync(), null, 0, 1000);

         Console.WriteLine("Press any key to exit...");
         Console.ReadKey();

         _timer.Dispose();
         _client.Dispose();
      }

      private static async Task RetrieveImageAsync()
      {
         try
         {
            HttpResponseMessage response = await _client.GetAsync(_cameraUrl);
            response.EnsureSuccessStatusCode();

            byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
            await File.WriteAllBytesAsync(_savePath, imageBytes);

            Console.WriteLine("Image downloaded successfully.");
         }
         catch (Exception ex)
         {
            Console.WriteLine($"An error occurred: {ex.Message}");
         }
      }
   }
}

By this stage I was an elite “prompt engineer” and after a couple of revisions of my Github Copilot prompt to released unmanaged resources (the initial version used .Dispose and was a bit ugly).

modify the code to dispose of the HttpClientHandler, HttpClient and Timer with “using” statements

The application had a reentrancy issue when retrieving an image from a camera took too long

modify the code to stop RetrieveImageAsync getting called while an image is already being retrieved

I then decided to try a “cosmetic” change

modify the code _timer does not have to be class level variable

The savePath was a constant and I wanted to store a series of images

modify the code to use String.Format to generate the savepath

For the final version I modified the program adding a Console.Writeline to display the build type, retrieving the SavePath, dueTime, and period from the appsettings.json. The Microsoft.Configuration.UserSecrets configuration source was used for the CameraUrl, UserName, and Password.

using System.Net;

using Microsoft.Extensions.Configuration;

namespace SecurityCameraClient
{
   internal class Program
   {
      private static HttpClient _client;
      private static bool _isRetrievingImage = false;
      private static ApplicationSettings _applicationSettings;

      static void Main(string[] args)
      {
         Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} SecurityCameraClient starting");
#if RELEASE
         Console.WriteLine("RELEASE");
#else
         Console.WriteLine("DEBUG");
#endif

         var configuration = new ConfigurationBuilder()
              .AddJsonFile("appsettings.json", false, true)
         .AddUserSecrets<Program>()
         .Build();

         _applicationSettings = configuration.GetSection("ApplicationSettings").Get<ApplicationSettings>();

         using (HttpClientHandler handler = new HttpClientHandler { Credentials = new NetworkCredential(_applicationSettings.Username, _applicationSettings.Password) })
         using (_client = new HttpClient(handler))
         using (var timer = new Timer(async _ => await RetrieveImageAsync(), null, _applicationSettings.TimerDue, _applicationSettings.TimerPeriod))
         {
            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
         }
      }

      private static async Task RetrieveImageAsync()
      {
         if (_isRetrievingImage) return;

         _isRetrievingImage = true;
         try
         {
            HttpResponseMessage response = await _client.GetAsync(_applicationSettings.CameraUrl);
            response.EnsureSuccessStatusCode();

            byte[] imageBytes = await response.Content.ReadAsByteArrayAsync();
            string savePath = string.Format(_applicationSettings.SavePath, DateTime.UtcNow);
            await File.WriteAllBytesAsync(savePath, imageBytes);

            Console.WriteLine("Image downloaded successfully.");
         }
         catch (Exception ex)
         {
            Console.WriteLine($"An error occurred: {ex.Message}");
         }
         finally
         {
            _isRetrievingImage = false;
         }
      }
   }

   public class ApplicationSettings
   {
      public string CameraUrl { get; set; } = "";

      public string SavePath { get; set; } = "";

      public string Username { get; set; } = "";

      public string Password { get; set; } = "";

      public TimeSpan TimerDue { get; set; } = TimeSpan.Zero;

      public TimeSpan TimerPeriod { get; set; } = TimeSpan.Zero;
   }
}

Overall, my Github Copilot experience was pretty good, and got better as my “prompt engineering” improved.

The Github Copilot “decision” to implement Hypertext Transfer Protocol (HTTP/1.1): Authentication (RFC7235) and Digest Authentication(RFC2617) was “sub optimal”