Random wanderings through Microsoft Azure esp. the IoT bits, AI on Micro controllers, .NET nanoFramework, .NET Core on *nix, and GHI Electronics TinyCLR
In this post I have not covered YoloV8 model selection and tuning of the training configuration to optimise the “performance” of the model. I used the default settings and then ran the model training overnight which cost USD6.77
This post is not about how create a “good” model it is the approach I took to create a “proof of concept” model for a demonstration.
To comply with the Ultralytics AGPL-3.0 License and to use an Ultralytics Pro plan the source code and models for an application have to be open source. Rather than publishing my YoloV8 model (which is quite large) this is the first in a series of posts which detail the process I used to create it. (which I think is more useful)
The single test image (not a good idea) is a photograph of 30 tennis balls on my living room floor.
The object detection results using the “default” model were pretty bad, but this wasn’t a surprise as the model is not optimised for this sort of problem.
I have used datasets from roboflow universe which is a great resource for building “proof of concept” applications.
The first step was to identify some datasets which would improve my tennis ball object detection model results. After some searching (with tennis, tennis-ball etc. classes) and filtering (object detection, has a model for faster evaluation, more the 5000 images) to reduce the search results to a manageable number, I identified 5 datasets worth further evaluation.
In my scenario the performance of the Acebot by Mrunal model was worse than the “default” yolov8s model.
In my scenario the performance of the tennis racket by test model was similar to the “default” yolov8s model.
In my scenario the performance of the Tennis Ball by Hust model was a bit better than the “default” yolov8s mode
In my scenario the performance of the roboflow_oball by ahmedelshalkany model was pretty good it detected 28 of the 30 tennis balls.
In my scenario the performance of the Tennis Ball by Ugur Ozdemir model was good it detected all of the 30 tennis balls.
The uses the Microsoft.Extensions.Logging library to publish diagnostic information to the console while debugging the application.
To check the results I put a breakpoint in the timer just after DetectAsync method is called and then used the Visual Studio 2022 Debugger QuickWatch functionality to inspect the contents of the DetectionResult object.
This application can also be deployed as a Linuxsystemd Service so it will start then run in the background. The same approach as the YoloV8.Detect.SecurityCamera.Stream sample is used because the image doesn’t have to be saved on the local filesystem.
The YoloV8.Detect.SecurityCamera.File sample downloads images from the security camera to the local file system, then calls DetectAsync with the local file path.
private static async void ImageUpdateTimerCallback(object state)
{
//...
try
{
Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image File processing start");
using (Stream cameraStream = await _httpClient.GetStreamAsync(_applicationSettings.CameraUrl))
using (Stream fileStream = System.IO.File.Create(_applicationSettings.ImageFilepath))
{
await cameraStream.CopyToAsync(fileStream);
}
DetectionResult result = await _predictor.DetectAsync(_applicationSettings.ImageFilepath);
Console.WriteLine($"Speed: {result.Speed}");
foreach (var prediction in result.Boxes)
{
Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
}
Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss:fff} YoloV8 Security Camera Image processing done");
}
catch (Exception ex)
{
Console.WriteLine($"{DateTime.UtcNow:yy-MM-dd HH:mm:ss} YoloV8 Security camera image download or YoloV8 prediction failed {ex.Message}");
}
//...
}
The ImageSelector parameter of DetectAsync caught my attention as I hadn’t seen this approach use before. The developers who wrote the NuGet package are definitely smarter than me so I figured I might learn something useful digging deeper.
public static DetectionResult Detect(this YoloV8 predictor, ImageSelector selector)
{
predictor.ValidateTask(YoloV8Task.Detect);
return predictor.Run(selector, (outputs, image, timer) =>
{
var output = outputs[0].AsTensor<float>();
var parser = new DetectionOutputParser(predictor.Metadata, predictor.Parameters);
var boxes = parser.Parse(output, image);
var speed = timer.Stop();
return new DetectionResult
{
Boxes = boxes,
Image = image,
Speed = speed,
};
});
public TResult Run<TResult>(ImageSelector selector, PostprocessContext<TResult> postprocess) where TResult : YoloV8Result
{
using var image = selector.Load(true);
var originSize = image.Size;
var timer = new SpeedTimer();
timer.StartPreprocess();
var input = Preprocess(image);
var inputs = MapNamedOnnxValues([input]);
timer.StartInference();
using var outputs = Infer(inputs);
var list = new List<NamedOnnxValue>(outputs);
timer.StartPostprocess();
return postprocess(list, originSize, timer);
}
}
It looks like most of the image loading magic of ImageSelector class is implemented using the SixLabors library…
public class ImageSelector<TPixel> where TPixel : unmanaged, IPixel<TPixel>
{
private readonly Func<Image<TPixel>> _factory;
public ImageSelector(Image image)
{
_factory = image.CloneAs<TPixel>;
}
public ImageSelector(string path)
{
_factory = () => Image.Load<TPixel>(path);
}
public ImageSelector(byte[] data)
{
_factory = () => Image.Load<TPixel>(data);
}
public ImageSelector(Stream stream)
{
_factory = () => Image.Load<TPixel>(stream);
}
internal Image<TPixel> Load(bool autoOrient)
{
var image = _factory();
if (autoOrient)
image.Mutate(x => x.AutoOrient());
return image;
}
public static implicit operator ImageSelector<TPixel>(Image image) => new(image);
public static implicit operator ImageSelector<TPixel>(string path) => new(path);
public static implicit operator ImageSelector<TPixel>(byte[] data) => new(data);
public static implicit operator ImageSelector<TPixel>(Stream stream) => new(stream);
}
Learnt something new must be careful to apply it only where it adds value.
All of the implementations load the model, load the sample image, detect objects in the image, then markup the image with the classification, minimum bounding boxes, and confidences of each object.
The first implementation uses YoloV8 by dme-compunet which supports asynchronous operation. The image is loaded asynchronously, the prediction is asynchronous, then marked up and saved asynchronously.
using (var predictor = new Compunet.YoloV8.YoloV8(_applicationSettings.ModelPath))
{
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
Console.WriteLine();
using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
{
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");
var predictions = await predictor.DetectAsync(image);
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
Console.WriteLine();
Console.WriteLine($" Speed: {predictions.Speed}");
foreach (var prediction in predictions.Boxes)
{
Console.WriteLine($" Class {prediction.Class} {(prediction.Confidence * 100.0):f1}% X:{prediction.Bounds.X} Y:{prediction.Bounds.Y} Width:{prediction.Bounds.Width} Height:{prediction.Bounds.Height}");
}
Console.WriteLine();
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");
SixLabors.ImageSharp.Image imageOutput = await predictions.PlotImageAsync(image);
await imageOutput.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
}
}
The second implementation uses YoloDotNet by NichSwardh which partially supports asynchronous operation. The image is loaded asynchronously, the prediction is synchronous, the markup is synchronous, and then saved asynchronously.
using (var predictor = new Yolo(_applicationSettings.ModelPath, false))
{
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
Console.WriteLine();
using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
{
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");
var predictions = predictor.RunObjectDetection(image);
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
Console.WriteLine();
foreach (var predicition in predictions)
{
Console.WriteLine($" Class {predicition.Label.Name} {(predicition.Confidence * 100.0):f1}% X:{predicition.BoundingBox.Left} Y:{predicition.BoundingBox.Y} Width:{predicition.BoundingBox.Width} Height:{predicition.BoundingBox.Height}");
}
Console.WriteLine();
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");
image.Draw(predictions);
await image.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
}
}
The third implementation uses YoloV8 by sstainba which partially supports asynchronous operation. The image is loaded asynchronously, the prediction is synchronous, the markup is synchronous, and then saved asynchronously.
using (var predictor = YoloV8Predictor.Create(_applicationSettings.ModelPath))
{
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model load done");
Console.WriteLine();
using (var image = await SixLabors.ImageSharp.Image.LoadAsync<Rgba32>(_applicationSettings.ImageInputPath))
{
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect start");
var predictions = predictor.Predict(image);
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} YoloV8 Model detect done");
Console.WriteLine();
foreach (var prediction in predictions)
{
Console.WriteLine($" Class {prediction.Label.Name} {(prediction.Score * 100.0):f1}% X:{prediction.Rectangle.X} Y:{prediction.Rectangle.Y} Width:{prediction.Rectangle.Width} Height:{prediction.Rectangle.Height}");
}
Console.WriteLine();
Console.WriteLine($" {DateTime.UtcNow:yy-MM-dd HH:mm:ss.fff} Plot and save : {_applicationSettings.ImageOutputPath}");
// This is a bit hacky should be fixed up in future release
Font font = new Font(SystemFonts.Get(_applicationSettings.FontName), _applicationSettings.FontSize);
foreach (var prediction in predictions)
{
var x = (int)Math.Max(prediction.Rectangle.X, 0);
var y = (int)Math.Max(prediction.Rectangle.Y, 0);
var width = (int)Math.Min(image.Width - x, prediction.Rectangle.Width);
var height = (int)Math.Min(image.Height - y, prediction.Rectangle.Height);
//Note that the output is already scaled to the original image height and width.
// Bounding Box Text
string text = $"{prediction.Label.Name} [{prediction.Score}]";
var size = TextMeasurer.MeasureSize(text, new TextOptions(font));
image.Mutate(d => d.Draw(Pens.Solid(Color.Yellow, 2), new Rectangle(x, y, width, height)));
image.Mutate(d => d.DrawText(text, font, Color.Yellow, new Point(x, (int)(y - size.Height - 1))));
}
await image.SaveAsJpegAsync(_applicationSettings.ImageOutputPath);
}
}
I don’t understand why the three NuGets produced different results which is worrying.