Swarm Space – Uplink with Azure Functions or WebAPI

This post could have been much longer with more screen grabs and code snippets, so this is the “highlights package”. This post took a lot longer than I expected as building, testing locally, then deploying the different implementations was time consuming.

Swarm Space Connector Functions Projects

I built the projects to investigate the different options taking into account reliability, robustness, amount of code, performance (I think slow startup could be a problem). The code is very “plain” I used the default options, no copyright notices, default formatting, context sensitive error messages were used to add any required “using” statements, libraries etc.

The desktop emulator hosting the six functions

I also deployed the Azure Functions and ASP .NET Core WebAPI application to check there were no difference (beyond performance) in the way they worked. I included a “default” function (generated by the new project wizard) for reference while I was building the others.

The function application with six functions in deployed to Azure

The “dynamic” type function worked but broke when the Javascript Object Notation(JSON) was invalid, or fields were missing, and it didn’t enforce the payload was correct.

namespace WebhookHttpTrigger
{
    public static class Dynamic
    {
        [FunctionName("Dynamic")]
        public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] dynamic input,
            ILogger log)
        {
            log.LogInformation($"C# HTTP Dynamic trigger function processed a request PacketId:{input.packetId}.");

            return new OkObjectResult("Hello, This HTTP triggered Dynamic function executed successfully.");
        }
    }
}
Dynamic Trigger function failing because PacketId format was valid (x in numeric field)

The “TypedAutomagic” function worked, it ensured the Javascript Object Notation(JSON) was valid, the payload format was correct but didn’t enforce the System.ComponentModel.DataAnnotations attributes.

namespace WebhookHttpTrigger
{
    public static class TypedAutomagic
    {
        [FunctionName("TypedAutomagic")]
        public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] UplinkPayload payload,
            ILogger log)
        {
            log.LogInformation($"C# HTTP trigger function typed TypedAutomagic UplinkPayload processed a request PacketId:{payload.PacketId}");

            return new OkObjectResult("Hello, This HTTP triggered automagic function executed successfully.");
        }
    }
}
Successful execution of TypedAutomagic function

The “TypedAutomagic” implementation also detected when the JSON property values in the payload couldn’t be deserialised successfully, but if the hiveRxTime was invalid the value was set to 1/1/0001 12:00:00 am.

TypedAutomagic hiveRxTime deserialisation failing

The “TypedDeserializeObject” function worked, it ensured the Javascript Object Notation(JSON) was valid, the payload format was correct but also didn’t enforce the System.ComponentModel.DataAnnotations attributes.

namespace WebhookHttpTrigger
{
    public static class TypedDeserializeObject
    {
        [FunctionName("TypedDeserializeObject")]
        public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] string httpPayload,
            ILogger log)
        {
            UplinkPayload uplinkPayload;

            try
            {
                uplinkPayload = JsonConvert.DeserializeObject<UplinkPayload>(httpPayload);
            }
            catch(Exception ex) 
            {
                log.LogWarning(ex, "JsonConvert.DeserializeObject failed");

                return new BadRequestObjectResult(ex.Message);
            }

            log.LogInformation($"C# HTTP trigger function typed DeserializeObject UplinkPayload processed a request PacketId:{uplinkPayload.PacketId}");

            return new OkObjectResult("Hello, This HTTP triggered DeserializeObject function executed successfully.");
        }
    }
}
TypedDeserializeObject function failing because PacketId format was valid (x in numeric field)
TypedDeserializeObject function failing because deviceId value is negative but datatype is unsigned
Successful execution of TypedDeserializeObject function

The “TypedDeserializeObjectAnnotations” function worked, it ensured the Javascript Object Notation (JSON) was valid, the payload format was correct and enforced the System.ComponentModel.DataAnnotations attributes.

namespace WebhookHttpTrigger
{
    public static class TypedDeserializeObjectAnnotations
    {
        [FunctionName("TypedDeserializeObjectAnnotations")]
        public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] string httpPayload,
            ILogger log)
        {
            UplinkPayload uplinkPayload;

            try
            {
                uplinkPayload = JsonConvert.DeserializeObject<UplinkPayload>(httpPayload);
            }
            catch (Exception ex)
            {
                log.LogWarning(ex, "JsonConvert.DeserializeObject failed");

                return new BadRequestObjectResult(ex.Message);
            }

            var context = new ValidationContext(uplinkPayload, serviceProvider: null, items: null);

            var results = new List<ValidationResult>();

            var isValid = Validator.TryValidateObject(uplinkPayload, context, results,true);

            if (!isValid)
            {
                log.LogWarning("Validator.TryValidateObject failed results:{results}", results);

                return new BadRequestObjectResult(results);
            }

            log.LogInformation($"C# HTTP trigger function typed DeserializeObject UplinkPayload processed a request PacketId:{uplinkPayload.PacketId}");

            return new OkObjectResult("Hello, This HTTP triggered DeserializeObject function executed successfully.");
        }
    }
}

I built an ASP .NET Core WebAPI version with two uplink method implementations, one which used dependency injection (DI) and the other that didn’t. I also added code to validate the deserialisation of HiveRxTimeUtc.

....
[HttpPost]
public async Task<IActionResult> Post([FromBody] UplinkPayload payload)
{
    if ( payload.HiveRxTimeUtc == DateTime.MinValue)
    {
        _logger.LogWarning("HiveRxTimeUtc validation failed");

        return this.BadRequest();
    }

    QueueClient queueClient = _queueServiceClient.GetQueueClient("uplink");

    await queueClient.SendMessageAsync(Convert.ToBase64String(JsonSerializer.SerializeToUtf8Bytes(payload)));

    return this.Ok();
}
...
 [HttpPost]
public async Task<IActionResult> Post([FromBody] UplinkPayload payload)
{
    // Check that the post data is good
    if (!this.ModelState.IsValid)
    {
        _logger.LogWarning("QueuedController validation failed {0}", this.ModelState.ToString());

        return this.BadRequest(this.ModelState);
    }

    if ( payload.HiveRxTimeUtc == DateTime.MinValue)
    {
        _logger.LogWarning("HiveRxTimeUtc validation failed");

        return this.BadRequest();
    }

    try
    {
        QueueClient queueClient = new QueueClient(_configuration.GetConnectionString("AzureWebApi"), "uplink");

        //await queueClient.CreateIfNotExistsAsync();

        await queueClient.SendMessageAsync(Convert.ToBase64String(JsonSerializer.SerializeToUtf8Bytes(payload)));
    }
    catch (Exception ex)
    {
        _logger.LogError(ex,"Unable to open/create queue or send message", ex);

        return this.Problem("Unable to open queue (creating if it doesn't exist) or send message", statusCode: 500, title: "Uplink payload not sent");
    }

    return this.Ok();
}

In Telerik Fiddler I could see calls to the Azure Functions and the ASP .NET Core WebAPI were taking similar time to execute (Though I did see 5+ seconds) and the ASP .NET Core WebAPI appeared to take much longer to startup. (I did see 100+ seconds when I made four requests as the ASP .NET Core WebAPI was starting)

I’m going to use the ASP .NET Core WebAPI with dependency injection (DI) approach just because “it’s always better with DI”.

I noticed some other “oddness” while implementing then testing the Azure Http Trigger functions and ASP .NET Core WebAPI which I will cover off in some future posts.

Swarm Space – Underlying Architecture Revisited

After figuring out that calling a CS-Script uplink payload formatter inside an Azure Http Trigger function wasn’t going to work I needed a new architecture.

Swarm Space Azure IoT Connector Identity Translation Gateway Architecture

The new approach uses most of the existing building blocks but adds an Azure HTTP Trigger which receives the Swarm Space Bumble bee hive Webhook Delivery Method calls and writes them to an Azure Storage Queue.

Swarm Space Bumble bee hive Web Hook Delivery method

The uplink and downlink formatters are now called asynchronously so they have limited impact on the overall performance of the application.

Swarm Space – Uplink Payload Startup Problem

I initially noticed a couple of duplicate Swarm Space message PacketIds in Azure IoT Central.

Azure IoT Central with consecutive duplicate PacketIds

Then I started to pay more attention and noticed that duplicate PacketIds could be interleaved

Azure IoT Central with interleaved duplicate PacketIds

Shortly after noticing the interleaved PacketIds I checked the Delivery Method and found there were message delivery timeouts.

Swarm Space Delivery with method timeouts

In Azure Application Insights I could see that the UplinkController was taking up to 15 seconds to execute which was longer than the bumblebee hive delivery timeout.

Azure Application Insights displaying UplinkController metrics.

In Telerik Fiddler I could see calls to the UplinkController taking 16 seconds to execute. (I did see 30+ seconds)

Telerik Fiddler showing duration of Uplink controller calls

To see if the problem was loading CS-Script I added code to load a simple function as the application started. After averaging the duration over many executions there was little difference in the duration.

public interface IApplication
{
    public DateTime Startup(DateTime utcNow);
}
...
protected override async Task ExecuteAsync(CancellationToken cancellationToken)
{
    await Task.Yield();

    _logger.LogInformation("StartUpService.ExecuteAsync start");
            
    // Force the loading and startup of CS Script evaluator
    dynamic application = CSScript.Evaluator
        .LoadCode(
                @"using System;
                public class Application : IApplication
                {
                    public DateTime Startup(DateTime utcNow)
                    {
                        return utcNow;
                    }
                }");

    DateTime result = application.Startup(DateTime.UtcNow);
            
    try
    {
        await _swarmSpaceBumblebeeHive.Login(cancellationToken);

       await _azureIoTDeviceClientCache.Load(cancellationToken);
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "StartUpService.ExecuteAsync error");

        throw;
    }

    _logger.LogInformation("StartUpService.ExecuteAsync finish");
}

The Swarm Eval Kit uplink formatter (UserApplicationId 65535.cs) “unpacks” the uplink Javascript ObjectNotation(JSON) message, adds an Azure IoT Central compatible location which requires a number of libraries to be loaded.

using System;
using System.Globalization;
using System.Text;

using Microsoft.Azure.Devices.Client;

using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

public class FormatterUplink : PayloadFormatter.IFormatterUplink
{
    public Message Evaluate(int organisationId, int deviceId, int deviceType, int userApplicationId, JObject telemetryEvent, JObject payloadJson, string payloadText, byte[] payloadBytes)
    {
        if ((payloadText != "") && (payloadJson != null))
        {
            JObject location = new JObject();

            location.Add("lat", payloadJson.GetValue("lt"));
            location.Add("lon", payloadJson.GetValue("ln"));
            location.Add("alt", payloadJson.GetValue("a"));

            telemetryEvent.Add("DeviceLocation", location);
        }

        Message ioTHubmessage = new Message(Encoding.ASCII.GetBytes(JsonConvert.SerializeObject(telemetryEvent)));

        ioTHubmessage.Properties.Add("iothub-creation-time-utc", DateTimeOffset.FromUnixTimeSeconds((long)payloadJson.GetValue("d")).ToString("s", CultureInfo.InvariantCulture));

        return ioTHubmessage;
    }
}

I then added code to load the most complex uplink and downlink formatters as the application started. There was a significant reduction in the UplinkController execution durations, but it could still take more than 30 seconds.

try
{
    await _swarmSpaceBumblebeeHive.Login(cancellationToken);

    await _azureIoTDeviceClientCache.Load(cancellationToken);

    await _formatterCache.UplinkGetAsync(65535);

    await _formatterCache.DownlinkGetAsync(20);
}
catch (Exception ex)
{
    _logger.LogError(ex, "StartUpService.ExecuteAsync error");

    throw;
}

I then added detailed telemetry to the code and found that the duration (also variability) was a combination of Azure IoT Device Provisoning Service(DPS) registration, Azure IoT Hub connection establishment, CS-Script payload formatter loading/compilation/execution, application startup tasks and message uploading durations.

After much experimentation It looks like that “synchronously” calling the payload processing code from the Uplink controller is not a viable approach as the Swarm Space Bumblebee hive calls regularly timeout resulting in duplicate messages.

Swarm Space – Uplink Payload Formatters revisited

The approach used in Swarm Space–Uplink Payload Message Creation Time had significant limitations e.g. setting the iothub-creation-time-utc message property.

public interface IFormatterUplink
{
    public Message Evaluate(int organisationId, int deviceId, int deviceType, int userApplicationId, JObject telemetryEvent, JObject payloadJson, string payloadText, byte[] payloadBytes);
}

Uplink payload formatters now return a Microsoft.Azure.Azure.Devices.Client message object to the UplinkController.

...
JObject telemetryEvent = new JObject
{
    { "packetId", payload.PacketId},
    { "deviceType" , payload.DeviceType},
    { "DeviceID", payload.DeviceId },
    { "organizationId", payload.OrganizationId },
    { "UserApplicationId", payload.UserApplicationId},
    { "ReceivedAtUtc", payload.HiveRxTime.ToString("s", CultureInfo.InvariantCulture) },
    { "DataLength", payload.Len },
    { "Data", payload.Data },
    { "Status", payload.Status },
};

    _logger.LogDebug("Uplink-DeviceId:{0} PacketId:{1} TelemetryEvent before:{0}", payload.DeviceId, payload.PacketId, JsonConvert.SerializeObject(telemetryEvent, Formatting.Indented));

    using (Message ioTHubmessage = swarmSpaceFormatterUplink.Evaluate(payload.OrganizationId, payload.DeviceId, context.DeviceType, payload.UserApplicationId, telemetryEvent, payloadJson, payloadText, payloadBytes))
{
    _logger.LogDebug("Uplink-DeviceId:{0} PacketId:{1} TelemetryEvent after:{0}", payload.DeviceId, payload.PacketId, JsonConvert.SerializeObject(telemetryEvent, Formatting.Indented));

    ioTHubmessage.Properties.Add("PacketId", payload.PacketId.ToString());
    ioTHubmessage.Properties.Add("OrganizationId", payload.OrganizationId.ToString());
    ioTHubmessage.Properties.Add("UserApplicationId", payload.UserApplicationId.ToString());
    ioTHubmessage.Properties.Add("DeviceId", payload.DeviceId.ToString());
    ioTHubmessage.Properties.Add("deviceType", payload.DeviceType.ToString());

    await deviceClient.SendEventAsync(ioTHubmessage);

    _logger.LogInformation("Uplink-DeviceID:{deviceId} PacketId:{1} SendEventAsync success", payload.DeviceId, payload.PacketId);
}
...

The default uplink payload formatter (UserApplicationId 0.cs) returns a Microsoft.Azure.Azure.Devices.Client message object with a serialised TelemetryEvent payload.

public class FormatterUplink : PayloadFormatter.IFormatterUplink
{
    public Message Evaluate(int organisationId, int deviceId, int deviceType, int userApplicationId, JObject telemetryEvent, JObject payloadJson, string payloadText, byte[] payloadBytes)
    {
        Message ioTHubmessage = new Message(Encoding.ASCII.GetBytes(JsonConvert.SerializeObject(telemetryEvent)));

        return ioTHubmessage;
    }
}

The Swarm Eval Kit uplink sample formatter (UserApplicationId 65535.cs) “unpacks” the uplink Javascript ObjectNotation(JSON) message, adds an Azure IoT Central compatible location to the TelemetryEvent and an “iothub-creation-time-utc” message property.

public class FormatterUplink : PayloadFormatter.IFormatterUplink
{
    public Message Evaluate(int organisationId, int deviceId, int deviceType, int userApplicationId, JObject telemetryEvent, JObject payloadJson, string payloadText, byte[] payloadBytes)
    {
        if ((payloadText != "") && (payloadJson != null))
        {
            JObject location = new JObject();

            location.Add("lat", payloadJson.GetValue("lt"));
            location.Add("lon", payloadJson.GetValue("ln"));
            location.Add("alt", payloadJson.GetValue("a"));

            telemetryEvent.Add("DeviceLocation", location);
        }

        Message ioTHubmessage = new Message(Encoding.ASCII.GetBytes(JsonConvert.SerializeObject(telemetryEvent)));

        ioTHubmessage.Properties.Add("iothub-creation-time-utc", DateTimeOffset.FromUnixTimeSeconds((long)payloadJson.GetValue("d")).ToString("s", CultureInfo.InvariantCulture));

        return ioTHubmessage;
    }
}

If the uplink formatter compilation or execution fails, a detailed exception message is logged to Azure Application Insights

Detailed compilation error message in Application Insights

I need to add some tools to make the creation, modification, deletion and debugging of downlink/uplink formatters easier.

Swarm Space – Uplink Payload Message Creation Time

The Swarm Space satellite constellation doesn’t have continuous coverage (Jan 2023) so messages sent when there is no coverage are queued (default 48hrs) by the Swarm M138 Modem for transmission when a satellite passes overhead.

Satellite Passes with gap in coverage from 16:18 to 18:42 highlighted

In the Swarm Hive Delivery Method messages from the Swarm Eval Kit and Swarm Tracker in my backyard arriving in “clusters”.

Swarm Hive Delivery Methods webhook calls.

The messages in each “cluster” were processed by a payload formatter then forwarded to Azure IoT Central for processing. All the messages in a cluster had similar event creation times which was “breaking” graphs and device tracking maps. After running the application locally using Telerik Fiddler to try different payloads I realised that the Microsoft.Azure.Azure.Devices.Client message iothub-creation-time-utc property was set to the when the message was received by Swarm Space infrastructure.

_logger.LogDebug("Uplink-DeviceId:{0} PacketId:{1} TelemetryEvent before:{0}", payload.DeviceId, payload.PacketId, JsonConvert.SerializeObject(telemetryEvent, Formatting.Indented));

telemetryEvent = swarmSpaceFormatterUplink.Evaluate(telemetryEvent, payload.Data, payloadBytes, payloadText, payloadJson);

_logger.LogDebug("Uplink-DeviceId:{0} PacketId:{1} TelemetryEvent after:{0}", payload.DeviceId, payload.PacketId, JsonConvert.SerializeObject(telemetryEvent, Formatting.Indented));

// Send the message to Azure IoT Hub
using (Message ioTHubmessage = new Message(Encoding.ASCII.GetBytes(JsonConvert.SerializeObject(telemetryEvent))))
{
   // Ensure the displayed time is the acquired time rather than the uploaded time. 
   ioTHubmessage.Properties.Add("iothub-creation-time-utc", payload.HiveRxTime.ToString("s", CultureInfo.InvariantCulture));
   ioTHubmessage.Properties.Add("PacketId", payload.PacketId.ToString());
   ioTHubmessage.Properties.Add("OrganizationId", payload.OrganizationId.ToString());
   ioTHubmessage.Properties.Add("ApplicationId", payload.UserApplicationId.ToString());
   ioTHubmessage.Properties.Add("DeviceId", payload.DeviceId.ToString());
   ioTHubmessage.Properties.Add("deviceType", payload.DeviceType.ToString());

   await deviceClient.SendEventAsync(ioTHubmessage);

   _logger.LogInformation("Uplink-DeviceID:{deviceId} SendEventAsync success", payload.DeviceId);
}

The Swarm Eval Kit uplink (JSON) message generated by the sample firmware “d” field is the number of seconds since the Unix Epoch that the message payload was constructed.

Swarm Hive Messages with “d” field in the JSON payload highlighted
Online Unix Epoch Convertor displaying Unix Epoch 1672561286 in NZDT and UTC time

The revised 65355.cs payload formatter adds an “iothub-creation-time-utc” field to the TelemetryEvent

using System;
using System.Globalization;

using Newtonsoft.Json.Linq;

public class FormatterUplink : PayloadFormatter.IFormatterUplink
{
    public JObject Evaluate(JObject telemetryEvent, string payloadBase64, byte[] payloadBytes, string payloadText, JObject payloadJson)
    {
        if ((payloadText != "" ) && ( payloadJson != null))
        {
            JObject location = new JObject();

            location.Add("lat", payloadJson.GetValue("lt"));
            location.Add("lon", payloadJson.GetValue("ln"));
            location.Add("alt", payloadJson.GetValue("a"));

            telemetryEvent.Add("DeviceLocation", location);
        };

        telemetryEvent.Add("iothub-creation-time-utc", DateTimeOffset.FromUnixTimeSeconds((long)payloadJson.GetValue("d")).ToString("s", CultureInfo.InvariantCulture));

        return telemetryEvent;
    }
}

Which, if present is used to populate theMicrosoft.Azure.Azure.Devices.Client message iothub-creation-time-utc property

_logger.LogDebug("Uplink-DeviceId:{0} PacketId:{1} TelemetryEvent before:{0}", payload.DeviceId, payload.PacketId, JsonConvert.SerializeObject(telemetryEvent, Formatting.Indented));

telemetryEvent = swarmSpaceFormatterUplink.Evaluate(telemetryEvent, payload.Data, payloadBytes, payloadText, payloadJson);

.LogDebug("Uplink-DeviceId:{0} PacketId:{1} TelemetryEvent after:{0}", payload.DeviceId, payload.PacketId, JsonConvert.SerializeObject(telemetryEvent, Formatting.Indented));

Send the message to Azure IoT Hub
using (Message ioTHubmessage = new Message(Encoding.ASCII.GetBytes(JsonConvert.SerializeObject(telemetryEvent))))
{
   // Ensure the displayed time is the acquired time rather than the uploaded time. 
   ioTHubmessage.Properties.Add("PacketId", payload.PacketId.ToString());
   ioTHubmessage.Properties.Add("OrganizationId", payload.OrganizationId.ToString());
   ioTHubmessage.Properties.Add("UserApplicationId", payload.UserApplicationId.ToString());
   ioTHubmessage.Properties.Add("DeviceId", payload.DeviceId.ToString());
   ioTHubmessage.Properties.Add("deviceType", payload.DeviceType.ToString());

   if (telemetryEvent.ContainsKey("iothub-creation-time-utc"))
   {
      ioTHubmessage.Properties.Add("iothub-creation-time-utc",telemetryEvent.Value<string>("iothub-creation-time-utc"));
   }

   await deviceClient.SendEventAsync(ioTHubmessage);

   _logger.LogInformation("Uplink-DeviceID:{deviceId} SendEventAsync success", payload.DeviceId);
}

The Azure IoT Central message now had the correct timestamp and “event creation time” values.

AzureIoT Central “Raw Data” with valid timestamp and event creation times

I don’t think this is a good solution

The design of the payload formatters will have to be revisited

Swarm Space – Uplink Payload formatter caching and files

The payload formatters of my Azure IoT Hub Cloud Identity Translation Gateway use CS-Script and even a simple one was taking more than half a second to compile each time it was called.

using System;
using System.Globalization;

using Newtonsoft.Json.Linq;

public class FormatterUplink : PayloadFormatter.IFormatterUplink
{
    public JObject Evaluate(JObject telemetryEvent, string payloadBase64, byte[] payloadBytes, string payloadText, JObject payloadJson)
    {
        if ((payloadText != "" ) && ( payloadJson != null))
        {
            JObject location = new JObject();

            location.Add("lat", payloadJson.GetValue("lt"));
            location.Add("lon", payloadJson.GetValue("ln"));
            location.Add("alt", payloadJson.GetValue("a"));

            telemetryEvent.Add("Location", location);
        };

        return telemetryEvent;
    }
}

The Swarm Eval Kit default message has a userApplicationId of 65335

{"ln":123.456,"si":0.0,"bi":0.2,"sv":0.152,"lt":-12.345,"bv":4.032,"d":1671704370,"n":2,"a":9.0,"s":1.0,"c":208.0,"r":-94,"ti":0.032}

The 65355.cs payload formatter adds an Azure IoT Central compatible location to the telemetry payload.

Azure IoT Central uplink telemetry message payload

The formatter files are currently part of the SwarmSpaceAzureIoTConnector project (moving to Azure Blob Storage) so are configured as “content” (bonus syntax highlighting works) and “copy if newer” so they are included in the deployment package.

Visual Studio 2022 Sample payload formatter

I used Alastair Crabtrees’s LazyCache to store compiled payload formatters with Uplink/Downlink + UserApplicationId as the cache key.

public class FormatterCache : IFormatterCache
{
    private readonly ILogger<FormatterCache> _logger;
    private readonly Models.ApplicationSettings _applicationSettings;
    private readonly static IAppCache _payloadFormatters = new CachingService();

    public FormatterCache(ILogger<FormatterCache>logger, IOptions<Models.ApplicationSettings> applicationSettings)
    {
        _logger = logger;
        _applicationSettings = applicationSettings.Value;
    }

    public async Task<IFormatterUplink> UplinkGetAsync(int userApplicationId)
    {
        IFormatterUplink payloadFormatterUplink = await _payloadFormatters.GetOrAddAsync<PayloadFormatter.IFormatterUplink>($"U{userApplicationId}", (ICacheEntry x) => UplinkLoadAsync(userApplicationId), memoryCacheEntryOptions);

        return payloadFormatterUplink;
    }

    private async Task<IFormatterUplink> UplinkLoadAsync(int userApplicationId)
    {
        string payloadformatterFilePath = $"{_applicationSettings.PayloadFormattersUplinkFilePath}\\{userApplicationId}.cs";

        if (!File.Exists(payloadformatterFilePath))
        {
            _logger.LogInformation("PayloadFormatterUplink- UserApplicationId:{0} PayloadFormatterPath:{1} not found using default:{2}", userApplicationId, payloadformatterFilePath, _applicationSettings.PayloadFormatterUplinkDefault);

            return CSScript.Evaluator.LoadFile<PayloadFormatter.IFormatterUplink>(_applicationSettings.PayloadFormatterUplinkDefault);
        }

        _logger.LogInformation("PayloadFormatterUplink- UserApplicationId:{0} loading PayloadFormatterPath:{1}", userApplicationId, payloadformatterFilePath);

        return CSScript.Evaluator.LoadFile<PayloadFormatter.IFormatterUplink>(payloadformatterFilePath);
    }
...
}

The default uplink and downlink formatters are configured in application settings and are used when a UserApplicationId specific formatter is not configured.

Fiddler Composer illustrating compiled formatter timings before and after caching

.NET Core web API + Dapper – Parameters

Different Approaches…

While working on customer ASP.NET Core web API(WebAPI) + Microsoft SQL Server(MSSQL) applications I have encountered several different ways of passing parameters to stored procedures and embedded Structured Query Language(SQL) statements. I have created five examples which query the World Wide Importers database [Warehouse].[StockItems] in the World Wide Importers database to illustrate the different approaches.

A customer with large application which had a lot of ADO.Net code was comfortable Dapper DynamicParameters. Hundreds of stored procedures with input (some output) parameters were used to manage access to data. The main advantage of this approach was “familiarity” and the use of DynamicParameters made mapping of C# variable and stored procedure parameters (with different naming conventions) obvious.

[HttpGet("Dynamic")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetDynamic(
            [Required][MinLength(3, ErrorMessage = "The name search text must be at least {1} characters long"), MaxLength(20, ErrorMessage = "The name search text must be no more that {1} characters long")] string searchText,
            [Required][Range(1, int.MaxValue, ErrorMessage = "MaximumRowsToReturn must be greater than or equal to {1}")] int maximumRowsToReturn)
{
    IEnumerable<Model.StockItemListDtoV1> response = null;

    using (SqlConnection db = new SqlConnection(this.connectionString))
    {
        DynamicParameters parameters = new DynamicParameters();

        parameters.Add("MaximumRowsToReturn", maximumRowsToReturn);
        parameters.Add("SearchText", searchText);

        response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(sql: "[Warehouse].[StockItemsNameSearchV1]", param: parameters, commandType: CommandType.StoredProcedure);
    }

    return this.Ok(response);
}
Error message displayed when SearchText field missing
Error message displayed when SearchText is too short
Error message displayed when SearchText too long
Successful query of StockItems table

The developers at another company used anonymous typed variables everywhere. They also had similar C# and stored procedure parameter naming conventions so there was minimal (in the example code only maximumRowsToReturn vs. stockItemsMaximum) mapping required. They found mapping stored procedure output parameters was problematic. For longer parameter lists they struggled with formatting the code in a way which was readable.

 [HttpGet("Anonymous")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetAnonymous(
            [Required][MinLength(3, ErrorMessage = "The name search text must be at least {1} characters long"), MaxLength(20, ErrorMessage = "The name search text must be no more that {1} characters long")] string searchText,
            [Required][Range(1, 100, ErrorMessage = "The maximum number of stock items to return must be greater than or equal to {1} and less then or equal {2}")] int stockItemsMaximum)
{
   IEnumerable<Model.StockItemListDtoV1> response = null;

   using (SqlConnection db = new SqlConnection(this.connectionString))
   {
      response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(sql: "[Warehouse].[StockItemsNameSearchV1]", new { searchText, maximumRowsToReturn = stockItemsMaximum }, commandType: CommandType.StoredProcedure);
   }

   return this.Ok(response);
}

At another customer the developers used Data Transfer Objects(DTOs)/Plain Old CLR Objects(POCOs) and they had some control over the naming of the stored procedure/embedded SQL parameters.

public class StockItemNameSearchDtoV1
{
   [Required]
   [MinLength(3, ErrorMessage = "The name search text must be at least {1} characters long"), MaxLength(20, ErrorMessage = "The name search text must be no more that {1} characters long")]
   public string SearchText { get; set; }

   [Required]
   [Range(1, 100, ErrorMessage = "The maximum number of rows to return must be greater than or equal to {1} and less then or equal {2}")]
   public int MaximumRowsToReturn { get; set; }
}
[HttpGet("AutomagicDefault")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetDefault([FromQuery] Model.StockItemNameSearchDtoV1 request)
{
   IEnumerable<Model.StockItemListDtoV1> response = null;

   using (SqlConnection db = new SqlConnection(this.connectionString))
   {
      response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(sql: "[Warehouse].[StockItemsNameSearchV1]", param: request, commandType: CommandType.StoredProcedure);
   }

   return this.Ok(response);
}

At another customer the developers used Data Transfer Objects(DTOs)/Plain Old CLR Objects(POCOs) to access the database which had several hundred stored procedures. They had no control over the stored procedure parameter names so they mapped query string parameters to the properties of their POCOs.

This took some experimentation as System.Text.Json/Newtonsoft.Json decorations didn’t work (query string is not Java Script Object Notation(JSON)). They decorated the properties of their DTOs with the [FromQuery] attribute.

public class StockItemNameSearchDtoV2
{
   [Required]
   [FromQuery(Name = "SearchText")]
   [MinLength(3, ErrorMessage = "The name search text must be at least {1} characters long"), MaxLength(20, ErrorMessage = "The name search text must be no more than {1} characters long")]
   public string SearchText { get; set; }

   [Required]
   [FromQuery(Name = "StockItemsMaximum")]
   [Range(1, 100, ErrorMessage = "The maximum number of stock items to return must be greater than or equal to {1} and less then or equal {2}")]
   public int MaximumRowsToReturn { get; set; }
}
[HttpGet("AutomagicMapped")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetMapperDecorated([FromQuery] Model.StockItemNameSearchDtoV2 request)
{
   IEnumerable<Model.StockItemListDtoV1> response = null;

   using (SqlConnection db = new SqlConnection(this.connectionString))
   {
      response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(sql: "[Warehouse].[StockItemsNameSearchV1]", param: request, commandType: CommandType.StoredProcedure);
   }

   return this.Ok(response);
}

I don’t think that [FromQuery] decorations on POCOs is a good idea. If the classes are only used for one method I would consider moving them into the controller file.

//
// https://localhost:5001/api/StockItemsParameter/Array?StockItemId=1&StockItemId=5&StockItemId=10
//
[HttpGet("Array")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetArray(
   [FromQuery(Name = "stockItemID")][Required(), MinLength(1, ErrorMessage = "Minimum of {1} StockItem id(s)"), MaxLength(100, ErrorMessage = "Maximum {1} StockItem ids")] int[] stockItemIDs)
{
    IEnumerable<Model.StockItemListDtoV1> response = null;

    using (SqlConnection db = new SqlConnection(this.connectionString))
    {
        response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(sql: @"SELECT [StockItemID] as ""ID"", [StockItemName] as ""Name"", [RecommendedRetailPrice], [TaxRate] FROM [Warehouse].[StockItems] WHERE  StockItemID IN @StockItemIds ", new { StockItemIDs = stockItemIDs }, commandType: CommandType.Text);
    }

    return this.Ok(response);
}

A customer wanted users to be able search for items selected in a multiple selection list so a Dapper WHERE IN value array was used.

Dapper WHERE IN with no StockItemIds on the query string
Dapper WHERE IN with several StockItemIds on query string

To explore how this worked I downloaded the Dapper source code and reference the project in my solution.

After single stepping through the Dapper source code I found where the array of StockTtems was getting mapped into a “generated” parameterised SQL statement.

Dapper generated parameterised SQL Statement

Based on my customer’s experiences a “mix ‘and ‘n’ match” approach to parameterising Dapper queries looks like a reasonable approach.

.NET Core web API + Dapper – Asynchronicity Revisited

Asynchronous is always better, maybe…

For a trivial ASP.NET Core web API controller like the one below the difference between using synchronous and asynchronous calls is most probably negligible. Especially as the sample World Wide Importers database [Warehouse].[StockItems] table only has 227 records.

[HttpGet("IEnumerableSmall")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetIEnumerableSmall([FromQuery] bool buffered = false)
{
	IEnumerable<Model.StockItemListDtoV1> response = null;

	using (SqlConnection db = new SqlConnection(this.connectionString))
	{
		logger.LogInformation("IEnumerableSmall start Buffered:{buffered}", buffered);

		response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(
			sql: @"SELECT [SI1].[StockItemID] as ""ID"", [SI1].[StockItemName] as ""Name"", [SI1].[RecommendedRetailPrice], [SI1].[TaxRate]" +
				   "FROM [Warehouse].[StockItems] as SI1",
			buffered,
			commandType: CommandType.Text);

		logger.LogInformation("IEnumerableSmall done");
	}

	return this.Ok(response);
}

The easiest way to increase the size of the returned record was with CROSS JOIN(s). This is the first (and most probably the last time) I have used a cross join in a “real” application.

[HttpGet("IEnumerableMedium")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetIEnumerableMedium([FromQuery] bool buffered = false)
{
	IEnumerable<Model.StockItemListDtoV1> response = null;

	using (SqlConnection db = new SqlConnection(this.connectionString))
	{
		logger.LogInformation("IEnumerableMedium start Buffered:{buffered}", buffered);

		response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(
					sql: @" SELECT [SI2].[StockItemID] as ""ID"", [SI2].[StockItemName] as ""Name"", [SI2].[RecommendedRetailPrice], [SI2].[TaxRate]" +
									"FROM [Warehouse].[StockItems] as SI1" +
									"CROSS JOIN[Warehouse].[StockItems] as SI2",
					buffered,
					commandType: CommandType.Text);

		logger.LogInformation("IEnumerableMedium done");
	}

	return this.Ok(response);
}

The medium controller returns 51,529 (227 x 227) rows and the large controller upto 11,697,083 (227 x 227 x 227) rows.

[HttpGet("IEnumerableLarge")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetIEnumerableLarge()
{
	IEnumerable<Model.StockItemListDtoV1> response = null;

	using (SqlConnection db = new SqlConnection(this.connectionString))
	{
		logger.LogInformation("IEnumerableLarge start");

		response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(
				sql: $@"SELECT [SI3].[StockItemID] as ""ID"", [SI3].[StockItemName] as ""Name"", [SI3].[RecommendedRetailPrice], [SI3].[TaxRate]" +
						"FROM [Warehouse].[StockItems] as SI1" +
						"   CROSS JOIN[Warehouse].[StockItems] as SI2" +
						"	CROSS JOIN[Warehouse].[StockItems] as SI3",
				commandType: CommandType.Text);

		logger.LogInformation("IEnumerableLarge done");
	}

	return this.Ok(response);
}

The first version of “IEnumerableLarge” returned some odd Hyper Text Transfer Protocol(HTTP) error codes and Opera kept running out of memory.

After a roughly 3minute delay Opera Browser displayed a 500 error

I think this error was due to the Azure App Service Load Balancer 230 second timeout.

Opera displaying out of memory error

I added some query string parameters to the IEnumerable and IAsyncEnumerable methods so the limit number of records returned by the QueryWithRetryAsync(us the TOP statement).

if (command.Buffered)
{
   var buffer = new List<T>();
   var convertToType = Nullable.GetUnderlyingType(effectiveType) ?? effectiveType;
   while (await reader.ReadAsync(cancel).ConfigureAwait(false))
   {
      object val = func(reader);
      buffer.Add(GetValue<T>(reader, effectiveType, val));
   }
   while (await reader.NextResultAsync(cancel).ConfigureAwait(false)) 
   { /* ignore subsequent result sets */ }
   command.OnCompleted();
   return buffer;
}
else
{
   // can't use ReadAsync / cancellation; but this will have to do
   wasClosed = false; // don't close if handing back an open reader; rely on the command-behavior
   var deferred = ExecuteReaderSync<T>(reader, func, command.Parameters);
   reader = null; // to prevent it being disposed before the caller gets to see it
   return deferred;
 }

The QueryWithRetryAsync method (My wrapper around Dapper’s QueryAsync) also has a “buffered” vs. “Unbuffered” reader parameter(defaults to True) and I wanted to see if that had any impact.

[HttpGet("IEnumerableLarge")]
public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetIEnumerableLarge([FromQuery] bool buffered = false, [FromQuery] int recordCount = 10)
{
	IEnumerable<Model.StockItemListDtoV1> response = null;

	using (SqlConnection db = new SqlConnection(this.connectionString))
	{
		logger.LogInformation("IEnumerableLarge start RecordCount:{recordCount} Buffered:{buffered}", recordCount, buffered);

		response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(
			sql: $@"SELECT TOP({recordCount}) [SI3].[StockItemID] as ""ID"", [SI3].[StockItemName] as ""Name"", [SI3].[RecommendedRetailPrice], [SI3].[TaxRate]" +
					"FROM [Warehouse].[StockItems] as SI1" +
					"   CROSS JOIN[Warehouse].[StockItems] as SI2" +
					"	CROSS JOIN[Warehouse].[StockItems] as SI3",
		buffered,
		commandType: CommandType.Text);

		logger.LogInformation("IEnumerableLarge done");
	}

	return this.Ok(response);
}

I used Telerik Fiddler to call the StockItemsIAsyncEnumerable controller IEnumberable and IAsyncEnumerable methods. The Azure App Service was hosted in an Azure Application Plan (S1, 100 total ACU, 1.75 GB). I found Telerik Fiddler had problems with larger responses, and would crash if the body of a larger response was viewed.

IEnumberableLarge method (buffered=false) response sizes and timings
IEnumberableLarge method (buffered=true) response sizes and timings

The unbuffered buffered version was slower Time To Last Byte(TTLB) and failed earlier which I was expecting.

[HttpGet("IAsyncEnumerableLarge")]
public async Task<ActionResult<IAsyncEnumerable<Model.StockItemListDtoV1>>> GetAsyncEnumerableLarge([FromQuery] bool buffered = false, [FromQuery]int recordCount = 10)
{
    IEnumerable<Model.StockItemListDtoV1> response = null;

    using (SqlConnection db = new SqlConnection(this.connectionString))
    {
        logger.LogInformation("IAsyncEnumerableLarge start RecordCount:{recordCount} Buffered:{buffered}", recordCount, buffered);

        response = await db.QueryWithRetryAsync<Model.StockItemListDtoV1>(
            sql: $@"SELECT TOP({recordCount}) [SI3].[StockItemID] as ""ID"", [SI3].[StockItemName] as ""Name"", [SI3].[RecommendedRetailPrice], [SI3].[TaxRate]" +
                    "FROM [Warehouse].[StockItems] as SI1" +
                    "   CROSS JOIN[Warehouse].[StockItems] as SI2" +
                    "   CROSS JOIN[Warehouse].[StockItems] as SI3",
        buffered,
        commandType: CommandType.Text);

        logger.LogInformation("IAsyncEnumerableLarge done");
    }

    return this.Ok(response);
}
IAsyncEnumberableLarge method response sizes and timings
[HttpGet("IAsyncEnumerableLargeYield")]
public async IAsyncEnumerable<Model.StockItemListDtoV1> GetAsyncEnumerableLargeYield([FromQuery] int recordCount = 10)
{
	int rowCount = 0;

	using (SqlConnection db = new SqlConnection(this.connectionString))
	{
		logger.LogInformation("IAsyncEnumerableLargeYield start RecordCount:{recordCount}", recordCount);

		CommandDefinition commandDefinition = new CommandDefinition(
			$@"SELECT TOP({recordCount}) [SI3].[StockItemID] as ""ID"", [SI3].[StockItemName] as ""Name"", [SI3].[RecommendedRetailPrice], [SI3].[TaxRate]" +
						"FROM [Warehouse].[StockItems] as SI1" +
						"   CROSS JOIN[Warehouse].[StockItems] as SI2" +
						"	CROSS JOIN[Warehouse].[StockItems] as SI3",
			//commandTimeout:
			CommandType.Text,
			//flags: CommandFlags.Pipelined
		);

		using var reader = await db.ExecuteReaderWithRetryAsync(commandDefinition);

		var rowParser = reader.GetRowParser<Model.StockItemListDtoV1>();

		while (await reader.ReadAsync())
		{
			rowCount++;

			if ((rowCount % 10000) == 0)
			{
				logger.LogInformation("Row count:{0}", rowCount);
			}

			yield return rowParser(reader);
		}
		logger.LogInformation("IAsyncEnumerableLargeYield done");
	}
}

When this post was written (August 2022) Dapper IAsyncEnumerable understanding was limited so I trialed the approach suggested in the StackOverflow post.

IAsyncEnumberableLargeYield method response sizes and timings

The IAsyncEnumerableLargeYield was faster to start responding, the overall duration was less and returned significantly more records 7000000 vs. 13000000. I assume this was because the response was streamed so there wasn’t a timeout.

Azure Application Insights displaying the IAsyncEnumerable with yield method executing

The results of my tests should be treated as “indicative” rather than “definitive”. In a future post I compare the scalability of different approaches. The number of records returned by the IAsyncEnumerableLargeYield not realistic and in a “real-world” scenario paging or an alternate approach should be used.

.NET Core web API + Dapper – Readonly query workloads with Data Sync

Azure SQL Data Synchronisation Process

Read-only replicas of an Azure SQL Database database with Active geo-replication are easy to setup but there are some disadvantages. e.g. bi-directional synchronisation is not supported, not all tables or selected columns of some tables might not be needed\should not be accessible for reporting, the overhead of replicating tables used for transaction processing might impact on the performance of the solution etc. Azure SQL Data Sync is a service built on Azure SQL Database that can synchronise selected data bi-directionally across multiple databases, both on-premises and in the cloud.

The first step was to remove all the Microsoft SQL Server features used in the the World Wide Importers database (e.g. Sequence Numbers, Column Store indexes etc.) which are not supported(see general limitiations) by Azure SQL Data Sync. I then used the “Deploy Database Wizard” to copy my modified World Wide Importers database to an Azure SQL Database.

Deploy Database to Microsoft Azure SQL Database running
Microsoft Azure SQL Database Servers with WorldWideImporters and ReadonlyReplicaHub database
Data Sync Group creation onReadOnlyReplicaHub database

For my “read-only replicas” scenario if there are any update conflicts the the source database “wins”.

Data Sync Group created
Data Sync Group configuration database management
Data Sync Group configuration adding “source” database synchronisation to Hub
Data Sync Group configuration adding “destination” database(s) synchronisation from Hub
Data Sync Group selecting source database StockItems table and columns to synchronise
Data Sync Group “destination” databases configured
Initial synchronisation of only StockItems table and seed data
Regular synchronisation of a subset of StockItems columns to destination databases
Source Azure SQL Database regular synchronisation compute utilisation
StockItemsReadOnlyReplicas Controller JSON after first replication completed
Azure application Insights Dependencies showing usage of different synchronised databases
StockItems table in source database with updated RRP
StockItems table in destination database with updated RRP after next scheduled snychronisation
StockItems table in destination database after next scheduled synchronisation

The Azure SQL Database Data Sync was pretty easy to setup (configuration in the hub database tripped me up initially). For a production scenario where only a portion of the database (e.g. shaped by Customer, Geography, security considerations, or a bi-directional requirement) it would be an effective solution, though for some applications the delay between synchronisations might be an issue.

.NET Core web API + Dapper – ADO.Net Retries

Recovering from transient failures with ADO.Net RetryLogicProvider

This post is all about learning from failure, hopefully it will help someone else…

A while ago I wrote DapperTransient which uses Polly to retry SQLConnection and SQLCommand operations if the failure might be “transient”. My DapperTransient code wraps nearly all of the Dapper methods with a Polly RetryPolicy.ExecuteAsync.

public static Task<int> ExecuteWithRetryAsync(
			  this IDbConnection connection,
			  string sql,
			  object param = null,
			  IDbTransaction transaction = null,
			  int? commandTimeout = null,
			  CommandType? commandType = null) => RetryPolicy.ExecuteAsync(() => connection.ExecuteAsync(sql, param, transaction, commandTimeout, commandType));


One company I work for has a 10+year old VB.Net codebase that makes extensive use of ADO.Net calls which we moved to Azure Infrastructure as a Service(IaaS) a few years ago. Every so often they would get a cluster of ADO.Net exceptions when executing stored procedures in their Azure SQL database. While I was investigating how to retry transient failures without a major refactoring of the codebase I stumbled across SqlRetryLogicOption + TransientErrors, SqlRetryLogicBaseProvider and RetryLogicProvider which looked like a viable solution. At the time I also wondered if it would be possible to use the same approach with Dapper.

namespace devMobile.WebAPIDapper.Lists.Controllers
{
    [ApiController]
    [Route("api/[controller]")]
    public class StockItemsRetryADONetController : ControllerBase
    {
        private readonly string connectionString;
        private readonly ILogger<StockItemsRetryADONetController> logger;

        // This is a bit nasty but sufficient for PoC
        private readonly int NumberOfRetries = 3;
        private readonly TimeSpan TimeBeforeNextExecution = TimeSpan.Parse("00:00:01");
        private readonly TimeSpan MaximumInterval = TimeSpan.Parse("00:00:30");
        private readonly List<int> TransientErrors = new List<int>()
        {
            49920, // Cannot process rquest. Too many operations in progress for subscription
			49919, // Cannot process create or update request.Too many create or update operations in progress for subscription
			49918, // Cannot process request. Not enough resources to process request.
			41839, // Transaction exceeded the maximum number of commit dependencies.
			41325, // The current transaction failed to commit due to a serializable validation failure.
			41305, // The current transaction failed to commit due to a repeatable read validation failure.
			41302, // The current transaction attempted to update a record that has been updated since the transaction started.
			41301, // Dependency failure: a dependency was taken on another transaction that later failed to commit.
			40613, // Database XXXX on server YYYY is not currently available. Please retry the connection later.
			40501, // The service is currently busy. Retry the request after 10 seconds
			40197, // The service has encountered an error processing your request. Please try again
			20041, // Transaction rolled back. Could not execute trigger. Retry your transaction.
			17197, // Login failed due to timeout; the connection has been closed. This error may indicate heavy server load.
			14355, // The MSSQLServerADHelper service is busy. Retry this operation later.
			11001, // Connection attempt failed
			10936, // The request limit for the elastic pool has been reached. 
			10929, // The server is currently too busy to support requests.
			10928, // The limit for the database is has been reached
			10922, // Operation failed. Rerun the statement.
			10060, // A network-related or instance-specific error occurred while establishing a connection to SQL Server.
			10054, // A transport-level error has occurred when sending the request to the server.
			10053, // A transport-level error has occurred when receiving results from the server.
			9515, // An XML schema has been altered or dropped, and the query plan is no longer valid. Please rerun the query batch.
			8651, // Could not perform the operation because the requested memory grant was not available in resource pool
			8645, // A timeout occurred while waiting for memory resources to execute the query in resource pool, Rerun the query
			8628, // A timeout occurred while waiting to optimize the query. Rerun the query. 
			4221, // Login to read-secondary failed due to long wait on 'HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING'. The replica is not available for login because row versions are missing for transactions that were in-flight when the replica was recycled
			4060, // Cannot open database requested by the login. The login failed.
			3966, // Transaction is rolled back when accessing version store. It was earlier marked as victim when the version store was shrunk due to insufficient space in tempdb. Retry the transaction.
			3960, // Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table directly or indirectly in database
			3935, // A FILESTREAM transaction context could not be initialized. This might be caused by a resource shortage. Retry the operation.
			1807, // Could not obtain exclusive lock on database 'model'. Retry the operation later.
			1221, // The Database Engine is attempting to release a group of locks that are not currently held by the transaction. Retry the transaction.
			1205, // Deadlock
			1204, // The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement.
			1203, // A process attempted to unlock a resource it does not own. Retry the transaction.
			997, // A connection was successfully established with the server, but then an error occurred during the login process.
			921, // Database has not been recovered yet. Wait and try again.
			669, // The row object is inconsistent. Please rerun the query.
			617, // Descriptor for object in database not found in the hash table during attempt to un-hash it. Rerun the query. If a cursor is involved, close and reopen the cursor.
			601, // Could not continue scan with NOLOCK due to data movement.
			233, // The client was unable to establish a connection because of an error during connection initialization process before login.
			121, // The semaphore timeout period has expired.
			64, // A connection was successfully established with the server, but then an error occurred during the login process.
			20, // The instance of SQL Server you attempted to connect to does not support encryption.
		};
...
}

After some experimentation the most reliable way I could reproduce a transient failure (usually SQL Error 11001-“An error has occurred while establishing a connection to the server”) was by modifying the database connection string or unplugging the network cable after a connection had been explicitly opened or command executed.

namespace devMobile.WebAPIDapper.Lists.Controllers
{
    [ApiController]
    [Route("api/[controller]")]
    public class StockItemsRetryADONetController : ControllerBase
    {
...
		[HttpGet("Dapper")]
		public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetDapper()
        {
            IEnumerable<Model.StockItemListDtoV1> response = null;

            SqlRetryLogicOption sqlRetryLogicOption = new SqlRetryLogicOption()
            {
                NumberOfTries = NumberOfRetries,
                DeltaTime = TimeBeforeNextExecution,
                MaxTimeInterval = MaximumInterval,
                TransientErrors = TransientErrors,
                //AuthorizedSqlCondition = x => string.IsNullOrEmpty(x) || Regex.IsMatch(x, @"^SELECT", RegexOptions.IgnoreCase),
            };

            SqlRetryLogicBaseProvider sqlRetryLogicProvider = SqlConfigurableRetryFactory.CreateFixedRetryProvider(sqlRetryLogicOption);

            using (SqlConnection db = new SqlConnection(this.connectionString))
            {
                db.RetryLogicProvider = sqlRetryLogicProvider;

                db.RetryLogicProvider.Retrying += new EventHandler<SqlRetryingEventArgs>(OnDapperRetrying);

                await db.OpenAsync(); // Did explicitly so I could yank out the LAN cable.

                response = await db.QueryAsync<Model.StockItemListDtoV1>(sql: @"SELECT [StockItemID] as ""ID"", [StockItemName] as ""Name"", [RecommendedRetailPrice], [TaxRate] FROM [Warehouse].[StockItems]", commandType: CommandType.Text);
            }

            return this.Ok(response);
        }

        protected void OnDapperRetrying(object sender, SqlRetryingEventArgs args)
        {
            logger.LogInformation("Dapper retrying for {RetryCount} times for {args.Delay.TotalMilliseconds:0.} mSec - Error code: {Number}", args.RetryCount, args.Delay.TotalMilliseconds, (args.Exceptions[0] as SqlException).Number);
        }
...
    }
}

For my initial testing I used an invalid Azure SQL Database connection string and in the Visual Studio 2022 Debug output I could see retries.

ADO.Net RetryLogicProvider retrying request 3 times

I then added an OpenAsync just before the Dapper query so I could open the database connection, pause the program with a breakpoint, unplug the LAN cable and then continue execution. The QueryAsync failed without any retries and modifying the AuthorizedSqlCondition didn’t seem change the way different SQL statement failures were handled.

There was limited documentation about how to use ADO.Net retry functionality so I hacked up another method to try and figure out what I had done wrong. The method uses the same SqlRetryLogicOption configuration for retrying connection and command failures.

namespace devMobile.WebAPIDapper.Lists.Controllers
{
    [ApiController]
    [Route("api/[controller]")]
    public class StockItemsRetryADONetController : ControllerBase
    {
...
        [HttpGet("AdoNet")]
        public async Task<ActionResult<IEnumerable<Model.StockItemListDtoV1>>> GetAdoNet()
        {
            List<Model.StockItemListDtoV1> response = new List<Model.StockItemListDtoV1>();

            // Both connection and command share same logic not really an issue for nasty demo
            SqlRetryLogicOption sqlRetryLogicOption = new SqlRetryLogicOption()
            {
                NumberOfTries = NumberOfRetries,
                DeltaTime = TimeBeforeNextExecution,
                MaxTimeInterval = MaximumInterval,
                TransientErrors = TransientErrors,
                //AuthorizedSqlCondition = x => string.IsNullOrEmpty(x) || Regex.IsMatch(x, @"^SELECT", RegexOptions.IgnoreCase),
            };

            SqlRetryLogicBaseProvider sqlRetryLogicProvider = SqlConfigurableRetryFactory.CreateFixedRetryProvider(sqlRetryLogicOption);


            // This ADO.Net is a bit overkill but just wanted to highlight ADO.Net vs. Dapper
            using (SqlConnection sqlConnection = new SqlConnection(this.connectionString))
            {
                sqlConnection.RetryLogicProvider = sqlRetryLogicProvider;
                sqlConnection.RetryLogicProvider.Retrying += new EventHandler<SqlRetryingEventArgs>(OnConnectionRetrying);

                await sqlConnection.OpenAsync(); // Did explicitly so I could yank out the LAN cable.

                using (SqlCommand sqlCommand = new SqlCommand())
                {
                    sqlCommand.Connection = sqlConnection;
                    sqlCommand.CommandText = @"SELECT [StockItemID] as ""ID"", [StockItemName] as ""Name"", [RecommendedRetailPrice], [TaxRate] FROM [Warehouse].[StockItems]";
                    sqlCommand.CommandType = CommandType.Text;

                    sqlCommand.RetryLogicProvider = sqlRetryLogicProvider;
                    sqlCommand.RetryLogicProvider.Retrying += new EventHandler<SqlRetryingEventArgs>(OnCommandRetrying);

                    // Over kill but makes really obvious
                    using (SqlDataReader sqlDataReader = await sqlCommand.ExecuteReaderAsync(CommandBehavior.CloseConnection))
                    {
                        while (await sqlDataReader.ReadAsync())
                        {
                            response.Add(new Model.StockItemListDtoV1()
                            {
                                Id = sqlDataReader.GetInt32("Id"),
                                Name = sqlDataReader.GetString("Name"),
                                RecommendedRetailPrice = sqlDataReader.GetDecimal("RecommendedRetailPrice"),
                                TaxRate = sqlDataReader.GetDecimal("TaxRate"),
                            });
                        }
                    }
                };
            }

            return this.Ok(response);
        }

        protected void OnConnectionRetrying(object sender, SqlRetryingEventArgs args)
        {
            logger.LogInformation("Connection retrying for {RetryCount} times for {args.Delay.TotalMilliseconds:0.} mSec - Error code: {Number}", args.RetryCount, args.Delay.TotalMilliseconds, (args.Exceptions[0] as SqlException).Number);
        }

        protected void OnCommandRetrying(object sender, SqlRetryingEventArgs args)
        {
            logger.LogInformation("Command retrying for {RetryCount} times for {args.Delay.TotalMilliseconds:0.} mSec - Error code: {Number}", args.RetryCount, args.Delay.TotalMilliseconds, (args.Exceptions[0] as SqlException).Number);
        }
    }
}

I also added two RetryLogicProvider.Retrying handlers one for SQLConnection and the other for SQLCommand so I could see what was being retried.

sqlConnection.RetryLogicProvider with a broken connection string
sqlCommand.RetryLogicProvider with the LAN cable unplugged just before executing query

The number of retries when I unplugged the LAN cable wasn’t quite what I was expecting….

I didn’t fully understand the differences between System.Data.Sqlclient and Microsoft.Data.Sqlclient so I downloaded the source code for Dapper and starting hacking. My approach was to modify the Dapper CommandDefinition struct so a caller could pass in a SqlRetryLogicBaseProvider instance.

namespace Dapper
{
    /// <summary>
    /// Represents the key aspects of a sql operation
    /// </summary>
    public struct CommandDefinition
    {
        internal static CommandDefinition ForCallback(object parameters)
        {
            if (parameters is DynamicParameters)
            {
                return new CommandDefinition(parameters);
            }
            else
            {
                return default;
            }
        }

        internal void OnCompleted()
        {
            (Parameters as SqlMapper.IParameterCallbacks)?.OnCompleted();
        }

        /// <summary>
        /// The command (sql or a stored-procedure name) to execute
        /// </summary>
        public string CommandText { get; }

        /// <summary>
        /// The parameters associated with the command
        /// </summary>
        public object Parameters { get; }

        /// <summary>
        /// The active transaction for the command
        /// </summary>
        public IDbTransaction Transaction { get; }

        /// <summary>
        /// The effective timeout for the command
        /// </summary>
        public int? CommandTimeout { get; }

        /// <summary>
        /// The type of command that the command-text represents
        /// </summary>
        public CommandType? CommandType { get; }

        /// <summary>
        /// Should data be buffered before returning?
        /// </summary>
        public bool Buffered => (Flags & CommandFlags.Buffered) != 0;

        /// <summary>
        /// 
        /// </summary>
        public SqlRetryLogicBaseProvider SqlRetryLogicProvider { get; }

        /// <summary>
        /// Should the plan for this query be cached?
        /// </summary>
        internal bool AddToCache => (Flags & CommandFlags.NoCache) == 0;

        /// <summary>
        /// Additional state flags against this command
        /// </summary>
        public CommandFlags Flags { get; }

        /// <summary>
        /// Can async queries be pipelined?
        /// </summary>
        public bool Pipelined => (Flags & CommandFlags.Pipelined) != 0;

        /// <summary>
        /// Initialize the command definition
        /// </summary>
        /// <param name="commandText">The text for this command.</param>
        /// <param name="parameters">The parameters for this command.</param>
        /// <param name="transaction">The transaction for this command to participate in.</param>
        /// <param name="commandTimeout">The timeout (in seconds) for this command.</param>
        /// <param name="commandType">The <see cref="CommandType"/> for this command.</param>
        /// <param name="flags">The behavior flags for this command.</param>
        /// <param name="sqlRetryLogicProvider">Retry strategy for this command.</param>
        /// <param name="cancellationToken">The cancellation token for this command.</param>
        public CommandDefinition(string commandText, object parameters = null, IDbTransaction transaction = null, int? commandTimeout = null,
                                 CommandType? commandType = null, CommandFlags flags = CommandFlags.Buffered
                                 , SqlRetryLogicBaseProvider sqlRetryLogicProvider = null
                                 , CancellationToken cancellationToken = default
            )
        {
            CommandText = commandText;
            Parameters = parameters;
            Transaction = transaction;
            CommandTimeout = commandTimeout;
            CommandType = commandType;
            Flags = flags;
            SqlRetryLogicProvider = sqlRetryLogicProvider;
            CancellationToken = cancellationToken;
        }
...
}

This didn’t end well, as the Dapper library extends System.Data.IDbConnection which doesn’t “natively” support retry logic. Several hours lost from my life I now understand a bit more about the differences between System.Data.Sqlclient and Microsoft.Data.Sqlclient.