Mistral | devMobile's blog

Building a lightweight Codestral Chat Completion Command-Line Interface(CLI) is one of the fastest ways to understand how modern Large Language Model(LLM) Application Programming Interfaces (API) work in the real world. While everyone is talking about “agentic” programming (I now spend a lot of time reviewing code, as more agents = more reviews) and “token maxing” (difficult conversations with finance) this series of posts is about the plumbing.

I’m starting from the bottom and working my way up the stack. A raw Hypertext Transfer Protocol(HTTP) contract: an HTTP POST, a model name, a messages array, and a response object you have to parse yourself. With an HTTP proxy (Telerik Fiddler) I confirmed the Mistral Chat API endpoint Uniform Resource Locator(URL) and my API Key worked.

Before writing or generating typed Data Transfer Objects(DTO), or any kind of strongly‑typed client, streaming the response JSON into a jsonDocument was a good way to visualise the shape of the responses. I could enumerate properties, check for missing or inconsistent fields, validate casing, and confirm whether optional objects appear only in certain scenarios. Any undocumented polymorphic shapes, and response constructs can be impossible to model cleanly, and “hand-rolled” serialisation can be fragile.

//...
Console.Write("Enter chat message: ");
var prompt = Console.ReadLine();

// Anonymous type for the request body which feels bit "hinky" but, it works and is concise. Alternatively, could define a class for request body for better type safety and maintainability.
var requestObject = new
{
   model = settings.ModelName,
   messages = new[]
   {
      new { role = "user", content = prompt }
   }
};

// Alternatively, a JsonObject and JsonArray for more control over the JSON structure
var requestJson = new JsonObject()
{
   ["model"] = settings.ModelName,
   ["messages"] = new JsonArray
   {
      new JsonObject
      {
         ["role"] = "user",
         ["content"] = prompt
      }
   }
};

// Create HttpClient with required headers. Note that HttpClient should ideally be reused, but for simplicity we're creating a new instance here.
HttpClient httpClient = new()
{
   DefaultRequestHeaders =
   {
      Accept = { new MediaTypeWithQualityHeaderValue("application/json") },
      Authorization = new AuthenticationHeaderValue("Bearer", settings.ApiKey)
   },
   BaseAddress = new Uri(settings.BaseUrl)
};

using var httpResponse = await httpClient.PostAsync("chat/completions", new StringContent(JsonSerializer.Serialize(requestObject), Encoding.UTF8, "application/json"));

httpResponse.EnsureSuccessStatusCode();

using var stream = await httpResponse.Content.ReadAsStreamAsync();
using var responseDocument = await JsonDocument.ParseAsync(stream);

var content = responseDocument.RootElement.GetProperty("choices")[0].GetProperty("message").GetProperty("content").GetString();

Console.WriteLine(content);
Console.WriteLine();

var usage = responseDocument.RootElement.GetProperty("usage");
Console.WriteLine($"Prompt tokens: {usage.GetProperty("prompt_tokens").GetInt32()}");
Console.WriteLine($"Completion tokens: {usage.GetProperty("completion_tokens").GetInt32()}");
Console.WriteLine($"Total tokens: {usage.GetProperty("total_tokens").GetInt32()}");
Console.WriteLine();

Console.WriteLine("Press <Enter> to exit...");
Console.ReadLine();

Reading the response string and parsing it with a jsonDocument is straightforward but can be inefficient for large responses because it loads the entire response into memory.

I built the typed interface by inspecting real JSON responses especially structures like choices[*].message and cross‑checking them against the Mistral API docs. This highlighted which fields are genuinely optional, which only appear for tool calls, and which vary by finish reason. It also highlighted that I would have to introduce polymorphic message classes so the interface can cleanly represent text messages, tool‑call messages, and whatever new variants the API adds later.

// Create HttpClient with required headers. Note that HttpClient should ideally be reused, but for simplicity we're creating a new instance here..
using HttpClient httpClient = new()
{
   DefaultRequestHeaders =
   {
      Accept = { new MediaTypeWithQualityHeaderValue("application/json") },
      Authorization = new AuthenticationHeaderValue("Bearer", settings.ApiKey)
   },
   BaseAddress = new Uri(settings.BaseUrl)
};

var jsonSerializerOptions = new JsonSerializerOptions()
{
   // PropertyNamingPolicy removed - [JsonPropertyName] attributes on model handle wire names
   WriteIndented           = false,
   DefaultIgnoreCondition  = JsonIgnoreCondition.WhenWritingNull,
   AllowTrailingCommas     = false,
   ReadCommentHandling     = JsonCommentHandling.Disallow,
   UnmappedMemberHandling  = JsonUnmappedMemberHandling.Skip,
};

Console.Write("Enter chat message: ");
var content = Console.ReadLine();

while (!string.IsNullOrWhiteSpace(content))
{
   var request = new ChatCompletionRequest
   {
      Model = settings.ModelName,
      Messages =
      [
         new ChatMessage { Role = "user", Content = content }
      ],
   };

   try
   {
      using var httpResponse = await httpClient.PostAsJsonAsync("chat/completions", request, jsonSerializerOptions);
      httpResponse.EnsureSuccessStatusCode();

      ChatCompletionResponse? chatCompletionResponse = await httpResponse.Content.ReadFromJsonAsync<ChatCompletionResponse>(jsonSerializerOptions);

      if (chatCompletionResponse != null)
      {
         foreach (var choice in chatCompletionResponse.Choices)
         {
            Console.WriteLine(choice.Message.Content);
         }

         Console.WriteLine();
         if (chatCompletionResponse.Usage != null)
         {
            Console.WriteLine($"Prompt tokens: {chatCompletionResponse.Usage.PromptTokens}");
            Console.WriteLine($"Completion tokens: {chatCompletionResponse.Usage.CompletionTokens}");
            Console.WriteLine($"Total tokens: {chatCompletionResponse.Usage.TotalTokens}");
         }
         Console.WriteLine();
      }
   }
   catch (HttpRequestException ex)
   {
      Console.WriteLine($"Request failed: {(int?)ex.StatusCode} {ex.Message}");
   }
   catch (TaskCanceledException)
   {
      Console.WriteLine("Request timed out.");
   }
   catch (JsonException ex)
   {
      Console.WriteLine($"Failed to parse response: {ex.Message}");
   }

   Console.Write("Enter chat message: ");
   content = Console.ReadLine();
}

public sealed class ChatCompletionRequest
{
   [JsonPropertyName("model")] public required string Model { get; init; }
   [JsonPropertyName("messages")] public required List<ChatMessage> Messages { get; init; }
   [JsonPropertyName("temperature")] public double? Temperature { get; init; }
   [JsonPropertyName("max_tokens")] public int? MaxTokens { get; init; }
   [JsonPropertyName("stream")] public bool? Stream { get; init; }
   [JsonPropertyName("response_format")] public ResponseFormat? ResponseFormat { get; init; }
}

public sealed class ChatMessage
{
   [JsonPropertyName("role")] public required string Role { get; init; }
   [JsonPropertyName("content")] public required string Content { get; init; }
}

public sealed class ResponseFormat
{
   [JsonPropertyName("type")] public required string Type { get; init; }
}

public sealed class ChatCompletionResponse
{
   [JsonPropertyName("id")] public required string Id { get; init; }
   [JsonPropertyName("choices")] public required List<ChatCompletionChoice> Choices { get; init; }
   [JsonPropertyName("usage")] public TokenUsage? Usage { get; init; }
}

public sealed class TokenUsage
{
   [JsonPropertyName("prompt_tokens")] public int? PromptTokens { get; init; }
   [JsonPropertyName("completion_tokens")] public int? CompletionTokens { get; init; }
   [JsonPropertyName("total_tokens")] public int? TotalTokens { get; init; }
}

public sealed class ChatCompletionChoice
{
   [JsonPropertyName("index")] public int Index { get; init; }
   [JsonPropertyName("message")] public required ChatMessage Message { get; init; }
   [JsonPropertyName("finish_reason")] public string? FinishReason { get; init; }
}

I had to capture the application output in two screenshots as the response text was longer.

The non-deterministic nature of LLMs resulted in different response messages, with the longest one consuming significantly more tokens, 530 vs. 799 (future posts will cover the use of Random_seed)

devMobile's blog

Random wanderings through Microsoft Azure esp. PaaS plumbing, the IoT bits, AI on Micro controllers, AI on Edge Devices, .NET nanoFramework, .NET Core on *nix and ML.NET+ONNX

Tag Archives: Mistral

Codestral CLI Synchronous Chat Completions