RFM69 hat library receive lockup debugging

Every so often my Enums & Masks test harness locked up and stopped receiving messages from my test rig. This seemed to happen more often when the send functionality of my library was not being used.

easysensors RFM69HCW test rig

After 5 to 30 minutes (a couple of times it was 5 to 8 hours overnight) the application stopped receiving messages and wouldn’t resume until the application (device reset) was restarted or the RegOpmode-Mode was quickly changed to sleep then back to receive.

private void InterruptGpioPin2_ValueChanged(GpioPin sender, GpioPinValueChangedEventArgs args)
{
    Debug.WriteLine("InterruptGpioPin2_ValueChanged");

    rfm69Device.SetMode(Rfm69HcwDevice.RegOpModeMode.Sleep);
    rfm69Device.SetMode(Rfm69HcwDevice.RegOpModeMode.Receive);
}

After re-reading the Semtech SX1231 datasheet one of the other possible solutions involved writing to the RegPacketConfig2-RestartRX bit

RegPacketConfig2 configuration options

Of the different approaches I found this code was the most reliable way of restarting reception of packets.

private void InterruptGpioPin3_ValueChanged(GpioPin sender, GpioPinValueChangedEventArgs args)
{
	Debug.WriteLine("InterruptGpioPin3_ValueChanged");

	byte regpacketConfig2 = rfm69Device.RegisterManager.ReadByte(0x3d);
	regpacketConfig2 |= (byte)0x04;
	rfm69Device.RegisterManager.WriteByte(0x3d, regpacketConfig2);
}

I had noticed this code in the Low Power Lab and wondered what it was for. The HopeRF library didn’t appear to have code like this to restart reception which was interesting.

void RFM69::send(uint16_t toAddress, const void* buffer, uint8_t bufferSize, bool requestACK)
{
  writeReg(REG_PACKETCONFIG2, (readReg(REG_PACKETCONFIG2) & 0xFB) | RF_PACKET2_RXRESTART); // avoid RX deadlocks
  uint32_t now = millis();
  while (!canSend() && millis() - now < RF69_CSMA_LIMIT_MS) receiveDone();
  sendFrame(toAddress, buffer, bufferSize, requestACK, false);
}

// should be called immediately after reception in case sender wants ACK
void RFM69::sendACK(const void* buffer, uint8_t bufferSize) {
  ACK_REQUESTED = 0;   // TWS added to make sure we don't end up in a timing race and infinite loop sending Acks
  uint16_t sender = SENDERID;
  int16_t _RSSI = RSSI; // save payload received RSSI value
  writeReg(REG_PACKETCONFIG2, (readReg(REG_PACKETCONFIG2) & 0xFB) | RF_PACKET2_RXRESTART); // avoid RX deadlocks
  uint32_t now = millis();
  while (!canSend() && millis() - now < RF69_CSMA_LIMIT_MS) receiveDone();
  SENDERID = sender;    // TWS: Restore SenderID after it gets wiped out by receiveDone()
  sendFrame(sender, buffer, bufferSize, false, true);
  RSSI = _RSSI; // restore payload RSSI
}

void RFM69::receiveBegin() {
  DATALEN = 0;
  SENDERID = 0;
  TARGETID = 0;
  PAYLOADLEN = 0;
  ACK_REQUESTED = 0;
  ACK_RECEIVED = 0;
#if defined(RF69_LISTENMODE_ENABLE)
  RF69_LISTEN_BURST_REMAINING_MS = 0;
#endif
  RSSI = 0;
  if (readReg(REG_IRQFLAGS2) & RF_IRQFLAGS2_PAYLOADREADY)
    writeReg(REG_PACKETCONFIG2, (readReg(REG_PACKETCONFIG2) & 0xFB) | RF_PACKET2_RXRESTART); // avoid RX deadlocks
  writeReg(REG_DIOMAPPING1, RF_DIOMAPPING1_DIO0_01); // set DIO0 to "PAYLOADREADY" in receive mode
  setMode(RF69_MODE_RX);
}

In the debug output you can see that clock frequencies of the two test devices are slightly different. Every so often they transmit close enough to corrupt one of the message payloads which causes the deadlock.

22:20:26.379 Address 0X99 10011001
22:20:26 Received 14 byte message Hello World:10
22:20:26.561 RegIrqFlags2 01100110
22:20:26.576 Address 0X66 01100110
22:20:26 Received 14 byte message Hello World:26
.22:20:27.501 RegIrqFlags2 01100110
22:20:27.517 Address 0X99 10011001
22:20:27 Received 14 byte message Hello World:11
22:20:27.699 RegIrqFlags2 01100110
22:20:27.714 Address 0X66 01100110
22:20:27 Received 14 byte message Hello World:27
...............................

Now I need to back integrate the fix into the send & receive message methods of my code, then stress test the library with even more client devices.

RFM69 hat library receive lockups issue

Sometimes while testing code you notice something odd. Every so often the Enums & Masks application locks up and stops receiving messages from my test rig.

easysensors RFM69HCW test rig

The symptom is that after 5 to 30 minutes the application stops receiving messages

21:37:37.568 RegIrqFlags1 11011001
21:37:37.583 Address 0X99 10011001
21:37:37 Received 14 byte message Hello World:61
..21:37:38.693 RegIrqFlags2 01100110
21:37:38.706 RegIrqFlags1 11011001
21:37:38.724 Address 0X99 10011001
21:37:38 Received 14 byte message Hello World:62
............The thread 0xba8 has exited with code 0 (0x0).
.................................................................................................................................................The thread 0xf90 has exited with code 0 (0x0).
.....................The thread 0xe30 has exited with code 0 (0x0).
.......................The thread 0xa04 has exited with code 0 (0x0).
................................The thread 0xc8c has exited with code 0 (0x0).
..........................................................................................The thread 0xc38 has exited with code 0 (0x0).
......................The thread 0xf68 has exited with code 0 (0x0).
......................................................................................The thread 0x1c8 has exited with code 0 (0x0).
..........The thread 0xeb8 has exited with code 0 (0x0).
..............................................................The thread 0xbb8 has exited with code 0 (0x0).
..........The thread 0xdc0 has exited with code 0 (0x0).
...............................The thread 0x820 has exited with code 0 (0x0).
....................................The thread 0xaac has exited with code 0 (0x0).
......The thread 0xbf0 has exited with code 0 (0x0).
............................................The thread 0x4e8 has exited with code 0 (0x0).
...............................The thread 0x1b4 has exited with code 0 (0x0).
...............................................................The thread 0xbdc has exited with code 0 (0x0).
....................The thread 0xb60 has exited with code 0 (0x0).
.........................................................................................................The thread 0x510 has exited with code 0 (0x0).
........The thread 0xf60 has exited with code 0 (0x0).
........................................................The thread 0x3c0 has exited with code 0 (0x0).
......................................The thread 0xa4c has exited with code 0 (0x0).
..................................................................The thread 0x9e0 has exited with code 0 (0x0).
....................The thread 0xd74 has exited with code 0 (0x0).
............................The thread 0xfa0 has exited with code 0 (0x0).
..................................................................................................The thread 0xfe0 has exited with code 0 (0x0).
....................................................................................The thread 0xdd4 has exited with code 0 (0x0).
........................The thread 0xc00 has exited with code 0 (0x0).
..................................The thread 0x478 has exited with code 0 (0x0).
.........................The thread 0x88c has exited with code 0 (0x0).
...........................................The thread 0x280 has exited with code 0 (0x0).
..........................................The thread 0x8e4 has exited with code 0 (0x0).
............The thread 0x410 has exited with code 0 (0x0).
..............................................The thread 0xa70 has exited with code 0 (0x0).
................The thread 0x994 has exited with code 0 (0x0).
....................The thread 0x298 has exited with code 0 (0x0).
..............The thread 0x3a4 has exited with code 0 (0x0).
............................................................The thread 0xa2c has exited with code 0 (0x0).
..........The thread 0x208 has exited with code 0 (0x0).
..........................................................................The thread 0xbd4 has exited with code 0 (0x0).
............The thread 0xfdc has exited with code 0 (0x0).
........................................................................The thread 0x36c has exited with code 0 (0x0).
...........22:08:57.638 RegIrqFlags2 01100110
22:08:57.658 RegIrqFlags1 11011001
22:08:57.676 Address 0X66 01100110
22:08:57 Received 15 byte message Hello World:157
22:08:57.807 RegIrqFlags2 01100110

But, every so often it would after many minutes start up again without me doing anything (I noticed this after leaving application running overnight). I could get the application to restart by putting a break point on the Debug.Write(“.”) and toggling the OperationMode from Sleep to Receive

Using Visual Studio Immediate Windows to execute SetMode

I have found if the device is transmitting every so often the lockups are also much less likely. To help with debugging the issue I have wired up the three buttons on the Adafruit Radio Bonnet to call different diagnostic code

public void Run(IBackgroundTaskInstance taskInstance)
{
	byte[] syncValues = { 0xAA, 0x2D, 0xD4 };
	byte[] aesKeyValues = { 0x0, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0X0E, 0X0F };

	GpioController gpioController = GpioController.GetDefault();

	InterruptGpioPin1 = gpioController.OpenPin(5);
	InterruptGpioPin1.SetDriveMode(GpioPinDriveMode.InputPullUp);
	InterruptGpioPin1.ValueChanged += InterruptGpioPin1_ValueChanged; ;

	InterruptGpioPin1 = gpioController.OpenPin(6);
	InterruptGpioPin1.SetDriveMode(GpioPinDriveMode.InputPullUp);
	InterruptGpioPin1.ValueChanged += InterruptGpioPin2_ValueChanged; ;

	InterruptGpioPin1 = gpioController.OpenPin(12);
	InterruptGpioPin1.SetDriveMode(GpioPinDriveMode.InputPullUp);
	InterruptGpioPin1.ValueChanged += InterruptGpioPin3_ValueChanged; ;

…

private void InterruptGpioPin1_ValueChanged(GpioPin sender, GpioPinValueChangedEventArgs args)
{
   Debug.WriteLine("InterruptGpioPin1_ValueChanged");
   rfm69Device.SetMode(Rfm69HcwDevice.RegOpModeMode.Sleep);
   rfm69Device.SetMode(Rfm69HcwDevice.RegOpModeMode.Receive);
}

private void InterruptGpioPin2_ValueChanged(GpioPin sender, GpioPinValueChangedEventArgs args)
{
   Debug.WriteLine("Receive-Wait");
   byte IrqFlags = rfm69Device.RegisterManager.ReadByte(0x28); // RegIrqFlags2
   while ((IrqFlags & 0b00000100) == 0)  // wait until PayLoadReady set
   {
      Task.Delay(20).Wait();
      IrqFlags = rfm69Device.RegisterManager.ReadByte(0x28); // RegIrqFlags2
      Debug.WriteLine(string.Format("RegIrqFlags {0}", Convert.ToString((byte)IrqFlags, 2).PadLeft(8, '0')));
       Debug.Write(".");
   }
   Debug.WriteLine("");

   // Read the length
   byte numberOfBytes = rfm69Device.RegisterManager.ReadByte(0x0);

   // Allocate buffer for message
   byte[] messageBytes = new byte[numberOfBytes];

  for (int i = 0; i < numberOfBytes; i++)
  {
     messageBytes[i] = rfm69Device.RegisterManager.ReadByte(0x00); // RegFifo
   }

  string messageText = UTF8Encoding.UTF8.GetString(messageBytes);
  Debug.WriteLine("Received {0} byte message {1}", messageBytes.Length, messageText);

  Debug.WriteLine("Receive-Done");
}

private void InterruptGpioPin3_ValueChanged(GpioPin sender, GpioPinValueChangedEventArgs args)
{
   Debug.WriteLine("Receive-No wait");

  // Read the length
  byte numberOfBytes = rfm69Device.RegisterManager.ReadByte(0x0);

  // Allocate buffer for message
  byte[] messageBytes = new byte[numberOfBytes];

  for (int i = 0; i < numberOfBytes; i++)
  {
      messageBytes[i] = rfm69Device.RegisterManager.ReadByte(0x00); // RegFifo
   }

  string messageText = UTF8Encoding.UTF8.GetString(messageBytes);
  Debug.WriteLine("Received {0} byte message {1}", messageBytes.Length, messageText);

  Debug.WriteLine("Receive-Done");
}

Looks like this maybe a bit of a heisenbug as it takes a longish time to appear and poking around in the debugger and adding more diagnostics changes the frequency the error.

Received 16 byte message 
Receive-Done
.............................Receive-No wait
Received 16 byte message 
Receive-Done
Receive-No wait
Received 16 byte message 
Receive-Done
....Receive-No wait
Received 16 byte message 
Receive-Done
Receive-No wait
Received 16 byte message 
Receive-Done
.............

Pressing button one restarts inbound messages for a while, button two sits in an endless loop, button three reads in a 16 byte message of 0x10 characters, which I think is buffer length. I have added code to catch exceptions and stop re-entrancy but it never seems to get triggered.

private void InterruptGpioPin_ValueChanged(GpioPin sender, GpioPinValueChangedEventArgs args)
{
	if (args.Edge != GpioPinEdge.RisingEdge)
	{
		return;
	}

	if (InIrqHandler)
	{
		Debug.WriteLine("{0:HH:mm:ss.fff} InIrqHandler+++++++++++++++++++++++++++++++++++++++++++++++++++++++++", DateTime.Now);
		return;
	}
	InIrqHandler = true;

	try
	{
		RegIrqFlags2 irqFlags2 = (RegIrqFlags2)RegisterManager.ReadByte((byte)Registers.RegIrqFlags2);
		Debug.WriteLine("{0:HH:mm:ss.fff} RegIrqFlags2 {1}", DateTime.Now, Convert.ToString((byte)irqFlags2, 2).PadLeft(8, '0'));

		if ((irqFlags2 & RegIrqFlags2.PayloadReady) == RegIrqFlags2.PayloadReady)
		{
			if ((irqFlags2 & RegIrqFlags2.CrcOk) == RegIrqFlags2.CrcOk)
			{
				RegIrqFlags1 irqFlags1 = (RegIrqFlags1)RegisterManager.ReadByte((byte)Registers.RegIrqFlags1); // RegIrqFlags1

				//	SetMode(RegOpModeMode.Sleep);

				// Read the length of the buffer
				byte numberOfBytes = RegisterManager.ReadByte(0x0);
					Debug.WriteLine("{0:HH:mm:ss.fff} RegIrqFlags1 {1}", DateTime.Now, Convert.ToString((byte)irqFlags1, 2).PadLeft(8, '0'));
				if (((irqFlags1 & RegIrqFlags1.SynAddressMatch) == RegIrqFlags1.SynAddressMatch) && DeviceAddressingEnabled)
				{
					byte address = RegisterManager.ReadByte(0x0);
					Debug.WriteLine("{0:HH:mm:ss.fff} Address 0X{1:X2} {2}", DateTime.Now, address, Convert.ToString((byte)address, 2).PadLeft(8, '0'));
					numberOfBytes--;
				}

				// Allocate buffer for message
				byte[] messageBytes = new byte[numberOfBytes];

				for (int i = 0; i < numberOfBytes; i++)
				{
					messageBytes[i] = RegisterManager.ReadByte(0x00); // RegFifo
				}
				//SetMode(RegOpModeMode.Receive);

				string messageText = UTF8Encoding.UTF8.GetString(messageBytes);
					Debug.WriteLine("{0:HH:mm:ss} Received {1} byte message {2}", DateTime.Now, messageBytes.Length, messageText);
			}
			else
			{
				Debug.WriteLine("{0:HH:mm:ss} Received message CRC NOK++++++++++++", DateTime.Now);
			}
		}

		if ((irqFlags2 & RegIrqFlags2.PacketSent) == RegIrqFlags2.PacketSent)  // PacketSent set
		{
			RegisterManager.WriteByte(0x01, 0b00010000); // RegOpMode set ReceiveMode
			Debug.WriteLine("{0:HH:mm:ss.fff} Transmit-Done", DateTime.Now);
		}
	}
	catch (Exception ex)
	{
		Debug.WriteLine($"################### {ex.Source}");
		Debug.WriteLine($"################### {ex.Message}");
		Debug.WriteLine($"################### {ex.StackTrace}");
	}
	finally
	{
		InIrqHandler = false;
	}
}


I need to have a look at the Low Power Lab and HopeRF libraries to see how they handle message received interrupts.