Watchdog Instrumentation

Prerequisite

This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, check out the appropriate guide in the table below.

MCU ArchitectureGetting Started Guide
ARM Cortex-MARM Cortex-M Integration Guide
nRF Connect SDKnRF Connect SDK Integration Guide
ESP32 ESP-IDF (Xtensa and RISC-V)ESP32 ESP-IDF Integration Guide
ESP8266ESP8266 RTOS Integration Guide
Dialog DA1469xDA1469x Integration Guide
NXP MCUXpresso RT1060NXP MCUXpresso SDK for i.MX RT Guide
Zephyr RTOSZephyr Integration Guide

This document describes the necessary integration details to capture coredumps when a Watchdog occurs.

An example of a captured Watchdog crash below:

Enabling a Watchdog Timer​

This guide assumes a Watchdog Timer is already present and enabled on the system. For some general background on Watchdog Timers and recommendations on how to use them, see this article:

https://interrupt.memfault.com/blog/firmware-watchdog-best-practices

Instrumenting Watchdogs with Memfault​

When using the Memfault SDK, we recommend instrumenting the watchdog timer so a coredump will be captured when the watchdog trips.

The fundamental technique is to use Memfault's Watchdog interrupt handler to trap the interrupt when the watchdog fires- this provides the best backtrace result, since it uses the Memfault interrupt shim.

To use it, make sure that the MEMFAULT_EXC_HANDLER_WATCHDOG Memfault SDK config is set to the name of the Watchdog interrupt handler registered in your interrupt vector table (this config is usually set in the memfault_platform_config.h header file), and there is no competing implementation.

In order for the coredump save to complete, the chip will need to be configured to fire the interrupt for the Watchdog Timer, but not immediately reset.

Configuring Watchdog Timer Warning Threshold​

Depending on the particular chip, there's often a "warning" threshold that can be configured into the watchdog, which will trigger an interrupt some period before the watchdog reset occurs.

For example, on the NXP RT1021 chip (from the RT1020 Refernce Manual):

If available, be sure to set up the warning threshold at a sufficiently long timeout for the coredump saving to have enough time to finish (for example, if there are long flash page erase times that need to be accounted for).

We typically recommend a relatively long timeout, such as 10 seconds, but be sure to consult the flash chip documentation to ensure enough time is reserved.

If the hardware watchdog does not permit a "warning" threshold of sufficient time, it may instead make sense to set the hardware watchdog to a very long period, and use a timer peripheral interrupt to trigger the Memfault software watchdog.

Petting the Watchdog Prior to Coredump Save​

The actual writing of the coredump to the platform storage medium is the time consuming part of the coredump capture.

One technique that may be useful is to apply one last reset of the watchdog timer (often called "petting" the watchdog) prior to the coredump save operation. The place to insert the watchdog timer reset is in this weakly-defined function:

//! Called prior to invoking any platform_storage_[read/write/erase] calls upon crash//!//! @note a weak no-op version of this API is defined in memfault_coredump.c because many platforms will//! not need to implement this at all. Some potential use cases://!   - re-configured platform storage driver to be poll instead of interrupt based//!   - give watchdog one last feed prior to storage code getting called it//! @return true if to continue saving the coredump, false to abortextern bool memfault_platform_coredump_save_begin(void);

Sample Integrations​

Memfault provides several example implementations of software watchdogs, which can be used as-is or as a reference for your own implementation: