Watchdog Instrumentation
This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, please complete the appropriate Getting Started guide.
This document describes the necessary integration details to capture coredumps when a Watchdog occurs.
An example of a captured Watchdog crash below:
Enabling a Watchdog Timer
This guide assumes a Watchdog Timer is already present and enabled on the system. For some general background on Watchdog Timers and recommendations on how to use them, see this article:
https://interrupt.memfault.com/blog/firmware-watchdog-best-practices
Instrumenting Watchdogs with Memfault
When using the Memfault SDK, we recommend instrumenting the watchdog timer so a coredump will be captured when the watchdog trips.
The fundamental technique is to use Memfault's Watchdog interrupt handler to trap the interrupt when the watchdog fires- this provides the best backtrace result, since it uses the Memfault interrupt shim.
To use it, make sure that the MEMFAULT_EXC_HANDLER_WATCHDOG
Memfault SDK
config is set to the name of the Watchdog interrupt handler registered in your
interrupt vector table (this config is usually set in the
memfault_platform_config.h
header file), and there is no competing
implementation.
In order for the coredump save to complete, the chip will need to be configured to fire the interrupt for the Watchdog Timer, but not immediately reset.
Configuring Watchdog Timer Warning Threshold
Depending on the particular chip, there's often a "warning" threshold that can be configured into the watchdog, which will trigger an interrupt some period before the watchdog reset occurs.
For example, on the NXP RT1021 chip (from the RT1020 Reference Manual):
If available, be sure to set up the warning threshold at a sufficiently long timeout for the coredump saving to have enough time to finish (for example, if there are long flash page erase times that need to be accounted for).
We typically recommend a relatively long timeout, such as 10 seconds, but be sure to consult the flash chip documentation to ensure enough time is reserved.
If the hardware watchdog does not permit a "warning" threshold of sufficient time, it may instead make sense to set the hardware watchdog to a very long period, and use a timer peripheral interrupt to trigger the Memfault software watchdog.
Example Sequence Diagram
Normally the coredump saving will complete before the full watchdog timeout, and the Memfault fault handler will reset the chip. The following sequence diagram illustrates this:
Example Sequence Diagram with Hardware Watchdog Triggering Reset
If the coredump saving takes too long, the hardware watchdog will trigger a reset instead, causing a lost coredump. In this case, Memfault saves a Trace Event to indicate that the watchdog reset occurred and failed to save the coredump, which will generate an Issue:
Petting the Watchdog Prior to Coredump Save
The actual writing of the coredump to the platform storage medium is the time consuming part of the coredump capture.
One technique that may be useful is to apply one last reset of the watchdog timer (often called "petting" the watchdog) prior to the coredump save operation. The place to insert the watchdog timer reset is in this weakly-defined function:
//! Called prior to invoking any platform_storage_[read/write/erase] calls upon crash
//!
//! @note a weak no-op version of this API is defined in memfault_coredump.c because many platforms will
//! not need to implement this at all. Some potential use cases:
//! - re-configured platform storage driver to be poll instead of interrupt based
//! - give watchdog one last feed prior to storage code getting called it
//! @return true if to continue saving the coredump, false to abort
extern bool memfault_platform_coredump_save_begin(void);
Software Watchdog
For other watchdog types that don't trigger a hardware interrupt, the Memfault
assert macro MEMFAULT_SOFTWARE_WATCHDOG()
can be used to trigger a coredump
tagged as Software Watchdog.
If MEMFAULT_SOFTWARE_WATCHDOG()
is called from a watchdog interrupt handler,
the Memfault interrupt shim will not be used, and the backtrace will not be as
accurate. Instead, it's preferable to wire up the
MEMFAULT_EXC_HANDLER_WATCHDOG
interrupt handler
directly.
One common use case for this feature is a "task watchdog" in a multi-threaded system, which checks for tasks that aren't busy-looping, but are stuck waiting on some OS primitive (mutex or queue) when they should be proceeding.
Call the MEMFAULT_SOFTWARE_WATCHDOG()
macro when the task watchdog trips. Note
that in order to diagnose such an issue, Memfault needs to support thread
awareness for the RTOS in use, and the necessary threads and kernel data
structures must be captured as part of the coredump (see the
RTOS Analysis documentation for details.)
Memfault provides a lightweight task watchdog subsystem which can be used if the system doesn't have one available. See here for more information.
Sample Integrations
Memfault provides several example implementations of software watchdogs, which can be used as-is or as a reference for your own implementation: