Error Tracking with Trace Events
We recommend capturing a full coredump trace in case the system encounters a fatal problem, like a hard fault or a failed assertion. However, in some cases it may not be desirable or possible to do so. For example, if stopping & rebooting the system is not an option, or if the error is recoverable but you would still like to understand how often it happens.
The trace event module within the SDK makes it easy to track errors in a way that requires less storage than full coredump traces and also allows the system to keep running after capturing the event. Only the program counter, return address and a custom "reason" are saved. Once uploaded to Memfault, each trace event will be associated with an Issue just like a coredump.
Here's an example where Trace Events are captured for Bluetooth protocol CRC errors and invalid message IDs:
Prerequisite
This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, check out the appropriate guide in the table below.
MCU Architecture | Getting Started Guide |
---|---|
ARM Cortex-M | ARM Cortex-M Integration Guide |
nRF Connect SDK | nRF Connect SDK Integration Guide |
ESP32 ESP-IDF | ESP32 ESP-IDF Integration Guide |
ESP8266 | ESP8266 RTOS Integration Guide |
Dialog DA1469x | DA1469x Integration Guide |
NXP MCUXpresso RT1060 | NXP MCUXpresso SDK for i.MX RT Guide |
Defining Custom Trace Reasons
Aside from the program counter and return address, a trace event also contains a
user-defined "error reason". The list of custom reasons is defined in a separate
configuration file named memfault_trace_reason_user_config.def
which you need
to create.
To start, we recommend adding a "test" trace error reason you can easily trigger (i.e via a CLI command) and a couple for error paths in your codebase (such as peripheral bus read/write failures, transport errors and unexpected timeouts).
Here is what the memfault_trace_reason_user_config.def
file should look like:
// memfault_trace_reason_user_config.def
MEMFAULT_TRACE_REASON_DEFINE(test)
MEMFAULT_TRACE_REASON_DEFINE(your_custom_error_reason_1)
MEMFAULT_TRACE_REASON_DEFINE(your_custom_error_reason_2)
// ...
Adding Trace Events
Next, we'll need to use the MEMFAULT_TRACE_EVENT_*
macros to capture trace
events when errors occur.
Note that it is perfectly fine to use the same reason in different places if that makes sense in the context of your code. Because the program counter and return address are captured in the trace event, you will be able to see the 2 topmost frames (function name, source file and line) in Memfault's Issue UI and distinguish between the two.
For test purposes, you can add a CLI command that logs trace events using the different methods:
#include "memfault/components.h"
// [ ...]
void test_trace_event_cli_cmd(void) {
MEMFAULT_TRACE_EVENT(test);
// trace event with status code (i.e error code returned from stack)
MEMFAULT_TRACE_EVENT_WITH_STATUS(test, 0xdeadbeef);
// trace event with a log of arbitrary data
MEMFAULT_TRACE_EVENT_WITH_LOG(test, "A trace event with log %d", 1);
}
You can also start to add trace events for error paths:
#include "memfault/core/trace_event.h"
// [ ...]
void ble_le_process_ll_pkt(...) {
// ...
if (invalid_msg_id) {
MEMFAULT_TRACE_EVENT(bt_invalid_msg_id);
// ...
}
// ..
}
Recording Trace Events from Interrupts
It is also safe to use the MEMFAULT_TRACE_EVENT()
macro from an interrupt.
When called from an interrupt, the trace event info will just copy the collected
info to a temporary storage region in RAM to minimize interrupt latency. The
collected data can then be serialized to event storage by explicitly calling
memfault_trace_event_try_flush_isr_event()
. The data will also be flushed
automatically anytime MEMFAULT_TRACE_EVENT()
is called and the processor is
not in an interrupt.