Skip to main content

Error Tracking with Trace Events

We recommend capturing a full coredump trace in case the system encounters a fatal problem, like a hard fault or a failed assertion. However, in some cases it may not be desirable or possible to do so. For example, if stopping & rebooting the system is not an option, or if the error is recoverable but you would still like to understand how often it happens.

The trace event module within the SDK makes it easy to track errors in a way that requires less storage than full coredump traces and also allows the system to keep running after capturing the event. Only the program counter, return address and a custom "reason" are saved. Once uploaded to Memfault, each trace event will be associated with an Issue just like a coredump.

Here's an example where Trace Events are captured for Bluetooth protocol CRC errors and invalid message IDs:

Prerequisite

This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, check out the appropriate guide in the table below.

MCU ArchitectureGetting Started Guide
ARM Cortex-MARM Cortex-M Integration Guide
nRF Connect SDKnRF Connect SDK Integration Guide
ESP32 ESP-IDFESP32 ESP-IDF Integration Guide
ESP8266ESP8266 RTOS Integration Guide

Defining Custom Trace Reasons#

Aside from the program counter and return address, a trace event also contains a user-defined "error reason". The list of custom reasons is defined in a separate configuration file named memfault_trace_reason_user_config.def which you need to create.

To start, we recommend adding a "test" trace error reason you can easily trigger (i.e via a CLI command) and a couple for error paths in your codebase (such as peripheral bus read/write failures, transport errors and unexpected timeouts).

Here is what the memfault_trace_reason_user_config.def file should look like:

// memfault_trace_reason_user_config.def
MEMFAULT_TRACE_REASON_DEFINE(test)
MEMFAULT_TRACE_REASON_DEFINE(your_custom_error_reason_1)
MEMFAULT_TRACE_REASON_DEFINE(your_custom_error_reason_2)
// ...

Adding Trace Events#

Next, we'll need to use the MEMFAULT_TRACE_EVENT_* macros to capture trace events when errors occur.

Note that it is perfectly fine to use the same reason in different places if that makes sense in the context of your code. Because the program counter and return address are captured in the trace event, you will be able to see the 2 topmost frames (function name, source file and line) in Memfault's Issue UI and distinguish between the two.

For test purposes, you can add a CLI command that logs trace events using the different methods:

#include "memfault/components.h"
// [ ...]
void test_trace_event_cli_cmd(void) {
MEMFAULT_TRACE_EVENT(test);
// trace event with status code (i.e error code returned from stack)
MEMFAULT_TRACE_EVENT_WITH_STATUS(test, 0xdeadbeef);
// trace event with a log of arbitrary data
MEMFAULT_TRACE_EVENT_WITH_LOG(test, "A trace event with log %d", 1);
}

You can also start to add trace events for error paths:

#include "memfault/core/trace_event.h"
// [ ...]
void ble_le_process_ll_pkt(...) {
// ...
if (invalid_msg_id) {
MEMFAULT_TRACE_EVENT(bt_invalid_msg_id);
// ...
}
// ..
}

Recording Trace Events from Interrupts#

It is also safe to use the MEMFAULT_TRACE_EVENT() macro from an interrupt. When called from an interrupt, the trace event info will just copy the collected info to a temporary storage region in RAM to minimize interrupt latency. The collected data can then be serialized to event storage by explicitly calling memfault_trace_event_try_flush_isr_event(). The data will also be flushed automatically anytime MEMFAULT_TRACE_EVENT() is called and the processor is not in an interrupt.