Coredump Collection
This tutorial will cover integrating the coredump collection functionality of the Memfault Firmware SDK into your system.
With coredumps integrated, variable, register, & task state can all be collected at the time a fault, assert, or unexpected error takes place.
This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, please complete the appropriate Getting Started guide.
Platform specific storage region for crash data
When a crash takes place, a snapshot of the state needs to be stored in a storage area that persists across a device reboot.
We typically recommend starting with the RAM backed Coredump port from
memfault_platform_ram_backed_coredump.c
By default this will only save the top of the stack at the time of crash but it lets you quickly get coredumps up and running and get a feel for how things work.
Coredump data can also be stored to any other backing storage (eMMC, external
NOR flash, internal flash, etc). We have a number of ports available for
different MCUs in the
ports
directory of the memfault-firmware-sdk or you can add your own port by
implementing the required
dependencies.
Rebooting after a coredump
At the very end of saving a coredump, the memfault_platform_reboot()
function
implemented as part of your initial port will be called. From here any final
system cleanup can be performed before restarting the system
Choosing what to store in a coredump
For the best debugging experience it is best to store all of RAM to backing storage. However, this is not always possible or desirable. For one, there simply may not be enough storage to preserve all of RAM. In other cases there may be sensitive data on the device that should not be sent off device. In this section you will learn how to choose what to save and how to make this easier for unrelated sections of RAM in the case where you cannot or should not preserve all of RAM.
This tutorial assumes a gcc
toolchain but the concepts are similar with other
toolchains.
Memfault coredumps are designed to be flexible allowing the user to choose only what they want or what they can fit into the coredump data storage. Memfault uses coredump regions to describe what regions of memory are to be gathered up and preserved when a fault exception or user trace occurs.
For supported RTOSs like FreeRTOS or Zephyr Memfault automatically collects task information if you use the Memfault port. But what if you have a special block of data or set of structures in your application that you would like to capture as part of a coredump? What if you want to store all of RAM except a particular set of structures?
To achieve this control we will use the linker to locate named sections within the memory map of your application. Using those section names and special compiler attributes we can assign any object to a given section.
Once the objects are located in named sections we will add those sections as
regions to the coredump saving logic. This last part will require modifying an
existing version of memfault_platform_coredump_get_regions()
or writing your
own implementation to override the default if you prefer.
The Linker Script File
Locate the linker script file used to layout the memory map for your application. Sometimes this file is generated from a master file so you will need to modify the master file to prevent losing your changes after a clean rebuild.
The linker script file conventionally has an extension of .ld
but that is not
required. You may need to look for the script file callout in the link line of
your applications, e.g. gcc <object files> -Wl,-T<script-file>
. The -T
flag
may be written out in long form as --script=<script-file>
.
Within the linker script file there is a section called SECTIONS
. This is
where the linker is told where to place zero-initialized data (.bss
),
initialized data (.data
), and code (.text
) and read-only constants
(.rodata
). There are other sections like stack and heap but we don't allocate
objects to those sections at build time.
For simplicity, allocate your important objects in one file, for example, in an
important_allocations.c
source file. These objects must either be declared
static or be defined at file-scope (global). This example assumes you are not
saving all of RAM as part of a coredump.
// important_allocations.c
[...]
// Example objects
struct SomeStruct g_global_object; // A .bss object
static uint32_t s_initial_value = 42; // A .data object
In the linker script specifically call out the resultant object file
important_allocations.o
as shown below (highlighted).
Be sure to add the highlighted lines between the vendor supplied __xxx_start
and __xxx_end
labels. If your compiler emits .obj
instead of .o
suffixed
object files then be sure to change the extension, or wildcard it, in the linker
script. The leading wildcard symbol removes any path information that may be
prepended by the toolchain.
/* linker-script.ld
*/
SECTIONS {
[...]
.data :
{
__data_start__ = .;
*(.data*)
memfault_data_start = .;
*important_allocations.o(.data .data*)
memfault_data_end = .;
__data_end__ = .;
} >RAM AT>FLASH
[...]
.bss (NOLOAD) :
{
__bss_start__ = .;
*(.bss*)
*(COMMON)
memfault_bss_start = .;
*important_allocations.o(.bss COMMON .bss*)
memfault_bss_end = .;
__bss_end__ = .;
} >RAM
[...]
}
This modification tells the linker to ensure that the every static and
file-scoped variable allocated in important_allocations.c
will be located
between the respective symbols memfault_xxx_start
and memfault_xxx_end
.
Next, add these two regions to your memfault_platform_coredump_get_regions()
implementation with entries like this.
[...]
// The addresses of the labels are the values of the start and end addresses.
extern uint32_t memfault_data_start[];
extern uint32_t memfault_data_end[];
const size_t memfault_data_region_size = (uintptr_t)memfault_data_end -
(uintptr_t)memfault_data_start;
extern uint32_t memfault_bss_start[];
extern uint32_t memfault_bss_end[];
const size_t memfault_bss_region_size = (uintptr_t)memfault_bss_end -
(uintptr_t)memfault_bss_start;
s_coredump_regions[region_idx++] = MEMFAULT_COREDUMP_MEMORY_REGION_INIT(
memfault_data_start, memfault_data_region_size);
s_coredump_regions[region_idx++] = MEMFAULT_COREDUMP_MEMORY_REGION_INIT(
memfault_bss_start, memfault_bss_region_size);
[...]
With these changes in place you should now see these two objects as part of the Globals & Statics display of the Memfault Issues → Coredump page.
Saving Specific Variables in a Coredump
This scenario is slightly different from the previous example in that we wish to
capture particular variables, possibly from many different source files, in the
coredump. For this we need to place variables in a named section so they can be
collected by the linker into a single region of memory. Use compiler directives
to place these important variables in the named section. Memfault provides a
convenience macro, MEMFAULT_PUT_IN_SECTION()
, to help with this.
In the default .bss
and .data
sections in the linker script, collect the
names sections as shown below (highlighted). We still need the start and end
labels to add them to the coredump region list.
/* linker-script.ld
*/
SECTIONS {
[...]
.data :
{
__data_start__ = .;
*(.data*)
memfault_data_start = .;
*(.memfault_data*)
memfault_data_end = .;
__data_end__ = .;
} >RAM AT>FLASH
[...]
.bss (NOLOAD) :
{
__bss_start__ = .;
*(.bss*)
*(COMMON)
memfault_bss_start = .;
*(.memfault_bss*)
memfault_bss_end = .;
__bss_end__ = .;
} >RAM
[...]
}
Like before, add these two regions to your
memfault_platform_coredump_get_regions()
. Because we have used the same label
names, memfault_xxx_yyy
, as in the previous example please refer back to the
previous code snippet as it is valid for this example as well.
Now allocate some variables to these sections. For this example, place variables from two source files into the new sections.
// --- file1.c ---
MEMFAULT_PUT_IN_SECTION(".memfault_data")
int g_initial_limit = 100;
int foo(void) {
MEMFAULT_PUT_IN_SECTION(".memfault_bss")
static char s_large_buffer[1024];
[...]
}
// --- file2.c ---
MEMFAULT_PUT_IN_SECTION(".memfault_bss")
char g_name[128];
int bar(void) {
MEMFAULT_PUT_IN_SECTION(".memfault_data")
static char s_num_resources = 12;
[...]
}
Capturing Peripheral Registers
While debugging a crash it may be useful to know the state of your peripherals.
Peripheral registers can be captured in a similar manner to other captured
memory regions. Simply add the address and length of the register space you wish
to capture to your memfault_platform_coredump_get_regions()
implementation.
Below is an example of capturing the UART peripheral registers for the nRF52840:
The memory regions mapped to your MCU's peripheral may require word aligned
reads. The nRF52840 has such a limitation which is why
kMfltCoredumpRegionType_MemoryWordAccessOnly
is used over
kMfltCoredumpRegionType_Memory
.
[...]
// UARTE is the nRF52840 UART peripheral
#define UARTE_PERIPHERAL_REGISTER_ADDRESS_START 0x40002000
#define UARTE_PERIPHERAL_REGISTER_ADDRESS_SIZE 0x56F
s_coredump_regions[region_idx++] =
(sMfltCoredumpRegion){.type = kMfltCoredumpRegionType_MemoryWordAccessOnly,
.region_start = (void *)UARTE_PERIPHERAL_REGISTER_ADDRESS_START,
.region_size = UARTE_PERIPHERAL_REGISTER_ADDRESS_SIZE};
[...]
These registers will now be present in your coredump, and you can inspect their value locally with GDB and a CMSIS SVD plugin to see decoded values.
Some peripheral registers change values when read. If changing the value of these registers would negatively affect the system during coredump collection, capturing their values should be avoided.
Locating Sections at a Specific Location
Another benefit working with linker scripts is the ability to place objects or files' data at specific locations or addresses. Often microcontrollers (MCUs) have special memory regions that have some unique characteristic you may wish to take advantage of. Also, it is common to share regions of memory between separate applications on one MCU like a bootloader and the production application. Lastly, it is often the case that you need to persist information across reboots.
In these instances the linker script allows control over the placement and allocation of objects in the MCU's address space. The following example demonstrates how to locate DMA buffers in on-chip "fast" RAM and allocate a shared memory area between a bootloader and an application. The linker script below shows the section layout with the addition of memory regions.
This has implications for your startup code. For zero initialized variables you need to explicitly zero them before use, if desired, and for value initialized variables (not used here) you will need to copy the initial values into the variables before use. This is because the default C run-time will not be aware of your special sections.
/* linker-script.ld
*/
MEMORY
{
FLASH (x) : org = 0x00000000 , len = 128K
DMARAM (wx) : org = 0x10000000 , len = 4K /* fast RAM */
SHAREDRAM (wx) : org = 0x08000000 , len = 1K /* carve 1st 1K from RAM */
RAM (wx) : org = 0x08000000+1K, len = 16K-1K
}
SECTIONS {
[...]
.dma_buffers (NOLOAD) :
{ /* Starts at 0x10000000 */
dma_ram_start = .;
*(.dma_buffers.*)
dma_ram_end = .;
} >DMARAM
.shared (NOLOAD) :
{
shared_start = .;
*shared_allocations.o(.shared*)
shared_end = .;
} >SHAREDRAM
.data :
{
__data_start__ = .;
*(.data*)
__data_end__ = .;
} >RAM AT>FLASH
[...]
.bss (NOLOAD) :
{
__bss_start__ = .;
*(.bss*)
*(COMMON)
__bss_end__ = .;
} >RAM
[...]
}
By using the location macro you can allocate DMA buffers and ensure they are in the correct memory range.
[...]
MEMFAULT_PUT_IN_SECTION(".dma_buffers")
static uin8_t s_uart_dma_buffer[NUM_UARTS][256];
To share a structure between your bootloader and production application the boot
loader would need to ensure that its linker script file had a matching entry for
the SHAREDRAM
memory region and the .shared
section. After that, the two
separate applications could exchange information in common data structures by
having a shared allocation file linked into both applications.
// --- shared_allocations.h ---
struct Version {
int major;
int minor;
int build;
char description[128];
};
// --- shared_allocations.c ---
#include "shared_allocations.h"
MEMFAULT_PUT_IN_SECTION(".shared")
static struct Version s_bootloader_version;
MEMFAULT_PUT_IN_SECTION(".shared")
static struct Version s_application_version;
[...]
This last example assumes that you will initialize the values appropriately. The values will persist across MCU resets.
Testing Coredump Platform Storage
The Memfault SDK provides functions for verifying the coredump platform storage layer is correctly implemented, by performing various reads, writes, and erases using the platform storage API.
After including and configuring one of the provided coredump platform storage implementations from the Memfault SDK ports, or when building a custom implementation, the coredump storage test functions are a great way to validate the storage is working correctly.
To run the test, call these functions either from your device's CLI, via a
button press, or simply somewhere in your application (eg main
function):
#include "memfault/components.h"
// Runs a sanity test to confirm coredump port is working as expected
int test_coredump_storage(int argc, char *argv[]) {
// Note: Coredump saving runs from an ISR prior to reboot so should
// be safe to call with interrupts disabled.
your_platform_disable_interrupts();
memfault_coredump_storage_debug_test_begin();
your_platform_enable_interrupts();
memfault_coredump_storage_debug_test_finish();
return 0;
}
Memfault Coredump Format
Memfault's coredump format is detailed below. The format is not necessary to understand, but it's documented here to help debug issues with invalid coredump files.
Section | Field | Size (bytes) | Description | Example (Little endian) |
---|---|---|---|---|
Header | Magic | 4 | Hard-coded magic bytes in header | 0x45524F43 (CORE) |
Version | 4 | Version of Memfault coredump | 0x00000002 (2) | |
Coredump Size | 4 | The size of the coredump including header and footer | 0x000008FC (2300) | |
Block 1 | Type | 1 | Signifies the type of the block | 0x01 (Memory Region) |
Reserved | 3 | Reserved for future use | All zeroes | |
Address | 4 | - If the block is a memory region, this is the starting address in RAM. - All other block types, this will be all zeroes. | 0x80000000 | |
Length | 4 | Length of block | 1280 | |
Data | variable | Block data | 1280 bytes of memory region data | |
Block 2..N | ... | ... | ... | ... |
Footer | Magic | 4 | Hard-coded magic bytes in footer | 0x504D5544 (DUMP) |
Flags | 4 | Miscellaneous flags of the coredump | All zeroes | |
Reserved | 8 | Reserved for future use | All zeroes |
Blocks can have the following types:
- Current registers
- Memory region
- Device serial
- Deprecated
- Hardware version
- Coredump reason
- Padding
- Machine type (
e_machine
from the ELF header) - Deprecated
- Arm MPU configuration
- Software version
- Software type
- Build ID
Trace Event when Coredump Saving Fails
The Memfault Firmware SDK will fall back on recording a Trace Event when a coredump fails to save. Coredump saving can fail for a number of reasons, including:
- coredump storage is already occupied by a coredump (hasn't been uploaded to Memfault yet)
- failure during saving to storage (eg. flash write failure)
Trace Events show only the current and previous frame of the stack (PC + LR on ARM architectures) at the time of the event.
You can see an example of the fallback Trace Event staging here:
loading...
Note that Trace Events are marked in the UI, so you can distinguish them from Coredump Traces: