Coredump Collection

This tutorial will cover integrating the coredump collection functionality of the Memfault Firmware SDK into your system.

With coredumps integrated, variable, register, & task state can all be collected at the time a fault, assert, or unexpected error takes place.

Prerequisite

This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, please complete the appropriate Getting Started guide.

Platform specific storage region for crash data

When a crash takes place, a snapshot of the state needs to be stored in a storage area that persists across a device reboot.

We typically recommend starting with the RAM backed Coredump port from memfault_platform_ram_backed_coredump.c

By default this will only save the top of the stack at the time of crash but it lets you quickly get coredumps up and running and get a feel for how things work.

Coredump data can also be stored to any other backing storage (eMMC, external NOR flash, internal flash, etc). We have a number of ports available for different MCUs in the ports directory of the memfault-firmware-sdk or you can add your own port by implementing the required dependencies.

Rebooting after a coredump

At the very end of saving a coredump, the memfault_platform_reboot() function implemented as part of your initial port will be called. From here any final system cleanup can be performed before restarting the system

Choosing what to store in a coredump

For the best debugging experience it is best to store all of RAM to backing storage. However, this is not always possible or desirable. For one, there simply may not be enough storage to preserve all of RAM. In other cases there may be sensitive data on the device that should not be sent off device. In this section you will learn how to choose what to save and how to make this easier for unrelated sections of RAM in the case where you cannot or should not preserve all of RAM.

note

This tutorial assumes a gcc toolchain but the concepts are similar with other toolchains.

Memfault coredumps are designed to be flexible allowing the user to choose only what they want or what they can fit into the coredump data storage. Memfault uses coredump regions to describe what regions of memory are to be gathered up and preserved when a fault exception or user trace occurs.

For supported RTOSs like FreeRTOS or Zephyr Memfault automatically collects task information if you use the Memfault port. But what if you have a special block of data or set of structures in your application that you would like to capture as part of a coredump? What if you want to store all of RAM except a particular set of structures?

To achieve this control we will use the linker to locate named sections within the memory map of your application. Using those section names and special compiler attributes we can assign any object to a given section.

Once the objects are located in named sections we will add those sections as regions to the coredump saving logic. This last part will require modifying an existing version of memfault_platform_coredump_get_regions() or writing your own implementation to override the default if you prefer.

The Linker Script File

Locate the linker script file used to layout the memory map for your application. Sometimes this file is generated from a master file so you will need to modify the master file to prevent losing your changes after a clean rebuild.

The linker script file conventionally has an extension of .ld but that is not required. You may need to look for the script file callout in the link line of your applications, e.g. gcc <object files> -Wl,-T<script-file>. The -T flag may be written out in long form as --script=<script-file>.

Within the linker script file there is a section called SECTIONS. This is where the linker is told where to place zero-initialized data (.bss), initialized data (.data), and code (.text) and read-only constants (.rodata). There are other sections like stack and heap but we don't allocate objects to those sections at build time.

For simplicity, allocate your important objects in one file, for example, in an important_allocations.c source file. These objects must either be declared static or be defined at file-scope (global). This example assumes you are not saving all of RAM as part of a coredump.

// important_allocations.c
[...]

// Example objects
struct SomeStruct g_global_object;    // A .bss object
static uint32_t s_initial_value = 42; // A .data object

In the linker script specifically call out the resultant object file important_allocations.o as shown below (highlighted).

note

Be sure to add the highlighted lines between the vendor supplied __xxx_start and __xxx_end labels. If your compiler emits .obj instead of .o suffixed object files then be sure to change the extension, or wildcard it, in the linker script. The leading wildcard symbol removes any path information that may be prepended by the toolchain.

/* linker-script.ld
*/

SECTIONS {
    [...]
    .data :
    {
        __data_start__ = .;
        *(.data*)
        memfault_data_start = .;
        *important_allocations.o(.data .data*)
        memfault_data_end = .;
        __data_end__ = .;
    } >RAM AT>FLASH

    [...]

    .bss (NOLOAD) :
    {
        __bss_start__ = .;
        *(.bss*)
        *(COMMON)
        memfault_bss_start = .;
        *important_allocations.o(.bss COMMON .bss*)
        memfault_bss_end = .;
        __bss_end__ = .;
    } >RAM

    [...]
}

This modification tells the linker to ensure that the every static and file-scoped variable allocated in important_allocations.c will be located between the respective symbols memfault_xxx_start and memfault_xxx_end.

Next, add these two regions to your memfault_platform_coredump_get_regions() implementation with entries like this.

  [...]

  // The addresses of the labels are the values of the start and end addresses.
  extern uint32_t memfault_data_start[];
  extern uint32_t memfault_data_end[];
  const size_t memfault_data_region_size = (uintptr_t)memfault_data_end -
      (uintptr_t)memfault_data_start;

  extern uint32_t memfault_bss_start[];
  extern uint32_t memfault_bss_end[];
  const size_t memfault_bss_region_size = (uintptr_t)memfault_bss_end -
      (uintptr_t)memfault_bss_start;

  s_coredump_regions[region_idx++] = MEMFAULT_COREDUMP_MEMORY_REGION_INIT(
     memfault_data_start, memfault_data_region_size);

  s_coredump_regions[region_idx++] = MEMFAULT_COREDUMP_MEMORY_REGION_INIT(
     memfault_bss_start, memfault_bss_region_size);

  [...]

With these changes in place you should now see these two objects as part of the Globals & Statics display of the Memfault Issues → Coredump page.

Saving Specific Variables in a Coredump

This scenario is slightly different from the previous example in that we wish to capture particular variables, possibly from many different source files, in the coredump. For this we need to place variables in a named section so they can be collected by the linker into a single region of memory. Use compiler directives to place these important variables in the named section. Memfault provides a convenience macro, MEMFAULT_PUT_IN_SECTION(), to help with this.

In the default .bss and .data sections in the linker script, collect the names sections as shown below (highlighted). We still need the start and end labels to add them to the coredump region list.

/* linker-script.ld
*/
SECTIONS {
    [...]
    .data :
    {
        __data_start__ = .;
        *(.data*)
        memfault_data_start = .;
        *(.memfault_data*)
        memfault_data_end = .;
        __data_end__ = .;
    } >RAM AT>FLASH

    [...]

    .bss (NOLOAD) :
    {
        __bss_start__ = .;
        *(.bss*)
        *(COMMON)
        memfault_bss_start = .;
        *(.memfault_bss*)
        memfault_bss_end = .;
        __bss_end__ = .;
    } >RAM

    [...]
}

Like before, add these two regions to your memfault_platform_coredump_get_regions(). Because we have used the same label names, memfault_xxx_yyy, as in the previous example please refer back to the previous code snippet as it is valid for this example as well.

Now allocate some variables to these sections. For this example, place variables from two source files into the new sections.

// --- file1.c ---
MEMFAULT_PUT_IN_SECTION(".memfault_data")
int g_initial_limit = 100;

int foo(void) {
  MEMFAULT_PUT_IN_SECTION(".memfault_bss")
  static char s_large_buffer[1024];
  [...]
}

// --- file2.c ---
MEMFAULT_PUT_IN_SECTION(".memfault_bss")
char g_name[128];

int bar(void) {
  MEMFAULT_PUT_IN_SECTION(".memfault_data")
  static char s_num_resources = 12;
  [...]
}

Capturing Peripheral Registers

While debugging a crash it may be useful to know the state of your peripherals. Peripheral registers can be captured in a similar manner to other captured memory regions. Simply add the address and length of the register space you wish to capture to your memfault_platform_coredump_get_regions() implementation. Below is an example of capturing the UART peripheral registers for the nRF52840:

note

The memory regions mapped to your MCU's peripheral may require word aligned reads. The nRF52840 has such a limitation which is why kMfltCoredumpRegionType_MemoryWordAccessOnly is used over kMfltCoredumpRegionType_Memory.

[...]
// UARTE is the nRF52840 UART peripheral
#define UARTE_PERIPHERAL_REGISTER_ADDRESS_START 0x40002000
#define UARTE_PERIPHERAL_REGISTER_ADDRESS_SIZE 0x56F

s_coredump_regions[region_idx++] =
    (sMfltCoredumpRegion){.type = kMfltCoredumpRegionType_MemoryWordAccessOnly,
                        .region_start = (void *)UARTE_PERIPHERAL_REGISTER_ADDRESS_START,
                        .region_size = UARTE_PERIPHERAL_REGISTER_ADDRESS_SIZE};

[...]

These registers will now be present in your coredump, and you can inspect their value locally with GDB and a CMSIS SVD plugin to see decoded values.

note

Some peripheral registers change values when read. If changing the value of these registers would negatively affect the system during coredump collection, capturing their values should be avoided.

Locating Sections at a Specific Location

Another benefit working with linker scripts is the ability to place objects or files' data at specific locations or addresses. Often microcontrollers (MCUs) have special memory regions that have some unique characteristic you may wish to take advantage of. Also, it is common to share regions of memory between separate applications on one MCU like a bootloader and the production application. Lastly, it is often the case that you need to persist information across reboots.

In these instances the linker script allows control over the placement and allocation of objects in the MCU's address space. The following example demonstrates how to locate DMA buffers in on-chip "fast" RAM and allocate a shared memory area between a bootloader and an application. The linker script below shows the section layout with the addition of memory regions.

note

This has implications for your startup code. For zero initialized variables you need to explicitly zero them before use, if desired, and for value initialized variables (not used here) you will need to copy the initial values into the variables before use. This is because the default C run-time will not be aware of your special sections.

/* linker-script.ld
*/

MEMORY
{
    FLASH     (x)  : org = 0x00000000   , len = 128K
    DMARAM    (wx) : org = 0x10000000   , len = 4K  /* fast RAM */
    SHAREDRAM (wx) : org = 0x08000000   , len = 1K  /* carve 1st 1K from RAM */
    RAM       (wx) : org = 0x08000000+1K, len = 16K-1K
}

SECTIONS {
    [...]

    .dma_buffers (NOLOAD) :
    {   /* Starts at 0x10000000  */
        dma_ram_start = .;
        *(.dma_buffers.*)
        dma_ram_end = .;
    } >DMARAM

    .shared (NOLOAD) :
    {
        shared_start = .;
        *shared_allocations.o(.shared*)
        shared_end = .;
    } >SHAREDRAM

    .data :
    {
        __data_start__ = .;
        *(.data*)
        __data_end__ = .;
    } >RAM AT>FLASH

    [...]

    .bss (NOLOAD) :
    {
        __bss_start__ = .;
        *(.bss*)
        *(COMMON)
        __bss_end__ = .;
    } >RAM

    [...]
}

By using the location macro you can allocate DMA buffers and ensure they are in the correct memory range.

[...]
MEMFAULT_PUT_IN_SECTION(".dma_buffers")
static uin8_t s_uart_dma_buffer[NUM_UARTS][256];

To share a structure between your bootloader and production application the boot loader would need to ensure that its linker script file had a matching entry for the SHAREDRAM memory region and the .shared section. After that, the two separate applications could exchange information in common data structures by having a shared allocation file linked into both applications.

// --- shared_allocations.h ---
struct Version {
   int major;
   int minor;
   int build;
   char description[128];
};

// --- shared_allocations.c ---
#include "shared_allocations.h"

MEMFAULT_PUT_IN_SECTION(".shared")
static struct Version s_bootloader_version;

MEMFAULT_PUT_IN_SECTION(".shared")
static struct Version s_application_version;

[...]

note

This last example assumes that you will initialize the values appropriately. The values will persist across MCU resets.

Testing Coredump Platform Storage

The Memfault SDK provides functions for verifying the coredump platform storage layer is correctly implemented, by performing various reads, writes, and erases using the platform storage API.

After including and configuring one of the provided coredump platform storage implementations from the Memfault SDK ports, or when building a custom implementation, the coredump storage test functions are a great way to validate the storage is working correctly.

To run the test, call these functions either from your device's CLI, via a button press, or simply somewhere in your application (eg main function):

#include "memfault/components.h"

// Runs a sanity test to confirm coredump port is working as expected
int test_coredump_storage(int argc, char *argv[]) {

  // Note: Coredump saving runs from an ISR prior to reboot so should
  // be safe to call with interrupts disabled.
  your_platform_disable_interrupts();
  memfault_coredump_storage_debug_test_begin();
  your_platform_enable_interrupts();

  memfault_coredump_storage_debug_test_finish();
  return 0;
}

Memfault Coredump Format

Memfault's coredump format is detailed below. The format is not necessary to understand, but it's documented here to help debug issues with invalid coredump files.

Section	Field	Size (bytes)	Description	Example (Little endian)
Header	Magic	4	Hard-coded magic bytes in header	`0x45524F43` (CORE)
	Version	4	Version of Memfault coredump	`0x00000002` (2)
	Coredump Size	4	The size of the coredump including header and footer	`0x000008FC` (2300)
Block 1	Type	1	Signifies the type of the block	`0x01` (Memory Region)
	Reserved	3	Reserved for future use	All zeroes
	Address	4	- If the block is a memory region, this is the starting address in RAM. - All other block types, this will be all zeroes.	`0x80000000`
	Length	4	Length of block	1280
	Data	variable	Block data	1280 bytes of memory region data
Block 2..N	...	...	...	...
Footer	Magic	4	Hard-coded magic bytes in footer	`0x504D5544` (DUMP)
	Flags	4	Miscellaneous flags of the coredump	All zeroes
	Reserved	8	Reserved for future use	All zeroes

Blocks can have the following types:

Current registers
Memory region
Device serial
Deprecated
Hardware version
Coredump reason
Padding
Machine type (e_machine from the ELF header)
Deprecated
Arm MPU configuration
Software version
Software type
Build ID

Trace Event when Coredump Saving Fails

The Memfault Firmware SDK will fall back on recording a Trace Event when a coredump fails to save. Coredump saving can fail for a number of reasons, including:

coredump storage is already occupied by a coredump (hasn't been uploaded to Memfault yet)
failure during saving to storage (eg. flash write failure)

Trace Events show only the current and previous frame of the stack (PC + LR on ARM architectures) at the time of the event.

You can see an example of the fallback Trace Event staging here:

memfault_fault_handling_arm.c
loading...

View on GitHub

Note that Trace Events are marked in the UI, so you can distinguish them from Coredump Traces:

Platform specific storage region for crash data​

Rebooting after a coredump​

Choosing what to store in a coredump​

The Linker Script File​

Saving Specific Variables in a Coredump​

Capturing Peripheral Registers​

Locating Sections at a Specific Location​

Testing Coredump Platform Storage​

Memfault Coredump Format​

Trace Event when Coredump Saving Fails​

Platform specific storage region for crash data

Rebooting after a coredump

Choosing what to store in a coredump

The Linker Script File

Saving Specific Variables in a Coredump

Capturing Peripheral Registers

Locating Sections at a Specific Location

Testing Coredump Platform Storage

Memfault Coredump Format

Trace Event when Coredump Saving Fails