MCU Reboot Reasons
There are many reasons a device may reboot in the field — whether it be due to a crash, a brownout, or a firmware update.
Within the Memfault UI, reboot events are displayed for each device as well as summarized in the main "Overview" dashboard:
In this guide we will walk through how to use the reboot tracking module from the memfault-firmware-sdk to collect this data.
This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, please complete the appropriate Getting Started guide.
Overview
The reboot tracking module utilizes a noinit region of RAM to track information about reboots across resets.
- If a MCU fault takes place, the fault reason will automatically be tracked
- If the device reboots and the reason is not known, the device reset reason will be derived from the MCU reset reason register that was part of your initial port.
Ingestion of Reboots Events may be rate-limited.
Platform-specific Reboots
Each platform has its own capabilities in terms of collecting information on a
reboot. To capture the reboot reason, your platform should implement
memfault_reboot_reason_get()
. On boot, the MCU SDK uses this function to
determine a reboot reason from your platform's hardware. Then, it calls
memfault_reboot_tracking_boot()
to write the reason to the noinit region of
RAM.
The MCU SDK implements memfault_reboot_reason_get()
for several platforms,
including:
- Atmel SAML1x
- ESP32 w/ ESP-IDF
- ESP8266 w/ ESP8266 SDK
- nRF Connect SDK supported devices
- Infineon PSoC 6
- Dialog DA145xx/DA1468x/DA1469x
- Mynewt supported devices
- nRF5 SDK supported devices
- STM32CubeSDK supported devices including F4/F7/H7/L4/WB
- NXP S32 SDK supported devices
- Particle Device OS supported devices
- NXP RT 1021
- Silicon Labs Gecko SDK supported devices
If your platform is not included in the above list, you will need to implement
memfault_reboot_reason_get()
. There are a few points to keep in mind when
implementing this function for your platform:
- The function should clear the reboot reason from the hardware before completing. This guarantees that the next boot will receive the next reboot reason.
- The function should try to classify reboot reasons as specifically as
possible. Any reason marked as unknown (
kMfltRebootReason_Unknown
) or one of the error reasons will cause the reboot to be classified as an "unexpected reboot".
Then, implement memfault_platform_reboot_tracking_boot()
to call
memfault_reboot_reason_get()
:
#include "memfault/ports/reboot_reason.h"
// [...]
MEMFAULT_PUT_IN_SECTION(".noinit.mflt_reboot_info")
static uint8_t s_reboot_tracking[MEMFAULT_REBOOT_TRACKING_REGION_SIZE];
void memfault_platform_reboot_tracking_boot(void) {
sResetBootupInfo reset_info = { 0 };
memfault_reboot_reason_get(&reset_info);
memfault_reboot_tracking_boot(s_reboot_tracking, &reset_info);
}
void memfault_platform_boot(void) {
// ...
memfault_platform_reboot_tracking_boot();
// ...
}
Unexpected Reboots
Unexpected reboots occur when the device restarts unintentionally. This is usually associated with a crash or other malfunction. Memfault categorizes these reboots as "unexpected", and that property is used to track fleet stability.
In the Memfault Firmware SDK, any reboot reason code >=
to
kMfltRebootReason_UnknownError
or ==
kMfltRebootReason_Unknown
will count
as an unexpected reboot.
loading...
The other reboot reason codes are considered "expected" reboots, and do not count as a crash when Memfault generates the metrics for fleet stability.
Expected Reboots
Sometimes resets take place due to expected, software-initiated behavior (i.e firmware update, button resets, etc.).
In the Memfault Firmware SDK, any reboot reason code >=
than
kMfltRebootReason_UserShutdown
and <
than MfltRebootReason_UnknownError
will count as an expected reboot.
loading...
Marking Imminent Reboots
For both expected and unexpected reboots that are initiated or detectable by
software before the reboot, you can record when and where these types of resets
take place through the memfault_reboot_tracking_mark_reset_imminent()
API. For
example, consider that you want to track anytime an expected reset occurs due to
an over-the-air update:
#include "memfault/components.h"
// [...]
void ota_finalize_and_reboot(void) {
MEMFAULT_REBOOT_MARK_RESET_IMMINENT(kMfltRebootReason_FirmwareUpdate);
memfault_platform_reboot(); // resets device
}
Or, if, for example, you detect an dynamic allocation failure, you can mark an imminent, unexpected reset due to an out of memory error:
#include "memfault/components.h"
// [...]
void alloc_failed_callback(void) {
MEMFAULT_REBOOT_MARK_RESET_IMMINENT(kMfltRebootReason_OutOfMemory);
}
Another useful utility is the MEMFAULT_ASSERT_WITH_REASON()
API, which will
mark an imminent reboot reason and then trigger a reset via an assert. This API
is useful when you want to reset to get the system out of an unknown state and
note a specific reason for doing so. Using the previous out of memory error
example, you will want to trigger a reset immediately after an allocation
failure is detected rather than wait for the code to error out in a strange,
unrelated place. Therefore, you should use mark an out of memory error and then
assert:
#include "memfault/components.h"
// [...]
void alloc_failed_callback(void) {
MEMFAULT_ASSERT_WITH_REASON(0, kMfltRebootReason_OutOfMemory);
}
On boot, memfault_reboot_tracking_boot()
will always be called, but if a
reboot reason was marked with memfault_reboot_tracking_mark_reset_imminent()
,
that reboot reason will take precedence over the reboot reason that is found and
recorded on boot with memfault_reboot_reason_get()
and
memfault_reboot_tracking_boot()
.
Reboot Reason IDs
The full list of reboot reason values can be found in the source file
reboot_reason_types.h
or in this
reference.
User Defined Reboot Reasons
Requires SDK version >= 1.7.0
Outside of the fixed set of reboot reasons you can also define custom reboot reasons tailored to your project.
To enable this feature, add the following define to your
memfault_platform_config.h
:
#define MEMFAULT_REBOOT_REASON_CUSTOM_ENABLE 1
Defining Custom Reboot Reasons
There is a maximum limit of 255 definable reboot reasons
To define custom reboot reasons, you will need to provide a config file. The
name of this file defaults to memfault_reboot_reason_user_config.def
, but you
can override it via MEMFAULT_REBOOT_REASON_USER_DEFS_FILE
in your project's
config header. Below is an example of some custom reboot reasons:
// Defines an "expected" reboot with string `ExpectedReboot`
MEMFAULT_EXPECTED_REBOOT_REASON_DEFINE(ExpectedReboot)
// Defines an "unexpected" reboot with string `UnexpectedReboot`
MEMFAULT_UNEXPECTED_REBOOT_REASON_DEFINE(UnexpectedReboot)
Note the difference when defining expected vs. unexpected reboots outlined earlier. By properly sorting your custom reboot reasons into the unexpected and expected reboot buckets, you will enable accurate calculation of fleet stability metrics.
The above definitions create an entry in the eMemfaultRebootReason
enum. To
mark an imminent reboot reason without needing to call
MEMFAULT_REBOOT_REASON_KEY()
to get the full enum name, the following
convenience macro can be used:
MEMFAULT_REBOOT_MARK_RESET_IMMINENT_CUSTOM(ExpectedReboot);
Deferring Reboot Reason Serialization
Most systems will record the Reboot event on system startup.
The exception is devices which need to initialize the values for
memfault_get_device_info()
after system initialization. You'll need to delay
the call to memfault_reboot_tracking_collect_reset_info()
to prevent invalid
values from being inserted into the serialized reboot event.
- Non-Zephyr
- Zephyr
For non-Zephyr applications, omit calling
memfault_reboot_tracking_collect_reset_info()
on boot, and instead call it
once memfault_get_device_info()
is valid.
On Zephyr, you'll need the Kconfig
CONFIG_MEMFAULT_RECORD_REBOOT_ON_SYSTEM_INIT=n
to disable the built-in reboot
reason serialization, then later once memfault_get_device_info()
is valid, you
can call memfault_zephyr_collect_reset_info()
Built-in Metrics
The Reboot Tracking component provides a built-in metric,
MemfaultSdkMetric_UnexpectedRebootCount
. At boot this metric is set to the
number of unexpected reboots since the last POR or the count was reset. Use this
metric to find devices with bootlooping behavior.
This counter is not cleared at boot, to clear this value call
memfault_reboot_tracking_reset_crash_count()
at a time after boot when it is
determined the device is not in a boot loop.