Linux Reboot Reason Tracking
Introduction
There are many reasons a device may reboot in the field — whether it be due to a kernel panic, a user reset, or a firmware update.
Within the Memfault UI, reboot events are displayed for each device as well as summarized in the main "Overview" dashboard:

Reboots chart in the "Overview" dashboard of the Memfault Web App.
In this guide we will walk through how to use the reboot reason tracking feature from the Memfault Linux SDK to collect this data.
Keep meta-memfault-example
open as a reference
implementation. Your integration should look similar to it once you're done
following the steps in this tutorial.
Prerequisites
The memfaultd
daemon
Follow the integration guide to learn how to set this up
for your device. A key function of memfaultd
is to detect, classify and upload
reboot events to the Memfault platform. This is enabled through the reboot
feature.
Linux kernel pstore
/ ramoops
configuration
The detection of kernel panics as a reboot reason depends on the so called
ramoops
subsystem and pstore
filesystem. From the Linux kernel admin
guide:
Ramoops is an oops/panic logger that writes its logs to RAM before the system crashes. It works by logging oopses and panics in a circular buffer. Ramoops needs a system with persistent RAM so that the content of that area can survive after a restart.
The pstore
is a RAM-backed filesystem that persists across reboots.
The easiest way to enable the pstore
in your kernel when using Yocto is via
the KERNEL_FEATURES
bitbake variable and add the pstore kernel
feature. To get finer-grained control over how the
pstore
is configured, you can instead use its Kconfig
options directly.
We also recommend adding the debug-panic-oops kernel feature to enable a kernel panic when an "oops" is encountered:
KERNEL_FEATURES:append = " cgl/features/pstore/pstore.scc cfg/debug/misc/debug-panic-oops.scc"
In the meta-memfault-example
QEMU integration, the KERNEL_FEATURES
approach
is taken, see these lines of code.
Next, you will need to specify what region of RAM to use. There are several ways
of doing this. The recommended way is to use a Device Tree binding. Please
consult this section on ramoops
parameters in the Linux kernel admin
guide for more details.
Lastly, it's possible to configure the kernel to always dump the kmsg logs
using printk.always_kmsg_dump
. This is expected to
be disabled (the default).
Note that in the meta-memfault-example
QEMU
integration, we are deviating from the
recommendation of using a Device Tree binding. The ramoops.*
kernel command
line arguments are used instead. The reason for this is that QEMU typically
auto-generates a Device Tree on-the-fly and extending it is more complicated.
Systemd configuration
The memfaultd
daemon will take care of cleaning up /sys/fs/pstore
after a
reboot of the system.
Often, systemd-pstore.service is configured to carry out this task. This would
conflict with memfaultd
performing this task. Therefore,
systemd-pstore.service has to be disabled. This service is automatically
excluded when including the meta-memfault
layer.
Note that memfaultd
does not provide functionality (yet) to archive pstore
files (like systemd-pstore.service can). If this is necessary for you, the
work-around is to create a service that performs the archiving and runs before
memfaultd.service
starts up.
Detecting reboot due to Over The Air updates
The OTA agent you are using (for example swupdate
) should inform the Memfault
SDK before rebooting after installing an update.
In our meta-memfault-example
this is accomplished by configuring
swupdate
to run the command
memfaultctl reboot --reason 3
after installing the update.
Reboot Reason Classification
Automatically detected reboot reasons
There are a few reboot reasons that memfaultd can determine with confidence for all embedded Linux devices out-of-the-box:
Reason | Description | Information Source |
---|---|---|
Kernel Panic | The Linux kernel crashed | pstore / ramoops |
User Reset | "Graceful" shutdown of the system | systemd state |
Providing a reboot reason to Memfault
There are many more types of reboots for which the detection can only be implemented in device-specific ways.
For example, an SoC may have special hardware to detect power brownouts. There is no "standard" way to read brownout detection state in Linux.
Another example: a device's software may contain logic to decide that the device needs to shut down or reset, e.g. because the battery has dropped below a certain point, or because the user pressed a button.
To track reboot reasons which the Memfault SDK cannot possibly know how to
detect, the SDK provides a way to extend the reboot reason classification, via
the last_reboot_reason_file
and the memfaultctl reboot
command.