Core Metrics & Device Vitals
Memfault provides support for a set of core metrics that apply to a wide range of devices. These metrics are either automatically collected by the on-device SDKs, or the SDKs have built-in facilities for enabling collection. The core metrics are also exempt from Custom Metric and Custom Attribute quotas.
Once data is being collected, Memfault's servers will automatically process these metrics to provide a set of key insights ("Device Vitals") across your fleet. Each of the Device Vitals insights can be visualized in special pre-configured dashboard cards, which can be added to dashboards.
Required SDK Version
The appropriate SDK version needs to be integrated before these metrics can be used. See the guides linked from each platform's section for getting Memfault up and running on your devices.
Platform | Required SDK Version |
---|---|
Android | >= 4.13 |
Linux | >= 1.9 |
MCU | >= 1.5.0 |
A summary of the Memfault Core Metrics provided by the device SDKs:
Category | Metric Keys | Automatically Collected on these SDKs |
Battery Life | battery_soc_pct_drop | Android |
battery_discharge_duration_ms | ||
battery_soc_pct | ||
Stable Hours | operational_hours | Android, Linux, MCU |
operational_crashfree_hours | ||
Periodic Connectivity (custom) | sync_successful | |
sync_failure | ||
Periodic Connectivity (Memfault) | sync_memfault_successful | Android, Linux |
sync_memfault_failure | ||
Always On Connectivity | connectivity_connected_time | Android |
connectivity_expected_time |
Battery Life
For background and principles around these metrics, see the following docs:
- Documentation of best practices for battery life tracking
- Debugging Android Battery Life with Memfault
The Battery Life core metrics are:
battery_soc_pct
: The battery's State of Charge (SoC) at the end of the heartbeat interval, reported as a percentage from 0-100.battery_soc_pct_drop
: The drop in the battery's State of Charge (SoC) during the heartbeat interval, reported as a percentage drop from 0-100.battery_discharge_duration_ms
: The time spent discharging the battery during the heartbeat interval, in milliseconds.
SDK Collection
These metrics are standardized and supported across all Memfault SDKs, but collection varies by platform:
- Android
- Linux
- MCU
Battery metrics are collected automatically on devices running the Memfault Android SDK.
To enable battery monitoring on Linux, you can either call the SDK whenever the battery status changes, or tell the SDK how to periodically poll the battery status.
Event based battery monitoring
You can notify Memfault SDK of changes in the battery status by calling memfaultctl add-battery-reading.
Periodic battery based monitoring
memfaultd
can also poll the battery status periodically. See battery_monitor
in memfaultd.conf for more details.
To enable collection in the Memfault Firmware SDK, the following flag must be
set in memfault_platform_config.h
:
#define MEMFAULT_METRICS_BATTERY_ENABLE 1
And this function should be called once the battery state-of-charge reading is available on boot:
#include "memfault/metrics/battery.h"
// called on system boot, once the battery state-of-charge reading is available
void memfault_platform_battery_boot(void) {
memfault_metrics_battery_boot();
}
Once MEMFAULT_METRICS_BATTERY_ENABLE
is enabled, the SDK will automatically
record the battery metrics at the end of each heartbeat interval.
The following platform function must be implemented (which is called by the SDK, during metrics collection at the end of the heartbeat):
loading...
For rechargeable batteries, this function should be called when the battery stops discharging (for example, when the charger is connected or the battery is removed):
loading...
A typical implementation of those platform dependencies would look like this:
static bool s_battery_is_discharging = false;
// This function is called on system initialization, to set the initial
// discharging state
void battery_set_initial_discharging_state(bool discharging) {
s_battery_is_discharging = discharging;
}
// This function is called by when the battery discharging state changes, i.e.:
// CHARGING → DISCHARGING or DISCHARGING → CHARGING
void battery_charge_state_changed_callback(bool discharging) {
if (discharging != s_battery_is_discharging) {
// update the state of the battery
s_battery_is_discharging = discharging;
if(!discharging) {
// signal the Memfault SDK that the battery has stopped discharging
memfault_metrics_battery_stopped_discharging();
}
}
}
// This function is called by the Memfault SDK at each heartbeat interval end,
// to get the current battery state-of-charge and discharging state.
int memfault_platform_get_stateofcharge(sMfltPlatformBatterySoc *soc) {
uint8_t percentage;
// get the battery level
int err = battery_get_soc_percent(&percentage);
if (err) {
MEMFAULT_LOG_ERROR("Failed to get battery level: %d", err);
return -1;
}
// for rechargeable batteries, check if the battery is discharging (i.e. not
// charging)
const bool discharging = battery_is_discharging();
// set the output data
*soc = (sMfltPlatformBatterySoc){
.soc = percentage,
.discharging = discharging,
};
return 0;
}
Insights
Add an Expected Battery Life card to your dashboard to see these insights.
Stable Hours
For background and principles around these metrics, see the following docs:
- Measuring Fleet Reliability With Stable Hours
- Interrupt Article: Counting Crashes to Improve Device Reliability
The Stable Hours core metrics are:
operational_hours
: The number of hours the device has been operational, since the last collection of this metric.operational_crashfree_hours
: The number of hours the device has been operational without a crash, since the last collection of this metric.
The goal of these metrics is to provide a straightforward aggregate metric, "% of operational hours without crashes", that can be used to track the reliability of a fleet of devices over time and compare reliability across firmware releases.
These metrics are collected automatically by all Memfault SDKs.
On MCU, a "crash" is counted when the device restarts unexpectedly. On Android and Linux, a "crash" is counted when a process crashes and a trace is captured by the Memfault SDK.
These metric values increment only on continuous hours of operation. The stable hours measurement will work best when the devices regularly exceeds one hour of continuous operation.
If your device operates on sub-hour intervals (i.e. no uptime counter is maintained between sessions, or the device restarts often), Memfault recommends using "crash-free sessions" instead of "stable hours" to measure reliability. Please contact us if you'd like to implement those metrics.
Low Power Modes and Operational Hours
Devices in low power modes usually advance the necessary time value for the Memfault SDK to capture continuous operational hours through low power modes, but this is device and platform-specific:
- Android
- Linux
- MCU
On Android, the operational_hours
and operational_crashfree_hours
metrics
will advance through device low-power modes, and will reset if the device
restarts.
On Linux, the operational_hours
and operational_crashfree_hours
metrics are
paused if the device enters a suspend mode, and will reset if the device
restarts.
On MCU, the operational_hours
and operational_crashfree_hours
metrics will
reset on a device restart.
The Memfault SDK uses the platform-specific
memfault_platform_get_time_since_boot_ms()
implementation to compute the
values for those metrics (this is implemented for you on
FreeRTOS/Zephyr/ESP-IDF). If the time returned by that function is not advancing
during low power modes, the metrics will not advance either.
Insights
Add a Stable Hours card to your dashboard to see these insights.
Connectivity
Connectivity metrics are split into two categories:
- Periodic Connectivity: Metrics that track the success or failure of individual data sync sessions
- Always-On Connectivity: Metrics that track the uptime of the device's connectivity
Both variants of the connectivity metrics provide a similar aggregate analysis as stable hours, where Memfault can show "% of successful data syncs" or "% of connectivity uptime" for groups of devices.