Collecting Device Metrics

There are many system health vitals that are useful to track aside from crashes and reboots. The options are numerous but you can expand the toggle to get a few examples.
  • RTOS related statistics
    • Amount of time spent in each RTOS task per unit time. This can help you understand if one task is starving the system
    • Heap high water marks
    • Stack high water marks
  • Time MCU was in different states
    • Stop, Sleep, Run Mode
    • Time each peripherals were active
  • Battery life drop per unit time.
  • Transport specific metrics (LTE, WiFI, BLE, LoRa, etc)
    • Amount of time transport was connected
    • Amount of connection attempts
  • Number of bytes being over transport per unit time.

In the Memfault UI, you can configure Alerts based on these metrics as well as explore metrics collected for any device.

Here is an example where the time bluetooth was connected, the amount of bytes sent and the battery life were tracked. In Memfault's UI, the data that gets collected from each device over time, is visualized in customizable graphs:

Metrics

The Memfault SDK includes a "metrics" component that makes it easy to collect this type on information on an embedded device. In the sections below we will walk through how get started with the component.

Prerequisite

This guide assumes you have already completed the minimal integration of the Memfault SDK. If you have not, check out the appropriate guide in the table below.

MCU ArchitectureGetting Started Guide
ARM Cortex-MARM Cortex-M Integration Guide
nRF Connect SDKnRF Connect SDK Integration Guide
ESP32 ESP-IDFESP32 ESP-IDF Integration Guide
ESP8266ESP8266 RTOS Integration Guide

Defining Custom Metrics#

All custom metrics can be defined with the MEMFAULT_METRICS_KEY_DEFINE macro in the memfault_metrics_heartbeat_config.def created as part of your port. In this guide we will walk through a simple example of tracking the high water mark of the stack for a "Main Task" in our application and the number of bytes sent out over a bluetooth connection.

// File $PROJECT_ROOT/third_party/memfault/memfault_metrics_heartbeat_config.def
MEMFAULT_METRICS_KEY_DEFINE(MainTaskStackHwm, kMemfaultMetricType_Unsigned)
MEMFAULT_METRICS_KEY_DEFINE(BtBytesSent, kMemfaultMetricType_Unsigned)

Dependency Function Overview#

The metrics subsystem uses the "timer" implemented as part of your initial port to control when data is aggregated into a "heartbeat". When the heartbeat subsystem is booted, a dependency function memfault_platform_metrics_timer_boot is called to set up this timer. Most RTOSs have a software timer implementation that can be directly mapped to the API or a hardware timer can be used as well. The expectation is that callback will be invoked every period_sec (which by default is once / hour).

The metrics subsystem supports a timer type (kMemfaultMetricType_Timer) which can be used to easily track durations (i.e. time spent in MCU stop mode) as well as overall system uptime. To support this, the memfault_platform_get_time_since_boot_ms() function implemented as part of the initial port is used. Typically this information is derived from either a system's Real Time Clock (RTC) or the SysTick counter used by an RTOS.

Setting Metric Values#

There's a set of APIs in components/include/memfault/metrics/metrics.h which can be used to easily update heartbeats as events take place. The updates occur in RAM so there is negligible overhead introduced. Here's an example:

#include "memfault/metrics/metrics.h"
// [ ... ]
void bluetooth_driver_send_bytes(const void *data, size_t data_len) {
memfault_metrics_heartbeat_add(MEMFAULT_METRICS_KEY(BtBytesSent), data_len);
// [ ... code to send bluetooth data ... ]
}

Including Sampled Values in a Heartbeat#

memfault_metrics_heartbeat_collect_data() is called at the very end of each heartbeat interval.

By default this is a weak empty function but you will want to implement it if there's data you want to sample and include in a heartbeat (i.e recorded RSSI, battery level, stack high water marks, etc).

The MainTaskStackHwm we are tracking in this guide is a good example for how to make use of this function.

#include "memfault/metrics/platform/overrides.h"
// [...]
void memfault_metrics_heartbeat_collect_data(void) {
// NOTE: When using FreeRTOS we can just call
// "uxTaskGetStackHighWaterMark(s_main_task_tcb)"
const uint32_t stack_high_water_mark = // TODO: code to get high water mark
memfault_metrics_heartbeat_set_unsigned(
MEMFAULT_METRICS_KEY(MainTaskStackHwm), stack_high_water_mark);
}

Initial Setup & Debug APIs#

While integrating the heartbeat metrics subsystem or adding new metrics, there are a few easy ways you can debug and test the new code. Notably:

  • memfault_metrics_heartbeat_debug_trigger() can be called at any time to trigger a heartbeat serialization (so you don't have to wait for the entire interval to get data to flush)
  • memfault_metrics_heartbeat_debug_print() can be called to dump the current value of all the metrics being tracked
  • The heartbeat interval can be reduced from the default 3600 seconds for debug purposes by setting MEMFAULT_METRICS_HEARTBEAT_INTERVAL_SECS in your memfault_platform_config.h interval to a shorter period such as 30 seconds.