Linux Metrics

Introduction

A key feature of the Memfault SDK is the ability to capture metrics on device, aggregate them in regular Heartbeat messages, and upload them to the Memfault cloud.

Using metrics

The aggregated metrics:

Are shown on the device timeline.
Are available for use as Timeseries or Attributes metrics, aggregated across the entire fleet.
Can be used to create Metric Charts and Alerts.

See Metrics for an introduction to Memfault terminology related to metrics and to learn how the features you'll set up here can be accessed in the Memfault Web App.

Prerequisites

Enable metrics collection

By default, enable_data_collection is false (see the default configuration). This is to enable asking end users for consent before collecting or transmitting any data to Memfault services.

Once the end user has given their consent, you can enable data collection like so:

$ memfaultctl enable-data-collection

To disable it:

$ memfaultctl disable-data-collection

The memfaultd service will restart automatically whenever you run either of those commands if called with a value different from the current configuration.

Take a look at the /etc/memfaultd.conf reference for more information. You can set enable_data_collection to true to bypass this step if user consent is not required to collect metric data.

Configuring Metric Collection

Enabling memfaultd's system metric collection in the metrics config turns on memfaultd's internal system for capturing metrics about the resource usage and state of your device.

For the list of system metrics that will be captured by memfaultd when metrics.enable_system_metric_collection is set to true, see the memfaultd Built-in Metrics reference

Process Metric Collection

memfaultd can collect metrics that track the resource utilization of individual processes on the system. When the metrics.system_metric_collection.process configuration is set to null, only data on the resource utilization of memfaultd will be collected.

We recommending selecting at least a few processes on your system to monitor and updating the default /etc/memfaultd.conf like so:

  "metrics": {
    ...
    "system_metric_collection": {
      ...
      "processes": ["memfaultd", "memfault-test-app", "systemd"]
    }
  }

Process-level metric collection can be disabled, even with metrics.system_metric_collection.enable set to true, by setting metrics.system_metric_collection.processes to []. This will disable the module that collects process-specific metrics entirely.

Network Interface Metric Collection

memfaultd can collect metrics that track traffic on network interfaces on the system. When the metrics.system_metric_collection.network_interfaces configuration is set to null, memfaultd will report network traffic metrics for all non-virtual interfaces.

You can specify the list of interfaces that should be tracked like so:

  "metrics": {
    ...
    "system_metric_collection": {
      ...
      "network_interfaces": ["eth0", "wlan0"]
    }
  }

Network metric collection can be disabled, even with metrics.system_metric_collection.enable set to true, by setting metrics.system_metric_collection.network_interfaces to []. This will disable the module that collects network interface metrics entirely.

Disk Space Metric Collection

memfaultd can monitor the available and used disk space of individual disks on the system. When the metrics.system_metric_collection.disk_space configuration is set to null, memfaultd will report disk space metrics for all mounts listed in /proc/mounts whose ID starts with /dev.

You can specify the list of disks that should be tracked like so:

  "metrics": {
    ...
    "system_metric_collection": {
      ...
      "disk_space": ["/dev/sda1", "/dev/sda2", "/dev/sdb1"]
    }
  }

Disk space metric collection can be disabled, even with metrics.system_metric_collection.enable set to true, by setting metrics.system_metric_collection.disk_space to []. This will disable the module that collects network interface metrics entirely.

Configure the Heartbeat interval

All metrics collected by memfaultd are aggregated in memory. A new Heartbeat will be generated and written to the filesystem at a regular interval. The default value is 1 hour. You can change the interval by setting the heartbeat_interval_seconds configuration option in /etc/memfaultd.conf.

A Heartbeat is also generated when you shutdown or restart memfaultd and every time the memfaultctl sync command is used.

Note

We strongly recommend using the default value of 1 hour for Heartbeat aggregation. Setting heartbeat_interval_seconds to a low value will result in more frequent Heartbeats that will eventually be rate-limited by Memfault's backend.

Custom Metrics with StatsD

An easy way to capture custom application metrics is to use the StatsD protocol. memfaultd exposes a StatsD server at localhost:8125 (this is the default location most StatsD client libraries will send readings to) and any application running on the system can submit metric readings to it over UDP.

`memfaultd` built-in StatsD server

memfaultd's built-in StatsD server can be enabled by providing a bind_address value for the statsd_server configuration field under the top-level metrics config.

{
  // ...
  "metrics": {
    "statsd_server": {
      "bind_address": "127.0.0.1:8125",
    },
    // ...
  },
  // ...
}

With this config, StatsD datagrams can be sent to port 8125 on localhost to be aggregated in Metric Reports.

StatsD Client Libraries

You can find a list of StatsD clients libraries for a diversity of languages in the StatsD repository.

In our meta-memfault-example distro, we've added StatsD clients as dependencies:

statsd-c-client in our C sample app:
- See the corresponding DEPENDS addition
- See a sample recipe for statsd-c-client
python3-statsd in our Python sample app:
- See the corresponding DEPENDS addition

Example: using C

See the full module in our example layer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <statsd-client.h>
#include <unistd.h>

#define MAX_LINE_LEN 200
#define PKT_LEN 1400

int main(int argc, char *argv[])
{
  statsd_link *link;

  link = statsd_init_with_namespace("localhost", 8125, "mycapp");

  char pkt[PKT_LEN] = {'\0'};
  char tmp[MAX_LINE_LEN] = {'\0'};

  statsd_prepare(link, "mygauge", 42, "g", 1.0, tmp, MAX_LINE_LEN, 1);
  strncat(pkt, tmp, PKT_LEN - 1);
  statsd_send(link, pkt);

  statsd_finalize(link);
}

Example: using Python

See the full module in our example layer.

from statsd import StatsClient

statsd = StatsClient(
    host="localhost",
    port=8125,
    prefix="mypythonapp",
)

statsd.gauge("mygauge", 42)

Custom Metrics with `memfaultctl`

The memfaultctl command permits writing custom metrics to memfaultd using the write-metrics command. This is primarily intended for testing and development.

# memfaultctl write-metrics mygauge=42 mycounter:1|c

Supported metric argument types:

KEY=VALUE : memfaultctl will attempt to convert the value to a floating-point number as a Gauge metric, falling back to a string value.
statsd-style KEY:VALUE|TYPE : memfaultctl will parse the value as a floating-point number. Accepted types are c, g, ms, h, or s.

See the memfaultctl write-metrics documentation.

Custom Device Attributes

The memfaultctl command provides an easy way to set a device-specific attribute with the write-attributes command.

# memfaultctl write-attributes APP_VERSION=1.4.2 ACTIVATED=true

Refer to memfaultctl write-attributes documentation.

Log to metrics

You can use the Memfault SDK for Linux to generate metrics directly from logs on your device. This is useful to capture metrics from applications that do not support StatsD or other metrics collection protocols.

For more details, refer to the logging guide.

Session Metric Reports

Note

Full support for Sessions in the Linux SDK was shipped with version 1.11.0. If your device is running an earlier version, you will need to upgrade to get access to session-based Metric Reporting!

Sessions offer an alternative to the periodic Heartbeat report approach. Similar the Heartbeat reports, a session Metric Report contains a set of aggregated metrics calculated from readings captured over a period of time. The difference is while Heartbeat reports have a consistent duration (configured via heartbeat_interval_seconds), sessions are started and stopped dynamically based on events on your device. This allows you to capture metrics that track your device's health and behavior while taking a specific action or while it is in a specific state.

Defining Session Types

Before a session can be captured by memfaultd, it first must be defined in the /etc/memfaultd.conf configuration file.

See the memfaultd.conf reference for details.

The following is an example of what this config might look like:

{
  "sessions": [
    {
      "captured_metrics": [
        "cpu/sum/percent/system",
        "cpu/sum/percent/user",
        "cpu/sum/percent/idle",
        "cpu/sum/percent/idle",
        "memory/memory/free",
        "memory/memory/used"
      ],
      "name": "camera-recording"
    },
    {
      "captured_metrics": [
        "cpu/sum/percent/system",
        "cpu/sum/percent/user",
        "cpu/sum/percent/idle",
        "cpu/sum/percent/idle",
        "memory/memory/free",
        "memory/memory/used"
      ],
      "name": "video-cloud-sync"
    }
  ]
}

This configuration would make sense for a smart camera device. With the above config we define two sessions, one for when the camera is recording video ("camera-recording") and one for when the camera is syncing its recorded video files to the cloud ("video-cloud-sync"). In both cases we are capturing CPU and memory metrics to measure the load these operations put on our device.

memfaultd is able to handle overlapping sessions. So if our device is uploading files to the cloud and recording video at the same time, memfaultd can capture metrics for both session Metric Reports at the same time and will create and upload two separate reports. This is important to consider when defining your device's session types!

Controlling Sessions

The memfaultctl CLI provides commands for starting and ending sessions. Using the above config, we can start a session as our device begins recording video with the following command:

# memfaultctl start-session camera-recording

This command makes a request to memfaultd to start a camera-recording session. Starting a session of a type where there is already a session in progress is a no-op.

Once the a session is over, memfaultctl end-session can be used to build a Metric Report for the aggregated metric readings captured during the session. Optionally, gauge readings can be passed to the session as additional arguments to memfaultctl start-session and memfaultctl end-session. In this example, we might pass a recording_errored reading with memfaultctl end-session to indicate whether our camera recording session experienced an error while it was recording video (where 1 indicates an error was experience and 0 indicates no errors). The command to end the recording session and report no errors would look like:

# memfaultctl end-session camera-recording recording_errored=0

Once a session is ended, its report is written to disk by memfaultd and it will be uploaded the next time memfaultd uploads data (as configured by upload_interval_seconds).

High Resolution Telemetry

Memfault's standard Metric Reports aggregate metric readings over the duration of the report (the aggregation strategy used depends on the Metric Type - Counters are summed, Histograms are averaged, etc.) into a single value for each Metric. High Resolution Telemetry provides an alternative that can be used to enable deeper debugging by storing each metric reading's value individually alongside the timestamp at which the reading was received. This will increase the amount of data uploaded by Memfault but can provide insights at a second-by-second level that can aid in debugging.

High Resolution Telemetry is enabled by default and can be disabled by setting metrics.hrt.enable to false. The maximum number of readings per minute is 750 by default and can be lowered via the metrics.hrt.max_samples_per_minute configuration field.

Unlike Heartbeat and Session Metric Reports, High Resolution Telemetry Data will only be uploaded if a Device's Fleet Sampling configuration for Debugging is set to "On".

Testing your integration

During the development phase, we recommend setting a low value (e.g. 60 seconds) for the heartbeat_interval_seconds and upload_interval_seconds settings in /etc/memfaultd.conf. Take a look at the /etc/memfaultd.conf reference for more information.

For changes in /etc/memfaultd.conf to take effect, you'll need to restart the memfaultd daemon:

$ systemctl restart memfaultd

Finally, we recommend enabling developer mode for this device in the Memfault backend to lift any rate limiting and allow you to see your data in the Memfault Web App as soon as possible.

The following section should help you figure out where you may expect data to be accessible in the Memfault Web Application.

Viewing Metrics in the Web Application

To see detailed reports from a specific device, find the device in Fleet → Devices, and then open its Timeline tab.

Open Dashboards → Metrics to create Metric Charts that monitor metrics at the fleet level by aggregating the data from each device.

To receive notifications when your metrics exceed a certain threshold or meet any complex set of criteria, you can set up Alerts. Navigate to Alerts using the main menu on the Memfault Web App.

Introduction​

Using metrics​

Prerequisites​

Enable metrics collection​

Configuring Metric Collection​

Process Metric Collection​

Network Interface Metric Collection​

Disk Space Metric Collection​

Configure the Heartbeat interval​

Custom Metrics with StatsD​

memfaultd built-in StatsD server​

StatsD Client Libraries​

Example: using C​

Example: using Python​

Custom Metrics with memfaultctl​

Custom Device Attributes​

Log to metrics​

Session Metric Reports​

Defining Session Types​

Controlling Sessions​

High Resolution Telemetry​

Testing your integration​

Viewing Metrics in the Web Application​