Skip to main content

Linux Metrics

Introduction

A key feature of the Memfault SDK is the ability to capture metrics on device, aggregate them in regular Heartbeat messages, and upload them to the Memfault cloud.

Using metrics

The aggregated metrics:

See Metrics for an introduction to Memfault terminology related to metrics and to learn how the features you'll set up here can be accessed in the Memfault Web App.

Prerequisites

Enable metrics collection

By default, enable_data_collection is false (see the default configuration). This is to enable asking end users for consent before collecting or transmitting any data to Memfault services.

Once the end user has given their consent, you can enable data collection like so:

$ memfaultctl enable-data-collection

To disable it:

$ memfaultctl disable-data-collection

The memfaultd service will restart automatically whenever you run either of those commands if called with a value different from the current configuration.

Take a look at the /etc/memfaultd.conf reference for more information. You can set enable_data_collection to true to bypass this step if user consent is not required to collect metric data.

Configuring Metric Collection

Enabling memfaultd's system metric collection in the metrics config turns on memfaultd's internal system for capturing metrics about the resource usage and state of your device.

For the list of system metrics that will be captured by memfaultd when metrics.enable_system_metric_collection is set to true, see the memfaultd Built-in Metrics reference

Process Metric Collection

memfaultd can collect metrics that track the resource utilization of individual processes on the system. When the metrics.system_metric_collection.process configuration is set to null, only data on the resource utilization of memfaultd will be collected.

We recommending selecting at least a few processes on your system to monitor and updating the default /etc/memfaultd.conf like so:

  "metrics": {
...
"system_metric_collection": {
...
"processes": ["memfaultd", "memfault-test-app", "systemd"]
}
}

Process-level metric collection can be disabled, even with metrics.system_metric_collection.enable set to true, by setting metrics.system_metric_collection.processes to []. This will disable the module that collects process-specific metrics entirely.

Network Interface Metric Collection

memfaultd can collect metrics that track traffic on network interfaces on the system. When the metrics.system_metric_collection.network_interfaces configuration is set to null, memfaultd will report network traffic metrics for all non-virtual interfaces.

You can specify the list of interfaces that should be tracked like so:

  "metrics": {
...
"system_metric_collection": {
...
"network_interfaces": ["eth0", "wlan0"]
}
}

Network metric collection can be disabled, even with metrics.system_metric_collection.enable set to true, by setting metrics.system_metric_collection.network_interfaces to []. This will disable the module that collects network interface metrics entirely.

Disk Space Metric Collection

memfaultd can monitor the available and used disk space of individual disks on the system. When the metrics.system_metric_collection.disk_space configuration is set to null, memfaultd will report disk space metrics for all mounts listed in /proc/mounts whose ID starts with /dev.

You can specify the list of disks that should be tracked like so:

  "metrics": {
...
"system_metric_collection": {
...
"disk_space": ["/dev/sda1", "/dev/sda2", "/dev/sdb1"]
}
}

Disk space metric collection can be disabled, even with metrics.system_metric_collection.enable set to true, by setting metrics.system_metric_collection.disk_space to []. This will disable the module that collects network interface metrics entirely.

Configure the Heartbeat interval

All metrics collected by memfaultd are aggregated in memory. A new Heartbeat will be generated and written to the filesystem at a regular interval. The default value is 1 hour. You can change the interval by setting the heartbeat_interval_seconds configuration option in /etc/memfaultd.conf.

A Heartbeat is also generated when you shutdown or restart memfaultd and every time the memfaultctl sync command is used.

Note

We strongly recommend using the default value of 1 hour for Heartbeat aggregation. Setting heartbeat_interval_seconds to a low value will result in more frequent Heartbeats that will eventually be rate-limited by Memfault's backend.

Contact us if you need to adjust this value for reasons that are specific to your device or use case.

Application metrics with StatsD

An easy way to capture application metrics is to use the StatsD protocol. memfaultd exposes a StatsD server at localhost:8125 (this is the default location most StatsD client libraries will send readings to) and any application running on the system can submit metric readings to it over UDP.

memfaultd built-in StatsD server

memfaultd's built-in StatsD server can be enabled by providing a bind_address value for the statsd_server configuration field under the top-level metrics config.

{
// ...
"metrics": {
"statsd_server": {
"bind_address": "127.0.0.1:8125",
},
// ...
},
// ...
}

With this config, StatsD datagrams can be sent to port 8125 on localhost to be aggregated in Metric Reports.

StatsD Client Libraries

You can find a list of StatsD clients libraries for a diversity of languages in the StatsD repository.

In our meta-memfault-example distro, we've added StatsD clients as dependencies:

Example: using C

See the full module in our example layer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <statsd-client.h>
#include <unistd.h>

#define MAX_LINE_LEN 200
#define PKT_LEN 1400

int main(int argc, char *argv[])
{
statsd_link *link;

link = statsd_init_with_namespace("localhost", 8125, "mycapp");

char pkt[PKT_LEN] = {'\0'};
char tmp[MAX_LINE_LEN] = {'\0'};

statsd_prepare(link, "mygauge", 42, "g", 1.0, tmp, MAX_LINE_LEN, 1);
strncat(pkt, tmp, PKT_LEN - 1);
statsd_send(link, pkt);

statsd_finalize(link);
}

Example: using Python

See the full module in our example layer.

from statsd import StatsClient

statsd = StatsClient(
host="localhost",
port=8125,
prefix="mypythonapp",
)

statsd.gauge("mygauge", 42)

Custom Device Attributes

The memfaultctl command provides an easy way to set a device-specific attribute with the write-attributes command.

# memfaultctl write-attributes APP_VERSION=1.4.2 ACTIVATED=true

Refer to memfaultctl write-attributes documentation.

Log to metrics

You can use the Memfault SDK for Linux to generate metrics directly from logs on your device. This is useful to capture metrics from applications that do not support StatsD or other metrics collection protocols.

For more details, refer to the logging guide.

Session Metric Reports

Note

Full support for Sessions in the Linux SDK was shipped with version 1.11.0. If your device is running an earlier version, you will need to upgrade to get access to session-based Metric Reporting!

Sessions offer an alternative to the periodic Heartbeat report approach. Similar the Heartbeat reports, a session Metric Report contains a set of aggregated metrics calculated from readings captured over a period of time. The difference is while Heartbeat reports have a consistent duration (configured via heartbeat_interval_seconds), sessions are started and stopped dynamically based on events on your device. This allows you to capture metrics that track your device's health and behavior while taking a specific action or while it is in a specific state.

note

Session Reports are available to all users, and can be visualized in the individual Device Timeline views.

Advanced Analytics, which enables Metrics Charts using session reports, Segments, data retention for 1 year, and other advanced features requires an add-on.

Advanced Analytics
This feature is limited to customers with Memfault's Advanced Analytics bundle. Please reach out to our sales or customer success team to access it.

Defining Session Types

Before a session can be captured by memfaultd, it first must be defined in the /etc/memfaultd.conf configuration file.

See the memfaultd.conf reference for details.

The following is an example of what this config might look like:

{
"sessions": [
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "camera-recording"
},
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "video-cloud-sync"
}
]
}

This configuration would make sense for a smart camera device. With the above config we define two sessions, one for when the camera is recording video ("camera-recording") and one for when the camera is syncing its recorded video files to the cloud ("video-cloud-sync"). In both cases we are capturing CPU and memory metrics to measure the load these operations put on our device.

memfaultd is able to handle overlapping sessions. So if our device is uploading files to the cloud and recording video at the same time, memfaultd can capture metrics for both session Metric Reports at the same time and will create and upload two separate reports. This is important to consider when defining your device's session types!

Controlling Sessions

The memfaultctl CLI provides commands for starting and ending sessions. Using the above config, we can start a session as our device begins recording video with the following command:

# memfaultctl start-session camera-recording

This command makes a request to memfaultd to start a camera-recording session. Starting a session of a type where there is already a session in progress is a no-op.

Once the a session is over, memfaultctl end-session can be used to build a Metric Report for the aggregated metric readings captured during the session. Optionally, gauge readings can be passed to the session as additional arguments to memfaultctl start-session and memfaultctl end-session. In this example, we might pass a recording_errored reading with memfaultctl end-session to indicate whether our camera recording session experienced an error while it was recording video (where 1 indicates an error was experience and 0 indicates no errors). The command to end the recording session and report no errors would look like:

# memfaultctl end-session camera-recording recording_errored=0

Once a session is ended, its report is written to disk by memfaultd and it will be uploaded the next time memfaultd uploads data (as configured by upload_interval_seconds).

High Resolution Telemetry

Memfault's standard Metric Reports aggregate metric readings over the duration of the report (the aggregation strategy used depends on the Metric Type - Counters are summed, Histograms are averaged, etc.) into a single value for each Metric. High Resolution Telemetry provides an alternative that can be used to enable deeper debugging by storing each metric reading's value individually alongside the timestamp at which the reading was received. This will increase the amount of data uploaded by Memfault but can provide insights at a second-by-second level that can aid in debugging.

High Resolution Telemetry is enabled by default and can be disabled by setting metrics.hrt.enable to false. The maximum number of readings per minute is 750 by default and can be lowered via the metrics.hrt.max_samples_per_minute configuration field.

Unlike Heartbeat and Session Metric Reports, High Resolution Telemetry Data will only be uploaded if a Device's Fleet Sampling configuration for Debugging is set to "On".

Testing your integration

During the development phase, we recommend setting a low value (e.g. 60 seconds) for the heartbeat_interval_seconds and upload_interval_seconds settings in /etc/memfaultd.conf. Take a look at the /etc/memfaultd.conf reference for more information.

For changes in /etc/memfaultd.conf to take effect, you'll need to restart the memfaultd daemon:

$ systemctl restart memfaultd

Finally, we recommend enabling developer mode for this device in the Memfault backend to lift any rate limiting and allow you to see your data in the Memfault Web App as soon as possible.

The following section should help you figure out where you may expect data to be accessible in the Memfault Web Application.

Viewing Metrics in the Web Application

To see detailed reports from a specific device, find the device in Fleet → Devices, and then open its Timeline tab.

Open Dashboards → Metrics to create Metric Charts that monitor metrics at the fleet level by aggregating the data from each device.

To receive notifications when your metrics exceed a certain threshold or meet any complex set of criteria, you can set up Alerts. Navigate to Alerts using the main menu on the Memfault Web App.