Skip to main content

Linux Metrics

Introduction

A key feature of the Memfault SDK is the ability to capture metrics on device, aggregate them in regular Heartbeat messages, and upload them to the Memfault cloud.

Using metrics

The aggregated metrics:

See Metrics for an introduction to Memfault terminology related to metrics and to learn how the features you'll set up here can be accessed in the Memfault Web App.

Architecture

memfaultd has the ability to capture some metrics as a standalone program. These built-in metrics can be augmented by collectd to capture further metrics from the system. All metrics from collectd are sent to the local HTTP memfaultd is running on the device.

This architecture has the following advantages:

  • Metrics are captured by a well-known and widely used tool. A wide set of plugins are available.
  • You can easily push your own metrics using one of the statsd plugin for collectd or via memfaultd's built-in StatsD server.
  • Metrics are aggregated in memory and sent to the Memfault backend in a single Heartbeat message. This reduces the number of messages sent to the backend and reduces the load on the device.
  • Aggregated metrics are written to disk (or to a volatile storage if the tmp_storage is configured to point to a tmpfs), and you have precise control on how much space is used by your diagnostic data.
  • Memfault Fleet Sampling can be used to enable or disable metrics captures remotely.
  • Until the user opts-in for data collection, all data is dropped and never written to disk or network.

Prerequisites

The memfaultd daemon

Follow the integration guide to learn how to set this up for your device. You will need the collectd feature enabled to use collectd with Memfault.

collectd

collectd is a lightweight daemon that collects system and application metrics. It is widely used in the Linux ecosystem and has a large number of plugins available to collect metrics from a variety of sources.

The collectd feature of memfaultd enables a collectd-compatible HTTP endpoint. This endpoint will receive metrics pushed by the collectd daemon at any frequency. memfaultd aggregates the metrics (in memory) into Heartbeats and sends them to the Memfault cloud.

In addition to standard metrics captured by collectd, you can capture custom metrics from your application using the StatsD protocol to collectd.

To use collectd with Memfault, we will configure collectd to send metrics to memfaultd (over HTTP on localhost). This is our recommended configuration and demonstrated in meta-memfault-example.

collectd with Yocto

For Yocto, the meta-oe layer includes a recipe for collectd, so you may be able to just add collectd to your dependencies, e.g. by appending it to IMAGE_INSTALL. In our example project, we've opted for adding it to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS in layer.conf.

Configuring memfaultd

Configure Built-in System Metric Collection

Enabling memfaultd's system metric collection in the metrics config turns on memfaultd's internal system for capturing metrics about the resource usage and state of your device. When memfaultd is capturing metrics for a given top-level namespace or "family" of metrics (for a metric like cpu/sum/percent/idle that would be cpu), collectd metric readings that share that namespace will be ignored. This is to prevent double counting or the aggregation of readings that may have slightly different collection methodologies together.

For the list of system metrics that will be captured by memfaultd when metrics.enable_system_metric_collection is set to true, see the memfaultd Built-in Metrics reference

Configure the http server

memfaultd exposes an HTTP server on port 8787 by default. The port must match the port that you will define in the collectd configuration. You can change the port by setting the http_port configuration option in /etc/memfaultd.conf.

Metrics are expected to be sent to the /v1/collectd path (this is not configurable).

Enable metrics collection

By default, enable_data_collection is false (see the default configuration). This is to enable asking end users for consent before collecting or transmitting any data to Memfault services.

Once the end user has given their consent, you can enable data collection like so:

$ memfaultctl enable-data-collection

To disable it:

$ memfaultctl disable-data-collection

The memfaultd service will restart automatically whenever you run either of those commands if called with a value different from the current configuration.

Take a look at the /etc/memfaultd.conf reference for more information.

Configure the Heartbeat interval

All metrics pushed by collectd are aggregated in memory. A new Heartbeat will be generated at a regular interval. The default value is 1 hour. You can change the interval by setting the heartbeat_interval_seconds configuration option in /etc/memfaultd.conf.

A Heartbeat is also generated when you shutdown or restart memfaultd and every time the memfaultctl sync command is used.

Note

We recommend using the default value of 1 hour for Heartbeat aggregation. Setting heartbeat_interval_seconds to a low value will result in more frequent Heartbeats that will eventually be rate-limited by Memfault's backend.

Contact us if you need to adjust this value.

Configuring collectd

We strongly recommend familiarizing yourself with collectd and how it's configured in order to make the best use of the Memfault platform.

A minimal configuration file to configure collectd to post to Memfault looks like this:

<LoadPlugin write_http>
FlushInterval 10
</LoadPlugin>
<Plugin write_http>
<Node "memfaultd">
URL "http://127.0.0.1:8787/v1/collectd"
Format "JSON"
Metrics true
Notifications false
StoreRates true
BufferSize 65536
Timeout 10000
</Node>
</Plugin>

This file configures the write_http plugin to send data to the memfaultd daemon on localhost. By default the port 8787 is used. We also set a FlushInterval of 10 seconds.

Note

The FlushInterval is the interval at which collectd will send data to memfaultd. Because memfaultd will re-aggregate the data in memory, a faster frequency will not result in more points visible in the Memfault backend. We recommend anywhere between 10 and 60 seconds.

Setting this interval to a value that is too high will increase the time error on the measurement (heartbeats are time-stamped when they are generated but the data may be much older if the FlushInterval is too high). With a FlushInterval greater than the Heartbeat interval, you would have Heartbeats without data points.

Note that this minimal example does not capture any data. You will need to activate some CollectD plugins to capture useful data. This will be covered in the next section.

See our recommended collectd configuration in meta-memfault-example. This configuration makes use of standard plugins that enjoy special support on the Memfault platform. Copying this configuration over to your project will guarantee a good first experience.

Application metrics with StatsD

An easy way to capture application metrics is to use the StatsD protocol. Metrics are captured inside your application and sent to collectd (over UDP). There are two supported StatsD servers in the Linux SDK - memfaultd's built-in StatsD server and the statsd plugin for collectd. Our recommendation is to run only one of these at a time, as they serve the same purpose and if both run on their default port (8125 for both) one will fail to start due to the port already being in use (by the other StatsD server).

memfaultd built-in StatsD server

memfaultd's built-in StatsD server can be enabled by providing a bind_address value for the statsd_server configuration field under the top-level metrics config.

{
// ...
"metrics": {
"statsd_server": {
"bind_address": "127.0.0.1:8125",
},
// ...
},
// ...
}

With this config, StatsD datagrams can be sent to port 8125 on localhost to be aggregated in Metric Reports. Note that memfaultd does not currently support Timer StatsD readings (which have the ms type).

statsd plugin for collectd

Collectd will summarize received StatsD metric readings and push them to memfaultd once every 10s. memfaultd will aggregate them in memory (just like other system metrics captured by collectd) and send them to the Memfault backend in a Heartbeat message.

Enable the statsd plugin in your /etc/collectd.conf (note that this is already included in the recommended configuration):

LoadPlugin statsd

We also recommend the following configuration to:

<Plugin statsd>
# Default value for Host is "::" / 0.0.0.0
Host "127.0.0.1"
# Adds a 'count' metric for each counter (in addition to rate/second)
CounterSum true
# Don't dispatch gauges or counters that have not been written to in an interval
DeleteGauges true
DeleteCounters true
</Plugin>

This configuration will:

  • Open the statsd port to localhost only.
  • Add a count metric for each counter which is useful to see how many times specific events happened per Heartbeat (by default, CollectD will convert them to rate/second).
  • Use the DeleteGauges and DeleteCounters options so that measurements are only sent when they have been updated in the last interval.

The StatsD plugin exposes a UDP port (by default 8125). You'll need to configure your StatsD client to talk to it. Read on to see some examples.

Install a StatsD Client

You can find a list of StatsD clients for a diversity of languages in the StatsD repository.

In our example project, we've added StatsD clients as dependencies:

Example: using C

See the full module in our example layer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <statsd-client.h>
#include <unistd.h>

#define MAX_LINE_LEN 200
#define PKT_LEN 1400

int main(int argc, char *argv[])
{
statsd_link *link;

link = statsd_init_with_namespace("localhost", 8125, "mycapp");

char pkt[PKT_LEN] = {'\0'};
char tmp[MAX_LINE_LEN] = {'\0'};

statsd_prepare(link, "mygauge", 42, "g", 1.0, tmp, MAX_LINE_LEN, 1);
strncat(pkt, tmp, PKT_LEN - 1);
statsd_send(link, pkt);

statsd_finalize(link);
}

Example: using Python

See the full module in our example layer.

from statsd import StatsClient

statsd = StatsClient(
host="localhost",
port=8125,
prefix="mypythonapp",
)

statsd.gauge("mygauge", 42)

Custom Device Attributes

The memfaultctl command provides an easy way to set a device-specific attribute with the write-attributes command.

# memfaultctl write-attributes APP_VERSION=1.4.2 ACTIVATED=true

Refer to memfaultctl write-attributes documentation.

Log to metrics

You can use the Memfault SDK for Linux to generate metrics directly from logs on your device. This is useful to capture metrics from applications that do not support StatsD or other metrics collection protocols.

For more details, refer to the logging guide.

Sessions

Note

Full support for Sessions in the Linux SDK was shipped with version 1.11.0. If your device is running an earlier version, you will need to upgrade to get access to session-based Metric Reporting!

Sessions offer an alternative to the periodic Heartbeat report approach. Similar the Heartbeat reports, a session Metric Report contains a set of aggregated metrics calculated from readings captured over a period of time. The difference is while Heartbeat reports have a consistent duration (configured via heartbeat_interval_seconds), sessions are started and stopped dynamically based on events on your device. This allows you to capture metrics that track your device's health and behavior while taking a specific action or while it is in a specific state.

note

Session Reports are available to all users, and can be visualized in the individual Device Timeline views.

Advanced Analytics, which enables Metrics Charts using session reports, Segments, data retention for 1 year, and other advanced features requires an add-on.

Advanced Analytics
This feature is limited to customers with Memfault's Advanced Analytics bundle. Please reach out to our sales or customer success team to access it.

Defining Session Types

Before a session can be captured by memfaultd, it first must be defined in the /etc/memfaultd.conf configuration file.

See the memfaultd.conf reference for details.

The following is an example of what this config might look like:

{
"sessions": [
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "camera-recording"
},
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "video-cloud-sync"
}
]
}

This configuration would make sense for a smart camera device. With the above config we define two sessions, one for when the camera is recording video ("camera-recording") and one for when the camera is syncing its recorded video files to the cloud ("video-cloud-sync"). In both cases we are capturing CPU and memory metrics to measure the load these operations put on our device.

memfaultd is able to handle overlapping sessions. So if our device is uploading files to the cloud and recording video at the same time, memfaultd can capture metrics for both session Metric Reports at the same time and will create and upload two separate reports. This is important to consider when defining your device's session types!

Controlling Sessions

The memfaultctl CLI provides commands for starting and ending sessions. Using the above config, we can start a session as our device begins recording video with the following command:

# memfaultctl start-session camera-recording

This command makes a request to memfaultd to start a camera-recording session. Starting a session of a type where there is already a session in progress is a no-op.

Once the a session is over, memfaultctl end-session can be used to build a Metric Report for the aggregated metric readings captured during the session. Optionally, gauge readings can be passed to the session as additional arguments to memfaultctl start-session and memfaultctl end-session. In this example, we might pass a recording_errored reading with memfaultctl end-session to indicate whether our camera recording session experienced an error while it was recording video (where 1 indicates an error was experience and 0 indicates no errors). The command to end the recording session and report no errors would look like:

# memfaultctl end-session camera-recording recording_errored=0

Once a session is ended, its report is written to disk by memfaultd and it will be uploaded the next time memfaultd uploads data (as configured by upload_interval_seconds).

Testing your integration

During the development phase, we recommend setting a low value (e.g. 60 seconds) for the heartbeat_interval_seconds and upload_interval_seconds settings in /etc/memfaultd.conf. Take a look at the /etc/memfaultd.conf reference for more information.

For changes in /etc/memfaultd.conf to take effect, you'll need to restart the memfaultd daemon:

$ systemctl restart memfaultd

Finally, we recommend enabling developer mode for this device in the Memfault backend to lift any rate limiting and allow you to see your data in the Memfault Web App as soon as possible.

The following section should help you figure out where you may expect data to be accessible in the Memfault Web Application.

Viewing Metrics in the Web Application

To see detailed reports from a specific device, find the device in Fleet → Devices, and then open its Timeline tab.

Open Dashboards → Metrics to create Metric Charts that monitor metrics at the fleet level by aggregating the data from each device.

To receive notifications when your metrics exceed a certain threshold or meet any complex set of criteria, you can set up Alerts. Navigate to Alerts using the main menu on the Memfault Web App.