Linux Metrics
Introduction
A key feature of the Memfault SDK is the ability to capture metrics on device, aggregate them in regular Heartbeat messages, and upload them to the Memfault cloud.
Using metrics
The aggregated metrics:
- Are shown on the device timeline.
- Are available for use as Timeseries or Attributes metrics, aggregated across the entire fleet.
- Can be used to create Metric Charts and Alerts.
See Metrics for an introduction to Memfault terminology related to metrics and to learn how the features you'll set up here can be accessed in the Memfault Web App.
Prerequisites
Enable metrics collection
By default, enable_data_collection is false (see the default
configuration). This is to enable asking end users for
consent before collecting or transmitting any data to Memfault services.
Once the end user has given their consent, you can enable data collection like so:
$ memfaultctl enable-data-collection
To disable it:
$ memfaultctl disable-data-collection
The memfaultd service will restart automatically whenever you run either of
those commands if called with a value different from the current configuration.
Take a look at the /etc/memfaultd.conf reference for
more information. You can set enable_data_collection to true to bypass this
step if user consent is not required to collect metric data.
Configuring Metric Collection
Enabling memfaultd's system metric collection in the metrics
config turns on memfaultd's internal system for
capturing metrics about the resource usage and state of your device.
For the list of system metrics that will be captured by memfaultd when
metrics.enable_system_metric_collection is set to true, see the memfaultd
Built-in Metrics reference
Process Metric Collection
memfaultd can collect metrics that track the resource utilization of
individual processes on the system. When the
metrics.system_metric_collection.process configuration is set to null, only
data on the resource utilization of memfaultd will be collected.
We recommending selecting at least a few processes on your system to monitor and
updating the default /etc/memfaultd.conf like so:
"metrics": {
...
"system_metric_collection": {
...
"processes": ["memfaultd", "memfault-test-app", "systemd"]
}
}
Process-level metric collection can be disabled, even with
metrics.system_metric_collection.enable set to true, by setting
metrics.system_metric_collection.processes to []. This will disable the
module that collects process-specific metrics entirely.
Network Interface Metric Collection
memfaultd can collect metrics that track traffic on network interfaces on the
system. When the metrics.system_metric_collection.network_interfaces
configuration is set to null, memfaultd will report network traffic metrics
for all non-virtual interfaces.
You can specify the list of interfaces that should be tracked like so:
"metrics": {
...
"system_metric_collection": {
...
"network_interfaces": ["eth0", "wlan0"]
}
}
Network metric collection can be disabled, even with
metrics.system_metric_collection.enable set to true, by setting
metrics.system_metric_collection.network_interfaces to []. This will disable
the module that collects network interface metrics entirely.
Disk Space Metric Collection
memfaultd can monitor the available and used disk space of individual disks on
the system. When the metrics.system_metric_collection.disk_space configuration
is set to null, memfaultd will report disk space metrics for all mounts
listed in /proc/mounts whose ID starts with /dev.
You can specify the list of disks that should be tracked like so:
"metrics": {
...
"system_metric_collection": {
...
"disk_space": ["/dev/sda1", "/dev/sda2", "/dev/sdb1"]
}
}
Disk space metric collection can be disabled, even with
metrics.system_metric_collection.enable set to true, by setting
metrics.system_metric_collection.disk_space to []. This will disable the
module that collects network interface metrics entirely.
Configure the Heartbeat interval
All metrics collected by memfaultd are aggregated in memory. A new Heartbeat
will be generated and written to the filesystem at a regular interval. The
default value is 1 hour. You can change the interval by setting the
heartbeat_interval_seconds configuration option in /etc/memfaultd.conf.
A Heartbeat is also generated when you shutdown or restart memfaultd and every
time the memfaultctl sync command is used.
We strongly recommend using the default value of 1 hour for Heartbeat
aggregation. Setting heartbeat_interval_seconds to a low value will result in
more frequent Heartbeats that will eventually be rate-limited by Memfault's
backend.
Contact us if you need to adjust this value for reasons that are specific to your device or use case.
Custom Metrics with StatsD
An easy way to capture custom application metrics is to use the StatsD protocol.
memfaultd exposes a StatsD server at localhost:8125 (this is the default
location most StatsD client libraries will send readings to) and any application
running on the system can submit metric readings to it over UDP.
memfaultd built-in StatsD server
memfaultd's built-in StatsD server can be enabled by providing a
bind_address value for the statsd_server configuration field under the
top-level metrics config.
{
// ...
"metrics": {
"statsd_server": {
"bind_address": "127.0.0.1:8125",
},
// ...
},
// ...
}
With this config, StatsD datagrams can be sent to port 8125 on localhost to be
aggregated in Metric Reports.
StatsD Client Libraries
You can find a list of StatsD clients libraries for a diversity of languages in the StatsD repository.
In our meta-memfault-example distro, we've added StatsD clients as
dependencies:
statsd-c-clientin our C sample app:python3-statsdin our Python sample app:
Example: using C
See the full module in our example layer.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <statsd-client.h>
#include <unistd.h>
#define MAX_LINE_LEN 200
#define PKT_LEN 1400
int main(int argc, char *argv[])
{
statsd_link *link;
link = statsd_init_with_namespace("localhost", 8125, "mycapp");
char pkt[PKT_LEN] = {'\0'};
char tmp[MAX_LINE_LEN] = {'\0'};
statsd_prepare(link, "mygauge", 42, "g", 1.0, tmp, MAX_LINE_LEN, 1);
strncat(pkt, tmp, PKT_LEN - 1);
statsd_send(link, pkt);
statsd_finalize(link);
}
Example: using Python
See the full module in our example layer.
from statsd import StatsClient
statsd = StatsClient(
host="localhost",
port=8125,
prefix="mypythonapp",
)
statsd.gauge("mygauge", 42)
Custom Metrics with memfaultctl
The memfaultctl command permits writing custom metrics to memfaultd using
the write-metrics command. This is primarily intended for testing and
development.
# memfaultctl write-metrics mygauge=42 mycounter:1|c
Supported metric argument types:
KEY=VALUE:memfaultctlwill attempt to convert the value to a floating-point number as a Gauge metric, falling back to a string value.- statsd-style
KEY:VALUE|TYPE:memfaultctlwill parse the value as a floating-point number. Accepted types arec,g,ms,h, ors.
See the memfaultctl write-metrics documentation.
Custom Device Attributes
The memfaultctl command provides an easy way to set a device-specific
attribute with the write-attributes command.
# memfaultctl write-attributes APP_VERSION=1.4.2 ACTIVATED=true
Refer to memfaultctl write-attributes documentation.
Log to metrics
You can use the Memfault SDK for Linux to generate metrics directly from logs on your device. This is useful to capture metrics from applications that do not support StatsD or other metrics collection protocols.
For more details, refer to the logging guide.
Session Metric Reports
Full support for Sessions in the Linux SDK was shipped with version 1.11.0. If your device is running an earlier version, you will need to upgrade to get access to session-based Metric Reporting!
Sessions offer an alternative to the periodic Heartbeat report approach. Similar
the Heartbeat reports, a session Metric Report contains a set of aggregated
metrics calculated from readings captured over a period of time. The difference
is while Heartbeat reports have a consistent duration (configured via
heartbeat_interval_seconds), sessions are started and stopped dynamically
based on events on your device. This allows you to capture metrics that track
your device's health and behavior while taking a specific action or while it is
in a specific state.
Defining Session Types
Before a session can be captured by memfaultd, it first must be defined in the
/etc/memfaultd.conf configuration file.
See the memfaultd.conf reference for details.
The following is an example of what this config might look like:
{
"sessions": [
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "camera-recording"
},
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "video-cloud-sync"
}
]
}
This configuration would make sense for a smart camera device. With the above
config we define two sessions, one for when the camera is recording video
("camera-recording") and one for when the camera is syncing its recorded video
files to the cloud ("video-cloud-sync"). In both cases we are capturing CPU
and memory metrics to measure the load these operations put on our device.
memfaultd is able to handle overlapping sessions. So if our device is
uploading files to the cloud and recording video at the same time, memfaultd
can capture metrics for both session Metric Reports at the same time and will
create and upload two separate reports. This is important to consider when
defining your device's session types!
Controlling Sessions
The memfaultctl CLI provides commands for starting and ending sessions. Using
the above config, we can start a session as our device begins recording video
with the following command:
# memfaultctl start-session camera-recording
This command makes a request to memfaultd to start a camera-recording
session. Starting a session of a type where there is already a session in
progress is a no-op.
Once the a session is over, memfaultctl end-session can be used to build a
Metric Report for the aggregated metric readings captured during the session.
Optionally, gauge readings can be passed to the session as additional arguments
to memfaultctl start-session and memfaultctl end-session. In this example,
we might pass a recording_errored reading with memfaultctl end-session to
indicate whether our camera recording session experienced an error while it was
recording video (where 1 indicates an error was experience and 0 indicates no
errors). The command to end the recording session and report no errors would
look like:
# memfaultctl end-session camera-recording recording_errored=0
Once a session is ended, its report is written to disk by memfaultd and it
will be uploaded the next time memfaultd uploads data (as configured by
upload_interval_seconds).
High Resolution Telemetry
Memfault's standard Metric Reports aggregate metric readings over the duration of the report (the aggregation strategy used depends on the Metric Type - Counters are summed, Histograms are averaged, etc.) into a single value for each Metric. High Resolution Telemetry provides an alternative that can be used to enable deeper debugging by storing each metric reading's value individually alongside the timestamp at which the reading was received. This will increase the amount of data uploaded by Memfault but can provide insights at a second-by-second level that can aid in debugging.
High Resolution Telemetry is enabled by default and can be disabled by setting
metrics.hrt.enable to false. The maximum number of readings per minute is
750 by default and can be lowered via the metrics.hrt.max_samples_per_minute
configuration field.
Unlike Heartbeat and Session Metric Reports, High Resolution Telemetry Data will only be uploaded if a Device's Fleet Sampling configuration for Debugging is set to "On".
Testing your integration
During the development phase, we recommend setting a low value (e.g. 60 seconds)
for the heartbeat_interval_seconds and upload_interval_seconds settings in
/etc/memfaultd.conf. Take a look at the /etc/memfaultd.conf
reference for more information.
For changes in /etc/memfaultd.conf to take effect, you'll need to restart the
memfaultd daemon:
$ systemctl restart memfaultd
Finally, we recommend enabling developer mode for this device in the Memfault backend to lift any rate limiting and allow you to see your data in the Memfault Web App as soon as possible.
The following section should help you figure out where you may expect data to be accessible in the Memfault Web Application.
Viewing Metrics in the Web Application
To see detailed reports from a specific device, find the device in Fleet → Devices, and then open its Timeline tab.
Open Dashboards → Metrics to create Metric Charts that monitor metrics at the fleet level by aggregating the data from each device.
To receive notifications when your metrics exceed a certain threshold or meet any complex set of criteria, you can set up Alerts. Navigate to Alerts using the main menu on the Memfault Web App.