Linux Metrics
Introduction
A key feature of the Memfault SDK is the ability to capture metrics on device, aggregate them in regular Heartbeat messages, and upload them to the Memfault cloud.
Using metrics
The aggregated metrics:
- Are shown on the device timeline.
- Are available for use as Timeseries or Attributes metrics, aggregated across the entire fleet.
- Can be used to create Metric Charts and Alerts.
See Metrics for an introduction to Memfault terminology related to metrics and to learn how the features you'll set up here can be accessed in the Memfault Web App.
Prerequisites
Enable metrics collection
By default, enable_data_collection
is false
(see the default
configuration). This is to enable asking end users for
consent before collecting or transmitting any data to Memfault services.
Once the end user has given their consent, you can enable data collection like so:
$ memfaultctl enable-data-collection
To disable it:
$ memfaultctl disable-data-collection
The memfaultd
service will restart automatically whenever you run either of
those commands if called with a value different from the current configuration.
Take a look at the /etc/memfaultd.conf
reference for
more information. You can set enable_data_collection
to true
to bypass this
step if user consent is not required to collect metric data.
Configuring Metric Collection
Enabling memfaultd
's system metric collection in the metrics
config turns on memfaultd
's internal system for
capturing metrics about the resource usage and state of your device.
For the list of system metrics that will be captured by memfaultd
when
metrics.enable_system_metric_collection
is set to true
, see the memfaultd
Built-in Metrics reference
Process Metric Collection
memfaultd
can collect metrics that track the resource utilization of
individual processes on the system. When the
metrics.system_metric_collection.process
configuration is set to null
, only
data on the resource utilization of memfaultd
will be collected.
We recommending selecting at least a few processes on your system to monitor and
updating the default /etc/memfaultd.conf
like so:
"metrics": {
...
"system_metric_collection": {
...
"processes": ["memfaultd", "memfault-test-app", "systemd"]
}
}
Process-level metric collection can be disabled, even with
metrics.system_metric_collection.enable
set to true
, by setting
metrics.system_metric_collection.processes
to []
. This will disable the
module that collects process-specific metrics entirely.
Network Interface Metric Collection
memfaultd
can collect metrics that track traffic on network interfaces on the
system. When the metrics.system_metric_collection.network_interfaces
configuration is set to null
, memfaultd
will report network traffic metrics
for all non-virtual interfaces.
You can specify the list of interfaces that should be tracked like so:
"metrics": {
...
"system_metric_collection": {
...
"network_interfaces": ["eth0", "wlan0"]
}
}
Network metric collection can be disabled, even with
metrics.system_metric_collection.enable
set to true
, by setting
metrics.system_metric_collection.network_interfaces
to []
. This will disable
the module that collects network interface metrics entirely.
Disk Space Metric Collection
memfaultd
can monitor the available and used disk space of individual disks on
the system. When the metrics.system_metric_collection.disk_space
configuration
is set to null
, memfaultd
will report disk space metrics for all mounts
listed in /proc/mounts
whose ID starts with /dev
.
You can specify the list of disks that should be tracked like so:
"metrics": {
...
"system_metric_collection": {
...
"disk_space": ["/dev/sda1", "/dev/sda2", "/dev/sdb1"]
}
}
Disk space metric collection can be disabled, even with
metrics.system_metric_collection.enable
set to true
, by setting
metrics.system_metric_collection.disk_space
to []
. This will disable the
module that collects network interface metrics entirely.
Configure the Heartbeat interval
All metrics collected by memfaultd
are aggregated in memory. A new Heartbeat
will be generated and written to the filesystem at a regular interval. The
default value is 1 hour. You can change the interval by setting the
heartbeat_interval_seconds
configuration option in /etc/memfaultd.conf
.
A Heartbeat is also generated when you shutdown or restart memfaultd
and every
time the memfaultctl sync
command is used.
We strongly recommend using the default value of 1 hour for Heartbeat
aggregation. Setting heartbeat_interval_seconds
to a low value will result in
more frequent Heartbeats that will eventually be rate-limited by Memfault's
backend.
Contact us if you need to adjust this value for reasons that are specific to your device or use case.
Application metrics with StatsD
An easy way to capture application metrics is to use the StatsD protocol.
memfaultd
exposes a StatsD server at localhost:8125
(this is the default
location most StatsD client libraries will send readings to) and any application
running on the system can submit metric readings to it over UDP.
memfaultd
built-in StatsD server
memfaultd
's built-in StatsD server can be enabled by providing a
bind_address
value for the statsd_server
configuration field under the
top-level metrics
config.
{
// ...
"metrics": {
"statsd_server": {
"bind_address": "127.0.0.1:8125",
},
// ...
},
// ...
}
With this config, StatsD datagrams can be sent to port 8125 on localhost
to be
aggregated in Metric Reports.
StatsD Client Libraries
You can find a list of StatsD clients libraries for a diversity of languages in the StatsD repository.
In our meta-memfault-example
distro, we've added StatsD clients as
dependencies:
statsd-c-client
in our C sample app:python3-statsd
in our Python sample app:
Example: using C
See the full module in our example layer.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <statsd-client.h>
#include <unistd.h>
#define MAX_LINE_LEN 200
#define PKT_LEN 1400
int main(int argc, char *argv[])
{
statsd_link *link;
link = statsd_init_with_namespace("localhost", 8125, "mycapp");
char pkt[PKT_LEN] = {'\0'};
char tmp[MAX_LINE_LEN] = {'\0'};
statsd_prepare(link, "mygauge", 42, "g", 1.0, tmp, MAX_LINE_LEN, 1);
strncat(pkt, tmp, PKT_LEN - 1);
statsd_send(link, pkt);
statsd_finalize(link);
}
Example: using Python
See the full module in our example layer.
from statsd import StatsClient
statsd = StatsClient(
host="localhost",
port=8125,
prefix="mypythonapp",
)
statsd.gauge("mygauge", 42)
Custom Device Attributes
The memfaultctl
command provides an easy way to set a device-specific
attribute with the write-attributes
command.
# memfaultctl write-attributes APP_VERSION=1.4.2 ACTIVATED=true
Refer to memfaultctl write-attributes
documentation.
Log to metrics
You can use the Memfault SDK for Linux to generate metrics directly from logs on your device. This is useful to capture metrics from applications that do not support StatsD or other metrics collection protocols.
For more details, refer to the logging guide.
Session Metric Reports
Full support for Sessions in the Linux SDK was shipped with version 1.11.0. If your device is running an earlier version, you will need to upgrade to get access to session-based Metric Reporting!
Sessions offer an alternative to the periodic Heartbeat report approach. Similar
the Heartbeat reports, a session Metric Report contains a set of aggregated
metrics calculated from readings captured over a period of time. The difference
is while Heartbeat reports have a consistent duration (configured via
heartbeat_interval_seconds
), sessions are started and stopped dynamically
based on events on your device. This allows you to capture metrics that track
your device's health and behavior while taking a specific action or while it is
in a specific state.
Session Reports are available to all users, and can be visualized in the individual Device Timeline views.
Advanced Analytics, which enables Metrics Charts using session reports, Segments, data retention for 1 year, and other advanced features requires an add-on.
Defining Session Types
Before a session can be captured by memfaultd
, it first must be defined in the
/etc/memfaultd.conf
configuration file.
See the memfaultd.conf
reference for details.
The following is an example of what this config might look like:
{
"sessions": [
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "camera-recording"
},
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "video-cloud-sync"
}
]
}
This configuration would make sense for a smart camera device. With the above
config we define two sessions, one for when the camera is recording video
("camera-recording"
) and one for when the camera is syncing its recorded video
files to the cloud ("video-cloud-sync
"). In both cases we are capturing CPU
and memory metrics to measure the load these operations put on our device.
memfaultd
is able to handle overlapping sessions. So if our device is
uploading files to the cloud and recording video at the same time, memfaultd
can capture metrics for both session Metric Reports at the same time and will
create and upload two separate reports. This is important to consider when
defining your device's session types!
Controlling Sessions
The memfaultctl
CLI provides commands for starting and ending sessions. Using
the above config, we can start a session as our device begins recording video
with the following command:
# memfaultctl start-session camera-recording
This command makes a request to memfaultd
to start a camera-recording
session. Starting a session of a type where there is already a session in
progress is a no-op.
Once the a session is over, memfaultctl end-session
can be used to build a
Metric Report for the aggregated metric readings captured during the session.
Optionally, gauge readings can be passed to the session as additional arguments
to memfaultctl start-session
and memfaultctl end-session
. In this example,
we might pass a recording_errored
reading with memfaultctl end-session
to
indicate whether our camera recording session experienced an error while it was
recording video (where 1 indicates an error was experience and 0 indicates no
errors). The command to end the recording session and report no errors would
look like:
# memfaultctl end-session camera-recording recording_errored=0
Once a session is ended, its report is written to disk by memfaultd
and it
will be uploaded the next time memfaultd
uploads data (as configured by
upload_interval_seconds
).
High Resolution Telemetry
Memfault's standard Metric Reports aggregate metric readings over the duration of the report (the aggregation strategy used depends on the Metric Type - Counters are summed, Histograms are averaged, etc.) into a single value for each Metric. High Resolution Telemetry provides an alternative that can be used to enable deeper debugging by storing each metric reading's value individually alongside the timestamp at which the reading was received. This will increase the amount of data uploaded by Memfault but can provide insights at a second-by-second level that can aid in debugging.
High Resolution Telemetry is enabled by default and can be disabled by setting
metrics.hrt.enable
to false
. The maximum number of readings per minute is
750 by default and can be lowered via the metrics.hrt.max_samples_per_minute
configuration field.
Unlike Heartbeat and Session Metric Reports, High Resolution Telemetry Data will only be uploaded if a Device's Fleet Sampling configuration for Debugging is set to "On".
Testing your integration
During the development phase, we recommend setting a low value (e.g. 60 seconds)
for the heartbeat_interval_seconds
and upload_interval_seconds
settings in
/etc/memfaultd.conf
. Take a look at the /etc/memfaultd.conf
reference for more information.
For changes in /etc/memfaultd.conf
to take effect, you'll need to restart the
memfaultd
daemon:
$ systemctl restart memfaultd
Finally, we recommend enabling developer mode for this device in the Memfault backend to lift any rate limiting and allow you to see your data in the Memfault Web App as soon as possible.
The following section should help you figure out where you may expect data to be accessible in the Memfault Web Application.
Viewing Metrics in the Web Application
To see detailed reports from a specific device, find the device in Fleet → Devices, and then open its Timeline tab.
Open Dashboards → Metrics to create Metric Charts that monitor metrics at the fleet level by aggregating the data from each device.
To receive notifications when your metrics exceed a certain threshold or meet any complex set of criteria, you can set up Alerts. Navigate to Alerts using the main menu on the Memfault Web App.