Linux Metrics
Introduction
A key feature of the Memfault SDK is the ability to capture metrics on device, aggregate them in regular Heartbeat messages, and upload them to the Memfault cloud.
Using metrics
The aggregated metrics:
- Are shown on the device timeline.
- Are available for use as Timeseries or Attributes metrics, aggregated across the entire fleet.
- Can be used to create Metric Charts and Alerts.
See Metrics for an introduction to Memfault terminology related to metrics and to learn how the features you'll set up here can be accessed in the Memfault Web App.
Architecture
memfaultd
has the ability to capture some metrics as a standalone program.
These built-in metrics can be augmented by collectd to
capture further metrics from the system. All metrics from collectd
are sent to
the local HTTP memfaultd
is running on the device.
This architecture has the following advantages:
- Metrics are captured by a well-known and widely used tool. A wide set of plugins are available.
- You can easily push your own metrics using one of the
statsd
plugin for collectd or viamemfaultd
's built-in StatsD server. - Metrics are aggregated in memory and sent to the Memfault backend in a single Heartbeat message. This reduces the number of messages sent to the backend and reduces the load on the device.
- Aggregated metrics are written to disk (or to a volatile storage if the
tmp_storage
is configured to point to atmpfs
), and you have precise control on how much space is used by your diagnostic data. - Memfault Fleet Sampling can be used to enable or disable metrics captures remotely.
- Until the user opts-in for data collection, all data is dropped and never written to disk or network.
Prerequisites
The memfaultd
daemon
Follow the integration guide to learn how to set this up
for your device. You will need the collectd
feature
enabled to use collectd
with Memfault.
collectd
collectd is a lightweight daemon that collects system and application metrics. It is widely used in the Linux ecosystem and has a large number of plugins available to collect metrics from a variety of sources.
The collectd
feature of memfaultd
enables a collectd-compatible HTTP
endpoint. This endpoint will receive metrics pushed by the collectd daemon at
any frequency. memfaultd
aggregates the metrics (in memory) into Heartbeats
and sends them to the Memfault cloud.
In addition to standard metrics captured by collectd, you can capture custom metrics from your application using the StatsD protocol to collectd.
To use collectd with Memfault, we will configure collectd to send metrics to
memfaultd
(over HTTP on localhost
). This is our recommended configuration
and demonstrated in meta-memfault-example
.
collectd with Yocto
For Yocto, the meta-oe
layer includes a recipe for
collectd
, so you may be able to just add collectd
to your dependencies, e.g. by appending it to IMAGE_INSTALL
. In our example
project, we've opted for adding it to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS
in
layer.conf
.
Configuring memfaultd
Configure Built-in System Metric Collection
Enabling memfaultd
's system metric collection in the metrics
config turns on memfaultd
's internal system for
capturing metrics about the resource usage and state of your device. When
memfaultd
is capturing metrics for a given top-level namespace or "family" of
metrics (for a metric like cpu/sum/percent/idle
that would be cpu
),
collectd
metric readings that share that namespace will be ignored. This is to
prevent double counting or the aggregation of readings that may have slightly
different collection methodologies together.
For the list of system metrics that will be captured by memfaultd
when
metrics.enable_system_metric_collection
is set to true
, see the memfaultd
Built-in Metrics reference
Configure the http server
memfaultd
exposes an HTTP server on port 8787 by default. The port must match
the port that you will define in the collectd configuration. You can change the
port by setting the http_port
configuration option in /etc/memfaultd.conf
.
Metrics are expected to be sent to the /v1/collectd
path (this is not
configurable).
Enable metrics collection
By default, enable_data_collection
is false
(see the default
configuration). This is to enable asking end users for
consent before collecting or transmitting any data to Memfault services.
Once the end user has given their consent, you can enable data collection like so:
$ memfaultctl enable-data-collection
To disable it:
$ memfaultctl disable-data-collection
The memfaultd
service will restart automatically whenever you run either of
those commands if called with a value different from the current configuration.
Take a look at the /etc/memfaultd.conf
reference for
more information.
Configure the Heartbeat interval
All metrics pushed by collectd are aggregated in memory. A new Heartbeat will be
generated at a regular interval. The default value is 1 hour. You can change the
interval by setting the heartbeat_interval_seconds
configuration option in
/etc/memfaultd.conf
.
A Heartbeat is also generated when you shutdown or restart memfaultd
and every
time the memfaultctl sync
command is used.
We recommend using the default value of 1 hour for Heartbeat aggregation.
Setting heartbeat_interval_seconds
to a low value will result in more frequent
Heartbeats that will eventually be rate-limited by Memfault's backend.
Contact us if you need to adjust this value.
Configuring collectd
We strongly recommend familiarizing yourself with collectd and how it's configured in order to make the best use of the Memfault platform.
A minimal configuration file to configure collectd to post to Memfault looks like this:
<LoadPlugin write_http>
FlushInterval 10
</LoadPlugin>
<Plugin write_http>
<Node "memfaultd">
URL "http://127.0.0.1:8787/v1/collectd"
Format "JSON"
Metrics true
Notifications false
StoreRates true
BufferSize 65536
Timeout 10000
</Node>
</Plugin>
This file configures the write_http
plugin to send data to the memfaultd
daemon on localhost. By default the port 8787 is used. We also set a
FlushInterval
of 10 seconds.
The FlushInterval
is the interval at which collectd will send data to
memfaultd
. Because memfaultd
will re-aggregate the data in memory, a faster
frequency will not result in more points visible in the Memfault backend. We
recommend anywhere between 10 and 60 seconds.
Setting this interval to a value that is too high will increase the time error
on the measurement (heartbeats are time-stamped when they are generated but the
data may be much older if the FlushInterval
is too high). With a
FlushInterval
greater than the Heartbeat interval, you would have Heartbeats
without data points.
Note that this minimal example does not capture any data. You will need to activate some CollectD plugins to capture useful data. This will be covered in the next section.
Recommended configuration
See our recommended collectd configuration in
meta-memfault-example
. This configuration
makes use of standard plugins that enjoy special support on the Memfault
platform. Copying this configuration over to your project will guarantee a good
first experience.
Application metrics with StatsD
An easy way to capture application metrics is to use the StatsD protocol.
Metrics are captured inside your application and sent to collectd (over UDP).
There are two supported StatsD servers in the Linux SDK - memfaultd
's built-in
StatsD server and the statsd
plugin for collectd
. Our recommendation is to
run only one of these at a time, as they serve the same purpose and if both run
on their default port (8125 for both) one will fail to start due to the port
already being in use (by the other StatsD server).
memfaultd
built-in StatsD server
memfaultd
's built-in StatsD server can be enabled by providing a
bind_address
value for the statsd_server
configuration field under the
top-level metrics
config.
{
// ...
"metrics": {
"statsd_server": {
"bind_address": "127.0.0.1:8125",
},
// ...
},
// ...
}
With this config, StatsD datagrams can be sent to port 8125 on localhost
to be
aggregated in Metric Reports. Note that memfaultd
does not currently support
Timer
StatsD readings (which have the ms
type).
statsd
plugin for collectd
Collectd will summarize received StatsD metric readings and push them to
memfaultd
once every 10s. memfaultd
will aggregate them in memory (just like
other system metrics captured by collectd
) and send them to the Memfault
backend in a Heartbeat message.
Enable the statsd
plugin in your /etc/collectd.conf
(note that this is
already included in the
recommended configuration):
LoadPlugin statsd
We also recommend the following configuration to:
<Plugin statsd>
# Default value for Host is "::" / 0.0.0.0
Host "127.0.0.1"
# Adds a 'count' metric for each counter (in addition to rate/second)
CounterSum true
# Don't dispatch gauges or counters that have not been written to in an interval
DeleteGauges true
DeleteCounters true
</Plugin>
This configuration will:
- Open the statsd port to localhost only.
- Add a
count
metric for each counter which is useful to see how many times specific events happened per Heartbeat (by default, CollectD will convert them to rate/second). - Use the
DeleteGauges
andDeleteCounters
options so that measurements are only sent when they have been updated in the last interval.
The StatsD plugin exposes a UDP port (by default
8125
). You'll need to configure your StatsD client to talk to it. Read on to
see some examples.
Install a StatsD Client
You can find a list of StatsD clients for a diversity of languages in the StatsD repository.
In our example project, we've added StatsD clients as dependencies:
statsd-c-client
in our C sample app:python3-statsd
in our Python sample app:
Example: using C
See the full module in our example layer.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <statsd-client.h>
#include <unistd.h>
#define MAX_LINE_LEN 200
#define PKT_LEN 1400
int main(int argc, char *argv[])
{
statsd_link *link;
link = statsd_init_with_namespace("localhost", 8125, "mycapp");
char pkt[PKT_LEN] = {'\0'};
char tmp[MAX_LINE_LEN] = {'\0'};
statsd_prepare(link, "mygauge", 42, "g", 1.0, tmp, MAX_LINE_LEN, 1);
strncat(pkt, tmp, PKT_LEN - 1);
statsd_send(link, pkt);
statsd_finalize(link);
}
Example: using Python
See the full module in our example layer.
from statsd import StatsClient
statsd = StatsClient(
host="localhost",
port=8125,
prefix="mypythonapp",
)
statsd.gauge("mygauge", 42)
Custom Device Attributes
The memfaultctl
command provides an easy way to set a device-specific
attribute with the write-attributes
command.
# memfaultctl write-attributes APP_VERSION=1.4.2 ACTIVATED=true
Refer to memfaultctl write-attributes
documentation.
Log to metrics
You can use the Memfault SDK for Linux to generate metrics directly from logs on your device. This is useful to capture metrics from applications that do not support StatsD or other metrics collection protocols.
For more details, refer to the logging guide.
Sessions
Full support for Sessions in the Linux SDK was shipped with version 1.11.0. If your device is running an earlier version, you will need to upgrade to get access to session-based Metric Reporting!
Sessions offer an alternative to the periodic Heartbeat report approach. Similar
the Heartbeat reports, a session Metric Report contains a set of aggregated
metrics calculated from readings captured over a period of time. The difference
is while Heartbeat reports have a consistent duration (configured via
heartbeat_interval_seconds
), sessions are started and stopped dynamically
based on events on your device. This allows you to capture metrics that track
your device's health and behavior while taking a specific action or while it is
in a specific state.
Session Reports are available to all users, and can be visualized in the individual Device Timeline views.
Advanced Analytics, which enables Metrics Charts using session reports, Segments, data retention for 1 year, and other advanced features requires an add-on.
Defining Session Types
Before a session can be captured by memfaultd
, it first must be defined in the
/etc/memfaultd.conf
configuration file.
See the memfaultd.conf
reference for details.
The following is an example of what this config might look like:
{
"sessions": [
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "camera-recording"
},
{
"captured_metrics": [
"cpu/sum/percent/system",
"cpu/sum/percent/user",
"cpu/sum/percent/idle",
"cpu/sum/percent/idle",
"memory/memory/free",
"memory/memory/used"
],
"name": "video-cloud-sync"
}
]
}
This configuration would make sense for a smart camera device. With the above
config we define two sessions, one for when the camera is recording video
("camera-recording"
) and one for when the camera is syncing its recorded video
files to the cloud ("video-cloud-sync
"). In both cases we are capturing CPU
and memory metrics to measure the load these operations put on our device.
memfaultd
is able to handle overlapping sessions. So if our device is
uploading files to the cloud and recording video at the same time, memfaultd
can capture metrics for both session Metric Reports at the same time and will
create and upload two separate reports. This is important to consider when
defining your device's session types!
Controlling Sessions
The memfaultctl
CLI provides commands for starting and ending sessions. Using
the above config, we can start a session as our device begins recording video
with the following command:
# memfaultctl start-session camera-recording
This command makes a request to memfaultd
to start a camera-recording
session. Starting a session of a type where there is already a session in
progress is a no-op.
Once the a session is over, memfaultctl end-session
can be used to build a
Metric Report for the aggregated metric readings captured during the session.
Optionally, gauge readings can be passed to the session as additional arguments
to memfaultctl start-session
and memfaultctl end-session
. In this example,
we might pass a recording_errored
reading with memfaultctl end-session
to
indicate whether our camera recording session experienced an error while it was
recording video (where 1 indicates an error was experience and 0 indicates no
errors). The command to end the recording session and report no errors would
look like:
# memfaultctl end-session camera-recording recording_errored=0
Once a session is ended, its report is written to disk by memfaultd
and it
will be uploaded the next time memfaultd
uploads data (as configured by
upload_interval_seconds
).
Testing your integration
During the development phase, we recommend setting a low value (e.g. 60 seconds)
for the heartbeat_interval_seconds
and upload_interval_seconds
settings in
/etc/memfaultd.conf
. Take a look at the /etc/memfaultd.conf
reference for more information.
For changes in /etc/memfaultd.conf
to take effect, you'll need to restart the
memfaultd
daemon:
$ systemctl restart memfaultd
Finally, we recommend enabling developer mode for this device in the Memfault backend to lift any rate limiting and allow you to see your data in the Memfault Web App as soon as possible.
The following section should help you figure out where you may expect data to be accessible in the Memfault Web Application.
Viewing Metrics in the Web Application
To see detailed reports from a specific device, find the device in Fleet → Devices, and then open its Timeline tab.
Open Dashboards → Metrics to create Metric Charts that monitor metrics at the fleet level by aggregating the data from each device.
To receive notifications when your metrics exceed a certain threshold or meet any complex set of criteria, you can set up Alerts. Navigate to Alerts using the main menu on the Memfault Web App.