Skip to main content

memfaultd Built-in Metrics

Built-in System Metric Collection

With built-in system Metric collection enabled, memfaultd captures readings on the state and health of your Device which are aggregated as part of Metric Reports uploaded to Memfault.

When built-in system Metric collection is enabled, readings from collectd that share a top-level namespace with one of the below Metric groups will be dropped. This is to avoid double-counting or aggregating readings that may be calculated differently.

Aggregation Types

Like StatsD Metric readings, memfaultd uses different aggregation types internally when aggregating Metrics it collects itself. The different aggregation types are listed below:

  • Histogram: An average of all readings captured within a Metric Report, weighted evenly. Some Histogram Metrics also report minimum and maximum values -- see Min/Max Metrics below.
  • Gauge: Stores the most recent value for a Metric. When a new reading is received, the old value is discarded and replaced with the new reading's value.
  • Counter: A simple monotonically increasing sum. New readings within the duration of a Metric Report are added to the sum of all previous readings captured within that Metric Report.
  • String: Stores a string value as a report tag. Used to capture identifying information such as Device names or OUI values.

Min/Max Metrics

For Histogram Metrics, memfaultd normally reports only the average of all readings captured during a Metric Report interval. When min/max reporting is enabled for a Metric, memfaultd will emit three values per interval instead:

  • <metric_name> -- the average of all readings
  • <metric_name>_min -- the minimum reading observed
  • <metric_name>_max -- the maximum reading observed

This is useful for detecting spikes or drops that would otherwise be hidden by averaging -- for example, a brief CPU spike to 100% that averages out to 40% over the heartbeat interval.

Default min/max Metrics

The following built-in Histogram Metrics report min/max values automatically:

Metric patternDescription
cpu_usage_pctOverall CPU usage
memory_pctOverall memory usage
cpu_usage_<process>_pctPer-process CPU usage
memory_<process>_pctPer-process memory usage
interface/<interface>/bytes_per_second/rxNetwork interface RX throughput
interface/<interface>/bytes_per_second/txNetwork interface TX throughput
thermal/*All thermal zone metrics

Adding custom min/max Metrics

You can configure additional Histogram Metrics to report min/max values using the metrics.min_max_metrics configuration option. For example, to track min/max for custom application Metrics:

{
"metrics": {
"min_max_metrics": ["my_custom_metric", "my_service_latency"]
}
}

This will cause my_custom_metric and my_service_latency to each emit _min and _max values alongside their averages in every Metric Report. These custom entries are merged with the default min/max metrics listed above.

note

Min/max reporting only applies to Histogram Metrics. Gauge and Counter Metrics are not affected by this setting.

CPU

MetricDescriptionAggregation Type
cpu/cpu/percent/idleThe percent of time this core spent in the idle stateHistogram
cpu/cpu/percent/systemThe percent of time this core spent in the system stateHistogram
cpu/cpu/percent/userThe percent of time this core spent in the user stateHistogram
cpu/cpu/percent/iowaitThe percent of time this core spent in the iowait stateHistogram
cpu/cpu/percent/irqThe percent of time this core spent in the irq stateHistogram
cpu/cpu/percent/softirqThe percent of time this core spent in the softirq stateHistogram
cpu/cpu/percent/niceThe percent of time this core spent in the nice stateHistogram

Memory

MetricDescriptionAggregation Type
memory/memory/freeThe amount of free memory on the system in bytesHistogram
memory/memory/usedThe amount of used memory on the system in bytesHistogram
memory/memory/cachedThe amount of cached memory on the system in bytesHistogram
memory/memory/bufferedThe amount of buffered memory on the system in bytesHistogram
memory/memory/slab_reclThe amount of reclaimable slab memory on the system in bytesHistogram
memory/memory/slab_unreclThe amount of unreclaimable slab memory on the system in bytesHistogram

Virtual Memory

MetricDescriptionAggregation Type
memory/vm/swaps_in_per_secondPages swapped in per second from /proc/vmstatHistogram
memory/vm/swaps_out_per_secondPages swapped out per second from /proc/vmstatHistogram
memory/vm/pages_in_per_secondPages paged in per second from /proc/vmstatHistogram
memory/vm/pages_out_per_secondPages paged out per second from /proc/vmstatHistogram

Temperature

MetricDescriptionAggregation Type
thermal/<thermal zone type>/tempEach thermal zone temperature reading in degrees CelsiusHistogram

Network Interfaces

By default, memfaultd will detect interfaces whose name does not start with lo, tun, dummy, veth, or usb (to avoid automatically tracking virtual interfaces) and monitor them via the following metrics.

A specific set of interfaces can be specified via the metrics.system_metric_collection.network_interfaces configuration.

MetricDescriptionAggregation Type
interface/<interface>/bytes_per_second/rxBytes per second received on this interfaceHistogram
interface/<interface>/bytes_per_second/txBytes per second sent on this interfaceHistogram
interface/<interface>/errors_per_second/rxErrors per second for RX traffic on this interfaceHistogram
interface/<interface>/errors_per_second/txErrors per second for TX traffic on this interfaceHistogram
interface/<interface>/dropped_per_second/rxPackets dropped per second for RX traffic on this interfaceHistogram
interface/<interface>/dropped_per_second/txPackets dropped per second for TX traffic on this interfaceHistogram
interface/<interface>/packets_per_second/rxPackets received per second on this interfaceHistogram
interface/<interface>/packets_per_second/txPackets sent per second on this interfaceHistogram
interface/<interface>/total_bytes/rxTotal bytes received on this interface since the last readingCounter
interface/<interface>/total_bytes/txTotal bytes sent on this interface since the last readingCounter
interfaces/<interface>/rssiRSSI of the connection on this interface to a connected AP/router (wireless only)Histogram

Wireless Networks

MetricDescriptionAggregation Type
wireless/oui/local_<device>Local OUI (Organizationally Unique Identifier) for the specified DeviceString
wireless/oui/ap_<device>Access Point OUI (Organizationally Unique Identifier) associated with the specified local DeviceString

Process-level Metrics

memfaultd will capture the following metrics for processes specified in the metrics.system_metric_collection.processes configuration.

note

<name> is the filename of the process's executable. If it is longer than 16 characters, it will be truncated to the first 16 characters in the filename.

MetricDescriptionAggregation Type
processes/<name>/rss_bytesResident set size of the process in bytesHistogram
processes/<name>/vm_bytesVirtual memory size of the process in bytesHistogram
processes/<name>/num_threadsNumber of threads in the processHistogram
processes/<name>/cpu/percent/userPercent of CPU time the process spent in user modeHistogram
processes/<name>/cpu/percent/systemPercent of CPU time the process spent in kernel modeHistogram
processes/<name>/pagefaults/minorMinor page faults (not requiring disk I/O) since the last readingHistogram
processes/<name>/pagefaults/majorMajor page faults (requiring disk I/O) since the last readingHistogram

Disk Space Metrics

By default, memfaultd will detect disks whose ID starts with /dev (excluding loop and ram devices) and track their disk space with these metrics.

Other disks or partitions can be specified via the metrics.system_metric_collection.disk_space configuration.

MetricDescriptionAggregation Type
disk_space/<disk>/free_bytesBytes on the corresponding disk or partition that are unusedHistogram
disk_space/<disk>/used_bytesBytes on the corresponding disk or partition that are in useHistogram

Diskstats Metrics

By default, memfaultd will collect disk I/O statistics from /proc/diskstats for all Devices (excluding loop and ram Devices). A specific set of Devices can be specified via the metrics.system_metric_collection.diskstats configuration.

MetricDescriptionAggregation Type
diskstats/<device>/reads_per_secondRead operations per second on this DeviceHistogram
diskstats/<device>/writes_per_secondWrite operations per second on this DeviceHistogram

MMC Disk Metrics

For eMMC/MMC block Devices (Devices with names matching mmcblk*), memfaultd collects additional disk health and identification Metrics.

MetricDescriptionAggregation Type
diskstats/<disk>/lifetime_remaining_pctEstimated lifetime remaining percentage (type A)Gauge
diskstats/<disk>/lifetime_b_remaining_pctEstimated lifetime remaining percentage (type B)Gauge
diskstats/<disk>/bytes_writtenBytes written to this Device since the last readingCounter
diskstats/<disk>/total_size_bytesTotal size of the Device in bytesGauge
diskstats/<disk>/nameProduct name of the DeviceString
diskstats/<disk>/manufacturer_idManufacturer ID of the DeviceString
diskstats/<disk>/manufacture_dateManufacture date of the DeviceString
diskstats/<disk>/revisionFirmware Revision of the DeviceString
diskstats/<disk>/serialSerial number of the DeviceString

Log Counter Metrics

memfaultd includes a number of built-in Log Filtering rules that increment counter metrics when a matching log line is received.

MetricDescriptionAggregation Type
oomkill_<process_name>Count of times process_name has been killed by the OOM killerCounter
systemd_restarts_<service_name>Count of times the service_name service has been restartedCounter
note

This metrics require the logging feature to be enabled and for logs to be sent to memfaultd.