Skip to main content

Android Custom Metrics

info

Reporting APIs are available from Bort 4.0 onwards.

Memfault has support for various built-in AOSP metrics, like battery health, wakelock counts, foregrounded apps and so on. See Built-in Metrics for more details.

Aside from the built-in metrics, it is also possible to define custom ones. In this guide we describe the different kinds of metric that can be collected.

Summary

The Reporting API enables logging individual metrics. These are uploaded in high resolution and in aggregate.

High Resolution Telemetry (HRT)

Individual metric events are uploaded and displayed in high resolution on the device timeline. This will appear similar to the existing batterystats metrics.

All metrics are uploaded as part of HRT, even if there are no aggregations requested.

note

HRT is supported from Bort 4.4.0 onwards. This is part of the Debugging Fleet Sampling resolution.

Aggregate Metrics

Bort aggregates each metric at the end of the reporting period (usually 1 hour). These aggregates are:

  • Shown on the device timeline.
  • Available for use as Timeseries or Attributes metrics, aggregated across the entire fleet.
note

Aggregate metrics are supported from Bort 4.0.0 onwards. This is part of the Monitoring Fleet Sampling resolution.

See Metrics for more details on how metrics can be configured and used in the Memfault dashboard.

Including the Reporting SDK

Custom Metrics can be recorded from any Java/Kotlin application, by including the reporting-lib library. This library reports metrics to the memfault_structured system service (The main Bort application is not required to be started for every value recorded). The latest version of the library is 1.1.

To add the library, follow these instructions.

A C/C++ library will be added in a future release.

Recording Metrics

The Bort SDK creates a metric report on a regular schedule (roughly once per hour). This report contains aggregated values for all metrics which were reported during the period covered by the report.

For example, if we create a counter metric and invoke its increment method 5 times during a reporting period, the the metric report will contain this value (5). Its value will be reset for the next reporting period.

The regular "heartbeat" metric Report is accessed using:

Reporting.report()

Custom Reporting periods may be available in a future release of the SDK.

Individual metrics are created with methods on the Report class e.g.

Reporting.report().counter("my-counter")

We recommend creating constants to define each metric that you will use, so that configuration (naming, defining aggregations) only has to happen in one place e.g.

val distributionMetric = Reporting.report().distribution("my-distribution", numericAggs(NumericAgg.MEAN, NumericAgg.MIN))

Then we can use this metric to record values from elsewhere in our codebase:

distributionMetric.record(6.89)
distributionMetric.record(9.86)

Metric Types

The Reporting APIs provide convenience wrappers for several common types of metrics.

Counter

Tracks the number of times that an event occured during each reporting period. As described above, the Bort SDK will track all recorded values, and report the sum in the metric report.

Create a metric counter named disconnection-events:

val disconnectionCounter = Reporting.report().counter("disconnection-events")

Then record values:

disconnectionCounter.increment()
disconnectionCounter.incrementBy(4)
disconnectionCounter.incrementBy(7.9)

This will create a metric in Memfault named disconnection-events.sum. In the example above, this would have a value of 12.9 for the current reporting period. This metric can be disabled by setting sumInReport = false when creating the metric (individual events will be shown in HRT on timeline).

info

Counters could be useful as either Timeseries Metrics (tracking trends over time), or simply for reviewing on Device Timeline.

Distribution

Tracks the distribution of a numerical value during the reporting period. Several aggregations are available to use.

Create a distribution metric named rssi, which tracks MEAN and MIN values:

val rssiMetric = Reporting.report().distribution("rssi", numericAggs(NumericAgg.MEAN, NumericAgg.MIN))

Then record values periodically:

rssiMetric.record(-38.68)
rssiMetric.record(-39.83)
rssiMetric.record(-43.67)
rssiMetric.record(-78.89)

This will create one metric in Memfault to track each requested aggregation:

  • rssi.mean: The mean RSSI value seen in the reporting period (i.e. the average signal strength).
  • rssi.min: The minimum RSSI value seen in the reporting period (i.e. the worst signal strength).

The available distribution aggregations are defined in the NumericAgg enum:

  • MIN: Minimum value recorded during the period.
  • MAX: Maximum value recorded during the period.
  • SUM: Sum of all values recorded during the period.
  • MEAN: Mean value recorded during the period.
  • COUNT: Number of values recorded during the period.

All values will be shown in HRT on timeline. Entries will only be created in the metric report for each requested aggregation.

info

Distributions may we well suited to track selected metrics over time across the fleet, as Timeseries Metrics - while also useful on the Device Timeline as part of the Metric Report.

State Tracker

Tracks state transitions - for example, the connection state of a peripheral device or enabled state of a device function.

State trackers track state using an enum:

enum class ConnectionState {
DISCONNECTED,
CONNECTING,
CONNECTED,
}

Create a state tracker named connection-state, tracking the total time spent in each state:

private val connectionStateMetric = Reporting.report().stateTracker<ConnectionState>("connection-state", statsAggs(StateAgg.TIME_TOTALS))

Then record state transitions as they occur:

connectionStateMetric.state(DISCONNECTED)
connectionStateMetric.state(CONNECTING)
connectionStateMetric.state(CONNECTED)
connectionStateMetric.state(DISCONNECTED)

This creates one metric for each state:

  • state_DISCONNECTED.total_secs: total seconds spent in the DISCONNECTED state in the reporting period.
  • state_CONNECTING.total_secs: total seconds spent in the CONNECTING state in the reporting period.
  • state_CONNECTED.total_secs: total seconds spent in the CONNECTED state in the reporting period.

Note that we used TIME_TOTALS for this example. There are three available aggregation types, defined in StateAgg:

  • TIME_TOTALS: Creates a metric per state, totalling the time (in seconds) spent in that state.
  • TIME_PER_HOUR: Creates a metric per state, showing time per hour (in seconds) spent in that state. This is useful if reports are not exactly one hour long.
  • LATEST_VALUE: Creates one metric in the report, with the last recorded value at the end of the reporting period. This is similar to using a stringProperty or numberProperty.

Even if no aggregation types are requested, all state transitions will be displayed on timeline by HRT.

info

Tracking time spent in selected states may be useful to track across the fleet using Timeseries Metrics, while LATEST_VALUE could be an interesting Device Attribute to filter devices.

String / Number Properties

Track device properties using these metrics. The last reported value will be included in the Metric Report, at the end of the reporting period.

Create string/numeric property trackers (named device-color/first-boot-time):

val deviceColor = Reporting.report().stringProperty("device-color")
val firstBootTime = Reporting.report().numberProperty("first-boot-time")

Then record values for these properties:

deviceColor.update("orange")
firstBootTime.update(1642794263)

This creates one metric for each property, with the last reported value for each reporting period:

  • device-color.latest: Latest value of this string property.
  • first-boot-time.latest: Latest value of this numeric property.

All values are displayed on timeline as part of HRT. The .latest metric can be disabled by setting addLatestToReport = false when creating the metric.

info

Properties are likely to be useful as Device Attributes, so that we can search the fleet for devices with specific properties (and create Device Sets based on them).

Event

Tracks individual events. When using HRT, these will be displayed individually on the device timeline.

Create a metric event named failure:

val failureEvent = Reporting.report().event("failure")

Then record events:

failureEvent.add("timeout")
failureEvent.add("code 403")

By default, this will not include any metrics in the heartbeat report; values will be visible on timeline using HRT. To include an event count in the report, set countInReport = true when creating the metric.

info

This API will replace Custom Events in a future release.

Recommendations

We recommend creating a wrapper class to define the metrics that you will use in your application. This could even be in a shared library, if you plan to record the same metrics from multiple applications.

package com.memfault.bort

import com.memfault.bort.reporting.NumericAgg
import com.memfault.bort.reporting.Reporting
import com.memfault.bort.reporting.StateAgg

enum class ConnectionState {
DISCONNECTED,
CONNECTING,
CONNECTED,
}

object RecordMetrics {
val rssiMetric = Reporting.report().distribution("rssi", NumericAgg.MEAN, NumericAgg.MIN)
val connectionStateMetric = Reporting.report().stateTracker<ConnectionState>("connection-state", StateAgg.TIME_TOTALS)
val disconnectionCounter = Reporting.report().counter("disconnection-events")
val deviceColor = Reporting.report().stringProperty("device-color")
val firstBootTime = Reporting.report().stringProperty("first-boot-time")
}

We can then record values from wherever we like in the application, without having to recreate the metric configuration:

disconnectionCounter.incrementBy(4)
rssiMetric.record(-43.67)
connectionStateMetric.state(CONNECTED)
deviceColor.update("orange")
firstBootTime.update(1642794263)

Troubleshooting

  • If you record metric values but don't see them appearing in the Memfault timeline:
    • Check that one of more aggregations are defined, for distribution and stateTracker metrics - otherwise no aggregations will be created in each metric report.
    • Wait for Bort to create the next metric report (roughly once per hour) - this is the same process which populates the built-in battery metrics on the device timeline (so you can tell when this has happened).