Alerts
The Alerts feature will send a team an alert when heartbeat events satisfy a given threshold for a specific set of filters. To find the feature, click the Alerts item in the sidebar.
Fleet-threshold Alerts
Fleet-threshold Alerts operate on aggregate Metrics conditions. For example, you
could configure a Fleet-threshold Alert that triggers when your production
Cohort surpasses a mean count of 5 connection failures in a time window of one
hour.
Only Metrics marked as Timeseries can be used as a condition for Fleet-threshold Alerts.
Device-threshold Alerts
Device-threshold Alerts trigger when any Device reports a certain value or range of values for a Metric.
Let's create an alert when the battery_perc
(battery percentage) is below 5%,
for devices with serial numbers starting with MFLT
.
- Click Alerts in the sidebar.
- Click New Alert in the top right.
- Provide a Title and optional Description (but highly advised)!
- Choose Notification targets to which Alert events will be sent
- Next, the Condition value is the Metric string key within a heartbeat.
- We set the condition value to is less than 5.
- We also wanted to make sure that the Device Serial values start with
MFLT
. - To do this we click Add, and specify these values. To add multiple filters, click add "and" condition and repeat the process.
- Above, we have created our Alert! Click OK
Results
Once the Alert is set up, Memfault will watch for Events matching the criteria specified. If it finds any, it will create Incidents, which can be found by clicking on the Alert entry in the Alerts table.
On top of the entries in this view, Memfault will send an email every 5 minutes summarizing the number of devices that meet this threshold. Given this fact, it is best to set up the Alerts so that they are hit rarely to prevent abuse of the email inbox.
Notification Frequency
Expect alerts will be delivered in the UI almost immediately under Incidents. Navigate to Alerts and select the Alert you want. Incidents are listed per device. An email will also get sent once every 12 hours.
At most 1 incident is generated in a 24 hour period per device. This is to minimize the number of incidents created and avoid alert fatigue. Devices can hit a metric threshold multiple times in 24 hours without creating duplicate incidents.
When only 1 device hits a metric threshold within the last 24 hours and reports an alert incident, the captured date timestamp is updated once within 24 hours. When multiple devices begin hitting the threshold for a given metric within a 24 hour period, the captured date timestamp will be updated when the new alert is emitted.