This article walks through the serialization strategy used by the Memfault Metrics Component and event subsystem in the memfault-firmware-sdk. It also discusses the advantages of the approach over other standard embedded serialization strategies by walking through a real-world example.
- Minimize Serialization Overhead
- Minimize bandwidth costs (such as situations where a cellular connection is in use).
- Maximize the amount of data that can be sent over slow data transports such as LoRa or BLE.
- Minimize the amount of RAM or flash storage needed to batch up events while a device is offline.
- Flexible Schema
- Over the lifecycle of a device, the metrics being tracked will evolve. Adding or removing a tracker should be a few line change and not require writing migration handlers or require coordinating release updates between many teams within an organization (e.g firmware/mobile/cloud).
- Small RAM and Code Footprint
- Solution must scale down to devices with kBs of Flash & RAM to accommodate the most deeply embedded devices.
In the sections below we will walk through the pros and cons of various serialization strategies. A quick summary of the highlights can be found in the table below.
|Encoding||Example Payload |
Encode Size (bytes)
|No Mobile/Cloud Changes needed |
when fields are added
|Compact Key Representation |
(Not a String)
|Integers Compressed |
based on Magnitude
|No Custom Migration Handlers Required|
|packed c struct||69||❌||❌||✅||❌||❌|
For comparison, let's walk through encoding a typical heartbeat metric used for tracking the health of a BLE based iOT device:
One approach for embedded devices is just sending JSON. This gives us a very flexible way to add and remove arbitrary key value pairs. However, C based JSON encoders wind up consuming a lot of RAM and flash space and often wind up being handwritten because no standard C implementation for embedded devices exists.
Even if we minify (strip the whitespace from) the example above, the event message payload winds up being 471 bytes!
One way we can optimize the size of the JSON is to use MsgPack (or CBOR). This still lets us flexibly add and remove key value pairs but encodes the values in a compressed binary form. For example, the largest uint32_t value (4294967295) can be encoded in 5 bytes instead of the 10 it would take up with JSON.
Using msgpack we can reduce the minified size down to 399 bytes (15%):
We can see from the binary dump above, a main reason our binary payload is so large is because the key names are all encoded as strings which takes up a lot of space. One way to reduce this is to use a structured serialization format. A typical approach is to use protobuf.
With Protobuf, key names are encoded with an integer "field number" and like MsgPack integers are encoded using a variable length scheme so that small integer numbers can be represented in very few bytes.
A user must define a representation of the data to be serialized in a ".proto" file and then can autogenerate code to assist packing and unpacking the data. For embedded devices, nanopb is a popular library used for the actual encoding.
With protobuf, our example above encodes in 72 bytes (an 82% decrease over MsgPack!)
While this is a great improvement, there are a few things to note:
protocand auto-generation need to be integrated into all the projects involved in encoding/decoding the data.
- Anytime we add a new key we will need to update the decoders to parse them. So for example if a new field is added on the firmware a decoder will need to be added on the consumer side (e.g. within a mobile app or a web backend) to decode it. This means any update will likely involve several teams within an organization to coordinate in order for a new event to be published from the firmware.
- If decoders are out of sync, new data will be silently dropped.
- A lot of boilerplate code is needed to serialize the data which can be cumbersome to write and read.
Due to some of the challenges with integrating protobuf many projects fallback to using packed C structures.
However, this is typically the worst approach that can be taken for numerous reasons. Notably,
- Value encoding is not standardized and is architecture dependent! Whereas JSON, MsgPack, CBOR and Protobuf all have a platform neutral way to represent numbers, with packed C structs it's usually up to the mobile or web engineer to manually deal with endianness issues and make sure multi-byte integer values are decoded correctly.
- You need to pick the right size integer encoding upfront for all the values to encode to save space when packing the structure. When doing this you also need to make sure you handle overflow. Otherwise counters will rollover and bogus values will get reported in your trackers.
- Encoding and decoding has to be written manually, often in different languages and by different engineering teams. This makes it much more likely to have subtle decode errors and integration issues.
- All migration logic from one version of an event to another has to be manually handled by developers.
The c-struct representation winds up being 69 bytes (only 3 bytes smaller than protobuf):
The Memfault Firmware SDK balances space utilization with ease of use based on the pros and cons explored above. Notably,
- All data values are encoded using CBOR. This means new fields can be added without breaking the decode process and all data types (e.g integers) are encoded using a standard platform-agnostic format (developers don't need to deal with endianness issues!)
- No key information is encoded during serialization at all! Instead an array of
CBOR values is stored for
event_info. This means as the number of keys increases, the encoding winds up being more optimal than protobuf because no bytes are needed for the key.
- To alleviate any decode burden or integration of any new tooling to generate mappings, the Memfault cloud makes use of the symbol file for the firmware release to autogenerate a mapping back to the key names. This means metrics can be updated on the firmware at any time via a firmware update and opaquely passed through a mobile app or cloud backend without requiring any updates!
With this approach, the example above has a serialization size of 68 bytes, which winds up being 1 byte less than the packed C structure!