[Thiago Cafe] Programming is fun!

Deserialising binary data in Rust

Created by Thiago Guedes on 2024-08-14 17:11:15

Tags: #rust   #c++  

One common way to deal with serialised data in C++ is to map to a struct, but in Rust with memory ownership and some smart memory optimizations, it is a little more complicated.

For example, let's say a producer in C++ serialises this message:

struct book_msg {  
    uint8_t major_version, // byte 0: message major version  
    uint8_t minor_version, // byte 1: message minor version  
    uint8_t msg_type,      // byte 2: message type  
    uint8_t title[20]      // byte 3-23: title of the book  
}
// ....
auto msg = create_msg();
comm.send(&msg, sizeof(book_msg));

How can we deserialise that in Rust?

Creating the struct to be filled with the data

Rust does not guarantee that the order of the struct arguments is maintained, so we need to use #[repr(C, packed)] to tell the compiler that the order of the arguments has to be maintained

#[repr(C, packed)]
struct BookMsg {  
    pub major_version: u8, // byte 0: message major version  
    pub minor_version: u8, // byte 1: message minor version  
    pub msg_type: u8,      // byte 2: message type  
    pub title: [u8; 20]    // byte 3-23: title of the book  
}

Mapping the bytes to a struct

Let's say we received:

// 013854546865205369676e206f662074686520466f7572 in hex
let msg_vec: Vec<u8> = comm.recv();

to map this vector to a struct, we need to violate rust safe guarantees as rust cannot verify in compile time that the struct maps to the vector size and data types.

Explanation of those 2 lines:

  1. We're casting the pointer from *const u8 to *const Msg
  2. And then we are accessing it as a value *
  3. And finally, getting a reference to this value &
let msg_bytes: *const u8 = msg_vec.as_ptr();  
let mapped_msg: &BookMsg = unsafe { &*(msg_bytes as *const BookMsg) };

Beware. When accessing parts of the message, it's better to copy the value to an external variable to avoid memory aligning issues.

let major = mapped_msg.major_version;  
let minor = mapped_msg.minor_version;  
let msg_type = msg.msg_type;  
let title = msg.title;

// Result:
// msg version=1.56, type=84
// msg title="The Sign of the Four"

Summary

  1. Use structs with #[repr(C, packed)] to ensure the layout is preserved
  2. Use pointers to map to this struct
  3. Copy the value from the struct to an external variable to avoid memory alignment issues

Tell me your opinion!

Reach me on Twitter - @thiedri