A super hacky tool to decode unknown binary formats

software (29) zig (14) formats (3)

Table of Contents
  1. The Issue
  2. The Solution
  3. Usage
  4. The Future
  5. Links

Select anchorThe Issue

I often work with either new or unknown binary formats and decoding those with a hex editor is hard work. Especially when trying to understand and reverse engineer an unknown binary format, having just a hex editor is often not enough.

I converged to the same solution several times in the last years, and the last time I thought:

Let’s make it an actual tool, not just another hacky script for the current purpose.

Select anchorThe Solution

The tool I created is called livedecode. It consumes two files, and dumps its output on stdout:

[user@host project]$ livedecode example/png.spec example/example.png

This will read a specification file (png.spec), and will apply the decoding instructions to example.png:

Console output of the above command showing a partially decoded PNG file

The png.spec looks something like this:

endian be
print "File Header:"

u8 magic_8bit # should be 0x89
str 5 magic # PNG\r\n
u8 magic_1a # 0x1A
u8 magic_0a # 0x0a

def tEXt 1950701684
def IHDR 1229472850
def tIME 1950960965

call chunk "IHDR"
call chunk "gAMA"
call chunk "cHRM"

As you can see, the format is line based, and has some assembler-like syntax. Each thing you decode (e.g. u8 magic_1a) will print its result to stdout.

You can also invoke subprograms with call <pgm> <args..>, like you can see in call chunk IHDR.

A smaller specification might look like this:

endian le
u32 magic
u32 type
u32 offset
u32 length

print section 1
seek *offset
dump *length

.if *type 10
  seek 0x200
  str 10 description

This format has a header built of a magic number, a file type, offset and length. The program will then seek to the specified offset and prints a hex dump with the length specified in the header.

Also, if the type is 10, it will seek to offset 512 and will print a 10 characters long string labelled description.

Select anchorUsage

I use livedecode in VSCode by running it periodically in a terminal:

[user@host project]$ while true; do
  livedecode docs/wmb6.spec data/wmb/wmb6/block.wmb > /tmp/dump.txt
  sleep 1

Then I view the dump.txt side by side with my spec file:

VSCode with a side-by-side editor view, left has the livedecode spec, on the right is the decoded data

This way, I can type, save and immediatly see the new decoding result. Working this way is very efficient and is really supporting an explorative workflow.

Select anchorThe Future

At roughly the same time where livedecode was finished, I started working on a new project that has a similar goal, but a different approach:

BFDL is using a formal syntax to describe the file formats instead of just executing a loose set of instructions. The benefits of that approach are that if you’re done discovering or designing your file format, you can then simply generate a serializer/deserializer for your format straight from your specification.

Select anchorLinks