Creating my own File Format

  • Posted on: 24 October 2016
  • By: ariane
Track: 
Software Development
Day: 
Sunday
Author: 
Ariane van der Steldt
Room: 
Track 3 (right)
Undefined
Paper: 

For Mon-soon (https://github.com/groupon/monsoon) I had to develop a file format, so it can store and recall historical monitoring data.

A number of (sometimes conflicting) requirements were present.

  • it needed fast recall of data
  • it needs to scale to terabytes of storage
  • it needs to be compact
  • expire old data, but as little as possible

Since the development of the file format went in tandem with the design of the logic that used these files, it's actually in its 3rd iteration.

In this talk I'll discuss:

  • what to definitely do on your first file format, to prevent hurting yourself immensely
  • transparent gzip compression, why it is a great idea and yet horrible
  • some tricks I use to make designing a file format quick and (relatively) easy
  • why the first 2 iterations were good and yet really, really bad
  • column-major vs row-major access
  • some of the specific challenges that Mon-soon imposes on the file format
  • error detection, recovery and reliable writes
Time: 
12:00 - 13:00 hrs
field_vote: 
0
No votes yet