# Kaitai Struct Kaitai Struct is a declarative language and cross-platform framework for describing and parsing binary data structures. It allows developers to define complex binary file formats, network protocols, and data structures using `.ksy` files, which are then compiled into parser source code for multiple programming languages. This approach eliminates the need to write repetitive, error-prone parsing code manually and ensures consistency across different platforms and languages. The framework consists of several components organized as submodules: a compiler (`kaitai-struct-compiler` or `ksc`) that translates `.ksy` format specifications into target language code, runtime libraries for 14+ languages including Java, Python, JavaScript, C++, C#, Go, Ruby, PHP, Rust, and others, a collection of pre-defined format descriptions, and visualization tools including a Web IDE and console visualizer. The project enables developers to describe a binary format once and generate parsers that work across all supported languages, making binary data manipulation more accessible and maintainable. ## Core Components ### KaitaiStream Stream API Runtime library providing binary data reading capabilities across all supported languages. ```python # Python runtime example - reading a binary file from kaitaistruct import KaitaiStream, KaitaiStruct import io # Create a stream from binary data with open('example.bin', 'rb') as f: data = f.read() stream = KaitaiStream(io.BytesIO(data)) # Read various data types unsigned_byte = stream.read_u1() # Read 1-byte unsigned integer signed_short_le = stream.read_s2le() # Read 2-byte signed integer (little-endian) unsigned_int_be = stream.read_u4be() # Read 4-byte unsigned integer (big-endian) float_le = stream.read_f4le() # Read 4-byte float (little-endian) double_be = stream.read_f8be() # Read 8-byte double (big-endian) # Stream positioning current_pos = stream.pos() # Get current position stream.seek(100) # Seek to position 100 is_eof = stream.is_eof() # Check if at end of file total_size = stream.size() # Get total stream size # Read byte arrays byte_array = stream.read_bytes(10) # Read 10 bytes remaining = stream.read_bytes_full() # Read all remaining bytes term_string = stream.read_bytes_term(0, False, True, True) # Read until null terminator ``` ### Format Definition (.ksy files) Declarative YAML-based format for describing binary structures. ```yaml # gif.ksy - Example format definition for GIF images meta: id: gif file-extension: gif endian: le doc: | GIF (Graphics Interchange Format) is a popular image format that supports animation and compression. seq: - id: header type: header - id: logical_screen_descriptor type: logical_screen_descriptor - id: global_color_table type: color_table if: logical_screen_descriptor.has_color_table repeat: expr repeat-expr: logical_screen_descriptor.color_table_size types: header: seq: - id: magic contents: 'GIF' - id: version size: 3 type: str encoding: ASCII logical_screen_descriptor: seq: - id: screen_width type: u2 - id: screen_height type: u2 - id: flags type: u1 - id: bg_color_index type: u1 - id: pixel_aspect_ratio type: u1 instances: has_color_table: value: (flags & 0x80) != 0 color_table_size: value: 2 << (flags & 7) color_table: seq: - id: red type: u1 - id: green type: u1 - id: blue type: u1 ``` ### Compiler Usage Generate parser code from .ksy format definitions. ```bash # Install the Kaitai Struct compiler # Debian/Ubuntu wget https://packages.kaitai.io/dists/unstable/main/binary-amd64/kaitai-struct-compiler_0.10_all.deb sudo dpkg -i kaitai-struct-compiler_0.10_all.deb # Using the compiler # Compile to Python kaitai-struct-compiler -t python gif.ksy -d output/ # Compile to Java kaitai-struct-compiler -t java gif.ksy -d output/ # Compile to JavaScript kaitai-struct-compiler -t javascript gif.ksy -d output/ # Compile to multiple languages at once kaitai-struct-compiler -t python -t java -t javascript gif.ksy -d output/ # Use the generated parser in Python from gif import Gif with open('image.gif', 'rb') as f: gif_file = Gif.from_file('image.gif') print(f"GIF version: {gif_file.header.version}") print(f"Screen size: {gif_file.logical_screen_descriptor.screen_width}x{gif_file.logical_screen_descriptor.screen_height}") print(f"Has color table: {gif_file.logical_screen_descriptor.has_color_table}") ``` ### Stream Positioning Operations Navigate through binary data streams with precise control. ```java // Java runtime example import io.kaitai.struct.KaitaiStream; import java.io.FileInputStream; public class BinaryParser { public static void main(String[] args) throws Exception { try (FileInputStream fis = new FileInputStream("data.bin")) { KaitaiStream stream = new KaitaiStream(fis); // Check if at end of file if (!stream.isEof()) { // Get current position long position = stream.pos(); System.out.println("Current position: " + position); // Get total size long size = stream.size(); System.out.println("Total size: " + size); // Seek to specific position stream.seek(100); // Read data at new position int value = stream.readU4be(); System.out.println("Value at position 100: " + value); // Seek relative to current position stream.seek(stream.pos() + 20); } } } } ``` ### Integer Reading Operations Read signed and unsigned integers in big-endian or little-endian byte order. ```javascript // JavaScript runtime example const KaitaiStream = require('kaitai-struct/KaitaiStream'); const fs = require('fs'); // Read binary file const buffer = fs.readFileSync('data.bin'); const stream = new KaitaiStream(buffer); // Unsigned integers const u1 = stream.readU1(); // 1-byte unsigned (0-255) const u2le = stream.readU2le(); // 2-byte unsigned little-endian const u2be = stream.readU2be(); // 2-byte unsigned big-endian const u4le = stream.readU4le(); // 4-byte unsigned little-endian const u4be = stream.readU4be(); // 4-byte unsigned big-endian const u8le = stream.readU8le(); // 8-byte unsigned little-endian const u8be = stream.readU8be(); // 8-byte unsigned big-endian // Signed integers const s1 = stream.readS1(); // 1-byte signed (-128 to 127) const s2le = stream.readS2le(); // 2-byte signed little-endian const s2be = stream.readS2be(); // 2-byte signed big-endian const s4le = stream.readS4le(); // 4-byte signed little-endian const s4be = stream.readS4be(); // 4-byte signed big-endian const s8le = stream.readS8le(); // 8-byte signed little-endian const s8be = stream.readS8be(); // 8-byte signed big-endian console.log(`Unsigned byte: ${u1}, Signed short (LE): ${s2le}`); ``` ### Floating Point Operations Read IEEE 754 floating point numbers in different byte orders. ```ruby # Ruby runtime example require 'kaitai/struct/struct' class FloatReader < Kaitai::Struct::Struct def initialize(_io, _parent = nil, _root = self) super(_io, _parent, _root) @_io = _io read end def read # Read 4-byte floats @float_le = @_io.read_f4le() # Little-endian single precision @float_be = @_io.read_f4be() # Big-endian single precision # Read 8-byte doubles @double_le = @_io.read_f8le() # Little-endian double precision @double_be = @_io.read_f8be() # Big-endian double precision end attr_reader :float_le, :float_be, :double_le, :double_be end # Usage File.open('data.bin', 'rb') do |file| reader = FloatReader.new(Kaitai::Struct::Stream.new(file)) puts "Float (LE): #{reader.float_le}" puts "Float (BE): #{reader.float_be}" puts "Double (LE): #{reader.double_le}" puts "Double (BE): #{reader.double_be}" end ``` ### Byte Array Operations Read fixed, variable, and terminated byte sequences. ```cpp // C++ STL runtime example #include #include #include int main() { std::ifstream ifs("data.bin", std::ifstream::binary); kaitai::kstream ks(&ifs); // Read fixed-size byte array std::string fixed_bytes = ks.read_bytes(10); std::cout << "Read 10 bytes: " << fixed_bytes.length() << std::endl; // Read all remaining bytes std::string remaining = ks.read_bytes_full(); std::cout << "Remaining bytes: " << remaining.length() << std::endl; // Read until terminator (e.g., null-terminated string) // Parameters: terminator, include_term, consume_term, eos_error std::string term_bytes = ks.read_bytes_term(0, false, true, true); std::cout << "Terminated string: " << term_bytes << std::endl; // Verify fixed contents std::string expected = "MAGIC"; ks.ensure_fixed_contents(expected); // Throws exception if mismatch return 0; } ``` ### Bit-Level Operations Read unaligned bit values and arrays for packed binary formats. ```csharp // C# runtime example using Kaitai; using System; using System.IO; public class BitReader { public static void Main() { using (var fs = new FileStream("data.bin", FileMode.Open)) { var stream = new KaitaiStream(fs); // Read individual bits // For formats with bit-level packing (e.g., compressed data, flags) ulong bits5 = stream.ReadBitsIntBe(5); // Read 5 bits as integer (big-endian) ulong bits3 = stream.ReadBitsIntBe(3); // Read 3 more bits // Read bit array byte[] bitArray = stream.ReadBitsArray(16); // Read 16 bits as byte array // Align back to byte boundary stream.AlignToByte(); // Continue with normal byte reading byte nextByte = stream.ReadU1(); Console.WriteLine($"5-bit value: {bits5}"); Console.WriteLine($"3-bit value: {bits3}"); Console.WriteLine($"Next byte: {nextByte}"); } } } ``` ### Data Processing Operations Transform and decode byte arrays with built-in processing algorithms. ```python # Python runtime example - byte processing from kaitaistruct import KaitaiStream import io # Example: Reading encrypted or compressed data class ProcessedData: def __init__(self, stream): # Read encrypted data with XOR cipher encrypted = stream.read_bytes(100) # Process with single-byte XOR key decrypted = KaitaiStream.process_xor_one(encrypted, 0x42) # Or process with multi-byte XOR key key = b'\x12\x34\x56\x78' decrypted_multi = KaitaiStream.process_xor_many(encrypted, key) # Process rotated bytes (bit rotation cipher) rotated_data = stream.read_bytes(50) unrotated = KaitaiStream.process_rotate_left(rotated_data, 3, 1) # Process zlib-compressed data compressed = stream.read_bytes(200) try: decompressed = KaitaiStream.process_zlib(compressed) print(f"Decompressed {len(compressed)} bytes to {len(decompressed)} bytes") except Exception as e: print(f"Decompression failed: {e}") # Strip padding from byte array padded = b'Hello\x00\x00\x00' stripped = KaitaiStream.bytes_strip_right(padded, 0) print(f"Stripped: {stripped}") # b'Hello' # Terminate at specific byte data = b'Hello\x00World' terminated = KaitaiStream.bytes_terminate(data, 0, False) print(f"Terminated: {terminated}") # b'Hello' # Example .ksy using process operations """ meta: id: encrypted_file seq: - id: encrypted_header size: 16 process: xor(0x42) - id: compressed_data size: _io.size - 16 process: zlib """ ``` ### Web IDE Integration Use the online visualizer to develop and debug format definitions. ```bash # Access the Web IDE at https://ide.kaitai.io/ # Local development workflow: # 1. Write your .ksy format definition # 2. Upload a sample binary file # 3. See real-time parsing results and structure visualization # 4. Debug and iterate on your format definition # 5. Download generated code for your target language # Example: Debugging a custom format cat > my_format.ksy <<'EOF' meta: id: my_format endian: le seq: - id: magic contents: [0x4D, 0x59, 0x46, 0x4D] # "MYFM" - id: version type: u2 - id: record_count type: u4 - id: records type: record repeat: expr repeat-expr: record_count types: record: seq: - id: id type: u4 - id: name_len type: u1 - id: name type: str size: name_len encoding: UTF-8 EOF # Upload my_format.ksy and your binary file to Web IDE for visualization ``` ### GitHub Actions CI/CD Integration Automated compilation and deployment of format definitions. ```yaml # .github/workflows/kaitai.yml name: Kaitai Struct Compilation on: push: branches: [master] pull_request: {} jobs: compile-formats: runs-on: ubuntu-latest steps: - uses: actions/checkout@v5 with: submodules: recursive - name: Setup Java uses: actions/setup-java@v5 with: distribution: temurin java-version: '25' - name: Install Kaitai Struct Compiler run: | wget https://packages.kaitai.io/dists/unstable/main/binary-amd64/kaitai-struct-compiler_0.10_all.deb sudo dpkg -i kaitai-struct-compiler_0.10_all.deb - name: Compile formats run: | mkdir -p compiled/python compiled/java compiled/javascript kaitai-struct-compiler -t python formats/*.ksy -d compiled/python/ kaitai-struct-compiler -t java formats/*.ksy -d compiled/java/ kaitai-struct-compiler -t javascript formats/*.ksy -d compiled/javascript/ - name: Upload compiled artifacts uses: actions/upload-artifact@v5 with: name: compiled-parsers path: compiled/ - name: Run tests run: | cd compiled/python python3 -m pytest tests/ ``` ## Summary Kaitai Struct serves as a comprehensive solution for binary data parsing across multiple programming languages and platforms. The framework is particularly valuable for reverse engineering file formats, implementing network protocol parsers, analyzing binary data structures, creating data forensics tools, and building cross-platform applications that need consistent binary parsing behavior. Common use cases include parsing image formats (GIF, PNG, JPEG), working with archive formats (ZIP, TAR), analyzing executable files (ELF, PE, Mach-O), decoding network protocols (TCP/IP packets, DNS), and processing embedded system data structures. Integration patterns typically involve defining the binary format in a `.ksy` file, compiling it to the target language using `ksc`, including the appropriate runtime library for that language, and using the generated parser classes in the application code. The runtime libraries are lightweight and designed for easy integration into existing projects via standard package managers (pip for Python, Maven for Java, npm for JavaScript, etc.). The modular architecture allows developers to use only the components they need, while the growing collection of pre-defined formats at https://github.com/kaitai-io/kaitai_struct_formats provides ready-to-use parsers for hundreds of common file formats, reducing development time and ensuring correctness through community-tested implementations.