### Install PolyFile from Source (Bash) Source: https://github.com/trailofbits/polyfile/blob/master/README.md This command installs PolyFile directly from the source code located in the current directory. Ensure Java is installed beforehand as it's needed for the Kaitai Struct compiler. ```Bash pip3 install . ``` -------------------------------- ### Install PolyFile from PyPI (Bash) Source: https://github.com/trailofbits/polyfile/blob/master/README.md This command installs the latest stable version of the PolyFile utility from the Python Package Index (PyPI) using pip3. ```Bash pip3 install polyfile ``` -------------------------------- ### Using Binary Struct Utility (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Provides an example of defining and reading a binary structure using PolyFile's `Struct` utility, demonstrating field definitions with types and dependencies, and reading data from a byte stream. ```python from io import BytesIO from polyfile.structs import ByteField, Int32LE, Struct, UInt8LE class Test(Struct): foo: UInt8LE bar: Int32LE data: ByteField["foo"] test = Test.read(BytesIO(b"\x03234567890")) print(test.foo, test.bar, test.data) ``` -------------------------------- ### Implement PolyFile Parser parse Method (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Illustrates an example implementation of a parser's 'parse' method, demonstrating how to read data from the stream, create and yield Submatch objects, and raise InvalidMatch on parsing failure. ```Python from polyfile import InvalidMatch, Submatch def parse(self, stream, match): header_content = stream.read(len(b"example")) if header_content == b"example": yield Submatch( name="Example Header", match_obj=header_content, relative_offset=0, length=len(header_content), parent=match ) remaining_content = stream.read() content_node = Submatch( name="Content", match_obj=remaining_content, relative_offset=len(header_content), length=len(remaining_content), parent=match ) yield content_node pos = remaining_content.find(b"1234") if pos >= 0: yield Submatch( name="1234 Position", match_obj=[1, 2, 3, 4], relative_offset=pos, # the offset is relative to the start of the parent node length=4, parent=content_node ) else: raise InvalidMatch("The file does not start with b\"example\"!") ``` -------------------------------- ### Example PolyFile JSON Output Structure - Javascript Source: https://github.com/trailofbits/polyfile/blob/master/docs/json_format.md Illustrates the JSON structure produced by PolyFile, showing top-level file metadata (MD5, SHA1, SHA256, etc.) and the 'struc' array containing file object mappings with nested sub-elements. Includes SBuD extensions like 'relative_offset' and 'img_data', and supports multiple entries in 'struc' for polyglots. ```javascript { "MD5": "MD5 hex string of the input file", "SHA1": "SHA1 hex string for the input file", "SHA256": "SHA256 hex string for the input file", "b64contents": "base64 encoded contents of the input file", "fileName": "The input filename, or 'STDIN' if the file was read from STDIN", "length": 1337, /* integer number of bytes in the file */ "struc": [ /* SBuD does not use a list here; there is just one element * * PolyFile uses a list to enable labeling multiple filetypes * * in the case of a polyglot */ { "name": "ADOBE_PDF", /* the filetype */ "offset": 0, /* the offset of this file object * * within the input file */ "subEls": [ /* sub-elements of this filetype */ { "offset": 0 /* once again, the global offset */ "relative_offset: 0 /* the offset of this element * * relative to its parent */ "name": "header" /* a descriptive element name */ "type": "magic" /* the type of this element */ "size": 9 /* size of the element in bytes */ "value": "%PDF1.3\n" "img_data": "Optional base64 encoded image" /* not in SBud */ "subEls": [ /* any child elements, in the same format */ ] } ] } /* additional dictionaries will be included here * * if the file is a polyglot */ ] } ``` -------------------------------- ### Using Custom PolyFile MagicMatcher Definitions (Python) Source: https://github.com/trailofbits/polyfile/blob/master/README.md This snippet shows how to load specific or custom magic file definitions into a `MagicMatcher` instance. It takes a list of definition file paths, parses them to create a matcher, and then uses this custom matcher to process a file. ```python list_of_paths_to_definitions = ["def1", "def2"] matcher = MagicMatcher.parse(*list_of_paths_to_definitions) with open("file_to_test", "rb") as f: for match in matcher.match(f.read()): ... ``` -------------------------------- ### Run PolyFile (Console) Source: https://github.com/trailofbits/polyfile/blob/master/README.md Executing PolyFile on a file without additional arguments mimics the behavior of the standard `file --keep-going` command, identifying multiple file types present within the input. ```Console $ polyfile png-polyglot.png PNG image data, 256 x 144, 8-bit/color RGB, non-interlaced Brainfu** Program Malformed PDF PDF document, version 1.3, 1 pages ZIP end of central directory record Java JAR archive ``` -------------------------------- ### PolyFile Debugger Help Output (Console) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Displays the help message for the PolyFile interactive debugger, accessible via the `--debugger` or `-db` command-line argument, listing available commands for debugging DSL specifications and parsers. ```console $ polyfile -db input_file PolyFile 0.5.4 Copyright ©2023 Trail of Bits Apache License Version 2.0 https://www.apache.org/licenses/ For help, type "help". (polyfile) help breakpoint ......... list the current breakpoints or add a new one continue ........... continue execution until the next breakpoint is hit (aliases: run) debug_and_continue . continue while debugging in PDB (aliases: debug and debug_and_cont) debug_and_rerun .... re-run the last test and debug in PDB debug_and_step ..... step into the next magic test and debug in PDB delete ............. delete a breakpoint help ............... print this message next ............... continue execution until the next test that matches print .............. print the computed absolute offset of the following libmagic DSL offset profile ............ print current profiling results (to enable profiling, use `set profile True`) quit ............... exit the debugger set ................ modifies part of the debugger environment show ............... prints part of the debugger environment step ............... step through a single magic test test ............... test the following libmagic DSL test at the current position where .............. print the context of the current magic test (aliases: backtrace and info stack) ``` -------------------------------- ### Using Default PolyFile MagicMatcher (Python) Source: https://github.com/trailofbits/polyfile/blob/master/README.md This snippet demonstrates how to use the default instance of the `MagicMatcher` class in PolyFile to identify the type of a file. It reads the file content, matches it against the default magic definitions, and prints the matched MIME types and match strings. ```python from polyfile.magic import MagicMatcher with open("file_to_test", "rb") as f: # the default instance automatically loads all file definitions for match in MagicMatcher.DEFAULT_INSTANCE.match(f.read()): for mimetype in match.mimetypes: print(f"Matched MIME: {mimetype}") print(f"Match string: {match!s}") ``` -------------------------------- ### Define libmagic Matcher Programmatically (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Demonstrates how to create a PolyFile MagicMatcher instance by programmatically defining libmagic rules within a temporary file and adding it to the default matcher instance. ```Python from pathlib import Path from polyfile.fileutils import ExactNamedTempfile from polyfile.magic import MagicMatcher, TestType with ExactNamedTempfile(b"""# The default libmagic test for NITF does not associate a MIME type, # and does not support NITF 02.10 0 string NITF NITF >4 string 02.10 \ version 2.10 (ISO/IEC IS 12087-5) >25 string >\0 dated %.14s !:mime application/vnd.nitf !:ext ntf """, name="NITFMatcher") as t: nitf_matcher = MagicMatcher.DEFAULT_INSTANCE.add(Path(t), test_type=TestType.BINARY)[0] ``` -------------------------------- ### Pipe File to PolyFile for HTML (Bash) Source: https://github.com/trailofbits/polyfile/blob/master/README.md This command demonstrates piping the content of a file downloaded via curl directly into PolyFile for analysis and generation of an HTML output file. ```Bash curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - ``` -------------------------------- ### Implement Pure Python Matcher (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Shows how to create a custom file matcher entirely in Python by subclassing MagicTest and implementing the test method to check for specific byte patterns. ```Python from typing import Optional from polyfile.magic import AbsoluteOffset, FailedTest, MagicMatcher, MagicTest, MatchedTest, TestResult, TestType class ExampleMatcher(MagicTest): def __init__(self): super().__init__( offset=AbsoluteOffset(0), # the file offset at which this test starts matching mime="application-x/example-mime", # the MIME type associated with this type extensions=("example",), # file extensions associated with this type, if any message="A message that will be printed when this test matches an input" ) def subtest_type(self) -> TestType: return TestType.BINARY # use TestType.TEXT if this test only works on non-binary input def test(self, data: bytes, absolute_offset: int, parent_match: Optional[TestResult]) -> TestResult: if data.startswith(b"example"): return MatchedTest(self, value=data, offset=0, length=len(data)) else: return FailedTest(self, offset=0, message="This is not an example file!") # Register the matcher so it always runs: MagicMatcher.DEFAULT_INSTANCE.add(ExampleMatcher()) ``` -------------------------------- ### Generate HTML Output (Console) Source: https://github.com/trailofbits/polyfile/blob/master/README.md Using the `--html` option with PolyFile generates an interactive hex viewer for the analyzed file and saves it to the specified HTML file. ```Console $ polyfile --html output.html png-polyglot.png Found a file of type application/pdf at byte offset 0 Found a file of type application/x-brainfuck at byte offset 0 Found a file of type image/png at byte offset 0 Found a file of type application/zip at byte offset 0 Found a file of type application/java-archive at byte offset 0 Saved HTML output to output.html ``` -------------------------------- ### Registering Custom Parser Class (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Demonstrates how to register a custom parser class inheriting from `polyfile.Parser` by adding an instance of the class to the `polyfile.PARSERS` dictionary under a specific MIME type. ```python import polyfile class ExampleParser(polyfile.Parser): def parse(self, stream, match): ... polyfile.PARSERS["application-x/example-mime"].add(ExampleParser) ``` -------------------------------- ### Defining Kaitai Struct MIME Mapping (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Illustrates the structure of the `KAITAI_MIME_MAPPING` dictionary used internally by PolyFile to associate MIME types with their corresponding Kaitai Struct format files (.ksy) for automatic parser generation. ```python KAITAI_MIME_MAPPING: Dict[str, str] = { "image/gif": "image/gif.ksy", "image/png": "image/png.ksy", "image/jpeg": "image/jpeg.ksy", "image/vnd.microsoft.icon": "image/ico.ksy", ⋮ } ``` -------------------------------- ### Iterate and Render Matches (Jinja2) Source: https://github.com/trailofbits/polyfile/blob/master/polyfile/templates/template.html Loops through a list of 'matches' provided to the template context and calls the `render_match` macro for each one. The `True` argument is passed to indicate the first level of matches. ```Jinja2 {% for match in matches %} {{ render_match(match, True) }} {% endfor %} ``` -------------------------------- ### Registering Parser Function with Decorator (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Shows how to register a standalone function as a parser using the `@polyfile.register_parser` decorator, specifying the target MIME type as the decorator argument. ```python from polyfile import register_parser @register_parser("application-x/example-mime") def parse_example(file_stream, match): ... ``` -------------------------------- ### Configure Git Hooks Path (Bash) Source: https://github.com/trailofbits/polyfile/blob/master/hooks/README.md Configures the Git repository to use the custom hooks located in the './hooks' directory. This command sets the 'core.hooksPath' configuration variable. ```bash $ git config core.hooksPath ./hooks ``` -------------------------------- ### PolyFile Parsers Dictionary Definition (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Shows the definition of the global dictionary used by PolyFile to map MIME types detected by matchers to corresponding parser classes. ```Python PARSERS: Dict[str, Set[polyfile.Parser]] = defaultdict(set) ``` -------------------------------- ### Render Match Name Macro (Jinja2) Source: https://github.com/trailofbits/polyfile/blob/master/polyfile/templates/template.html Defines a Jinja2 macro to render the name, size, and offset of a file match, including optional decode link and image data. It takes a 'match' object containing match details and a 'first' boolean flag to control output format. ```Jinja2 {% macro render_match_name(match, first) -%} [⇩](# "download") {{ match['name'] }}{% if not first %} {{ match['size'] }} bytes @ {{ '0x%0X' % match['offset'] }} {% endif %}{% if match['decoded'] %} [[Decode](#decoded{{ match['uid'] }})]{% endif %}{% if match['img_data'] %} ![]({{ match['img_data'] }}) {% endif %} {%- endmacro %} ``` -------------------------------- ### PolyFile Parser parse Method Signature (Python) Source: https://github.com/trailofbits/polyfile/blob/master/docs/extending_polyfile.md Provides the required method signature for the 'parse' function that all PolyFile parser subclasses must implement to process file streams and yield submatches. ```Python def parse(self, stream: polyfile.fileutils.FileStream, match: polyfile.Match) -> Iterator[ polyfile.Submatch] ``` -------------------------------- ### Render Match Macro (Jinja2) Source: https://github.com/trailofbits/polyfile/blob/master/polyfile/templates/template.html Defines a recursive Jinja2 macro to render a file match and its potential sub-elements. It calls `render_match_name` for the current match and recursively calls itself for any child matches ('subEls'). ```Jinja2 {% macro render_match(match, first) -%}* {% if 'subEls' in match and match['subEls'] %} {{ render_match_name(match, first) }} {% for child in match['subEls'] %} {{ render_match(child, False) }} {% endfor %} {% else %} {{ render_match_name(match, first) }} {% endif %} {%- endmacro %} ``` -------------------------------- ### Render Decoded Matches (Jinja2) Source: https://github.com/trailofbits/polyfile/blob/master/polyfile/templates/template.html Iterates through a list of decoded matches obtained from the `decoded_matches()` function. For each match, it displays the match name as a header and includes the decoded content, marked as safe HTML. ```Jinja2 {% for match, decoded in decoded_matches() %} Decoded {{ match['name'] }} ----------------------------- [×](#) {{ decoded|safe }} {% endfor %} ``` -------------------------------- ### Generate Hex Column Headers (Jinja2) Source: https://github.com/trailofbits/polyfile/blob/master/polyfile/templates/template.html Generates a row of hexadecimal characters (0-f) to be used as column headers in a hex dump display. It first adds spacing based on the logarithm of the input size to align with the address column. ```Jinja2 {% for i in range(math.ceil(math.log(input_bytes)/math.log(16))) %} {% endfor %}{% for col in range(16) %}{{ '%x' % col }}{% endfor %} ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.