### Install puremagic using pip Source: https://github.com/cdgriffith/puremagic/blob/master/README.rst Install the puremagic library using pip. On Linux, it's recommended to use `python3 -m pip` for clarity. ```bash $ pip install puremagic ``` ```bash On linux environments, you may want to be clear you are using python3 $ python3 -m pip install puremagic ``` -------------------------------- ### Install puremagic Source: https://context7.com/cdgriffith/puremagic/llms.txt Install the puremagic library using pip. Ensure you are using Python 3.12+ or use puremagic 1.x for older versions. ```bash pip install puremagic ``` -------------------------------- ### Identify File by Path Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.from_file` to get the file extension (default) or MIME type by providing the file path. Deep scan is automatically enabled. ```python import puremagic # Extension (default) print(puremagic.from_file("test/resources/images/test.gif")) # '.gif' print(puremagic.from_file("test/resources/audio/test.mp3")) # '.mp3' print(puremagic.from_file("test/resources/office/test.docx")) # '.docx' # MIME type print(puremagic.from_file("test/resources/images/test.png", mime=True)) # 'image/png' print(puremagic.from_file("test/resources/office/test.xlsx", mime=True)) # 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' # Error handling from puremagic import PureError, PureValueError try: result = puremagic.from_file("empty_or_unknown.bin") except PureError: print("File type unrecognized") except PureValueError: print("File was empty") ``` -------------------------------- ### Identify file extension and MIME type Source: https://github.com/cdgriffith/puremagic/blob/master/README.rst Use `from_file` to get the most likely file extension and `magic_file` to retrieve all possible matches with confidence scores. ```python import puremagic filename = "test/resources/images/test.gif" ext = puremagic.from_file(filename) # '.gif' puremagic.magic_file(filename) # [['.gif', 'image/gif', 'Graphics interchange format file (GIF87a)', 0.7], # ['.gif', '', 'GIF file', 0.5]] ``` -------------------------------- ### Get All Database Entries for an Extension Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.magic_extension` to retrieve a list of all `PureMagicWithConfidence` entries matching a given extension, sorted by confidence. This function does not raise an error for unknown extensions, returning an empty list instead. ```python import puremagic # See all database entries for .zip matches = puremagic.magic_extension(".zip") for m in matches: print(f"conf={m.confidence} bytes={m.byte_match!r} {m.name}") # conf=0.4 bytes=b'PK\x03\x04' ZIP archive # conf=0.3 bytes=b'PK\x05\x06' ZIP archive (empty) # ... # Works without leading dot matches = puremagic.magic_extension("png") print(matches[0].mime_type) # 'image/png' # Returns empty list for unknown extensions (does not raise) matches = puremagic.magic_extension(".notaformat") print(matches) # [] ``` -------------------------------- ### Get All Matches for a File with Deep Scan Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.magic_file` to retrieve all `PureMagicWithConfidence` matches for a given file, including deep scan results. This is useful for debugging ambiguous files. ```python import puremagic results = puremagic.magic_file("test/resources/images/test.gif") # [PureMagicWithConfidence(byte_match=b'GIF87a', offset=0, extension='.gif', # mime_type='image/gif', name='Graphics interchange format file (GIF87a)', confidence=0.6), # PureMagicWithConfidence(byte_match=b'GIF', offset=0, extension='.gif', # mime_type='image/gif', name='GIF file', confidence=0.3)] for match in results: print(f"{match.extension:10s} {match.confidence:.0%} {match.name}") # Deep scan result (confidence=1.0) for an MP3: mp3_results = puremagic.magic_file("test/resources/audio/test.mp3") print(mp3_results[0]) # PureMagicWithConfidence(..., extension='.mp3', # name='MPEG-1 Audio Layer III (MP3) file [128k 44.1Khz Stereo CBR ID3v2.3]', # confidence=1.0) ``` -------------------------------- ### Get All Matches from a Stream Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.magic_stream` to get all `PureMagicWithConfidence` matches from a binary stream. The stream is automatically rewound to position 0 after reading, allowing further operations on it. ```python import puremagic with open("test/resources/video/test.mp4", "rb") as f: matches = puremagic.magic_stream(f) # Stream is automatically rewound; f is still usable after this call for m in matches: print(f"{m.extension} {m.mime_type} conf={m.confidence}") # .mp4 video/mp4 conf=0.8 # .mp4 video/mp4 conf=0.8 (second match from multi-part signature) # Check that stream is rewound with open("test/resources/images/test.png", "rb") as f: _ = puremagic.magic_stream(f) print(f.tell()) # 0 — stream rewound ``` -------------------------------- ### Run puremagic from the command line Source: https://github.com/cdgriffith/puremagic/blob/master/README.rst Execute puremagic as a module from the command line to scan files. Options include returning MIME types or verbose output. ```bash $ python -m puremagic [options] filename ... ``` ```bash $ python -m puremagic test/resources/images/test.gif 'test/resources/images/test.gif' : .gif ``` ```bash $ python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3 'test/resources/images/test.gif' : image/gif 'test/resources/audio/test.mp3' : audio/mpeg ``` -------------------------------- ### Identify files from the command line with python -m puremagic Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies files from the command line, printing their extensions or MIME types. Supports single files, multiple files, and all files within a directory. Options include returning MIME types (-m), verbose output (-v), and looking up MIME types by extension (-e). ```bash # Identify a single file (prints extension) python -m puremagic test/resources/images/test.gif # 'test/resources/images/test.gif' : .gif # Return MIME type with -m / --mime python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3 # 'test/resources/images/test.gif' : image/gif # 'test/resources/audio/test.mp3' : audio/mpeg # Verbose output: all matches with confidence, byte match, and offset python -m puremagic -v test/resources/office/test.docx # 'test/resources/office/test.docx' : .docx # Total Possible Matches: 1 # Deepscan Match # Name: Word Document # Confidence: 100% # Extension: .docx # Mime Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document # Byte Match: b'PK\x03\x04' # Offset: 0 # Look up MIME type for a file extension (no file needed) python -m puremagic -e pdf # application/pdf python -m puremagic -v -e mp3 # Total Possible Matches: 5 # Best Match # Name: MPEG audio stream, Layer III (MP3) # Confidence: 80% # Extension: .mp3 # ... # Scan all files in a directory python -m puremagic test/resources/images/ # 'test/resources/images/test.gif' : .gif # 'test/resources/images/test.png' : .png # ... # Disable deep scan for faster (magic-number-only) results PUREMAGIC_DEEPSCAN=0 python -m puremagic test/resources/office/test.docx # 'test/resources/office/test.docx' : .zip (generic ZIP without deep scan) ``` -------------------------------- ### Load magic signature database with puremagic.magic_data Source: https://context7.com/cdgriffith/puremagic/llms.txt Loads and returns the raw magic database. Uses the bundled `magic_data.json` by default, but can accept a custom path for an alternate database. The database is returned as four structures: headers, footers, extensions, and multi-part entries. ```python from puremagic.main import magic_data, PureMagic headers, footers, extensions, multi_part = magic_data() print(f"Header signatures : {len(headers)}") print(f"Footer signatures : {len(footers)}") print(f"Extension-only : {len(extensions)}") print(f"Multi-part entries: {len(multi_part)}") # Inspect a signature sample = headers[0] print(f"bytes={sample.byte_match!r} offset={sample.offset} ext={sample.extension}") # Load a custom/augmented database custom_headers, _, _, _ = magic_data("/path/to/custom_magic_data.json") ``` -------------------------------- ### Identify Raw Bytes or String Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.from_string` with bytes or a string to identify its type. Provide a `filename` hint to boost confidence if the extension matches. ```python import puremagic # PNG magic bytes png_header = b'\x89PNG\r\n\x1a\n' + b'\x00' * 100 print(puremagic.from_string(png_header)) # '.png' print(puremagic.from_string(png_header, mime=True)) # 'image/png' # With filename hint (raises confidence if extension matches) gif_bytes = open("test/resources/images/test.gif", "rb").read() print(puremagic.from_string(gif_bytes, filename="animation.gif")) # '.gif' # PDF bytes pdf_bytes = b'%PDF-1.4 ...' + b'\x00' * 50 + b'startxref\n0\n%%EOF' print(puremagic.from_string(pdf_bytes)) # '.pdf' ``` -------------------------------- ### magic_extension Source: https://context7.com/cdgriffith/puremagic/llms.txt Retrieves all database entries matching a given file extension, sorted by confidence. ```APIDOC ## magic_extension(extension) ### Description Returns a list of all `PureMagicWithConfidence` entries matching the given extension, sorted by confidence descending. ### Parameters #### Query Parameters - **extension** (string) - Required - The file extension to query (e.g., ".zip", "png"). ### Request Example ```python import puremagic matches = puremagic.magic_extension(".zip") for m in matches: print(f"conf={m.confidence} bytes={m.byte_match!r} {m.name}") matches = puremagic.magic_extension("png") print(matches[0].mime_type) matches = puremagic.magic_extension(".notaformat") print(matches) ``` ### Response #### Success Response - **matches** (list) - A list of `PureMagicWithConfidence` objects matching the extension, or an empty list if none are found. #### Response Example ``` conf=0.4 bytes=b'PK\x03\x04' ZIP archive conf=0.3 bytes=b'PK\x05\x06' ZIP archive (empty) ... image/png [] ``` ``` -------------------------------- ### from_file(filename, mime=False) Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies a file by its path. It reads the file's magic bytes (header and footer) and returns the best-guess file extension or MIME type. Deep scan is automatically enabled for more accurate identification. ```APIDOC ## from_file(filename, mime=False) ### Description Opens the file, reads its magic bytes (header + footer), and returns the best-guess extension or MIME type. Deep scan runs automatically when enabled. ### Method `puremagic.from_file` ### Parameters #### Path Parameters - **filename** (string) - Required - The path to the file to identify. - **mime** (boolean) - Optional - If True, returns the MIME type; otherwise, returns the file extension. Defaults to False. ### Request Example ```python import puremagic # Extension (default) print(puremagic.from_file("test/resources/images/test.gif")) # '.gif' print(puremagic.from_file("test/resources/audio/test.mp3")) # '.mp3' print(puremagic.from_file("test/resources/office/test.docx")) # '.docx' # MIME type print(puremagic.from_file("test/resources/images/test.png", mime=True)) # 'image/png' print(puremagic.from_file("test/resources/office/test.xlsx", mime=True)) # 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' # Error handling from puremagic import PureError, PureValueError try: result = puremagic.from_file("empty_or_unknown.bin") except PureError: print("File type unrecognized") except PureValueError: print("File was empty") ``` ``` -------------------------------- ### from_string(string, mime=False, filename=None) Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies the file type from raw bytes or a string. It can optionally accept a filename hint to improve confidence if the file extension matches the content. ```APIDOC ## from_string(string, mime=False, filename=None) ### Description Accepts `bytes` or `str`. Optionally accepts a `filename` hint to boost confidence when the extension matches. ### Method `puremagic.from_string` ### Parameters #### Path Parameters - **string** (bytes or str) - Required - The raw bytes or string content to identify. - **mime** (boolean) - Optional - If True, returns the MIME type; otherwise, returns the file extension. Defaults to False. - **filename** (string) - Optional - A filename hint to potentially increase identification confidence. ### Request Example ```python import puremagic # PNG magic bytes png_header = b'\x89PNG\r\n\x1a\n' + b'\x00' * 100 print(puremagic.from_string(png_header)) # '.png' print(puremagic.from_string(png_header, mime=True)) # 'image/png' # With filename hint (raises confidence if extension matches) gif_bytes = open("test/resources/images/test.gif", "rb").read() print(puremagic.from_string(gif_bytes, filename="animation.gif")) # '.gif' # PDF bytes pdf_bytes = b'%PDF-1.4 ...' + b'\x00' * 50 + b'startxref\n0\n%%EOF' print(puremagic.from_string(pdf_bytes)) # '.pdf' ``` ``` -------------------------------- ### Look Up MIME Type by Extension Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.from_extension` to query the magic database for the MIME type or human-readable name associated with a file extension. This function does not read any files. ```python import puremagic from puremagic import PureError # MIME type lookup (default) print(puremagic.from_extension(".pdf")) # 'application/pdf' print(puremagic.from_extension("mp3")) # 'audio/mpeg' (leading dot optional) print(puremagic.from_extension(".docx")) # 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' # Human-readable name instead of MIME print(puremagic.from_extension(".gif", mime=False)) # 'GIF file' print(puremagic.from_extension(".png", mime=False)) # 'PNG image' # Unknown extension raises PureError try: puremagic.from_extension(".xyz123") except PureError as e: print(e) # Could not find extension '.xyz123' in magic database ``` -------------------------------- ### Identify File Type from Raw Bytes Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.magic_string` to identify file types from a byte string. Provide an optional filename hint to trigger a deep scan, which can help distinguish between similar file types like DOCX and ZIP. ```python import puremagic # ZIP magic bytes (will match generic .zip without deep scan) zip_data = open("test/resources/archive/test.zip", "rb").read() matches = puremagic.magic_string(zip_data) for m in matches: print(f"{m.extension} conf={m.confidence}") # With filename, deep scan can distinguish docx from zip docx_data = open("test/resources/office/test.docx", "rb").read() matches = puremagic.magic_string(docx_data, filename="test/resources/office/test.docx") print(matches[0].extension) # '.docx' print(matches[0].mime_type) # 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' ``` -------------------------------- ### Identify File Type from Stream Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.from_stream` to identify file types from open file handles or BytesIO buffers. Optionally provide a filename hint for better accuracy in ambiguous cases. ```python import puremagic import io # From an open file handle with open("test/resources/video/test.mp4", "rb") as f: print(puremagic.from_stream(f)) # '.mp4' print(puremagic.from_stream(f, mime=True)) # 'video/mp4' # From an in-memory BytesIO buffer zip_magic = b'PK\x03\x04' + b'\x00' * 200 buf = io.BytesIO(zip_magic) print(puremagic.from_stream(buf)) # '.zip' # With filename hint for deep scan disambiguation with open("spreadsheet.xlsx", "rb") as f: ext = puremagic.from_stream(f, filename="spreadsheet.xlsx") print(ext) # '.xlsx' ``` -------------------------------- ### Extract file extension with puremagic.ext_from_filename Source: https://context7.com/cdgriffith/puremagic/llms.txt Extracts the file extension from a filename string. Handles double extensions correctly by consulting the magic database. Input filenames are case-insensitive and extensions are returned in lowercase. ```python import puremagic print(puremagic.ext_from_filename("archive.tar.gz")) # '.tar.gz' print(puremagic.ext_from_filename("photo.JPG")) # '.jpg' (lowercased) print(puremagic.ext_from_filename("document.docx")) # '.docx' print(puremagic.ext_from_filename("noextension")) # '' print(puremagic.ext_from_filename("backup.fb2.zip")) # '.fb2.zip' ``` -------------------------------- ### magic_string Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies all file type matches from a raw byte string, with optional filename hint for deep scanning. ```APIDOC ## magic_string(string, filename=None) ### Description Returns all matches from a byte string. Optionally accepts a filename hint to trigger deep scan. ### Parameters #### Request Body - **string** (bytes) - Required - The raw byte string to analyze. - **filename** (string) - Optional - A filename hint to trigger deep scan for more accurate results. ### Request Example ```python import puremagic zip_data = open("test/resources/archive/test.zip", "rb").read() matches = puremagic.magic_string(zip_data) for m in matches: print(f"{m.extension} conf={m.confidence}") docx_data = open("test/resources/office/test.docx", "rb").read() matches = puremagic.magic_string(docx_data, filename="test/resources/office/test.docx") print(matches[0].extension) print(matches[0].mime_type) ``` ### Response #### Success Response - **matches** (list) - A list of `PureMagicWithConfidence` objects representing the identified file types. #### Response Example ``` .zip conf=0.8 .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document ``` ``` -------------------------------- ### magic_data Source: https://context7.com/cdgriffith/puremagic/llms.txt Loads and returns the raw magic signature database, which can be the bundled JSON file or a custom path. ```APIDOC ## magic_data(filename=None) ### Description Loads and returns the raw magic signature database as four structures. Uses the bundled `magic_data.json` by default; accepts a custom path to load an alternate database. ### Parameters #### Path Parameters - **filename** (string) - Optional - Path to a custom magic database file. If not provided, the bundled `magic_data.json` is used. ### Request Example ```python from puremagic.main import magic_data, PureMagic headers, footers, extensions, multi_part = magic_data() print(f"Header signatures : {len(headers)}") print(f"Footer signatures : {len(footers)}") print(f"Extension-only : {len(extensions)}") print(f"Multi-part entries: {len(multi_part)}") # Inspect a signature sample = headers[0] print(f"bytes={sample.byte_match!r} offset={sample.offset} ext={sample.extension}") # Load a custom/augmented database custom_headers, _, _, _ = magic_data("/path/to/custom_magic_data.json") ``` ### Response #### Success Response (tuple) - Returns a tuple containing four structures: `headers`, `footers`, `extensions`, and `multi_part`. ### Response Example ``` Header signatures : 123 Footer signatures : 456 Extension-only : 789 Multi-part entries: 10 bytes=b'\x89PNG\r\n\x1a\n' offset=0 ext='.png' ``` ``` -------------------------------- ### python -m puremagic Source: https://context7.com/cdgriffith/puremagic/llms.txt Command-line interface to identify files, printing their extensions or MIME types. ```APIDOC ## python -m puremagic ### Description Identifies one or more files (or all files in a directory) and prints extensions or MIME types. ### Usage ```bash # Identify a single file (prints extension) python -m puremagic test/resources/images/test.gif # Return MIME type with -m / --mime python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3 # Verbose output: all matches with confidence, byte match, and offset python -m puremagic -v test/resources/office/test.docx # Look up MIME type for a file extension (no file needed) python -m puremagic -e pdf # Scan all files in a directory python -m puremagic test/resources/images/ # Disable deep scan for faster (magic-number-only) results PUREMAGIC_DEEPSCAN=0 python -m puremagic test/resources/office/test.docx ``` ### Options - `-m`, `--mime`: Output MIME type instead of extension. - `-v`, `--verbose`: Show detailed information about all possible matches. - `-e`, `--extension`: Look up MIME type for a given file extension. ### Environment Variables - `PUREMAGIC_DEEPSCAN`: Set to `0` to disable deep scanning and rely only on magic numbers. ``` -------------------------------- ### magic_file Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies all possible file types for a given file path, including deep scan results, sorted by confidence. ```APIDOC ## magic_file(filename) ### Description Returns the full list of `PureMagicWithConfidence` matches including deep scan results. Useful for debugging ambiguous files. ### Parameters #### Path Parameters - **filename** (string) - Required - The path to the file to analyze. ### Request Example ```python import puremagic results = puremagic.magic_file("test/resources/images/test.gif") for match in results: print(f"{match.extension:10s} {match.confidence:.0%} {match.name}") mp3_results = puremagic.magic_file("test/resources/audio/test.mp3") print(mp3_results[0]) ``` ### Response #### Success Response - **results** (list) - A list of `PureMagicWithConfidence` objects, each containing `byte_match`, `offset`, `extension`, `mime_type`, `name`, and `confidence`. #### Response Example ``` [PureMagicWithConfidence(byte_match=b'GIF87a', offset=0, extension='.gif', mime_type='image/gif', name='Graphics interchange format file (GIF87a)', confidence=0.6), PureMagicWithConfidence(byte_match=b'GIF', offset=0, extension='.gif', mime_type='image/gif', name='GIF file', confidence=0.3)] PureMagicWithConfidence(..., extension='.mp3', name='MPEG-1 Audio Layer III (MP3) file [128k 44.1Khz Stereo CBR ID3v2.3]', confidence=1.0) ``` ``` -------------------------------- ### Identify from File-Like Object Source: https://context7.com/cdgriffith/puremagic/llms.txt Use `puremagic.from_stream` with any seekable binary stream, such as file handles or `io.BytesIO` objects, to identify its type. ```python import puremagic import io ``` -------------------------------- ### from_extension Source: https://context7.com/cdgriffith/puremagic/llms.txt Looks up the MIME type or human-readable name associated with a file extension. ```APIDOC ## from_extension(extension, mime=True) ### Description Queries the magic database for the MIME type (or name) associated with a file extension. Does not read any file. ### Parameters #### Query Parameters - **extension** (string) - Required - The file extension to look up (e.g., ".pdf", "mp3"). - **mime** (boolean) - Optional - If True (default), returns the MIME type. If False, returns the human-readable name. ### Request Example ```python import puremagic from puremagic import PureError print(puremagic.from_extension(".pdf")) print(puremagic.from_extension("mp3")) print(puremagic.from_extension(".docx")) print(puremagic.from_extension(".gif", mime=False)) try: puremagic.from_extension(".xyz123") except PureError as e: print(e) ``` ### Response #### Success Response - **result** (string) - The MIME type or human-readable name for the given extension. #### Error Response - **PureError** - Raised if the extension is not found in the magic database. #### Response Example ``` application/pdf audio/mpeg application/vnd.openxmlformats-officedocument.wordprocessingml.document GIF file Could not find extension '.xyz123' in magic database ``` ``` -------------------------------- ### Control deep scanning with PUREMAGIC_DEEPSCAN environment variable Source: https://context7.com/cdgriffith/puremagic/llms.txt Deep scan is enabled by default for more accurate file type identification. Set the `PUREMAGIC_DEEPSCAN` environment variable to `0` to disable it globally, resulting in faster, magic-number-only matching. This can be done programmatically or via the shell. ```python import os import puremagic # Disable deep scan programmatically before importing (or set in environment) os.environ["PUREMAGIC_DEEPSCAN"] = "0" # With deep scan disabled, ZIP-based formats all resolve to generic .zip result = puremagic.from_file("spreadsheet.xlsx") print(result) # '.zip' (not '.xlsx') # Re-enable del os.environ["PUREMAGIC_DEEPSCAN"] result = puremagic.from_file("spreadsheet.xlsx") print(result) # '.xlsx' # Shell usage # export PUREMAGIC_DEEPSCAN=0 # python -m puremagic myfile.docx -> .zip ``` -------------------------------- ### ext_from_filename Source: https://context7.com/cdgriffith/puremagic/llms.txt Extracts the file extension from a filename string, correctly handling double extensions by consulting the magic database. ```APIDOC ## ext_from_filename(filename) ### Description Extracts the extension from a filename string, handling double extensions (`.tar.gz`, `.fb2.zip`) correctly by checking against the magic database. ### Parameters #### Path Parameters - **filename** (string) - Required - The name of the file from which to extract the extension. ### Request Example ```python import puremagic print(puremagic.ext_from_filename("archive.tar.gz")) print(puremagic.ext_from_filename("photo.JPG")) print(puremagic.ext_from_filename("document.docx")) print(puremagic.ext_from_filename("noextension")) print(puremagic.ext_from_filename("backup.fb2.zip")) ``` ### Response #### Success Response (string) - Returns the file extension (e.g., '.tar.gz', '.jpg', '.docx', ''). ### Response Example ``` .tar.gz .jpg .docx .fb2.zip ``` ``` -------------------------------- ### PureError Exception Handling Source: https://context7.com/cdgriffith/puremagic/llms.txt Raised when the file type cannot be determined by puremagic functions like from_file, from_string, or from_stream. Catch this to handle identification failures gracefully. ```python import puremagic from puremagic import PureError try: ext = puremagic.from_file("unknown_binary.dat") except PureError as e: print(f"Could not identify: {e}") # Could not identify: Could not identify file ``` -------------------------------- ### PureMagicWithConfidence Data Structure Source: https://context7.com/cdgriffith/puremagic/llms.txt Extends PureMagic with a confidence score (0.0-1.0). Results from magic_* functions are lists of these, sorted by confidence. ```python from puremagic.main import PureMagicWithConfidence # Confidence scale: # 1.0 — confirmed by deep scan # 0.9 — 9+ byte magic match, or extension-boosted # 0.8 — 8-byte match # 0.7 — 7-byte match # ... # 0.1 — extension-only match (no magic bytes found) result = PureMagicWithConfidence( byte_match=b'GIF87a', offset=0, extension='.gif', mime_type='image/gif', name='Graphics interchange format file (GIF87a)', confidence=0.6 ) print(f"{result.extension} ({result.confidence:.0%} confidence)") # .gif (60% confidence) ``` -------------------------------- ### PureMagic Data Structure Source: https://context7.com/cdgriffith/puremagic/llms.txt Represents a single entry from the magic database. Use this to access raw signature details like byte match, offset, extension, MIME type, and name. ```python from puremagic.main import PureMagic # Fields: byte_match, offset, extension, mime_type, name entry = PureMagic( byte_match=b'\x89PNG\r\n\x1a\n', offset=0, extension='.png', mime_type='image/png', name='PNG image' ) print(entry.extension) # '.png' print(entry.mime_type) # 'image/png' ``` -------------------------------- ### from_stream(stream, mime=False, filename=None) Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies the file type from an open file-like object. This function works with any seekable binary stream, including file handles and `io.BytesIO` objects. ```APIDOC ## from_stream(stream, mime=False, filename=None) ### Description Works with any seekable binary stream — file handles, `io.BytesIO`, network streams, etc. ### Method `puremagic.from_stream` ### Parameters #### Path Parameters - **stream** (file-like object) - Required - The seekable binary stream to identify. - **mime** (boolean) - Optional - If True, returns the MIME type; otherwise, returns the file extension. Defaults to False. - **filename** (string) - Optional - A filename hint to potentially increase identification confidence. ### Request Example ```python import puremagic import io # Example with io.BytesIO buffer = io.BytesIO(b'\x89PNG\r\n\x1a\n' + b'\x00' * 100) print(puremagic.from_stream(buffer, mime=False)) # '.png' # Example with a file handle with open("test/resources/images/test.jpg", "rb") as f: print(puremagic.from_stream(f, mime=True)) # 'image/jpeg' ``` ``` -------------------------------- ### magic_stream Source: https://context7.com/cdgriffith/puremagic/llms.txt Identifies all file type matches from a binary stream, rewinding the stream after reading. ```APIDOC ## magic_stream(stream, filename=None) ### Description Returns all `PureMagicWithConfidence` matches from a binary stream. The stream is rewound to position 0 after reading. ### Parameters #### Request Body - **stream** (binary stream) - Required - The binary stream to analyze. - **filename** (string) - Optional - A filename hint to trigger deep scan. ### Request Example ```python import puremagic with open("test/resources/video/test.mp4", "rb") as f: matches = puremagic.magic_stream(f) for m in matches: print(f"{m.extension} {m.mime_type} conf={m.confidence}") print(f.tell()) # Stream is rewound to 0 ``` ### Response #### Success Response - **matches** (list) - A list of `PureMagicWithConfidence` objects representing the identified file types. #### Response Example ``` .mp4 video/mp4 conf=0.8 .mp4 video/mp4 conf=0.8 0 ``` ``` -------------------------------- ### Identify file from a stream Source: https://github.com/cdgriffith/puremagic/blob/master/README.rst Use `magic_stream` to identify file properties from an open file object. This method returns detailed information including byte matches, offsets, extensions, MIME types, names, and confidence levels. ```python with open(r"test\resources\video\test.mp4", "rb") as file: print(puremagic.magic_stream(file)) # [PureMagicWithConfidence(byte_match=b'ftypisom', offset=4, extension='.mp4', mime_type='video/mp4', name='MPEG-4 video', confidence=0.8), # PureMagicWithConfidence(byte_match=b'iso2avc1mp4', offset=20, extension='.mp4', mime_type='video/mp4', name='MP4 Video', confidence=0.8)] ``` -------------------------------- ### Disable deep scan using environment variable Source: https://github.com/cdgriffith/puremagic/blob/master/README.rst To disable the deep scan feature, set the `PUREMAGIC_DEEPSCAN` environment variable to `0`. ```bash $ export PUREMAGIC_DEEPSCAN=0 ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.