# PassportEye PassportEye is a Python library for extracting and parsing machine-readable zone (MRZ) information from scanned identification documents including passports, visas, and ID cards. The library uses advanced image processing techniques combined with Google Tesseract OCR to detect MRZ regions in arbitrarily positioned documents and extract structured data such as document number, holder's name, date of birth, nationality, and expiration date. The core processing pipeline uses morphological operations, edge detection, and contour analysis to locate candidate MRZ regions, then applies OCR with automatic error correction optimized for the limited MRZ character set. PassportEye supports all standard ICAO document types (TD1 for ID cards, TD2 for smaller passports, TD3 for standard passports, and MRVA/MRVB for visas) and includes validation via check digit verification. The library provides both a simple Python API and command-line tools for integration into document processing workflows. ## read_mrz - Main MRZ Extraction Function The primary interface for extracting MRZ data from document images. Takes an image file path or byte stream and returns a parsed MRZ object containing all extracted fields with validation status. Supports JPEG, PNG, and PDF files. ```python from passporteye import read_mrz # Basic usage with an image file mrz = read_mrz('/path/to/passport.jpg') if mrz is not None: # Check if parsing was successful if mrz.valid: print(f"Document Type: {mrz.type}") print(f"Country: {mrz.country}") print(f"Document Number: {mrz.number}") print(f"Surname: {mrz.surname}") print(f"Names: {mrz.names}") print(f"Nationality: {mrz.nationality}") print(f"Date of Birth: {mrz.date_of_birth}") print(f"Sex: {mrz.sex}") print(f"Expiration Date: {mrz.expiration_date}") print(f"Validation Score: {mrz.valid_score}/100") else: # Partial parsing - some fields may still be usable print(f"Partial match (score: {mrz.valid_score})") print(f"Check digits valid: {mrz.valid_check_digits}") else: print("No MRZ detected in image") # With ROI (Region of Interest) extraction mrz = read_mrz('/path/to/passport.jpg', save_roi=True) if mrz is not None and 'roi' in mrz.aux: roi_image = mrz.aux['roi'] # numpy ndarray of the detected MRZ region # Using legacy Tesseract engine (often better results) mrz = read_mrz('/path/to/passport.jpg', extra_cmdline_params='--oem 0') # Processing from byte stream with open('/path/to/passport.jpg', 'rb') as f: mrz = read_mrz(f) ``` ## MRZ Class - Text Parsing and Validation Parses MRZ text strings into structured data with full ICAO specification compliance. Supports TD1 (3-line, 30 chars), TD2 (2-line, 36 chars), TD3 (2-line, 44 chars), MRVA and MRVB visa formats. Validates all check digits and provides confidence scoring. ```python from passporteye.mrz.text import MRZ # Parse TD1 ID card (3 lines, 30 characters each) mrz_td1 = MRZ([ 'IDAUT10000999<6<<<<<<<<<<<<<<<', '7109094F1112315AUT<<<<<<<<<<<4', 'MUSTERFRAU<