### Install ocrmac from Source Source: https://github.com/straussmaximilian/ocrmac/blob/main/docs/installation.rst Install ocrmac after obtaining the source code, using the setup.py script. ```console python setup.py install ``` -------------------------------- ### Set up local development environment Source: https://github.com/straussmaximilian/ocrmac/blob/main/CONTRIBUTING.rst Install the project in a virtual environment using setup.py develop. This command installs the project in editable mode. ```shell mkvirtualenv ocrmac cd ocrmac/ python setup.py develop ``` -------------------------------- ### Install Click Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md Install the Click CLI framework. This is a required dependency for the CLI module, although it is not yet implemented. ```bash pip install Click>=7.0 ``` -------------------------------- ### OCR Class Constructor Examples Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/ocr-class.md Demonstrates how to instantiate the OCR class using different input types and framework configurations. Includes examples for image file paths, PIL Image objects, and the LiveText framework. ```python from ocrmac.ocrmac import OCR from PIL import Image # From image file path ocr = OCR('document.png', recognition_level='accurate', language_preference=['en-US']) # From PIL Image object img = Image.open('photo.jpg') ocr = OCR(img, framework='vision', detail=True) # Using livetext (macOS Sonoma+) ocr = OCR('scan.png', framework='livetext', unit='line') ``` -------------------------------- ### Install ocrmac via pip Source: https://github.com/straussmaximilian/ocrmac/blob/main/README.md Install the ocrmac library using pip. This is the first step before using any of its functionalities. ```bash pip install ocrmac ``` -------------------------------- ### Install pyobjc-framework-Vision Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md Install the pyobjc-framework-Vision wrapper for the Apple Vision framework. This is required for VNRecognizeTextRequest OCR functionality. ```bash pip install pyobjc-framework-Vision ``` -------------------------------- ### Install Matplotlib Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/errors.md Use this command to install the Matplotlib library if it is missing, which is required for the annotate_matplotlib() feature. ```bash pip install matplotlib ``` -------------------------------- ### Install Pillow Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md Install the Pillow library for image handling. This is a required dependency for image loading and format conversion. ```bash pip install pillow ``` -------------------------------- ### OCR Class Constructor Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/configuration.md Example of initializing the OCR class with various configuration options. The default values are shown for reference. ```python OCR( image, framework="vision", recognition_level="accurate", language_preference=None, confidence_threshold=0.0, detail=True, unit='token' ) ``` -------------------------------- ### Handle Framework Fallback Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Provides an example of attempting to use the 'livetext' framework and falling back to 'vision' if an ImportError occurs. ```python try: ocr = OCR(image, framework='livetext') except ImportError: ocr = OCR(image, framework='vision') ``` -------------------------------- ### OCR Initialization with LanguageCode Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Shows how to initialize the OCR class with single or multiple language preferences. It also includes an example of handling a ValueError for an invalid language code. ```Python from ocrmac.ocrmac import OCR # Single language ocr = OCR('english_doc.png', language_preference=['en-US']) # Multiple languages (post-processing hints) ocr = OCR('multilingual.png', language_preference=['en-US', 'de-DE', 'fr-FR']) # Query available languages from ocrmac.ocrmac import text_from_image try: text_from_image('page.png', language_preference=['invalid-code']) except ValueError as e: print("Available languages:", e) ``` -------------------------------- ### Run tests and checks Source: https://github.com/straussmaximilian/ocrmac/blob/main/CONTRIBUTING.rst Ensure your changes pass linting (flake8) and all tests, including cross-version compatibility with tox. Install flake8 and tox if not already present. ```shell flake8 ocrmac tests python setup.py test or pytest tox ``` -------------------------------- ### Download ocrmac Tarball Source: https://github.com/straussmaximilian/ocrmac/blob/main/docs/installation.rst Download the source tarball for ocrmac from GitHub to install from source. ```console curl -OJL https://github.com/straussmaximilian/ocrmac/tarball/master ``` -------------------------------- ### Basic OCR Recognition Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Initialize the OCR class with an image path and call the recognize method to get text, confidence, and bounding box information. This is a quick way to start using ocrmac for text extraction. ```python from ocrmac.ocrmac import OCR ocr = OCR('image.png') results = ocr.recognize() for text, confidence, bbox in results: print(f"{text} ({confidence:.0%})") ``` -------------------------------- ### Handle Matplotlib ImportError and Fallback Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/errors.md This example demonstrates how to handle an ImportError if Matplotlib is not installed when attempting to annotate an image. It provides a fallback to using the PIL annotation. ```python from ocrmac.ocrmac import OCR ocr = OCR('image.png') try: fig = ocr.annotate_matplotlib() except ImportError as e: print("Matplotlib not available, using PIL instead") annotated = ocr.annotate_PIL() ``` -------------------------------- ### LiveText Recognition Example (Token and Line Level) Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Demonstrates how to use OCR.recognize() for both token-level and line-level text extraction. The 'conf' value is always 1.0 for LiveText results. ```python from ocrmac.ocrmac import OCR # Token-level ocr = OCR('page.png', framework='livetext', unit='token') results = ocr.recognize() for text, conf, bbox in results: print(f"Token: {text}, Confidence: {conf}") # conf is always 1.0 # Line-level ocr_line = OCR('page.png', framework='livetext', unit='line') results_line = ocr_line.recognize() for line, conf, bbox in results_line: print(f"Full line: {line}") ``` -------------------------------- ### OCR Example with BoundingBox Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Demonstrates how to use the OCR class and process results, extracting text, confidence, and bounding box information. The bounding box coordinates are then printed in a formatted way. ```python from ocrmac.ocrmac import OCR ocr = OCR('document.png') results = ocr.recognize() for text, confidence, bbox in results: x, y, width, height = bbox print(f"Text: {text}") print(f" Position: ({x:.2f}, {y:.2f})") print(f" Size: {width:.2f} × {height:.2f}") ``` -------------------------------- ### Clone ocrmac Repository Source: https://github.com/straussmaximilian/ocrmac/blob/main/docs/installation.rst Clone the ocrmac public repository from GitHub to install from source. ```console git clone git://github.com/straussmaximilian/ocrmac ``` -------------------------------- ### Batch Process Documents with Quality Verification Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/configuration.md Iterate through files, process with 'accurate' recognition, and filter results by confidence. This example demonstrates a basic quality gate based on the ratio of high-confidence results. ```python import os from ocrmac.ocrmac import OCR for filename in os.listdir('batch_documents'): if filename.endswith('.pdf'): continue # Convert PDFs separately ocr = OCR( image=f'batch_documents/{filename}', framework='vision', recognition_level='accurate', language_preference=['en-US'], confidence_threshold=0.75, detail=True ) results = ocr.recognize() # Filter by confidence for quality gate high_confidence = [r for r in results if r[1] >= 0.9] low_confidence = [r for r in results if r[1] < 0.9] if len(low_confidence) > len(high_confidence): print(f"⚠️ {filename}: Low confidence ratio") else: print(f"✓ {filename}: Quality acceptable") ``` -------------------------------- ### Text-Only Extraction Example Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Shows how to extract only the recognized text from an image using text_from_image with detail=False. This function returns a list of strings. ```python from ocrmac.ocrmac import text_from_image # Returns List[str] texts = text_from_image('page.png', detail=False) for text in texts: print(text) ``` -------------------------------- ### Basic OCR Import and Usage Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md Import the main OCR class and use it to recognize text from an image. This is the most common way to start using OCRMAC for basic OCR tasks. ```python from ocrmac.ocrmac import OCR ocr = OCR('image.png') results = ocr.recognize() ``` -------------------------------- ### Generate Annotated Image Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Set `detail=True` and use `annotate_PIL()` to get an image with bounding boxes and recognized text drawn on it. Save the resulting PIL Image object. ```python ocr = OCR('image.png', detail=True) annotated = ocr.annotate_PIL() annotated.save('output.png') ``` -------------------------------- ### Process Images in a File Pipeline with OCRMac Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Use this pattern to process all PNG images in a directory, save annotated images, and output recognition results as JSON. Ensure the 'ocrmac' library is installed and images are in PNG format. ```python from pathlib import Path from ocrmac.ocrmac import OCR def process_image_directory(directory): output_dir = Path(directory) / 'ocr_results' output_dir.mkdir(exist_ok=True) for image_file in Path(directory).glob('*.png'): print(f"Processing {image_file.name}...") ocr = OCR(str(image_file), recognition_level='accurate') results = ocr.recognize() # Save annotations if results: annotated = ocr.annotate_PIL() annotated.save(output_dir / f"{image_file.stem}_annotated.png") # Save results as JSON results_data = [ {'text': text, 'confidence': conf, 'bbox': bbox} for text, conf, bbox in results ] import json with open(output_dir / f"{image_file.stem}_results.json", 'w') as f: json.dump(results_data, f, indent=2) process_image_directory('input_images') ``` -------------------------------- ### Clamping Coordinates to Image Bounds Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/coordinate-conversion.md Ensure coordinates stay within the image dimensions to prevent errors during cropping. This example shows how to clamp coordinates using max and min. ```python # ❌ Wrong: May produce coordinates outside image x1, y1, x2, y2 = convert_coordinates_pil(bbox, img.width, img.height) cropped = img.crop((x1, y1, x2, y2)) # May fail if outside bounds # ✓ Correct: Clamp to image dimensions x1 = max(0, min(img.width, x1)) y1 = max(0, min(img.height, y1)) x2 = max(0, min(img.width, x2)) y2 = max(0, min(img.height, y2)) cropped = img.crop((x1, y1, x2, y2)) ``` -------------------------------- ### OCR Initialization with OutputUnit Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Demonstrates how to initialize the OCR class for LiveText framework with different OutputUnit granularities. Use 'token' for fine-grained text units and 'line' for full lines of text. ```python from ocrmac.ocrmac import OCR # Token-level (fine-grained) ocr_token = OCR('page.png', framework='livetext', unit='token') tokens = ocr_token.recognize() # Line-level ocr_line = OCR('page.png', framework='livetext', unit='line') lines = ocr_line.recognize() ``` -------------------------------- ### OCR Initialization with Recognition Levels Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Demonstrates initializing the OCR class with different recognition levels. Use 'fast' for quick, less critical tasks and 'accurate' for important documents or poor image quality. ```python from ocrmac.ocrmac import OCR # Fast recognition for real-time use ocr_fast = OCR('webcam_frame.png', recognition_level='fast') # Accurate recognition for documents ocr_accurate = OCR('scanned_doc.png', recognition_level='accurate') ``` -------------------------------- ### OCR Initialization with Framework Selection Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Shows how to initialize the OCR class specifying either the 'vision' or 'livetext' framework. The 'vision' framework is the default and offers more configuration options, while 'livetext' is optimized for performance on macOS Sonoma and later. ```python from ocrmac.ocrmac import OCR # Vision framework (default) ocr_vision = OCR('page.png', framework='vision') # LiveText framework ocr_livetext = OCR('page.png', framework='livetext') ``` -------------------------------- ### OCR Initialization with ImageInput Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/types.md Demonstrates initializing the OCR class with different ImageInput types: a file path, an in-memory PIL Image, and a PIL Image loaded from bytes. ```Python from ocrmac.ocrmac import OCR from PIL import Image # From file path ocr1 = OCR('/path/to/image.png') # From PIL Image img = Image.open('/path/to/image.png') ocr2 = OCR(img) # From PIL Image loaded from bytes import io img_bytes = open('image.png', 'rb').read() img = Image.open(io.BytesIO(img_bytes)) ocr3 = OCR(img) ``` -------------------------------- ### Basic OCR from File Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR with a file path and recognize text with bounding box information. ```python from ocrmac.ocrmac import OCR ocr = OCR('document.png') results = ocr.recognize() for text, confidence, bbox in results: print(f"{text}: {confidence:.2f}") ``` -------------------------------- ### Try All OCR Frameworks Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Iterates through available OCR frameworks ('livetext', 'vision'), attempting to initialize OCR with each. It uses the first framework that does not raise an ImportError. ```python from ocrmac.ocrmac import OCR for framework in ['livetext', 'vision']: try: ocr = OCR('image.png', framework=framework) results = ocr.recognize() print(f"{framework}: {len(results)} results") break except ImportError: continue ``` -------------------------------- ### OCR Recognize Method Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Performs text recognition on the image. Set px=True to get results in pixels. ```python ocr.recognize(px=False) ``` -------------------------------- ### CLI main Function Signature Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md The entry point for the command-line tool. Currently a stub implementation. ```python @click.command() def main(args: list[str] | None = None) -> int ``` -------------------------------- ### Basic OCR Workflow Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Demonstrates the fundamental steps for performing OCR: import, create OCR object, recognize text, and access results. ```python from ocrmac.ocrmac import OCR ocr = OCR('image.png') results = ocr.recognize() for text, conf, bbox in results: print(text) ``` -------------------------------- ### OCR Result Format Example Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md OCR returns a list of tuples, each containing the recognized text, its confidence score, and the bounding box coordinates. ```python [ ("text string", 0.95, [x, y, width, height]), ("another text", 0.87, [x, y, width, height]), ... ] ``` -------------------------------- ### Get Normalized Coordinates of Text Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Retrieves normalized bounding box coordinates (x, y, width, height) for recognized text. These coordinates are relative to the image dimensions. ```python from ocrmac.ocrmac import OCR ocr = OCR('image.png') results = ocr.recognize() for text, conf, bbox in results: x, y, width, height = bbox print(f"'{text}' at ({x:.2f}, {y:.2f}) size ({width:.2f}×{height:.2f})") ``` -------------------------------- ### Adapting PIL Coordinates for Matplotlib Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/coordinate-conversion.md PIL and Matplotlib expect coordinate formats differently. This example demonstrates the correct conversion for Matplotlib, which requires x, y, width, and height. ```python # ❌ Wrong: PIL returns corners x1, y1, x2, y2 = convert_coordinates_pil(bbox, w, h) rect = patches.Rectangle((x1, y1), x2, y2) # Wrong! # ✓ Correct: Matplotlib needs width/height x, y, width, height = convert_coordinates_pyplot(bbox, w, h) rect = patches.Rectangle((x, y), width, height) ``` -------------------------------- ### Import OCR as Module Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Imports the ocrmac module and initializes an OCR instance. ```python # As module from ocrmac import ocrmac ocr = ocrmac.OCR('image.png') ``` -------------------------------- ### Use LiveText Framework with Fallback Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Attempt to use the 'livetext' framework if available, falling back to 'vision' if an `ImportError` occurs. This leverages native OS capabilities when possible. ```python try: ocr = OCR('image.png', framework='livetext') results = ocr.recognize() except ImportError: ocr = OCR('image.png', framework='vision') results = ocr.recognize() ``` -------------------------------- ### Initialize EasyOCR Reader for Russian Source: https://github.com/straussmaximilian/ocrmac/blob/main/BlogTest.ipynb Loads the EasyOCR model into memory for Russian language. This needs to run only once. ```python reader = easyocr.Reader(['ru']) # this needs to run only once to load the model into memory result = reader.readtext('wikipedia_test.png') result ``` -------------------------------- ### OCRMAC Data Flow Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md Illustrates the data flow within the OCRMAC module, starting from image input to the final processed text output, including optional annotation steps. ```text Image Input (file path or PIL Image) ↓ OCR.__init__() - Store configuration ↓ OCR.recognize() or helper function ↓ Vision/LiveText framework ↓ Raw text + confidence + normalized bbox ↓ Return as list of tuples ↓ Optional: Convert coordinates (pil/pyplot) Optional: Annotate image (PIL/matplotlib) ``` -------------------------------- ### Correct Usage for LiveText Recognition Level Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/errors.md Demonstrates the correct way to initialize OCR with the LiveText framework without specifying a recognition level. ```python # Correct: don't pass recognition_level for livetext ocr = OCR('image.png', framework='livetext') ``` -------------------------------- ### Image Annotation Workflow Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Shows how to create an OCR object with detail enabled, annotate the image using PIL, and save the annotated image. ```python ocr = OCR('image.png', detail=True) annotated = ocr.annotate_PIL(color='red') annotated.save('output.png') ``` -------------------------------- ### Initialize EasyOCR Reader Source: https://github.com/straussmaximilian/ocrmac/blob/main/BlogTest.ipynb Loads the EasyOCR model into memory. This needs to run only once. ```python import easyocr reader = easyocr.Reader(['en']) # this needs to run only once to load the model into memory result = reader.readtext('wikipedia_test.png') ``` -------------------------------- ### Get Pixel Coordinates from Normalized Bounding Boxes Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Converts normalized bounding box coordinates to pixel coordinates using the image dimensions. Useful for drawing or cropping based on OCR results. ```python from ocrmac.ocrmac import OCR, convert_coordinates_pil ocr = OCR('image.png') results = ocr.recognize() for text, conf, bbox_normalized in results: x1, y1, x2, y2 = convert_coordinates_pil( bbox_normalized, ocr.image.width, ocr.image.height ) print(f"'{text}' at pixels ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})") ``` -------------------------------- ### Clone the ocrmac repository Source: https://github.com/straussmaximilian/ocrmac/blob/main/CONTRIBUTING.rst Clone your forked repository locally to begin development. ```shell git clone git@github.com:your_name_here/ocrmac.git ``` -------------------------------- ### Extract Text from Image using LiveText Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/helper-functions.md Use `livetext_from_image` to extract text from an image. Specify `unit='line'` for full-line results or `detail=False` to get text only. Language preferences can be provided for recognition hints. ```python from ocrmac.ocrmac import livetext_from_image # Token-level (default, fine-grained) results = livetext_from_image('document.png') for text, confidence, bbox in results: print(f"Token: '{text}'") # Line-level output results_line = livetext_from_image('page.png', unit='line') for line_text, conf, bbox in results_line: print(f"Line: {line_text}") # With language preference results = livetext_from_image( 'chinese_text.png', language_preference=['zh-Hans', 'en-US'], unit='token' ) # Text only texts = livetext_from_image('scan.png', detail=False, unit='line') for text in texts: print(text) ``` -------------------------------- ### Batch Processing Workflow Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Demonstrates processing a list of images by iterating through them, creating an OCR object for each, and recognizing text. Includes error handling. ```python from ocrmac.ocrmac import OCR for image_file in image_list: try: ocr = OCR(image_file) results = ocr.recognize() # Process results except Exception as e: print(f"Error processing {image_file}: {e}") ``` -------------------------------- ### Get macOS Processor and Python Version Source: https://github.com/straussmaximilian/ocrmac/blob/main/ExampleNotebook.ipynb Retrieves and prints the macOS processor information using 'sysctl' and the Python version using the 'platform' module. This is useful for system diagnostics and environment checks. ```python import platform import subprocess def get_m_processor_info(): # Use subprocess to run the `sysctl` command, which retrieves system information on macOS try: processor_info = subprocess.check_output(["sysctl", "machdep.cpu.brand_string"]).strip().decode() return processor_info except subprocess.CalledProcessError: return "Unable to determine processor information." # Display the processor info processor_info = get_m_processor_info() print(f"Processor: {processor_info}") # Print python information python_version = platform.python_version() print(f"Python Version : {platform.python_version()}") ``` -------------------------------- ### Basic OCR from PIL Image Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR with a PIL Image object for recognition. ```python from ocrmac.ocrmac import OCR from PIL import Image img = Image.open('photo.jpg') ocr = OCR(img) results = ocr.recognize() ``` -------------------------------- ### Annotate Image with Matplotlib Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/ocr-class.md This method generates an annotated matplotlib figure highlighting recognized text. It requires `detail=True` in the OCR constructor and matplotlib to be installed. The recognized text is overlaid with bounding boxes and labels. ```python from ocrmac.ocrmac import OCR import matplotlib.pyplot as plt ocr = OCR('document.png', detail=True) fig = ocr.annotate_matplotlib(figsize=(15, 20), color='green', alpha=0.7) plt.show() # Save figure to file fig.savefig('annotated_document.png', dpi=150, bbox_inches='tight') ``` -------------------------------- ### OCR Class Initialization Parameters Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Configure the OCR class during initialization with parameters like image source, recognition framework, recognition level, language preference, confidence threshold, detail level, and unit for results. ```python OCR( image, # Path or PIL Image framework="vision", # or "livetext" (Sonoma+) recognition_level="accurate", # or "fast" language_preference=None, # ['en-US', 'de-DE', ...] confidence_threshold=0.0, # 0.0-1.0, ignored for livetext detail=True, # Include bounding boxes unit='token' # or 'line' (livetext only) ) ``` -------------------------------- ### LiveText Framework OCR with Fallback Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR using the 'livetext' framework available on macOS Sonoma (14.0+). Includes a fallback to the 'vision' framework if LiveText is not available. The 'unit' parameter can be set to 'line' for line-based recognition. ```python from ocrmac.ocrmac import OCR try: ocr = OCR('image.png', framework='livetext', unit='line') results = ocr.recognize() except ImportError: print("LiveText not available, falling back to Vision") ocr = OCR('image.png', framework='vision') results = ocr.recognize() ``` -------------------------------- ### Commit and push changes Source: https://github.com/straussmaximilian/ocrmac/blob/main/CONTRIBUTING.rst Stage all changes, commit them with a descriptive message, and push your branch to GitHub. This prepares your changes for a pull request. ```shell git add . git commit -m "Your detailed description of your changes." git push origin name-of-your-bugfix-or-feature ``` -------------------------------- ### OCR Class - Constructor Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Initializes the main class for text recognition with all available parameters. ```APIDOC ## OCR Class Constructor ### Description Initializes the OCR class, the main interface for text recognition operations. ### Parameters (Details on constructor parameters are not provided in the source text, but it's indicated that all parameters are available here.) ### Usage ```python from ocrmac import OCR ocr_instance = OCR(...) ``` ``` -------------------------------- ### Define Test Image Paths Source: https://github.com/straussmaximilian/ocrmac/blob/main/RegenerateTestImages.ipynb Defines the directory for tests and the paths for the source image and the different output image formats (fast, accurate, livetext). Ensures consistent file naming and location. ```python # Paths TESTS_DIR = "tests" SOURCE_IMAGE = os.path.join(TESTS_DIR, "test.png") OUTPUT_FAST = os.path.join(TESTS_DIR, "test_output_fast.png") OUTPUT_ACCURATE = os.path.join(TESTS_DIR, "test_output_accurate.png") OUTPUT_LIVETEXT = os.path.join(TESTS_DIR, "test_output_livetext.png") print(f"Source image: {SOURCE_IMAGE}") print(f"Output files will be saved to: {TESTS_DIR}/") ``` -------------------------------- ### Deploy new version Source: https://github.com/straussmaximilian/ocrmac/blob/main/CONTRIBUTING.rst Update the version number using bump2version, commit changes including an entry in HISTORY.rst, and push to GitHub and tags. Travis CI will handle deployment to PyPI upon successful tests. ```shell bump2version patch # possible: major / minor / patch git push git push --tags ``` -------------------------------- ### Import OCR with Helpers Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Imports the OCR class along with utility functions for convenience. ```python # With helpers from ocrmac.ocrmac import ( OCR, text_from_image, livetext_from_image, convert_coordinates_pil, convert_coordinates_pyplot ) ``` -------------------------------- ### Full Pipeline: OCR to Drawing with PIL Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/coordinate-conversion.md This snippet demonstrates the complete workflow from performing OCR to converting normalized bounding box coordinates to pixel coordinates for drawing annotations on an image using PIL. ```python from ocrmac.ocrmac import OCR, convert_coordinates_pil from PIL import Image, ImageDraw, ImageFont # 1. Perform OCR ocr = OCR('document.png', detail=True) results = ocr.recognize() # 2. Load original image img = Image.open('document.png') draw = ImageDraw.Draw(img) # 3. Convert and draw results for text, confidence, bbox_normalized in results: # Convert from normalized to pixel coordinates x1, y1, x2, y2 = convert_coordinates_pil( bbox_normalized, img.width, img.height ) # Draw annotation if confidence > 0.8: draw.rectangle((x1, y1, x2, y2), outline='green', width=2) else: draw.rectangle((x1, y1, x2, y2), outline='orange', width=2) draw.text((x1, y1), text, fill='black') # 4. Save result img.save('annotated.png') ``` -------------------------------- ### Import All OCRMAC Helper Functions Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/module-overview.md Import all available helper functions from the OCRMAC module. This is useful when you need access to multiple utility functions for image processing and OCR. ```python from ocrmac.ocrmac import ( OCR, text_from_image, livetext_from_image, convert_coordinates_pil, convert_coordinates_pyplot ) ``` -------------------------------- ### Line-Level Output (LiveText) Configuration Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/configuration.md For processing line-by-line text on macOS Sonoma and later. This configuration provides one result per line with a confidence of 1.0, suitable for document layout analysis and line extraction. ```python from ocrmac.ocrmac import OCR ocr = OCR( image='page.png', framework='livetext', unit='line', detail=True ) results = ocr.recognize() for line_text, conf, bbox in results: print(f"Line: {line_text}") ``` -------------------------------- ### OCR Configuration: LiveText (Sonoma+) Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Use this profile to enable OCR with the LiveText framework on macOS Sonoma or later. Results are returned line by line. ```python OCR(image, framework='livetext', unit='line') ``` -------------------------------- ### OCR with Single Language Preference Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR with a specific single language preference. Use this when the document language is known. ```python from ocrmac.ocrmac import OCR ocr = OCR('english_doc.png', language_preference=['en-US']) results = ocr.recognize() ``` -------------------------------- ### Basic OCR Import Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Imports the core OCR class for basic text recognition. ```python # Basic import from ocrmac.ocrmac import OCR ``` -------------------------------- ### Coordinate Conversion Workflow Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/INDEX.md Illustrates obtaining OCR results and then converting bounding box coordinates using the `convert_coordinates_pil` helper function. ```python from ocrmac.ocrmac import convert_coordinates_pil results = ocr.recognize() x1, y1, x2, y2 = convert_coordinates_pil(bbox, width, height) ``` -------------------------------- ### Vision Framework OCR Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR using the default 'vision' framework, which is compatible with macOS 10.15 and later. ```python from ocrmac.ocrmac import OCR ocr = OCR('image.png', framework='vision') results = ocr.recognize() ``` -------------------------------- ### OCR with Auto Language Detection Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR with `language_preference=None` to let the framework automatically detect the language of the input image. ```python from ocrmac.ocrmac import OCR ocr = OCR('unknown_language.png', language_preference=None) results = ocr.recognize() ``` -------------------------------- ### Display System Information Source: https://github.com/straussmaximilian/ocrmac/blob/main/RegenerateTestImages.ipynb Imports necessary libraries and displays system information including macOS version, processor type, and Python version. This helps in understanding the environment where the tests are run. ```python import os import platform import subprocess from ocrmac import ocrmac # Display system info for reference def get_processor_info(): try: return subprocess.check_output(["sysctl", "machdep.cpu.brand_string"]).strip().decode() except subprocess.CalledProcessError: return "Unknown" print(f"macOS: {platform.mac_ver()[0]}") print(f"Processor: {get_processor_info()}") print(f"Python: {platform.python_version()}") ``` -------------------------------- ### Graceful Degradation: Fallback to Vision Framework Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/errors.md Use a try-except block to attempt OCR with the LiveText framework and fall back to the Vision framework if LiveText is not available (ImportError). ```python from ocrmac.ocrmac import OCR # Try LiveText, fall back to Vision try: ocr = OCR('page.png', framework='livetext') results = ocr.recognize() except ImportError: print("LiveText not available, using Vision framework") ocr = OCR('page.png', framework='vision') results = ocr.recognize() ``` -------------------------------- ### OCR Class Constructor Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/ocr-class.md Initializes the OCR class to prepare for text extraction. It allows configuration of the OCR framework, recognition level, language preferences, confidence threshold, detail level, and output unit. ```APIDOC ## OCR Class Constructor ### Description Initializes the OCR class for text extraction from images using specified frameworks and configurations. ### Parameters #### Constructor Parameters - **image** (str or PIL.Image.Image) - Required - Path to image file or PIL Image object. - **framework** (str) - Optional - OCR framework to use: `"vision"` (Apple VisionKit) or `"livetext"` (macOS Sonoma+). Defaults to `"vision"`. - **recognition_level** (str) - Optional - Recognition accuracy for vision framework: `"fast"` or `"accurate"`. Ignored for livetext. Defaults to `"accurate"`. - **language_preference** (list of str or None) - Optional - List of IANA language codes for post-processing hints. Defaults to `None`. - **confidence_threshold** (float) - Optional - Minimum confidence score (0.0–1.0) to include results. Ignored for livetext. Defaults to `0.0`. - **detail** (bool) - Optional - Whether to return bounding box coordinates. Set to `False` to return text only. Defaults to `True`. - **unit** (str) - Optional - LiveText output granularity: `'token'` (finest-grained, default) or `'line'` (one result per line). Ignored for vision framework. Defaults to `'token'`. ### Returns OCR instance. Use `.recognize()` to perform OCR. ### Raises - **ValueError**: If `image` is not a valid path or PIL.Image.Image. - **ValueError**: If `framework` is not `"vision"` or `"livetext"`. - **ValueError**: If `recognition_level` is not `"accurate"` or `"fast"`. - **ValueError**: If `unit` is not `'token'` or `'line'`. - **ValueError**: If `livetext` framework is selected but `recognition_level` or `confidence_threshold` are explicitly set to non-default values. ### Example ```python from ocrmac.ocrmac import OCR from PIL import Image # From image file path ocr = OCR('document.png', recognition_level='accurate', language_preference=['en-US']) # From PIL Image object img = Image.open('photo.jpg') ocr = OCR(img, framework='vision', detail=True) # Using livetext (macOS Sonoma+) ocr = OCR('scan.png', framework='livetext', unit='line') ``` ``` -------------------------------- ### Visualize EasyOCR Results Source: https://github.com/straussmaximilian/ocrmac/blob/main/BlogTest.ipynb Displays the image with bounding boxes and text extracted by EasyOCR. Saves the output to a file. ```python import matplotlib.pyplot as plt im = plt.imread('wikipedia_test.png') print(im.shape) fig = plt.figure(figsize=(15,15)) plt.imshow(im) for _ in result: x = [n[0] for n in _[0]] y = [n[1] for n in _[0]] plt.fill(x,y, facecolor='none', edgecolor='red') plt.text(x[0],y[0], _[1], color='red', fontsize=15) plt.axis('off') plt.savefig('output_easyocr.png') plt.show() ``` -------------------------------- ### Annotate Image with PIL Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/ocr-class.md Use this method to create an annotated copy of an image with recognized text highlighted using PIL. Ensure `detail=True` is set in the OCR constructor and `.recognize()` has been called. ```python from ocrmac.ocrmac import OCR ocr = OCR('screenshot.png', detail=True) annotated = ocr.annotate_PIL(color='blue', fontsize=14) annotated.save('annotated_screenshot.png') # Display in Jupyter from IPython.display import display display(annotated) ``` -------------------------------- ### OCR with Multiple Language Preferences Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Initialize OCR with a list of preferred languages. The library will attempt to detect and process text in any of these languages. ```python from ocrmac.ocrmac import OCR ocr = OCR( 'multilingual_page.png', language_preference=['en-US', 'de-DE', 'fr-FR'] ) results = ocr.recognize() ``` -------------------------------- ### Run Test Suite Command Source: https://github.com/straussmaximilian/ocrmac/blob/main/RegenerateTestImages.ipynb Provides the bash command to execute the OCRMac test suite using pytest after regenerating the reference images. This confirms that the OCR process is working as expected with the new images. ```bash pytest tests/test_ocrmac.py -v ``` -------------------------------- ### recognize() Method Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/ocr-class.md Performs text recognition on an image, returning recognized text, confidence scores, and bounding box coordinates. Supports conversion to pixel coordinates and different recognition frameworks. ```APIDOC ## Method: recognize() ### Description Performs text recognition on the image. Returns a list of tuples, where each tuple contains the recognized text, its confidence score, and the bounding box coordinates. ### Method Signature ```python def recognize(px=False) -> list[tuple[str, float, list[float, float, float, float]]] ``` ### Parameters #### Query Parameters - **px** (bool) - Optional - Default: `False` - If True, convert bounding box from normalized (0–1) to pixel coordinates. If False, return normalized coordinates. ### Returns - **list[tuple[str, float, list[float, float, float, float]]**: A list of tuples, where each tuple contains: - **text** (str): Recognized text string. - **confidence** (float): Confidence score (0.0–1.0). Always 1.0 for livetext. - **bbox** (list): Bounding box coordinates `[x, y, width, height]`. Coordinates are normalized (0–1) unless `px=True`. ### Details - **Coordinate System**: Uses normalized coordinates (0,0 is lower-left, 1,1 is upper-right) by default. Can be converted to pixel coordinates using `convert_coordinates_pil()` or `convert_coordinates_pyplot()`. - **Filtering**: Results below `confidence_threshold` are automatically excluded for the vision framework. This does not apply to livetext. ### Raises - **ImportError**: If the livetext framework is selected but not available (requires macOS Sonoma+). - **RuntimeError**: If livetext analysis fails during processing. ### Example ```python from ocrmac.ocrmac import OCR # Basic OCR ocr = OCR('page.png', recognition_level='accurate') results = ocr.recognize() for text, confidence, bbox in results: print(f"Text: {text}, Confidence: {confidence:.2f}, Box: {bbox}") # Get pixel coordinates results_px = ocr.recognize(px=True) for text, conf, (x1, y1, x2, y2) in results_px: print(f"Crop region: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})") # LiveText with line-level output ocr_line = OCR('document.png', framework='livetext', unit='line') lines = ocr_line.recognize() for line_text, conf, bbox in lines: print(f"Line: {line_text}") ``` ``` -------------------------------- ### OCR Configuration: Balanced Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/QUICK-REFERENCE.md Use this profile for a balance between speed and accuracy with the Vision framework, applying a moderate confidence threshold. ```python OCR(image, framework='vision', recognition_level='accurate', confidence_threshold=0.7) ``` -------------------------------- ### Framework Benchmark Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/usage-patterns.md Benchmark the performance of 'vision' and 'livetext' frameworks for OCR recognition. Measures the time taken and the number of results obtained for different recognition levels. ```python import time from ocrmac.ocrmac import OCR test_image = 'document.png' # Vision benchmark for level in ['fast', 'accurate']: ocr = OCR(test_image, framework='vision', recognition_level=level) start = time.time() results = ocr.recognize() elapsed = time.time() - start print(f"Vision {level}: {elapsed*1000:.1f}ms, {len(results)} results") # LiveText benchmark (if available) try: ocr = OCR(test_image, framework='livetext') start = time.time() results = ocr.recognize() elapsed = time.time() - start print(f"LiveText: {elapsed*1000:.1f}ms, {len(results)} results") except ImportError: print("LiveText not available") ``` -------------------------------- ### OCR Configuration for macOS Sonoma+ Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/configuration.md On macOS Sonoma and later, this configuration attempts to use the faster LiveText framework first, falling back to Vision if LiveText is unavailable. ```python # Try LiveText first for speed try: ocr = OCR('image.png', framework='livetext') except ImportError: ocr = OCR('image.png', framework='vision') ``` -------------------------------- ### Configure OCR for Screenshot Analysis Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/configuration.md Employ 'fast' recognition level for quick analysis of screenshots, which typically have good image quality. A lower confidence threshold may be acceptable. ```python ocr = OCR( image='screenshot.png', framework='vision', recognition_level='fast', confidence_threshold=0.6, detail=True ) ``` -------------------------------- ### annotate_PIL() Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/api-reference/ocr-class.md Creates an annotated copy of the image with recognized text highlighted using PIL. This method draws bounding boxes and overlays text labels with customizable color and font size. ```APIDOC ## Method: annotate_PIL() ### Description Create an annotated copy of the image with recognized text highlighted. ### Method Signature ```python def annotate_PIL(color="red", fontsize=12) -> PIL.Image.Image ``` ### Parameters #### Path Parameters * None #### Query Parameters * None #### Request Body * None ### Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | color | str | `"red"` | Color of bounding boxes and text labels. Any PIL-compatible color name or hex code. | | fontsize | int | `12` | Font size for text labels in pixels. | ### Returns PIL.Image.Image: Annotated image with bounding boxes drawn and text labels overlaid. ### Requirements - `detail=True` must be set in the OCR constructor - `.recognize()` must be called first or will be called automatically - Requires Arial Unicode font available on the system ### Raises | Error | Condition | |-------|-----------| | `ValueError` | detail=False in OCR constructor | ### Example ```python from ocrmac.ocrmac import OCR ocr = OCR('screenshot.png', detail=True) annotated = ocr.annotate_PIL(color='blue', fontsize=14) annotated.save('annotated_screenshot.png') # Display in Jupyter from IPython.display import display display(annotated) ``` ``` -------------------------------- ### OCRMAC Project File Structure Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/README.md This snippet displays the directory and file organization of the OCRMAC project. It helps users understand where to find different types of documentation and code. ```tree output/ ├── README.md ← You are here ├── INDEX.md ← Navigation hub ├── QUICK-REFERENCE.md ← Cheat sheet ├── module-overview.md ← Package structure ├── types.md ← Type definitions ├── configuration.md ← Options and setup ├── errors.md ← Error reference ├── usage-patterns.md ← Code examples └── api-reference/ ├── ocr-class.md ← OCR class API ├── helper-functions.md ← Helper functions API └── coordinate-conversion.md ← Coordinate systems ``` -------------------------------- ### Handle Invalid Framework Selection Error Source: https://github.com/straussmaximilian/ocrmac/blob/main/_autodocs/errors.md Catch ValueError when an unsupported framework name like 'tesseract' is provided. ```python from ocrmac.ocrmac import OCR try: ocr = OCR('image.png', framework='tesseract') except ValueError as e: print(f"Error: {e}") ```