### Install and Run Inscriptis Web Service Source: https://context7.com/weblyzard/inscriptis/llms.txt Instructions for installing the web service extras and starting the Inscriptis API using uvicorn or the provided Docker image. Covers installation, server startup, and Docker usage. ```bash # Install with web-service extras pip install inscriptis[web-service] # Start the server uvicorn inscriptis.service.web:app --host 127.0.0.1 --port 5000 # or: inscriptis-api # Docker docker pull ghcr.io/weblyzard/inscriptis:latest docker run -p 5000:5000 ghcr.io/weblyzard/inscriptis:latest ``` -------------------------------- ### Install Inscriptis using easy_install Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Alternative installation method using easy_install if pip is not available. ```bash $ easy_install inscriptis ``` -------------------------------- ### Install Inscriptis Web Service Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Command to install the Inscriptis library with the optional web-service feature. ```bash $ pip install inscriptis[web-service] ``` -------------------------------- ### Install Inscriptis using pip Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Install the Inscriptis library using pip. This is the recommended method for most users. ```bash $ pip install inscriptis ``` -------------------------------- ### Start Inscriptis Web Service Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Command to start the Inscriptis web service using uvicorn. ```bash $ uvicorn inscriptis.service.web:app --port 5000 --host 127.0.0.1 ``` -------------------------------- ### Print Fact Example Source: https://github.com/weblyzard/inscriptis/blob/master/RENDERING.md A simple Python print statement. No specific setup or constraints are mentioned. ```python print(fact) ``` -------------------------------- ### Example HTML for Annotation Source: https://github.com/weblyzard/inscriptis/blob/master/docs/README.rst A sample HTML snippet used to demonstrate how annotation rules are applied. ```html

Chur

Chur is the capital and largest town of the Swiss canton of the Grisons and lies in the Grisonian Rhine Valley. ``` -------------------------------- ### Inscriptis Annotation Profile JSON Example Source: https://context7.com/weblyzard/inscriptis/llms.txt An example JSON file defining annotation rules for the Inscriptis CLI. Shows how to map HTML elements and CSS selectors to annotation labels. ```json { "h1": ["heading", "h1"], "h2": ["heading", "h2"], "b": ["emphasis"], "div#class=toc": ["table-of-contents"], "#class=FactBox": ["fact-box"], "#cite": ["citation"], "a#title": ["entity"] } ``` -------------------------------- ### JSONL Output Example with Annotations Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Example of Inscriptis output in JSONL format, including extracted text and annotations for headings and emphasis. ```json {"text": "Chur\n\nChur is the capital and largest town of the Swiss canton of the Grisons and lies in the Grisonian Rhine Valley.", "label": [[0, 4, "heading"], [0, 4, "h1"], [6, 10, "emphasis"]]} ``` -------------------------------- ### Curl Request to Get Version Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Example cURL command to check the version of the Inscriptis web service. ```bash $ curl http://localhost:5000/version ``` -------------------------------- ### Command-line Usage with Postprocessor Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Example of using the Inscriptis command-line tool with a postprocessor to annotate content. ```bash $ inscript https://www.fhgr.ch \ -r ./examples/annotation/annotation-profile.json \ -p surface ``` -------------------------------- ### Inscriptis CLI Tool Usage Source: https://context7.com/weblyzard/inscriptis/llms.txt Examples demonstrating the usage of the `inscript` command-line tool for various conversion and annotation tasks. ```APIDOC ## Basic Usage ### Convert URL to text ```bash inscript https://en.wikipedia.org/wiki/Chur ``` ### Convert local file to text and save ```bash inscript page.html -o page.txt ``` ## Advanced Options ### Strict indentation ```bash inscript --indentation strict page.html -o page-strict.txt ``` ### Show link targets inline ```bash inscript -l https://example.com ``` ### Show image alt captions and deduplicate ```bash inscript -i -d page.html ``` ### Annotate using JSON rules and output raw JSONL ```bash inscript -r annotation-profile.json https://example.com ``` ### Annotate and postprocess to XML ```bash inscript -r annotation-profile.json -p xml https://example.com ``` ### Annotate and postprocess to surface forms (JSON) ```bash inscript -r annotation-profile.json -p surface https://example.com ``` ### Annotate and postprocess to highlighted HTML ```bash inscript -r annotation-profile.json -p html https://example.com -o annotated.html ``` ### Convert from stdin ```bash echo "

Hello world

" | inscript -o output.txt ``` ### Custom table cell separator ```bash inscript --table-cell-separator " | " page.html ``` ## Example `annotation-profile.json` ```json { "h1": ["heading", "h1"], "h2": ["heading", "h2"], "b": ["emphasis"], "div#class=toc": ["table-of-contents"], "#class=FactBox": ["fact-box"], "#cite": ["citation"], "a#title": ["entity"] } ``` ``` -------------------------------- ### Python Hello World Program Source: https://github.com/weblyzard/inscriptis/blob/master/RENDERING.md A basic 'Hello, world!' program in Python. This example is used to show rendering differences. ```python print('Hello, world!') ``` -------------------------------- ### Python Loop and Print Example Source: https://github.com/weblyzard/inscriptis/blob/master/tests/html/advanced-prefix-test.txt Demonstrates a basic Python for loop with a cumulative sum and subsequent print statements. ```python y=0 for x in range(3,10): print(x) y += x print(y) ``` ```python print("Hallo") print("Echo") print("123") ``` -------------------------------- ### HTML Snippet for Annotation Example Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst An example HTML snippet used to demonstrate how Inscriptis applies annotation rules. ```html

Chur

Chur is the capital and largest town of the Swiss canton of the Grisons and lies in the Grisonian Rhine Valley. ``` -------------------------------- ### Inscript CLI Usage Examples Source: https://context7.com/weblyzard/inscriptis/llms.txt Demonstrates various command-line interface commands for Inscriptis, including converting URLs, local files, applying strict indentation, showing link targets, handling images, annotating, and processing stdin. Covers common use cases and options. ```bash # Convert a URL to text inscript https://en.wikipedia.org/wiki/Chur # Convert a local file, save output inscript page.html -o page.txt # Strict indentation (Firefox-like, no extra div/span padding) inscript --indentation strict page.html -o page-strict.txt # Show link targets inline: [link text](URL) inscript -l https://example.com # Show image alt captions, deduplicate repeated ones inscript -i -d page.html # Annotate using a JSON rules file, output raw JSONL inscript -r annotation-profile.json https://example.com # Annotate + postprocess to XML inscript -r annotation-profile.json -p xml https://example.com # Annotate + postprocess to surface forms (JSON) inscript -r annotation-profile.json -p surface https://example.com # Annotate + postprocess to highlighted HTML inscript -r annotation-profile.json -p html https://example.com -o annotated.html # Convert from stdin echo "

Hello world

" | inscript -o output.txt # Custom table cell separator inscript --table-cell-separator " | " page.html ``` -------------------------------- ### JSON Output Example for Annotations Source: https://github.com/weblyzard/inscriptis/blob/master/docs/api.md Illustrates the expected JSON structure for text and annotations returned by `get_annotated_text()`. ```json {"text": "Chur\n\nChur is the capital and largest town of the Swiss canton of the Grisons and lies in the Grisonian Rhine Valley.", "label": [[0, 4, "heading"], [6, 10, "emphasis"]]} ``` -------------------------------- ### Annotated Text Output (JSONL) Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Example of JSON Lines (JSONL) output from Inscriptis when processing annotated HTML. It includes the extracted text and a list of labels with their start and end indices. ```json {"text": "Chur\n\nChur is the capital and largest town of the Swiss canton of the Grisons and lies in the Grisonian Rhine Valley.", "label": [[0, 4, "heading"], [0, 4, "h1"], [6, 10, "emphasis"]]} ``` -------------------------------- ### Curl Request to Get Text Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Example cURL command to send an HTML file to the Inscriptis web service and retrieve plain text. ```bash $ curl -X POST -H "Content-Type: text/html; encoding=UTF8" \ --data-binary @test.html http://localhost:5000/get_text ``` -------------------------------- ### XML Output with Annotations Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Example XML output from Inscriptis when using the 'xml' postprocessor, including annotated text. ```xml Chur Chur is the capital and largest town of the Swiss canton of the Grisons and lies in the Grisonian Rhine Valley. ``` -------------------------------- ### Python Code Rendering (lynx) Source: https://github.com/weblyzard/inscriptis/blob/master/RENDERING.md Python code examples as rendered by lynx, showing differences in whitespace handling compared to inscriptis. ```python Python programming examples[edit] Hello world program: print('Hello, world!') Program to calculate the factorial of a positive integer: n = int(input('Type a number, and its factorial will be printed: ')) if n < 0: raise ValueError('You must enter a positive integer') fact = 1 i = 2 while i <= n: fact *= i i += 1 ``` -------------------------------- ### Annotation Rules and Metadata Extraction Example Source: https://github.com/weblyzard/inscriptis/blob/master/docs/benchmarking.md Example JSON structure for annotation rules and extracted metadata, used in Inscriptis 2.0 and later for specific test cases. ```json { "annotation_rules": { "h1": ["heading"], "b": ["emphasis"] }, "result": [ ["heading", "The first"], ["heading", "The second"], ["heading", "Subheading"] ] } ``` -------------------------------- ### HTML to Text Conversion Example Source: https://github.com/weblyzard/inscriptis/blob/master/README.rst Demonstrates the difference in conversion quality between Inscriptis and Beautiful Soup for HTML enumerations. Inscriptis provides a more accurate, layout-aware output. ```html