### Start Local Web Server

Source: https://github.com/scribeocr/scribe.js/blob/master/scribe-ui/README.md

Use this command to serve the basic viewer locally. Ensure you are in the parent directory where `scribe-ui/` and `scribe.js/` are siblings.

```bash
npx http-server
```

--------------------------------

### Run the Textract Proxy Server

Source: https://github.com/scribeocr/scribe.js/blob/master/examples/server-textract-proxy/README.md

Execute this command in the terminal to start the proxy server. Ensure Node.js 20+ and AWS credentials are set up.

```sh
cd scribe.js/examples/server-textract-proxy
AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... \
TEXTRACT_REGIONS=us-east-1 \
node server.js
```

--------------------------------

### Install Scribe.js via npm

Source: https://github.com/scribeocr/scribe.js/blob/master/README.md

Install the Scribe.js library using npm. This command is used for setting up the project.

```sh
npm i scribe.js-ocr
```

--------------------------------

### Clone and Set Up Scribe.js Locally

Source: https://github.com/scribeocr/scribe.js/blob/master/README.md

Clone the Scribe.js repository, including submodules, and install dependencies. Run automated tests before making a Pull Request.

```sh
## Clone the repo, including recursively cloning submodules
git clone --recurse-submodules git@github.com:scribeocr/scribe.js.git
cd scribe.js

## Install dependencies
npm i

## Make changes
## [...]

## Run automated tests before making PR
npm run test
```

--------------------------------

### Programmatic Integration with scribe.js

Source: https://github.com/scribeocr/scribe.js/blob/master/examples/server-textract-proxy/README.md

Example of how to integrate the Textract proxy into your application using the scribe.js library.

```APIDOC
## Wire it into your own app

```js
import scribe from 'scribe.js';
import { RecognitionModelServerProxy } from './RecognitionModelServerProxy.js';

await scribe.init();
await scribe.importFiles([pdfFileFromInput]);

const ac = new AbortController();
cancelButton.addEventListener('click', () => ac.abort());

try {
  await scribe.recognize({
    model: RecognitionModelServerProxy,
    modelOptions: { serverUrl: 'https://your-server.example/ocr' },
    signal: ac.signal,
  });
  // scribe.data.ocr.active is populated; scribe.exportData('pdf') returns a searchable PDF
} catch (err) {
  if (err.name !== 'AbortError') throw err;
  // Partial results in scribe.data.ocr.active for pages that arrived before abort.
}
```
```

--------------------------------

### Scribe.js CLI Examples

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Command-line interface for batch OCR, PDF processing, and text extraction. Supports various formats and operations like recognition, checking confidence, and overlaying text.

```bash
# Extract text from an image (outputs .txt file)
scribe extract scan.png
```

```bash
# Extract text from a PDF, output as searchable PDF
scribe extract document.pdf --format pdf
```

```bash
# Process all supported files in a directory
scribe extract ./scans/ --dir --format txt
```

```bash
# Recognize and create a searchable PDF with invisible text layer
scribe recognize scan.pdf --output ./output/
```

```bash
# Check OCR confidence of a file
scribe check document.pdf
```

```bash
# Overlay OCR text on a PDF with a visual proof overlay
scribe overlay scan.pdf --output ./output/ --vis
```

```bash
# Detect whether a PDF is text-native or image-based
scribe detect-pdf-type document.pdf
```

--------------------------------

### Accessing OCR Data in Scribe.js

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Iterate through OCR words on a specific page to access their text and confidence scores. Also shows how to get the total page count and calculate an overall accuracy estimate.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);
await scribe.recognize();

// Iterate OCR words on page 0
const page0 = scribe.data.ocr.active[0];
for (const line of page0.lines) {
  for (const word of line.words) {
    console.log(word.text, word.conf); // text and confidence score
  }
}

// Total page count
console.log(scribe.inputData.pageCount);

// Confidence summary across all pages
const { highConf, total } = scribe.utils.calcConf(scribe.data.ocr.active);
console.log(`Accuracy estimate: ${((highConf / total) * 100).toFixed(1)}%`);

await scribe.terminate();
```

--------------------------------

### init

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Initialize the program and optionally pre-load resources like the PDF renderer and OCR engine.

```APIDOC
## init

Initialize the program and optionally pre-load resources.

### Parameters

* `params` **[Object]?**
    * `params.pdf` **[boolean]** Load PDF renderer. (optional, default `false`)
    * `params.ocr` **[boolean]** Load OCR engine. (optional, default `false`)
    * `params.font` **[boolean]** Load built-in fonts.
        The PDF renderer and OCR engine are automatically loaded when needed.
        Therefore, the only reason to set `pdf` or `ocr` to `true` is to pre-load them. (optional, default `false`)
```

--------------------------------

### Serve Demo HTML

Source: https://github.com/scribeocr/scribe.js/blob/master/examples/server-textract-proxy/README.md

Run this command from the scribe.js checkout root to serve the demo HTML file. This is used to test the proxy server from a browser.

```sh
# In a second terminal, from the scribe.js checkout root (NOT this folder):
cd scribe.js
npx http-server -p 8081 --cors
```

--------------------------------

### Run Standalone Tauri Viewer

Source: https://github.com/scribeocr/scribe.js/blob/master/scribe-ui/README.md

Launch the built Tauri application. This command requires the path to a PDF file. Additional flags can be used to specify the initial page, action, or highlights.

```bash
./basic-viewer/tauri/target/release/scribe-viewer-tauri -f /path/to/file.pdf
```

--------------------------------

### Build Standalone Tauri Viewer

Source: https://github.com/scribeocr/scribe.js/blob/master/scribe-ui/README.md

Execute this script to build the standalone desktop viewer. It auto-detects the environment and uses `cargo` if available, otherwise falling back to Docker. Ensure Rust and Tauri dependencies are met if not using the dev container.

```bash
bash basic-viewer/tauri/build.sh
```

--------------------------------

### scribe.init(params?)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Initializes the Scribe.js library and pre-loads necessary resources like the PDF renderer, OCR engine, and fonts. This is useful for avoiding startup delays during the first user interaction.

```APIDOC
## scribe.init(params?)

### Description
Pre-loads the PDF renderer, OCR engine, and/or built-in fonts. All three are loaded on-demand automatically; calling `init` is only necessary to avoid a startup delay when the user first interacts with the page.

### Method
`init`

### Parameters
- **params** (object) - Optional - An object to specify which resources to pre-load (e.g., `{ pdf: true, ocr: true, font: true }`). Can also include `ocrParams` for custom Tesseract configurations.

### Request Example
```javascript
import scribe from 'scribe.js-ocr';

// Pre-load everything upfront
await scribe.init({ pdf: true, ocr: true, font: true });

// Pre-load OCR only with custom Tesseract parameters
await scribe.init({
  ocr: true,
  ocrParams: { corePath: '/custom/tesseract-core.wasm.js' },
});
```
```

--------------------------------

### Import PDF as ArrayBuffer with SortedInputFiles

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Import a PDF file provided as an ArrayBuffer using the `SortedInputFiles` object. This is necessary when passing raw buffer data.

```javascript
import scribe from 'scribe.js-ocr';
import fs from 'node:fs';
const pdfBuffer = fs.readFileSync('scan.pdf').buffer;
await scribe.importFiles({
  pdfFiles: [pdfBuffer],   // ArrayBuffer
});
```

--------------------------------

### recognize

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Recognize all pages in the active document. Files must be imported first using `importFiles`.

```APIDOC
## recognize

Recognize all pages in active document.
Files for recognition should already be imported using `importFiles` before calling this function.
The results of recognition can be exported by calling `exportFiles` after this function.

### Parameters

* `langs` **[Array]<[string]>**  (optional, default `['eng']`)
```

--------------------------------

### Initialize Scribe.js with Custom Tesseract Parameters

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Initialize Scribe.js with OCR enabled and custom Tesseract parameters, such as specifying the core path for the WebAssembly file.

```javascript
import scribe from 'scribe.js-ocr';

// Pre-load OCR only with custom Tesseract parameters
await scribe.init({
  ocr: true,
  ocrParams: { corePath: '/custom/tesseract-core.wasm.js' },
});
```

--------------------------------

### Initialize Scribe.js with Default Resources

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Pre-load the PDF renderer, OCR engine, and built-in fonts for Scribe.js. This is optional and useful for avoiding startup delays on user interaction.

```javascript
import scribe from 'scribe.js-ocr';

// Pre-load everything upfront
await scribe.init({ pdf: true, ocr: true, font: true });
```

--------------------------------

### Import Supplemental OCR Data for Comparison

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Loads additional OCR data, such as ground-truth, to enable comparison with the primary OCR results. Requires importing primary files and recognizing them first.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);
await scribe.recognize();

// Load ground-truth hOCR for accuracy evaluation
await scribe.importFilesSupp(['ground-truth.hocr'], 'Ground Truth');

// Compare primary OCR vs ground truth
const comparison = await scribe.compareOCR(
  scribe.data.ocr.active,
  scribe.data.ocr['Ground Truth'],
);
console.log(comparison.metrics); // per-page word-level metrics
await scribe.terminate();
```

--------------------------------

### Cloud Adapter - AWS Textract

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Drop-in recognition model that sends pages to AWS Textract instead of the built-in Tesseract engine. Requires AWS credentials in the environment.

```APIDOC
## Cloud Adapter — AWS Textract

Drop-in recognition model that sends pages to AWS Textract instead of the built-in Tesseract engine. Requires AWS credentials in the environment.

```js
import scribe from 'scribe.js-ocr';
import { RecognitionModelTextract } from '@scribe.js/aws-textract';

// AWS credentials via env: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
scribe.opt.progressHandler = ({ n, type, info }) => {
  if (type === 'convert') console.log(`Page ${n}: ${info.engineName}`);
};

await scribe.importFiles(['invoice.pdf']);

await scribe.recognize({
  model: RecognitionModelTextract,
  modelOptions: {
    analyzeLayout: true,
    // Multi-region for higher throughput (optional):
    region: ['us-east-1', 'us-west-2', 'eu-west-1'],
  },
});

const text = await scribe.exportData('text');
console.log(text);
await scribe.terminate();
```
```

--------------------------------

### Integrate with Your Application

Source: https://github.com/scribeocr/scribe.js/blob/master/examples/server-textract-proxy/README.md

Use this JavaScript code to integrate the Textract proxy into your own application. It initializes scribe.js, imports PDF files, and calls the recognition service.

```js
import scribe from 'scribe.js';
import { RecognitionModelServerProxy } from './RecognitionModelServerProxy.js';

await scribe.init();
await scribe.importFiles([pdfFileFromInput]);

const ac = new AbortController();
cancelButton.addEventListener('click', () => ac.abort());

try {
  await scribe.recognize({
    model: RecognitionModelServerProxy,
    modelOptions: { serverUrl: 'https://your-server.example/ocr' },
    signal: ac.signal,
  });
  // scribe.data.ocr.active is populated; scribe.exportData('pdf') returns a searchable PDF
} catch (err) {
  if (err.name !== 'AbortError') throw err;
  // Partial results in scribe.data.ocr.active for pages that arrived before abort.
}
```

--------------------------------

### download

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Runs `exportData` and saves the result as a download (browser) or local file (Node.js).

```APIDOC
## download

Runs `exportData` and saves the result as a download (browser) or local file (Node.js).

### Parameters

* `format` **(`"pdf"` | `"hocr"` | `"docx"` | `"xlsx"` | `"txt"` | `"text"`)**
* `fileName` **[string]**
* `minPage` **[number]** First page to export. (optional, default `0`)
* `maxPage` **[number]** Last page to export (inclusive). -1 exports through the last page. (optional, default `-1`)
```

--------------------------------

### Recognize Text with Default Settings

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Run OCR on imported files using the default recognition mode, which combines LSTM and legacy Tesseract models for the English language.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);

// Default: combined LSTM + legacy, English
await scribe.recognize({ langs: ['eng'] });
```

--------------------------------

### Using AWS Textract Cloud Adapter with Scribe.js

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Integrates AWS Textract as a recognition model. Requires AWS credentials to be set in the environment. The progress handler can be customized to log engine information during processing.

```javascript
import scribe from 'scribe.js-ocr';
import { RecognitionModelTextract } from '@scribe.js/aws-textract';

// AWS credentials via env: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
scribe.opt.progressHandler = ({ n, type, info }) => {
  if (type === 'convert') console.log(`Page ${n}: ${info.engineName}`);
};

await scribe.importFiles(['invoice.pdf']);

await scribe.recognize({
  model: RecognitionModelTextract,
  modelOptions: {
    analyzeLayout: true,
    // Multi-region for higher throughput (optional):
    region: ['us-east-1', 'us-west-2', 'eu-west-1'],
  },
});

const text = await scribe.exportData('text');
console.log(text);
await scribe.terminate();
```

--------------------------------

### Extract Text from User-Uploaded Files (Browser)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Extract text from files uploaded by a user in a browser environment. Pre-loading resources with `scribe.init` can improve the initial response time.

```javascript
import scribe from 'node_modules/scribe.js-ocr/scribe.js';

await scribe.init({ ocr: true, font: true }); // pre-load for faster response
document.getElementById('uploader').addEventListener('change', async (e) => {
  const text = await scribe.extractText(e.target.files);
  console.log(text);
});
```

--------------------------------

### scribe.opt

Source: https://context7.com/scribeocr/scribe.js/llms.txt

A static class providing global configuration options that affect recognition, export, and performance. These options must be set before the relevant operation is invoked.

```APIDOC
## scribe.opt

### Description
Global options object that affects recognition, export, and performance. Must be set before the relevant operation is invoked.

### Properties
- `confThreshHigh` (number): High confidence threshold for OCR results. Defaults to 85.
- `confThreshMed` (number): Medium confidence threshold for OCR results. Defaults to 75.
- `workerN` (number): Number of worker threads to use. Set before `init`.
- `langPath` (string): Path to Tesseract language data for offline/sandboxed environments.
- `displayMode` (string): Controls PDF output mode. Options: 'invis' (invisible text), 'ebook' (text only), 'proof' (semi-transparent overlay), 'eval' (debug color-coded overlay).
- `reflow` (boolean): Enables reflow for reconstructing reading order from layout during export.
- `progressHandler` (function): A custom handler for progress updates. Receives an object with `n` (page number) and `type`.
- `warningHandler` (function): A custom handler for warning messages. Receives the warning message string.
- `errorHandler` (function): A custom handler for error messages. Receives the error message string.
```

--------------------------------

### Scribe CLI Commands

Source: https://context7.com/scribeocr/scribe.js/llms.txt

The `scribe` CLI provides batch OCR, PDF overlay, and text extraction from the terminal.

```APIDOC
## CLI — `scribe` command-line interface

The `scribe` CLI (installed via `npm i -g scribe.js-ocr` or run with `npx`) provides batch OCR, PDF overlay, and text extraction from the terminal.

```bash
# Extract text from an image (outputs .txt file)
scribe extract scan.png

# Extract text from a PDF, output as searchable PDF
scribe extract document.pdf --format pdf

# Process all supported files in a directory
scribe extract ./scans/ --dir --format txt

# Recognize and create a searchable PDF with invisible text layer
scribe recognize scan.pdf --output ./output/

# Check OCR confidence of a file
scribe check document.pdf

# Overlay OCR text on a PDF with a visual proof overlay
scribe overlay scan.pdf --output ./output/ --vis

# Detect whether a PDF is text-native or image-based
scribe detect-pdf-type document.pdf
```
```

--------------------------------

### importFiles

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Import files for processing. Supports various file types and input formats, including structured objects for explicit file type definition.

```APIDOC
## importFiles

Import files for processing.
An object with `pdfFiles`, `imageFiles`, and `ocrFiles` arrays can be provided to import multiple types of files.
Alternatively, for `File` objects (browser) and file paths (Node.js), a single array can be provided, which is sorted based on extension.

### Parameters

* `files` **([Array]<[File]> | FileList | [Array]<[string]> | [SortedInputFiles])**
```

--------------------------------

### Import Multiple Image Files

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Import multiple image files into the Scribe.js processing pipeline. The library sorts them alphabetically internally.

```javascript
import scribe from 'scribe.js-ocr';

// Multiple images (sorted alphabetically internally)
await scribe.importFiles(['page_01.png', 'page_02.png', 'page_03.png']);
```

--------------------------------

### Import and Use Scribe.js

Source: https://github.com/scribeocr/scribe.js/blob/master/README.md

Import Scribe.js in your JavaScript code for browser or Node.js environments. Use the extractText method to process image URLs and log the results.

```javascript
// Import statement in browser:
import scribe from 'node_modules/scribe.js-ocr/scribe.js';
// Import statement for Node.js:
import scribe from 'scribe.js-ocr';

// Basic usage
scribe.extractText(['https://tesseract.projectnaptha.com/img/eng_bw.png'])
	.then((res) => console.log(res))
```

--------------------------------

### Configure Global Scribe.js Options

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Sets global configuration options that affect recognition, export, and performance. These must be set before the relevant operation is invoked.

```javascript
import scribe from 'scribe.js-ocr';

// Tune confidence thresholds
scribe.opt.confThreshHigh = 85;   // default
scribe.opt.confThreshMed  = 75;

// Control number of workers (set before init)
scribe.opt.workerN = 4;

// Use a local mirror for Tesseract language data (offline/sandboxed environments)
scribe.opt.langPath = '/static/tessdata';

// Control PDF output mode: 'invis' (invisible text), 'ebook' (text only),
// 'proof' (semi-transparent overlay), 'eval' (debug color-coded overlay)
scribe.opt.displayMode = 'invis';

// Export reflow: reconstruct reading order from layout
scribe.opt.reflow = true;

// Custom progress handler
scribe.opt.progressHandler = ({ n, type }) => {
  console.log(`Progress: page=${n} type=${type}`);
};

// Custom warning / error handlers
scribe.opt.warningHandler = (msg) => console.warn('[scribe warn]', msg);
scribe.opt.errorHandler   = (msg) => console.error('[scribe error]', msg);

await scribe.init({ ocr: true });
await scribe.importFiles(['scan.png']);
await scribe.recognize({ langs: ['eng'] });
const pdf = await scribe.exportData('pdf');
await scribe.terminate();
```

--------------------------------

### scribe.recognize(options?)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Executes the Optical Character Recognition (OCR) process on files previously imported using `importFiles`. It supports various recognition modes and language options, including custom cloud-based models.

```APIDOC
## scribe.recognize(options?)

### Description
Runs optical character recognition on all pages currently loaded via `importFiles`. Supports built-in Tesseract (combined LSTM + legacy), speed-only mode, and pluggable cloud/custom recognition models.

### Method
`recognize`

### Parameters
- **options** (object) - Optional - Configuration options for the recognition process, including `langs`, `mode`, `modeAdv`, `config`, and `model`.
  - **langs** (Array<string>) - Optional - Languages to use for OCR.
  - **mode** (string) - Optional - Recognition mode ('speed' for faster processing).
  - **modeAdv** (string) - Optional - Advanced recognition mode (e.g., 'lstm').
  - **config** (object) - Optional - Custom Tesseract configuration parameters.
  - **model** (object) - Optional - A custom recognition model (e.g., `RecognitionModelTextract`).
  - **modelOptions** (object) - Optional - Options specific to the chosen custom model.

### Request Example
```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);

// Default: combined LSTM + legacy, English
await scribe.recognize({ langs: ['eng'] });

// Speed mode (faster, similar to raw Tesseract.js)
await scribe.recognize({ mode: 'speed', langs: ['eng'] });

// Advanced: use only LSTM model, custom Tesseract config
await scribe.recognize({
  modeAdv: 'lstm',
  langs: ['deu'],
  config: { tessedit_char_whitelist: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' },
});

// Custom cloud model: AWS Textract
import { RecognitionModelTextract } from '@scribe.js/aws-textract';

await scribe.importFiles(['document.pdf']);
await scribe.recognize({
  model: RecognitionModelTextract,
  modelOptions: { analyzeLayout: true },
});
const text = await scribe.exportData('text');
console.log(text);
await scribe.terminate();
```
```

--------------------------------

### writeDebugImages

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Writes debug images for visual inspection of processing steps.

```APIDOC
## writeDebugImages

### Parameters

* `ctx` 
* `compDebugArrArr` **[Array]<[Array]<CompDebugNode>>**
* `filePath` **[string]**
```

--------------------------------

### Parameters for Recognition

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

This section details the options available for configuring the recognition process in Scribe.js.

```APIDOC
## Parameters for Recognition

### Description
Configuration options for the recognition process.

### Parameters
#### options (Object) - Optional, default `{}`

- **options.mode** (`"speed"` | `"quality"`) - Optional, default `'quality'`: Recognition mode.
- **options.langs** (Array<string>) - Optional, default `['eng']`: Language(s) in the document.
- **options.modeAdv** (`"lstm"` | `"legacy"` | `"combined"`) - Optional, default `'combined'`: Alternative method of setting recognition mode.
- **options.combineMode** (`"conf"` | `"data"` | `"none"`) - Optional, default `'data'`: Method of combining OCR results. Used if OCR data already exists.
- **options.vanillaMode** (boolean) - Optional, default `false`: Whether to use the vanilla Tesseract.js model.
- **options.config** (Object<string, string>) - Optional, default `{}`: Config parameters to pass to Tesseract.js.
```

--------------------------------

### scribe.importFiles(files)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Loads various file types (PDF, images, hOCR/XML, .scribe sessions) into the Scribe.js processing pipeline. It accepts different input formats depending on the environment (Node.js vs. Browser).

```APIDOC
## scribe.importFiles(files)

### Description
Loads PDF, image (PNG/JPEG), hOCR/XML, or `.scribe` session files into the internal pipeline. Accepts an array of paths (Node.js), URLs or `File` objects (browser), a `FileList`, or a pre-sorted `SortedInputFiles` object. Must be called before `recognize` or `exportData`.

### Method
`importFiles`

### Parameters
- **files** (Array<string | File | FileList | SortedInputFiles | ArrayBuffer>) - Required - An array of file paths, URLs, File objects, FileList, or a `SortedInputFiles` object containing file data.

### Request Example
```javascript
import scribe from 'scribe.js-ocr';

// Single PDF
await scribe.importFiles(['invoice.pdf']);

// Multiple images (sorted alphabetically internally)
await scribe.importFiles(['page_01.png', 'page_02.png', 'page_03.png']);

// Pre-sorted SortedInputFiles object — required when passing ArrayBuffers
import fs from 'node:fs';
const pdfBuffer = fs.readFileSync('scan.pdf').buffer;
await scribe.importFiles({
  pdfFiles: [pdfBuffer],   // ArrayBuffer
});

// Combined: image pages + existing hOCR data
await scribe.importFiles({
  imageFiles: ['scan.png'],
  ocrFiles:   ['scan.hocr'],
});

console.log(scribe.inputData.pageCount); // => number of pages imported
```
```

--------------------------------

### terminate

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Terminates the program and releases all allocated resources.

```APIDOC
## terminate

Terminates the program and releases resources.
```

--------------------------------

### Import Single PDF File

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Import a single PDF file into the Scribe.js processing pipeline using its file path.

```javascript
import scribe from 'scribe.js-ocr';

// Single PDF
await scribe.importFiles(['invoice.pdf']);
```

--------------------------------

### Download OCR Data to File

Source: https://context7.com/scribeocr/scribe.js/llms.txt

A convenience function that exports OCR data and saves it as a file. Supports various formats and page range options.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['report.pdf']);
await scribe.recognize();

// Save searchable PDF to disk (Node.js) or trigger download (browser)
await scribe.download('pdf', 'report.pdf');

// Save plain text output
await scribe.download('txt', 'report.txt');

// Save only first 3 pages as DOCX
await scribe.download('docx', 'report.docx', { minPage: 0, maxPage: 2 });

await scribe.terminate();
```

--------------------------------

### Recognize Text with Advanced LSTM and Custom Config

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Utilize advanced recognition options, such as using only the LSTM model and providing a custom Tesseract configuration for character whitelisting.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);

// Advanced: use only LSTM model, custom Tesseract config
await scribe.recognize({
  modeAdv: 'lstm',
  langs: ['deu'],
  config: { tessedit_char_whitelist: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' },
});
```

--------------------------------

### Import Mixed Image and OCR Files

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Import both image files and pre-existing OCR data (e.g., hOCR) into the Scribe.js pipeline. The `pageCount` can be checked after import.

```javascript
import scribe from 'scribe.js-ocr';

// Combined: image pages + existing hOCR data
await scribe.importFiles({
  imageFiles: ['scan.png'],
  ocrFiles:   ['scan.hocr'],
});

console.log(scribe.inputData.pageCount); // => number of pages imported
```

--------------------------------

### Recognize Text with AWS Textract Cloud Model

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Integrate with AWS Textract for OCR by specifying the `RecognitionModelTextract` and its options. The extracted text can then be exported.

```javascript
import scribe from 'scribe.js-ocr';
import { RecognitionModelTextract } from '@scribe.js/aws-textract';

await scribe.importFiles(['document.pdf']);
await scribe.recognize({
  model: RecognitionModelTextract,
  modelOptions: { analyzeLayout: true },
});
const text = await scribe.exportData('text');
console.log(text);
await scribe.terminate();
```

--------------------------------

### Export OCR Data in Various Formats

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Exports OCR data to formats like 'txt', 'pdf', 'hocr', 'alto', 'html', 'md', 'docx', 'xlsx', or 'scribe'. Options allow specifying page ranges or arrays of pages.

```javascript
import scribe from 'scribe.js-ocr';
import fs from 'node:fs';

await scribe.importFiles(['multi-page.pdf']);
await scribe.recognize({ langs: ['eng'] });

// Plain text — all pages
const txt = await scribe.exportData('txt');
console.log(txt);

// Searchable PDF (invisible text layer overlaid on original)
const pdfBytes = await scribe.exportData('pdf');
fs.writeFileSync('output.pdf', Buffer.from(pdfBytes));

// hOCR — pages 2–4 only (0-based indices)
const hocr = await scribe.exportData('hocr', { minPage: 1, maxPage: 3 });

// Specific pages via array
const partial = await scribe.exportData('txt', { pageArr: [0, 2, 4] });

// Markdown with table preservation
const md = await scribe.exportData('md');

// Scribe session format (compressed, for later restore)
const sessionBuf = await scribe.exportData('scribe');
fs.writeFileSync('session.scribe', Buffer.from(sessionBuf));

await scribe.terminate();
```

--------------------------------

### scribe.importFilesSupp(files, ocrName)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Loads supplemental OCR data, such as ground-truth or alternate engine outputs, to be used alongside the primary OCR data for comparison and evaluation.

```APIDOC
## scribe.importFilesSupp(files, ocrName)

### Description
Loads an additional OCR version (e.g., ground-truth or an alternate engine's output) alongside the primary OCR data, enabling comparison and evaluation workflows.

### Parameters
#### `files` (array of strings, required)
- An array of file paths or URLs to the supplemental OCR data files.

#### `ocrName` (string, required)
- A name to identify this supplemental OCR data, used for referencing it later (e.g., 'Ground Truth').

### Returns
- `Promise<void>`: This function performs an import operation and does not return a value.
```

--------------------------------

### Recognize Text in Speed Mode

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Perform OCR in speed mode for faster processing, similar to raw Tesseract.js. Specify the desired languages for recognition.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);

// Speed mode (faster, similar to raw Tesseract.js)
await scribe.recognize({ mode: 'speed', langs: ['eng'] });
```

--------------------------------

### clear

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Clears all document-specific data from the current session.

```APIDOC
## clear

Clears all document-specific data.
```

--------------------------------

### Extract Text from Image (Node.js)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Use this to extract text from an image file in a Node.js environment. Remember to terminate the process to release workers.

```javascript
import scribe from 'scribe.js-ocr';

const text = await scribe.extractText(['scan.png']);
console.log(text);
// => "Hello World\n..."

await scribe.terminate(); // release workers
```

--------------------------------

### API Endpoint: POST /ocr

Source: https://github.com/scribeocr/scribe.js/blob/master/examples/server-textract-proxy/README.md

This endpoint accepts raw PDF bytes and returns NDJSON results streamed per page, processed by AWS Textract.

```APIDOC
## POST /ocr

### Description
Accepts raw PDF bytes and returns NDJSON results streamed per page, processed by AWS Textract. Credentials remain on the server.

### Method
POST

### Endpoint
/ocr

### Parameters
#### Request Body
- **body** (binary) - Required - Raw PDF bytes, `Content-Type: application/pdf`

### Response
#### Success Response (200)
- **application/x-ndjson** - Streamed NDJSON lines, one per page.
  - `{"pageNum": 0, "rawData": "<stringified Textract JSON>"}`
  - Lines are flushed as each page completes.

#### Failure Response
- **application/x-ndjson** - Streamed NDJSON lines, one per page.
  - `{"pageNum": 0, "error": {"message": "..."}}`
```

--------------------------------

### scribe.download(format, fileName, options?)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

A convenience function that exports OCR data and saves it as a file. It acts as a wrapper around `exportData` and handles file saving in both browser and Node.js environments.

```APIDOC
## scribe.download(format, fileName, options?)

### Description
Exports the active OCR data and saves it as a file download (browser) or writes to the local filesystem (Node.js). This is a convenience wrapper around `exportData`.

### Parameters
#### `format` (string, required)
- The desired output format (e.g., 'pdf', 'txt', 'docx').

#### `fileName` (string, required)
- The name of the file to save the output as.

#### `options` (object, optional)
- `minPage` (number): The starting page index (0-based) for export.
- `maxPage` (number): The ending page index (0-based) for export.
- `pageArr` (array of numbers): An array of specific page indices to export.

### Returns
- `Promise<void>`: This function does not return a value but performs a file save operation.
```

--------------------------------

### `scribe.utils.calcConf(ocrPages)` - Compute confidence statistics

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Returns aggregate high-confidence and total word counts across an array of OCR pages, useful for estimating recognition quality.

```APIDOC
## `scribe.utils.calcConf(ocrPages)` — Compute confidence statistics

Returns aggregate high-confidence and total word counts across an array of OCR pages, useful for estimating recognition quality.

```js
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.pdf']);
await scribe.recognize();

const { highConf, total } = scribe.utils.calcConf(scribe.data.ocr.active);
console.log(`High-confidence words: ${highConf} / ${total}`);
console.log(`Estimated accuracy: ${((highConf / total) * 100).toFixed(1)}%`);

await scribe.terminate();
```
```

--------------------------------

### scribe.extractText(files, langs?, outputFormat?, options?)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Performs text extraction from provided files in a single asynchronous call. It handles both text-native PDFs and image-based files, automatically performing OCR when necessary. Supports various input types like URLs, file paths, and File/FileList objects.

```APIDOC
## scribe.extractText(files, langs?, outputFormat?, options?)

### Description
Imports files, runs OCR when needed, and returns extracted text in a single async call. For text-native PDFs the existing text is returned directly; for image-based files OCR is performed automatically. Accepts URLs (browser), file paths (Node.js), or `File`/`FileList` objects (browser).

### Method
`extractText`

### Parameters
- **files** (Array<string | File | FileList>) - Required - An array of file paths, URLs, or File objects to process.
- **langs** (Array<string>) - Optional - An array of language codes for OCR (e.g., ['eng', 'fra']).
- **outputFormat** (string) - Optional - The desired output format (e.g., 'txt').
- **options** (object) - Optional - Configuration options, such as `{ skipRecPDFTextNative: true }`.

### Request Example
```javascript
// Node.js — extract text from an image
import scribe from 'scribe.js-ocr';

const text = await scribe.extractText(['scan.png']);
console.log(text);
// => "Hello World\n..."

await scribe.terminate(); // release workers


// Node.js — extract text from a PDF, specify language
const text = await scribe.extractText(
  ['document.pdf'],
  ['eng', 'fra'],   // languages
  'txt',            // output format
  { skipRecPDFTextNative: true }  // skip OCR for text-native PDFs (default)
);
await scribe.terminate();


// Browser — extract from user-uploaded files
import scribe from 'node_modules/scribe.js-ocr/scribe.js';

await scribe.init({ ocr: true, font: true }); // pre-load for faster response
document.getElementById('uploader').addEventListener('change', async (e) => {
  const text = await scribe.extractText(e.target.files);
  console.log(text);
});
```
```

--------------------------------

### scribe.exportData(format?, options?)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Exports the active OCR data to the requested format. Supported formats include 'txt', 'pdf', 'hocr', 'alto', 'html', 'md', 'docx', 'xlsx', and 'scribe'. Options can specify page ranges or specific pages.

```APIDOC
## scribe.exportData(format?, options?)

### Description
Exports the active OCR data to the requested format and returns the content as a `string` or `ArrayBuffer`. Supported formats: `'txt'`, `'pdf'`, `'hocr'`, `'alto'`, `'html'`, `'md'`, `'docx'`, `'xlsx'`, `'scribe'`.

### Parameters
#### `format` (string, optional)
- The desired output format. Defaults to 'txt' if not specified.

#### `options` (object, optional)
- `minPage` (number): The starting page index (0-based) for export.
- `maxPage` (number): The ending page index (0-based) for export.
- `pageArr` (array of numbers): An array of specific page indices to export.

### Returns
- `string` or `ArrayBuffer`: The exported OCR data in the specified format.
```

--------------------------------

### Computing Confidence Statistics with Scribe.js

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Calculates aggregate high-confidence and total word counts from OCR pages to estimate recognition quality. This utility function is useful for assessing the accuracy of the OCR process.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.pdf']);
await scribe.recognize();

const { highConf, total } = scribe.utils.calcConf(scribe.data.ocr.active);
console.log(`High-confidence words: ${highConf} / ${total}`);
console.log(`Estimated accuracy: ${((highConf / total) * 100).toFixed(1)}%`);

await scribe.terminate();
```

--------------------------------

### Accessing Internal OCR and Image Data

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Provides access to raw OCR page objects, font state, image cache, layout regions, and per-page metrics after processing.

```APIDOC
## `scribe.data` — Accessing internal OCR and image data

Provides access to raw OCR page objects, font state, image cache, layout regions, and per-page metrics after processing.

```js
import scribe from 'scribe.js-ocr';

await scribe.importFiles(['scan.png']);
await scribe.recognize();

// Iterate OCR words on page 0
const page0 = scribe.data.ocr.active[0];
for (const line of page0.lines) {
  for (const word of line.words) {
    console.log(word.text, word.conf); // text and confidence score
  }
}

// Total page count
console.log(scribe.inputData.pageCount);

// Confidence summary across all pages
const { highConf, total } = scribe.utils.calcConf(scribe.data.ocr.active);
console.log(`Accuracy estimate: ${((highConf / total) * 100).toFixed(1)}%`);

await scribe.terminate();
```
```

--------------------------------

### exportData

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Export active OCR data to a specified format, with options for page range.

```APIDOC
## exportData

Export active OCR data to specified format.

### Parameters

* `format` **(`"pdf"` | `"hocr"` | `"docx"` | `"xlsx"` | `"txt"` | `"text"`)**  (optional, default `'txt'`)
* `minPage` **[number]** First page to export. (optional, default `0`)
* `maxPage` **[number]** Last page to export (inclusive). -1 exports through the last page. (optional, default `-1`)

Returns **[Promise]<([string] | [ArrayBuffer])>**
```

--------------------------------

### Extract Text from PDF with Language and Format (Node.js)

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Extract text from a PDF file, specifying the desired languages and output format. The `skipRecPDFTextNative` option can be used to skip OCR for text-native PDFs.

```javascript
import scribe from 'scribe.js-ocr';

const text = await scribe.extractText(
  ['document.pdf'],
  ['eng', 'fra'],   // languages
  'txt',            // output format
  { skipRecPDFTextNative: true }  // skip OCR for text-native PDFs (default)
);
await scribe.terminate();
```

--------------------------------

### scribe.terminate()

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Releases all resources used by Scribe.js, including worker threads for OCR, PDF rendering, and font engines. This should always be called when processing is complete to prevent resource leaks.

```APIDOC
## scribe.terminate()

### Description
Terminates the underlying worker threads (Tesseract, PDF renderer, font engine) and frees memory. Always call this when you are done processing to avoid resource leaks, especially in Node.js.

### Returns
- `Promise<void>`: This function performs a termination operation and does not return a value.
```

--------------------------------

### Terminate Scribe.js Resources

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Frees all resources by terminating worker threads. Essential for preventing memory leaks, especially in Node.js environments.

```javascript
import scribe from 'scribe.js-ocr';

try {
  const text = await scribe.extractText(['scan.pdf']);
  console.log(text);
} finally {
  await scribe.terminate();
}
```

--------------------------------

### extractText

Source: https://github.com/scribeocr/scribe.js/blob/master/docs/API.md

Function for extracting text from image and PDF files with a single function call. Handles PDF text extraction or OCR based on file type and options.

```APIDOC
## extractText

Function for extracting text from image and PDF files with a single function call.
By default, existing text content is extracted for text-native PDF files; otherwise text is extracted using OCR.
To control how text from PDF files is handled, set the options in the `opt.usePDFText` object.
For more control, use `init`, `importFiles`, `recognize`, and `exportData` separately.

### Parameters

* `files` 
* `langs` **[Array]<[string]>**  (optional, default `['eng']`)
* `outputFormat`   (optional, default `'txt'`)
* `options`   (optional, default `{}`)
```

--------------------------------

### Clear Document Data for New Processing

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Clears all OCR results and related data for the current document, allowing a new document to be processed within the same session without re-initializing workers.

```javascript
import scribe from 'scribe.js-ocr';

await scribe.init({ ocr: true });

// Process first document
await scribe.importFiles(['doc1.pdf']);
await scribe.recognize();
const text1 = await scribe.exportData('txt');

// Clear and process second document
await scribe.clear();
await scribe.importFiles(['doc2.pdf']);
await scribe.recognize();
const text2 = await scribe.exportData('txt');

await scribe.terminate();
```

--------------------------------

### scribe.clear()

Source: https://context7.com/scribeocr/scribe.js/llms.txt

Clears all document-specific data, including OCR results and images, without releasing the underlying worker resources. This is useful for processing multiple documents within the same session.

```APIDOC
## scribe.clear()

### Description
Clears all document-specific data (OCR results, images, page metrics) without releasing the underlying workers. Use this to process a new document in the same session without the overhead of re-initializing.

### Returns
- `Promise<void>`: This function performs a clear operation and does not return a value.
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.