# MuPDF MuPDF is a lightweight, open-source software framework for viewing, converting, and manipulating PDF, XPS, and e-book documents (EPUB, MOBI, FB2, CBZ). Written in portable C, it powers command-line tools (`mutool`), desktop and mobile viewers, and a comprehensive library accessible from C, JavaScript/WebAssembly, Java (Android), and Python (via PyMuPDF). The library provides high-quality rendering at any resolution, full text extraction and search, annotation creation and editing, digital signature support, and low-level PDF object manipulation. MuPDF 1.28 is the current release, available under AGPL v3 or a commercial license from Artifex Software. The JavaScript/WASM binding (`mupdf` on npm) exposes the full C library through an object-oriented API that works identically in Node.js, Bun, and modern web browsers. The same API is available through the `mutool run` command-line interpreter (ES5 only). Core classes include `Document` / `PDFDocument` for file I/O, `Page` / `PDFPage` for rendering and text extraction, `Pixmap` for raster images, `StructuredText` for analyzed text, `PDFAnnotation` for PDF markup, `DisplayList` for cached rendering, and low-level helpers such as `Matrix`, `Path`, `Text`, and `Font`. The C API uses an `fz_context` for thread-local state, a `fz_try/fz_catch` exception model, and reference-counted objects dropped with `fz_drop_*` calls. --- ## JavaScript API ### Install and import the mupdf module Install the npm package and import it as an ES module; all classes are exported from the top-level namespace. ```javascript // npm install mupdf import * as mupdf from "mupdf" // Verify available exports console.log(Object.keys(mupdf)) ``` --- ### `Document.openDocument()` — Open any supported document The static factory opens PDF, XPS, EPUB, MOBI, FB2, CBZ, and image files from a path, a Buffer, or an ArrayBuffer. It also accepts an optional accelerator file for faster EPUB loading and an Archive for resource lookup. ```javascript import * as mupdf from "mupdf" import * as fs from "fs" // Open by file path (MIME type inferred from extension) const doc1 = mupdf.Document.openDocument("report.pdf") // Open from a Node Buffer with explicit MIME type const buf = fs.readFileSync("report.pdf") const doc2 = mupdf.Document.openDocument(buf, "application/pdf") // Check document type console.log(doc2.isPDF()) // true console.log(doc2.countPages()) // e.g. 42 console.log(doc2.getMetaData("info:Title")) console.log(doc2.getMetaData("info:Author")) // Password-protected documents if (doc2.needsPassword()) { const result = doc2.authenticatePassword("secret") // 0=failed, 1=no password needed, 2=user ok, 4=owner ok, 6=both ok if (result === 0) throw new Error("Wrong password") } // Reflowable documents (EPUB) can be re-laid-out const epub = mupdf.Document.openDocument("book.epub") if (epub.isReflowable()) { epub.layout(400, 600, 16) // pageWidth, pageHeight, fontSize } ``` --- ### `Document.prototype.loadPage()` — Load a page for rendering or inspection Returns a `Page` object (or `PDFPage` for PDF files) that provides all rendering, text extraction, search, and link operations for that page. Pages are zero-indexed. ```javascript import * as mupdf from "mupdf" import * as fs from "fs" const doc = mupdf.Document.openDocument("input.pdf") const page = doc.loadPage(0) // first page // Bounding box of the page const bounds = page.getBounds() console.log("Page size:", bounds) // [x0, y0, x1, y1] // Render at 150 dpi (72 dpi is 1x scale) const scale = 150 / 72 const matrix = mupdf.Matrix.scale(scale, scale) const pixmap = page.toPixmap(matrix, mupdf.ColorSpace.DeviceRGB, false, true) fs.writeFileSync("page1.png", pixmap.asPNG()) // Text extraction const stext = page.toStructuredText("preserve-whitespace") console.log(stext.asText()) // Search const hits = page.search("important keyword") hits.forEach(quads => console.log("Hit quads:", quads)) // Links const links = page.getLinks() links.forEach(link => { if (link.uri.startsWith("#")) { console.log("Internal link to page", doc.resolveLink(link)) } else { console.log("External link:", link.uri) } }) ``` --- ### `Page.prototype.toPixmap()` — Render a page to a raster image Renders the full page (with or without annotations/widgets) into a `Pixmap` using the given transformation matrix and colorspace. The pixmap can then be exported as PNG, JPEG, PAM, or PSD. ```javascript import * as mupdf from "mupdf" import * as fs from "fs" const doc = mupdf.Document.openDocument("slides.pdf") const pageCount = doc.countPages() for (let i = 0; i < pageCount; i++) { const page = doc.loadPage(i) // 300 dpi render, RGB, no alpha, include annotations const dpi = 300 const matrix = mupdf.Matrix.scale(dpi / 72, dpi / 72) const pixmap = page.toPixmap(matrix, mupdf.ColorSpace.DeviceRGB, false, true) // Save as PNG fs.writeFileSync(`page-${i + 1}.png`, pixmap.asPNG()) // Or as JPEG at quality 85 // fs.writeFileSync(`page-${i + 1}.jpg`, pixmap.asJPEG(85, false)) console.log(`Page ${i + 1}: ${pixmap.getWidth()}x${pixmap.getHeight()} px`) } ``` --- ### `Page.prototype.toStructuredText()` — Extract structured text from a page Extracts all text on a page, organized into blocks, lines, and characters. The returned `StructuredText` provides plain text, HTML, JSON, search, and walker interfaces. ```javascript import * as mupdf from "mupdf" const doc = mupdf.Document.openDocument("article.pdf") const page = doc.loadPage(0) // Basic plain text extraction const stext = page.toStructuredText("preserve-whitespace") const plainText = stext.asText() console.log(plainText) // Structured JSON output (requires "preserve-spans" option) const json = JSON.parse(page.toStructuredText("preserve-spans").asJSON()) json.blocks.forEach(block => { if (block.type === "text") { block.lines.forEach(line => { console.log(`[${line.font.name} ${line.font.size}pt] ${line.text}`) }) } }) // Full-featured walker stext.walk({ beginTextBlock(bbox) { console.log("Block at", bbox) }, onChar(utf, origin, font, size, quad, argb) { process.stdout.write(utf) }, endTextBlock() { console.log() }, onImageBlock(bbox, transform, image) { console.log("Image at", bbox) } }) // Copy text in a selection rectangle const selected = stext.copy([50, 100], [400, 200]) console.log("Selected text:", selected) ``` --- ### `PDFDocument` constructor and `PDFDocument.prototype.save()` — Create and save PDF files `new mupdf.PDFDocument()` creates an empty PDF from scratch. Pages are added with `addPage()` then `insertPage()`. Documents are saved with a rich set of options controlling compression, encryption, garbage collection, and incremental updates. ```javascript import * as mupdf from "mupdf" // Create a new PDF with one page const pdf = new mupdf.PDFDocument() // Add a font resource const fontObj = pdf.addSimpleFont(new mupdf.Font("Helvetica"), "Latin") const fonts = pdf.newDictionary() fonts.put("F1", fontObj) const resources = pdf.addObject(pdf.newDictionary()) resources.put("Font", fonts) // Create a page (A4: 595x842 pts) const pageObj = pdf.addPage( [0, 0, 595, 842], 0, // rotation resources, "BT /F1 24 Tf 72 750 Td (Hello from MuPDF!) Tj ET" ) pdf.insertPage(-1, pageObj) // -1 = append // Save options: pretty-print, compress images, garbage-collect pdf.save("output.pdf", "pretty,compress-images,garbage") // Or save to a buffer for streaming const buffer = pdf.saveToBuffer("compress,garbage=deduplicate") // buffer.asUint8Array() or buffer.asString() // Incremental save (for appending changes to existing file) // pdf.save("output.pdf", "incremental") ``` --- ### `PDFPage.prototype.createAnnotation()` — Add annotations to PDF pages Creates a new PDF annotation of the specified type on a page. Annotation types include `Text`, `FreeText`, `Line`, `Square`, `Circle`, `Highlight`, `Underline`, `Stamp`, `Ink`, `Redaction`, and more. ```javascript import * as mupdf from "mupdf" const doc = mupdf.Document.openDocument("input.pdf") const pdf = doc.asPDF() const page = pdf.loadPage(0) // FreeText annotation (sticky note with text box) const freeText = page.createAnnotation("FreeText") freeText.setRect([50, 700, 300, 750]) freeText.setContents("Review this section!") freeText.setDefaultAppearance("Helv", 12, [1, 0, 0]) // red Helvetica 12pt freeText.setColor([1, 1, 0]) // yellow border freeText.update() // Highlight annotation with quad points const highlight = page.createAnnotation("Highlight") highlight.setColor([1, 1, 0]) // yellow highlight.setOpacity(0.5) highlight.setQuadPoints([ [72, 720, 300, 720, 300, 736, 72, 736] ]) highlight.update() // Ink (freehand) annotation const ink = page.createAnnotation("Ink") ink.setColor([0, 0, 1]) // blue ink.setBorderWidth(2) ink.setInkList([ [[100, 500], [150, 480], [200, 500], [250, 480]], // stroke 1 [[100, 460], [200, 460]] // stroke 2 ]) ink.update() // Save pdf.save("annotated.pdf", "incremental") ``` --- ### `PDFPage.prototype.applyRedactions()` — Permanently redact content Redaction removes content from a PDF permanently. First create `Redaction` annotations to mark areas, then call `applyRedactions()` to destructively apply them. ```javascript import * as mupdf from "mupdf" const pdf = mupdf.Document.openDocument("sensitive.pdf").asPDF() const page = pdf.loadPage(0) // Mark two areas for redaction const r1 = page.createAnnotation("Redaction") r1.setRect([50, 700, 400, 730]) // header area r1.update() const r2 = page.createAnnotation("Redaction") r2.setRect([50, 100, 300, 130]) // footer area r2.update() // Apply redactions (irreversible) page.applyRedactions( true, // black boxes at redacted areas mupdf.PDFPage.REDACT_IMAGE_PIXELS, // redact covered image pixels mupdf.PDFPage.REDACT_LINE_ART_REMOVE_IF_COVERED, mupdf.PDFPage.REDACT_TEXT_REMOVE ) pdf.save("redacted.pdf", "garbage,compress") ``` --- ### `PDFDocument.prototype.graftPage()` — Merge pages across documents Copies a page (and all its resources) from one `PDFDocument` into another. Use a graft map (`newGraftMap`) when copying multiple objects to avoid duplicating shared resources. ```javascript import * as mupdf from "mupdf" // Merge multiple PDFs into one const output = new mupdf.PDFDocument() const inputs = ["file1.pdf", "file2.pdf", "file3.pdf"] for (const path of inputs) { const src = mupdf.Document.openDocument(path).asPDF() const n = src.countPages() for (let i = 0; i < n; i++) { output.graftPage(-1, src, i) // append each page } } output.save("merged.pdf", "garbage,compress") // Low-level merge preserving shared resources with a graft map function mergeWithGraftMap(dstDoc, srcDoc) { const graftMap = dstDoc.newGraftMap() const n = srcDoc.countPages() for (let k = 0; k < n; k++) { const srcPage = srcDoc.findPage(k) const dstPage = dstDoc.newDictionary() dstPage.put("Type", dstDoc.newName("Page")) if (srcPage.get("MediaBox")) dstPage.put("MediaBox", graftMap.graftObject(srcPage.get("MediaBox"))) if (srcPage.get("Resources")) dstPage.put("Resources", graftMap.graftObject(srcPage.get("Resources"))) if (srcPage.get("Contents")) dstPage.put("Contents", graftMap.graftObject(srcPage.get("Contents"))) dstDoc.insertPage(-1, dstDoc.addObject(dstPage)) } } ``` --- ### `PDFDocument.prototype.rearrangePages()` — Reorder, subset, or duplicate pages Rearranges the page tree to match the supplied array of page indices. Pages omitted are removed; pages listed multiple times are duplicated. Save with `garbage` to physically remove orphaned objects. ```javascript import * as mupdf from "mupdf" const pdf = mupdf.Document.openDocument("book.pdf").asPDF() // Reverse all pages const n = pdf.countPages() const reversed = Array.from({ length: n }, (_, i) => n - 1 - i) pdf.rearrangePages(reversed) pdf.save("reversed.pdf", "garbage") // Keep only pages 0, 2, 4 (delete odd pages) const evens = Array.from({ length: Math.ceil(n / 2) }, (_, i) => i * 2) pdf.rearrangePages(evens) pdf.save("even-pages.pdf", "garbage") ``` --- ### `PDFDocument` journaling — Undo/redo support Enable journaling on a document before making changes to support undo/redo. Each named operation becomes one undo step. ```javascript import * as mupdf from "mupdf" const pdf = mupdf.Document.openDocument("form.pdf").asPDF() pdf.enableJournal() // Make a change as a named undoable operation pdf.beginOperation("Add annotation") try { const page = pdf.loadPage(0) const annot = page.createAnnotation("Text") annot.setRect([100, 100, 150, 150]) annot.setContents("TODO") annot.update() pdf.endOperation() } catch (e) { pdf.abandonOperation() throw e } console.log(pdf.canUndo()) // true console.log(pdf.getJournal()) // { position: 1, steps: ["Add annotation"] } pdf.undo() console.log(pdf.canRedo()) // true pdf.redo() pdf.save("edited.pdf", "incremental") ``` --- ### `DisplayList` — Cache page rendering for multiple uses A `DisplayList` records all device calls for a page so they can be replayed multiple times (e.g., rendering at different scales, or searching while also rendering) without re-parsing the file. ```javascript import * as mupdf from "mupdf" import * as fs from "fs" const doc = mupdf.Document.openDocument("large.pdf") const page = doc.loadPage(5) // Record page into a display list once const displayList = page.toDisplayList(true) // true = include annotations // Render at 72 dpi (thumbnail) const small = displayList.toPixmap( mupdf.Matrix.scale(0.5, 0.5), mupdf.ColorSpace.DeviceRGB, false ) fs.writeFileSync("thumb.png", small.asPNG()) // Render at 300 dpi (high quality) const big = displayList.toPixmap( mupdf.Matrix.scale(300 / 72, 300 / 72), mupdf.ColorSpace.DeviceRGB, false ) fs.writeFileSync("hires.png", big.asPNG()) // Search the cached display list const hits = displayList.search("contract terms") console.log(`Found ${hits.length} match(es)`) // Extract text from cached display list const stext = displayList.toStructuredText("preserve-whitespace") console.log(stext.asText()) ``` --- ### `Pixmap` — Raster image manipulation A `Pixmap` holds a raster image with a specific colorspace and optional alpha channel. It can be created from scratch, obtained from page rendering, or derived from image files. Pixel-level access, color conversion, gamma correction, tinting, warping, deskewing, and barcode encoding/decoding are all supported. ```javascript import * as mupdf from "mupdf" import * as fs from "fs" // Create a blank 500x600 RGB pixmap (white background) const pixmap = new mupdf.Pixmap(mupdf.ColorSpace.DeviceRGB, [0, 0, 500, 600], false) pixmap.clear(255) // Draw on it using a DrawDevice const device = new mupdf.DrawDevice(mupdf.Matrix.identity, pixmap) const path = new mupdf.Path() path.moveTo(50, 50) path.lineTo(450, 50) path.lineTo(450, 550) path.lineTo(50, 550) path.closePath() device.fillPath(path, false, mupdf.Matrix.identity, mupdf.ColorSpace.DeviceRGB, [0.8, 0.9, 1.0], 1) device.close() // Color convert to CMYK const cmyk = pixmap.convertToColorSpace(mupdf.ColorSpace.DeviceCMYK, false) // Image manipulation pixmap.gamma(1.4) // lighten pixmap.invert() // invert colors pixmap.tint(0x000000, 0xffffff) // tint // Save in various formats fs.writeFileSync("out.png", pixmap.asPNG()) fs.writeFileSync("out.jpg", pixmap.asJPEG(90, false)) // Deskew a scanned document pixmap const angle = pixmap.detectSkew() if (Math.abs(angle) > 0.5) { const deskewed = pixmap.deskew(angle, "increase") fs.writeFileSync("deskewed.png", deskewed.asPNG()) } // Encode a QR code const qr = mupdf.Pixmap.encodeBarcode("qrcode", "https://mupdf.com", 200, 2, true, false) fs.writeFileSync("qr.png", qr.asPNG()) ``` --- ### `Matrix` — 2D transformation matrices Matrices are plain six-element arrays `[a, b, c, d, e, f]` representing 2D affine transforms. Static helpers on `mupdf.Matrix` create common transforms; `concat` combines them. ```javascript import * as mupdf from "mupdf" // Identity matrix const id = mupdf.Matrix.identity // [1, 0, 0, 1, 0, 0] // Scale: 2× in both axes (e.g., for 144 dpi from 72 dpi base) const scale2x = mupdf.Matrix.scale(2, 2) // Translate 100 points right, 50 points down const translate = mupdf.Matrix.translate(100, 50) // Rotate 45 degrees clockwise const rotate45 = mupdf.Matrix.rotate(45) // Combine: first scale, then translate const combined = mupdf.Matrix.concat(scale2x, translate) // Invert a matrix const inv = mupdf.Matrix.invert(combined) // Typical use: render a page at 150 dpi const doc = mupdf.Document.openDocument("doc.pdf") const page = doc.loadPage(0) const dpiMatrix = mupdf.Matrix.scale(150 / 72, 150 / 72) const pixmap = page.toPixmap(dpiMatrix, mupdf.ColorSpace.DeviceRGB, false) ``` --- ### `Story` and `DocumentWriter` — Flow HTML text into PDF pages `Story` takes an HTML string (or a programmatic DOM tree) and flows it into rectangular areas across multiple pages using `DocumentWriter`. This is the high-level API for generating PDFs from formatted content. ```javascript import * as mupdf from "mupdf" const mediabox = [0, 0, 595, 842] // A4 const margin = 40 const html = `

Annual Report 2024

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris.

` const writer = new mupdf.DocumentWriter("report.pdf", "PDF", "") const buf = new mupdf.Buffer() buf.write(html) const story = new mupdf.Story(buf, "", 12) // HTML, user-CSS, default font size let placed do { const where = [mediabox[0] + margin, mediabox[1] + margin, mediabox[2] - margin, mediabox[3] - margin] const dev = writer.beginPage(mediabox) placed = story.place(where) story.draw(dev, mupdf.Matrix.identity) writer.endPage() } while (placed.more) writer.close() ``` --- ### `mupdf.setLog()` — Global logging and cache management Configure a custom logger for MuPDF warnings and errors, and manage the resource store cache size. ```javascript import * as mupdf from "mupdf" // Custom logger mupdf.setLog({ error(msg) { console.error("[MuPDF ERROR]", msg) }, warning(msg) { console.warn("[MuPDF WARN]", msg) }, }) // Enable ICC color management mupdf.enableICC() // Set a CSS stylesheet for all reflowable documents (EPUB, HTML) mupdf.setUserCSS("body { font-size: 18pt; font-family: Georgia; }", true) // Manage resource store mupdf.shrinkStore(75) // shrink to 75% of current size mupdf.emptyStore() // free all cached resources // Install a system font loader mupdf.installLoadFontFunction((fontName, scriptName, isBold, isItalic) => { // Return a Font object, or null to continue with fallbacks return null }) ``` --- ## C API ### Basic C rendering example — `fz_new_context`, `fz_open_document`, `fz_new_pixmap_from_page_number` The C API is the foundation of all language bindings. A context (`fz_context`) carries global state; documents and pages are opened with `fz_open_document` / `fz_load_page`; pixmaps are rendered with `fz_new_pixmap_from_page_number`. All objects are reference-counted and must be freed with matching `fz_drop_*` calls. Exceptions are handled with `fz_try` / `fz_catch` macros. ```c #include #include #include int main(int argc, char **argv) { fz_context *ctx; fz_document *doc; fz_pixmap *pix; fz_matrix ctm; int page_number = 0; float zoom = 150.0f; /* 150 dpi */ float rotate = 0.0f; /* Create context (NULL alloc = default, NULL locks = single-thread) */ ctx = fz_new_context(NULL, NULL, FZ_STORE_UNLIMITED); if (!ctx) { fprintf(stderr, "cannot create context\n"); return 1; } fz_try(ctx) { fz_register_document_handlers(ctx); doc = fz_open_document(ctx, "input.pdf"); /* Scale matrix: zoom/100 converts zoom% to scale factor (72dpi base) */ ctm = fz_scale(zoom / 100.0f, zoom / 100.0f); ctm = fz_pre_rotate(ctm, rotate); /* Render page to an RGB pixmap (0 = no alpha) */ pix = fz_new_pixmap_from_page_number( ctx, doc, page_number, ctm, fz_device_rgb(ctx), 0); /* Save as PNG */ fz_save_pixmap_as_png(ctx, pix, "output.png"); printf("Rendered: %d x %d pixels\n", pix->w, pix->h); } fz_always(ctx) { fz_drop_pixmap(ctx, pix); fz_drop_document(ctx, doc); } fz_catch(ctx) { fz_report_error(ctx); fz_drop_context(ctx); return EXIT_FAILURE; } fz_drop_context(ctx); return EXIT_SUCCESS; } ``` --- ### Multi-threaded rendering — `fz_clone_context` For concurrent rendering, clone the base context for each thread so they share the store and glyph cache but have independent exception stacks. Documents should only be accessed from one thread at a time; render display lists (which are thread-safe) from multiple threads. ```c #include #include /* Shared across all threads */ static fz_context *base_ctx; static fz_document *doc; static pthread_mutex_t mutexes[FZ_LOCK_MAX]; static void lock_fn(void *user, int lock) { pthread_mutex_lock(&mutexes[lock]); } static void unlock_fn(void *user, int lock) { pthread_mutex_unlock(&mutexes[lock]); } typedef struct { int page; } ThreadArgs; static void *render_page(void *arg) { ThreadArgs *a = arg; /* Each thread gets its own cloned context */ fz_context *ctx = fz_clone_context(base_ctx); fz_display_list *list = NULL; fz_pixmap *pix = NULL; fz_try(ctx) { /* Fetch display list from the shared document (serialize access) */ fz_page *page; fz_rect bounds; pthread_mutex_lock(&mutexes[0]); page = fz_load_page(ctx, doc, a->page); bounds = fz_bound_page(ctx, page); list = fz_new_display_list_from_page(ctx, page); fz_drop_page(ctx, page); pthread_mutex_unlock(&mutexes[0]); /* Render the display list — no document access needed here */ fz_matrix ctm = fz_scale(2.0f, 2.0f); pix = fz_new_pixmap_from_display_list( ctx, list, ctm, fz_device_rgb(ctx), 0); char name[64]; snprintf(name, sizeof(name), "page%03d.png", a->page + 1); fz_save_pixmap_as_png(ctx, pix, name); } fz_always(ctx) { fz_drop_pixmap(ctx, pix); fz_drop_display_list(ctx, list); } fz_catch(ctx) { fz_report_error(ctx); } fz_drop_context(ctx); return NULL; } int main(void) { fz_locks_context locks = { NULL, lock_fn, unlock_fn }; for (int i = 0; i < FZ_LOCK_MAX; i++) pthread_mutex_init(&mutexes[i], NULL); base_ctx = fz_new_context(NULL, &locks, FZ_STORE_UNLIMITED); fz_register_document_handlers(base_ctx); doc = fz_open_document(base_ctx, "document.pdf"); int n = fz_count_pages(base_ctx, doc); pthread_t *threads = calloc(n, sizeof(pthread_t)); ThreadArgs *args = calloc(n, sizeof(ThreadArgs)); for (int i = 0; i < n; i++) { args[i].page = i; pthread_create(&threads[i], NULL, render_page, &args[i]); } for (int i = 0; i < n; i++) pthread_join(threads[i], NULL); fz_drop_document(base_ctx, doc); fz_drop_context(base_ctx); free(threads); free(args); return 0; } ``` --- ## mutool Command-Line Tools ### `mutool draw` — Render documents to image files Renders pages of any supported document to raster images (PNG, JPEG, PPM, etc.) or vector formats (SVG, PDF). DPI, color space, rotation, and page ranges are configurable. ```bash # Render all pages to PNG at 150 dpi mutool draw -r 150 -o page%03d.png document.pdf # Render pages 1-5 to JPEG at 200 dpi mutool draw -r 200 -F jpeg -o page%d.jpg document.pdf 1-5 # Render to PNG with specific colorspace (gray) mutool draw -r 96 -c gray -o thumb.png document.pdf 1 # Convert to PDF (useful for XPS, EPUB → PDF) mutool draw -F pdf -o output.pdf input.epub ``` --- ### `mutool convert` — Convert document formats Converts documents between formats with simpler syntax than `mutool draw`. ```bash # Convert PDF to text mutool convert -o output.txt input.pdf # Convert PDF to HTML mutool convert -o output.html input.pdf # Convert EPUB to PDF mutool convert -o output.pdf input.epub # Extract structured text as XML mutool convert -F stext input.pdf ``` --- ### `mutool clean` — Rewrite and repair PDF files Rewrites a PDF file, optionally decompressing, compressing, sanitizing content streams, and garbage-collecting unused objects. ```bash # Decompress all streams (makes file human-readable) mutool clean -d input.pdf readable.pdf # Compress everything for smallest file size mutool clean -gggg -z input.pdf compressed.pdf # Sanitize content streams and subset fonts mutool clean -s input.pdf sanitized.pdf # Fix broken PDF (force repair) and garbage collect mutool clean -g input.pdf fixed.pdf ``` --- ### `mutool run` — Run JavaScript scripts with MuPDF bindings Executes ES5 JavaScript scripts that have full access to the MuPDF JavaScript API. ```bash # Run a script with arguments mutool run script.js input.pdf output.pdf # Inline one-liner: count pages mutool run -c "var d = Document.openDocument('doc.pdf'); print(d.countPages())" # Run the PDF merge example mutool run docs/examples/pdf-merge.js merged.pdf a.pdf b.pdf c.pdf ``` --- ## Summary MuPDF is the foundation for high-performance document processing pipelines. Its primary use cases are: (1) **server-side rendering and conversion** — generating page thumbnails, full-page PNGs/JPEGs, or text extractions at scale in Node.js or C services; (2) **PDF creation and editing** — building new PDFs from HTML/CSS content via `Story`, merging documents with `graftPage`, adding or modifying annotations and form fields, applying redactions, and managing embedded files and digital signatures; (3) **content extraction** — extracting plain text, structured JSON (blocks/lines/spans with font metadata), HTML, or walking the structured text tree for custom parsers and search indexers; and (4) **viewer applications** — the Android, desktop (GL/X11), and WebAssembly viewer builds all use the same underlying C library, with `DisplayList` caching enabling smooth multi-resolution rendering. Integration patterns follow one of two approaches depending on the language. In **C**, applications create one base `fz_context` at startup, clone it per thread, open documents as `fz_document` objects, render to `fz_pixmap` via `fz_new_pixmap_from_page_number` or through a `fz_display_list` for thread-safe concurrent rendering, and free everything with `fz_drop_*`. In **JavaScript** (Node.js, browser, or `mutool run`), the API is fully object-oriented: `Document.openDocument()` → `loadPage()` → `toPixmap()` / `toStructuredText()` / `search()`, with `PDFDocument` providing low-level PDF object access, `PDFPage` for annotations and widgets, and `DocumentWriter` + `Story` for generating new paginated PDFs from HTML. The same API also runs inside a browser via the WebAssembly build (`npm install mupdf`).