### Install Hatch and Run Tests

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Installs the 'hatch' build tool and then executes the project's tests. An alternative installation method for hatch is provided in the comment.

```bash
pip install hatch  # Other ways of installing hatch: https://hatch.pypa.io/dev/install/
hatch shell
hatch test
```

--------------------------------

### Install MarkItDown from Source

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Clone the repository and install MarkItDown using pip.

```bash
git clone git@github.com:microsoft/markitdown.git
cd markitdown
pip install -e 'packages/markitdown[all]'
```

--------------------------------

### Install MarkItDown from Source

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown/README.md

Clone the repository and install MarkItDown from local source with all extras.

```bash
git clone git@github.com:microsoft/markitdown.git
cd markitdown
pip install -e packages/markitdown[all]
```

--------------------------------

### Install MarkItDown from PyPI

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown/README.md

Install the MarkItDown package with all extras using pip.

```bash
pip install markitdown[all]
```

--------------------------------

### Install Specific Optional Dependencies

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Install MarkItDown with specific optional dependencies for PDF, DOCX, and PPTX files.

```bash
pip install 'markitdown[pdf, docx, pptx]'
```

--------------------------------

### Install and Use MarkItDown Plugin (Bash)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-sample-plugin/README.md

These bash commands demonstrate how to install a MarkItDown plugin locally using pip and how to list available plugins. It also shows the command-line usage for converting a file with the plugin enabled.

```bash
pip install -e .
markitdown --list-plugins
markitdown --use-plugins path-to-file.rtf
```

--------------------------------

### List Installed Plugins

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use the markitdown command to list currently installed plugins.

```bash
markitdown --list-plugins
```

--------------------------------

### Install markitdown-ocr and OpenAI client

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Install the markitdown-ocr plugin and an OpenAI-compatible client to enable OCR capabilities. This is required before using the OCR plugin.

```bash
pip install markitdown-ocr
pip install openai  # or any OpenAI-compatible client
```

--------------------------------

### Install MarkItDown OCR Plugin

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Installs the MarkItDown OCR plugin using pip. This is the primary installation command for the plugin.

```bash
pip install markitdown-ocr
```

--------------------------------

### Enable MarkItDown Plugin in Python

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-sample-plugin/README.md

This Python code snippet illustrates how to enable installed plugins when creating a MarkItDown instance and perform a file conversion. It shows the instantiation of MarkItDown with `enable_plugins=True` and how to access the converted text content from the result.

```python
from markitdown import MarkItDown

md = MarkItDown(enable_plugins=True) 
result = md.convert("path-to-file.rtf")
print(result.text_content)
```

--------------------------------

### Install MarkItDown-MCP Package

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Install the markitdown-mcp package using pip. This is the primary method for obtaining the tool.

```bash
pip install markitdown-mcp
```

--------------------------------

### Run MarkItDown-MCP Server (STDIO)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Run the MCP server using the default STDIO transport. This command starts the server for local communication.

```bash
markitdown-mcp
```

--------------------------------

### Install OpenAI Client for MarkItDown

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Installs the OpenAI client library, which is required if you plan to use OpenAI services with the MarkItDown OCR plugin. This is a prerequisite for using the plugin's LLM capabilities.

```bash
pip install openai
```

--------------------------------

### Run MarkItDown-MCP Docker Container

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Run the markitdown-mcp Docker container. This command starts the server within a containerized environment.

```bash
docker run -it --rm markitdown-mcp:latest
```

--------------------------------

### Python API with Azure OpenAI Client

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Demonstrates using an Azure OpenAI client with the MarkItDown OCR plugin. This example shows how to configure the plugin to work with Azure's OpenAI service, including API key and endpoint configuration.

```python
from openai import AzureOpenAI

md = MarkItDown(
    enable_plugins=True,
    llm_client=AzureOpenAI(
        api_key="...",
        azure_endpoint="https://your-resource.openai.azure.com/",
        api_version="2024-02-01",
    ),
    llm_model="gpt-4o",
)
```

--------------------------------

### Create and Activate Virtual Environment (Standard Python)

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use these commands to create and activate a Python virtual environment for MarkItDown.

```bash
python -m venv .venv
source .venv/bin/activate
```

--------------------------------

### Configure MarkItDown Plugin Entrypoint (TOML)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-sample-plugin/README.md

This TOML configuration snippet defines the entry point for a MarkItDown plugin within the `pyproject.toml` file. It maps a chosen plugin name (`sample_plugin`) to the fully qualified package name that implements the plugin.

```toml
[project.entry-points."markitdown.plugin"]
sample_plugin = "markitdown_sample_plugin"
```

--------------------------------

### Enable Plugins for Conversion

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use the markitdown command with the --use-plugins flag to enable plugins during file conversion.

```bash
markitdown --use-plugins path-to-file.pdf
```

--------------------------------

### Create and Activate Virtual Environment (uv)

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use these commands with uv to create and activate a Python virtual environment for MarkItDown.

```bash
uv venv --python=3.12 .venv
source .venv/bin/activate
# NOTE: Be sure to use 'uv pip install' rather than just 'pip install' to install packages in this virtual environment
```

--------------------------------

### Build MarkItDown-MCP Docker Image

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Build the Docker image for markitdown-mcp using the provided Dockerfile. This prepares the container for deployment.

```bash
docker build -t markitdown-mcp:latest .
```

--------------------------------

### Configure Claude Desktop MCP Server (with Volume Mount)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Configure Claude Desktop to use the markitdown MCP server with a Docker volume mount. This allows the server to access local files specified in the volume mapping.

```json
{
  "mcpServers": {
    "markitdown": {
      "command": "docker",
      "args": [
	"run",
	"--rm",
	"-i",
	"-v",
	"/home/user/data:/workdir",
	"markitdown-mcp:latest"
      ]
    }
  }
}
```

--------------------------------

### Configure Claude Desktop MCP Server (Basic)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Configure Claude Desktop to use the markitdown MCP server via Docker. This JSON entry specifies the command and arguments for running the server.

```json
{
  "mcpServers": {
    "markitdown": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "markitdown-mcp:latest"
      ]
    }
  }
}
```

--------------------------------

### Run MarkItDown-MCP Server (HTTP/SSE)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Run the MCP server using Streamable HTTP and SSE transports. Specify the host and port for network access.

```bash
markitdown-mcp --http --host 127.0.0.1 --port 3001
```

--------------------------------

### Initialize Markitdown with OCR plugin

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Initialize the Markitdown converter with the OCR plugin enabled, providing an LLM client and model for image-based text extraction. If no client is provided, OCR is skipped.

```python
from markitdown import MarkItDown
from openai import OpenAI

md = MarkItDown(
    enable_plugins=True,
    llm_client=OpenAI(),
    llm_model="gpt-4o",
)
result = md.convert("document_with_images.pdf")
print(result.text_content)
```

--------------------------------

### Convert file using Azure Content Understanding CLI

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use the Markitdown CLI to convert a file using Azure Content Understanding. Ensure you provide your content understanding endpoint.

```bash
markitdown path-to-file.pdf --use-cu --cu-endpoint "<content_understanding_endpoint>"
```

--------------------------------

### Run MarkItDown Docker Container for Conversion

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Runs the MarkItDown Docker container interactively to convert a local PDF file to Markdown output. Input is piped from a file, and output is redirected to 'output.md'.

```docker
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
```

--------------------------------

### Create and Activate Virtual Environment (Anaconda)

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use these commands with Anaconda to create and activate a Python virtual environment for MarkItDown.

```bash
conda create -n markitdown python=3.12
conda activate markitdown
```

--------------------------------

### Convert File to Markdown with Output File (Command-Line)

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Use the markitdown command-line tool with the -o flag to specify the output file.

```bash
markitdown path-to-file.pdf -o document.md
```

--------------------------------

### Image Description Generation with LLM in Python

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Generates descriptions for image files (or PPTX) using Large Language Models via the MarkItDown Python API. Requires an OpenAI client and model. You can optionally provide a custom prompt.

```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o", llm_prompt="optional custom prompt")
result = md.convert("example.jpg")
print(result.text_content)
```

--------------------------------

### Build MarkItDown Docker Image

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Builds a Docker image for MarkItDown with the tag 'markitdown:latest'. This command should be run from the project's root directory.

```docker
docker build -t markitdown:latest .
```

--------------------------------

### Run Docker Container with Volume Mount

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Run the markitdown-mcp Docker container and mount a local directory into the container. This allows access to local files.

```bash
docker run -it --rm -v /home/user/data:/workdir markitdown-mcp:latest
```

--------------------------------

### Initialize Markitdown with Custom Azure Content Understanding Analyzer

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Initialize Markitdown with a custom Azure Content Understanding analyzer ID for domain-specific field extraction. The output will include YAML front matter with extracted fields.

```python
md = MarkItDown(
    cu_endpoint="<content_understanding_endpoint>",
    cu_analyzer_id="my-invoice-analyzer",
)
result = md.convert("invoice.pdf")
print(result.markdown)
# Output includes YAML front matter with extracted fields:
# ---
# contentType: document
# fields:
#   VendorName: CONTOSO LTD.
#   InvoiceDate: '2019-11-15'
# ---
# <!-- page 1 -->
# ...
```

--------------------------------

### Initialize Markitdown with Azure Content Understanding

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Initialize Markitdown with your Azure Content Understanding endpoint. This allows for higher-quality conversion and structured field extraction. The analyzer is auto-selected based on file type.

```python
from markitdown import MarkItDown

# Zero-config — auto-selects analyzer per file type
md = MarkItDown(cu_endpoint="<content_understanding_endpoint>")
result = md.convert("report.pdf")   # documents → prebuilt-documentSearch
result = md.convert("meeting.mp4")  # video → prebuilt-videoSearch
result = md.convert("call.wav")     # audio → prebuilt-audioSearch
print(result.markdown)
```

--------------------------------

### Command Line Usage of MarkItDown OCR

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Demonstrates how to use the MarkItDown OCR plugin from the command line. It specifies the input document and enables the use of LLM plugins with a specific client and model.

```bash
markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o
```

--------------------------------

### Basic Markdown Conversion with Python API

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Demonstrates basic usage of the MarkItDown Python API to convert an Excel file to Markdown. Set 'enable_plugins' to True to activate plugins.

```python
from markitdown import MarkItDown

md = MarkItDown(enable_plugins=False) # Set to True to enable plugins
result = md.convert("test.xlsx")
print(result.text_content)
```

--------------------------------

### Launch MCP Inspector

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-mcp/README.md

Launch the MCP Inspector tool using npx. This tool is used for debugging the MCP server.

```bash
npx @modelcontextprotocol/inspector
```

--------------------------------

### Navigate to MarkItDown Package Directory

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Changes the current directory to the 'markitdown' package within the project structure. This is a prerequisite for running local tests.

```bash
cd packages/markitdown
```

--------------------------------

### Convert File to Markdown via Python API

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown/README.md

Instantiate the MarkItDown class and use its convert method to process a file, then print the text content.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("test.xlsx")
print(result.text_content)
```

--------------------------------

### Pipe Content to MarkItDown (Command-Line)

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Pipe the content of a file to the markitdown command-line tool.

```bash
cat path-to-file.pdf | markitdown
```

--------------------------------

### Convert File to Markdown via CLI

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown/README.md

Use the MarkItDown command-line utility to convert a file to Markdown and redirect output.

```bash
markitdown path-to-file.pdf > document.md
```

--------------------------------

### Register MarkItDown Plugin Entrypoint (Python)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-sample-plugin/README.md

This Python code snippet shows how to define the necessary package exports for a MarkItDown plugin. It sets the plugin interface version and provides the `register_converters` function, which is called by MarkItDown to register custom converters.

```python
# The version of the plugin interface that this plugin uses. 
# The only supported version is 1 for now.
__plugin_interface_version__ = 1 

# The main entrypoint for the plugin. This is called each time MarkItDown instances are created.
def register_converters(markitdown: MarkItDown, **kwargs):
    """
    Called during construction of MarkItDown instances to register converters provided by plugins.
    """

    # Simply create and attach an RtfConverter instance
    markitdown.register_converter(RtfConverter())
```

--------------------------------

### Implement Custom DocumentConverter for MarkItDown Plugin (Python)

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-sample-plugin/README.md

This Python code defines a custom DocumentConverter for MarkItDown. It includes the `accepts` method to check file compatibility and the `convert` method for file transformation. This serves as the core logic for handling specific file formats within the plugin.

```python
from typing import BinaryIO, Any
from markitdown import MarkItDown, DocumentConverter, DocumentConverterResult, StreamInfo

class RtfConverter(DocumentConverter):

    def __init__(
        self, priority: float = DocumentConverter.PRIORITY_SPECIFIC_FILE_FORMAT
    ):
        super().__init__(priority=priority)

    def accepts(
        self, 
        file_stream: BinaryIO,
        stream_info: StreamInfo,
        **kwargs: Any,
    ) -> bool:
	
	# Implement logic to check if the file stream is an RTF file
	# ...
	raise NotImplementedError()


    def convert(
        self,
        file_stream: BinaryIO,
        stream_info: StreamInfo,
        **kwargs: Any,
    ) -> DocumentConverterResult:

	# Implement logic to convert the file stream to Markdown
	# ...
	raise NotImplementedError()
```

--------------------------------

### Python API Usage with MarkItDown OCR

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Shows how to integrate the MarkItDown OCR plugin into a Python script. It initializes MarkItDown with plugin support and specifies the LLM client and model for OCR processing.

```python
from markitdown import MarkItDown
from openai import OpenAI

md = MarkItDown(
    enable_plugins=True,
    llm_client=OpenAI(),
    llm_model="gpt-4o",
)

result = md.convert("document_with_images.pdf")
print(result.text_content)
```

--------------------------------

### Run Pre-commit Checks

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Executes all pre-commit checks across all files in the repository. This command should be run before submitting a Pull Request to ensure code quality and consistency.

```bash
pre-commit run --all-files
```

--------------------------------

### Troubleshooting Missing OCR Text

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Provides a code snippet to verify the correct configuration of `llm_client` and `llm_model` when troubleshooting missing OCR text in MarkItDown outputs. This is a common issue if the LLM is not properly initialized.

```python
from openai import OpenAI
from markitdown import MarkItDown

md = MarkItDown(
    enable_plugins=True,
    llm_client=OpenAI(),   # required
    llm_model="gpt-4o",    # required
)
```

--------------------------------

### Restrict Azure Content Understanding to specific file types

Source: https://github.com/microsoft/markitdown/blob/main/README.md

Configure Markitdown to only use Azure Content Understanding for specified file types, such as PDFs. Other file types will use default converters.

```python
from markitdown.converters import ContentUnderstandingFileType

md = MarkItDown(
    cu_endpoint="<content_understanding_endpoint>",
    cu_file_types=[ContentUnderstandingFileType.PDF],  # only PDFs use CU
)
```

--------------------------------

### Python API with Custom OCR Prompt

Source: https://github.com/microsoft/markitdown/blob/main/packages/markitdown-ocr/README.md

Illustrates how to use a custom prompt with the MarkItDown OCR plugin via its Python API. This allows for specialized text extraction instructions tailored to specific document types.

```python
md = MarkItDown(
    enable_plugins=True,
    llm_client=OpenAI(),
    llm_model="gpt-4o",
    llm_prompt="Extract all text from this image, preserving table structure.",
)
```