### MarkItDown CLI Usage Examples Source: https://context7.com/managedcode/markitdown/llms.txt Provides examples for running the MarkItDown CLI for interactive and batch conversions, building self-contained binaries, converting multiple files from a directory, and converting a URL. ```bash # Run the CLI dotnet run --project src/MarkItDown.Cli -- path/to/input ``` ```bash # Build a self-contained binary dotnet publish src/MarkItDown.Cli -c Release -r linux-x64 --self-contained ``` ```bash # Batch convert multiple files ./MarkItDown.Cli /path/to/documents/ ``` ```bash # Convert a URL ./MarkItDown.Cli https://example.com/article ``` -------------------------------- ### Install ManagedCode.MarkItDown via NuGet Source: https://github.com/managedcode/markitdown/blob/main/README.md Use the Package Manager Console, .NET CLI, or PackageReference in your .csproj file to install the library. ```bash # Package Manager Console Install-Package ManagedCode.MarkItDown ``` ```bash # .NET CLI dotnet add package ManagedCode.MarkItDown ``` ```xml ``` -------------------------------- ### Install MarkItDown via NuGet Source: https://context7.com/managedcode/markitdown/llms.txt Install the MarkItDown library using the .NET CLI, Package Manager Console, or by adding a PackageReference to your .csproj file. ```bash # .NET CLI dotnet add package ManagedCode.MarkItDown # Package Manager Console Install-Package ManagedCode.MarkItDown # PackageReference (add to your .csproj) ``` -------------------------------- ### Install .NET SDK using install-dotnet.sh Source: https://github.com/managedcode/markitdown/blob/main/README.md This bash script installs the required .NET SDK version for MarkItDown development. It uses the official dotnet-install script and configures environment variables for subsequent shell sessions. ```bash ./eng/install-dotnet.sh ``` -------------------------------- ### Minimal Usage Example Source: https://github.com/managedcode/markitdown/blob/main/README.md Instantiate the MarkItDownClient and use it to convert a PDF document asynchronously. The result contains the title and Markdown content. ```csharp using MarkItDown; var client = new MarkItDownClient(); await using var result = await client.ConvertAsync("document.pdf"); Console.WriteLine(result.Title); Console.WriteLine(result.Markdown); ``` -------------------------------- ### Complex Markdown Report Example Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md A comprehensive markdown example demonstrating a report structure with metadata, headings, embedded images, descriptions, audio transcripts, and tables. ```markdown --- title: Q4 Report author: Jane Smith date: 2024-01-15 source: report.pdf (30 pages) --- # Q4 Performance Report ## Executive Summary Revenue exceeded projections by 15%. ![Dashboard](dashboard.png) ## CEO Commentary ## Detailed Metrics | Metric | Target | Actual | Variance | |--------|--------|---------|----------| | Revenue | $3.9M | $4.5M | +15.4% | | New Customers | 2,000 | 2,150 | +7.5% | | Churn Rate | 3.0% | 2.1% | -30% | ``` -------------------------------- ### Initialize Performance Metrics Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Initializes the performance object and sets up event listeners for navigation and beforeunload events. It uses `performance.timing` or a custom start time if available. ```javascript var perf = (function(n) { var o = function() { return window.performance && performance.now ? Math.round(performance.now()) : new Date() - (window.si_ST || 0); }, a = function() { return window.performance && performance.timing && window.performance.timing.navigationStart || window.si_ST || new Date(); }, p = function(t) { l = t; }, w = function(t, i) { var r = o(); t && i && t.length > 0 && (n.marks = n.marks || {}, n.marks[t] = i - l); }, b = function(t, i) { var r = o(); n.measures = n.measures || {}; n.measures[t] = i - l; }, u = function() { var t = o(), i = a(); if (!n.measures) return; var r = { duration: t - i }, o = Object.keys(n.marks || {}), a = Object.keys(n.measures || {}); o.forEach(function(t) { r[t] = n.marks[t]; }); a.forEach(function(t) { r[t] = n.measures[t]; }); var h = window.si_PP; h && h(t, "P", r); n.marks = {}, n.measures = {}, i = !0; }; if (window.performance && performance.timing) { l = a(); n.setStartTime(l); } sj_be(window, "load", u, !1); sj_be(window, "beforeunload", u, !1); return n; })(perf || (perf = {})); ``` -------------------------------- ### Handle MarkItDown Conversion Errors in C# Source: https://context7.com/managedcode/markitdown/llms.txt Illustrates how to gracefully handle various conversion errors using specific exception types such as UnsupportedFormatException, FileNotFoundException, FileConversionException, and OperationCanceledException. Includes an example of using a timeout with a CancellationToken. ```csharp using MarkItDown; var client = new MarkItDownClient(httpClient: new HttpClient()); try { await using var result = await client.ConvertAsync("document.pdf"); Console.WriteLine(result.Markdown); } catch (UnsupportedFormatException ex) { // File format not supported by any converter Console.WriteLine($"Unsupported format: {ex.Message}"); } catch (FileNotFoundException ex) { // File doesn't exist Console.WriteLine($"File not found: {ex.Message}"); } catch (FileConversionException ex) { // Authentication/authorization or conversion failure Console.WriteLine($"Conversion failed: {ex.Message}"); if (ex.InnerException is AggregateException agg) { foreach (var inner in agg.InnerExceptions) { Console.WriteLine($" - {inner.Message}"); } } } catch (OperationCanceledException) { Console.WriteLine("Conversion was cancelled"); } // Use timeout with cancellation token using var cts = new CancellationTokenSource(TimeSpan.FromMinutes(5)); try { await using var result = await client.ConvertAsync("large-file.pdf", cts.Token); } catch (OperationCanceledException) { Console.WriteLine("Conversion timed out"); } ``` -------------------------------- ### Basic Azure Document Intelligence Configuration Source: https://github.com/managedcode/markitdown/blob/main/README.md Example of setting up Azure Document Intelligence options with just the endpoint. When API keys are omitted, the system will attempt to use managed identity. ```csharp var azureOptions = new AzureIntelligenceOptions { DocumentIntelligence = new AzureDocumentIntelligenceOptions { Endpoint = "https://contoso.cognitiveservices.azure.com/" }, ``` -------------------------------- ### Markdown Flowchart Example (Mermaid) Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md Create flowcharts within markdown using Mermaid syntax, embedded in a comment and linked to an image. This is useful for illustrating processes. ```markdown ![Sales Process](sales-flow.png) Contact[Initial Contact] Contact --> Qualify{Qualified?} Qualify -->|Yes| Demo[Product Demo] Qualify -->|No| Nurture[Add to Nurture] Demo --> Proposal[Send Proposal] Proposal --> Negotiate{Negotiate} Negotiate -->|Deal| Close[Close Deal] Negotiate -->|No Deal| Followup[Follow Up] Nurture --> Contact ``` --> ``` -------------------------------- ### Get Scope Bar Element Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Retrieves the first scope bar element. Returns null if no scope bar is found. ```javascript function it(){var n=_d.querySelectorAll(".b_scopebar");return(n===null||n===void 0?void 0:n.length)?n[0].firstChild:null} ``` -------------------------------- ### Get Scope Hide Element Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Retrieves the first element with the class 'b_scopehide' within the 'b_scopebar'. Returns null if no such element is found. ```javascript function tt(){var n=_d.querySelectorAll(".b_scopebar > .b_scopehide");return n&&n.length>0?n[0]:null} ``` -------------------------------- ### Shift Elements Up in Array Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Shifts elements up in an array, moving elements from a starting index to an ending index. Used internally by 'k'. ```javascript function g(n,t,i){for(var r=i;r>t;r--)n[r].innerHTML=n[r-1].innerHTML,n[r].id=n[r-1].id} ``` -------------------------------- ### BM Module Initialization and Event Handling Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Initializes the BM module, allowing rules to be wired up, enqueued, and dequeued. It also defines functions to trigger computations and get timestamps. ```javascript var BM = BM || {}; (function(n) { function u(n, u) { n in t || (t[n] = []); !u.compute || n in r || (r[n] = u.compute); !u.unload || n in i || (i[n] = u.unload); u.load && u.load() } function f(n, i) { t[n].push({ t: s(), i: i }) } function e(n) { return n in i && i[n](), n in t ? t[n] : void 0 } function o() { for (var n in r) r[n]() } function s() { return window.performance && performance.now ? Math.round(performance.now()) : new Date - window.si_ST } var t = {}, i = {}, r = {}; n.wireup = u; n.enqueue = f; n.dequeue = e; n.trigger = o })(BM); ``` -------------------------------- ### Track Frontend Rendering Performance Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Observes 'element' entries using PerformanceObserver to track frontend rendering performance. It records the minimum render time of elements starting with 'frp' and disconnects the observer once recorded. ```javascript var FRPMetricModule; (function() { var t = !1, i, n; typeof PerformanceObserver != "undefined" && typeof PerformanceObserver == "function" && (i = PerformanceObserver.supportedEntryTypes || [], i.indexOf("element") >= 0 && (n = new PerformanceObserver(function(i) { i.getEntries().forEach(function(i) { var r, u, f; typeof _w.frpPreviousEntry == "undefined" && (_w.frpPreviousEntry = i); ((r = i === null || i === void 0 ? void 0 : i.identifier) === null || r === void 0 ? void 0 : r.length) > 0 && (u = i.identifier, u.startsWith("frp") && u !== "frp.SearchBox" && (f = Math.round(Math.min(_w.frpPreviousEntry.renderTime, i.renderTime)), _G.frp = f, _w.perf && !t && (_w.perf.record && _w.perf.record("FRP", f), t = !0), n && t && n.disconnect())) }) }), n.observe({entryTypes: ["element"]})) })(FRPMetricModule || (FRPMetricModule = {})); ``` -------------------------------- ### Build, Test, and Package MarkItDown from Source Source: https://github.com/managedcode/markitdown/blob/main/README.md Standard .NET CLI commands to clone the repository, build the solution, run unit tests, and create a NuGet package for the MarkItDown project. ```bash # Clone the repository git clone https://github.com/managedcode/markitdown.git cd markitdown # Build the solution dotnet build # Run tests dotnet test # Create NuGet package dotnet pack --configuration Release ``` -------------------------------- ### Configure MarkItDown Client Source: https://github.com/managedcode/markitdown/blob/main/README.md Set up MarkItDown options, including enabling built-in converters and specifying the path to the exiftool binary if needed. Instantiate the client with these options. ```csharp var options = new MarkItDownOptions { EnableBuiltins = true, // Use built-in converters (default: true) EnablePlugins = false, // Plugin system (reserved for future use) ExifToolPath = "/usr/local/bin/exiftool" // Path to exiftool binary (optional) }; var markItDown = new MarkItDownClient(options); ``` -------------------------------- ### Build Project Source: https://github.com/managedcode/markitdown/blob/main/docs/Development/CI.md Builds the MarkItDown solution. This command compiles the code. ```bash dotnet build MarkItDown.slnx ``` -------------------------------- ### Markdown Photo Example Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md Embed a photo in markdown. Use a comment block to provide a detailed description of the photo's content. ```markdown ![Office Photo](office.jpg) ``` -------------------------------- ### Shift Elements Down in Array Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Shifts elements down in an array, moving elements from a starting index to an ending index. Used internally by 'k'. ```javascript function d(n,t,i){for(var r=t;r builder.AddConsole().SetMinimumLevel(LogLevel.Information)); services.AddHttpClient(); var serviceProvider = services.BuildServiceProvider(); var logger = serviceProvider.GetRequiredService>(); var httpClientFactory = serviceProvider.GetRequiredService(); var options = new MarkItDownOptions { // Graceful degradation for image processing ImageCaptioner = async (bytes, info, token) => { try { // Your AI service call here return await CallVisionServiceAsync(bytes, token); } catch (Exception ex) { logger.LogWarning("Image captioning failed: {Error}", ex.Message); return $"[Image: {info.FileName ?? "unknown"}]"; // Fallback } } }; var markItDown = new MarkItDownClient(options, logger, httpClientFactory.CreateClient()); ``` -------------------------------- ### Get Scope Bar Items Source: https://github.com/managedcode/markitdown/blob/main/tests/MarkItDown.Tests/TestFiles/bing-search-results.html Retrieves all scope bar items that are part of the menu. It filters elements with the class 'b_sp_over_menu' and 'b_scopebar_item'. ```javascript function s(){var n=_d.querySelectorAll(".b_scopebar #b-scopeListItem-menu .b_sp_over_menu .b_scopebar_item");return Array.prototype.slice.call(n)} ``` -------------------------------- ### Analyze Project Command Source: https://github.com/managedcode/markitdown/blob/main/AGENTS.md Build the MarkItDown solution with analyzers enabled for code analysis. ```bash dotnet build MarkItDown.slnx -p:RunAnalyzers=true ``` -------------------------------- ### Flattened Table Example Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD.md Flatten tables with merged cells to ensure each Markdown row and column has concrete values. This is crucial for consistent rendering and processing. ```markdown | ITEM | CONDITION | QUANTITY | LOCATION | | --- | --- | --- | --- | | Laptop | Refurbished | 5 | Aisle 3 | | Laptop | Like-New (Open Box) | 3 | Aisle 3 | | Laptop | Used (Grade A) | 4 | Aisle 5 | ``` -------------------------------- ### System / Module Map (Mermaid) Source: https://github.com/managedcode/markitdown/blob/main/docs/templates/Architecture-Template.md Illustrates the high-level structure of modules and their relationships within the system. Use this to understand the overall block diagram. ```mermaid flowchart LR EP[Entry Points] A[Module A] B[Module B] EP --> A A --> B ``` -------------------------------- ### Running the MarkItDown CLI Source: https://github.com/managedcode/markitdown/blob/main/README.md Command to run the bundled CLI for batch processing of files or URLs. Use 'dotnet publish' for a self-contained binary. ```bash dotnet run --project src/MarkItDown.Cli -- path/to/input ``` -------------------------------- ### Markdown Screenshot Description Example Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md Embed a screenshot in markdown and provide a detailed description in a comment block. This is useful for documenting UI elements and settings. ```markdown ![Settings Page](settings.png) ``` -------------------------------- ### Configure MarkItDownClient with Advanced Options Source: https://context7.com/managedcode/markitdown/llms.txt Customize the MarkItDown client using MarkItDownOptions. Options include workspace configuration, AI integration, telemetry, custom converters, and delegates for image captioning and audio transcription. ```csharp using MarkItDown; using Microsoft.Extensions.Logging; var options = new MarkItDownOptions { // Workspace configuration RootPath = Path.Combine(Path.GetTempPath(), "markitdown", "workspaces"), // Built-in converter registration EnableBuiltins = true, // Optional external tool path for image metadata ExifToolPath = "/usr/local/bin/exiftool", // Telemetry settings EnableTelemetry = true, ProgressDetail = ProgressDetailLevel.Detailed, // AI enrichment settings EnableAiImageEnrichment = true, // Segment configuration Segments = SegmentOptions.Default with { IncludeSegmentMetadataInMarkdown = true }, // Custom image captioning delegate ImageCaptioner = async (imageBytes, streamInfo, token) => { // Call your vision AI service return $"[Image: {streamInfo.FileName ?? "unknown"}]"; }, // Custom audio transcription delegate AudioTranscriber = async (audioBytes, streamInfo, token) => { // Call your speech-to-text service return "[Transcription placeholder]"; } }; // Create client with logging using var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole().SetMinimumLevel(LogLevel.Information)); var logger = loggerFactory.CreateLogger(); var client = new MarkItDownClient(options, logger, new HttpClient()); ``` -------------------------------- ### Markdown Image Example Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md Embed an image in markdown using the standard ![alt text](url) syntax. Include a comment block for image descriptions. ```markdown ![Screenshot](screen.png) ``` -------------------------------- ### Migrate MarkItDown Conversion from Python to C# Source: https://github.com/managedcode/markitdown/blob/main/README.md Illustrates the equivalent C# code for converting a PDF file, demonstrating the shift to an asynchronous pattern and the use of StreamInfo. ```python # Python version import markitdown md = markitdown.MarkItDownClient() result = md.convert("document.pdf") print(result.text_content) ``` ```csharp // C# version using MarkItDown; var markItDown = new MarkItDownClient(); var result = await markItDown.ConvertAsync("document.pdf"); Console.WriteLine(result.Markdown); ``` -------------------------------- ### Configure Google Cloud AI Options Source: https://github.com/managedcode/markitdown/blob/main/README.md Set up Google Cloud AI options for Document Intelligence, Vision, and Media. This includes project ID, location, processor ID, and credential paths or direct credential objects. Defaults to Application Default Credentials if not specified. ```csharp var googleOptions = new GoogleIntelligenceOptions { DocumentIntelligence = new GoogleDocumentIntelligenceOptions { ProjectId = "my-project", Location = "us", ProcessorId = "processor-id", CredentialsPath = Environment.GetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS") }, Vision = new GoogleVisionOptions { JsonCredentials = Environment.GetEnvironmentVariable("GOOGLE_VISION_JSON") }, Media = new GoogleMediaIntelligenceOptions { Credential = GoogleCredential.GetApplicationDefault(), LanguageCode = "en-US" } }; ``` -------------------------------- ### Markdown Pie Chart Example Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md Embed a pie chart in markdown, similar to other charts, by linking an image and detailing the chart's data and segments in a comment. ```markdown ![Market Share](pie-chart.png) ``` -------------------------------- ### Restore Dependencies Source: https://github.com/managedcode/markitdown/blob/main/docs/Development/CI.md Restores project dependencies. Run this before building. ```bash dotnet restore MarkItDown.slnx ``` -------------------------------- ### Markdown Chart Example (as data) Source: https://github.com/managedcode/markitdown/blob/main/docs/MetaMD-Examples.md Embed a chart in markdown by referencing an image and providing chart data in a comment block. This allows for structured data representation. ```markdown ![Sales Chart](chart.png) ```