Hello

### Local Development Setup

Source: https://github.com/jina-ai/reader/blob/main/README.md

Commands to install dependencies, start services, and initialize the database for local development.

```bash
npm install
docker compose up -d
npm run init-db
```

--------------------------------

### Local Development Setup Commands

Source: https://context7.com/jina-ai/reader/llms.txt

Commands for setting up the local development environment, including cloning the repository, installing dependencies, starting services via Docker, initializing the database, and running the application in development or production mode.

```bash
# Prerequisites: Node.js >=22, Docker
git clone git@github.com:jina-ai/reader.git
cd reader
npm install

# Start MongoDB + MinIO (S3-compatible) via Docker Compose
docker compose up -d

# Initialise the database (creates collections/indexes)
npm run init-db

# Start in development mode with hot-reload
npm run dev
# → Server listening on http://localhost:3000 (HTTP/1.1)
# → Alternative HTTP/2 cleartext on port 3001

# Build TypeScript and start in production mode
npm run serve

# Run unit tests
npm run test:unit

# Run e2e tests
npm run test:e2e

# With coverage report
npm run test:coverage
```

--------------------------------

### Local Development with Environment Variables

Source: https://github.com/jina-ai/reader/blob/main/README.md

Commands to start services and run development mode after setting up environment variables.

```bash
docker compose up -d
npm run dev
```

--------------------------------

### Start Crawl Server (Node.js)

Source: https://context7.com/jina-ai/reader/llms.txt

Import and start the crawl server singleton for embedding in Node.js applications. The service can be configured to listen on a specific port.

```typescript
// package.json exports: ".", "./crawl", "./search", "./serp"
import crawlServer from 'reader/crawl';      // CrawlStandAloneServer singleton
import searchServer from 'reader/search';    // SearchStandAloneServer singleton

// Start the crawl server on port 3000
crawlServer.serviceReady().then((s) => {
    s.h2c().listen(3000);
});

// Or dry-run mode (initialise then shut down — useful for testing)
if (process.env.NODE_ENV === 'dry-run') {
    crawlServer.serviceReady().then(() => finalizer.terminate());
}
```

--------------------------------

### Clone Repository and Install Dependencies

Source: https://github.com/jina-ai/reader/blob/main/README.md

Instructions for cloning the Jina Reader repository and installing the necessary Node.js dependencies using npm. Ensure you are using Node.js v18.

```bash
git clone git@github.com:jina-ai/reader.git
npm install
```

--------------------------------

### Start Web Crawler

Source: https://github.com/jina-ai/reader/blob/main/fixtures/sample.html

Initiates a web crawler to start fetching pages from a given URL. Ensure the Crawler class is properly defined and imported.

```javascript
const crawler = new Crawler();
crawler.start('https://example.com');
```

--------------------------------

### GET / - URL to Markdown

Source: https://context7.com/jina-ai/reader/llms.txt

Prepend `https://r.jina.ai/` to any URL to convert it to markdown. The service auto-selects rendering engines and applies cleanup. Behavior can be controlled via request headers.

```APIDOC
## GET /

### Description
Converts a given URL to markdown text. The service automatically selects the appropriate rendering engine (Curl or Browser) and applies cleanup using Readability. Output format and behavior can be customized using request headers.

### Method
GET

### Endpoint
`https://r.jina.ai/<URL>`

### Headers
- `Accept`: `application/json` to receive JSON output with title, url, and content fields.
- `X-No-Cache`: `true` to force bypass cache.
- `X-Timeout`: Specifies timeout in seconds.
- `X-Target-Selector`: Focuses rendering on a specific CSS selector.
- `X-Remove-Selector`: Removes specified CSS elements (e.g., cookie banners).
- `X-Respond-With`: `screenshot` to receive a raw screenshot (PNG redirect).
- `X-Set-Cookie`: Forwards session cookies.
- `X-Proxy-Url`: Specifies a custom proxy URL (e.g., `socks5://user:pass@proxy.example.com:1080`).

### Request Example
```bash
# Basic usage
curl https://r.jina.ai/https://en.wikipedia.org/wiki/Artificial_intelligence

# JSON output
curl -H "Accept: application/json" https://r.jina.ai/https://en.wikipedia.org/wiki/Artificial_intelligence

# Screenshot output
curl -L -H "X-Respond-With: screenshot" https://r.jina.ai/https://jina.ai -o screenshot.png
```

### Response
#### Success Response (200)
- `content` (string): Markdown representation of the URL content.
- `title` (string): The title of the page.
- `url` (string): The original URL.

#### Response Example
```json
{
  "title": "Artificial intelligence",
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "content": "# Artificial intelligence\n..."
}
```
```

--------------------------------

### Markdown output: ATX headings

Source: https://context7.com/jina-ai/reader/llms.txt

Control the markdown heading style. This example forces ATX-style headings (e.g., # Heading) instead of setext underline style.

```bash
curl -H "X-Md-Heading-Style: atx" \
     https://r.jina.ai/https://docs.example.com
```

--------------------------------

### Get Screenshot with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Retrieve a raw screenshot (PNG redirect) by setting the 'X-Respond-With' header to 'screenshot'. Use '-L' to follow redirects.

```bash
# Return raw screenshot (PNG redirect)
curl -L -H "X-Respond-With: screenshot" \
     https://r.jina.ai/https://jina.ai -o screenshot.png
```

--------------------------------

### Submit URL via POST with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Submit a URL for processing via a POST request, essential for hash-routed SPAs where fragments cannot be sent in GET paths.

```bash
# Hash-routed SPA (# fragment cannot be sent in a GET path)
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d '{"url": "https://example.com/#/route/to/page"}'
```

--------------------------------

### Inject custom JavaScript

Source: https://context7.com/jina-ai/reader/llms.txt

Inject custom JavaScript into the page after it has loaded. This example removes a modal element using JavaScript.

```bash
curl -H "Accept: application/json" \
     -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d '{"url": "https://example.com", "injectPageScript": ["document.querySelector(\".modal\").remove();"]}'
```

--------------------------------

### Convert URL to Markdown with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Use this to convert a Wikipedia page to markdown. Specify 'Accept: application/json' to get JSON output with title, url, and content fields.

```bash
# Basic: convert a Wikipedia page to markdown
curl https://r.jina.ai/https://en.wikipedia.org/wiki/Artificial_intelligence

# Return JSON with title, url, content fields
curl -H "Accept: application/json" \
     https://r.jina.ai/https://en.wikipedia.org/wiki/Artificial_intelligence
```

--------------------------------

### Token budget: Summary of links

Source: https://context7.com/jina-ai/reader/llms.txt

Include a summary section listing all found links in the response. Set 'X-With-Links-Summary: true' to enable this feature.

```bash
curl -H "X-With-Links-Summary: true" \
     https://r.jina.ai/https://news.ycombinator.com
```

--------------------------------

### Enable JSON Mode with Accept Header

Source: https://github.com/jina-ai/reader/blob/main/README.md

Use the `Accept: application/json` header to retrieve content in JSON format. Currently, this mode returns a JSON object with `url`, `title`, and `content` fields.

```bash
curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
```

--------------------------------

### Submit Binary File as Base64 with POST

Source: https://context7.com/jina-ai/reader/llms.txt

Submit any binary file (e.g., Word, Excel, PowerPoint) encoded in Base64 in the JSON body of a POST request.

```bash
# Submit any binary file (Word, Excel, PowerPoint) as base64
DOCX_B64=$(base64 -i slides.pptx)
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d "{"file": "${DOCX_B64}"}"
```

--------------------------------

### In-Site Search with Jina Reader

Source: https://github.com/jina-ai/reader/blob/main/README.md

Perform an in-site search by specifying the `site` parameter in the query. Multiple `site` parameters can be used to search across different domains.

```bash
curl 'https://s.jina.ai/When%20was%20Jina%20AI%20founded%3F?site=jina.ai&site=github.com'
```

--------------------------------

### SPA Fetching with Timeout Header

Source: https://github.com/jina-ai/reader/blob/main/README.md

When dealing with SPAs that dynamically load content, use the `x-timeout` header to instruct Reader to wait until the network is idle or the timeout is reached, ensuring content is fully loaded.

```bash
curl 'https://example.com/' -H 'x-timeout: 30'
```

--------------------------------

### Enable Image Captioning

Source: https://github.com/jina-ai/reader/blob/main/README.md

To enable image captioning for better latency, set the `x-with-generated-alt: true` header in your request.

```http
x-with-generated-alt: true
```

--------------------------------

### Control Output Format with X-Respond-With Header

Source: https://context7.com/jina-ai/reader/llms.txt

Demonstrates how to control the output format of the Reader API using the X-Respond-With header. Supports various formats like markdown, html, text, pageshot, and readerlm-v2. Requires Accept: text/event-stream or application/json for compound formats.

```bash
# Return only the cleaned markdown (no Readability processing)
curl -H "X-Respond-With: markdown" \
     https://r.jina.ai/https://docs.python.org/3/library/os.html
```

```bash
# Return raw outer HTML
curl -H "X-Respond-With: html" \
     https://r.jina.ai/https://example.com
```

```bash
# Return innerText only
curl -H "X-Respond-With: text" \
     https://r.jina.ai/https://example.com
```

```bash
# Full-page screenshot (PNG)
curl -H "X-Respond-With: pageshot" \
     https://r.jina.ai/https://jina.ai -o pageshot.png
```

```bash
# Use ReaderLM-v2 (small LM converts HTML → markdown)
curl -H "X-Respond-With: readerlm-v2" \
     https://r.jina.ai/https://arxiv.org/abs/2309.10305
```

```bash
# Compound: content + screenshot (requires SSE or JSON Accept)
curl -H "Accept: application/json" \
     -H "X-Respond-With: content,screenshot" \
     https://r.jina.ai/https://example.com
```

--------------------------------

### Compare Standard vs. Streaming Mode

Source: https://github.com/jina-ai/reader/blob/main/README.md

Demonstrates the difference between standard and streaming mode for content extraction. Streaming mode is beneficial when content is loaded dynamically after the initial page load.

```bash
curl -H 'x-no-cache: true' https://access.redhat.com/security/cve/CVE-2023-45853
```

```bash
curl -H "Accept: text/event-stream" -H 'x-no-cache: true' https://r.jina.ai/https://access.redhat.com/security/cve/CVE-2023-45853
```

--------------------------------

### Read URL with Jina Reader

Source: https://github.com/jina-ai/reader/blob/main/README.md

Prepend `https://r.jina.ai/` to any URL to convert it into an LLM-friendly input. This is useful for agents and RAG systems.

```url
https://r.jina.ai/https://en.wikipedia.org/wiki/Artificial_intelligence
```

--------------------------------

### Cache control: Robots.txt compliance

Source: https://context7.com/jina-ai/reader/llms.txt

Ensure compliance with robots.txt rules for specific bots. 'X-Robots-Txt: Googlebot' checks against Googlebot's rules.

```bash
curl -H "X-Robots-Txt: Googlebot" \
     https://r.jina.ai/https://example.com/page
```

--------------------------------

### SPA Fetching with Timeout

Source: https://github.com/jina-ai/reader/blob/main/README.md

For SPAs or websites with dynamic content loading, use the `x-timeout` header to specify a waiting period.

```APIDOC
## SPA Fetching (Dynamic Content)

### Description
Fetches content from SPAs or websites with dynamic content loading, waiting until a specified timeout for network idle.

### Method
GET

### Endpoint
`{URL}`

### Parameters
#### Headers
- **x-timeout** (integer) - Optional - The timeout in seconds to wait for network idle.

### Request Example
```bash
curl 'https://example.com/' -H 'x-timeout: 30'
```
```

--------------------------------

### Token budget: Inspect usage (curl)

Source: https://context7.com/jina-ai/reader/llms.txt

Inspect token usage from response headers using curl. The 'X-Usage-Tokens' header provides an estimate of the GPT-compatible token count.

```bash
curl -sI "https://r.jina.ai/https://en.wikipedia.org/wiki/Python_(programming_language)" \
  | grep -i x-usage-tokens
```

--------------------------------

### SPA Fetching with POST Request

Source: https://github.com/jina-ai/reader/blob/main/README.md

For SPAs with hash-based routing, use the POST method with the `url` parameter in the request body to correctly handle URLs containing `#`.

```bash
curl -X POST 'https://r.jina.ai/' -d 'url=https://example.com/#/route'
```

--------------------------------

### Enable Streaming Mode with Accept Header

Source: https://github.com/jina-ai/reader/blob/main/README.md

Toggle streaming mode using the `Accept: text/event-stream` header. This mode waits longer for the page to stabilize, providing more complete results, especially for pages with dynamic content loaded by JavaScript.

```bash
curl -H "Accept: text/event-stream" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
```

--------------------------------

### Token budget: Reject if exceeded

Source: https://context7.com/jina-ai/reader/llms.txt

Set a token budget for requests. If the estimated token count exceeds the budget (e.g., 5000), the request will be rejected with an HTTP 400 error.

```bash
curl -v -H "X-Token-Budget: 5000" \
     https://r.jina.ai/https://very-long-document.example.com
```

--------------------------------

### Handle SPA and Dynamic Content with Browser Engine

Source: https://context7.com/jina-ai/reader/llms.txt

Manages Single Page Applications (SPAs) with hash routing, lazy-loaded content, and dynamic elements. Uses Puppeteer headless Chrome. Options include waiting for specific selectors, setting explicit timeouts, and forcing the browser engine.

```bash
# Wait for a specific element to appear before returning
curl -H "X-Wait-For-Selector: #main-content" \
     https://r.jina.ai/https://react-app.example.com/products
```

```bash
# Explicit timeout to capture fully-rendered heavy pages (max 180s)
curl -H "X-Timeout: 45" \
     https://r.jina.ai/https://dashboard.example.com/analytics
```

```bash
# Force browser engine (always use headless Chrome, never curl)
curl -H "X-Engine: browser" \
     https://r.jina.ai/https://js-heavy-app.example.com
```

--------------------------------

### POST / - URL or file via POST body

Source: https://context7.com/jina-ai/reader/llms.txt

Submit content via a POST request body to `https://r.jina.ai/`. Supports `url`, `html`, `pdf` (base64), or `file` (base64) for processing.

```APIDOC
## POST /

### Description
Allows submitting content via a POST request body for processing. This is useful for handling hash-routed SPAs, raw HTML conversion, or direct file uploads.

### Method
POST

### Endpoint
`https://r.jina.ai/`

### Headers
- `Content-Type`: `application/json`

### Request Body
- `url` (string): The URL to process.
- `html` (string): Raw HTML content to convert to markdown.
- `pdf` (string): Base64 encoded PDF content.
- `file` (string): Base64 encoded content of any binary file (e.g., Word, Excel, PowerPoint).

### Request Example
```bash
# Process a URL with a hash fragment
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d '{"url": "https://example.com/#/route/to/page"}'

# Convert raw HTML string to markdown
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d '{"html": "<h1>Hello</h1><p>World <strong>example</strong></p>"}'

# Submit a base64 encoded PDF
PDF_B64=$(base64 -i report.pdf)
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d "{"pdf": "${PDF_B64}"}"

# Submit a base64 encoded DOCX file
DOCX_B64=$(base64 -i slides.pptx)
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d "{"file": "${DOCX_B64}"}"
```

### Response
#### Success Response (200)
- `content` (string): Markdown representation of the processed content.

#### Response Example
```
# Hello

World **example**
```
```

--------------------------------

### SPA Fetching with POST

Source: https://github.com/jina-ai/reader/blob/main/README.md

For Single Page Applications (SPAs) with hash-based routing, use the POST method with the URL in the request body.

```APIDOC
## SPA Fetching (Hash-based Routing)

### Description
Fetches content from SPAs that use hash-based routing by sending a POST request.

### Method
POST

### Endpoint
`https://r.jina.ai/`

### Parameters
#### Request Body
- **url** (string) - Required - The URL of the SPA.

### Request Example
```bash
curl -X POST 'https://r.jina.ai/' -d 'url=https://example.com/#/route'
```
```

--------------------------------

### Web Search with s.jina.ai

Source: https://context7.com/jina-ai/reader/llms.txt

Performs web searches using the s.jina.ai endpoint. Supports basic search, domain restriction, JSON output, streaming results, image search, news search, geo-targeting, and controlling result count. Requires an Authorization header for some features.

```bash
# Basic web search, returns top 5 results in markdown
curl "https://s.jina.ai/What%20is%20the%20best%20vector%20database%20for%20RAG%3F"
```

```bash
# Restrict search to specific domains (in-site search)
curl "https://s.jina.ai/When%20was%20Jina%20AI%20founded%3F?site=jina.ai&site=github.com"
```

```bash
# Return structured JSON (array of {title, url, content})
curl -H "Accept: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/latest%20AI%20models%202024"
```

```bash
# Stream search results progressively via SSE
curl -H "Accept: text/event-stream" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/quantum%20computing%20breakthroughs"
```

```bash
# Search images
curl -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/golden%20retriever?type=images"
```

```bash
# Search news
curl -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/AI%20regulation%202024?type=news"
```

```bash
# Geo-targeted search (country + language)
curl -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/local%20weather?gl=de&hl=de"
```

```bash
# Request only URLs/titles without fetching full content (faster)
curl -H "X-Respond-With: no-content" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/machine%20learning%20tutorials"
```

```bash
# Control result count (up to 20)
curl -H "Authorization: Bearer YOUR_API_KEY" \
     "https://s.jina.ai/climate%20change&num=3"
```

--------------------------------

### Web Search

Source: https://github.com/jina-ai/reader/blob/main/README.md

Perform a web search by prepending `https://s.jina.ai/` to your search query. The API fetches and processes the top 5 search results.

```APIDOC
## Web Search

### Description
Performs a web search and fetches content from the top 5 results.

### Method
GET

### Endpoint
`https://s.jina.ai/{search_query}`

### Parameters
#### Path Parameters
- **search_query** (string) - Required - The URL-encoded search query.

### Request Example
```bash
https://s.jina.ai/Who%20will%20win%202024%20US%20presidential%20election%3F
```
```

--------------------------------

### In-site Search with Jina Reader

Source: https://github.com/jina-ai/reader/blob/main/README.md

To restrict search results to a specific domain, append `site=yourdomain.com` to the query parameters.

```url
https://s.jina.ai/your+query?site=jina.ai
```

--------------------------------

### Use Custom Proxy with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Utilize a custom proxy for requests by specifying the proxy URL in the 'X-Proxy-Url' header.

```bash
# Use a custom proxy
curl -H "X-Proxy-Url: socks5://user:pass@proxy.example.com:1080" \
     https://r.jina.ai/https://geo-restricted-site.com
```

--------------------------------

### Generate Alt Text for Images with X-With-Generated-Alt

Source: https://github.com/jina-ai/reader/blob/main/README.md

Enable automatic captioning of images lacking alt tags by using the `X-With-Generated-Alt: true` header. The captions are formatted to assist downstream LLMs in understanding image content.

```bash
curl -H "X-With-Generated-Alt: true" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
```

--------------------------------

### Submit PDF as Base64 with POST

Source: https://context7.com/jina-ai/reader/llms.txt

Submit a PDF file encoded in Base64 within the JSON body of a POST request to convert it to markdown.

```bash
# Submit a PDF as base64 and get markdown back
PDF_B64=$(base64 -i report.pdf)
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d "{"pdf": "${PDF_B64}"}"
```

--------------------------------

### Streaming Mode - Accept: text/event-stream

Source: https://context7.com/jina-ai/reader/llms.txt

Enables Server-Sent Events (SSE) streaming for content that loads dynamically. Each SSE chunk provides increasingly complete page content, allowing immediate processing.

```APIDOC
## Streaming Mode

### Description
Enables Server-Sent Events (SSE) streaming by setting the `Accept` header to `text/event-stream`. This is useful for websites that load content dynamically via JavaScript or when immediate processing of partial content is desired. Each subsequent SSE chunk contains more complete page content.

### Method
GET

### Endpoint
`https://r.jina.ai/<URL>`

### Headers
- `Accept`: `text/event-stream`
- `X-No-Cache`: `true` (optional, to ensure fresh content)

### Request Example
```bash
# Stream Wikipedia main page
curl -H "Accept: text/event-stream" \
     https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page

# Stream a site with dynamic content, bypassing cache
curl -H "Accept: text/event-stream" \
     -H "X-No-Cache: true" \
     https://r.jina.ai/https://access.redhat.com/security/cve/CVE-2023-45853
```

### Response
#### Success Response (200)
- Server-Sent Events stream where each event chunk contains progressively more complete page content.
```

--------------------------------

### Programmatic Crawling with CrawlerHost (Node.js)

Source: https://context7.com/jina-ai/reader/llms.txt

Directly use the CrawlerHost for programmatic crawling. This involves resolving the host from the container, configuring crawl options, and iterating through snapshots.

```typescript
// Direct use of CrawlerHost for programmatic crawling
import { container } from 'tsyringe';
import { CrawlerHost } from './src/api/crawler';
import { CrawlerOptions } from './src/dto/crawler-options';

const host = container.resolve(CrawlerHost);
await host.serviceReady();

const url = new URL('https://en.wikipedia.org/wiki/Artificial_intelligence');
const opts = CrawlerOptions.from({ respondWith: 'markdown', withGeneratedAlt: false });
const crawlOpts = await host.configure(opts);

for await (const snapshot of host.iterSnapshots(url, crawlOpts, opts)) {
    if (!snapshot) continue;
    const formatted = await host.simpleCrawl('content', url, crawlOpts);
    console.log(formatted.content);
    break;
}
```

--------------------------------

### Convert Raw HTML to Markdown with POST

Source: https://context7.com/jina-ai/reader/llms.txt

Convert a raw HTML string to markdown by including the HTML content in the JSON body of a POST request.

```bash
# Convert raw HTML string to markdown
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d '{"html": "<h1>Hello</h1><p>World <strong>example</strong></p>"}'
# → "# Hello\n\nWorld **example**\n"
```

--------------------------------

### Cache control: Cache tolerance

Source: https://context7.com/jina-ai/reader/llms.txt

Specify the maximum age of cached content to accept. 'X-Cache-Tolerance: 86400' allows cached content up to 24 hours old.

```bash
curl -H "X-Cache-Tolerance: 86400" \
     https://r.jina.ai/https://example.com/static-page
```

--------------------------------

### Force curl engine

Source: https://context7.com/jina-ai/reader/llms.txt

Use the 'curl' engine for lightweight, no-JavaScript execution. This is useful for static content or when JavaScript execution is not desired.

```bash
curl -H "X-Engine: curl" \
     https://r.jina.ai/https://static-site.example.com
```

--------------------------------

### Cache control: Bypass cache

Source: https://context7.com/jina-ai/reader/llms.txt

Bypass the cache entirely for a request. Use 'X-No-Cache: true' to ensure the content is always re-fetched from the source.

```bash
curl -H "X-No-Cache: true" \
     https://r.jina.ai/https://example.com/live-data
```

--------------------------------

### Search Web with Jina Reader

Source: https://github.com/jina-ai/reader/blob/main/README.md

Use `https://s.jina.ai/` followed by a URL-encoded query to search the web. This allows LLMs to access current world knowledge.

```url
https://s.jina.ai/Who%20will%20win%202024%20US%20presidential%20election%3F
```

--------------------------------

### Set viewport for responsive rendering

Source: https://context7.com/jina-ai/reader/llms.txt

Define a specific viewport for rendering responsive web pages. This is useful for testing how a page appears on different devices.

```bash
curl -X POST https://r.jina.ai/ \
     -H "Content-Type: application/json" \
     -d '{"url":"https://example.com","viewport":{"width":375,"height":812,"isMobile":true}}'
```

--------------------------------

### Token budget: Max tokens

Source: https://context7.com/jina-ai/reader/llms.txt

Limit the response size by setting a maximum token count. 'X-Max-Tokens: 2000' will trim the response to approximately 2000 tokens.

```bash
curl -H "X-Max-Tokens: 2000" \
     https://r.jina.ai/https://wikipedia.org/wiki/Large_language_model
```

--------------------------------

### Control Caching and Timeout with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Force bypass of cache and wait for full network idle for heavy JS apps by setting 'X-No-Cache: true' and 'X-Timeout: 30'.

```bash
# Force bypass cache and wait for full network idle (heavy JS apps)
curl -H "X-No-Cache: true" \
     -H "X-Timeout: 30" \
     https://r.jina.ai/https://app.example.com/dashboard
```

--------------------------------

### Markdown output: Strip images

Source: https://context7.com/jina-ai/reader/llms.txt

Configure the markdown output to strip all images. Use 'X-Retain-Images: none' to remove all image elements.

```bash
curl -H "X-Retain-Images: none" \
     https://r.jina.ai/https://example.com/article
```

--------------------------------

### Generate Image Alt-Text with X-With-Generated-Alt

Source: https://context7.com/jina-ai/reader/llms.txt

Automatically generates captions for images lacking alt attributes using the jina-vlm model. Captions are embedded in markdown. Can be combined with X-With-Images-Summary for a full image summary section. Requires an Authorization header for JSON output.

```bash
# Auto-caption all images on a Wikipedia page
curl -H "X-With-Generated-Alt: true" \
     https://r.jina.ai/https://en.wikipedia.org/wiki/Hubble_Space_Telescope
```

```bash
# With JSON output to inspect image captions programmatically
curl -H "Accept: application/json" \
     -H "X-With-Generated-Alt: true" \
     https://r.jina.ai/https://en.wikipedia.org/wiki/Hubble_Space_Telescope
```

```bash
# Combined: with alt text + full image summary section
curl -H "X-With-Generated-Alt: true" \
     -H "X-With-Images-Summary: true" \
     https://r.jina.ai/https://www.nasa.gov/missions/
```

--------------------------------

### In-Site Search

Source: https://github.com/jina-ai/reader/blob/main/README.md

Perform an in-site search by specifying the `site` query parameter. You can target multiple sites.

```APIDOC
## In-Site Search

### Description
Searches within specified websites.

### Method
GET

### Endpoint
`https://s.jina.ai/{search_query}`

### Parameters
#### Query Parameters
- **search_query** (string) - Required - The URL-encoded search query.
- **site** (string) - Optional - The domain to search within. Can be specified multiple times.

### Request Example
```bash
curl 'https://s.jina.ai/When%20was%20Jina%20AI%20founded%3F?site=jina.ai&site=github.com'
```
```

--------------------------------

### Wait for CSS Selector with x-wait-for-selector

Source: https://github.com/jina-ai/reader/blob/main/README.md

Use the `x-wait-for-selector` header to make the Reader wait for a specific CSS selector to appear on the page before extracting content. This is useful when you know the exact element to target.

```bash
curl 'https://example.com/' -H 'x-wait-for-selector: #content'
```

--------------------------------

### Read PDF from URL with Jina Reader

Source: https://github.com/jina-ai/reader/blob/main/README.md

Jina Reader can now process PDF files directly from a URL. The output is an LLM-friendly format.

```url
https://r.jina.ai/https://www.nasa.gov/wp-content/uploads/2023/01/55583main_vision_space_exploration2.pdf
```

--------------------------------

### Markdown output: Contextual chunking

Source: https://context7.com/jina-ai/reader/llms.txt

Apply contextual chunking to the markdown output. 'X-Markdown-Chunking: s2' enables structured chunking with a depth of 2.

```bash
curl -H "X-Markdown-Chunking: s2" \
     https://r.jina.ai/https://long-documentation-page.example.com
```

--------------------------------

### Markdown output: Strip links

Source: https://context7.com/jina-ai/reader/llms.txt

Configure the markdown output to strip all links. Use 'X-Retain-Links: none' to remove all hyperlink elements.

```bash
curl -H "X-Retain-Links: none" \
     https://r.jina.ai/https://example.com/article
```

--------------------------------

### Markdown output: Chunking by heading level

Source: https://context7.com/jina-ai/reader/llms.txt

Enable markdown chunking based on heading levels. 'X-Markdown-Chunking: h2' injects a separator before each H2 heading.

```bash
curl -H "X-Markdown-Chunking: h2" \
     https://r.jina.ai/https://long-documentation-page.example.com
```

--------------------------------

### Remove cookie/GDPR overlays

Source: https://context7.com/jina-ai/reader/llms.txt

Automatically remove common overlay elements like cookie or GDPR banners. Set 'X-Remove-Overlay' to 'true' to enable this feature.

```bash
curl -H "X-Remove-Overlay: true" \
     https://r.jina.ai/https://news.site.example.com
```

--------------------------------

### Control response timing

Source: https://context7.com/jina-ai/reader/llms.txt

Explicitly control when the response is considered ready using the 'X-Respond-Timing' header. 'network-idle' waits until network activity has ceased.

```bash
curl -H "X-Respond-Timing: network-idle" \
     https://r.jina.ai/https://example.com/heavy-page
```

--------------------------------

### Forward Session Cookie with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Forward a session cookie to maintain user sessions and prevent caching by using the 'X-Set-Cookie' header.

```bash
# Forward session cookie (result not cached)
curl -H "X-Set-Cookie: session=abc123; Domain=example.com; Path=/" \
     https://r.jina.ai/https://example.com/profile
```

--------------------------------

### Markdown output: GPT-OSS citation links

Source: https://context7.com/jina-ai/reader/llms.txt

Set the link retention policy to 'gpt-oss' for a specific citation link format: 【{id}†.*】. This is useful for academic or technical documentation.

```bash
curl -H "X-Retain-Links: gpt-oss" \
     https://r.jina.ai/https://docs.openai.com/api-reference
```

--------------------------------

### Token budget: Parse usage (Python)

Source: https://context7.com/jina-ai/reader/llms.txt

Parse token usage from a JSON response using Python's httpx library. The 'usage.tokens' field in the JSON response contains the token count.

```python
import httpx
r = httpx.get(
    "https://r.jina.ai/https://en.wikipedia.org/wiki/Python_(programming_language)",
    headers={"Accept": "application/json", "Authorization": "Bearer YOUR_API_KEY"}
)
data = r.json()
print(f"Tokens used: {data.get('usage', {}).get('tokens')}")
```

--------------------------------

### Enable Server-Sent Events Streaming with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Enable Server-Sent Events (SSE) streaming by setting the 'Accept: text/event-stream' header. This is useful for sites that load content dynamically via JavaScript or when immediate processing is needed.

```bash
# Stream Wikipedia main page — last event contains the most complete result
curl -H "Accept: text/event-stream" \
     https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page

# Real example: site that lazy-loads after full load
# Standard mode returns incomplete page; streaming waits longer:
curl -H "Accept: text/event-stream" \
     -H "X-No-Cache: true" \
     https://r.jina.ai/https://access.redhat.com/security/cve/CVE-2023-45853
```

--------------------------------

### Parse in Python - Use Final Data Event

Source: https://context7.com/jina-ai/reader/llms.txt

Extracts the last data event from a stream, useful for processing the final content of a web page. Requires httpx and json libraries.

```python
import httpx, json
with httpx.stream("GET", "https://r.jina.ai/https://example.com",
                  headers={"Accept": "text/event-stream",
                           "Authorization": "Bearer YOUR_API_KEY"}) as r:
    last_data = None
    for line in r.iter_lines():
        if line.startswith("data: "):
            last_data = json.loads(line[6:])
print(last_data["content"])
```

--------------------------------

### Target Specific Sections and Remove Elements with curl

Source: https://context7.com/jina-ai/reader/llms.txt

Focus on a specific CSS section using 'X-Target-Selector' and remove elements like cookie banners with 'X-Remove-Selector'.

```bash
# Focus on one CSS section, remove cookie banners
curl -H "X-Target-Selector: article.main-content" \
     -H "X-Remove-Selector: .cookie-banner, #newsletter-popup" \
     https://r.jina.ai/https://www.nytimes.com/2024/01/01/technology/ai.html
```

--------------------------------

### Request Headers for Reader API

Source: https://github.com/jina-ai/reader/blob/main/README.md

Control the behavior of the Reader API using various request headers for features like image captioning, cookie forwarding, response format, proxying, caching, and element selection.

```APIDOC
## Reader API Request Headers

### Description
Customize the Reader API's behavior using the following request headers.

### Headers
- **x-with-generated-alt**: `true` - Enable image caption feature.
- **x-set-cookie**: `true` - Forward cookie settings. Requests with cookies are not cached.
- **x-respond-with**: `markdown` | `html` | `text` | `screenshot` - Specify response format. `markdown` bypasses readability, `html` returns outerHTML, `text` returns innerText, `screenshot` returns screenshot URL.
- **x-proxy-url**: (string) - Specify a proxy server URL.
- **x-cache-tolerance**: (integer) - Customize cache tolerance in seconds.
- **x-no-cache**: `true` - Bypass the cached page (equivalent to `x-cache-tolerance: 0`).
- **x-target-selector**: (string) - CSS selector to target a specific element for content extraction.
- **x-wait-for-selector**: (string) - CSS selector to wait for until the element is rendered.
```

--------------------------------

### Cache control: Prevent caching

Source: https://context7.com/jina-ai/reader/llms.txt

Prevent the result of a specific request from being cached. This is useful for sensitive or frequently changing data.

```bash
curl -H "DNT: 1" \
     https://r.jina.ai/https://example.com/sensitive-page
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.