### Install code-chunk using npm or bun

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Install the code-chunk library using either npm or bun package managers. This is the first step to using the library in your project.

```bash
bun add code-chunk
# or
npm install code-chunk
```

--------------------------------

### Basic code chunking in TypeScript

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Demonstrates basic usage of the `chunk` function in TypeScript. It takes a file path and source code as input and returns an array of code chunks, each with associated context.

```typescript
import { chunk } from 'code-chunk'

const chunks = await chunk('src/user.ts', sourceCode)

for (const c of chunks) {
  console.log(c.text)
  console.log(c.context.scope)    // [{ name: 'UserService', type: 'class' }]
  console.log(c.context.entities) // [{ name: 'getUser', type: 'method', ... }]
}
```

--------------------------------

### Creating a reusable chunker instance in TypeScript

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Demonstrates how to create a reusable `chunker` instance with custom configuration. This is efficient when processing multiple files with the same chunking settings.

```typescript
import { createChunker } from 'code-chunk'

const chunker = createChunker({
  maxChunkSize: 2048,
  contextMode: 'full',
  siblingDetail: 'signatures',
})

for (const file of files) {
  const chunks = await chunker.chunk(file.path, file.content)
}
```

--------------------------------

### Create Reusable Chunker Instance (TypeScript)

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Creates a reusable chunker instance with preconfigured default options. This is useful when processing multiple files with the same configuration. It allows for custom defaults and overriding options per file.

```typescript
import { createChunker } from 'code-chunk'
import * as fs from 'fs'
import * as path from 'path'

// Create chunker with custom defaults
const chunker = createChunker({
  maxChunkSize: 2048,
  contextMode: 'full',
  siblingDetail: 'signatures',
  overlapLines: 5  // Include 5 lines from previous chunk for context
})

const sourceFiles = [
  'src/auth/login.ts',
  'src/auth/register.ts',
  'src/services/user.ts',
  'src/services/payment.ts'
]

for (const filepath of sourceFiles) {
  const code = fs.readFileSync(filepath, 'utf-8')

  // Use chunker.chunk() for array of all chunks
  const chunks = await chunker.chunk(filepath, code)
  console.log(`${filepath}: ${chunks.length} chunks`)

  // Or use chunker.stream() for incremental processing
  for await (const chunk of chunker.stream(filepath, code)) {
    await indexChunk(filepath, chunk)
  }
}

// Override options per-file if needed
const specialChunks = await chunker.chunk('src/config.ts', configCode, {
  maxChunkSize: 500,  // Smaller chunks for config files
  contextMode: 'minimal'
})
```

--------------------------------

### Custom Context Formatting with formatChunkWithContext

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

A utility function, `formatChunkWithContext`, allows for custom formatting of chunk text by prepending semantic context. This is particularly useful when precise control over the text format used for embedding is required. It can also incorporate optional overlap text from the previous chunk for better continuity.

```typescript
import { chunk, formatChunkWithContext } from 'code-chunk'

const chunks = await chunk('src/service.ts', sourceCode, {
  contextMode: 'none'  // Get raw chunks without automatic formatting
})

for (const c of chunks) {
  // Custom formatting with optional overlap text
  const previousChunkLastLines = chunks[c.index - 1]?.text.split('\n').slice(-3).join('\n')

  const formattedText = formatChunkWithContext(
    c.text,
    c.context,
    previousChunkLastLines  // Optional overlap for continuity
  )

  console.log(formattedText)
  // Output:
  // # src/service.ts
  // # Scope: MyService > processData
  // # Defines: async transform(data: Input): Promise<Output>
  // # Uses: lodash, validator
  // # After: validate
  // # Before: save
  //
  // # ...
  // <last 3 lines from previous chunk>
  // # ---
  //   async transform(data: Input): Promise<Output> {
  //     return _.mapValues(data, validate)
  //   }

  const embedding = await embed(formattedText)
}
```

--------------------------------

### Effect-Based Batch Processing with chunkBatchEffect and chunkBatchStreamEffect

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Provides effect-native batch processing capabilities using `chunkBatchEffect` and `chunkBatchStreamEffect`. `chunkBatchEffect` returns an Effect containing all results, suitable for integrating batch chunking into Effect applications. `chunkBatchStreamEffect` processes batches as a stream, allowing for more granular control and processing of results as they become available.

```typescript
import { chunkBatchEffect, chunkBatchStreamEffect } from 'code-chunk'
import { Effect, Stream, pipe } from 'effect'

const files = [
  { filepath: 'src/api/users.ts', code: usersApiCode },
  { filepath: 'src/api/products.ts', code: productsApiCode },
  { filepath: 'src/models/user.ts', code: userModelCode }
]

// Batch processing with Effect
const batchProgram = pipe(
  chunkBatchEffect(files, { concurrency: 5, maxChunkSize: 1500 }),
  Effect.flatMap((results) =>
    Effect.forEach(results, (result) => {
      if (result.error) {
        return Effect.logError(`Failed: ${result.filepath}`)
      }
      return Effect.log(`Success: ${result.filepath} (${result.chunks.length} chunks)`)
    })
  )
)

await Effect.runPromise(batchProgram)

// Streaming batch with Effect
const streamProgram = pipe(
  chunkBatchStreamEffect(files, { concurrency: 3 }),
  Stream.filter((result) => result.error === null),
  Stream.flatMap((result) => Stream.fromIterable(result.chunks)),
  Stream.mapEffect((chunk) =>
    Effect.tryPromise(() => indexChunk(chunk))
  ),
  Stream.runCollect
)

const indexed = await Effect.runPromise(streamProgram)
console.log(`Indexed ${indexed.length} chunks`)
```

--------------------------------

### Batch Process Files with Error Handling (TypeScript)

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Processes multiple files concurrently using `chunkBatch`. It allows for configuration of maximum chunk size, concurrency, and provides progress updates. Results are collected in a promise, enabling iteration over individual file outcomes, including errors.

```typescript
import { chunkBatch } from 'code-chunk'

const files = [
  { filepath: 'src/user.ts', code: userCode },
  { filepath: 'src/auth.ts', code: authCode },
  { filepath: 'lib/utils.py', code: utilsCode },
]

const results = await chunkBatch(files, {
  maxChunkSize: 1500,
  concurrency: 10,
  onProgress: (done, total, path, success) => {
    console.log(`[${done}/${total}] ${path}: ${success ? 'ok' : 'failed'}`)
  }
})

for (const result of results) {
  if (result.error) {
    console.error(`Failed: ${result.filepath}`, result.error)
  } else {
    await indexChunks(result.filepath, result.chunks)
  }
}
```

--------------------------------

### Cloudflare Workers WASM Chunker Integration

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Provides a WASM-based chunker for environments like Cloudflare Workers. It requires importing WASM binaries for Tree-sitter and specific languages. The `createChunker` function initializes the WASM chunker with configuration for languages and chunking options. The chunker can be used for standard chunking, streaming, and batch processing.

```typescript
import { createChunker, type WasmConfig } from 'code-chunk/wasm'

// Import WASM binaries (Cloudflare Workers style)
import treeSitterWasm from 'web-tree-sitter/tree-sitter.wasm'
import typescriptWasm from 'tree-sitter-typescript/tree-sitter.tsx.wasm'
import pythonWasm from 'tree-sitter-python/tree-sitter-python.wasm'

const wasmConfig: WasmConfig = {
  treeSitter: treeSitterWasm,
  languages: {
    typescript: typescriptWasm,
    python: pythonWasm
    // Only include languages you need to minimize bundle size
  }
}

// Create WASM-based chunker (async initialization required)
const chunker = await createChunker(wasmConfig, {
  maxChunkSize: 1500,
  contextMode: 'full'
})

// Use the same API as the native chunker
export default {
  async fetch(request: Request): Promise<Response> {
    const { filepath, code } = await request.json()

    try {
      const chunks = await chunker.chunk(filepath, code)
      return Response.json({ chunks })
    } catch (error) {
      return Response.json({ error: error.message }, { status: 400 })
    }
  }
}

// Streaming also works
for await (const chunk of chunker.stream('api/handler.ts', handlerCode)) {
  await processChunk(chunk)
}

// Batch processing
const results = await chunker.chunkBatch(files, { concurrency: 5 })
```

--------------------------------

### Streaming Batch Results (TypeScript)

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Streams batch results as files complete processing. Results are yielded immediately when each file finishes, enabling real-time progress and early processing. This is ideal for applications that need to react to chunking results as they become available.

```typescript
import { chunkBatchStream } from 'code-chunk'

const files = [
  { filepath: 'src/auth.ts', code: authCode },
  { filepath: 'src/user.ts', code: userCode },
  { filepath: 'src/payment.ts', code: paymentCode },
  { filepath: 'lib/utils.py', code: utilsCode },
  { filepath: 'lib/helpers.go', code: helpersCode }
]

console.log('Starting batch processing...')

let processed = 0
let indexed = 0

// Results stream as files complete (not in order)
for await (const result of chunkBatchStream(files, { concurrency: 5 })) {
  processed++

  if (result.error) {
    console.error(`[${processed}/${files.length}] ${result.filepath} FAILED:`, result.error.message)
    continue
  }

  console.log(`[${processed}/${files.length}] ${result.filepath}: ${result.chunks.length} chunks`)

  // Index immediately as results arrive
  for (const chunk of result.chunks) {
    await vectorDB.upsert({
      id: `${result.filepath}:${chunk.index}`,
      embedding: await embed(chunk.contextualizedText),
      metadata: { filepath: result.filepath, ...chunk.lineRange }
    })
    indexed++
  }
}

console.log(`Completed: ${indexed} chunks indexed from ${processed} files`)
```

--------------------------------

### Chunk Code with Context (TypeScript)

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

The primary `chunk` function parses source code, extracts entities, builds a scope tree, and returns semantic code chunks with rich metadata. It supports various options for chunking and context retrieval. This is suitable for batch processing or when all chunks are needed at once.

```typescript
import { chunk } from 'code-chunk'

const sourceCode = `
import { Database } from './db'

class UserService {
  private db: Database

  constructor(db: Database) {
    this.db = db
  }

  async getUser(id: string): Promise<User> {
    return this.db.query('SELECT * FROM users WHERE id = ?', [id])
  }

  async createUser(data: UserData): Promise<User> {
    return this.db.insert('users', data)
  }
}
`

const chunks = await chunk('src/services/user.ts', sourceCode, {
  maxChunkSize: 1500,
  contextMode: 'full',
  siblingDetail: 'signatures'
})

for (const c of chunks) {
  console.log(`Chunk ${c.index + 1}/${c.totalChunks}`)
  console.log('Text:', c.text)
  console.log('Lines:', c.lineRange.start, '-', c.lineRange.end)
  console.log('Scope:', c.context.scope)
  // [{ name: 'UserService', type: 'class' }]
  console.log('Entities:', c.context.entities)
  // [{ name: 'getUser', type: 'method', signature: 'async getUser(id: string): Promise<User>' }]
  console.log('Imports:', c.context.imports)
  // [{ name: 'Database', source: './db' }]

  // Use contextualizedText for embeddings
  const embedding = await embedModel.embed(c.contextualizedText)
  await vectorDB.upsert({ id: `user.ts:${c.index}`, embedding, metadata: c.lineRange })
}

// Example contextualizedText output:
// # src/services/user.ts
// # Scope: UserService
// # Defines: async getUser(id: string): Promise<User>
// # Uses: Database
// # After: constructor
//
//   async getUser(id: string): Promise<User> {
//     return this.db.query('SELECT * FROM users WHERE id = ?', [id])
//   }

```

--------------------------------

### Using contextualized text for embeddings in TypeScript

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Shows how to use the `contextualizedText` property of code chunks for generating embeddings. This pre-formatted text includes semantic context, which can improve embedding quality for RAG systems.

```typescript
for (const c of chunks) {
  const embedding = await embed(c.contextualizedText)
  await vectorDB.upsert({
    id: `${filepath}:${c.index}`,
    embedding,
    metadata: { filepath, lines: c.lineRange }
  })
}
```

--------------------------------

### Detect Programming Language from File Path

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Detects the programming language of a file based on its extension. It returns the language name or null if the extension is not supported. The `LANGUAGE_EXTENSIONS` constant provides a mapping of file extensions to language names. This can be used to override language detection when calling the `chunk` function.

```typescript
import { detectLanguage, LANGUAGE_EXTENSIONS } from 'code-chunk'

// Detect language from filepath
console.log(detectLanguage('src/app.ts'))       // 'typescript'
console.log(detectLanguage('lib/utils.tsx'))    // 'typescript'
console.log(detectLanguage('scripts/build.js')) // 'javascript'
console.log(detectLanguage('src/main.py'))      // 'python'
console.log(detectLanguage('cmd/server.go'))    // 'go'
console.log(detectLanguage('src/Main.java'))    // 'java'
console.log(detectLanguage('lib/core.rs'))      // 'rust'
console.log(detectLanguage('README.md'))        // null (unsupported)

// Access the full extension mapping
console.log(LANGUAGE_EXTENSIONS)
// {
//   '.ts': 'typescript', '.tsx': 'typescript', '.mts': 'typescript', '.cts': 'typescript',
//   '.js': 'javascript', '.jsx': 'javascript', '.mjs': 'javascript', '.cjs': 'javascript',
//   '.py': 'python', '.pyi': 'python',
//   '.rs': 'rust',
//   '.go': 'go',
//   '.java': 'java'
// }

// Use with chunk to override detection
const chunks = await chunk('config.txt', tsCode, {
  language: 'typescript'  // Force TypeScript parsing
})
```

--------------------------------

### Streaming large files with code-chunk in TypeScript

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Illustrates how to process large files incrementally using the `chunkStream` function. This is useful for managing memory when dealing with very large code files.

```typescript
import { chunkStream } from 'code-chunk'

for await (const c of chunkStream('src/large.ts', code)) {
  await process(c)
}
```

--------------------------------

### Effect-Based Streaming with chunkStreamEffect

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Utilizes `chunkStreamEffect` for effect-native streaming of code chunks. This function returns an Effect Stream, allowing seamless integration with the Effect ecosystem for composable pipelines. It processes code from a specified file, applies transformations, and indexes the resulting chunks.

```typescript
import { chunkStreamEffect } from 'code-chunk'
import { Effect, Stream, pipe } from 'effect'

const sourceCode = `
export function fibonacci(n: number): number {
  if (n <= 1) return n
  return fibonacci(n - 1) + fibonacci(n - 2)
}

export function factorial(n: number): number {
  if (n <= 1) return 1
  return n * factorial(n - 1)
}
`

// Create a processing pipeline using Effect
const program = pipe(
  chunkStreamEffect('src/math.ts', sourceCode, { maxChunkSize: 500 }),
  Stream.tap((chunk) =>
    Effect.log(`Processing chunk ${chunk.index}: ${chunk.context.entities.map(e => e.name).join(', ')}`)
  ),
  Stream.mapEffect((chunk) =>
    Effect.tryPromise(async () => ({
      chunk,
      embedding: await embedModel.embed(chunk.contextualizedText)
    }))
  ),
  Stream.mapEffect(({ chunk, embedding }) =>
    Effect.tryPromise(() => vectorDB.upsert({
      id: `math:${chunk.index}`,
      embedding,
      metadata: chunk.context
    }))
  ),
  Stream.runCollect
)

// Execute the pipeline
const results = await Effect.runPromise(program)
console.log(`Indexed ${results.length} chunks`)
```

--------------------------------

### Integrate Chunking into Effect-based Pipelines (TypeScript)

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Integrates code chunking into Effect-based pipelines using `chunkStreamEffect`. This function returns an Effect-native `Stream` of chunks, allowing for composable and robust asynchronous data processing within the Effect ecosystem. It demonstrates logging each chunk's text as it's processed.

```typescript
import { chunkStreamEffect } from 'code-chunk'
import { Effect, Stream } from 'effect'

const program = Stream.runForEach(
  chunkStreamEffect('src/utils.ts', code),
  (chunk) => Effect.log(chunk.text)
)

await Effect.runPromise(program)
```

--------------------------------

### Concurrent Batch Processing (TypeScript)

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

Processes multiple files concurrently with per-file error handling. Failed files don't stop the batch; errors are captured in the result. This function is useful for processing large numbers of files efficiently.

```typescript
import { chunkBatch } from 'code-chunk'
import * as fs from 'fs'
import * as glob from 'fast-glob'

// Load all TypeScript files
const filePaths = await glob('src/**/*.ts')
const files = filePaths.map(filepath => ({
  filepath,
  code: fs.readFileSync(filepath, 'utf-8')
}))

console.log(`Processing ${files.length} files...`)

const results = await chunkBatch(files, {
  maxChunkSize: 1500,
  concurrency: 10,  // Process 10 files in parallel
  onProgress: (completed, total, filepath, success) => {
    const status = success ? 'OK' : 'FAILED'
    console.log(`[${completed}/${total}] ${filepath}: ${status}`)
  }
})

// Process results
let totalChunks = 0
let failedFiles = 0

for (const result of results) {
  if (result.error) {
    console.error(`Error in ${result.filepath}:`, result.error.message)
    failedFiles++
  } else {
    totalChunks += result.chunks.length

    // Index successful chunks
    for (const chunk of result.chunks) {
      await vectorDB.upsert({
        id: `${result.filepath}:${chunk.index}`,
        embedding: await embed(chunk.contextualizedText),
        metadata: {
          filepath: result.filepath,
          lines: chunk.lineRange,
          entities: chunk.context.entities.map(e => e.name)
        }
      })
    }
  }
}

console.log(`Indexed ${totalChunks} chunks from ${files.length - failedFiles} files`)
console.log(`${failedFiles} files failed`)
```

--------------------------------

### Stream Batch Results as Files Complete (TypeScript)

Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md

Streams results from batch file processing as each file completes using `chunkBatchStream`. This is useful for scenarios where immediate processing of completed files is desired without waiting for the entire batch. It iterates over results, processing chunks for files that were successfully processed.

```typescript
import { chunkBatchStream } from 'code-chunk'

for await (const result of chunkBatchStream(files, { concurrency: 5 })) {
  if (result.chunks) {
    await indexChunks(result.filepath, result.chunks)
  }
}
```

--------------------------------

### Typed Error Handling for Chunking Operations

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

The library provides specific error types like `ChunkingError` and `UnsupportedLanguageError` for robust error handling. These errors can be caught using standard `try...catch` blocks or within Effect-based programming paradigms. The `ChunkingError` includes a `cause` property for deeper inspection of the underlying issue.

```typescript
import { chunk, ChunkingError, UnsupportedLanguageError } from 'code-chunk'

async function processFile(filepath: string, code: string) {
  try {
    const chunks = await chunk(filepath, code)
    return { success: true, chunks }
  } catch (error) {
    if (error instanceof UnsupportedLanguageError) {
      console.warn(`Skipping unsupported file: ${error.filepath}`)
      return { success: false, reason: 'unsupported_language' }
    }

    if (error instanceof ChunkingError) {
      console.error(`Chunking failed: ${error.message}`)
      console.error('Cause:', error.cause)
      return { success: false, reason: 'chunking_error' }
    }

    throw error  // Re-throw unexpected errors
  }
}

// Effect-style error handling with _tag
import { Effect, pipe } from 'effect'
import { chunkStreamEffect } from 'code-chunk'

const program = pipe(
  chunkStreamEffect('src/app.ts', code),
  Stream.runCollect,
  Effect.catchTag('UnsupportedLanguageError', (e) =>
    Effect.succeed([])  // Return empty for unsupported files
  ),
  Effect.catchTag('ChunkingError', (e) =>
    Effect.fail(new Error(`Parse error: ${e.message}`))
  )
)
```

--------------------------------

### Stream Code Chunks for Large Files (TypeScript)

Source: https://context7.com/supermemoryai/code-chunk/llms.txt

The `chunkStream` function processes large files by streaming semantic code chunks as they are generated using an async generator. This approach is memory-efficient as it avoids loading all chunks simultaneously. `totalChunks` is reported as -1 due to the unknown total count upfront.

```typescript
import { chunkStream } from 'code-chunk'
import * as fs from 'fs'

const largeFile = fs.readFileSync('src/large-module.ts', 'utf-8')

// Process chunks incrementally as they're generated
for await (const chunk of chunkStream('src/large-module.ts', largeFile)) {
  console.log(`Processing chunk ${chunk.index}...`)
  console.log('Entities:', chunk.context.entities.map(e => e.name).join(', '))

  // Stream to vector database immediately
  const embedding = await embedModel.embed(chunk.contextualizedText)
  await vectorDB.upsert({
    id: `large-module:${chunk.index}`,
    embedding,
    metadata: {
      filepath: 'src/large-module.ts',
      lines: chunk.lineRange,
      scope: chunk.context.scope.map(s => s.name).join(' > ')
    }
  })

  // Free memory after processing
  console.log(`Chunk ${chunk.index} indexed successfully`)
}

console.log('Finished processing large file')

```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.