### Install code-chunk using npm or bun Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Install the code-chunk library using either npm or bun package managers. This is the first step to using the library in your project. ```bash bun add code-chunk # or npm install code-chunk ``` -------------------------------- ### Basic code chunking in TypeScript Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Demonstrates basic usage of the `chunk` function in TypeScript. It takes a file path and source code as input and returns an array of code chunks, each with associated context. ```typescript import { chunk } from 'code-chunk' const chunks = await chunk('src/user.ts', sourceCode) for (const c of chunks) { console.log(c.text) console.log(c.context.scope) // [{ name: 'UserService', type: 'class' }] console.log(c.context.entities) // [{ name: 'getUser', type: 'method', ... }] } ``` -------------------------------- ### Creating a reusable chunker instance in TypeScript Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Demonstrates how to create a reusable `chunker` instance with custom configuration. This is efficient when processing multiple files with the same chunking settings. ```typescript import { createChunker } from 'code-chunk' const chunker = createChunker({ maxChunkSize: 2048, contextMode: 'full', siblingDetail: 'signatures', }) for (const file of files) { const chunks = await chunker.chunk(file.path, file.content) } ``` -------------------------------- ### Create Reusable Chunker Instance (TypeScript) Source: https://context7.com/supermemoryai/code-chunk/llms.txt Creates a reusable chunker instance with preconfigured default options. This is useful when processing multiple files with the same configuration. It allows for custom defaults and overriding options per file. ```typescript import { createChunker } from 'code-chunk' import * as fs from 'fs' import * as path from 'path' // Create chunker with custom defaults const chunker = createChunker({ maxChunkSize: 2048, contextMode: 'full', siblingDetail: 'signatures', overlapLines: 5 // Include 5 lines from previous chunk for context }) const sourceFiles = [ 'src/auth/login.ts', 'src/auth/register.ts', 'src/services/user.ts', 'src/services/payment.ts' ] for (const filepath of sourceFiles) { const code = fs.readFileSync(filepath, 'utf-8') // Use chunker.chunk() for array of all chunks const chunks = await chunker.chunk(filepath, code) console.log(`${filepath}: ${chunks.length} chunks`) // Or use chunker.stream() for incremental processing for await (const chunk of chunker.stream(filepath, code)) { await indexChunk(filepath, chunk) } } // Override options per-file if needed const specialChunks = await chunker.chunk('src/config.ts', configCode, { maxChunkSize: 500, // Smaller chunks for config files contextMode: 'minimal' }) ``` -------------------------------- ### Custom Context Formatting with formatChunkWithContext Source: https://context7.com/supermemoryai/code-chunk/llms.txt A utility function, `formatChunkWithContext`, allows for custom formatting of chunk text by prepending semantic context. This is particularly useful when precise control over the text format used for embedding is required. It can also incorporate optional overlap text from the previous chunk for better continuity. ```typescript import { chunk, formatChunkWithContext } from 'code-chunk' const chunks = await chunk('src/service.ts', sourceCode, { contextMode: 'none' // Get raw chunks without automatic formatting }) for (const c of chunks) { // Custom formatting with optional overlap text const previousChunkLastLines = chunks[c.index - 1]?.text.split('\n').slice(-3).join('\n') const formattedText = formatChunkWithContext( c.text, c.context, previousChunkLastLines // Optional overlap for continuity ) console.log(formattedText) // Output: // # src/service.ts // # Scope: MyService > processData // # Defines: async transform(data: Input): Promise // # Uses: lodash, validator // # After: validate // # Before: save // // # ... // // # --- // async transform(data: Input): Promise { // return _.mapValues(data, validate) // } const embedding = await embed(formattedText) } ``` -------------------------------- ### Effect-Based Batch Processing with chunkBatchEffect and chunkBatchStreamEffect Source: https://context7.com/supermemoryai/code-chunk/llms.txt Provides effect-native batch processing capabilities using `chunkBatchEffect` and `chunkBatchStreamEffect`. `chunkBatchEffect` returns an Effect containing all results, suitable for integrating batch chunking into Effect applications. `chunkBatchStreamEffect` processes batches as a stream, allowing for more granular control and processing of results as they become available. ```typescript import { chunkBatchEffect, chunkBatchStreamEffect } from 'code-chunk' import { Effect, Stream, pipe } from 'effect' const files = [ { filepath: 'src/api/users.ts', code: usersApiCode }, { filepath: 'src/api/products.ts', code: productsApiCode }, { filepath: 'src/models/user.ts', code: userModelCode } ] // Batch processing with Effect const batchProgram = pipe( chunkBatchEffect(files, { concurrency: 5, maxChunkSize: 1500 }), Effect.flatMap((results) => Effect.forEach(results, (result) => { if (result.error) { return Effect.logError(`Failed: ${result.filepath}`) } return Effect.log(`Success: ${result.filepath} (${result.chunks.length} chunks)`) }) ) ) await Effect.runPromise(batchProgram) // Streaming batch with Effect const streamProgram = pipe( chunkBatchStreamEffect(files, { concurrency: 3 }), Stream.filter((result) => result.error === null), Stream.flatMap((result) => Stream.fromIterable(result.chunks)), Stream.mapEffect((chunk) => Effect.tryPromise(() => indexChunk(chunk)) ), Stream.runCollect ) const indexed = await Effect.runPromise(streamProgram) console.log(`Indexed ${indexed.length} chunks`) ``` -------------------------------- ### Batch Process Files with Error Handling (TypeScript) Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Processes multiple files concurrently using `chunkBatch`. It allows for configuration of maximum chunk size, concurrency, and provides progress updates. Results are collected in a promise, enabling iteration over individual file outcomes, including errors. ```typescript import { chunkBatch } from 'code-chunk' const files = [ { filepath: 'src/user.ts', code: userCode }, { filepath: 'src/auth.ts', code: authCode }, { filepath: 'lib/utils.py', code: utilsCode }, ] const results = await chunkBatch(files, { maxChunkSize: 1500, concurrency: 10, onProgress: (done, total, path, success) => { console.log(`[${done}/${total}] ${path}: ${success ? 'ok' : 'failed'}`) } }) for (const result of results) { if (result.error) { console.error(`Failed: ${result.filepath}`, result.error) } else { await indexChunks(result.filepath, result.chunks) } } ``` -------------------------------- ### Cloudflare Workers WASM Chunker Integration Source: https://context7.com/supermemoryai/code-chunk/llms.txt Provides a WASM-based chunker for environments like Cloudflare Workers. It requires importing WASM binaries for Tree-sitter and specific languages. The `createChunker` function initializes the WASM chunker with configuration for languages and chunking options. The chunker can be used for standard chunking, streaming, and batch processing. ```typescript import { createChunker, type WasmConfig } from 'code-chunk/wasm' // Import WASM binaries (Cloudflare Workers style) import treeSitterWasm from 'web-tree-sitter/tree-sitter.wasm' import typescriptWasm from 'tree-sitter-typescript/tree-sitter.tsx.wasm' import pythonWasm from 'tree-sitter-python/tree-sitter-python.wasm' const wasmConfig: WasmConfig = { treeSitter: treeSitterWasm, languages: { typescript: typescriptWasm, python: pythonWasm // Only include languages you need to minimize bundle size } } // Create WASM-based chunker (async initialization required) const chunker = await createChunker(wasmConfig, { maxChunkSize: 1500, contextMode: 'full' }) // Use the same API as the native chunker export default { async fetch(request: Request): Promise { const { filepath, code } = await request.json() try { const chunks = await chunker.chunk(filepath, code) return Response.json({ chunks }) } catch (error) { return Response.json({ error: error.message }, { status: 400 }) } } } // Streaming also works for await (const chunk of chunker.stream('api/handler.ts', handlerCode)) { await processChunk(chunk) } // Batch processing const results = await chunker.chunkBatch(files, { concurrency: 5 }) ``` -------------------------------- ### Streaming Batch Results (TypeScript) Source: https://context7.com/supermemoryai/code-chunk/llms.txt Streams batch results as files complete processing. Results are yielded immediately when each file finishes, enabling real-time progress and early processing. This is ideal for applications that need to react to chunking results as they become available. ```typescript import { chunkBatchStream } from 'code-chunk' const files = [ { filepath: 'src/auth.ts', code: authCode }, { filepath: 'src/user.ts', code: userCode }, { filepath: 'src/payment.ts', code: paymentCode }, { filepath: 'lib/utils.py', code: utilsCode }, { filepath: 'lib/helpers.go', code: helpersCode } ] console.log('Starting batch processing...') let processed = 0 let indexed = 0 // Results stream as files complete (not in order) for await (const result of chunkBatchStream(files, { concurrency: 5 })) { processed++ if (result.error) { console.error(`[${processed}/${files.length}] ${result.filepath} FAILED:`, result.error.message) continue } console.log(`[${processed}/${files.length}] ${result.filepath}: ${result.chunks.length} chunks`) // Index immediately as results arrive for (const chunk of result.chunks) { await vectorDB.upsert({ id: `${result.filepath}:${chunk.index}`, embedding: await embed(chunk.contextualizedText), metadata: { filepath: result.filepath, ...chunk.lineRange } }) indexed++ } } console.log(`Completed: ${indexed} chunks indexed from ${processed} files`) ``` -------------------------------- ### Chunk Code with Context (TypeScript) Source: https://context7.com/supermemoryai/code-chunk/llms.txt The primary `chunk` function parses source code, extracts entities, builds a scope tree, and returns semantic code chunks with rich metadata. It supports various options for chunking and context retrieval. This is suitable for batch processing or when all chunks are needed at once. ```typescript import { chunk } from 'code-chunk' const sourceCode = ` import { Database } from './db' class UserService { private db: Database constructor(db: Database) { this.db = db } async getUser(id: string): Promise { return this.db.query('SELECT * FROM users WHERE id = ?', [id]) } async createUser(data: UserData): Promise { return this.db.insert('users', data) } } ` const chunks = await chunk('src/services/user.ts', sourceCode, { maxChunkSize: 1500, contextMode: 'full', siblingDetail: 'signatures' }) for (const c of chunks) { console.log(`Chunk ${c.index + 1}/${c.totalChunks}`) console.log('Text:', c.text) console.log('Lines:', c.lineRange.start, '-', c.lineRange.end) console.log('Scope:', c.context.scope) // [{ name: 'UserService', type: 'class' }] console.log('Entities:', c.context.entities) // [{ name: 'getUser', type: 'method', signature: 'async getUser(id: string): Promise' }] console.log('Imports:', c.context.imports) // [{ name: 'Database', source: './db' }] // Use contextualizedText for embeddings const embedding = await embedModel.embed(c.contextualizedText) await vectorDB.upsert({ id: `user.ts:${c.index}`, embedding, metadata: c.lineRange }) } // Example contextualizedText output: // # src/services/user.ts // # Scope: UserService // # Defines: async getUser(id: string): Promise // # Uses: Database // # After: constructor // // async getUser(id: string): Promise { // return this.db.query('SELECT * FROM users WHERE id = ?', [id]) // } ``` -------------------------------- ### Using contextualized text for embeddings in TypeScript Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Shows how to use the `contextualizedText` property of code chunks for generating embeddings. This pre-formatted text includes semantic context, which can improve embedding quality for RAG systems. ```typescript for (const c of chunks) { const embedding = await embed(c.contextualizedText) await vectorDB.upsert({ id: `${filepath}:${c.index}`, embedding, metadata: { filepath, lines: c.lineRange } }) } ``` -------------------------------- ### Detect Programming Language from File Path Source: https://context7.com/supermemoryai/code-chunk/llms.txt Detects the programming language of a file based on its extension. It returns the language name or null if the extension is not supported. The `LANGUAGE_EXTENSIONS` constant provides a mapping of file extensions to language names. This can be used to override language detection when calling the `chunk` function. ```typescript import { detectLanguage, LANGUAGE_EXTENSIONS } from 'code-chunk' // Detect language from filepath console.log(detectLanguage('src/app.ts')) // 'typescript' console.log(detectLanguage('lib/utils.tsx')) // 'typescript' console.log(detectLanguage('scripts/build.js')) // 'javascript' console.log(detectLanguage('src/main.py')) // 'python' console.log(detectLanguage('cmd/server.go')) // 'go' console.log(detectLanguage('src/Main.java')) // 'java' console.log(detectLanguage('lib/core.rs')) // 'rust' console.log(detectLanguage('README.md')) // null (unsupported) // Access the full extension mapping console.log(LANGUAGE_EXTENSIONS) // { // '.ts': 'typescript', '.tsx': 'typescript', '.mts': 'typescript', '.cts': 'typescript', // '.js': 'javascript', '.jsx': 'javascript', '.mjs': 'javascript', '.cjs': 'javascript', // '.py': 'python', '.pyi': 'python', // '.rs': 'rust', // '.go': 'go', // '.java': 'java' // } // Use with chunk to override detection const chunks = await chunk('config.txt', tsCode, { language: 'typescript' // Force TypeScript parsing }) ``` -------------------------------- ### Streaming large files with code-chunk in TypeScript Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Illustrates how to process large files incrementally using the `chunkStream` function. This is useful for managing memory when dealing with very large code files. ```typescript import { chunkStream } from 'code-chunk' for await (const c of chunkStream('src/large.ts', code)) { await process(c) } ``` -------------------------------- ### Effect-Based Streaming with chunkStreamEffect Source: https://context7.com/supermemoryai/code-chunk/llms.txt Utilizes `chunkStreamEffect` for effect-native streaming of code chunks. This function returns an Effect Stream, allowing seamless integration with the Effect ecosystem for composable pipelines. It processes code from a specified file, applies transformations, and indexes the resulting chunks. ```typescript import { chunkStreamEffect } from 'code-chunk' import { Effect, Stream, pipe } from 'effect' const sourceCode = ` export function fibonacci(n: number): number { if (n <= 1) return n return fibonacci(n - 1) + fibonacci(n - 2) } export function factorial(n: number): number { if (n <= 1) return 1 return n * factorial(n - 1) } ` // Create a processing pipeline using Effect const program = pipe( chunkStreamEffect('src/math.ts', sourceCode, { maxChunkSize: 500 }), Stream.tap((chunk) => Effect.log(`Processing chunk ${chunk.index}: ${chunk.context.entities.map(e => e.name).join(', ')}`) ), Stream.mapEffect((chunk) => Effect.tryPromise(async () => ({ chunk, embedding: await embedModel.embed(chunk.contextualizedText) })) ), Stream.mapEffect(({ chunk, embedding }) => Effect.tryPromise(() => vectorDB.upsert({ id: `math:${chunk.index}`, embedding, metadata: chunk.context })) ), Stream.runCollect ) // Execute the pipeline const results = await Effect.runPromise(program) console.log(`Indexed ${results.length} chunks`) ``` -------------------------------- ### Integrate Chunking into Effect-based Pipelines (TypeScript) Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Integrates code chunking into Effect-based pipelines using `chunkStreamEffect`. This function returns an Effect-native `Stream` of chunks, allowing for composable and robust asynchronous data processing within the Effect ecosystem. It demonstrates logging each chunk's text as it's processed. ```typescript import { chunkStreamEffect } from 'code-chunk' import { Effect, Stream } from 'effect' const program = Stream.runForEach( chunkStreamEffect('src/utils.ts', code), (chunk) => Effect.log(chunk.text) ) await Effect.runPromise(program) ``` -------------------------------- ### Concurrent Batch Processing (TypeScript) Source: https://context7.com/supermemoryai/code-chunk/llms.txt Processes multiple files concurrently with per-file error handling. Failed files don't stop the batch; errors are captured in the result. This function is useful for processing large numbers of files efficiently. ```typescript import { chunkBatch } from 'code-chunk' import * as fs from 'fs' import * as glob from 'fast-glob' // Load all TypeScript files const filePaths = await glob('src/**/*.ts') const files = filePaths.map(filepath => ({ filepath, code: fs.readFileSync(filepath, 'utf-8') })) console.log(`Processing ${files.length} files...`) const results = await chunkBatch(files, { maxChunkSize: 1500, concurrency: 10, // Process 10 files in parallel onProgress: (completed, total, filepath, success) => { const status = success ? 'OK' : 'FAILED' console.log(`[${completed}/${total}] ${filepath}: ${status}`) } }) // Process results let totalChunks = 0 let failedFiles = 0 for (const result of results) { if (result.error) { console.error(`Error in ${result.filepath}:`, result.error.message) failedFiles++ } else { totalChunks += result.chunks.length // Index successful chunks for (const chunk of result.chunks) { await vectorDB.upsert({ id: `${result.filepath}:${chunk.index}`, embedding: await embed(chunk.contextualizedText), metadata: { filepath: result.filepath, lines: chunk.lineRange, entities: chunk.context.entities.map(e => e.name) } }) } } } console.log(`Indexed ${totalChunks} chunks from ${files.length - failedFiles} files`) console.log(`${failedFiles} files failed`) ``` -------------------------------- ### Stream Batch Results as Files Complete (TypeScript) Source: https://github.com/supermemoryai/code-chunk/blob/main/README.md Streams results from batch file processing as each file completes using `chunkBatchStream`. This is useful for scenarios where immediate processing of completed files is desired without waiting for the entire batch. It iterates over results, processing chunks for files that were successfully processed. ```typescript import { chunkBatchStream } from 'code-chunk' for await (const result of chunkBatchStream(files, { concurrency: 5 })) { if (result.chunks) { await indexChunks(result.filepath, result.chunks) } } ``` -------------------------------- ### Typed Error Handling for Chunking Operations Source: https://context7.com/supermemoryai/code-chunk/llms.txt The library provides specific error types like `ChunkingError` and `UnsupportedLanguageError` for robust error handling. These errors can be caught using standard `try...catch` blocks or within Effect-based programming paradigms. The `ChunkingError` includes a `cause` property for deeper inspection of the underlying issue. ```typescript import { chunk, ChunkingError, UnsupportedLanguageError } from 'code-chunk' async function processFile(filepath: string, code: string) { try { const chunks = await chunk(filepath, code) return { success: true, chunks } } catch (error) { if (error instanceof UnsupportedLanguageError) { console.warn(`Skipping unsupported file: ${error.filepath}`) return { success: false, reason: 'unsupported_language' } } if (error instanceof ChunkingError) { console.error(`Chunking failed: ${error.message}`) console.error('Cause:', error.cause) return { success: false, reason: 'chunking_error' } } throw error // Re-throw unexpected errors } } // Effect-style error handling with _tag import { Effect, pipe } from 'effect' import { chunkStreamEffect } from 'code-chunk' const program = pipe( chunkStreamEffect('src/app.ts', code), Stream.runCollect, Effect.catchTag('UnsupportedLanguageError', (e) => Effect.succeed([]) // Return empty for unsupported files ), Effect.catchTag('ChunkingError', (e) => Effect.fail(new Error(`Parse error: ${e.message}`)) ) ) ``` -------------------------------- ### Stream Code Chunks for Large Files (TypeScript) Source: https://context7.com/supermemoryai/code-chunk/llms.txt The `chunkStream` function processes large files by streaming semantic code chunks as they are generated using an async generator. This approach is memory-efficient as it avoids loading all chunks simultaneously. `totalChunks` is reported as -1 due to the unknown total count upfront. ```typescript import { chunkStream } from 'code-chunk' import * as fs from 'fs' const largeFile = fs.readFileSync('src/large-module.ts', 'utf-8') // Process chunks incrementally as they're generated for await (const chunk of chunkStream('src/large-module.ts', largeFile)) { console.log(`Processing chunk ${chunk.index}...`) console.log('Entities:', chunk.context.entities.map(e => e.name).join(', ')) // Stream to vector database immediately const embedding = await embedModel.embed(chunk.contextualizedText) await vectorDB.upsert({ id: `large-module:${chunk.index}`, embedding, metadata: { filepath: 'src/large-module.ts', lines: chunk.lineRange, scope: chunk.context.scope.map(s => s.name).join(' > ') } }) // Free memory after processing console.log(`Chunk ${chunk.index} indexed successfully`) } console.log('Finished processing large file') ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.