# DuckDB-Wasm
DuckDB-Wasm brings DuckDB, an in-process SQL OLAP Database Management System, to every browser thanks to WebAssembly. It enables running analytical SQL queries directly in the browser or Node.js without requiring a server, supporting Apache Arrow for efficient data handling, and reading Parquet, CSV, and JSON files from local filesystems or HTTP endpoints. The library has been tested with Chrome, Firefox, Safari, and Node.js, and is currently based on DuckDB v1.5.2.
The library provides both synchronous and asynchronous APIs, with the async API being the recommended approach for browser environments using Web Workers. DuckDB-Wasm speaks Arrow fluently, allowing seamless integration with the Apache Arrow ecosystem for columnar data processing. It supports loading DuckDB extensions at runtime, spatial queries, and provides React hooks through the `@duckdb/react-duckdb` package for easy integration with React applications.
## Core APIs
### Initializing DuckDB in the Browser (Async)
The primary way to use DuckDB-Wasm in a browser is through the async API with Web Workers. This approach prevents blocking the main thread during database operations and is the recommended pattern for production applications.
```typescript
import * as duckdb from '@duckdb/duckdb-wasm';
// Select the appropriate WebAssembly bundle based on browser capabilities
const DUCKDB_CONFIG = await duckdb.selectBundle({
mvp: {
mainModule: './duckdb-mvp.wasm',
mainWorker: './duckdb-browser-mvp.worker.js',
},
eh: {
mainModule: './duckdb-eh.wasm',
mainWorker: './duckdb-browser-eh.worker.js',
},
coi: {
mainModule: './duckdb-coi.wasm',
mainWorker: './duckdb-browser-coi.worker.js',
pthreadWorker: './duckdb-browser-coi.pthread.worker.js',
},
});
// Create logger and worker
const logger = new duckdb.ConsoleLogger();
const worker = new Worker(DUCKDB_CONFIG.mainWorker!);
// Initialize AsyncDuckDB instance
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(DUCKDB_CONFIG.mainModule, DUCKDB_CONFIG.pthreadWorker);
// Connect and run queries
const conn = await db.connect();
const result = await conn.query(`SELECT 42 as answer`);
console.log(result.toArray()); // [{ answer: 42 }]
// Cleanup
await conn.close();
await db.terminate();
await worker.terminate();
```
### Initializing DuckDB in Node.js
For Node.js environments, DuckDB-Wasm provides similar async APIs with Worker support through the web-worker package.
```javascript
const duckdb = require('@duckdb/duckdb-wasm');
const path = require('path');
const Worker = require('web-worker');
const DUCKDB_DIST = path.dirname(require.resolve('@duckdb/duckdb-wasm'));
(async () => {
const DUCKDB_CONFIG = await duckdb.selectBundle({
mvp: {
mainModule: path.resolve(DUCKDB_DIST, './duckdb-mvp.wasm'),
mainWorker: path.resolve(DUCKDB_DIST, './duckdb-node-mvp.worker.cjs'),
},
eh: {
mainModule: path.resolve(DUCKDB_DIST, './duckdb-eh.wasm'),
mainWorker: path.resolve(DUCKDB_DIST, './duckdb-node-eh.worker.cjs'),
},
});
const logger = new duckdb.ConsoleLogger();
const worker = new Worker(DUCKDB_CONFIG.mainWorker);
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(DUCKDB_CONFIG.mainModule, DUCKDB_CONFIG.pthreadWorker);
const conn = await db.connect();
const result = await conn.query(`SELECT count(*)::INTEGER as v FROM generate_series(0, 100) t(v)`);
console.log(result.toArray()); // [{ v: 101 }]
await conn.close();
await db.terminate();
await worker.terminate();
})();
```
### Using CDN in Plain HTML
For quick prototyping, you can load DuckDB-Wasm directly from a CDN without any build tools.
```html
DuckDB-Wasm Demo
```
### Database Configuration
DuckDB-Wasm supports various configuration options when opening a database, including access modes, threading, and query behavior settings.
```typescript
import * as duckdb from '@duckdb/duckdb-wasm';
// Open database with configuration
await db.open({
path: ':memory:', // Database path (":memory:" for in-memory)
accessMode: duckdb.DuckDBAccessMode.READ_WRITE,
maximumThreads: 4, // Requires cross-origin isolation
useDirectIO: false,
query: {
queryPollingInterval: 100, // Polling interval in ms
castBigIntToDouble: false, // Convert BigInt to Double
castTimestampToDate: false, // Convert Timestamp to Date64
castDecimalToDouble: false, // Convert Decimal to Double
},
filesystem: {
reliableHeadRequests: true,
allowFullHTTPReads: true, // Allow fallback for servers without range support
},
allowUnsignedExtensions: false,
});
```
### Running Queries
The connection object provides methods for executing SQL queries with full Arrow Table support for efficient columnar data access.
```typescript
import * as arrow from 'apache-arrow';
const conn = await db.connect();
// Simple query returning Arrow Table
const result = await conn.query<{ name: arrow.Utf8; value: arrow.Int32 }>(`
SELECT 'hello' as name, 42 as value
UNION ALL
SELECT 'world' as name, 100 as value
`);
// Access results
console.log(result.numRows); // 2
console.log(result.numCols); // 2
console.log(result.toArray()); // [{name: 'hello', value: 42}, {name: 'world', value: 100}]
// Access columns by index
const nameColumn = result.getChildAt(0);
console.log(nameColumn?.toArray()); // ['hello', 'world']
// Iterate over rows
for (const row of result) {
console.log(row.name, row.value);
}
await conn.close();
```
### Streaming Query Results
For large result sets, use streaming to process data incrementally without loading everything into memory.
```typescript
const conn = await db.connect();
// Send query and get streaming reader
const reader = await conn.send<{ v: arrow.Int32 }>(
`SELECT v FROM generate_series(1, 1000000) AS t(v)`,
true // allowStreamResult
);
// Process record batches as they arrive
let totalRows = 0;
for await (const batch of reader) {
totalRows += batch.numRows;
console.log(`Processed batch with ${batch.numRows} rows`);
}
console.log(`Total rows: ${totalRows}`);
await conn.close();
```
### Prepared Statements
Prepared statements allow you to compile a query once and execute it multiple times with different parameters, improving performance for repeated queries.
```typescript
const conn = await db.connect();
// Create prepared statement with parameter placeholder
const stmt = await conn.prepare(`SELECT v + ? as result FROM generate_series(0, 10) AS t(v)`);
// Execute with different parameters
const result1 = await stmt.query(100);
console.log(result1.toArray()); // [{result: 100}, {result: 101}, ..., {result: 110}]
const result2 = await stmt.query(1000);
console.log(result2.toArray()); // [{result: 1000}, {result: 1001}, ..., {result: 1010}]
// Close statement when done
await stmt.close();
await conn.close();
```
### Registering Files from Buffers
Register in-memory data as virtual files that can be queried with SQL.
```typescript
const conn = await db.connect();
const encoder = new TextEncoder();
// Register CSV data as a file
const csvData = `name,age,city
Alice,30,New York
Bob,25,San Francisco
Charlie,35,Boston`;
await db.registerFileBuffer('people.csv', encoder.encode(csvData));
// Query the registered file
const result = await conn.query(`SELECT * FROM 'people.csv'`);
console.log(result.toArray());
// [{name: 'Alice', age: 30, city: 'New York'}, ...]
// Clean up
await db.dropFile('people.csv');
await conn.close();
```
### Registering Remote Files via HTTP
Register remote files accessible via HTTP/HTTPS for direct querying without downloading.
```typescript
const conn = await db.connect();
// Register a remote Parquet file
await db.registerFileURL(
'remote_data.parquet',
'https://example.com/data.parquet',
duckdb.DuckDBDataProtocol.HTTP,
false // directIO
);
// Query the remote file directly
const result = await conn.query(`SELECT * FROM 'remote_data.parquet' LIMIT 10`);
console.log(result.toArray());
// Alternative: Query remote files directly with URL
const result2 = await conn.query(`
SELECT * FROM read_parquet('https://blobs.duckdb.org/stations.parquet') LIMIT 5
`);
await conn.close();
```
### Inserting CSV Data
Import CSV data into DuckDB tables with automatic schema detection or explicit column definitions.
```typescript
import * as arrow from 'apache-arrow';
const conn = await db.connect();
const encoder = new TextEncoder();
// CSV with auto-detection
const csvAuto = `a,b,c
1,2,3
4,5,6
7,8,9`;
await db.registerFileBuffer('data.csv', encoder.encode(csvAuto));
await conn.insertCSVFromPath('data.csv', {
schema: 'main',
name: 'my_table',
});
const result = await conn.query('SELECT * FROM main.my_table');
console.log(result.toArray());
// [{a: 1, b: 2, c: 3}, {a: 4, b: 5, c: 6}, {a: 7, b: 8, c: 9}]
// CSV with explicit schema and options
const csvCustom = `1|2|01/02/2020
4|5|01/03/2020
7|8|01/04/2020`;
await db.registerFileBuffer('custom.csv', encoder.encode(csvCustom));
await conn.insertCSVFromPath('custom.csv', {
schema: 'main',
name: 'custom_table',
detect: false,
header: false,
delimiter: '|',
dateFormat: '%m/%d/%Y',
columns: {
col_a: new arrow.Int16(),
col_b: new arrow.Int32(),
col_date: new arrow.DateDay(),
},
});
await conn.close();
```
### Inserting JSON Data
Import JSON data (row arrays or column objects) into DuckDB tables.
```typescript
import * as arrow from 'apache-arrow';
const conn = await db.connect();
const encoder = new TextEncoder();
// JSON as row array
const jsonRows = `[
{"a": 1, "b": 2, "c": 3},
{"a": 4, "b": 5, "c": 6},
{"a": 7, "b": 8, "c": 9}
]`;
await db.registerFileBuffer('rows.json', encoder.encode(jsonRows));
await conn.insertJSONFromPath('rows.json', {
schema: 'main',
name: 'json_table',
});
const result = await conn.query('SELECT * FROM main.json_table');
console.log(result.toArray());
// [{a: 1, b: 2, c: 3}, {a: 4, b: 5, c: 6}, {a: 7, b: 8, c: 9}]
// JSON as column object
const jsonCols = `{
"x": [1, 4, 7],
"y": [2, 5, 8],
"z": [3, 6, 9]
}`;
await db.registerFileBuffer('cols.json', encoder.encode(jsonCols));
await conn.insertJSONFromPath('cols.json', {
schema: 'main',
name: 'col_table',
shape: duckdb.JSONTableShape.COLUMN_OBJECT,
});
await conn.close();
```
### Inserting Arrow Tables
Insert Apache Arrow tables directly into DuckDB for seamless integration with the Arrow ecosystem.
```typescript
import * as arrow from 'apache-arrow';
const conn = await db.connect();
// Create Arrow table from vectors
const table = new arrow.Table({
name: arrow.vectorFromArray(['Alice', 'Bob', 'Charlie']),
age: arrow.makeVector(new Int32Array([30, 25, 35])),
score: arrow.makeVector(new Float64Array([95.5, 87.3, 92.1])),
});
// Insert Arrow table into DuckDB
await conn.insertArrowTable(table, {
schema: 'main',
name: 'arrow_data',
create: true,
});
// Query the inserted data
const result = await conn.query('SELECT * FROM main.arrow_data ORDER BY age');
console.log(result.toArray());
// [{name: 'Bob', age: 25, score: 87.3}, {name: 'Alice', age: 30, score: 95.5}, ...]
// Insert from IPC stream buffer
const ipcBuffer = arrow.tableToIPC(table, 'stream');
await conn.insertArrowFromIPCStream(ipcBuffer, {
schema: 'main',
name: 'from_ipc',
create: true,
});
await conn.close();
```
### User-Defined Functions (UDFs)
Create custom scalar functions in JavaScript that can be used in SQL queries.
```typescript
import { Int32, Float64, Utf8 } from 'apache-arrow';
const conn = await db.connect();
// Simple UDF returning integer
conn.createScalarFunction('double_it', new Int32(), (x) => x * 2);
const result1 = await conn.query('SELECT double_it(21) as answer');
console.log(result1.toArray()); // [{answer: 42}]
// UDF with multiple arguments
conn.createScalarFunction('add_three', new Int32(), (a, b, c) => a + b + c);
const result2 = await conn.query('SELECT add_three(10, 20, 30) as sum');
console.log(result2.toArray()); // [{sum: 60}]
// UDF returning string
conn.createScalarFunction('greet', new Utf8(), (name) => `Hello, ${name}!`);
const result3 = await conn.query("SELECT greet('World') as greeting");
console.log(result3.toArray()); // [{greeting: 'Hello, World!'}]
// UDF handling nulls
conn.createScalarFunction('safe_double', new Int32(), (x) => x == null ? 0 : x * 2);
const result4 = await conn.query(`
SELECT safe_double(CASE WHEN v % 2 = 0 THEN v ELSE NULL END) as doubled
FROM generate_series(1, 5) AS t(v)
`);
console.log(result4.toArray()); // [{doubled: 0}, {doubled: 4}, {doubled: 0}, {doubled: 8}, {doubled: 0}]
await conn.close();
```
### Loading Extensions
DuckDB-Wasm supports loading extensions at runtime for additional functionality like spatial queries, ICU, and more.
```typescript
const conn = await db.connect();
// Load core extensions
await conn.query(`LOAD icu`);
await conn.query(`LOAD parquet`);
// Install and load community extensions
await conn.query(`INSTALL spatial FROM 'https://extensions.duckdb.org'`);
await conn.query(`LOAD spatial`);
// Install from community repository
await conn.query(`INSTALL h3 FROM community`);
await conn.query(`LOAD h3`);
// Check loaded extensions
const extensions = await conn.query(`FROM duckdb_extensions() WHERE loaded`);
console.log(extensions.toArray());
// Use spatial functions
const spatialResult = await conn.query(`
SELECT ST_Distance(
ST_Point(-73.9857, 40.7484),
ST_Point(-122.4194, 37.7749)
) * 111139 AS distance_meters
`);
console.log(spatialResult.toArray()); // Approximate distance NYC to SF
await conn.close();
```
### Attaching Remote Databases
Attach remote DuckDB database files for querying without downloading the entire file.
```typescript
const conn = await db.connect();
// Attach a remote database file
await conn.query(`ATTACH 'https://blobs.duckdb.org/data/test.db' AS remote_db`);
// Query tables from the attached database
const result = await conn.query(`FROM remote_db.t`);
console.log(result.toArray());
// List tables in attached database
const tables = await conn.query(`
SELECT table_name FROM information_schema.tables
WHERE table_catalog = 'remote_db'
`);
console.log(tables.toArray());
await conn.close();
```
### React Integration with @duckdb/react-duckdb
The React package provides hooks and providers for easy DuckDB integration in React applications.
```tsx
import React from 'react';
import * as duckdb from '@duckdb/duckdb-wasm';
import {
DuckDBProvider,
DuckDBConnectionProvider,
useDuckDB,
useDuckDBResolver,
useDuckDBConnection,
useDuckDBConnectionDialer,
} from '@duckdb/react-duckdb';
// App wrapper with providers
function App() {
return (
);
}
// Component using DuckDB hooks
function QueryComponent() {
const db = useDuckDB();
const resolveDB = useDuckDBResolver();
const conn = useDuckDBConnection();
const dial = useDuckDBConnectionDialer();
const [result, setResult] = React.useState([]);
React.useEffect(() => {
// Trigger database initialization and connection
dial();
}, []);
React.useEffect(() => {
if (!conn) return;
const runQuery = async () => {
const queryResult = await conn.query(`
SELECT * FROM generate_series(1, 10) AS t(num)
`);
setResult(queryResult.toArray());
};
runQuery();
}, [conn]);
if (db.resolving()) return Loading database...
;
if (!conn) return Connecting...
;
return (
{result.map((row, i) => (
- {row.num}
))}
);
}
```
### File Statistics and Monitoring
Track file access patterns and statistics for performance analysis.
```typescript
const conn = await db.connect();
// Enable statistics collection for a file
await db.registerFileURL(
'data.parquet',
'https://example.com/data.parquet',
duckdb.DuckDBDataProtocol.HTTP,
false
);
await db.collectFileStatistics('data.parquet', true);
// Run some queries
await conn.query(`SELECT COUNT(*) FROM 'data.parquet'`);
await conn.query(`SELECT * FROM 'data.parquet' WHERE id < 100`);
// Export statistics
const stats = await db.exportFileStatistics('data.parquet');
console.log('File statistics:', stats);
// Disable collection
await db.collectFileStatistics('data.parquet', false);
await conn.close();
```
### Copying Files to Buffers
Export query results or database files to memory buffers.
```typescript
const conn = await db.connect();
// Create and populate a table
await conn.query(`CREATE TABLE export_test AS SELECT * FROM generate_series(1, 100) AS t(v)`);
// Export to Parquet file (in virtual filesystem)
await conn.query(`COPY export_test TO 'output.parquet' (FORMAT PARQUET)`);
// Copy file to buffer
const buffer = await db.copyFileToBuffer('output.parquet');
console.log(`Exported ${buffer.length} bytes`);
// Use the buffer (e.g., download in browser)
const blob = new Blob([buffer], { type: 'application/octet-stream' });
const url = URL.createObjectURL(blob);
// Create download link: Download
// Clean up
await db.dropFile('output.parquet');
await conn.close();
```
## Summary
DuckDB-Wasm is ideal for analytical workloads in browser-based applications, data visualization tools, and serverless environments where running SQL queries locally provides better performance and privacy than server roundtrips. Common use cases include building interactive data exploration interfaces, client-side ETL pipelines, Jupyter-style notebook environments in the browser, and offline-capable analytics applications. The library excels at querying Parquet and CSV files directly from URLs, making it perfect for data journalism, scientific computing, and business intelligence dashboards.
Integration patterns typically involve initializing DuckDB with a Web Worker for non-blocking operations, registering data sources (local files, remote URLs, or in-memory buffers), and executing SQL queries that return Apache Arrow tables for efficient processing. The React hooks package simplifies state management in React applications, while the streaming API enables processing large datasets that exceed available memory. For production deployments, consider using the COI (Cross-Origin Isolated) bundle for multi-threading support, implementing proper error handling around query execution, and leveraging prepared statements for parameterized queries to prevent SQL injection and improve performance.