Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Theme
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Create API Key
Add Docs
Kotlin DataFrame
https://github.com/kotlin/dataframe
Admin
Kotlin DataFrame is a typesafe, in-memory structured data processing library for the JVM, offering
...
Tokens:
318,334
Snippets:
3,101
Trust Score:
9.5
Update:
4 days ago
Context
Skills
Chat
Benchmark
80.6
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Kotlin DataFrame Kotlin DataFrame is a typesafe, in-memory structured data processing library for the JVM. It reconciles Kotlin's static typing with the dynamic nature of data by providing a functional, immutable, and hierarchical data model that works seamlessly in both regular Gradle/Maven projects and Jupyter notebooks. The library supports three column kinds — `ValueColumn` (data values), `ColumnGroup` (nested columns), and `FrameColumn` (nested DataFrames) — enabling representation of arbitrarily deep JSON-like structures. Every operation returns a new `DataFrame` instance, reusing underlying storage where possible, making the API chain-friendly and side-effect-free. The core functionality revolves around a rich DSL for creating, reading, filtering, transforming, aggregating, and writing DataFrames. Data can be loaded from CSV, TSV, JSON, Excel, Apache Arrow, Apache Parquet, and SQL databases; written back to any of those formats; and accessed either dynamically via string column names or in a fully type-safe manner through `@DataSchema`-annotated interfaces and the Kotlin DataFrame Compiler Plugin. The compiler plugin generates extension properties that provide IDE autocompletion, refactoring support, and compile-time schema verification across the entire transformation pipeline. --- ## Setup ### Gradle dependency ```kotlin // build.gradle.kts repositories { mavenCentral() } dependencies { implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta5") } // Optional: enable the Compiler Plugin for type-safe column access plugins { kotlin("jvm") version "2.3.20" kotlin("plugin.dataframe") version "2.3.20" } // gradle.properties — required while incremental compilation is unsupported // kotlin.incremental=false ``` ### Kotlin Notebook / Jupyter ``` %useLatestDescriptors %use dataframe ``` --- ## DataFrame Creation ### `dataFrameOf` — create a DataFrame inline Builds a `DataFrame` from column-name/value pairs, from `vararg` row values, or from existing `DataColumn` objects. The most direct way to construct a DataFrame from literals. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // From name-to-list pairs val df = dataFrameOf( "name" to listOf("Alice", "Bob", "Charlie"), "age" to listOf(15, 20, 100), ) // From column names + row values (vararg) val df2 = dataFrameOf("name", "age")( "Alice", 15, "Bob", 20, "Charlie", 100, ) // With nested ColumnGroup val df3 = dataFrameOf( "name" to columnOf( "firstName" to columnOf("Alice", "Bob"), "lastName" to columnOf("Cooper", "Dylan"), ), "age" to columnOf(15, 20), ) ``` --- ### `toDataFrame` — convert Kotlin objects / collections Converts a `List<T>`, `Map<String, List<*>>`, an `IntRange` builder, or a `List<List<T>>` into a `DataFrame`. Supports deep object graph traversal via the `maxDepth` parameter and a configuration DSL. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // From a Map val map = mapOf("name" to listOf("Alice", "Bob"), "age" to listOf(15, 20)) val df = map.toDataFrame() // From data class instances data class Name(val firstName: String, val lastName: String) data class Score(val subject: String, val value: Int) data class Student(val name: Name, val age: Int, val scores: List<Score>) val students = listOf( Student(Name("Alice", "Cooper"), 15, listOf(Score("math", 4), Score("biology", 3))), Student(Name("Bob", "Marley"), 20, listOf(Score("music", 5))), ) // maxDepth = 1 → Name becomes ColumnGroup, scores becomes FrameColumn val df2 = students.toDataFrame(maxDepth = 1) // Advanced DSL with custom columns, property exclusions, and column groups val df3 = students.toDataFrame { "year of birth" from { 2021 - it.age } properties(maxDepth = 1) { exclude(Score::subject) preserve<Name>() // keep Name as an object, don't decompose } "summary" { "max score" from { it.scores.maxOf { s -> s.value } } "min score" from { it.scores.minOf { s -> s.value } } } } // From an IntRange with a builder lambda (useful for random/generated data) val generated = (0 until 7).toDataFrame { "id" from { "P${1000 + it}" } "price" from { kotlin.random.Random.nextDouble(10.0, 500.0) } "inStock" from { kotlin.random.Random.nextInt(0..100) } } ``` --- ## Reading Data ### `DataFrame.read` — auto-detect format Reads a `DataFrame` from a file path or URL, automatically detecting the format from the file extension. ```kotlin val df = DataFrame.read("input.csv") val df2 = DataFrame.read("https://example.com/data.json") ``` --- ### `DataFrame.readCsv` / `readCsvStr` — read CSV Reads CSV (or TSV/delimited) files with automatic type inference for `Int`, `Long`, `Double`, and `Boolean`. Supports custom delimiters, null-string sets, locale-specific number formats, and custom date-time patterns via `ParserOptions`. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.io.* import java.io.File import java.util.Locale // From file val df = DataFrame.readCsv("data.csv") DataFrame.readCsv(File("data.csv")) // From URL DataFrame.readCsv(java.net.URI("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv").toURL()) // From String val csv = """ A,B,C,D 12,tuv,0.12,true 41,xyz,3.6,not assigned 89,abc,7.1,false """.trimIndent() DataFrame.readCsvStr(csv) // Schema inferred: A: Int, B: String, C: Double, D: Boolean? // Custom delimiter + null strings val df2 = DataFrame.readCsv( File("data.psv"), delimiter = '|', header = listOf("A", "B", "C", "D"), parserOptions = ParserOptions(nullStrings = setOf("not assigned")), ) // Locale-specific numbers (comma as decimal separator) val df3 = DataFrame.readCsv(File("eu_data.csv"), parserOptions = ParserOptions(locale = Locale.GERMAN)) // Custom date-time pattern val df4 = DataFrame.readCsv( File("log.csv"), parserOptions = ParserOptions( dateTime = DateTimeParserOptions.Java.withPattern("dd/MMM/yy h:mm a"), ), ) // Disable type inference for all columns (keep everything as String) val df5 = DataFrame.readCsv(File("data.csv"), colTypes = mapOf(ColType.DEFAULT to ColType.String)) ``` --- ### `DataFrame.readJson` / `readJsonStr` — read JSON Reads JSON arrays or objects into a hierarchical `DataFrame`. Nested JSON objects become `ColumnGroup`s and arrays of objects become `FrameColumn`s. Type clashes (same key holding different JSON types) can be resolved via `typeClashTactic`. Use `keyValuePaths` to read large key-value maps as `FrameColumn`s instead of exploding into hundreds of columns. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.io.* // From file or URL val df = DataFrame.readJson(File("data.json")) DataFrame.readJson("https://covid.ourworldindata.org/data/owid-covid-data.json") // Type-clash tactic: fold mixed-type fields into (value, array, objectProps) groups DataFrame.readJsonStr(json, typeClashTactic = JSON.TypeClashTactic.ARRAY_AND_VALUE_COLUMNS) // ANY_COLUMNS: keep mixed-type column as Any DataFrame.readJsonStr(json, typeClashTactic = JSON.TypeClashTactic.ANY_COLUMNS) // keyValuePaths: read {"dogs": {"fido": {...}, "spot": {...}}} as FrameColumn // instead of one column per dog name DataFrame.readJsonStr( text = myJson, keyValuePaths = listOf( JsonPath().append("dogs"), JsonPath().append("cats"), ), ) // Result schema: dogs: *[name: String, value: {age: Int, breed: String}] ``` --- ### `DataFrame.readExcel` — read Excel (XLS/XLSX) Reads Excel spreadsheets. Date cells are read as `kotlinx.datetime.LocalDateTime`, numeric cells as `Double`. Use `stringColumns` to force specific columns to be read as `String`. ```kotlin import org.jetbrains.kotlinx.dataframe.io.* val df = DataFrame.readExcel(File("report.xlsx")) DataFrame.readExcel("https://example.com/data.xlsx") // Force column "A" to be String to avoid numeric/string mixed type val df2 = DataFrame.readExcel("mixed_column.xlsx", stringColumns = StringColumns("A")) ``` --- ### `DataFrame.readArrowFeather` — read Apache Arrow Reads Arrow IPC streaming format or Feather (random access) format from a file, `InputStream`, `Channel`, or `ByteArray`. ```kotlin import org.jetbrains.kotlinx.dataframe.io.* val df = DataFrame.readArrowFeather(File("data.feather")) val df2 = DataFrame.readArrowIPC(File("data.arrow")) ``` --- ### SQL Database — `DataFrame.readSqlTable` / `readSqlQuery` Reads data from SQL databases (PostgreSQL, MySQL, MariaDB, SQLite, MS SQL, DuckDB) via JDBC. Requires the `dataframe-jdbc` artifact and the appropriate JDBC driver. ```kotlin // build.gradle.kts // implementation("org.jetbrains.kotlinx:dataframe-jdbc:1.0.0-Beta5") // implementation("org.postgresql:postgresql:$version") import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig import org.jetbrains.kotlinx.dataframe.api.print val dbConfig = DbConnectionConfig( url = "jdbc:postgresql://localhost:5432/testDatabase", username = "postgres", password = "password", ) // Read an entire table (first 100 rows) val df = DataFrame.readSqlTable(dbConfig, "Customer", limit = 100) df.print() // Execute an arbitrary SQL query val result = DataFrame.readSqlQuery(dbConfig, "SELECT id, name, age FROM Customer WHERE age > 30") // Read using an existing JDBC Connection import java.sql.DriverManager val conn = DriverManager.getConnection("jdbc:sqlite:local.db") val df2 = conn.readDataFrame("SELECT * FROM orders") // Inspect schema without reading data val schema = DataFrameSchema.readSqlTable(dbConfig, "Customer") ``` --- ## Column Selection DSL ### Columns Selection DSL — multi-column selectors A powerful DSL used across `select`, `filter`, `update`, `remove`, `move`, `convert`, and many other operations. Supports property-based access (with compiler plugin / `@DataSchema`), string-based access, index-based access, and predicate-based selection. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Select specific columns df.select { age and weight } df.select("age", "weight") df["age", "weight"] // Type-filtered columns at any depth df.select { colsAtAnyDepth().colsOf<String>() } // Filter columns by predicate df.remove { cols { it.hasNulls() } } // Nested column access df.select { name.firstName and name.lastName } // ColumnGroup navigation // Range of columns df.select { age.."weight" } // All columns in a group df.select { name.allCols() } // Combine in real operations df.fillNaNs { colsAtAnyDepth().colsOf<Double>() }.withZero() df.update { city }.notNull { it.lowercase() } df.move { name.firstName and name.lastName }.after { city } ``` --- ## Filtering Rows ### `filter` — row filtering Returns a `DataFrame` with rows that satisfy a row condition. The lambda receiver is `DataRow`, giving access to column values as properties (with compiler plugin) or via typed string access. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Property-based (requires @DataSchema or compiler plugin) df.filter { age > 18 && name.firstName.startsWith("A") } // String-based df.filter { "age"<Int>() > 18 && "name"["firstName"]<String>().startsWith("A") } // Combined with other operations df .filter { age in 18..65 } .sortBy { age } .select { name and age and city } ``` --- ## Column Operations ### `add` — add computed columns Appends new columns derived from a row expression. Supports multiple columns at once, column groups, and recurrent (row-to-row) calculations. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Single column df.add("year of birth") { 2021 - age } // Multiple columns including a nested group df.add { "year of birth" from { 2021 - age } expr { age > 18 } into "is adult" "details" { name.lastName.map { it.length } into "last name length" "full name" from { name.firstName + " " + name.lastName } } } // Recurrent computation (Fibonacci) df.add("fibonacci") { if (index() < 2) 1 else prev()!!.newValue<Int>() + prev()!!.prev()!!.newValue<Int>() } // Add sequential id column (prepended as first column) df.addId() df.addId("rowId") ``` --- ### `update` — update cell values (same type) Changes values in selected cells without altering the column type. Supports row conditions, index ranges, per-column and per-row-column expressions. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.update { age }.with { it * 2 } df.update { colsAtAnyDepth().colsOf<String>() }.with { it.uppercase() } df.update { weight }.at(1..4).notNull { it / 2 } df.update { name.lastName and age }.at(1, 3, 4).withNull() // Conditional update df.update { city }.where { name.firstName == "Alice" }.with { "Paris" } // Row-dependent update df.update { city }.with { name.firstName + " from " + it } // Per-column update (replace with column mean) df.update { colsOf<Number?>() }.perCol { mean(skipNaN = true) } // Per-row-col update df.update { colsOf<String?>() }.perRowCol { row, col -> col.name() + ": " + row.index() } // Update a ColumnGroup as a DataFrame df.update { name }.asFrame { select { lastName } } ``` --- ### `convert` — change column types Returns a `DataFrame` with column values converted to a different type. Supports automatic conversions between primitives, date/time types, enums, value classes, and custom converters via `ParserOptions`. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.convert { age }.with { it.toDouble() } df.convert { colsAtAnyDepth().colsOf<String>() }.with { it.toCharArray().toList() } // Automatic type conversion shortcuts df.convert { age }.to<Double>() df.convert { weight }.toFloat() df.convert { colsOf<Number>() }.to<String>() // Column-level conversion df.weight.convertTo<Float?>() df.age.convertToDouble() // String → enum enum class Direction { NORTH, SOUTH, WEST, EAST } dataFrameOf("direction")("NORTH", "WEST").convert("direction").to<Direction>() // String → value class @JvmInline value class IntClass(val value: Int) dataFrameOf("value")("1", "2").convert("value").to<IntClass>() // String with custom locale / date format stringDf.convert { value }.to<Double?>( parserOptions = ParserOptions(locale = Locale.GERMAN, nullStrings = setOf("-")), ) stringDf.convert { date }.toLocalDate(kotlinx.datetime.LocalDate.Formats.ISO) ``` --- ### `rename` / `renameToCamelCase` — rename columns Renames one or more columns by new name, name expression, or bulk camelCase normalization. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Rename a single column df.rename { name }.into("fullName") df.rename("name").into("fullName") // Rename with an expression using column statistics df.rename { age }.into { val mean = it.data.mean(); "age [mean = $mean]" } // Rename a subset to camelCase df.rename { ColumnA and `COLUMN-C` }.toCamelCase() // Rename ALL columns (including nested) to camelCase // e.g. "first_name" → "firstName", "RESTApi" → "restApi" df.renameToCamelCase() ``` --- ### `remove` — drop columns Returns a `DataFrame` without the selected columns. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.remove { name and weight } df.remove("name", "weight") // Remove all columns that have nulls df.remove { cols { it.hasNulls() } } ``` --- ### `select` — project columns Creates a new `DataFrame` containing only the specified columns, preserving order. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.select { age and weight } df.select("age", "weight") df.select { colsOf<Int>() } df.select { name.allCols() } // flatten a ColumnGroup into top-level ``` --- ### `move` — reorder / restructure columns Moves columns to a different position or restructures them into/out of `ColumnGroup`s. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.move { age }.toStart() df.move { weight }.to(1) // Group age + weight under "info" df.move { age and weight }.under("info") // Rename path while moving: name.firstName → fullName.first df.move { name.firstName and name.lastName }.into { pathOf("fullName", it.name().dropLast(4)) } // Flatten a ColumnGroup to top level df.move { name.allCols() }.toTop() // Split pipe-separated column names into hierarchy: "a|b|c" → a.b.c dataFrameOf("a|b|c", "a|d|e")(0, 0) .move { all() }.into { it.name().split("|").toPath() } ``` --- ### `split` — split column values horizontally or vertically Splits `String`, `List`, or `DataFrame` column values and stores them in new columns or as new rows. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Split String column into characters → separate columns df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2") // Split comma-separated string into a list in-place df.split { "tags"<String>() }.by(",").inplace() // Expand a list column into separate rows (explode variant) val dfWithLists = dataFrameOf( "a" to columnOf(listOf(1, 2), listOf(3, 4, 5)), "b" to columnOf(listOf(1, 2, 3), listOf(4, 5)), ) dfWithLists.split { a }.intoRows() ``` --- ### `explode` — spread list/frame values into rows Expands list-valued cells into individual rows, duplicating values in other columns. Reverses `implode`. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* val df = dataFrameOf("a", "b")( 1, listOf(1, 2), 2, listOf(3, 4), ) df.explode("b") // Result: a=1/b=1, a=1/b=2, a=2/b=3, a=2/b=4 // Explode multiple columns simultaneously (values aligned) val df2 = dataFrameOf( "a" to columnOf(listOf(1, 2), listOf(3, 4, 5)), "b" to columnOf(listOf(1, 2, 3), listOf(4, 5)), ) df2.explode("a", "b") // Explode a FrameColumn val col by columnOf( dataFrameOf("a", "b")(1, 2, 3, 4), dataFrameOf("a", "b")(5, 6, 7, 8), ) col.explode() ``` --- ## Sorting ### `sortBy` / `sortByDesc` / `sortWith` — sort rows Returns a `DataFrame` with rows sorted by one or more columns. Modifiers `.desc()` and `.nullsLast()` control order per column. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.sortBy { age } df.sortBy { age and name.firstName.desc() } df.sortBy { weight.nullsLast() } df.sortByDesc { age and weight } // Custom comparator df.sortWith { row1, row2 -> when { row1.age < row2.age -> -1 row1.age > row2.age -> 1 else -> row1.name.firstName.compareTo(row2.name.firstName) } } ``` --- ## Deduplication ### `distinct` / `distinctBy` — remove duplicate rows Removes duplicate rows from the `DataFrame`. `distinctBy` keeps only the first row per group defined by given columns. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.distinct() // Distinct on a subset of columns df.distinct { age and name } df.distinct("age", "name") // Keep first row per group df.distinctBy { age and name } ``` --- ## Aggregation ### `groupBy` — group rows and aggregate Splits rows into groups by one or more key columns, then aggregates each group. Returns a `GroupBy` object that can be aggregated, pivoted, sorted, or converted back to a `DataFrame`. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Simple grouping df.groupBy { name } df.groupBy { city and name.lastName } // Inline computed key df.groupBy { expr { name.firstName.length + name.lastName.length } named "nameLength" } // Multi-statistic aggregation df.groupBy { city }.aggregate { count() into "total" count { age > 18 } into "adults" median { age } into "median age" min { age } into "min age" maxBy { age }.name into "oldest" } // Direct aggregation shortcuts df.groupBy { city }.max() // max per comparable column df.groupBy { city }.mean() // mean per numeric column df.groupBy { city }.max { age } // → column "age" df.groupBy { city }.sum("total weight") { weight } // → column "total weight" df.groupBy { city }.count() // → column "count" // Collect raw values without aggregation df.groupBy { city }.values { name and age } df.groupBy { city }.values { weight into "weights" } // Concat groups back into a DataFrame (preserving group order) df.groupBy { name }.concat() ``` --- ### `pivot` — pivot columns from row values Reshapes the `DataFrame` by turning distinct values in one column into new column headers, optionally combined with `groupBy` for a full cross-tabulation matrix. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.pivot { city } df.pivot { city and name.firstName } // independent pivots df.pivot { city then name.firstName } // hierarchical pivot // pivot + groupBy (cross-tabulation) df.pivot { city }.groupBy { name }.aggregate { mean { age } } df.groupBy { name }.pivot { city }.aggregate { mean { age } } // Both produce the same result: rows indexed by name, columns by city ``` --- ### `describe` — summary statistics Produces a summary `DataFrame` with per-column statistics: count, unique, nulls, top, freq, mean, std, min, p25, median, p75, max. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.describe() // Describe specific columns only df.describe { age and name.allCols() } ``` --- ## Joining DataFrames ### `joinWith` — expression-based join Joins two `DataFrame` objects using an arbitrary Boolean expression. Supports Inner, Left, Right, Full, Filter, and Exclude join types via shortcut functions. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* import kotlinx.datetime.LocalDate // Inner join on a date range condition campaigns.innerJoinWith(visits) { right.date in startDate..endDate } // Left join campaigns.leftJoinWith(visits) { right.date in startDate..endDate } // Right join campaigns.rightJoinWith(visits) { right.date in startDate..endDate } // Full join campaigns.fullJoinWith(visits) { right.date in startDate..endDate } // Filter join (inner but keeps only left columns) campaigns.filterJoinWith(visits) { right.date in startDate..endDate } // Exclude join (rows from left with no match on the right) campaigns.excludeJoinWith(visits) { right.date in startDate..endDate } // Cross product (cartesian join) campaigns.joinWith(visits) { true } // String-based inner join on equality (deduplicates matched columns) df1.innerJoin(df2, "index", "age") ``` --- ## Combining DataFrames ### `concat` — union rows from multiple DataFrames Vertically concatenates rows from multiple `DataFrame` objects. Unifies schemas: matching column names get the lowest common supertype; missing columns are filled with `null`. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* df.concat(df1, df2) listOf(df1, df2).concat() // Concat selected rows val rows = listOf(df[2], df[4], df[5]) rows.concat() // Concat two DataColumn instances val a by columnOf(1, 2) val b by columnOf(3, 4) a.concat(b) // Explode a FrameColumn (i.e. concat its nested frames) val frameColumn by columnOf( dataFrameOf("a", "b")(1, 2, 3, 4), dataFrameOf("b", "c")(5, 6, 7, 8), ) frameColumn.concat() ``` --- ## Type-Safe Access with `@DataSchema` ### `@DataSchema` — declare typed DataFrame schemas Annotate an interface or data class with `@DataSchema` to generate extension properties for type-safe column access. Use with `cast<>()` (assertion) or `convertTo<>()` (coercing conversion) to apply the schema to a raw `DataFrame`. ```kotlin import org.jetbrains.kotlinx.dataframe.annotations.* import org.jetbrains.kotlinx.dataframe.api.* @DataSchema interface Person { val firstName: String @ColumnName("last_name") // maps to a column literally named "last_name" val lastName: String val age: Int val city: String? } // Cast: compile-time assertion that schema matches (no data conversion) val df = DataFrame.readCsv("people.csv").cast<Person>() // Type-safe access via generated extension properties df.filter { firstName.startsWith("A") && age > 18 } df.add("greeting") { "Hello, $firstName $lastName" } df.select { firstName and lastName } df.convert { firstName }.with { it.uppercase() } // Schemas for nested / hierarchical data @DataSchema data class Name(val firstName: String, val lastName: String) @DataSchema data class PersonNested(val name: Name, val age: Int, val city: String?) @DataSchema data class Group(val id: String, val participants: List<PersonNested>) val url = "https://raw.githubusercontent.com/Kotlin/dataframe/refs/heads/master/data/participants.json" val groupDf = DataFrame.readJson(url).cast<Group>() groupDf.participants.explode() // expand FrameColumn into rows ``` --- ### `convertTo<Schema>` — coerce DataFrame to schema Converts all columns to match the target `@DataSchema`, applying automatic type conversion and accepting custom converters, parsers, and fillers for missing or non-standard columns. ```kotlin import org.jetbrains.kotlinx.dataframe.annotations.* import org.jetbrains.kotlinx.dataframe.api.* class MyType(val value: Int) @DataSchema class MySchema(val a: MyType, val b: MyType, val c: Int) val raw: AnyFrame = dataFrameOf( "a" to columnOf(1, 2, 3), "b" to columnOf("1", "2", "3"), ) val typed = raw.convertTo<MySchema> { convert<Int>().with { MyType(it) } // Int → MyType for column "a" parser { MyType(it.toInt()) } // String → MyType for column "b" fill { c }.with { a.value + b.value } // compute missing column "c" } ``` --- ## Writing Data ### `writeCsv` / `writeJson` / `writeExcel` / `writeArrowIPC` — persist DataFrames Saves a `DataFrame` to CSV, JSON, Excel (XLS/XLSX), or Apache Arrow formats. ```kotlin import org.jetbrains.kotlinx.dataframe.io.* import java.io.File val df = DataFrame.readCsv("input.csv") .filter { "age"<Int>() > 18 } .rename("stargazers_count").into("stars") // CSV df.writeCsv(File("output.csv")) val csvString = df.toCsvStr(delimiter = ';', recordSeparator = System.lineSeparator()) // JSON df.writeJson(File("output.json")) val jsonString = df.toJson(prettyPrint = true) // Excel – single sheet df.writeExcel(File("output.xlsx")) // Excel – multiple sheets (keepFile = true appends without overwriting) df.filter { "isHappy"<Boolean>() }.remove("isHappy") .writeExcel(File("report.xlsx"), sheetName = "happyPersons", keepFile = true) // Apache Arrow IPC streaming df.writeArrowIPC(File("data.arrow")) val bytes: ByteArray = df.saveArrowIPCToByteArray() // Apache Arrow Feather (random access) df.writeArrowFeather(File("data.feather")) val featherBytes: ByteArray = df.saveArrowFeatherToByteArray() // Arrow with a target schema and strict mode val schema = org.apache.arrow.vector.types.pojo.Schema.fromJSON(schemaJson) df.arrowWriter( targetSchema = schema, mode = ArrowWriter.Mode(restrictWidening = true, strictType = true, strictNullable = false), ).use { writer -> writer.writeArrowFeather(File("typed.feather")) } ``` --- ## Compiler Plugin (Type-Safe Extension Properties) ### Kotlin DataFrame Compiler Plugin — live schema tracking When the Kotlin DataFrame Compiler Plugin is enabled in a Gradle/Maven project, every `DataFrame` operation that changes the schema generates new extension properties on the fly. This means the IDE and the Kotlin compiler always know the exact shape of the `DataFrame` at each step — no `@DataSchema` annotation required. ```kotlin // build.gradle.kts plugins { kotlin("jvm") version "2.3.20" kotlin("plugin.dataframe") version "2.3.20" } dependencies { implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta5") } // gradle.properties: kotlin.incremental=false // ── Full end-to-end example ────────────────────────────────────────── import org.jetbrains.kotlinx.dataframe.api.* // Step 1: read — plugin infers schema from CSV headers + content val df = DataFrame.readCsv( "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv" ) // Step 2: rename/clean — new schema propagated; "stars" property available below val repos = df .renameToCamelCase() .rename { stargazersCount }.into("stars") // Step 3: filter — typed access to "stars" (Int), no cast needed val popular = repos.filter { stars > 50 } // Step 4: transform — convert "topics" String column to List<String> val enriched = popular.convert { topics }.with { raw -> val inner = raw.removeSurrounding("[", "]") if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim) } // Step 5: add derived column val result = enriched.add("topicCount") { topics.size } // Step 6: persist result.writeCsv("jetbrains_repositories_enriched.csv") ``` --- Kotlin DataFrame is primarily used for two broad patterns. The first is **ETL and data engineering pipelines** on the JVM: ingesting raw data from CSV files, REST JSON endpoints, SQL databases, or Parquet/Arrow stores; normalizing schemas (including hierarchical/nested ones via `ColumnGroup` and `FrameColumn`); applying filtering, transformation, deduplication, join, and aggregation chains; and writing cleaned data back to file or database. The library's immutable, functional style makes it straightforward to compose multi-step transformations in a readable, auditable manner without side-effects. The second major pattern is **interactive data exploration in Kotlin Notebook or Datalore**, where the Jupyter integration (`%use dataframe`) and the compiler plugin combine to give a pandas-like workflow with full Kotlin type safety. Analysts can read data with auto-detected schemas, explore structure via `describe()`, filter and pivot with DSL expressions, visualise results with Kandy, and export processed frames — all while benefiting from IDE autocompletion and null-safety. The `@DataSchema` annotation enables teams to define shared typed contracts for DataFrames that cross service boundaries or are read from external sources, making the library suitable for production data services as well as exploratory notebooks.