### Convert DataFrame to GroupBy Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Illustrates how to convert a regular DataFrame into a GroupBy DataFrame. This is achieved by interpreting a specific `FrameColumn` within the DataFrame as the data groups. This operation is useful when preparing data for group-wise transformations. ```kotlin val key by columnOf(1, 2) // create int column with name "key" val data by columnOf(df[0..3], df[4..6]) // create frame column with name "data" val df = dataFrameOf(key, data) // create dataframe with two columns df.asGroupBy { data } // convert dataframe to GroupBy by interpreting 'data' column as groups ``` -------------------------------- ### GroupBy DataFrame without MoveToTop Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Demonstrates the behavior of groupBy when `moveToTop` is set to `false` (the default). In this case, the grouped column remains nested within its original `ColumnGroup`, preserving the hierarchical structure. ```kotlin df.groupBy(moveToTop = false) { name.lastName } ``` -------------------------------- ### Convert GroupBy to DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Shows how to convert a GroupBy DataFrame back into a regular DataFrame. This operation effectively unions all the data groups into a single DataFrame, preserving the order established by the grouping and any subsequent transformations. ```kotlin df.groupBy { city }.toDataFrame() ``` -------------------------------- ### GroupBy DataFrame by Column Properties Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Demonstrates how to group a DataFrame using column properties. This is useful for grouping based on the values or derived properties of columns. It supports grouping by single columns, multiple columns, or by creating new columns through expressions. ```kotlin df.groupBy { name } df.groupBy { city and name.lastName } df.groupBy { age / 10 named "ageDecade" } ``` -------------------------------- ### GroupBy DataFrame with Expression Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Shows how to group a DataFrame based on a computed expression. This allows for creating groups based on complex logic or derived values that are not directly represented by existing columns. The resulting group column can be named. ```kotlin df.groupBy { expr { name.firstName.length + name.lastName.length } named "nameLength" } df.groupBy { expr { "name"["firstName"]().length + "name"["lastName"]().length } named "nameLength" } ``` -------------------------------- ### Group Data by Year and Genre using GroupBy Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb This code groups the DataFrame by both 'year' and 'genres' columns. The `groupBy` operation is applied after filtering and normalizing the genre data. It produces a `GroupBy` object, which is a precursor to aggregation operations. The output includes a synthetic 'group' column containing DataFrames, illustrating DataFrame nesting capabilities. ```kotlin movies .filter { year >= 1920 && genres != "(no genres listed)" } .split { genres }.by("|").intoRows() .groupBy { year and genres } ``` -------------------------------- ### GroupBy DataFrame with MoveToTop Parameter Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Explains the use of the `moveToTop` parameter in the groupBy operation. When `moveToTop` is set to `true`, the grouped column is promoted to a top-level column in the resulting DataFrame. This is useful for flattening the structure. ```kotlin df.groupBy(moveToTop = true) { name.lastName } ``` -------------------------------- ### Simplified Group Aggregation in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md When only one aggregation function is used in a groupBy operation, the column name can be omitted. This simplifies the syntax for single aggregation results. ```kotlin df.groupBy { city }.aggregate { maxBy { age }.name } ``` ```kotlin df.groupBy("city").aggregate { maxBy("age")["name"] } ``` -------------------------------- ### GroupBy DataFrame by Column Strings Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Illustrates grouping a DataFrame using column names as strings. This method is convenient for referencing columns by their string identifiers. It also allows for nested column access and creating new columns with specified names. ```kotlin df.groupBy("name") df.groupBy { "city" and "name"["lastName"] } df.groupBy { "age"() / 10 named "ageDecade" } ``` -------------------------------- ### Group Data by Year and Genre using GroupBy Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Shows how to group a DataFrame by multiple columns ('year' and 'genres') using the `groupBy` operation. The result is a `GroupBy` type, which contains DataFrames within a 'group' column and enables aggregation operations. ```kotlin movies .filter { year >= 1920 && genres != "(no genres listed)" } .split { genres }.by("|").intoRows() .groupBy { year and genres } ``` -------------------------------- ### Group DataFrame by Column in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/guides/quickstart.md Groups rows of a DataFrame based on the values in selected columns using the .groupBy() operation. The result is a GroupBy object. ```kotlin val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij } ``` -------------------------------- ### Direct Aggregations on Grouped DataFrames in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Common aggregation functions can be applied directly to a GroupBy object without needing the aggregate function explicitly. This includes functions like max, mean, sum, count, medianFor, and minFor, supporting both property access and string column names. ```kotlin df.groupBy { city }.max() // max for every comparable column df.groupBy { city }.mean() // mean for every numeric column df.groupBy { city }.max { age } // max age into column "age" df.groupBy { city }.sum("total weight") { weight } // sum of weights into column "total weight" df.groupBy { city }.count() // number of rows into column "count" df.groupBy { city } .max { name.firstName.map { it.length } and name.lastName.map { it.length } } // maximum length of firstName or lastName into column "max" df.groupBy { city } .medianFor { age and weight } // median age into column "age", median weight into column "weight" df.groupBy { city } .minFor { (age into "min age") and (weight into "min weight") } // min age into column "min age", min weight into column "min weight" df.groupBy { city }.meanOf("mean ratio") { weight?.div(age) } // mean of weight/age into column "mean ratio" ``` ```kotlin df.groupBy("city").max() // max for every comparable column df.groupBy("city").mean() // mean for every numeric column df.groupBy("city").max("age") // max age into column "age" df.groupBy("city").sum("weight", name = "total weight") // sum of weights into column "total weight" df.groupBy("city").count() // number of rows into column "count" df.groupBy("city").max { "name"["firstName"]().map { it.length } and "name"["lastName"]().map { it.length } } // maximum length of firstName or lastName into column "max" df.groupBy("city") .medianFor("age", "weight") // median age into column "age", median weight into column "weight" df.groupBy("city") .minFor { ("age"() into "min age") and ("weight"() into "min weight") } // min age into column "min age", min weight into column "min weight" df.groupBy("city").meanOf("mean ratio") { "weight"()?.div("age"()) } // mean of weight/age into column "mean ratio" ``` -------------------------------- ### Extracting Group Values Without Aggregation in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md The 'values' function can be used to retrieve all column values for each group without performing aggregation. For ValueColumns, it collects values into lists, and for ColumnGroups, it converts them into FrameColumns. ```kotlin df.groupBy { city }.values() df.groupBy { city }.values { name and age } df.groupBy { city }.values { weight into "weights" } ``` ```kotlin df.groupBy("city").values() df.groupBy("city").values("name", "age") df.groupBy("city").values { "weight" into "weights" } ``` -------------------------------- ### Group and Aggregate DataFrames in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/groupBy.md Compute statistics per group using the aggregate function. This function processes each data group and allows adding new columns to the resulting DataFrame using the 'into' function. It supports aggregations on properties or strings. ```kotlin df.groupBy { city }.aggregate { count() into "total" count { age > 18 } into "adults" median { age } into "median age" min { age } into "min age" maxBy { age }.name into "oldest" } ``` ```kotlin df.groupBy("city").aggregate { count() into "total" count { "age"() > 18 } into "adults" median("age") into "median age" min("age") into "min age" maxBy("age")["name"] into "oldest" } ``` ```kotlin df.groupBy("city").aggregate { count() into "total" count { "age"() > 18 } into "adults" "age"().median() into "median age" "age"().min() into "min age" maxBy("age")["name"] into "oldest" } ``` -------------------------------- ### Group DataFrame by Column Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/resources/guides/quickstart.ipynb Groups rows of a DataFrame based on the values in specified key columns using the `.groupBy { }` operation. It returns a `GroupBy` object, which is a DataFrame-like structure associating keys with row subsets. ```kotlin val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij } groupedByIsIntellij ``` -------------------------------- ### Pivot and GroupBy DataFrame (Strings) Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/pivot.md Demonstrates combining pivot and groupBy operations using string column references for vertical grouping. This creates a structured matrix table. ```kotlin df.pivot("city").groupBy("name") ``` ```kotlin df.groupBy("name").pivot("city") ``` -------------------------------- ### Analyze DataFrame - groupBy Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/_shadow_resources.md Shows how to group rows of a DataFrame based on the values in one or more columns, preparing for aggregation operations. This is a core step in many analytical tasks. ```kotlin df.groupBy("category") ``` -------------------------------- ### Aggregate Groups in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/guides/quickstart.md Computes summary statistics for each group in a GroupBy object using the .aggregate() operation. Allows specifying multiple aggregations like sum and max. ```kotlin groupedByIsIntellij.aggregate { // Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns sumOf { starsCount } into "sumStars" maxOf { starsCount } into "maxStars" } ``` -------------------------------- ### Pivot and GroupBy DataFrame (Properties) Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/pivot.md Combines the pivot and groupBy operations to create a matrix table expanded both horizontally and vertically. It shows how to specify columns for vertical grouping using property access. The order of pivot and groupBy can be reversed with the same outcome. ```kotlin df.pivot { city }.groupBy { name } ``` ```kotlin df.groupBy { name }.pivot { city } ``` -------------------------------- ### Group data by column in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb This snippet explains how to group data in a Kotlin DataFrame based on the values in a specific column. This is often a precursor to aggregation operations like summing or counting within each group. The `groupBy` function is central to this process. ```kotlin import org.jetbrains.kotlinx.dataframe.io.readCSV import org.jetbrains.kotlinx.dataframe.groupBy fun main() { val df = readCSV("path/to/your/file.csv") val groupedDf = df.groupBy { "categoryColumn" } println(groupedDf) } ``` -------------------------------- ### Grouping and Aggregating in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Illustrates grouping rows by a specific column and then applying aggregation functions in Kotlin. This is essential for summary statistics and analysis. It uses the `groupBy` and aggregation operators. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = dataFrameOf( "category" to listOf("A", "B", "A", "B", "A"), "value" to listOf(10, 20, 15, 25, 12) ) val aggregated = df.groupBy("category").aggregate { "total_value" from "value" into sum() } println(aggregated) } ``` -------------------------------- ### Group by columns using Kotlin DataFrame API Source: https://github.com/kotlin/dataframe/blob/master/plugins/kotlin-dataframe/testData/diagnostics/schemaInfo.fir.txt This snippet illustrates how to perform a group-by operation on a DataFrame. It uses the `groupBy` function from the Kotlin DataFrame API, specifying the columns to group by, such as 'lastName' and potentially other group properties. ```kotlin ^ R|/it|.R|org/jetbrains/kotlinx/dataframe/api/groupBy|/Group_92|>( = groupBy@fun R|org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl</Group_92>|.(it: R|org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl</Group_92>|): R|org/jetbrains/kotlinx/dataframe/columns/ColumnsResolver<*>| { ^ (this@R|/test|, (this@R|/test|, this@R|special/anonymous|).R|/Scope0.participants|).R|/Scope1.lastName| } ``` -------------------------------- ### Simplify Grouping by Year using Map (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/netflix/netflix.ipynb This Kotlin snippet demonstrates a more concise way to group by year by using the `map` function within the column selector directly in the `groupBy` operation. It achieves the same result as the previous snippet but with less code. ```kotlin val df_date_count = df .groupBy { date_added.map { it.year } } // grouping by year added extracted from `date_added` .aggregate { count { type == "TV Show" } into "tvshows" // counting TV Shows into column `tvshows` count { type == "Movie" } into "movies" // counting Movies into column `movies` } df_date_count ``` -------------------------------- ### Group DataFrame by 'city' column in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/plugins/kotlin-dataframe/testData/box/reorder.fir.txt This snippet demonstrates how to group a DataFrame by the 'city' column using the groupBy extension function. It leverages Kotlin's type-safe column selection for clarity and efficiency. The result is a DataFrame where rows are aggregated based on unique city values. ```kotlin df.groupBy { city() } ``` -------------------------------- ### Group and Aggregate Data in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Explains how to group rows by one or more columns and then perform aggregate calculations on the grouped data. This is essential for summarizing and understanding data patterns. It uses the `groupBy` and aggregation functions. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* // Assuming 'df' has 'category' and 'value' columns val aggregatedDf = df.groupBy("category").aggregate { mean(value) } ``` -------------------------------- ### Dataframe Grouping and Aggregation in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Explains how to group rows by one or more columns and then perform aggregate operations (like sum, count, average) on the grouped data in Kotlin. This is fundamental for summary statistics. Uses `groupBy` and `aggregate`. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* fun main() { data class Sale(val product: String, val quantity: Int, val price: Double) val sales = listOf(Sale("Apple", 10, 0.5), Sale("Banana", 20, 0.3), Sale("Apple", 5, 0.5), Sale("Orange", 15, 0.4)) val df = sales.toDataFrame() // Group by product and calculate total quantity and average price val aggregatedSales = df.groupBy { product }.aggregate { "total_quantity" into { sumOf { quantity } } "average_price" into { meanOf { price } } } println(aggregatedSales) } ``` -------------------------------- ### GroupBy Operations on Kotlin DataFrames Source: https://context7.com/kotlin/dataframe/llms.txt Partition DataFrame rows into groups based on key columns and perform aggregations within each group. Supports nested grouping and custom aggregation functions. ```kotlin // Basic grouping df.groupBy { city } // Multiple columns df.groupBy { city and country } // Group by column names df.groupBy("city", "country") // Group and aggregate df.groupBy { city } .aggregate { count() into "count" age.mean() into "avgAge" income.sum() into "totalIncome" } // Group with filtering df.groupBy { category } .filter { count() > 10 } .aggregate { values.mean() into "average" } ``` -------------------------------- ### Group and Aggregate DataFrame in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Shows how to group data by a column and perform aggregate functions (like count, sum, average) on other columns. This is a common data analysis pattern. Uses `groupBy` and `aggregate`. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* // Sample DataFrame with more data for grouping (assuming 'df' is created) // val df = DataFrame.from(listOf( // mapOf("category" to "A", "value" to 10), // mapOf("category" to "B", "value" to 20), // mapOf("category" to "A", "value" to 15), // mapOf("category" to "C", "value" to 25), // mapOf("category" to "B", "value" to 30) // )) // Group by 'category' and calculate sum and average of 'value' val aggregatedDf = df.groupBy("category").aggregate { "sum_value"" named sumOf { value } "avg_value"" named meanOf { value } "count"" named count() } aggregatedDf.print() ``` -------------------------------- ### Group and aggregate data in DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Illustrates grouping data by a specific column and performing aggregate functions (e.g., sum, count). This is crucial for summarizing data. The `groupBy` and `aggregate` operations are used here. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* val aggregatedDf = df.groupBy { city }.aggregate { count() into "city_count" sum(age) into "total_age" } ``` -------------------------------- ### Group and Aggregate Data (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb This snippet shows how to group data in a Kotlin DataFrame by one or more columns and then perform aggregate operations. It uses the 'groupBy' and 'aggregate' functions. The input is a DataFrame and grouping/aggregation specifications, and the output is an aggregated DataFrame. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.groupBy import org.jetbrains.kotlinx.dataframe.aggregate import org.jetbrains.kotlinx.dataframe.head fun main() { // Assuming 'df' is a pre-existing DataFrame // val df: DataFrame<*> = ... // Example: Group by 'category' and calculate the mean of 'value' val aggregatedDf = df.groupBy("category").aggregate { "mean_value"() from "value" into org.jetbrains.kotlinx.dataframe.Aggregation. mean() } println(aggregatedDf.head()) } ``` -------------------------------- ### Grouped DataFrame Summary Statistics (Single Value) Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/summaryStatistics.md This snippet shows how to compute a single summary statistic value for each group in a GroupBy DataFrame. It demonstrates computing the mean of a specific column or an expression per group, and how to name the resulting aggregated column. ```kotlin df.groupBy { city }.mean { age } // [`city`, `mean`] df.groupBy { city }.meanOf { age / 2 } // [`city`, `mean`] df.groupBy { city }.mean("mean age") { age } // [`city`, `mean age`] df.groupBy { city }.meanOf("custom") { age / 2 } // [`city`, `custom`] ``` -------------------------------- ### Group and Aggregate Data in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb This code demonstrates how to group data by one or more columns and then perform aggregate calculations (e.g., count, sum, average) on the grouped data. The `groupBy` and `aggregate` operations are powerful for summarizing data. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.io.readJSON fun main() { val jsonString = """ [ {"city": "New York", "value": 100}, {"city": "Los Angeles", "value": 150}, {"city": "New York", "value": 120}, {"city": "Chicago", "value": 80}, {"city": "Los Angeles", "value": 200} ] """ val df = DataFrame.readJSON(jsonString) // Group by 'city' and calculate the sum and average of 'value' val aggregatedDf = df.groupBy { city }.aggregate { count() into "count" sumOf { value } into "total_value" avgOf { value } into "average_value" } println(aggregatedDf) } ``` -------------------------------- ### Group and Aggregate DataFrame (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb This code demonstrates how to group rows in a DataFrame by one or more columns and then perform aggregate operations on the grouped data. It uses the `groupBy` and `aggregate` extension functions. The function takes a DataFrame, the column(s) to group by, and a set of aggregation operations as input, returning a new DataFrame with the aggregated results. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.aggregate import org.jetbrains.kotlinx.dataframe.groupBy fun aggregateDataFrame(df: DataFrame, groupCol: org.jetbrains.kotlinx.dataframe.ColumnReference, aggregations: org.jetbrains.kotlinx.dataframe.api.Aggregation.() -> Unit): DataFrame<*> { return df.groupBy(groupCol).aggregate(aggregations) } ``` -------------------------------- ### Kotlin DataFrame: Grouping Data by Columns Source: https://github.com/kotlin/dataframe/blob/master/plugins/kotlin-dataframe/testData/box/reorder.fir.txt Demonstrates how to group data in a Kotlin DataFrame using the `groupBy` API. This snippet shows the selection of columns for grouping and subsequent processing of the grouped data. ```kotlin ^ R|/it|.R|org/jetbrains/kotlinx/dataframe/api/groupBy|/Invoke_09|>( = groupBy@fun R|org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl</Invoke_09>|.(it: R|org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl</Invoke_09>|): R|org/jetbrains/kotlinx/dataframe/columns/ColumnsResolver<*>| { ^ (this@R|/box|, this@R|special/anonymous|).R|/Scope0.city| } ). R|kotlin/let|/Key_32, /Group_32>|, R|org/jetbrains/kotlinx/dataframe/DataFrame</Key_92>|>( = fun (it: R|org/jetbrains/kotlinx/dataframe/api/GroupBy</Key_32, /Group_32>|): R|org/jetbrains/kotlinx/dataframe/DataFrame</Key_92>| { local abstract class Key_92I : R|kotlin/Any| { @R|org/jetbrains/kotlinx/dataframe/annotations/Order|(order = Int(0)) public abstract val city: R|kotlin/String?| public get(): R|kotlin/String?| @R|org/jetbrains/kotlinx/dataframe/annotations/Order|(order = Int(1)) public abstract val a: R|org/jetbrains/kotlinx/dataframe/DataFrame</A_351>| public get(): R|org/jetbrains/kotlinx/dataframe/DataFrame</A_351>| public constructor(): R|/Key_92I| } local final class Scope0 : R|kotlin/Any| { ``` -------------------------------- ### Concatenate GroupBy Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/concat.md Applies the concat operation to a GroupedDataFrame. This is useful for combining the results of a grouping operation. ```kotlin df.groupBy { name }.concat() ``` -------------------------------- ### Mean Aggregations with GroupBy and Pivot in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/mean.md Demonstrates how to compute the mean within different DataFrame aggregation structures like `groupBy` and `pivot`. This allows for calculating means based on grouped or pivoted data. ```kotlin df.mean() df.age.mean() df.groupBy { city }.mean() df.pivot { city }.mean() df.pivot { city }.groupBy { name.lastName }.mean() ``` -------------------------------- ### Count Animal Occurrences in DataFrame in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/puzzles/40 puzzles.ipynb This snippet counts the number of occurrences for each distinct value in the 'animal' column. It uses `groupBy` on 'animal' and then `count()`. ```kotlin df.groupBy { animal }.count() ``` -------------------------------- ### Grouping and Aggregating Data in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Explains how to group rows by one or more columns and then aggregate data within those groups using Kotlin DataFrame. This is essential for summary statistics. It leverages the `groupBy` and aggregation functions. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = DataFrame.create( "Category" to listOf("A", "B", "A", "B", "A"), "Value" to listOf(10, 15, 12, 18, 11) ) // Group by 'Category' and calculate the sum of 'Value' val aggregatedDf = df.groupBy { "Category" }.aggregate { "Value".sum() } println(aggregatedDf) } ``` -------------------------------- ### Count Rows in Aggregations (groupBy, pivot) Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/count.md Counts rows within aggregation operations like groupBy and pivot. This is useful for determining the size of subgroups or pivoted data. ```kotlin df.groupBy { city }.count() df.pivot { city }.count { age > 18 } df.pivot { name.firstName }.groupBy { name.lastName }.count() ``` -------------------------------- ### Extract values in groupBy and pivot aggregations (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/values.md Used within aggregation contexts (`groupBy`, `pivot`) to yield a list of column values for each aggregated data group. ```kotlin df.groupBy { A }.values { B } df.pivot { A }.values { B } df.pivot { A }.groupBy { B }.values { C and D } ``` -------------------------------- ### Ungroup Data in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/_shadow_resources.md Shows how to reverse a 'groupBy' operation by ungrouping a DataFrame. This is useful when you need to flatten a grouped structure back into a single level. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.samples.api.* fun main() { val df = dataFrameOf("City" to listOf("New York", "New York", "Chicago"), "Name" to listOf("Alice", "Charlie", "David"), "Age" to listOf(30, 30, 25)) val grouped = df.groupBy { City }.add("avgAge") { Age.mean() } println("Grouped DataFrame:\n$grouped\n") // Ungroup the DataFrame val ungrouped = grouped.ungroup() println("Ungrouped DataFrame:\n$ungrouped") } ``` -------------------------------- ### Group and Aggregate Data (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Provides an example of grouping data by a specific column and then performing an aggregation (e.g., sum, average). This is a core operation for summarizing and understanding data distributions. The code uses the `groupBy` and `aggregate` functions. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* fun main() { // Assuming 'df' is an existing DataFrame with 'category' and 'value' columns // val df: DataFrame<*> = ... val aggregatedDf = df.groupBy("category").aggregate { "totalValue" from "value" sum "averageValue" from "value" mean } println(aggregatedDf) } ``` -------------------------------- ### Grouped DataFrame Summary Statistics (Multiple Values) Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/summaryStatistics.md This snippet illustrates computing summary statistics for multiple columns within each group of a GroupBy DataFrame. It shows how to compute statistics separately for specified columns or for all applicable columns in a group. ```kotlin df.groupBy { city }.meanFor { age and weight } // [`city`, `age`, `weight`] df.groupBy { city }.mean() // [`city`, `age`, `weight`, ...] ``` -------------------------------- ### Percentile Aggregations with GroupBy and Pivot Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/percentile.md Illustrates how to use the percentile operation in conjunction with groupBy and pivot operations for complex data aggregations in Kotlin Dataframe. This allows for calculating percentiles within specific groups or pivoted columns. ```kotlin df.percentile(25.0) df.age.percentile(75.0) df.groupBy { city }.percentile(50.0) df.pivot { city }.percentile(75.0) df.pivot { city }.groupBy { name.lastName }.percentile(25.0) ``` -------------------------------- ### Count Rows in Groups in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/guides/quickstart.md Computes the number of rows within each group of a GroupBy object using the .count() aggregation. ```kotlin groupedByIsIntellij.count() ``` -------------------------------- ### Define Group Structure for GroupBy Operation Source: https://github.com/kotlin/dataframe/blob/master/plugins/kotlin-dataframe/testData/box/reorder.fir.txt Defines an abstract class `Group_32I` representing the structure of a group after a `groupBy` operation on a DataFrame. It includes fields like 'city', 'age', 'name', and 'weight', with ordering specified by `@Order` annotations. ```kotlin local abstract class Group_32I : R|kotlin/Any| { @R|org/jetbrains/kotlinx/dataframe.annotations.Order|(order = Int(2)) public abstract val city: R|kotlin/String?| public get(): R|kotlin/String?| @R|org/jetbrains/kotlinx/dataframe.annotations.Order|(order = Int(1)) public abstract val age: R|kotlin/Int| public get(): R|kotlin/Int| @R|org/jetbrains/kotlinx/dataframe.annotations.Order|(order = Int(0)) public abstract val name: R|kotlin/String| public get(): R|kotlin/String| @R|org/jetbrains/kotlinx/dataframe.annotations.Order|(order = Int(3)) public abstract val weight: R|kotlin/Int?| public get(): R|kotlin/Int?| } ``` -------------------------------- ### Pivot inside Aggregate in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/pivot.md Demonstrates how to use the pivot transformation within the aggregate function of groupBy. This allows combining column pivoting with other groupBy aggregations like mean and count. It supports property access and string-based column access. ```kotlin df.groupBy { name.firstName }.aggregate { pivot { city }.aggregate(separate = true) { mean { age } into "mean age" count() into "count" } count() into "total" } ``` ```kotlin df.groupBy { "name"["firstName"] }.aggregate { pivot("city").aggregate(separate = true) { mean("age") into "mean age" count() into "count" } count() into "total" } ``` -------------------------------- ### Group and Aggregate Data in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Illustrates how to group data by one or more columns and perform aggregate operations (like sum, average, count) in a Kotlin DataFrame. This is crucial for summarizing data. It requires the 'groupBy' and 'aggregate' extensions. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.groupBy import org.jetbrains.kotlinx.dataframe.api.aggregate fun main() { val df: DataFrame<*> = // ... assume df is already loaded val aggregatedDf = df.groupBy("category").aggregate { "total_sales" (`sum` of "sales") "average_price" (`mean` of "price") } println(aggregatedDf) } ``` -------------------------------- ### Perform DataFrame Grouping by 'city' in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/plugins/kotlin-dataframe/testData/box/reorder.fir.txt Demonstrates how to group a DataFrame using the `groupBy` function. This snippet specifies the 'city' column as the grouping key and utilizes a lambda to define the resulting structure, including nested data access. ```kotlin ^ R|/it|.R|org/jetbrains/kotlinx/dataframe/api/groupBy|/Invoke_09|>( = groupBy@fun R|org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl</Invoke_09>|.(it: R|org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl</Invoke_09>|): R|org/jetbrains/kotlinx/dataframe/columns/ColumnsResolver<*>| { ^ (this@R|/box|, this@R|special/anonymous|).R|/Scope0.city| } ).R|kotlin/let|/Key_32, /Group_32>|, R|org/jetbrains/kotlinx/dataframe/DataFrame</Key_92>|>( = fun (it: R|org/jetbrains/kotlinx/dataframe/api/GroupBy</Key_32, /Group_32>|): R|org/jetbrains/kotlinx/dataframe/DataFrame</Key_92>| { ``` -------------------------------- ### Standard Deviation Aggregations with GroupBy and Pivot Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/std.md Demonstrates how to compute standard deviation as an aggregation using `groupBy` and `pivot` operations in Kotlin DataFrame. This allows for calculating standard deviations within specific groups or pivoted structures. Null values are ignored, and results are Double. NaN values are propagated by default. ```kotlin df.std() df.age.std() df.groupBy { city }.std() df.pivot { city }.std() df.pivot { city }.groupBy { name.lastName }.std() ``` -------------------------------- ### Sum of Three Greatest Values per Group - Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/puzzles/40 puzzles.ipynb Calculates the sum of the three greatest values within each group defined by the 'grps' column. It uses `groupBy`, `aggregate`, `sortDesc`, `take(3)`, and `sum` to achieve this. The result is a DataFrame with group labels and the calculated sums. ```kotlin val grps by columnOf("a", "a", "a", "b", "b", "c", "a", "a", "b", "c", "c", "c", "b", "b", "c") val vals by columnOf(12, 345, 3, 1, 45, 14, 4, 52, 54, 23, 235, 21, 57, 3, 87) val df = dataFrameOf(grps, vals) df.groupBy { grps }.aggregate { vals.sortDesc().take(3).sum() into "res" } ``` -------------------------------- ### Calculate Mean Age per Animal in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/puzzles/40 puzzles.ipynb This example calculates the mean 'age' for each unique 'animal' type in the DataFrame. It uses `groupBy` on the 'animal' column and then applies `mean` to the 'age' column. ```kotlin df.groupBy { animal }.mean { age } ``` -------------------------------- ### Group and Aggregate Data (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Shows how to group a DataFrame by one or more columns and then perform aggregate calculations (like sum, mean, count) on other columns. This is crucial for summarizing data. It uses the 'groupBy' and 'aggregate' operations. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.aggregate import org.jetbrains.kotlinx.dataframe.head import org.jetbrains.kotlinx.dataframe.io.read import org.jetbrains.kotlinx.dataframe.groupBy fun main() { val df: DataFrame<*> = read("path/to/your/data.csv") val aggregatedDf = df.groupBy { "category" }.aggregate { "mean_value" from "value" mean "sum_count" from "id" count } println(aggregatedDf.head()) } ``` -------------------------------- ### Group and Aggregate Data (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Demonstrates how to group data by one or more columns and perform aggregate calculations (e.g., sum, average, count) on the grouped data. This is a core operation for summarizing and analyzing data. It uses `groupBy` and aggregation functions. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = dataframe { "category" of "Electronics", "Clothing", "Electronics", "Home", "Clothing" "sales" of 1000, 500, 1500, 800, 700 "units" of 10, 20, 15, 25, 15 } // Group by 'category' and calculate total sales and average units val aggregatedData = df.groupBy("category").aggregate { "total_sales"("sales").sum() "avg_units"("units").mean() } println("Aggregated Sales Data:") println(aggregatedData) // Another example: group by category and count occurrences val countByCategory = df.groupBy("category").aggregate { "count"() meaning { size() } } println("Count by Category:") println(countByCategory) } ``` -------------------------------- ### Group DataFrame by 'city' and 'age' columns in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/plugins/kotlin-dataframe/testData/box/reorder.fir.txt This example shows how to perform a multi-column grouping on a DataFrame. By specifying both 'city' and 'age' in the groupBy function, the DataFrame is partitioned into groups where each group contains rows with the same combination of city and age. This is useful for more granular data analysis. ```kotlin df.groupBy { city() and age() } ``` -------------------------------- ### Group and Aggregate Data in DataFrame (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Shows how to group data by one or more columns and perform aggregate functions (like sum, mean, count) using Kotlin. This is a core operation for summarizing data. It utilizes the `groupBy` and `aggregate` functions. ```kotlin import org.jetbrains.kotlinx.dataframe.* import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = DataFrame.readCS("path/to/your/data.csv") val aggregatedDf = df.groupBy { city }.aggregate { count() into "count" age.mean() into "avg_age" } println(aggregatedDf) } ``` -------------------------------- ### Concatenate and Group DataFrame in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/_shadow_resources.md Shows how to concatenate multiple DataFrames and then group the resulting DataFrame by one or more columns. This is useful for combining datasets and performing aggregate operations. It uses `concat` followed by `groupBy`. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.io.readCSV fun main() { val df1 = dataFrameOf("key" to listOf("A", "B"), "value" to listOf(1, 2)) val df2 = dataFrameOf("key" to listOf("A", "C"), "value" to listOf(3, 4)) // Concatenate df1 and df2, then group by 'key' val concatenatedAndGrouped = concat(df1, df2).groupBy("key") println("Concatenated and Grouped DataFrame:\n$concatenatedAndGrouped") } ``` -------------------------------- ### Pivot with GroupByOther DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/pivot.md Shows how to pivot a DataFrame and group vertically by all columns except the pivoted ones using `groupByOther()`. This is useful for comprehensive vertical aggregation. ```kotlin df.pivot { city }.groupByOther() ``` -------------------------------- ### DataFrame Pivot Count by Year and Genre (Kotlin) Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb This snippet demonstrates how to create a pivot table from a DataFrame. It filters movies by year and genre, splits the 'genres' column into separate rows, sorts the data, and then pivots to count the occurrences of each genre per year. The operation uses `groupBy` and `pivot.count()`. ```kotlin movies .filter { year >= 1920 && genres != "(no genres listed)" } .split { genres }.by("|").intoRows() .sortBy { year and genres } .groupBy { year }.pivot { genres }.count() ``` -------------------------------- ### Group DataFrame by Column in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/_shadow_resources.md Shows how to group rows of a DataFrame based on the unique values in one or more columns. This is often a precursor to aggregation. The `group` function is used. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.io.readCSV fun main() { // Sample DataFrame val df = dataFrameOf("category" to listOf("A", "B", "A", "B", "A"), "value" to listOf(1, 2, 3, 4, 5)) // Group DataFrame by 'category' val dfGrouped = df.group("category") println("DataFrame grouped by 'category':\n$dfGrouped") } ``` -------------------------------- ### Group DataFrame by Column in Kotlin Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Groups rows of a Kotlin DataFrame based on the unique values in a specified column. This is a common step in aggregation and summary statistics. ```Kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.groupBy fun groupDataFrame(df: DataFrame<*>, columnName: String): org.jetbrains.kotlinx.dataframe.Grouped<*> { return df.groupBy(columnName) } ``` -------------------------------- ### Group by and Aggregate in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Illustrates how to group rows by a column and then perform aggregate operations (like count, sum, average) on other columns within each group. This is fundamental for summary statistics. ```kotlin import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = dataFrameOf( "Category" to listOf("A", "B", "A", "B", "A"), "Value" to listOf(10, 20, 15, 25, 12) ) val aggregatedDf = df.groupBy { Category }.aggregate { count() into "Count" Value.sum() into "TotalValue" } println(aggregatedDf) } ``` -------------------------------- ### Group DataFrame by Column and Aggregate Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Explains how to group rows of a DataFrame by the values in one or more columns and then apply aggregation functions (e.g., sum, mean) to other columns within each group. ```kotlin val groupedDf = df.groupBy("colB").aggregate { "sumA" with sumOf { colA } } ``` -------------------------------- ### Kotlin DataFrame: Group By and Aggregate Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Explains how to group rows by one or more columns and then perform aggregation operations on the grouped data in a Kotlin DataFrame. This is common for summarizing data. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.groupBy import org.jetbrains.kotlinx.dataframe.mean fun groupByAndAggregate(df: DataFrame<*>, groupCol: String, aggCol: String): DataFrame<*> { return df.groupBy(groupCol).mean(aggCol) } ``` -------------------------------- ### Group and Aggregate Tags per Movie with Kotlin Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/dev/movies/movies.ipynb Groups the joined DataFrame by 'movieId' and aggregates the 'tag' column into a set for each movie, removing duplicates. It also extracts the first 'title' for each group. ```kotlin moviesWithTags .groupBy { movieId } .aggregate { title.first() into "title" tag.dropNulls().toSet() into "tags" } ``` -------------------------------- ### Group By and Aggregate in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Shows how to group rows by a column and then perform aggregation functions (e.g., sum, mean) on other columns. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.dataFrameOf import org.jetbrains.kotlinx.dataframe.groupBy import org.jetbrains.kotlinx.dataframe.mean val df = dataFrameOf("category" to listOf("A", "B", "A", "B"), "value" to listOf(10, 20, 15, 25)) val aggregatedDf = df.groupBy("category").mean("value") println(aggregatedDf) ``` -------------------------------- ### Grouping and Aggregating in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Explains how to group rows by one or more columns and then apply aggregation functions (like sum, count, mean) to the grouped data in Kotlin. This is crucial for summarizing data. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = dataframe { col("city") col("value") }.add("A", 10) .add("B", 20) .add("A", 15) .add("C", 25) .add("B", 30) // Group by 'city' and calculate the sum of 'value' val aggregatedDf = df.groupBy("city").aggregate { sum("value").`as`("total_value") } println(aggregatedDf) } ``` -------------------------------- ### Compute Cumulative Sum in Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/docs/StardustDocs/topics/cumSum.md Calculates the cumulative sum of values in selected columns of a DataFrame. It can handle NA values based on the `skipNA` parameter. The operation is available for DataFrames, DataColumns, and GroupBy transformations. ```kotlin df.cumSum { weight } ``` ```kotlin df.weight.cumSum() ``` ```kotlin df.groupBy { city }.cumSum { weight }.concat() ``` -------------------------------- ### Group Data and Aggregate with Kotlin DataFrame Source: https://github.com/kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb Explains how to group data by one or more columns and then apply aggregate functions to each group. This is essential for summarizing and understanding group-wise statistics. ```kotlin import org.jetbrains.kotlinx.dataframe.DataFrame import org.jetbrains.kotlinx.dataframe.api.* fun main() { val df = DataFrame.readCSV("path/to/your/data.csv") // Example: Group by 'Category' and calculate the mean 'Value' for each category val groupedDf = df.groupBy("Category") { mean("Value") } println(groupedDf) } ```