### Example of pandas groupby head

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example demonstrates how to get the first N rows for each group using `groupby().head()`.

```python
df.groupby('grp').head(2)
```

--------------------------------

### Install Query.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

Use this command to install the Query.jl package using the Julia package manager.

```julia
using Pkg
Pkg.add("Query")
```

--------------------------------

### Example Doctest

Source: https://github.com/juliadata/dataframes.jl/blob/main/CONTRIBUTING.md

Doctests are examples written within docstrings that can be used as test cases. They need to match an interactive REPL, including the `julia>` prompt. Add the header `# Examples` above doctests.

```jldoctest
julia> uppercase("Docstring test")
"DOCSTRING TEST"
```

--------------------------------

### Setup a DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Initialize a DataFrame with sample integer data for demonstration purposes. This setup is required before performing manipulation or indexing operations.

```julia
df = DataFrame(x = 1:3, y = 4:6, z = 7:9)  # define data frame
```

--------------------------------

### Install DataFramesMeta.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

Use Pkg.add to install the DataFramesMeta.jl package. This is the first step before using its features.

```julia
using Pkg
Pkg.add("DataFramesMeta")
```

--------------------------------

### Get Column Vector by Copying

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Examples of how to get a column as a vector with a copy of the data.

```julia
german[:, :Age]
```

```julia
german[:, "Age"]
```

```julia
german[:, 2]
```

--------------------------------

### Install TidierData.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

Use Pkg.add to install the TidierData.jl package.

```julia
using Pkg
Pkg.add("TidierData")
```

--------------------------------

### Install CSV.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/importing_and_exporting.md

Install the CSV.jl package using the Pkg manager if it's not already installed.

```julia
using Pkg
Pkg.add("CSV")
```

--------------------------------

### Install CSV.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Use Pkg.add to install the CSV.jl package. This is a prerequisite for reading CSV files.

```julia
using Pkg

Pkg.add("CSV")
```

--------------------------------

### Install DataFrameMacros.jl

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

Use Pkg.add to install the DataFrameMacros.jl package.

```julia
using Pkg
Pkg.add("DataFrameMacros")
```

--------------------------------

### Create a basic string vector

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/categorical.md

This example shows a naive string vector representation. Use CategoricalArrays for more efficient storage.

```jldoctest
julia> v = ["Group A", "Group A", "Group A", "Group B", "Group B", "Group B"]
6-element Vector{String}:
 "Group A"
 "Group A"
 "Group A"
 "Group B"
 "Group B"
 "Group B"
```

--------------------------------

### Get DataFrame with Copied Columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Examples of how to get a DataFrame with copied columns using various selectors.

```julia
german[:, 1:2]
```

```julia
german[:, [:id, :Age]]
```

```julia
german[:, ["id", "Age"]]
```

--------------------------------

### Example of pandas aggregation returning list

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example shows how to use `.agg()` to return a list of values (min and max) for a column.

```python
df[['x']].agg(lambda x: [min(x), max(x)])
```

--------------------------------

### Example of pandas aggregate multiple columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example shows how to use `.agg()` to apply different functions to different columns.

```python
df.agg({'x': max, 'y': min})
```

--------------------------------

### Example of pandas join with grouped aggregation and column selection

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example demonstrates joining aggregated data and then selecting specific columns from the result.

```python
df.join(df.groupby('grp')['x'].mean(), on='grp', rsuffix='_mean')[['grp', 'x_mean']]
```

--------------------------------

### Example of pandas groupby aggregation with rename

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example shows how to group by a column, calculate the mean, and rename the resulting series.

```python
df.groupby('grp')['x'].mean().rename("my_mean")
```

--------------------------------

### Example of pandas groupby aggregation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This is a pandas example demonstrating how to group by a column and calculate the mean of another column.

```python
df.groupby('grp')['x'].mean()
```

--------------------------------

### Get DataFrame with Reused Columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Examples of how to get a DataFrame with reused columns (no copy) using various selectors.

```julia
german[!, 1:2]
```

```julia
german[!, [:id, :Age]]
```

```julia
german[!, ["id", "Age"]]
```

--------------------------------

### Install DataFrames.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/getting_started.md

Use this command to add the DataFrames package to your Julia environment.

```julia
using Pkg
Pkg.add("DataFrames")
```

--------------------------------

### Example of pandas join with grouped aggregation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example illustrates joining the original DataFrame with the result of a grouped aggregation.

```python
df.join(df.groupby('grp')['x'].mean(), on='grp', rsuffix='_mean')
```

--------------------------------

### Combine with Multiple Operations

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Shows how to use `combine` to apply multiple functions to grouped data. This example calculates the correlation between SepalLength and SepalWidth, and the number of rows in each group.

```julia
combine(iris_gdf, 1:2 => cor, nrow)
```

--------------------------------

### Get Type of Basic Operation Pairs

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Demonstrates the types of basic operation pairs created using symbols, strings, or integers.

```julia
julia> typeof(:x => :a)
Pair{Symbol, Symbol}
```

```julia
julia> typeof("x" => "a")
Pair{String, String}
```

```julia
julia> typeof(1 => "a")
Pair{Int64, String}
```

--------------------------------

### Example of pandas row-wise argmax with apply

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example uses `.apply()` with `axis=1` to find the column name corresponding to the maximum value in each row.

```python
df.assign(x_y_argmax = df.apply(lambda v: df.columns[v.argmax()], axis=1))
```

--------------------------------

### Add DataFrames.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Use this command to install the DataFrames.jl package using Julia's Pkg manager.

```julia
using Pkg

Pkg.add("DataFrames")
```

--------------------------------

### Combine with Do Block for Grouped Statistics

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Demonstrates the `do` block form of the `combine` function for applying operations to grouped data. This example calculates the mean and variance of PetalLength for each species.

```julia
combine(iris_gdf) do df
           (m = mean(df.PetalLength), s² = var(df.PetalLength))
       end
```

--------------------------------

### Construct DataFrame Column by Column

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/getting_started.md

Start with an empty DataFrame and add columns one by one. Use `df.ColumnName = data` or `df[:, :ColumnName] = data` to add or replace columns.

```jldoctest
julia> df = DataFrame()
0×0 DataFrame

julia> df.A = 1:8
1:8

julia> df[:, :B] = ["M", "F", "F", "M", "F", "M", "M", "F"]
8-element Vector{String}:
 "M"
 "F"
 "F"
 "M"
 "F"
 "M"
 "M"
 "F"

julia> df[!, :C] .= 0
8-element Vector{Int64}:
 0
 0
 0
 0
 0
 0
 0
 0

julia> df
8×3 DataFrame
 Row │ A      B       C
     │ Int64  String  Int64
─────┼──────────────────────
   1 │     1  M           0
   2 │     2  F           0
   3 │     3  F           0
   4 │     4  M           0
   5 │     5  F           0
   6 │     6  M           0
   7 │     7  M           0
   8 │     8  F           0
```

```jldoctest
julia> size(df, 1)
8
```

```jldoctest
julia> size(df, 2)
3
```

```jldoctest
julia> size(df)
(8, 3)
```

--------------------------------

### Iterating Over Query Results

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

This example shows how to loop through the iterator returned by a Query.jl query using a standard Julia for loop to process the results.

```jldoctest
julia> total_children = 0
0

julia> for i in q2
           global total_children += i.number_of_children
       end

julia> total_children
4
```

--------------------------------

### Example of pandas groupby output

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This is the output of a pandas `groupby().mean()` operation, showing a Series with the group keys and the aggregated values.

```python
grp
1    4
2    3
Name: x, dtype: int64
```

--------------------------------

### Manage Column Metadata

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/lib/metadata.md

Illustrates adding and retrieving metadata for specific columns. Use `colmetadata!` to add, `colmetadatakeys` to list keys, and `colmetadata` to get values. `emptycolmetadata!` removes all column metadata.

```jldoctest
julia> colmetadatakeys(df)
()
```

```jldoctest
julia> colmetadata!(df, :name, "label", "First and last name of a player", style=:note);
```

```jldoctest
julia> colmetadata!(df, :date, "label", "Rating date in yyyy-u format", style=:note);
```

```jldoctest
julia> colmetadata!(df, :rating, "label", "ELO rating in classical time control", style=:note);
```

```jldoctest
julia> "label" in colmetadatakeys(df, :rating)
true
```

```jldoctest
julia> colmetadata(df, :rating, "label")
"ELO rating in classical time control"
```

```jldoctest
julia> colmetadata(df, :rating, "label", style=true)
("ELO rating in classical time control", :note)
```

```jldoctest
julia> collect(colmetadatakeys(df))
3-element Vector{Pair{Symbol, Base.KeySet{String, Dict{String, Tuple{Any, Any}}}}}:
   :date => ["label"]
 :rating => ["label"]
   :name => ["label"]
```

```jldoctest
julia> [only(names(df, col)) =>
        [key => colmetadata(df, col, key) for key in metakeys] for
        (col, metakeys) in colmetadatakeys(df)]
3-element Vector{Pair{String, Vector{Pair{String, String}}}}:
   "date" => ["label" => "Rating date in yyyy-u format"]
 "rating" => ["label" => "ELO rating in classical time control"]
   "name" => ["label" => "First and last name of a player"]
```

```jldoctest
julia> emptycolmetadata!(df);

julia> colmetadatakeys(df)
()
```

--------------------------------

### Subset rows and select columns with DataFramesMeta.jl

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

This example shows how to subset rows based on a condition and select specific columns, renaming one during the process.

```julia
using DataFramesMeta

df = DataFrame(name=["John", "Sally", "Roger"],
                 age=[54.0, 34.0, 79.0],
                 children=[0, 2, 4])

@chain df begin
    @rsubset :age > 40 
    @select(:number_of_children = :children, :name)
end
```

--------------------------------

### Create and Display a DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/working_with_dataframes.md

Demonstrates creating a DataFrame and its default printing behavior, which shows a sample of rows and columns. Requires the DataFrames package.

```julia
using DataFrames

df = DataFrame(A=1:2:1000, B=repeat(1:10, inner=50), C=1:500)

```

--------------------------------

### Example of DataFrames.jl groupby output

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This is the output of a DataFrames.jl `combine(groupby(df, :grp), :x => mean)` operation, showing a DataFrame with the group keys and the aggregated mean values.

```julia
2×2 DataFrame
 Row │ grp    x_mean 
     │ Int64  Float64
─────┼────────────────
   1 │     1      4.0
   2 │     2      3.0
```

--------------------------------

### Manage DataFrame Metadata

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/lib/metadata.md

Demonstrates adding, checking for, and retrieving DataFrame-level metadata. Use `metadata!` to add, `metadatakeys` to list keys, and `metadata` to get values. `emptymetadata!` removes all metadata.

```jldoctest
julia> metadatakeys(df)
()
```

```jldoctest
julia> metadata!(df, "caption", "ELO ratings of chess players", style=:note);

julia> collect(metadatakeys(df))
1-element Vector{String}:
 "caption"
```

```jldoctest
julia> "caption" in metadatakeys(df)
true
```

```jldoctest
julia> metadata(df, "caption")
"ELO ratings of chess players"
```

```jldoctest
julia> metadata(df, "caption", style=true)
("ELO ratings of chess players", :note)
```

```jldoctest
julia> emptymetadata!(df);

julia> metadatakeys(df)
()
```

--------------------------------

### Stack DataFrame for aggregation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/reshaping_and_pivoting.md

Use `stack` to prepare data for aggregation. This example stacks the `iris` DataFrame, excluding the `:Species` column, to facilitate split-apply-combine operations.

```jldoctest
julia> using Statistics

julia> d = stack(iris, Not(:Species))
750×3 DataFrame
 Row │ Species         variable     value
     │ String15        String       Float64
─────┼──────────────────────────────────────
   1 │ Iris-setosa     SepalLength      5.1
   2 │ Iris-setosa     SepalLength      4.9
   3 │ Iris-setosa     SepalLength      4.7
   4 │ Iris-setosa     SepalLength      4.6
   5 │ Iris-setosa     SepalLength      5.0
   6 │ Iris-setosa     SepalLength      5.4
   7 │ Iris-setosa     SepalLength      4.6
   8 │ Iris-setosa     SepalLength      5.0
```

--------------------------------

### Get Column Vector Without Copying

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Examples of how to get a column as a vector without copying the data, which is more memory-efficient.

```julia
german.Age
```

```julia
german."Age"
```

```julia
german[!, :Age]
```

```julia
german[!, "Age"]
```

```julia
german[!, 2]
```

--------------------------------

### Combined Query with Filtering and Selection

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

This example shows a Query.jl query that combines filtering and selection, and collects the results into a Vector. It demonstrates selecting a single value per row.

```jldoctest
julia> q3 = @from i in df begin
            @where i.age > 40 && i.children > 0
            @select i.name
            @collect
       end
1-element Vector{String}:
 "Roger"
```

--------------------------------

### Create and Combine DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Demonstrates creating a DataFrame and then using the `combine` function to aggregate a column.

```julia
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      4

julia> combine(df, :a => sum)
1×1 DataFrame
 Row │ a_sum
     │ Int64
─────┼───────
   1 │     6
```

--------------------------------

### Get DataFrame Dimensions with size()

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Use the `size` function to get the dimensions (rows, columns) of a DataFrame. You can also specify a dimension to get only the number of rows or columns.

```jldoctest
julia> german = copy(german_ref);

julia> size(german)
(1000, 10)

julia> size(german, 1)
1000

julia> size(german, 2)
10
```

--------------------------------

### Display All Rows and Columns of a DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/working_with_dataframes.md

Shows how to use the `show` function with `allrows=true` and `allcols=true` to display all rows and columns of a DataFrame, respectively. This is useful when the default sample is insufficient.

```julia
show(df, allrows=true)
show(df, allcols=true)

```

--------------------------------

### Create DataFrame and GroupedDataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Initializes a sample DataFrame and groups it by 'customer_id' for subsequent operations.

```julia
julia> df = DataFrame(customer_id=["a", "b", "b", "b", "c", "c"],
                      transaction_id=[12, 15, 19, 17, 13, 11],
                      volume=[2, 3, 1, 4, 5, 9])
6×3 DataFrame
 Row │ customer_id  transaction_id  volume
     │ String       Int64           Int64
─────┼─────────────────────────────────────
   1 │ a                        12       2
   2 │ b                        15       3
   3 │ b                        19       1
   4 │ b                        17       4
   5 │ c                        13       5
   6 │ c                        11       9

julia> gdf = groupby(df, :customer_id, sort=true)

julia> show(gdf, allgroups=true)
GroupedDataFrame with 3 groups based on key: customer_id
Group 1 (1 row): customer_id = "a"
 Row │ customer_id  transaction_id  volume
     │ String       Int64           Int64
─────┼─────────────────────────────────────
   1 │ a                        12       2
Group 2 (3 rows): customer_id = "b"
 Row │ customer_id  transaction_id  volume
     │ String       Int64           Int64
─────┼─────────────────────────────────────
   1 │ b                        15       3
   2 │ b                        19       1
   3 │ b                        17       4
Group 3 (2 rows): customer_id = "c"
 Row │ customer_id  transaction_id  volume
     │ String       Int64           Int64
─────┼─────────────────────────────────────
   1 │ c                        13       5
   2 │ c                        11       9
```

--------------------------------

### Create DataFrame with Temperature Data in Julia

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Initializes a DataFrame with time and temperature readings from three different locations.

```julia
julia> df = DataFrame(Time = 1:4,
                      Temperature1 = [20, 23, 25, 28],
                      Temperature2 = [33, 37, 41, 44],
                      Temperature3 = [15, 10, 4, 0])
4×4 DataFrame
 Row │ Time   Temperature1  Temperature2  Temperature3
     │ Int64  Int64         Int64         Int64
─────┼─────────────────────────────────────────────────
   1 │     1            20            33            15
   2 │     2            23            37            10
   3 │     3            25            41             4
   4 │     4            28            44             0
```

--------------------------------

### Example of pandas mean aggregation on multiple columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example calculates the mean for multiple specified columns.

```python
df[['x', 'y']].mean()
```

--------------------------------

### Create and Initialize DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/lib/metadata.md

Initializes a DataFrame with sample data for names, dates, and ratings. This serves as the base for metadata operations.

```jldoctest
julia> using DataFrames

julia> df = DataFrame(name=["Jan Krzysztof Duda", "Jan Krzysztof Duda",
                           "Radosław Wojtaszek", "Radosław Wojtaszek"],
                      date=["2022-Jun", "2021-Jun", "2022-Jun", "2021-Jun"],
                      rating=[2750, 2729, 2708, 2687])
4×3 DataFrame
 Row │ name                date      rating
     │ String              String    Int64
─────┼──────────────────────────────────────
   1 │ Jan Krzysztof Duda  2022-Jun    2750
   2 │ Jan Krzysztof Duda  2021-Jun    2729
   3 │ Radosław Wojtaszek  2022-Jun    2708
   4 │ Radosław Wojtaszek  2021-Jun    2687
```

--------------------------------

### Get Group Indices as a Vector

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

The `groupindices` function can be called directly on a `GroupedDataFrame` to get a vector of group indices for each row.

```jldoctest
julia> groupindices(gdf)
6-element Vector{Union{Missing, Int64}}:
 1
 2
 2
 2
 3
 3
```

--------------------------------

### Compare DataFrame column selection with Vector indexing

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/working_with_dataframes.md

Demonstrates the difference between selecting a single column using `select` (returns DataFrame) and using standard indexing `[:, :column_name]` (returns Vector).

```julia
julia> df[:, :x1]
2-element Vector{Int64}:
 1
 2
```

--------------------------------

### Initialize DataFrame with Columns and Scalar Broadcasting

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Create a DataFrame by providing column names and their corresponding data. Scalars are automatically broadcasted to fill all rows.

```jldoctest
julia> DataFrame(A=1:3, B=5:7, fixed=1)
3×3 DataFrame
 Row │ A      B      fixed
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      5      1
   2 │     2      6      1
   3 │     3      7      1
```

--------------------------------

### Create Sample DataFrames in Python (pandas)

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

Creates two sample DataFrames in Python using the pandas and numpy libraries. Note that pandas supports multi-index, so the example data frame is set up with 'a' to 'f' as row indices rather than a separate 'id' column.

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({'grp': [1, 2, 1, 2, 1, 2],
                   'x': range(6, 0, -1),
                   'y': range(4, 10),
                   'z': [3, 4, 5, 6, 7, None]},
                   index = list('abcdef'))
df2 = pd.DataFrame({'grp': [1, 3], 'w': [10, 11]})
```

--------------------------------

### Get Total Number of Rows (Regular Function)

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Demonstrates the use of `nrow` as a regular function to get the total number of rows in a DataFrame.

```julia
julia> nrow(df)
6
```

--------------------------------

### Performance Comparison: Indexing vs. View Creation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Benchmarks demonstrate that creating a view is significantly faster and allocates less memory than direct indexing for large DataFrame subsets. However, views share memory with the parent DataFrame.

```julia
julia> using BenchmarkTools

julia> @btime $german[1:end-1, 1:end-1];
  9.900 μs (44 allocations: 57.56 KiB)
```

```julia
julia> @btime @view $german[1:end-1, 1:end-1];
  67.332 ns (2 allocations: 32 bytes)
```

--------------------------------

### Copy Columns vs. Reuse Columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Demonstrates the difference between copying columns (`:`) and reusing columns (`!`). `!` avoids copying, saving memory and improving performance, but can lead to bugs.

```julia
julia> german[:, [:Sex]]
1000×1 DataFrame
  Row │ Sex
      │ String7
──────┼─────────
    1 │ male
    2 │ female
    3 │ male
    4 │ male
    5 │ male
    6 │ male
    7 │ male
    8 │ male
  ⋮   │    ⋮
  994 │ male
  995 │ male
  996 │ female
  997 │ male
  998 │ male
  999 │ male
 1000 │ male
985 rows omitted
```

```julia
julia> german[!, [:Sex]]
1000×1 DataFrame
  Row │ Sex
      │ String7
──────┼─────────
    1 │ male
    2 │ female
    3 │ male
    4 │ male
    5 │ male
    6 │ male
    7 │ male
    8 │ male
  ⋮   │    ⋮
  994 │ male
  995 │ male
  996 │ female
  997 │ male
  998 │ male
  999 │ male
 1000 │ male
985 rows omitted
```

--------------------------------

### Get Number of Rows/Columns with nrow() and ncol()

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

The `nrow` and `ncol` functions provide a direct way to get the number of rows and columns in a DataFrame, respectively.

```jldoctest
julia> nrow(german)
1000

julia> ncol(german)
10
```

--------------------------------

### Combine with Custom Output Column Names

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Illustrates using `combine` to apply a function that returns multiple values, aliasing them to new column names. This example finds the minimum and maximum PetalLength for each species.

```julia
combine(iris_gdf, :PetalLength => (x -> [extrema(x)]) => [:min, :max])
```

--------------------------------

### Example of pandas assign with correlation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example uses `.assign()` to add a new column calculated using the correlation between two existing columns.

```python
df.assign(x_y_cor = np.corrcoef(df.x, df.y)[0, 1])
```

--------------------------------

### Example of pandas complex function aggregation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example uses `.agg()` with a lambda function to apply a complex operation (mean of cosine) to a column.

```python
df[['z']].agg(lambda v: np.mean(np.cos(v)))
```

--------------------------------

### Create DataFrame from a Dictionary with Symbol Keys

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Initialize a DataFrame from a dictionary where keys are Symbols representing column names. Using Symbols is generally faster than strings.

```jldoctest
julia> dict = Dict(:customer_age => [15, 20, 25],
                   :first_name => ["Rohit", "Rahul", "Akshat"])
Dict{Symbol, Vector} with 2 entries:
  :customer_age => [15, 20, 25]
  :first_name   => ["Rohit", "Rahul", "Akshat"]

julia> DataFrame(dict)
3×2 DataFrame
 Row │ customer_age  first_name
     │ Int64         String
─────┼──────────────────────────
   1 │           15  Rohit
   2 │           20  Rahul
   3 │           25  Akshat
```

--------------------------------

### Example of pandas mean aggregation on columns matching regex

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example calculates the mean for columns whose names match a given regular expression.

```python
df.filter(regex=("^x")).mean()
```

--------------------------------

### Select Rows and All Columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Selects the first 5 rows and all columns. The colon ':' indicates all items.

```julia
julia> german[1:5, :]
5×10 DataFrame
 Row │ id     Age    Sex      Job    Housing  Saving accounts  Checking accoun ⋯
     │ Int64  Int64  String7  Int64  String7  String15         String15        ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     0     67  male         2  own      NA               little          ⋯
   2 │     1     22  female       2  own      little           moderate
   3 │     2     49  male         1  own      little           NA
   4 │     3     45  male         2  free     little           little
   5 │     4     53  male         2  free     little           little          ⋯
                                                               4 columns omitted
```

--------------------------------

### Create and Filter DataFrame with TidierData.jl

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

Demonstrates creating a DataFrame and then filtering and selecting columns using TidierData.jl's @chain, @filter, and @select macros.

```jldoctest tidierdata
julia> using TidierData

julia> df = DataFrame(
                name = ["John", "Sally", "Roger"],
                age = [54.0, 34.0, 79.0],
                children = [0, 2, 4]
            )
3×3 DataFrame
 Row │ name    age      children
     │ String  Float64  Int64
─────┼───────────────────────────
   1 │ John       54.0         0
   2 │ Sally      34.0         2
   3 │ Roger      79.0         4

julia> @chain df begin
           @filter(children != 2)
           @select(name, num_children = children)
       end
2×2 DataFrame
 Row │ name    num_children 
     │ String  Int64        
─────┼──────────────────────
   1 │ John               0
   2 │ Roger              4
```

--------------------------------

### Example of pandas row-wise operation with apply

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

This pandas example uses `.apply()` with `axis=1` to perform an operation (finding the minimum) row by row across specified columns.

```python
df.assign(x_y_min = df.apply(lambda v: min(v.x, v.y), axis=1))
```

--------------------------------

### Initialize data.table objects in R

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/comparisons.md

Initializes two data.table objects in R for demonstration purposes.

```R
library(data.table)
df  <- data.table(grp = rep(1:2, 3), x = 6:1, y = 4:9,
                  z = c(3:7, NA), id = letters[1:6])
df2 <- data.table(grp=c(1,3), w = c(10,11))
```

--------------------------------

### Safe Group Retrieval with get

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/lib/indexing.md

Use the `get` function to retrieve a group by its key (Tuple or NamedTuple), providing a default value if the key does not exist. This prevents errors for missing keys.

```julia
get(gd, key::Union{Tuple, NamedTuple}, default)
```

--------------------------------

### Basic Query with Filtering and Projection

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

This example demonstrates a basic Query.jl query that filters rows based on age and selects specific columns, renaming one, and collecting the results into a new DataFrame.

```jldoctest
julia> using DataFrames, Query

julia> df = DataFrame(name=["John", "Sally", "Roger"],
                      age=[54.0, 34.0, 79.0],
                      children=[0, 2, 4])
3×3 DataFrame
 Row │ name    age      children
     │ String  Float64  Int64
─────┼───────────────────────────
   1 │ John       54.0         0
   2 │ Sally      34.0         2
   3 │ Roger      79.0         4

julia> q1 = @from i in df begin
            @where i.age > 40
            @select {number_of_children=i.children, i.name}
            @collect DataFrame
       end
2×2 DataFrame
 Row │ number_of_children  name
     │ Int64               String
─────┼────────────────────────────
   1 │                  0  John
   2 │                  4  Roger
```

--------------------------------

### Get Row Count of Grouped DataFrames

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Demonstrates how to get the number of rows for each group in a GroupedDataFrame using `map(nrow, ...)` and the broadcast operator `nrow.(...)`. These methods are suitable for iterating over groups.

```julia
map(nrow, sdf_vec)
3-element Vector{Int64}:
 50
 50
 50
```

```julia
nrow.(sdf_vec)
3-element Vector{Int64}:
 50
 50
 50
```

--------------------------------

### Get All Indices with `eachindex`

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

The `eachindex` function, when applied to a vector or array, returns a sequence of all indices.

```jldoctest
julia> collect(eachindex(df.customer_id))
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6
```

--------------------------------

### Get Single Cell from DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Use matrix-like indexing to retrieve a single cell's value from a DataFrame.

```jldoctest
julia> german[4, 4]
2
```

--------------------------------

### Get Group Indices with `combine`

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Use `combine` with `groupindices` to add a column with the group number for each row. This operation is column-independent.

```jldoctest
julia> combine(gdf, groupindices)
3×2 DataFrame
 Row │ customer_id  groupindices
     │ String       Int64
─────┼───────────────────────────
   1 │ a                       1
   2 │ b                       2
   3 │ c                       3
```

--------------------------------

### Nested Pair Creation and Access in Julia

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Demonstrates the creation of nested pairs in Julia and how to access their elements. Accessing an index beyond the pair's structure results in a BoundsError.

```julia
julia> p = :x => :y => :z
:x => (:y => :z)
```

```julia
julia> p[1]
:x
```

```julia
julia> p[2]
:y => :z
```

```julia
julia> p[2][1]
:y
```

```julia
julia> p[2][2]
:z
```

```julia
julia> p[3] # there is no index 3 for a pair
ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
```

--------------------------------

### Retrieve levels from a CategoricalArray

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/categorical.md

Uses the `levels` function to get the unique categories present in a CategoricalArray. The order of levels is maintained.

```jldoctest
julia> levels(cv)
2-element CategoricalArray{String,1,UInt32}:
 "Group A"
 "Group B"
```

--------------------------------

### Construct DataFrame with Keyword Arguments

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/getting_started.md

Use keyword arguments to construct a DataFrame where each argument represents a column. This is a common and straightforward method.

```julia
using DataFrames

DataFrame(a=1:4, b=["M", "F", "F", "M"]) # keyword argument constructor
```

--------------------------------

### Check DataFrames.jl Package Status

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Use the 'status' command in Pkg REPL mode to view the version of DataFrames.jl that is currently installed.

```julia
]

(@v1.9) pkg> status DataFrames
      Status `~\v1.13\Project.toml`
  [a93c6f00] DataFrames v1.8.0
```

--------------------------------

### Split-Apply-Combine with DataFramesMeta.jl

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

Demonstrates the split-apply-combine pattern using @rsubset, @by, and @select for calculating ranges within groups.

```julia
using DataFramesMeta

df = DataFrame(key=repeat(1:3, 4), value=1:12)

@chain df begin
    @rsubset :value > 3 
    @by(:key, :min = minimum(:value), :max = maximum(:value))
    @select(:key, :range = :max - :min)
end
```

--------------------------------

### Create DataFrame from a Matrix with Auto-Generated Column Names

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Initialize a DataFrame from a matrix. Use `:auto` as the second argument to automatically generate column names like `x1`, `x2`, etc.

```jldoctest
julia> DataFrame([1 0; 2 0], :auto)
2×2 DataFrame
 Row │ x1     x2
```

--------------------------------

### Get Element Types of DataFrame Columns

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Iterate over columns using `eachcol` and broadcast `eltype` to find the data type of each column.

```jldoctest
julia> eltype.(eachcol(german))
10-element Vector{DataType}:
 Int64
 Int64
 String7
 Int64
 String7
 String15
 String15
 Int64
 Int64
 String31
```

--------------------------------

### View First Rows of a DataFrame

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/working_with_dataframes.md

Illustrates using the `first` function to view a specified number of the initial rows of a DataFrame. This is helpful for quickly inspecting the beginning of the data.

```julia
first(df, 6)

```

--------------------------------

### Get Group Number

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Use `groupindices` to return the group number for each row. This can be helpful for tracking which group a row belongs to after transformations.

```julia
combine(grouped_df, groupindices)
```

--------------------------------

### Get DataFrame Column Names as Strings

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

The `names` function returns a vector of column names as `String`s. It can also filter by element type.

```jldoctest
julia> names(german)
10-element Vector{String}:
 "id"
 "Age"
 "Sex"
 "Job"
 "Housing"
 "Saving accounts"
 "Checking account"
 "Credit amount"
 "Duration"
 "Purpose"
```

```jldoctest
julia> names(german, AbstractString)
5-element Vector{String}:
 "Sex"
 "Housing"
 "Saving accounts"
 "Checking account"
 "Purpose"
```

--------------------------------

### DataFrame Construction

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/lib/functions.md

Functions for creating and initializing DataFrames.

```APIDOC
## Constructing data frames

### `allcombinations`

Creates all combinations of elements from input iterables.

### `copy`

Creates a copy of a DataFrame.

### `similar`

Creates a new DataFrame with the same structure but uninitialized data.
```

--------------------------------

### Extracting Data with Comprehension

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/querying_frameworks.md

This example uses a Julia comprehension to extract specific data from the iterator returned by a Query.jl query, applying a condition.

```jldoctest
julia> y = [i.name for i in q2 if i.number_of_children > 0]
1-element Vector{String}:
 "Roger"
```

--------------------------------

### Test DataFrames.jl Package

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Run this command to execute the tests for DataFrames.jl. Be aware that this process can take over 30 minutes to complete.

```julia
using Pkg

Pkg.test("DataFrames") # Warning! This will take more than 30 minutes.
```

--------------------------------

### Custom Number Formatting for DataFrames

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/customizing_output.md

Define a custom function to format numbers in a DataFrame. This example formats negative numbers by enclosing them in parentheses.

```julia
function parentheses_fmt(v, i, j)
           !(v isa Number) && return v
           v < 0 && return "($(-v))"
           return v
       end
```

```julia
show(df; formatters = [parentheses_fmt])
```

--------------------------------

### Customize DataFrame Display with show()

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/basics.md

Manually call the `show` function to control how a DataFrame is displayed. Use `allrows=true` to show all rows and `allcols=true` to show all columns, regardless of screen size.

```jldoctest
julia> show(german, allcols=true)
1000×10 DataFrame
  Row │ id     Age    Sex      Job    Housing  Saving accounts  Checking account  Credit amount  Duration  Purpose
      │ Int64  Int64  String7  Int64  String7  String15         String15          Int64          Int64     String31
──────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    1 │     0     67  male         2  own      NA               little                     1169         6  radio/TV
    2 │     1     22  female       2  own      little           moderate                   5951        48  radio/TV
    3 │     2     49  male         1  own      little           NA                         2096        12  education
    4 │     3     45  male         2  free     little           little                     7882        42  furniture/equipment
    5 │     4     53  male         2  free     little           little                     4870        24  car
    6 │     5     35  male         1  free     NA               NA                         9055        36  education
    7 │     6     53  male         2  own      quite rich       NA                         2835        24  furniture/equipment
    8 │     7     35  male         3  rent     little           moderate                   6948        36  car
  ⋮   │   ⋮      ⋮       ⋮       ⋮       ⋮            ⋮                ⋮                ⋮           ⋮               ⋮
  994 │   993     30  male         3  own      little           little                     3959        36  furniture/equipment
  995 │   994     50  male         2  own      NA               NA                         2390        12  car
  996 │   995     31  female       1  own      little           NA                         1736        12  furniture/equipment
  997 │   996     40  male         3  own      little           little                     3857        30  car
```

--------------------------------

### Get Row Indices within Groups with `combine`

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Use `combine` with `eachindex` to add a column with the index of each row within its group. This operation is column-independent.

```jldoctest
julia> combine(gdf, eachindex)
6×2 DataFrame
 Row │ customer_id  eachindex
     │ String       Int64
─────┼────────────────────────
   1 │ a                    1
   2 │ b                    1
   3 │ b                    2
   4 │ b                    3
   5 │ c                    1
   6 │ c                    2
```

--------------------------------

### Select with Grouped Correlation

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/split_apply_combine.md

Shows how `select` can be used with grouped data to apply a function that returns a single value per group, which is then broadcast to all rows in that group. This example calculates the correlation between SepalLength and SepalWidth.

```julia
select(iris_gdf, 1:2 => cor)
```

--------------------------------

### Combine DataFrame Columns with Sum

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/working_with_dataframes.md

Use `combine` with `All() .=> sum` to apply the sum function to all columns and get a single row DataFrame with the sums.

```julia
df = DataFrame(A=1:4, B=4.0:-1.0:1.0)
julia> df
4×2 DataFrame
 Row │ A      B
     │ Int64  Float64
─────┼────────────────
   1 │     1      4.0
   2 │     2      3.0
   3 │     3      2.0
   4 │     4      1.0

julia> combine(df, All() .=> sum)
1×2 DataFrame
 Row │ A_sum  B_sum
     │ Int64  Float64
─────┼────────────────
   1 │    10     10.0
```

--------------------------------

### Create DataFrames for Joins

Source: https://github.com/juliadata/dataframes.jl/blob/main/docs/src/man/joins.md

Initializes two DataFrames, 'people' and 'jobs', with common 'ID' columns to be used in join operations.

```julia
using DataFrames

people = DataFrame(ID=[20, 40], Name=["John Doe", "Jane Doe"])
jobs = DataFrame(ID=[20, 40], Job=["Lawyer", "Doctor"])
```