### Install DataFramesMeta.jl and dependencies Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Activate a temporary environment and install DataFramesMeta.jl, CSV.jl, and HTTP.jl using the Julia package manager. ```julia julia> ] # press ] to drop into pkg-mode pkg> activate --temp # activate a temporary environment for this tutorial pkg> add DataFramesMeta pkg> add CSV HTTP ``` -------------------------------- ### Install DataFramesMeta.jl via REPL mode Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/README.md Use the Pkg REPL mode by typing ']' in the console to add the package. ```julia ] add DataFramesMeta ``` -------------------------------- ### Install and Load DataFramesMeta.jl Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Installs the DataFramesMeta.jl package and loads it along with its dependencies DataFrames.jl and Chain.jl. ```julia using Pkg Pkg.add("DataFramesMeta") # Load the package (also loads DataFrames.jl and Chain.jl) using DataFramesMeta ``` -------------------------------- ### Install DataFramesMeta.jl via Pkg Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/README.md Use the Pkg module to add the package to your current environment. ```julia julia> import Pkg; Pkg.add("DataFramesMeta") ``` -------------------------------- ### Multiple Summaries with @combine Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md You can compute multiple summary statistics at once by providing several operations within a block to @combine. This example calculates average, minimum, maximum, and length. ```julia @combine msleep begin :avg_sleep = mean(:sleep_total) :min_sleep = minimum(:sleep_total) :max_sleep = maximum(:sleep_total) :total = length(:sleep_total) end ``` -------------------------------- ### Sort String Column in Reverse with @orderby and ordinalrank Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md This example demonstrates sorting a string column in reverse order using @orderby and the ordinalrank function from StatsBase.jl. It requires importing StatsBase. ```julia using StatsBase df = DataFrame(group=[1, 2, 1, 2, 1], name = ["Bob", "Dexter", "Alice", "Eve", "Cedric"]) @orderby df begin :group ordinalrank(:name, rev=true) end ``` -------------------------------- ### Column reference restrictions in macros Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Examples of invalid and valid column reference mixing in @eachrow and @with. ```julia df = DataFrame(A = 1:3, B = [2, 1, 2]) @eachrow df begin :A = $2 end @with df begin $1 + $"A" end ``` ```julia @eachrow df begin $1 + $2 end @with df begin $1 + $2 end ``` -------------------------------- ### Dynamic Column Names with @rtransform Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Create new columns with names stored in variables or names containing spaces using the '$' syntax within @rtransform. This example creates :rem_proportion and "Body weight in grams". ```julia newname = :rem_proportion @rtransform msleep begin $newname = :sleep_rem / :sleep_total $"Body weight in grams" = :bodywt * 1000 end ``` -------------------------------- ### Combine Data with @combine and mean Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use the @combine macro to compute summary statistics for columns. This example calculates the average sleep time and names the result :avg_sleep. ```julia @chain msleep @combine :avg_sleep = mean(:sleep_total) ``` -------------------------------- ### Add Multiple Columns with @rtransform Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md You can add multiple new columns simultaneously using a block within @rtransform. This example creates :rem_proportion and :bodywt_grams. ```julia @rtransform msleep begin :rem_proportion = :sleep_rem / :sleep_total :bodywt_grams = :bodywt * 1000 end ``` -------------------------------- ### Filter Rows After Ordering with @rsubset Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md After ordering rows, you can filter them using @rsubset based on specific conditions. This example filters for mammals sleeping 16 or more hours. ```julia @chain msleep begin @select :name :order :sleep_total @orderby :order :sleep_total @rsubset :sleep_total > 16 end ``` -------------------------------- ### Conditional Row Operations with @eachrow and let Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md When using @eachrow, a `let` block is necessary to create a scope for assigning variables within the macro. This example demonstrates conditional logic based on row values. ```julia df = DataFrame(A = 1:3, B = [2, 1, 2], C = [-4,2,1]) let x = 0.0 @eachrow df begin if :A < :B x += :A * :C end end x end ``` -------------------------------- ### Multi-argument Selectors in Transformations Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Multi-argument selectors can be used within transformations when the entire argument is wrapped in $(). However, this specific example will fail as it's not a valid transformation structure. ```julia @select df :y = f($[:a, :b]) ``` -------------------------------- ### Create New Column with @transform (Column-wise) Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use @transform for column-wise operations to create new columns. This example calculates the deviation of each animal's sleep time from the average sleep time. ```julia @transform msleep :demeand_sleep = :sleep_total .- mean(:sleep_total) ``` -------------------------------- ### Descending Order Sort with @orderby Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md To sort in descending order for numeric columns, negate the column name within the @orderby macro. This example sorts by :order ascending and :sleep_total descending. ```julia @chain msleep begin @select :name :order :sleep_total @orderby begin :order -:sleep_total end @rsubset :sleep_total >= 16 end ``` -------------------------------- ### Chain Operations with @chain and @orderby Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Chain multiple DataFrame operations using @chain, including selecting columns, ordering by multiple criteria, and taking the first few rows. Note that non-macro functions like `first` do not start with '@'. ```julia @chain msleep begin @select :name :order :sleep_total @orderby :order :sleep_total first(10) end ``` -------------------------------- ### @by Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Perform grouping and combining operations in a single step. ```APIDOC ## @by ### Description Perform the grouping and combining operations in one step. ### Parameters - **df** (DataFrame) - Required - The input data frame. - **group_col** (Symbol) - Required - The column to group by. - **block** (Expr) - Required - A block of transformations to apply to the groups. ``` -------------------------------- ### Select columns using selectors with @select Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Shows how to use multi-column selectors like Not, Between, All, and Cols (with regular expressions) for column selection with the @select macro. ```julia @select df Not(:x) @select df Between(:x, :y) @select df All() @select df Cols(r"x") # Regular expressions. ``` -------------------------------- ### Manage Column Metadata with @label! and @note! Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Attach labels and notes to columns to provide documentation. Labels are short descriptions, while notes are longer, appendable explanations. ```julia using DataFramesMeta df = DataFrame(wage = [15, 22, 18], tenure = [12, 36, 24]) # Add short labels @label! df begin :wage = "Hourly wage (2023 USD)" :tenure = "Job tenure (months)" end # View labels printlabels(df) # ┌────────┬────────────────────────┐ # │ Column │ Label │ # ├────────┼────────────────────────┤ # │ wage │ Hourly wage (2023 USD) │ # │ tenure │ Job tenure (months) │ # └────────┴────────────────────────┘ # Add detailed notes (notes append by default) @note! df :wage = "Source: Bureau of Labor Statistics" @note! df :wage = "Adjusted for inflation to 2023 dollars" # View notes printnotes(df) # Column: wage # ──────────── # Label: Hourly wage (2023 USD) # Source: Bureau of Labor Statistics # Adjusted for inflation to 2023 dollars # Access metadata programmatically label(df, :wage) # "Hourly wage (2023 USD)" note(df, :wage) # "Source: Bureau of Labor Statistics\nAdjusted for..." ``` -------------------------------- ### Select columns and perform transformations with @select Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Demonstrates selecting specific columns and creating new columns with transformations using the @select macro. It also shows usage with GroupedDataFrame and the in-place @select! macro. ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); gd = @groupby(df, :x); @select(df, :x, :y) @select(df, :x2 = 2 * :x, :y) @select(gd, :x2 = 2 .* :y .* first(:y)) @select!(df, :x, :y) @select!(df, :x = 2 * :x, :y) @select!(gd, :y = 2 .* :y .* first(:y)) ``` -------------------------------- ### Dynamic Column Reference with $ Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The $ syntax allows for dynamic column references using variables, strings, or integers. ```julia using DataFramesMeta df = DataFrame(col_a = 1:3, col_b = 4:6, col_c = 7:9) ``` -------------------------------- ### Create New Column with @rtransform (Row-wise) Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use @rtransform to add a new column by performing row-wise calculations. This example calculates the ratio of rem sleep to total sleep. ```julia @rtransform msleep :rem_proportion = :sleep_rem / :sleep_total ``` -------------------------------- ### Load additional packages for data handling Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Load the CSV.jl and HTTP.jl packages for reading CSV files and making HTTP requests, respectively. Also loads the Statistics standard library. ```julia using CSV, HTTP, Statistics ``` -------------------------------- ### Dynamic column creation and selection Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Create new columns or select multiple columns programmatically using variables. ```julia newcol = "Result Column" @transform(df, $newcol = :col_a .* 2) ``` ```julia cols = [:col_a, :col_b] @select(df, $cols) ``` -------------------------------- ### AsTable for multi-column transformations on LHS Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Demonstrates using AsTable on the left-hand side of a transformation to create multiple columns at once, with names determined programmatically. ```julia :y = f(AsTable(cols)) ``` -------------------------------- ### Select First Column Using Index Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Select the first column using the '$' followed by the column index. This is a shorthand for positional selection. ```julia @select msleep $1 ``` -------------------------------- ### Compare DataFrames.jl and DataFramesMeta.jl Syntax Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Illustrates the syntactic differences between base DataFrames.jl functions and DataFramesMeta.jl macros for common operations like transformation and subsetting. ```julia df = DataFrame(a = [1, 2], b = [3, 4]); # With DataFrames transform(df, [:a, :b] => ((x, y) -> x + y) => :c) # With DataFramesMeta @transform(df, :c = :a + :b) # With DataFrames subset(df, :a => ByRow(==(2))) # With DataFramesMeta @rsubset(df, :a == 2) ``` -------------------------------- ### Define DataFrame and column variables Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Initializes a DataFrame and a list of column names to be used in subsequent operations. ```julia df = DataFrame(a = [11, 14], b = [17, 10], c = [12, 5]); vars = ["a", "b"]; ``` -------------------------------- ### Load msleep dataset from URL Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Download and read the msleep dataset from a GitHub URL into a DataFrame using CSV.jl and HTTP.jl. Handles missing values represented as 'NA'. ```julia url = "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/msleep_ggplot2.csv"; msleep = CSV.read(HTTP.get(url).body, DataFrame; missingstring="NA") ``` -------------------------------- ### Automatic Summaries with describe Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md The `describe` function from DataFrames.jl provides a convenient way to automatically compute several summary statistics for all columns in a DataFrame. ```julia describe(msleep) ``` -------------------------------- ### Creating new columns with $ Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Use $ to define new column names dynamically within transformation macros. ```julia df = DataFrame(A = 1:3, B = [2, 1, 2]) newcol = "C" @select(df, $newcol = :A + :B) @by(df, :B, $("A complicated" * " new name") = first(:A)) nameC = "C" df3 = @eachrow df begin @newcol $nameC::Vector{Int} $nameC = :A end ``` -------------------------------- ### AsTable with single column selection Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Illustrates that AsTable can select a single column, and its content is accessed directly without needing '$'. ```julia :y = first(AsTable("a")) ``` -------------------------------- ### Pass keyword arguments to DataFrames.jl functions Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Demonstrates passing keyword arguments to underlying functions using semicolon syntax, @kwarg blocks, or splatting pairs. ```julia-repl julia> df = DataFrame(x = [1, 1, 2, 2], b = [5, 6, 7, 8]); julia> @rsubset(df, :x == 1 ; view = true) 2×2 SubDataFrame Row │ x b │ Int64 Int64 ─────┼────────────── 1 │ 1 5 2 │ 1 6 ``` ```julia-repl julia> df = DataFrame(x = [1, 1, 2, 2], b = [5, 6, 7, 8]); julia> @rsubset df begin :x == 1 @kwarg view = true end 2×2 SubDataFrame Row │ x b │ Int64 Int64 ─────┼────────────── 1 │ 1 5 2 │ 1 6 ``` ```julia-repl julia> df = DataFrame(x = [1, 1, 2, 2], b = [5, 6, 7, 8]); julia> my_kwargs = [:view => true, :skipmissing => false]; julia> @rsubset(df, :x == 1; my_kwargs...) 2×2 SubDataFrame Row │ x b │ Int64 Int64 ─────┼────────────── 1 │ 1 5 2 │ 1 6 julia> @rsubset df begin :x == 1 @kwarg my_kwargs... end 2×2 SubDataFrame Row │ x b │ Int64 Int64 ─────┼────────────── 1 │ 1 5 2 │ 1 6 ``` -------------------------------- ### Column Reference with $ Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The $ syntax enables dynamic column references using variables, strings, or integers instead of literal symbols. ```APIDOC ## Column Reference with $ The `$` syntax enables dynamic column references using variables, strings, or integers instead of literal symbols. ### Usage ```julia using DataFramesMeta df = DataFrame(col_a = 1:3, col_b = 4:6, col_c = 7:9) # Example using a variable col_name = :col_b @transform df :col_a_plus_dynamic = :col_a + $col_name # Example using a string string_col = "col_c" @transform df :col_a_plus_string = :col_a + $(string_col) # Example using an integer index (1-based) @transform df :col_a_plus_index = :col_a + $(1) ``` ``` -------------------------------- ### Group and Combine with @by Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Perform grouping and combining operations in a single step. ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); @by df :x begin :y_sum = sum(:y) end ``` -------------------------------- ### Shorthand Grouping and Combining with @by Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @by macro provides a concise syntax for performing a group-and-combine operation in one step. ```julia using DataFramesMeta, Statistics df = DataFrame( region = ["East", "East", "West", "West"], sales = [100, 150, 200, 250], quarter = [1, 2, 1, 2] ) # Group by region and summarize @by df :region begin :total_sales = sum(:sales) :avg_sales = mean(:sales) :num_quarters = length(:quarter) end # 2×4 DataFrame # Row │ region total_sales avg_sales num_quarters # │ String Int64 Float64 Int64 # ─────┼─────────────────────────────────────────────── # 1 │ East 250 125.0 2 # 2 │ West 450 225.0 2 # Equivalent to: # @chain df begin # @groupby :region # @combine :total_sales = sum(:sales) # end ``` -------------------------------- ### Transformations with DataFrames.jl Mini-Language Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Demonstrates using the DataFrames.jl 'mini-language' (src => fun => dest) within @transform by escaping the entire transformation with $(). This allows for complex transformations and new column creation. ```julia df = DataFrame(a = [1, 2], b = [3, 4]) my_transformation = :a => (t -> t .+ 100) => :c @transform df begin $my_transformation :d = :b .+ 200 end ``` -------------------------------- ### DataFrames.jl compatibility and restrictions Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Demonstrates how DataFramesMeta handles column type mixing compared to standard DataFrames.jl. ```julia transform(df, [:A, 2] => (+) => :y) ``` ```julia @transform(df, :y = :A + $"B") ``` ```julia transform(df, [:A, "B"] => (+) => :y) ``` -------------------------------- ### @label! and @note! - Column Metadata Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @label! and @note! macros attach metadata to columns for documentation. Labels are short descriptions; notes are longer explanations that can be appended. ```APIDOC ## @label! and @note! - Column Metadata The `@label!` and `@note!` macros attach metadata to columns for documentation. Labels are short descriptions; notes are longer explanations that can be appended. ### Usage ```julia using DataFramesMeta df = DataFrame(wage = [15, 22, 18], tenure = [12, 36, 24]) # Add short labels @label! df begin :wage = "Hourly wage (2023 USD)" :tenure = "Job tenure (months)" end # View labels printlabels(df) # Add detailed notes (notes append by default) @note! df :wage = "Source: Bureau of Labor Statistics" @note! df :wage = "Adjusted for inflation to 2023 dollars" # View notes printnotes(df) # Access metadata programmatically label(df, :wage) note(df, :wage) ``` ### Example Output ``` ┌────────┬────────────────────────┐ │ Column │ Label │ ├────────┼────────────────────────┤ │ wage │ Hourly wage (2023 USD) │ │ tenure │ Job tenure (months) │ └────────┴────────────────────────┘ Column: wage ──────────── Label: Hourly wage (2023 USD) Source: Bureau of Labor Statistics Adjusted for inflation to 2023 dollars ``` ``` -------------------------------- ### Dynamic column creation with AsTable Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Defines a function to create a new column with a dynamically generated name based on the sum of input columns, then applies it using AsTable. ```julia function fun_with_new_name(x::NamedTuple) nms = string.(propertynames(x)) new_name = Symbol(join(nms, "_"), "_sum") s = sum(x) (; new_name => s) end julia> @rtransform df $AsTable = fun_with_new_name(AsTable([:a, :b])) ``` -------------------------------- ### Print DataFrame column notes using printnotes Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Displays labels and notes for columns in a DataFrame. Requires prior assignment of labels or notes using @label! or @note! macros. ```julia-repl julia> df = DataFrame(wage = [12], age = [23]); julia> @label! df :age = "Age (years)"; julia> @note! df :wage = "Derived from American Community Survey"; julia> @note! df :wage = "Missing values imputed as 0 wage"; julia> @label! df :wage = "Hourly wage (2015 USD)"; julia> printnotes(df) Column: wage ──────────── Label: Hourly wage (2015 USD) Derived from American Community Survey Missing values imputed as 0 wage Column: age ─────────── Label: Age (years) ``` -------------------------------- ### Create multiple columns with @astable Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Allows generating multiple new columns in a single operation while sharing intermediate calculations. ```julia-repl julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]); julia> @transform df @astable begin ex = extrema(:b) :b_first = :b .- first(ex) :b_last = :b .- last(ex) end 3×4 DataFrame Row │ a b b_first b_last │ Int64 Int64 Int64 Int64 ─────┼─────────────────────────────── 1 │ 1 400 0 -200 2 │ 2 500 100 -100 3 │ 3 600 200 0 ``` -------------------------------- ### @astable - Multiple Column Output Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @astable flag creates multiple columns from a single transformation, sharing intermediate computations. ```APIDOC ## @astable - Multiple Column Output The `@astable` flag creates multiple columns from a single transformation, sharing intermediate computations. ### Usage ```julia using DataFramesMeta, Statistics df = DataFrame(values = [10, 20, 30, 40, 50]) # Create multiple related columns efficiently @transform df @astable begin m = mean(:values) s = std(:values) :centered = :values .- m :standardized = (:values .- m) ./ s end # Row-wise with shared computation df2 = DataFrame(a = 1:4, b = 5:8) @rtransform df2 @astable begin total = :a + :b :sum = total :double_sum = total * 2 :is_large = total > 10 end ``` ### Example Output ``` 4×5 DataFrame Row │ a b sum double_sum is_large │ Int64 Int64 Int64 Int64 Bool ─────┼─────────────────────────────────────────── 1 │ 1 5 6 12 false 2 │ 2 6 8 16 false 3 │ 3 7 10 20 false 4 │ 4 8 12 24 true ``` ``` -------------------------------- ### @with Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Creates a scope where symbols are aliases for columns in a DataFrame. ```APIDOC ## @with ### Description @with creates a scope in which all symbols that appear are aliases for the columns in a DataFrame. ### Parameters - **df** (DataFrame) - Required - The input data frame. - **expr** (Expr) - Required - The expression to evaluate within the scope of the DataFrame columns. ``` -------------------------------- ### Load DataFramesMeta.jl Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Load the DataFramesMeta.jl package, which also loads DataFrames.jl, into the current Julia session. ```julia using DataFramesMeta ``` -------------------------------- ### Reference columns by variable and string Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Use the $ interpolation syntax to reference columns dynamically via symbols or strings. ```julia colname = :col_a @select(df, $colname, :col_b) ``` ```julia @transform(df, :new = $"col_a" + $"col_b") ``` -------------------------------- ### Select Columns Using Vector of Names Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Select multiple columns by providing a vector of their names (as strings) prefixed with '$'. ```julia varnames = ["name", "sleep_total"] @select msleep $varnames ``` -------------------------------- ### Select Columns Using @select Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use @select to choose specific columns by their symbol names. Ensure the DataFrame 'msleep' is available. ```julia @select msleep :name :sleep_total ``` -------------------------------- ### Select Columns Using String Variable Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Select columns using a string variable name by prefixing it with '$'. This allows dynamic column selection. ```julia varname = "sleep_total" @select msleep :name $varname ``` -------------------------------- ### Referencing columns with $ Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Use $ to refer to columns using variables containing Symbols or strings. ```julia df = DataFrame(A = 1:3, :B = [2, 1, 2]) nameA = :A df2 = @transform(df, :C = :B - $nameA) nameA_string = "A" df3 = @transform(df, :C = :B - $nameA_string) nameB = "B" df4 = @eachrow df begin :A = $nameB end ``` -------------------------------- ### Add columns with transformations using @transform Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Demonstrates adding new columns to a DataFrame based on transformations using the @transform macro. It also shows usage with GroupedDataFrame and the in-place @transform! macro. ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); gd = @groupby(df, :x); @transform(df, :x2 = 2 * :x, :y) @transform(gd, :x2 = 2 .* :y .* first(:y)) @transform!(df, :x, :y) @transform!(df, :x = 2 * :x, :y) @transform!(gd, :y = 2 .* :y .* first(:y)) ``` -------------------------------- ### Reference columns by position Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Access columns by their integer index using the $ syntax. ```julia @select(df, $1, $3) # First and third columns ``` -------------------------------- ### Select Columns Matching Regex Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use `Cols()` with a regular expression to select columns whose names match the pattern. This is powerful for selecting columns based on naming conventions. ```julia @select msleep Cols(r"^sl") ``` -------------------------------- ### @rename Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Rename columns in a data frame using keyword argument-like syntax. ```APIDOC ## @rename ### Description Rename columns in a data frame using the keyword argument-like syntax :new = :old. ### Parameters - **df** (DataFrame) - Required - The input data frame. - **renames** (Expr) - Required - One or more :new = :old mappings. ``` -------------------------------- ### Create Multiple Columns with @astable Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @astable macro allows generating multiple columns from a single transformation block, enabling shared intermediate computations. ```julia using DataFramesMeta, Statistics df = DataFrame(values = [10, 20, 30, 40, 50]) # Create multiple related columns efficiently @transform df @astable begin m = mean(:values) s = std(:values) :centered = :values .- m :standardized = (:values .- m) ./ s end # Row-wise with shared computation df2 = DataFrame(a = 1:4, b = 5:8) @rtransform df2 @astable begin total = :a + :b :sum = total :double_sum = total * 2 :is_large = total > 10 end # 4×5 DataFrame # Row │ a b sum double_sum is_large # │ Int64 Int64 Int64 Int64 Bool # ─────┼─────────────────────────────────────────── # 1 │ 1 5 6 12 false # 2 │ 2 6 8 16 false # 3 │ 3 7 10 20 false # 4 │ 4 8 12 24 true ``` -------------------------------- ### Sum columns using AsTable with string names Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Creates a new column 'y' by summing the values of columns specified by string names in 'vars' using AsTable. ```julia julia> @rtransform df :y = sum(AsTable(vars)) ``` -------------------------------- ### Select Columns Matching a Regex Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Wrap a regular expression in $() to select all columns whose names match the pattern. The regex must be enclosed in parentheses. ```julia @select df $(r"^a") ``` -------------------------------- ### Split-Apply-Combine with @groupby and @combine Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Use @groupby to group data and @combine to summarize it. These macros can also be used for transformations or on ungrouped data. ```julia using DataFramesMeta, Statistics df = DataFrame( category = ["A", "A", "B", "B", "B"], value = [10, 20, 30, 40, 50], weight = [1.0, 2.0, 1.5, 2.5, 3.0] ) # Group and summarize gd = @groupby(df, :category) @combine gd begin :mean_val = mean(:value) :sum_val = sum(:value) :count = length(:value) end # 2×4 DataFrame # Row │ category mean_val sum_val count # │ String Float64 Int64 Int64 # ─────┼──────────────────────────────────── # 1 │ A 15.0 30 2 # 2 │ B 40.0 120 3 # Transform within groups (keeps all rows) @transform(gd, :demeaned = :value .- mean(:value)) # Group by multiple columns @groupby(df, :category, :weight) # Combine on ungrouped DataFrame (treats as single group) @combine(df, :total = sum(:value), :avg = mean(:value)) ``` -------------------------------- ### @combine Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Summarize or collapse a grouped data frame by performing transformations at the group level. ```APIDOC ## @combine ### Description Summarize, or collapse, a grouped data frame by performing transformations at the group level and collecting the result into a single data frame. ### Parameters - **gd** (GroupedDataFrame/DataFrame) - Required - The input data frame or grouped data frame. - **transformations** (Expr) - Required - Keyword-like syntax :new = f(:old) to define summary columns. ``` -------------------------------- ### @select - Select and Transform Columns Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @select macro keeps only specified columns or creates new columns from transformations, returning a new DataFrame. ```APIDOC ## @select ### Description Selects specific columns from a DataFrame or creates new columns based on transformations. ### Parameters - **df** (DataFrame) - Required - The input DataFrame. - **args** (Symbols/Expressions) - Required - Column names or transformation expressions. ### Request Example @select(df, :a, :c) ### Response - **DataFrame** - A new DataFrame containing only the selected or transformed columns. ``` -------------------------------- ### @subset and @subset! Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Select row subsets from a DataFrame or GroupedDataFrame. @subset returns a new DataFrame, while @subset! modifies it in-place. ```APIDOC ## @subset and @subset! ### Description Select row subsets. Operates on both a DataFrame and a GroupedDataFrame. @subset always returns a freshly-allocated data frame whereas @subset! modifies the data frame in-place. ### Parameters - **df** (DataFrame/GroupedDataFrame) - Required - The input data frame. - **expressions** (Expr) - Required - One or more boolean expressions to filter rows. ``` -------------------------------- ### @chain - Pipeline Operations Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @chain macro from Chain.jl (re-exported by DataFramesMeta) enables piping operations together, similar to R's %>% or Julia's |>. ```APIDOC ## @chain - Pipeline Operations The `@chain` macro from Chain.jl (re-exported by DataFramesMeta) enables piping operations together, similar to R's `%>%` or Julia's `|>`. ### Usage ```julia using DataFramesMeta, Statistics df = DataFrame( category = repeat(["A", "B"], 5), value = 1:10, weight = rand(10) ) # Chain multiple operations result = @chain df begin @rsubset :value > 3 @transform :weighted = :value .* :weight @groupby :category @combine begin :mean_value = mean(:value) :total_weighted = sum(:weighted) end @orderby -:mean_value end # Use _ to reference previous result explicitly @chain df begin @select :value :category @rsubset :value > 5 nrow(_) end # @aside for side effects @chain df begin @rsubset :value > 5 @aside println("Filtered to ", nrow(_), " rows") @select :category :value end ``` ``` -------------------------------- ### Pass DataFrames mini-language directly Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Apply multiple transformations or aggregations by passing a vector of pairs to the macro. ```julia @transform(df, $([:col_a, :col_b] .=> [sum, mean])) ``` -------------------------------- ### Arrange Rows by Column using @orderby Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use the @orderby macro to reorder rows based on the values in a specified column. This is useful for sorting data. ```julia @orderby msleep :order ``` -------------------------------- ### Select Range of Columns Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use `Between()` with @select to select a contiguous range of columns defined by two column names. Ensure the columns are ordered as expected. ```julia @select msleep Between(:name, :order) ``` -------------------------------- ### Rename Columns with @rename Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Rename columns using the :new = :old syntax. Supports both single-argument and block formats. ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); @rename df :x_new = :x @rename(df, :x_new = :x) @rename df $"Name with spaces" = :y @rename df begin :x_new = :x :y_new = :y end ``` -------------------------------- ### Introducing @groupby Macro in DataFramesMeta v0.15.0 Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/NEWS.md DataFramesMeta.jl v0.15.0 introduces the `@groupby` macro, offering a more convenient syntax for grouping DataFrames. ```julia @groupby ``` -------------------------------- ### Sum columns using AsTable with symbol names Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Creates a new column 'y' by summing the values of columns specified by symbols using AsTable. ```julia julia> @rtransform df :y = sum(AsTable([:a, :b])) ``` -------------------------------- ### New @note! and @label! Macros in DataFramesMeta v0.15.0 Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/NEWS.md DataFramesMeta.jl v0.15.0 adds the `@note!` and `@label!` macros, along with `printlabels` and `printnotes`, to facilitate easier management of metadata within DataFrames. ```julia @note! ``` ```julia @label! ``` -------------------------------- ### @eachrow - Row Iteration with Control Flow Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @eachrow macro iterates over rows with full control flow support (if/else, loops, break, continue). Use @eachrow! for in-place modification. ```APIDOC ## @eachrow - Row Iteration with Control Flow The `@eachrow` macro iterates over rows with full control flow support (if/else, loops, break, continue). Use `@eachrow!` for in-place modification. ### Usage ```julia using DataFramesMeta df = DataFrame(A = 1:5, B = [2, 1, 2, 1, 2]) # Conditional row modification @eachrow df begin if :A > :B :A = 0 end end # Allocate new columns with @newcol @eachrow df begin @newcol :C::Vector{Float64} :C = :B == 2 ? pi * :A : Float64(:B) end # Use control flow @eachrow df begin :A == 3 && continue # Skip row 3 :A > 4 && break # Stop after row 4 println("Processing row with A = ", :A) end # In-place modification df2 = copy(df) @eachrow! df2 begin if :B == 1 :A = :A * 10 end end ``` ### Example Output ``` 5×2 DataFrame (returns new DataFrame) ``` ``` -------------------------------- ### AsTable for multi-column transformations on RHS Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Shows how AsTable on the right-hand side of a transformation allows operations on multiple columns grouped into a NamedTuple. ```julia :y = sum(AsTable(cols)) ``` -------------------------------- ### Select and Transform Columns with @select Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Keeps specified columns or creates new ones, returning a new DataFrame. Supports column selectors like Not, Between, and Cols. ```julia using DataFramesMeta df = DataFrame(a = 1:5, b = 11:15, c = 21:25) # Select specific columns @select(df, :a, :c) ``` ```julia # Select and create new column @select df begin :a :sum_ab = :a + :b end ``` ```julia # Use column selectors @select(df, Not(:c)) # All columns except :c @select(df, Between(:a, :b)) # Columns from :a to :b @select(df, Cols(r"^a")) # Columns matching regex (starts with "a") ``` ```julia # Row-wise selection with @rselect @rselect(df, :a, :result = :a > 2 ? "high" : "low") ``` -------------------------------- ### Wrapping complex expressions Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Use parentheses with $ to evaluate complex expressions for column names. ```julia @transform df :a + $("a column name" * " in two parts") @transform df :a + $(get_column_name(x)) ``` -------------------------------- ### View column notes with printnotes Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Use printnotes to display all attached notes for a specific column in a DataFrame. ```julia df = DataFrame(wage = [-99, 16, 14, 23, 5000]) @note! df :wage = "Hourly wage from 2015 American Community Survey (ACS)" @rtransform! df :wage = :wage == -99 ? 0 : :wage @note! df :wage = "Individuals with no job are recorded as 0 wage" printnotes(df) ``` -------------------------------- ### Attach short column labels with @label! Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Use @label! to assign short, descriptive labels to DataFrame columns. This is useful for enhancing readability in printed output. ```julia df = DataFrame(wage = [16, 25, 14, 23]); @label! df :wage = "Wage (2015 USD)" ``` -------------------------------- ### Support for Not, All, and Cols in @select in DataFramesMeta v0.15.0 Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/NEWS.md DataFramesMeta.jl v0.15.0 enhances the `@select` macro to support `Not`, `All`, and `Cols`, simplifying the process of selecting or excluding multiple columns at once. ```julia Not ``` ```julia All ``` ```julia Cols ``` -------------------------------- ### Iterate Rows with @eachrow Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt Use @eachrow for row-wise iteration with full control flow support. Use @eachrow! for in-place modifications. ```julia using DataFramesMeta df = DataFrame(A = 1:5, B = [2, 1, 2, 1, 2]) # Conditional row modification @eachrow df begin if :A > :B :A = 0 end end # 5×2 DataFrame (returns new DataFrame) # Allocate new columns with @newcol @eachrow df begin @newcol :C::Vector{Float64} :C = :B == 2 ? pi * :A : Float64(:B) end # Use control flow @eachrow df begin :A == 3 && continue # Skip row 3 :A > 4 && break # Stop after row 4 println("Processing row with A = ", :A) end # In-place modification df2 = copy(df) @eachrow! df2 begin if :B == 1 :A = :A * 10 end end # df2 is modified ``` -------------------------------- ### Select Columns Using Variable Name Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Select columns when their names are stored in variables. Use the '$' syntax to indicate a variable column reference. This works for symbols. ```julia varname = :sleep_total @select msleep :name $varname ``` -------------------------------- ### Pipeline Operations with @chain Source: https://context7.com/juliadata/dataframesmeta.jl/llms.txt The @chain macro enables piping operations together, supporting explicit references with _ and side effects with @aside. ```julia using DataFramesMeta, Statistics df = DataFrame( category = repeat(["A", "B"], 5), value = 1:10, weight = rand(10) ) # Chain multiple operations result = @chain df begin @rsubset :value > 3 @transform :weighted = :value .* :weight @groupby :category @combine begin :mean_value = mean(:value) :total_weighted = sum(:weighted) end @orderby -:mean_value end # Use _ to reference previous result explicitly @chain df begin @select :value :category @rsubset :value > 5 nrow(_) # Returns count of filtered rows end # @aside for side effects @chain df begin @rsubset :value > 5 @aside println("Filtered to ", nrow(_), " rows") @select :category :value end ``` -------------------------------- ### Order Rows with @orderby Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Sort rows in a DataFrame based on column values or transformations. Only applicable to DataFrames. ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); @orderby(df, -1 .* :x) @orderby(df, :x, :y .- mean(:y)) ``` -------------------------------- ### Subset Rows with @subset and @byrow block Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Apply multiple row conditions using a block with @subset and @byrow. All conditions within the block are evaluated for each row, and rows satisfying all conditions are kept. ```julia df = DataFrame(a = [1, 2], b = [3, 4]) @subset df @byrow begin :a > 1 :b < 5 end ``` -------------------------------- ### Combine Grouped Data with @combine Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Summarize grouped data frames by applying transformations at the group level. Requires a DataFrame or GroupedDataFrame as the first argument. ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); gd = @groupby(df, :x); @combine(gd, :x2 = sum(:y)) @combine(gd, :x2 = :y .- sum(:y)) @combine(gd, $AsTable = (n1 = sum(:y), n2 = first(:y))) ``` ```julia df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]); gd = groupby(df, :x); @combine(gd, $AsTable = (a = sum(:x), b = sum(:y))) ``` -------------------------------- ### Select Columns Except One Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/dplyr.md Use `Not()` with @select to exclude a specific column from the selection. This is useful for selecting all columns except one. ```julia @select msleep Not(:name) ``` -------------------------------- ### View column labels with printlabels Source: https://github.com/juliadata/dataframesmeta.jl/blob/master/docs/src/index.md Use printlabels to display column labels in a DataFrame. It can filter by specific columns and control the display of unlabelled columns. ```julia df = DataFrame(wage = [12], age = [23]); @label! df :wage = "Hourly wage (2015 USD)"; printlabels(df) printlabels(df, [:wage, :age]; unlabelled = false) ```