### Test DataFrames.jl Package Installation Source: https://dataframes.juliadata.org/stable/man/basics This code snippet demonstrates how to run the bundled tests for the DataFrames.jl package to verify its installation. Be aware that this process can take over 30 minutes to complete. ```julia using Pkg Pkg.test("DataFrames") # Warning! This will take more than 30 minutes. ``` -------------------------------- ### Install DataFrames.jl Source: https://dataframes.juliadata.org/stable/man/sorting Installs the DataFrames.jl package using the Julia package manager. This is the first step to using the package for data manipulation. ```julia using Pkg Pkg.add("DataFrames") ``` -------------------------------- ### Install CSV.jl Package Source: https://dataframes.juliadata.org/stable/man/basics Shows how to install the CSV.jl package, which is a dependency for reading CSV files into DataFrames. This is typically done using Julia's package manager. ```julia using Pkg Pkg.add("CSV") ``` -------------------------------- ### Setup DataFrame for manipulation Source: https://dataframes.juliadata.org/stable/man/basics Initializes a DataFrame named 'df' with three columns 'x', 'y', and 'z', each containing a range of integers. This setup is a prerequisite for demonstrating subsequent data manipulation and indexing operations. ```julia julia> df = DataFrame(x = 1:3, y = 4:6, z = 7:9) # define data frame 3×3 DataFrame Row │ x y z │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 4 7 2 │ 2 5 8 3 │ 3 6 9 ``` -------------------------------- ### Install DataFrames.jl Package Source: https://dataframes.juliadata.org/stable/man/basics This code snippet shows how to add the DataFrames.jl package to your Julia environment using the Pkg manager. It requires no external dependencies beyond a Julia installation. ```julia using Pkg Pkg.add("DataFrames") ``` ```julia ] # ']' should be pressed (@v1.9) pkg> add DataFrames ``` -------------------------------- ### Install DataFramesMeta.jl Package Source: https://dataframes.juliadata.org/stable/man/querying_frameworks This code snippet shows how to install the DataFramesMeta.jl package using the Julia Package manager. ```julia using Pkg Pkg.add("DataFramesMeta") ``` -------------------------------- ### Install Query.jl Package Source: https://dataframes.juliadata.org/stable/man/querying_frameworks This code snippet shows how to install the Query.jl package using Julia's Pkg manager. It's a prerequisite for using Query.jl's data manipulation features. ```julia using Pkg Pkg.add("Query") ``` -------------------------------- ### Install DataFrameMacros.jl Package Source: https://dataframes.juliadata.org/stable/man/querying_frameworks This snippet shows how to install the DataFrameMacros.jl package using the Pkg manager in Julia. It's a prerequisite for using the package's functionalities. ```julia using Pkg Pkg.add("DataFrameMacros") ``` -------------------------------- ### Install TidierData.jl Package Source: https://dataframes.juliadata.org/stable/man/querying_frameworks This code snippet demonstrates how to install the TidierData.jl package using Julia's package manager. It ensures that the necessary functionalities for data manipulation are available in the Julia environment. ```julia using Pkg Pkg.add("TidierData") ``` -------------------------------- ### Check DataFrames.jl Package Status Source: https://dataframes.juliadata.org/stable/man/basics This code snippet shows how to check the installed version and status of the DataFrames.jl package using the Pkg manager in Julia. This is useful for verifying the installation and managing package versions. ```julia ] (@v1.9) pkg> status DataFrames ``` -------------------------------- ### Install CSV.jl Package Source: https://dataframes.juliadata.org/stable/man/importing_and_exporting Installs the CSV.jl package using the Julia package manager. This is a prerequisite for using CSV.jl functions. ```julia using Pkg Pkg.add("CSV") ``` -------------------------------- ### Create DataFrames for Joining Example Source: https://dataframes.juliadata.org/stable/man/joins Demonstrates the creation of two sample DataFrames, 'people' and 'jobs', which will be used for join operations. This requires the DataFrames package. ```julia using DataFrames people = DataFrame(ID=[20, 40], Name=["John Doe", "Jane Doe"]) jobs = DataFrame(ID=[20, 40], Job=["Lawyer", "Doctor"]) ``` -------------------------------- ### Get Single Column View from DataFrame - Julia Source: https://dataframes.juliadata.org/stable/man/basics Creates a view of a single column from a DataFrame using the `@view` macro. This allows access to the column data without copying it, improving memory efficiency. The example selects the first column for the first 5 rows. ```julia julia> @view german[1:5, 1] 5-element view(::Vector{Int64}, 1:5) with eltype Int64: 0 1 2 3 4 ``` -------------------------------- ### DataFrames.subset! Function Documentation Source: https://dataframes.juliadata.org/stable/lib/functions Provides detailed documentation for the subset! function, including its signatures, behavior, parameters, and examples for both DataFrames and GroupedDataFrames. ```APIDOC ## `subset!(df::AbstractDataFrame, args...; skipmissing::Bool=false, threads::Bool=true)` ## `subset!(gdf::GroupedDataFrame{DataFrame}, args...; skipmissing::Bool=false, ungroup::Bool=true, threads::Bool=true)` ### Description Updates data frame `df` or the parent of `gdf` in place to contain only rows for which all values produced by transformation(s) `args` for a given row is `true`. All transformations must produce vectors containing `true` or `false`. When the first argument is a `GroupedDataFrame`, transformations are also allowed to return a single `true` or `false` value, which results in including or excluding a whole group. If `skipmissing=false` (the default) `args` are required to produce results containing only `Bool` values. If `skipmissing=true`, additionally `missing` is allowed and it is treated as `false` (i.e. rows for which one of the conditions returns `missing` are skipped). Each argument passed in `args` can be any specifier following the rules described for `select` with the restriction that: * specifying target column name is not allowed as `subset!` does not create new columns; * every passed transformation must return a scalar or a vector (returning `AbstractDataFrame`, `NamedTuple`, `DataFrameRow` or `AbstractMatrix` is not supported). If `ungroup=false` the passed `GroupedDataFrame` `gdf` is updated (preserving the order of its groups) and returned. If `threads=true` (the default) transformations may be run in separate tasks which can execute in parallel (possibly being applied to multiple rows or groups at the same time). Whether or not tasks are actually spawned and their number are determined automatically. Set to `false` if some transformations require serial execution or are not thread-safe. If `GroupedDataFrame` is subsetted then it must include all groups present in the `parent` data frame, like in `select!`. In this case the passed `GroupedDataFrame` is updated to have correct groups after its parent is updated. ### Method `subset!` ### Parameters #### Path Parameters None #### Query Parameters - **skipmissing** (Bool) - Optional - Defaults to `false`. If `true`, `missing` values in transformation results are treated as `false`. - **threads** (Bool) - Optional - Defaults to `true`. If `true`, transformations may run in parallel. - **ungroup** (Bool) - Optional - Defaults to `true`. If `false` when used with `GroupedDataFrame`, the `GroupedDataFrame` is updated and returned, preserving group order. #### Request Body Transformations (`args`): Each argument can be a column name or a transformation function. Transformations must return a scalar or a vector of `Bool` (or `Bool?` if `skipmissing=true`). ### Request Example ```julia df = DataFrame(id=1:4, x=[true, false, true, false], y=[true, true, false, false]) subset!(df, :x, :y => ByRow(!)); # df is now 1x3 DataFrame with Row 3 df_grouped = DataFrame(id=1:4, y=[true, true, false, false], v=[1, 2, 11, 12]) subset!(groupby(df_grouped, :y), :v => x -> x .> minimum(x)); # df_grouped is now 2x3 DataFrame containing groups that satisfy the condition df_missing = DataFrame(id=1:4, x=[true, false, true, false], z=[true, true, missing, missing]) subset!(df_missing, :x, :z, skipmissing=true); # df_missing is now 1x4 DataFrame with Row 1 ``` ### Response #### Success Response (200) Returns the modified `AbstractDataFrame` or `GroupedDataFrame` in place. #### Response Example ```julia # Example for df after subset!(df, :x, :y => ByRow(!)) 1×3 DataFrame Row │ id x y │ Int64 Bool Bool ─────┼──────────────────── 1 │ 3 true false # Example for df_grouped after subset!(groupby(df_grouped, :y), :v => x -> x .> minimum(x)) 2×3 DataFrame Row │ id y v │ Int64 Bool Int64 ─────┼───────────────────── 1 │ 2 true 2 2 │ 4 false 12 # Example for df_missing after subset!(df_missing, :x, :z, skipmissing=true) 1×4 DataFrame Row │ id x z v │ Int64 Bool Bool? Int64 ─────┼──────────────────────── 1 │ 1 true true 1 ``` ### Error Handling - `ArgumentError`: Raised if `skipmissing=false` and a transformation returns `missing` values. - Other errors may occur based on invalid transformation specifications or incompatible types. ``` -------------------------------- ### StackedVector Constructor Example in Julia Source: https://dataframes.juliadata.org/stable/lib/types Demonstrates the construction of a `StackedVector`, which provides a linear, concatenated view into multiple AbstractVectors. It takes a collection of AbstractVectors as input. ```julia StackedVector(Any[[1, 2], [9, 10], [11, 12]]) # [1, 2, 9, 10, 11, 12] ``` -------------------------------- ### DataFrame Construction Examples Source: https://dataframes.juliadata.org/stable/lib/types Demonstrates various ways to construct and manipulate DataFrames using the `AsTable` type for column selection and transformation. This includes passing columns as a NamedTuple and expanding it back into columns. ```julia julia> df1 = DataFrame(a=1:3, b=11:13) 3×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 11 2 │ 2 12 3 │ 3 13 julia> df2 = select(df1, AsTable([:a, :b]) => ByRow(identity)) 3×1 DataFrame Row │ a_b_identity │ NamedTuple… ─────┼───────────────── 1 │ (a = 1, b = 11) 2 │ (a = 2, b = 12) 3 │ (a = 3, b = 13) julia> select(df2, :a_b_identity => AsTable) 3×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 11 2 │ 2 12 3 │ 3 13 julia> select(df1, AsTable([:a, :b]) => ByRow(nt -> map(x -> x^2, nt)) => AsTable) 3×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 121 2 │ 4 144 3 │ 9 169 ``` -------------------------------- ### Prepend Rows to DataFrame (Julia) Source: https://dataframes.juliadata.org/stable/lib/functions Demonstrates how to prepend rows from one DataFrame to another. Shows examples with different column matching strategies like :union. ```julia df1 = DataFrame(A=1:3, B=1:3) df2 = DataFrame(A=4.0:6.0, B=4:6) prepend!(df1, df2) df2 = DataFrame(A=4.0:6.0, B=4:6) prepend!(df2, DataFrame(A=1), (; C=1:2), cols=:union) ``` -------------------------------- ### Select Rows and All Columns in DataFrame (Julia) Source: https://dataframes.juliadata.org/stable/man/basics This example shows how to select a range of rows while retaining all columns from a DataFrame. The colon ':' is used as a shorthand for selecting all columns. The result is a DataFrame containing the specified rows and all original columns. ```julia julia> german[1:5, :] 5×10 DataFrame Row │ id Age Sex Job Housing Saving accounts Checking accoun ⋯ │ Int64 Int64 String7 Int64 String7 String15 String15 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 0 67 male 2 own NA little ⋯ 2 │ 1 22 female 2 own little moderate 3 │ 2 49 male 1 own little NA 4 │ 3 45 male 2 free little little 5 │ 4 53 male 2 free little little ⋯ 4 columns omitted ``` -------------------------------- ### Create and Initialize DataFrame in Julia Source: https://dataframes.juliadata.org/stable/man/working_with_dataframes Initializes a DataFrame with three columns: 'A', 'B', and 'C', using ranges and repeated values. This is a common starting point for data manipulation tasks in Julia. ```julia df = DataFrame(A=1:2:1000, B=repeat(1:10, inner=50), C=1:500) ``` -------------------------------- ### Julia: Reorder columns using `select` Source: https://dataframes.juliadata.org/stable/man/basics This example illustrates how to reorder columns in a DataFrame using the `select` function. It explicitly lists the desired column order, effectively changing the DataFrame's column arrangement. ```julia df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"]) select(df, :c, :b, :a) ``` -------------------------------- ### Getting Proportion of Rows per Group with DataFrames.jl Source: https://dataframes.juliadata.org/stable/man/split_apply_combine Illustrates how to calculate the proportion of rows for each group in a `GroupedDataFrame` using the `proprow` operation. Examples include using the default column name and specifying a custom target column name. ```julia df = DataFrame(customer_id=["a", "b", "b", "b", "c", "c"], transaction_id=[12, 15, 19, 17, 13, 11], volume=[2, 3, 1, 4, 5, 9]) gdf = groupby(df, :customer_id, sort=true) # Using default column name :proprow combine(gdf, proprow) # Using a custom target column name combine(gdf, proprow => "transaction_fraction") ``` -------------------------------- ### Get Single Cell View from DataFrame - Julia Source: https://dataframes.juliadata.org/stable/man/basics Retrieves a view of a single cell within a DataFrame using the `@view` macro. This is highly memory-efficient as it only provides a reference to the existing data. The example accesses the data at row 2, column 2. ```julia julia> @view german[2, 2] 0-dimensional view(::Vector{Int64}, 2) with eltype Int64: 22 ``` -------------------------------- ### Get Single Row View from DataFrame - Julia Source: https://dataframes.juliadata.org/stable/man/basics Creates a view of a single row from a DataFrame using the `@view` macro. This provides efficient access to all columns of a specific row without data duplication. The example selects row 3 and columns 2 through 5. ```julia julia> @view german[3, 2:5] DataFrameRow Row │ Age Sex Job Housing │ Int64 String7 Int64 String7 ─────┼──────────────────────────────── 3 │ 49 male 1 own ``` -------------------------------- ### Create DataFrame from Matrix and Column Names Source: https://dataframes.juliadata.org/stable/man/basics Demonstrates creating a DataFrame by providing a matrix of data and a vector of column names to the DataFrame constructor. This method is useful when data is already in memory as a matrix. ```julia mat = [1 2 4 5; 15 58 69 41; 23 21 26 69] nms = ["a", "b", "c", "d"] DataFrame(mat, nms) ``` -------------------------------- ### Benchmark Indexing vs. View Creation - Julia Source: https://dataframes.juliadata.org/stable/man/basics Compares the performance and memory allocation of standard DataFrame indexing versus creating a DataFrame view using the `@view` macro. Benchmarking is done using the BenchmarkTools.jl package. The results show that view creation is significantly faster and allocates much less memory. ```julia julia> using BenchmarkTools julia> @btime $german[1:end-1, 1:end-1]; 9.900 μs (44 allocations: 57.56 KiB) julia> @btime @view $german[1:end-1, 1:end-1]; 67.332 ns (2 allocations: 32 bytes) ``` -------------------------------- ### Split-Apply-Combine with DataFramesMeta.jl - Grouping and Aggregation Source: https://dataframes.juliadata.org/stable/man/querying_frameworks This example illustrates the split-apply-combine pattern using DataFramesMeta.jl. It filters data, groups by a key, calculates minimum and maximum values for each group, and then selects a derived range column. ```julia julia> df = DataFrame(key=repeat(1:3, 4), value=1:12) 12×2 DataFrame Row │ key value │ Int64 Int64 ─────┼────────────── 1 │ 1 1 2 │ 2 2 3 │ 3 3 4 │ 1 4 5 │ 2 5 6 │ 3 6 7 │ 1 7 8 │ 2 8 9 │ 3 9 10 │ 1 10 11 │ 2 11 12 │ 3 12 julia> @chain df begin @rsubset :value > 3 @by(:key, :min = minimum(:value), :max = maximum(:value)) @select(:key, :range = :max - :min) end 3×2 DataFrame Row │ key range │ Int64 Int64 ─────┼────────────── 1 │ 1 6 2 │ 2 6 3 │ 3 6 ``` -------------------------------- ### Get DataFrame Dimensions with size() Source: https://dataframes.juliadata.org/stable/man/basics The `size` function returns the dimensions (number of rows and columns) of a DataFrame. It can be called with one argument to get a tuple of (rows, columns), or with a second argument (1 for rows, 2 for columns) to get a specific dimension. ```julia julia> german = copy(german_ref); julia> size(german) (1000, 10) julia> size(german, 1) 1000 julia> size(german, 2) 10 ``` -------------------------------- ### Get DataFrame Column Names (Julia) Source: https://dataframes.juliadata.org/stable/man/basics Shows how to retrieve column names from a DataFrame as a vector of strings using the `names` function. It also demonstrates filtering column names based on their element type, such as `AbstractString`. ```julia julia> names(german) 10-element Vector{String}: "id" "Age" "Sex" "Job" "Housing" "Saving accounts" "Checking account" "Credit amount" "Duration" "Purpose" julia> names(german, AbstractString) 5-element Vector{String}: "Sex" "Housing" "Saving accounts" "Checking account" "Purpose" ``` -------------------------------- ### Load DataFrames.jl Package Source: https://dataframes.juliadata.org/stable/man/basics This code snippet shows the command to load the DataFrames.jl package into your Julia session, making its functionalities available for use. This is a prerequisite for working with DataFrames. ```julia using DataFrames ``` -------------------------------- ### Broadcasting Functions for Vector Operations in Julia Source: https://dataframes.juliadata.org/stable/man/basics Explains how to define functions that broadcast over vectors, allowing direct application to columns without needing ByRow. Examples include element-wise addition and a function operating on two columns. ```julia g(x) = x .+ 1 transform(df, :a => g) h(x, y) = x .+ y .+ 1 transform(df, [:a, :b] => h) ``` -------------------------------- ### Create DataFrames with Keyword Arguments and Pairs Source: https://context7.com/context7/dataframes_juliadata_stable/llms.txt Demonstrates creating DataFrames.jl DataFrames using keyword arguments, named tuples, pairs, dictionaries, and matrices. These methods allow for flexible initialization of tabular data structures in Julia. ```julia using DataFrames # Keyword argument constructor df = DataFrame(a=1:4, b=["M", "F", "F", "M"]) # 4×2 DataFrame # Row │ a b # │ Int64 String # ─────┼─────────────── # 1 │ 1 M # 2 │ 2 F # 3 │ 3 F # 4 │ 4 M # Named tuple of vectors df = DataFrame((a=[1, 2], b=[3, 4])) # Vector of named tuples df = DataFrame([(a=1, b=0), (a=2, b=0)]) # Pair constructor df = DataFrame("a" => 1:2, "b" => 0) # Dictionary constructor df = DataFrame(Dict(:a => 1:2, :b => 0)) # Matrix constructor with automatic column names df = DataFrame([1 0; 2 0], :auto) # 2×2 DataFrame # Row │ x1 x2 # │ Int64 Int64 # ─────┼────────────── # 1 │ 1 0 # 2 │ 2 0 ``` -------------------------------- ### Advanced DataFrame Column Selection with Not, Between, Cols, All (Julia) Source: https://dataframes.juliadata.org/stable/man/working_with_dataframes Provides examples of using advanced column selectors like `Not`, `Between`, `Cols`, and `All` for more complex DataFrame subsetting. `Not` excludes, `Between` selects a range, `All` selects all, and `Cols` selects based on a predicate. ```julia julia> df = DataFrame(r=1, x1=2, x2=3, y=4) 1×4 DataFrame Row │ r x1 x2 y │ Int64 Int64 Int64 Int64 ─────┼──────────────────────────── 1 │ 1 2 3 4 julia> df[:, Not(:r)] # drop :r column 1×3 DataFrame Row │ x1 x2 y │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 2 3 4 julia> df[:, Between(:r, :x2)] # keep columns between :r and :x2 1×3 DataFrame Row │ r x1 x2 │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 2 3 julia> df[:, All()] # keep all columns 1×4 DataFrame Row │ r x1 x2 y │ Int64 Int64 Int64 Int64 ─────┼──────────────────────────── 1 │ 1 2 3 4 julia> df[:, Cols(x -> startswith(x, "x"))] # keep columns whose name starts with "x" 1×2 DataFrame Row │ x1 x2 │ Int64 Int64 ─────┼────────────── 1 │ 2 3 ``` -------------------------------- ### Copy DataFrame Source: https://dataframes.juliadata.org/stable/man/basics Demonstrates creating a copy of an existing DataFrame. This is a common practice to preserve the original data before performing modifications. ```julia german = copy(german_ref) ``` -------------------------------- ### Initialize DataFrame with Named Columns in Julia Source: https://dataframes.juliadata.org/stable/man/basics Shows how to initialize a DataFrame with specified column names and data. Supports broadcasting scalar values to fill entire columns. Column names are provided as Symbols (e.g., :A, :B). ```julia julia> DataFrame(A=1:3, B=5:7, fixed=1) 3×3 DataFrame Row │ A B fixed │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 5 1 2 │ 2 6 1 3 │ 3 7 1 ``` -------------------------------- ### RepeatedVector Constructor Example in Julia Source: https://dataframes.juliadata.org/stable/lib/types Provides examples of how to construct a `RepeatedVector`, which is a view into an AbstractVector with repeated elements. It takes a parent vector and specifies inner and outer repetition counts. ```julia RepeatedVector([1, 2], 3, 1) # [1, 1, 1, 2, 2, 2] RepeatedVector([1, 2], 1, 3) # [1, 2, 1, 2, 1, 2] RepeatedVector([1, 2], 2, 2) # [1, 1, 2, 2, 1, 1, 2, 2] ``` -------------------------------- ### Create and Display DataFrame in Julia Source: https://dataframes.juliadata.org/stable/man/working_with_dataframes Demonstrates creating a DataFrame in Julia using the DataFrames.jl package and displays its default summarized output. It also shows how to adjust printing options to display all rows or columns. ```julia using DataFrames df = DataFrame(A=1:2:1000, B=repeat(1:10, inner=50), C=1:500) # Display default output (sample) println(df) # Display all rows show(df, allrows=true) # Display all columns show(df, allcols=true) ``` -------------------------------- ### Get Last N Rows of DataFrame Source: https://dataframes.juliadata.org/stable/lib/functions The `last` function can also be used to get a specified number of rows from the end of a DataFrame. It returns a new DataFrame or a SubDataFrame view. It preserves metadata. ```julia last(df::AbstractDataFrame, n::Integer; view::Bool=false) ``` -------------------------------- ### Create DataFrame and SubDataFrame View Source: https://dataframes.juliadata.org/stable/lib/types Demonstrates creating a DataFrame and then creating a SubDataFrame by selecting a range of rows and specific columns using the `view` function. ```julia df = DataFrame(a=repeat([1, 2, 3, 4], outer=[2]), b=repeat([2, 1], outer=[4]), c=1:8) view(df, 1:4, [:a, :c]) ``` -------------------------------- ### Get First N Rows of DataFrame Source: https://dataframes.juliadata.org/stable/lib/functions The `first` function can also be used to get a specified number of rows from the beginning of a DataFrame. It returns a new DataFrame or a SubDataFrame view. It preserves metadata. ```julia first(df::AbstractDataFrame, n::Integer; view::Bool=false) ``` -------------------------------- ### Basic DataFrame Operations in Julia Source: https://dataframes.juliadata.org/stable/man/basics Demonstrates basic DataFrame creation and operations like sum, maximum, and vector subtraction using combine and transform. It highlights how scalar results are broadcasted and how vector operations behave. ```julia df = DataFrame(a = [1, 2, 3], b = [4, 5, 4]) combine(df, :a => sum) transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows transform(df, [:b, :a] => -) # vector subtraction is okay ``` -------------------------------- ### Extract First Two Columns as DataFrame (Julia) Source: https://dataframes.juliadata.org/stable/man/basics This example demonstrates extracting the first two columns of a DataFrame into a new DataFrame. It shows multiple ways to achieve this, including using a range of column indices or vectors of column names/symbols. The distinction between copying and non-copying extraction is also illustrated. ```julia julia> german[:, 1:2] # Copies the columns julia> german[:, [:id, :Age]] # Copies the columns julia> german[:, ["id", "Age"]] # Copies the columns julia> german[!, 1:2] # Reuses columns without copying julia> german[!, [:id, :Age]] # Reuses columns without copying julia> german[!, ["id", "Age"]] # Reuses columns without copying ``` -------------------------------- ### Julia: Select specific columns and all others using `select` Source: https://dataframes.juliadata.org/stable/man/basics This code snippet shows how to use the `select` function to pick a specific column and all remaining columns using the `:` operator. This is useful for rearranging or focusing on a subset of data. ```julia df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"]) select(df, :b, :) ``` -------------------------------- ### Get Element Types of DataFrame Columns (Julia) Source: https://dataframes.juliadata.org/stable/man/basics Demonstrates how to get the element types of each column in a DataFrame. It uses `eachcol` to iterate over the columns and then broadcasts the `eltype` function to determine the data type of elements within each column. ```julia julia> eltype.(eachcol(german)) 10-element Vector{DataType}: Int64 Int64 String7 Int64 String7 String15 String15 Int64 Int64 String31 ``` -------------------------------- ### Julia: Perform multi-step transformation sequentially Source: https://dataframes.juliadata.org/stable/man/basics This code demonstrates the correct way to perform sequential transformations where a new column created in one step is used in a subsequent step. It first creates column `:d` by summing `:a` and `:b`, then transforms `:d` in a separate `transform!` call. ```julia df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"]) new_df = transform(df, [:a, :b] => ByRow(+) => :d) transform!(new_df, :d => (x -> x ./ 2) => :d_2) ``` -------------------------------- ### Julia: Broadcasting Multiple Functions to Different DataFrame Columns Source: https://dataframes.juliadata.org/stable/man/basics Demonstrates how to apply different functions to different columns within a DataFrame. This example creates two simple functions, `f1` and `f2`, and broadcasts them to columns 'a' and 'b' respectively, creating new columns for the results. ```julia julia> df = DataFrame(a=1:4, b=5:8) 4×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 5 2 │ 2 6 3 │ 3 7 4 │ 4 8 julia> f1(x) = x .+ 1 f1 (generic function with 1 method) julia> f2(x) = x ./ 10 f2 (generic function with 1 method) julia> transform(df, [:a, :b] .=> [f1, f2]) 4×4 DataFrame Row │ a b a_f1 b_f2 │ Int64 Int64 Int64 Float64 ─────┼────────────────────────────── 1 │ 1 5 2 0.5 2 │ 2 6 3 0.6 3 │ 3 7 4 0.7 4 │ 4 8 5 0.8 ``` -------------------------------- ### Get Group Indices with DataFrames.jl Source: https://dataframes.juliadata.org/stable/man/split_apply_combine Demonstrates how to retrieve the group number for each row in a grouped DataFrame using the `groupindices` operation. This can be used with `combine` or `transform` to add a group index column, or directly as a function to get a vector of indices. ```julia julia> combine(gdf, groupindices) 3×2 DataFrame Row │ customer_id groupindices │ String Int64 ─────┼─────────────────────────── 1 │ a 1 2 │ b 2 3 │ c 3 ``` ```julia julia> transform(gdf, groupindices) 6×4 DataFrame Row │ customer_id transaction_id volume groupindices │ String Int64 Int64 Int64 ─────┼─────────────────────────────────────────────────── 1 │ a 12 2 1 2 │ b 15 3 2 3 │ b 19 1 2 4 │ b 17 4 2 5 │ c 13 5 3 6 │ c 11 9 3 ``` ```julia julia> combine(gdf, groupindices => "group_number") 3×2 DataFrame Row │ customer_id group_number │ String Int64 ─────┼─────────────────────────── 1 │ a 1 2 │ b 2 3 │ c 3 ``` ```julia julia> groupindices(gdf) 6-element Vector{Union{Missing, Int64}}: 1 2 2 2 3 3 ``` -------------------------------- ### Applying Custom Functions Element-wise in Julia Source: https://dataframes.juliadata.org/stable/man/basics Illustrates applying custom defined functions element-wise to DataFrame columns using ByRow. It demonstrates a simple addition function and how it transforms a column. ```julia f(x) = x + 1 transform(df, :a => ByRow(f)) ``` -------------------------------- ### Get DataFrame Row and Column Counts with nrow() and ncol() Source: https://dataframes.juliadata.org/stable/man/basics The `nrow` function returns the number of rows in a DataFrame, while the `ncol` function returns the number of columns. These provide a more direct way to get specific dimension counts compared to `size()`. ```julia julia> nrow(german) 1000 julia> ncol(german) 10 ``` -------------------------------- ### Select DataFrame columns using collections and patterns in Julia Source: https://dataframes.juliadata.org/stable/man/basics Illustrates advanced column selection using collections (like vectors), regular expressions, and special selectors (`Not`, `Between`, `All`, `Cols`). This provides flexible ways to select subsets of columns based on various criteria. ```julia df = DataFrame( id = [1, 2, 3], first_name = ["José", "Emma", "Nathan"], last_name = ["Garcia", "Marino", "Boyer"], age = [61, 24, 33] ) select(df, [:last_name, :first_name]) select(df, r"name") select(df, Not(:id)) select(df, Between(2,4)) ``` -------------------------------- ### Get DataFrameRow Element Count Source: https://dataframes.juliadata.org/stable/lib/functions Returns the number of elements in a DataFrameRow. If a dimension is specified, it must be 1 and returns the number of elements directly. ```julia size(dfr::DataFrameRow[, dim]) dfr = DataFrame(a=1:3, b='a':'c')[1, :] size(dfr) (2,) size(dfr, 1) 2 ``` -------------------------------- ### Get Number of Dimensions of DataFrameRow with Base.ndims Source: https://dataframes.juliadata.org/stable/lib/functions Returns the number of dimensions for a `DataFrameRow` or its type, which is always 1, reflecting its structure as a single row. ```julia ndims(::DataFrameRow) ndims(::Type{<:DataFrameRow}) ``` -------------------------------- ### Select Rows and Specific Columns in DataFrame (Julia) Source: https://dataframes.juliadata.org/stable/man/basics This code snippet demonstrates how to select a range of rows and specific columns from a DataFrame. It uses standard Julia DataFrame indexing syntax, specifying row indices and a vector of column names. The output is a new DataFrame with the selected subset of data. ```julia julia> german[1:5, [:Sex, :Age]] 5×2 DataFrame Row │ Sex Age │ String7 Int64 ─────┼──────────────── 1 │ male 67 2 │ female 22 3 │ male 49 4 │ male 45 5 │ male 53 ``` -------------------------------- ### Get Table Metadata Source: https://dataframes.juliadata.org/stable/lib/functions Retrieves the value of a table-level metadata key from a DataFrame. Optionally returns the metadata style and a default value if the key does not exist. ```APIDOC ## GET /dataframes/metadata ### Description Retrieves the value of a table-level metadata key from a DataFrame. ### Method GET ### Endpoint /dataframes/metadata ### Parameters #### Query Parameters - **key** (string) - Required - The metadata key to retrieve. - **default** (any) - Optional - The default value to return if the key does not exist. - **style** (boolean) - Optional - If true, returns a tuple of (value, style). Defaults to false. ### Request Example ```json { "key": "name", "style": true } ``` ### Response #### Success Response (200) - **value** (any) - The metadata value. - **style** (symbol) - The metadata style (if style=true). #### Response Example ```json { "value": "example", "style": "note" } ``` ``` -------------------------------- ### Constructing DataFrame Column by Column in Julia Source: https://dataframes.juliadata.org/stable/man/getting_started Demonstrates creating an empty DataFrame and adding columns sequentially. It shows different syntaxes for column assignment and modification, including broadcasting scalar values. Note the difference between `df.col` and `df[:, :col]` for replacement vs. in-place updates. ```julia df = DataFrame() df.A = 1:8 df[:, :B] = ["M", "F", "F", "M", "F", "M", "M", "F"] df[!, :C] .= 0 println(df) println("Size: ", size(df)) ``` ```julia df.B = df.B .== "F" println(df) ``` -------------------------------- ### Get Single Row DataFrame Source: https://dataframes.juliadata.org/stable/lib/functions The `only` function returns a DataFrameRow if the input DataFrame has exactly one row, otherwise it throws an ArgumentError. It preserves metadata. ```julia only(df::AbstractDataFrame) ``` -------------------------------- ### Get Last Row of DataFrame Source: https://dataframes.juliadata.org/stable/lib/functions The `last` function retrieves the last row of a DataFrame and returns it as a `DataFrameRow`. This function preserves table-level and column-level metadata. ```julia last(df::AbstractDataFrame) ``` -------------------------------- ### Get First Row of DataFrame Source: https://dataframes.juliadata.org/stable/lib/functions The `first` function retrieves the first row of a DataFrame and returns it as a `DataFrameRow`. This function preserves table-level and column-level metadata. ```julia first(df::AbstractDataFrame) ``` -------------------------------- ### Getting Column Names from DataFrameColumns (Julia) Source: https://dataframes.juliadata.org/stable/lib/functions Retrieves a vector of column names as Symbols from a DataFrameColumns object, typically representing columns within a DataFrame. ```julia source keys(dfc::DataFrameColumns) ``` Get a vector of column names of `dfc` as `Symbol`s. ``` -------------------------------- ### Constructing DataFrame from Tables.jl Interface in Julia Source: https://dataframes.juliadata.org/stable/man/getting_started Shows how to create a DataFrame from other table-like data structures that adhere to the Tables.jl interface. This example includes writing a DataFrame to a CSV file and loading it into an SQLite database, demonstrating DataFrames.jl's interoperability. ```julia using CSV using SQLite df = DataFrame(a=[1, 2, 3], b=[:a, :b, :c]) # write DataFrame out to CSV file CSV.write("dataframe.csv", df) # store DataFrame in an SQLite database table db = db"mydatabase.sqlite" SQLite.load!(df, db, "dataframe_table") close!(db) ``` -------------------------------- ### Create and Filter/Select DataFrame with TidierData.jl Source: https://dataframes.juliadata.org/stable/man/querying_frameworks This example shows how to create a DataFrame using DataFrames.jl and then apply filtering and selection operations using TidierData.jl's @chain, @filter, and @select macros. It demonstrates a common data wrangling pattern. ```julia using TidierData using DataFrames df = DataFrame( name = ["John", "Sally", "Roger"], age = [54.0, 34.0, 79.0], children = [0, 2, 4] ) @chain df begin @filter(children != 2) @select(name, num_children = children) end ``` -------------------------------- ### Get DataFrame Dimensions Source: https://dataframes.juliadata.org/stable/lib/functions Retrieves the dimensions (number of rows and columns) of an AbstractDataFrame. Optionally, a specific dimension (1 for rows, 2 for columns) can be requested. ```julia size(df::AbstractDataFrame[, dim]) df = DataFrame(a=1:3, b='a':'c') size(df) (3, 2) size(df, 1) 3 ``` -------------------------------- ### Get Number of Rows with DataAPI.nrow Source: https://dataframes.juliadata.org/stable/lib/functions Returns the count of rows present in an `AbstractDataFrame`. This function is crucial for understanding the depth of your data. It complements `ncol` and `size`. ```julia nrow(df::AbstractDataFrame) # Example: df = DataFrame(i=1:10, x=rand(10), y=rand(["a", "b", "c"], 10)) nrow(df) ```