### Setup Development Environment and Run Tests

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/contributing.md

Follow these steps to create a conda environment, clone the repository, install development dependencies, and run unit tests to verify the installation.

```bash
conda create -n nested_pandas_env python=3.11
conda activate nested_pandas_env

git clone https://github.com/lincc-frameworks/nested-pandas.git
cd nested-pandas/
bash ./.setup_dev.sh

pip install pytest
pytest
```

--------------------------------

### Install nested-pandas with Development Dependencies

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/CLAUDE.md

Use the provided setup script for recommended development environment setup. Alternatively, install the package in editable mode with development dependencies.

```bash
./.setup_dev.sh
```

```bash
pip install -e '.[dev]'
```

```bash
pre-commit install
```

--------------------------------

### Install pytest and run tests

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/installation.md

Installs the pytest framework and runs the unit test suite to verify the local installation of nested-pandas.

```bash
pip install pytest
pytest
```

--------------------------------

### Generate Example Nested DataFrame

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_manipulation.ipynb

Generates an example nested dataframe for demonstration purposes. This is the initial setup for most operations.

```python
import nested_pandas as npd
from nested_pandas.datasets import generate_data

# Begin by generating an example dataset
ndf = generate_data(5, 20, seed=1)
ndf
```

--------------------------------

### Benchmarking setup and execution

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Defines helper functions for running benchmarks and plotting results. Ensures the njit function is compiled before benchmarking.

```python
# define helpers for benchmarking
def run_max_slope_py(nf):
    nf.map_rows(
        max_slope_py, columns=["nested.t", "nested.flux"], row_container="args", output_names="max_slope"
    )


def run_max_slope_njit(nf):
    nf.map_rows(
        max_slope_njit,
        columns=["nested.t", "nested.flux"],
        row_container="args",
        output_names="max_slope",
        njit=True,
    )


run_max_slope_njit(nf.copy())  # run njit once for compilation before benchmark

plot_bench(run_max_slope_py, run_max_slope_njit, title="njit over python execution - max_slope")
```

--------------------------------

### Install nested-pandas from source

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/installation.md

Clones the nested-pandas repository and installs it from the local source code. This is useful for development versions.

```bash
git clone https://github.com/lincc-frameworks/nested-pandas.git
cd nested-pandas
pip install .
```

--------------------------------

### Install nested-pandas using pip

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/installation.md

Installs the latest release version of nested-pandas from PyPI.

```bash
% pip install nested-pandas
```

--------------------------------

### Setup Toy DataFrame for Combining Nested Structures

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_manipulation.ipynb

Sets up a toy nested dataframe with multiple nested columns ('c' and 'd') to demonstrate combining nested structures.

```python
# Setup a toy dataframe with two nested columns
list_nf = npd.NestedFrame(
    {
        "a": ["cat", "dog", "bird"],
        "b": [1, 2, 3],
        "c": [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
        "d": [[10, 20, 30], [40, 50, 60], [70, 80, 90]],
    }
)

list_nf = list_nf.nest_lists(["c"], "c")
list_nf = list_nf.nest_lists(["d"], "d")
list_nf
```

--------------------------------

### Install nested-pandas with development dependencies

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/installation.md

Installs nested-pandas from source along with optional development dependencies, which are needed for running unit tests and building documentation. Depending on your system, you might need to use single quotes around 'dev'.

```bash
pip install .[dev]
```

--------------------------------

### Install nested-pandas

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_loading_notebook.ipynb

Install nested-pandas and its dependencies using pip.

```python
# % pip install nested-pandas
```

--------------------------------

### Inspecting All Column Labels

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/quickstart.ipynb

Use the `.all_columns` property to get a dictionary of both top-level ('base') and nested column labels. This provides a comprehensive view of all available columns.

```python
# Provides a dictionary of "base" (top-level) and nested column labels
nf.all_columns
```

--------------------------------

### nested-pandas Workflow for Flux Amplitude Calculation

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/performance.ipynb

This snippet demonstrates an optimized workflow using nested-pandas to achieve the same results as the native pandas example, often with improved performance. It involves reading data, joining nested structures, filtering, and using the reduce method for calculations. Ensure nested_pandas and its utilities are imported.

```python
%%timeit

# Read in parquet data
# nesting sources into objects
nf = npd.read_parquet("objects.parquet")
nf = nf.join_nested(npd.read_parquet("ztf_sources.parquet"), "ztf_sources")

# Filter on object
nf = nf.query("ra > 10.0")

# Count number of observations per photometric band and add it as a column
nf = count_nested(nf, "ztf_sources", by="band", join=True)  # use an existing utility

# Filter on our nobs
nf = nf.query("n_ztf_sources_g > 520")

# Calculate Amplitude
amplitude = licu.Amplitude()
nf.reduce(amplitude, "ztf_sources.mjd", "ztf_sources.flux")
```

--------------------------------

### Build Documentation with Sphinx

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/CLAUDE.md

Navigate to the docs directory and use the make html command to build the project's documentation.

```bash
cd docs && make html
```

--------------------------------

### Run Pre-commit Checks and Linting/Formatting with Ruff

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/CLAUDE.md

Execute all pre-commit hooks to ensure code quality and style compliance across the project. Specific commands are available for Ruff linting and formatting.

```bash
pre-commit run --all-files
```

```bash
ruff check src/ tests/
```

```bash
ruff format src/ tests/
```

--------------------------------

### Get nested column names

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Retrieves the names of all nested columns available in the Series using the .columns attribute of the .nest accessor.

```python
nested_series.nest.columns
```

--------------------------------

### Set up conda environment for nested-pandas

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/installation.md

Creates and activates a new conda environment for nested-pandas development. Use Python 3.11.

```bash
conda create -n nested_pandas_env python=3.11
conda activate nested_pandas_env
```

--------------------------------

### Import necessary libraries

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_loading_notebook.ipynb

Import pandas, os, tempfile, and specific components from nested_pandas.

```python
import os
import tempfile

import pandas as pd

from nested_pandas import NestedFrame, read_parquet
from nested_pandas.datasets import generate_parquet_file
```

--------------------------------

### Convert Nested Series to Flat DataFrame

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Use `.to_flat()` to get a "flat" pandas DataFrame with a repeated index, effectively concatenating nested elements. This operation is copy-free.

```python
nested_series.nest.to_flat(["flux", "t"])
```

--------------------------------

### Plotting Benchmark for Asymptotic Behavior

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Configures and plots benchmark results to analyze the asymptotic behavior of njit versus Python execution. Adjusts the number of base rows and nested columns to observe performance trends.

```python
n_base_shrink = [7500, 10_000]
n_nested_list_asy = [500, 1000, 1500, 2000, 2500, 3000, 3500, 4000]

plot_bench(
    run_max_slope_py,
    run_max_slope_njit,
    title="njit over python execution - max_slope (asymptotic behavior)",
    n_base_list=n_base_shrink,
    n_nested_list=n_nested_list_asy,
)
```

--------------------------------

### Read Parquet with Partial Column Loading

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Demonstrates reading a Parquet file with nested columns, supporting partial loading of specific sub-columns using dot notation. Requires the 's3fs' library for S3 access.

```python
import nested_pandas as npd
from nested_pandas.datasets import generate_data

nf = generate_data(100, 50, seed=0)

# Write
nf.to_parquet("data.parquet")

# Read full file
nf2 = npd.read_parquet("data.parquet")
assert list(nf2.nested_columns) == ["nested"]

# Partial load: only "a" base column and the "flux" sub-column of "nested"
nf3 = npd.read_parquet("data.parquet", columns=["a", "nested.flux"])
print(nf3.columns.tolist())         # ['a', 'nested']
print(nf3["nested"].nest.columns)   # ['flux']

# Read from S3 (requires s3fs)
# nf_s3 = npd.read_parquet("s3://my-bucket/data.parquet")
```

--------------------------------

### Run All Tests with Pytest

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/CLAUDE.md

Execute all tests in the project using the pytest command. For coverage reports, include the --cov and --cov-report flags.

```bash
python -m pytest
```

```bash
python -m pytest --cov=nested_pandas --cov-report=xml
```

--------------------------------

### Applying njit-compiled function with map_rows

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Demonstrates how to use an `@njit` decorated function with `map_rows` by setting `njit=True`. This enables performance optimizations.

```python
nf.map_rows(
    max_slope_njit,
    columns=["nested.t", "nested.flux"],
    row_container="args",
    output_names="max_slope",
    njit=True,
)
```

--------------------------------

### Running and Plotting Weighted Mean Slope Benchmarks

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Sets up and runs benchmark comparisons for Python and njit versions of weighted_mean_slope using map_rows. Includes initial njit compilation and plotting of results.

```python
def run_weighted_mean_slope_py(nf):
    nf.map_rows(
        weighted_mean_slope_py,
        columns=["nested.t", "nested.flux"],
        row_container="args",
        output_names="weighted_mean_slope",
    )

def run_weighted_mean_slope_njit(nf):
    nf.map_rows(
        weighted_mean_slope_njit,
        columns=["nested.t", "nested.flux"],
        row_container="args",
        output_names="weighted_mean_slope",
        njit=True,
    )


run_weighted_mean_slope_njit(nf.copy())  # run njit once for compilation before benchmark

plot_bench(
    run_weighted_mean_slope_py,
    run_weighted_mean_slope_njit,
    title="njit over python execution - weighted_mean_slope",
)
```

--------------------------------

### GroupBy Describe Aggregation

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/groupby_doc.ipynb

Demonstrates the `describe()` aggregation on a `groupby` object. This method works as expected, providing descriptive statistics by automatically flattening the nested columns.

```python
# describe works as expected with automatic flattened nested column
nf.groupby("c").describe()
```

--------------------------------

### Import necessary libraries and generate data

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Imports `generate_data` from `nested_pandas.datasets`, `numpy`, and `njit` from `numba`. Generates a sample nested pandas DataFrame for demonstration.

```python
from nested_pandas.datasets import generate_data
import numpy as np
from numba import njit

# example frame
nf = generate_data(10_000, 1000, seed=1)
```

--------------------------------

### Access nested column keys using .nest

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Demonstrates how to retrieve the names (keys) of the nested columns using the .nest accessor.

```python
list(nested_series.nest.keys())
```

--------------------------------

### Build Flat Spectrum Dataframe

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/nested_spectra.ipynb

Constructs a 'flat' spectrum table by iterating through retrieved FITS spectral data. It aggregates wavelength, flux, error, and object index into NumPy arrays. Requires numpy.

```python
import numpy as np

# Build a flat spectrum dataframe

# Initialize some empty arrays to hold the flat data
wave = np.array([])
flux = np.array([])
err = np.array([])
index = np.array([])
# Loop over each spectrum, adding its data to the arrays
for i, hdu in enumerate(sp):
    wave = np.append(wave, 10 ** hdu["COADD"].data.loglam)  # * u.angstrom
    flux = np.append(flux, hdu["COADD"].data.flux * 1e-17)  # * u.erg/u.second/u.centimeter**2/u.angstrom
    err = np.append(err, 1 / hdu["COADD"].data.ivar * 1e-17)  # * flux.unit

    # We'll need to set an index to keep track of which rows correspond
    # to which object
    index = np.append(index, i * np.ones(len(hdu["COADD"].data.loglam)))
```

--------------------------------

### Asymptotic Behavior Plot for njit Looping Function

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Use this to visualize the performance scaling of njit-compiled functions with explicit loops. It helps understand how performance changes as the input size increases, particularly for nested data structures.

```python
n_base_shrink = [7500, 10_000]
nested_list_asy = [500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000]

# Assuming plot_bench and run_weighted_mean_slope_py are defined elsewhere
# plot_bench(
#     run_weighted_mean_slope_py,
#     run_weighted_mean_slope_njit_loop, # Assuming this is defined elsewhere
#     title="njit over python execution - weighted_mean_slope (asymptotic behavior for loop)",
#     n_base_list=n_base_shrink,
#     n_nested_list=nested_list_asy,
# )
```

--------------------------------

### Applying standard Python function with map_rows

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Shows the usage of a standard Python function with `map_rows`. This serves as the baseline for performance comparison.

```python
nf.map_rows(
    max_slope_py,
    columns=["nested.t", "nested.flux"],
    row_container="args",
    output_names="max_slope",
)
```

--------------------------------

### Load Data from Parquet Files

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_loading_notebook.ipynb

Ingest data from Parquet files into a NestedFrame using the `read_parquet` method. Temporary files are used for demonstration.

```python
# Note: that we use the `tempfile` module to create and then cleanup a temporary directory.
# You can of course remove this and use your own directory and real files on your system.
with tempfile.TemporaryDirectory() as temp_path:
    # Generates parquet files with random data within our temporary directory
    generate_parquet_file(10, {"nested1": 100, "nested2": 10}, os.path.join(temp_path, "test.parquet"))

    # Read the parquet file to a NestedFrame
    nf = read_parquet(os.path.join(temp_path, "test.parquet"))
```

--------------------------------

### Class Documentation

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/_templates/autosummary/class.rst

This section details the documentation structure for classes within the nested-pandas library, including their constructors, methods, and attributes.

```APIDOC
## Class Name

.. autoclass:: {{ objname }}

   .. automethod:: __init__

   .. rubric:: Methods

   .. autosummary::
      {% for item in methods %}
         ~{{ name }}.{{ item }}
      {%- endfor %}

   .. rubric:: Attributes

   .. autosummary::
      {% for item in attributes %}
         ~{{ name }}.{{ item }}
      {%- endfor %}
```

--------------------------------

### Pandas Workflow for Flux Amplitude Calculation

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/performance.ipynb

This snippet shows a typical workflow using native pandas to read, filter, and process photometric data, including calculating flux amplitudes. It requires importing pandas, light_curve, and numpy.

```python
import nested_pandas as npd
import pandas as pd
import light_curve as licu
import numpy as np

from nested_pandas.utils import count_nested
```

```python
%%timeit

# Read data
object_df = pd.read_parquet("objects.parquet")
source_df = pd.read_parquet("ztf_sources.parquet")

# Filter on object
filtered_object = object_df.query("ra > 10.0")
# sync object to source --removes any index values of source not found in object
filtered_source = filtered_object[[]].join(source_df, how="left")

# Count number of observations per photometric band and add it to the object table
band_counts = (
    source_df.groupby(level=0)
    .apply(lambda x: x[["band"]].value_counts().reset_index())
    .pivot_table(values="count", index="index", columns="band", aggfunc="sum")
)
filtered_object = filtered_object.join(band_counts[["g", "r"]])

# Filter on our nobs
filtered_object = filtered_object.query("g > 520")
filtered_source = filtered_object[[]].join(source_df, how="left")

# Calculate Amplitude
amplitude = licu.Amplitude()
filtered_source.groupby(level=0).apply(lambda x: amplitude(np.array(x.mjd), np.array(x.flux)))
```

--------------------------------

### Benchmarking Three Mixed-Type Arguments with map_rows

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Benchmark the performance of a three-argument function using `map_rows`, comparing njit execution against the default Python pathway. Observe how optimization decreases with increasing nested column width.

```python
def run_sum_max3_py(nf):
    nf.map_rows(
        sum_max3_py, columns=["a", "nested.t", "nested.flux"], row_container="args", output_names="max_3"
    )


# leave out `njit=True` to take python pathway
def run_sum_max3_njit(nf):
    nf.map_rows(
        sum_max3_njit,
        columns=["a", "nested.t", "nested.flux"],
        row_container="args",
        output_names="max_3",
    )


run_sum_max3_njit(nf.copy())  # run once for jit compilation before benchmark

plot_bench(run_sum_max3_py, run_sum_max3_njit, title="njit custom function over python - sum_max3")
```

--------------------------------

### Plotting Asymptotic Behavior for Weighted Mean Slope

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Analyzes the asymptotic behavior of the weighted_mean_slope function by plotting performance against increasing nested column width. This helps identify the crossover point where Python may outperform njit.

```python
n_base_shrink = [7500, 10_000]
n_nested_list_asy = [500, 1000, 1500, 2000, 2500, 3000, 3500, 4000]

plot_bench(
    run_weighted_mean_slope_py,
    run_weighted_mean_slope_njit,
    title="njit over python execution - weighted_mean_slope (asymptotic behavior)",
    n_base_list=n_base_shrink,
    n_nested_list=n_nested_list_asy,
)
```

--------------------------------

### NestedFrame.to_parquet / read_parquet

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Write a NestedFrame to a Parquet file and read it back. Supports partial column loading and remote paths.

```APIDOC
## NestedFrame.to_parquet / read_parquet — Parquet I/O

Write a `NestedFrame` to a Parquet file (nested columns are stored as struct-of-lists) and read it back. Supports partial column loading via dot notation and remote paths (S3, HTTP).

### Request Example
```python
import nested_pandas as npd
from nested_pandas.datasets import generate_data

nf = generate_data(100, 50, seed=0)

# Write
nf.to_parquet("data.parquet")

# Read full file
nf2 = npd.read_parquet("data.parquet")

# Partial load: only "a" base column and the "flux" sub-column of "nested"
nf3 = npd.read_parquet("data.parquet", columns=["a", "nested.flux"])

# Read from S3 (requires s3fs)
# nf_s3 = npd.read_parquet("s3://my-bucket/data.parquet")
```

### Response Example
```python
# After reading full file:
# assert list(nf2.nested_columns) == ["nested"]

# After partial load:
# print(nf3.columns.tolist())         # Expected: ['a', 'nested']
# print(nf3["nested"].nest.columns)   # Expected: ['flux']
```
```

--------------------------------

### Create a flat Pandas DataFrame

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_loading_notebook.ipynb

Define a sample flat DataFrame with repeating and varying columns, suitable for conversion to a nested structure.

```python
flat_df = pd.DataFrame(
    data={
        "a": [1, 1, 1, 2, 2, 2, 3, 3, 3, 3],
        "b": [2, 2, 2, 4, 4, 4, 6, 6, 6, 6],
        "c": [0, 2, 4, 1, 4, 3, 1, 4, 1, 1],
        "d": [5, 4, 7, 5, 3, 1, 9, 3, 4, 1],
    },
    index=[0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
)
flat_df
```

--------------------------------

### Create Base NestedFrame

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_loading_notebook.ipynb

Initialize a NestedFrame from a dictionary to define top-level objects and constant values.

```python
nf = NestedFrame(
    data={
        "a": [1, 2, 3],
        "b": [2, 4, 6],
    },
    index=[0, 1, 2],
)
nf
```

--------------------------------

### Display NestedFrame with Nested Columns

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/data_loading_notebook.ipynb

View a NestedFrame that contains nested columns, demonstrating the structure after data loading.

```python
nf  # nf contains nested columns
```

--------------------------------

### Build NestedFrame from Arrays

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/nested_spectra.ipynb

Constructs a NestedFrame from flat arrays representing spectral data (wavelength, flux, error). Ensure data is properly formatted and indexed.

```python
flat_spec = npd.NestedFrame(dict(wave=wave, flux=flux, err=err), index=index.astype(np.int8))
```

--------------------------------

### Python and njit implementations for max_slope

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Provides both a standard Python and an `@njit` decorated version of the `max_slope` function. Use the `@njit` version for performance-critical computations.

```python
def max_slope_py(t, flux):
    slope = np.diff(flux) / np.diff(t)
    return np.max(slope)


@njit
def max_slope_njit(t, flux):
    slope = np.diff(flux) / np.diff(t)
    return np.max(slope)
```

--------------------------------

### Generate Toy NestedFrame Dataset

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Use `generate_data` to quickly create synthetic NestedFrame datasets for testing. It supports single or multiple nested columns with configurable sizes.

```python
from nested_pandas.datasets import generate_data

# Single nested column: 5 base rows, 10 nested rows each
nf = generate_data(5, 10, seed=1)
print(nf)
#           a         b                                             nested
# 0  0.417022  0.184677  [{t: 8.38389, flux: 31.551563, band: 'r'}; …] (10 rows)
# ...
```

```python
# Multiple nested columns with different sizes
nf2 = generate_data(5, {"lc": 10, "spectra": 3}, seed=42)
print(nf2.nested_columns)   # ['lc', 'spectra']
print(nf2["lc"].nest.columns)  # ['t', 'flux', 'band']
```

--------------------------------

### Import necessary libraries

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Imports essential libraries for data manipulation and nested data structures.

```python
import numpy as np
import pandas as pd
import pyarrow as pa

from nested_pandas import NestedDtype
from nested_pandas.datasets import generate_data
from nested_pandas.series.packer import pack
```

--------------------------------

### Mapping Rows with Custom Function and Arguments Input

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/quickstart.ipynb

Apply a custom function to rows using `map_rows`, passing specified columns as arguments. The `row_container='args'` option unpacks the data into separate arguments for the function.

```python
def show_inputs(*args):
    return args

nf_inputs = nf.map_rows(show_inputs, columns=["ra", "lightcurve.time"], row_container="args")
nf_inputs
```

--------------------------------

### GroupBy Min/Max/Mean Aggregation Failure

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/groupby_doc.ipynb

Illustrates that `min()`, `max()`, and `mean()` aggregations fail when applied to nested columns. This is due to the unhashable nature of the nested data structures.

```python
# min/max/mean fail on nested columns
try:
    grouped_min = nf.groupby("c").min()
    print(grouped_min)
except TypeError as e:
    print(f"Cannot compute min on nested columns: {e}")
```

--------------------------------

### Infer Nesting with Prefix

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Use `infer_nesting=True` to automatically create nested columns based on an 'out.' prefix in the output dictionary keys. This is useful for transforming flat data into a nested structure.

```python
def offsets(row):
    return {"out.dt": row["nested.t"] - row["a"],
            "out.df": row["nested.flux"] - row["b"]}

result3 = nf.map_rows(offsets, columns=["a", "b", "nested.t", "nested.flux"],
                      infer_nesting=True)
print(result3.nested_columns)  # ['out']

# append_columns: merge results back into the original frame
nf_aug = nf.map_rows(summarize, columns=["a", "nested.flux"], append_columns=True)
print(nf_aug.base_columns)  # ['a', 'b', 'mean_flux', 'n_obs', 'max_minus_a']
```

--------------------------------

### Generate Sample NestedPandas Data

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/groupby_doc.ipynb

Generates a sample NestedPandas DataFrame for demonstration purposes. This includes creating a DataFrame with nested data and adding a non-nested column 'c' for grouping.

```python
from nested_pandas.datasets import generate_data

nf = generate_data(5, 10, seed=1)
nf["c"] = [0, 0, 1, 1, 1]
nf
```

--------------------------------

### Benchmarking Exploded Base Column Optimization

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Benchmark the performance of the njit-optimized function against its Python equivalent. This helps determine the breaking point where the overhead of exploding the column outweighs njit's benefits.

```python
# explode graph
def run_scaled_max_flux_njit_explode(nf):
    nf["nested.a"] = nf["a"]
    nf.map_rows(
        scaled_max_flux_njit_explode,
        columns=["nested.flux", "nested.a"],
        row_container="args",
        output_names="scaled_max_flux",
        njit=True,
    )


run_scaled_max_flux_njit_explode(nf.copy())  # run once for jit compilation before benchmark

plot_bench(
    run_scaled_max_flux_py,
    run_scaled_max_flux_njit_explode,
    title="njit over python execution - scaled_max_flux (explode)",
)
```

--------------------------------

### Numba njit with Explicit Loop for Weighted Mean Slope

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Use this when optimizing functions that involve iterative calculations. Numba's njit can provide substantial speedups by compiling explicit Python loops into machine code, outperforming numpy-based approaches in certain scenarios.

```python
import numpy as np
from numba import njit

@njit
def weighted_mean_slope_njit_loop(t, flux):
    n = t.size - 1
    num = 0.0
    weight = 0.0

    # manually looping to get the difference and summing up
    for i in range(n):
        dt = t[i + 1] - t[i]
        df = flux[i + 1] - flux[i]

        slope = df / dt
        num += slope * dt
        weight += dt

    return num / weight

def run_weighted_mean_slope_njit_loop(nf):
    nf.map_rows(
        weighted_mean_slope_njit_loop,
        columns=["nested.t", "nested.flux"],
        row_container="args",
        output_names="weighted_mean_slope",
        njit=True,
    )

# Assuming nf and plot_bench are defined elsewhere
# run_weighted_mean_slope_njit_loop(nf.copy())  # run njit once for compilation before benchmark
# plot_bench(
#     run_weighted_mean_slope_py, # Assuming this is defined elsewhere
#     run_weighted_mean_slope_njit_loop,
#     title="njit over python execution - weighted_mean_slope (loop)",
# )
```

--------------------------------

### Multi-select Sub-columns in NestedSeries

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/quickstart.ipynb

Select multiple sub-columns from a NestedSeries simultaneously.

```python
# Multi-selecting sub-columns
nf["lightcurve"][[ "time", "brightness"]]
```

--------------------------------

### Create Nested Series using pack()

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Use `pack()` to create a nested Series from a collection of elements like DataFrames, dictionaries, or None. Elements must share the same columns but can have different lengths.

```python
series_from_pack = pack(
    [
        pd.DataFrame({"t": [1, 2, 3], "flux": [0.1, 0.2, 0.3]}),
        {"t": [4, 5], "flux": [0.4, 0.5]},
        None,
    ],
    name="from_pack",  # optional
    index=[3, 4, 5],  # optional
)
series_from_pack
```

--------------------------------

### Mapping Rows with Custom Function and Dictionary Input

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/quickstart.ipynb

Apply a custom function to rows using `map_rows`, passing specified columns as a dictionary. The `row_container='dict'` option structures the input for the function.

```python
def show_inputs(row):
    return row

# row_container="dict" passes the data as a dictionary to the function
nf_inputs = nf.map_rows(show_inputs, columns=["ra", "lightcurve.time"], row_container="dict")
nf_inputs

# map_rows returns a dataframe view of the dicts, but the two columns can be accessed with show_inputs as
# row["ra"] and row["lightcurve.time"]
```

--------------------------------

### Dispatcher for packing DataFrames or sequences

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

The `pack` function acts as a dispatcher, automatically selecting `pack_flat` for DataFrames and `pack_seq` for sequences.

```python
import pandas as pd
import numpy as np
from nested_pandas.series.packer import pack, pack_flat, pack_lists, pack_seq

# pack: dispatcher — delegates to pack_flat (DataFrame) or pack_seq (sequence)
flat = pd.DataFrame({
    "t":    [1.0, 1.5, 2.0, 2.5],
    "flux": [10., 11., 20., 21.],
}, index=[0, 0, 1, 1])
ns4 = pack(flat, name="lc")
```

--------------------------------

### Plotting a Nested Spectrum

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/nested_spectra.ipynb

Visualizes a single spectrum by plotting its flux against wavelength. Requires matplotlib for plotting. Ensure the spectrum data is correctly accessed.

```python
import matplotlib.pyplot as plt

# Plot a spectrum
spec = spec_ndf.iloc[1].coadd_spectrum

plt.plot(spec["wave"], spec["flux"])
plt.xlabel("Wavelength (Å)")
plt.ylabel(r"Flux ($ergs/s/cm^2/Å$)")
```

--------------------------------

### Build NestedFrame Manually

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Construct a `NestedFrame` by joining a flat DataFrame as a nested column to a base DataFrame. This demonstrates the core `NestedFrame` structure and its properties.

```python
import nested_pandas as npd
import pandas as pd
import numpy as np

# Build a NestedFrame manually
base = npd.NestedFrame({"obj_id": [1, 2, 3], "ra": [10.0, 20.0, 30.0]}, index=[0, 1, 2])
measurements = pd.DataFrame({
    "time": [1.1, 1.2, 2.1, 2.2, 3.1],
    "flux": [10.0, 11.0, 20.0, 21.0, 30.0],
    "band": ["g", "r", "g", "r", "g"],
}, index=[0, 0, 1, 1, 2])

nf = base.join_nested(measurements, "lc")
print(nf)
#    obj_id    ra                                 lc
# 0       1  10.0  [{time: 1.1, flux: 10.0, band: 'g'}; …] (2 rows)
# 1       2  20.0  [{time: 2.1, flux: 20.0, band: 'g'}; …] (2 rows)
# 2       3  30.0         [{time: 3.1, flux: 30.0, band: 'g'}]

print(nf.nested_columns)   # ['lc']
print(nf.base_columns)     # ['obj_id', 'ra']
print(nf.all_columns)      # {'base': [...], 'lc': ['time', 'flux', 'band']}
```

--------------------------------

### Basic GroupBy Operation

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/groupby_doc.ipynb

Demonstrates a basic `groupby` operation on a non-nested column. This returns a standard Pandas GroupBy object, as grouping by nested columns is not supported due to their unhashable nature.

```python
nf.groupby("c")  # returns a Pandas GroupBy object
```

--------------------------------

### Exploding Base Column for njit Optimization

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/njit_map_rows.ipynb

Use this pattern when njit optimization is needed for functions with mixed-type arguments. Explode the base column into nested columns to satisfy njit's static typing requirements. Be aware that this can be inefficient for large nested column widths.

```python
@njit
def scaled_max_flux_njit_explode(flux, a):
    """
    flux: 1D array (nested slice)
    a: scalar vector (base column value exploded into nested column)
    """
    return a[0] * np.max(flux)


nf["nested.a"] = nf["a"]  # explode base column into nested column
nf.map_rows(
    scaled_max_flux_njit_explode,
    columns=["nested.flux", "nested.a"],  # input both arguments as nested column
    row_container="args",
    output_names="scaled_max_flux",
    njit=True,
)
```

--------------------------------

### Create Nested Series from Existing Nested Series

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Use the `pack()` function to create a new nested Series from an existing one. This is useful for creating copies or when performing operations that result in a new nested structure.

```python
new_series = pack(nested_series.nest.to_flat())
new_series.equals(nested_series)
```

--------------------------------

### NestSeriesAccessor (.nest)

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Low-level accessor for NestedSeries, providing methods to convert between representations, mutate sub-columns, and query the flat array.

```APIDOC
## NestSeriesAccessor (`.nest`) — Low-level accessor for NestedSeries

Registered as `.nest` on any `pd.Series` with `NestedDtype`. Provides methods to convert between representations, mutate sub-columns, and query the flat array.

### Methods

- **`to_flat()`**: Convert to flat DataFrame (repeated index).
- **`to_lists()`**: Convert to list-arrays DataFrame (one array per row).
- **`columns`**: Get list of sub-column names.
- **`len()`**: Get number of nested rows per outer row.
- **`flat_length`**: Get total number of nested elements.
- **`flat_index`**: Get flat index (repeated outer index).
- **`set_flat_column(name, values)`**: Add a new sub-column with a scalar or flat array.
- **`set_list_column(name, values)`**: Add a sub-column using list-arrays.
- **`set_filled_column(name, values)`**: Repeat a base-column value into nested rows.
- **`drop(column_name)`**: Drop sub-columns.
- **`query(expression)`**: Query the flat arrays.

### Request Example
```python
from nested_pandas.datasets import generate_data

nf = generate_data(5, 5, seed=1)
ns = nf["nested"]   # NestedSeries

# Convert to flat DataFrame
flat_df = ns.nest.to_flat()

# Add a new sub-column
ns2 = ns.nest.set_flat_column("weight", 1.0)
ns3 = ns.nest.set_flat_column("norm_flux", flat_df["flux"].values / flat_df["flux"].max())

# Drop sub-columns
ns_no_band = ns.nest.drop("band")

# Query the flat arrays
ns_bright = ns.nest.query("flux > 50")
```

### Response Example
```python
# print(flat_df.head())
# print(ns.nest.columns)   # Expected: ['t', 'flux', 'band']
# print(ns.nest.len())     # Expected: [5, 5, 5, 5, 5]
# print(ns.nest.flat_length)  # Expected: 25
# print(ns.nest.flat_index)   # Expected: Index([0, 0, 0, 0, 0, 1, 1, ...])
# print(ns_bright)
```
```

--------------------------------

### Query SDSS for Spectra

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/nested_spectra.ipynb

Queries the Sloan Digital Sky Survey (SDSS) for astronomical objects within a specified region and retrieves their spectra. Requires astroquery and astropy libraries.

```python
from astroquery.sdss import SDSS
from astropy import coordinates as coords
import astropy.units as u
import nested_pandas as npd

# Query SDSS for a set of objects with spectra
pos = coords.SkyCoord("0h8m10.63s +14d50m23.3s", frame="icrs")
xid = SDSS.query_region(pos, radius=3 * u.arcmin, spectro=True)
xid_ndf = npd.NestedFrame(xid.to_pandas())
xid_ndf
```

--------------------------------

### Perform Type Checking with Mypy

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/CLAUDE.md

Utilize mypy for static type checking to catch potential type-related errors in the source and test files.

```bash
mypy src/ tests/
```

--------------------------------

### Retrieve Spectra Data from SDSS

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/nested_spectra.ipynb

Retrieves the actual spectral data for objects previously identified by an SDSS query. Clears the cache before fetching to ensure fresh data.

```python
# Query SDSS for the corresponding spectra
SDSS.clear_cache()
sp = SDSS.get_spectra(matches=xid)
sp
```

--------------------------------

### Join Nested Spectra to Existing Data

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/pre_executed/nested_spectra.ipynb

Joins a NestedFrame containing spectral data to an existing DataFrame based on a common key. This nests the spectral data within the main table.

```python
spec_ndf = xid_ndf.join_nested(flat_spec, "coadd_spectrum").set_index("objid")
```

--------------------------------

### Create Nested Series from PyArrow Struct Array

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Construct a nested Series efficiently from a PyArrow struct array. This is the most performant method for creating nested Series when data is already in PyArrow format.

```python
pa_struct_array = pa.StructArray.from_arrays(
    [
        [
            np.arange(10),
            np.arange(5),
        ],  # "a" field
        [
            np.linspace(0, 1, 10),
            np.linspace(0, 1, 5),
        ],  # "b" field
    ],
    names=["a", "b"],
)
series_from_pa_struct = pd.Series(
    pa_struct_array,
    dtype=NestedDtype(pa_struct_array.type),
    name="from_pa_struct_array",
    index=["I", "II"],
)
```

--------------------------------

### Create a Flat Pandas DataFrame

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/gettingstarted/quickstart.ipynb

Create a standard pandas DataFrame to represent nested time-series data. This serves as the input for converting into a NestedFrame.

```python
import pandas as pd

# Represent nested time series information as a classic pandas dataframe.
my_data_frame = pd.DataFrame(
    {
        "id": [0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        "ra": [10.0, 10.0, 10.0, 15.0, 15.0, 15.0, 12.1, 12.1, 12.1, 12.1],
        "dec": [0.0, 0.0, 0.0, -1.0, -1.0, -1.0, 0.5, 0.5, 0.5, 0.5],
        "time": [60676.0, 60677.0, 60678.0, 60675.0, 60676.5, 60677.0, 60676.6, 60676.7, 60676.8, 60676.9],
        "brightness": [100.0, 101.0, 99.8, 5.0, 5.01, 4.98, 20.1, 20.5, 20.3, 20.2],
        "band": ["g", "r", "g", "r", "g", "r", "g", "g", "r", "r"],
    }
)
my_data_frame
```

--------------------------------

### Convert Nested Series to ArrowDtype, Flat DataFrame, and List DataFrame

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Demonstrates conversions of a nested Series to a PyArrow dtype Series, a flat DataFrame, and a DataFrame with list-arrays. These conversions are useful for interoperability and analysis.

```python
# Convert to pd.ArrowDtype Series of struct-arrays
arrow_dtyped_series = pd.Series(nested_series, dtype=nested_series.dtype.to_pandas_arrow_dtype())
# Convert to a flat dataframe
flat_df = nested_series.nest.to_flat()
# Convert to a list-array dataframe
list_df = nested_series.nest.to_lists()
```

--------------------------------

### Generate nested data and access Series

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Generates sample nested data and extracts a Series with NestedDtype for further manipulation.

```python
nested_df = generate_data(4, 3, seed=42)
nested_series = nested_df["nested"]
nested_series[2]
```

--------------------------------

### Pack list columns into a nested column in-place

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Use `nest_lists` as an instance method to pack existing list-valued columns into a new named nested column. This is an in-place style operation.

```python
import nested_pandas as npd

nf = npd.NestedFrame({
    "id":   [1, 2, 3],
    "flux": [[10., 11.], [20., 21.], [30., 31.]],
    "band": [["g", "r"], ["g", "g"], ["r", "r"]],
})

result = nf.nest_lists(columns=["flux", "band"], name="obs")
print(result)
#    id                                  obs
# 0   1  [{flux: 10.0, band: 'g'}; …] (2 rows)
# 1   2  [{flux: 20.0, band: 'g'}; …] (2 rows)
# 2   3  [{flux: 30.0, band: 'r'}; …] (2 rows)
```

--------------------------------

### Create a new nested Series with a subset of columns

Source: https://github.com/lincc-frameworks/nested-pandas/blob/main/docs/tutorials/low_level.ipynb

Generates a new nested Series containing only the specified columns ('t' and 'flux') from the original nested Series.

```python
nested_series.nest[["t", "flux"]].dtype
```

--------------------------------

### generate_data

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

Generates a synthetic NestedFrame dataset for testing and exploration. It can create single or multiple nested columns with specified sizes.

```APIDOC
## `generate_data` — Generate a toy NestedFrame dataset

Quickly creates a synthetic `NestedFrame` with base columns `a` and `b` and one or more nested columns (`t`, `flux`, `band`) for testing and exploration. Accepts a dictionary for `n_layer` to create multiple nested columns in one call.

```python
from nested_pandas.datasets import generate_data

# Single nested column: 5 base rows, 10 nested rows each
nf = generate_data(5, 10, seed=1)
print(nf)
#           a         b                                             nested
# 0  0.417022  0.184677  [{t: 8.38389, flux: 31.551563, band: 'r'}; …] (10 rows)
# ...

# Multiple nested columns with different sizes
nf2 = generate_data(5, {"lc": 10, "spectra": 3}, seed=42)
print(nf2.nested_columns)   # ['lc', 'spectra']
print(nf2["lc"].nest.columns)  # ['t', 'flux', 'band']
```
```

--------------------------------

### NestedSeries Accessor for Low-Level Operations

Source: https://context7.com/lincc-frameworks/nested-pandas/llms.txt

The `.nest` accessor provides low-level methods for NestedSeries, enabling conversion to flat or list-array DataFrames, manipulation of sub-columns, and querying of nested data.

```python
from nested_pandas.datasets import generate_data

nf = generate_data(5, 5, seed=1)
ns = nf["nested"]   # NestedSeries

# Convert to flat DataFrame (repeated index)
flat_df = ns.nest.to_flat()
print(flat_df.head())

# Convert to list-arrays DataFrame (one array per row)
lists_df = ns.nest.to_lists()
print(lists_df.head())

# List of sub-column names
print(ns.nest.columns)   # ['t', 'flux', 'band']

# Number of nested rows per outer row
print(ns.nest.len())     # [5, 5, 5, 5, 5]

# Total number of nested elements
print(ns.nest.flat_length)  # 25

# Flat index (repeated outer index)
print(ns.nest.flat_index)   # Index([0, 0, 0, 0, 0, 1, 1, ...])

# Add a new sub-column with a scalar (broadcast) or flat array
ns2 = ns.nest.set_flat_column("weight", 1.0)
ns3 = ns.nest.set_flat_column("norm_flux", flat_df["flux"].values / flat_df["flux"].max())

# Add a sub-column using list-arrays (one list per outer row)
ns4 = ns.nest.set_list_column("flag", [[True]*5]*5)

# Repeat a base-column value into nested rows
ns5 = ns.nest.set_filled_column("object_id", [10, 20, 30, 40, 50])

# Drop sub-columns
ns_no_band = ns.nest.drop("band")

# Query the flat arrays
ns_bright = ns.nest.query("flux > 50")
print(ns_bright)
```