### Patsy: Complex Formula with Categorical Coding and Numerical Transformation (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Provides an example of a more complex formula combining categorical coding (Polynomial) with a numerical transformation (center()) within an interaction term.

```python
dmatrix("C(a, Poly):center(x1)", data)
```

--------------------------------

### Displaying Sample Data (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows the structure and content of the 'data' object generated by demo_data. Patsy can work with various data structures as long as they can be indexed like a Python dictionary.

```python
data
```

--------------------------------

### Patsy: Using Orthogonal Polynomial Categorical Coding (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Demonstrates how to apply an alternative categorical coding scheme, specifically Orthogonal Polynomial coding, using the C() notation.

```python
dmatrix("C(c, Poly)", {"c": ["c1", "c1", "c2", "c2", "c3", "c3"]})
```

--------------------------------

### Patsy: ANOVA-Style Formula with Main Effects and Interaction (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows how Patsy handles formulas including main effects and interactions, building up the model incrementally to represent the additional contribution of each term. Includes the shorthand notation.

```python
dmatrix("a + b + a:b", data)
```

```python
dmatrix("a*b", data)
```

--------------------------------

### Patsy: Interaction Between Categorical and Numerical Variables (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows how to model an interaction between a categorical variable and a numerical variable, effectively fitting different slopes for the numerical variable within each category.

```python
dmatrix("a:x1", data)
```

--------------------------------

### Combining Q() with Other Transformations (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Demonstrates that the Q() function can be combined with other transformations, such as a custom function like 'double', to handle special characters while applying transformations within the formula.

```python
dmatrix("double(Q('weird column!')) + x1", weird_data)
```

--------------------------------

### Handling Special Characters in Variable Names with Q() (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Explains how to use the built-in Q() function to 'quote' variable names that contain special characters like spaces or punctuation, allowing them to be correctly interpreted within the formula string. Includes setup code for data with a 'weird column!'.

```python
weird_data = demo_data("weird column!", "x1")
dmatrix("Q('weird column!') + x1", weird_data)
```

--------------------------------

### Importing Libraries and Generating Data (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Imports necessary libraries including numpy and key functions from patsy (dmatrices, dmatrix, demo_data). It then generates sample data using demo_data with specified variable names, including a mix of categorical and numerical types.

```python
import numpy as np
from patsy import dmatrices, dmatrix, demo_data
data = demo_data("a", "b", "x1", "x2", "y", "z column")
```

--------------------------------

### Applying Built-in Transformation Functions (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Illustrates the use of Patsy's built-in transformation functions, such as 'center' and 'standardize', directly within the formula string. These functions are automatically available without explicit import.

```python
dmatrix("center(x1) + standardize(x2)", data)
```

--------------------------------

### Patsy: Dummy Coding Interaction Between Categorical Variables (No Intercept) (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Demonstrates how to dummy code the interaction term between two categorical variables, creating columns for each combination of values, when the intercept is excluded.

```python
dmatrix("0 + a:b", data)
```

--------------------------------

### Installing Patsy from Source (Shell)

Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst

Provides the command to install Patsy from a downloaded source distribution by running the standard Python setup.py script.

```Shell
python setup.py install
```

--------------------------------

### Patsy: Treatment-Coded Slopes for Categorical-Numerical Interaction (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Illustrates how Patsy's redundancy avoidance applies to categorical-numerical interactions when the numerical variable is also included as a main effect, resulting in treatment-coded slopes.

```python
# compare to the difference between "0 + a" and "1 + a"
dmatrix("x1 + a:x1", data)
```

--------------------------------

### Patsy: Handling Different Data Types in Formulas (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Demonstrates how Patsy's formula evaluation respects the underlying Python data types (e.g., NumPy arrays vs. lists) when performing operations like addition within the formula.

```python
dmatrix("I(x1 + x2)", {"x1": np.array([1, 2, 3]), "x2": np.array([4, 5, 6])})
```

```python
dmatrix("I(x1 + x2)", {"x1": [1, 2, 3], "x2": [4, 5, 6]})
```

--------------------------------

### Applying External Transformation Function (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Demonstrates applying a transformation function (np.log) from the current Python environment within the Patsy formula. Any function or variable accessible where dmatrix is called can be used inside the formula string.

```python
dmatrix("x1 + np.log(x2 + 10)", data)
```

--------------------------------

### Patsy: Treatment Coding Single Categorical Variable (With Intercept) (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Illustrates Patsy's default behavior for coding a single categorical variable when the intercept is present. It uses reduced-rank treatment coding to avoid redundancy.

```python
dmatrix("a", data)
```

--------------------------------

### Performing Linear Regression with Design Matrices (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Demonstrates how to use the DesignMatrix objects returned by dmatrices directly with a regression function like numpy.linalg.lstsq. It calculates the regression coefficients (betas) and prints them along with their corresponding column names from the predictor matrix's design_info.

```python
outcome, predictors = dmatrices("y ~ x1 + x2", data)
betas = np.linalg.lstsq(predictors, outcome)[0].ravel()
for name, beta in zip(predictors.design_info.column_names, betas):
    print("%s: %s" % (name, beta))
```

--------------------------------

### Patsy: Dummy Coding Single Categorical Variable (No Intercept) (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows how Patsy dummy codes a single categorical variable when the intercept is explicitly excluded from the formula, resulting in a column for each category.

```python
dmatrix("0 + a", data)
```

--------------------------------

### Installing Patsy via Pip (Shell)

Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst

Shows the standard command for installing or upgrading the Patsy package using the pip package manager, which fetches the package from the Python Package Index (PyPI).

```Shell
pip install --upgrade patsy
```

--------------------------------

### Defining and Using Custom Transformation Function (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows how to define a custom Python function ('double') and use it as a transformation within a Patsy formula. The function is applied to the 'x1' variable during design matrix creation.

```python
def double(x):
    return 2 * x

dmatrix("x1 + double(x1)", data)
```

--------------------------------

### Performing Arithmetic Operations with I() (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows how to use the built-in I() function to 'protect' arithmetic expressions within a formula string. This ensures that operators like '+' are interpreted as mathematical operations rather than formula separators.

```python
dmatrix("I(x1 + x2)", data)
```

--------------------------------

### Installing Patsy via pip

Source: https://github.com/pydata/patsy/blob/master/README.md

This command installs the patsy library using the pip package installer for Python. It fetches the latest version from PyPI and makes it available in your Python environment.

```Shell
pip install patsy
```

--------------------------------

### Generating Design Matrices with dmatrices (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Uses the dmatrices function to create design matrices suitable for regression. It takes a formula string and the data object, returning a tuple containing two DesignMatrix objects: one for the left-hand side (outcome) and one for the right-hand side (predictors), automatically including an intercept.

```python
dmatrices("y ~ x1 + x2", data)
```

--------------------------------

### Using Environment Variable in Formula (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows that variables defined in the Python environment can be referenced directly within a Patsy formula string. A new variable 'new_x2' is created and then used in the formula passed to dmatrix.

```python
new_x2 = data["x2"] * 100
dmatrix("new_x2")
```

--------------------------------

### Generating Predictor Design Matrix with dmatrix (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Shows how to use the dmatrix function to generate a single design matrix, typically for predictors. By omitting the 'y ~' part of the formula, only the right-hand side matrix is created, which also includes an intercept by default.

```python
dmatrix("x1 + x2", data)
```

--------------------------------

### Define Categorical Factors - Python (Setup)

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Sets up two lists, 'a' and 'b', representing categorical data, and imports the 'dmatrix' function from Patsy. This code is used for internal setup and is suppressed in the original documentation output.

```Python
a = ["a1", "a1", "a2", "a2"]
b = ["b1", "b2", "b1", "b2"]
from patsy import dmatrix
```

--------------------------------

### Defining and Using a Custom Stateful Transform - Patsy - Python

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Provides a simple example of defining a custom stateful transform class (`MyExampleCenter`) that follows the required protocol (`__init__`, `memorize_chunk`, `memorize_finish`, `transform`). It then shows how to wrap the class using `patsy.stateful_transform` to create a callable object and demonstrates its basic usage.

```Python
class MyExampleCenter(object):
    def __init__(self):
        self._total = 0
        self._count = 0
        self._mean = None

    def memorize_chunk(self, x):
        self._total += np.sum(x)
        self._count += len(x)

    def memorize_finish(self):
        self._mean = self._total * 1. / self._count

    def transform(self, x):
        return x - self._mean

my_example_center = patsy.stateful_transform(MyExampleCenter)
print(my_example_center(np.array([1, 2, 3])))
```

--------------------------------

### Example Formula with Stateful Transforms - Patsy - Python

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Illustrates a complex Patsy formula incorporating multiple stateful transforms like `center` and `spline` within an `I()` identity function, demonstrating how Patsy handles dependencies correctly.

```Python
y ~ I(spline(center(x1)) + center(x2))
```

--------------------------------

### Generating Design Matrix Without Intercept (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst

Illustrates how to explicitly remove the automatic intercept term from the design matrix. This is achieved by adding '- 1' to the formula string passed to the dmatrix function.

```python
dmatrix("x1 + x2 - 1", data)
```

--------------------------------

### Using a Patsy-Enabled Model Class in Python

Source: https://github.com/pydata/patsy/blob/master/doc/library-developers.rst

Demonstrates how to use a model class (`LM`) that has been integrated with Patsy. It shows instantiating the model using both traditional matrix input and the Patsy formula interface (`formula`, `data`). Examples include making predictions on new data and calculating log-likelihood, highlighting Patsy's automatic handling of categorical variables and transformations.

```python
from patsy import demo_data
data = demo_data("x", "y", "a")

# Old and boring approach (but it still works):
X = np.column_stack(([1] * len(data["y"]), data["x"]))
LM((data["y"], X))

# Fancy new way:
m = LM("y ~ x", data)
m
m.predict({"x": [10, 20, 30]})
m.loglik(data)
m.loglik({"x": [10, 20, 30], "y": [-1, -2, -3]})

# Your users get support for categorical predictors for free:
LM("y ~ a", data)

# And variable transformations too:
LM("y ~ np.log(x ** 2)", data)
```

--------------------------------

### Importing Libraries for Patsy Formulas - Python

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

Imports the numpy library as 'np' and all components from the patsy library. This setup is required to use patsy's formula parsing and model description functionalities.

```python
import numpy as np
from patsy import *
```

--------------------------------

### Define Categorical Factors - R

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Sets up two categorical factors, 'a' and 'b', each with two levels, for use in subsequent R formula examples demonstrating differences in categorical coding.

```R
a <- factor(c("a1", "a1", "a2", "a2"))
b <- factor(c("b1", "b2", "b1", "b2"))
```

--------------------------------

### Using Custom Coding Class (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst

Demonstrates how to use the custom MyTreat coding class with patsy.dmatrix, showing examples for full-rank (without intercept), reduced-rank (with intercept), and passing arguments to the custom class constructor.

```python
# Full rank:
dmatrix("0 + C(a, MyTreat)", data)
# Reduced rank:
dmatrix("C(a, MyTreat)", data)
# With argument:
dmatrix("C(a, MyTreat(2))", data)
```

--------------------------------

### Patsy Formula Parsing Example

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

This example illustrates how Patsy's parser distinguishes its operators from Python code within factors. Operators inside parentheses within a factor are treated as part of the factor's Python code, not Patsy operators.

```Patsy Formula
f(x1 + x2) + x3
```

--------------------------------

### Accessing Term objects

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Retrieves the list of Term objects that represent the formula terms. It also shows how to get the name string for each Term object.

```python
di.terms
[term.name() for term in di.terms]
```

--------------------------------

### Loading Custom Coding Class (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst

Loads and executes the Python code defining the custom MyTreat class from an external file, making it available for use in subsequent examples. This snippet is suppressed in the original output.

```python
with open("_examples/example_treatment.py") as f:
    exec(f.read())
```

--------------------------------

### Using the : Operator in Patsy Formula

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

The `:` operator computes the interaction between terms, taking the union of factors within them. For example, `(a + b):(c + d)` expands to `a:c + a:d + b:c + b:d`. It handles identical terms such that `a:a` is `a`.

```Patsy Formula
(a + b):(c + d)
```

```Patsy Formula
a:c + a:d + b:c + b:d
```

```Patsy Formula
a:a
```

```Patsy Formula
a
```

```Patsy Formula
(a:b):(a:c)
```

```Patsy Formula
a:b:c
```

--------------------------------

### Defining a Custom Patsy Factor Class in Python

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

This Python class `MyAlternativeFactor` demonstrates a basic custom Patsy factor. It implements the required `name` method and stores internal state (`alternative_formula`, `side`) passed during initialization. This example is part of a larger pattern for integrating alternative formula systems with Patsy.

```python
class MyAlternativeFactor(object):
    # A factor object that simply returns the design
    def __init__(self, alternative_formula, side):
        self.alternative_formula = alternative_formula
        self.side = side

    def name(self):
        return self.side
```

--------------------------------

### Plotting Patsy B-spline Basis and Resulting Curve

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

Generates a B-spline basis using `patsy.dmatrix` with the `bs` transform. It then plots the individual basis functions (scaled by example coefficients) and the sum of these functions, representing the resulting spline curve. Requires `matplotlib`.

```python
import matplotlib.pyplot as plt
import numpy as np
from patsy import dmatrix

plt.title("B-spline basis example (degree=3)");
x = np.linspace(0., 1., 100)
y = dmatrix("bs(x, df=6, degree=3, include_intercept=True) - 1", {"x": x})
# Define some coefficients
b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7])
# Plot B-spline basis functions (colored curves) each multiplied by its coeff
plt.plot(x, y*b);
# Plot the spline itself (sum of the basis functions, thick black curve)
plt.plot(x, np.dot(y, b), color='k', linewidth=3);
```

--------------------------------

### Define eval for Custom Factor (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

Implements the `eval` method for a custom Patsy factor. This method is responsible for evaluating the factor given the current state and data. In this example, it delegates the matrix generation to an alternative formula object.

```python
def eval(self, state, data):
    return self.alternative_formula.get_matrix(self.side, data)
```

--------------------------------

### Plotting Patsy Cyclic Cubic Spline Basis and Resulting Curve

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

Generates a cyclic cubic regression spline basis using `patsy.dmatrix` with the `cc` transform. It plots the individual basis functions (scaled by example coefficients) and their sum, demonstrating the resulting cyclic spline curve. Requires `matplotlib`.

```python
import matplotlib.pyplot as plt
import numpy as np
from patsy import dmatrix

plt.title("Cyclic cubic regression spline basis example");
x = np.linspace(0., 1., 100)
# Define some coefficients (example values)
b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7])
y = dmatrix("cc(x, df=6) - 1", {"x": x})
# Plot cyclic cubic regression spline basis functions (colored curves) each multiplied by its coeff
plt.plot(x, y*b);
# Plot the spline itself (sum of the basis functions, thick black curve)
plt.plot(x, np.dot(y, b), color='k', linewidth=3);
```

--------------------------------

### Function to Add Predictors to a Patsy ModelDesc

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

Provides an example function `add_predictors` that takes a base formula string and a list of additional variable names. It parses the formula, creates `LookupFactor` and `Term` objects for the extra variables, and appends them to the right-hand side of the parsed `ModelDesc`, allowing programmatic model expansion.

```python
from patsy import ModelDesc, Term, LookupFactor, parse_formula

def add_predictors(formula_string, extra_vars):
    """
    Parses a formula string and adds additional predictors as Terms.

    Args:
        formula_string: The base formula string (e.g., "y ~ a + b").
        extra_vars: A list of variable names to add as individual predictors.

    Returns:
        A ModelDesc object including terms from the formula and extra variables.
    """
    # Parse the initial formula
    desc = parse_formula(formula_string)

    # Create Terms for the extra variables and add them to the RHS
    for var_name in extra_vars:
        lookup_factor = LookupFactor(var_name)
        new_term = Term([lookup_factor])
        desc.rhs_terms.append(new_term)

    return desc
```

--------------------------------

### Plotting Patsy Natural Cubic Spline Basis and Resulting Curve

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

Generates a natural cubic regression spline basis using `patsy.dmatrix` with the `cr` transform. It plots the individual basis functions (scaled by example coefficients) and their sum, illustrating the resulting spline curve. Requires `matplotlib`.

```python
import matplotlib.pyplot as plt
import numpy as np
from patsy import dmatrix

plt.title("Natural cubic regression spline basis example");
x = np.linspace(0., 1., 100)
# Define some coefficients (example values)
b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7])
y = dmatrix("cr(x, df=6) - 1", {"x": x})
# Plot natural cubic regression spline basis functions (colored curves) each multiplied by its coeff
plt.plot(x, y*b);
# Plot the spline itself (sum of the basis functions, thick black curve)
plt.plot(x, np.dot(y, b), color='k', linewidth=3);
```

--------------------------------

### Create Source and Wheel Distributions (Shell)

Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt

Executes setup.py twice to create both a source distribution (sdist) in zip format and a binary wheel distribution (bdist_wheel). This is typically done from a clean clone to ensure correct packaging before uploading to PyPI.

```shell
python setup.py sdist --formats=zip && python setup.py bdist_wheel
```

--------------------------------

### Create Source Distribution (Shell)

Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt

Runs the setup.py script to create a source distribution (sdist) of the project, specifically formatted as a zip file. This is a standard step in packaging Python libraries for distribution.

```shell
python setup.py sdist --formats=zip
```

--------------------------------

### Build Documentation (Shell)

Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt

Changes the current directory to 'docs' and executes the 'make clean html' command to clean previous builds and generate the HTML documentation. This step is used to verify that the documentation builds without errors or warnings.

```shell
cd docs; make clean html
```

--------------------------------

### Upload Distributions to PyPI (Shell)

Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt

Uses the 'twine' command-line utility to upload the generated source (.zip) and wheel (.whl) distribution files found in the 'dist' directory to the Python Package Index (PyPI), making the new version available.

```shell
twine upload dist/*.zip dist/*.whl
```

--------------------------------

### Loading Libraries and Sample Data (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst

Imports key Patsy components and generates a sample dataset with a categorical variable 'a' having 3 levels for demonstration.

```python
from patsy import dmatrix, demo_data, ContrastMatrix, Poly
data = demo_data("a", nlevels=3)
data
```

--------------------------------

### Creating a DesignInfo object

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Demonstrates how to generate a design matrix using dmatrix with a formula and data, and then access the associated DesignInfo object which contains metadata about the resulting matrix.

```python
mat = dmatrix("a + x", demo_data("a", "x", nlevels=3))
di = mat.design_info
```

--------------------------------

### Cloning Patsy Git Repository (Shell)

Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst

Provides the command-line instruction to clone the latest development version of the Patsy source code repository from GitHub using the Git version control system.

```Shell
git clone git://github.com/pydata/patsy.git
```

--------------------------------

### Tag and Push Release (Shell)

Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt

Uses Git to create a version tag (replacing <version> with the actual version number) and then pushes all local tags to the remote Git repository. This marks the release point in the version history.

```shell
git tag v<version> && git push --tags
```

--------------------------------

### Demonstrating Patsy Term Ordering (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

This snippet creates a sample dataset and uses Patsy's `dmatrix` to build a design matrix from a complex formula. It then accesses the `term_names` attribute to show the order in which Patsy sorts the terms according to its internal rules.

```python
data = demo_data("a", "b", "x1", "x2")
mat = dmatrix("x1:x2 + a:b + b + x1:a:b + a + x2:a:x1", data)
mat.design_info.term_names
```

--------------------------------

### Using Patsy's Stateful Center Transform (Patsy/Python)

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Demonstrates the correct behavior of Patsy's built-in `center` stateful transform. It correctly applies the transformation to new data using the mean calculated from the original data used to build the design matrix.

```python
import numpy as np
from patsy import dmatrix, build_design_matrices, incr_dbuilder
data = {"x": [1, 2, 3, 4]}
new_data = {"x": [5, 6, 7, 8]}

# Build matrix on original data using stateful center
fixed_mat = dmatrix("center(x)", data)
# fixed_mat output: [ -1.5, -0.5,  0.5,  1.5] (centered on mean 2.5)

# Correct! Apply to new data using the state from fixed_mat
build_design_matrices([fixed_mat.design_info], new_data)[0]
# Output: [ 3.5, 4.5, 5.5, 6.5] (correctly centered on original mean 2.5)
```

--------------------------------

### Programmatically Building ModelDesc with Factors and Terms in Patsy

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

Demonstrates the programmatic approach to defining a model structure using Patsy's internal API. It shows how to create `LookupFactor` and `EvalFactor` objects, combine them into `Term` objects, and assemble these into a `ModelDesc` for use with `dmatrix`.

```python
import numpy as np
from patsy import (ModelDesc, EvalEnvironment, Term, EvalFactor,
                      LookupFactor, demo_data, dmatrix)
data = demo_data("a", "x")

# LookupFactor takes a dictionary key:
a_lookup = LookupFactor("a")
# EvalFactor takes arbitrary Python code:
x_transform = EvalFactor("np.log(x ** 2)")
# First argument is empty list for dmatrix; we would need to put
# something there if we were calling dmatrices.
desc = ModelDesc([],
                    [Term([a_lookup]),
                     Term([x_transform]),
                     # An interaction:
                     Term([a_lookup, x_transform])])
# Create the matrix (or pass 'desc' to any statistical library
# function that uses patsy.dmatrix internally):
dmatrix(desc, data)
```

--------------------------------

### Demonstrating Naive Centering Problem (Patsy/Python)

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Shows how a naive centering function behaves incorrectly when applied to new data after being fitted on original data, using `patsy.dmatrix` to build the initial matrix and `patsy.build_design_matrices` to transform new data. The naive function centers the new data based on its *own* mean, not the original data's mean.

```python
import numpy as np
from patsy import dmatrix, build_design_matrices, incr_dbuilder
data = {"x": [1, 2, 3, 4]}

# Build matrix on original data
mat = dmatrix("naive_center(x)", data)
# mat output: [ -1.5, -0.5,  0.5,  1.5] (centered on mean 2.5)

# Apply to new data
new_data = {"x": [5, 6, 7, 8]}
# Broken! Centers on the mean of new_data (6.5) instead of original mean (2.5)
build_design_matrices([mat.design_info], new_data)[0]
# Output: [ -1.5, -0.5,  0.5,  1.5]
```

--------------------------------

### Integrating Patsy High-Level Interface in Python

Source: https://github.com/pydata/patsy/blob/master/doc/library-developers.rst

Shows how to modify existing Python function signatures to accept Patsy formula strings and data dictionaries. It uses `patsy.dmatrices` or `patsy.dmatrix` to build design matrices from the formula and data, enabling users to specify models using the formula mini-language. Requires the `patsy` library.

```python
def mymodel2_patsy(formula_like, data={}, ...):
    y, X = patsy.dmatrices(formula_like, data, 1)
    ...

def mymodel1_patsy(formula_like, data={}, ...):
    X = patsy.dmatrix(formula_like, data, 1)
    ...
```

--------------------------------

### Stateful Centering with Incremental Data (Patsy/Python)

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Shows how Patsy's stateful `center` transform correctly handles incremental data processing with `patsy.incr_dbuilder`. It makes an initial pass to calculate the necessary state (like the mean) across all chunks before building the matrix. It also shows applying the resulting transform to new data.

```python
import numpy as np
from patsy import dmatrix, build_design_matrices, incr_dbuilder
data = {"x": [1, 2, 3, 4]}
new_data = {"x": [5, 6, 7, 8]}

# Process data in chunks
data_chunked = [{"x": data["x"][:2]},
                {"x": data["x"][2:]}]

# Build incrementally using stateful center
dinfo = incr_dbuilder("center(x)", lambda: iter(data_chunked))

# Correct! Matrix is built using the overall mean (2.5)
matrix_from_chunks = np.vstack([build_design_matrices([dinfo], chunk)[0]
               for chunk in data_chunked])
# Output: [[-1.5, -0.5], [0.5, 1.5]]

# Correct! Apply to new data using the state from the incremental build
build_design_matrices([dinfo], new_data)[0]
# Output: [ 3.5, 4.5, 5.5, 6.5]
```

--------------------------------

### Listing Python Dependencies

Source: https://github.com/pydata/patsy/blob/master/doc/sphinxext/requirements.txt

This snippet lists the required Python packages for the project, including scientific computing libraries (numpy, scipy, pandas) and development/documentation tools (mistune, jsonschema, ipython). This format is typical for a requirements file.

```Python
numpy
scipy
pandas

mistune
jsonschema
ipython
```

--------------------------------

### Importing the patsy library

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Imports all public names from the pat patsy library into the current namespace, making functions and classes directly accessible.

```python
from patsy import *
```

--------------------------------

### Creating Design Matrix with Custom Names using DesignMatrix and DesignInfo in Patsy

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

Shows how to use `patsy.DesignMatrix` and `patsy.DesignInfo` to wrap an array-like object, allowing the specification of custom column names for the resulting design matrix while still using `dmatrix`.

```python
from patsy import DesignMatrix, DesignInfo
design_info = DesignInfo(["Intercept!", "Not intercept!"])
X_dm = DesignMatrix(X, design_info)
dmatrix(X_dm)
```

--------------------------------

### Creating Design Matrix from List of Lists in Patsy

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

Demonstrates the simplest method to create a design matrix by directly passing a list of lists (or any array-like object) to the `patsy.dmatrix` function. This bypasses the formula parser entirely.

```python
from patsy import dmatrix
X = [[1, 10], [1, 20], [1, -2]]
dmatrix(X)
```

--------------------------------

### Python: Exploring Patsy Formula Parsing

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

Provides Python code using `patsy.ModelDesc.from_formula` to parse and inspect various Patsy formulas. This allows users to explore how different syntax elements like interactions, combinations, powers, and embedded Python code are interpreted.

```Python
from patsy import ModelDesc
ModelDesc.from_formula("y ~ x")
ModelDesc.from_formula("y ~ x + x + x")
ModelDesc.from_formula("y ~ -1 + x")
ModelDesc.from_formula("~ -1")
ModelDesc.from_formula("y ~ a:b")
ModelDesc.from_formula("y ~ a*b")
ModelDesc.from_formula("y ~ (a + b + c + d) ** 2")
ModelDesc.from_formula("y ~ (a + b)/(c + d)")
ModelDesc.from_formula("np.log(x1 + x2) "
                         "+ (x + {6: x3, 8 + 1: x4}[3 * i])")
```

--------------------------------

### Naive Centering with Incremental Data (Patsy/Python)

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Illustrates the issue with using a naive centering function when building a design matrix incrementally with `patsy.incr_dbuilder`. The naive function centers each chunk of data independently, leading to incorrect results for the dataset as a whole.

```python
import numpy as np
from patsy import dmatrix, build_design_matrices, incr_dbuilder
data = {"x": [1, 2, 3, 4]}

# Process data in chunks
data_chunked = [{"x": data["x"][:2]},
                {"x": data["x"][2:]}]

# Build incrementally using naive center
dinfo = incr_dbuilder("naive_center(x)", lambda: iter(data_chunked))

# Broken! Each chunk is centered independently
np.vstack([build_design_matrices([dinfo], chunk)[0]
               for chunk in data_chunked])
# Output: [[-0.5, 0.5], [-0.5, 0.5]]
```

--------------------------------

### Build Design Matrix for New Data with Tensor Product Basis (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

This snippet shows how to use a previously created design matrix's `design_info` to build a new design matrix for a different set of data points. It defines data for three variables (`x1`, `x2`, `x3`), creates a tensor product basis with centering constraints using `dmatrix`, and then uses `build_design_matrices` to apply the same basis transformation to new data points. It requires `numpy` and `patsy`. The input is dictionaries containing data for the original basis creation and new data points; the output is a new design matrix for the new data.

```python
data = {"x1": np.linspace(0., 1., 100),
           "x2": np.linspace(0., 1., 100),
           "x3": np.linspace(0., 1., 100)}
design_matrix = dmatrix("te(cr(x1, df=3), cr(x2, df=3), cc(x3, df=3), constraints='center')",
                           data)
new_data = {"x1": [0.1, 0.2],
               "x2": [0.2, 0.3],
               "x3": [0.3, 0.4]}
new_design_matrix = build_design_matrices([design_matrix.design_info], new_data)[0]
new_design_matrix
np.asarray(new_design_matrix)
```

--------------------------------

### Creating Design Matrices with Patsy Formula (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst

Demonstrates how to use the patsy.dmatrices function with a formula string to generate design matrices for a statistical model from a dataset. The formula specifies the dependent variable (y), independent variables (x, a, b), and an interaction term (a:b).

```Python
patsy.dmatrices("y ~ x + a + b + a:b", data)
```

--------------------------------

### Calculate Patsy dmatrix columns for 1 + a:b (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Demonstrates how Patsy correctly handles collinearity issues with interaction terms (a:b) and an intercept (1) by producing the expected number of columns in the design matrix, unlike R's model.matrix in certain cases.

```python
# Python:
a = ["a1", "a1", "a2", "a2"]
b = ["b1", "b2", "b1", "b2"]
mat = dmatrix("1 + a:b")
mat.shape[1]
```

--------------------------------

### Calculate Patsy dmatrix columns for 0 + a:x + a:b (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Illustrates how Patsy correctly determines the number of columns for a design matrix involving both numeric (a:x) and categorical (a:b) interaction terms without an intercept (0 +). Patsy distinguishes between numeric and categorical factors, leading to the correct full-rank encoding for the categorical term.

```python
# Python:
x = [1, 2, 3, 4]
mat = dmatrix("0 + a:x + a:b")
mat.shape[1]
```

--------------------------------

### Using Named Custom Contrast Matrix (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst

Shows how to create a ContrastMatrix object with custom column names to improve the readability of the resulting design matrix columns.

```python
contrast_mat = ContrastMatrix(contrast, ["[pretty0]", "[pretty1]"])
dmatrix("C(a, contrast_mat)", data)
```

--------------------------------

### Importing builtins from patsy - Python

Source: https://github.com/pydata/patsy/blob/master/doc/builtins-reference.rst

This snippet shows how to import all members from the patsy.builtins module directly into the current namespace. These members are automatically available within patsy formula code.

```python
from patsy.builtins import *
```

--------------------------------

### Manually Constructing a Patsy ModelDesc - Python

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

Demonstrates how to programmatically build a Patsy ModelDesc object using its constituent parts: Terms and EvalFactors. This manual construction represents the structure of the formula 'y ~ a + a:b + np.log(x)', showing the left-hand side term ('y') and the right-hand side terms (intercept, 'a', 'a:b', 'np.log(x)').

```python
from patsy import ModelDesc, Term, EvalFactor
ModelDesc([Term([EvalFactor("y")])],
            [Term([]),
             Term([EvalFactor("a")]),
             Term([EvalFactor("a"), EvalFactor("b")]),
             Term([EvalFactor("np.log(x)")])
             ])
```

--------------------------------

### Using Custom Contrast Matrix (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst

Demonstrates providing a custom contrast matrix as a list of lists directly to the C() function to define arbitrary coding schemes.

```python
contrast = [[1, 2], [3, 4], [5, 6]]
dmatrix("C(a, contrast)", data)
dmatrix("C(a, [[1], [2], [-4]])", data)
```

--------------------------------

### Use Custom Formula with dmatrix (Python)

Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst

Demonstrates how to instantiate and use a custom Patsy formula object with the `dmatrix` function to generate a design matrix based on the custom formula's definition.

```python
my_formula = MyAlternativeFormula(...)
dmatrix(my_formula, data)
```

--------------------------------

### Accessing term codings

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Retrieves an OrderedDict mapping Term objects to lists of SubtermInfo objects. This describes how each term is encoded into the final design matrix columns.

```python
di.term_codings
```

--------------------------------

### Initial Imports for Patsy Spline Usage

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

Imports the necessary `numpy` library for numerical operations and `dmatrix`, `build_design_matrices` from `patsy` for creating and applying design matrices, particularly for spline bases.

```python
import numpy as np
from patsy import dmatrix, build_design_matrices
```

--------------------------------

### Patsy Intercept Handling with Parentheses - Python

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Demonstrates how Patsy handles implicit intercepts and the effect of parentheses on their inclusion, contrasting with R's behavior. The first formula excludes the intercept, while the second includes it.

```Python
dmatrices("y ~ b - 1")   # equivalent to 1 + b - 1: no intercept
dmatrices("y ~ (b - 1)") # equivalent to 1 + (b - 1): has intercept
```

--------------------------------

### Accessing column name indexes

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Retrieves an OrderedDict mapping column names (strings) to their corresponding integer indexes in the design matrix, sorted by index.

```python
di.column_name_indexes
```

--------------------------------

### Compare Patsy Formula Ranks - Python

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

This snippet uses Patsy to generate design matrices for two different formulas involving categorical factors and their interaction. It then uses NumPy to calculate and compare the matrix ranks, showing that the column spans are identical, illustrating Patsy's handling of term redundancy. It requires the `patsy` and `numpy` libraries.

```python
data = demo_data("a", "b", "y")
mat1 = dmatrices("y ~ 0 + a:b", data)[1]
mat2 = dmatrices("y ~ 1 + a + b + a:b", data)[1]
np.linalg.matrix_rank(mat1)
np.linalg.matrix_rank(mat2)
np.linalg.matrix_rank(np.column_stack((mat1, mat2)))
```

--------------------------------

### Visualize 2D Tensor Product Basis (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

This snippet demonstrates how to create and visualize a 2D tensor product basis using Patsy's `te`, `cr`, and `cc` functions. It generates a design matrix for a smooth of two variables (`x1`, `x2`) and then plots each basis function as a 3D surface and contours using Matplotlib. It requires `numpy`, `matplotlib`, and `patsy`. The input is grid data for `x1` and `x2`, and the output is a plot showing the basis functions.

```python
In [10]: from matplotlib import cm

   In [20]: from mpl_toolkits.mplot3d.axes3d import Axes3D

   In [30]: x1 = np.linspace(0., 1., 100)

   In [40]: x2 = np.linspace(0., 1., 100)

   In [50]: x1, x2 = np.meshgrid(x1, x2)

   In [60]: df = 3

   In [70]: y = dmatrix("te(cr(x1, df), cc(x2, df)) - 1",
      ....:            {"x1": x1.ravel(), "x2": x2.ravel(), "df": df})
      ....:

   In [80]: print y.shape

   In [90]: fig = plt.figure()

   In [100]: fig.suptitle("Tensor product basis example (2 covariates)");

   In [110]: for i in range(df * df):
      .....:     ax = fig.add_subplot(df, df, i + 1, projection='3d')
      .....:     yi = y[:, i].reshape(x1.shape)
      .....:     ax.plot_surface(x1, x2, yi, rstride=4, cstride=4, alpha=0.15)
      .....:     ax.contour(x1, x2, yi, zdir='z', cmap=cm.coolwarm, offset=-0.5)
      .....:     ax.contour(x1, x2, yi, zdir='y', cmap=cm.coolwarm, offset=1.2)
      .....:     ax.contour(x1, x2, yi, zdir='x', cmap=cm.coolwarm, offset=-0.2)
      .....:     ax.set_xlim3d(-0.2, 1.0)
      .....:     ax.set_ylim3d(0, 1.2)
      .....:     ax.set_zlim3d(-0.5, 1)
      .....:     ax.set_xticks([0, 1])
      .....:     ax.set_yticks([0, 1])
      .....:     ax.set_zticks([-0.5, 0, 1])
      .....:

   In [120]: fig.tight_layout()
```

--------------------------------

### Display Patsy Design Matrix 2 - Python

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

This snippet outputs the content of the `mat2` variable, containing the design matrix generated by Patsy for the formula `y ~ 1 + a + b + a:b`. It demonstrates that while the rank is the same as `mat1`, the actual matrix contents differ due to Patsy's coding strategies. This requires `mat2` to be defined previously.

```python
mat2
```

--------------------------------

### Correct Stateful Transform Usage in dmatrix - Patsy - Python

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Shows the correct way to use a stateful transform function (`center`) directly within a Patsy formula string passed to `dmatrix` using a simple variable reference.

```Python
dmatrix("y ~ center(x)", data)
```

--------------------------------

### Check Model Matrix Column Count with Interaction - R

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Generates the model matrix for the formula `~ 1 + a:b` in R and displays the number of columns, highlighting a potential overspecification issue in R's coding algorithm.

```R
mat <- model.matrix(~ 1 + a:b)
ncol(mat)
```

--------------------------------

### Correct Stateful Transform Usage via Variable - Patsy - Python

Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst

Demonstrates the correct usage of a stateful transform by assigning the function to a local variable (`asdf`) and then referencing the variable directly in the Patsy formula string.

```Python
asdf = patsy.center
dmatrix("y ~ asdf(x)", data)
```

--------------------------------

### Python: Describing Parsed Patsy Formula

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

Illustrates how to use the `ModelDesc.describe()` method in Python. This method converts a parsed `ModelDesc` object back into its formula string representation, which can be helpful for understanding the result of parsing complex formulas.

```Python
desc = ModelDesc.from_formula("y ~ (a + b + c + d) ** 2")
desc.describe()
```

--------------------------------

### Check Model Matrix Rank with Interaction - R

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Calculates and displays the rank of the model matrix generated by the formula `~ 1 + a:b` in R, illustrating R's behavior with interactions and intercepts.

```R
qr(model.matrix(~ 1 + a:b))$rank
```

--------------------------------

### Default Treatment Coding (Patsy, Python)

Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst

Demonstrates the default behavior of patsy.dmatrix when given a categorical variable, which is to apply Treatment coding.

```python
dmatrix("a", data)
```

--------------------------------

### Parsing a Formula String into ModelDesc - Python

Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst

Illustrates the standard method for converting a Patsy formula string into a ModelDesc object using the ModelDesc.from_formula() class method. This is the typical way users interact with Patsy's formula parsing, achieving the same result as manual construction but more conveniently.

```python
ModelDesc.from_formula("y ~ a + a:b + np.log(x)")
```

--------------------------------

### Accessing factor information

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Retrieves a dictionary mapping factor objects to FactorInfo objects, providing detailed information about each factor used in the design.

```python
di.factor_infos
```

--------------------------------

### Accessing column names

Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst

Retrieves the list of column names from a DesignInfo object. The names are returned as strings in the order they appear in the design matrix.

```python
di.column_names
```

--------------------------------

### Applying Patsy B-spline Basis to New Data

Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst

Creates a B-spline design matrix from initial data using `patsy.dmatrix`. It then uses the `design_info` from this matrix to apply the same spline transformation to a new set of data points using `patsy.build_design_matrices`, generating a new design matrix for prediction.

```python
import numpy as np
from patsy import dmatrix, build_design_matrices

data = {"x": np.linspace(0., 1., 100)}
design_matrix = dmatrix("bs(x, df=4)", data)

new_data = {"x": [0.1, 0.25, 0.9]}
build_design_matrices([design_matrix.design_info], new_data)[0]
```

--------------------------------

### Calculate R model.matrix columns for 0 + a:x + a:b (R)

Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst

Shows R's model.matrix behavior when dealing with a mix of numeric (a:x) and categorical (a:b) interaction terms without an intercept (0 +). It incorrectly treats the categorical factor as if it were collinear with the numeric interaction, resulting in fewer columns than expected.

```r
# R:
> x <- c(1, 2, 3, 4)
> mat <- model.matrix(~ 0 + a:x + a:b)
> ncol(mat)
[1] 4
```