### Patsy: Complex Formula with Categorical Coding and Numerical Transformation (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Provides an example of a more complex formula combining categorical coding (Polynomial) with a numerical transformation (center()) within an interaction term. ```python dmatrix("C(a, Poly):center(x1)", data) ``` -------------------------------- ### Displaying Sample Data (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows the structure and content of the 'data' object generated by demo_data. Patsy can work with various data structures as long as they can be indexed like a Python dictionary. ```python data ``` -------------------------------- ### Patsy: Using Orthogonal Polynomial Categorical Coding (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Demonstrates how to apply an alternative categorical coding scheme, specifically Orthogonal Polynomial coding, using the C() notation. ```python dmatrix("C(c, Poly)", {"c": ["c1", "c1", "c2", "c2", "c3", "c3"]}) ``` -------------------------------- ### Patsy: ANOVA-Style Formula with Main Effects and Interaction (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows how Patsy handles formulas including main effects and interactions, building up the model incrementally to represent the additional contribution of each term. Includes the shorthand notation. ```python dmatrix("a + b + a:b", data) ``` ```python dmatrix("a*b", data) ``` -------------------------------- ### Patsy: Interaction Between Categorical and Numerical Variables (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows how to model an interaction between a categorical variable and a numerical variable, effectively fitting different slopes for the numerical variable within each category. ```python dmatrix("a:x1", data) ``` -------------------------------- ### Combining Q() with Other Transformations (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Demonstrates that the Q() function can be combined with other transformations, such as a custom function like 'double', to handle special characters while applying transformations within the formula. ```python dmatrix("double(Q('weird column!')) + x1", weird_data) ``` -------------------------------- ### Handling Special Characters in Variable Names with Q() (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Explains how to use the built-in Q() function to 'quote' variable names that contain special characters like spaces or punctuation, allowing them to be correctly interpreted within the formula string. Includes setup code for data with a 'weird column!'. ```python weird_data = demo_data("weird column!", "x1") dmatrix("Q('weird column!') + x1", weird_data) ``` -------------------------------- ### Importing Libraries and Generating Data (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Imports necessary libraries including numpy and key functions from patsy (dmatrices, dmatrix, demo_data). It then generates sample data using demo_data with specified variable names, including a mix of categorical and numerical types. ```python import numpy as np from patsy import dmatrices, dmatrix, demo_data data = demo_data("a", "b", "x1", "x2", "y", "z column") ``` -------------------------------- ### Applying Built-in Transformation Functions (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Illustrates the use of Patsy's built-in transformation functions, such as 'center' and 'standardize', directly within the formula string. These functions are automatically available without explicit import. ```python dmatrix("center(x1) + standardize(x2)", data) ``` -------------------------------- ### Patsy: Dummy Coding Interaction Between Categorical Variables (No Intercept) (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Demonstrates how to dummy code the interaction term between two categorical variables, creating columns for each combination of values, when the intercept is excluded. ```python dmatrix("0 + a:b", data) ``` -------------------------------- ### Installing Patsy from Source (Shell) Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst Provides the command to install Patsy from a downloaded source distribution by running the standard Python setup.py script. ```Shell python setup.py install ``` -------------------------------- ### Patsy: Treatment-Coded Slopes for Categorical-Numerical Interaction (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Illustrates how Patsy's redundancy avoidance applies to categorical-numerical interactions when the numerical variable is also included as a main effect, resulting in treatment-coded slopes. ```python # compare to the difference between "0 + a" and "1 + a" dmatrix("x1 + a:x1", data) ``` -------------------------------- ### Patsy: Handling Different Data Types in Formulas (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Demonstrates how Patsy's formula evaluation respects the underlying Python data types (e.g., NumPy arrays vs. lists) when performing operations like addition within the formula. ```python dmatrix("I(x1 + x2)", {"x1": np.array([1, 2, 3]), "x2": np.array([4, 5, 6])}) ``` ```python dmatrix("I(x1 + x2)", {"x1": [1, 2, 3], "x2": [4, 5, 6]}) ``` -------------------------------- ### Applying External Transformation Function (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Demonstrates applying a transformation function (np.log) from the current Python environment within the Patsy formula. Any function or variable accessible where dmatrix is called can be used inside the formula string. ```python dmatrix("x1 + np.log(x2 + 10)", data) ``` -------------------------------- ### Patsy: Treatment Coding Single Categorical Variable (With Intercept) (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Illustrates Patsy's default behavior for coding a single categorical variable when the intercept is present. It uses reduced-rank treatment coding to avoid redundancy. ```python dmatrix("a", data) ``` -------------------------------- ### Performing Linear Regression with Design Matrices (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Demonstrates how to use the DesignMatrix objects returned by dmatrices directly with a regression function like numpy.linalg.lstsq. It calculates the regression coefficients (betas) and prints them along with their corresponding column names from the predictor matrix's design_info. ```python outcome, predictors = dmatrices("y ~ x1 + x2", data) betas = np.linalg.lstsq(predictors, outcome)[0].ravel() for name, beta in zip(predictors.design_info.column_names, betas): print("%s: %s" % (name, beta)) ``` -------------------------------- ### Patsy: Dummy Coding Single Categorical Variable (No Intercept) (Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows how Patsy dummy codes a single categorical variable when the intercept is explicitly excluded from the formula, resulting in a column for each category. ```python dmatrix("0 + a", data) ``` -------------------------------- ### Installing Patsy via Pip (Shell) Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst Shows the standard command for installing or upgrading the Patsy package using the pip package manager, which fetches the package from the Python Package Index (PyPI). ```Shell pip install --upgrade patsy ``` -------------------------------- ### Defining and Using Custom Transformation Function (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows how to define a custom Python function ('double') and use it as a transformation within a Patsy formula. The function is applied to the 'x1' variable during design matrix creation. ```python def double(x): return 2 * x dmatrix("x1 + double(x1)", data) ``` -------------------------------- ### Performing Arithmetic Operations with I() (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows how to use the built-in I() function to 'protect' arithmetic expressions within a formula string. This ensures that operators like '+' are interpreted as mathematical operations rather than formula separators. ```python dmatrix("I(x1 + x2)", data) ``` -------------------------------- ### Installing Patsy via pip Source: https://github.com/pydata/patsy/blob/master/README.md This command installs the patsy library using the pip package installer for Python. It fetches the latest version from PyPI and makes it available in your Python environment. ```Shell pip install patsy ``` -------------------------------- ### Generating Design Matrices with dmatrices (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Uses the dmatrices function to create design matrices suitable for regression. It takes a formula string and the data object, returning a tuple containing two DesignMatrix objects: one for the left-hand side (outcome) and one for the right-hand side (predictors), automatically including an intercept. ```python dmatrices("y ~ x1 + x2", data) ``` -------------------------------- ### Using Environment Variable in Formula (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows that variables defined in the Python environment can be referenced directly within a Patsy formula string. A new variable 'new_x2' is created and then used in the formula passed to dmatrix. ```python new_x2 = data["x2"] * 100 dmatrix("new_x2") ``` -------------------------------- ### Generating Predictor Design Matrix with dmatrix (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Shows how to use the dmatrix function to generate a single design matrix, typically for predictors. By omitting the 'y ~' part of the formula, only the right-hand side matrix is created, which also includes an intercept by default. ```python dmatrix("x1 + x2", data) ``` -------------------------------- ### Define Categorical Factors - Python (Setup) Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Sets up two lists, 'a' and 'b', representing categorical data, and imports the 'dmatrix' function from Patsy. This code is used for internal setup and is suppressed in the original documentation output. ```Python a = ["a1", "a1", "a2", "a2"] b = ["b1", "b2", "b1", "b2"] from patsy import dmatrix ``` -------------------------------- ### Defining and Using a Custom Stateful Transform - Patsy - Python Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Provides a simple example of defining a custom stateful transform class (`MyExampleCenter`) that follows the required protocol (`__init__`, `memorize_chunk`, `memorize_finish`, `transform`). It then shows how to wrap the class using `patsy.stateful_transform` to create a callable object and demonstrates its basic usage. ```Python class MyExampleCenter(object): def __init__(self): self._total = 0 self._count = 0 self._mean = None def memorize_chunk(self, x): self._total += np.sum(x) self._count += len(x) def memorize_finish(self): self._mean = self._total * 1. / self._count def transform(self, x): return x - self._mean my_example_center = patsy.stateful_transform(MyExampleCenter) print(my_example_center(np.array([1, 2, 3]))) ``` -------------------------------- ### Example Formula with Stateful Transforms - Patsy - Python Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Illustrates a complex Patsy formula incorporating multiple stateful transforms like `center` and `spline` within an `I()` identity function, demonstrating how Patsy handles dependencies correctly. ```Python y ~ I(spline(center(x1)) + center(x2)) ``` -------------------------------- ### Generating Design Matrix Without Intercept (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/quickstart.rst Illustrates how to explicitly remove the automatic intercept term from the design matrix. This is achieved by adding '- 1' to the formula string passed to the dmatrix function. ```python dmatrix("x1 + x2 - 1", data) ``` -------------------------------- ### Using a Patsy-Enabled Model Class in Python Source: https://github.com/pydata/patsy/blob/master/doc/library-developers.rst Demonstrates how to use a model class (`LM`) that has been integrated with Patsy. It shows instantiating the model using both traditional matrix input and the Patsy formula interface (`formula`, `data`). Examples include making predictions on new data and calculating log-likelihood, highlighting Patsy's automatic handling of categorical variables and transformations. ```python from patsy import demo_data data = demo_data("x", "y", "a") # Old and boring approach (but it still works): X = np.column_stack(([1] * len(data["y"]), data["x"])) LM((data["y"], X)) # Fancy new way: m = LM("y ~ x", data) m m.predict({"x": [10, 20, 30]}) m.loglik(data) m.loglik({"x": [10, 20, 30], "y": [-1, -2, -3]}) # Your users get support for categorical predictors for free: LM("y ~ a", data) # And variable transformations too: LM("y ~ np.log(x ** 2)", data) ``` -------------------------------- ### Importing Libraries for Patsy Formulas - Python Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst Imports the numpy library as 'np' and all components from the patsy library. This setup is required to use patsy's formula parsing and model description functionalities. ```python import numpy as np from patsy import * ``` -------------------------------- ### Define Categorical Factors - R Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Sets up two categorical factors, 'a' and 'b', each with two levels, for use in subsequent R formula examples demonstrating differences in categorical coding. ```R a <- factor(c("a1", "a1", "a2", "a2")) b <- factor(c("b1", "b2", "b1", "b2")) ``` -------------------------------- ### Using Custom Coding Class (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst Demonstrates how to use the custom MyTreat coding class with patsy.dmatrix, showing examples for full-rank (without intercept), reduced-rank (with intercept), and passing arguments to the custom class constructor. ```python # Full rank: dmatrix("0 + C(a, MyTreat)", data) # Reduced rank: dmatrix("C(a, MyTreat)", data) # With argument: dmatrix("C(a, MyTreat(2))", data) ``` -------------------------------- ### Patsy Formula Parsing Example Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst This example illustrates how Patsy's parser distinguishes its operators from Python code within factors. Operators inside parentheses within a factor are treated as part of the factor's Python code, not Patsy operators. ```Patsy Formula f(x1 + x2) + x3 ``` -------------------------------- ### Accessing Term objects Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Retrieves the list of Term objects that represent the formula terms. It also shows how to get the name string for each Term object. ```python di.terms [term.name() for term in di.terms] ``` -------------------------------- ### Loading Custom Coding Class (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst Loads and executes the Python code defining the custom MyTreat class from an external file, making it available for use in subsequent examples. This snippet is suppressed in the original output. ```python with open("_examples/example_treatment.py") as f: exec(f.read()) ``` -------------------------------- ### Using the : Operator in Patsy Formula Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst The `:` operator computes the interaction between terms, taking the union of factors within them. For example, `(a + b):(c + d)` expands to `a:c + a:d + b:c + b:d`. It handles identical terms such that `a:a` is `a`. ```Patsy Formula (a + b):(c + d) ``` ```Patsy Formula a:c + a:d + b:c + b:d ``` ```Patsy Formula a:a ``` ```Patsy Formula a ``` ```Patsy Formula (a:b):(a:c) ``` ```Patsy Formula a:b:c ``` -------------------------------- ### Defining a Custom Patsy Factor Class in Python Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst This Python class `MyAlternativeFactor` demonstrates a basic custom Patsy factor. It implements the required `name` method and stores internal state (`alternative_formula`, `side`) passed during initialization. This example is part of a larger pattern for integrating alternative formula systems with Patsy. ```python class MyAlternativeFactor(object): # A factor object that simply returns the design def __init__(self, alternative_formula, side): self.alternative_formula = alternative_formula self.side = side def name(self): return self.side ``` -------------------------------- ### Plotting Patsy B-spline Basis and Resulting Curve Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst Generates a B-spline basis using `patsy.dmatrix` with the `bs` transform. It then plots the individual basis functions (scaled by example coefficients) and the sum of these functions, representing the resulting spline curve. Requires `matplotlib`. ```python import matplotlib.pyplot as plt import numpy as np from patsy import dmatrix plt.title("B-spline basis example (degree=3)"); x = np.linspace(0., 1., 100) y = dmatrix("bs(x, df=6, degree=3, include_intercept=True) - 1", {"x": x}) # Define some coefficients b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7]) # Plot B-spline basis functions (colored curves) each multiplied by its coeff plt.plot(x, y*b); # Plot the spline itself (sum of the basis functions, thick black curve) plt.plot(x, np.dot(y, b), color='k', linewidth=3); ``` -------------------------------- ### Define eval for Custom Factor (Python) Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst Implements the `eval` method for a custom Patsy factor. This method is responsible for evaluating the factor given the current state and data. In this example, it delegates the matrix generation to an alternative formula object. ```python def eval(self, state, data): return self.alternative_formula.get_matrix(self.side, data) ``` -------------------------------- ### Plotting Patsy Cyclic Cubic Spline Basis and Resulting Curve Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst Generates a cyclic cubic regression spline basis using `patsy.dmatrix` with the `cc` transform. It plots the individual basis functions (scaled by example coefficients) and their sum, demonstrating the resulting cyclic spline curve. Requires `matplotlib`. ```python import matplotlib.pyplot as plt import numpy as np from patsy import dmatrix plt.title("Cyclic cubic regression spline basis example"); x = np.linspace(0., 1., 100) # Define some coefficients (example values) b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7]) y = dmatrix("cc(x, df=6) - 1", {"x": x}) # Plot cyclic cubic regression spline basis functions (colored curves) each multiplied by its coeff plt.plot(x, y*b); # Plot the spline itself (sum of the basis functions, thick black curve) plt.plot(x, np.dot(y, b), color='k', linewidth=3); ``` -------------------------------- ### Function to Add Predictors to a Patsy ModelDesc Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst Provides an example function `add_predictors` that takes a base formula string and a list of additional variable names. It parses the formula, creates `LookupFactor` and `Term` objects for the extra variables, and appends them to the right-hand side of the parsed `ModelDesc`, allowing programmatic model expansion. ```python from patsy import ModelDesc, Term, LookupFactor, parse_formula def add_predictors(formula_string, extra_vars): """ Parses a formula string and adds additional predictors as Terms. Args: formula_string: The base formula string (e.g., "y ~ a + b"). extra_vars: A list of variable names to add as individual predictors. Returns: A ModelDesc object including terms from the formula and extra variables. """ # Parse the initial formula desc = parse_formula(formula_string) # Create Terms for the extra variables and add them to the RHS for var_name in extra_vars: lookup_factor = LookupFactor(var_name) new_term = Term([lookup_factor]) desc.rhs_terms.append(new_term) return desc ``` -------------------------------- ### Plotting Patsy Natural Cubic Spline Basis and Resulting Curve Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst Generates a natural cubic regression spline basis using `patsy.dmatrix` with the `cr` transform. It plots the individual basis functions (scaled by example coefficients) and their sum, illustrating the resulting spline curve. Requires `matplotlib`. ```python import matplotlib.pyplot as plt import numpy as np from patsy import dmatrix plt.title("Natural cubic regression spline basis example"); x = np.linspace(0., 1., 100) # Define some coefficients (example values) b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7]) y = dmatrix("cr(x, df=6) - 1", {"x": x}) # Plot natural cubic regression spline basis functions (colored curves) each multiplied by its coeff plt.plot(x, y*b); # Plot the spline itself (sum of the basis functions, thick black curve) plt.plot(x, np.dot(y, b), color='k', linewidth=3); ``` -------------------------------- ### Create Source and Wheel Distributions (Shell) Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt Executes setup.py twice to create both a source distribution (sdist) in zip format and a binary wheel distribution (bdist_wheel). This is typically done from a clean clone to ensure correct packaging before uploading to PyPI. ```shell python setup.py sdist --formats=zip && python setup.py bdist_wheel ``` -------------------------------- ### Create Source Distribution (Shell) Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt Runs the setup.py script to create a source distribution (sdist) of the project, specifically formatted as a zip file. This is a standard step in packaging Python libraries for distribution. ```shell python setup.py sdist --formats=zip ``` -------------------------------- ### Build Documentation (Shell) Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt Changes the current directory to 'docs' and executes the 'make clean html' command to clean previous builds and generate the HTML documentation. This step is used to verify that the documentation builds without errors or warnings. ```shell cd docs; make clean html ``` -------------------------------- ### Upload Distributions to PyPI (Shell) Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt Uses the 'twine' command-line utility to upload the generated source (.zip) and wheel (.whl) distribution files found in the 'dist' directory to the Python Package Index (PyPI), making the new version available. ```shell twine upload dist/*.zip dist/*.whl ``` -------------------------------- ### Loading Libraries and Sample Data (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst Imports key Patsy components and generates a sample dataset with a categorical variable 'a' having 3 levels for demonstration. ```python from patsy import dmatrix, demo_data, ContrastMatrix, Poly data = demo_data("a", nlevels=3) data ``` -------------------------------- ### Creating a DesignInfo object Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Demonstrates how to generate a design matrix using dmatrix with a formula and data, and then access the associated DesignInfo object which contains metadata about the resulting matrix. ```python mat = dmatrix("a + x", demo_data("a", "x", nlevels=3)) di = mat.design_info ``` -------------------------------- ### Cloning Patsy Git Repository (Shell) Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst Provides the command-line instruction to clone the latest development version of the Patsy source code repository from GitHub using the Git version control system. ```Shell git clone git://github.com/pydata/patsy.git ``` -------------------------------- ### Tag and Push Release (Shell) Source: https://github.com/pydata/patsy/blob/master/release-checklist.txt Uses Git to create a version tag (replacing with the actual version number) and then pushes all local tags to the remote Git repository. This marks the release point in the version history. ```shell git tag v && git push --tags ``` -------------------------------- ### Demonstrating Patsy Term Ordering (Python) Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst This snippet creates a sample dataset and uses Patsy's `dmatrix` to build a design matrix from a complex formula. It then accesses the `term_names` attribute to show the order in which Patsy sorts the terms according to its internal rules. ```python data = demo_data("a", "b", "x1", "x2") mat = dmatrix("x1:x2 + a:b + b + x1:a:b + a + x2:a:x1", data) mat.design_info.term_names ``` -------------------------------- ### Using Patsy's Stateful Center Transform (Patsy/Python) Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Demonstrates the correct behavior of Patsy's built-in `center` stateful transform. It correctly applies the transformation to new data using the mean calculated from the original data used to build the design matrix. ```python import numpy as np from patsy import dmatrix, build_design_matrices, incr_dbuilder data = {"x": [1, 2, 3, 4]} new_data = {"x": [5, 6, 7, 8]} # Build matrix on original data using stateful center fixed_mat = dmatrix("center(x)", data) # fixed_mat output: [ -1.5, -0.5, 0.5, 1.5] (centered on mean 2.5) # Correct! Apply to new data using the state from fixed_mat build_design_matrices([fixed_mat.design_info], new_data)[0] # Output: [ 3.5, 4.5, 5.5, 6.5] (correctly centered on original mean 2.5) ``` -------------------------------- ### Programmatically Building ModelDesc with Factors and Terms in Patsy Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst Demonstrates the programmatic approach to defining a model structure using Patsy's internal API. It shows how to create `LookupFactor` and `EvalFactor` objects, combine them into `Term` objects, and assemble these into a `ModelDesc` for use with `dmatrix`. ```python import numpy as np from patsy import (ModelDesc, EvalEnvironment, Term, EvalFactor, LookupFactor, demo_data, dmatrix) data = demo_data("a", "x") # LookupFactor takes a dictionary key: a_lookup = LookupFactor("a") # EvalFactor takes arbitrary Python code: x_transform = EvalFactor("np.log(x ** 2)") # First argument is empty list for dmatrix; we would need to put # something there if we were calling dmatrices. desc = ModelDesc([], [Term([a_lookup]), Term([x_transform]), # An interaction: Term([a_lookup, x_transform])]) # Create the matrix (or pass 'desc' to any statistical library # function that uses patsy.dmatrix internally): dmatrix(desc, data) ``` -------------------------------- ### Demonstrating Naive Centering Problem (Patsy/Python) Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Shows how a naive centering function behaves incorrectly when applied to new data after being fitted on original data, using `patsy.dmatrix` to build the initial matrix and `patsy.build_design_matrices` to transform new data. The naive function centers the new data based on its *own* mean, not the original data's mean. ```python import numpy as np from patsy import dmatrix, build_design_matrices, incr_dbuilder data = {"x": [1, 2, 3, 4]} # Build matrix on original data mat = dmatrix("naive_center(x)", data) # mat output: [ -1.5, -0.5, 0.5, 1.5] (centered on mean 2.5) # Apply to new data new_data = {"x": [5, 6, 7, 8]} # Broken! Centers on the mean of new_data (6.5) instead of original mean (2.5) build_design_matrices([mat.design_info], new_data)[0] # Output: [ -1.5, -0.5, 0.5, 1.5] ``` -------------------------------- ### Integrating Patsy High-Level Interface in Python Source: https://github.com/pydata/patsy/blob/master/doc/library-developers.rst Shows how to modify existing Python function signatures to accept Patsy formula strings and data dictionaries. It uses `patsy.dmatrices` or `patsy.dmatrix` to build design matrices from the formula and data, enabling users to specify models using the formula mini-language. Requires the `patsy` library. ```python def mymodel2_patsy(formula_like, data={}, ...): y, X = patsy.dmatrices(formula_like, data, 1) ... def mymodel1_patsy(formula_like, data={}, ...): X = patsy.dmatrix(formula_like, data, 1) ... ``` -------------------------------- ### Stateful Centering with Incremental Data (Patsy/Python) Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Shows how Patsy's stateful `center` transform correctly handles incremental data processing with `patsy.incr_dbuilder`. It makes an initial pass to calculate the necessary state (like the mean) across all chunks before building the matrix. It also shows applying the resulting transform to new data. ```python import numpy as np from patsy import dmatrix, build_design_matrices, incr_dbuilder data = {"x": [1, 2, 3, 4]} new_data = {"x": [5, 6, 7, 8]} # Process data in chunks data_chunked = [{"x": data["x"][:2]}, {"x": data["x"][2:]}] # Build incrementally using stateful center dinfo = incr_dbuilder("center(x)", lambda: iter(data_chunked)) # Correct! Matrix is built using the overall mean (2.5) matrix_from_chunks = np.vstack([build_design_matrices([dinfo], chunk)[0] for chunk in data_chunked]) # Output: [[-1.5, -0.5], [0.5, 1.5]] # Correct! Apply to new data using the state from the incremental build build_design_matrices([dinfo], new_data)[0] # Output: [ 3.5, 4.5, 5.5, 6.5] ``` -------------------------------- ### Listing Python Dependencies Source: https://github.com/pydata/patsy/blob/master/doc/sphinxext/requirements.txt This snippet lists the required Python packages for the project, including scientific computing libraries (numpy, scipy, pandas) and development/documentation tools (mistune, jsonschema, ipython). This format is typical for a requirements file. ```Python numpy scipy pandas mistune jsonschema ipython ``` -------------------------------- ### Importing the patsy library Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Imports all public names from the pat patsy library into the current namespace, making functions and classes directly accessible. ```python from patsy import * ``` -------------------------------- ### Creating Design Matrix with Custom Names using DesignMatrix and DesignInfo in Patsy Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst Shows how to use `patsy.DesignMatrix` and `patsy.DesignInfo` to wrap an array-like object, allowing the specification of custom column names for the resulting design matrix while still using `dmatrix`. ```python from patsy import DesignMatrix, DesignInfo design_info = DesignInfo(["Intercept!", "Not intercept!"]) X_dm = DesignMatrix(X, design_info) dmatrix(X_dm) ``` -------------------------------- ### Creating Design Matrix from List of Lists in Patsy Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst Demonstrates the simplest method to create a design matrix by directly passing a list of lists (or any array-like object) to the `patsy.dmatrix` function. This bypasses the formula parser entirely. ```python from patsy import dmatrix X = [[1, 10], [1, 20], [1, -2]] dmatrix(X) ``` -------------------------------- ### Python: Exploring Patsy Formula Parsing Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst Provides Python code using `patsy.ModelDesc.from_formula` to parse and inspect various Patsy formulas. This allows users to explore how different syntax elements like interactions, combinations, powers, and embedded Python code are interpreted. ```Python from patsy import ModelDesc ModelDesc.from_formula("y ~ x") ModelDesc.from_formula("y ~ x + x + x") ModelDesc.from_formula("y ~ -1 + x") ModelDesc.from_formula("~ -1") ModelDesc.from_formula("y ~ a:b") ModelDesc.from_formula("y ~ a*b") ModelDesc.from_formula("y ~ (a + b + c + d) ** 2") ModelDesc.from_formula("y ~ (a + b)/(c + d)") ModelDesc.from_formula("np.log(x1 + x2) " "+ (x + {6: x3, 8 + 1: x4}[3 * i])") ``` -------------------------------- ### Naive Centering with Incremental Data (Patsy/Python) Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Illustrates the issue with using a naive centering function when building a design matrix incrementally with `patsy.incr_dbuilder`. The naive function centers each chunk of data independently, leading to incorrect results for the dataset as a whole. ```python import numpy as np from patsy import dmatrix, build_design_matrices, incr_dbuilder data = {"x": [1, 2, 3, 4]} # Process data in chunks data_chunked = [{"x": data["x"][:2]}, {"x": data["x"][2:]}] # Build incrementally using naive center dinfo = incr_dbuilder("naive_center(x)", lambda: iter(data_chunked)) # Broken! Each chunk is centered independently np.vstack([build_design_matrices([dinfo], chunk)[0] for chunk in data_chunked]) # Output: [[-0.5, 0.5], [-0.5, 0.5]] ``` -------------------------------- ### Build Design Matrix for New Data with Tensor Product Basis (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst This snippet shows how to use a previously created design matrix's `design_info` to build a new design matrix for a different set of data points. It defines data for three variables (`x1`, `x2`, `x3`), creates a tensor product basis with centering constraints using `dmatrix`, and then uses `build_design_matrices` to apply the same basis transformation to new data points. It requires `numpy` and `patsy`. The input is dictionaries containing data for the original basis creation and new data points; the output is a new design matrix for the new data. ```python data = {"x1": np.linspace(0., 1., 100), "x2": np.linspace(0., 1., 100), "x3": np.linspace(0., 1., 100)} design_matrix = dmatrix("te(cr(x1, df=3), cr(x2, df=3), cc(x3, df=3), constraints='center')", data) new_data = {"x1": [0.1, 0.2], "x2": [0.2, 0.3], "x3": [0.3, 0.4]} new_design_matrix = build_design_matrices([design_matrix.design_info], new_data)[0] new_design_matrix np.asarray(new_design_matrix) ``` -------------------------------- ### Creating Design Matrices with Patsy Formula (Python) Source: https://github.com/pydata/patsy/blob/master/doc/overview.rst Demonstrates how to use the patsy.dmatrices function with a formula string to generate design matrices for a statistical model from a dataset. The formula specifies the dependent variable (y), independent variables (x, a, b), and an interaction term (a:b). ```Python patsy.dmatrices("y ~ x + a + b + a:b", data) ``` -------------------------------- ### Calculate Patsy dmatrix columns for 1 + a:b (Python) Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Demonstrates how Patsy correctly handles collinearity issues with interaction terms (a:b) and an intercept (1) by producing the expected number of columns in the design matrix, unlike R's model.matrix in certain cases. ```python # Python: a = ["a1", "a1", "a2", "a2"] b = ["b1", "b2", "b1", "b2"] mat = dmatrix("1 + a:b") mat.shape[1] ``` -------------------------------- ### Calculate Patsy dmatrix columns for 0 + a:x + a:b (Python) Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Illustrates how Patsy correctly determines the number of columns for a design matrix involving both numeric (a:x) and categorical (a:b) interaction terms without an intercept (0 +). Patsy distinguishes between numeric and categorical factors, leading to the correct full-rank encoding for the categorical term. ```python # Python: x = [1, 2, 3, 4] mat = dmatrix("0 + a:x + a:b") mat.shape[1] ``` -------------------------------- ### Using Named Custom Contrast Matrix (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst Shows how to create a ContrastMatrix object with custom column names to improve the readability of the resulting design matrix columns. ```python contrast_mat = ContrastMatrix(contrast, ["[pretty0]", "[pretty1]"]) dmatrix("C(a, contrast_mat)", data) ``` -------------------------------- ### Importing builtins from patsy - Python Source: https://github.com/pydata/patsy/blob/master/doc/builtins-reference.rst This snippet shows how to import all members from the patsy.builtins module directly into the current namespace. These members are automatically available within patsy formula code. ```python from patsy.builtins import * ``` -------------------------------- ### Manually Constructing a Patsy ModelDesc - Python Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst Demonstrates how to programmatically build a Patsy ModelDesc object using its constituent parts: Terms and EvalFactors. This manual construction represents the structure of the formula 'y ~ a + a:b + np.log(x)', showing the left-hand side term ('y') and the right-hand side terms (intercept, 'a', 'a:b', 'np.log(x)'). ```python from patsy import ModelDesc, Term, EvalFactor ModelDesc([Term([EvalFactor("y")])], [Term([]), Term([EvalFactor("a")]), Term([EvalFactor("a"), EvalFactor("b")]), Term([EvalFactor("np.log(x)")]) ]) ``` -------------------------------- ### Using Custom Contrast Matrix (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst Demonstrates providing a custom contrast matrix as a list of lists directly to the C() function to define arbitrary coding schemes. ```python contrast = [[1, 2], [3, 4], [5, 6]] dmatrix("C(a, contrast)", data) dmatrix("C(a, [[1], [2], [-4]])", data) ``` -------------------------------- ### Use Custom Formula with dmatrix (Python) Source: https://github.com/pydata/patsy/blob/master/doc/expert-model-specification.rst Demonstrates how to instantiate and use a custom Patsy formula object with the `dmatrix` function to generate a design matrix based on the custom formula's definition. ```python my_formula = MyAlternativeFormula(...) dmatrix(my_formula, data) ``` -------------------------------- ### Accessing term codings Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Retrieves an OrderedDict mapping Term objects to lists of SubtermInfo objects. This describes how each term is encoded into the final design matrix columns. ```python di.term_codings ``` -------------------------------- ### Initial Imports for Patsy Spline Usage Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst Imports the necessary `numpy` library for numerical operations and `dmatrix`, `build_design_matrices` from `patsy` for creating and applying design matrices, particularly for spline bases. ```python import numpy as np from patsy import dmatrix, build_design_matrices ``` -------------------------------- ### Patsy Intercept Handling with Parentheses - Python Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Demonstrates how Patsy handles implicit intercepts and the effect of parentheses on their inclusion, contrasting with R's behavior. The first formula excludes the intercept, while the second includes it. ```Python dmatrices("y ~ b - 1") # equivalent to 1 + b - 1: no intercept dmatrices("y ~ (b - 1)") # equivalent to 1 + (b - 1): has intercept ``` -------------------------------- ### Accessing column name indexes Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Retrieves an OrderedDict mapping column names (strings) to their corresponding integer indexes in the design matrix, sorted by index. ```python di.column_name_indexes ``` -------------------------------- ### Compare Patsy Formula Ranks - Python Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst This snippet uses Patsy to generate design matrices for two different formulas involving categorical factors and their interaction. It then uses NumPy to calculate and compare the matrix ranks, showing that the column spans are identical, illustrating Patsy's handling of term redundancy. It requires the `patsy` and `numpy` libraries. ```python data = demo_data("a", "b", "y") mat1 = dmatrices("y ~ 0 + a:b", data)[1] mat2 = dmatrices("y ~ 1 + a + b + a:b", data)[1] np.linalg.matrix_rank(mat1) np.linalg.matrix_rank(mat2) np.linalg.matrix_rank(np.column_stack((mat1, mat2))) ``` -------------------------------- ### Visualize 2D Tensor Product Basis (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst This snippet demonstrates how to create and visualize a 2D tensor product basis using Patsy's `te`, `cr`, and `cc` functions. It generates a design matrix for a smooth of two variables (`x1`, `x2`) and then plots each basis function as a 3D surface and contours using Matplotlib. It requires `numpy`, `matplotlib`, and `patsy`. The input is grid data for `x1` and `x2`, and the output is a plot showing the basis functions. ```python In [10]: from matplotlib import cm In [20]: from mpl_toolkits.mplot3d.axes3d import Axes3D In [30]: x1 = np.linspace(0., 1., 100) In [40]: x2 = np.linspace(0., 1., 100) In [50]: x1, x2 = np.meshgrid(x1, x2) In [60]: df = 3 In [70]: y = dmatrix("te(cr(x1, df), cc(x2, df)) - 1", ....: {"x1": x1.ravel(), "x2": x2.ravel(), "df": df}) ....: In [80]: print y.shape In [90]: fig = plt.figure() In [100]: fig.suptitle("Tensor product basis example (2 covariates)"); In [110]: for i in range(df * df): .....: ax = fig.add_subplot(df, df, i + 1, projection='3d') .....: yi = y[:, i].reshape(x1.shape) .....: ax.plot_surface(x1, x2, yi, rstride=4, cstride=4, alpha=0.15) .....: ax.contour(x1, x2, yi, zdir='z', cmap=cm.coolwarm, offset=-0.5) .....: ax.contour(x1, x2, yi, zdir='y', cmap=cm.coolwarm, offset=1.2) .....: ax.contour(x1, x2, yi, zdir='x', cmap=cm.coolwarm, offset=-0.2) .....: ax.set_xlim3d(-0.2, 1.0) .....: ax.set_ylim3d(0, 1.2) .....: ax.set_zlim3d(-0.5, 1) .....: ax.set_xticks([0, 1]) .....: ax.set_yticks([0, 1]) .....: ax.set_zticks([-0.5, 0, 1]) .....: In [120]: fig.tight_layout() ``` -------------------------------- ### Display Patsy Design Matrix 2 - Python Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst This snippet outputs the content of the `mat2` variable, containing the design matrix generated by Patsy for the formula `y ~ 1 + a + b + a:b`. It demonstrates that while the rank is the same as `mat1`, the actual matrix contents differ due to Patsy's coding strategies. This requires `mat2` to be defined previously. ```python mat2 ``` -------------------------------- ### Correct Stateful Transform Usage in dmatrix - Patsy - Python Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Shows the correct way to use a stateful transform function (`center`) directly within a Patsy formula string passed to `dmatrix` using a simple variable reference. ```Python dmatrix("y ~ center(x)", data) ``` -------------------------------- ### Check Model Matrix Column Count with Interaction - R Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Generates the model matrix for the formula `~ 1 + a:b` in R and displays the number of columns, highlighting a potential overspecification issue in R's coding algorithm. ```R mat <- model.matrix(~ 1 + a:b) ncol(mat) ``` -------------------------------- ### Correct Stateful Transform Usage via Variable - Patsy - Python Source: https://github.com/pydata/patsy/blob/master/doc/stateful-transforms.rst Demonstrates the correct usage of a stateful transform by assigning the function to a local variable (`asdf`) and then referencing the variable directly in the Patsy formula string. ```Python asdf = patsy.center dmatrix("y ~ asdf(x)", data) ``` -------------------------------- ### Python: Describing Parsed Patsy Formula Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst Illustrates how to use the `ModelDesc.describe()` method in Python. This method converts a parsed `ModelDesc` object back into its formula string representation, which can be helpful for understanding the result of parsing complex formulas. ```Python desc = ModelDesc.from_formula("y ~ (a + b + c + d) ** 2") desc.describe() ``` -------------------------------- ### Check Model Matrix Rank with Interaction - R Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Calculates and displays the rank of the model matrix generated by the formula `~ 1 + a:b` in R, illustrating R's behavior with interactions and intercepts. ```R qr(model.matrix(~ 1 + a:b))$rank ``` -------------------------------- ### Default Treatment Coding (Patsy, Python) Source: https://github.com/pydata/patsy/blob/master/doc/categorical-coding.rst Demonstrates the default behavior of patsy.dmatrix when given a categorical variable, which is to apply Treatment coding. ```python dmatrix("a", data) ``` -------------------------------- ### Parsing a Formula String into ModelDesc - Python Source: https://github.com/pydata/patsy/blob/master/doc/formulas.rst Illustrates the standard method for converting a Patsy formula string into a ModelDesc object using the ModelDesc.from_formula() class method. This is the typical way users interact with Patsy's formula parsing, achieving the same result as manual construction but more conveniently. ```python ModelDesc.from_formula("y ~ a + a:b + np.log(x)") ``` -------------------------------- ### Accessing factor information Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Retrieves a dictionary mapping factor objects to FactorInfo objects, providing detailed information about each factor used in the design. ```python di.factor_infos ``` -------------------------------- ### Accessing column names Source: https://github.com/pydata/patsy/blob/master/doc/API-reference.rst Retrieves the list of column names from a DesignInfo object. The names are returned as strings in the order they appear in the design matrix. ```python di.column_names ``` -------------------------------- ### Applying Patsy B-spline Basis to New Data Source: https://github.com/pydata/patsy/blob/master/doc/spline-regression.rst Creates a B-spline design matrix from initial data using `patsy.dmatrix`. It then uses the `design_info` from this matrix to apply the same spline transformation to a new set of data points using `patsy.build_design_matrices`, generating a new design matrix for prediction. ```python import numpy as np from patsy import dmatrix, build_design_matrices data = {"x": np.linspace(0., 1., 100)} design_matrix = dmatrix("bs(x, df=4)", data) new_data = {"x": [0.1, 0.25, 0.9]} build_design_matrices([design_matrix.design_info], new_data)[0] ``` -------------------------------- ### Calculate R model.matrix columns for 0 + a:x + a:b (R) Source: https://github.com/pydata/patsy/blob/master/doc/R-comparison.rst Shows R's model.matrix behavior when dealing with a mix of numeric (a:x) and categorical (a:b) interaction terms without an intercept (0 +). It incorrectly treats the categorical factor as if it were collinear with the numeric interaction, resulting in fewer columns than expected. ```r # R: > x <- c(1, 2, 3, 4) > mat <- model.matrix(~ 0 + a:x + a:b) > ncol(mat) [1] 4 ```