### Install Copulas from Source (Unix-based) Source: https://github.com/sdv-dev/copulas/blob/main/INSTALL.md Clone the repository and use 'make install' for installing from source on Unix-like systems. ```bash git clone https://github.com/sdv-dev/Copulas cd Copulas git checkout stable make install ``` -------------------------------- ### Install Copulas for Development Source: https://github.com/sdv-dev/copulas/blob/main/INSTALL.md Use 'make install-develop' for installing from source when contributing to the project. Branch from 'main' first. ```bash git clone git@github.com:sdv-dev/Copulas cd Copulas git checkout main git checkout -b make install-develp ``` -------------------------------- ### Install Copulas from Source Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Clone the Copulas repository and install development requirements. Ensure your virtual environment is activated. ```bash git clone https://github.com/sdv-dev/Copulas.git cd Copulas git checkout main make install-develop make install-readme ``` -------------------------------- ### Usage Examples for Dataset Sampling Functions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/datasets.md Demonstrates how to import and use various dataset sampling functions for bivariate, trivariate, and univariate data generation. ```python from copulas.datasets import ( sample_bivariate_age_income, sample_trivariate_xyz, sample_univariate_normal ) # Bivariate data bivariate_data = sample_bivariate_age_income(size=500) print(bivariate_data.columns) # ['age', 'income'] # Trivariate data trivariate_data = sample_trivariate_xyz(size=1000) print(trivariate_data.shape) # (1000, 3) # Univariate data univariate_data = sample_univariate_normal(size=100) print(univariate_data.mean()) # Approximately 1.0 ``` -------------------------------- ### Complete Copulas Workflow Example Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/visualization.md Demonstrates a full workflow from loading data, fitting a copula, generating synthetic data, and performing 1D, 2D, and 3D comparisons using visualization functions. ```python from copulas.datasets import sample_trivariate_xyz from copulas.multivariate import GaussianMultivariate from copulas.visualization import compare_1d, compare_2d, compare_3d # Load real data real_data = sample_trivariate_xyz(size=1000, seed=42) # Fit copula copula = GaussianMultivariate() copula.fit(real_data) # Generate synthetic data synthetic_data = copula.sample(1000) # 1D comparisons fig1d_x = compare_1d(real_data['x'], synthetic_data['x'], title='X Distribution') fig1d_x.show() # 2D comparison fig2d = compare_2d(real_data, synthetic_data, columns=['x', 'y']) fig2d.show() # 3D comparison fig3d = compare_3d(real_data, synthetic_data) fig3d.show() ``` -------------------------------- ### Complete Copulas Utilities Example Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/utils.md Demonstrates the combined usage of several Copulas utility functions, including random state validation, instance creation from FQN and class, and retrieving FQNs for serialization. ```python import numpy as np from copulas.utils import ( validate_random_state, get_instance, get_qualified_name, store_args, random_state ) # Validate and use random state rs = validate_random_state(42) print(type(rs)) # # Create instances from various sources univariate_cls = 'copulas.univariate.GaussianUnivariate' model1 = get_instance(univariate_cls, random_state=rs) # Get FQN for serialization fqn = get_qualified_name(model1) print(fqn) # 'copulas.univariate.gaussian.GaussianUnivariate' # Recreate from FQN model2 = get_instance(fqn, random_state=rs) ``` -------------------------------- ### Install Copulas with pip Source: https://github.com/sdv-dev/copulas/blob/main/INSTALL.md Use this command to install the latest stable release of Copulas from PyPi. ```bash pip install copulas ``` -------------------------------- ### Get Copulas Instance Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/utils.md Instantiate Copulas models from a string FQN, a class, or an existing instance. Useful for flexible model creation and copying. ```python model1 = get_instance('copulas.univariate.GaussianUnivariate') # From class from copulas.univariate import GaussianUnivariate model2 = get_instance(GaussianUnivariate, random_state=42) # From instance (creates copy) model3 = get_instance(model1) ``` -------------------------------- ### Install Copulas using Conda Source: https://github.com/sdv-dev/copulas/blob/main/README.md Install the Copulas library using Conda. This is an alternative installation method, often preferred for managing complex dependencies. ```bash conda install -c conda-forge copulas ``` -------------------------------- ### Complete Copulas Type Example Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/types.md Illustrates the instantiation of various copula types (Univariate, Bivariate, Multivariate) and the creation of sample data arrays and DataFrames for use with the library. ```python from copulas.univariate import Univariate, ParametricType, BoundedType from copulas.bivariate import Bivariate, CopulaTypes from copulas.multivariate import GaussianMultivariate, VineCopula from copulas.multivariate.tree import TreeTypes import numpy as np import pandas as pd # Univariate with type filtering univariate: Univariate = Univariate( parametric=ParametricType.PARAMETRIC, bounded=BoundedType.BOUNDED ) # Bivariate with type selection bivariate: Bivariate = Bivariate(copula_type=CopulaTypes.CLAYTON) # Multivariate gaussian_mv: GaussianMultivariate = GaussianMultivariate() vine_mv: VineCopula = VineCopula(vine_type='center') # Data types univariate_data: np.ndarray = np.array([1.0, 2.0, 3.0]) bivariate_data: np.ndarray = np.array([[0.1, 0.2], [0.3, 0.4]]) multivariate_data: pd.DataFrame = pd.DataFrame({ 'x': [1.0, 2.0, 3.0], 'y': [0.5, 1.5, 2.5] }) ``` -------------------------------- ### Complete GaussianMultivariate Copula Example Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/multivariate.md Demonstrates the full lifecycle of a GaussianMultivariate copula, from fitting to data, generating synthetic samples, serializing and deserializing the model, and computing PDF and CDF values. ```python from copulas.multivariate import GaussianMultivariate from copulas.datasets import sample_trivariate_xyz import pandas as pd # Load sample data real_data = sample_trivariate_xyz(size=500) # Fit Gaussian copula copula = GaussianMultivariate() copula.fit(real_data) # Generate synthetic data synthetic_data = copula.sample(num_rows=500) # Serialize/deserialize model_dict = copula.to_dict() new_copula = GaussianMultivariate.from_dict(model_dict) # Compute probabilities pdf_vals = copula.pdf(real_data) cdf_vals = copula.cdf(real_data) ``` -------------------------------- ### Automatic Copula Selection Example Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/bivariate.md Demonstrates using `select_copula` to automatically find the best copula for generated bivariate data. Requires importing `select_copula` and numpy. ```python from copulas.bivariate import select_copula import numpy as np data = np.column_stack([ np.random.uniform(0, 1, 1000), np.random.uniform(0, 1, 1000) ]) best_copula = select_copula(data) print(f"Selected: {best_copula.copula_type}") ``` -------------------------------- ### Sample Trivariate Data Source: https://github.com/sdv-dev/copulas/blob/main/README.md Load a sample trivariate dataset for demonstration purposes. This function is useful for getting started with the library. ```python from copulas.datasets import sample_trivariate_xyz real_data = sample_trivariate_xyz() real_data.head() ``` -------------------------------- ### Verify Reproducibility with Random State in GaussianMultivariate Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md This example verifies that setting the same `random_state` when initializing GaussianMultivariate copulas results in identical samples after fitting. It compares two samples generated with the same seed. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import ( ParametricType, BoundedType, GaussianUnivariate, BetaUnivariate ) from copulas.datasets import sample_trivariate_xyz import pandas as pd # Prepare data real_data = sample_trivariate_xyz(size=2000, seed=42) # Verify reproducibility with random_state copula_a = GaussianMultivariate(random_state=42) copula_a.fit(real_data) sample_a = copula_a.sample(100) copula_b = GaussianMultivariate(random_state=42) copula_b.fit(real_data) sample_b = copula_b.sample(100) # Samples should be identical print((sample_a == sample_b).all()) # True ``` -------------------------------- ### Install Copulas with conda Source: https://github.com/sdv-dev/copulas/blob/main/INSTALL.md Use this command to install the latest stable release of Copulas from Anaconda channels. ```bash conda install -c sdv-dev -c conda-forge copulas ``` -------------------------------- ### Clayton Copula Fit and Sample Example Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/bivariate.md Demonstrates how to fit a Clayton copula to bivariate data and generate synthetic samples. Requires importing Clayton and numpy. ```python from copulas.bivariate import Clayton import numpy as np # Generate correlated uniform data data = np.column_stack([ np.random.uniform(0, 1, 500), np.random.uniform(0, 1, 500) ]) copula = Clayton() copula.fit(data) synthetic = copula.sample(100) ``` -------------------------------- ### Generate Samples from Fitted GaussianMultivariate Copula Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/multivariate.md Generate synthetic samples from a fitted GaussianMultivariate copula. This example demonstrates fitting the copula with automatic distribution selection and then sampling new data. ```python from copulas.multivariate import GaussianMultivariate import pandas as pd # Fit with automatic distribution selection data = pd.DataFrame({ 'age': [25, 30, 35, 40, 45], 'income': [50000, 60000, 70000, 80000, 90000], 'score': [0.5, 0.6, 0.7, 0.8, 0.9] }) copula = GaussianMultivariate() copula.fit(data) # Generate synthetic data synthetic = copula.sample(num_rows=100) print(synthetic.head()) ``` -------------------------------- ### Internal Usage Example: Copula Sampling Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/optimize.md Demonstrates how optimization functions like `bisect` and `chandrupatla` are used internally within copulas for tasks such as computing the percent point function (inverse CDF) and sampling from copulas. ```python from copulas.bivariate import Clayton from copulas.optimize import bisect import numpy as np # Fit a Clayton copula data = np.column_stack([ np.random.uniform(0, 1, 100), np.random.uniform(0, 1, 100) ]) copula = Clayton() copula.fit(data) # percent_point internally uses chandrupatla to solve: # Find u such that C(u|v) = y # where C is the conditional CDF of the copula # Sample from the copula (uses percent_point) samples = copula.sample(50) ``` -------------------------------- ### Import Utility Functions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Imports various utility functions for handling machine epsilon, validating random states, getting object instances and qualified names, and managing arguments and random states. ```python from copulas.utils import ( EPSILON, validate_random_state, get_instance, get_qualified_name, store_args, random_state, vectorize, set_random_state, check_valid_values, ) ``` -------------------------------- ### Example Bivariate Fit Test Case (Frank Copula) Source: https://github.com/sdv-dev/copulas/blob/main/tests/numerical/README.md This JSON defines a test case for fitting the Frank copula to a highly correlated dataset. It includes metadata, the class to test, input file, expected outputs (theta and tau) from R and Matlab, and comparison settings. ```json { "metadata": { "test_type": "bivariate_fit", "dataset_type": "handcrafted", "scripts_to_generate": { "output": { "R": "bivariate_fit_output.R", "Matlab": "bivariate_fit_output.m" } }, "description": "Test Frank on a highly correlated dataset." }, "test": { "class": "copulas.bivariate.frank.Frank", "kwargs": {} }, "test_case_inputs": { "points": "bivariate_fit_test_case_1_input.csv" }, "expected_output": { "R": { "theta": 17.0227133037689, "tau": 0.787726361443376 }, "Matlab": { "theta": 17.0227133037689, "tau": 0.787726361443376 } }, "settings": { "rtol": 1e-5 } } ``` -------------------------------- ### Probability Integral Transform Example Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/01_Introduction_to_Copulas.ipynb This code demonstrates the Probability Integral Transform (PIT) by sampling from a normal distribution and then transforming the samples using the CDF. The transformed samples are shown to approximate a uniform distribution. ```python from scipy import stats X = stats.norm.rvs(size=10000) X_pit = stats.norm.cdf(X) fig = make_subplots(rows=1, cols=2, subplot_titles=("Samples", "Transformed Samples")) fig.add_trace( go.Histogram(x=X), row=1, col=1 ) fig.add_trace( go.Histogram(x=X_pit), row=1, col=2 ) fig.update_layout(height=400, width=900, showlegend=False) fig.show() ``` -------------------------------- ### Generate and View Documentation Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Generate the project documentation and open it in a browser to ensure all changes are reflected. Alternatively, use 'make docs' to only generate. ```bash make view-docs ``` ```bash make docs ``` -------------------------------- ### Complete Configuration Workflow: Default GaussianMultivariate Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Demonstrates the first option in a complete configuration workflow: initializing GaussianMultivariate with default settings. This uses auto-selected univariate distributions. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import ( ParametricType, BoundedType, GaussianUnivariate, BetaUnivariate ) from copulas.datasets import sample_trivariate_xyz import pandas as pd # Prepare data real_data = sample_trivariate_xyz(size=2000, seed=42) # Option 1: Default configuration copula1 = GaussianMultivariate() ``` -------------------------------- ### Handling NotFittedError with check_fit() Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/errors.md Example of catching NotFittedError using the check_fit() method on a univariate model, indicating the model is not ready for use. ```python from copulas.univariate import Univariate from copulas.errors import NotFittedError model = Univariate() try: model.check_fit() except NotFittedError as e: print(f"Model not fitted: {e}") ``` -------------------------------- ### Complete Configuration Workflow: Fitting and Sampling All Copulas Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md This snippet demonstrates fitting multiple configured GaussianMultivariate copulas to the same dataset and then sampling from each. It prints the number of generated samples for each configuration. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import ( ParametricType, BoundedType, GaussianUnivariate, BetaUnivariate ) from copulas.datasets import sample_trivariate_xyz import pandas as pd # Prepare data real_data = sample_trivariate_xyz(size=2000, seed=42) # Option 1: Default configuration copula1 = GaussianMultivariate() # Option 2: Parametric distributions only copula2 = GaussianMultivariate( distribution='copulas.univariate.Univariate' ) # Option 3: Specific distributions per column copula3 = GaussianMultivariate( distribution={ 'x': 'copulas.univariate.BetaUnivariate', 'y': 'copulas.univariate.BetaUnivariate', 'z': 'copulas.univariate.GaussianUnivariate' }, random_state=42 ) # Fit all for i, copula in enumerate([copula1, copula2, copula3], 1): copula.fit(real_data) synthetic = copula.sample(num_rows=100) print(f"Config {i}: Generated {len(synthetic)} samples") ``` -------------------------------- ### Get Multivariate Distribution Parameters Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Retrieves all parameters defining a multivariate distribution as a Python dictionary. Useful for saving and loading distribution configurations. ```python parameters = dist.to_dict() parameters.keys() ``` -------------------------------- ### Complete Configuration Workflow: Specific Univariate Distributions Per Column Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Shows the third option in a complete configuration workflow: specifying different univariate distributions for each column in GaussianMultivariate, along with a random state for reproducibility. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import ( ParametricType, BoundedType, GaussianUnivariate, BetaUnivariate ) from copulas.datasets import sample_trivariate_xyz import pandas as pd # Prepare data real_data = sample_trivariate_xyz(size=2000, seed=42) # Option 3: Specific distributions per column copula3 = GaussianMultivariate( distribution={ 'x': 'copulas.univariate.BetaUnivariate', 'y': 'copulas.univariate.BetaUnivariate', 'z': 'copulas.univariate.GaussianUnivariate' }, random_state=42 ) ``` -------------------------------- ### Get Fully Qualified Name (FQN) Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/utils.md Retrieve the fully qualified name (FQN) of a class or instance. This is useful for serialization and recreating objects. ```python from copulas.utils import get_qualified_name from copulas.univariate import GaussianUnivariate model = GaussianUnivariate() fqn = get_qualified_name(model) print(fqn) # 'copulas.univariate.gaussian.GaussianUnivariate' fqn2 = get_qualified_name(GaussianUnivariate) print(fqn2) # 'copulas.univariate.gaussian.GaussianUnivariate' ``` -------------------------------- ### Initialize Univariate with Specific Candidates Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Univariate class with a predefined list of candidate distributions. ```python from copulas.univariate import Univariate # Specific candidates candidates = [ 'copulas.univariate.GaussianUnivariate', 'copulas.univariate.BetaUnivariate' ] model = Univariate(candidates=candidates) ``` -------------------------------- ### Catching RuntimeWarning for Non-Uniform Marginals Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/errors.md Issued by `check_marginal()` when data is not uniformly distributed. This example shows how to catch and print RuntimeWarnings during model fitting. ```python from copulas.bivariate import Bivariate import numpy as np import warnings copula = Bivariate(copula_type='clayton') # Data with non-uniform marginals u = np.random.beta(0.5, 0.5, 100) # Beta distribution, not uniform v = np.random.uniform(0, 1, 100) data = np.column_stack([u, v]) with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") copula.fit(data) for warning in w: if issubclass(warning.category, RuntimeWarning): print(f"Warning: {warning.message}") ``` -------------------------------- ### Complete Configuration Workflow: Parametric Univariate Distributions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Illustrates the second option in a complete configuration workflow: using parametric univariate distributions for GaussianMultivariate. The 'distribution' parameter is set to a generic univariate type. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import ( ParametricType, BoundedType, GaussianUnivariate, BetaUnivariate ) from copulas.datasets import sample_trivariate_xyz import pandas as pd # Prepare data real_data = sample_trivariate_xyz(size=2000, seed=42) # Option 2: Parametric distributions only copula2 = GaussianMultivariate( distribution='copulas.univariate.Univariate' ) ``` -------------------------------- ### Instantiate and Fit Vine Copulas Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Initializes and fits different types of Vine Copulas (center, regular, direct) to data. This prepares the model for sampling or further analysis. ```python from copulas.multivariate import VineCopula center = VineCopula('center') regular = VineCopula('regular') direct = VineCopula('direct') center.fit(data) regular.fit(data) direct.fit(data) center_samples = center.sample(1000) regular_samples = regular.sample(1000) direct_samples = direct.sample(1000) ``` -------------------------------- ### Initialize Univariate with Default Settings Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Univariate class to use default settings for auto-selection of candidate distributions. ```python from copulas.univariate import Univariate # Default: auto-select from all candidates model = Univariate() ``` -------------------------------- ### Serialize Univariate Distribution Parameters to Dictionary Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/02_Univariate_Distributions.ipynb Use the `to_dict` method to get a dictionary of parameters defining a univariate distribution. This is useful for saving and loading distribution configurations. ```python parameters = beta.to_dict() ``` ```python parameters ``` -------------------------------- ### Univariate Distribution Selection and Sampling Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Demonstrates automatic selection of a univariate parametric distribution for given data, fitting the model, and generating samples. Also shows how to compute probability density and cumulative distribution. ```python from copulas.univariate import Univariate, ParametricType, BoundedType import numpy as np # Data data = np.random.gamma(2, 2, 1000) # Auto-select from parametric distributions model = Univariate(parametric=ParametricType.PARAMETRIC) model.fit(data) # Generate samples samples = model.sample(100) # Compute probabilities pdf = model.probability_density(data) cdf = model.cumulative_distribution(data) ``` -------------------------------- ### get_instance Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/utils.md Creates an object instance from a string representing its fully qualified name (FQN), a class type, or an existing object. If a string is provided, it handles the import and instantiation. For existing objects, it creates a new instance of the same class, optionally using stored arguments if the `@store_args` decorator was used. ```APIDOC ## get_instance ### Description Create an instance from a string FQN, class, or existing instance. ### Signature ```python def get_instance( obj: str | type | object, **kwargs ) -> object ``` ### Parameters #### Parameters - **obj** (str | type | object) - Fully qualified name, class, or instance. - **kwargs** (dict) - Arguments to pass to constructor. ### Returns - object: Instantiated object. ### Behavior - **String:** Imports module and class, instantiates with kwargs. - **Type:** Instantiates class with kwargs. - **Instance:** Creates new instance of same class with kwargs (or uses stored args/kwargs). ### Example ```python from copulas.utils import get_instance ``` -------------------------------- ### Univariate Model Fitting and Sampling Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/errors.md Demonstrates the correct workflow for fitting a univariate Gaussian model and then sampling from it, contrasting with the NotFittedError scenario. ```python from copulas.univariate import GaussianUnivariate model = GaussianUnivariate() # All raise NotFittedError try: model.sample(10) except NotFittedError: print("Must fit model first") # Fit the model data = [1.0, 2.0, 3.0, 4.0, 5.0] model.fit(data) # Now works samples = model.sample(10) ``` -------------------------------- ### Fit and Sample with Univariate Model Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Demonstrates fitting a Univariate model to data and then sampling from the fitted model. ```python import numpy as np from copulas.univariate import Univariate # Fit and use data = np.random.normal(0, 1, 10000) model.fit(data) samples = model.sample(100) ``` -------------------------------- ### Initialize VineCopula with Regular Type and Reproducibility Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Configure a VineCopula with the 'regular' vine structure and set a random state for reproducible sampling. Compatibility with Python 3.8+ is not guaranteed. ```python from copulas.multivariate import VineCopula import pandas as pd import warnings # Warning: Vines not fully tested on Python >= 3.8 with warnings.catch_warnings(): warnings.simplefilter("ignore") # Regular vine with reproducibility vine = VineCopula(vine_type='regular', random_state=42) ``` -------------------------------- ### Create a Branch on SDV for Testing Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Create a new branch in the SDV repository to test the Copulas release candidate. Update pyproject.toml to specify the minimum Copulas version. ```bash git checkout -b test-copulas-X.Y.Z ``` -------------------------------- ### Initialize Univariate with Reproducibility Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Univariate class and set a random state for reproducible results. ```python from copulas.univariate import Univariate # With reproducibility model = Univariate(random_state=42) ``` -------------------------------- ### Initialize Univariate with Subsample for Selection Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Univariate class with a specified subsample size for faster candidate selection on large datasets. ```python from copulas.univariate import Univariate # Subsample for faster selection on large data model = Univariate(selection_sample_size=5000) ``` -------------------------------- ### Run Tests and Linting Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Execute the test suite and linting checks. The tests should complete without errors, though warnings are acceptable. ```bash make test && make lint ``` -------------------------------- ### Initialize GaussianMultivariate with Different Distributions Per Column Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Set up a GaussianMultivariate copula with distinct univariate distributions for different columns. This allows for fine-grained control over marginal modeling. Missing columns will use the default. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import GaussianUnivariate import pandas as pd # Different distributions per column copula = GaussianMultivariate( distribution={ 'age': 'copulas.univariate.GammaUnivariate', 'income': 'copulas.univariate.GaussianUnivariate', 'score': 'copulas.univariate.BetaUnivariate' } ) ``` -------------------------------- ### Check Release Readiness Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Run the 'check-release' make command to verify if the release can be made after HISTORY.md has been updated on the main branch. ```bash make check-release ``` -------------------------------- ### Initialize VineCopula with Direct Type Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate a VineCopula using the 'direct' vine structure. Be aware of potential compatibility issues with Python versions 3.8 and above. ```python from copulas.multivariate import VineCopula import pandas as pd import warnings # Warning: Vines not fully tested on Python >= 3.8 with warnings.catch_warnings(): warnings.simplefilter("ignore") # Direct vine vine = VineCopula(vine_type='direct') ``` -------------------------------- ### Compare Real vs Synthetic Data Distributions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Illustrates how to compare the distributions of real and synthetic data using 1D, 2D, and 3D comparison plots. Requires fitting a copula first. ```python from copulas.multivariate import GaussianMultivariate from copulas.visualization import compare_1d, compare_2d, compare_3d from copulas.datasets import sample_trivariate_xyz # Fit copula real_data = sample_trivariate_xyz(size=500) copula = GaussianMultivariate() copula.fit(real_data) synthetic_data = copula.sample(500) # 1D comparison fig1d = compare_1d(real_data['x'], synthetic_data['x']) fig1d.show() # 2D comparison fig2d = compare_2d(real_data, synthetic_data, columns=['x', 'y']) fig2d.show() # 3D comparison fig3d = compare_3d(real_data, synthetic_data) fig3d.show() ``` -------------------------------- ### Fit Univariate Model with Subsampling Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Demonstrates how to use subsampling for candidate selection when fitting a univariate model to large datasets. This is useful when dealing with millions of samples. ```python from copulas.univariate import Univariate # Use subsampling for candidate selection on large data model = Univariate(selection_sample_size=10000) # Sample 10k instead of all 1M model.fit(large_data) ``` -------------------------------- ### Fit Multivariate Model with Column Subsets Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Shows how to fit a multivariate model by selecting a subset of important columns from a larger dataset. This can improve performance and focus the model. ```python # For multivariate, consider column subsets important_cols = ['col1', 'col2', 'col3'] subset_data = data[important_cols] ``` -------------------------------- ### Initialize Bivariate with Copula Type as String Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Bivariate class using a string to specify the copula type, such as Frank. ```python from copulas.bivariate import Bivariate # Or using string copula = Bivariate(copula_type='frank') ``` -------------------------------- ### Initialize Bivariate with Reproducibility Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Bivariate class with a specified copula type and a random seed for reproducible results. ```python from copulas.bivariate import Bivariate # With random seed copula = Bivariate(copula_type='gumbel', random_state=42) ``` -------------------------------- ### Create Vine Copulas with Different Tree Types Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/types.md Shows how to create vine copulas with different tree structures using string representations for tree types like 'center' or 'direct'. ```python from copulas.multivariate import VineCopula, Tree # Create vine copula with center tree structure vine = VineCopula(vine_type='center') # Or direct vine = VineCopula(vine_type='direct') ``` -------------------------------- ### Load Sample Data Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/00_Quickstart.ipynb Load a built-in trivariate dataset using the sample_trivariate_xyz function. ```python from copulas.datasets import sample_trivariate_xyz data = sample_trivariate_xyz() data.head() ``` -------------------------------- ### Fit Gaussian Copula and Generate Synthetic Data Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/README Quickstart.ipynb Load a sample dataset, fit a Gaussian copula, generate synthetic data, and compare it with the original data. Requires `copulas` library and its dependencies. ```python import warnings warnings.filterwarnings('ignore') from copulas.datasets import sample_trivariate_xyz from copulas.multivariate import GaussianMultivariate from copulas.visualization import compare_3d # Load a dataset with 3 columns that are not independent real_data = sample_trivariate_xyz() # Fit a gaussian copula to the data copula = GaussianMultivariate() copula.fit(real_data) # Sample synthetic data synthetic_data = copula.sample(len(real_data)) # Plot the real and the synthetic data to compare compare_3d(real_data, synthetic_data) ``` -------------------------------- ### Import Sample Datasets Functions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Imports various functions for sampling test and correlated data, as well as different types of univariate distributions. ```python from copulas.datasets import ( sample_trivariate_xyz, sample_bivariate_age_income, sample_univariate_bernoulli, sample_univariate_bimodal, sample_univariate_uniform, sample_univariate_normal, sample_univariate_degenerate, sample_univariate_exponential, sample_univariate_beta, sample_univariates, ) ``` -------------------------------- ### Initialize GaussianMultivariate with Reproducibility Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Create a GaussianMultivariate copula with a specified random state for reproducible results. This is crucial for testing and consistent model behavior. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import GaussianUnivariate import pandas as pd # With reproducibility copula = GaussianMultivariate(random_state=42) ``` -------------------------------- ### Import Optimization Functions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Imports functions for numerical optimization, including bisection and hybrid bisection-interpolation methods. ```python from copulas.optimize import ( bisect, chandrupatla, ) ``` -------------------------------- ### Load and Split Dataset Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/04_Synthetic_Data_for_Machine_Learning.ipynb Imports necessary libraries and loads the diabetes dataset, then splits it into training and testing sets. ```python import warnings warnings.filterwarnings('ignore') from sklearn.datasets import load_diabetes from sklearn.model_selection import train_test_split X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y) ``` -------------------------------- ### Bisection Method with Different Tolerances Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/optimize.md Demonstrates how to set different convergence tolerances for the bisection method. A tighter tolerance yields higher precision but takes longer, while a looser tolerance is faster but less precise. The default tolerance is typically 1e-8. ```python # Tight tolerance for high precision result = bisect(f, xmin, xmax, tol=1e-12) ``` ```python # Loose tolerance for faster computation result = bisect(f, xmin, xmax, tol=1e-6) ``` ```python # Default is usually 1e-8 (good balance) result = bisect(f, xmin, xmax) ``` -------------------------------- ### Initialize Univariate for Bounded Distributions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Univariate class to filter for bounded distributions only. ```python from copulas.univariate import Univariate, BoundedType # Bounded distributions only model = Univariate(bounded=BoundedType.BOUNDED) ``` -------------------------------- ### Visualize C-Vine Samples Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Generates a 3D scatter plot for samples generated by the C-Vine copula. Allows visual assessment of the model's fit. ```python scatter_3d(center_samples, title='C-Vine') ``` -------------------------------- ### Import Visualization Functions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/INDEX.md Imports functions for visualizing univariate and multivariate distributions, including comparisons between real and synthetic data. ```python from copulas.visualization import ( dist_1d, compare_1d, scatter_2d, compare_2d, dist_3d, compare_3d, PlotConfig, ) ``` -------------------------------- ### Sample from Fitted Generic Univariate Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/02_Univariate_Distributions.ipynb After fitting a generic `Univariate` instance, you can sample data from the selected distribution. The `to_dict` method can then be used to inspect the chosen distribution and its learned parameters. ```python parameters = univariate.to_dict() ``` ```python parameters ``` -------------------------------- ### Release Major Version Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Execute this command to bump and release the next major version. This is for changes that modify the user API in a backwards incompatible way after v1.0.0. ```bash make release-major ``` -------------------------------- ### Compare Original and Synthetic Data Distributions Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/01_Introduction_to_Copulas.ipynb Visualizes a comparison between the original dataset and the synthetic data generated by the copula. This is crucial for evaluating the quality of the synthetic data. ```python from copulas.visualization import compare_2d compare_2d(df, synthetic) ``` -------------------------------- ### Fit Gaussian Copula and Sample Data Source: https://github.com/sdv-dev/copulas/blob/main/README.md Model the loaded data using a Gaussian Copula and generate synthetic data. This demonstrates the core functionality of fitting a copula and sampling from it. ```python from copulas.multivariate import GaussianMultivariate copula = GaussianMultivariate() copula.fit(real_data) synthetic_data = copula.sample(len(real_data)) ``` -------------------------------- ### Visualize D-Vine Samples Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Generates a 3D scatter plot for samples generated by the D-Vine copula. Helps in evaluating the quality of the dependency modeling. ```python scatter_3d(direct_samples, title='D-Vine') ``` -------------------------------- ### Fit and Sample with GaussianMultivariate Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Demonstrates fitting a GaussianMultivariate copula to data and then sampling synthetic data from the fitted model. Ensure the data is in a pandas DataFrame format. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import GaussianUnivariate import pandas as pd # Default: auto-select univariate for each column copula = GaussianMultivariate() # Fit and use data = pd.DataFrame({ 'x': [1, 2, 3, 4, 5], 'y': [0.1, 0.2, 0.3, 0.4, 0.5], 'z': [10, 20, 30, 40, 50] }) copula.fit(data) synthetic = copula.sample(100) ``` -------------------------------- ### Sample Data from Fitted Model Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Generates synthetic data from the fitted GaussianMultivariate model. The `sample` method creates new data points that mimic the characteristics of the original data. ```python sampled = dist.sample(1000) ``` -------------------------------- ### Visualize R-Vine Samples Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Generates a 3D scatter plot for samples generated by the R-Vine copula. Used to visually compare the modeled distribution with the real data. ```python scatter_3d(regular_samples, title='R-Vine') ``` -------------------------------- ### Initialize GaussianMultivariate with Default Settings Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Use this snippet to create a GaussianMultivariate copula without specifying univariate distributions. The copula will automatically select appropriate univariate distributions for each column. ```python from copulas.multivariate import GaussianMultivariate from copulas.univariate import GaussianUnivariate import pandas as pd # Default: auto-select univariate for each column copula = GaussianMultivariate() ``` -------------------------------- ### Initialize Univariate for Parametric Distributions Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Instantiate the Univariate class to filter for parametric distributions only. ```python from copulas.univariate import Univariate, ParametricType # Parametric only model = Univariate(parametric=ParametricType.PARAMETRIC) ``` -------------------------------- ### Create Gaussian Copula Instance Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/00_Quickstart.ipynb Create an instance of the GaussianMultivariate class to model the dataset. ```python from copulas.multivariate import GaussianMultivariate copula = GaussianMultivariate() ``` -------------------------------- ### Fit and Sample with Bivariate Copula Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/configuration.md Demonstrates fitting a Bivariate copula to two-dimensional data and then sampling from the fitted copula. ```python import numpy as np from copulas.bivariate import Bivariate # Fit to data data = np.random.uniform(0, 1, (1000, 2)) copula.fit(data) samples = copula.sample(100) ``` -------------------------------- ### Logical Organization of Copulas Project Files Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/MANIFEST.md This structure outlines the key files and their purposes within the Copulas project documentation, aiding in navigation and understanding of the project's organization. ```text output/ ├── README.md ← Overview ├── INDEX.md ← Navigation hub ├── types.md ← Enumerations and types ├── configuration.md ← Constructor parameters ├── errors.md ← Exception handling └── api-reference/ ├── univariate.md ← Distribution classes ├── bivariate.md ← Copula pairs ├── multivariate.md ← High-dimensional ├── datasets.md ← Test data ├── visualization.md ← Plotting functions ├── utils.md ← Utility functions └── optimize.md ← Root-finding algorithms ``` -------------------------------- ### Compare Data Distributions Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/02_Univariate_Distributions.ipynb Visualizes and compares the distribution of the original data with the newly sampled data. Helps to assess the quality of the fit. ```python from copulas.visualization import compare_1d compare_1d(data, sampled) ``` -------------------------------- ### Instantiate Object from String, Class, or Instance Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/utils.md A utility function to create an object instance. It can handle a fully qualified name (string), a class type, or an existing instance, passing additional keyword arguments to the constructor. ```python from copulas.utils import get_instance ``` -------------------------------- ### Chandrupatla's Hybrid Root Finding Algorithm Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/optimize.md Employ `chandrupatla` for faster root finding using a hybrid bisection-interpolation approach, especially when function evaluations are expensive. It handles multiple root-finding problems in parallel and guarantees convergence. ```python from copulas.optimize import chandrupatla import numpy as np def f(x): return x**3 - 2 # Root at 2^(1/3) result = chandrupatla(f, xmin=np.array([1.0]), xmax=np.array([2.0])) print(result) # Approximately [1.26...] # Vectorized with multiple roots xmin = np.array([1.0, -1.0]) xmax = np.array([2.0, 0.0]) result = chandrupatla(f, xmin, xmax) ``` -------------------------------- ### Sample New Data Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/00_Quickstart.ipynb Generate a specified number of synthetic data samples using the fitted copula model. ```python num_samples = 1000 synthetic_data = copula.sample(num_samples) synthetic_data.head() ``` -------------------------------- ### Create Copula from Parameters Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/00_Quickstart.ipynb Create a new copula instance from a dictionary of parameters. ```python new_copula = GaussianMultivariate.from_dict(copula_params) ``` -------------------------------- ### Release Minor Version Source: https://github.com/sdv-dev/copulas/blob/main/RELEASE.md Use this command to bump and release the next minor version. This is for changes that modify the existing user API, even if backwards compatible. ```bash make release-minor ``` -------------------------------- ### Fit GaussianMultivariate Model Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/03_Multivariate_Distributions.ipynb Initializes and fits a GaussianMultivariate model to the provided data. The model learns the marginal distributions and the correlations between them. ```python from copulas.multivariate import GaussianMultivariate dist = GaussianMultivariate() dist.fit(data) ``` -------------------------------- ### Display Probability Density Samples Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/02_Univariate_Distributions.ipynb Shows the first few calculated probability density values for the sampled data. Useful for inspecting the PDF output. ```python probability_density[0:5] ``` -------------------------------- ### Recreate Specific Univariate Model Instance Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/02_Univariate_Distributions.ipynb Demonstrates that when recreating a model using `Univariate.from_dict` after fitting to specific data (e.g., 'beta'), the resulting instance is of the specific subclass (e.g., `BetaUnivariate`), not the generic `Univariate` class. ```python univariate = Univariate() univariate.fit(data['beta']) new_model = Univariate.from_dict(parameters) ``` ```python new_model.__class__ ``` -------------------------------- ### Fit Univariate Distribution with Parametric and Bounded Constraints Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/02_Univariate_Distributions.ipynb Filters the search for the best univariate distribution by specifying desired parametric and bounded types. Requires importing ParametricType and BoundedType. ```python from copulas.univariate import ParametricType, BoundedType univariate = Univariate( parametric=ParametricType.PARAMETRIC, bounded=BoundedType.BOUNDED ) univariate.fit(data['bimodal']) univariate.to_dict() ``` -------------------------------- ### Compare Real and Synthetic Data in 3D Source: https://github.com/sdv-dev/copulas/blob/main/README.md Visualize the real and synthetic data side-by-side using a 3D scatterplot. This helps in assessing the quality of the generated synthetic data. ```python from copulas.visualization import compare_3d compare_3d(real_data, synthetic_data) ``` -------------------------------- ### Generate Beta Distribution Samples Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/api-reference/datasets.md Generates a pandas Series of samples from a beta distribution with specified parameters. Useful for creating bounded data within a specific range. ```python from copulas.datasets import sample_univariate_beta # Generate 1000 samples with default seed beta_samples = sample_univariate_beta(size=1000) # Display the first 5 samples and their characteristics print(beta_samples.head()) print(f"\nMin value: {beta_samples.min()}") print(f"Max value: {beta_samples.max()}") ``` -------------------------------- ### Handling NotFittedError with Try-Except Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/errors.md Shows how to use a try-except block to catch `NotFittedError` when attempting to use an unfitted model. The model is then fitted and the operation retried. ```python from copulas.bivariate import Frank from copulas.errors import NotFittedError import numpy as np copula = Frank() try: # Attempt to use unfitted model samples = copula.sample(10) except NotFittedError: # Fit and retry data = np.random.uniform(0, 1, (100, 2)) copula.fit(data) samples = copula.sample(10) print(f"Generated {len(samples)} samples") ``` -------------------------------- ### Sample Bivariate Age and Income Data Source: https://github.com/sdv-dev/copulas/blob/main/tutorials/01_Introduction_to_Copulas.ipynb Generates a sample bivariate dataset for age and income using the copulas library. This is useful for initial exploration and testing. ```python from copulas.datasets import sample_bivariate_age_income df = sample_bivariate_age_income() df.head() ``` -------------------------------- ### Using check_fit() with Exception Handling Source: https://github.com/sdv-dev/copulas/blob/main/_autodocs/errors.md Illustrates how to use the `check_fit()` method within a try-except block to catch `NotFittedError`. This is useful for validating the model's state before proceeding with operations. ```python from copulas.multivariate import GaussianMultivariate from copulas.errors import NotFittedError copula = GaussianMultivariate() try: copula.check_fit() except NotFittedError as e: print(f"Cannot proceed: {e}") # Handle appropriately (fit model, load from file, etc.) ```