### Install from Main Branch using Git Source: https://github.com/bashtage/linearmodels/blob/main/README.md Clone the repository and install directly from the main branch for the latest development version. Ensure you have Git installed. ```bash git clone https://github.com/bashtage/linearmodels cd linearmodels pip install . ``` -------------------------------- ### Install Latest Release using Pip Source: https://github.com/bashtage/linearmodels/blob/main/README.md Use this command to install the most recent stable version of the linearmodels package from PyPI. ```bash pip install linearmodels ``` -------------------------------- ### Import Common Libraries Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Imports essential libraries for data manipulation and statistical modeling. Ensure these are installed before running. ```python # Common libraries import numpy as np import pandas as pd import seaborn as sns import statsmodels.api as sm ``` -------------------------------- ### Load and Prepare Birthweight Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Loads the birthweight dataset and adds a constant. This is a common setup for regression analysis. ```python from linearmodels.datasets import birthweight data = birthweight.load() print(birthweight.DESCR) data = add_constant(data) ``` -------------------------------- ### Setup and Fit SUR Model Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/system/index.rst Use OrderedDict to define equations for SUR estimation. The dependent and exogenous variables for each equation are specified in a dictionary. The model is then fitted using the .fit() method. ```python from collections import OrderedDict import statsmodels.api as sm from linearmodels.datasets import fringe from linearmodels.system import SUR data = sm.add_constant(fringe.load()) equations = OrderedDict() equations['earnings'] = {'dependent': data.hrearn, 'exog': data[['const', 'exper', 'tenure']] } equations['benefits'] = {'dependent': data.hrbens, 'exog': data[['const', 'exper', 'tenure']] } mod = SUR(equations) mod.fit(cov_type='unadjusted') ``` -------------------------------- ### Load and Prepare Card Dataset Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Loads the card dataset, which is used for an example of proximity to college as an instrument for education. Adds a constant to the data. ```python from linearmodels.datasets import card data = card.load() print(card.DESCR) data = add_constant(data) ``` -------------------------------- ### Feasible GLS-type Estimation Setup Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Calculates squared residuals from a previous IV estimation and fits a model to estimate the variance of these residuals, preparing for weighted IV estimation. ```python res2 = res.resids**2 fgls_mod = IV2SLS(np.log(res2), men[["const", "sibs", "exper"]], None, None) fgls_res = fgls_mod.fit() sigma2_hat = np.exp(np.log(res2) - fgls_res.resids) print(fgls_res) ``` -------------------------------- ### Estimate System of Equations using Dictionary (GLS) Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Estimates a system of equations using GLS for potentially more efficient estimates. This example shows how GLS can be applied to system estimation, with potential changes in results compared to OLS. ```python system_3sls = IV3SLS.from_formula(equations, data) system_3sls_res = system_3sls.fit(method="gls", cov_type="unadjusted") print(system_3sls_res) ``` -------------------------------- ### Basic IV2SLS Model Estimation Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/iv/introduction.rst Demonstrates the setup and estimation of a basic Instrumental Variable (IV) model using the IV2SLS estimator. Requires dependent, exogenous, endogenous, and instrument variables. The result object contains detailed estimation statistics. ```python import pandas as pd import numpy as np import statsmodels.api as sm from linearmodels.iv import IV2SLS from linearmodels.datasets import wage data = wage.load() dependent = np.log(data.wage) exog = sm.add_constant(data.exper) endog = data.educ instruments = data.sibs mod = IV2SLS(dependent, exog, endog, instruments) res = mod.fit(cov_type='unadjusted') res ``` -------------------------------- ### Simulate Data for Absorbing Regression Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_absorbing-regression.ipynb Generates a simulated dataset with state and firm effects to be used for fitting absorbing regression models. This setup mirrors data structures common in practice. ```python import numpy as np import pandas as pd rs = np.random.RandomState(0) nobs = 1_000_000 state_id = rs.randint(50, size=nobs) state_effects = rs.standard_normal(state_id.max() + 1) state_effects = state_effects[state_id] # 5 workers/firm, on average firm_id = rs.randint(nobs // 5, size=nobs) firm_effects = rs.standard_normal(firm_id.max() + 1) firm_effects = firm_effects[firm_id] cats = pd.DataFrame( {"state": pd.Categorical(state_id), "firm": pd.Categorical(firm_id)} ) olds = rs.standard_normal(nobs) x = rs.standard_normal((nobs, 2)) x = np.column_stack([np.ones(nobs), x]) y = x.sum(1) + firm_effects + state_effects + eps ``` -------------------------------- ### Predict with Panel IV Model Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Generates predictions from a fitted panel IV model. This example shows how to predict using exogenous variables and the endogenous variables. ```python mod = IV2SLS.from_formula("lscrap ~ 1 + [hrsemp ~ grant]", deltas) res_iv = mod.fit(cov_type="unadjusted") n = deltas.shape[0] pred_exog = pd.DataFrame(np.ones((n, 1)), index=deltas.index) res_iv.predict(exog=pred_exog, endog=deltas[["hrsemp"]]) ``` -------------------------------- ### Estimate First Stage for Weak Instrument Example Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Estimates the first stage of a 2SLS model to check for weak instruments. Note the unadjusted covariance type. ```python res = IV2SLS(data.packs, data[["const", "cigprice"]], None, None).fit( cov_type="unadjusted" ) print(res) ``` -------------------------------- ### Estimate PanelOLS with Entity Effects Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/panel/pandas.rst This example demonstrates how to load the Grunfeld dataset, prepare it for PanelOLS by setting a MultiIndex, and then estimate a model with entity effects using the PanelOLS class. The fit method is called with debiased=True for covariance estimation. ```python import numpy as np from statsmodels.datasets import grunfeld data = grunfeld.load_pandas().data data.year = data.year.astype(np.int64) from linearmodels import PanelOLS etdata = data.set_index(['firm','year']) PanelOLS(etdata.invest,etdata[['value','capital']],entity_effects=True).fit(debiased=True) ``` -------------------------------- ### Estimate IV Model with Formula Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_using-formulas.ipynb Estimates an IV model using the `from_formula` interface. This method takes a formula string and a DataFrame, simplifying model setup. The formula specifies dependent and independent variables, including an endogenous variable and its instrument. ```python formula = ( "ldrugexp ~ 1 + totchr + female + age + linc + blhisp + [hi_empunion ~ ssiratio]" ) mod = IV2SLS.from_formula(formula, data) ``` ```python iv_res = mod.fit(cov_type="robust") print(iv_res) ``` -------------------------------- ### Print SUR Model Summary (First 33 Lines) Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Prints the first 33 lines of the SUR model's summary text, showing results for two regions. ```python print("\n".join(res.summary.as_text().split("\n")[:33])) ``` -------------------------------- ### Load and Prepare Panel Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/panel_examples.ipynb Loads the wage panel dataset and prepares it for panel analysis by setting a MultiIndex and creating a categorical year variable. Prints the dataset description and the first few rows. ```python import pandas as pd from linearmodels.datasets import wage_panel data = wage_panel.load() year = pd.Categorical(data.year) data = data.set_index(["nr", "year"]) data["year"] = year print(wage_panel.DESCR) print(data.head()) ``` -------------------------------- ### Kernel (HAC) Covariance Estimation Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Applies a kernel-based Heteroskedasticity and Autocorrelation Consistent (HAC) covariance estimator. This example uses the Parzen kernel. ```python hac_res = fmod.fit(cov_type="kernel", kernel="parzen") print(hac_res.summary) ``` -------------------------------- ### 2-Step Estimation with LinearFactorModel Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Estimates factor loadings and risk premia using a 2-step procedure. Requires defining portfolios and factors. ```python from linearmodels.asset_pricing import LinearFactorModel factors = data[["MktRF", "SMB", "HML", "Mom"]] portfolios = data[ ["S1M1", "S1M3", "S1M5", "S3M1", "S3M3", "S3M5", "S5M1", "S5M3", "S5M5"] ] mod = LinearFactorModel(portfolios, factors) res = mod.fit() print(res) ``` -------------------------------- ### Load and Prepare Wage Dataset for Men Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Loads the wage dataset, selects relevant columns for men, adds a constant, and removes rows with missing values to prepare for regression analysis. ```python from linearmodels.datasets import wage men = wage.load() print(wage.DESCR) men = men[["educ", "wage", "sibs", "exper"]] men = add_constant(men) men = men.dropna() ``` -------------------------------- ### Get Model Parameter Names Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Retrieves the parameter names for a SUR model. Useful for understanding the structure of the parameter vector when defining constraints. ```python mod.param_names[:14] ``` -------------------------------- ### Load and Prepare Data for Formulas Source: https://github.com/bashtage/linearmodels/blob/main/examples/panel_using-formulas.ipynb Loads the Grunfeld dataset and prepares it for use with formula-based models by setting a MultiIndex of 'firm' and 'year'. ```python from statsmodels.datasets import grunfeld data = grunfeld.load_pandas().data data = data.set_index(["firm", "year"]) print(data.head()) ``` -------------------------------- ### Clustered Covariance Estimation Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Estimates the model using a clustered covariance estimator. Requires specifying the cluster variable(s). This example uses randomly generated clusters. ```python rs = np.random.RandomState([983476381, 28390328, 23829810]) random_clusters = rs.randint(0, 51, size=(616, 1)) clustered_res = fmod.fit(cov_type="clustered", clusters=random_clusters) print(clustered_res.summary) ``` -------------------------------- ### Load and Prepare French Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_formulas.ipynb Loads data from the French dataset and transforms portfolio returns to excess returns by subtracting the risk-free rate. ```python from linearmodels.datasets import french data = french.load() print(french.DESCR) data.iloc[:, 6:] = data.iloc[:, 6:].values - data[["RF"]].values ``` -------------------------------- ### Load and Prepare Munnell Dataset Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Loads the Munnell dataset and prepares it for regional analysis by mapping states to regions, calculating employment shares, and weighting unemployment rates. Requires the 'linearmodels' library. ```python from linearmodels.datasets import munnell data = munnell.load() regions = { "GF": ["AL", "FL", "LA", "MS"], "MW": ["IL", "IN", "KY", "MI", "MN", "OH", "WI"], "MA": ["DE", "MD", "NJ", "NY", "PA", "VA"], "MT": ["CO", "ID", "MT", "ND", "SD", "WY"], "NE": ["CT", "ME", "MA", "NH", "RI", "VT"], "SO": ["GA", "NC", "SC", "TN", "WV", "AR"], "SW": ["AZ", "NV", "NM", "TX", "UT"], "CN": ["AK", "IA", "KS", "MO", "NE", "OK"], "WC": ["CA", "OR", "WA"], } def map_region(state): for key in regions: if state in regions[key]: return key data["REGION"] = data.ST_ABB.map(map_region) data["TOTAL_EMP"] = data.groupby(["REGION", "YR"])["EMP"].transform("sum") data["EMP_SHARE"] = data.EMP / data.TOTAL_EMP data["WEIGHED_UNEMP"] = data.EMP_SHARE * data.UNEMP ``` -------------------------------- ### Get Estimated Kappa for LIML Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Prints the estimated value of the kappa parameter from a LIML estimation. This parameter influences the LIML estimator and is close to 1 in this dataset. ```python print(res_liml.kappa) ``` -------------------------------- ### Prepare Data for SUR Model Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Prepares data for a SUR model using an OrderedDict. Each key represents an equation label, and the value is a dictionary containing 'dependent', 'exog', and optionally 'weights'. ```python from collections import OrderedDict mod_data = OrderedDict() for region in ["GF", "SW", "WC", "MT", "NE", "MA", "SO", "MW", "CN"]: region_data = agg_data.loc[region] dependent = region_data.lnGSP exog = region_data[ ["Intercept", "lnPC", "lnHWY", "lnWATER", "lnUTIL", "lnEMP", "UNEMP"] ] mod_data[region] = {"dependent": dependent, "exog": exog} ``` -------------------------------- ### Continuously Updating GMM (CUE) Estimation Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Shows how to use the continuously updating GMM estimator, which simultaneously optimizes moment conditions and the weighting matrix for potential efficiency gains. The 'display' option can show optimizer output. ```python ivmod = IVGMMCUE( data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]] ) res_gmm_cue = ivmod.fit(cov_type="robust", display=True) ``` -------------------------------- ### Specify system model using dictionary of formulas Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_formulas.ipynb Defines a system model using an OrderedDict where keys are equation labels and values are formula strings. This method preserves equation order in results. ```python from collections import OrderedDict formula = OrderedDict() formula["benefits"] = ( "hrbens ~ educ + exper + expersq + union + south + nrtheast + nrthcen + male" ) formula["earnings"] = "hrearn ~ educ + exper + expersq + nrtheast + married + male" ``` -------------------------------- ### Define Cross-Equation Parameter Restrictions Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Constructs a restriction matrix `r` for imposing linear constraints on SUR model parameters. This example enforces that the coefficient of 'unemp' is the same across all equations. ```python r = pd.DataFrame( columns=mod.param_names, index=["rest{0}".format(i) for i in range(8)], dtype=np.float64, ) r.loc[:, :] = 0.0 r.iloc[:, 6] = -1.0 r.iloc[:, 13::7] = np.eye(8) print(r.iloc[:, 6::7]) ``` -------------------------------- ### Import Estimation Models Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Imports the necessary classes for estimation: IV2SLS for single-equation 2SLS, IV3SLS for system estimation with instrumental variables, and IVSystemGMM. ```python from linearmodels import IV2SLS, IV3SLS, IVSystemGMM ``` -------------------------------- ### Comparing Model Results Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Illustrates how to compare results from multiple models using the 'compare' function. An OrderedDict is recommended to preserve the order of models in the output. ```python from collections import OrderedDict from linearmodels.iv.results import compare res = OrderedDict() res["OLS"] = res_ols res["2SLS"] = res_2sls res["2SLS-Homo"] = res_2sls_std res["2SLS-Hetero"] = res_2sls_robust res["GMM"] = res_gmm res["GMM Cluster(Age)"] = res_gmm_clustered res["GMM-CUE"] = res_gmm_cue print(compare(res)) ``` -------------------------------- ### Load and Prepare Panel Data for IV Regression Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Loads the jobtraining dataset and prepares it for panel IV regression. This involves filtering years, handling missing values, sorting, and setting an index. ```python from linearmodels.datasets import jobtraining data = jobtraining.load() print(jobtraining.DESCR) data.head() data = data.where(data.year.isin((1987, 1988))) data = data.dropna(how="all", axis=0).sort_values(["fcode", "year"]) print(data.describe()) data = data.set_index("fcode") data = data[["year", "hrsemp", "grant", "scrap", "lscrap"]] ``` -------------------------------- ### Estimate Multi-Factor Model (Size and Value) Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Expands the factor set to include size (SMB) and value (HML) factors, in addition to the market factor. This multi-factor model is then estimated and printed. ```python factors = data[["MktRF", "SMB", "HML"]] mod = TradedFactorModel(portfolios, factors) res = mod.fit() print(res) ``` -------------------------------- ### Specify system model using curly braces Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_formulas.ipynb Defines a system model using a single string where each equation is enclosed in curly braces. This is an alternative to the dictionary method for specifying system formulas. ```python braces_formula = """ {hrbens ~ educ + exper + expersq + union + south + nrtheast + nrthcen + male} {hrearn ~ educ + exper + expersq + nrtheast + married + male} """ braces_mod = SUR.from_formula(braces_formula, data) braces_res = braces_mod.fit(cov_type="unadjusted") print(braces_res) ``` -------------------------------- ### Asset Pricing with Implicit Portfolios (Factors Only) Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_formulas.ipynb Estimates a LinearFactorModel using the second formula syntax, specifying only factors and passing test portfolios via the 'portfolios' keyword argument. Includes an option to set 'risk_free=True'. ```python ports = ["S{0}V{1}".format(i, j) for i in (1, 3, 5) for j in (1, 3, 5)] ports += ["S{0}M{1}".format(i, j) for i in (1, 3, 5) for j in (1, 3, 5)] portfolios = data[ports] formula = "MktRF + HML + Mom" mod = LinearFactorModel.from_formula( formula, data, portfolios=portfolios, risk_free=True ) res = mod.fit() print(res) ``` -------------------------------- ### Specify system model with labeled equations using curly braces Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_formulas.ipynb Defines a system model using a single string with explicitly labeled equations within curly braces. This allows for custom equation names, preventing potential conflicts with dependent variable names. ```python labeled_formula = """ {benefits: hrbens ~ educ + exper + expersq + union + south + nrtheast + nrthcen + male} {earnings: hrearn ~ educ + exper + expersq + nrtheast + married + male} """ labels_mod = SUR.from_formula(labeled_formula, data) labeled_res = labels_mod.fit(cov_type="unadjusted") print("Unlabeled") print(braces_res.equation_labels) print("Labeled") print(labeled_res.equation_labels) ``` -------------------------------- ### Load and Prepare Mroz Dataset Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Imports necessary libraries, prints dataset description, loads data, drops missing values, and adds a constant to the DataFrame for regression analysis. ```python import numpy as np import pandas as pd from statsmodels.api import add_constant from linearmodels.datasets import mroz print(mroz.DESCR) data = mroz.load() data = data.dropna() data = add_constant(data, has_constant="add") ``` -------------------------------- ### Asset Pricing with Explicit Portfolios and Factors Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_formulas.ipynb Estimates a LinearFactorModel using the first formula syntax, explicitly defining test portfolios and factors. Requires the 'kernel' and 'bandwidth' for covariance estimation. ```python from linearmodels.asset_pricing import LinearFactorModel formula = "NoDur + Chems + S1V1 + S5V5 + S1M1 + S5M5 ~ MktRF + HML + Mom" mod = LinearFactorModel.from_formula(formula, data) res = mod.fit(cov_type="kernel", kernel="parzen", bandwidth=20) print(res) ``` -------------------------------- ### Display Comparison Image Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Displays a PNG image related to parameter comparisons. Requires 'display_png' and 'Image' to be imported. ```python display_png(Image("system_correct-greene-table-10-3.png")) ``` -------------------------------- ### Fit SUR model with random weights Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_formulas.ipynb Instantiates and fits a SUR model using a dictionary formula and random weights. Weights are assumed to be proportional to the inverse variance of the data. ```python random_weights = np.random.chisquare(5, size=(616, 2)) random_weights = pd.DataFrame(random_weights, columns=["benefits", "earnings"]) weighted_mod = SUR.from_formula(formula, data, weights=random_weights) print(weighted_mod.fit()) ``` -------------------------------- ### Estimate Model with Industry Portfolios Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Changes the test portfolios to industry portfolios and re-estimates the multi-factor model. This demonstrates flexibility in choosing test assets. ```python indu = [ "NoDur", "Durbl", "Manuf", "Enrgy", "Chems", "BusEq", "Telcm", "Utils", "Shops", "Hlth", "Money", "Other", ] portfolios = data[indu] mod = TradedFactorModel(portfolios, factors) res = mod.fit() print(res) ``` -------------------------------- ### Fit Absorbing Regression with LSMR and Options Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_absorbing-regression.ipynb Demonstrates fitting an absorbing regression using the LSMR method with custom options. The `absorb_options` dictionary allows for fine-tuning LSMR's iterative process. ```python mod = AbsorbingLS(y, x[:, 1:], absorb=cats) res = mod.fit(method="lsmr", absorb_options={"show": True}) ``` -------------------------------- ### Compare OLS, 2SLS, and Direct IV Estimates Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Compares the results of OLS, 2SLS, and a direct IV estimation using the fitted values from the first stage. This highlights the differences in coefficients and their statistical significance. ```python from linearmodels.iv import compare res_direct = IV2SLS(np.log(data.wage), data[["const", "educ_hat"]], None, None).fit( cov_type="unadjusted" ) print(compare({"OLS": res_ols, "2SLS": res_second, "Direct": res_direct})) ``` -------------------------------- ### Asset Pricing with Standard Interface Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_formulas.ipynb Estimates a LinearFactorModel using the standard interface, providing portfolios and factors as separate DataFrames. This is used for verifying results from the formulaic interface. ```python portfolios = data[ports] factors = data[["MktRF", "HML", "Mom"]] mod = LinearFactorModel(portfolios, factors, risk_free=True) print(mod.fit()) ``` -------------------------------- ### Load and Inspect Job Training Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/panel_data-formats.ipynb Loads the job training dataset and displays the first few rows to understand its structure. This is a common first step in data analysis. ```python from linearmodels.datasets import jobtraining data = jobtraining.load() print(data.head()) ``` -------------------------------- ### Import Data and IV2SLS Model Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_using-formulas.ipynb Imports necessary libraries and loads the MEPS dataset. Handles missing values and prints dataset description. This is a prerequisite for estimating models using formulas. ```python from linearmodels.datasets import meps from linearmodels.iv import IV2SLS data = meps.load() data = data.dropna() print(meps.DESCR) ``` -------------------------------- ### Add and Estimate Model with Constraints Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Adds linear constraints to a SUR model using `add_constraints` and then estimates the model. Displays the parameters affected by the constraints to verify they are equal. ```python mod.add_constraints(r) constrained_res = mod.fit() constrained_res.params[6::7] ``` -------------------------------- ### Load French Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Loads the French data including factor returns and test portfolios. Prints the dataset description. ```python from linearmodels.datasets import french data = french.load() print(french.DESCR) ``` -------------------------------- ### Load and Prepare Data for IV Regression Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Loads and prepares the Mroz dataset for instrumental variables regression. Includes dropping missing values, adding a constant, and calculating log wage. ```python data = mroz.load() data = data.dropna() data = add_constant(data, has_constant="add") data["lnwage"] = np.log(data.wage) dep = "lnwage" exog = ["const", "exper", "expersq"] endog = ["educ"] instr = ["fatheduc", "motheduc"] ``` -------------------------------- ### Display PNG Image Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Displays a PNG image. Requires 'display_png' and 'Image' to be imported. ```python display_png(Image("system_correct-greene-table-10-1.png")) ``` -------------------------------- ### Load Data and Estimate Linear Factor Model Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/asset-pricing/introduction.rst Loads data from Ken French's library, prepares factors and portfolios, and estimates a LinearFactorModel using kernel-based covariance estimation. Use this for general-purpose estimation with traded or non-traded factors. ```python from linearmodels.datasets import french data = french.load() factors = data[['MktRF', 'SMB', 'HML']] portfolios = data[['S1V1','S1V3','S1V5','S5V1','S5V3','S5V5']].copy() portfolios.loc[:,:] = portfolios.values - data[['RF']].values from linearmodels.asset_pricing import LinearFactorModel mod = LinearFactorModel(portfolios, factors) res = mod.fit(cov_type='kernel') print(res) ``` -------------------------------- ### Display Estimated Factor Loadings (Betas) Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Retrieves and prints the estimated factor loadings (betas) from the fitted TradedFactorModel results. Shows dispersion in loadings across factors. ```python print(res.betas) ``` -------------------------------- ### Data Generation for System Estimation Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Generates synthetic data for demonstrating system estimation models. This includes creating dependent and independent variables with specified relationships and noise. ```python import numpy as np import pandas as pd rs = np.random.default_rng(20220224) e = rs.standard_normal((50000, 2)) x = rs.standard_normal((50000, 2)) y2 = x[:, 1] / 2 - x[:, 0] / 2 + e[:, 1] / 2 - e[:, 0] / 2 y1 = x[:, 1] / 2 + x[:, 0] / 2 + e[:, 1] / 2 + e[:, 0] / 2 df = pd.DataFrame(np.column_stack([y1, y2, x]), columns=["y1", "y2", "x1", "x2"]) in_sample = df.iloc[:-10000] oos = df.iloc[-10000:] mod = IV3SLS.from_formula( {"y1": "y1 ~ x1 + [y2 ~ x2]", "y2": "y2 ~ x2 + [y1 ~ x1]"}, data=df ) res = mod.fit() print(res) ``` -------------------------------- ### Direct Model Specification for System Estimation Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Specifies a system of equations directly using a dictionary of dictionaries. This interface is useful for programmatic model generation and requires defining 'dependent', 'exog', 'endog', and 'instruments' for each equation. ```python hours = { "dependent": data[["hours"]], "exog": data[["educ", "age", "kidslt6", "nwifeinc"]], "endog": data[["lwage"]], "instruments": data[["exper", "expersq"]], } lwage = { "dependent": data[["lwage"]], "exog": data[["educ", "exper", "expersq"]], "endog": data[["hours"]], "instruments": data[["age", "kidslt6", "nwifeinc"]], } equations = {"hours": hours, "lwage": lwage} system_3sls = IV3SLS(equations) system_3sls_res = system_3sls.fit(cov_type="unadjusted") print(system_3sls_res) ``` -------------------------------- ### Estimate Log-Wage Model using Formula Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Estimates the log-wage model using the formula interface, similar to the 'hours' model. This demonstrates flexibility in specifying different dependent variables and instrument sets. ```python lwage = "lwage ~ educ + exper + expersq + [hours ~ age + kidslt6 + nwifeinc]" lwage_mod = IV2SLS.from_formula(lwage, data) lwage_res = lwage_mod.fit(cov_type="unadjusted") print(lwage_res) ``` -------------------------------- ### Enable Future Formula Sorting Behavior Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/changes/4.0.rst Use this import to enable the future behavior of not sorting variables in formulas, which changes the order of parameter estimates. ```python from linearmodels.__future__ import ordering ``` -------------------------------- ### OLS Estimation using IV2SLS Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/iv/introduction.rst Shows how to estimate an Ordinary Least Squares (OLS) model by setting both 'endog' and 'instruments' to None when initializing the IV2SLS estimator. This leverages the OLS nesting capability within the IV framework. ```python mod = IV2SLS(dependent, exog, None, None) ols_res = mod.fit() ``` -------------------------------- ### LIML Estimation and Comparison Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Performs LIML estimation and compares its results with 2SLS and GMM. LIML can offer better finite sample properties when the model is not strongly identified. ```python ivmod = IVLIML( data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]] ) res_liml = ivmod.fit(cov_type="robust") print(compare({"2SLS": res_2sls_robust, "LIML": res_liml, "GMM": res_gmm})) ``` -------------------------------- ### Display Greene's Table 10-2 Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Displays an image of Greene's Table 10-2 for comparison with the model results. Requires IPython.display. ```python from IPython.display import Image, display_png display_png(Image("system_correct-greene-table-10-2.png")) ``` -------------------------------- ### Initialize PanelOLS with xarray DataArray Source: https://github.com/bashtage/linearmodels/blob/main/examples/panel_data-formats.ipynb Initializes and fits a PanelOLS model using an xarray DataArray. The data is transposed to match the expected (time, entity) format for the model. ```python res = PanelOLS(da["lscrap"].T, da["hrsemp"].T, entity_effects=True).fit() print(res) ``` -------------------------------- ### Initialize PanelOLS with MultiIndex DataFrame Source: https://github.com/bashtage/linearmodels/blob/main/examples/panel_data-formats.ipynb Initializes and fits a PanelOLS model using a MultiIndex DataFrame. Entity effects are included in the model specification. ```python from linearmodels import PanelOLS mod = PanelOLS(mi_data.lscrap, mi_data.hrsemp, entity_effects=True) print(mod.fit()) ``` -------------------------------- ### Calculate First Stage Diagnostics with Multiple Instruments Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Fits a 2SLS model with multiple instruments and then prints the first stage diagnostics. This F-statistic can be divided by the number of instruments to approximate a rule-of-thumb threshold. ```python ivmod = IV2SLS(data.ldrugexp, data[controls], data.hi_empunion, data[instruments]) res_2sls_all = ivmod.fit() print(res_2sls_all.first_stage) ``` -------------------------------- ### PanelOLS with Time-Entity Data Format Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/panel/pandas.rst Demonstrates how to use the deprecated pandas.stats.plm.PanelOLS with data indexed by year and firm. Ensure your data is set with a MultiIndex of time and entity before calling. ```python tedata = data.set_index(['year','firm']) from pandas.stats import plm plm.PanelOLS(tedata['invest'],tedata[['value','capital']],entity_effects=True) ``` -------------------------------- ### GMM Estimation with Clustered Weights Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Demonstrates how to change the weighting matrix structure in GMM estimation by using clustered weights. The covariance estimator should match the weighting matrix. ```python ivmod = IVGMM( data.ldrugexp, data[controls], data.hi_empunion, data[["ssiratio", "multlc"]], weight_type="clustered", clusters=data.age, ) res_gmm_clustered = ivmod.fit(cov_type="clustered", clusters=data.age) ``` -------------------------------- ### Import IV Estimators Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Imports the main estimators for linear IV models: IV2SLS, IVGMM, IVGMMCUE, and IVLIML. ```python from linearmodels import IV2SLS, IVGMM, IVGMMCUE, IVLIML ``` -------------------------------- ### Compare IV2SLS 'OLS' Version with Statsmodels OLS Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Compares the parameters from the 'OLS' version of IV2SLS with the parameters from statsmodels OLS. This serves as a final check for equivalence. ```python import pandas as pd ivolsmod = IV2SLS(data.ldrugexp, data[["hi_empunion"] + controls], None, None) res_ivols = ivolsmod.fit() sm_ols = res_ols.params sm_ols.name = "sm" print(pd.concat([res_ivols.params, sm_ols], axis=1)) ``` -------------------------------- ### Load MORZ Dataset Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Loads the MORZ dataset from linearmodels.datasets. Ensure the dataset is available and imported correctly. ```python from linearmodels.datasets import mroz data = mroz.load() ``` -------------------------------- ### Load and Prepare MEPS Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Loads the MEPS dataset and removes rows with missing values. Prints the dataset description. ```python from linearmodels.datasets import meps data = meps.load() data = data.dropna() print(meps.DESCR) ``` -------------------------------- ### Estimate Risk-Free Rate with LinearFactorModel Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Estimates the risk-free rate optionally. Useful when excess returns are not available or the risk-free rate is misspecified. ```python from linearmodels.asset_pricing import LinearFactorModel factors = data[["MktRF", "HML", "Mom"]] portfolios = data[ ["S1M1", "S1M3", "S1M5", "S3M1", "S3M3", "S3M5", "S5M1", "S5M3", "S5M5"] ] mod = LinearFactorModel(portfolios, factors, risk_free=True) print(mod.fit()) ``` -------------------------------- ### Load Fringe Benefit Data Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Loads the fringe benefit dataset from linearmodels.datasets. Requires importing the dataset and printing its description. ```python from linearmodels.datasets import fringe print(fringe.DESCR) fdata = fringe.load() ``` -------------------------------- ### Estimate OLS for Men's Wages Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Estimates the effect of education on the log of wages for men using OLS, serving as a benchmark before applying IV methods. ```python res_ols = IV2SLS(np.log(men.wage), men[["const", "educ"]], None, None).fit( cov_type="unadjusted" ) print(res_ols) ``` -------------------------------- ### Extract and Format Parameters Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Extracts parameters from equations, concatenates them, and formats them for display. Assumes 'res' and 'pd' are defined. ```python params = [res.equations[label].params for label in res.equation_labels] params = pd.concat(params, axis=1) params.columns = res.equation_labels params.T.style.format("{:0.3f}") ``` -------------------------------- ### Display First Stage Diagnostics Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Prints the first stage diagnostics for a 2SLS regression, which helps assess the credibility of instruments for the endogenous regressor. For a single instrument, this is the squared t-statistic. ```python print(res_2sls.first_stage) ``` -------------------------------- ### Replicating Wu-Hausman Test Statistic Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_advanced-examples.ipynb Shows how to directly replicate the Wu-Hausman test statistic using a two-stage approach. The first stage regresses the endogenous variable on controls and instruments, and the second stage uses the residuals. ```python import pandas as pd step1 = IV2SLS(data.hi_empunion, data[["ssiratio"] + controls], None, None).fit() resids = step1.resids exog = pd.concat([data[["hi_empunion"] + controls], resids], axis=1) step2 = IV2SLS(data.ldrugexp, exog, None, None).fit(cov_type="unadjusted") print(step2.tstats.residual**2) ``` -------------------------------- ### Estimate OLS Model for Comparison Source: https://github.com/bashtage/linearmodels/blob/main/examples/iv_basic-examples.ipynb Estimates an OLS model to provide a baseline comparison for the instrumental variables estimate. This shows the effect of education on wage without accounting for endogeneity. ```python res = IV2SLS(np.log(data.wage), data[exog + endog], None, None).fit() print(res) ``` -------------------------------- ### 2-Step Estimation after Dropping Insignificant Factor Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Re-estimates the model after dropping an insignificant factor (SMB). This demonstrates model refinement. ```python from linearmodels.asset_pricing import LinearFactorModel factors = data[["MktRF", "HML", "Mom"]] portfolios = data[ ["S1M1", "S1M3", "S1M5", "S3M1", "S3M3", "S3M5", "S5M1", "S5M3", "S5M5"] ] mod = LinearFactorModel(portfolios, factors) print(mod.fit()) ``` -------------------------------- ### Visualize Factor Loadings Source: https://github.com/bashtage/linearmodels/blob/main/examples/asset-pricing_examples.ipynb Generates a heatmap visualization of the estimated factor loadings using seaborn. This helps in visually identifying patterns in the betas. ```python import seaborn as sns %matplotlib inline sns.heatmap(res.betas) ``` -------------------------------- ### Estimation Results Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/asset-pricing/reference.rst Classes for accessing and interpreting the results of asset pricing model estimations. ```APIDOC ## LinearFactorModelResults ### Description Stores and provides access to the results of a LinearFactorModel estimation. ### Method N/A (Class definition) ### Endpoint N/A (Class definition) ## GMMFactorModelResults ### Description Stores and provides access to the results of a GMMFactorModel estimation. ### Method N/A (Class definition) ### Endpoint N/A (Class definition) ``` -------------------------------- ### Estimate Multivariate OLS using SUR Interface Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_examples.ipynb Estimates a multivariate OLS model using the `SUR.multivariate_ls` method. Requires dependent variables and common regressors as input. Uses the `french` dataset for demonstration. ```python import statsmodels.api as sm from linearmodels.datasets import french data = french.load() factors = sm.add_constant(data[["MktRF"]]) mv_ols = SUR.multivariate_ls( data[["S1V1", "S1V3", "S1V5", "S5V1", "S5V3", "S5V5"]], factors ) mv_ols_res = mv_ols.fit(cov_type="unadjusted") print(mv_ols_res) ``` -------------------------------- ### Linear Factor Models Source: https://github.com/bashtage/linearmodels/blob/main/doc/source/asset-pricing/reference.rst Classes for estimating linear factor models for asset prices. ```APIDOC ## LinearFactorModel ### Description Represents a linear factor model for asset pricing. ### Method N/A (Class definition) ### Endpoint N/A (Class definition) ## LinearFactorModelGMM ### Description Represents a linear factor model estimated using Generalized Method of Moments (GMM). ### Method N/A (Class definition) ### Endpoint N/A (Class definition) ## TradedFactorModel ### Description Represents a traded factor model for asset pricing. ### Method N/A (Class definition) ### Endpoint N/A (Class definition) ``` -------------------------------- ### System GMM Estimation with Robust Weighting and Iterations Source: https://github.com/bashtage/linearmodels/blob/main/examples/system_three-stage-ls.ipynb Estimates a system of equations using the IVSystemGMM estimator with robust weighting, allowing for conditional heteroskedasticity. The estimation can be iterated up to a specified limit. ```python system_gmm = IVSystemGMM.from_formula(equations, data, weight_type="robust") system_gmm_res = system_gmm.fit(cov_type="robust", iter_limit=100) print("Number of iterations: " + str(system_gmm_res.iterations)) print(system_gmm_res) ```