# Starsim - Agent-Based Disease Modeling Framework ## Introduction Starsim is a powerful Python framework for building agent-based models (ABMs) of disease transmission and other complex systems. It provides a flexible, modular architecture that allows researchers and public health professionals to simulate disease dynamics, intervention strategies, and demographic processes. The framework supports a wide range of infectious diseases including SIR/SIS models, HIV, Ebola, Measles, and Cholera, while also enabling non-communicable disease (NCD) modeling. Starsim is built on top of NumPy, Pandas, and SciPy, offering high-performance simulations with intuitive APIs for defining disease parameters, network structures, and intervention strategies. At its core, Starsim uses an agent-based approach where individual people (agents) interact through networks, transmit diseases, and respond to interventions. The framework handles complex time-based calculations through its sophisticated duration and rate system, supporting both calendar dates and relative time periods. Key features include automatic unit conversion between days, weeks, months, and years; probability-based event modeling; multi-run simulations for uncertainty quantification; and Optuna-based calibration for fitting models to real-world data. Starsim is designed to be extensible, allowing users to create custom disease modules, network types, and interventions while leveraging the built-in infrastructure for state management and results tracking. ## API Documentation ### Core Simulation Class #### `ss.Sim` - Main Simulation Engine The `Sim` class is the central component for running simulations. It manages all modules (diseases, networks, demographics, interventions) and coordinates the simulation timeline. ```python import starsim as ss # Basic simulation setup sim = ss.Sim( n_agents=10000, # Number of agents (people) start=2020, # Start year stop=2030, # End year dt=1.0, # Timestep in years (default) rand_seed=42, # Random seed for reproducibility diseases='sir', # Disease module(s) networks='random', # Network type(s) demographics=True, # Enable births/deaths verbose=True # Print progress ) # Run simulation sim.run() # Access results print(sim.results) sim.plot() ``` ```python # Advanced simulation with explicit module configuration import starsim as ss # Define disease parameters sir = ss.SIR( beta=0.1, # Transmission rate init_prev=0.01, # Initial prevalence (1%) dur_inf=10 # Duration of infection (days) ) # Define network network = ss.random(n_contacts=ss.poisson(4)) # Average 4 contacts # Define demographics births = ss.Births(birth_rate=ss.peryear(0.02)) deaths = ss.Deaths(death_rate=ss.peryear(0.01)) # Create and run simulation sim = ss.Sim( n_agents=50000, start=ss.date('2020-01-01'), stop=ss.date('2025-12-31'), dt=ss.days(1), # Daily timestep diseases=sir, networks=network, demographics=[births, deaths], rand_seed=123 ) sim.run() results = sim.results ``` ### Disease Modules #### `ss.SIR` - Susceptible-Infected-Recovered Model ```python import starsim as ss # Basic SIR model sir = ss.SIR( beta=0.15, # Transmission probability per contact init_prev=0.005, # Initial 0.5% infected dur_inf=ss.days(14) # 14-day infection duration ) # SIR with time-varying beta sir_seasonal = ss.SIR( beta=lambda t: 0.1 + 0.05 * np.sin(2 * np.pi * t), # Seasonal variation init_prev=0.01, dur_inf=ss.weeks(2) ) sim = ss.Sim(n_agents=10000, diseases=sir, networks='random') sim.run() # Access disease-specific results print(f"Peak infections: {sim.results.sir.n_infected.max()}") print(f"Total recovered: {sim.results.sir.n_recovered[-1]}") ``` #### `ss.SIS` - Susceptible-Infected-Susceptible Model ```python import starsim as ss # SIS model (no permanent immunity) sis = ss.SIS( beta=0.2, init_prev=0.02, dur_inf=ss.days(7) # Agents return to susceptible after recovery ) sim = ss.Sim( n_agents=5000, diseases=sis, networks=ss.random(n_contacts=3), start=2020, stop=2025 ) sim.run() ``` #### `ss.SIS_vaccine` - SIS with Vaccination ```python import starsim as ss # SIS model with vaccine protection sis_vax = ss.SIS_vaccine( beta=0.25, init_prev=0.01, dur_inf=ss.days(10), waning=ss.peryear(0.1) # Vaccine protection wanes over time ) # Add vaccination intervention vax = ss.routine_vx( product=ss.Vaccine(efficacy=0.8), prob=0.05, # 5% vaccination rate per year start_year=2022 ) sim = ss.Sim( n_agents=10000, diseases=sis_vax, interventions=vax, networks='random' ) sim.run() ``` #### `ss.NCD` - Non-Communicable Disease ```python import starsim as ss # Model a non-communicable disease (e.g., diabetes) ncd = ss.NCD( init_prev=0.05, # 5% initial prevalence incidence=ss.peryear(0.01), # 1% annual incidence progression=ss.peryear(0.02), # 2% annual progression rate ) sim = ss.Sim( n_agents=20000, diseases=ncd, demographics=True, start=2020, stop=2040 ) sim.run() ``` ### Network Types #### `ss.random` - Random Network ```python import starsim as ss # Simple random network net = ss.random(n_contacts=4) # Fixed 4 contacts per agent # Random network with Poisson-distributed contacts net = ss.random(n_contacts=ss.poisson(mean=5)) sim = ss.Sim(n_agents=1000, networks=net, diseases='sir') ``` #### `ss.static` - Static Network ```python import starsim as ss import numpy as np # Create custom static network from edge list edges = np.array([[0, 1], [0, 2], [1, 3], [2, 3]]) net = ss.static(edges=edges) # Or from adjacency matrix adj_matrix = np.random.randint(0, 2, (100, 100)) net = ss.static(adj=adj_matrix) ``` #### `ss.sexual` - Sexual Contact Network ```python import starsim as ss # Sexual network with age-based mixing sexual_net = ss.sexual( mean_contacts=ss.lognormal(mean=2, std=1), duration=ss.years(3), # Average relationship duration debut_age=15, # Age of sexual debut mixing='assortative' # Age-assortative mixing ) # Multi-layer sexual network hiv_sim = ss.Sim( n_agents=10000, diseases='hiv', networks=sexual_net, demographics=True ) ``` #### `ss.MFNetwork` - Male-Female Partnership Network ```python import starsim as ss # Male-female partnership network mf_net = ss.MFNetwork( mean_dur=ss.years(5), # Average partnership duration n_partners=ss.poisson(1.5) # Average number of partners ) sim = ss.Sim( n_agents=5000, networks=mf_net, diseases='sti' ) ``` #### `ss.maternal` - Maternal Transmission Network ```python import starsim as ss # Maternal network for vertical transmission maternal_net = ss.maternal() # Often combined with other networks sim = ss.Sim( n_agents=10000, networks=[ss.random(n_contacts=3), maternal_net], diseases='hiv', demographics=True ) ``` ### Time and Duration System #### Duration Classes ```python import starsim as ss # Creating durations one_year = ss.years(1) two_weeks = ss.weeks(2) thirty_days = ss.days(30) six_months = ss.months(6) # Shorthand constants ss.year # = ss.years(1) ss.month # = ss.months(1) ss.week # = ss.weeks(1) ss.day # = ss.days(1) # Duration arithmetic total = ss.years(2) + ss.months(6) # 2.5 years diff = ss.weeks(10) - ss.days(14) # 8 weeks # Unit conversion dur = ss.years(0.5) print(dur.days) # ~182.5 print(dur.weeks) # ~26.1 print(dur.months) # 6.0 # Using in disease parameters sir = ss.SIR( dur_inf=ss.days(14), # Infection lasts 14 days beta=0.1 ) ``` #### Date Handling ```python import starsim as ss # Creating dates date1 = ss.date(2024) # <2024-01-01> date2 = ss.date('2024-06-15') # <2024-06-15> date3 = ss.date(year=2024, month=3, day=1) # <2024-03-01> # Date arithmetic future = ss.date(2024) + ss.years(2) # <2026-01-01> past = ss.date(2024) - ss.months(6) # <2023-07-01> # Date ranges dates = ss.date.arange(2020, 2025, step=ss.months(3)) # Converting dates to years year_float = ss.date('2024-07-01').to_year() # ~2024.5 # Using in simulations sim = ss.Sim( start=ss.date('2020-01-01'), stop=ss.date('2030-12-31'), dt=ss.weeks(1) ) ``` #### Rate Classes ```python import starsim as ss # ss.per - Instantaneous rate (converts to probability on multiplication) death_rate = ss.peryear(0.02) # 2% per year infection_rate = ss.perday(0.01) # 1% per day # Rate conversion to probability prob = death_rate * ss.year # Probability over 1 year daily_prob = death_rate * ss.day # Probability over 1 day # ss.prob - Probability with time units p_infection = ss.probperyear(0.1) # 10% probability per year p_daily = p_infection * ss.days(1) # Convert to daily probability # ss.freq - Event frequency (simple multiplication) acts_per_year = ss.freqperyear(50) # 50 acts per year acts_per_month = acts_per_year * ss.month # ~4.17 acts per month # Using rates in disease models disease = ss.SIR( beta=ss.perday(0.05), # Transmission rate per day dur_inf=ss.days(10) ) # Demographics with rates births = ss.Births(birth_rate=ss.peryear(0.025)) deaths = ss.Deaths(death_rate=ss.peryear(0.015)) ``` ### Distributions ```python import starsim as ss # Available distributions normal = ss.normal(mean=10, std=2) lognormal = ss.lognormal(mean=5, std=1) uniform = ss.uniform(low=0, high=10) poisson = ss.poisson(mean=4) bernoulli = ss.bernoulli(p=0.3) exponential = ss.exponential(rate=0.1) # Using distributions in parameters sir = ss.SIR( beta=ss.lognormal(mean=0.1, std=0.02), # Stochastic beta dur_inf=ss.normal(mean=14, std=3) # Variable infection duration ) # Network with distribution-based contacts net = ss.random(n_contacts=ss.poisson(mean=5)) # Sampling from distributions samples = normal.sample(n=1000) # Distributions with time units dur_dist = ss.days(ss.lognormal(mean=10, std=3)) # Duration in days ``` ### Demographics Modules #### `ss.Births` - Birth Module ```python import starsim as ss # Constant birth rate births = ss.Births(birth_rate=ss.peryear(0.02)) # Time-varying birth rate def declining_births(t): return ss.peryear(0.03 - 0.001 * (t - 2020)) births = ss.Births(birth_rate=declining_births) # Age-structured births births = ss.Births( birth_rate=ss.peryear(0.02), fertility_age=[15, 49] # Reproductive age range ) ``` #### `ss.Deaths` - Death Module ```python import starsim as ss # Simple death rate deaths = ss.Deaths(death_rate=ss.peryear(0.01)) # Age-dependent mortality deaths = ss.Deaths( death_rate=lambda age: ss.peryear(0.001 * np.exp(0.05 * age)) ) # Complete demographic module demographics = [ ss.Births(birth_rate=ss.peryear(0.02)), ss.Deaths(death_rate=ss.peryear(0.01)) ] sim = ss.Sim( n_agents=10000, demographics=demographics, diseases='sir' ) ``` #### `ss.Pregnancy` - Pregnancy Module ```python import starsim as ss # Pregnancy with maternal transmission pregnancy = ss.Pregnancy( fertility_rate=ss.peryear(0.1), dur_pregnancy=ss.months(9) ) maternal_net = ss.maternal() sim = ss.Sim( n_agents=10000, demographics=[pregnancy, ss.Deaths()], networks=[ss.random(), maternal_net], diseases='hiv' ) ``` ### Interventions #### Vaccination ```python import starsim as ss # Define vaccine product vaccine = ss.Vaccine( efficacy=0.9, # 90% efficacy dur_immunity=ss.years(5) # 5 years of protection ) # Routine vaccination (ongoing) routine_vax = ss.routine_vx( product=vaccine, prob=0.7, # 70% coverage start_year=2022, end_year=2030 ) # Campaign vaccination (specific years) campaign_vax = ss.campaign_vx( product=vaccine, prob=0.8, # 80% coverage years=[2023, 2025, 2027] ) # Age-targeted vaccination def eligible_for_vax(sim): return sim.people.age >= 12 # Ages 12+ routine_vax = ss.routine_vx( product=vaccine, prob=0.6, eligibility=eligible_for_vax, start_year=2022 ) sim = ss.Sim( n_agents=50000, diseases='measles', interventions=[routine_vax, campaign_vax], demographics=True ) ``` #### Screening and Testing ```python import starsim as ss # Define screening product test = ss.Diagnostic( sensitivity=0.95, specificity=0.98 ) # Routine screening routine_screen = ss.routine_screening( product=test, prob=0.1, # 10% annual screening rate start_year=2022, eligibility=lambda sim: sim.people.age >= 18 ) # Campaign screening campaign_screen = ss.campaign_screening( product=test, prob=0.5, # 50% coverage years=[2023, 2025] ) sim = ss.Sim( n_agents=20000, diseases='hiv', interventions=[routine_screen] ) ``` #### Treatment ```python import starsim as ss # Define treatment product treatment = ss.Treatment( efficacy=0.85, dur_treatment=ss.days(14) ) # Treat fixed number per timestep treat_intervention = ss.treat_num( product=treatment, max_capacity=100, # Treat up to 100 per timestep prob=0.9, # 90% acceptance eligibility=lambda sim: sim.diseases.sir.infected.uids ) sim = ss.Sim( n_agents=10000, diseases='sir', interventions=treat_intervention ) ``` ### Multi-Simulation and Analysis #### `ss.MultiSim` - Multiple Simulations ```python import starsim as ss # Create base simulation base_sim = ss.Sim( n_agents=10000, diseases='sir', networks='random', rand_seed=None # Will be varied ) # Run multiple simulations msim = ss.MultiSim( base_sim, n_runs=20, # 20 replicates parallel=True # Run in parallel ) msim.run() # Reduce to mean and confidence intervals msim.reduce() # Plot with uncertainty msim.plot() # Access individual sim results for sim in msim.sims: print(f"Seed {sim.pars.rand_seed}: Peak infected = {sim.results.sir.n_infected.max()}") ``` #### Parameter Sweeps ```python import starsim as ss import numpy as np # Sweep over beta values betas = np.linspace(0.05, 0.3, 10) sims = [] for beta in betas: sim = ss.Sim( n_agents=5000, diseases=ss.SIR(beta=beta, init_prev=0.01), networks='random', rand_seed=42 ) sims.append(sim) msim = ss.MultiSim(sims) msim.run() # Analyze results for sim in msim.sims: beta = sim.diseases.sir.pars.beta peak = sim.results.sir.n_infected.max() print(f"Beta={beta:.2f}: Peak={peak}") ``` ### Calibration #### `ss.Calibration` - Parameter Optimization ```python import starsim as ss import pandas as pd # Prepare observed data observed_data = pd.DataFrame({ 't': [2020.5, 2021.0, 2021.5, 2022.0], 'x': [50, 120, 200, 150], # Observed cases 'n': [10000, 10000, 10000, 10000] # Population }).set_index('t') # Define calibration parameters calib_pars = { 'beta': { 'low': 0.05, 'high': 0.3, 'guess': 0.15 }, 'init_prev': { 'low': 0.001, 'high': 0.05, 'guess': 0.01 } } # Define build function def build_sim(sim, calib_pars, **kwargs): sim.diseases.sir.pars.beta = calib_pars['beta']['value'] sim.diseases.sir.pars.init_prev = calib_pars['init_prev']['value'] return sim # Define extraction function for calibration component def extract_cases(sim): return pd.DataFrame({ 't': sim.t.yearvec, 'x': sim.results.sir.n_infected, 'n': [sim.n_agents] * len(sim.t.yearvec) }).set_index('t') # Create calibration component component = ss.Binomial( name='cases', expected=observed_data, extract_fn=extract_cases, conform='prevalent' ) # Create base simulation base_sim = ss.Sim( n_agents=10000, diseases='sir', networks='random', start=2020, stop=2023 ) # Run calibration calib = ss.Calibration( sim=base_sim, calib_pars=calib_pars, build_fn=build_sim, components=[component], total_trials=100, n_workers=4 ) calib.calibrate() # View results print(f"Best parameters: {calib.best_pars}") calib.check_fit() ``` #### Calibration Components ```python import starsim as ss import pandas as pd # BetaBinomial - For count data with uncertainty beta_binom = ss.BetaBinomial( name='prevalence', expected=observed_prev, extract_fn=extract_prevalence, conform='prevalent' ) # Normal - For continuous data normal_comp = ss.Normal( name='incidence_rate', expected=observed_rates, extract_fn=extract_rates, conform='incident', sigma2=0.01 # Known variance ) # GammaPoisson - For count data with overdispersion gamma_poisson = ss.GammaPoisson( name='deaths', expected=observed_deaths, extract_fn=extract_deaths, conform='incident' ) # DirichletMultinomial - For multinomial outcomes dirichlet = ss.DirichletMultinomial( name='age_distribution', expected=observed_ages, extract_fn=extract_age_dist, conform='prevalent' ) ``` ### Results and Plotting ```python import starsim as ss # Run simulation sim = ss.Sim( n_agents=10000, diseases='sir', networks='random' ) sim.run() # Access results results = sim.results # Disease-specific results sir_results = results.sir print(f"Total susceptible: {sir_results.n_susceptible}") print(f"Total infected: {sir_results.n_infected}") print(f"Total recovered: {sir_results.n_recovered}") print(f"Cumulative infections: {sir_results.cum_infections}") # Time vector print(f"Time points: {sim.t.timevec}") print(f"Year vector: {sim.t.yearvec}") # Basic plotting sim.plot() # Custom plotting import matplotlib.pyplot as plt fig, axes = plt.subplots(1, 2, figsize=(12, 5)) axes[0].plot(sim.t.yearvec, results.sir.n_infected) axes[0].set_xlabel('Year') axes[0].set_ylabel('Number Infected') axes[0].set_title('Infection Dynamics') axes[1].stackplot(sim.t.yearvec, results.sir.n_susceptible, results.sir.n_infected, results.sir.n_recovered, labels=['S', 'I', 'R']) axes[1].legend() axes[1].set_xlabel('Year') axes[1].set_ylabel('Population') axes[1].set_title('SIR Compartments') plt.tight_layout() plt.show() ``` ### Custom Modules ```python import starsim as ss import numpy as np # Custom disease module class CustomDisease(ss.Disease): def __init__(self, beta=0.1, gamma=0.05, **kwargs): super().__init__(**kwargs) self.define_pars( beta=beta, gamma=gamma ) self.define_states( ss.BoolArr('exposed'), ss.BoolArr('infected'), ss.FloatArr('ti_exposed'), ss.FloatArr('ti_infected') ) def init_results(self): super().init_results() self.define_results( ss.Result('n_exposed', dtype=int, scale=True), ss.Result('n_infected', dtype=int, scale=True) ) def step_state(self): # Custom state transitions ti = self.sim.ti exposed_uids = self.exposed.uids # Progress from exposed to infected dur_exposed = ti - self.ti_exposed[exposed_uids] progress = dur_exposed > 5 # 5 timesteps latent period new_infected = exposed_uids[progress] self.exposed[new_infected] = False self.infected[new_infected] = True self.ti_infected[new_infected] = ti return # Custom intervention class CustomIntervention(ss.Intervention): def __init__(self, coverage=0.5, **kwargs): super().__init__(**kwargs) self.coverage = coverage def step(self): sim = self.sim # Custom intervention logic eligible = self.check_eligibility() treated = ss.bernoulli(p=self.coverage).filter(eligible) # Apply effects... return treated # Use custom modules sim = ss.Sim( n_agents=10000, diseases=CustomDisease(beta=0.15), interventions=CustomIntervention(coverage=0.3) ) ``` ## Summary Starsim's primary use cases center on epidemiological modeling and public health research. Researchers use Starsim to simulate disease outbreaks, evaluate intervention strategies, and inform public health policy decisions. The framework excels at modeling both acute infectious diseases (like influenza or COVID-19 using SIR/SIS models) and chronic conditions (like HIV or non-communicable diseases). Its agent-based approach captures heterogeneity in populations, including varying contact patterns, demographics, and individual-level risk factors. The calibration system allows fitting models to real-world surveillance data, making Starsim suitable for forecasting and scenario analysis. Integration with Starsim typically follows a modular pattern: users define disease parameters, configure population networks reflecting realistic contact structures, add demographic processes for long-term projections, and layer interventions to test control strategies. The framework integrates seamlessly with the Python scientific ecosystem - NumPy arrays for efficient computation, Pandas DataFrames for data handling, Matplotlib for visualization, and Optuna for optimization. For production use, Starsim supports parallel execution through MultiSim for uncertainty quantification, database-backed calibration for distributed computing environments, and comprehensive results export for downstream analysis. Common integration patterns include connecting Starsim outputs to economic models for cost-effectiveness analysis, linking with geographic information systems for spatial modeling, and embedding in larger pipelines for automated scenario generation and reporting.