Sciris (sciris/sciris)

Sciris

https://github.com/sciris/sciris
Admin
Sciris is a Python package for scientific computing, providing tools for modeling, simulation, and...

Tokens:66,940
Snippets:559
Trust Score:7.5
Update:1 month ago
Show doc for...
Context Summary (auto-generated)
Raw
# Sciris

Sciris is a powerful Python library for scientific computing that provides a collection of utilities to simplify common tasks in data analysis, file I/O, parallel processing, and visualization. Built on top of NumPy, Pandas, and Matplotlib, it offers intuitive interfaces for everyday operations like saving/loading objects, timing code, working with dates, and handling nested data structures. The library is designed to reduce boilerplate code and make scientific workflows more efficient and readable.

The core philosophy of Sciris is to provide "batteries-included" functionality that covers the gaps between Python's standard library and scientific computing packages. It includes enhanced containers like `odict` (ordered dictionary with array-like indexing), flexible file I/O supporting pickles, JSON, and YAML, easy parallelization tools, robust date/time handling, and numerous helper functions for common operations. Whether you're building simulations, analyzing data, or developing scientific applications, Sciris aims to make your code cleaner and more maintainable.

## File I/O Functions

### sc.save() / sc.load()

Save and load any Python object to/from disk using compressed pickle format. These functions handle serialization automatically with gzip or zstandard compression, making it easy to persist complex objects including custom classes, NumPy arrays, and nested data structures.

```python
import sciris as sc
import numpy as np

# Save any Python object
data = {
    'name': 'experiment_1',
    'results': np.random.rand(100, 50),
    'metadata': {'date': '2024-01-15', 'version': 2.1}
}
sc.save('experiment.obj', data)

# Load the object back
loaded = sc.load('experiment.obj')
print(loaded['name'])  # 'experiment_1'
print(loaded['results'].shape)  # (100, 50)

# Save with zstandard compression for better compression ratio
sc.save('experiment.zst', data, compression='zstd')

# Or use the zsave shortcut
sc.zsave('experiment_zstd.obj', data)

# Load works automatically with any compression format
loaded_zstd = sc.load('experiment.zst')
```

### sc.savejson() / sc.loadjson()

Save and load JSON files with automatic handling of NumPy arrays, dates, and other non-JSON-serializable types. Provides a more robust alternative to the standard json module for scientific data.

```python
import sciris as sc
import numpy as np

# Save data as JSON
config = {
    'parameters': {'alpha': 0.5, 'beta': 1.2},
    'array_data': np.array([1, 2, 3, 4, 5]),
    'enabled': True
}
sc.savejson('config.json', config)

# Load JSON with automatic type conversion
loaded_config = sc.loadjson('config.json')
print(loaded_config['parameters']['alpha'])  # 0.5

# Pretty-print JSON to file
sc.savejson('config_pretty.json', config, indent=2)
```

### sc.loadtext() / sc.savetext()

Convenience functions for reading and writing text files with minimal boilerplate.

```python
import sciris as sc

# Save text to file
content = ['Line 1: Introduction', 'Line 2: Methods', 'Line 3: Results']
sc.savetext('document.txt', content)

# Load text from file
text = sc.loadtext('document.txt')
print(text)  # Full file contents as string

# Load as list of lines
lines = sc.loadtext('document.txt', splitlines=True)
print(lines[0])  # 'Line 1: Introduction'
```

### sc.thisdir() / sc.glob()

Get the current directory path and find files using glob patterns. Essential utilities for file path handling in scripts.

```python
import sciris as sc

# Get the directory of the current script
current_dir = sc.thisdir()
print(current_dir)  # e.g., '/home/user/project/scripts'

# Get a file path relative to current script
config_path = sc.thisdir('config', 'settings.json')

# Find all Python files in a directory
py_files = sc.glob('~/projects', '*.py', abspath=True)
print(py_files)  # List of absolute paths to .py files

# Find files recursively
all_csvs = sc.glob('.', '**/*.csv', recursive=True)

# Find only files (not directories)
data_files = sc.glob('./data', filesonly=True)
```

## Ordered Dictionary (odict)

### sc.odict

An enhanced ordered dictionary that supports integer indexing, slicing, and array-like operations. Combines the best features of dictionaries, lists, and arrays into a single flexible container.

```python
import sciris as sc
import numpy as np

# Create an odict
data = sc.odict(
    temperatures=[20.1, 21.3, 22.5, 23.1],
    pressures=[101.2, 101.5, 101.8, 102.0],
    humidity=[45, 50, 55, 60]
)

# Access by key (like a dict)
print(data['temperatures'])  # [20.1, 21.3, 22.5, 23.1]

# Access by integer index (like a list)
print(data[0])  # [20.1, 21.3, 22.5, 23.1]
print(data[1])  # [101.2, 101.5, 101.8, 102.0]

# Slice access returns numpy array
print(data[:2])  # array with first two values

# Iterate with index, key, and value
for i, key, value in data.enumitems():
    print(f'{i}: {key} = {value}')

# Use as defaultdict
nested = sc.odict(defaultdict=list)
nested['new_key'].append('auto-created')

# Infinitely nested dictionary
deep = sc.odict(defaultdict='nested')
deep['level1']['level2']['level3'] = 'value'
print(deep['level1']['level2']['level3'])  # 'value'
```

### sc.objdict

Like odict, but allows attribute-style access to dictionary keys. Perfect for configuration objects and structured data.

```python
import sciris as sc

# Create an objdict
config = sc.objdict(
    model='ResNet50',
    learning_rate=0.001,
    batch_size=32,
    epochs=100
)

# Access via attribute (cleaner syntax)
print(config.model)  # 'ResNet50'
print(config.learning_rate)  # 0.001

# Still works like a dictionary
config['dropout'] = 0.5
print(config.dropout)  # 0.5

# Iterate like a dictionary
for key, value in config.items():
    print(f'{key}: {value}')

# Convert from regular dict
settings = sc.objdict({'name': 'experiment', 'seed': 42})
print(settings.name)  # 'experiment'
```

## Date and Time Functions

### sc.now() / sc.date()

Get current time and convert various date formats to Python datetime objects. Provides flexible date parsing and formatting.

```python
import sciris as sc

# Get current time
current_time = sc.now()
print(current_time)  # datetime object: 2024-01-15 14:30:45

# Get time as string
time_str = sc.now(astype='str')
print(time_str)  # '2024-Jan-15 14:30:45'

# Get time in specific timezone
utc_time = sc.now(utc=True)
pacific = sc.now(timezone='US/Pacific')

# Convert string to date
date1 = sc.date('2024-03-15')
print(date1)  # datetime.date(2024, 3, 15)

# Convert multiple dates at once
dates = sc.date(['2024-01-01', '2024-06-15', '2024-12-31'])

# Convert integer offset to date
day_10 = sc.date(10, start_date='2024-01-01')
print(day_10)  # datetime.date(2024, 1, 11)

# Output as string
date_str = sc.date('2024-03-15', to='str', outformat='%Y/%m/%d')
print(date_str)  # '2024/03/15'
```

### sc.daterange() / sc.datedelta()

Generate date ranges and perform date arithmetic. Useful for time series analysis and scheduling.

```python
import sciris as sc

# Generate a range of dates
dates = sc.daterange('2024-01-01', '2024-01-10')
print(dates)  # ['2024-01-01', '2024-01-02', ..., '2024-01-10']

# Generate dates with interval
monthly = sc.daterange('2024-01-01', '2024-12-01', interval='month')
print(monthly)  # First day of each month

# Generate dates using delta
dates_5weeks = sc.daterange('2024-01-01', weeks=5)

# Perform date arithmetic
future = sc.datedelta('2024-01-15', days=30)
print(future)  # '2024-02-14'

past = sc.datedelta('2024-06-15', months=-3)
print(past)  # '2024-03-15'

# Add multiple units
new_date = sc.datedelta('2024-01-01', years=1, months=2, days=15)
print(new_date)  # '2025-03-16'

# Calculate difference between dates
diff = sc.daydiff('2024-01-01', '2024-03-15')
print(diff)  # 74 days
```

### sc.tic() / sc.toc() / sc.timer()

Simple and intuitive timing functions for measuring code execution time. Much cleaner than using the time module directly.

```python
import sciris as sc
import numpy as np

# Simple timing with tic/toc
sc.tic()
result = np.random.rand(1000, 1000) @ np.random.rand(1000, 1000)
sc.toc()  # Prints: Elapsed time: 0.234 s

# Named timing
sc.tic()
# ... some operation ...
elapsed = sc.toc(output=True)  # Returns elapsed time without printing

# Timer context manager (recommended)
with sc.timer('Matrix multiplication'):
    result = np.random.rand(1000, 1000) @ np.random.rand(1000, 1000)
# Output: Matrix multiplication: 0.234 s

# Timer object for multiple measurements
T = sc.timer()

T.tic()
np.sort(np.random.rand(1_000_000))
T.toc('Sorting 1M elements')

T.tic()
np.fft.fft(np.random.rand(1_000_000))
T.toc('FFT of 1M elements')

# Get timing summary
print(T)  # Shows all recorded times
print(f'Total time: {T.total} seconds')
```

## Parallelization

### sc.parallelize()

Easy parallelization of functions across multiple CPU cores. Abstracts away the complexity of multiprocessing with a simple interface.

```python
import sciris as sc
import numpy as np

# Define a function to parallelize
def process_data(x, multiplier=1):
    sc.randsleep()  # Simulate variable processing time
    return x ** 2 * multiplier

# Parallelize over a range of values
results = sc.parallelize(process_data, iterarg=range(10))
print(results)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Parallelize with additional arguments
results = sc.parallelize(
    process_data,
    iterarg=range(10),
    kwargs={'multiplier': 2}
)
print(results)  # [0, 2, 8, 18, 32, 50, 72, 98, 128, 162]

# Use iterkwargs for varying keyword arguments
results = sc.parallelize(
    process_data,
    iterkwargs=[
        {'x': 1, 'multiplier': 1},
        {'x': 2, 'multiplier': 2},
        {'x': 3, 'multiplier': 3},
    ]
)
print(results)  # [1, 8, 27]

# Control number of CPUs
results = sc.parallelize(process_data, iterarg=range(20), ncpus=4)

# Show progress bar
results = sc.parallelize(process_data, iterarg=range(100), progress=True)

# Run serially for debugging
results = sc.parallelize(process_data, iterarg=range(10), serial=True)
```

### sc.Parallel (Advanced Usage)

For more control over parallel execution, use the Parallel class directly with async monitoring.

```python
import sciris as sc

def slow_computation(i):
    sc.randsleep(seed=i)
    return i ** 2

# Create parallel manager
P = sc.Parallel(
    slow_computation,
    iterarg=range(20),
    parallelizer='multiprocess-async',
    ncpus=4
)

# Start async execution
P.run_async()

# Monitor progress
P.monitor()  # Displays progress bar

# Get results
P.finalize()
print(P.results)  # [0, 1, 4, 9, 16, ...]
print(P.times)  # Timing information for each job
```

## Dataframe Extensions

### sc.dataframe

An extended pandas DataFrame with additional convenience methods for flexible row/column access, data manipulation, and easier syntax for common operations.

```python
import sciris as sc
import numpy as np

# Create a dataframe
df = sc.dataframe(
    a=[1, 2, 3, 4, 5],
    b=[10, 20, 30, 40, 50],
    c=['x', 'y', 'z', 'w', 'v']
)

# Access by column name (standard)
print(df['a'])  # Column 'a'

# Access by row index (integer)
print(df[0])  # First row
print(df[1:3])  # Rows 1 and 2

# Access by row and column
print(df['a', 2])  # Value at column 'a', row 2 (result: 3)
print(df[2, 'b'])  # Same result, different order

# Slice operations
df[0, :] = [100, 1000, 'new']  # Set entire row
print(df)

# Add new column
df.addcol('d', [1.1, 2.2, 3.3, 4.4, 5.5])

# Remove column
df.rmcol('d')

# Append row
df.append([6, 60, 'u'])

# Insert row at position
df.insertrow(2, [2.5, 25, 'y.5'])

# Remove row by value
df.rmrow(100)  # Remove row where first column equals 100

# Sort by column
df.sort('b', reverse=True)

# Find row by value
row = df.findrow(3)  # Find row where first column equals 3
print(row)
```

## Array and Math Utilities

### sc.findinds() / sc.findnearest()

Find array indices matching conditions or nearest values. More flexible than NumPy's native functions.

```python
import sciris as sc
import numpy as np

data = np.array([1.0, 2.5, 3.0, 2.5, 5.0, 2.5, 7.0])

# Find indices where value equals 2.5 (with floating point tolerance)
indices = sc.findinds(data, 2.5)
print(indices)  # array([1, 3, 5])

# Find first matching index
first = sc.findinds(data, 2.5, first=True)
print(first)  # 1

# Find last matching index
last = sc.findinds(data, 2.5, last=True)
print(last)  # 5

# Find indices matching a condition
small = sc.findinds(data < 3)
print(small)  # array([0, 1, 3, 5])

# Multiple conditions
combined = sc.findinds(data > 2, data < 6)
print(combined)  # array([1, 2, 3, 4, 5])

# Find nearest value in array
series = np.array([0, 10, 20, 30, 40, 50])
nearest_idx = sc.findnearest(series, 23)
print(nearest_idx)  # 2 (value 20 is nearest to 23)

# Find nearest for multiple values
indices = sc.findnearest(series, [15, 35, 47])
print(indices)  # array([1, 3, 5])
```

### sc.smooth() / sc.rolling()

Apply smoothing and rolling operations to data arrays for noise reduction and trend analysis.

```python
import sciris as sc
import numpy as np

# Create noisy data
np.random.seed(42)
noisy = np.sin(np.linspace(0, 4*np.pi, 100)) + np.random.randn(100) * 0.3

# Simple smoothing
smoothed = sc.smooth(noisy, window=5)

# Rolling average
rolling_avg = sc.rolling(noisy, window=10, operation='mean')

# Rolling with different operations
rolling_sum = sc.rolling(noisy, window=10, operation='sum')
rolling_std = sc.rolling(noisy, window=10, operation='std')

# 2D smoothing
data_2d = np.random.rand(50, 50)
smoothed_2d = sc.smooth(data_2d, window=3)
```

## Printing and Display

### sc.pr() / sc.prettyobj

Pretty-print detailed representations of objects including methods, properties, and attributes. Invaluable for debugging and exploration.

```python
import sciris as sc
import numpy as np

# Pretty-print any object to see its structure
df = sc.dataframe(a=[1,2,3], b=[4,5,6])
sc.pr(df)  # Shows all methods, properties, and attributes

# Create objects with automatic pretty printing
class MyModel(sc.prettyobj):
    def __init__(self):
        self.weights = np.random.rand(10, 10)
        self.bias = np.zeros(10)
        self.learning_rate = 0.001

    def train(self):
        pass

    def predict(self, x):
        return x @ self.weights + self.bias

model = MyModel()
print(model)  # Automatically shows all attributes and methods

# For large objects, use quickobj (doesn't print values)
class BigModel(sc.quickobj):
    def __init__(self):
        self.big_array = np.random.rand(1000, 1000)

big = BigModel()
print(big)  # Shows structure without printing large arrays
```

### sc.sigfig() / sc.heading()

Format numbers with significant figures and print formatted headings for readable output.

```python
import sciris as sc
import numpy as np

# Format number with significant figures
value = 3432.3842
print(sc.sigfig(value, sigfigs=3))  # '3430'
print(sc.sigfig(value, sigfigs=5))  # '3432.4'

# Use SI notation
print(sc.sigfig(1234567, SI=True))  # '1.235M'
print(sc.sigfig(0.00234, sigfigs=2))  # '0.0023'

# Format array of numbers
values = np.array([1234, 5678, 91011])
formatted = sc.sigfig(values, sigfigs=2)
print(formatted)  # ['1200', '5700', '91000']

# Print formatted headings
sc.heading('Results Summary')
# Output:
# =====================================
# Results Summary
# =====================================

sc.heading('Section 1', level=2)
# Output:
# --- Section 1 ---

# Print with color
sc.heading('Important Notice', color='red')
```

### sc.progressbar()

Display text-based progress bars for long-running operations without additional dependencies.

```python
import sciris as sc
import time

# Simple progress bar in a loop
n = 100
for i in range(n):
    sc.progressbar(i+1, n)
    time.sleep(0.01)  # Simulate work

# With custom label
for i in range(50):
    sc.progressbar(i+1, 50, label='Processing files')
    time.sleep(0.02)

# Manual control
sc.progressbar(25, 100, label='Download progress')  # Shows 25%
```

## Nested Data Operations

### sc.getnested() / sc.setnested()

Access and modify deeply nested data structures using key lists. Essential for working with complex JSON configurations and hierarchical data.

```python
import sciris as sc

# Create nested data
data = {
    'experiment': {
        'config': {
            'model': {
                'name': 'transformer',
                'layers': 12,
                'hidden_size': 768
            },
            'training': {
                'epochs': 100,
                'batch_size': 32
            }
        },
        'results': {
            'accuracy': 0.95
        }
    }
}

# Get nested value using key list
model_name = sc.getnested(data, ['experiment', 'config', 'model', 'name'])
print(model_name)  # 'transformer'

# Set nested value
sc.setnested(data, ['experiment', 'config', 'training', 'epochs'], 200)
print(data['experiment']['config']['training']['epochs'])  # 200

# Create nested structure automatically
new_data = {}
sc.makenested(new_data, ['level1', 'level2', 'level3'], value='deep value')
print(new_data)  # {'level1': {'level2': {'level3': 'deep value'}}}

# Iterate over all nested keys
for keylist in sc.iternested(data):
    value = sc.getnested(data, keylist)
    print(f'{" > ".join(keylist)}: {value}')
```

### sc.search()

Search through nested objects to find values, keys, or patterns. Powerful for exploring complex data structures.

```python
import sciris as sc

# Complex nested object
obj = sc.objdict(
    users=[
        {'name': 'Alice', 'email': 'alice@example.com', 'age': 30},
        {'name': 'Bob', 'email': 'bob@example.com', 'age': 25},
    ],
    settings={'theme': 'dark', 'notifications': True},
    metadata={'version': '1.0', 'name': 'MyApp'}
)

# Search for a value
results = sc.search(obj, 'Alice')
print(results)  # Shows paths where 'Alice' was found

# Search for a key
results = sc.search(obj, 'email', method='key')
print(results)  # Shows all paths containing 'email' key

# Search with regex pattern
results = sc.search(obj, r'.*@example\.com', method='value')
```

## Memory and Profiling

### sc.checkmem() / sc.checkram()

Check memory usage of objects and current RAM consumption. Essential for optimizing memory-intensive applications.

```python
import sciris as sc
import numpy as np

# Check memory usage of an object
big_array = np.random.rand(1000, 1000)
sc.checkmem(big_array)
# Output: DataFrame showing memory usage

# Check memory of nested structure
data = {
    'small': np.random.rand(10, 10),
    'medium': np.random.rand(100, 100),
    'large': np.random.rand(500, 500)
}
sc.checkmem(data, descend=1)
# Shows memory breakdown for each key

# Check current RAM usage
start_ram = sc.checkram(to_string=False)
large_data = np.random.rand(10000, 1000)
print(sc.checkram(start=start_ram))  # Shows RAM increase
```

### sc.benchmark()

Quickly benchmark your system's Python and NumPy performance.

```python
import sciris as sc

# Run standard benchmark
results = sc.benchmark()
print(results)  # {'python': 11.2, 'numpy': 245.3} (MOPS)

# Benchmark only NumPy
numpy_mops = sc.benchmark(which='numpy')
if numpy_mops > 300:
    print('Fast system!')
elif numpy_mops < 100:
    print('Slow system')
else:
    print('Average system')

# More detailed benchmarking
results = sc.benchmark(repeats=10, verbose=True)
```

## 3D Plotting

### sc.plot3d() / sc.scatter3d()

Create 3D visualizations with minimal code. Simplifies matplotlib's 3D plotting interface.

```python
import sciris as sc
import numpy as np
import matplotlib.pyplot as plt

# Create 3D line plot
t = np.linspace(0, 10*np.pi, 1000)
x = np.sin(t)
y = np.cos(t)
z = t

sc.plot3d(x, y, z, c='index')
plt.title('3D Helix')
plt.show()

# 3D scatter plot
n = 500
x = np.random.randn(n)
y = np.random.randn(n)
z = x**2 + y**2

sc.scatter3d(x, y, z, c=z, cmap='viridis', s=20)
plt.title('3D Scatter with Color')
plt.show()

# Surface plot
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))

sc.surf3d(Z, cmap='coolwarm')
plt.title('Surface Plot')
plt.show()
```

## Summary

Sciris excels at reducing the friction in everyday scientific computing tasks. Its primary use cases include: rapid prototyping of data analysis pipelines where you need flexible containers and quick file I/O; building scientific simulations that require parallel processing and timing instrumentation; developing applications that work with complex nested configurations and hierarchical data; and creating reproducible research workflows with robust object serialization. The library is particularly valuable when you find yourself repeatedly writing boilerplate code for common operations like timing, file handling, or date manipulation.

Integration with existing codebases is straightforward since Sciris builds on standard libraries (NumPy, Pandas, Matplotlib) rather than replacing them. You can adopt individual functions as needed without committing to the entire library. Common patterns include using `sc.objdict` for configuration management, `sc.save/load` for checkpointing long-running computations, `sc.parallelize` for embarrassingly parallel workloads, and `sc.timer` for performance monitoring. The library's design philosophy emphasizes sensible defaults while allowing full customization when needed, making it suitable for both quick scripts and production applications.