### Full Docker Compose Setup Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Build and start the DataJoint test environment using Docker Compose. ```bash docker compose --profile test up djtest --build ``` -------------------------------- ### Alternative Development Setup with pip Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Install the project in editable mode with test dependencies and run pytest. ```bash pip install -e ".[test]" pytest tests/ ``` -------------------------------- ### Install and Run Pre-commit Hooks Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Install pre-commit hooks for the first time and run them manually across all files. ```bash pixi run pre-commit install # First time only pixi run pre-commit run --all-files # Run manually ``` -------------------------------- ### Quick Start with pixi Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Clone the repository and run tests or pre-commit hooks using pixi for dependency management. ```bash git clone https://github.com/datajoint/datajoint-python.git cd datajoint-python # Run tests (containers managed automatically) pixi run test # Run with coverage pixi run test-cov # Run pre-commit hooks pixi run pre-commit run --all-files ``` -------------------------------- ### NumPy-style Docstring Example Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Example of a public API docstring following NumPy style, including parameters, return values, raises, and examples. ```python def insert(self, rows, *, replace=False): """ Insert rows into the table. Parameters ---------- rows : iterable Rows to insert. Each row can be a dict, numpy record, or sequence. replace : bool, optional If True, replace existing rows with matching keys. Default is False. Returns ------- None Raises ------ DuplicateError When inserting a duplicate key without ``replace=True``. Examples -------- >>> Mouse.insert1({"mouse_id": 1, "dob": "2024-01-15"}) """ ``` -------------------------------- ### Install PostgreSQL Driver Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Install the PostgreSQL driver for DataJoint support. ```bash pip install -e ".[postgres]" # Installs psycopg2-binary ``` -------------------------------- ### Install datajoint with pip Source: https://github.com/datajoint/datajoint-python/blob/master/README.md Use pip to install the datajoint package. ```bash pip install datajoint ``` -------------------------------- ### Install datajoint with Conda Source: https://github.com/datajoint/datajoint-python/blob/master/README.md Use Conda to install the datajoint package from the conda-forge channel. ```bash conda install -c conda-forge datajoint ``` -------------------------------- ### Running External Containers for Debugging Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Start external MySQL, PostgreSQL, and MinIO containers using docker compose, run tests with DJ_USE_EXTERNAL_CONTAINERS=1, and then stop the containers. ```bash # MySQL + MinIO docker compose up -d db minio DJ_USE_EXTERNAL_CONTAINERS=1 pixi run test docker compose down # MySQL + PostgreSQL + MinIO docker compose up -d db postgres minio DJ_USE_EXTERNAL_CONTAINERS=1 pixi run test docker compose down ``` -------------------------------- ### Conda-Forge `meta.yaml` Dependencies Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md Example of the `requirements` section in a conda-forge `meta.yaml` file, detailing the host and run dependencies for the datajoint package. Ensure these match the `pyproject.toml`. ```yaml requirements: host: - python {{ python_min }} - pip - setuptools >=62.0 run: - python >={{ python_min }} - numpy - pandas - pymysql >=1.0 - minio - packaging # ... etc ``` -------------------------------- ### List and Get DataJoint Codecs Source: https://context7.com/datajoint/datajoint-python/llms.txt List all registered DataJoint codecs and retrieve a specific codec object by its name. ```python print(dj.list_codecs()) # ['blob', 'npy', 'attach', 'filepath', 'hash', 'schema', 'graph'] codec_obj = dj.get_codec('graph') ``` -------------------------------- ### Fetch Data as PyArrow Table Source: https://context7.com/datajoint/datajoint-python/llms.txt Convert query results into a PyArrow Table. Requires the 'arrow' extra to be installed (`pip install datajoint[arrow]`). ```python arrow_tbl = Subject.to_arrow() ``` -------------------------------- ### Fetch Data as Pandas DataFrame Source: https://context7.com/datajoint/datajoint-python/llms.txt Convert query results into a pandas DataFrame with primary keys as the index. Requires pandas to be installed. ```python df = Subject.to_pandas() # DataFrame with index=['subject_id'], columns=['species','dob'] ``` -------------------------------- ### Fetch Data as Polars DataFrame Source: https://context7.com/datajoint/datajoint-python/llms.txt Retrieve data as a polars DataFrame. Requires the 'polars' extra to be installed (`pip install datajoint[polars]`). ```python pl_df = Subject.to_polars(order_by='subject_id') ``` -------------------------------- ### Get Table Description and Alter Definition Source: https://context7.com/datajoint/datajoint-python/llms.txt Retrieve the DataJoint DDL string for a table using `describe()` and alter the table definition interactively or non-interactively. ```python # Table description (DataJoint DDL) print(Analysis.describe()) # Alter table definition (adds/modifies columns) Analysis.alter() # interactive prompt Analysis.alter(prompt=False) # immediate ``` -------------------------------- ### Get SHA256 Hash for Conda Package (Bash) Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md Bash command to retrieve the SHA256 hash for a Python package's source distribution from PyPI. It uses `curl` to fetch the JSON data and `jq` to parse and extract the specific hash. ```bash curl -sL https://pypi.org/pypi/datajoint/2.1.0/json | jq -r '.urls[] | select(.packagetype=="sdist") | .digests.sha256' ``` -------------------------------- ### Establishing Database Connections with dj.conn() Source: https://context7.com/datajoint/datajoint-python/llms.txt Shows how to establish and manage database connections using the singleton `dj.conn()` function or the explicit `dj.Connection` class. Covers connecting with default credentials, explicit credentials, forcing a reconnect, using TLS, and managing transactions. ```python import datajoint as dj # Singleton connection — credentials from dj.config or env vars connection = dj.conn() # Explicit credentials connection = dj.conn(host="localhost", user="root", password="secret") # Force re-connect (e.g., after credential rotation) connection = dj.conn(reset=True) # With TLS connection = dj.conn(host="secure-db.org", use_tls=True) # Direct Connection object (bypasses singleton) conn = dj.Connection("localhost", "root", "secret", port=3306) # Using within a transaction with conn.transaction: SomeTable.insert([{'id': 1, 'value': 42}]) OtherTable.insert([{'id': 1, 'result': 'ok'}]) # Rolls back automatically on exception # List all accessible schemas schemas = dj.list_schemas() print(schemas) # ['my_lab', 'shared_data', ...] ``` -------------------------------- ### Multi-Tenant Instances with dj.Instance Source: https://context7.com/datajoint/datajoint-python/llms.txt Illustrates the creation and usage of `dj.Instance` for thread-safe, isolated contexts, essential for multi-tenant applications or when `DJ_THREAD_SAFE=true`. Shows how to create an instance with specific configurations and bind schemas to it. ```python import datajoint as dj # Create a fully isolated instance inst = dj.Instance( host="db.example.org", user="alice", password="secret", backend="postgresql", safemode=False, # keyword config overrides ) # Access config and connection inst.config.display.limit = 50 print(inst.connection) # Create schema bound to this instance schema = inst.Schema("lab_alice") @schema class Session(dj.Manual): definition = ''' session_id : int --- date : date ''' # Access an existing table without defining a class tbl = inst.FreeTable("lab_alice.session") print(len(tbl)) ``` -------------------------------- ### Configuration Management with dj.config Source: https://context7.com/datajoint/datajoint-python/llms.txt Demonstrates reading, modifying, and transiently overriding DataJoint configuration settings. Configuration can be loaded from environment variables, a .secrets/ directory, or a datajoint.json file. It also shows how to configure object stores like S3. ```python import datajoint as dj # Read settings print(dj.config.database.host) # "localhost" print(dj.config.database.backend) # "mysql" print(dj.config.safemode) # True # Modify at runtime dj.config.database.host = "db.example.org" dj.config['database.user'] = "alice" # Dot-notation dict-style access dj.config['loglevel'] = "DEBUG" # Transient override (automatically restored) with dj.config.override(safemode=False, database__host="staging-db"): SomeTable.delete() # runs without confirmation prompt # Generate a project template (creates datajoint.json + .secrets/) dj.config.save_template() # minimal template dj.config.save_template("full-config.json", minimal=False) # Load from explicit file dj.config.load("my-project.json") # Object store configuration (S3 example) dj.config.stores['main'] = { "protocol": "s3", "endpoint": "s3.amazonaws.com", "bucket": "my-bucket", "access_key": "AKIAIOSFODNN7EXAMPLE", "secret_key": "wJalrXUtnFEMI/K7MDENG", "location": "my-project/data", } dj.config.stores['default'] = "main" ``` -------------------------------- ### AutoPopulate — `populate` and `make` Source: https://context7.com/datajoint/datajoint-python/llms.txt Drives automated computation. `populate()` calls `make(key)` for each primary key in `key_source` not yet present in the table. Supports parallel processing, progress display, error suppression, and distributed job queues. ```APIDOC ## AutoPopulate — `populate` and `make` Drives automated computation. `populate()` calls `make(key)` for each primary key in `key_source` not yet present in the table. Supports parallel processing, progress display, error suppression, and distributed job queues. ```python import datajoint as dj schema = dj.Schema('pipeline') @schema class Subject(dj.Manual): definition = 'subject_id: int --- name: varchar(64)' @schema class Analysis(dj.Computed): definition = ''' -> Subject --- result : float ''' def make(self, key): import time, random time.sleep(0.1) # simulate computation result = random.gauss(0, 1) self.insert1(dict(key, result=result)) # Basic populate (direct mode) Analysis.populate() # With progress bar Analysis.populate(display_progress=True) # Only populate specific subjects Analysis.populate(Subject & "subject_id < 100") # Stop after N calls Analysis.populate(max_calls=10) # Suppress errors, collect failures status = Analysis.populate(suppress_errors=True) print(status['success_count']) print(status['error_list']) # list of (key, error_message) tuples # Parallel (multi-process) Analysis.populate(processes=4, display_progress=True) # Check progress remaining, total = Analysis.progress(display=True) # Distributed mode with job table # (creates ~~analysis job table automatically) Analysis.populate(reserve_jobs=True) # reserve + run Analysis.jobs.refresh() # populate job queue for others to claim # Tripartite make for long computations (fetch outside transaction) class HeavyAnalysis(dj.Computed): definition = '-> Subject --- result: longblob' def make_fetch(self, key): return (Subject & key).to_dicts(), # returns tuple def make_compute(self, key, subjects): import time time.sleep(60) # long computation outside transaction return sum(s['subject_id'] for s in subjects), def make_insert(self, key, total): self.insert1(dict(key, result=total)) ``` ``` -------------------------------- ### AutoPopulate with Progress Bar Source: https://context7.com/datajoint/datajoint-python/llms.txt Run .populate() with the display_progress=True option to show a progress bar, which is helpful for long-running computations. ```python Analysis.populate(display_progress=True) ``` -------------------------------- ### Define DataJoint Schema and Manual Table Source: https://context7.com/datajoint/datajoint-python/llms.txt Set up a DataJoint schema and define a manual table. Manual tables require data to be inserted explicitly. ```python import datajoint as dj schema = dj.Schema('pipeline') @schema class Subject(dj.Manual): definition = 'subject_id: int --- name: varchar(64)' ``` -------------------------------- ### Basic AutoPopulate Execution Source: https://context7.com/datajoint/datajoint-python/llms.txt Trigger the population of a computed table using .populate(). This method computes entries for keys in the key_source that are not yet in the table. ```python Analysis.populate() ``` -------------------------------- ### Delete Data with Cascade Options Source: https://context7.com/datajoint/datajoint-python/llms.txt Use `delete()` for cascading deletes with transaction previews and optional confirmation. `delete_quick()` offers a non-cascading fast delete. Part table integrity can be managed with `part_integrity` options. ```python import datajoint as dj # Delete with cascade (prompts if safemode=True) n = (Subject & {'subject_id': 1}).delete() print(f"Deleted {n} rows") ``` ```python # Delete without prompt (Subject & "dob < '2023-01-01'").delete(prompt=False) ``` ```python # Non-cascading fast delete (fails if dependents exist) (Subject & {'subject_id': 99}).delete_quick() ``` ```python # Part table integrity options Analysis.Stats.delete(part_integrity='ignore') # allow deleting parts directly Analysis.Stats.delete(part_integrity='cascade') # also delete master rows ``` ```python # Drop a table (cascading drop) DeprecatedTable.drop() # prompts for confirmation DeprecatedTable.drop(prompt=False) # immediate drop ``` ```python # Part table drop Analysis.Stats.drop(part_integrity='ignore') ``` ```python # Schema drop (all tables) schema.drop(prompt=False) ``` -------------------------------- ### Parallel AutoPopulate Execution Source: https://context7.com/datajoint/datajoint-python/llms.txt Enable parallel processing for .populate() by specifying the number of processes. This can significantly speed up computations on multi-core machines. ```python Analysis.populate(processes=4, display_progress=True) ``` -------------------------------- ### Ordered Fetch for Previewing Source: https://context7.com/datajoint/datajoint-python/llms.txt Use dj.Top() within a relation to fetch a limited number of rows based on a specified order, useful for previews or sampling. ```python first5 = (Subject & dj.Top(limit=5, order_by='subject_id')).to_dicts() ``` -------------------------------- ### Visualize Table Dependencies with `dj.Diagram` Source: https://context7.com/datajoint/datajoint-python/llms.txt Use `dj.Diagram` to visualize the dependency graph of tables. It supports set operators for subgraph selection and cascade preview for delete/drop operations. Drawing requires matplotlib and pygraphviz. ```python import datajoint as dj # Diagram of a single table (no display; just the graph) diag = dj.Diagram(Analysis) ``` ```python # Expand n levels up (ancestors) or down (descendants) diag_up2 = dj.Diagram(Analysis) - 2 # 2 levels of parents diag_down1 = dj.Diagram(Analysis) + 1 # 1 level of children diag_both = dj.Diagram(Analysis) - 1 + 1 ``` ```python # Diagram of an entire schema diag_schema = dj.Diagram(schema) ``` ```python # Set operators combined = dj.Diagram(Analysis) + dj.Diagram(Subject) diff = dj.Diagram(schema) - dj.Diagram(DeprecatedTable) ``` ```python # Draw (requires matplotlib + pygraphviz) dj.Diagram(schema).draw() ``` ```python # Get row counts per table in the diagram diag.counts() ``` ```python # Temporarily change layout direction with dj.config.override(display__diagram_direction="TB"): dj.Diagram(schema).draw() ``` ```python # Preview cascade impact of a delete without executing preview_diag = dj.Diagram.cascade(Subject & "subject_id < 5") for ft in preview_diag: print(ft.full_table_name, len(ft)) ``` -------------------------------- ### Define Database Schema and Tables Source: https://context7.com/datajoint/datajoint-python/llms.txt Binds table classes to a database schema. The schema creates the database if needed and declares tables from definition strings. ```python import datajoint as dj schema = dj.Schema('my_pipeline') @schema class Subject(dj.Manual): definition = ''' subject_id : varchar(12) # unique subject identifier --- species : enum('mouse','rat','human') date_of_birth: date notes='' : varchar(2048) ''' @schema class Session(dj.Manual): definition = ''' -> Subject session_id : smallint --- session_date : date experimenter : varchar(64) ' ``` -------------------------------- ### Lineage Tracking and Job Management Source: https://context7.com/datajoint/datajoint-python/llms.txt Manage lineage tracking for DataJoint schemas, including rebuilding the lineage table and checking its existence. Also demonstrates refreshing and inspecting jobs for auto-populated tables. ```python # Lineage tracking print(schema.lineage) # dict mapping attr -> origin schema.rebuild_lineage() # rebuild ~lineage table print(schema.lineage_table_exists) # Job management for auto-populated tables Analysis.jobs.refresh() pending_keys = Analysis.jobs.pending.to_dicts() print(Analysis.jobs.errors.to_dicts()) # see error records ``` -------------------------------- ### Distributed AutoPopulate with Job Table Source: https://context7.com/datajoint/datajoint-python/llms.txt Configure .populate() for distributed processing by setting reserve_jobs=True. This creates and manages a job queue for distributed workers. ```python # (creates ~~analysis job table automatically) Analysis.populate(reserve_jobs=True) # reserve + run Analysis.jobs.refresh() # populate job queue for others to claim ``` -------------------------------- ### Tripartite make for Long Computations Source: https://context7.com/datajoint/datajoint-python/llms.txt Implement a tripartite make pattern for long computations by defining make_fetch, make_compute, and make_insert methods. This allows fetching data outside the transaction. ```python class HeavyAnalysis(dj.Computed): definition = '-> Subject --- result: longblob' def make_fetch(self, key): return (Subject & key).to_dicts(), # returns tuple def make_compute(self, key, subjects): import time time.sleep(60) # long computation outside transaction return sum(s['subject_id'] for s in subjects), def make_insert(self, key, total): self.insert1(dict(key, result=total)) ``` -------------------------------- ### Running DataJoint Tests Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Execute various test suites using pixi, including all tests, coverage, unit tests, specific integration files, or tests filtered by backend. ```bash pixi run test # All tests (both backends) pixi run test-cov # With coverage pixi run -e test pytest tests/unit/ # Unit tests only pixi run -e test pytest tests/integration/test_blob.py -v # Specific file pixi run -e test pytest -m mysql # MySQL tests only pixi run -e test pytest -m postgresql # PostgreSQL tests only ``` -------------------------------- ### Projection: Select, Rename, and Compute Attributes Source: https://context7.com/datajoint/datajoint-python/llms.txt Project specific attributes, rename them, or compute new ones using SQL-like expressions. Use '...' to include all existing attributes along with new computed ones. ```python pk_only = Subject.proj() selected = Subject.proj('species', 'dob') renamed = Subject.proj(birth_date='dob') computed = Subject.proj(age_days="DATEDIFF(NOW(), dob)") all_plus_computed = Subject.proj(..., age_days="DATEDIFF(NOW(), dob)") ``` -------------------------------- ### Release Notes Markdown Format Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md Use this markdown structure to organize release notes, categorizing changes into BREAKING, Added, Changed, Fixed, etc. Link to PRs/issues for detailed information. ```markdown ## What's Changed ### BREAKING CHANGES - **`fetch()` removed** — Use `to_dicts()`, `to_pandas()`, or `to_arrays()` instead (#123) ### Added - New `to_polars()` method for Polars DataFrame output (#456) - Support for custom codecs via `@codec` decorator (#789) ### Changed - Improved query performance for complex joins (2-3x faster) - Default connection timeout increased to 30s ### Fixed - Fixed incorrect NULL handling in aggregations (#234) ### Full Changelog https://github.com/datajoint/datajoint-python/compare/v2.0.0...v2.1.0 ``` -------------------------------- ### Schema Introspection and Iteration Source: https://context7.com/datajoint/datajoint-python/llms.txt Utilities for inspecting the schema, such as listing tables, checking for table existence, retrieving tables, and iterating through tables in dependency order. ```python # Schema introspection print(schema.list_tables()) # ['subject', 'session', ...] print('Subject' in schema) # True tbl = schema.get_table('Subject') # FreeTable tbl2 = schema['session'] # bracket-notation alias # Iterate all tables in dependency order for table in schema: print(table.full_table_name, len(table)) ``` -------------------------------- ### Download Path for Attachments/Filepaths Source: https://context7.com/datajoint/datajoint-python/llms.txt Specify a download path for attachments or filepaths using dj.config.override. This ensures downloaded files are stored in the designated location. ```python with dj.config.override(download_path='/tmp/downloads'): data = AttachmentTable.to_dicts() ``` -------------------------------- ### macOS Docker Host Configuration Source: https://github.com/datajoint/datajoint-python/blob/master/CONTRIBUTING.md Set the DOCKER_HOST environment variable for macOS Docker Desktop users if tests fail to connect. ```bash export DOCKER_HOST=unix://$HOME/.docker/run/docker.sock ``` -------------------------------- ### Trigger Documentation Build Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md Manually trigger a documentation rebuild for datajoint-docs using the GitHub CLI. This is useful after updating docstrings in the datajoint-python repository. ```bash gh workflow run development.yml --repo datajoint/datajoint-docs ``` -------------------------------- ### Fetch Data as List of Dictionaries Source: https://context7.com/datajoint/datajoint-python/llms.txt Retrieve data as a list of Python dictionaries using .to_dicts(). Supports ordering, limiting, and offsetting results. ```python rows = Subject.to_dicts() # [{'subject_id': 1, 'species': 'mouse', 'dob': datetime.date(2024, 1, 15)}, ...] rows = Subject.to_dicts(order_by='dob DESC', limit=10, offset=20) ``` -------------------------------- ### AutoPopulate for Specific Keys Source: https://context7.com/datajoint/datajoint-python/llms.txt Limit the population process to a subset of keys by providing a relation. This allows targeted computation for specific data entries. ```python Analysis.populate(Subject & "subject_id < 100") ``` -------------------------------- ### Define Table Tiers Source: https://context7.com/datajoint/datajoint-python/llms.txt Demonstrates the definition of various table tiers: Lookup, Manual, Imported, Computed, and Part. Each tier has specific characteristics and SQL name prefixes. ```python import datajoint as dj schema = dj.Schema('tiers_demo') @schema class Species(dj.Lookup): """Static reference data — auto-populated from `contents`.""" definition = ''' species : varchar(24) ''' contents = [['mouse'], ['rat'], ['human']] @schema class Subject(dj.Manual): definition = ''' subject_id : int auto_increment --- -> Species dob : date ''' @schema class RawData(dj.Imported): """Populated from external files; make() called once per Subject.""" definition = ''' -> Subject --- raw_signal : # serialised numpy array ''' def make(self, key): import numpy as np signal = np.load(f"/data/{key['subject_id']}.npy") self.insert1(dict(key, raw_signal=signal)) @schema class Analysis(dj.Computed): definition = ''' -> RawData --- mean_value : float std_value : float ''' class Stats(dj.Part): definition = ''' -> Analysis bin_idx : int --- bin_mean : float ''' def make(self, key): import numpy as np signal = (RawData & key).fetch1('raw_signal') self.insert1(dict(key, mean_value=float(np.mean(signal)), std_value=float(np.std(signal)))) bins = np.array_split(signal, 10) self.Stats.insert([dict(key, bin_idx=i, bin_mean=float(b.mean())) for i, b in enumerate(bins)]) ``` -------------------------------- ### Conda-Forge `meta.yaml` Configuration Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md Configuration snippet for the `recipe/meta.yaml` file in a conda-forge feedstock. It specifies the package version, source URL, and SHA256 hash for the distribution. ```yaml {% set version = "2.1.0" %} package: name: datajoint version: {{ version }} source: url: https://pypi.org/packages/source/d/datajoint/datajoint-{{ version }}.tar.gz sha256: build: number: 0 # Reset to 0 for new version ``` -------------------------------- ### Check AutoPopulate Progress Source: https://context7.com/datajoint/datajoint-python/llms.txt Monitor the progress of a population job using .progress(). It returns the number of remaining and total entries to process. ```python remaining, total = Analysis.progress(display=True) ``` -------------------------------- ### Fetch Data as NumPy Structured Array Source: https://context7.com/datajoint/datajoint-python/llms.txt Retrieve data as a NumPy structured array. Individual columns can be accessed by their names. ```python arr = Subject.to_arrays() print(arr['species']) # array(['mouse', 'rat', ...]) ``` -------------------------------- ### Access Tables with FreeTable (Singleton Connection) Source: https://context7.com/datajoint/datajoint-python/llms.txt Access a DataJoint table directly by its full name without defining a class, using a singleton connection. Retrieve data as dictionaries. ```python import datajoint as dj # Using singleton connection tbl = dj.FreeTable("my_schema.subject") print(tbl.to_dicts(limit=3)) ``` -------------------------------- ### Insert Data from File or DataFrame Source: https://context7.com/datajoint/datajoint-python/llms.txt Inserts data from a CSV file or a pandas DataFrame. For DataFrames, the index is automatically detected as the primary key. ```python # Insert from CSV file Subject.insert(Path('subjects.csv')) # Insert from pandas DataFrame df = pd.DataFrame({'subject_id': [4, 5], 'species': ['mouse', 'rat'], 'dob': ['2024-04-01', '2024-05-01']}) Subject.insert(df) # Round-trip: fetch → modify → re-insert df = Subject.to_pandas() # PK becomes index df.index # subject_id Subject.insert_dataframe(df, skip_duplicates=True) # auto-detects index as PK ``` -------------------------------- ### Fetch Specific Columns as Separate Arrays Source: https://context7.com/datajoint/datajoint-python/llms.txt Fetch specified columns as separate NumPy arrays. The order of returned arrays corresponds to the order of column names provided. ```python species_arr, dob_arr = Subject.to_arrays('species', 'dob') ``` -------------------------------- ### Top/Limit/Offset for Data Retrieval Source: https://context7.com/datajoint/datajoint-python/llms.txt Retrieve a limited number of rows based on ordering. dj.Top() is used within a relation to apply limit and ordering. ```python top5 = Subject & dj.Top(limit=5, order_by='dob DESC') ``` -------------------------------- ### Navigate Table Dependencies Source: https://context7.com/datajoint/datajoint-python/llms.txt Use graph-traversal methods on a DataJoint Table object to inspect relationships, including parents, children, ancestors, and descendants. ```python import datajoint as dj # Navigate the dependency graph print(Analysis.parents()) # list of parent table names print(Analysis.children(as_objects=True)) # list of FreeTable objects print(Analysis.ancestors()) # all upstream tables (topological order) print(Analysis.descendants()) # all downstream tables # Part tables of a master print(Analysis.parts()) print(Analysis.parts(as_objects=True)) ``` -------------------------------- ### Chunked Insert for Large Datasets Source: https://context7.com/datajoint/datajoint-python/llms.txt Performs bulk inserts using chunked batches to manage memory pressure for large datasets. Requires an iterator for the rows. ```python # Chunked insert for large datasets Subject.insert(large_rows_iter, chunk_size=10_000) ``` -------------------------------- ### Schema-Addressed Storage with NpyRef Source: https://context7.com/datajoint/datajoint-python/llms.txt Define a table to store numpy arrays in the default object store using the '' type. Use NpyRef to create a lightweight handle to a stored numpy array and fetch its data. ```python @schema class LargeData(dj.Manual): definition = ''' data_id : int --- matrix : # numpy array in default object store ''' # NpyRef — lightweight handle to a stored numpy array ref = dj.NpyRef(schema="my_schema", table="large_data", field="matrix", key={'data_id': 1}) arr = ref.fetch() # downloads only on access ``` -------------------------------- ### Aggregation with DataJoint Source: https://context7.com/datajoint/datajoint-python/llms.txt Aggregate data using the .aggr() method. Specify the table to join with and the aggregation function. Use exclude_nonmatching=True to only include entries with at least one match in the joined table. ```python from datajoint import U session_count = Subject.aggr(Session, n='count(*)') # all subjects, n=0 if none # Only subjects with at least one session: active_subjects = Subject.aggr(Session, n='count(*)', exclude_nonmatching=True) ``` -------------------------------- ### Extend DataJoint Type System with Codecs Source: https://context7.com/datajoint/datajoint-python/llms.txt The `dj.Codec` system allows extending DataJoint's type system with custom encode/decode logic. Codec subclasses auto-register on definition and can be used in table definitions with `` syntax. ```python import datajoint as dj import numpy as np import networkx as nx ``` -------------------------------- ### Fetching Data Source: https://context7.com/datajoint/datajoint-python/llms.txt Modern fetch API for retrieving data as Python objects, pandas DataFrames, polars DataFrames, PyArrow Tables, numpy arrays, or primary keys. Supports ordering, limiting, and offsetting results. fetch1 retrieves exactly one row. ```APIDOC ## Fetching Data — `to_dicts`, `to_pandas`, `to_polars`, `to_arrays`, `keys`, `fetch1` Modern fetch API (DataJoint 2.0). Returns decoded Python objects including numpy arrays, custom codec values, and downloaded file paths for attachment/filepath types. ```python import datajoint as dj # List of dicts (recommended) rows = Subject.to_dicts() # [{'subject_id': 1, 'species': 'mouse', 'dob': datetime.date(2024, 1, 15)}, ...] # With ordering, limit, offset rows = Subject.to_dicts(order_by='dob DESC', limit=10, offset=20) # pandas DataFrame (PK as index) df = Subject.to_pandas() # DataFrame with index=['subject_id'], columns=['species','dob'] # polars DataFrame (requires: pip install datajoint[polars]) pl_df = Subject.to_polars(order_by='subject_id') # PyArrow Table (requires: pip install datajoint[arrow]) arrow_tbl = Subject.to_arrow() # numpy structured array arr = Subject.to_arrays() print(arr['species']) # array(['mouse', 'rat', ...]) # Specific columns as separate arrays (tuple) species_arr, dob_arr = Subject.to_arrays('species', 'dob') # Primary keys only keys = Subject.keys() # [{'subject_id': 1}, {'subject_id': 2}, ...] # Fetch exactly one row (raises if 0 or >1 matches) row = (Subject & {'subject_id': 1}).fetch1() # {'subject_id': 1, 'species': 'mouse', 'dob': datetime.date(2024,1,15)} species = (Subject & {'subject_id': 1}).fetch1('species') # 'mouse' sp, dob = (Subject & {'subject_id': 1}).fetch1('species', 'dob') # Ordered fetch for previewing first5 = (Subject & dj.Top(limit=5, order_by='subject_id')).to_dicts() # Download path for attachments/filepaths with dj.config.override(download_path='/tmp/downloads'): data = AttachmentTable.to_dicts() ``` ``` -------------------------------- ### AutoPopulate with Maximum Calls Source: https://context7.com/datajoint/datajoint-python/llms.txt Control the number of make calls during population using max_calls. This is useful for testing or limiting the scope of a population run. ```python Analysis.populate(max_calls=10) ``` -------------------------------- ### Union of Relations Source: https://context7.com/datajoint/datajoint-python/llms.txt Combine results from multiple relations using the '+' operator. This is useful for creating a union of datasets that share a common structure. ```python all_subjects = (Subject & "species='mouse'") + (Subject & "species='rat'") ``` -------------------------------- ### Define DataJoint Computed Table with make Method Source: https://context7.com/datajoint/datajoint-python/llms.txt Define a computed table that automatically populates its data using the make method. The make method is called for each new primary key. ```python @schema class Analysis(dj.Computed): definition = ''' -> Subject --- result : float ''' def make(self, key): import time, random time.sleep(0.1) # simulate computation result = random.gauss(0, 1) self.insert1(dict(key, result=result)) ``` -------------------------------- ### Extract Version from Release Name (Bash) Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md This bash command uses grep with a Perl-compatible regular expression to extract a version string (e.g., X.Y.Z) from a GitHub release name. It's used in the `post_draft_release_published.yaml` workflow. ```bash VERSION=$(echo "${{ github.event.release.name }}" | grep -oP '\d+\.\d+\.\d+') ``` -------------------------------- ### Define and Use a Custom Graph Codec Source: https://context7.com/datajoint/datajoint-python/llms.txt Define a custom codec for serializing and deserializing networkx graphs. Use the custom codec in a DataJoint table definition and insert/fetch data. ```python class GraphCodec(dj.Codec): name = "graph" def get_dtype(self, is_store: bool) -> str: return "" # serializes to blob def encode(self, graph, *, key=None, store_name=None): return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())} def decode(self, stored, *, key=None): G = nx.Graph() G.add_nodes_from(stored['nodes']) G.add_edges_from(stored['edges']) return G def validate(self, value): if not isinstance(value, nx.Graph): raise TypeError(f"Expected networkx.Graph, got {type(value).__name__}") @schema class Connectivity(dj.Manual): definition = ''' conn_id : int --- graph_data : ''' G = nx.path_graph(5) Connectivity.insert1({'conn_id': 1, 'graph_data': G}) row = Connectivity.fetch1() assert len(row['graph_data'].nodes) == 5 ``` -------------------------------- ### Fetch Exactly One Row Source: https://context7.com/datajoint/datajoint-python/llms.txt Fetch a single row using .fetch1(). This method raises an error if zero or more than one row matches the query. ```python row = (Subject & {'subject_id': 1}).fetch1() # {'subject_id': 1, 'species': 'mouse', 'dob': datetime.date(2024,1,15)} species = (Subject & {'subject_id': 1}).fetch1('species') # 'mouse' sp, dob = (Subject & {'subject_id': 1}).fetch1('species', 'dob') ``` -------------------------------- ### Drop Entire Schema Source: https://context7.com/datajoint/datajoint-python/llms.txt Safely drops an entire database schema. This action prompts for confirmation unless `safemode=False` is explicitly set. ```python # Drop entire schema (prompts unless safemode=False) schema.drop() ``` -------------------------------- ### AutoPopulate with Error Suppression Source: https://context7.com/datajoint/datajoint-python/llms.txt Use suppress_errors=True to prevent population from stopping on errors. The status dictionary returned contains success counts and a list of errors. ```python status = Analysis.populate(suppress_errors=True) print(status['success_count']) print(status['error_list']) # list of (key, error_message) tuples ``` -------------------------------- ### Introspect Schemas with `virtual_schema` Source: https://context7.com/datajoint/datajoint-python/llms.txt Use `dj.virtual_schema` to introspect an existing database schema and auto-generate Python table classes. This is useful when working with data produced by a different codebase. `dj.VirtualModule` offers lower-level access. ```python import datajoint as dj # Access an existing schema without Python class definitions lab = dj.virtual_schema('my_lab_schema') ``` ```python # Table classes are auto-generated as module attributes df = lab.Subject.to_pandas() sessions = lab.Session & "experimenter='alice'" ``` ```python # Iterate tables for name in lab.schema.list_tables(): print(name, len(getattr(lab, name.replace('_', ' ').title().replace(' ', '')))) ``` ```python # VirtualModule (lower-level) lab2 = dj.VirtualModule('lab', 'my_lab_schema', connection=dj.conn()) lab2.Session.to_dicts(limit=5) ``` ```python # Schema bracket / attribute access schema['Subject'].to_dicts(limit=3) schema.get_table('session').to_dicts() ``` ```python # list_schemas helper all_schemas = dj.list_schemas() ``` -------------------------------- ### Fetch Primary Keys Only Source: https://context7.com/datajoint/datajoint-python/llms.txt Retrieve only the primary keys of the relation as a list of dictionaries. This is efficient for operations that only require identifiers. ```python keys = Subject.keys() # [{'subject_id': 1}, {'subject_id': 2}, ...] ``` -------------------------------- ### Search Conda-Forge Package (Bash) Source: https://github.com/datajoint/datajoint-python/blob/master/RELEASE_MEMO.md Bash command to search for the datajoint package within the conda-forge channel. This is used for verification after a release. ```bash conda search datajoint -c conda-forge ``` -------------------------------- ### NOT Wrapper for Conditions Source: https://context7.com/datajoint/datajoint-python/llms.txt Use dj.Not() to negate a condition, effectively selecting rows that do not meet the specified criteria. This is useful for exclusion filtering. ```python not_condition = dj.Not(Subject & "species='mouse'") others = Subject & not_condition ``` -------------------------------- ### Join Operations in DataJoint Source: https://context7.com/datajoint/datajoint-python/llms.txt Perform natural joins on shared primary keys or extend relations with attributes from another table. '*' denotes a natural join, while .extend() performs a left join. ```python joined = Session * RawData # inner join extended = Session.extend(Subject) # left join — add Subject attrs to Session ``` -------------------------------- ### Query Operators: Join and Semi-join Source: https://context7.com/datajoint/datajoint-python/llms.txt Performs joins and semi-joins between tables. Semi-joins are used to find rows in one table that have corresponding entries in another table. ```python # Restriction by another table (semi-join) with_data = Subject & RawData # subjects that have raw data without_data = Subject - RawData # subjects missing raw data ``` -------------------------------- ### Insert Multiple Rows Source: https://context7.com/datajoint/datajoint-python/llms.txt Inserts multiple rows into a table using a list of dictionaries. Supports skipping duplicate primary keys or replacing existing rows. ```python # Multiple rows (list of dicts) Subject.insert([ {'subject_id': 2, 'species': 'rat', 'dob': '2024-02-01'}, {'subject_id': 3, 'species': 'mouse', 'dob': '2024-03-10'}, ]) # Skip duplicate primary keys silently Subject.insert(rows, skip_duplicates=True) # Replace existing rows Subject.insert(rows, replace=True) ``` -------------------------------- ### Validate Data Before Insertion Source: https://context7.com/datajoint/datajoint-python/llms.txt The `validate()` method checks field existence, types, null constraints, and more without database interaction. It returns a validation result object that can be used to insert data or report errors. ```python import datajoint as dj rows = [ {'subject_id': 10, 'species': 'mouse', 'dob': '2024-01-01'}, {'subject_id': 11, 'species': 'INVALID_SPECIES'}, # missing dob {'subject_id': 12, 'unknown_field': 'x'}, # unknown field ] result = Subject.validate(rows) if result: Subject.insert(rows) else: print(result.summary()) # Validation failed: 2 error(s) in 3 rows # Row 1 in field 'dob': Required field 'dob' is missing # Row 2 in field 'unknown_field': Field 'unknown_field' not in table heading ``` ```python # Access individual errors for row_idx, field_name, msg in result.errors: print(f" Row {row_idx}, {field_name}: {msg}") ``` ```python # Raise on failure result.raise_if_invalid() ``` ```python # Ignore extra fields during validation result = Subject.validate(rows, ignore_extra_fields=True) ``` -------------------------------- ### Restriction by List of Dicts (OR Logic) Source: https://context7.com/datajoint/datajoint-python/llms.txt Use a list of dictionaries to apply OR logic for filtering relations. Each dictionary represents a condition. ```python some = Subject & [{'subject_id': 1}, {'subject_id': 3}] ``` -------------------------------- ### Insert Single Row Source: https://context7.com/datajoint/datajoint-python/llms.txt Inserts a single row into a table using a dictionary. ```python # Single row Subject.insert1({'subject_id': 1, 'species': 'mouse', 'dob': '2024-01-15'}) ``` -------------------------------- ### Access Tables with FreeTable (Explicit Connection) Source: https://context7.com/datajoint/datajoint-python/llms.txt Access a DataJoint table directly by its full name using an explicit connection object. Supports standard query operators and conversion to pandas DataFrames. ```python # Using explicit connection conn = dj.conn() tbl = dj.FreeTable(conn, "`my_schema`.`subject`") # FreeTable supports all query operators filtered = tbl & "species='mouse'" df = filtered.to_pandas() # Inspect structure print(tbl.heading.names) print(tbl.primary_key) print(tbl.describe()) # DataJoint DDL string # Dependency navigation parents = tbl.parents() children = tbl.children(as_objects=True) ancestors = tbl.ancestors() descendants = tbl.descendants() parts = tbl.parts(as_objects=True) ``` -------------------------------- ### Query Operators: Restriction Source: https://context7.com/datajoint/datajoint-python/llms.txt Applies restrictions (AND, NOT) to query tables based on conditions. Supports string-based conditions and dictionary-based lookups. ```python # Restriction: & (AND), - (NOT) mice = Subject & "species='mouse'" young = Subject & "dob > '2024-01-01'" mice_and_young = Subject & "species='mouse'" & "dob > '2024-01-01'" not_mice = Subject - "species='mouse'" # Restriction by dict one = Subject & {'subject_id': 1} ``` -------------------------------- ### Update Single Row Source: https://context7.com/datajoint/datajoint-python/llms.txt Updates a single existing row in a table. All primary key fields are required in the dictionary. ```python # Update a single existing row (all PK fields required) Subject.update1({'subject_id': 1, 'dob': '2024-01-20'}) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.