### Install uv Only

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Installs uv without setting up the full development environment.

```bash
make install-uv
```

--------------------------------

### Install PyIceberg Package

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/verify-release.md

Install the PyIceberg package from the source distribution using the `make install` command. This is a prerequisite for running tests.

```sh
make install
```

--------------------------------

### Install PyIceberg from Source

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Clones the repository and installs PyIceberg locally with optional extras for S3 and Hive support.

```bash
git clone https://github.com/apache/iceberg-python.git
cd iceberg-python
pip3 install -e ".[s3fs,hive]"
```

--------------------------------

### Install PyIceberg with Bodo support

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Install PyIceberg with the necessary dependencies for Bodo integration. This command should be run in your environment.

```bash
pip install pyiceberg['bodo']
```

--------------------------------

### Install PyIceberg Directly from GitHub

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Installs PyIceberg directly from a GitHub repository, useful for testing unreleased changes. Includes PyArrow support.

```bash
pip install "git+https://github.com/apache/iceberg-python.git#egg=pyiceberg[pyarrow]"
```

--------------------------------

### Setup: Connect to a Catalog

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/pyiceberg_example.ipynb

Configure and load a catalog using SQLite for local testing. This requires setting up a temporary warehouse directory.

```python
# Import required libraries
import os
import tempfile

import pyarrow.compute as pc
```

```python
# Create a temporary warehouse location
warehouse_path = tempfile.mkdtemp(prefix="iceberg_warehouse_")
print(f"Warehouse location: {warehouse_path}")
```

```python
# Configure and load the catalog
catalog = load_catalog(
    "default",
    type="sql",
    uri=f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
    warehouse=f"file://{warehouse_path}",
)

print("Catalog loaded successfully!")
print(f"Namespaces: {list(catalog.list_namespaces())}")
```

--------------------------------

### Install PyIceberg with Daft support

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Install PyIceberg with the necessary dependencies for Daft integration. This command should be run in your environment.

```bash
pip install pyiceberg['daft']
```

--------------------------------

### List Namespaces

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Retrieve a list of all existing namespaces in the catalog. The example asserts that the newly created 'docs_example' namespace is present.

```python
ns = catalog.list_namespaces()

assert ns == [("docs_example",)]
```

--------------------------------

### Install PyIceberg with Polars support

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Install PyIceberg with the necessary dependencies for Polars integration. This command should be run in your environment.

```bash
pip install pyiceberg['polars']
```

--------------------------------

### Launch Jupyter Lab for Basic Experimentation

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Install notebook dependencies and launch Jupyter Lab in the 'notebooks/' directory for basic PyIceberg experimentation without external infrastructure.

```bash
make notebook
```

--------------------------------

### Import PyIceberg Libraries

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/pyiceberg_example.ipynb

Import necessary libraries for PyIceberg operations and display the installed version.

```python
# Import required libraries
import pyarrow as pa

import pyiceberg
from pyiceberg.catalog import load_catalog

print(f"PyIceberg version: {pyiceberg.__version__}")
```

--------------------------------

### SimpleLocationProvider Data File Path (Partitioned)

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Example of a data file path generated by SimpleLocationProvider for a table partitioned by 'category'.

```txt
s3://bucket/ns/table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
```

--------------------------------

### Install PyIceberg Nightly Build

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/nightly-build.md

Use this command to install the latest nightly build of PyIceberg from TestPyPI. This is recommended for testing purposes only.

```shell
pip install -i https://test.pypi.org/simple/ --pre pyiceberg
```

--------------------------------

### Install PyIceberg with S3 and Hive Support

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/index.md

Install the latest release of PyIceberg with optional dependencies for S3 file system and Hive metastore.

```sh
pip install "pyiceberg[s3fs,hive]"
```

--------------------------------

### SimpleLocationProvider Data File Path (Non-Partitioned)

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Example of a data file path generated by SimpleLocationProvider for a non-partitioned table.

```txt
s3://bucket/ns/table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
```

--------------------------------

### YAML Configuration for Multiple Catalogs

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Example of a .pyiceberg.yaml file configuring both a Hive and a REST catalog. This demonstrates how to define multiple catalog configurations in a single file.

```yaml
catalog:
  hive:
    uri: thrift://127.0.0.1:9083
    s3.endpoint: http://127.0.0.1:9000
    s3.access-key-id: admin
    s3.secret-access-key: password
  rest:
    uri: https://rest-server:8181/
    warehouse: my-warehouse
```

--------------------------------

### Complete Filter Examples

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/row-filter-syntax.md

Demonstrates combining various row filter operations to create complex filtering conditions.

```sql
-- Complex filter with multiple conditions
status = 'active' AND age > 18 AND NOT (country IN ('US', 'CA'))
```

```sql
-- Filter with string pattern matching
name LIKE 'John%' AND age >= 21
```

```sql
-- Filter with NULL checks and numeric comparisons
price IS NOT NULL AND price > 100 AND quantity > 0
```

```sql
-- Filter with multiple logical operations
(status = 'pending' OR status = 'processing') AND NOT (priority = 'low')
```

--------------------------------

### Install Pre-commit Hooks

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Install pre-commit hooks to automatically run linters and formatters on code changes before each commit. This helps maintain code quality.

```bash
prek install
```

--------------------------------

### PyArrow Table Schema Example

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Example structure of a PyArrow Table returned by PyIceberg scans. Shows column names and data types.

```text
pyarrow.Table
VendorID: int64
tpep_pickup_datetime: timestamp[us, tz=+00:00]
tpep_dropoff_datetime: timestamp[us, tz=+00:00]
----
VendorID: [[2,1,2,1,1,...,2,2,2,2,2],[2,1,1,1,2,...,1,1,2,1,2],...,[2,2,2,2,2,...,2,6,6,2,2],[2,2,2,2,2,...,2,2,2,2,2]]
tpep_pickup_datetime: [[2021-04-01 00:28:05.000000,...,2021-04-30 23:44:25.000000]]
tpep_dropoff_datetime: [[2021-04-01 00:47:59.000000,...,2021-05-01 00:14:47.000000]]
```

--------------------------------

### ObjectStoreLocationProvider Data File Path (Partitioned)

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Example of a data file path generated by ObjectStoreLocationProvider for a partitioned table, including binary directories for hash distribution.

```txt
s3://bucket/ns/table/data/0101/0110/1001/10110010/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
```

--------------------------------

### Define Partition Specification

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Define how a table's data should be partitioned. This example partitions data by the 'day' of the 'datetime' field.

```python
from pyiceberg.partitioning import PartitionSpec, PartitionField

partition_spec = PartitionSpec(
    PartitionField(
        source_id=1, field_id=1000, transform="day", name="datetime_day"
    )
)
```

--------------------------------

### Launch Jupyter Lab with Infrastructure for Spark Integration

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Spin up the full integration test infrastructure (Spark, REST Catalog, Hive Metastore, Minio) via Docker Compose and launch Jupyter Lab for Spark integration examples.

```bash
make notebook-infra
```

--------------------------------

### ObjectStoreLocationProvider Data File Path (Partition Exclusion)

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Example of a data file path generated by ObjectStoreLocationProvider with partition exclusion enabled, omitting partition keys and values.

```txt
s3://bucket/ns/table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
```

--------------------------------

### YAML Configuration for REST Catalog

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Example of a .pyiceberg.yaml file to configure a REST catalog named 'prod'. This file specifies the catalog's URI and credentials.

```yaml
catalog:
  prod:
    uri: http://rest-catalog/ws/
    credential: t-1234:secret
```

--------------------------------

### Configure REST Catalog via YAML

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Example configuration for a REST catalog named 'test_catalog' using a YAML file. This specifies the URI and credentials.

```yaml
catalog:
  test_catalog:
    uri: http://rest-catalog/ws/
    credential: t-1234:secret
```

--------------------------------

### Example File Metadata Data

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

This shows sample data for the PyArrow Table returned by `table.inspect.files()`. It illustrates the actual values for file paths, record counts, and other metadata fields for two Parquet files.

```python
content: [[0,0]]
file_path: [["s3://warehouse/default/table_metadata_files/data/00000-0-9ea7d222-6457-467f-bad5-6fb125c9aa5f.parquet","s3://warehouse/default/table_metadata_files/data/00000-0-afa8893c-de71-4710-97c9-6b01590d0c44.parquet"]]
file_format: [["PARQUET","PARQUET"]]
spec_id: [[0,0]]
record_count: [[3,3]]
file_size_in_bytes: [[5459,5459]]
column_sizes: [[keys:[1,2,3,4,5,...,8,9,10,11,12]values:[49,78,128,94,118,...,118,118,94,78,109],keys:[1,2,3,4,5,...,8,9,10,11,12]values:[49,78,128,94,118,...,118,118,94,78,109]]]
value_counts: [[keys:[1,2,3,4,5,...,8,9,10,11,12]values:[3,3,3,3,3,...,3,3,3,3,3],keys:[1,2,3,4,5,...,8,9,10,11,12]values:[3,3,3,3,3,...,3,3,3,3,3]]]
null_value_counts: [[keys:[1,2,3,4,5,...,8,9,10,11,12]values:[1,1,1,1,1,...,1,1,1,1,1],keys:[1,2,3,4,5,...,8,9,10,11,12]values:[1,1,1,1,1,...,1,1,1,1,1]]]
nan_value_counts: [[keys:[]values:[],keys:[]values:[]]]
lower_bounds: [[keys:[1,2,3,4,5,...,8,9,10,11,12]values:[00,61,61616161616161616161616161616161,01000000,0100000000000000,...,009B6ACA38F10500,009B6ACA38F10500,9E4B0000,01,00000000000000000000000000000000],keys:[1,2,3,4,5,...,8,9,10,11,12]values:[00,61,61616161616161616161616161616161,01000000,0100000000000000,...,009B6ACA38F10500,009B6ACA38F10500,9E4B0000,01,00000000000000000000000000000000]]]
upper_bounds:[[keys:[1,2,3,4,5,...,8,9,10,11,12]values:[00,61,61616161616161616161616161616161,01000000,0100000000000000,...,009B6ACA38F10500,009B6ACA38F10500,9E4B0000,01,00000000000000000000000000000000],keys:[1,2,3,4,5,...,8,9,10,11,12]values:[00,61,61616161616161616161616161616161,01000000,0100000000000000,...,009B6ACA38F10500,009B6ACA38F10500,9E4B0000,01,00000000000000000000000000000000]]] 
key_metadata: [[0100,0100]]
split_offsets:[[[],[]]]
equality_ids:[[[],[]]]
sort_order_id:[[[],[]]]
readable_metrics: [
  -- is_valid: all not null

```

--------------------------------

### PyArrow Import Example

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Imports the pyarrow module, commonly used for data manipulation and type handling in PyIceberg.

```python
import pyarrow as pa
```

--------------------------------

### Describe a Table in JSON Format

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/cli.md

Use the '--output json' flag with the 'describe' command to get table details in JSON format, suitable for programmatic processing and automation. The output is piped to 'jq' for pretty-printing.

```sh
➜  pyiceberg --output json describe nyc.taxis | jq
{
  "identifier": [
    "nyc",
    "taxis"
  ],
  "metadata_location": "file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json",
  "metadata": {
    "location": "file:/.../nyc.db/taxis",
    "table-uuid": "6cdfda33-bfa3-48a7-a09e-7abb462e3460",
    "last-updated-ms": 1661783158061,
    "last-column-id": 19,
    "schemas": [
      {
        "type": "struct",
        "fields": [
          {
            "id": 1,
            "name": "VendorID",
            "type": "long",
            "required": false
          },
...
          {
            "id": 19,
            "name": "airport_fee",
            "type": "double",
            "required": false
          }
        ],
        "schema-id": 0,
        "identifier-field-ids": []
      }
    ],
    "current-schema-id": 0,
    "partition-specs": [
      {
        "spec-id": 0,
        "fields": []
      }
    ],
    "default-spec-id": 0,
    "last-partition-id": 999,
    "properties": {
      "owner": "root",
      "write.format.default": "parquet"
    },
    "current-snapshot-id": 5937117119577207000,
    "snapshots": [
      {
        "snapshot-id": 5937117119577207000,
        "timestamp-ms": 1661783158061,
        "manifest-list": "file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro",
        "summary": {
          "operation": "append",
          "spark.app.id": "local-1661783139151",
          "added-data-files": "1",
          "added-records": "2979431",
          "added-files-size": "46600777",
          "changed-partition-count": "1",
          "total-records": "2979431",
          "total-files-size": "46600777",
          "total-data-files": "1",
          "total-delete-files": "0",
          "total-position-deletes": "0",
          "total-equality-deletes": "0"
        },
        "schema-id": 0
      }
    ],
    "snapshot-log": [
      {
        "snapshot-id": "5937117119577207079",

```

--------------------------------

### REST Catalog Authentication Configuration

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Example YAML configuration for a REST catalog with pluggable authentication. Replace `<auth_type>` with the desired authentication method (e.g., `oauth2`, `basic`, `custom`).

```yaml
catalog:
  default:
    type: rest
    uri: http://rest-catalog/ws/
    auth:
      type: <auth_type>
      <auth_type>:
        # Type-specific configuration
      impl: <custom_class_path>  # Only for custom auth
```

--------------------------------

### Add Files with Custom Snapshot Properties and Duplicate Check

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

This example demonstrates adding Parquet files to an Iceberg table while specifying custom snapshot properties and explicitly enabling duplicate file checking. It also includes assertions to verify the NameMapping and the snapshot property.

```python
# Assume an existing Iceberg table object `tbl`

file_paths = [
    "s3a://warehouse/default/existing-1.parquet",
    "s3a://warehouse/default/existing-2.parquet",
]

# Custom snapshot properties
snapshot_properties = {"abc": "def"}

# Enable duplicate file checking
check_duplicate_files = True

# Add the Parquet files to the Iceberg table without rewriting
tbl.add_files(
    file_paths=file_paths,
    snapshot_properties=snapshot_properties,
    check_duplicate_files=check_duplicate_files
)

# NameMapping must have been set to enable reads
assert tbl.name_mapping() is not None

# Verify that the snapshot property was set correctly
assert tbl.metadata.snapshots[-1].summary["abc"] == "def"
```

--------------------------------

### Initiate Schema Transaction

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Start a transaction to perform multiple schema modifications, such as adding columns and updating properties.

```python
with table.transaction() as transaction:
    with transaction.update_schema() as update_schema:
        update.add_column("some_other_field", IntegerType(), "doc")
    # ... Update properties etc
```

--------------------------------

### Represent Point Data in WKB Format

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/geospatial.md

Example of a Point(0, 0) represented as Well-Known Binary (WKB) bytes. PyIceberg uses WKB for storing geometry and geography values.

```python
# Example: Point(0, 0) in WKB format
point_wkb = bytes.fromhex("0101000000000000000000000000000000000000")
```

--------------------------------

### Serve Docs Locally

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/README.md

Run this command to serve the documentation locally. Ensure you are in the root directory of the project.

```sh
make docs-serve
```

--------------------------------

### Catalog Instantiation

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Demonstrates how to load an Iceberg catalog using configuration from a `.pyiceberg.yaml` file or by passing properties directly.

```APIDOC
## Catalog Instantiation

### Description
Instantiate an Iceberg catalog to read and write data. Catalogs can be loaded by name from a configuration file or by providing properties directly.

### Method
`load_catalog(name: str, **properties) -> Catalog`

### Parameters
- **name** (str) - The name of the catalog to load.
- **properties** (dict) - A dictionary of properties to configure the catalog.

### Example
```python
from pyiceberg.catalog import load_catalog

# Load catalog by name from .pyiceberg.yaml
catalog_by_name = load_catalog(name="prod")

# Load catalog by passing properties directly
catalog_direct = load_catalog(
    "docs",
    **{
        "uri": "http://127.0.0.1:8181",
        "s3.endpoint": "http://127.0.0.1:9000",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
        "s3.access-key-id": "admin",
        "s3.secret-access-key": "password",
    }
)
```
```

--------------------------------

### Remove Deprecated API Example

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Example of a deprecated API that should be removed before a release.

```python
@deprecated(
    deprecated_in="0.1.0",
    removed_in="0.2.0",
    help_message="Please use load_something_else() instead",
)
```

--------------------------------

### Deprecation Message Example

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Example of using the deprecation_message function to inform users about API changes.

```python
deprecation_message(
    deprecated_in="0.1.0",
    removed_in="0.2.0",
    help_message="The old_property is deprecated. Please use the something_else property instead.",
)
```

--------------------------------

### Display Pyiceberg CLI Help

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/cli.md

Use the --help flag to display available commands and options for the pyiceberg CLI. This is useful for understanding the CLI's capabilities and structure.

```sh
➜  pyiceberg --help
Usage: pyiceberg [OPTIONS] COMMAND [ARGS]...

Options:
  --catalog TEXT
  --verbose BOOLEAN
  --output [text|json]
  --ugi TEXT
  --uri TEXT
  --credential TEXT
  --warehouse TEXT
  --help                Show this message and exit.

Commands:
  create      Operation to create a namespace.
  describe    Describe a namespace or a table.
  drop        Operations to drop a namespace or table.
  files       List all the files of the table.
  list        List tables or namespaces.
  list-refs   List all the refs in the provided table.
  location    Return the location of the table.
  properties  Properties on tables/namespaces.
  rename      Rename a table.
  schema      Get the schema of the table.
  spec        Return the partition spec of the table.
  uuid        Return the UUID of the table.
  version     Print pyiceberg version.
```

--------------------------------

### Set Release Version and Verification Directory

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/verify-release.md

Set environment variables for the PyIceberg version to verify and a temporary directory to store downloaded artifacts. Replace `<version>` with the actual release candidate version.

```sh
export PYICEBERG_VERSION=<version> # e.g. 0.6.1rc3
export PYICEBERG_VERIFICATION_DIR=/tmp/pyiceberg/${PYICEBERG_VERSION}
```

--------------------------------

### Complex Expression Example

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/expression-dsl.md

Combine multiple predicates and logical operators to construct intricate filtering logic. This example demonstrates nested AND and OR operations.

```python
from pyiceberg.expressions import And, Or, Not, EqualTo, GreaterThan, LessThan, In

# (age >= 18 AND age <= 65) AND (status = 'active' OR status = 'pending')
complex_filter = And(
    And(
        GreaterThanOrEqual("age", 18),
        LessThanOrEqual("age", 65)
    ),
    Or(
        EqualTo("status", "active"),
        EqualTo("status", "pending")
    )
)

# NOT (age < 18 OR age > 65)
age_in_range = Not(
    Or(
        LessThan("age", 18),
        GreaterThan("age", 65)
    )
)
```

--------------------------------

### Import GPG Keys

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/verify-release.md

Download the Apache Iceberg KEYS file and import it into your GPG keyring. This is the first step in verifying release signatures.

```sh
curl https://downloads.apache.org/iceberg/KEYS -o KEYS
gpg --import KEYS
```

--------------------------------

### Upgrade Pip

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/index.md

Ensure you are using an up-to-date version of pip before installing PyIceberg.

```sh
pip install --upgrade pip
```

--------------------------------

### Generate All Vendor Packages

Source: https://github.com/apache/iceberg-python/blob/main/vendor/README.md

Run this command to generate all vendor packages. Ensure 'make all' is executed for a complete build.

```bash
make all
```

--------------------------------

### Configure Custom Catalog Implementation

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Set up a custom catalog implementation by specifying the Python module and class path, along with any custom configuration keys.

```yaml
catalog:
  default:
    py-catalog-impl: mypackage.mymodule.MyCatalog
    custom-key1: value1
    custom-key2: value2
```

--------------------------------

### Update Column Requirement

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Change the nullability of a column, for example, making a required field optional. This can be an incompatible change.

```python
with table.update_schema() as update:
    # Make a field optional
    update.update_column("symbol", required=False)
```

--------------------------------

### Create a Partitioned Iceberg Table

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

This snippet demonstrates creating a partitioned Iceberg table with a schema and partition specification.

```python
from pyiceberg.schema import Schema
from pyiceberg.types import DoubleType, NestedField, StringType
from pyiceberg.partitioning import PartitionSpec, PartitionField, IdentityTransform

schema = Schema(
    NestedField(1, "city", StringType(), required=False),
    NestedField(2, "lat", DoubleType(), required=False),
    NestedField(3, "long", DoubleType(), required=False),
)

tbl = catalog.create_table(
    "default.cities",
    schema=schema,
    partition_spec=PartitionSpec(PartitionField(source_id=1, field_id=1001, transform=IdentityTransform(), name="city_identity"))
)
```

--------------------------------

### Load Catalog Programmatically

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Instantiate an Iceberg catalog by passing configuration properties directly to `load_catalog`. This is an alternative to using a `.pyiceberg.yaml` file.

```python
from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "docs",
    **{
        "uri": "http://127.0.0.1:8181",
        "s3.endpoint": "http://127.0.0.1:9000",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
        "s3.access-key-id": "admin",
        "s3.secret-access-key": "password",
    }
)
```

--------------------------------

### Prepare and Verify License Documentation

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/verify-release.md

Extract the source tarball, navigate into the extracted directory, and run the `./dev/check-license` script to validate the license headers. This ensures compliance with Apache licensing requirements.

```sh
export PYICEBERG_RELEASE_VERSION=${PYICEBERG_VERSION/rc?/}  # remove rcX qualifier
tar xzf pyiceberg-${PYICEBERG_RELEASE_VERSION}.tar.gz
cd pyiceberg-${PYICEBERG_RELEASE_VERSION}

./dev/check-license
```

--------------------------------

### Update Column Type

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Modify the data type of an existing column, for example, promoting a float to a double. This operation might be incompatible.

```python
with table.update_schema() as update:
    # Promote a float to a double
    update.update_column("bid", field_type=DoubleType())
```

--------------------------------

### Get All Table Properties using Python CLI

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/cli.md

Retrieve all properties associated with a given table. This helps in understanding the current configuration of a table.

```sh
➜  pyiceberg properties get table nyc.taxis
```

--------------------------------

### Convert Iceberg Table to Polars LazyFrame

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Converts an Iceberg table to a Polars LazyFrame for efficient data manipulation and filtering. Requires Polars to be installed.

```python
lf = iceberg_table.to_polars().filter(pl.col("ticket_id") > 10)
print(lf.collect())
```

--------------------------------

### Create Sample Data

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/pyiceberg_example.ipynb

Generate a sample PyArrow table with taxi-like data for writing to an Iceberg table.

```python
# Create sample data using PyArrow

# Sample taxi-like data
data = {
    "vendor_id": [1, 2, 1, 2, 1],
    "trip_distance": [1.5, 2.3, 0.8, 5.2, 3.1],
    "fare_amount": [10.0, 15.5, 6.0, 22.0, 18.0],
    "tip_amount": [2.0, 3.0, 1.0, 4.5, 3.5],
    "passenger_count": [1, 2, 1, 3, 2],
}

df = pa.table(data)
print("Sample data:")
print(df)
```

--------------------------------

### Conclude Vote Thread

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Example text for concluding a release vote thread on the dev mailing list after the voting period has ended and requirements are met.

```text
Thanks everyone for voting! The 72 hours have passed, and a minimum of 3 binding votes have been cast:

+1 Foo Bar (non-binding)
...
+1 Fokko Driesprong (binding)

The release candidate has been accepted as PyIceberg <VERSION>. Thanks everyone, when all artifacts are published the announcement will be sent out.

Kind regards,
```

--------------------------------

### Load SqlCatalog for Local Testing

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/index.md

Load the SqlCatalog implementation to manage Iceberg tables using a local SQLite database and filesystem warehouse. This is suitable for testing but not recommended for production.

```python
from pyiceberg.catalog import load_catalog

warehouse_path = "/tmp/warehouse"
catalog = load_catalog(
    "default",
    **{
        'type': 'sql',
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)
```

--------------------------------

### Configure OneLake REST Catalog with Entra ID Authentication

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Use this configuration for OneLake REST catalog with Entra ID authentication. Ensure pyiceberg[entra-auth] is installed.

```yaml
catalog:
  onelake_catalog:
    type: rest
    uri: https://onelake.table.fabric.microsoft.com/iceberg
    warehouse: <fabric_workspace_id>/<fabric_data_item_id>
    auth:
      type: entra
    adls.account-name: onelake
    adls.account-host: onelake.blob.fabric.microsoft.com
```

--------------------------------

### Configure REST Catalog via Environment Variables

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Set environment variables to configure the URI and credentials for a REST catalog named 'test_catalog'.

```bash
export PYICEBERG_CATALOG__TEST_CATALOG__URI=thrift://localhost:9083
export PYICEBERG_CATALOG__TEST_CATALOG__ACCESS_KEY_ID=username
export PYICEBERG_CATALOG__TEST_CATALOG__SECRET_ACCESS_KEY=password
```

--------------------------------

### Get Specific Table Property using Python CLI

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/cli.md

Retrieve the value of a single, specific property for a table. This is useful for checking the status of a particular configuration setting.

```sh
➜  pyiceberg properties get table nyc.taxis write.metadata.delete-after-commit.enabled
```

--------------------------------

### Create a Namespace

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/pyiceberg_example.ipynb

Create a new namespace within the loaded catalog. Check available namespaces after creation.

```python
# Create a namespace
catalog.create_namespace("default")
print(f"Available namespaces: {list(catalog.list_namespaces())}")
```

--------------------------------

### Custom UUID Location Provider Implementation

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Example of a custom LocationProvider that generates unique data file locations using UUIDs. This implementation extends the base LocationProvider and customizes the new_data_location method.

```python
import uuid

class UUIDLocationProvider(LocationProvider):
    def __init__(self, table_location: str, table_properties: Properties):
        super().__init__(table_location, table_properties)

    def new_data_location(self, data_file_name: str, partition_key: Optional[PartitionKey] = None) -> str:
        # Can use any custom method to generate a file path given the partitioning information and file name
        prefix = f"{self.table_location}/{uuid.uuid4()}"
        return f"{prefix}/{partition_key.to_path()}/{data_file_name}" if partition_key else f"{prefix}/{data_file_name}"
```

--------------------------------

### Configure Glue Catalog with Static Credentials

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure a Glue catalog using static AWS access key ID, secret access key, and session token.

```yaml
catalog:
  default:
    type: glue
    glue.access-key-id: <ACCESS_KEY_ID>
    glue.secret-access-key: <SECRET_ACCESS_KEY>
    glue.session-token: <SESSION_TOKEN>
    glue.region: <REGION_NAME>
    s3.endpoint: http://localhost:9000
    s3.access-key-id: admin
    s3.secret-access-key: password
```

--------------------------------

### Describe a Table in Default Format

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/cli.md

Use the 'describe' command to view detailed information about a specific table, including its metadata, schema, and snapshots. The output is in a human-readable format.

```sh
➜  pyiceberg describe nyc.taxis
Table format version  1
Metadata location     file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json
Table UUID            6cdfda33-bfa3-48a7-a09e-7abb462e3460
Last Updated          1661783158061
Partition spec        []
Sort order            []
Current schema        Schema, id=0
├── 1: VendorID: optional long
├── 2: tpep_pickup_datetime: optional timestamptz
├── 3: tpep_dropoff_datetime: optional timestamptz
├── 4: passenger_count: optional double
├── 5: trip_distance: optional double
├── 6: RatecodeID: optional double
├── 7: store_and_fwd_flag: optional string
├── 8: PULocationID: optional long
├── 9: DOLocationID: optional long
├── 10: payment_type: optional long
├── 11: fare_amount: optional double
├── 12: extra: optional double
├── 13: mta_tax: optional double
├── 14: tip_amount: optional double
├── 15: tolls_amount: optional double
├── 16: improvement_surcharge: optional double
├── 17: total_amount: optional double
├── 18: congestion_surcharge: optional double
└── 19: airport_fee: optional double
Current snapshot      Operation.APPEND: id=5937117119577207079, schema_id=0
Snapshots             Snapshots
└── Snapshot 5937117119577207079, schema 0: file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro
Properties            owner                 root
write.format.default  parquet
```

--------------------------------

### Create Patch Branch from Tag

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Commands to create a new patch branch from an existing release tag and push it.

```bash
# Fetch all tags
git fetch --tags

# Assuming 0.8.0 is the latest release tag
git checkout -b pyiceberg-0.8.x pyiceberg-0.8.0

# Cherry-pick commits for the upcoming patch release
git cherry-pick <commit>

# Push the new branch
git push git@github.com:apache/iceberg-python.git pyiceberg-0.8.x
```

--------------------------------

### Configure SQL Catalog with PostgreSQL

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure a SQL catalog using PostgreSQL as the backend. Set init_catalog_tables to false to prevent automatic table creation.

```yaml
catalog:
  default:
    type: sql
    uri: postgresql+psycopg2://username:password@localhost/mydatabase
    init_catalog_tables: false
```

--------------------------------

### Create a Table with Iceberg Format Version 3

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/geospatial.md

Demonstrates creating a new table with the specified schema and explicitly setting the `format-version` property to '3', which is required for geospatial types.

```python
from pyiceberg.table import TableProperties

# Creating a v3 table
table = catalog.create_table(
    identifier="db.spatial_table",
    schema=schema,
    properties={
        TableProperties.FORMAT_VERSION: "3"
    }
)
```

--------------------------------

### Generate Individual Vendor Packages

Source: https://github.com/apache/iceberg-python/blob/main/vendor/README.md

Use these commands to generate specific vendor packages. 'make fb303' generates only the FB303 Thrift client, while 'make hive-metastore' generates only the Hive Metastore Thrift definitions.

```bash
make fb303           # FB303 Thrift client only
```

```bash
make hive-metastore  # Hive Metastore Thrift definitions only
```

--------------------------------

### Run Linting and Formatting

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Execute the linting and autoformatting checks for the project. This command ensures code style consistency and catches potential issues.

```bash
make lint
```

--------------------------------

### Basic Authentication Configuration

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure basic authentication with a username and password. Ensure these credentials are kept secure.

```yaml
auth:
  type: basic
  basic:
    username: myuser
    password: mypass
```

--------------------------------

### Upload PyIceberg Release to PyPI

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Checks out the release artifacts from Apache SVN and uploads them to PyPI using twine. Requires the VERSION environment variable to be set and may require a PyPi API token.

```bash
: "${VERSION:?ERROR: VERSION is not set or is empty}"

svn checkout https://dist.apache.org/repos/dist/release/iceberg/pyiceberg-${VERSION} /tmp/iceberg-dist-release/pyiceberg-${VERSION}

cd /tmp/iceberg-dist-release/pyiceberg-${VERSION}

twine upload pyiceberg-*.whl pyiceberg-*.tar.gz
```

--------------------------------

### Import PySpark Libraries

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/spark_integration_example.ipynb

Import the necessary PySpark SQL library for creating a SparkSession.

```python
from pyspark.sql import SparkSession
```

--------------------------------

### Configure SQL Catalog with SQLite

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure a SQL catalog using SQLite. This is suitable for development and exploratory purposes only due to concurrency limitations.

```yaml
catalog:
  default:
    type: sql
    uri: sqlite:////tmp/pyiceberg.db
    init_catalog_tables: false
```

--------------------------------

### Show Tables in Default Namespace

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/spark_integration_example.ipynb

List all tables present in the 'default' namespace. This is useful for identifying available Iceberg tables.

```python
# Show tables in the default namespace
spark.sql("SHOW TABLES FROM default").show()
```

--------------------------------

### Apache Gravitino Catalog Configuration

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure integration with Apache Gravitino. Requires catalog URI and delegation headers. Uses noop authentication by default.

```yaml
catalog:
  gravitino_catalog:
    type: rest
    uri: <gravitino-catalog-uri>
    header.X-Iceberg-Access-Delegation: vended-credentials
    auth:
      type: noop
```

--------------------------------

### Configure In-Memory Catalog

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure an in-memory catalog for testing and demos. It uses an in-memory SQLite database and is not suitable for production.

```yaml
catalog:
  default:
    type: in-memory
    warehouse: /tmp/pyiceberg/warehouse
```

--------------------------------

### Run Full Test Coverage

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/verify-release.md

Execute all unit and integration tests for PyIceberg with coverage reporting. This command spins up Docker containers to facilitate the testing process.

```sh
make test-coverage
```

--------------------------------

### Configure REST Catalog for Testing

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Set the PYICEBERG_TEST_CATALOG environment variable to specify which REST catalog to use for integration tests. Warning: Do not run against production catalogs.

```bash
export PYICEBERG_TEST_CATALOG=test_catalog
```

--------------------------------

### Run Unit Tests

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Execute the project's unit tests using pytest and coverage. Aims to enforce 90%+ code coverage.

```bash
make test
```

--------------------------------

### Configure Glue Catalog with Profile Name

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure a Glue catalog using an AWS profile name and region.

```yaml
catalog:
  default:
    type: glue
    glue.profile-name: <PROFILE_NAME>
    glue.region: <REGION_NAME>
    s3.endpoint: http://localhost:9000
    s3.access-key-id: admin
    s3.secret-access-key: password
```

--------------------------------

### Create and Push Signed Git Tag

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Bash script to set version variables, create a signed Git tag, and push it to the repository.

```bash
export VERSION=0.7.0
export RC=1

export VERSION_WITH_RC=${VERSION}rc${RC}
export GIT_TAG=pyiceberg-${VERSION_WITH_RC}

git tag -s ${GIT_TAG} -m "PyIceberg ${VERSION_WITH_RC}"
git push git@github.com:apache/iceberg-python.git ${GIT_TAG}
```

--------------------------------

### List Tables in Default Catalog

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/cli.md

Execute the 'list' command to display tables and namespaces within the default catalog. This command is useful for exploring available data.

```sh
➜  pyiceberg list
default
nyc
```

--------------------------------

### Specify Python Version for Environment

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Creates a virtual environment using a specific Python version and runs tests against it.

```bash
PYTHON=3.12 make install # Create environment with Python 3.12
make test # Run tests against Python 3.12
```

--------------------------------

### Verify Release Signatures

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/verify-release.md

Checkout the release artifacts from the Apache distribution repository and iterate through the downloaded files, verifying their signatures using GPG. This ensures the integrity and authenticity of the release files.

```sh
svn checkout https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-$PYICEBERG_VERSION/ ${PYICEBERG_VERIFICATION_DIR}

cd ${PYICEBERG_VERIFICATION_DIR}

for name in $(ls pyiceberg-*.whl pyiceberg-*.tar.gz)
do
    gpg --verify ${name}.asc ${name}
done
```

--------------------------------

### Create and Append Data to an Iceberg Table

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

This snippet demonstrates how to create a new Iceberg table and append initial data using PyArrow and PyIceberg.

```python
import pyarrow as pa
df = pa.Table.from_pylist(
    [
        {"city": "Amsterdam", "lat": 52.371807, "long": 4.896029},
        {"city": "San Francisco", "lat": 37.773972, "long": -122.431297},
        {"city": "Drachten", "lat": 53.11254, "long": 6.0989},
        {"city": "Paris", "lat": 48.864716, "long": 2.349014},
    ],
)

from pyiceberg.catalog import load_catalog
catalog = load_catalog("default")

tbl = catalog.create_table("default.cities", schema=df.schema)

tbl.append(df)
```

--------------------------------

### Unity Catalog Configuration

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Configure integration with Databricks Unity Catalog using its REST API. Requires workspace URL, catalog name, and a Databricks PAT token.

```yaml
catalog:
  unity_catalog:
    type: rest
    uri: https://<workspace-url>/api/2.1/unity-catalog/iceberg-rest
    warehouse: <uc-catalog-name>
    token: <databricks-pat-token>
```

--------------------------------

### Upload Artifacts to Apache Dev SVN

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Import the release candidate artifacts into the Apache Development SVN repository. The command requires the version and RC number to construct the correct paths.

```bash
: "${VERSION:?ERROR: VERSION is not set or is empty}"
: "${VERSION_WITH_RC:?ERROR: VERSION_WITH_RC is not set or is empty}"
: "${RC:?ERROR: RC is not set or is empty}"

svn import "svn-release-candidate-${VERSION}rc${RC}" "https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-${VERSION_WITH_RC}" -m "PyIceberg ${VERSION_WITH_RC}"
```

--------------------------------

### Inspect Table Manifests

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Use this to view a table's current file manifests. The output is a pyarrow.Table.

```python
table.inspect.manifests()
```

```python
pyarrow.Table
content: int8 not null
path: string not null
length: int64 not null
partition_spec_id: int32 not null
added_snapshot_id: int64 not null
added_data_files_count: int32 not null
existing_data_files_count: int32 not null
deleted_data_files_count: int32 not null
added_delete_files_count: int32 not null
existing_delete_files_count: int32 not null
deleted_delete_files_count: int32 not null
partition_summaries: list<item: struct<contains_null: bool not null, contains_nan: bool, lower_bound: string, upper_bound: string>> not null
  child 0, item: struct<contains_null: bool not null, contains_nan: bool, lower_bound: string, upper_bound: string>
      child 0, contains_null: bool not null
      child 1, contains_nan: bool
      child 2, lower_bound: string
      child 3, upper_bound: string
----
content: [[0]]
path: [["s3://warehouse/default/table_metadata_manifests/metadata/3bf5b4c6-a7a4-4b43-a6ce-ca2b4887945a-m0.avro"]]
length: [[6886]]
partition_spec_id: [[0]]
added_snapshot_id: [[3815834705531553721]]
added_data_files_count: [[1]]
existing_data_files_count: [[0]]
deleted_data_files_count: [[0]]
added_delete_files_count: [[0]]
existing_delete_files_count: [[0]]
deleted_delete_files_count: [[0]]
partition_summaries: [[    -- is_valid: all not null
    -- child 0 type: bool
[false]
    -- child 1 type: bool
[false]
    -- child 2 type: string
["test"]
    -- child 3 type: string
["test"]]]
```

--------------------------------

### Set Catalog Configuration via Environment Variables

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Use environment variables to configure catalog settings, such as the URI and S3 credentials. Double underscores represent nested fields in the YAML structure.

```sh
export PYICEBERG_CATALOG__DEFAULT__URI=thrift://localhost:9083
export PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID=username
export PYICEBERG_CATALOG__DEFAULT__S3__SECRET_ACCESS_KEY=password
```

--------------------------------

### Create Branch with Default Settings

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md

Create a mutable branch referencing a specific snapshot with default retention settings.

```python
# Create a branch with default settings
table.manage_snapshots().create_branch(
    snapshot_id=snapshot_id,
    branch_name="dev"
).commit()
```

--------------------------------

### Run S3 Integration Tests

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/contributing.md

Execute integration tests specifically for S3, requiring minio to be running.

```bash
make test-s3
```

--------------------------------

### No Authentication Configuration

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md

Use this configuration when no authentication is required for the catalog. This is the simplest authentication type.

```yaml
auth:
  type: noop
```

--------------------------------

### Show Namespaces in Spark

Source: https://github.com/apache/iceberg-python/blob/main/notebooks/spark_integration_example.ipynb

Display all available namespaces (databases) within the connected Spark environment. This helps in understanding the data organization.

```python
# Show available namespaces/databases
spark.sql("SHOW NAMESPACES").show()
```

--------------------------------

### Monitor and Watch GitHub Release Action

Source: https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/how-to-release.md

Use the `gh` CLI to find the database ID of the release workflow and then watch its progress. This is useful for tracking the artifact generation process.

```bash
: "${GIT_TAG:?ERROR: GIT_TAG is not set or is empty}"

RUN_ID=$(gh run list --repo apache/iceberg-python --workflow "Python Build Release Candidate" --branch "${GIT_TAG}" --event push --json databaseId -q '.[0].databaseId')
: "${RUN_ID:?ERROR: RUN_ID could not be determined}"

echo "Waiting for workflow to complete, this will take several minutes..."
gh run watch $RUN_ID --repo apache/iceberg-python
```