### Install nox Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md Install the nox task runner using `uv`. ```bash uv tool install nox ``` -------------------------------- ### Install Nox for Development Source: https://github.com/fsspec/universal_pathlib/blob/main/CONTRIBUTING.rst Install the Nox tool, which is required for setting up the development environment and running tests. ```console pip install nox ``` -------------------------------- ### Install nox with uv Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md If you prefer using `uv` for package management, you can install nox with this command. ```bash uv pip install nox ``` -------------------------------- ### Install universal-pathlib with pip Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/install.md Use this command to install universal-pathlib using pip. ```bash python -m pip install universal-pathlib ``` -------------------------------- ### Example Feature Request Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md When requesting a feature, explain the problem, your proposed solution, alternatives considered, and the use case. This example demonstrates how to request a `glob_with_info()` method for filtering paths by size. ```python for path, info in bucket.glob_with_info("**/*.parquet"): if info.st_size > 100_000_000: process(path) ``` -------------------------------- ### Install Pre-commit Hooks Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md Set up pre-commit hooks to automatically check your code before each commit. ```bash pip install pre-commit pre-commit install ``` -------------------------------- ### Install universal_pathlib with pip Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Install the latest version of universal_pathlib using pip. Additional packages may be needed for specific filesystems. ```bash python -m pip install universal_pathlib ``` -------------------------------- ### Install universal-pathlib with uv Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/install.md Use this command to add universal-pathlib to your project with the uv package manager. ```bash uv add universal-pathlib ``` -------------------------------- ### Install universal-pathlib with conda Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/install.md Use this command to install universal-pathlib from the conda-forge channel. ```bash conda install -c conda-forge universal-pathlib ``` -------------------------------- ### Register Custom UPath via Entry Points (setup.cfg) Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Illustrates registering a custom UPath implementation using the 'entry_points' mechanism in a 'setup.cfg' file. This makes the custom implementation discoverable by 'universal_pathlib' upon package installation. ```ini [options.entry_points] universal_pathlib.implementations = myproto = my_module.submodule:MyPath ``` -------------------------------- ### Reproducible Bug Report Example Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md Use this format to provide a minimal reproducible example when reporting bugs. Include environment details, the issue, reproduction steps, expected behavior, and actual behavior. ```markdown **Environment:** - OS: macOS 14.0 - Python: 3.11.5 - universal_pathlib: 0.2.2 - Filesystem: S3 (s3fs 2023.10.0) **Issue:** `UPath.glob()` doesn't match files with spaces in their names on S3. **Reproduction:** ```python from upath import UPath path = UPath("s3://my-bucket/") # This file exists: "my file.txt" list(path.glob("my*.txt")) # Returns empty list ``` **Expected:** Should find "my file.txt" **Actual:** Returns empty list ``` -------------------------------- ### Install AWS S3 filesystem dependencies Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/install.md Install the s3fs package for AWS S3 support, or use fsspec extras for a more integrated approach. ```bash pip install s3fs ``` ```bash pip install "fsspec[s3]" ``` -------------------------------- ### Install universal_pathlib with conda Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Install the latest version of universal_pathlib using conda. Additional packages may be needed for specific filesystems. ```bash conda install -c conda-forge universal_pathlib ``` -------------------------------- ### UPath Public API Example Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Demonstrates how to access fsspec protocol, storage options, and path components from a UPath instance. This allows for direct interaction with fsspec filesystems. ```python from upath import UPath from fsspec import filesystem p = UPath("s3://bucket/file.txt", anon=True) fs = filesystem(p.protocol, **p.storage_options) # equivalent to p.fs with fs.open(p.path) as f: data = f.read() ``` -------------------------------- ### Initialize UPath for S3 Filesystem Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Creates a UPath object to interact with an S3 bucket. This example initializes a UPath for the 'spacenet-dataset' bucket. ```python s3path = UPath("s3://spacenet-dataset") ``` -------------------------------- ### Install Test Requirements Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md Install additional dependencies required for specific tests, such as S3, GCS, or Azure tests, using pip. ```bash pip install -e ".[dev,s3,gcs,azure]" ``` -------------------------------- ### Migrate custom _FSSpecAccessor __init__ using _fs_factory Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/migration.md For custom filesystem instantiation in accessor subclasses, use the new `_fs_factory` classmethod in v0.2.0+ instead of overriding the `__init__` method directly. This provides a cleaner way to customize filesystem setup. ```python # OLD: v0.1.x import fsspec from upath.core import UPath, _FSSpecAccessor class MyAccessor(_FSSpecAccessor): def __init__(self, parsed_url, **kwargs): # custom filesystem setup super().__init__(parsed_url, **kwargs) class MyPath(UPath): _default_accessor = MyAccessor # NEW: v0.2.0+ from upath import UPath class MyPath(UPath): @classmethod def _fs_factory(cls, protocol, storage_options): # custom filesystem setup return super()._fs_factory(protocol, storage_options) ``` -------------------------------- ### Iterate Directory Contents with GitHub UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Demonstrates iterating through the contents of a directory on GitHub using UPath. This example shows how to list files and breaks after the first item. ```python for p in ghpath.iterdir(): print(p) break ``` -------------------------------- ### Construct and Access File Path with GitHub UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Shows how to construct a full file path within a GitHub repository using the `/` operator and access it. This example creates a path to the README.md file. ```python readme_path = ghpath / "README.md" readme_path ``` -------------------------------- ### Add universal_pathlib to project dependencies Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Example pyproject.toml configuration to add universal_pathlib and fsspec extras for S3 and HTTP support to your project. ```toml [project] name = "myproject" requires-python = ">=3.9" dependencies = [ "universal_pathlib>=0.3.10", "fsspec[s3,http]", ] ``` -------------------------------- ### Registering a Custom UPath Implementation Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/usage.md Provides an example of subclassing UPath to fix specific methods for a non-standard fsspec filesystem and registering it with a custom protocol. ```python from upath import UPath from upath.registry import register_implementation class MyCustomPath(UPath): # fix specific methods if the filesystem is a bit non-standard def is_dir(self, *, follow_symlinks=True): # some special way to check if it's a dir is_dir = ... return is_dir # Register for your protocol register_implementation("myproto", MyCustomPath) # Now it works! my_path = UPath("myproto://server/path") ``` -------------------------------- ### Concrete Path Filesystem Operations Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/concepts/pathlib.md Shows how to use concrete path objects for interacting with the actual filesystem, including checking existence, reading files, and getting file metadata. ```python from pathlib import Path # Concrete path - filesystem operations p = Path("/home/user/file.txt") exists = p.exists() # Checks filesystem content = p.read_text() # Reads file size = p.stat().st_size # Gets file size ``` -------------------------------- ### List Available UPath Implementations Source: https://context7.com/fsspec/universal_pathlib/llms.txt Use `available_implementations` to get a list of registered protocol strings. Set `fallback=True` to include all fsspec-known protocols. ```python from upath.registry import available_implementations # Only explicitly registered upath implementations impls = available_implementations() print(sorted(impls)) # ['abfs', 'abfss', 'adl', 'az', 'data', 'file', 'ftp', 'gcs', 'gs', # 'github', 'hdfs', 'hf', 'http', 'https', 'local', 'memory', # 's3', 's3a', 'sftp', 'simplecache', 'smb', 'ssh', 'tar', 'webdav', # 'webdav+http', 'webdav+https', 'zip'] # Also include all fsspec-known protocols (much larger list) all_protos = available_implementations(fallback=True) print(len(all_protos)) # 50+ protocols ``` -------------------------------- ### fsspec Core Functionality Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/concepts/fsspec.md Demonstrates common filesystem operations like listing files, checking existence, getting info, reading, writing, copying, and deleting. Requires importing fsspec and creating a filesystem instance for a specific protocol. ```python import fsspec # Create a filesystem instance # Returns an AbstractFileSystem subclass for the specified protocol fs = fsspec.filesystem('s3', anon=True) # List files files = fs.ls('my-bucket/data/') # Check if file exists exists = fs.exists('my-bucket/data/file.txt') # Get file info info = fs.info('my-bucket/data/file.txt') # Read file with fs.open('my-bucket/data/file.txt', 'r') as f: content = f.read() # Write file with fs.open('my-bucket/output.txt', 'w') as f: f.write('Hello, World!') # Copy files fs.cp('my-bucket/source.txt', 'my-bucket/dest.txt') # Delete files fs.rm('my-bucket/file.txt') ``` -------------------------------- ### Declare universal-pathlib and fsspec extras in pyproject.toml Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/install.md Specify universal-pathlib and required fsspec filesystem extras in your project's pyproject.toml file. This example includes S3 and HTTP support. ```toml [project] name = "myproject" requires-python = ">=3.9" dependencies = [ "universal_pathlib>=0.3.10", "fsspec[s3,http]", # Add the filesystems you need ] ``` -------------------------------- ### Basic UPath Operations Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/index.md Provides examples of common file operations using UPath for local paths, S3, and HTTP. This snippet shows how to write to a local file, check for existence on S3, and read bytes from an HTTP resource using a consistent interface. ```python from upath import UPath # Works with local paths local_path = UPath("documents/notes.txt") local_path.write_text("Hello, World!") print(local_path.read_text()) # "Hello, World!" # Works with S3 s3_path = UPath("s3://my-bucket/data/processed/results.csv") if s3_path.exists(): data = s3_path.read_text() # Works with HTTP http_path = UPath("https://example.com/data/file.json") if http_path.exists(): content = http_path.read_bytes() # Works with many more! 🌟 ``` -------------------------------- ### Basic Usage of UPath for S3 Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Demonstrates basic operations like accessing file name, stem, suffix, checking existence, and reading text from an S3 path using UPath. Ensure fsspec[s3] is installed. ```python # pip install universal_pathlib fsspec[s3] >>> from upath import UPath >>> >>> s3path = UPath("s3://test_bucket") / "example.txt" >>> s3path.name example.txt >>> s3path.stem example >>> s3path.suffix .txt >>> s3path.exists() True >>> s3path.read_text() 'Hello World' ``` -------------------------------- ### Initialize UPath for Local Path Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb When a local path is provided, UPath defaults to using pathlib.PosixPath or pathlib.WindowsPath, similar to pathlib.Path. This example shows creating a UPath object from a temporary file's name. ```python import pathlib import warnings from tempfile import NamedTemporaryFile from upath import UPath warnings.filterwarnings(action="ignore", message="UPath .*/", module="upath.core") tmp = NamedTemporaryFile() print(tmp.name, type(tmp.name)) local_path = UPath(tmp.name) assert isinstance(local_path, (pathlib.PosixPath, pathlib.WindowsPath)) local_path ``` -------------------------------- ### Instantiate UPath for Various Filesystems Source: https://context7.com/fsspec/universal_pathlib/llms.txt Demonstrates creating UPath instances for local files, file:// URIs, S3, GCS, Azure Blob Storage, HTTP, GitHub, and in-memory storage. Shows how to access basic path attributes and specify protocols. ```python from upath import UPath # ── Local filesystem (returns PosixUPath / WindowsUPath — a pathlib.Path subclass) ── local = UPath("/tmp/data.csv") print(type(local)) # print(local.suffix) # .csv # ── Local via file:// URI (returns FilePath, fsspec-backed) ── file_path = UPath("file:///tmp/data.csv") print(file_path.protocol) # file # ── Amazon S3 ── s3 = UPath("s3://my-bucket/datasets/train.parquet", anon=True) print(s3.protocol) # s3 print(s3.name) # train.parquet print(s3.parent) # S3Path('s3://my-bucket/datasets') # ── Google Cloud Storage ── gcs = UPath("gs://my-bucket/models/checkpoint.pt") print(gcs.drive) # my-bucket/ # ── Azure Blob Storage ── az = UPath("az://container/blob.json", account_name="myaccount") print(az.storage_options) # {'account_name': 'myaccount'} # ── HTTP/HTTPS (read-only) ── http = UPath("https://example.com/data/file.json") print(http.protocol) # https # ── GitHub repository (read-only) ── gh = UPath("github://fsspec:universal_pathlib@main/") readme = gh / "README.md" print(readme.name) # README.md # ── In-memory (useful for tests) ── mem = UPath("memory://scratch/temp.txt") print(mem.protocol) # memory # ── Explicit protocol kwarg ── p = UPath("/some/path/file.txt", protocol="memory") print(p.protocol) # memory ``` -------------------------------- ### Initialize UPath with GitHub Filesystem (Keyword Args) Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Connect to a GitHub repository using UPath by providing connection details as keyword arguments. This is useful for accessing files in a specific repository, organization, and branch/commit. ```python ghpath = UPath('github:/', org='fsspec', repo='universal_pathlib', sha='main') assert ghpath.exists() ghpath.fs ``` -------------------------------- ### fsspec Storage Options for Authentication and Connection Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/concepts/fsspec.md Illustrates how to configure filesystem instances with specific storage options for authentication, anonymous access, tokens, service accounts, and connection settings. ```python import fsspec # Authentication credentials fs = fsspec.filesystem('s3', key='...', secret='...') # Anonymous/public access fs = fsspec.filesystem('s3', anon=True) # Tokens and service accounts fs = fsspec.filesystem('gs', token='path/to/creds.json') # Connection settings fs = fsspec.filesystem('sftp', host='...', port=22, username='...') # Behavioral options fs = fsspec.filesystem('s3', use_ssl=True, default_block_size=5*2**20) ``` -------------------------------- ### Create Path Objects with pathlib Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/concepts/pathlib.md Demonstrates creating path objects for local filesystems using various methods like absolute paths, relative paths, home directory, and current working directory. ```python from pathlib import Path # Create path objects p = Path("/home/user/documents") p = Path("relative/path/to/file.txt") p = Path.home() # User's home directory p = Path.cwd() # Current working directory ``` -------------------------------- ### Create directories with UPath.mkdir() Source: https://context7.com/fsspec/universal_pathlib/llms.txt Use `mkdir()` to create directories, with options for recursive creation (`parents=True`) and ignoring existing directories (`exist_ok=True`). Works for both local and S3 paths. ```python from upath import UPath # In-memory filesystem base = UPath("memory://project") (base / "data" / "raw").mkdir(parents=True, exist_ok=True) (base / "data" / "processed").mkdir(parents=True, exist_ok=True) print(list(base.glob("**"))) # S3: "directories" are virtual prefixes; mkdir is a no-op or creates a pseudo-dir s3_dir = UPath("s3://my-bucket/experiments/run001") s3_dir.mkdir(parents=True, exist_ok=True) # exist_ok=False raises if directory already exists try: (base / "data").mkdir(exist_ok=False) except FileExistsError as e: print(f"Already exists: {e}") ``` -------------------------------- ### Compare Filesystem APIs Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/why.md Demonstrates the disparate APIs and import requirements for handling local, S3, and Azure Blob Storage files using traditional Python libraries. ```python from pathlib import Path local_file = Path("data/results.csv") with local_file.open('r') as f: data = f.read() import boto3 s3 = boto3.resource('s3') obj = s3.Object('my-bucket', 'data/results.csv') data = obj.get()['Body'].read().decode('utf-8') from azure.storage.blob import BlobServiceClient blob_client = BlobServiceClient.from_connection_string(conn_str) container_client = blob_client.get_container_client('my-container') blob_client = container_client.get_blob_client('data/results.csv') data = blob_client.download_blob().readall().decode('utf-8') # Three different APIs, three different patterns 😫 ``` -------------------------------- ### Register Custom UPath via Entry Points (pyproject.toml) Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Shows how to register a custom UPath implementation for distribution by defining an entry point in the 'pyproject.toml' file under the 'project.entry-points."universal_pathlib.implementations"' section. ```toml [project.entry-points."universal_pathlib.implementations"] myproto = "my_module.submodule:MyPath" ``` -------------------------------- ### Convert UPath to URI String Source: https://context7.com/fsspec/universal_pathlib/llms.txt Use `as_uri()` to get the string representation of a path as a URI. This method raises a ValueError for relative paths. ```python from upath import UPath s3 = UPath("s3://my-bucket/data/file.parquet") print(s3.as_uri()) # s3://my-bucket/data/file.parquet local = UPath("/home/user/data.csv") print(local.as_uri()) # file:///home/user/data.csv # Relative paths cannot be expressed as URIs rel = s3.relative_to(UPath("s3://my-bucket/")) try: rel.as_uri() except ValueError as e: print(e) # relative path can't be expressed as a s3 URI ``` -------------------------------- ### UPath.mkdir(mode, parents, exist_ok) Source: https://context7.com/fsspec/universal_pathlib/llms.txt Creates directories, supporting recursive creation and ignoring existing ones. ```APIDOC ## UPath.mkdir(mode, parents, exist_ok) ### Description Creates a directory at the specified path. Can create parent directories and optionally ignore errors if the directory already exists. ### Method `mkdir(mode=0o777, parents=False, exist_ok=False)` ### Parameters * **mode** (int) - Optional - The file mode to use for the new directory (default is 0o777). * **parents** (bool) - Optional - If True, create parent directories as needed (default is False). * **exist_ok** (bool) - Optional - If True, do not raise an error if the directory already exists (default is False). ### Example ```python from upath import UPath # In-memory filesystem base = UPath("memory://project") (base / "data" / "raw").mkdir(parents=True, exist_ok=True) (base / "data" / "processed").mkdir(parents=True, exist_ok=True) print(list(base.glob("**"))) # S3: "directories" are virtual prefixes; mkdir is a no-op or creates a pseudo-dir s3_dir = UPath("s3://my-bucket/experiments/run001") s3_dir.mkdir(parents=True, exist_ok=True) # exist_ok=False raises if directory already exists try: (base / "data").mkdir(exist_ok=False) except FileExistsError as e: print(f"Already exists: {e}") ``` ``` -------------------------------- ### UPath.path Source: https://context7.com/fsspec/universal_pathlib/llms.txt Get the fsspec-internal path string, which is the path string as expected by the underlying fsspec filesystem (i.e., with the URI scheme and authority stripped). ```APIDOC ## `UPath.path` — fsspec-internal path string Returns the path string as expected by the underlying fsspec filesystem (i.e. with the URI scheme and authority stripped). Pass this to `p.fs.open(p.path)`, `p.fs.exists(p.path)`, etc. when calling fsspec methods directly. ### Example ```python from upath import UPath p = UPath("s3://my-bucket/data/train/features.npy") print(str(p)) # s3://my-bucket/data/train/features.npy print(p.path) # my-bucket/data/train/features.npy m = UPath("memory:///foo/bar.txt") print(str(m)) # memory://foo/bar.txt print(m.path) # /foo/bar.txt # Use p.path when calling fsspec directly with p.fs.open(p.path, "rb") as f: header = f.read(8) ``` ``` -------------------------------- ### Pass Connection Options to UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/usage.md Provide connection options as keyword arguments or use the 'protocol' keyword argument to specify the filesystem implementation. ```python # GitHub with explicit parameters ghpath = UPath('github:/', org='fsspec', repo='universal_pathlib', sha='main') ``` ```python # Using protocol kwarg for S3 s3path = UPath('my-bucket/data/file.txt', protocol='s3') ``` ```python # Using protocol kwarg for Azure with configuration azpath = UPath( 'container/blob.parquet', protocol='az', account_name='myaccount', account_key='mykey' ) ``` -------------------------------- ### Instantiate UPath with Different Protocols Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Demonstrates instantiating UPath with different protocols like 's3', 'memory', and 'ftp'. Shows how the correct UPath subclass is returned based on the protocol or if a default UPath is used. ```python from upath import UPath from upath.implementations.cloud import S3Path from upath.implementations.memory import MemoryPath p0 = UPath("s3://bucket/file.txt") assert p0.protocol == "s3" assert type(p0) is S3Path assert isinstance(p0, UPath) p1 = UPath("/some/path/file.txt", protocol="memory") assert p1.protocol == "memory" assert type(p1) is MemoryPath assert isinstance(p1, UPath) # the ftp filesystem current has no custom UPath implementation and is not # tested in the universal_pathlib test-suite. Therefore, the default UPath # implementation is returned, and a warning is emitted on instantiation. p2 = UPath("ftp://ftp.ncbi.nih.gov/snp/archive") assert p2.protocol == "ftp" assert type(p2) is UPath ``` -------------------------------- ### Get file metadata with UPath.stat() Source: https://context7.com/fsspec/universal_pathlib/llms.txt Retrieve file metadata such as size, modification time, and mode using `stat()`. This returns a `UPathStatResult` object similar to `os.stat_result`. ```python from upath import UPath import datetime p = UPath("s3://spacenet-dataset/LICENSE.md", anon=True) st = p.stat() print(f"Size: {st.st_size} bytes") print(f"Modified: {datetime.datetime.fromtimestamp(st.st_mtime)}") print(f"Mode: {oct(st.st_mode)}") # Access raw fsspec info dict via p.fs.info(p.path) info = p.fs.info(p.path) print(info.keys()) # dict_keys(['Key', 'LastModified', 'ETag', 'Size', ...]) ``` -------------------------------- ### Clone Repository Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/contributing.md Fork the repository on GitHub first, then clone it locally and navigate into the project directory. ```bash git clone https://github.com/YOUR-USERNAME/universal_pathlib.git cd universal_pathlib ``` -------------------------------- ### Initialize UPath with GitHub Filesystem (URL) Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Connect to a GitHub repository by embedding connection details directly within the path/URL string. UPath parses these details to establish the connection. ```python ghpath = UPath('github://fsspec:universal_pathlib@main/') ghpath ``` -------------------------------- ### Get Filesystem Protocol from UPath Source: https://context7.com/fsspec/universal_pathlib/llms.txt Retrieves the URI scheme (protocol) from a UPath instance, which identifies the underlying filesystem. This is useful for understanding or conditionally handling different storage types. ```python from upath import UPath paths = [ UPath("s3://bucket/key.txt"), UPath("gs://bucket/key.txt"), UPath("az://container/blob"), UPath("hf://datasets/user/repo/file.parquet"), UPath("memory://tmp/x.bin"), UPath("github://owner:repo@main/src/"), UPath("https://example.com/api/data"), ] for p in paths: print(f"{p.protocol!r:12} → {type(p).__name__}") # 's3' → S3Path # 'gs' → GCSPath # 'az' → AzurePath # 'hf' → HfPath # 'memory' → MemoryPath # 'github' → GitHubPath # 'https' → HTTPPath ``` -------------------------------- ### Listing Known fsspec Implementations Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/usage.md Demonstrates how to programmatically retrieve and print a list of all known fsspec filesystem implementations and their corresponding classes. ```python from fsspec.registry import known_implementations for name, details in sorted(known_implementations.items()): print(f"{name}: {details['class']}") ``` -------------------------------- ### Register Custom Filesystem with Default UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md Shows how to register a custom fsspec AbstractFileSystem for a new protocol ('myproto') and instantiate it using the default UPath implementation. This is useful when the custom filesystem works with UPath's default behavior. ```python import fsspec.registry from fsspec.spec import AbstractFileSystem class MyProtoFileSystem(AbstractFileSystem): protocol = ("myproto",) ... fsspec.registry.register_implementation("myproto", MyProtoFileSystem) from upath import UPath p = UPath("myproto:///my/proto/path") assert type(p) is UPath assert p.protocol == "myproto" assert isinstance(p.fs, MyProtoFileSystem) ``` -------------------------------- ### Get UPath Class by Protocol Source: https://context7.com/fsspec/universal_pathlib/llms.txt Use `get_upath_class` to retrieve the `UPath` subclass for a protocol without instantiation. It falls back to a generated subclass for known fsspec protocols, emitting a warning. ```python from upath.registry import get_upath_class # Well-known protocols print(get_upath_class("s3").__name__) # S3Path print(get_upath_class("gs").__name__) # GCSPath print(get_upath_class("memory").__name__) # MemoryPath print(get_upath_class("github").__name__) # GitHubPath print(get_upath_class("").__name__) # PosixUPath (on Linux/macOS) # Unknown protocol → returns None result = get_upath_class("nonexistent", fallback=False) print(result) # None # Fallback: fsspec-registered but no upath impl → auto-generated class + warning cls = get_upath_class("ftp") # emits UserWarning print(issubclass(cls, __import__("upath").UPath)) # True ``` -------------------------------- ### Get fsspec-internal path string with UPath.path Source: https://context7.com/fsspec/universal_pathlib/llms.txt Retrieve the path string as expected by the underlying fsspec filesystem, with the URI scheme and authority stripped. This is useful when calling fsspec methods directly. ```python from upath import UPath p = UPath("s3://my-bucket/data/train/features.npy") print(str(p)) # s3://my-bucket/data/train/features.npy print(p.path) # my-bucket/data/train/features.npy m = UPath("memory:///foo/bar.txt") print(str(m)) # memory://foo/bar.txt print(m.path) # /foo/bar.txt # Use p.path when calling fsspec directly with p.fs.open(p.path, "rb") as f: header = f.read(8) ``` -------------------------------- ### Get File Name, Stem, and Suffix from UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Demonstrates how to extract the file's name, stem (filename without extension), and suffix (file extension) from a UPath object, similar to pathlib.Path. ```python readme_path.name, readme_path.stem, readme_path.suffix ``` -------------------------------- ### Get Full Path String from UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb Converts a UPath object representing a file path into its string representation. This is useful when the path needs to be used with other libraries or functions that expect string paths. ```python str(readme_path) ``` -------------------------------- ### Manage files and directories with UPath.touch(), unlink(), rmdir() Source: https://context7.com/fsspec/universal_pathlib/llms.txt Create empty files with `touch()`, delete files with `unlink()`, and remove directories with `rmdir()`. `rmdir()` is recursive by default in UPath. ```python from upath import UPath root = UPath("memory://workspace/") # touch: create an empty file f = root / "log.txt" f.touch() print(f.exists()) # True print(f.stat().st_size) # 0 # write then unlink f.write_text("hello") f.unlink() print(f.exists()) # False # missing_ok suppresses FileNotFoundError f.unlink(missing_ok=True) # no error # rmdir: remove directory (recursive by default) (root / "subdir" / "file.txt").write_text("data") (root / "subdir").rmdir() # removes non-empty directory print((root / "subdir").exists()) # False ``` -------------------------------- ### Run Linting and Formatting Checks with Nox Source: https://github.com/fsspec/universal_pathlib/blob/main/CONTRIBUTING.rst Execute the linting and code formatting checks using a specific Nox session. ```console nox -s lint ``` -------------------------------- ### fsspec URI-Based Access with urlpaths Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/concepts/fsspec.md Shows how to open files directly using URIs by combining protocol, storage options, and path. This allows for convenient access to resources without explicitly creating a filesystem object first. ```python import fsspec # resource protocol = "s3" storage_options = {"anon": True} path = "bucket/file.txt" # Create filesystem and open path fs = fsspec.filesystem("s3", anon=True) with fs.open("bucket/file.txt", "r") as f: content = f.read() # Or open a file via its urlpath with storage_options with fsspec.open('s3://bucket/file.txt', 'r', anon=True) as f: content = f.read() ``` -------------------------------- ### Filesystem-Agnostic Code with upath.types Source: https://context7.com/fsspec/universal_pathlib/llms.txt Shows how to use `ReadablePath` and `WritablePath` from `upath.types` to write functions that accept both `pathlib.Path` and `UPath` implementations. ```python from pathlib import Path from upath import UPath from upath.types import ReadablePath, WritablePath def transform(src: ReadablePath, dst: WritablePath, upper: bool = True) -> int: """Copy src → dst with optional uppercasing. Works with any path type.""" data = src.read_text(encoding="utf-8") result = data.upper() if upper else data return dst.write_text(result, encoding="utf-8") # stdlib Path → stdlib Path transform(Path("/tmp/input.txt"), Path("/tmp/output.txt")) # UPath (S3) → UPath (GCS) transform( UPath("s3://raw-bucket/data.txt", anon=True), UPath("gs://processed-bucket/data.txt"), ) # HTTP → local transform( UPath("https://example.com/sample.txt"), UPath("/tmp/sample_upper.txt"), ) # HTTP → memory (useful in tests) result = UPath("memory://out/sample.txt") transform(UPath("https://example.com/sample.txt"), result) print(result.read_text()) ``` -------------------------------- ### WindowsUPath Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/api/implementations.md Windows-style local filesystem paths. ```APIDOC ## WindowsUPath ### Description Represents Windows-style paths for local filesystems. ``` -------------------------------- ### Convert Local Path to URI and Initialize UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/notebooks/examples.ipynb This snippet demonstrates converting a local UPath object to its absolute URI and then initializing a new UPath object from this URI. It also shows how to inspect the underlying fsspec filesystem implementation. ```python local_uri = local_path.absolute().as_uri() print(f"{local_uri=}") local_upath = UPath(local_uri) print(f"{local_upath=}") print(f"{type(local_upath)=}") assert isinstance(local_upath, UPath) print(f"{type(local_upath.fs)=}") tmp.close() ``` -------------------------------- ### Accessing Underlying fsspec Filesystem Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/usage.md Demonstrates how to retrieve the fsspec filesystem object, path string, and storage options from a UPath instance for lower-level control. ```python path = UPath("s3://my-bucket/file.txt") # Access the filesystem object fs = path.fs # Get the path string for use with fsspec path_str = path.path # Get storage options options = path.storage_options # Use fsspec directly if needed with fs.open(path_str, 'rb') as f: data = f.read() ``` -------------------------------- ### Storage Options Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/api/types.md Typed dictionaries providing type hints for filesystem-specific configuration options, ensuring correct parameter names and types. ```APIDOC ## SimpleCacheStorageOptions ### Description Typed dictionary for SimpleCache filesystem storage options. ### Fields - `cache_dir` (str) - Required - Path to the cache directory. ``` ```APIDOC ## GCSStorageOptions ### Description Typed dictionary for Google Cloud Storage (GCS) filesystem storage options. ### Fields - `token` (str, optional) - Authentication token. - `project` (str, optional) - GCS project ID. - `client_options` (dict, optional) - Client options for GCS connection. ``` ```APIDOC ## S3StorageOptions ### Description Typed dictionary for Amazon S3 filesystem storage options. ### Fields - `key` (str, optional) - S3 access key. - `secret` (str, optional) - S3 secret key. - `client_kwargs` (dict, optional) - Keyword arguments for S3 client. ``` ```APIDOC ## AzureStorageOptions ### Description Typed dictionary for Azure Blob Storage filesystem storage options. ### Fields - `account_name` (str) - Required - Azure storage account name. - `account_key` (str, optional) - Azure storage account key. - `sas_token` (str, optional) - Azure SAS token. ``` ```APIDOC ## DataStorageOptions ### Description Typed dictionary for Data URI scheme storage options. ### Fields - `encoding` (str, optional) - Encoding of the data. - `mime_type` (str, optional) - MIME type of the data. ``` ```APIDOC ## FTPStorageOptions ### Description Typed dictionary for FTP filesystem storage options. ### Fields - `user` (str, optional) - FTP username. - `password` (str, optional) - FTP password. - `host` (str) - Required - FTP host. - `port` (int, optional) - FTP port. ``` ```APIDOC ## GitHubStorageOptions ### Description Typed dictionary for GitHub filesystem storage options. ### Fields - `token` (str) - Required - GitHub personal access token. - `repo` (str) - Required - Repository name (e.g., 'owner/repo'). - `branch` (str, optional) - Branch name. ``` ```APIDOC ## HDFSStorageOptions ### Description Typed dictionary for HDFS filesystem storage options. ### Fields - `user` (str, optional) - HDFS username. - `kerb_ticket` (str, optional) - Path to Kerberos ticket cache. - `host` (str, optional) - HDFS namenode host. - `port` (int, optional) - HDFS namenode port. ``` ```APIDOC ## HTTPStorageOptions ### Description Typed dictionary for HTTP/HTTPS filesystem storage options. ### Fields - `headers` (dict, optional) - Custom HTTP headers. - `auth` (tuple, optional) - Authentication tuple (username, password). ``` ```APIDOC ## FileStorageOptions ### Description Typed dictionary for local file system storage options. ### Fields - `auto_mkdir` (bool, optional) - Whether to automatically create parent directories. ``` ```APIDOC ## MemoryStorageOptions ### Description Typed dictionary for in-memory filesystem storage options. ### Fields - `base_dir` (str, optional) - Base directory for the in-memory filesystem. ``` ```APIDOC ## SFTPStorageOptions ### Description Typed dictionary for SFTP filesystem storage options. ### Fields - `username` (str, optional) - SFTP username. - `password` (str, optional) - SFTP password. - `private_key` (str, optional) - Path to private key file. - `host` (str) - Required - SFTP host. - `port` (int, optional) - SFTP port. ``` ```APIDOC ## SMBStorageOptions ### Description Typed dictionary for SMB/CIFS filesystem storage options. ### Fields - `username` (str, optional) - SMB username. - `password` (str, optional) - SMB password. - `domain` (str, optional) - SMB domain. - `host` (str) - Required - SMB host. ``` ```APIDOC ## WebdavStorageOptions ### Description Typed dictionary for WebDAV filesystem storage options. ### Fields - `username` (str, optional) - WebDAV username. - `password` (str, optional) - WebDAV password. - `auth_type` (str, optional) - WebDAV authentication type (e.g., 'basic', 'digest'). ``` ```APIDOC ## ZipStorageOptions ### Description Typed dictionary for Zip archive filesystem storage options. ### Fields - `compression` (str, optional) - Compression method (e.g., 'zip_deflated'). ``` ```APIDOC ## TarStorageOptions ### Description Typed dictionary for Tar archive filesystem storage options. ### Fields - `compression` (str, optional) - Compression method (e.g., 'gz', 'bz2'). ``` -------------------------------- ### Custom FSSpecAccessor __init__ Method (v0.2.0+) Source: https://github.com/fsspec/universal_pathlib/blob/main/README.md For customizing the fsspec filesystem instance creation in v0.2.0+, override '_parse_storage_options' and '_fs_factory' class methods in your UPath subclass. ```python # OLD: v0.1.x import fsspec from upath.core import UPath, _FSSpecAccessor from typing import Any, Mapping from urllib.parse import SplitResult class MyAccessor(_FSSpecAccessor): def __init__(self, parsed_url: SplitResult | None, **kwargs: Any) -> None: # custom code protocol = ... storage_options = ... self._fs = fsspec.filesystem(protocol, storage_options) class MyPath(UPath): _default_accessor = MyAccessor # NEW: v0.2.0+ from upath import UPath from typing import Any, Mapping from fsspec.spec import AbstractFileSystem class MyPath(UPath): @classmethod def _parse_storage_options( cls, urlpath: str, protocol: str, storage_options: Mapping[str, Any] ) -> dict[str, Any]: # custom code to change storage_options storage_options = ... return storage_options @classmethod def _fs_factory( cls, urlpath: str, protocol: str, storage_options: Mapping[str, Any] ) -> AbstractFileSystem: # custom code to instantiate fsspec filesystem protocol = ... storage_options = ... # note changes to storage_options here won't # show up in MyPath().storage_options return fsspec.filesystem(protocol, **storage_options) ``` -------------------------------- ### UPath Constructor Source: https://context7.com/fsspec/universal_pathlib/llms.txt The primary entry point for creating universal path objects. It automatically dispatches to the appropriate filesystem-specific implementation based on the URI scheme. ```APIDOC ## UPath(*args, protocol=None, **storage_options) ### Description The primary entry point. Instantiate with a URI string and optional storage options; `UPath.__new__` reads the scheme, consults the implementation registry, and returns the correct subclass instance. Pass `protocol=` explicitly to override scheme detection or to attach a scheme to a plain path string. ### Parameters #### Path Parameters - **args** (str or PathLike) - Required - The path string or existing path object. - **protocol** (str, optional) - The fsspec protocol to use. Overrides scheme detection. - ****storage_options** (dict, optional) - Keyword arguments for the fsspec filesystem constructor. ### Request Example ```python from upath import UPath # Local filesystem local = UPath("/tmp/data.csv") # Local via file:// URI file_path = UPath("file:///tmp/data.csv") # Amazon S3 s3 = UPath("s3://my-bucket/datasets/train.parquet", anon=True) # Google Cloud Storage gcs = UPath("gs://my-bucket/models/checkpoint.pt") # Azure Blob Storage az = UPath("az://container/blob.json", account_name="myaccount") # HTTP/HTTPS (read-only) http = UPath("https://example.com/data/file.json") # GitHub repository (read-only) gh = UPath("github://fsspec:universal_pathlib@main/") # In-memory mem = UPath("memory://scratch/temp.txt") # Explicit protocol kwarg p = UPath("/some/path/file.txt", protocol="memory") ``` ### Response #### Success Response (Instance of appropriate UPath subclass) - **UPath instance** - An object representing the path, with methods and properties corresponding to the underlying filesystem. ``` -------------------------------- ### Familiar pathlib Operations with UPath Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/why.md Illustrates how Universal Pathlib mirrors the familiar `pathlib.Path` interface for common operations like accessing file attributes, joining paths, and performing file/directory operations. ```python from upath import UPath # All the familiar pathlib operations path = UPath("s3://bucket/data/file.txt") print(path.name) # "file.txt" print(path.stem) # "file" print(path.suffix) # ".txt" print(path.parent) # UPath("s3://bucket/data") # Path joining output = path.parent / "processed" / "output.csv" # File operations path.write_text("Hello!") content = path.read_text() # Directory operations for item in path.parent.iterdir(): print(item) ``` -------------------------------- ### Run All Project Tests with Nox Source: https://github.com/fsspec/universal_pathlib/blob/main/CONTRIBUTING.rst Execute the complete test suite for the project using the Nox automation tool. ```console nox ``` -------------------------------- ### Adding Custom Methods with ProxyUPath Source: https://github.com/fsspec/universal_pathlib/blob/main/docs/usage.md Shows how to extend UPath with domain-specific methods like 'download' and 'get_metadata' by subclassing ProxyUPath, which delegates core functionality to the wrapped UPath instance. ```python from upath import UPath from upath.extensions import ProxyUPath class MyCustomPath(ProxyUPath): """A path with extra convenience methods.""" def download(self, local_path): """Download this remote file to a local path.""" local = UPath(local_path) local.write_bytes(self.read_bytes()) return local def get_metadata(self): """Get custom metadata for this file.""" stat = self.stat() return { 'size': stat.st_size, 'modified': stat.st_mtime, 'name': self.name, } # Use it like a regular UPath path = MyCustomPath("s3://my-bucket/data.csv", anon=True) # Access standard UPath methods print(path.exists()) print(path.name) # Use your custom methods metadata = path.get_metadata() path.download("/tmp/data.csv") ``` -------------------------------- ### UPath.touch() / UPath.unlink() / UPath.rmdir() Source: https://context7.com/fsspec/universal_pathlib/llms.txt Operations for creating empty files, deleting files, and removing directories. ```APIDOC ## UPath.touch() / UPath.unlink() / UPath.rmdir() ### Description Provides methods to create empty files (`touch`), delete files (`unlink`), and remove directories (`rmdir`). `rmdir` is recursive by default for `UPath`. ### Methods * `touch(mode=0o666, exist_ok=True)`: Creates an empty file or updates its modification time. * `unlink(missing_ok=False)`: Deletes a file. If `missing_ok` is True, does not raise an error if the file does not exist. * `rmdir()`: Removes a directory. By default, it removes non-empty directories recursively. ### Parameters * **touch**: `mode` (int, optional), `exist_ok` (bool, optional, default True). * **unlink**: `missing_ok` (bool, optional, default False). * **rmdir**: None. ### Example ```python from upath import UPath root = UPath("memory://workspace/") # touch: create an empty file f = root / "log.txt" f.touch() print(f.exists()) # True print(f.stat().st_size) # 0 # write then unlink f.write_text("hello") f.unlink() print(f.exists()) # False # missing_ok suppresses FileNotFoundError f.unlink(missing_ok=True) # no error # rmdir: remove directory (recursive by default) (root / "subdir" / "file.txt").write_text("data") (root / "subdir").rmdir() # removes non-empty directory print((root / "subdir").exists()) # False ``` ```