### Install Frictionless with Visidata Support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/explore.md Install the Frictionless library with the necessary dependencies for Visidata integration. Include `zenodo` for accessing examples from the tutorial. ```bash pip install frictionless[visidata] ``` ```bash pip install frictionless[visidata,zenodo] ``` -------------------------------- ### Install Frictionless with SQL support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/query.md Install the Frictionless package with SQL support. For examples in this tutorial, also install Zenodo support. ```bash pip install frictionless[sql] ``` ```bash pip install frictionless[sql,zenodo] # for examples in this tutorial ``` -------------------------------- ### Install Frictionless Framework Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/getting-started.md Install the Frictionless framework using pip. Optionally, install with SQL plugin support for Zsh or other shells. ```bash pip install frictionless ``` ```bash pip install frictionless[sql] # to install a core plugin (optional) ``` ```bash pip install 'frictionless[sql]' # for zsh shell ``` -------------------------------- ### Start Livemark Documentation Server Source: https://github.com/frictionlessdata/frictionless-py/blob/main/CONTRIBUTING.md Start the local development server for the Frictionless documentation portal, which is built using Livemark. ```bash livemark start ``` -------------------------------- ### Install Frictionless with AWS support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/schemes/aws.md Install the Frictionless library with the necessary AWS integration. Use the second command for zsh shell. ```bash pip install frictionless[aws] ``` ```bash pip install 'frictionless[aws]' ``` -------------------------------- ### Install Frictionless with CKAN Support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/ckan.md Install the Frictionless library with the CKAN extra for full portal integration. Use the `--pre` flag for pre-release versions. ```bash pip install frictionless[ckan] --pre ``` ```bash pip install 'frictionless[ckan]' --pre # for zsh shell ``` -------------------------------- ### Install Frictionless with Parquet Support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/parquet.md Install the Frictionless library with the necessary dependencies for Parquet support. Use the second command for zsh shells. ```bash pip install frictionless[parquet] ``` ```bash pip install 'frictionless[parquet]' # for zsh shell ``` -------------------------------- ### Install Frictionless with SQL support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/sql.md Install the Frictionless framework with SQL support using pip. This command installs the necessary dependencies for working with SQL databases. ```bash pip install frictionless[sql] ``` ```bash pip install frictionless[postgresql] ``` ```bash pip install frictionless[mysql] ``` ```bash pip install frictionless[duckdb] ``` -------------------------------- ### Install Hatch Source: https://github.com/frictionlessdata/frictionless-py/blob/main/CONTRIBUTING.md Install Hatch, a Python development and orchestration tool, used for managing environments and running commands. ```bash pip3 install hatch ``` -------------------------------- ### Install Frictionless with SQL support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/index.md Install the Frictionless library with SQL support using pip. This is a prerequisite for using database indexing features. ```bash pip install frictionless[sql] ``` -------------------------------- ### Get Frictionless CLI Help Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/getting-started.md Display help documentation for Frictionless commands using the --help flag. ```bash frictionless --help ``` ```bash frictionless describe --help ``` ```bash frictionless extract --help ``` ```bash frictionless validate --help ``` ```bash frictionless transform --help ``` -------------------------------- ### Install Frictionless with Excel Support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/excel.md Install the Frictionless library with the necessary dependencies for Excel support. Use the second command for zsh shells. ```bash pip install frictionless[excel] ``` ```bash pip install 'frictionless[excel]' ``` -------------------------------- ### Install frictionless Source: https://github.com/frictionlessdata/frictionless-py/blob/main/README.md Install the frictionless package using pip. ```bash pip install frictionless ``` -------------------------------- ### Install Frictionless with HTML support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/html.md Install the Frictionless library with the HTML extra to enable HTML parsing capabilities. Use the second command for zsh shell. ```bash pip install frictionless[html] ``` ```bash pip install 'frictionless[html]' ``` -------------------------------- ### Get Visidata Help in Console Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/explore.md Display the help message for Visidata directly from the command line to understand its available options and commands. ```bash vd --help ``` -------------------------------- ### MultipartLoader Read Line Stream Example Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/advanced/system.md An excerpt from a MultipartLoader demonstrating how to iterate through lines from multiple file paths using the system's create_loader. ```python def read_line_stream(self): for number, path in enumerate(self.__path, start=1): resource = Resource(path=path) resource.infer(sample=False) with system.create_loader(resource) as loader: for line_number, line in enumerate(loader.byte_stream, start=1): if not self.__headless and number > 1 and line_number == 1: continue yield line ``` -------------------------------- ### Frictionless CLI Help Commands Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/overview.md Demonstrates how to get help for specific Frictionless CLI commands. ```bash frictionless describe --help # to get help for describe frictionless extract --help # to get help for extract frictionless validate --help # to get help for validate frictionless transform --help # to get help for transform ``` -------------------------------- ### Fill and Replace Cells Example Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/steps/cell.md This example demonstrates replacing specific cell patterns and then filling cells in a target field with a new value. It shows a combination of cell replacement and filling operations. ```python from pprint import pprint from frictionless import Package, Resource, transform, steps source = Resource(path="transform.csv") target = transform( source, steps=[ steps.cell_replace(pattern="france", replace=None), steps.cell_fill(field_name="name", value="FRANCE"), ] ) print(target.schema) print(target.to_view()) ``` -------------------------------- ### Install Frictionless with GitHub Dependencies Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/github.md Install the frictionless library with the necessary GitHub extra dependencies. Use the second command for zsh shell compatibility. ```bash pip install frictionless[github] --pre ``` ```bash pip install 'frictionless[github]' --pre # for zsh shell ``` -------------------------------- ### Install Frictionless with ODS support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/ods.md Install the Frictionless library with the necessary dependencies for ODS support. Use the second command for zsh shells. ```bash pip install frictionless[ods] ``` ```bash pip install 'frictionless[ods]' ``` -------------------------------- ### Install Frictionless with Pandas Support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/pandas.md Install the Frictionless library with the pandas extra to enable dataframe support. Use the second command for zsh shell. ```bash pip install frictionless[pandas] ``` ```bash pip install 'frictionless[pandas]' ``` -------------------------------- ### Create a Checklist Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/checklist.md Instantiate a Checklist with a list of checks. This example adds a row constraint check. ```python from frictionless import Checklist, checks checklist = Checklist(checks=[checks.row_constraint(formula='id > 1')]) print(checklist) ``` -------------------------------- ### Install Frictionless with Zenodo Dependencies Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/zenodo.md Install the frictionless package with Zenodo extra dependencies. This command is used for both standard and zsh shells. ```bash pip install frictionless[zenodo] --pre ``` ```bash pip install 'frictionless[zenodo]' --pre # for zsh shell ``` -------------------------------- ### Install Python Development Headers Source: https://github.com/frictionlessdata/frictionless-py/blob/main/CONTRIBUTING.md Install Python development headers required for building Python extensions. This is a prerequisite for setting up the development environment. ```bash sudo apt-get install libpython3.10-dev ``` -------------------------------- ### Install Frictionless with JSON support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/json.md Install the Frictionless library with the `json` extra to enable JSON format support. This command is compatible with bash and zsh shells. ```bash pip install frictionless[json] ``` ```bash pip install 'frictionless[json]' ``` -------------------------------- ### Install Frictionless with Gsheets Support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/gsheets.md Install the Frictionless package with the gsheets extra for CLI usage. For zsh shell, use the second command. ```bash pip install frictionless[gsheets] ``` ```bash pip install 'frictionless[gsheets]' ``` -------------------------------- ### Install Frictionless with SPSS support Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/spss.md Install the Frictionless library with the SPSS extra to enable SPSS file support. Use the second command for zsh shell. ```bash pip install frictionless[spss] ``` ```bash pip install 'frictionless[spss]' ``` -------------------------------- ### Transforming a Package with Resource Operations Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/transforming-data.md Demonstrates transforming a package by adding a resource, transforming an existing resource, and then removing the added resource. This example is artificial but shows package transformation flexibility. ```Python from frictionless import Package, Resource, transform, steps # Define source package source = Package(resources=[Resource(name='main', path="transform.csv")]) # Create a pipeline pipeline = Pipeline(steps=[ steps.resource_add(name="extra", descriptor={"data": [['id', 'cars'], [1, 166], [2, 132], [3, 94]]}), steps.resource_transform( name="main", steps=[ steps.table_normalize(), steps.table_join(resource="extra", field_name="id"), ], ), steps.resource_remove(name="extra"), ]) # Apply transform steps target = source.transform(pipeline) # Print resulting resources, schema and data print(target.resource_names) print(target.get_resource("main").schema) print(target.get_resource("main").to_view()) ``` -------------------------------- ### Manage Resources in a Frictionless Package Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/package.md Provides examples of managing resources within a Package, including accessing all resources, their names, adding a new resource, retrieving a specific resource, checking for its existence, and removing it. ```Python from frictionless import Package, Resource package = Package('datapackage.json') print(package.resources) print(package.resource_names) package.add_resource(Resource(name='new', data=[['key1', 'key2'], ['val1', 'val2']])) resource = package.get_resource('new') print(package.has_resource('new')) package.remove_resource('new') ``` -------------------------------- ### Get Resource Format (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Demonstrates how to access the format (extension) of a resource. The format helps Frictionless select the appropriate parser for the data. ```python from frictionless import Resource with Resource(b'header1,header2\nvalue1,value2.csv', format='csv') as resource: print(resource.format) print(resource.to_view()) ``` -------------------------------- ### Create Catalog from GitHub Search Source: https://github.com/frictionlessdata/frictionless-py/blob/main/blog/2022/09-07-github-integration.md Create a `Catalog` by searching GitHub repositories using specific queries and qualifiers. This example searches for repositories owned by 'fdtester' with pagination controls. ```Python from frictionless import Catalog, portals catalog = Catalog( control=portals.GithubControl(search="user:fdtester", per_page=1, page=1), ) ``` -------------------------------- ### List Resources (Normal Mode) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/list.md Use this command to get a visually formatted list of resources from specified data files. It operates on available metadata only. ```bash frictionless list tables/*.csv ``` -------------------------------- ### Infer Catalog from Zenodo with Search Query Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/zenodo.md Reads a catalog from Zenodo, filtering repositories using a search query. Ensure the 'frictionless' library is installed. ```python from pprint import pprint from frictionless import portals, Catalog control = portals.ZenodoControl(search='notes:"TDWD"') catalog = Catalog(control=control) catalog.infer() print("Total packages", len(catalog.packages)) print(catalog.packages) ``` -------------------------------- ### Create a Pipeline Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/pipeline.md Use the Pipeline class to define a sequence of transformation steps. This example normalizes a table and then melts it by a specified field. ```Python from frictionless import Pipeline, steps pipeline = Pipeline(steps=[steps.table_normalize(), steps.table_melt(field_name='name')]) print(pipeline) ``` -------------------------------- ### Enter Hatch Shell Source: https://github.com/frictionlessdata/frictionless-py/blob/main/CONTRIBUTING.md Activate the Hatch shell environment. This ensures that all development dependencies are installed and available within the virtual environment. ```bash hatch shell ``` -------------------------------- ### Describe Data Source - Python Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/actions.md Use the `describe` function to infer metadata from a data source. Ensure the 'frictionless' library is installed. ```python from frictionless import describe resource = describe('table.csv') print(resource) ``` -------------------------------- ### Search and Create Catalog with Pagination Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/github.md Use `per_page` and `page` parameters within `GithubControl` to control pagination when searching for repositories and creating a catalog. This example fetches the first page of search results. ```Python from frictionless import portals, Catalog control = portals.GithubControl(apikey=apikey, search="user:fdtester sort:updated-desc 'TestAction: Read' in:readme", per_page=1, page=1) catalog = Catalog(control=control) ``` -------------------------------- ### Publish a Frictionless Package to a SQL database Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/sql.md Publish a Frictionless Package to a SQL database. This example assumes you have a datapackage.json file and a valid database connection URL. ```python from frictionless import Package package = Package('path/to/datapackage.json') package.publish('postgresql://database') ``` -------------------------------- ### Get Resource Scheme (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Demonstrates how to access the scheme (protocol) of a resource. The scheme indicates the loader Frictionless uses for reading or writing data. ```python from frictionless import Resource with Resource(b'header1,header2\nvalue1,value2', format='csv') as resource: print(resource.scheme) print(resource.to_view()) ``` -------------------------------- ### Run a Pipeline Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/pipeline.md Execute a defined pipeline on a data source using the transform function. This example applies a pipeline to 'table.csv' and prints the resulting schema and rows. ```Python from frictionless import Pipeline, transform, steps pipeline = Pipeline(steps=[steps.table_normalize(), steps.table_melt(field_name='name')]) resource = transform('table.csv', pipeline=pipeline) print(resource.schema) print(resource.read_rows()) ``` -------------------------------- ### Frictionless Extract Function Usage (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/extracting-data.md Shows how to use the 'extract' function in Python to get data as rows, a resource, or a package. ```python from frictionless import extract rows = extract('capital-3.csv') resource = extract('capital-3.csv', type="resource") package = extract('capital-3.csv', type="package") ``` -------------------------------- ### Publish Data with GitHub Control Configuration Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/github.md Configure the `publish` function using parameters within `GithubControl`. This example sets the repository name, display name, email, and API key for publishing. ```Python from frictionless import portals, Package package = Package('datapackage.json') control = portals.GithubControl(repo="test-repo", name='FD Test', email="test@gmail", apikey=apikey) response = package.publish(control=control) print(response) ``` -------------------------------- ### Specifying a Comment Character Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/dialect.md Shows how to define a custom character for commenting out rows using 'comment_char' in the Dialect. Rows starting with this character will be ignored. Ensure the 'frictionless' library is installed. ```python from frictionless import Resource, Dialect dialect = Dialect(comment_char="#") with Resource(b'name\n#row1\nrow2', format="csv", dialect=dialect) as resource: print(resource.read_rows()) ``` -------------------------------- ### Describe and Initialize Resource Metadata Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Shows how to create a Resource with metadata properties like name, title, and description, and how to access and modify them. ```Python from frictionless import Resource resource = Resource( name='resource', title='My Resource', description='My Resource for the Guide', path='table.csv', # it's possible to provide all the official properties like mediatype, etc ) print(resource) ``` ```Python from frictionless import Resource resource = Resource('resource.json') print(resource.name) # and others ``` ```Python from frictionless import Resource resource = Resource('resource.json') resource.name = 'new-name' resource.title = 'New Title' resource.description = 'New Description' # and others print(resource) ``` -------------------------------- ### Access and Read Rows from a GitHub Resource Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/github.md After reading a package from GitHub, access specific resources by name and read their rows. This example accesses 'first-resource'. ```python from frictionless import Package package = Package("https://github.com/fdtester/test-repo-with-datapackage-json") print(package.get_resource('first-resource').read_rows()) ``` -------------------------------- ### Describe Files in Normal Mode Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/describe.md Use this command to get visually formatted metadata for CSV files in the 'tables' directory. This is the default output mode. ```bash frictionless describe tables/*.csv ``` -------------------------------- ### Explore a Remote Dataset with Frictionless Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/explore.md Use the `frictionless explore` command to open a dataset hosted on Zenodo in Visidata for interactive analysis in the console. ```bash frictionless explore https://zenodo.org/record/3977957 ``` -------------------------------- ### Write Dataset to Parquet Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/parquet.md Write a dataset from a CSV file to a Parquet file. This example shows how to create a Resource from a CSV, write it to a Parquet file, and then read the data back. ```python from frictionless import Resource resource = Resource('table.csv') target = resource.write('table-output.parq') print(target) print(target.read_rows()) ``` -------------------------------- ### Creating a Dialect with Header Disabled Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/dialect.md Demonstrates creating a Resource with a Dialect where the header is explicitly set to False. This treats the first row as data. Ensure the 'frictionless' library is installed. ```python from frictionless import Resource, Dialect dialect = Dialect(header=False) with Resource('capital-3.csv', dialect=dialect) as resource: print(resource.header.labels) print(resource.to_view()) ``` -------------------------------- ### Manage Schema Fields Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/schema.md Provides examples of managing fields within a Schema object, including accessing all fields, field names, adding a new field, retrieving a specific field, checking for a field's existence, and removing a field. ```python from frictionless import Schema, fields schema = Schema.from_descriptor('schema.json') print(schema.fields) print(schema.field_names) schema.add_field(fields.StringField(name='new-name')) field = schema.get_field('new-name') print(schema.has_field('new-name')) schema.remove_field('new-name') ``` -------------------------------- ### Add Stats to Resource Descriptor (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/validating-data.md Extend a resource descriptor by adding statistics like MD5 hash and byte count. Note: The example includes placeholder values for demonstration. ```python from frictionless import describe resource = describe('capital-invalid.csv') resource.add_defined('stats') # TODO: fix and remove this line resource.stats.md5 = 'ae23c74693ca2d3f0e38b9ba3570775b' # this is a made up incorrect resource.stats.bytes = 100 # this is wrong resource.to_yaml('capital.resource-bad.yaml') ``` -------------------------------- ### Configure Package Reading with GitHub Control Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/github.md Control the behavior of reading package data by specifying formats using `GithubControl`. This example reads only 'csv' files from a GitHub repository. ```Python from frictionless import portals, Package control = portals.GithubControl(user="fdtester", formats=["csv"], repo="test-repo-without-datapackage") package = Package("https://github.com/fdtester/test-repo-with-datapackage-json") print(package) ``` -------------------------------- ### Create and Infer File Resource Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/resources/file.md Demonstrates how to create a FileResource and infer its schema and statistics. Ensure the 'text.txt' file exists in the specified basepath. ```python from frictionless.resources import FileResource resource = FileResource(path='text.txt') resource.infer(stats=True) print(resource) ``` -------------------------------- ### Configuring Header Rows in Dialect Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/dialect.md Shows how to specify multiple rows to be treated as the header using the 'header_rows' parameter in the Dialect. This is useful for multi-line headers. Ensure the 'frictionless' library is installed. ```python from frictionless import Resource, Dialect dialect = Dialect(header_rows=[1, 2, 3]) with Resource('capital-3.csv', dialect=dialect) as resource: print(resource.header) print(resource.to_view()) ``` -------------------------------- ### Creating a Declarative Pipeline from a Descriptor Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/transforming-data.md Constructs a pipeline from a JSON descriptor, mirroring the resource transformation example. Declarative pipelines can be saved as JSON files for sharing and CLI usage. ```Python from frictionless import Pipeline, transform pipeline = Pipeline.from_descriptor({ "steps": [ {"type": "table-normalize"}, { "type": "field-add", "name": "cars", "formula": "population*2", "descriptor": {"type": "integer"} }, ], }) print(pipeline) ``` -------------------------------- ### Create Resource from Different Sources Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Demonstrates creating a Resource object from various sources including file paths, descriptor paths, and descriptor dictionaries. ```Python from frictionless import Resource resource = Resource('table.csv') # from a resource path resource = Resource('resource.json') # from a descriptor path resource = Resource({'path': 'table.csv'}) # from a descriptor resource = Resource(path='table.csv') # from arguments ``` ```Python from frictionless import Resource resource = Resource(path='data/table.csv') # from a path resource = Resource('data/resource.json') # from a descriptor ``` -------------------------------- ### Create Frictionless Catalog from Datasets Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/catalog.md Create a Catalog by providing a list of Dataset objects. Ensure the Package is correctly specified for the Dataset. ```python from frictionless import Catalog, Dataset, Package catalog = Catalog(datasets=[Dataset(name='name', package=Package('tables/*'))]) ``` -------------------------------- ### Update and Save Resource Descriptor Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/basic-examples.md This example shows how to update a resource descriptor, including setting missing value markers, field types, and foreign keys, and then saving it to a YAML file using Frictionless Python. ```python from frictionless import Detector, describe detector = Detector(field_missing_values=["", "n/a"]) resource = describe("countries.csv", detector=detector) resource.schema.set_field_type("neighbor_id", "integer") resource.schema.foreign_keys.append( {"fields": ["neighbor_id"], "reference": {"resource": "", "fields": ["id"]}} ) resource.to_yaml("countries.resource.yaml") ``` -------------------------------- ### CsvPlugin Select Control Hook Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/advanced/system.md Example of a CsvPlugin implementing a hook to select the CsvControl based on type. ```python def select_Control(self, type: str): if type == "csv": return CsvControl ``` -------------------------------- ### CsvPlugin Parser and Detector Hooks Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/advanced/system.md Example of a CsvPlugin implementing hooks to create a CsvParser and detect CSV resources. ```python class CsvPlugin(Plugin): """Plugin for CSV""" # Hooks def create_parser(self, resource: Resource): if resource.format in ["csv", "tsv"]: return CsvParser(resource) def detect_resource(self, resource: Resource): if resource.format in ["csv", "tsv"]: resource.type = "table" resource.mediatype = f"text/{resource.format}" ``` -------------------------------- ### Instantiate and Detect Resources with System Object Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/advanced/system.md Use the system object to create adapters, loaders, and parsers, and to detect resources and field candidates. ```python from frictionless import Resource, system # Create adapter = system.create_adapter(source, control=control) loader = system.create_loader(resource) parser = system.create_parser(resource) # Detect system.detect_resource(resource) field_candidates = system.detect_field_candidates() # Select Check = system.selectCheck('type') Control = system.selectControl('type') Error = system.selectError('type') Field = system.selectField('type') Step = system.selectStep('type') ``` -------------------------------- ### Frictionless Describe Output Formats Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/overview.md Shows how to specify the output format for the `describe` command. ```bash frictionless describe # default YAML with a commented front-matter frictionless describe --yaml # standard YAML frictionless describe --json # standard JSON ``` -------------------------------- ### Read SPSS Data with Frictionless Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/spss.md Read data from an SPSS file using the frictionless.Resource class. Requires the SPSS extra to be installed. ```python from pprint import pprint from frictionless import Resource resource = Resource('table.sav') pprint(resource.read_rows()) ``` -------------------------------- ### Create Catalog from Zenodo Search Source: https://github.com/frictionlessdata/frictionless-py/blob/main/blog/2022/11-07-zenodo-integration.md Create a Frictionless Catalog by searching across Zenodo repositories. The `ZenodoControl` can be configured with search terms to find relevant data. ```python from frictionless import Catalog, portals control=portals.ZenodoControl(search='title:"open science"') catalog = Catalog( control=control, ) ``` -------------------------------- ### Read HTML Data with Frictionless Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/html.md Use `TableResource` to read rows from an HTML file. Ensure the `frictionless[html]` package is installed. ```python from pprint import pprint from frictionless import resources resource = resources.TableResource(path='table1.html') pprint(resource.read_rows()) ``` -------------------------------- ### Describe and Print Frictionless Package Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/package.md Illustrates how to create a Package with metadata like name, title, and description, and then print the package object. It's possible to provide all official properties. ```Python from frictionless import Package, Resource package = Package( name='package', title='My Package', description='My Package for the Guide', resources=[Resource(path='table.csv')], # it's possible to provide all the official properties like homepage, version, etc ) print(package) ``` -------------------------------- ### Read Data from Excel Resource Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/excel.md Read rows from an Excel file using the frictionless.Resource class. Ensure the 'excel' extra is installed. ```python from pprint import pprint from frictionless import Resource resource = Resource(path='table.xlsx') pprint(resource.read_rows()) ``` -------------------------------- ### Describe Remote Catalog from URL Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/catalog.md Instantiate a Catalog by providing a URL to a remote data catalog, such as a CKAN instance. The catalog will then describe the datasets available at that URL. ```python from frictionless import Catalog catalog = Catalog('https://demo.ckan.org/dataset/') print(catalog) ``` -------------------------------- ### Get Field from Resource Schema Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/schema.md Demonstrates how to retrieve a specific field object from a schema that has been inferred from a resource (e.g., a CSV file). ```python from frictionless import describe resource = describe('table.csv') field = resource.schema.get_field('id') print(field) ``` -------------------------------- ### Build Docker Image Source: https://github.com/frictionlessdata/frictionless-py/blob/main/CONTRIBUTING.md Build the Docker container for development. This command uses Hatch to run the image build process, setting up the complete development environment. ```bash hatch run image ``` -------------------------------- ### Skipping Blank Rows Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/dialect.md Illustrates enabling the 'skip_blank_rows' option in the Dialect to ignore rows that are entirely empty. Ensure the 'frictionless' library is installed. ```python from frictionless import Resource, Dialect dialect = Dialect(skip_blank_rows=True) with Resource(b'name\n\nrow2', format="csv", dialect=dialect) as resource: print(resource.read_rows()) ``` -------------------------------- ### Run Release Process Source: https://github.com/frictionlessdata/frictionless-py/blob/main/CONTRIBUTING.md Execute the release command to create a release commit, tag, and push to GitHub. The actual release is handled by GitHub CI. ```bash hatch run release ``` -------------------------------- ### Write Data to Stream Scheme Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/schemes/stream.md Write data to the stream scheme using `Resource.write`. This example demonstrates writing data to a CSV format. ```Python from frictionless import Resource source = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']]) target = source.write(scheme='stream', format='csv') print(target) print(target.to_view()) ``` -------------------------------- ### Explicit Package Creation in Frictionless Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/package.md Shows explicit ways to create a Package, either from arguments with a list of Resource objects or directly from a descriptor file. ```Python from frictionless import Package, Resource package = Package(resources=[Resource(path='table.csv')]) # from arguments package = Package('datapackage.json') # from a descriptor ``` -------------------------------- ### Export Resource to PETL Table Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/transforming-data.md Convert a Frictionless resource into a PETL table for use with the PETL data manipulation framework. Ensure PETL is installed. ```python from frictionless import Resource resource = Resource(path='transform.csv') petl_table = resource.to_petl() # Use it with PETL framework print(petl_table) ``` -------------------------------- ### Ignoring Specific Comment Rows Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/dialect.md Demonstrates how to ignore specific rows by providing their indices to the 'comment_rows' parameter in the Dialect. Ensure the 'frictionless' library is installed. ```python from frictionless import Resource, Dialect dialect = Dialect(comment_rows=[2]) with Resource(b'name\nrow1\nrow2', format="csv", dialect=dialect) as resource: print(resource.read_rows()) ``` -------------------------------- ### Describe a Resource to YAML (CLI) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/validating-data.md Use the 'frictionless describe' command to generate a resource descriptor in YAML format from a data file. ```bash frictionless describe capital-invalid.csv > capital.resource.yaml ``` -------------------------------- ### Apply Row Constraints with Expressions Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/checks/row.md Utilize row constraints to evaluate arbitrary Python expressions on data rows. This requires the `simpleeval` package to be installed. ```Python from pprint import pprint from frictionless import validate, checks source = [ ["row", "salary", "bonus"], [2, 1000, 200], [3, 2500, 500], [4, 1300, 500], [5, 5000, 1000], ] report = validate(source, checks=[checks.row_constraint(formula="salary == bonus * 5")]) pprint(report.flatten(["type", "message"])) ``` -------------------------------- ### Create a Frictionless Resource with Path Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/describing-data.md Instantiate a Frictionless Resource object by providing only the file path. Frictionless infers basic properties. ```python from frictionless import Resource resource = Resource("country-1.csv") print(resource) ``` -------------------------------- ### Create Schema from Various Sources Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/schema.md Demonstrates creating a Schema object from a resource path, a descriptor file, or a descriptor dictionary. The framework attempts to infer the schema type automatically. ```python from frictionless import Schema, fields, describe schema = describe('table.csv', type='schema') # from a resource path schema = Schema.from_descriptor('schema.json') # from a descriptor path schema = Schema.from_descriptor({'fields': [{'name': 'id', 'type': 'integer'}]}) # from a descriptor ``` -------------------------------- ### Write Table to JSON Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/steps/table.md Use `table_write` to save the transformed table data to a specified file path. This example writes the data to a JSON file. ```python from pprint import pprint from frictionless import Package, Resource, transform, steps source = Resource(path="transform.csv") target = transform( source, steps=[ steps.table_write(path='transform.json'), ] ) ``` ```bash cat transform.json ``` ```python with open('transform.json') as file: print(file.read()) ``` -------------------------------- ### Create Frictionless Package from Various Sources Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/package.md Demonstrates creating a Package object from different sources like file paths, globs, lists of files, and descriptor paths or dictionaries. The library automatically detects the source type. ```Python from frictionless import Package, Resource package = Package('table.csv') # from a resource path package = Package('tables/*') # from a resources glob package = Package(['tables/chunk1.csv', 'tables/chunk2.csv']) # from a list package = Package('package/datapackage.json') # from a descriptor path package = Package({'resources': {'path': 'table.csv'}}) # from a descriptor package = Package(resources=[Resource(path='table.csv')]) # from arguments ``` -------------------------------- ### Transform Data Source - Python Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/actions.md Use the `transform` function with `steps` to modify tabular data. This example uses `cell_set` to update a specific cell. ```python from frictionless import transform, steps resource = transform('table.csv', steps=[steps.cell_set(field_name='name', value='new')]) print(resource.read_rows()) ``` -------------------------------- ### Download All CKAN Datasets Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/ckan.md Initialize a Catalog to download all datasets from a CKAN instance. This is limited by the CKAN API's default number of returned datasets. ```Python import frictionless from frictionless import portals, Catalog ckan_control = portals.CkanControl(baseurl='https://legado.dados.gov.br') c = Catalog(control=ckan_control) ``` -------------------------------- ### Configure CSV Writing with CsvControl Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/csv.md Customize CSV file handling during writing using `CsvControl`. This example specifies a semicolon delimiter for the output file. ```python from frictionless import Resource, formats resource = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']]) resource.write('tmp/table.csv', control=formats.CsvControl(delimiter=';')) ``` -------------------------------- ### Stream Resource Data Using Context Manager Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Illustrates streaming resource data (bytes, text, cells, rows) using a `with` statement, which handles opening and closing automatically. ```Python from frictionless import Resource with Resource('country-3.csv') as resource: pprint(resource.byte_stream) pprint(resource.text_stream) pprint(resource.cell_stream) pprint(resource.row_stream) for row in resource.row_stream: print(row) ``` -------------------------------- ### Describe CSV File with Frictionless Python API Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/describing-data.md Use the 'frictionless.describe' function to get metadata about a CSV file. The output is printed in YAML format. ```python from frictionless import describe resource = describe("country-2.csv") print(resource.to_yaml()) ``` -------------------------------- ### Describe Schema with Metadata Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/schema.md Illustrates how to create a Schema with additional metadata such as missing values and primary keys. Accessing and printing these properties is also shown. ```python from frictionless import Schema, fields schema = Schema( fields=[fields.StringField(name='id')], missing_values=['na'], primary_key=['id'], # foreign_keys ) print(schema) ``` ```python from frictionless import Schema schema = Schema.from_descriptor('schema.json') print(schema.missing_values) # and others ``` -------------------------------- ### Extract Data from Files (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/extracting-data.md Use the frictionless Python API to extract data from CSV files matching a pattern. Ensure the 'frictionless' library is installed. ```python from pprint import pprint from frictionless import extract data = extract('*-3.csv') pprint(data) ``` -------------------------------- ### Describe Data with CLI Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/describing-data.md Use the `frictionless describe` command to generate metadata for a data file. Flags like `--type` can specify the desired metadata format (schema, resource, or package). ```bash frictionless describe your-table.csv ``` ```bash frictionless describe your-table.csv --type schema ``` ```bash frictionless describe your-table.csv --type resource ``` ```bash frictionless describe your-table.csv --type package ``` -------------------------------- ### Download CKAN Datasets with Offset Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/ckan.md Use `results_offset` to paginate through CKAN datasets, downloading packages starting from a specific point. This is useful for retrieving datasets in batches. ```Python import frictionless from frictionless import portals, Catalog ckan_control = portals.CkanControl(baseurl='https://legado.dados.gov.br', ignore_package_erros=True, results_offset=1000) c = Catalog(control=ckan_control) ``` -------------------------------- ### Read Remote Data with Resource Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/schemes/remote.md Use the Resource class to read data directly from a remote URL. This example demonstrates reading CSV data from GitHub. ```python from pprint import pprint from frictionless import Resource path='https://raw.githubusercontent.com/frictionlessdata/frictionless-py/master/data/table.csv' resource = Resource(path=path) pprint(resource.read_rows()) ``` -------------------------------- ### Implement Custom Step in Python Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/transforming-data.md Define a custom data transformation step by subclassing `frictionless.Step`. This example removes a field from a resource's schema and data. ```python from frictionless import Package, Resource, Step, transform, steps class custom_step(Step): def transform_resource(self, resource): current = resource.to_copy() # Data def data(): with current: for list in current.cell_stream: yield list[1:] # Meta resource.data = data resource.schema.remove_field("id") source = Resource("transform.csv") pipeline = Pipeline(steps=[custom_step()]) target = source.transform(pipeline) print(target.schema) print(target.to_view()) ``` -------------------------------- ### Fast Indexing with QSV Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/index.md Utilize the `--qsv` option to specify the path to the QSV binary for fast indexing. This method performs schema inference on the entire data file, guaranteeing type validity before indexing. ```bash frictionless index table.csv --database sqlite:///index/project.db --name table --fast --qsv qsv_path ``` -------------------------------- ### Describe Files in YAML Mode Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/describe.md Use this command to get metadata for CSV files in the 'tables' directory formatted as YAML. This is useful for programmatic consumption of the metadata. ```bash frictionless describe tables/*.csv --yaml ``` -------------------------------- ### Define a Custom Validation Check Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/checklist.md Create a custom Check by inheriting from frictionless.Check. This example defines a duplicate row check that uses a memory to track seen rows. ```python from frictionless import Check, errors class duplicate_row(Check): code = "duplicate-row" Errors = [errors.DuplicateRowError] def __init__(self, descriptor=None): super().__init__(descriptor) self.__memory = {} def validate_row(self, row): text = ",".join(map(str, row.values())) hash = hashlib.sha256(text.encode("utf-8")).hexdigest() match = self.__memory.get(hash) if match: note = 'the same as row at position "%s"' % match yield errors.DuplicateRowError.from_row(row, note=note) self.__memory[hash] = row.row_position # Metadata metadata_profile = { # type: ignore "type": "object", "properties": {}, } ``` -------------------------------- ### View Updated Metadata (CLI) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/describing-data.md Use the command line to view the content of the updated metadata file. ```bash cat country.resource-updated2.yaml ``` -------------------------------- ### Write CSV Data with Resource Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/csv.md The `Resource` class can also be used to write data to a CSV file. This example creates a resource from in-memory data and writes it to 'table-output.csv'. ```python from frictionless import Resource source = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']]) target = source.write('table-output.csv') print(target) print(target.to_view()) ``` -------------------------------- ### Infer Catalog with Complex Search Query Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/zenodo.md Creates a catalog by searching Zenodo for repositories matching a complex query string. Supports Zenodo's advanced search syntax. ```python from pprint import pprint from frictionless import portals, Catalog control = portals.ZenodoControl(search='+frictionlessdata +science') catalog = Catalog(control=control) ``` -------------------------------- ### Python vs CLI Argument Syntax Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/console/overview.md Illustrates the difference in argument syntax between Python API and CLI commands. ```text Python: validate('data/table.csv', limit_errors=1) CLI: $ validate data/table.csv --limit-errors 1 ``` -------------------------------- ### Read ODS data into a Resource Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/ods.md Read data from an ODS file into a Frictionless Resource object. This allows you to access and process the data programmatically. Ensure the 'ods' extra is installed. ```python from pprint import pprint from frictionless import Resource resource = Resource(path='table.ods') pprint(resource.read_rows()) ``` -------------------------------- ### Extract Data with String Field Schema Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/fields/string.md Use this snippet to extract data from a list of lists using a Frictionless Schema that includes a StringField. Ensure the frictionless library is installed. ```python from frictionless import Schema, extract, fields data = [['name'], ['value']] rows = extract(data, schema=Schema(fields=[fields.StringField(name='name')])) print(rows) ``` -------------------------------- ### Create and Describe a Data Package with Metadata Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/describing-data.md This Python code describes multiple CSV files, adds dataset-level metadata, renames resources, and defines a foreign key relationship. The result is saved to a YAML file. ```python from frictionless import describe package = describe("*-3.csv") package.title = "Countries and their capitals" package.description = "The data was collected as a research project" package.get_resource("country-3").name = "country" package.get_resource("capital-3").name = "capital" package.get_resource("country").schema.foreign_keys.append( {"fields": ["capital_id"], "reference": {"resource": "capital", "fields": ["id"]}} ) package.to_yaml("country.package.yaml") ``` -------------------------------- ### Describe and Validate a Package (CLI) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/validating-data.md Generate a package descriptor from multiple CSV files and then validate the package using CLI commands. ```bash frictionless describe capital-*id.csv > capital.package.yaml frictionless validate capital.package.yaml ``` -------------------------------- ### Write Data to HTML Format with Frictionless Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/html.md Create a `TableResource` from data and write it to an HTML file. The `to_view()` method can display the generated HTML. ```python from frictionless import Resource, resources source = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']]) target = resources.TableResource(path='table-output.html') source.write(target) print(target) print(target.to_view()) ``` -------------------------------- ### Read Resource Data in Various Formats Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Shows how to read resource data into memory as bytes, text, cells, or rows using different `read_` methods. ```Python from frictionless import Resource resource = Resource('country-3.csv') pprint(resource.read_bytes()) pprint(resource.read_text()) pprint(resource.read_cells()) pprint(resource.read_rows()) ``` -------------------------------- ### Inspecting Row Errors Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/table.md Illustrates how to inspect errors associated with a specific row, particularly useful when dealing with invalid data or discrepancies. This example uses a resource with a malformed row. ```python from pprint import pprint from frictionless import Resource with Resource([['name'], ['value', 'value']]) as resource: for row in resource.row_stream: pprint(row.errors) ``` -------------------------------- ### Describe a CSV Resource using CLI Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/describing-data.md Use the 'frictionless describe' command to infer and output metadata for a CSV file in YAML format. ```bash frictionless describe country-2.csv ``` -------------------------------- ### Implement Custom Encoding Function (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/detector.md Provide a custom encoding function to the Detector. This example uses a lambda function to always return 'utf-8', but allows for more complex logic. ```python from frictionless import Detector, Resource detector = Detector(encoding_function=lambda sample: "utf-8") with Resource("table.csv", detector=detector) as resource: print(resource.encoding) ``` -------------------------------- ### Define a Custom Transform Step Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/pipeline.md Create a custom Step class by inheriting from frictionless.Step. This example defines a 'cell_set' step that updates values in a specified field using PETL. ```Python from frictionless import Step class cell_set(Step): code = "cell-set" def __init__(self, descriptor=None, *, value=None, field_name=None): self.setinitial("value", value) self.setinitial("fieldName", field_name) super().__init__(descriptor) def transform_resource(self, resource): value = self.get("value") field_name = self.get("fieldName") yield from resource.to_petl().update(field_name, value) ``` -------------------------------- ### Describe a Resource to YAML (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/validating-data.md Use the 'frictionless.describe' function to programmatically generate a resource descriptor in YAML format. ```python from frictionless import describe resource = describe('capital-invalid.csv') resource.to_yaml('capital.resource.yaml') ``` -------------------------------- ### Describe and Validate a Package (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/validating-data.md Programmatically describe multiple data files to create a package descriptor and then validate the package. ```python from frictionless import describe, validate # create package descriptor package = describe("capital-*id.csv") package.to_yaml("capital.package.yaml") # validate report = validate("capital.package.yaml") print(report) ``` -------------------------------- ### Read Package with API Key Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/portals/zenodo.md Increase the access limit when reading from Zenodo by providing an API key. ```Python from pprint import pprint from frictionless import portals, Package control = portals.ZenodoControl(apikey=apikey) package = Package("https://zenodo.org/record/7078768", control=control) print(package) ``` -------------------------------- ### List Cleaned Data Files (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/basic-examples.md List files starting with 'countries-cleaned.' in the current directory using Python's os module. This programmatically checks for generated output files. ```python import os files = [f for f in os.listdir('.') if os.path.isfile(f) and f.startswith('countries-cleaned.')] print(files) ``` -------------------------------- ### Import JsonSchema as Table Schema Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/formats/jsonschema.md Use the `Schema.from_jsonschema` method to load a JsonSchema file and convert it into a Frictionless Table Schema. Ensure the JsonSchema file exists at the specified path. ```python schema = Schema.from_jsonschema('table.jsonschema') ``` -------------------------------- ### Update Schema with Schema Patch Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/detector.md Apply schema patches to update specific fields or properties without redefining the entire schema. This example changes the type of the 'id' field to string. ```Python from frictionless import Detector, Resource detector = Detector(schema_patch={'fields': {'id': {'type': 'string'}}}) with Resource('table.csv', detector=detector) as resource: print(resource.schema) print(resource.read_rows()) ``` -------------------------------- ### Create and Infer Resource Descriptor (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/guides/extracting-data.md Create a Resource object from a file, infer its schema, and append a missing value interpretation. The descriptor can then be saved to YAML or JSON. ```python from frictionless import Resource resource = Resource('capital-3.csv') resource.infer() # as an example, in the next line we will append the schema resource.schema.missing_values.append('3') # will interpret 3 as a missing value resource.to_yaml('capital.resource-test.yaml') # use resource.to_json for JSON format ``` -------------------------------- ### Disabling Header Case Sensitivity Validation Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/dialect.md Demonstrates disabling case-sensitive validation for headers using 'header_case=False' in the Dialect. This affects validation only and not the resulting header labels. Ensure the 'frictionless' library is installed. ```python from frictionless import Resource, Schema, Dialect, fields dialect = Dialect(header_case=False) schema = Schema(fields=[fields.StringField(name="ID"), fields.StringField(name="NAME")]) with Resource('capital-3.csv', dialect=dialect, schema=schema) as resource: print(f'Header: {resource.header}') print(f'Valid: {resource.header.valid}') # without "header_case" it will have 2 errors ``` -------------------------------- ### Index and Extract Data (Fast Mode CLI) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Use the CLI for faster indexing of CSV data into a SQLite database. Fast mode requires specific database versions and will fail if invalid data is encountered. ```bash frictionless index table.csv --database sqlite:///index/project.db --name table --fast ``` ```bash frictionless extract sqlite:///index/project.db --table table --json ``` -------------------------------- ### Specify Resource Compression (Python) Source: https://github.com/frictionlessdata/frictionless-py/blob/main/docs/framework/resource.md Explicitly provide the compression algorithm for a resource. This is useful when automatic detection might fail or to ensure the correct algorithm is used. ```python from frictionless import Resource with Resource('table.csv.zip', compression='zip') as resource: print(resource.compression) print(resource.to_view()) ``` -------------------------------- ### Publish Package to Zenodo Repository Source: https://github.com/frictionlessdata/frictionless-py/blob/main/blog/2022/11-07-zenodo-integration.md Publish a Frictionless data package to Zenodo using the `publish` function. Requires metadata and an API key. The function returns the deposition ID upon successful publication. ```python from frictionless import Package, portals control = portals.ZenodoControl( metafn="data/zenodo/metadata.json", apikey=apikey ) package = Package("data/datapackage.json") deposition_id = package.publish(control=control) print(deposition_id) ```