### Install data.world-py using pip Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Installs the data.world-py library from PyPI. This is the standard installation method for the library. ```bash pip install datadotworld ``` -------------------------------- ### Install data.world-py with Pandas support Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Installs the data.world-py library with optional pandas support, enabling DataFrame integration for data manipulation. ```bash pip install datadotworld[pandas] ``` -------------------------------- ### Install Dependencies and Run Tests with Pip and Tox Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Commands to install the project's required packages using pip and to run tests with tox, ensuring compatibility across different Python versions. ```bash pip install -e . tox --pre ``` -------------------------------- ### Get Help for data.world Python Package Source: https://github.com/datadotworld/data.world-py/blob/main/docs/index.html Demonstrates how to use the built-in 'help' function to get inline assistance on the available functions within the data.world Python package. ```python help(dw) ``` -------------------------------- ### Install data.world-py using conda Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Installs the data.world-py library from the conda-forge channel, suitable for users managing Python distributions with conda. ```bash conda install -c conda-forge datadotworld-py ``` -------------------------------- ### Create and Manage Feature Branches Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Steps to create a new feature branch for development, starting from the main branch and pulling the latest changes from the upstream repository. ```bash git checkout main git pull upstream main git checkout -b my-feature-branch ``` -------------------------------- ### Install data.world Python Package Source: https://github.com/datadotworld/data.world-py/blob/main/docs/index.html Installs the data.world Python package, including pandas support, using pip. This is the first step to integrate Python with data.world. ```bash $ pip install datadotworld[pandas] ``` -------------------------------- ### Run Code Style Checks with Flake8 Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Command to execute Flake8, a tool that checks Python code for style guide violations and potential errors, ensuring code quality. ```bash flake8 ``` -------------------------------- ### Update Existing data.world Dataset Metadata (Python) Source: https://context7.com/datadotworld/data.world-py/llms.txt Provides examples for updating specific metadata properties of an existing data.world dataset without needing to resubmit all properties. ```python import datadotworld as dw client = dw.api_client() # Update dataset tags client.update_dataset( 'username/test-dataset', tags=['updated', 'new-tag', 'demo'] ) # Update description and summary client.update_dataset( 'username/test-dataset', description='Updated description for the dataset', summary='## Updated Summary\nNew content here.', tags=['data', 'science'] ) # Change visibility client.update_dataset( 'username/test-dataset', visibility='OPEN' ) # Update license client.update_dataset( 'username/test-dataset', license='CC-BY-SA' ) ``` -------------------------------- ### Write Binary Data to Remote File in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Writes binary data (bytes or bytearray) to a file in a data.world dataset by opening the file in binary write mode ('wb'). The example demonstrates writing a sequence of bytes representing ASCII characters. ```python with dw.open_remote_file('username/test-dataset', 'test.txt', mode='wb') as w: w.write(bytes([100,97,116,97,46,119,111,114,108,100])) ``` -------------------------------- ### Write Text to a Remote File in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Writes a string of text to a specified file within a data.world dataset. This uses the open_remote_file() function with a context manager to ensure the file is properly handled. The example writes simple text content. ```python import datadotworld as dw with dw.open_remote_file('username/test-dataset', 'test.txt') as w: w.write("this is a test.") ``` -------------------------------- ### Write Dictionaries as CSV to Remote File in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Writes a sequence of dictionaries to a CSV file in a data.world dataset. It uses Python's csv.DictWriter to handle the mapping of dictionary keys to CSV columns. The example includes writing headers and then rows. ```python import csv with dw.open_remote_file('username/test-dataset', 'test.csv') as w: csvw = csv.DictWriter(w, fieldnames=['foo', 'bar']) csvw.writeheader() csvw.writerow({'foo':42, 'bar':"A"}) csvw.writerow({'foo':13, 'bar':"B"}) ``` -------------------------------- ### Access Query Results as Pandas DataFrame in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Retrieves the query results from a data.world query as a pandas DataFrame. This is useful for data manipulation and analysis using the pandas library. The example shows the structure of the DataFrame with sample data. ```python >>> results.dataframe Name PointsPerGame AssistsPerGame 0 Jon 20.4 1.3 1 Rob 15.5 8.0 2 Sharon 30.1 11.2 3 Alex 8.2 0.5 4 Rebecca 12.3 17.0 5 Ariane 18.1 3.0 6 Bryon 16.0 8.5 7 Matt 13.0 2.1 ``` -------------------------------- ### Access Query Results as Table (List of Dicts) in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Accesses query results from data.world as a list of ordered dictionaries, where each dictionary represents a row with column names as keys. This format is useful for iterating through results row by row. The example shows the first row of the table. ```python >>> results.table[0] OrderedDict([('Name', 'Jon'), ('PointsPerGame', Decimal('20.4')), ('AssistsPerGame', Decimal('1.3'))]) ``` -------------------------------- ### Clone and Set Up data.world-py Repository Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Instructions for forking the data.world-py project on GitHub, cloning your copy, and adding the upstream remote repository for future updates. ```bash git clone https://github.com/[YOUR_GITHUB_NAME]/data.world-py.git $ cd data.world-py $ git remote add upstream https://github.com/datadotworld/data.world-py.git ``` -------------------------------- ### Create New data.world Datasets (Python) Source: https://context7.com/datadotworld/data.world-py/llms.txt Shows how to create new datasets on data.world with initial metadata and optionally populate them with files from specified URLs. ```python import datadotworld as dw client = dw.api_client() # Create basic dataset dataset_key = client.create_dataset( 'username', title='My Test Dataset', visibility='PRIVATE', license='Public Domain', description='A test dataset created via API', summary='## Overview\nThis is a test dataset.', tags=['test', 'demo', 'api'] ) print(f"Created dataset: {dataset_key}") # Create dataset with files from URLs dataset_key = client.create_dataset( 'username', title='Dataset with Files', visibility='OPEN', license='CC-BY', files={ 'sample.csv': { 'url': 'http://example.com/data/sample.csv', 'description': 'Sample data file', 'labels': ['raw data', 'csv'] }, 'readme.txt': { 'url': 'http://example.com/docs/readme.txt', 'description': 'Documentation', 'labels': ['documentation'] } } ) ``` -------------------------------- ### Initialize data.world API Client Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Demonstrates how to obtain an instance of the ApiClient class to interact with the data.world API. This client is the primary interface for all API operations. ```python import datadotworld as dw client = dw.api_client ``` -------------------------------- ### Stage and Commit Changes with Git Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Basic Git commands to stage all changes and create a new commit with a descriptive log message. ```bash git add ... git commit ``` -------------------------------- ### Configure data.world authentication token Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Configures the data.world-py library by running a command-line tool to set up API authentication. Alternatively, the token can be set as an environment variable. ```bash dw configure ``` -------------------------------- ### Configure data.world Python Package Source: https://github.com/datadotworld/data.world-py/blob/main/docs/index.html Configures the data.world Python package using the command line. Requires a data.world API token, which can be obtained from the data.world Python integration page. ```bash $ dw configure ``` -------------------------------- ### Push Feature Branch to Origin Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Command to push your local feature branch to your remote origin repository on GitHub. ```bash git push origin my-feature-branch ``` -------------------------------- ### Configure Git User Information Source: https://github.com/datadotworld/data.world-py/blob/main/CONTRIBUTING.md Commands to set your global Git username and email address, which is necessary for committing changes. ```bash git config --global user.name "Your Name" git config --global user.email "contributor@example.com" ``` -------------------------------- ### Search Resources on data.world - Python Source: https://context7.com/datadotworld/data.world-py/llms.txt Searches for various resources on data.world, including datasets and projects, using a keyword query and optional filters such as resource types, tags, and user. Results can be paginated using the 'limit' parameter. ```python import datadotworld as dw client = dw.api_client() # Search for resources results = client.search_resources( query='climate data', types=['dataset'], sort='relevance', limit=10 ) for item in results['records']: print(f"{item['owner']}/{item['id']}: {item['title']}") print(f" Description: {item['description']}") print(f" Tags: {item['tags']}") print() # Search with filters results = client.search_resources( query='sales', types=['dataset', 'project'], tags=['finance', 'quarterly'], user='username', limit=20 ) ``` -------------------------------- ### Retrieve data.world Dataset Metadata (Python) Source: https://context7.com/datadotworld/data.world-py/llms.txt Demonstrates how to fetch the complete metadata for a data.world dataset, including details like title, owner, description, files, and update timestamps. ```python import datadotworld as dw client = dw.api_client() # Get dataset metadata dataset = client.get_dataset('jonloyens/an-intro-to-dataworld-dataset') print(dataset['title']) # Output: 'An Intro to data.world Dataset' print(dataset['owner']) # Output: 'jonloyens' print(dataset['description']) print(dataset['license']) print(dataset['visibility']) print(dataset['tags']) # Access file information for file_info in dataset['files']: print(f"{file_info['name']}: {file_info['sizeInBytes']} bytes") # Check last updated timestamp print(dataset['updated']) ``` -------------------------------- ### Query a dataset live Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Queries a dataset live using SQL or SPARQL query languages. The `query()` function allows for dynamic data retrieval and analysis directly from data.world. ```python # Example usage for query function would go here, but is not provided in the text. ``` -------------------------------- ### Add Files to data.world Dataset from URLs (Python) Source: https://context7.com/datadotworld/data.world-py/llms.txt Explains how to add new files to an existing data.world dataset by specifying URLs, along with optional descriptions and labels for each file. ```python import datadotworld as dw client = dw.api_client() # Add single file from URL client.add_files_via_url( 'username/test-dataset', files={ 'sample.xls': { 'url': 'http://www.sample.com/sample.xls', 'description': 'Sample Excel file', 'labels': ['raw data', 'excel'] } } ) # Add multiple files at once client.add_files_via_url( 'username/test-dataset', files={ 'data1.csv': { 'url': 'http://example.com/data1.csv', 'description': 'First data file', 'labels': ['csv', 'raw'] }, 'data2.json': { 'url': 'http://example.com/data2.json', 'description': 'JSON data', 'labels': ['json', 'processed'] }, 'data3.xlsx': { 'url': 'http://example.com/data3.xlsx' } } ) ``` -------------------------------- ### Describe dataset metadata Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Retrieves metadata for an entire dataset or a specific file within the dataset. This includes information like homepage, name, resources, format, path, and schema. ```python >>> intro_dataset.describe() {'homepage': 'https://data.world/jonloyens/an-intro-to-dataworld-dataset', 'name': 'jonloyens_an-intro-to-dataworld-dataset', 'resources': [{'format': 'csv', 'name': 'changelog', 'path': 'data/ChangeLog.csv'}, {'format': 'csv', 'name': 'datadotworldbballstats', 'path': 'data/DataDotWorldBBallStats.csv'}, {'format': 'csv', 'name': 'datadotworldbballteam', 'path': 'data/DataDotWorldBBallTeam.csv'}]} >>> intro_dataset.describe('datadotworldbballstats') {'format': 'csv', 'name': 'datadotworldbballstats', 'path': 'data/DataDotWorldBBallStats.csv', 'schema': {'fields': [{'name': 'Name', 'title': 'Name', 'type': 'string'}, {'name': 'PointsPerGame', 'title': 'PointsPerGame', 'type': 'number'}, {'name': 'AssistsPerGame', 'title': 'AssistsPerGame', 'type': 'number'}]}} ``` -------------------------------- ### Manage Projects on data.world - Python Source: https://context7.com/datadotworld/data.world-py/llms.txt Provides functionalities to create, link datasets to, retrieve, update, and delete projects on data.world. Projects serve as containers for organizing related datasets and analysis. ```python import datadotworld as dw client = dw.api_client() # Create project project_key = client.create_project( 'username', title='My Data Project', visibility='PRIVATE', objective='Analyze sales trends', tags=['sales', 'analytics'] ) # Add linked dataset to project client.add_linked_dataset( 'username/my-project', 'username/sales-dataset' ) # Get project details project = client.get_project('username/my-project') print(project['title']) print(project['linkedDatasets']) # Update project client.update_project( 'username/my-project', objective='Updated project objective', tags=['sales', 'analytics', 'q4'] ) # Remove linked dataset client.remove_linked_dataset( 'username/my-project', 'username/sales-dataset' ) # Delete project client.delete_project('username/my-project') ``` -------------------------------- ### Query a Dataset with data.world Python Source: https://github.com/datadotworld/data.world-py/blob/main/docs/index.html Shows how to query a dataset using the data.world Python package. This function allows executing SQL queries against a specified dataset. ```python query = dw.query('jonloyens/an-intro-to-dataworld-dataset', 'SELECT * FROM DataDotWorldBBallStats') ``` -------------------------------- ### Import and Use data.world Python Package Source: https://github.com/datadotworld/data.world-py/blob/main/docs/index.html Demonstrates how to import the data.world Python package, optionally with an alias like 'dw', and load a dataset using its ID. ```python import datadotworld # Or using an alias: import datadotworld as dw intro_dw = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset') ``` -------------------------------- ### Add files to a dataset from a URL Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Explains how to use the `add_files_via_url` function to add files to a dataset from specified URLs. Files are provided as a dictionary mapping desired filenames to objects containing the URL, description, and labels. ```python >>> client = dw.api_client() >>> client.add_files_via_url('username/test-dataset', files={'sample.xls': {'url':'http://www.sample.com/sample.xls', 'description': 'sample doc', 'labels': ['raw data']}}) ``` -------------------------------- ### Configure data.world Library Authentication - Python Source: https://context7.com/datadotworld/data.world-py/llms.txt Configures authentication for the data.world-py library using various methods including CLI, environment variables, inline tokens, and named profiles. Select the method that best suits your environment and security needs. ```python import datadotworld as dw import os # Method 1: Use CLI to configure (interactive) # Run in terminal: dw configure # Enter token when prompted # Method 2: Set environment variable os.environ['DW_AUTH_TOKEN'] = 'your_token_here' dataset = dw.load_dataset('owner/dataset-id') # Method 3: Pass token inline dataset = dw.load_dataset( 'owner/dataset-id', auth_token='your_token_here' ) # Method 4: Use named profile # Create profile with: dw configure --profile production dataset = dw.load_dataset( 'owner/dataset-id', profile='production' ) # Get API client with specific profile client = dw.api_client(profile='production') # Get API client with inline auth client = dw.api_client(auth_token='your_token_here') ``` -------------------------------- ### Set DW_AUTH_TOKEN environment variable Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Sets the data.world API authentication token as an environment variable. This is an alternative to using the 'dw configure' command. ```bash export DW_AUTH_TOKEN= ``` -------------------------------- ### Upload Files to a Dataset using data.world Python Source: https://github.com/datadotworld/data.world-py/blob/main/docs/index.html Illustrates how to save a loaded dataset to a CSV file and then upload it to a data.world dataset using the API client. Requires the dataset ID and the file path. ```python intro_dw.to_csv('dataset_intro.csv',index=False) client = dw.api_client() client.upload_files('jonloyens/an-intro-to-dataworld-dataset',files='dataset_intro.csv') ``` -------------------------------- ### Load a dataset using datadotworld Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Loads a dataset from data.world into the local filesystem cache. Subsequent calls will use the cached copy unless force_update or auto_update is set to True. The loaded dataset can be accessed via properties like dataframes, tables, or raw_data. ```python import datadotworld as dw intro_dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset') ``` -------------------------------- ### Access dataset tables Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Accesses dataset tables as lists of rows, where each row is a dictionary mapping column names to values. This provides structured access to non-DataFrame compatible data. ```python >>> stats_table = intro_dataset.tables['datadotworldbballstats'] >>> stats_table[0] OrderedDict([('Name', 'Jon'), ('PointsPerGame', Decimal('20.4')), ('AssistsPerGame', Decimal('1.3'))]) ``` -------------------------------- ### Load Dataset with data.world-py Source: https://context7.com/datadotworld/data.world-py/llms.txt Downloads and caches datasets locally for offline access. It returns a LocalDataset object that allows access to data in multiple formats like pandas DataFrames, tables (lists of dicts), and raw bytes. Options for forcing updates or auto-updating are available, along with functions to describe dataset metadata and resource schemas. ```python import datadotworld as dw # Load dataset with automatic caching dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset') # Access data as pandas DataFrames dataframes = dataset.dataframes stats_df = dataframes['datadotworldbballstats'] print(stats_df.head()) # Access data as tables (list of dicts) stats_table = dataset.tables['datadotworldbballstats'] print(stats_table[0]) # Output: OrderedDict([('Name', 'Jon'), ('PointsPerGame', Decimal('20.4')), ('AssistsPerGame', Decimal('1.3'))]) # Access raw data as bytes raw_data = dataset.raw_data['datadotworldbballstats'] # Force update to get latest version dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset', force_update=True) # Auto-update only if newer version exists dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset', auto_update=True) # View dataset metadata metadata = dataset.describe() print(metadata['resources']) # View specific resource metadata resource_info = dataset.describe('datadotworldbballstats') print(resource_info['schema']) ``` -------------------------------- ### Read Remote Files from data.world Datasets (Python) Source: https://context7.com/datadotworld/data.world-py/llms.txt Demonstrates reading various file types (text, CSV, binary) from data.world datasets using streaming file-like objects. Supports custom chunk sizes for large files. ```python import datadotworld as dw import csv # Read text file with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r: content = r.read() print(content) # Read CSV file line by line with dw.open_remote_file('username/test-dataset', 'test.csv', mode='r') as r: csvr = csv.DictReader(r) for row in csvr: print(row['column_a'], row['column_b']) # Read binary file with dw.open_remote_file('username/test-dataset', 'test.bin', mode='rb') as r: binary_data = r.read() print(len(binary_data)) # Read with custom chunk size for large files with dw.open_remote_file('username/test-dataset', 'large_file.bin', mode='rb', chunk_size=8192) as r: for chunk in r: process_chunk(chunk) ``` -------------------------------- ### Write Files to data.world Datasets with data.world-py Source: https://context7.com/datadotworld/data.world-py/llms.txt Enables writing data to remote files within data.world datasets using file-like objects compatible with standard Python libraries. This supports writing various formats, including plain text, pandas DataFrames as CSV, CSV using the `csv` module, JSON Lines, and binary files. The `open_remote_file` function handles the interaction with the data.world API for these operations. ```python import datadotworld as dw import pandas as pd import csv import json # Write a simple text file with dw.open_remote_file('username/test-dataset', 'test.txt') as w: w.write("this is a test.") # Write pandas DataFrame as CSV df = pd.DataFrame({'foo': [1, 2, 3, 4], 'bar': ['a', 'b', 'c', 'd']}) with dw.open_remote_file('username/test-dataset', 'dataframe.csv') as w: df.to_csv(w, index=False) # Write CSV using csv module with dw.open_remote_file('username/test-dataset', 'test.csv') as w: csvw = csv.DictWriter(w, fieldnames=['foo', 'bar']) csvw.writeheader() csvw.writerow({'foo': 42, 'bar': "A"}) csvw.writerow({'foo': 13, 'bar': "B"}) # Write JSON Lines file with dw.open_remote_file('username/test-dataset', 'test.jsonl') as w: json.dump({'foo': 42, 'bar': "A"}, w) w.write("\n") json.dump({'foo': 13, 'bar': "B"}, w) w.write("\n") # Write binary file with dw.open_remote_file('username/test-dataset', 'test.bin', mode='wb') as w: w.write(bytes([100, 97, 116, 97, 46, 119, 111, 114, 108, 100])) ``` -------------------------------- ### Upload Local Files to Dataset - Python Source: https://context7.com/datadotworld/data.world-py/llms.txt Uploads one or more files from the local filesystem to a data.world dataset. Can specify custom names for uploaded files within the dataset. Ensure the dataset exists and the client has upload permissions. ```python import datadotworld as dw client = dw.api_client() # Upload single file client.upload_file( 'username/test-dataset', 'local_data.csv' ) # Upload multiple files client.upload_files( 'username/test-dataset', [ '/path/to/file1.csv', '/path/to/file2.json', '/path/to/file3.xlsx' ] ) # Upload with custom name in dataset client.upload_file( 'username/test-dataset', '/local/path/my_file.csv', name='uploaded_data.csv' ) ``` -------------------------------- ### Retrieve Query Result Metadata in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Fetches metadata associated with a data.world query result using the describe() function. This metadata includes information about the fields, such as their names and data types. This is helpful for understanding the structure of the returned data. ```python >>> results.describe() {'fields': [{'name': 'Name', 'type': 'string'}, {'name': 'PointsPerGame', 'type': 'number'}, {'name': 'AssistsPerGame', 'type': 'number'}]} ``` -------------------------------- ### Execute SQL Query and Access Results in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Executes a SQL query against a specified data.world dataset and provides access to the results. The query results can be accessed as raw JSON, a list of rows (table), or a pandas DataFrame. This snippet demonstrates basic SQL querying. ```python import datadotworld as dw results = dw.query('jonloyens/an-intro-to-dataworld-dataset', 'SELECT * FROM DataDotWorldBBallStats') ``` -------------------------------- ### Append Records to Dataset - Python Source: https://context7.com/datadotworld/data.world-py/llms.txt Appends records to a specified dataset and table. Requires a valid data.world client and dataset/table identifiers. The records should be in a format compatible with the table schema. ```python import datadotworld as dw import time client = dw.api_client() # Example with complex nested data event_data = { 'event_type': 'user_action', 'user_id': 'user123', 'timestamp': time.time(), 'metadata': { 'action': 'click', 'target': 'button_submit', 'session_id': 'session456' }, 'properties': ['prop1', 'prop2'] } client.append_records('username/test-dataset', 'events', event_data) ``` -------------------------------- ### Access dataset dataframes Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Accesses tabular data from a loaded dataset as pandas DataFrames. The data is lazy loaded and cached. Files not containing tabular data are accessible via raw_data. ```python >>> intro_dataset.dataframes LazyLoadedDict({ 'changelog': LazyLoadedValue(), 'datadotworldbballstats': LazyLoadedValue(), 'datadotworldbballteam': LazyLoadedValue()}) ``` -------------------------------- ### Append Records to data.world Stream (Python) Source: https://context7.com/datadotworld/data.world-py/llms.txt Illustrates how to append JSON records to a data stream within a data.world dataset, suitable for real-time data ingestion. Supports appending single or multiple records. ```python import datadotworld as dw import time client = dw.api_client() # Append single record to stream client.append_records( 'username/test-dataset', 'sensor_stream', {'timestamp': time.time(), 'temperature': 72.5, 'humidity': 45} ) # Append multiple records for i in range(10): record = { 'id': i, 'timestamp': time.time(), 'value': i * 2.5, 'status': 'active' } client.append_records('username/test-dataset', 'data_stream', record) time.sleep(0.1) # Stream appears as .jsonl file in dataset ``` -------------------------------- ### Read Binary Data from Remote File in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Reads the entire content of a binary file from a data.world dataset. The file is opened in binary read mode ('rb'), and the read() method returns the content as a byte array. The file object also acts as an iterator of bytes. ```python with dw.open_remote_file('username/test-dataset', 'test', mode='rb') as r: bytes = r.read() ``` -------------------------------- ### Read Text from Remote File in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Reads the entire content of a text file from a data.world dataset. The file is opened in read mode ('r') using a context manager, and its content is retrieved using the read() method. ```python with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r: print(r.read) ``` -------------------------------- ### Write JSON Lines to Remote File in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Writes multiple JSON objects, one per line, to a file in a data.world dataset. This is achieved by using json.dump() repeatedly within the context of an opened remote file. Each dump corresponds to a new line in the file. ```python import json with dw.open_remote_file('username/test-dataset', 'test.jsonl') as w: json.dump({'foo':42, 'bar':"A"}, w) json.dump({'foo':13, 'bar':"B"}, w) ``` -------------------------------- ### Append records to a data stream Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Illustrates the use of the `append_records` function to add JSON data to a data stream within a dataset. Streams are created automatically on the first use of a `streamId`. Appended records appear as .jsonl files. ```python >>> client = dw.api_client() >>> client.append_records('username/test-dataset','streamId', {'data': 'data'}) ``` -------------------------------- ### Query Datasets with data.world-py Source: https://context7.com/datadotworld/data.world-py/llms.txt Executes SQL or SPARQL queries against datasets on data.world without local caching. Results can be accessed as pandas DataFrames, tables (lists of dicts), or raw JSON. The function supports query parameters for both SQL (positional) and SPARQL (named) queries. It also allows viewing the schema of the query results. ```python import datadotworld as dw # Execute SQL query results = dw.query( 'jonloyens/an-intro-to-dataworld-dataset', 'SELECT * FROM DataDotWorldBBallStats WHERE PointsPerGame > 15' ) # Access results as pandas DataFrame df = results.dataframe print(df) # Name PointsPerGame AssistsPerGame # 0 Jon 20.4 1.3 # 1 Sharon 30.1 11.2 # 2 Ariane 18.1 3.0 # Access results as table (list of dicts) table = results.table print(table[0]) # Output: OrderedDict([('Name', 'Jon'), ('PointsPerGame', Decimal('20.4')), ('AssistsPerGame', Decimal('1.3'))]) # Access raw JSON data raw = results.raw_data print(raw) # Execute SPARQL query with parameters sparql_query = """ SELECT ?name ?points WHERE { ?player ?name . ?player ?points . FILTER(?points > ?minPoints) } """ results = dw.query( 'jonloyens/an-intro-to-dataworld-dataset', sparql_query, query_type='sparql', parameters={'minPoints': 15.0} ) # Execute SQL with positional parameters sql_query = "SELECT * FROM DataDotWorldBBallStats WHERE PointsPerGame > $data_world_param0" results = dw.query( 'jonloyens/an-intro-to-dataworld-dataset', sql_query, query_type='sql', parameters=[15.0] ) # View query result schema schema = results.describe() print(schema['fields']) ``` -------------------------------- ### Delete Dataset - Python Source: https://context7.com/datadotworld/data.world-py/llms.txt Permanently deletes a dataset from data.world. This action is irreversible. It's recommended to wrap the deletion call in a try-except block for robust error handling, especially when dealing with potentially non-existent datasets. ```python import datadotworld as dw client = dw.api_client() # Delete dataset client.delete_dataset('username/test-dataset') # Wrap in try-except for error handling try: client.delete_dataset('username/nonexistent-dataset') except Exception as e: print(f"Error deleting dataset: {e}") ``` -------------------------------- ### Read CSV Data from Remote File into DictReader in Python Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Reads a CSV file from a data.world dataset and processes it using csv.DictReader. The file is opened in read mode ('r'), and the DictReader iterates over the rows, yielding each row as a dictionary. This is suitable for structured CSV data. ```python import csv with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r: csvr = csv.DictReader(r) for row in csvr: print(row['column a'], row['column b']) ``` -------------------------------- ### Write Pandas DataFrame to CSV in Remote File (Python) Source: https://github.com/datadotworld/data.world-py/blob/main/README.rst Serializes a pandas DataFrame and writes it as a CSV file to a data.world dataset. This function leverages the DataFrame's to_csv method, passing the remote file handle as the output stream. It's a convenient way to upload tabular data. ```python import pandas as pd df = pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']}) with dw.open_remote_file('username/test-dataset', 'dataframe.csv') as w: df.to_csv(w, index=False) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.