### Install and run the 'ia' command-line tool Source: https://archive.org/developers/internetarchive/cli Download the binary for the `ia` command-line tool, make it executable, and view its help information. This is the initial setup for using the tool. ```bash curl -LOs https://archive.org/download/ia-pex/ia chmod +x ia ./ia --help ``` -------------------------------- ### Get Item Metadata using ia CLI Source: https://archive.org/developers/quick-start-cli Command to retrieve metadata for a specific item on the Internet Archive using its unique identifier. Requires the 'ia' tool to be installed and configured. ```bash ia metadata ``` -------------------------------- ### Install and Configure ia CLI Tool Source: https://archive.org/developers/quick-start-cli Steps to download the 'ia' CLI tool, make it executable, and configure it with Internet Archive credentials. Requires cURL and a Unix-like environment. ```bash curl -LOs https://archive.org/download/ia-pex/ia ``` ```bash chmod +x ia ``` ```bash ia configure ``` -------------------------------- ### Upload Item to Internet Archive using ia CLI Source: https://archive.org/developers/quick-start-cli Syntax for uploading files to the Internet Archive using the 'ia' CLI tool. Allows specifying metadata such as mediatype and other parameters. Requires the 'ia' tool to be installed and configured. ```bash ia upload file1 file2 --metadata="mediatype:texts" --metadata="param:arg" ``` -------------------------------- ### Get Item Metadata using Python `ia` package Source: https://archive.org/developers/quick-start-pip Retrieves and prints the metadata associated with a specific item from the Internet Archive using the `internetarchive` Python package. Requires the `internetarchive` library to be installed. ```python from internetarchive import get_item item = get_item('') for k,v in item.metadata.items(): print(print(k,":",v)) ``` -------------------------------- ### Upload Item to Internet Archive using Python `ia` package Source: https://archive.org/developers/quick-start-pip Uploads a new item to the Internet Archive, including specified files and metadata. This function requires valid access keys and item details. Ensure the `internetarchive` package is installed. ```python from internetarchive import get_item md = {'collection': 'test_collection', 'title': 'My New Item', 'mediatype': 'movies'} r = item.upload('', files=['film.txt', 'film.mov'], metadata=md, access_key='YoUrAcCEssKey', secret_key='youRSECRETKEY') r[0].status_code ``` -------------------------------- ### Install Tox Source: https://archive.org/developers/internetarchive/contributing Installs tox, a tool for automating testing in multiple Python environments. ```bash $ pip install tox ``` -------------------------------- ### Install Testing Dependencies Source: https://archive.org/developers/internetarchive/contributing Installs the necessary packages for running tests: pytest, pytest-pep8, and responses. ```bash $ pip install pytest pytest-pep8 responses ``` -------------------------------- ### Install Internetarchive Locally Source: https://archive.org/developers/internetarchive/contributing Installs the internetarchive library in an editable mode after navigating into its directory. ```bash $ cd internetarchive $ pip install -e . ``` -------------------------------- ### Search Items by Identifier (Python) Source: https://archive.org/developers/internetarchive/quickstart Shows how to use the `search_items` function to find items on archive.org based on their identifier. The example iterates through results and prints each item's identifier. ```python from internetarchive import search_items for i in search_items('identifier:nasa'): print(i['identifier']) ``` -------------------------------- ### Archive API Cold Start Response Example Source: https://archive.org/developers/changes Example JSON response from the Archive API during a cold start. It includes a flag for sleep behavior, a list of 'changes' with 'identifier', and a 'next_token' for subsequent calls. ```json { "do_sleep_before_returning": false, "changes": [ { "identifier": "0----------" }, ... { "identifier": "008MRAnonymous" } ], "next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOiI1MzIyODQzOTAiLCJzY2FuX3N0YXJ0IjoiMDA4TVJBbm9ueW1vdXMifQ==" } ``` -------------------------------- ### Install internetarchive using pipx Source: https://archive.org/developers/internetarchive/installation Installs the internetarchive Python library and command-line tool in an isolated environment using pipx. This is the recommended installation method. ```bash pipx install internetarchive ``` -------------------------------- ### Filter Downloads using Glob Pattern Source: https://archive.org/developers/internetarchive/quickstart Demonstrates how to download only specific files from an archive item using the `glob_pattern` parameter. This example downloads all XML files. ```python download('nasa', verbose=True, glob_pattern='*xml') ``` -------------------------------- ### Verify internetarchive Installation Source: https://archive.org/developers/internetarchive/installation Command to verify that the internetarchive library has been installed correctly. It should display the installed version number. ```bash ia --version ``` -------------------------------- ### Local Start Time Example Source: https://archive.org/developers/metadata-schema/index This snippet shows an example of the local start time for a program in its broadcast time zone. This format is YYYY-MM-DD HH:MM:SS and is primarily used for TV Archive items. ```date-time 2010-03-26 18:00:00 ``` -------------------------------- ### Reviews API Example Source: https://archive.org/developers/iarest An example of a POST request to the Reviews API to add or update a user's review, including request and success/error response examples. ```APIDOC ## POST /services/reviews.php ### Description Adds or updates a user's review for a specific item. ### Method POST ### Endpoint `/services/reviews.php` ### Query Parameters - **identifier** (string) - Required - The unique identifier of the item. - **version** (string) - Required - The version of the item. ### Request Body - **title** (string) - Required - The title of the review. - **body** (string) - Required - The content of the review. - **stars** (integer) - Required - The star rating for the review (1-5). ### Request Example ``` POST /services/reviews.php?identifier=foo&version=1 HTTP/1.1 Host: archive.org Authorization: LOW : Content-Type: application/json Accept-Encoding: gzip, deflate { "title":"A review title", "body":"A review body", "stars":1 } ``` ### Response #### Success Response (200) - **success** (boolean) - Indicates if the operation was successful. - **value** (object) - Contains task details. - **task_id** (integer) - The ID of the task. - **review_updated** (boolean) - Whether the review was updated or added. #### Response Example (Success) ``` HTTP/1.1 200 OK Content-Type: application/json Content-Encoding: gzip { "success":true, "value":{ "task_id":1234, "review_updated":false } } ``` #### Error Response Example (401 Unauthorized) ``` HTTP/1.1 401 Unauthorized Content-Type: application/json Content-Encoding: gzip { "success":false, "error":"Authentication failed" } ``` ``` -------------------------------- ### Example Request for Website Snapshots Source: https://archive.org/developers/tutorial-compare-snapshot-wayback An example cURL command to retrieve snapshots for 'tc.eserver.org' from the Wayback Machine CDX Server API. ```bash curl -X GET "http://web.archive.org/cdx/search/cdx?url=tc.eserver.org" ``` -------------------------------- ### Check Python Version Source: https://archive.org/developers/internetarchive/installation Command to check the currently installed Python version. This is a prerequisite for installing the internetarchive library. ```bash python --version ``` -------------------------------- ### Download and execute internetarchive binary Source: https://archive.org/developers/internetarchive/installation Downloads the pre-compiled binary for the 'ia' command-line tool using curl and makes it executable. This is an alternative installation method for users who only need the CLI. ```bash $ curl -LOs https://archive.org/download/ia-pex/ia $ chmod +x ia ``` -------------------------------- ### Check cURL Installation Source: https://archive.org/developers/tutorial-compare-snapshot-wayback Verifies if the cURL command-line tool is installed on your system. cURL is required for interacting with the Wayback Machine CDX Server API. ```bash curl ``` -------------------------------- ### Runtime Examples (HH:MM:SS, M:SS) Source: https://archive.org/developers/metadata-schema/index Examples of runtime formats for audio or video items, showing different ways to represent duration. ```text 00:15:00 ``` ```text 2:12 ``` ```text 0:23 ``` -------------------------------- ### License URL Example Source: https://archive.org/developers/metadata-schema/index Example of a License URL pointing to a recognized license, such as Creative Commons. ```text http://creativecommons.org/licenses/by-nd/3.0/ ``` -------------------------------- ### Sort-By Example (Date) Source: https://archive.org/developers/metadata-schema/index An example of the sort-by field, specifying the default sorting order for a collection, in this case, by date in descending order. ```text -date ``` -------------------------------- ### Check cURL Installation Source: https://archive.org/developers/tutorial-get-snapshot-wayback Verify if the cURL command-line tool is installed on your system. cURL is a prerequisite for making requests to the Wayback Machine API. ```bash curl ``` -------------------------------- ### Successful single-target write response example (HTTP) Source: https://archive.org/developers/md-write An example of a successful HTTP response for a single-target write operation. It indicates success, provides a task ID for tracking, and a URL to the execution log. ```http HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "success":true, "task_id":2391928033, "log":"https://catalogd.archive.org/log/2391928033" } ``` -------------------------------- ### Example Successful Response for Adding/Updating Review Source: https://archive.org/developers/reviews Provides an example of a successful response after adding or updating a review, including the task ID for the operation and a boolean indicating if a review was updated. ```json { "success":true, "value":{ "task_id":1234, "review_updated":false } } ``` -------------------------------- ### UTC Start Time Example Source: https://archive.org/developers/metadata-schema/index Defines the start time of a program in UTC, mainly for TV Archive items. Format: YYYY-MM-DD HH:MM:SS. ```text 2010-03-26 15:00:00 ``` -------------------------------- ### Fetching Task Listings (GET requests) Source: https://archive.org/developers/tasks Demonstrates how to retrieve task listings from the Internet Archive Tasks API using HTTP GET requests. It shows examples of specifying categories and criteria for filtering tasks. ```http https://archive.org/services/tasks.php?summary=1&history=1 https://archive.org/services/tasks.php?catalog=1&summary=0 https://archive.org/services/tasks.php?identifier=prelinger&cmd=archive.php&history=1 https://archive.org/services/tasks.php?submittime%3E%3D=Jan+1+2018 ``` -------------------------------- ### Slice Array Elements from Item Files (HTTP GET) Source: https://archive.org/developers/md-read Demonstrates how to retrieve a range of elements from the 'files' array in an item's metadata using the 'start' and 'count' query parameters in the GET request. ```HTTP Request https://archive.org/metadata/gov.uspto.patents.application.10743335/files?start=100&count=5 ``` -------------------------------- ### Download Files from Archive.org using python Source: https://archive.org/developers/internetarchive/quickstart Demonstrates the basic usage of the `download` function to retrieve files from a specified archive item. It shows how to enable verbose output for progress monitoring. ```python from internetarchive import download download('nasa', verbose=True) ``` -------------------------------- ### Collection Summary Metadata Example Source: https://archive.org/developers/metadata-schema/index This example demonstrates the format for a collection summary, which can include HTML, CSS, and potentially JavaScript. It is displayed at the top of collection pages. ```html The Universal School Library (USL), is a growing collection of digitized books within the Internet Archive's larger holdings, made available through controlled digital lending, and curated by a national advisory group of school librarians, librarian educators and researchers. ``` -------------------------------- ### Example Identifiers for Simple Lists Source: https://archive.org/developers/simplelists Illustrative identifiers used in examples for the Simple Lists service, demonstrating a collection parent, a book item child, and a list name. ```text Parent: library_of_atlantis (a collection) Child: isbn_9780920303122 (book item) List name: holdings ``` -------------------------------- ### Example Creator Codes Source: https://archive.org/developers/metadata-schema/index Illustrates examples of creator entries, representing the individual or organization that created the media content. Formats vary from standard naming conventions to organization names. ```plaintext Austen, Jane, 1775-1817 ``` ```plaintext Ralph Burns ``` -------------------------------- ### Example Item Description Source: https://archive.org/developers/metadata-schema/index An example of an item description, which can include details about the media content, physical item, or creator. HTML, CSS, and historically Javascript were supported for formatting. ```plaintext Cinemascope homage to the city of San Francisco made by amateur filmmaker and inventor Tullio Pellegrini. ``` -------------------------------- ### Example Request to Wayback Machine API Source: https://archive.org/developers/tutorial-get-snapshot-wayback An example of how to use the cURL command to query the Wayback Machine API for the archive status of a specific website (http://tc.eserver.org/). ```bash curl -X GET "https://archive.org/wayback/available?url=http://tc.eserver.org/" ``` -------------------------------- ### Run Basic Tests Source: https://archive.org/developers/internetarchive/contributing Executes the tests using py.test, including PEP8 compliance checks. ```bash $ py.test --pep8 ``` -------------------------------- ### Error single-target write response example (HTTP) Source: https://archive.org/developers/md-write An example of an error HTTP response for a single-target write operation. It indicates failure and provides an error message explaining the issue. ```http HTTP/1.1 400 Bad Request Content-Type: application/json Transfer-Encoding: chunked { "success":false, "error":"No changes made to _meta.xml" } ``` -------------------------------- ### GET /changes/v1 Source: https://archive.org/developers/changes Retrieves changes from the archive. Supports a cold start by setting `start_date` to `0` to dump all items, followed by incremental changes using `next_token`. ```APIDOC ## GET /changes/v1 ### Description Retrieves changes from the archive. When `start_date` is set to `0`, it initiates a cold start by dumping all items, followed by incremental changes based on the dump time. Subsequent calls with the `next_token` will fetch subsequent changes. ### Method GET ### Endpoint https://be-api.us.archive.org/changes/v1 ### Parameters #### Query Parameters - **access** (string) - Required - Your access token. - **secret** (string) - Required - Your secret key. - **start_date** (integer) - Optional - Set to `0` for a cold start (dump all items). If not provided, it defaults to incremental changes since the last retrieval. - **token** (string) - Optional - Token obtained from a previous call's `next_token` to retrieve subsequent changes. ### Request Example ```bash curl --data-urlencode access="$ACCESS" --data-urlencode secret="$SECRET" \ --data-urlencode start_date=0 \ https://be-api.us.archive.org/changes/v1 | jq . ``` ### Response #### Success Response (200) - **do_sleep_before_returning** (boolean) - Indicates if a sleep is scheduled before returning. - **changes** (array) - A list of changes, where each change object may contain an `identifier`. - **identifier** (string) - The identifier of the changed item. - **next_token** (string) - A token to be used in subsequent calls to retrieve the next set of changes. #### Response Example ```json { "do_sleep_before_returning": false, "changes": [ { "identifier": "0----------" }, { "identifier": "008MRAnonymous" } ], "next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOiI1MzIyODQzOTAiLCJzY2FuX3N0YXJ0IjoiMDA4TVJBbm9ueW1vdXMifQ==" } ``` ``` -------------------------------- ### Download On-the-Fly Files (Python) Source: https://archive.org/developers/internetarchive/quickstart Demonstrates downloading files that are generated on-the-fly by archive.org, such as EPUB, MOBI, DAISY, and MARCXML formats. This utilizes the `on_the_fly=True` parameter in the `download` function. ```python from internetarchive import download download('wonderfulwizardo00baumiala', verbose=True, formats='DAISY', on_the_fly=True) ``` -------------------------------- ### Run All Tests with Tox Source: https://archive.org/developers/internetarchive/contributing Executes tox from the repository root to run tests against all supported Python versions defined in tox.ini. ```bash $ tox ``` -------------------------------- ### Box ID Format Example Source: https://archive.org/developers/metadata-schema/index Demonstrates the required format for the 'boxid' field, which specifies the physical location of an item in the archive. Box IDs always start with 'IA' followed by numbers. ```text IA158001 ``` -------------------------------- ### Get archive.org item using Python Source: https://archive.org/developers/internetarchive/internetarchive Demonstrates how to use the `internetarchive` Python library to retrieve an item from archive.org. It requires the `internetarchive` library to be installed. The code fetches an item by its identifier and checks for its existence. ```python >>> from internetarchive import get_item >>> item = get_item('govlawgacode20071') >>> item.exists True ``` -------------------------------- ### Example Response for Website Snapshots Source: https://archive.org/developers/tutorial-compare-snapshot-wayback Sample output from the Wayback Machine CDX Server API, detailing various snapshots of a website. Each line represents a snapshot with its associated metadata. ```text org,eserver,tc)/ 20180515033912 http://tc.eserver.org:80/ text/html 302 RK36SX4X6VJ44FMUWDK4QYFPYGBYUJUH 404 org,eserver,tc)/ 20180716082607 http://tc.eserver.org:80/ text/html 302 RK36SX4X6VJ44FMUWDK4QYFPYGBYUJUH 405 org,eserver,tc)/ 20180915160723 http://tc.eserver.org:80/ text/html 302 RK36SX4X6VJ44FMUWDK4QYFPYGBYUJUH 404 org,eserver,tc)/ 20181014163006 http://tc.eserver.org/ warc/revisit - RK36SX4X6VJ44FMUWDK4QYFPYGBYUJUH 502 org,eserver,tc)/ 20181115172501 http://tc.eserver.org:80/ text/html 302 RK36SX4X6VJ44FMUWDK4QYFPYGBYUJUH 404 org,eserver,tc)/ 20181228210547 http://tc.eserver.org/ warc/revisit - RK36SX4X6VJ44FMUWDK4QYFPYGBYUJUH 500 ``` -------------------------------- ### Simulate API Errors with Curl Source: https://archive.org/developers/ias3 Developers can test error handling by simulating S3 API errors using the `x-archive-simulate-error` header. For example, to simulate a SlowDown error, set the header to `SlowDown`. Use `help` to get a list of available simulated errors. ```bash $ curl s3.us.archive.org -v -H x-archive-simulate-error:SlowDown ``` ```bash $ curl s3.us.archive.org -v -H x-archive-simulate-error:help ``` -------------------------------- ### Filter Downloads by Multiple Formats Source: https://archive.org/developers/internetarchive/quickstart Illustrates downloading multiple specified file formats from an archive item by providing a list of format strings to the `formats` parameter. ```python download('goodytwoshoes00newyiala', verbose=True, formats=['DjVuTXT', 'MARC']) ``` -------------------------------- ### Skip Downloaded Files Using MD5 Checksums Source: https://archive.org/developers/internetarchive/quickstart Shows how to use the `checksum=True` option in the `download` function to ensure file integrity by comparing MD5 checksums. This is safer but computationally more intensive. ```python download('nasa', verbose=True, checksum=True) ``` -------------------------------- ### Upgrade internetarchive using pipx Source: https://archive.org/developers/internetarchive/installation Command to upgrade the internetarchive library to the latest available version using pipx. ```bash pipx upgrade internetarchive ``` -------------------------------- ### Rights Statement Examples Source: https://archive.org/developers/metadata-schema/index Examples of rights statements, including a placeholder for Creative Commons licenses and a full text example for specific usage permissions. ```text Permission is granted under the Wikimedia Foundation's ``` ```text These National Treasury publications may not be reproduced wholly or in part without the express authorisation of the National Treasury in writing unless used for non-profit purposes. ``` -------------------------------- ### Install internetarchive package Source: https://archive.org/developers/tutorial-find-identifier-item Installs the necessary Python package for interacting with the Internet Archive. Ensure you have Python 3 and pip installed. ```shell pip install internetarchive ``` -------------------------------- ### Iterate Search Results as Item Objects (Python) Source: https://archive.org/developers/internetarchive/quickstart Illustrates retrieving full `Item` objects from search results using the `iter_as_items()` method of the `search_items` function. This allows for more detailed interaction with each found item. ```python from internetarchive import search_items for item in search_items('identifier:nasa').iter_as_items(): print(item) ``` -------------------------------- ### Configure Archive.org Credentials using CLI Source: https://archive.org/developers/internetarchive/quickstart Configure the internetarchive library by saving your archive.org credentials to a configuration file using the `ia configure` command. This is necessary for uploading, searching, and modifying metadata. It supports providing credentials directly or using a netrc file. ```bash $ ia configure Enter your archive.org credentials below to configure 'ia'. Email address: user@example.com Password: Config saved to: /home/user/.config/ia.ini ``` ```bash ia configure --netrc ``` ```bash ia --config-file '~/.ia-custom-config' configure ``` -------------------------------- ### Skip Downloaded Files Based on Length and Date Source: https://archive.org/developers/internetarchive/quickstart Illustrates how the `download` function skips files that already exist locally, matching both filename, mtime, and size with those on archive.org. This prevents redundant downloads. ```python download('nasa', verbose=True) ``` -------------------------------- ### Clone Internetarchive Repository Source: https://archive.org/developers/internetarchive/contributing Clones the internetarchive library from its GitHub repository. ```bash $ git clone https://github.com/jjjake/internetarchive ``` -------------------------------- ### Sound Indicator Examples Source: https://archive.org/developers/metadata-schema/index Examples indicating whether media has sound or is silent. ```text sound ``` ```text silent ``` -------------------------------- ### Upload Files and Metadata with Python Source: https://archive.org/developers/internetarchive/quickstart Upload files to archive.org using the `internetarchive.upload` function. You can specify metadata, upload dictionaries of remote filenames to local filenames, upload file-like objects, and upload directories. Existing files with the same name will be overwritten. ```python from internetarchive import upload md = {'collection': 'test_collection', 'title': 'My New Item', 'mediatype': 'movies'} r = upload('', files=['foo.txt', 'bar.mov'], metadata=md) r[0].status_code ``` ```python r = upload('', files={'remote-name.txt': 'local-name.txt'}) ``` ```python from io import StringIO r = upload('iacli-test-item301', {'foo.txt': StringIO('bar baz boo')}) ``` ```python r = upload('my_item', 'my_dir') ``` ```python r = upload('my_item', 'my_dir/') ``` ```python r = upload('my_item', {'name': 'foo.txt', 'title': 'My File'}) ``` ```python r = upload('my_item', [{'name': 'foo.txt', 'title': 'My File'}, {'name': 'bar.txt', 'title': 'My Other File'}]) ``` -------------------------------- ### Size Example (Number) Source: https://archive.org/developers/metadata-schema/index An example of the size field, typically representing the physical dimensions of a digitized item. ```text 10.0 ``` -------------------------------- ### Initialize Search Object with Various Parameters (Python) Source: https://archive.org/developers/_modules/internetarchive/search Shows the initialization of the Search class with a query and optional parameters such as fields, sorts, and request_kwargs. It highlights how parameters like 'page' and 'rows' are handled, and authentication setup. ```Python def __init__(self, archive_session, query, fields=None, sorts=None, params=None, full_text_search=None, dsl_fts=None, request_kwargs=None, max_retries=None): params = params or {} self.session = archive_session self.dsl_fts = False if not dsl_fts else True if self.dsl_fts or full_text_search: self.fts = True else: self.fts = False self.query = query if self.fts and not self.dsl_fts: self.query = f'!L {self.query}' self.fields = fields or [] self.sorts = sorts or [] self.request_kwargs = request_kwargs or {} self._num_found = None self.fts_url = f'{self.session.protocol}//be-api.us.archive.org/ia-pub-fts-api' self.scrape_url = f'{self.session.protocol}//{self.session.host}/services/search/v1/scrape' self.search_url = f'{self.session.protocol}//{self.session.host}/advancedsearch.php' if self.session.access_key and self.session.secret_key: self.auth = S3Auth(self.session.access_key, self.session.secret_key) else: self.auth = None self.max_retries = max_retries if max_retries is not None else 5 # Initialize params. default_params = {'q': self.query} if 'page' not in params: if 'rows' in params: params['page'] = 1 else: default_params['count'] = 10000 else: default_params['output'] = 'json' # In the beta endpoint 'scope' was called 'index'. # Let's support both for a while. if 'index' in params: params['scope'] = params['index'] del params['index'] self.params = default_params.copy() self.params.update(params) # Set timeout. if 'timeout' not in self.request_kwargs: self.request_kwargs['timeout'] = 300 # Set retries. self.session.mount_http_adapter(max_retries=self.max_retries) ``` -------------------------------- ### ISBN Examples Source: https://archive.org/developers/metadata-schema/index Examples of ISBN-10 and ISBN-13 identifiers. The last digit of an ISBN can be a number from 0-9 or the character 'X'. ```isbn 3540212507 ``` ```isbn 031294716X ``` -------------------------------- ### Download On-the-Fly Files with 'ia' Source: https://archive.org/developers/internetarchive/cli Demonstrates how to download files that are generated on-the-fly by archive.org using the --on-the-fly parameter. This is applicable for formats like EPUB, MOBI, DAISY, and MARCXML. ```bash $ ia download goodytwoshoes00newyiala --on-the-fly ``` -------------------------------- ### Example Contributor Identifier Source: https://archive.org/developers/metadata-schema/index Identifies the person or organization that provided the media. 'Robarts - University of Toronto' is an example of a contributor. ```text Robarts - University of Toronto ``` -------------------------------- ### Initialize ArchiveSession Source: https://archive.org/developers/internetarchive/internetarchive Demonstrates how to create an instance of the ArchiveSession class to begin interacting with the Internet Archive. ```python from internetarchive import ArchiveSession s = ArchiveSession() ``` -------------------------------- ### Uploader Email Address Example Source: https://archive.org/developers/metadata-schema/index Example of an email address for the 'uploader' field, which identifies the account that uploaded an item to archive.org. ```text footage@panix.com ``` -------------------------------- ### Download Collection or from Itemlist Source: https://archive.org/developers/internetarchive/cli Download all items from a specific collection or from a list of items specified in a text file. The `--search` flag is used for collections, and `--itemlist` is used for item lists. ```shell $ ia download --search 'collection:glasgowschoolofart' ``` ```shell $ ia download --itemlist itemlist.txt ``` -------------------------------- ### Scan Date Example (YYYYMMDDHHMMSS) Source: https://archive.org/developers/metadata-schema/index An example of the scan date format, representing the date and time of media capture or digitization. ```text 20170329201345 ``` -------------------------------- ### Example of 'repub_state' value Source: https://archive.org/developers/metadata-schema/index Shows an example of the 'repub_state' field, which indicates the current state of a scanned book as a whole number. ```text 19 ``` -------------------------------- ### Year Metadata Example (Deprecated) Source: https://archive.org/developers/metadata-schema/index An example of the 'year' field, which is deprecated and should be replaced by the 'date' field. It expects a YYYY format. ```YYYY 1996 ``` -------------------------------- ### Title Metadata Example Source: https://archive.org/developers/metadata-schema/index An example of a media title. This field accepts plain text and does not support HTML or HTML entities. ```text San Francisco (1955 Cinemascope film) ``` -------------------------------- ### Filter Downloads by Single Format Source: https://archive.org/developers/internetarchive/quickstart Shows how to download a specific file format from an archive item using the `formats` parameter with a single string value. ```python download('goodytwoshoes00newyiala', verbose=True, formats='MARC') ``` -------------------------------- ### Sponsor Example Source: https://archive.org/developers/metadata-schema/index An example value for the 'sponsor' metadata field, identifying the person or organization that funded the digitization or collection of the media. ```text Kahle-Austin Foundation ``` -------------------------------- ### ArchiveSession Initialization Parameters Source: https://archive.org/developers/_modules/internetarchive/session Details the parameters for initializing an ArchiveSession, including configuration options, file paths, debug flags, and HTTP adapter arguments. ```python def __init__(self, config: Mapping | None = None, config_file: str = "", debug: bool = False, http_adapter_kwargs: MutableMapping | None = None): """Initialize :class:`ArchiveSession ` object with config. :param config: A config dict used for initializing the :class:`ArchiveSession ` object. :param config_file: Path to config file used for initializing the :class:`ArchiveSession ` object. :param http_adapter_kwargs: Keyword arguments used to initialize the :class:`requests.adapters.HTTPAdapter ` object. :returns: :class:`ArchiveSession` object. """ pass ``` -------------------------------- ### Skip Downloaded Files with Checksum Archive Source: https://archive.org/developers/internetarchive/quickstart Explains the use of `checksum_archive=True` for faster, safer downloads. It leverages a checksum archive file to avoid recalculating checksums for previously validated files. ```python download('nasa', verbose=True, checksum_archive=True) ``` -------------------------------- ### Example of an Internet Archive Item Metadata File URL Source: https://archive.org/developers/items An example demonstrating the structure of a URL pointing to a specific metadata file within an Internet Archive item. This shows how the identifier and filename are incorporated. ```text https://archive.org/download/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml ``` -------------------------------- ### Scanner Examples (Hostname, Software) Source: https://archive.org/developers/metadata-schema/index Examples of scanner identifiers, including hostnames for web crawls and software versions for digitization tools. ```text scribe2.nj.archive.org ``` ```text selenium-101.us.archive.org ``` ```text Lasergraphics Scanstation ``` ```text ArchiveCD Version 2.1.15 ``` ```text Internet Archive HTML5 Uploader 1.6.3 ``` -------------------------------- ### Volume Metadata Example Source: https://archive.org/developers/metadata-schema/index This snippet shows an example of a volume number or name for a media item. This field is not overwritten by MARC data. ```string 15 ``` -------------------------------- ### Subject/Keyword Metadata Example Source: https://archive.org/developers/metadata-schema/index This snippet shows an example of how to define a subject or keyword for media content. It supports plain text and all alphabets. ```text France ``` -------------------------------- ### Archive.today API Availability Response Example Source: https://archive.org/developers/_static/wayback Illustrates a successful response from the Archive.today API, detailing results for requested URLs. Includes snapshot information like status, URL, and timestamp. ```json { "results": [ { "url": "http://www.entish.org", "timestamp": "2016-04-07T19:39:18Z", "snapshot": { "status": "200", "url": "http://web.archive.org/web/20160111075133/http://entish.org/", "timestamp": "2016-04-07T19:39:18Z" }, "tag": "0" }, { "url": "http://www.cnn.com/", "snapshot": { "url": "http://web.archive.org/web/20160413132039/http://www.cnn.com/", "timestamp": "2016-04-13T13:20:39Z" }, "tag": "1" }, { "url": "http://www.youcantfindthis.cat", "timestamp": "2016-04-07T19:39:18Z", "snapshot": {}, "tag": "2" } ] } ``` -------------------------------- ### LCCN Example Source: https://archive.org/developers/metadata-schema/index Example of a Library of Congress Call Number (LCCN). This is a unique identifier for serial publications assigned by the Library of Congress. ```text 2004045278 ``` -------------------------------- ### Build request with progress bar for file upload using Python Source: https://archive.org/developers/_modules/internetarchive/item This code defines a function `_build_request` that prepares the file upload request, including handling of progress reporting using `tqdm`. It determines the chunk size, creates a chunk generator, and wraps it with a progress bar for verbose output. Handles empty files. ```python def _build_request(): body.seek(0, os.SEEK_SET) if verbose: try: # hack to raise exception so we get some output for # empty files. if size == 0: raise Exception chunk_size = 1048576 expected_size = math.ceil(size / chunk_size) chunks = chunk_generator(body, chunk_size) progress_generator = tqdm(chunks, desc=f' uploading {key}', dynamic_ncols=True, total=expected_size, unit='MiB') data = None # pre_encode is needed because http doesn't know that it # needs to encode a TextIO object when it's wrapped # in the Iterator from tqdm. # So, this FileAdapter provides pre-encoded output data = IterableToFileAdapter( progress_generator, size, pre_encode=isinstance(body, io.TextIOBase) ) except Exception: ``` -------------------------------- ### ISSN Example Source: https://archive.org/developers/metadata-schema/index Examples of valid ISSN formats. An ISSN is a unique eight-digit serial number used to identify a serial publication. ```text 2528-7788 ``` ```text 1943-345X ``` -------------------------------- ### Install GNU Parallel with Homebrew Source: https://archive.org/developers/internetarchive/parallel This command installs the GNU Parallel tool using the Homebrew package manager on macOS. GNU Parallel is a shell tool for executing jobs in parallel, useful for bulk operations. ```shell brew install parallel ``` -------------------------------- ### Example Media Condition Source: https://archive.org/developers/metadata-schema/index Defines the physical condition of the media item. 'Good' is an example of an accepted value, representing the state of the disc or file. ```text Good ``` -------------------------------- ### Configure Internet Archive Credentials with Python Source: https://archive.org/developers/_modules/internetarchive/api Sets up authentication for the internetarchive library by prompting for username and password or using provided credentials. Writes configuration to a file. ```python def configure( username: str = "", password: str = "", config_file: str = "", host: str = "archive.org", ) -> str: """Configure internetarchive with your Archive.org credentials. :param username: The email address associated with your Archive.org account. :param password: Your Archive.org password. :returns: The config file path. Usage: >>> from internetarchive import configure >>> configure('user@example.com', 'password') """ auth_config = config_module.get_auth_config( username or input("Email address: "), password or getpass("Password: "), host, ) config_file_path = config_module.write_config_file(auth_config, config_file) return config_file_path ``` -------------------------------- ### Example Collection Identifier Source: https://archive.org/developers/metadata-schema/index Specifies the collection(s) an item belongs to. 'prelinger' is an example of a valid collection identifier. The primary collection should be listed first. ```xml prelinger ``` -------------------------------- ### Compare Website Versions Source: https://archive.org/developers/tutorial-compare-snapshot-wayback Manually compare two versions of a website by constructing specific URLs for each snapshot. ```APIDOC ## Website Version Comparison ### Description This section describes how to construct URLs for specific archived versions of a website to enable side-by-side comparison using external diff tools. ### Method GET (Constructed URLs) ### Endpoint Structure `http://web.archive.org/web//` ### Steps 1. Obtain snapshots using the `/cdx/search/cdx` endpoint. 2. Select two desired snapshots based on their `timestamp`. 3. Construct the comparison URLs using the format `http://web.archive.org/web//`. For example: - `http://web.archive.org/web/20180427130634/https://tc.eserver.org/` - `http://web.archive.org/web/20181115172501/https://tc.eserver.org/` 4. Use a diff tool to compare the content accessible via these constructed URLs. ### Note If no visible differences are found, try selecting snapshots with different `digest` values from the `cdx` API response. ``` -------------------------------- ### BaseItem Initialization and Loading Source: https://archive.org/developers/_modules/internetarchive/item Initializes a BaseItem object and loads its attributes from provided or fetched item metadata. Handles default attributes and sets the 'exists' flag. ```python class BaseItem: EXCLUDED_ITEM_METADATA_KEYS = ('workable_servers', 'server') def __init__( self, identifier: str | None = None, item_metadata: Mapping | None = None, ): # Default attributes. self.identifier = identifier self.item_metadata = item_metadata or {} self.exists = False # Archive.org metadata attributes. self.metadata: dict = {} self.files: list[dict] = [] self.created = None self.d1 = None self.d2 = None self.dir = None self.files_count = None self.item_size = None self.reviews: list = [] self.server = None self.uniq = None self.updated = None self.tasks = None self.is_dark = None # Load item. self.load() def load(self, item_metadata: Mapping | None = None) -> None: if item_metadata: self.item_metadata = item_metadata self.exists = bool(self.item_metadata) for key in self.item_metadata: setattr(self, key, self.item_metadata[key]) if not self.identifier: self.identifier = self.metadata.get('identifier') mc = self.metadata.get('collection', []) # TODO: The `type: ignore` on the following line should be removed. See #518 self.collection = IdentifierListAsItems(mc, self.session) # type: ignore ``` -------------------------------- ### Initialize Item with metadata in Python Source: https://archive.org/developers/_modules/internetarchive/item Initializes an Item object with metadata, either provided directly or retrieved from Archive.org using an identifier. It also sets up URLs and generates a MediaWiki-formatted link to the item. ```python self.session = archive_session super().__init__(identifier, item_metadata) self.urls = Item.URLs(self) if self.metadata.get('title'): # A copyable link to the item, in MediaWiki format details = self.urls.details # type: ignore self.wikilink = f'* [{details} {self.identifier}] -- {self.metadata["title"]}' ``` -------------------------------- ### Fold Out Count Example Source: https://archive.org/developers/metadata-schema/index Shows an example of the 'foldoutcount' field, which indicates the number of foldouts captured for an item. This is relevant for items photographed on machinery other than the Scribe. ```text 1 ``` -------------------------------- ### Initialize ArchiveSession Source: https://archive.org/developers/_modules/internetarchive/session Initializes the ArchiveSession object, loading configuration from a file or dictionary. It sets up default headers, cookies, and mounts an HTTP adapter for managing requests. ```python from internetarchive import ArchiveSession s = ArchiveSession() item = s.get_item('nasa') print(item) ``` -------------------------------- ### Sound Examples Source: https://archive.org/developers/metadata-schema/index Examples of values for the 'sound' metadata field, indicating whether media contains sound or is silent. This is primarily used for video items. ```text sound ``` ```text silent ``` -------------------------------- ### Size Example Source: https://archive.org/developers/metadata-schema/index An example value for the 'size' metadata field, indicating the size of a physical item that has been digitized. The unit of measurement is assumed to be inches if not specified. ```text 10.0 ``` -------------------------------- ### Open Library Identifier Example Source: https://archive.org/developers/metadata-schema/index The 'openlibrary' field, now deprecated, previously held the Open Library edition identifier. The example shows the format for this identifier. ```plaintext OL2769393M ``` -------------------------------- ### Local Identifier Example Source: https://archive.org/developers/metadata-schema/index An example of a 'identifier-bib' field, used for additional local identifiers that do not fit into other predefined metadata fields. These are unique to the providing institution. ```text GLAD-84064318 ```