### Install Unstructured[all-docs] Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'unstructured' package with all documentation-related extras. ```Python unstructured[all-docs]==0.17.2 ``` -------------------------------- ### Install Uvicorn Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'uvicorn' package, a lightning-fast ASGI server. ```Python uvicorn==0.34.3 ``` -------------------------------- ### Install Unstructured-Client Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'unstructured-client' package for interacting with the Unstructured API. ```Python unstructured-client==0.36.0 ``` -------------------------------- ### Install unstructured-client Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the unstructured-client library, version 0.36.0, for interacting with the unstructured API. ```Python unstructured-client==0.36.0 ``` -------------------------------- ### Install Websocket-Client Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'websocket-client' package for implementing WebSocket clients. ```Python websocket-client==1.8.0 ``` -------------------------------- ### Install OmegaConf Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'omegaconf' version 2.3.0, a library for structured configuration, used by 'effdet'. ```bash pip install omegaconf==2.3.0 ``` -------------------------------- ### Install Traitlets Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'traitlets' package, a library for configuration and traitlets. ```Python traitlets==5.14.3 ``` -------------------------------- ### Install Watchdog Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'watchdog' package, an API for monitoring file system events. ```Python watchdog==6.0.0 ``` -------------------------------- ### Install Wheel Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'wheel' package, a built-package format for Python. ```Python wheel==0.45.1 ``` -------------------------------- ### Install HTTP Core Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'httpcore' version 1.0.9, a low-level HTTP client, used by 'httpx'. ```bash pip install httpcore==1.0.9 ``` -------------------------------- ### Install Wrapt Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'wrapt' package, providing decorators and utilities for metaprogramming. ```Python wrapt==1.17.2 ``` -------------------------------- ### Install HTTPX Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'httpx' version 0.28.1, an HTTP client for Python, used by 'unstructured-client'. ```bash pip install httpx==0.28.1 ``` -------------------------------- ### Install Uri-Template Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'uri-template' package for processing URI templates. ```Python uri-template==1.3.0 ``` -------------------------------- ### Install Wcwidth Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'wcwidth' package, which calculates the display width of Unicode characters. ```Python wcwidth==0.2.13 ``` -------------------------------- ### Install Widgetsnbextension Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'widgetsnbextension' package for interactive Jupyter widgets. ```Python widgetsnbextension==4.0.14 ``` -------------------------------- ### Install ratelimit Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the ratelimit library, version 2.2.1, for rate limiting functionality. ```Python ratelimit==2.2.1 ``` -------------------------------- ### Install Proto Plus Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'proto-plus' version 1.26.1, a library for Protocol Buffers, used by 'google-api-core' and 'google-cloud-vision'. ```bash pip install proto-plus==1.26.1 ``` -------------------------------- ### Install Tornado Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'tornado' package, an asynchronous networking library and web framework. ```Python tornado==6.5.1 ``` -------------------------------- ### Install unstructured[all-docs] Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the unstructured library with support for all document types, version 0.17.2. ```Python unstructured[all-docs]==0.17.2 ``` -------------------------------- ### Install Webcolors Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'webcolors' package for converting between CSS color formats. ```Python webcolors==24.11.1 ``` -------------------------------- ### Set up Python Virtual Environment with pyenv Source: https://github.com/unstructured-io/unstructured-api/blob/main/README.md This code demonstrates the recommended steps for setting up a Python virtual environment using pyenv for the unstructured-api project. It includes installing a specific Python version and creating/activating a virtual environment. ```bash brew install pyenv-virtualenv pyenv install 3.12 pyenv virtualenv 3.12 unstructured-api pyenv activate unstructured-api ``` -------------------------------- ### Install Humanfriendly Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'humanfriendly' version 10.0, a library for human-friendly output, used by 'coloredlogs'. ```bash pip install humanfriendly==10.0 ``` -------------------------------- ### Download XBRL 10-K using curl Source: https://github.com/unstructured-io/unstructured-api/blob/main/sample-docs/README.md This command downloads an example 10-K filing in inline XBRL format from the SEC website. It requires setting a user agent in the header to avoid rejection by the SEC site. The downloaded file can then be processed using the HTML parser. ```bash curl -O \ -A '${organization} ${email}' \ https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt ``` -------------------------------- ### Fetch XBRL 10-K with Curl Source: https://github.com/unstructured-io/unstructured-api/blob/main/sample-docs/README.rst Fetches an example 10-K filing in inline XBRL format from the SEC website using curl. Requires setting a user agent in the header to avoid rejection by the SEC site. ```bash curl -O \ -A '${organization} ${email}' \ https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt ``` -------------------------------- ### Install Urllib3 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'urllib3' package, a powerful HTTP client for Python. ```Python urllib3==2.4.0 ``` -------------------------------- ### Install Google Auth Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'google-auth' version 2.40.3, a library for Google authentication, used by 'google-api-core' and 'google-cloud-vision'. ```bash pip install google-auth==2.40.3 ``` -------------------------------- ### Install Unstructured-Inference Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'unstructured-inference' package for performing inference tasks with Unstructured. ```Python unstructured-inference==1.0.5 ``` -------------------------------- ### Install Xlsxwriter Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'xlsxwriter' package for writing data to .xlsx files. ```Python xlsxwriter==3.2.3 ``` -------------------------------- ### Install Tinycss2 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'tinycss2' package, a low-level CSS parser. ```Python tinycss2==1.4.0 ``` -------------------------------- ### Install Tokenizers Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'tokenizers' package, providing fast tokenization algorithms for NLP models. ```Python tokenizers==0.21.1 ``` -------------------------------- ### Install Transformers Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'transformers' package, providing state-of-the-art NLP models and tools. ```Python transformers==4.52.4 ``` -------------------------------- ### Install Webencodings Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'webencodings' package, providing encodings for web standards. ```Python webencodings==0.5.1 ``` -------------------------------- ### Install Protobuf Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'protobuf' version 6.31.1, a language-neutral, platform-neutral, extensible mechanism for serializing structured data, used by multiple Google libraries and ONNX. ```bash pip install protobuf==6.31.1 ``` -------------------------------- ### Install Psutil Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'psutil' version 7.0.0, a cross-platform library to retrieve information on running processes and system utilization, used by 'accelerate' and 'unstructured'. ```bash ``` -------------------------------- ### Install Torch Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'torch' package, a fundamental library for deep learning and tensor computation. ```Python torch==2.7.1 ``` -------------------------------- ### Install Terminado Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'terminado' package, which provides a pseudo-terminal backend for Jupyter. ```Python terminado==0.18.1 ``` -------------------------------- ### Install Flatbuffers Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'flatbuffers' version 25.2.10, a serialization library, used as a dependency for 'onnxruntime'. ```bash pip install flatbuffers==25.2.10 ``` -------------------------------- ### Install Timm Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'timm' package, a collection of pre-trained image models for PyTorch. ```Python timm==1.0.15 ``` -------------------------------- ### Install Six Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'six' package, a compatibility library for Python 2 and 3. ```Python six==1.17.0 ``` -------------------------------- ### Install AnyIO Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the 'anyio' package version 4.9.0, an asynchronous networking and concurrency library for Python, used by 'httpx' and 'starlette'. ```bash pip install anyio==4.9.0 ``` -------------------------------- ### Install Unstructured-Pytesseract Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'unstructured-pytesseract' package for integrating Tesseract OCR with Unstructured. ```Python unstructured-pytesseract==0.3.15 ``` -------------------------------- ### Install Soupsieve Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'soupsieve' package, a CSS selector library for BeautifulSoup. ```Python soupsieve==2.7 ``` -------------------------------- ### Install Hugging Face Hub Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'huggingface-hub' version 0.33.0, a library for interacting with the Hugging Face Hub, used by 'accelerate', 'timm', 'tokenizers', 'transformers', and 'unstructured-inference'. ```bash pip install huggingface-hub==0.33.0 ``` -------------------------------- ### Install EffDet Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'effdet' version 0.4.1, likely related to EfficientDet object detection models, used by 'unstructured'. ```bash pip install effdet==0.4.1 ``` -------------------------------- ### Install SymPy Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'sympy' package, a Python library for symbolic mathematics. ```Python sympy==1.14.0 ``` -------------------------------- ### Install IDNA Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'idna' version 3.10, a library for Internationalized Domain Names in Applications, used by 'anyio', 'httpx', and 'requests'. ```bash pip install idna==3.10 ``` -------------------------------- ### Install pypdfium2 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the pypdfium2 library, version 4.30.1, a dependency for unstructured-inference. ```Python pypdfium2==4.30.1 ``` -------------------------------- ### Install Starlette Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'starlette' package, a lightweight ASGI framework for building high-performance web applications. ```Python starlette==0.41.2 ``` -------------------------------- ### Install Typing-Inspect Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'typing-inspect' package, providing utilities for inspecting type hints. ```Python typing-inspect==0.9.0 ``` -------------------------------- ### Install MPmath Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'mpmath' version 1.3.0, a library for arbitrary-precision floating-point arithmetic, used by 'sympy'. ```bash pip install mpmath==1.3.0 ``` -------------------------------- ### Install Joblib Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'joblib' version 1.5.1, a set of tools to provide lightweight pipelining of Python functions, used by 'nltk'. ```bash pip install joblib==1.5.1 ``` -------------------------------- ### Install Accelerate, Timm, and Transformers Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt This snippet shows the installation of 'accelerate', 'timm', and 'transformers' packages, often used for machine learning and deep learning tasks. ```Python # accelerate # timm # transformers ``` -------------------------------- ### Install Sniffio Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'sniffio' package, which provides a way to detect the running asynchronous I/O library. ```Python sniffio==1.3.1 ``` -------------------------------- ### Install MarkupSafe Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'markupsafe' version 3.0.2, a dependency for 'jinja2', used for escaping strings for safe inclusion in HTML. ```bash pip install markupsafe==3.0.2 ``` -------------------------------- ### Install rsa Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the rsa library, version 4.9.1, for RSA encryption, a dependency for google-auth. ```Python rsa==4.9.1 ``` -------------------------------- ### Install Google API Common Protobufs Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'googleapis-common-protos' version 1.70.0, Protocol Buffer definitions for Google APIs, used by 'google-api-core' and 'grpcio-status'. ```bash pip install googleapis-common-protos==1.70.0 ``` -------------------------------- ### Install Typing-Extensions Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'typing-extensions' package, providing backported and experimental type hints for Python. ```Python typing-extensions==4.14.0 ``` -------------------------------- ### Install ET-XMLFile Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'et-xmlfile' version 2.0.0, a library for creating XML files, used as a dependency for 'openpyxl'. ```bash pip install et-xmlfile==2.0.0 ``` -------------------------------- ### Install Jinja2 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'jinja2' version 3.1.6, a modern and designer-friendly templating language for Python, used by 'torch'. ```bash pip install jinja2==3.1.6 ``` -------------------------------- ### Install Tqdm Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'tqdm' package, which provides fast, extensible progress bars for loops and long-running tasks. ```Python tqdm==4.67.1 ``` -------------------------------- ### Install Stack-data Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'stack-data' package, used for improving stack trace formatting. ```Python stack-data==0.6.3 ``` -------------------------------- ### Install Tzdata Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'tzdata' package, providing timezone data for Python. ```Python tzdata==2025.2 ``` -------------------------------- ### Install Click Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'click' version 8.2.1, a package for creating command-line interfaces, used by 'unstructured' and other tools. ```bash pip install click==8.2.1 ``` -------------------------------- ### Install HF XET Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'hf-xet' version 1.1.3, likely related to Hugging Face's XET repository management, used by 'huggingface-hub'. ```bash pip install hf-xet==1.1.3 ``` -------------------------------- ### Install starlette Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the starlette library, version 0.41.2, a lightweight ASGI framework, used by fastapi. ```Python starlette==0.41.2 ``` -------------------------------- ### Install Cachetools Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'cachetools' version 5.5.2, a collection of caching functions, used here as a dependency for 'google-auth'. ```bash pip install cachetools==5.5.2 ``` -------------------------------- ### Install unstructured-inference Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the unstructured-inference library, version 1.0.5, for performing inference tasks with unstructured models. ```Python unstructured-inference==1.0.5 ``` -------------------------------- ### Install gRPC IO Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'grpcio' version 1.73.0, a high-performance RPC framework, used by 'google-api-core' and 'grpcio-status'. ```bash pip install grpcio==1.73.0 ``` -------------------------------- ### Install unstructured-pytesseract Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the unstructured-pytesseract library, version 0.3.15, for integrating pytesseract with unstructured. ```Python unstructured-pytesseract==0.3.15 ``` -------------------------------- ### Install Typing-Inspection Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'typing-inspection' package, a duplicate or alternative for inspecting type hints. ```Python typing-inspection==0.4.1 ``` -------------------------------- ### Install python-pptx Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the python-pptx library, version 1.0.2, for creating PowerPoint presentations, a dependency for unstructured. ```Python python-pptx==1.0.2 ``` -------------------------------- ### Install Xlrd Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'xlrd' package for reading data from Microsoft Excel files. ```Python xlrd==2.0.1 ``` -------------------------------- ### Install Accelerate Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the 'accelerate' package version 1.7.0, which is often used for optimizing deep learning model training and inference. ```bash pip install accelerate==1.7.0 ``` -------------------------------- ### Install Markdown Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'markdown' version 3.8, a library for converting Markdown text to HTML, used by the 'unstructured' package. ```bash pip install markdown==3.8 ``` -------------------------------- ### Install Types-Python-Dateutil Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs type stubs for the 'python-dateutil' library, improving type checking. ```Python types-python-dateutil==2.9.0.20250516 ``` -------------------------------- ### Install Marshmallow Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'marshmallow' version 3.26.1, a library for object serialization/deserialization, used by 'dataclasses-json'. ```bash pip install marshmallow==3.26.1 ``` -------------------------------- ### Install Google API Core with gRPC Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'google-api-core' with gRPC support version 2.25.1, a core library for Google Cloud client libraries. ```bash pip install "google-api-core[grpc]==2.25.1" ``` -------------------------------- ### Install SciPy Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'scipy' package, a fundamental library for scientific and technical computing in Python. ```Python scipy==1.15.3 ``` -------------------------------- ### Install HTML5lib Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'html5lib' version 1.1, a Python library for parsing HTML, used by the 'unstructured' package. ```bash pip install html5lib==1.1 ``` -------------------------------- ### Install MyPy Extensions Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'mypy-extensions' version 1.1.0, providing additional type hints for MyPy, used by 'typing-inspect'. ```bash pip install mypy-extensions==1.1.0 ``` -------------------------------- ### Install H11 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'h11' version 0.16.0, an HTTP/1.1 protocol implementation, used by 'httpcore' and 'uvicorn'. ```bash pip install h11==0.16.0 ``` -------------------------------- ### Install Torchvision Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'torchvision' package, providing datasets, models, and transforms for computer vision with PyTorch. ```Python torchvision==0.22.1 ``` -------------------------------- ### Install Nest Asyncio Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'nest-asyncio' version 1.6.0, a library that enables asyncio to run in nested event loops, used by 'unstructured-client'. ```bash pip install nest-asyncio==1.6.0 ``` -------------------------------- ### Install pypdf Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the pypdf library, version 5.6.0, used for PDF processing. ```Python pypdf==5.6.0 ``` -------------------------------- ### Install typing-inspect Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the typing-inspect library, version 0.9.0, for inspecting type hints, used by dataclasses-json. ```Python typing-inspect==0.9.0 ``` -------------------------------- ### Install tqdm Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the tqdm library, version 4.67.1, for creating progress bars, used by huggingface-hub, nltk, transformers, and unstructured. ```Python tqdm==4.67.1 ``` -------------------------------- ### Install python-multipart Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the python-multipart library, version 0.0.20, for parsing multipart/form-data, used by unstructured-inference. ```Python python-multipart==0.0.20 ``` -------------------------------- ### Install sniffio Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the sniffio library, version 1.3.1, for detecting the asynchronous library in use, a dependency for anyio. ```Python sniffio==1.3.1 ``` -------------------------------- ### Install NetworkX Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'networkx' version 3.5, a library for creating, manipulating, and studying the structure, dynamics, and functions of complex networks, used by 'torch' and 'unstructured'. ```bash pip install networkx==3.5 ``` -------------------------------- ### Install webencodings Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the webencodings library, version 0.5.1, for character encoding detection, used by html5lib. ```Python webencodings==0.5.1 ``` -------------------------------- ### Install PI-Heif Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'pi-heif' version 0.22.0, a library for reading and writing HEIF image files, used by 'unstructured'. ```bash pip install pi-heif==0.22.0 ``` -------------------------------- ### Install pytz Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the pytz library, version 2025.2, for timezone calculations, used by pandas. ```Python pytz==2025.2 ``` -------------------------------- ### Install rapidfuzz Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the rapidfuzz library, version 3.13.0, for fast string matching, used by unstructured and unstructured-inference. ```Python rapidfuzz==3.13.0 ``` -------------------------------- ### Install Langdetect Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'langdetect' version 1.0.9, a library for language detection, used by the 'unstructured' package. ```bash pip install langdetect==1.0.9 ``` -------------------------------- ### Install FastAPI Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'fastapi' version 0.115.12, a modern, fast web framework for building APIs with Python, a core dependency for the project. ```bash pip install fastapi==0.115.12 ``` -------------------------------- ### Install FSSpec Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'fsspec' version 2025.5.1, a file system interface library, used by 'huggingface-hub' and 'torch'. ```bash pip install fsspec==2025.5.1 ``` -------------------------------- ### Install Matplotlib Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'matplotlib' version 3.10.3, a plotting library for Python, used by 'unstructured-inference'. ```bash pip install matplotlib==3.10.3 ``` -------------------------------- ### Install FileLock Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'filelock' version 3.18.0, a platform-independent file lock library, used by 'huggingface-hub', 'torch', and 'transformers'. ```bash pip install filelock==3.18.0 ``` -------------------------------- ### Install ONNX Runtime Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'onnxruntime' version 1.22.0, a high-performance inference engine for ONNX models, used by 'unstructured' and 'unstructured-inference'. ```bash pip install onnxruntime==1.22.0 ``` -------------------------------- ### Install sympy Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the sympy library, version 1.14.0, for symbolic mathematics, used by onnxruntime and torch. ```Python sympy==1.14.0 ``` -------------------------------- ### Install Pikepdf Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'pikepdf' version 9.8.1, a library for working with PDF files, used by 'unstructured'. ```bash pip install pikepdf==9.8.1 ``` -------------------------------- ### Install Chardet Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'chardet' version 5.2.0, a library for character encoding detection, used by the 'unstructured' package. ```bash pip install chardet==5.2.0 ``` -------------------------------- ### Install Kiwisolver Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'kiwisolver' version 1.4.8, a Python library for solving linear equations, used as a dependency for 'matplotlib'. ```bash pip install kiwisolver==1.4.8 ``` -------------------------------- ### Install Packaging Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'packaging' version 25.0, a library for Python package and version handling, used by various libraries including 'accelerate', 'huggingface-hub', and 'matplotlib'. ```bash pip install packaging==25.0 ``` -------------------------------- ### Install transformers Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the transformers library, version 4.52.4, for state-of-the-art NLP models, a key dependency for unstructured-inference. ```Python transformers==4.52.4 ``` -------------------------------- ### Install scipy Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the scipy library, version 1.15.3, for scientific and technical computing, a dependency for unstructured-inference. ```Python scipy==1.15.3 ``` -------------------------------- ### Install Cycler Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'cycler' version 0.12.1, a utility for creating color cyclers, used as a dependency for 'matplotlib'. ```bash pip install cycler==0.12.1 ``` -------------------------------- ### Install typing-inspection Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the typing-inspection library, version 0.4.1, for inspecting type hints, used by pydantic. ```Python typing-inspection==0.4.1 ``` -------------------------------- ### Install uvicorn Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the uvicorn library, version 0.34.3, an ASGI server, used in the project's base requirements. ```Python uvicorn==0.34.3 ``` -------------------------------- ### Install gRPC IO Status Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'grpcio-status' version 1.73.0, providing status codes for gRPC, used by 'google-api-core'. ```bash pip install grpcio-status==1.73.0 ``` -------------------------------- ### Install Google Cloud Vision Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'google-cloud-vision' version 3.10.2, a client library for Google Cloud Vision API, used by the 'unstructured' package. ```bash pip install google-cloud-vision==3.10.2 ``` -------------------------------- ### Install requests-toolbelt Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the requests-toolbelt library, version 1.0.0, providing additional utilities for the requests library, used by unstructured-client. ```Python requests-toolbelt==1.0.0 ``` -------------------------------- ### Install ColoredLogs Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'coloredlogs' version 15.0.1, which enhances Python's logging output with colors and other features, used by 'onnxruntime'. ```bash pip install coloredlogs==15.0.1 ``` -------------------------------- ### Install python-magic Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the python-magic library, version 0.4.27, for file type identification, a dependency for unstructured. ```Python python-magic==0.4.27 ``` -------------------------------- ### Install wrapt Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the wrapt library, version 1.17.2, for monkey patching and decorators, used by deprecated and unstructured. ```Python wrapt==1.17.2 ``` -------------------------------- ### Install torch Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the torch library, version 2.7.1, for deep learning and tensor computation, used by multiple libraries including accelerate and unstructured-inference. ```Python torch==2.7.1 ``` -------------------------------- ### Install Filetype Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'filetype' version 1.2.0, a library for detecting file types, used by the 'unstructured' package. ```bash pip install filetype==1.2.0 ``` -------------------------------- ### Install python-dateutil Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs python-dateutil, version 2.9.0.post0, a utility for parsing dates, used by matplotlib and pandas. ```Python python-dateutil==2.9.0.post0 ``` -------------------------------- ### Install ContourPy Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'contourpy' version 1.3.2, a library for plotting contours, used as a dependency for 'matplotlib'. ```bash pip install contourpy==1.3.2 ``` -------------------------------- ### Install Dataclasses-JSON Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'dataclasses-json' version 0.6.7, a library for serializing and deserializing dataclasses to and from JSON, used by 'unstructured'. ```bash pip install dataclasses-json==0.6.7 ``` -------------------------------- ### Install Certifi Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'certifi' version 2025.4.26, which provides Mozilla's root certificates for validating SSL connections, a dependency for 'httpcore', 'httpx', and 'requests'. ```bash pip install certifi==2025.4.26 ``` -------------------------------- ### Load Powerpoint Presentation Source: https://github.com/unstructured-io/unstructured-api/blob/main/exploration-notebooks/exploration-powerpoint.ipynb This Python snippet uses the `pptx` library to load a Powerpoint file. It takes the file path obtained from `get_filename` and creates a `Presentation` object, which is the entry point for accessing the presentation's content. ```python import pptx presentation = pptx.Presentation(filename) ``` -------------------------------- ### Install tokenizers Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the tokenizers library, version 0.21.1, for natural language processing tokenization, used by transformers. ```Python tokenizers==0.21.1 ``` -------------------------------- ### Install xlsxwriter Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the xlsxwriter library, version 3.2.3, for writing Excel (.xlsx) files, used by python-pptx. ```Python xlsxwriter==3.2.3 ``` -------------------------------- ### Install python-iso639 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the python-iso639 library, version 2025.2.18, for handling ISO 639 language codes, used by unstructured. ```Python python-iso639==2025.2.18 ``` -------------------------------- ### Install Openpyxl Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'openpyxl' version 3.1.5, a library for reading and writing Excel .xlsx files, used by 'unstructured'. ```bash pip install openpyxl==3.1.5 ``` -------------------------------- ### Install typing-extensions Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the typing-extensions library, version 4.14.0, providing backported and experimental type hints, used by many core libraries. ```Python typing-extensions==4.14.0 ``` -------------------------------- ### Install regex Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the regex library, version 2024.11.6, for regular expression operations, used by nltk and transformers. ```Python regex==2024.11.6 ``` -------------------------------- ### Install urllib3 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the urllib3 library, version 2.4.0, for making HTTP requests, a dependency of the requests library. ```Python urllib3==2.4.0 ``` -------------------------------- ### Install xlrd Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the xlrd library, version 2.0.1, for reading data from Excel files, a dependency for unstructured. ```Python xlrd==2.0.1 ``` -------------------------------- ### Install safetensors Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the safetensors library, version 0.5.3, for safe tensor serialization, used by accelerate, timm, and transformers. ```Python safetensors==0.5.3 ``` -------------------------------- ### Install PDFMiner Six Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'pdfminer-six' version 20250506, a PDF parsing library, used by 'unstructured' and 'unstructured-inference'. ```bash pip install pdfminer-six==20250506 ``` -------------------------------- ### Install ONNX Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'onnx' version 1.18.0, the Open Neural Network Exchange format, used by 'unstructured' and 'unstructured-inference'. ```bash pip install onnx==1.18.0 ``` -------------------------------- ### Install timm Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the timm (PyTorch Image Models) library, version 1.0.15, for deep learning models, used by effdet and unstructured-inference. ```Python timm==1.0.15 ``` -------------------------------- ### Install Base Requirements Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the core set of Python packages required for the Unstructured API project. This includes libraries for data processing, API interaction, and general utilities. ```bash pip install -r requirements/base.txt ``` -------------------------------- ### Install AIOFiles Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the 'aiofiles' package version 24.1.0, which provides asynchronous file operations for Python, commonly used with async web frameworks. ```bash pip install aiofiles==24.1.0 ``` -------------------------------- ### Install Pandas Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'pandas' version 2.3.0, a powerful data manipulation and analysis library, used by 'unstructured' and 'unstructured-inference'. ```bash pip install pandas==2.3.0 ``` -------------------------------- ### Install PDF2Image Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'pdf2image' version 1.17.0, a Python library to convert PDF pages to PIL Image objects, used by 'unstructured'. ```bash pip install pdf2image==1.17.0 ``` -------------------------------- ### Install torchvision Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the torchvision library, version 0.22.1, for computer vision tasks in PyTorch, used by effdet and timm. ```Python torchvision==0.22.1 ``` -------------------------------- ### Install Test Requirements Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the Python packages necessary for running tests within the Unstructured API project. This typically includes testing frameworks and related utilities. ```bash pip install -r requirements/test.in ``` -------------------------------- ### Install six Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the six library, version 1.17.0, a compatibility layer for Python 2 and 3, used by multiple libraries like html5lib and langdetect. ```Python six==1.17.0 ``` -------------------------------- ### Install OLEFile Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'olefile' version 0.47, a library for reading and writing Microsoft OLE2 Compound Document format, used by 'python-oxmsg'. ```bash pip install olefile==0.47 ``` -------------------------------- ### Install OpenCV Python Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'opencv-python' version 4.11.0.86, a package for computer vision tasks, used by 'unstructured-inference'. ```bash pip install opencv-python==4.11.0.86 ``` -------------------------------- ### Install Send2Trash Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Installs the 'send2trash' package, which allows files to be sent to the trash or recycle bin instead of being permanently deleted. ```Python send2trash==1.8.3 ``` -------------------------------- ### Install python-oxmsg Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the python-oxmsg library, version 0.0.2, for handling Outlook message files, a dependency for unstructured. ```Python python-oxmsg==0.0.2 ``` -------------------------------- ### Install CFFI Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'cffi' version 1.17.1, a package for calling C code from Python, used here as a dependency for 'cryptography'. ```bash pip install cffi==1.17.1 ``` -------------------------------- ### Install LXML Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'lxml' version 5.4.0, a library for processing XML and HTML, used by 'pikepdf', 'python-docx', 'python-pptx', and 'unstructured'. ```bash pip install lxml==5.4.0 ``` -------------------------------- ### Install FontTools Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'fonttools' version 4.58.2, a library for manipulating font files, used as a dependency for 'matplotlib'. ```bash pip install fonttools==4.58.2 ``` -------------------------------- ### Install Cryptography Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'cryptography' version 45.0.4, a package providing cryptographic recipes and primitives, used by 'pdfminer-six' and 'unstructured-client'. ```bash pip install cryptography==45.0.4 ``` -------------------------------- ### Install soupsieve Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the soupsieve library, version 2.7, a CSS selector library for Beautiful Soup, used by beautifulsoup4. ```Python soupsieve==2.7 ``` -------------------------------- ### Install Deprecated Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'deprecated' version 1.2.18, a library for marking Python code as deprecated, used by 'pikepdf'. ```bash pip install deprecated==1.2.18 ``` -------------------------------- ### Install NLTK Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'nltk' version 3.9.1, the Natural Language Toolkit, a popular library for NLP tasks, used by 'unstructured'. ```bash pip install nltk==3.9.1 ``` -------------------------------- ### Install Emoji Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'emoji' version 2.14.1, a library for working with emoji characters in Python, used by 'unstructured'. ```bash pip install emoji==2.14.1 ``` -------------------------------- ### Install Backoff Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the 'backoff' package version 2.2.1, a Python library for exponential backoff and retry logic, used in 'unstructured' and other requirements. ```bash pip install backoff==2.2.1 ``` -------------------------------- ### Install requests Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the requests library, version 2.32.4, for making HTTP requests, a core dependency for many parts of the project. ```Python requests==2.32.4 ``` -------------------------------- ### Install Annotated Types Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the 'annotated-types' package version 0.7.0, a dependency for 'pydantic', used for data validation and settings management. ```bash pip install annotated-types==0.7.0 ``` -------------------------------- ### Install python-docx Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the python-docx library, version 1.1.2, for creating and updating Microsoft Word (.docx) files, a dependency for unstructured. ```Python python-docx==1.1.2 ``` -------------------------------- ### Install Charset Normalizer Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'charset-normalizer' version 3.4.2, a library for detecting text encoding, used by 'pdfminer-six' and 'requests'. ```bash pip install charset-normalizer==3.4.2 ``` -------------------------------- ### Install Beautiful Soup 4 Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'beautifulsoup4' version 4.13.4, a library for parsing HTML and XML documents, used by the 'unstructured' package. ```bash pip install beautifulsoup4==4.13.4 ``` -------------------------------- ### Install Pillow Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'Pillow' version 11.3.0, the friendly PIL fork for image processing, used by 'matplotlib', 'pdf2image', 'pi-heif', 'pikepdf', 'python-pptx', 'torchvision', and 'unstructured-pytesseract'. ```bash pip install Pillow==11.3.0 ``` -------------------------------- ### Install tzdata Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the tzdata library, version 2025.2, for timezone data, used by pandas. ```Python tzdata==2025.2 ``` -------------------------------- ### Install PyYAML Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs PyYAML, version 6.0.2, for YAML parsing and serialization, used by multiple libraries including accelerate and transformers. ```Python pyyaml==6.0.2 ``` -------------------------------- ### Install Constraints Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/test.txt Applies version constraints for Python packages, ensuring compatibility and stability across the project's dependencies. This is often used in conjunction with other requirements files. ```bash pip install -c ./requirements/constraints.in ``` -------------------------------- ### Install ANTLR4 Python3 Runtime Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs the ANTLR4 Python3 runtime version 4.9.3, a parser generator used for defining and processing grammars, here a dependency for 'omegaconf'. ```bash pip install antlr4-python3-runtime==4.9.3 ``` -------------------------------- ### Install NumPy Source: https://github.com/unstructured-io/unstructured-api/blob/main/requirements/base.txt Installs 'numpy' version 1.26.4, the fundamental package for scientific computing with Python, used across many libraries including 'accelerate', 'matplotlib', 'torch', and 'unstructured'. ```bash pip install numpy==1.26.4 ```