### Setup python-libzim on Ubuntu/Debian
Source: https://github.com/openzim/python-libzim/blob/main/CONTRIBUTING.md
Installs necessary system packages, Python dependencies, and clones the repository. Configures environment variables for CFLAGS and LDFLAGS to point to the libzim installation. Builds the Cython extension and installs the package in development mode.
```bash
apt install coreutils wget git ca-certificates \
g++ pkg-config libtool automake autoconf make meson ninja-build \
liblzma-dev zlib1g-dev libicu-dev libgumbo-dev libmagic-dev
pip3 install --upgrade pip pipenv
export CFLAGS "-I${LIBZIM_DIR}/include"
export LDFLAGS "-L${LIBZIM_DIR}/lib/x86_64-linux-gnu"
git clone https://github.com/openzim/python-libzim
cd python-libzim
python setup.py build_ext
pipenv install --dev
pipenv run pip install -e .
```
--------------------------------
### Complete Workflow Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md
This example demonstrates the full lifecycle of creating a ZIM archive, adding articles, writing metadata, and then reading and querying the archive using search and suggestion functionalities. It requires importing necessary classes from libzim.reader, libzim.writer, libzim.search, and libzim.suggestion.
```python
import pathlib
import datetime
from libzim.reader import Archive
from libzim.writer import Creator, Item, StringProvider, Hint
from libzim.search import Searcher, Query
from libzim.suggestion import SuggestionSearcher
# STEP 1: Define content
class Article(Item):
def __init__(self, path, title, content):
super().__init__()
self.path = path
self.title = title
self.content = content
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
# STEP 2: Create archive
output = pathlib.Path("myarchive.zim")
articles = [
Article("Main", "Home", "
Welcome
Homepage
"),
Article("About", "About Us", "About
About this site
"),
]
with Creator(output) \
.config_indexing(True, "eng") \
.config_compression("zstd") as creator:
for article in articles:
creator.add_item(article)
creator.add_metadata("Title", "My Wiki")
creator.add_metadata("Creator", "Admin")
creator.add_metadata("Date", datetime.date.today())
creator.set_mainpath("Main")
print(f"Created: {output}")
# STEP 3: Read and query
archive = Archive(output)
print(f"\nArchive info:")
print(f" Entries: {archive.entry_count}")
print(f" Main: {archive.main_entry.path}")
print(f" Title: {archive.get_metadata('Title').decode()}")
# STEP 4: Search
if archive.has_fulltext_index:
searcher = Searcher(archive)
query = Query().set_query("Welcome")
results = searcher.search(query)
print(f"\nSearch results for 'Welcome': {results.getEstimatedMatches()}")
# STEP 5: Suggestions
if archive.has_title_index:
suggester = SuggestionSearcher(archive)
suggestions = suggester.suggest("A")
print(f"Suggestions for 'A': {suggestions.getEstimatedMatches()}")
```
--------------------------------
### Custom Item Hints Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Example of implementing the get_hints method in a custom Item subclass to provide entry-specific hints like FRONT_ARTICLE and COMPRESS.
```python
def get_hints(self):
return {
Hint.FRONT_ARTICLE: True, # This is a main article
Hint.COMPRESS: True # Compress this content
}
```
--------------------------------
### FileProvider Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Demonstrates initializing FileProvider with a file path. This provider is used to serve content directly from a file on the disk.
```python
import pathlib
from libzim.writer import FileProvider
filepath = pathlib.Path("large_file.html")
provider = FileProvider(filepath)
```
--------------------------------
### Install python-libzim
Source: https://github.com/openzim/python-libzim/blob/main/README.md
Install the python-libzim package using pip. This command installs pre-compiled wheels for supported platforms.
```sh
pip install libzim
```
--------------------------------
### Complete Usage Example for Article Suggestions
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md
Demonstrates how to get article title suggestions from a ZIM archive using the SuggestionSearcher. Ensure the archive has a title index for this functionality.
```python
from libzim.reader import Archive
from libzim.suggestion import SuggestionSearcher
def get_suggestions(archive_path: str, prefix: str, limit: int = 10):
"""Get article title suggestions for a prefix."""
archive = Archive(archive_path)
# Check if archive has title index
if not archive.has_title_index:
print("Archive does not have a title index")
return []
# Create searcher and get suggestions
searcher = SuggestionSearcher(archive)
suggestions = searcher.suggest(prefix)
# Get result count
total_suggestions = suggestions.getEstimatedMatches()
if total_suggestions == 0:
print(f"No suggestions found for '{prefix}'")
return []
# Fetch suggestions up to the limit
fetch_count = min(limit, total_suggestions)
result_set = suggestions.getResults(0, fetch_count)
results = []
for path in result_set:
entry = archive.get_entry_by_path(path)
results.append({
"path": entry.path,
"title": entry.title
})
return results
# Usage
suggestions = get_suggestions("wikipedia.zim", "Albert", limit=20)
for item in suggestions:
print(f"{item['title']} ({item['path']})")
```
--------------------------------
### Complete Archive Search Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Demonstrates how to open a ZIM archive, check for a full-text index, create a searcher, execute a query, and display a batch of results.
```python
from libzim.reader import Archive
from libzim.search import Searcher, Query
def search_archive(archive_path: str, search_term: str):
"""Search a ZIM archive and display results."""
archive = Archive(archive_path)
# Check if archive has search index
if not archive.has_fulltext_index:
print("Archive does not have a full-text search index")
return
# Create searcher and execute query
searcher = Searcher(archive)
query = Query().set_query(search_term)
results = searcher.search(query)
# Get result count
total_matches = results.getEstimatedMatches()
print(f"Search for '{search_term}': {total_matches} matches")
if total_matches == 0:
return
# Fetch and display first 20 results
batch_size = 20
result_set = results.getResults(0, min(batch_size, total_matches))
for i, path in enumerate(result_set, 1):
entry = archive.get_entry_by_path(path)
print(f"{i}. {entry.title} ({entry.path})")
# Usage
search_archive("wikipedia.zim", "Albert Einstein")
```
--------------------------------
### Use Creator as a context manager
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Demonstrates the basic usage of the `Creator` class with a `with` statement, ensuring the ZIM file is properly started and finalized.
```python
with Creator("output.zim") as creator:
creator.add_item(item)
# File is finalized and closed here
```
--------------------------------
### get_versions
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md
Get version information as an ordered dictionary, mapping component names to their version strings.
```APIDOC
## get_versions
### Description
Get version information as an ordered dictionary.
### Method
`get_versions() -> OrderedDict[str, str]`
### Parameters
None
### Request Example
```python
from libzim.version import get_versions
versions = get_versions()
```
### Response
#### Success Response
- **versions** (`OrderedDict[str, str]`) - Mapping of component names to version strings.
#### Response Example
```python
{
"libzim": "9.0.1",
"python": "3.11.5",
"python-libzim": "3.10.1-dev0"
}
```
```
--------------------------------
### Setup Debug Logging with Version Info
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md
Configures the logging system to DEBUG level and logs the output of `print_versions` to aid in debugging. This provides a comprehensive view of the environment during troubleshooting.
```python
from libzim.version import print_versions
import sys
import logging
def setup_debug_logging():
"""Setup logging with version information."""
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
# Log version info
import io
version_output = io.StringIO()
print_versions(out=version_output)
logger.debug(f"Version info:\n{version_output.getvalue()}")
setup_debug_logging()
```
--------------------------------
### StringProvider Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Illustrates how to initialize StringProvider with either a string or bytes object to provide inline content. The string content is automatically encoded as UTF-8.
```python
from libzim.writer import StringProvider
# From string
provider = StringProvider("HTML Content
")
# From bytes
provider = StringProvider(b"Binary data")
```
--------------------------------
### Retrieving and Iterating OrderedDict Versions
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Shows how to get version information as an OrderedDict and iterate through its components and versions.
```python
from collections import OrderedDict
from libzim.version import get_versions
versions: OrderedDict[str, str] = get_versions()
# Iterate in order
for component, version in versions.items():
print(f"{component}: {version}")
```
--------------------------------
### Handling Configuration Errors During Creator Setup
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/errors.md
When using the Creator, ensure all configurations are done before adding items or entering the context. Attempting to configure compression or indexing after items have been added will raise a RuntimeError.
```python
from libzim.writer import Creator
try:
creator = Creator("output.zim")
with creator:
creator.add_item(item1)
# Error: trying to configure after items added
creator.config_compression("zstd")
except RuntimeError as e:
print(f"Configuration error: {e}")
```
```python
# Correct way
with Creator("output.zim") \
.config_compression("zstd") \
.config_indexing(True, "eng") as creator:
creator.add_item(item1)
```
--------------------------------
### Implementing a Generator for Blobs
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Example of implementing a ContentProvider that yields Blobs using a generator. This is useful for providing content in chunks.
```python
from collections.abc import Generator
from libzim.writer import Blob, ContentProvider
class MyProvider(ContentProvider):
def gen_blob(self) -> Generator[Blob]:
yield Blob(b"Part 1")
yield Blob(b"Part 2")
```
--------------------------------
### Clone and Test python-libzim
Source: https://github.com/openzim/python-libzim/blob/main/README.md
Clone the python-libzim repository and run tests with coverage. Ensure you have the necessary development tools installed.
```sh
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# hatch run test:coverage
```
--------------------------------
### Paginate Through All Search Results
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Provides an example of iterating through all search results using pagination. It calculates the total matches and fetches results in batches of a defined size.
```python
results = searcher.search(query)
total = results.getEstimatedMatches()
batch_size = 20
for offset in range(0, total, batch_size):
batch = results.getResults(offset, batch_size)
for path in batch:
entry = archive.get_entry_by_path(path)
print(f"- {entry.title}")
```
--------------------------------
### Error Handling Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md
Demonstrates how to handle potential errors when using the SuggestionSearcher, such as a missing title index or no suggestions found.
```APIDOC
## Suggestion Searcher Error Handling
### Description
This example shows how to safely use the SuggestionSearcher, including checks for the archive's title index and handling of potential runtime errors.
### Usage
1. Ensure the ZIM archive has a title index.
2. Perform a search using `searcher.suggest()`.
3. Handle cases where no suggestions are found.
4. Catch `RuntimeError` for other potential issues.
### Example
```python
from libzim.reader import Archive
from libzim.suggestion import SuggestionSearcher
try:
archive = Archive("example.zim")
if not archive.has_title_index:
print("Archive lacks title index for suggestions")
exit(1)
searcher = SuggestionSearcher(archive)
suggestions = searcher.suggest("search term")
count = suggestions.getEstimatedMatches()
if count > 0:
for path in suggestions.getResults(0, count):
print(archive.get_entry_by_path(path).title)
else:
print("No suggestions found")
except RuntimeError as e:
print(f"Error: {e}")
```
```
--------------------------------
### Creating an Item with FileProvider
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
An example of defining a custom Item class that uses FileProvider to serve content from a specified file. This includes setting the item's path, title, mimetype, content provider, and hints.
```python
from libzim.writer import Item, FileProvider
class FileItem(Item):
def __init__(self, path, title, filepath):
super().__init__()
self.path = path
self.title = title
self.filepath = filepath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
return FileProvider(self.filepath)
def get_hints(self):
return {}
```
--------------------------------
### Get All Version Information as an Ordered Dictionary
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md
Retrieves version details for all components, including libzim, Python, and dependencies, as an ordered dictionary. This is ideal for structured access and iteration.
```python
from libzim.version import get_versions
from collections import OrderedDict
versions = get_versions()
for component, version in versions.items():
print(f"{component}: {version}")
# Access specific versions
print(f"libzim: {versions.get('libzim')}")
print(f"Python: {versions.get('python')}")
```
--------------------------------
### Hint Dictionary Example
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Illustrates the structure of the dictionary used for hints in Item.get_hints() and Creator.add_redirection(). It shows mapping Hint enum members to boolean or integer values.
```python
from libzim.writer import Hint
hints: dict[Hint, int] = {
Hint.FRONT_ARTICLE: True, # or 1
Hint.COMPRESS: False # or 0
}
```
--------------------------------
### Execute Search Query and Process Results
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Illustrates executing a search query and retrieving results. It shows how to get the estimated number of matches and iterate through the entry paths to access entry titles.
```python
from libzim.search import Query, Searcher
searcher = Searcher(archive)
query = Query().set_query("python programming")
results = searcher.search(query)
# Get number of matches
count = results.getEstimatedMatches()
print(f"Found {count} matches")
# Get results
result_paths = list(results.getResults(0, count))
for path in result_paths:
entry = archive.get_entry_by_path(path)
print(f"- {entry.title}")
```
--------------------------------
### Configure Creator for verbose output
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Enable verbose console output during ZIM file creation by calling `config_verbose(True)`. This provides detailed progress information. Must be called before creation starts.
```python
with Creator("output.zim").config_verbose(True) as c:
c.add_item(item)
# Prints progress information
```
--------------------------------
### Read a ZIM File
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md
Demonstrates how to open a ZIM file, retrieve an entry by its path, get its item, and read its content. It also shows how to perform a full-text search if the archive has an index.
```python
from libzim.reader import Archive
archive = Archive("example.zim")
# Get entry
entry = archive.get_entry_by_path("Main_Page")
item = entry.get_item()
# Read content
html = bytes(item.content).decode("UTF-8")
# Search (if available)
if archive.has_fulltext_index:
from libzim.search import Searcher, Query
searcher = Searcher(archive)
results = searcher.search(Query().set_query("python"))
```
--------------------------------
### Creator Context Manager
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
The `Creator` class acts as a context manager, simplifying the process of file creation. The `__enter__` method starts the ZIM file creation, and the `__exit__` method ensures the file is properly finalized and closed. This is the recommended way to use the `Creator`.
```APIDOC
## Creator Context Manager
### Description
Supports the `with` statement. `__enter__` starts the ZIM file creation, `__exit__` finalizes and closes the file.
### Method
__enter__
__exit__
### Example (Recommended)
```python
with Creator("output.zim") as creator:
creator.add_item(item)
# File is finalized and closed here
```
### Example (Linear Usage)
```python
creator = Creator("output.zim")
creator.__enter__()
creator.add_item(item)
creator.__exit__(None, None, None)
```
```
--------------------------------
### Basic Search Result Set Iteration
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
A simple example of iterating through a SearchResultSet to print each result path. This is a fundamental way to access the data returned by a search.
```python
results = searcher.search(query)
result_set = results.getResults(0, 10)
for path in result_set:
print(path)
```
--------------------------------
### Entry Path Formats
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Provides examples of different string formats used for entry paths within ZIM files, including standard hierarchical paths and the newer namespace scheme. It also lists common namespace prefixes.
```python
path: str = "A/Main_Page" # Standard format
path: str = "mainPage" # New namespace scheme
path: str = "Media/image.png" # Media files
```
--------------------------------
### Check libzim Version Compatibility
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md
Compares the installed libzim version against a required version to ensure compatibility. This is crucial for features that depend on specific libzim capabilities.
```python
from libzim.version import get_libzim_version
required_version = "9.0.0"
actual_version = get_libzim_version()
if tuple(map(int, actual_version.split("."))) >= \
tuple(map(int, required_version.split("."))):
print(f"libzim {actual_version} meets requirement {required_version}")
else:
print(f"ERROR: libzim {actual_version} does not meet requirement {required_version}")
```
--------------------------------
### Pathlib.Path Conversion and Properties
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Provides examples of converting strings to pathlib.Path objects and accessing common path properties like name, parent, stem, suffix, and absolute path. This is useful for file system operations.
```python
import pathlib
# From string
path = pathlib.Path("archive.zim")
# To string
path_str = str(path)
# Properties
print(path.name) # "archive.zim"
print(path.parent) # Current directory
print(path.stem) # "archive"
print(path.suffix) # ".zim"
print(path.absolute()) # Full path
```
--------------------------------
### Iterate Through Search Result Paths
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Shows how to iterate directly over a SearchResultSet to get entry paths. It also demonstrates converting the result set to a list or unpacking it.
```python
results = searcher.search(query)
result_set = results.getResults(0, 10)
# Convert to list
paths = list(result_set)
# Or iterate directly
for path in result_set:
entry = archive.get_entry_by_path(path)
print(f"Found: {entry.title}")
# Or unpack for small result sets
first_path, *rest = result_set
```
--------------------------------
### Iterating Content with StringProvider
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
An example demonstrating how to use the feed() method of a StringProvider (or any ContentProvider) to iteratively retrieve content chunks until an empty blob is returned, signaling the end of content.
```python
from libzim.writer import StringProvider
provider = StringProvider("Content")
while True:
blob = provider.feed()
if not blob.size():
break
process(blob)
```
--------------------------------
### Validate Item Before Adding to Creator
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/errors.md
This example demonstrates how to implement and use a custom `validate` method for an `Item` subclass before adding it to a `Creator`. It ensures essential item properties are present and valid.
```python
from libzim.writer import Creator, Item
class MyItem(Item):
def __init__(self, path, title, content):
super().__init__()
self.path = path
self.title = title
self.content = content
# ... implement required methods
def validate(self):
"""Validate item before adding."""
assert self.get_path(), "Path cannot be empty"
assert self.get_title(), "Title cannot be empty"
assert self.get_mimetype(), "Mimetype cannot be empty"
provider = self.get_contentprovider()
assert provider, "ContentProvider cannot be None"
assert provider.get_size() > 0, "Content cannot be empty"
hints = self.get_hints()
assert isinstance(hints, dict), "Hints must be dict"
item = MyItem("path", "title", "content")
item.validate()
with Creator("output.zim") as creator:
creator.add_item(item)
```
--------------------------------
### Build wheel using system libzim
Source: https://github.com/openzim/python-libzim/blob/main/README.md
This command builds a wheel for python-libzim, utilizing a system-installed C++ libzim. The resulting wheel will not bundle the C++ libzim binary. Ensure libzim is installed via your system's package manager.
```sh
# using system-installed C++ libzim
brew install libzim # macOS
apt-get install libzim-devel # debian
dnf install libzim-dev # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel
```
--------------------------------
### Initialize Searcher and Perform Basic Search
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Demonstrates how to initialize the Searcher with an archive and perform a basic search using a Query object. Ensure the archive has a full-text index.
```python
from libzim.reader import Archive
from libzim.search import Searcher, Query
archive = Archive("example.zim")
if archive.has_fulltext_index:
searcher = Searcher(archive)
query = Query().set_query("search term")
results = searcher.search(query)
```
--------------------------------
### Get Batch Suggestions
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md
Efficiently retrieve suggestions for multiple prefixes from a ZIM archive. This function processes a list of prefixes and returns a dictionary mapping each prefix to its top suggested paths.
```python
def batch_suggestions(archive_path: str, prefixes: list[str]):
"""Get suggestions for multiple prefixes."""
archive = Archive(archive_path)
searcher = SuggestionSearcher(archive)
results = {}
for prefix in prefixes:
suggestions = searcher.suggest(prefix)
count = suggestions.getEstimatedMatches()
paths = list(suggestions.getResults(0, min(5, count)))
results[prefix] = paths
return results
```
--------------------------------
### Get libzim Version Information
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md
Retrieve version details for the libzim library. Use `get_libzim_version()` for the C++ core version or `get_versions()` for a dictionary of all components. `print_versions()` outputs directly to stdout.
```python
from libzim.version import get_versions, get_libzim_version
# Get specific version
libzim_version = get_libzim_version()
print(f"libzim: {libzim_version}")
# Get all versions
versions = get_versions()
for component, version in versions.items():
print(f"{component}: {version}")
# Print to stdout
from libzim.version import print_versions
print_versions()
```
--------------------------------
### Move DLLs for editable install on Windows
Source: https://github.com/openzim/python-libzim/blob/main/README.md
When installing python-libzim in editable mode (`pip install -e .`) on Windows, you need to manually place the bundled DLLs (libzim and libicu) at the root of the repository.
```powershell
Move-Item -Force -Path .\libzim\*.dll -Destination .\
```
--------------------------------
### Initialize Searcher with Archive
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Shows how to create a Searcher instance for a given ZIM archive. The archive must possess a full-text index for search operations to be valid.
```python
from libzim.reader import Archive
from libzim.search import Searcher
archive = Archive("wikipedia.zim")
searcher = Searcher(archive)
```
--------------------------------
### get_libzim_version
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md
Get the version of the underlying C++ libzim library as a string.
```APIDOC
## get_libzim_version
### Description
Get the version of the underlying C++ libzim library.
### Method
`get_libzim_version() -> str`
### Parameters
None
### Request Example
```python
from libzim.version import get_libzim_version
version = get_libzim_version()
```
### Response
#### Success Response
- **version** (`str`) - Version string (e.g., "9.0.1")
#### Response Example
```
"9.0.1"
```
```
--------------------------------
### Create a ZIM Archive with Custom Items
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md
Shows how to create a new ZIM file, configure compression and indexing, add custom content items defined by a `MyItem` class, and include metadata and illustrations.
```python
import pathlib
import datetime
from libzim.writer import (
Creator, Item, StringProvider, Hint, Compression
)
# Define custom item
class MyItem(Item):
def __init__(self, path, title, content):
super().__init__()
self.path = path
self.title = title
self.content = content
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
# Create ZIM file
with Creator(pathlib.Path("output.zim")) \
.config_compression(Compression.zstd) \
.config_indexing(True, "eng") \
.config_verbose(True) \
.set_mainpath("Main") as creator:
# Add content
item = MyItem("Main", "Home", "Welcome
")
creator.add_item(item)
# Add metadata
creator.add_metadata("Title", "My Archive")
creator.add_metadata("Creator", "My Name")
creator.add_metadata("Date", datetime.date.today())
creator.add_metadata("Language", "eng")
# Add icon
icon_bytes = pathlib.Path("icon.png").read_bytes()
creator.add_illustration(48, icon_bytes)
```
--------------------------------
### Get libzim Version
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md
Retrieve the version string of the underlying C++ libzim library.
```python
from libzim.version import get_libzim_version
version = get_libzim_version()
# Returns C++ libzim version string (e.g., "9.0.1")
```
--------------------------------
### Get Keywords for an Item
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Retrieves comma-separated keywords or tags associated with an item. Returns a string.
```python
def get_keywords(self) -> str:
```
--------------------------------
### Create a ZIM file using Creator context manager
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Use the `Creator` class as a context manager to automatically handle ZIM file creation and cleanup. This is the recommended approach for creating ZIM files.
```python
from libzim.writer import Creator
import pathlib
with Creator(pathlib.Path("output.zim")) as creator:
creator.add_item(my_item)
creator.add_metadata("Title", "My Archive")
```
--------------------------------
### config_nbworkers
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Configures the number of worker threads used for compression. This should be called before the creation process starts.
```APIDOC
## config_nbworkers
### Description
Set number of worker threads for compression.
### Method
`config_nbworkers(self, nbWorkers: int) -> Self`
### Parameters
#### Path Parameters
None
#### Query Parameters
None
#### Request Body
None
### Parameters
- **nbWorkers** (int) - Required - Number of worker threads for compression
### Request Example
```python
# Use 4 threads for compression
with Creator("output.zim").config_nbworkers(4) as c:
c.add_item(item)
```
### Response
#### Success Response (200)
None
#### Response Example
None
### Raises
- `RuntimeError`: If called after creation has started
```
--------------------------------
### Get Current Cache Size
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/configuration.md
Retrieves the current size of the cluster cache in bytes. This is a read-only value.
```python
current = get_cluster_cache_current_size() # Returns bytes
```
--------------------------------
### Create ZIM file with metadata and illustration
Source: https://github.com/openzim/python-libzim/blob/main/README.md
This snippet demonstrates creating a ZIM file, adding items, setting metadata, and including an illustration. Ensure the illustration is base64 decoded bytes.
```python
illustration = base64.b64decode("iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAANQTFRFR3BMgvrS0gAAAAF0Uk5TAEDm2GYAAAANSURBVBjTY2AYBdQEAAFQAAGn4toWAAAAAElFTkSuQmCC")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
creator.add_illustration(48, illustration)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
"language": "eng",
"date": "2024-06-30",
}.items():
creator.add_metadata(name.title(), value)
```
--------------------------------
### Create a Query Object
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md
Demonstrates the creation of a Query object and setting the search string using the set_query method, which supports method chaining.
```python
from libzim.search import Query
query = Query().set_query("search term")
```
--------------------------------
### Configure Compression Workers
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Set the number of worker threads used for compression. This should be called before creation starts.
```python
with Creator("output.zim").config_nbworkers(4) as c:
c.add_item(item)
```
--------------------------------
### Initialize Archive Object
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md
Opens a ZIM file at the specified path for reading. Ensure the path points to a valid ZIM file.
```python
from libzim.reader import Archive
import pathlib
archive = Archive(pathlib.Path("example.zim"))
```
--------------------------------
### Get Word Count for an Item
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Retrieves the word count of an item, which is used for relevance ranking. Returns an integer.
```python
def get_wordcount(self) -> int:
```
--------------------------------
### Get Entry Item
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md
Retrieves the content Item for a given ZIM archive entry, automatically resolving any redirects.
```python
entry = archive.get_entry_by_path("Redirect_Page")
item = entry.get_item() # Automatically follows redirect
print(bytes(item.content).decode("UTF-8"))
```
--------------------------------
### Print Version Information to Console
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md
Prints version details for libzim and its dependencies directly to standard output. This is useful for quick checks.
```python
from libzim.version import print_versions
import sys
print_versions() # Prints to stdout
```
--------------------------------
### SuggestionSearch.getResults
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md
Fetches a specific batch of suggestion results, starting from a given offset and returning a specified number of items.
```APIDOC
## SuggestionSearch.getResults
### Description
Get a batch of suggestions starting from a given offset.
### Method
```python
def getResults(self, start: int, count: int) -> SuggestionResultSet
```
### Parameters
#### Path Parameters
- **start** (int) - Yes - Starting offset (0-based index)
- **count** (int) - Yes - Number of suggestions to retrieve
### Returns
- **SuggestionResultSet** - Iterable of entry paths
### Request Example
```python
suggestions = searcher.suggest("Albert")
total = suggestions.getEstimatedMatches()
# Fetch first 10 suggestions
batch1 = suggestions.getResults(0, 10)
paths1 = list(batch1)
# Fetch next 10 suggestions
batch2 = suggestions.getResults(10, 10)
paths2 = list(batch2)
# Fetch all suggestions
all_paths = list(suggestions.getResults(0, total))
for path in all_paths:
entry = archive.get_entry_by_path(path)
print(f"- {entry.title}")
```
```
--------------------------------
### Implementing a Custom ContentProvider
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Shows how to create a custom content provider by inheriting from ContentProvider and implementing the get_size and gen_blob methods. This allows for streaming large or dynamically generated content.
```python
from libzim.writer import ContentProvider, Blob
class CustomProvider(ContentProvider):
def get_size(self) -> int:
return 1000 # Total size in bytes
def gen_blob(self):
yield Blob(b"First chunk")
yield Blob(b"Second chunk")
```
--------------------------------
### Complete Module Import Pattern
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md
Demonstrates importing the entire libzim library and accessing its modules and classes via the top-level namespace.
```python
import libzim
from libzim import reader, writer, search, suggestion, version
# Access via module
archive = libzim.reader.Archive("file.zim")
```
--------------------------------
### Initialize Creator for writing to a ZIM file
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Manually initialize a `Creator` instance for writing to a specified ZIM file path. Ensure the directory exists and has write permissions.
```python
import pathlib
from libzim.writer import Creator
zim_path = pathlib.Path("archive.zim")
creator = Creator(zim_path)
with creator:
# Add content here
pass
```
--------------------------------
### Configure Creator with Zstd Compression
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md
Configure the creator with zstd compression, cluster size, indexing, worker count, verbosity, and main path.
```python
creator = Creator("output.zim") \
.config_compression("zstd") \
.config_clustersize(8192) \
.config_indexing(True, "eng") \
.config_nbworkers(4) \
.config_verbose(True) \
.set_mainpath("Main")
```
--------------------------------
### Get Geographic Position of an Item
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Retrieves the geographic position (latitude, longitude) of an item if applicable. Returns a tuple of floats or None.
```python
def get_geoposition(self) -> tuple[float, float] | None:
```
--------------------------------
### Configure Creator for Zstandard compression
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Set the compression algorithm to Zstandard using `config_compression(Compression.zstd)` or `config_compression("zstd")`. This configuration must be done before adding items.
```python
from libzim.writer import Creator, Compression
# Using enum
with Creator("output.zim").config_compression(Compression.zstd) as c:
c.add_item(item)
# Using string
with Creator("output.zim").config_compression("zstd") as c:
c.add_item(item)
```
--------------------------------
### Read a ZIM File with python-libzim
Source: https://github.com/openzim/python-libzim/blob/main/README.md
Demonstrates how to open a ZIM file, access its main entry, retrieve specific entries by path, and perform full-text searches and suggestion lookups.
```python
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
```
--------------------------------
### libzim.suggestion Module
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md
Enables autocomplete and title suggestions for ZIM archives. It allows users to get suggestions based on a given prefix.
```APIDOC
## Module: libzim.suggestion
**Purpose**: Autocomplete and title suggestions
**Key Classes**
| Class | Purpose |
|-------|---------|
| `SuggestionSearcher` | Get prefix-based suggestions |
| `SuggestionSearch` | Suggestion results container |
| `SuggestionResultSet` | Iterable of suggestion paths |
**Requirements**
- Archive must have `has_title_index == True`
- Created with title index support
**Common Operations**
```python
from libzim.reader import Archive
from libzim.suggestion import SuggestionSearcher
archive = Archive("archive.zim")
if not archive.has_title_index:
print("Archive has no title index")
else:
searcher = SuggestionSearcher(archive)
suggestions = searcher.suggest("Albert")
count = suggestions.getEstimatedMatches()
print(f"Found {count} suggestions")
# Get all suggestions
for path in suggestions.getResults(0, count):
entry = archive.get_entry_by_path(path)
print(f"- {entry.title}")
```
```
--------------------------------
### Creator Class Initialization
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Initializes a `Creator` object to begin writing a ZIM file. The `filename` parameter specifies the output path, and the directory must exist with write permissions. It's recommended to use this class as a context manager.
```APIDOC
## Creator Constructor
### Description
Initialize a Creator for writing to a ZIM file.
### Method
__init__
### Parameters
#### Path Parameters
- **filename** (`pathlib.Path`) - Yes - Path where the ZIM file will be created. Directory must exist.
### Raises
- **IOError**: If the directory doesn't exist or lacks write permissions.
### Example
```python
import pathlib
from libzim.writer import Creator
zim_path = pathlib.Path("archive.zim")
creator = Creator(zim_path)
with creator:
# Add content here
pass
```
```
--------------------------------
### Get Estimated Matches from SuggestionSearch
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md
Obtain the estimated count of suggestions that match the search query. This method is called on a SuggestionSearch object.
```python
suggestions = searcher.suggest("Einstein")
count = suggestions.getEstimatedMatches()
print(f"Found {count} articles starting with 'Einstein'")
```
--------------------------------
### Create a ZIM File
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md
Illustrates how to create a new ZIM file using the Creator class. It defines a custom Item class for articles and configures indexing before adding items and metadata.
```python
from libzim.writer import Creator, Item, StringProvider, Hint
import pathlib
class Article(Item):
def __init__(self, path, title, content):
super().__init__()
self.path, self.title, self.content = path, title, content
def get_path(self): return self.path
def get_title(self): return self.title
def get_mimetype(self): return "text/html"
def get_contentprovider(self): return StringProvider(self.content)
def get_hints(self): return {Hint.FRONT_ARTICLE: True}
with Creator(pathlib.Path("output.zim")) \
.config_indexing(True, "eng") as creator:
creator.add_item(Article("Main", "Home", "Welcome
"))
creator.add_metadata("Title", "My Archive")
```
--------------------------------
### Iterating Search Results
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Demonstrates iterating over a SearchResultSet to get individual paths. The result set can be iterated directly or converted to a list.
```python
from collections.abc import Iterator
from libzim.search import SearchResultSet
result_set: SearchResultSet
for path in result_set: # Uses __iter__
print(path) # path: str
# Or convert to list
paths: list[str] = list(result_set)
```
--------------------------------
### Hint Enum Definition
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Defines hints for the ZIM creator to manage entry content. Use these hints to guide the creation process.
```python
from libzim.writer import Hint
class Hint(enum.Enum):
COMPRESS: Self
FRONT_ARTICLE: Self
```
--------------------------------
### Write a ZIM File with python-libzim
Source: https://github.com/openzim/python-libzim/blob/main/README.md
Provides a custom Item class and demonstrates how to create a ZIM file with multiple entries, including one from a string and another from a file.
```python
import base64
import pathlib
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content="", fpath=None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """Web Page Title
Welcome to this ZIM
Kiwix
"""
pathlib.Path("home-fr.html").write_text(
"""
Bonjour
this is home-fr
"""
)
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
```
--------------------------------
### Get Suggestions with SuggestionSearcher
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md
Retrieves suggestions based on a query string using prefix-based autocomplete. The searcher must be initialized with a valid archive.
```python
from libzim.suggestion import SuggestionSearcher
searcher = SuggestionSearcher(archive)
suggestions = searcher.suggest("Albert")
# Get number of matches
count = suggestions.getEstimatedMatches()
print(f"Found {count} suggestions")
# Get suggestions
suggestion_paths = list(suggestions.getResults(0, count))
for path in suggestion_paths:
entry = archive.get_entry_by_path(path)
print(f"- {entry.title}")
```
--------------------------------
### Manage Archive Caches
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md
Get and set the maximum cluster cache size, and check current usage. Also configure the entry cache.
```python
from libzim.reader import (
get_cluster_cache_max_size,
set_cluster_cache_max_size,
get_cluster_cache_current_size
)
archive.dirent_cache_max_size = 256
print(archive.dirent_cache_current_size)
```
--------------------------------
### Implementing IndexData with Optional Geoposition
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md
Example of implementing the IndexData interface where a method can return a tuple of floats or None, indicating the absence of geoposition data.
```python
from libzim.writer import IndexData
class MyIndexData(IndexData):
def get_geoposition(self) -> tuple[float, float] | None:
# Return None if no position available
return None
```
--------------------------------
### Get Current Cluster Cache Size in Python
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md
Retrieves the current size of the cluster cache in bytes. This function requires importing `get_cluster_cache_current_size` from `libzim.reader`.
```python
from libzim.reader import get_cluster_cache_current_size
current = get_cluster_cache_current_size()
print(f"Current cache usage: {current} bytes")
```
--------------------------------
### Get Archive UUID
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md
Retrieves the unique identifier (UUID) of the ZIM file and prints its hexadecimal representation. This is useful for identifying specific ZIM archives.
```python
from libzim.reader import Archive
archive = Archive("test.zim")
print(archive.uuid.hex) # 32-character hex string
```
--------------------------------
### Linear usage of Creator without context manager
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Shows how to use the `Creator` class linearly by manually calling `__enter__` and `__exit__` methods. This approach requires explicit management of the file lifecycle.
```python
creator = Creator("output.zim")
creator.__enter__()
creator.add_item(item)
creator.__exit__(None, None, None)
```
--------------------------------
### Custom Item with Index Data
Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md
Example of implementing the optional get_indexdata method in a custom Item subclass to provide custom indexing data for full-text search.
```python
class SearchableItem(Item):
def get_indexdata(self):
class CustomIndexData(IndexData):
def has_indexdata(self):
return True
def get_title(self):
return "Article Title"
def get_content(self):
return "Searchable content text"
def get_keywords(self):
return "keyword1, keyword2"
def get_wordcount(self):
return 50
return CustomIndexData()
```