### Setup python-libzim on Ubuntu/Debian Source: https://github.com/openzim/python-libzim/blob/main/CONTRIBUTING.md Installs necessary system packages, Python dependencies, and clones the repository. Configures environment variables for CFLAGS and LDFLAGS to point to the libzim installation. Builds the Cython extension and installs the package in development mode. ```bash apt install coreutils wget git ca-certificates \ g++ pkg-config libtool automake autoconf make meson ninja-build \ liblzma-dev zlib1g-dev libicu-dev libgumbo-dev libmagic-dev pip3 install --upgrade pip pipenv export CFLAGS "-I${LIBZIM_DIR}/include" export LDFLAGS "-L${LIBZIM_DIR}/lib/x86_64-linux-gnu" git clone https://github.com/openzim/python-libzim cd python-libzim python setup.py build_ext pipenv install --dev pipenv run pip install -e . ``` -------------------------------- ### Complete Workflow Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md This example demonstrates the full lifecycle of creating a ZIM archive, adding articles, writing metadata, and then reading and querying the archive using search and suggestion functionalities. It requires importing necessary classes from libzim.reader, libzim.writer, libzim.search, and libzim.suggestion. ```python import pathlib import datetime from libzim.reader import Archive from libzim.writer import Creator, Item, StringProvider, Hint from libzim.search import Searcher, Query from libzim.suggestion import SuggestionSearcher # STEP 1: Define content class Article(Item): def __init__(self, path, title, content): super().__init__() self.path = path self.title = title self.content = content def get_path(self): return self.path def get_title(self): return self.title def get_mimetype(self): return "text/html" def get_contentprovider(self): return StringProvider(self.content) def get_hints(self): return {Hint.FRONT_ARTICLE: True} # STEP 2: Create archive output = pathlib.Path("myarchive.zim") articles = [ Article("Main", "Home", "

Welcome

Homepage

"), Article("About", "About Us", "

About

About this site

"), ] with Creator(output) \ .config_indexing(True, "eng") \ .config_compression("zstd") as creator: for article in articles: creator.add_item(article) creator.add_metadata("Title", "My Wiki") creator.add_metadata("Creator", "Admin") creator.add_metadata("Date", datetime.date.today()) creator.set_mainpath("Main") print(f"Created: {output}") # STEP 3: Read and query archive = Archive(output) print(f"\nArchive info:") print(f" Entries: {archive.entry_count}") print(f" Main: {archive.main_entry.path}") print(f" Title: {archive.get_metadata('Title').decode()}") # STEP 4: Search if archive.has_fulltext_index: searcher = Searcher(archive) query = Query().set_query("Welcome") results = searcher.search(query) print(f"\nSearch results for 'Welcome': {results.getEstimatedMatches()}") # STEP 5: Suggestions if archive.has_title_index: suggester = SuggestionSearcher(archive) suggestions = suggester.suggest("A") print(f"Suggestions for 'A': {suggestions.getEstimatedMatches()}") ``` -------------------------------- ### Custom Item Hints Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Example of implementing the get_hints method in a custom Item subclass to provide entry-specific hints like FRONT_ARTICLE and COMPRESS. ```python def get_hints(self): return { Hint.FRONT_ARTICLE: True, # This is a main article Hint.COMPRESS: True # Compress this content } ``` -------------------------------- ### FileProvider Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Demonstrates initializing FileProvider with a file path. This provider is used to serve content directly from a file on the disk. ```python import pathlib from libzim.writer import FileProvider filepath = pathlib.Path("large_file.html") provider = FileProvider(filepath) ``` -------------------------------- ### Install python-libzim Source: https://github.com/openzim/python-libzim/blob/main/README.md Install the python-libzim package using pip. This command installs pre-compiled wheels for supported platforms. ```sh pip install libzim ``` -------------------------------- ### Complete Usage Example for Article Suggestions Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md Demonstrates how to get article title suggestions from a ZIM archive using the SuggestionSearcher. Ensure the archive has a title index for this functionality. ```python from libzim.reader import Archive from libzim.suggestion import SuggestionSearcher def get_suggestions(archive_path: str, prefix: str, limit: int = 10): """Get article title suggestions for a prefix.""" archive = Archive(archive_path) # Check if archive has title index if not archive.has_title_index: print("Archive does not have a title index") return [] # Create searcher and get suggestions searcher = SuggestionSearcher(archive) suggestions = searcher.suggest(prefix) # Get result count total_suggestions = suggestions.getEstimatedMatches() if total_suggestions == 0: print(f"No suggestions found for '{prefix}'") return [] # Fetch suggestions up to the limit fetch_count = min(limit, total_suggestions) result_set = suggestions.getResults(0, fetch_count) results = [] for path in result_set: entry = archive.get_entry_by_path(path) results.append({ "path": entry.path, "title": entry.title }) return results # Usage suggestions = get_suggestions("wikipedia.zim", "Albert", limit=20) for item in suggestions: print(f"{item['title']} ({item['path']})") ``` -------------------------------- ### Complete Archive Search Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Demonstrates how to open a ZIM archive, check for a full-text index, create a searcher, execute a query, and display a batch of results. ```python from libzim.reader import Archive from libzim.search import Searcher, Query def search_archive(archive_path: str, search_term: str): """Search a ZIM archive and display results.""" archive = Archive(archive_path) # Check if archive has search index if not archive.has_fulltext_index: print("Archive does not have a full-text search index") return # Create searcher and execute query searcher = Searcher(archive) query = Query().set_query(search_term) results = searcher.search(query) # Get result count total_matches = results.getEstimatedMatches() print(f"Search for '{search_term}': {total_matches} matches") if total_matches == 0: return # Fetch and display first 20 results batch_size = 20 result_set = results.getResults(0, min(batch_size, total_matches)) for i, path in enumerate(result_set, 1): entry = archive.get_entry_by_path(path) print(f"{i}. {entry.title} ({entry.path})") # Usage search_archive("wikipedia.zim", "Albert Einstein") ``` -------------------------------- ### Use Creator as a context manager Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Demonstrates the basic usage of the `Creator` class with a `with` statement, ensuring the ZIM file is properly started and finalized. ```python with Creator("output.zim") as creator: creator.add_item(item) # File is finalized and closed here ``` -------------------------------- ### get_versions Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md Get version information as an ordered dictionary, mapping component names to their version strings. ```APIDOC ## get_versions ### Description Get version information as an ordered dictionary. ### Method `get_versions() -> OrderedDict[str, str]` ### Parameters None ### Request Example ```python from libzim.version import get_versions versions = get_versions() ``` ### Response #### Success Response - **versions** (`OrderedDict[str, str]`) - Mapping of component names to version strings. #### Response Example ```python { "libzim": "9.0.1", "python": "3.11.5", "python-libzim": "3.10.1-dev0" } ``` ``` -------------------------------- ### Setup Debug Logging with Version Info Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md Configures the logging system to DEBUG level and logs the output of `print_versions` to aid in debugging. This provides a comprehensive view of the environment during troubleshooting. ```python from libzim.version import print_versions import sys import logging def setup_debug_logging(): """Setup logging with version information.""" logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger(__name__) # Log version info import io version_output = io.StringIO() print_versions(out=version_output) logger.debug(f"Version info:\n{version_output.getvalue()}") setup_debug_logging() ``` -------------------------------- ### StringProvider Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Illustrates how to initialize StringProvider with either a string or bytes object to provide inline content. The string content is automatically encoded as UTF-8. ```python from libzim.writer import StringProvider # From string provider = StringProvider("

HTML Content

") # From bytes provider = StringProvider(b"Binary data") ``` -------------------------------- ### Retrieving and Iterating OrderedDict Versions Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Shows how to get version information as an OrderedDict and iterate through its components and versions. ```python from collections import OrderedDict from libzim.version import get_versions versions: OrderedDict[str, str] = get_versions() # Iterate in order for component, version in versions.items(): print(f"{component}: {version}") ``` -------------------------------- ### Handling Configuration Errors During Creator Setup Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/errors.md When using the Creator, ensure all configurations are done before adding items or entering the context. Attempting to configure compression or indexing after items have been added will raise a RuntimeError. ```python from libzim.writer import Creator try: creator = Creator("output.zim") with creator: creator.add_item(item1) # Error: trying to configure after items added creator.config_compression("zstd") except RuntimeError as e: print(f"Configuration error: {e}") ``` ```python # Correct way with Creator("output.zim") \ .config_compression("zstd") \ .config_indexing(True, "eng") as creator: creator.add_item(item1) ``` -------------------------------- ### Implementing a Generator for Blobs Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Example of implementing a ContentProvider that yields Blobs using a generator. This is useful for providing content in chunks. ```python from collections.abc import Generator from libzim.writer import Blob, ContentProvider class MyProvider(ContentProvider): def gen_blob(self) -> Generator[Blob]: yield Blob(b"Part 1") yield Blob(b"Part 2") ``` -------------------------------- ### Clone and Test python-libzim Source: https://github.com/openzim/python-libzim/blob/main/README.md Clone the python-libzim repository and run tests with coverage. Ensure you have the necessary development tools installed. ```sh git clone git@github.com:openzim/python-libzim.git && cd python-libzim # hatch run test:coverage ``` -------------------------------- ### Paginate Through All Search Results Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Provides an example of iterating through all search results using pagination. It calculates the total matches and fetches results in batches of a defined size. ```python results = searcher.search(query) total = results.getEstimatedMatches() batch_size = 20 for offset in range(0, total, batch_size): batch = results.getResults(offset, batch_size) for path in batch: entry = archive.get_entry_by_path(path) print(f"- {entry.title}") ``` -------------------------------- ### Error Handling Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md Demonstrates how to handle potential errors when using the SuggestionSearcher, such as a missing title index or no suggestions found. ```APIDOC ## Suggestion Searcher Error Handling ### Description This example shows how to safely use the SuggestionSearcher, including checks for the archive's title index and handling of potential runtime errors. ### Usage 1. Ensure the ZIM archive has a title index. 2. Perform a search using `searcher.suggest()`. 3. Handle cases where no suggestions are found. 4. Catch `RuntimeError` for other potential issues. ### Example ```python from libzim.reader import Archive from libzim.suggestion import SuggestionSearcher try: archive = Archive("example.zim") if not archive.has_title_index: print("Archive lacks title index for suggestions") exit(1) searcher = SuggestionSearcher(archive) suggestions = searcher.suggest("search term") count = suggestions.getEstimatedMatches() if count > 0: for path in suggestions.getResults(0, count): print(archive.get_entry_by_path(path).title) else: print("No suggestions found") except RuntimeError as e: print(f"Error: {e}") ``` ``` -------------------------------- ### Creating an Item with FileProvider Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md An example of defining a custom Item class that uses FileProvider to serve content from a specified file. This includes setting the item's path, title, mimetype, content provider, and hints. ```python from libzim.writer import Item, FileProvider class FileItem(Item): def __init__(self, path, title, filepath): super().__init__() self.path = path self.title = title self.filepath = filepath def get_path(self): return self.path def get_title(self): return self.title def get_mimetype(self): return "text/html" def get_contentprovider(self): return FileProvider(self.filepath) def get_hints(self): return {} ``` -------------------------------- ### Get All Version Information as an Ordered Dictionary Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md Retrieves version details for all components, including libzim, Python, and dependencies, as an ordered dictionary. This is ideal for structured access and iteration. ```python from libzim.version import get_versions from collections import OrderedDict versions = get_versions() for component, version in versions.items(): print(f"{component}: {version}") # Access specific versions print(f"libzim: {versions.get('libzim')}") print(f"Python: {versions.get('python')}") ``` -------------------------------- ### Hint Dictionary Example Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Illustrates the structure of the dictionary used for hints in Item.get_hints() and Creator.add_redirection(). It shows mapping Hint enum members to boolean or integer values. ```python from libzim.writer import Hint hints: dict[Hint, int] = { Hint.FRONT_ARTICLE: True, # or 1 Hint.COMPRESS: False # or 0 } ``` -------------------------------- ### Execute Search Query and Process Results Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Illustrates executing a search query and retrieving results. It shows how to get the estimated number of matches and iterate through the entry paths to access entry titles. ```python from libzim.search import Query, Searcher searcher = Searcher(archive) query = Query().set_query("python programming") results = searcher.search(query) # Get number of matches count = results.getEstimatedMatches() print(f"Found {count} matches") # Get results result_paths = list(results.getResults(0, count)) for path in result_paths: entry = archive.get_entry_by_path(path) print(f"- {entry.title}") ``` -------------------------------- ### Configure Creator for verbose output Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Enable verbose console output during ZIM file creation by calling `config_verbose(True)`. This provides detailed progress information. Must be called before creation starts. ```python with Creator("output.zim").config_verbose(True) as c: c.add_item(item) # Prints progress information ``` -------------------------------- ### Read a ZIM File Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md Demonstrates how to open a ZIM file, retrieve an entry by its path, get its item, and read its content. It also shows how to perform a full-text search if the archive has an index. ```python from libzim.reader import Archive archive = Archive("example.zim") # Get entry entry = archive.get_entry_by_path("Main_Page") item = entry.get_item() # Read content html = bytes(item.content).decode("UTF-8") # Search (if available) if archive.has_fulltext_index: from libzim.search import Searcher, Query searcher = Searcher(archive) results = searcher.search(Query().set_query("python")) ``` -------------------------------- ### Creator Context Manager Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md The `Creator` class acts as a context manager, simplifying the process of file creation. The `__enter__` method starts the ZIM file creation, and the `__exit__` method ensures the file is properly finalized and closed. This is the recommended way to use the `Creator`. ```APIDOC ## Creator Context Manager ### Description Supports the `with` statement. `__enter__` starts the ZIM file creation, `__exit__` finalizes and closes the file. ### Method __enter__ __exit__ ### Example (Recommended) ```python with Creator("output.zim") as creator: creator.add_item(item) # File is finalized and closed here ``` ### Example (Linear Usage) ```python creator = Creator("output.zim") creator.__enter__() creator.add_item(item) creator.__exit__(None, None, None) ``` ``` -------------------------------- ### Basic Search Result Set Iteration Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md A simple example of iterating through a SearchResultSet to print each result path. This is a fundamental way to access the data returned by a search. ```python results = searcher.search(query) result_set = results.getResults(0, 10) for path in result_set: print(path) ``` -------------------------------- ### Entry Path Formats Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Provides examples of different string formats used for entry paths within ZIM files, including standard hierarchical paths and the newer namespace scheme. It also lists common namespace prefixes. ```python path: str = "A/Main_Page" # Standard format path: str = "mainPage" # New namespace scheme path: str = "Media/image.png" # Media files ``` -------------------------------- ### Check libzim Version Compatibility Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md Compares the installed libzim version against a required version to ensure compatibility. This is crucial for features that depend on specific libzim capabilities. ```python from libzim.version import get_libzim_version required_version = "9.0.0" actual_version = get_libzim_version() if tuple(map(int, actual_version.split("."))) >= \ tuple(map(int, required_version.split("."))): print(f"libzim {actual_version} meets requirement {required_version}") else: print(f"ERROR: libzim {actual_version} does not meet requirement {required_version}") ``` -------------------------------- ### Pathlib.Path Conversion and Properties Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Provides examples of converting strings to pathlib.Path objects and accessing common path properties like name, parent, stem, suffix, and absolute path. This is useful for file system operations. ```python import pathlib # From string path = pathlib.Path("archive.zim") # To string path_str = str(path) # Properties print(path.name) # "archive.zim" print(path.parent) # Current directory print(path.stem) # "archive" print(path.suffix) # ".zim" print(path.absolute()) # Full path ``` -------------------------------- ### Iterate Through Search Result Paths Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Shows how to iterate directly over a SearchResultSet to get entry paths. It also demonstrates converting the result set to a list or unpacking it. ```python results = searcher.search(query) result_set = results.getResults(0, 10) # Convert to list paths = list(result_set) # Or iterate directly for path in result_set: entry = archive.get_entry_by_path(path) print(f"Found: {entry.title}") # Or unpack for small result sets first_path, *rest = result_set ``` -------------------------------- ### Iterating Content with StringProvider Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md An example demonstrating how to use the feed() method of a StringProvider (or any ContentProvider) to iteratively retrieve content chunks until an empty blob is returned, signaling the end of content. ```python from libzim.writer import StringProvider provider = StringProvider("Content") while True: blob = provider.feed() if not blob.size(): break process(blob) ``` -------------------------------- ### Validate Item Before Adding to Creator Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/errors.md This example demonstrates how to implement and use a custom `validate` method for an `Item` subclass before adding it to a `Creator`. It ensures essential item properties are present and valid. ```python from libzim.writer import Creator, Item class MyItem(Item): def __init__(self, path, title, content): super().__init__() self.path = path self.title = title self.content = content # ... implement required methods def validate(self): """Validate item before adding.""" assert self.get_path(), "Path cannot be empty" assert self.get_title(), "Title cannot be empty" assert self.get_mimetype(), "Mimetype cannot be empty" provider = self.get_contentprovider() assert provider, "ContentProvider cannot be None" assert provider.get_size() > 0, "Content cannot be empty" hints = self.get_hints() assert isinstance(hints, dict), "Hints must be dict" item = MyItem("path", "title", "content") item.validate() with Creator("output.zim") as creator: creator.add_item(item) ``` -------------------------------- ### Build wheel using system libzim Source: https://github.com/openzim/python-libzim/blob/main/README.md This command builds a wheel for python-libzim, utilizing a system-installed C++ libzim. The resulting wheel will not bundle the C++ libzim binary. Ensure libzim is installed via your system's package manager. ```sh # using system-installed C++ libzim brew install libzim # macOS apt-get install libzim-devel # debian dnf install libzim-dev # fedora USE_SYSTEM_LIBZIM=1 python3 -m build --wheel ``` -------------------------------- ### Initialize Searcher and Perform Basic Search Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Demonstrates how to initialize the Searcher with an archive and perform a basic search using a Query object. Ensure the archive has a full-text index. ```python from libzim.reader import Archive from libzim.search import Searcher, Query archive = Archive("example.zim") if archive.has_fulltext_index: searcher = Searcher(archive) query = Query().set_query("search term") results = searcher.search(query) ``` -------------------------------- ### Get Batch Suggestions Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md Efficiently retrieve suggestions for multiple prefixes from a ZIM archive. This function processes a list of prefixes and returns a dictionary mapping each prefix to its top suggested paths. ```python def batch_suggestions(archive_path: str, prefixes: list[str]): """Get suggestions for multiple prefixes.""" archive = Archive(archive_path) searcher = SuggestionSearcher(archive) results = {} for prefix in prefixes: suggestions = searcher.suggest(prefix) count = suggestions.getEstimatedMatches() paths = list(suggestions.getResults(0, min(5, count))) results[prefix] = paths return results ``` -------------------------------- ### Get libzim Version Information Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md Retrieve version details for the libzim library. Use `get_libzim_version()` for the C++ core version or `get_versions()` for a dictionary of all components. `print_versions()` outputs directly to stdout. ```python from libzim.version import get_versions, get_libzim_version # Get specific version libzim_version = get_libzim_version() print(f"libzim: {libzim_version}") # Get all versions versions = get_versions() for component, version in versions.items(): print(f"{component}: {version}") # Print to stdout from libzim.version import print_versions print_versions() ``` -------------------------------- ### Move DLLs for editable install on Windows Source: https://github.com/openzim/python-libzim/blob/main/README.md When installing python-libzim in editable mode (`pip install -e .`) on Windows, you need to manually place the bundled DLLs (libzim and libicu) at the root of the repository. ```powershell Move-Item -Force -Path .\libzim\*.dll -Destination .\ ``` -------------------------------- ### Initialize Searcher with Archive Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Shows how to create a Searcher instance for a given ZIM archive. The archive must possess a full-text index for search operations to be valid. ```python from libzim.reader import Archive from libzim.search import Searcher archive = Archive("wikipedia.zim") searcher = Searcher(archive) ``` -------------------------------- ### get_libzim_version Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md Get the version of the underlying C++ libzim library as a string. ```APIDOC ## get_libzim_version ### Description Get the version of the underlying C++ libzim library. ### Method `get_libzim_version() -> str` ### Parameters None ### Request Example ```python from libzim.version import get_libzim_version version = get_libzim_version() ``` ### Response #### Success Response - **version** (`str`) - Version string (e.g., "9.0.1") #### Response Example ``` "9.0.1" ``` ``` -------------------------------- ### Create a ZIM Archive with Custom Items Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md Shows how to create a new ZIM file, configure compression and indexing, add custom content items defined by a `MyItem` class, and include metadata and illustrations. ```python import pathlib import datetime from libzim.writer import ( Creator, Item, StringProvider, Hint, Compression ) # Define custom item class MyItem(Item): def __init__(self, path, title, content): super().__init__() self.path = path self.title = title self.content = content def get_path(self): return self.path def get_title(self): return self.title def get_mimetype(self): return "text/html" def get_contentprovider(self): return StringProvider(self.content) def get_hints(self): return {Hint.FRONT_ARTICLE: True} # Create ZIM file with Creator(pathlib.Path("output.zim")) \ .config_compression(Compression.zstd) \ .config_indexing(True, "eng") \ .config_verbose(True) \ .set_mainpath("Main") as creator: # Add content item = MyItem("Main", "Home", "

Welcome

") creator.add_item(item) # Add metadata creator.add_metadata("Title", "My Archive") creator.add_metadata("Creator", "My Name") creator.add_metadata("Date", datetime.date.today()) creator.add_metadata("Language", "eng") # Add icon icon_bytes = pathlib.Path("icon.png").read_bytes() creator.add_illustration(48, icon_bytes) ``` -------------------------------- ### Get libzim Version Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md Retrieve the version string of the underlying C++ libzim library. ```python from libzim.version import get_libzim_version version = get_libzim_version() # Returns C++ libzim version string (e.g., "9.0.1") ``` -------------------------------- ### Get Keywords for an Item Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Retrieves comma-separated keywords or tags associated with an item. Returns a string. ```python def get_keywords(self) -> str: ``` -------------------------------- ### Create a ZIM file using Creator context manager Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Use the `Creator` class as a context manager to automatically handle ZIM file creation and cleanup. This is the recommended approach for creating ZIM files. ```python from libzim.writer import Creator import pathlib with Creator(pathlib.Path("output.zim")) as creator: creator.add_item(my_item) creator.add_metadata("Title", "My Archive") ``` -------------------------------- ### config_nbworkers Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Configures the number of worker threads used for compression. This should be called before the creation process starts. ```APIDOC ## config_nbworkers ### Description Set number of worker threads for compression. ### Method `config_nbworkers(self, nbWorkers: int) -> Self` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **nbWorkers** (int) - Required - Number of worker threads for compression ### Request Example ```python # Use 4 threads for compression with Creator("output.zim").config_nbworkers(4) as c: c.add_item(item) ``` ### Response #### Success Response (200) None #### Response Example None ### Raises - `RuntimeError`: If called after creation has started ``` -------------------------------- ### Get Current Cache Size Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/configuration.md Retrieves the current size of the cluster cache in bytes. This is a read-only value. ```python current = get_cluster_cache_current_size() # Returns bytes ``` -------------------------------- ### Create ZIM file with metadata and illustration Source: https://github.com/openzim/python-libzim/blob/main/README.md This snippet demonstrates creating a ZIM file, adding items, setting metadata, and including an illustration. Ensure the illustration is base64 decoded bytes. ```python illustration = base64.b64decode("iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAANQTFRFR3BMgvrS0gAAAAF0Uk5TAEDm2GYAAAANSURBVBjTY2AYBdQEAAFQAAGn4toWAAAAAElFTkSuQmCC") with Creator("test.zim").config_indexing(True, "eng") as creator: creator.set_mainpath("home") creator.add_item(item) creator.add_item(item2) creator.add_illustration(48, illustration) for name, value in { "creator": "python-libzim", "description": "Created in python", "name": "my-zim", "publisher": "You", "title": "Test ZIM", "language": "eng", "date": "2024-06-30", }.items(): creator.add_metadata(name.title(), value) ``` -------------------------------- ### Create a Query Object Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/search.md Demonstrates the creation of a Query object and setting the search string using the set_query method, which supports method chaining. ```python from libzim.search import Query query = Query().set_query("search term") ``` -------------------------------- ### Configure Compression Workers Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Set the number of worker threads used for compression. This should be called before creation starts. ```python with Creator("output.zim").config_nbworkers(4) as c: c.add_item(item) ``` -------------------------------- ### Initialize Archive Object Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md Opens a ZIM file at the specified path for reading. Ensure the path points to a valid ZIM file. ```python from libzim.reader import Archive import pathlib archive = Archive(pathlib.Path("example.zim")) ``` -------------------------------- ### Get Word Count for an Item Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Retrieves the word count of an item, which is used for relevance ranking. Returns an integer. ```python def get_wordcount(self) -> int: ``` -------------------------------- ### Get Entry Item Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md Retrieves the content Item for a given ZIM archive entry, automatically resolving any redirects. ```python entry = archive.get_entry_by_path("Redirect_Page") item = entry.get_item() # Automatically follows redirect print(bytes(item.content).decode("UTF-8")) ``` -------------------------------- ### Print Version Information to Console Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/version.md Prints version details for libzim and its dependencies directly to standard output. This is useful for quick checks. ```python from libzim.version import print_versions import sys print_versions() # Prints to stdout ``` -------------------------------- ### SuggestionSearch.getResults Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md Fetches a specific batch of suggestion results, starting from a given offset and returning a specified number of items. ```APIDOC ## SuggestionSearch.getResults ### Description Get a batch of suggestions starting from a given offset. ### Method ```python def getResults(self, start: int, count: int) -> SuggestionResultSet ``` ### Parameters #### Path Parameters - **start** (int) - Yes - Starting offset (0-based index) - **count** (int) - Yes - Number of suggestions to retrieve ### Returns - **SuggestionResultSet** - Iterable of entry paths ### Request Example ```python suggestions = searcher.suggest("Albert") total = suggestions.getEstimatedMatches() # Fetch first 10 suggestions batch1 = suggestions.getResults(0, 10) paths1 = list(batch1) # Fetch next 10 suggestions batch2 = suggestions.getResults(10, 10) paths2 = list(batch2) # Fetch all suggestions all_paths = list(suggestions.getResults(0, total)) for path in all_paths: entry = archive.get_entry_by_path(path) print(f"- {entry.title}") ``` ``` -------------------------------- ### Implementing a Custom ContentProvider Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Shows how to create a custom content provider by inheriting from ContentProvider and implementing the get_size and gen_blob methods. This allows for streaming large or dynamically generated content. ```python from libzim.writer import ContentProvider, Blob class CustomProvider(ContentProvider): def get_size(self) -> int: return 1000 # Total size in bytes def gen_blob(self): yield Blob(b"First chunk") yield Blob(b"Second chunk") ``` -------------------------------- ### Complete Module Import Pattern Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md Demonstrates importing the entire libzim library and accessing its modules and classes via the top-level namespace. ```python import libzim from libzim import reader, writer, search, suggestion, version # Access via module archive = libzim.reader.Archive("file.zim") ``` -------------------------------- ### Initialize Creator for writing to a ZIM file Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Manually initialize a `Creator` instance for writing to a specified ZIM file path. Ensure the directory exists and has write permissions. ```python import pathlib from libzim.writer import Creator zim_path = pathlib.Path("archive.zim") creator = Creator(zim_path) with creator: # Add content here pass ``` -------------------------------- ### Configure Creator with Zstd Compression Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md Configure the creator with zstd compression, cluster size, indexing, worker count, verbosity, and main path. ```python creator = Creator("output.zim") \ .config_compression("zstd") \ .config_clustersize(8192) \ .config_indexing(True, "eng") \ .config_nbworkers(4) \ .config_verbose(True) \ .set_mainpath("Main") ``` -------------------------------- ### Get Geographic Position of an Item Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Retrieves the geographic position (latitude, longitude) of an item if applicable. Returns a tuple of floats or None. ```python def get_geoposition(self) -> tuple[float, float] | None: ``` -------------------------------- ### Configure Creator for Zstandard compression Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Set the compression algorithm to Zstandard using `config_compression(Compression.zstd)` or `config_compression("zstd")`. This configuration must be done before adding items. ```python from libzim.writer import Creator, Compression # Using enum with Creator("output.zim").config_compression(Compression.zstd) as c: c.add_item(item) # Using string with Creator("output.zim").config_compression("zstd") as c: c.add_item(item) ``` -------------------------------- ### Read a ZIM File with python-libzim Source: https://github.com/openzim/python-libzim/blob/main/README.md Demonstrates how to open a ZIM file, access its main entry, retrieve specific entries by path, and perform full-text searches and suggestion lookups. ```python from libzim.reader import Archive from libzim.search import Query, Searcher from libzim.suggestion import SuggestionSearcher zim = Archive("test.zim") print(f"Main entry is at {zim.main_entry.get_item().path}") entry = zim.get_entry_by_path("home/fr") print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.") print(bytes(entry.get_item().content).decode("UTF-8")) # searching using full-text index search_string = "Welcome" query = Query().set_query(search_string) searcher = Searcher(zim) search = searcher.search(query) search_count = search.getEstimatedMatches() print(f"there are {search_count} matches for {search_string}") print(list(search.getResults(0, search_count))) # accessing suggestions search_string = "kiwix" suggestion_searcher = SuggestionSearcher(zim) suggestion = suggestion_searcher.suggest(search_string) suggestion_count = suggestion.getEstimatedMatches() print(f"there are {suggestion_count} matches for {search_string}") print(list(suggestion.getResults(0, suggestion_count))) ``` -------------------------------- ### libzim.suggestion Module Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/modules.md Enables autocomplete and title suggestions for ZIM archives. It allows users to get suggestions based on a given prefix. ```APIDOC ## Module: libzim.suggestion **Purpose**: Autocomplete and title suggestions **Key Classes** | Class | Purpose | |-------|---------| | `SuggestionSearcher` | Get prefix-based suggestions | | `SuggestionSearch` | Suggestion results container | | `SuggestionResultSet` | Iterable of suggestion paths | **Requirements** - Archive must have `has_title_index == True` - Created with title index support **Common Operations** ```python from libzim.reader import Archive from libzim.suggestion import SuggestionSearcher archive = Archive("archive.zim") if not archive.has_title_index: print("Archive has no title index") else: searcher = SuggestionSearcher(archive) suggestions = searcher.suggest("Albert") count = suggestions.getEstimatedMatches() print(f"Found {count} suggestions") # Get all suggestions for path in suggestions.getResults(0, count): entry = archive.get_entry_by_path(path) print(f"- {entry.title}") ``` ``` -------------------------------- ### Creator Class Initialization Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Initializes a `Creator` object to begin writing a ZIM file. The `filename` parameter specifies the output path, and the directory must exist with write permissions. It's recommended to use this class as a context manager. ```APIDOC ## Creator Constructor ### Description Initialize a Creator for writing to a ZIM file. ### Method __init__ ### Parameters #### Path Parameters - **filename** (`pathlib.Path`) - Yes - Path where the ZIM file will be created. Directory must exist. ### Raises - **IOError**: If the directory doesn't exist or lacks write permissions. ### Example ```python import pathlib from libzim.writer import Creator zim_path = pathlib.Path("archive.zim") creator = Creator(zim_path) with creator: # Add content here pass ``` ``` -------------------------------- ### Get Estimated Matches from SuggestionSearch Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md Obtain the estimated count of suggestions that match the search query. This method is called on a SuggestionSearch object. ```python suggestions = searcher.suggest("Einstein") count = suggestions.getEstimatedMatches() print(f"Found {count} articles starting with 'Einstein'") ``` -------------------------------- ### Create a ZIM File Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md Illustrates how to create a new ZIM file using the Creator class. It defines a custom Item class for articles and configures indexing before adding items and metadata. ```python from libzim.writer import Creator, Item, StringProvider, Hint import pathlib class Article(Item): def __init__(self, path, title, content): super().__init__() self.path, self.title, self.content = path, title, content def get_path(self): return self.path def get_title(self): return self.title def get_mimetype(self): return "text/html" def get_contentprovider(self): return StringProvider(self.content) def get_hints(self): return {Hint.FRONT_ARTICLE: True} with Creator(pathlib.Path("output.zim")) \ .config_indexing(True, "eng") as creator: creator.add_item(Article("Main", "Home", "

Welcome

")) creator.add_metadata("Title", "My Archive") ``` -------------------------------- ### Iterating Search Results Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Demonstrates iterating over a SearchResultSet to get individual paths. The result set can be iterated directly or converted to a list. ```python from collections.abc import Iterator from libzim.search import SearchResultSet result_set: SearchResultSet for path in result_set: # Uses __iter__ print(path) # path: str # Or convert to list paths: list[str] = list(result_set) ``` -------------------------------- ### Hint Enum Definition Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Defines hints for the ZIM creator to manage entry content. Use these hints to guide the creation process. ```python from libzim.writer import Hint class Hint(enum.Enum): COMPRESS: Self FRONT_ARTICLE: Self ``` -------------------------------- ### Write a ZIM File with python-libzim Source: https://github.com/openzim/python-libzim/blob/main/README.md Provides a custom Item class and demonstrates how to create a ZIM file with multiple entries, including one from a string and another from a file. ```python import base64 import pathlib from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint class MyItem(Item): def __init__(self, title, path, content="", fpath=None): super().__init__() self.path = path self.title = title self.content = content self.fpath = fpath def get_path(self): return self.path def get_title(self): return self.title def get_mimetype(self): return "text/html" def get_contentprovider(self): if self.fpath is not None: return FileProvider(self.fpath) return StringProvider(self.content) def get_hints(self): return {Hint.FRONT_ARTICLE: True} content = """Web Page Title

Welcome to this ZIM

Kiwix

""" pathlib.Path("home-fr.html").write_text( """ Bonjour

this is home-fr

""" ) item = MyItem("Hello Kiwix", "home", content) item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html") ``` -------------------------------- ### Get Suggestions with SuggestionSearcher Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/suggestion.md Retrieves suggestions based on a query string using prefix-based autocomplete. The searcher must be initialized with a valid archive. ```python from libzim.suggestion import SuggestionSearcher searcher = SuggestionSearcher(archive) suggestions = searcher.suggest("Albert") # Get number of matches count = suggestions.getEstimatedMatches() print(f"Found {count} suggestions") # Get suggestions suggestion_paths = list(suggestions.getResults(0, count)) for path in suggestion_paths: entry = archive.get_entry_by_path(path) print(f"- {entry.title}") ``` -------------------------------- ### Manage Archive Caches Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/README.md Get and set the maximum cluster cache size, and check current usage. Also configure the entry cache. ```python from libzim.reader import ( get_cluster_cache_max_size, set_cluster_cache_max_size, get_cluster_cache_current_size ) archive.dirent_cache_max_size = 256 print(archive.dirent_cache_current_size) ``` -------------------------------- ### Implementing IndexData with Optional Geoposition Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/types.md Example of implementing the IndexData interface where a method can return a tuple of floats or None, indicating the absence of geoposition data. ```python from libzim.writer import IndexData class MyIndexData(IndexData): def get_geoposition(self) -> tuple[float, float] | None: # Return None if no position available return None ``` -------------------------------- ### Get Current Cluster Cache Size in Python Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md Retrieves the current size of the cluster cache in bytes. This function requires importing `get_cluster_cache_current_size` from `libzim.reader`. ```python from libzim.reader import get_cluster_cache_current_size current = get_cluster_cache_current_size() print(f"Current cache usage: {current} bytes") ``` -------------------------------- ### Get Archive UUID Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/archive.md Retrieves the unique identifier (UUID) of the ZIM file and prints its hexadecimal representation. This is useful for identifying specific ZIM archives. ```python from libzim.reader import Archive archive = Archive("test.zim") print(archive.uuid.hex) # 32-character hex string ``` -------------------------------- ### Linear usage of Creator without context manager Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Shows how to use the `Creator` class linearly by manually calling `__enter__` and `__exit__` methods. This approach requires explicit management of the file lifecycle. ```python creator = Creator("output.zim") creator.__enter__() creator.add_item(item) creator.__exit__(None, None, None) ``` -------------------------------- ### Custom Item with Index Data Source: https://github.com/openzim/python-libzim/blob/main/_autodocs/api-reference/creator.md Example of implementing the optional get_indexdata method in a custom Item subclass to provide custom indexing data for full-text search. ```python class SearchableItem(Item): def get_indexdata(self): class CustomIndexData(IndexData): def has_indexdata(self): return True def get_title(self): return "Article Title" def get_content(self): return "Searchable content text" def get_keywords(self): return "keyword1, keyword2" def get_wordcount(self): return 50 return CustomIndexData() ```