### Minimal Configuration Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/configuration.md This example shows the basic usage of the `parse()` function with only the required URL and stream parameters. ```python import podcastparser # Minimal configuration feed = podcastparser.parse( 'https://example.com/feed.rss', open('feed.xml', 'rb') ) ``` -------------------------------- ### Install Podcast Parser Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md Installs the podcastparser library using pip. Requires Python 3.x. ```bash pip install podcastparser ``` -------------------------------- ### Full Configuration Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/configuration.md This example shows parsing a feed from bytes using `io.BytesIO` and setting `max_episodes`. It also includes printing the number of parsed episodes. ```python import podcastparser import io # Read feed from bytes feed_bytes = b'...' feed = podcastparser.parse( url='https://example.com/podcast.rss', stream=io.BytesIO(feed_bytes), max_episodes=100, ) print(f"Parsed {len(feed['episodes'])} episodes") ``` -------------------------------- ### Parse Podcast Feed with Requests Library Source: https://github.com/gpodder/podcastparser/blob/master/doc/index.md This example demonstrates parsing a podcast feed using the popular Requests library. It handles the response stream to efficiently process the feed content. Make sure to install the Requests library (`pip install requests`). ```python import podcastparser import requests url = 'https://example.net/podcast.atom' with requests.get(url, stream=True) as response: response.raw.decode_content = True parsed = podcastparser.parse(url, response.raw) # parsed is a dict import pprint pprint.pprint(parsed) ``` -------------------------------- ### Parse Default Namespace Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Demonstrates parsing a default namespace declaration from XML attributes. ```python from podcastparser import Namespace # Default namespace result = Namespace.parse_namespaces({'xmlns': 'example'}) # Returns: {'': 'example'} ``` -------------------------------- ### Lookup Namespace with Parent Context Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Illustrates how a child namespace context can look up a prefix defined in its parent context. ```python # With parent namespace parent = Namespace({'xmlns:m': 'http://search.yahoo.com/mrss/'}, None) child = Namespace({}, parent) child.lookup('m') # Returns: 'http://search.yahoo.com/mrss/' ``` -------------------------------- ### Map Tag with iTunes Namespace Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Demonstrates mapping a tag name that belongs to the iTunes namespace to its standardized podcastparser format. ```python from podcastparser import Namespace # Prefixed element with iTunes namespace ns = Namespace({ 'xmlns:it': 'http://www.itunes.com/dtds/podcast-1.0.dtd' }, None) ns.map('it:duration') # Returns: 'itunes:duration' ``` -------------------------------- ### Podcast Feed Return Structure Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/types.md Illustrates a common structure returned by the parse() function, including episode details and enclosures. ```python feed = { 'title': 'Example Podcast', 'description': 'A great podcast about examples.', 'link': 'https://example.com', 'cover_url': 'https://example.com/cover.jpg', 'language': 'en', 'type': 'episodic', 'explicit': False, 'episodes': [ { 'title': 'Episode 1', 'description': 'First episode', 'guid': 'ep-001', 'published': 1609459200, # 2021-01-01 00:00:00 UTC 'enclosures': [ { 'url': 'https://example.com/episode-1.mp3', 'file_size': 52428800, 'mime_type': 'audio/mpeg', } ], 'total_time': 3600, 'chapters': [ { 'start': 0, 'title': 'Introduction', } ], }, ], } ``` -------------------------------- ### Parsing Podlove Simple Chapters Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/supported-formats.md This XML snippet demonstrates the structure of Podlove Simple Chapters, including attributes like start time, title, href, and image. Chapters missing 'start' or 'title' are skipped. ```xml ``` -------------------------------- ### Lookup Namespace Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Demonstrates looking up a defined namespace prefix. The lookup checks the current context first and then recursively checks parent contexts. ```python from podcastparser import Namespace ns = Namespace({'xmlns:it': 'http://www.itunes.com/dtds/podcast-1.0.dtd'}, None) ns.lookup('it') # Returns: 'http://www.itunes.com/dtds/podcast-1.0.dtd' ``` -------------------------------- ### Inspect Error Details with getLocator() Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-errors.md Example demonstrating how to inspect detailed error information, including message, line/column number, and underlying exception. ```python import podcastparser try: with open('feed.xml', 'rb') as f: feed = podcastparser.parse('http://example.com/feed.xml', f) except podcastparser.FeedParseError as e: print(f"Error message: {e.getMessage()}") locator = e.getLocator() if locator: print(f"Line {locator.getLineNumber()}, column {locator.getColumnNumber()}") if e.getException(): print(f"Caused by: {e.getException()}") ``` -------------------------------- ### Limited Episodes Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/configuration.md This example demonstrates how to limit the number of episodes parsed by setting the `max_episodes` parameter. Only the most recent episodes are returned. ```python import podcastparser # Limit to 10 most recent episodes feed = podcastparser.parse( 'https://example.com/feed.rss', open('feed.xml', 'rb'), max_episodes=10 ) ``` -------------------------------- ### Parse Multiple Namespaces Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Illustrates parsing multiple namespace declarations, including a default and several prefixed ones, from XML attributes. ```python # Multiple namespaces result = Namespace.parse_namespaces({ 'xmlns': 'foo', 'xmlns:a': 'bar', 'xmlns:b': 'bla' }) # Returns: {'': 'foo', 'a': 'bar', 'b': 'bla'} ``` -------------------------------- ### Lookup Undefined Namespace Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Shows the result of looking up a namespace prefix that is not defined in the current or any parent context. ```python ns.lookup('undefined') # Returns: None ``` -------------------------------- ### Parse Prefixed Namespace Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Shows how to parse a single prefixed namespace declaration from XML attributes. ```python # Prefixed namespace result = Namespace.parse_namespaces({'xmlns:foo': 'http://example.com/bar'}) # Returns: {'foo': 'http://example.com/bar'} ``` -------------------------------- ### Podcast Person Element Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/supported-formats.md Shows the structure of a podcast:person element used for parsing person information. ```xml Jane Doe ``` -------------------------------- ### EpisodeGuid Handler Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Parses GUIDs and honors the isPermaLink attribute. It treats the GUID as a URL if isPermaLink is true, and as a literal string otherwise. ```python class EpisodeGuid(EpisodeAttr): """Parses GUID and honors isPermaLink attribute""" ``` ```xml http://example.com/ep1 custom-id-123 ``` -------------------------------- ### Stream Feed from Network (Python) Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Shows how to download a podcast feed from a URL and parse it using a byte stream. Includes error handling for network and parsing issues. This is a conceptual example for network streaming. ```python import podcastparser import io import urllib.request def parse_feed_from_url(url): """Download and parse a podcast feed from a URL""" try: # Download feed content with urllib.request.urlopen(url) as response: feed_content = response.read() # Parse from bytes stream stream = io.BytesIO(feed_content) feed = podcastparser.parse(url, stream) return feed except urllib.error.URLError as e: print(f"Network error: {e}") return None except podcastparser.FeedParseError as e: print(f"Parse error: {e}") return None # Usage feed = parse_feed_from_url('https://example.com/podcast.rss') if feed: print(f"Parsed: {feed['title']}") print(f"Episodes: {len(feed['episodes'])}") ``` -------------------------------- ### Extract and Display Chapter Information Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Shows how to extract and display chapter information, including titles, start times, and optional URLs or images, for podcast episodes. ```python import podcastparser with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/feed.xml', f) for episode in feed['episodes']: if 'chapters' not in episode: continue print(f"{episode['title']}") print("Chapters:") for chapter in episode['chapters']: # Format start time total_sec = chapter['start'] hours = total_sec // 3600 minutes = (total_sec % 3600) // 60 seconds = total_sec % 60 if hours > 0: time_str = f"{hours:02d}:{minutes:02d}:{seconds:02d}" else: time_str = f"{minutes:02d}:{seconds:02d}" print(f ``` ```python [{time_str}] {chapter['title']}") if 'href' in chapter: print(f ``` ```python URL: {chapter['href']}") if 'image' in chapter: print(f ``` ```python Image: {chapter['image']}") print() ``` -------------------------------- ### Map Tag with Unknown Namespace URI Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Illustrates how a tag name with an unknown namespace URI is mapped, resulting in a prefix of '!'. ```python # Unknown namespace URI gets prefixed with ! child.map('x:y') # Returns: '!x:y' ``` -------------------------------- ### Podcast Parser Mapping Dictionary Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md The MAPPING dictionary maps XML element paths to handler instances. Paths use namespace-mapped names. ```python MAPPING = { 'rss/channel/title': PodcastAttr('title', squash_whitespace), 'rss/channel/item': EpisodeItem(), 'rss/channel/item/title': EpisodeAttr('title', squash_whitespace), 'rss/channel/item/enclosure': Enclosure('length'), 'atom:feed/atom:entry/psc:chapters/psc:chapter': PodloveChapter(), # ... 70+ more entries } ``` ``` -------------------------------- ### Map Tag with Inherited Namespace Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Shows how a child namespace context maps a tag name using a namespace defined in its parent context. ```python # Child inherits parent namespace parent = Namespace({ 'xmlns:m': 'http://search.yahoo.com/mrss/', 'xmlns:x': 'http://example.com/' }, None) child = Namespace({}, parent) child.map('m:content') # Returns: 'media:content' ``` -------------------------------- ### Get First Episode Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md Safely retrieve the first episode from a feed, returning None if no episodes exist. ```python first = feed['episodes'][0] if feed['episodes'] else None ``` -------------------------------- ### EpisodeItem Handler Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Handles the opening and closing of episode elements. It calls add_episode() on start and validate_episode() on end. ```python class EpisodeItem(Target): """Creates a new episode and validates it after parsing""" ``` -------------------------------- ### Parse Dates and Durations with Python Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Demonstrates how to parse publication dates and durations from podcast episodes. It shows converting Unix timestamps to human-readable dates and formatting duration in hours, minutes, and seconds. Also includes examples of parsing arbitrary date/time strings and durations using helper functions. ```python import podcastparser from datetime import datetime with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/feed.xml', f) for episode in feed['episodes']: # Convert Unix timestamp to human-readable format pub_time = datetime.utcfromtimestamp(episode['published']) date_str = pub_time.strftime('%Y-%m-%d %H:%M:%S') print(f"{episode['title']}") print(f" Published: {date_str} UTC") # Format duration if 'total_time' in episode: total_sec = episode['total_time'] hours = total_sec // 3600 minutes = (total_sec % 3600) // 60 seconds = total_sec % 60 if hours > 0: duration = f"{hours}h {minutes}m {seconds}s" else: duration = f"{minutes}m {seconds}s" print(f" Duration: {duration}") print() # Parsing arbitrary date/time strings print("Date parsing examples:") print(podcastparser.parse_pubdate('2023-12-25T00:00:00Z')) print(podcastparser.parse_pubdate('Mon, 25 Dec 2023 12:30:45 -0500')) print("\nTime parsing examples:") print(f"'30' seconds = {podcastparser.parse_time('30')} seconds") print(f"'01:30' = {podcastparser.parse_time('01:30')} seconds") print(f"'01:30:45' = {podcastparser.parse_time('01:30:45')} seconds") ``` -------------------------------- ### PodcastAttrList Handler Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Parses comma-separated text content into a list. Use this handler for podcast fields that contain multiple values separated by commas. ```python 'rss/channel/itunes:keywords': PodcastAttrList('itunes_keywords', squash_whitespace), ``` -------------------------------- ### Map Tag with Undefined Prefix Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Demonstrates the behavior when mapping a tag name with a prefix that is not defined in any namespace context. The tag is returned unchanged, and a warning is logged. ```python # Undefined prefix returns unchanged child.map('atom:link') # Returns: 'atom:link' (with warning logged) ``` -------------------------------- ### Minimal Podcast Feed Parsing Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/README.md This snippet demonstrates the basic usage of the podcastparser library to parse a podcast feed from a URL and print its title and the number of episodes. Ensure the feed.xml file is accessible and the URL points to a valid podcast feed. ```python import podcastparser with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/podcast.xml', f) print(f"Podcast: {feed['title']}") print(f"Episodes: {len(feed['episodes'])}") ``` -------------------------------- ### Parsing Podcast Feed from HTTP Stream Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-parse.md Shows how to parse a podcast feed directly from an HTTP stream. This example uses `io.BytesIO` to simulate a stream, but in a real application, you would fetch the content using libraries like `urllib.request` or `requests`. ```python import podcastparser import io # In real code, you'd use urllib.request or requests # feed_content = urllib.request.urlopen(url).read() feed_content = b'...' stream = io.BytesIO(feed_content) feed = podcastparser.parse('https://example.com/feed.rss', stream) ``` -------------------------------- ### PodcastAttrFromHref Handler Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Extracts an attribute value (defaults to 'href') and stores it as an absolute URL. Useful for image sources or links specified in attributes. ```python 'rss/channel/itunes:image': PodcastAttrFromHref('cover_url'), ``` -------------------------------- ### Namespace.NAMESPACES Dictionary Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md This static dictionary maps XML namespace URIs to their corresponding standardized prefixes. It is used internally for parsing feeds with various namespace declarations. ```python { 'http://www.itunes.com/dtds/podcast-1.0.dtd': 'itunes', 'http://www.w3.org/2005/Atom': 'atom', 'http://search.yahoo.com/mrss/': 'media', 'http://podlove.org/simple-chapters': 'psc', 'https://github.com/Podcastindex-org/podcast-namespace/...': 'podcast', # ... more entries } ``` -------------------------------- ### PodcastAttrRelativeLink Handler Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Resolves relative URLs against the feed's base URL before storing them. Use this when links within the feed might be relative paths. ```python 'rss/channel/link': PodcastAttrRelativeLink('link'), ``` -------------------------------- ### Podcastparser Return Structure Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/README.md This JSON-like structure illustrates the data returned by the `podcastparser.parse()` function. It includes top-level podcast information and a list of episodes, each with details like title, publication date, and media enclosures. Note that some fields like 'total_time', 'chapters', and 'persons' are optional. ```json { "title": "string", # Podcast title "description": "string", # Podcast description "link": "string", # Podcast website "cover_url": "string", # Cover art URL "episodes": [ { "title": "string", "description": "string", "published": int, # Unix timestamp "guid": "string", # Unique ID "enclosures": [ { "url": "string", "file_size": int, "mime_type": "string", }, ], "total_time": int, # Duration (optional) "chapters": [...], # Chapters (optional) "persons": [...], # Contributors (optional) # ... more optional fields }, ], # ... more optional fields } ``` -------------------------------- ### Podcast Feed Parsing with Error Handling Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/README.md This example shows how to handle potential errors during podcast feed parsing using a try-except block. It catches the specific `FeedParseError` and prints an informative message. This is crucial for robust applications that interact with external feeds. ```python import podcastparser try: feed = podcastparser.parse(url, stream) except podcastparser.FeedParseError as e: print(f"Parse error: {e}") ``` -------------------------------- ### PodcastAttr Handler Example Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Stores element text content as a podcast attribute. Use this handler to extract simple string values for podcast fields. ```python 'rss/channel/title': PodcastAttr('title', squash_whitespace), ``` -------------------------------- ### Chapter Dictionary Structure Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/types.md Defines the structure for representing chapter markers within a podcast episode. Includes start time, title, and optional href and image URLs. ```python { 'start': int, 'title': str, 'href': str, # optional 'image': str, # optional } ``` -------------------------------- ### Find Episode by GUID Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md Searches for an episode within a feed using its unique 'guid'. Returns None if not found. ```python target_guid = 'ep-001' episode = next((e for e in feed['episodes'] if e['guid'] == target_guid), None) ``` -------------------------------- ### Import Podcastparser Module Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md Demonstrates how to import the entire podcastparser module for use in your project. ```python # Import the entire module import podcastparser ``` -------------------------------- ### Import the Library Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md Import the podcastparser library to begin using its functionalities. ```python import podcastparser ``` -------------------------------- ### Initialize iTunes Owner Dictionary Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Sets up the `itunes_owner` dictionary within the podcast's metadata. This is used for the owner's name and email address. ```python def add_itunes_owner(): self.data['itunes_owner'] = {} ``` -------------------------------- ### Podlove Single Chapter Parsing Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Parses a single chapter marker from Podlove feeds. Extracts optional 'start', 'title', 'href', and 'image' attributes. Chapters missing 'start' or 'title' are skipped. ```python class PodloveChapter(Target): """Parses a single chapter marker""" Extracts chapter attributes (all optional except `start` and `title`): - `start` → parsed as seconds via `parse_time()` - `title` → chapter name - `href` → chapter URL (optional) - `image` → chapter image URL (optional) Skips chapters missing either `start` or `title`. ``` -------------------------------- ### Initialize iTunes Categories List Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Initializes the `itunes_categories` list in the podcast's metadata. This prepares the structure for adding category information. ```python def add_itunes_categories(): self.data['itunes_categories'] = [] ``` -------------------------------- ### Get Episode Audio URL Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md Extracts the audio URL from the first enclosure of an episode, if available. ```python if episode['enclosures']: audio_url = episode['enclosures'][0]['url'] ``` -------------------------------- ### Initialize Episode Persons List Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Initializes the `persons` list for the current episode, specifically for data within the Podcast Index namespace. ```python def add_episode_persons(): self.episodes[-1]['persons'] = [] ``` -------------------------------- ### add_itunes_owner() -> None Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Initializes the iTunes owner dictionary within the podcast's metadata. ```APIDOC ## add_itunes_owner() -> None ### Description Initializes the iTunes owner dictionary. ``` -------------------------------- ### Get Podcast Attribute Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Retrieves a podcast-level attribute from the metadata dictionary. Provides a default value if the attribute is not found. ```python def get_podcast_attr(self, key, default=None): return self.data.get(key, default) ``` -------------------------------- ### Standard Namespace Mappings Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Provides a static dictionary mapping various XML namespace URIs to their standardized prefixes used by podcastparser. This includes namespaces for iTunes, Atom, Media RSS, Podlove Simple Chapters, Content Module, and Podcast Index. ```python NAMESPACES = { 'http://www.itunes.com/dtds/podcast-1.0.dtd': 'itunes', 'http://www.itunes.com/DTDs/Podcast-1.0.dtd': 'itunes', 'http://www.w3.org/2005/Atom': 'atom', 'http://www.w3.org/2005/Atom/': 'atom', 'http://search.yahoo.com/mrss/': 'media', 'http://search.yahoo.com/mrss': 'media', 'http://podlove.org/simple-chapters': 'psc', 'http://podlove.org/simple-chapters/': 'psc', 'http://purl.org/rss/1.0/modules/content/': 'content', 'https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md': 'podcast', 'https://github.com/podcastindex-org/podcast-namespace/blob/main/docs/1.0.md': 'podcast', } ``` -------------------------------- ### Catch and Handle Parse Errors Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-errors.md A basic example of catching FeedParseError during feed parsing and printing a user-friendly error message. ```python import podcastparser try: with open('feed.xml', 'rb') as f: feed = podcastparser.parse('http://example.com/feed.xml', f) except podcastparser.FeedParseError as e: print(f"Failed to parse feed: {e}") ``` -------------------------------- ### Podcastparser Type Hints Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md Illustrates the expected type hints for the parse function and the conceptual structure of its return types (Feed, Episode, Enclosure). These are not enforced by the library but serve as documentation. ```python import podcastparser from typing import Dict, List, Any, Optional # Parse function signature def parse(url: str, stream: Any, max_episodes: int = 0) -> Dict[str, Any]: ... # Return type structure (conceptual, not enforced) Feed = Dict[str, Any] # Contains 'episodes', 'title', etc. Episode = Dict[str, Any] # Contains 'title', 'guid', 'enclosures', etc. Enclosure = Dict[str, Any] # Contains 'url', 'file_size', 'mime_type' ``` -------------------------------- ### get_episode_attr(key: str, default: any = None) -> any Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Gets an attribute from the current episode. Returns a default value if the attribute is not set. ```APIDOC ## get_episode_attr(key: str, default: any = None) -> any ### Description Gets an attribute from the current episode. Returns `default` if not set. ### Parameters #### Path Parameters - **key** (str) - Required - The attribute name. - **default** (any) - Optional - The default value to return if the key is not found. ``` -------------------------------- ### Get Current Episode Attribute Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Retrieves an attribute from the current episode. Returns a specified default value if the attribute does not exist. ```python def get_episode_attr(self, key, default=None): return self.episodes[-1].get(key, default) ``` -------------------------------- ### Accessing Public API Functions and Classes Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md Demonstrates how to import and access the main functions, classes, and constants provided by the podcastparser module directly. ```python import podcastparser # All functions directly accessible podcastparser.parse(...) podcastparser.normalize_feed_url(...) podcastparser.FeedParseError # Classes directly accessible handler = podcastparser.PodcastHandler(...) ns = podcastparser.Namespace(...) # Constants directly accessible mapping = podcastparser.MAPPING roots = podcastparser.VALID_ROOTS namespaces = podcastparser.Namespace.NAMESPACES ``` -------------------------------- ### Import Specific Podcastparser Functions Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md Shows how to import specific functions from the podcastparser module, allowing for more targeted usage. ```python # Import specific functions from podcastparser import parse, normalize_feed_url, FeedParseError ``` -------------------------------- ### add_itunes_categories() -> None Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Initializes the iTunes categories list within the podcast's metadata. ```APIDOC ## add_itunes_categories() -> None ### Description Initializes the iTunes categories list. ``` -------------------------------- ### get_podcast_attr(key: str, default: any = None) -> any Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Gets a podcast-level attribute from the metadata dictionary. Returns a default value if the attribute is not found. ```APIDOC ## get_podcast_attr(key: str, default: any = None) -> any ### Description Gets a podcast-level attribute from the metadata dictionary. Returns `default` if not set. ### Parameters #### Path Parameters - **key** (str) - Required - The attribute name. - **default** (any) - Optional - The default value to return if the key is not found. ``` -------------------------------- ### Safe Access for Optional Fields (Check Key Presence) Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/types.md Demonstrates how to safely access optional fields like 'chapters' by checking for their presence in the episode dictionary. ```python # Safe access pattern if 'chapters' in episode: for chapter in episode['chapters']: print(chapter['title']) ``` -------------------------------- ### validate_episode() -> None Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Validates and cleans up the current episode after parsing is complete. This includes handling chapters, descriptions, GUIDs, titles, and internal flags. ```APIDOC ## validate_episode() -> None ### Description Validates and cleans up the current episode after parsing is complete. Performs checks for empty chapters, HTML descriptions, missing GUIDs and titles, and cleans up internal flags. ``` -------------------------------- ### Importing All Module Contents Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md This pattern imports the entire podcastparser module, allowing access to its functions and classes using the module name as a prefix. ```python import podcastparser feed = podcastparser.parse(...) ``` -------------------------------- ### Main Parse Function Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/MANIFEST.txt Documentation for the main parse() function, which is the primary entry point for parsing podcast feeds. ```APIDOC ## parse() ### Description This is the main function used to parse podcast feeds. It takes a feed URL or content and returns a structured representation of the podcast. ### Parameters - **feed_url_or_content** (str or bytes) - Required - The URL of the podcast feed or the raw feed content. - **encoding** (str) - Optional - The character encoding of the feed content. ### Returns - **Podcast** - A Podcast object containing the parsed feed data. ### Example ```python from podcastparser import parse podcast_data = parse('http://example.com/podcast.rss') print(podcast_data.title) ``` ``` -------------------------------- ### Check Media Type (Python) Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Iterates through episode enclosures and checks the 'mime_type' to identify and print URLs for audio or video content. Uses startswith() for flexible matching of MIME types. ```python for enclosure in episode['enclosures']: if enclosure['mime_type'].startswith('audio/'): print(f"Audio: {enclosure['url']}") elif enclosure['mime_type'].startswith('video/'): print(f"Video: {enclosure['url']}") ``` -------------------------------- ### Basic Podcast Parsing from Local File Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-parse.md Demonstrates how to parse a podcast feed from a local file using the `podcastparser.parse` function. It then prints the podcast title and a list of episodes with their titles and publication dates. ```python import podcastparser with open('feed.xml', 'rb') as f: feed = podcastparser.parse('http://example.com/podcast.xml', f) print(f"Podcast: {feed['title']}") print(f"Episodes: {len(feed['episodes'])}") for episode in feed['episodes']: print(f" - {episode['title']} ({episode['published']})") ``` -------------------------------- ### add_episode_persons() -> None Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Initializes the `persons` list for the current episode, specifically for the Podcast Index namespace. ```APIDOC ## add_episode_persons() -> None ### Description Initializes the `persons` list for the current episode (Podcast Index namespace). ``` -------------------------------- ### Initialize iTunes Owner Handler Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md This class is called when the 'itunes:owner' element opens, initializing the 'itunes_owner' dictionary. ```python class ItunesOwnerItem(Target): """Creates iTunes owner object""" ``` ``` -------------------------------- ### Validate Current Episode Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Performs validation and cleanup on the current episode after parsing. This includes handling descriptions, generating missing GUIDs and titles, and cleaning internal flags. ```python def validate_episode(): # Validates and cleans up the current episode ``` -------------------------------- ### Find Podcast Episode by ID or Title Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Search for a specific episode within a parsed feed using its GUID or a substring of its title. Ensure the feed object is already parsed. ```python import podcastparser def find_episode_by_guid(feed, target_guid): """Find episode by GUID""" for episode in feed['episodes']: if episode['guid'] == target_guid: return episode return None def find_episode_by_title(feed, title_substring): """Find episode by partial title match""" for episode in feed['episodes']: if title_substring.lower() in episode['title'].lower(): return episode return None with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/feed.xml', f) # Search by GUID ep = find_episode_by_guid(feed, 'episode-123') if ep: print(f"Found: {ep['title']}") # Search by title substring ep = find_episode_by_title(feed, 'Interview') if ep: print(f"Found: {ep['title']}") ``` -------------------------------- ### Access Episode Data Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md Iterate through the list of episodes to access individual episode details including title, description, publication timestamp, GUID, link, total duration, and enclosures. ```python for episode in feed['episodes']: episode['title'] # str episode['description'] # str episode['published'] # int (Unix timestamp) episode['guid'] # str episode['link'] # str episode['total_time'] # int (seconds) episode['enclosures'] # list of enclosure dicts ``` -------------------------------- ### Base Target Class Definition Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md Defines the base class for all element handlers. It includes attributes for storing parsed data and methods for handling element start and end events. ```python class Target(object): WANT_TEXT: bool = False def __init__(self, key=None, filter_func=lambda x: x.strip(), overwrite=True): self.key = key self.filter_func = filter_func self.overwrite = overwrite def start(self, handler, attrs): pass def end(self, handler, text): pass ``` -------------------------------- ### Process HTML Content Safely Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Demonstrates how to safely handle HTML descriptions in podcast episodes by stripping tags and decoding entities. It also shows how to detect if a string is HTML and strip tags from arbitrary HTML. ```python import podcastparser with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/feed.xml', f) for episode in feed['episodes']: # Determine which description to use if 'description_html' in episode: # Strip HTML tags and decode entities text = podcastparser.remove_html_tags(episode['description_html']) print(f"{episode['title']} (from HTML)") elif episode['description']: # Use plain text description text = episode['description'] print(f"{episode['title']} (plain text)") else: text = "(no description)" print(f"{episode['title']} (no description)") # Display excerpt excerpt = text[:200] if text else '' print(f ``` ```python excerpt}...") print() # Manual HTML detection sample = "

This is HTML

" is_html = podcastparser.is_html(sample) print(f"\nIs HTML: {is_html}") # Strip tags from arbitrary HTML html = '

This is bold text

' plain = podcastparser.remove_html_tags(html) print(f"Original: {html}") print(f"Stripped: {plain}") ``` -------------------------------- ### Initialize Namespace Context Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Creates a new namespace context. It parses namespace declarations from XML attributes and sets the parent context for hierarchical lookup. ```python def __init__(self, attrs, parent=None): self.namespaces = self.parse_namespaces(attrs) self.parent = parent ``` -------------------------------- ### Episode Dictionary Structure Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md Represents the structure of an individual episode dictionary within the podcast feed. Includes details such as title, description, link, publication date, GUID, enclosures, and total duration. ```APIDOC ## Episode Dictionary This dictionary represents a single episode within a podcast feed. ### Fields - **title** (str) - The title of the episode. - **description** (str) - A description of the episode. - **link** (str) - A URL related to the episode. - **published** (int) - The publication date as a Unix timestamp. - **guid** (str) - A globally unique identifier for the episode. - **enclosures** (list[Enclosure, ...]) - A list of media file enclosures for the episode. - **total_time** (int) - The total duration of the episode in seconds. - **...** (optional fields) - May include additional fields. ``` -------------------------------- ### Iterate with Error Handling (Python) Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Shows how to iterate over a list of episodes while gracefully handling potential KeyError or TypeError exceptions that might arise from unexpected data structures. ```python try: for episode in feed['episodes']: print(episode['title']) except (KeyError, TypeError): print("Unexpected data structure") ``` -------------------------------- ### Importing Specific Items from Module Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md This pattern imports only the necessary functions and classes from the podcastparser module, reducing namespace pollution and improving clarity. ```python from podcastparser import parse, normalize_feed_url, FeedParseError feed = parse(...) url = normalize_feed_url(...) try: ... except FeedParseError: ... ``` -------------------------------- ### PodcastHandler Constructor Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Initializes the handler for parsing a single feed. It sets up attributes for storing feed data, episode limits, and managing the XML parsing state. ```python def __init__(self, url, max_episodes): self.url = url self.max_episodes = max_episodes self.base = url self.text = None self.episodes = [] self.data = { 'title': file_basename_no_extension(url), 'episodes': self.episodes, } self.path_stack = [] self.namespace = None ``` -------------------------------- ### Add Media Enclosure to Episode Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Adds a media enclosure (like an audio or video file) to the current episode. Requires URL, file size, and MIME type. ```python def add_enclosure(self, url, file_size, mime_type): self.episodes[-1]['enclosures'].append({ 'url': url, 'file_size': file_size, 'mime_type': mime_type, }) ``` -------------------------------- ### Extracting Enclosure URLs from Episodes Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-parse.md Demonstrates how to iterate through parsed episodes and extract details about their enclosures, including the URL, file size, and MIME type. This is useful for accessing media files associated with podcast episodes. ```python import podcastparser with open('feed.xml', 'rb') as f: feed = podcastparser.parse('http://example.com/podcast.xml', f) for episode in feed['episodes']: for enclosure in episode['enclosures']: print(f"Episode: {episode['title']}") print(f" URL: {enclosure['url']}") print(f" Size: {enclosure['file_size']} bytes") print(f" Type: {enclosure['mime_type']}") ``` -------------------------------- ### Parse Podcast Feed with podcastparser Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/INDEX.md Demonstrates how to parse a podcast feed from a file and access its title and episode information. Ensure the feed file is opened in binary read mode ('rb'). ```python import podcastparser # Parse a feed with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/feed.xml', f) # Access data print(f"Podcast: {feed['title']}") print(f"Episodes: {len(feed['episodes'])}") for episode in feed['episodes']: print(f" {episode['title']} ({episode['published']})") ``` -------------------------------- ### Generate Feed Statistics (Python) Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Calculates and prints statistics such as total episodes, average duration, and total file size from a parsed podcast feed. Requires the feed to have episodes and enclosures with size information. ```python import podcastparser import statistics from datetime import datetime def feed_statistics(feed): """Calculate statistics about a feed""" if not feed['episodes']: return None episodes = feed['episodes'] durations = [ep['total_time'] for ep in episodes if 'total_time' in ep] sizes = [enc['file_size'] for ep in episodes for enc in ep['enclosures'] if enc['file_size'] > 0] stats = { 'total_episodes': len(episodes), 'episodes_with_duration': len(durations), 'avg_duration': statistics.mean(durations) if durations else 0, 'median_duration': statistics.median(durations) if durations else 0, 'total_size_bytes': sum(sizes), 'avg_file_size': statistics.mean(sizes) if sizes else 0, } return stats with open('feed.xml', 'rb') as f: feed = podcastparser.parse('https://example.com/feed.xml', f) stats = feed_statistics(feed) if stats: print(f"Feed Statistics: {feed['title']}") print(f" Total episodes: {stats['total_episodes']}") print(f" Episodes with duration: {stats['episodes_with_duration']}") print(f" Average duration: {stats['avg_duration']//60} minutes") print(f" Median duration: {stats['median_duration']//60} minutes") print(f" Total size: {stats['total_size_bytes']//(1024**3)} GB") print(f" Average file size: {stats['avg_file_size']//(1024**2)} MB") ``` -------------------------------- ### Constants Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md Publicly accessible constants for configuration and data mapping. ```APIDOC ## Constants ### Description Publicly accessible constants. ### Accessible Constants - `MAPPING`: Data mapping configuration. - `VALID_ROOTS`: List of valid root elements for podcast feeds. - `Namespace.NAMESPACES`: Predefined XML namespaces. ``` -------------------------------- ### add_enclosure(url: str, file_size: int, mime_type: str) -> None Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md Adds a media enclosure to the current episode. Enclosures represent the media files associated with an episode. ```APIDOC ## add_enclosure(url: str, file_size: int, mime_type: str) -> None ### Description Adds a media enclosure to the current episode. ### Parameters #### Path Parameters - **url** (str) - Required - Normalized media file URL. - **file_size** (int) - Required - File size in bytes (-1 if unknown). - **mime_type** (str) - Required - MIME type (e.g., 'audio/mpeg'). ``` -------------------------------- ### Global Scope Functions Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md Lists the available functions in the global scope for parsing podcast data, normalizing URLs, and processing text. ```python parse(url, stream, max_episodes=0) -> dict normalize_feed_url(url) -> str | None parse_url(text) -> str | None parse_time(value) -> int parse_pubdate(text) -> int parse_length(text) -> int parse_type(text) -> str file_basename_no_extension(filename) -> str squash_whitespace(text) -> str squash_whitespace_not_nl(text) -> str is_html(text) -> bool remove_html_tags(html) -> str | None ``` -------------------------------- ### Module-Level Version and Author Information Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md These constants define the version, author, website, and license of the podcastparser library. They are typically used for informational purposes. ```python __version__ = '0.6.11' # Version string __author__ = 'Thomas Perl ' # Author info __website__ = 'http://gpodder.org/podcastparser/' # Project URL __license__ = 'ISC License' # License type ``` -------------------------------- ### Main Entry Point Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md The primary function to parse a podcast feed. It takes a URL or a stream and an optional maximum number of episodes to retrieve, returning a dictionary containing podcast metadata and a list of episodes. ```APIDOC ## parse(url, stream, max_episodes=0) ### Description Parses a podcast feed from a given URL or stream. It extracts structured metadata for the podcast and its episodes. ### Parameters - **url** (string) - Optional - The URL of the podcast feed. - **stream** (file-like object) - Optional - A file-like object containing the feed content. - **max_episodes** (integer) - Optional - The maximum number of episodes to include in the output. Defaults to 0 (all episodes). ### Returns A dictionary containing podcast metadata and a list of episodes. ``` -------------------------------- ### Namespace Constructor Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md Initializes a new Namespace object, parsing XML attributes to establish namespace contexts. It can optionally link to a parent namespace for hierarchical lookups. ```APIDOC ## Constructor ### `__init__(attrs: dict, parent: Namespace = None)` Creates a new namespace context. #### Parameters - `attrs` (dict) - Required - XML element attributes (from SAX handler) - `parent` (Namespace or None) - Optional - Parent namespace context (for hierarchical lookup) #### Attributes - `namespaces` (dict) - Parsed namespace declarations (prefix → URI mapping) - `parent` (Namespace or None) - Parent context for inheritance lookup ``` -------------------------------- ### Namespace Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md Represents XML namespaces used in podcast feeds. ```APIDOC ## Namespace(...) ### Description Represents an XML namespace. ### Attributes - `NAMESPACES`: A dictionary of predefined namespaces. ``` -------------------------------- ### Basic Podcast Feed Parsing Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md Parses a podcast feed from a file and displays basic information such as title, description, number of episodes, and details of the latest episode. Ensure the feed file is opened in binary read mode. ```python import podcastparser # Open and parse a feed with open('podcast.rss', 'rb') as f: feed = podcastparser.parse('https://example.com/podcast.rss', f) # Display podcast information print(f"Podcast: {feed['title']}") print(f"Description: {feed['description']}") print(f"Episodes: {len(feed['episodes'])}") # Display first episode if feed['episodes']: ep = feed['episodes'][0] print(f"\nLatest episode: {ep['title']}") print(f"Published: {ep['published']}") print(f"Length: {ep['total_time']} seconds") ```