### Minimal Configuration Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/configuration.md

This example shows the basic usage of the `parse()` function with only the required URL and stream parameters.

```python
import podcastparser

# Minimal configuration
feed = podcastparser.parse(
    'https://example.com/feed.rss',
    open('feed.xml', 'rb')
)
```

--------------------------------

### Install Podcast Parser

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md

Installs the podcastparser library using pip. Requires Python 3.x.

```bash
pip install podcastparser
```

--------------------------------

### Full Configuration Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/configuration.md

This example shows parsing a feed from bytes using `io.BytesIO` and setting `max_episodes`. It also includes printing the number of parsed episodes.

```python
import podcastparser
import io

# Read feed from bytes
feed_bytes = b'<?xml version="1.0"?><rss>...'

feed = podcastparser.parse(
    url='https://example.com/podcast.rss',
    stream=io.BytesIO(feed_bytes),
    max_episodes=100,
)

print(f"Parsed {len(feed['episodes'])} episodes")
```

--------------------------------

### Parse Podcast Feed with Requests Library

Source: https://github.com/gpodder/podcastparser/blob/master/doc/index.md

This example demonstrates parsing a podcast feed using the popular Requests library. It handles the response stream to efficiently process the feed content. Make sure to install the Requests library (`pip install requests`).

```python
import podcastparser
import requests

url = 'https://example.net/podcast.atom'

with requests.get(url, stream=True) as response:
    response.raw.decode_content = True
    parsed = podcastparser.parse(url, response.raw)

# parsed is a dict
import pprint
pprint.pprint(parsed)
```

--------------------------------

### Parse Default Namespace Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Demonstrates parsing a default namespace declaration from XML attributes.

```python
from podcastparser import Namespace

# Default namespace
result = Namespace.parse_namespaces({'xmlns': 'example'})
# Returns: {'': 'example'}
```

--------------------------------

### Lookup Namespace with Parent Context Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Illustrates how a child namespace context can look up a prefix defined in its parent context.

```python
# With parent namespace
parent = Namespace({'xmlns:m': 'http://search.yahoo.com/mrss/'}, None)
child = Namespace({}, parent)
child.lookup('m')
# Returns: 'http://search.yahoo.com/mrss/'
```

--------------------------------

### Map Tag with iTunes Namespace Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Demonstrates mapping a tag name that belongs to the iTunes namespace to its standardized podcastparser format.

```python
from podcastparser import Namespace

# Prefixed element with iTunes namespace
ns = Namespace({
    'xmlns:it': 'http://www.itunes.com/dtds/podcast-1.0.dtd'
}, None)
ns.map('it:duration')
# Returns: 'itunes:duration'
```

--------------------------------

### Podcast Feed Return Structure Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/types.md

Illustrates a common structure returned by the parse() function, including episode details and enclosures.

```python
feed = {
    'title': 'Example Podcast',
    'description': 'A great podcast about examples.',
    'link': 'https://example.com',
    'cover_url': 'https://example.com/cover.jpg',
    'language': 'en',
    'type': 'episodic',
    'explicit': False,
    'episodes': [
        {
            'title': 'Episode 1',
            'description': 'First episode',
            'guid': 'ep-001',
            'published': 1609459200,  # 2021-01-01 00:00:00 UTC
            'enclosures': [
                {
                    'url': 'https://example.com/episode-1.mp3',
                    'file_size': 52428800,
                    'mime_type': 'audio/mpeg',
                }
            ],
            'total_time': 3600,
            'chapters': [
                {
                    'start': 0,
                    'title': 'Introduction',
                }
            ],
        },
    ],
}
```

--------------------------------

### Parsing Podlove Simple Chapters

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/supported-formats.md

This XML snippet demonstrates the structure of Podlove Simple Chapters, including attributes like start time, title, href, and image. Chapters missing 'start' or 'title' are skipped.

```xml
<psc:chapters version="1.1">
  <psc:chapter start="00:00:00" title="Introduction" href="https://..." image="https://..." />
  <psc:chapter start="00:05:00" title="Main Topic" />
</psc:chapters>
```

--------------------------------

### Lookup Namespace Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Demonstrates looking up a defined namespace prefix. The lookup checks the current context first and then recursively checks parent contexts.

```python
from podcastparser import Namespace

ns = Namespace({'xmlns:it': 'http://www.itunes.com/dtds/podcast-1.0.dtd'}, None)
ns.lookup('it')
# Returns: 'http://www.itunes.com/dtds/podcast-1.0.dtd'
```

--------------------------------

### Inspect Error Details with getLocator()

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-errors.md

Example demonstrating how to inspect detailed error information, including message, line/column number, and underlying exception.

```python
import podcastparser

try:
    with open('feed.xml', 'rb') as f:
        feed = podcastparser.parse('http://example.com/feed.xml', f)
except podcastparser.FeedParseError as e:
    print(f"Error message: {e.getMessage()}")
    locator = e.getLocator()
    if locator:
        print(f"Line {locator.getLineNumber()}, column {locator.getColumnNumber()}")
    if e.getException():
        print(f"Caused by: {e.getException()}")
```

--------------------------------

### Limited Episodes Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/configuration.md

This example demonstrates how to limit the number of episodes parsed by setting the `max_episodes` parameter. Only the most recent episodes are returned.

```python
import podcastparser

# Limit to 10 most recent episodes
feed = podcastparser.parse(
    'https://example.com/feed.rss',
    open('feed.xml', 'rb'),
    max_episodes=10
)
```

--------------------------------

### Parse Multiple Namespaces Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Illustrates parsing multiple namespace declarations, including a default and several prefixed ones, from XML attributes.

```python
# Multiple namespaces
result = Namespace.parse_namespaces({
    'xmlns': 'foo',
    'xmlns:a': 'bar',
    'xmlns:b': 'bla'
})
# Returns: {'': 'foo', 'a': 'bar', 'b': 'bla'}
```

--------------------------------

### Lookup Undefined Namespace Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Shows the result of looking up a namespace prefix that is not defined in the current or any parent context.

```python
ns.lookup('undefined')
# Returns: None
```

--------------------------------

### Parse Prefixed Namespace Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Shows how to parse a single prefixed namespace declaration from XML attributes.

```python
# Prefixed namespace
result = Namespace.parse_namespaces({'xmlns:foo': 'http://example.com/bar'})
# Returns: {'foo': 'http://example.com/bar'}
```

--------------------------------

### Podcast Person Element Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/supported-formats.md

Shows the structure of a podcast:person element used for parsing person information.

```xml
<podcast:person name="Jane Doe" role="host" group="cast" href="https://..." img="https://..."/>
  Jane Doe
</podcast:person>
```

--------------------------------

### EpisodeGuid Handler

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Parses GUIDs and honors the isPermaLink attribute. It treats the GUID as a URL if isPermaLink is true, and as a literal string otherwise.

```python
class EpisodeGuid(EpisodeAttr):
    """Parses GUID and honors isPermaLink attribute"""
```

```xml
<guid isPermaLink="true">http://example.com/ep1</guid>
<guid isPermaLink="false">custom-id-123</guid>
```

--------------------------------

### Stream Feed from Network (Python)

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Shows how to download a podcast feed from a URL and parse it using a byte stream. Includes error handling for network and parsing issues. This is a conceptual example for network streaming.

```python
import podcastparser
import io
import urllib.request

def parse_feed_from_url(url):
    """Download and parse a podcast feed from a URL"""
    try:
        # Download feed content
        with urllib.request.urlopen(url) as response:
            feed_content = response.read()
        
        # Parse from bytes stream
        stream = io.BytesIO(feed_content)
        feed = podcastparser.parse(url, stream)
        return feed
    
    except urllib.error.URLError as e:
        print(f"Network error: {e}")
        return None
    except podcastparser.FeedParseError as e:
        print(f"Parse error: {e}")
        return None

# Usage
feed = parse_feed_from_url('https://example.com/podcast.rss')
if feed:
    print(f"Parsed: {feed['title']}")
    print(f"Episodes: {len(feed['episodes'])}")
```

--------------------------------

### Extract and Display Chapter Information

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Shows how to extract and display chapter information, including titles, start times, and optional URLs or images, for podcast episodes.

```python
import podcastparser

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/feed.xml', f)

for episode in feed['episodes']:
    if 'chapters' not in episode:
        continue
    
    print(f"{episode['title']}")
    print("Chapters:")
    
    for chapter in episode['chapters']:
        # Format start time
        total_sec = chapter['start']
        hours = total_sec // 3600
        minutes = (total_sec % 3600) // 60
        seconds = total_sec % 60
        
        if hours > 0:
            time_str = f"{hours:02d}:{minutes:02d}:{seconds:02d}"
        else:
            time_str = f"{minutes:02d}:{seconds:02d}"
        
        print(f
```

```python
          [{time_str}] {chapter['title']}")
        
        if 'href' in chapter:
            print(f
```

```python
URL: {chapter['href']}")
        if 'image' in chapter:
            print(f
```

```python
Image: {chapter['image']}")
    
    print()

```

--------------------------------

### Map Tag with Unknown Namespace URI Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Illustrates how a tag name with an unknown namespace URI is mapped, resulting in a prefix of '!'.

```python
# Unknown namespace URI gets prefixed with !
child.map('x:y')
# Returns: '!x:y'
```

--------------------------------

### Podcast Parser Mapping Dictionary Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

The MAPPING dictionary maps XML element paths to handler instances. Paths use namespace-mapped names.

```python
MAPPING = {
    'rss/channel/title': PodcastAttr('title', squash_whitespace),
    'rss/channel/item': EpisodeItem(),
    'rss/channel/item/title': EpisodeAttr('title', squash_whitespace),
    'rss/channel/item/enclosure': Enclosure('length'),
    'atom:feed/atom:entry/psc:chapters/psc:chapter': PodloveChapter(),
    # ... 70+ more entries
}
```
```

--------------------------------

### Map Tag with Inherited Namespace Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Shows how a child namespace context maps a tag name using a namespace defined in its parent context.

```python
# Child inherits parent namespace
parent = Namespace({
    'xmlns:m': 'http://search.yahoo.com/mrss/',
    'xmlns:x': 'http://example.com/'
}, None)
child = Namespace({}, parent)
child.map('m:content')
# Returns: 'media:content'
```

--------------------------------

### Get First Episode

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md

Safely retrieve the first episode from a feed, returning None if no episodes exist.

```python
first = feed['episodes'][0] if feed['episodes'] else None
```

--------------------------------

### EpisodeItem Handler

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Handles the opening and closing of episode elements. It calls add_episode() on start and validate_episode() on end.

```python
class EpisodeItem(Target):
    """Creates a new episode and validates it after parsing"""
```

--------------------------------

### Parse Dates and Durations with Python

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Demonstrates how to parse publication dates and durations from podcast episodes. It shows converting Unix timestamps to human-readable dates and formatting duration in hours, minutes, and seconds. Also includes examples of parsing arbitrary date/time strings and durations using helper functions.

```python
import podcastparser
from datetime import datetime

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/feed.xml', f)

for episode in feed['episodes']:
    # Convert Unix timestamp to human-readable format
    pub_time = datetime.utcfromtimestamp(episode['published'])
    date_str = pub_time.strftime('%Y-%m-%d %H:%M:%S')
    print(f"{episode['title']}")
    print(f"  Published: {date_str} UTC")
    
    # Format duration
    if 'total_time' in episode:
        total_sec = episode['total_time']
        hours = total_sec // 3600
        minutes = (total_sec % 3600) // 60
        seconds = total_sec % 60
        
        if hours > 0:
            duration = f"{hours}h {minutes}m {seconds}s"
        else:
            duration = f"{minutes}m {seconds}s"
        print(f"  Duration: {duration}")
    
    print()

# Parsing arbitrary date/time strings
print("Date parsing examples:")
print(podcastparser.parse_pubdate('2023-12-25T00:00:00Z'))
print(podcastparser.parse_pubdate('Mon, 25 Dec 2023 12:30:45 -0500'))

print("\nTime parsing examples:")
print(f"'30' seconds = {podcastparser.parse_time('30')} seconds")
print(f"'01:30' = {podcastparser.parse_time('01:30')} seconds")
print(f"'01:30:45' = {podcastparser.parse_time('01:30:45')} seconds")
```

--------------------------------

### PodcastAttrList Handler Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Parses comma-separated text content into a list. Use this handler for podcast fields that contain multiple values separated by commas.

```python
'rss/channel/itunes:keywords': PodcastAttrList('itunes_keywords', squash_whitespace),
```

--------------------------------

### Map Tag with Undefined Prefix Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Demonstrates the behavior when mapping a tag name with a prefix that is not defined in any namespace context. The tag is returned unchanged, and a warning is logged.

```python
# Undefined prefix returns unchanged
child.map('atom:link')
# Returns: 'atom:link' (with warning logged)
```

--------------------------------

### Minimal Podcast Feed Parsing Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/README.md

This snippet demonstrates the basic usage of the podcastparser library to parse a podcast feed from a URL and print its title and the number of episodes. Ensure the feed.xml file is accessible and the URL points to a valid podcast feed.

```python
import podcastparser

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/podcast.xml', f)

print(f"Podcast: {feed['title']}")
print(f"Episodes: {len(feed['episodes'])}")
```

--------------------------------

### Parsing Podcast Feed from HTTP Stream

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-parse.md

Shows how to parse a podcast feed directly from an HTTP stream. This example uses `io.BytesIO` to simulate a stream, but in a real application, you would fetch the content using libraries like `urllib.request` or `requests`.

```python
import podcastparser
import io

# In real code, you'd use urllib.request or requests
# feed_content = urllib.request.urlopen(url).read()
feed_content = b'<?xml version="1.0"?>...'

stream = io.BytesIO(feed_content)
feed = podcastparser.parse('https://example.com/feed.rss', stream)
```

--------------------------------

### PodcastAttrFromHref Handler Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Extracts an attribute value (defaults to 'href') and stores it as an absolute URL. Useful for image sources or links specified in attributes.

```python
'rss/channel/itunes:image': PodcastAttrFromHref('cover_url'),
```

--------------------------------

### Namespace.NAMESPACES Dictionary Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

This static dictionary maps XML namespace URIs to their corresponding standardized prefixes. It is used internally for parsing feeds with various namespace declarations.

```python
{
    'http://www.itunes.com/dtds/podcast-1.0.dtd': 'itunes',
    'http://www.w3.org/2005/Atom': 'atom',
    'http://search.yahoo.com/mrss/': 'media',
    'http://podlove.org/simple-chapters': 'psc',
    'https://github.com/Podcastindex-org/podcast-namespace/...': 'podcast',
    # ... more entries
}
```

--------------------------------

### PodcastAttrRelativeLink Handler Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Resolves relative URLs against the feed's base URL before storing them. Use this when links within the feed might be relative paths.

```python
'rss/channel/link': PodcastAttrRelativeLink('link'),
```

--------------------------------

### Podcastparser Return Structure Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/README.md

This JSON-like structure illustrates the data returned by the `podcastparser.parse()` function. It includes top-level podcast information and a list of episodes, each with details like title, publication date, and media enclosures. Note that some fields like 'total_time', 'chapters', and 'persons' are optional.

```json
{
    "title": "string",           # Podcast title
    "description": "string",     # Podcast description
    "link": "string",            # Podcast website
    "cover_url": "string",       # Cover art URL
    "episodes": [
        {
            "title": "string",
            "description": "string",
            "published": int,       # Unix timestamp
            "guid": "string",            # Unique ID
            "enclosures": [
                {
                    "url": "string",
                    "file_size": int,
                    "mime_type": "string",
                },
            ],
            "total_time": int,      # Duration (optional)
            "chapters": [...],      # Chapters (optional)
            "persons": [...],       # Contributors (optional)
            # ... more optional fields
        },
    ],
    # ... more optional fields
}
```

--------------------------------

### Podcast Feed Parsing with Error Handling

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/README.md

This example shows how to handle potential errors during podcast feed parsing using a try-except block. It catches the specific `FeedParseError` and prints an informative message. This is crucial for robust applications that interact with external feeds.

```python
import podcastparser

try:
    feed = podcastparser.parse(url, stream)
except podcastparser.FeedParseError as e:
    print(f"Parse error: {e}")
```

--------------------------------

### PodcastAttr Handler Example

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Stores element text content as a podcast attribute. Use this handler to extract simple string values for podcast fields.

```python
'rss/channel/title': PodcastAttr('title', squash_whitespace),
```

--------------------------------

### Chapter Dictionary Structure

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/types.md

Defines the structure for representing chapter markers within a podcast episode. Includes start time, title, and optional href and image URLs.

```python
{
    'start': int,
    'title': str,
    'href': str,      # optional
    'image': str,     # optional
}
```

--------------------------------

### Find Episode by GUID

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md

Searches for an episode within a feed using its unique 'guid'. Returns None if not found.

```python
target_guid = 'ep-001'
episode = next((e for e in feed['episodes'] if e['guid'] == target_guid), None)
```

--------------------------------

### Import Podcastparser Module

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md

Demonstrates how to import the entire podcastparser module for use in your project.

```python
# Import the entire module
import podcastparser
```

--------------------------------

### Import the Library

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md

Import the podcastparser library to begin using its functionalities.

```python
import podcastparser
```

--------------------------------

### Initialize iTunes Owner Dictionary

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Sets up the `itunes_owner` dictionary within the podcast's metadata. This is used for the owner's name and email address.

```python
def add_itunes_owner():
    self.data['itunes_owner'] = {}
```

--------------------------------

### Podlove Single Chapter Parsing

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Parses a single chapter marker from Podlove feeds. Extracts optional 'start', 'title', 'href', and 'image' attributes. Chapters missing 'start' or 'title' are skipped.

```python
class PodloveChapter(Target):
    """Parses a single chapter marker"""

Extracts chapter attributes (all optional except `start` and `title`):
- `start` → parsed as seconds via `parse_time()`
- `title` → chapter name
- `href` → chapter URL (optional)
- `image` → chapter image URL (optional)

Skips chapters missing either `start` or `title`.
```

--------------------------------

### Initialize iTunes Categories List

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Initializes the `itunes_categories` list in the podcast's metadata. This prepares the structure for adding category information.

```python
def add_itunes_categories():
    self.data['itunes_categories'] = []
```

--------------------------------

### Get Episode Audio URL

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md

Extracts the audio URL from the first enclosure of an episode, if available.

```python
if episode['enclosures']:
    audio_url = episode['enclosures'][0]['url']
```

--------------------------------

### Initialize Episode Persons List

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Initializes the `persons` list for the current episode, specifically for data within the Podcast Index namespace.

```python
def add_episode_persons():
    self.episodes[-1]['persons'] = []
```

--------------------------------

### add_itunes_owner() -> None

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Initializes the iTunes owner dictionary within the podcast's metadata.

```APIDOC
## add_itunes_owner() -> None

### Description
Initializes the iTunes owner dictionary.
```

--------------------------------

### Get Podcast Attribute

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Retrieves a podcast-level attribute from the metadata dictionary. Provides a default value if the attribute is not found.

```python
def get_podcast_attr(self, key, default=None):
    return self.data.get(key, default)
```

--------------------------------

### Standard Namespace Mappings

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Provides a static dictionary mapping various XML namespace URIs to their standardized prefixes used by podcastparser. This includes namespaces for iTunes, Atom, Media RSS, Podlove Simple Chapters, Content Module, and Podcast Index.

```python
NAMESPACES = {
    'http://www.itunes.com/dtds/podcast-1.0.dtd': 'itunes',
    'http://www.itunes.com/DTDs/Podcast-1.0.dtd': 'itunes',
    'http://www.w3.org/2005/Atom': 'atom',
    'http://www.w3.org/2005/Atom/': 'atom',
    'http://search.yahoo.com/mrss/': 'media',
    'http://search.yahoo.com/mrss': 'media',
    'http://podlove.org/simple-chapters': 'psc',
    'http://podlove.org/simple-chapters/': 'psc',
    'http://purl.org/rss/1.0/modules/content/': 'content',
    'https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md': 'podcast',
    'https://github.com/podcastindex-org/podcast-namespace/blob/main/docs/1.0.md': 'podcast',
}
```

--------------------------------

### Catch and Handle Parse Errors

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-errors.md

A basic example of catching FeedParseError during feed parsing and printing a user-friendly error message.

```python
import podcastparser

try:
    with open('feed.xml', 'rb') as f:
        feed = podcastparser.parse('http://example.com/feed.xml', f)
except podcastparser.FeedParseError as e:
    print(f"Failed to parse feed: {e}")
```

--------------------------------

### Podcastparser Type Hints

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

Illustrates the expected type hints for the parse function and the conceptual structure of its return types (Feed, Episode, Enclosure). These are not enforced by the library but serve as documentation.

```python
import podcastparser
from typing import Dict, List, Any, Optional

# Parse function signature
def parse(url: str, stream: Any, max_episodes: int = 0) -> Dict[str, Any]:
    ...

# Return type structure (conceptual, not enforced)
Feed = Dict[str, Any]  # Contains 'episodes', 'title', etc.
Episode = Dict[str, Any]  # Contains 'title', 'guid', 'enclosures', etc.
Enclosure = Dict[str, Any]  # Contains 'url', 'file_size', 'mime_type'
```

--------------------------------

### get_episode_attr(key: str, default: any = None) -> any

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Gets an attribute from the current episode. Returns a default value if the attribute is not set.

```APIDOC
## get_episode_attr(key: str, default: any = None) -> any

### Description
Gets an attribute from the current episode. Returns `default` if not set.

### Parameters
#### Path Parameters
- **key** (str) - Required - The attribute name.
- **default** (any) - Optional - The default value to return if the key is not found.
```

--------------------------------

### Get Current Episode Attribute

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Retrieves an attribute from the current episode. Returns a specified default value if the attribute does not exist.

```python
def get_episode_attr(self, key, default=None):
    return self.episodes[-1].get(key, default)
```

--------------------------------

### Accessing Public API Functions and Classes

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

Demonstrates how to import and access the main functions, classes, and constants provided by the podcastparser module directly.

```python
import podcastparser

# All functions directly accessible
podcastparser.parse(...)
podcastparser.normalize_feed_url(...)
podcastparser.FeedParseError

# Classes directly accessible
handler = podcastparser.PodcastHandler(...)
ns = podcastparser.Namespace(...)

# Constants directly accessible
mapping = podcastparser.MAPPING
roots = podcastparser.VALID_ROOTS
namespaces = podcastparser.Namespace.NAMESPACES
```

--------------------------------

### Import Specific Podcastparser Functions

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md

Shows how to import specific functions from the podcastparser module, allowing for more targeted usage.

```python
# Import specific functions
from podcastparser import parse, normalize_feed_url, FeedParseError
```

--------------------------------

### add_itunes_categories() -> None

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Initializes the iTunes categories list within the podcast's metadata.

```APIDOC
## add_itunes_categories() -> None

### Description
Initializes the iTunes categories list.
```

--------------------------------

### get_podcast_attr(key: str, default: any = None) -> any

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Gets a podcast-level attribute from the metadata dictionary. Returns a default value if the attribute is not found.

```APIDOC
## get_podcast_attr(key: str, default: any = None) -> any

### Description
Gets a podcast-level attribute from the metadata dictionary. Returns `default` if not set.

### Parameters
#### Path Parameters
- **key** (str) - Required - The attribute name.
- **default** (any) - Optional - The default value to return if the key is not found.
```

--------------------------------

### Safe Access for Optional Fields (Check Key Presence)

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/types.md

Demonstrates how to safely access optional fields like 'chapters' by checking for their presence in the episode dictionary.

```python
# Safe access pattern
if 'chapters' in episode:
    for chapter in episode['chapters']:
        print(chapter['title'])
```

--------------------------------

### validate_episode() -> None

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Validates and cleans up the current episode after parsing is complete. This includes handling chapters, descriptions, GUIDs, titles, and internal flags.

```APIDOC
## validate_episode() -> None

### Description
Validates and cleans up the current episode after parsing is complete. Performs checks for empty chapters, HTML descriptions, missing GUIDs and titles, and cleans up internal flags.
```

--------------------------------

### Importing All Module Contents

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

This pattern imports the entire podcastparser module, allowing access to its functions and classes using the module name as a prefix.

```python
import podcastparser

feed = podcastparser.parse(...)
```

--------------------------------

### Main Parse Function

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/MANIFEST.txt

Documentation for the main parse() function, which is the primary entry point for parsing podcast feeds.

```APIDOC
## parse()

### Description
This is the main function used to parse podcast feeds. It takes a feed URL or content and returns a structured representation of the podcast.

### Parameters
- **feed_url_or_content** (str or bytes) - Required - The URL of the podcast feed or the raw feed content.
- **encoding** (str) - Optional - The character encoding of the feed content.

### Returns
- **Podcast** - A Podcast object containing the parsed feed data.

### Example
```python
from podcastparser import parse

podcast_data = parse('http://example.com/podcast.rss')
print(podcast_data.title)
```
```

--------------------------------

### Check Media Type (Python)

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Iterates through episode enclosures and checks the 'mime_type' to identify and print URLs for audio or video content. Uses startswith() for flexible matching of MIME types.

```python
for enclosure in episode['enclosures']:
    if enclosure['mime_type'].startswith('audio/'):
        print(f"Audio: {enclosure['url']}")
    elif enclosure['mime_type'].startswith('video/'):
        print(f"Video: {enclosure['url']}")
```

--------------------------------

### Basic Podcast Parsing from Local File

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-parse.md

Demonstrates how to parse a podcast feed from a local file using the `podcastparser.parse` function. It then prints the podcast title and a list of episodes with their titles and publication dates.

```python
import podcastparser

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('http://example.com/podcast.xml', f)

print(f"Podcast: {feed['title']}")
print(f"Episodes: {len(feed['episodes'])}")
for episode in feed['episodes']:
    print(f"  - {episode['title']} ({episode['published']})")
```

--------------------------------

### add_episode_persons() -> None

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Initializes the `persons` list for the current episode, specifically for the Podcast Index namespace.

```APIDOC
## add_episode_persons() -> None

### Description
Initializes the `persons` list for the current episode (Podcast Index namespace).
```

--------------------------------

### Initialize iTunes Owner Handler

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

This class is called when the 'itunes:owner' element opens, initializing the 'itunes_owner' dictionary.

```python
class ItunesOwnerItem(Target):
    """Creates iTunes owner object"""
```
```

--------------------------------

### Validate Current Episode

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Performs validation and cleanup on the current episode after parsing. This includes handling descriptions, generating missing GUIDs and titles, and cleaning internal flags.

```python
def validate_episode():
    # Validates and cleans up the current episode
```

--------------------------------

### Find Podcast Episode by ID or Title

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Search for a specific episode within a parsed feed using its GUID or a substring of its title. Ensure the feed object is already parsed.

```python
import podcastparser

def find_episode_by_guid(feed, target_guid):
    """Find episode by GUID"""
    for episode in feed['episodes']:
        if episode['guid'] == target_guid:
            return episode
    return None

def find_episode_by_title(feed, title_substring):
    """Find episode by partial title match"""
    for episode in feed['episodes']:
        if title_substring.lower() in episode['title'].lower():
            return episode
    return None

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/feed.xml', f)

# Search by GUID
ep = find_episode_by_guid(feed, 'episode-123')
if ep:
    print(f"Found: {ep['title']}")

# Search by title substring
ep = find_episode_by_title(feed, 'Interview')
if ep:
    print(f"Found: {ep['title']}")
```

--------------------------------

### Access Episode Data

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/quick-reference.md

Iterate through the list of episodes to access individual episode details including title, description, publication timestamp, GUID, link, total duration, and enclosures.

```python
for episode in feed['episodes']:
    episode['title']             # str
    episode['description']       # str
    episode['published']         # int (Unix timestamp)
    episode['guid']              # str
    episode['link']              # str
    episode['total_time']        # int (seconds)
    episode['enclosures']        # list of enclosure dicts
```

--------------------------------

### Base Target Class Definition

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/handlers.md

Defines the base class for all element handlers. It includes attributes for storing parsed data and methods for handling element start and end events.

```python
class Target(object):
    WANT_TEXT: bool = False
    
    def __init__(self, key=None, filter_func=lambda x: x.strip(), overwrite=True):
        self.key = key
        self.filter_func = filter_func
        self.overwrite = overwrite
    
    def start(self, handler, attrs): pass
    def end(self, handler, text): pass
```

--------------------------------

### Process HTML Content Safely

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Demonstrates how to safely handle HTML descriptions in podcast episodes by stripping tags and decoding entities. It also shows how to detect if a string is HTML and strip tags from arbitrary HTML.

```python
import podcastparser

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/feed.xml', f)

for episode in feed['episodes']:
    # Determine which description to use
    if 'description_html' in episode:
        # Strip HTML tags and decode entities
        text = podcastparser.remove_html_tags(episode['description_html'])
        print(f"{episode['title']} (from HTML)")
    elif episode['description']:
        # Use plain text description
        text = episode['description']
        print(f"{episode['title']} (plain text)")
    else:
        text = "(no description)"
        print(f"{episode['title']} (no description)")
    
    # Display excerpt
    excerpt = text[:200] if text else ''
    print(f
```

```python
    excerpt}...")
    print()

# Manual HTML detection
sample = "<p>This is HTML</p>"
is_html = podcastparser.is_html(sample)
print(f"\nIs HTML: {is_html}")

# Strip tags from arbitrary HTML
html = '<p>This is <strong>bold</strong> text</p>'
plain = podcastparser.remove_html_tags(html)
print(f"Original: {html}")
print(f"Stripped: {plain}")
```

--------------------------------

### Initialize Namespace Context

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Creates a new namespace context. It parses namespace declarations from XML attributes and sets the parent context for hierarchical lookup.

```python
def __init__(self, attrs, parent=None):
    self.namespaces = self.parse_namespaces(attrs)
    self.parent = parent
```

--------------------------------

### Episode Dictionary Structure

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md

Represents the structure of an individual episode dictionary within the podcast feed. Includes details such as title, description, link, publication date, GUID, enclosures, and total duration.

```APIDOC
## Episode Dictionary

This dictionary represents a single episode within a podcast feed.

### Fields
- **title** (str) - The title of the episode.
- **description** (str) - A description of the episode.
- **link** (str) - A URL related to the episode.
- **published** (int) - The publication date as a Unix timestamp.
- **guid** (str) - A globally unique identifier for the episode.
- **enclosures** (list[Enclosure, ...]) - A list of media file enclosures for the episode.
- **total_time** (int) - The total duration of the episode in seconds.
- **...** (optional fields) - May include additional fields.
```

--------------------------------

### Iterate with Error Handling (Python)

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Shows how to iterate over a list of episodes while gracefully handling potential KeyError or TypeError exceptions that might arise from unexpected data structures.

```python
try:
    for episode in feed['episodes']:
        print(episode['title'])
except (KeyError, TypeError):
    print("Unexpected data structure")
```

--------------------------------

### Importing Specific Items from Module

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

This pattern imports only the necessary functions and classes from the podcastparser module, reducing namespace pollution and improving clarity.

```python
from podcastparser import parse, normalize_feed_url, FeedParseError

feed = parse(...)
url = normalize_feed_url(...)
try:
    ...
except FeedParseError:
    ...
```

--------------------------------

### PodcastHandler Constructor

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Initializes the handler for parsing a single feed. It sets up attributes for storing feed data, episode limits, and managing the XML parsing state.

```python
def __init__(self, url, max_episodes):
    self.url = url
    self.max_episodes = max_episodes
    self.base = url
    self.text = None
    self.episodes = []
    self.data = {
        'title': file_basename_no_extension(url),
        'episodes': self.episodes,
    }
    self.path_stack = []
    self.namespace = None
```

--------------------------------

### Add Media Enclosure to Episode

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Adds a media enclosure (like an audio or video file) to the current episode. Requires URL, file size, and MIME type.

```python
def add_enclosure(self, url, file_size, mime_type):
    self.episodes[-1]['enclosures'].append({
        'url': url,
        'file_size': file_size,
        'mime_type': mime_type,
    })
```

--------------------------------

### Extracting Enclosure URLs from Episodes

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-parse.md

Demonstrates how to iterate through parsed episodes and extract details about their enclosures, including the URL, file size, and MIME type. This is useful for accessing media files associated with podcast episodes.

```python
import podcastparser

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('http://example.com/podcast.xml', f)

for episode in feed['episodes']:
    for enclosure in episode['enclosures']:
        print(f"Episode: {episode['title']}")
        print(f"  URL: {enclosure['url']}")
        print(f"  Size: {enclosure['file_size']} bytes")
        print(f"  Type: {enclosure['mime_type']}")
```

--------------------------------

### Parse Podcast Feed with podcastparser

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/INDEX.md

Demonstrates how to parse a podcast feed from a file and access its title and episode information. Ensure the feed file is opened in binary read mode ('rb').

```python
import podcastparser

# Parse a feed
with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/feed.xml', f)

# Access data
print(f"Podcast: {feed['title']}")
print(f"Episodes: {len(feed['episodes'])}")

for episode in feed['episodes']:
    print(f"  {episode['title']} ({episode['published']})")
```

--------------------------------

### Generate Feed Statistics (Python)

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Calculates and prints statistics such as total episodes, average duration, and total file size from a parsed podcast feed. Requires the feed to have episodes and enclosures with size information.

```python
import podcastparser
import statistics
from datetime import datetime

def feed_statistics(feed):
    """Calculate statistics about a feed"""
    if not feed['episodes']:
        return None
    
    episodes = feed['episodes']
    durations = [ep['total_time'] for ep in episodes if 'total_time' in ep]
    sizes = [enc['file_size'] for ep in episodes for enc in ep['enclosures']
             if enc['file_size'] > 0]
    
    stats = {
        'total_episodes': len(episodes),
        'episodes_with_duration': len(durations),
        'avg_duration': statistics.mean(durations) if durations else 0,
        'median_duration': statistics.median(durations) if durations else 0,
        'total_size_bytes': sum(sizes),
        'avg_file_size': statistics.mean(sizes) if sizes else 0,
    }
    return stats

with open('feed.xml', 'rb') as f:
    feed = podcastparser.parse('https://example.com/feed.xml', f)

stats = feed_statistics(feed)
if stats:
    print(f"Feed Statistics: {feed['title']}")
    print(f"  Total episodes: {stats['total_episodes']}")
    print(f"  Episodes with duration: {stats['episodes_with_duration']}")
    print(f"  Average duration: {stats['avg_duration']//60} minutes")
    print(f"  Median duration: {stats['median_duration']//60} minutes")
    print(f"  Total size: {stats['total_size_bytes']//(1024**3)} GB")
    print(f"  Average file size: {stats['avg_file_size']//(1024**2)} MB")
```

--------------------------------

### Constants

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

Publicly accessible constants for configuration and data mapping.

```APIDOC
## Constants

### Description
Publicly accessible constants.

### Accessible Constants
- `MAPPING`: Data mapping configuration.
- `VALID_ROOTS`: List of valid root elements for podcast feeds.
- `Namespace.NAMESPACES`: Predefined XML namespaces.
```

--------------------------------

### add_enclosure(url: str, file_size: int, mime_type: str) -> None

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-podcast-handler.md

Adds a media enclosure to the current episode. Enclosures represent the media files associated with an episode.

```APIDOC
## add_enclosure(url: str, file_size: int, mime_type: str) -> None

### Description
Adds a media enclosure to the current episode.

### Parameters
#### Path Parameters
- **url** (str) - Required - Normalized media file URL.
- **file_size** (int) - Required - File size in bytes (-1 if unknown).
- **mime_type** (str) - Required - MIME type (e.g., 'audio/mpeg').
```

--------------------------------

### Global Scope Functions

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md

Lists the available functions in the global scope for parsing podcast data, normalizing URLs, and processing text.

```python
parse(url, stream, max_episodes=0) -> dict
normalize_feed_url(url) -> str | None
parse_url(text) -> str | None
parse_time(value) -> int
parse_pubdate(text) -> int
parse_length(text) -> int
parse_type(text) -> str
file_basename_no_extension(filename) -> str
squash_whitespace(text) -> str
squash_whitespace_not_nl(text) -> str
is_html(text) -> bool
remove_html_tags(html) -> str | None
```

--------------------------------

### Module-Level Version and Author Information

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

These constants define the version, author, website, and license of the podcastparser library. They are typically used for informational purposes.

```python
__version__ = '0.6.11'              # Version string
__author__ = 'Thomas Perl <m@thp.io>'  # Author info
__website__ = 'http://gpodder.org/podcastparser/'  # Project URL
__license__ = 'ISC License'         # License type
```

--------------------------------

### Main Entry Point

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/module-overview.md

The primary function to parse a podcast feed. It takes a URL or a stream and an optional maximum number of episodes to retrieve, returning a dictionary containing podcast metadata and a list of episodes.

```APIDOC
## parse(url, stream, max_episodes=0)

### Description
Parses a podcast feed from a given URL or stream. It extracts structured metadata for the podcast and its episodes.

### Parameters
- **url** (string) - Optional - The URL of the podcast feed.
- **stream** (file-like object) - Optional - A file-like object containing the feed content.
- **max_episodes** (integer) - Optional - The maximum number of episodes to include in the output. Defaults to 0 (all episodes).

### Returns
A dictionary containing podcast metadata and a list of episodes.
```

--------------------------------

### Namespace Constructor

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-namespace.md

Initializes a new Namespace object, parsing XML attributes to establish namespace contexts. It can optionally link to a parent namespace for hierarchical lookups.

```APIDOC
## Constructor

### `__init__(attrs: dict, parent: Namespace = None)`

Creates a new namespace context.

#### Parameters
- `attrs` (dict) - Required - XML element attributes (from SAX handler)
- `parent` (Namespace or None) - Optional - Parent namespace context (for hierarchical lookup)

#### Attributes
- `namespaces` (dict) - Parsed namespace declarations (prefix → URI mapping)
- `parent` (Namespace or None) - Parent context for inheritance lookup
```

--------------------------------

### Namespace

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/api-surface.md

Represents XML namespaces used in podcast feeds.

```APIDOC
## Namespace(...)

### Description
Represents an XML namespace.

### Attributes
- `NAMESPACES`: A dictionary of predefined namespaces.
```

--------------------------------

### Basic Podcast Feed Parsing

Source: https://github.com/gpodder/podcastparser/blob/master/_autodocs/examples.md

Parses a podcast feed from a file and displays basic information such as title, description, number of episodes, and details of the latest episode. Ensure the feed file is opened in binary read mode.

```python
import podcastparser

# Open and parse a feed
with open('podcast.rss', 'rb') as f:
    feed = podcastparser.parse('https://example.com/podcast.rss', f)

# Display podcast information
print(f"Podcast: {feed['title']}")
print(f"Description: {feed['description']}")
print(f"Episodes: {len(feed['episodes'])}")

# Display first episode
if feed['episodes']:
    ep = feed['episodes'][0]
    print(f"\nLatest episode: {ep['title']}")
    print(f"Published: {ep['published']}")
    print(f"Length: {ep['total_time']} seconds")
```