### Install cmudict Package

Source: https://github.com/prosegrinder/python-cmudict/blob/main/README.md

Install the cmudict package using pip. This is the standard method for installing Python packages.

```bash
pip install cmudict
```

--------------------------------

### Get Punctuation Pronunciation Dictionary with cmudict.vp()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns a dictionary mapping 52 punctuation tokens to their ARPAbet pronunciations. Useful for handling punctuation in text processing.

```python
import cmudict

vp = cmudict.vp()
print(len(vp))   # 52

print(vp["!exclamation-point"])
# [['EH2', 'K', 'S', 'K', 'L', 'AH0', 'M', 'EY1', 'SH', 'AH0', 'N', 'P', 'OY2', 'N', 'T']]

print(vp[",comma"])
# [['K', 'AA1', 'M', 'AH0']]

# List all punctuation tokens
for token in sorted(vp.keys()):
    print(token)
# !exclamation-point
# "close-quote
# ...
```

--------------------------------

### Get CMUdict file content as a string

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Retrieve the full raw text of cmudict.dict as a string. This is identical to cmudict.raw().

```python
import cmudict

s = cmudict.dict_string()
print(len(s))   # 3618488
print(s[:80])
```

--------------------------------

### Get CMUdict entries as (word, phones) tuples

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Retrieve the CMU lexicon as a flat list of (word, phones) tuples. Preserves all entries, including multiple pronunciations for the same word. Compatible with NLTK's CMUDictCorpusReader.entries().

```python
import cmudict

entries = cmudict.entries()
print(len(entries))  # 135166

# First five entries
for word, phones in entries[:5]:
    print(word, "->", " ".join(phones))
# a -> AH0
# a(1) -> EY1
# a's -> EY1 Z
# a. -> EY1
# a.'s -> EY1 Z

# Count words with more than one pronunciation
from collections import Counter
word_counts = Counter(word for word, _ in entries)
multi_pron = [w for w, c in word_counts.items() if c > 1]
print(f"{len(multi_pron)} words have multiple pronunciations")
# e.g. 9114 words have multiple pronunciations
```

--------------------------------

### Get all words from CMUdict lexicon

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Obtain a flat list of all lowercase word strings from the CMU lexicon. Words with multiple pronunciations appear multiple times. Compatible with NLTK's CMUDictCorpusReader.words().

```python
import cmudict

words = cmudict.words()
print(len(words))   # 135166
print(words[:5])    # ['a', 'a(1)', "a's", 'a.', "a.'s"]

# Check if a word is in the lexicon
word_set = set(words)
print("hello" in word_set)   # True
print("foobar" in word_set)  # False
```

--------------------------------

### Retrieve CMUdict License

Source: https://github.com/prosegrinder/python-cmudict/blob/main/README.md

Get the license for the CMUdict data set as a string. This function is useful for understanding the terms of use for the dictionary data.

```python
cmudict.license_string() # Returns the cmudict license as a string
```

--------------------------------

### Access CMUdict Data Files

Source: https://github.com/prosegrinder/python-cmudict/blob/main/README.md

Import the cmudict library and access its data files. Functions are provided to get raw string content, binary streams, or minimally processed structures for dictionary, phones, symbols, and voice print data.

```python
import cmudict

cmudict.dict() # Compatible with NLTK
cmudict.dict_string()
cmudict.dict_stream()

cmudict.phones()
cmudict.phones_string()
cmudict.phones_stream()

cmudict.symbols()
cmudict.symbols_string()
cmudict.symbols_stream()

cmudict.vp()
cmudict.vp_string()
cmudict.vp_stream()
```

--------------------------------

### Access CMUdict as a Python dictionary

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Get the full CMU pronouncing dictionary as a Python dict. Keys are lowercase words, and values are lists of pronunciations. Compatible with NLTK's CMUDictCorpusReader.dict().

```python
import cmudict

d = cmudict.dict()

# Single-pronunciation word
print(d["hello"])
# [['HH', 'AH0', 'L', 'OW1']]

# Word with multiple pronunciations
print(d["spieth"])
# [['S', 'P', 'IY1', 'TH'], ['S', 'P', 'AY1', 'AH0', 'TH']]

# Pronunciation lookup with fallback
def get_pronunciation(word: str) -> list[list[str]]:
    pronunciations = d.get(word.lower())
    if pronunciations is None:
        raise KeyError(f
```

--------------------------------

### Get Parsed Phone Table with cmudict.phones()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Retrieves the 39 ARPAbet phones used in the dictionary. Useful for analyzing phonetic categories or filtering specific types of phones like vowels.

```python
import cmudict

phones = cmudict.phones()
print(len(phones))   # 39

for phone, categories in phones[:5]:
    print(phone, "->", categories)
# AA -> ['vowel']
# AE -> ['vowel']
# AH -> ['vowel']
# AO -> ['vowel']
# AW -> ['vowel']

# Get all vowel phones
vowels = [p for p, cats in phones if "vowel" in cats]
print("Vowels:", vowels)
# Vowels: ['AA', 'AE', 'AH', 'AO', 'AW', 'AY', 'EH', 'ER', 'EY', 'IH', 'IY', 'OW', 'OY', 'UH', 'UW']
```

--------------------------------

### Read raw CMUdict file content as a string

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Get the complete cmudict.dict file as a single string, including comment lines. Useful for regex-based processing or passing raw content to other tools. Compatible with NLTK's CMUDictCorpusReader.raw().

```python
import cmudict

raw = cmudict.raw()
print(type(raw))   # <class 'str'>
print(len(raw))    # 3618488

# Print the first three non-comment lines
lines = [l for l in raw.splitlines() if not l.startswith(";")]
for line in lines[:3]:
    print(line)
# A  AH0
# A(1)  EY1
# A'S  EY1 Z
```

--------------------------------

### cmudict.vp_stream() / cmudict.vp_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the cmudict.vp file as a binary stream or as a string respectively.

```APIDOC
## `cmudict.vp_stream()` / `cmudict.vp_string()` — Raw VP data

Returns the `cmudict.vp` file as a binary stream or as a string respectively.

### Usage Example (String)
```python
import cmudict

s = cmudict.vp_string()
print(len(s))   # 1747
```

### Usage Example (Stream)
```python
import cmudict

with cmudict.vp_stream() as stream:
    for line in stream:
        print(line.decode("utf-8").strip())
        break
# !exclamation-point  EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T
```
```

--------------------------------

### Access Raw VP Data with cmudict.vp_stream() / cmudict.vp_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Provides the cmudict.vp file content as a binary stream or a string. Useful for direct processing of punctuation pronunciation data.

```python
import cmudict

s = cmudict.vp_string()
print(len(s))   # 1747

with cmudict.vp_stream() as stream:
    for line in stream:
        print(line.decode("utf-8").strip())
        break
# !exclamation-point  EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T
```

--------------------------------

### cmudict.vp()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the cmudict.vp file as a dict mapping 52 punctuation tokens to their ARPAbet pronunciations.

```APIDOC
## `cmudict.vp()` — Punctuation pronunciation dictionary

Returns the `cmudict.vp` file as a `dict` mapping 52 punctuation tokens (e.g., `"!exclamation-point"`, `",comma"`) to their ARPAbet pronunciations, in the same structure as `cmudict.dict()`.

### Usage Example
```python
import cmudict

vp = cmudict.vp()
print(len(vp))   # 52

print(vp["!exclamation-point"])
# [['EH2', 'K', 'S', 'K', 'L', 'AH0', 'M', 'EY1', 'SH', 'AH0', 'N', 'P', 'OY2', 'N', 'T']]

print(vp[",comma"])
# [['K', 'AA1', 'M', 'AH0']]

# List all punctuation tokens
for token in sorted(vp.keys()):
    print(token)
# !exclamation-point
# "close-quote
# ...
```
```

--------------------------------

### cmudict.license_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the full text of the CMU Pronouncing Dictionary data license as a string.

```APIDOC
## `cmudict.license_string()` — CMUdict data license text

Returns the full text of the CMU Pronouncing Dictionary data license as a string, useful for attribution or display in downstream applications.

### Usage Example
```python
import cmudict

license_text = cmudict.license_string()
print(len(license_text))   # 1754
print(license_text[:200])
# CMUdict
# -------
# Copyright (C) 1993-2015 Carnegie Mellon University. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
```
```

--------------------------------

### Access Raw Phone Data with cmudict.phones_stream() / cmudict.phones_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Provides the cmudict.phones file content as a binary stream or a string. Useful for direct processing or when the full parsed table is not needed.

```python
import cmudict

# As string (382 characters)
s = cmudict.phones_string()
print(len(s))   # 382
print(s[:60])
# AA	vowel
# AE	vowel
# AH	vowel

# As stream (useful for streaming pipelines)
with cmudict.phones_stream() as stream:
    for line in stream:
        phone, ptype = line.decode("utf-8").strip().split()
        # process each phone inline
        pass
```

--------------------------------

### cmudict.phones_stream() / cmudict.phones_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the cmudict.phones file as a binary stream or as a string respectively.

```APIDOC
## `cmudict.phones_stream()` / `cmudict.phones_string()` — Raw phone data

Returns the `cmudict.phones` file as a binary stream or as a string respectively.

### Usage Example (String)
```python
import cmudict

s = cmudict.phones_string()
print(len(s))   # 382
print(s[:60])
# AA\tvowel
# AE\tvowel
# AH\tvowel
```

### Usage Example (Stream)
```python
import cmudict

with cmudict.phones_stream() as stream:
    for line in stream:
        phone, ptype = line.decode("utf-8").strip().split()
        # process each phone inline
        pass
```
```

--------------------------------

### cmudict.phones()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the 39 ARPAbet phones used in the dictionary as a list of (phone, [category]) tuples.

```APIDOC
## `cmudict.phones()` — Parsed phone table

Returns the 39 ARPAbet phones used in the dictionary as a list of `(phone, [category])` tuples where each phone maps to its phonetic category (e.g., vowel or consonant type).

### Usage Example
```python
import cmudict

phones = cmudict.phones()
print(len(phones))   # 39

for phone, categories in phones[:5]:
    print(phone, "->", categories)
# AA -> ['vowel']
# AE -> ['vowel']
# AH -> ['vowel']
# AO -> ['vowel']
# AW -> ['vowel']

# Get all vowel phones
vowels = [p for p, cats in phones if "vowel" in cats]
print("Vowels:", vowels)
# Vowels: ['AA', 'AE', 'AH', 'AO', 'AW', 'AY', 'EH', 'ER', 'EY', 'IH', 'IY', 'OW', 'OY', 'UH', 'UW']
```
```

--------------------------------

### Retrieve CMUdict License Text with cmudict.license_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the full text of the CMU Pronouncing Dictionary data license as a string. Essential for attribution and compliance in applications using the dictionary.

```python
import cmudict

license_text = cmudict.license_string()
print(len(license_text))   # 1754
print(license_text[:200])
# CMUdict
# -------
# Copyright (C) 1993-2015 Carnegie Mellon University. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
```

--------------------------------

### List All Phonetic Symbols with cmudict.symbols()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns a flat list of all 84 phonetic symbols, including ARPAbet phones and stress markers. Useful for validating pronunciations against the complete set of symbols.

```python
import cmudict

syms = cmudict.symbols()
print(len(syms))    # 84
print(syms[:10])
# ['AA', 'AA0', 'AA1', 'AA2', 'AE', 'AE0', 'AE1', 'AE2', 'AH', 'AH0']

# Validate that a pronunciation uses only known symbols
def is_valid_pronunciation(phones: list[str]) -> bool:
    valid = set(cmudict.symbols())
    return all(p in valid for p in phones)

print(is_valid_pronunciation(["HH", "AH0", "L", "OW1"]))  # True
print(is_valid_pronunciation(["HH", "XX", "L", "OW1"]))   # False
```

--------------------------------

### cmudict.symbols_stream() / cmudict.symbols_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the cmudict.symbols file as a binary stream or as a string respectively.

```APIDOC
## `cmudict.symbols_stream()` / `cmudict.symbols_string()` — Raw symbols data

Returns the `cmudict.symbols` file as a binary stream or as a string respectively.

### Usage Example (String)
```python
import cmudict

s = cmudict.symbols_string()
print(len(s))   # 281
print(s[:40])
# AA
# AA0
# AA1
```

### Usage Example (Stream)
```python
import cmudict

with cmudict.symbols_stream() as stream:
    all_syms = [line.decode("utf-8").strip() for line in stream]
print(all_syms[:5])
# ['AA', 'AA0', 'AA1', 'AA2', 'AE']
```
```

--------------------------------

### cmudict.dict_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the full raw text of cmudict.dict as a string, identical to cmudict.raw().

```APIDOC
## cmudict.dict_string()

### Description
Returns the full raw text of `cmudict.dict` as a string. Identical to `cmudict.raw()`.

### Usage
```python
import cmudict

s = cmudict.dict_string()
print(len(s))   # 3618488
print(s[:80])
```
```

--------------------------------

### cmudict.entries()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the CMU lexicon as a flat list of (word, phones) tuples, preserving all entries including multiple pronunciations for the same word.

```APIDOC
## cmudict.entries()

### Description
Returns the CMU lexicon as a flat list of `(word, phones)` tuples, preserving all 135,166 entries including multiple pronunciations for the same word (each as a separate tuple). Compatible with NLTK's `CMUDictCorpusReader.entries()`.

### Usage
```python
import cmudict

entries = cmudict.entries()
print(len(entries))  # 135166

# First five entries
for word, phones in entries[:5]:
    print(word, "->", " ".join(phones))
# a -> AH0
# a(1) -> EY1
# a's -> EY1 Z
# a. -> EY1
# a.'s -> EY1 Z
```

### Example: Counting Multiple Pronunciations
```python
from collections import Counter
word_counts = Counter(word for word, _ in entries)
multi_pron = [w for w, c in word_counts.items() if c > 1]
print(f"{len(multi_pron)} words have multiple pronunciations")
# e.g. 9114 words have multiple pronunciations
```
```

--------------------------------

### cmudict.raw()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the complete cmudict.dict file as a single string, including comment lines. Useful for regex-based processing.

```APIDOC
## cmudict.raw()

### Description
Returns the complete `cmudict.dict` file as a single string (3,618,488 characters), including comment lines that begin with `;`. Useful for regex-based processing or passing the raw content to another tool. Compatible with NLTK's `CMUDictCorpusReader.raw()`.

### Usage
```python
import cmudict

raw = cmudict.raw()
print(type(raw))   # <class 'str'>
print(len(raw))    # 3618488
```

### Example: Printing First Three Non-Comment Lines
```python
lines = [l for l in raw.splitlines() if not l.startswith(";")]
for line in lines[:3]:
    print(line)
# A  AH0
# A(1)  EY1
# A'S  EY1 Z
```
```

--------------------------------

### cmudict.words()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns a flat list of all lowercase word strings from the CMU lexicon, including duplicates for words with multiple pronunciations.

```APIDOC
## cmudict.words()

### Description
Returns a flat list of all 135,166 lowercase word strings from the CMU lexicon (one entry per pronunciation row, so words with multiple pronunciations appear more than once). Compatible with NLTK's `CMUDictCorpusReader.words()`.

### Usage
```python
import cmudict

words = cmudict.words()
print(len(words))   # 135166
print(words[:5])    # ['a', 'a(1)', "a's", 'a.', "a.'s"]
```

### Example: Checking Word Existence
```python
word_set = set(words)
print("hello" in word_set)   # True
print("foobar" in word_set)  # False
```
```

--------------------------------

### Access Raw Symbols Data with cmudict.symbols_stream() / cmudict.symbols_string()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Provides the cmudict.symbols file content as a binary stream or a string. Useful for direct processing of the symbol list.

```python
import cmudict

s = cmudict.symbols_string()
print(len(s))   # 281
print(s[:40])
# AA
# AA0
# AA1

with cmudict.symbols_stream() as stream:
    all_syms = [line.decode("utf-8").strip() for line in stream]
print(all_syms[:5])
# ['AA', 'AA0', 'AA1', 'AA2', 'AE']
```

--------------------------------

### cmudict.dict()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns the full CMU pronouncing dictionary as a dict. Keys are lowercase words, and values are lists of pronunciations (each pronunciation is a list of ARPAbet phone strings).

```APIDOC
## cmudict.dict()

### Description
Returns the full CMU pronouncing dictionary as a `dict` whose keys are lowercase words and whose values are lists of pronunciations (each pronunciation is a list of ARPAbet phone strings). Words with multiple valid pronunciations have multiple entries in the list. Compatible with NLTK's `CMUDictCorpusReader.dict()`.

### Usage
```python
import cmudict

d = cmudict.dict()

# Single-pronunciation word
print(d["hello"])
# [['HH', 'AH0', 'L', 'OW1']]

# Word with multiple pronunciations
print(d["spieth"])
# [['S', 'P', 'IY1', 'TH'], ['S', 'P', 'AY1', 'AH0', 'TH']]
```

### Example Function
```python
def get_pronunciation(word: str) -> list[list[str]]:
    pronunciations = d.get(word.lower())
    if pronunciations is None:
        raise KeyError(f"'{word}' not found in CMUdict ({len(d)} entries)")
    return pronunciations

print(get_pronunciation("Python"))
# [['P', 'AY1', 'TH', 'AH0', 'N']]
```
```

--------------------------------

### cmudict.dict_stream()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns an open binary file-like object for cmudict.dict, suitable for memory-efficient line-by-line processing.

```APIDOC
## cmudict.dict_stream()

### Description
Returns an open binary file-like object (`IO[bytes]`) for `cmudict.dict`. Useful for memory-efficient line-by-line processing of the full dictionary without loading it entirely into memory. The caller is responsible for closing the stream.

### Usage
```python
import cmudict

pronunciations = []
filehandle = cmudict.dict_stream()
for line in filehandle:
    decoded = line.strip().decode("utf-8")
    if decoded.startswith(";"):  # skip comment lines
        continue
    word, phones = decoded.split(" ", 1)
    pronunciations.append((word.split("(", 1)[0].lower(), phones))
filehandle.close()

print(len(pronunciations))          # 135166
print(pronunciations[0])            # ('a', 'AH0')
```
```

--------------------------------

### cmudict.symbols()

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Returns all 84 phonetic symbols (ARPAbet phones plus stress markers) used in the dictionary as a flat list of strings.

```APIDOC
## `cmudict.symbols()` — List of all phonetic symbols

Returns all 84 phonetic symbols (ARPAbet phones plus stress markers) used in the dictionary as a flat list of strings.

### Usage Example
```python
import cmudict

syms = cmudict.symbols()
print(len(syms))    # 84
print(syms[:10])
# ['AA', 'AA0', 'AA1', 'AA2', 'AE', 'AE0', 'AE1', 'AE2', 'AH', 'AH0']

# Validate that a pronunciation uses only known symbols
def is_valid_pronunciation(phones: list[str]) -> bool:
    valid = set(cmudict.symbols())
    return all(p in valid for p in phones)

print(is_valid_pronunciation(["HH", "AH0", "L", "OW1"]))  # True
print(is_valid_pronunciation(["HH", "XX", "L", "OW1"]))   # False
```
```

--------------------------------

### Access CMUdict file as a binary stream

Source: https://context7.com/prosegrinder/python-cmudict/llms.txt

Obtain an open binary file-like object for cmudict.dict. Useful for memory-efficient line-by-line processing without loading the entire dictionary into memory. The caller is responsible for closing the stream.

```python
import cmudict

pronunciations = []
filehandle = cmudict.dict_stream()
for line in filehandle:
    decoded = line.strip().decode("utf-8")
    if decoded.startswith(";"):  # skip comment lines
        continue
    word, phones = decoded.split(" ", 1)
    pronunciations.append((word.split("(", 1)[0].lower(), phones))
filehandle.close()

print(len(pronunciations))          # 135166
print(pronunciations[0])            # ('a', 'AH0')
```

--------------------------------

### NLTK Compatibility Functions

Source: https://github.com/prosegrinder/python-cmudict/blob/main/README.md

Utilize functions that maintain compatibility with NLTK's corpus reader for CMUdict. These include accessing entries, raw data, and words.

```python
cmudict.entries() # Compatible with NLTK
cmudict.raw() # Compatible with NLTK
cmudict.words() # Compatible with NTLK
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.