### ftfy Negative Examples and Manual Fixes

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Demonstrates text examples that ftfy does not alter because they are not considered mojibake. It also shows how to manually attempt fixes using different encodings when ftfy's automatic correction is not applied.

```python
NEGATIVE_EXAMPLES = [
    "Con il corpo e lo spirito ammaccato,\u00a0è come se nel cuore avessi un vetro conficcato.",
    "2012—∞",
    "TEM QUE SEGUIR, SDV SÓ…",
    "Join ZZAJÉ’s Official Fan List",
    "(-1/2)! = √π",
    "OK??:(   `¬´    ):"
]

for example in NEGATIVE_EXAMPLES:
    # ftfy doesn't "fix" these because they're not broken, but we can manually try fixes
    try:
        print(example.encode('sloppy-windows-1252').decode('utf-8'))
    except UnicodeError:
        print(example.encode('macroman').decode('utf-8'))
    assert ftfy.fix_encoding(example) == example
```

--------------------------------

### Fix Text Examples

Source: https://github.com/rspeer/python-ftfy/blob/main/README.md

Illustrates the core functionality of ftfy's `fix_text` function with various real-world examples of mojibake, including nested encoding issues, curly quotes, and incorrect handling of non-breaking spaces.

```python
import ftfy

# Basic mojibake
print(ftfy.fix_text('âœ” No problems'))

# Multiple layers of mojibake
print(ftfy.fix_text('The Mona Lisa doesnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢t have eyebrows.'))

# Mojibake with curly quotes
print(ftfy.fix_text("l’humanitÃ©"))

# Mojibake with non-breaking spaces
print(ftfy.fix_text('Ã\xa0 perturber la rÃ©flexion'))
print(ftfy.fix_text('Ã perturber la rÃ©flexion'))

# HTML entities
print(ftfy.fix_text('P&EACUTE;REZ'))

# Unchanged text (avoids false positives)
print(ftfy.fix_text('IL Y MARQUÉ…'))
```

--------------------------------

### Chained Encoding/Decoding Errors

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

This example demonstrates a common scenario where mojibake is created and then exacerbated by repeatedly encoding and decoding text with incompatible encodings (UTF-8 and Windows-1252).

```python
text = "l’Hôpital"
print(text.encode('utf-8').decode('windows-1252').encode('utf-8').decode('windows-1252'))
```

--------------------------------

### Fix Encoding Example

Source: https://github.com/rspeer/python-ftfy/blob/main/README.md

Demonstrates how to use the `fix_encoding` function from the ftfy library to correct text that has been improperly encoded.

```python
from ftfy import fix_encoding
print(fix_encoding("(à¸‡'âŒ£')à¸‡"))
```

--------------------------------

### Avoid False Positives

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/index.rst

Highlights ftfy's heuristic approach to avoid altering text that is already correct. This example shows text that might appear like mojibake but is intentionally left unchanged.

```python
import ftfy

ftfy.fix_text('IL Y MARQUÉ…')
```

--------------------------------

### Fixing MacRoman Mojibake with ftfy

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

This example shows how ftfy can correct text that has been misinterpreted due to MacRoman encoding. It first encodes the string using MacRoman and then decodes it as UTF-8, simulating a common mojibake scenario, before ftfy corrects it.

```python
EXAMPLES = [
    "Merci de t‚Äö√†√∂¬¨¬©l‚Äö√†√∂¬¨¬©charger le plug-in"
]

# Simulate MacRoman mojibake
mojibake_text = EXAMPLES[0].encode('macroman').decode('utf-8')

# Fix the mojibake using ftfy
fixed_text = fix_and_explain(mojibake_text)[0]

print(f"Original (simulated MacRoman mojibake): {mojibake_text}")
print(f"Fixed by ftfy: {fixed_text}")
```

--------------------------------

### ftfy Module Documentation

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/heuristic.rst

Provides an overview of the `ftfy` library's modules and functions related to mojibake detection and fixing.

```APIDOC
Module: ftfy

Description: Fixes mojibake in text.

Submodules:
  - ftfy.badness: Contains heuristics for detecting mojibake.
    - Functions:
      - badness(text: str) -> float: Calculates a 'badness' score for the given text.
      - is_bad(text: str) -> bool: Returns True if the text is considered 'bad' (mojibake).
  - ftfy.chardata: Contains regular expressions for mojibake detection.
    - Constants:
      - UTF8_DETECTOR_RE: Regex for detecting specific UTF-8 decoding errors.
      - LOSSY_UTF8_RE: Regex for detecting lossy UTF-8 sequences (with replacements).
  - ftfy.fixes: Contains functions to fix various types of mojibake.
    - Functions:
      - decode_inconsistent_utf8(text: str) -> str: Fixes text with inconsistent UTF-8 decoding.
      - replace_lossy_sequences(text: str) -> str: Replaces lossy UTF-8 sequences with the replacement character.
```

--------------------------------

### Highlighting Mojibake Matches with ftfy

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Illustrates how ftfy identifies potential mojibake using a regular expression. It shows how to find these matches and manually highlight them in a string, demonstrating the 'badness' metric.

```python
text = "Ã perturber la rÃ©flexion des thÃ©ologiens jusqu\'Ã nos jours"

# We want to highlight the matches to this regular expression:
ftfy.badness.BADNESS_RE.findall(text)

# We'll just highlight it manually:
term = blessings.Terminal()
highlighted_text = term.on_yellow("Ã ") + "perturber la r" + term.on_yellow("Ã©") + "flexion des th" + term.on_yellow("Ã©") + "ologiens jusqu\'Ã nos jours"

# Highlighted text shows matches for the 'badness' expression.
# If we've confirmed from them that this is mojibake, and there's a consistent fix, we
# can fix even text in contexts that were too unclear for the regex, such as the final Ã.

print(highlighted_text)
print(ftfy.fix_text(highlighted_text))
```

--------------------------------

### ftfy Command-line Usage

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/cli.rst

Provides the usage documentation for the 'ftfy' command-line tool. It outlines the available arguments and options for processing text files, including input/output handling, encoding detection, normalization, and entity preservation.

```text
usage: ftfy [-h] [-o OUTPUT] [-g] [-e ENCODING] [-n NORMALIZATION]
            [--preserve-entities]
            [filename]

ftfy (fixes text for you), version 6.0

positional arguments:
  filename              The file whose Unicode is to be fixed. Defaults to -, meaning standard input.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        The file to output to. Defaults to -, meaning standard output.
  -g, --guess           Ask ftfy to guess the encoding of your input. This is risky. Overrides -e.
  -e ENCODING, --encoding ENCODING
                        The encoding of the input. Defaults to UTF-8.
  -n NORMALIZATION, --normalization NORMALIZATION
                        The normalization of Unicode to apply. Defaults to NFC. Can be "none".
  --preserve-entities   Leave HTML entities as they are. The default is to decode them, as long as no HTML tags have appeared in the file.
```

--------------------------------

### Fix Text and Explain Mojibake

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/explain.rst

Demonstrates how to use `fix_and_explain` to correct mojibake in text and obtain a step-by-step explanation of the transformations applied. This is useful for understanding encoding issues.

```python
from ftfy import fix_and_explain, apply_plan
shipping_label = "L&AMP;AMP;ATILDE;&AMP;AMP;SUP3;PEZ"
fixed, explanation = fix_and_explain(shipping_label)
print(fixed)
# Output: LóPEZ
print(explanation)
# Output: [('apply', 'unescape_html'), ('apply', 'unescape_html'), ('apply', 'unescape_html'), ('encode', 'latin-1'), ('decode', 'utf-8')]

label2 = "CARR&AMP;AMP;ATILDE;&AMP;AMP;COPY;"
print(apply_plan(label2, explanation))
# Output: CARRé
```

--------------------------------

### ftfy License Requirements

Source: https://github.com/rspeer/python-ftfy/blob/main/README.md

Outlines the core requirements of the Apache 2.0 license for using and distributing ftfy, including attribution.

```APIDOC
If you use or distribute ftfy, you must follow the terms of the [Apache license](https://www.apache.org/licenses/LICENSE-2.0), including that you must attribute the author of ftfy (Robyn Speer) correctly.
```

--------------------------------

### Opening Files with UTF-8 Encoding in Python

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/avoid.rst

Demonstrates how to open files in Python 3 using UTF-8 encoding and specifying error handling. This is the recommended approach for most text files.

```python
openfile = open(filename, encoding='utf-8', errors='replace')
```

--------------------------------

### Encode and Decode Text

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Demonstrates mojibake by encoding a string in Windows-1252 and then decoding it with MacRoman.

```python
phrase = "Plus ça change, plus c’est la même chose"

phrase.encode('windows-1252').decode('macroman')
```

--------------------------------

### Import and Version Check

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Imports the ftfy library and accesses its version attribute.

```python
import ftfy
ftfy.__version__
```

--------------------------------

### Show CP437 Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Displays the CP437 character table.

```python
show_char_table('cp437')
```

--------------------------------

### ftfy Configuration Options

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/config.rst

This section outlines various configuration options for the ftfy library, allowing users to customize text fixing behavior. Options include disabling HTML unescaping, preserving CJK text spacing, managing quotation marks, and controlling UTF-8 decoding.

```APIDOC
ftfy.fix_text(text, config=None, **kwargs)
  Fixes text using a sequence of fixes.

ftfy.fix_and_explain(text, config=None, **kwargs)
  Fixes text and explains the changes made.

Configuration Options:
  unescape_html (bool): If True, unescapes HTML entities. Set to False to preserve HTML.
  fix_character_width (bool): If True, fixes character width issues, especially for CJK text. Set to False to preserve spacing.
  uncurl_quotes (bool): If True, replaces typographically correct quotes with standard ones. Set to False to preserve them or use smartypants.
  decode_inconsistent_utf8 (bool): If True, attempts to fix decoding errors in UTF-8. Set to False for cautious fixing.

TextFixerConfig:
  An object that holds the configuration for ftfy. Can be passed directly to fix_text and fix_and_explain.
  Keyword arguments can be passed to override default configuration values.
```

--------------------------------

### Recognizing and Fixing Mojibake with ftfy

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

This section demonstrates the use of the `ftfy.fix_and_explain` function to identify and correct various types of mojibake. It takes a potentially corrupted string, fixes it, and provides an explanation of the detected issues.

```python
from ftfy import fix_and_explain
from pprint import pprint

def show_explanation(text):
    print(f"Original: {text}")
    fixed, expl = fix_and_explain(text)
    print(f"   Fixed: {fixed}\n")
    pprint(expl)

EXAMPLES = [
    "Merci de t‚Äö√†√∂¬¨¬©l‚Äö√†√∂¬¨¬©charger le plug-in",
    
    "The Mona Lisa doesnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢t have eyebrows.",
    
    "I just figured out how to tweet emojis! âx9a½íxa0½í¸x80íxa0½í¸x81íxa0½í¸"
    "\x82íxa0½í¸x86íxa0½í¸x8eíxa0½í¸x8eíxa0½í¸x8eíxa0½í¸x8e"
]

show_explanation(EXAMPLES[0])

show_explanation(EXAMPLES[1])

show_explanation(EXAMPLES[2])
```

--------------------------------

### Show ASCII Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Displays the ASCII character table using the utility function.

```python
show_char_table("ascii")
```

--------------------------------

### Show MacRoman Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Displays the MacRoman character table.

```python
show_char_table('macroman')
```

--------------------------------

### ftfy.ExplainedText Class

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/explain.rst

Describes the structure of the NamedTuple returned by `fix_and_explain` and `fix_encoding_and_explain` functions. It contains the fixed text and a list of applied transformations.

```python
from ftfy import fix_and_explain

text = "Some text"
fixed, explanation = fix_and_explain(text)

# The 'explanation' variable is an instance of ExplainedText
print(type(explanation))
# Output: <class 'ftfy.badness.ExplainedText'>
print(explanation.fixed)
# Output: Some text
print(explanation.explanation)
# Output: []
```

--------------------------------

### ftfy Citation for Research

Source: https://github.com/rspeer/python-ftfy/blob/main/README.md

Provides the recommended citation format for the ftfy library in research contexts, including version and DOI.

```APIDOC
Robyn Speer. (2019). ftfy (Version 5.5). Zenodo.
http://doi.org/10.5281/zenodo.2591652
```

--------------------------------

### ftfy.fix_and_explain Function

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/explain.rst

Similar to `fix_text`, but also returns a detailed explanation of the transformations performed on the entire text. It does not process text line by line, aiming for a unified explanation.

```python
from ftfy import fix_and_explain

text_to_explain = "Another example with \'\'
fixed, explanation = fix_and_explain(text_to_explain)
print(fixed)
# Example Output: Another example with '
print(explanation)
# Example Output: [('apply', 'unescape_html'), ('apply', 'unescape_html')]
```

--------------------------------

### ftfy Test Case Structure

Source: https://github.com/rspeer/python-ftfy/blob/main/tests/test-cases/README.md

Defines the structure of a test case JSON file for the ftfy library. It includes fields for labeling, commenting, original text, and expected fixed text.

```json
{
  "label": "A description of the test case.",
  "comment": "Further details on the test case.",
  "original": "The text to run through ftfy.",
  "fixed-encoding": "(optional) The expected result of ftfy.fix_encoding(original)",
  "fixed": "The expected result of ftfy.fix_text(original)",
  "expect": "pass | fail"
}
```

--------------------------------

### Show Windows-1251 Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Displays the Windows-1251 character table.

```python
show_char_table("windows-1251")
```

--------------------------------

### Displaying Mojibake from DOS NFO Files

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

This snippet shows how to interpret and display text that was originally encoded in CP437 (common in DOS) and then potentially misinterpreted as Windows-1252. It highlights the 'vintage mojibake' often seen in .NFO files.

```python
crack_nfo = r"""
  ───── ────────────── ───────────────── ────────────── ─────────────── ────
 ▄█████▄ ▀█████▄ ████▄ ▀█████████ ████▄   ▄█████▄▀████▄ ▀██████▄ ▀████▄  ▄██▄
 ████████ █████▀ ██████ ▀████████ ██████ ▀███████▌██████  ███████  ████▄█████
 ███ ▀███▌█▀ ▄▄▄█▌▀█████ ▌████ ▄▄▄█▀▐████ ███▀▌▀█▌█▌ ███▌ ██▀▐████ ██████████
 ███  ▐██▌▌  ████  ▌████  ████ ████  ████ ██▌  ▐▌█▌ ████ ██  ████ ██▌▐█▌▐███
 ███ ▄███▌▄▄ ████   ████  ████ ████ ▄████ ██▌  █▄▄█▌ ████ ██ ▄████ ██  █  ███
 ████████ ██ ████   ████  ████ ████ █████ ██▌  ▀▀██▌▐███▀ ██▐█████ ██  ▄  ███
 ██████▀ ▄▀▀ ████   ████  ████ ████ ▀████ █████▄▐██▄▀▀ ▄███ ▀████ ██  ▄  ███
 ███▀ ▄▄██   ████  ▐████  ████ ████  ████ ██▌▐▐██▌█▀██▄ ████  ████ ██     ███
 ███  █████▄  ▀▀█  ████▀ ▐████ ░███ ▐████ ███▄▐██▌█▌▐██▌▐███ ▐███░ ██▌   r███
 ░██  █████████▄   ▐██▌▄██████ ▒░██ █████ ███████▌█▌▐███ ███ ███░▒ ███   o██░
 ▒░█   ▀███████▀  ▀ ███▐████▀  ▓▒░█ ▐███▌ ▀██████▐█▌▐███ ███ ▐█░▒▓ ██▌   y█░▒
 - ▌─────▐▀─ ▄▄▄█ ── ▀▀ ───── ────── ▀▀▀ ─ ▐▀▀▀▀ ▀▀ ████ ──── ▀█▀ ─ ▀ ────▐ ─

   ╓────────────────────────[ RELEASE INFORMATION ]───────────────────────╖
╓────────────────────────────────────────────────────────────────────────────╖
║  -/- THE EVEN MORE INCREDIBLE MACHINE FOR *DOS* FROM SIERRA/DYNAMIX -/-  ║
╙────────────────────────────────────────────────────────────────────────────╜
""")

print(crack_nfo.encode('cp437').decode('windows-1252'))
```

--------------------------------

### Badness Heuristic Functions - ftfy.badness

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/heuristic.rst

Provides functions for calculating the 'badness' of text, a heuristic for detecting mojibake. It includes the main `badness` function and `is_bad` for checking if text is considered bad.

```python
import ftfy.badness

# Example usage:
text = "This is some text."
score = ftfy.badness.badness(text)
print(f"Badness score: {score}")

is_text_bad = ftfy.badness.is_bad(text)
print(f"Is text bad? {is_text_bad}")
```

--------------------------------

### Show Windows-1252 Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Displays the Windows-1252 character table.

```python
show_char_table('windows-1252')
```

--------------------------------

### Fix Multiple Layers of Mojibake

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/index.rst

Shows how ftfy can resolve text that has undergone several layers of encoding corruption, a common issue with legacy systems or complex data pipelines.

```python
import ftfy

ftfy.fix_text('The Mona Lisa doesnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢t have eyebrows.')
```

--------------------------------

### Show Latin-1 Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Displays the Latin-1 character table, highlighting the 'here be dragons' section.

```python
show_char_table('latin-1')
```

--------------------------------

### Display Character Table

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

A utility function to display character tables for different encodings, highlighting printable and control characters. It uses the blessings library for terminal formatting and ftfy.formatting for centering text.

```python
import blessings
term = blessings.Terminal()  # enable colorful text

def displayable_codepoint(codepoint, encoding):
    char = bytes([codepoint]).decode(encoding, 'replace')
    if char == '':
        return '▓▓'
    elif not char.isprintable():
        return '░░'
    else:
        return char

def show_char_table(encoding):
    print(f"encoding: {encoding}\n     0 1 2 3 4 5 6 7 8 9 a b c d e f\n")
    for row in range(16):
        print(f"{row*16:>02x}", end="   ")
        if row == 0:
            print(ftfy.formatting.display_center(term.green(" control characters "), 32, "░"))
        elif row == 8 and encoding == 'latin-1':
            print(ftfy.formatting.display_center(term.green(" here be dragons "), 32, "░"))
        else:
            for col in range(16):
                char = displayable_codepoint(row * 16 + col, encoding)
                print(f"{char:<2}", end="")
            print()

```

--------------------------------

### UTF-8 Character Encoding Visualization

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

This function demonstrates how characters are represented in UTF-8 by encoding them and displaying the resulting byte sequence. It helps visualize the multi-byte nature of UTF-8 for non-ASCII characters.

```python
# Code to look at the encoding of each character in UTF-8

def show_utf8(text):
    for char in text:
        char_bytes = char.encode('utf-8')
        byte_sequence = ' '.join([f"{byte:>02x}" for byte in char_bytes])
        print(f"{char} = {byte_sequence}")

text = "l’Hôpital"
show_utf8(text)
```

--------------------------------

### Fix Mojibake Text

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Demonstrates the primary function of the ftfy library by fixing a string containing mojibake.

```python
ftfy.fix_text("merci de t‚Äö√†√∂¬¨¬©l‚Äö√†√∂¬¨¬©charger le plug-in")
```

--------------------------------

### Decoding Bytes to Text as UTF-8 in Python

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/avoid.rst

Shows how to decode a byte buffer into a text string using UTF-8 encoding in Python. This is used when converting raw bytes to readable text.

```python
text = bytebuffer.decode('utf-8', 'replace')
```

--------------------------------

### Fix Mojibake with Non-Breaking Spaces

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/index.rst

Demonstrates ftfy's handling of mojibake where non-breaking spaces (U+A0) were incorrectly converted to ASCII spaces, potentially leading to multiple spaces.

```python
import ftfy

ftfy.fix_text('Ã\xa0 perturber la rÃ©flexion')
ftfy.fix_text('Ã perturber la rÃ©flexion')
```

--------------------------------

### ftfy Restrictions for AI Training

Source: https://github.com/rspeer/python-ftfy/blob/main/README.md

Specifies restrictions on creating derived works from ftfy, particularly those involving AI training datasets or obscuring authorship.

```APIDOC
You _may not_ make a derived work of ftfy that obscures its authorship, such as by putting its code in an AI training dataset, including the code in AI training at runtime, or using a generative AI that copies code from such a dataset.
```

--------------------------------

### Fix Mojibake (Encoding Mix-ups)

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/index.rst

Demonstrates ftfy's ability to correct text where characters were misinterpreted due to incorrect encoding. It handles common UTF-8 decoding errors.

```python
import ftfy

ftfy.fix_text('âœ” No problems')
```

--------------------------------

### ftfy BibTeX Citation

Source: https://github.com/rspeer/python-ftfy/blob/main/README.md

Presents the citation for the ftfy library in BibTeX format, suitable for LaTeX documents.

```BibTeX
@misc{speer-2019-ftfy,
  author       = {Robyn Speer},
  title        = {ftfy},
  note         = {Version 5.5},
  year         = 2019,
  howpublished = {Zenodo},
  doi          = {10.5281/zenodo.2591652},
  url          = {https://doi.org/10.5281/zenodo.2591652}
}
```

--------------------------------

### ftfy.explain_unicode Function

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/explain.rst

A utility function that breaks down a string into its constituent Unicode characters, providing details about each character. Useful for understanding the composition of strings.

```python
from ftfy import explain_unicode

unicode_string = "Héllo"
explanation = explain_unicode(unicode_string)
print(explanation)
# Output: H (U+0048) é (U+00E9) l (U+006C) l (U+006C) o (U+006F)
```

--------------------------------

### ftfy.fix_encoding and ftfy.fix_encoding_and_explain Functions

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/explain.rst

These functions specifically address encoding and decoding problems, excluding other issues like HTML entities. `fix_encoding` returns only the fixed string, while `fix_encoding_and_explain` provides an explanation.

```python
from ftfy import fix_encoding, fix_encoding_and_explain

encoding_issue_text = "\xc3\xa9cole"
fixed_string = fix_encoding(encoding_issue_text)
print(fixed_string)
# Output: école

fixed_str, explanation = fix_encoding_and_explain(encoding_issue_text)
print(fixed_str)
# Output: école
print(explanation)
# Output: [('encode', 'latin-1'), ('decode', 'utf-8')]
```

--------------------------------

### Fix Mojibake with Curly Quotes

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/index.rst

Illustrates ftfy's capability to fix mojibake that has been combined with the incorrect rendering of 'curly quotes', requiring a two-step correction process.

```python
import ftfy

ftfy.fix_text("l’humanitÃ©")
```

--------------------------------

### Lossy UTF-8 Heuristic - ftfy.chardata

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/heuristic.rst

Describes the `LOSSY_UTF8_RE` regular expression in `chardata.py`. This heuristic targets sequences that appear to be incorrectly decoded UTF-8, where characters are replaced by question marks or the Unicode replacement character (''). It's utilized by `ftfy.fixes.replace_lossy_sequences`.

```python
import ftfy.fixes

# Assuming LOSSY_UTF8_RE is accessible or its logic is demonstrated
# This is a conceptual example as the regex itself is not provided in the text.
# In practice, you would use ftfy.fixes.replace_lossy_sequences directly.

# Example of how it might be used (conceptual):
# text_with_replacements = "Some text with  characters."
# cleaned_text = ftfy.fixes.replace_lossy_sequences(text_with_replacements)

print("The lossy UTF-8 heuristic uses the regex LOSSY_UTF8_RE in chardata.py.")
print("It replaces sequences with '?' or '' with the replacement character.")
```

--------------------------------

### UTF-8 Detector Heuristic - ftfy.chardata

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/heuristic.rst

Details the `UTF8_DETECTOR_RE` regular expression in `chardata.py`. This heuristic identifies specific sequences of mojibake characters resulting from common UTF-8 decoding errors. It's used in `ftfy.fixes.decode_inconsistent_utf8`.

```python
import re
import ftfy.fixes

# Assuming UTF8_DETECTOR_RE is accessible or its logic is demonstrated
# This is a conceptual example as the regex itself is not provided in the text.
# In practice, you would use ftfy.fixes.decode_inconsistent_utf8 directly.

# Example of how it might be used (conceptual):
# mojibake_text = "..."
# fixed_text = ftfy.fixes.decode_inconsistent_utf8(mojibake_text)

print("The UTF-8 detector heuristic uses the regex UTF8_DETECTOR_RE in chardata.py.")
print("It helps fix text that has specific UTF-8 decoding errors.")
```

--------------------------------

### UTF-8 to Windows-1252 Conversion

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

This snippet shows the result of encoding a string containing accented characters into UTF-8 and then attempting to decode it using the Windows-1252 encoding. This often results in mojibake.

```python
text = "l’Hôpital"
print(text.encode('utf-8').decode('windows-1252'))
```

--------------------------------

### ftfy.fix_encoding

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/config.rst

A specialized function to detect and repair decoding errors (mojibake) in text. While it can be used independently, mojibake might be entangled with other text issues, and limiting the process to this step could make some mojibake unfixable.

```python
import ftfy

fixed_text = ftfy.fix_encoding(text_with_mojibake)
```

--------------------------------

### Decode HTML Entities Outside HTML

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/index.rst

Shows ftfy's ability to correctly decode HTML entities that appear in plain text contexts, even when they are not properly formed or capitalized according to standards.

```python
import ftfy

# by the HTML 5 standard, only 'P&Eacute;REZ' is acceptable
ftfy.fix_text('P&EACUTE;REZ')
```

--------------------------------

### ftfy.fixes Module Functions

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/fixes.rst

This section details the various text-fixing functions available in the ftfy.fixes module. These functions address specific text normalization tasks, such as decoding escaped characters, fixing inconsistent UTF-8 encoding, handling control characters, and uncurling quotes.

```APIDOC
ftfy.fixes:
  decode_escapes(text: str) -> str
    Decodes escape sequences within a string.

  decode_inconsistent_utf8(text: str) -> str
    Fixes strings with inconsistent or incorrect UTF-8 encoding.

  fix_c1_controls(text: str) -> str
    Replaces C1 control characters with appropriate replacements.

  fix_character_width(text: str) -> str
    Normalizes character widths, particularly for East Asian characters.

  fix_latin_ligatures(text: str) -> str
    Replaces common Latin ligatures with their constituent characters.

  fix_line_breaks(text: str) -> str
    Normalizes different types of line break characters.

  fix_surrogates(text: str) -> str
    Handles and potentially replaces surrogate characters.

  remove_control_chars(text: str) -> str
    Removes common control characters from a string.

  remove_terminal_escapes(text: str) -> str
    Removes ANSI escape codes often found in terminal output.

  replace_lossy_sequences(text: str) -> str
    Replaces sequences that may have been corrupted during encoding/decoding.

  restore_byte_a0(text: str) -> str
    Restores the correct representation of the non-breaking space character (U+00A0).

  uncurl_quotes(text: str) -> str
    Replaces curly quotation marks with standard straight quotes.

  unescape_html(text: str) -> str
    Unescapes HTML entities within a string.
```

--------------------------------

### Fix Mascot Text

Source: https://github.com/rspeer/python-ftfy/blob/main/notebook/ftfy talk.ipynb

Fixes a string containing mojibake that represents a character.

```python
ftfy.fix_text("(Ãxa0Â¸â€¡'ÃŒâ‚¬Ã¢Å’Â£'ÃŒÂx81)Ãxa0Â¸â€¡")
```

--------------------------------

### Guess Bytes Encoding

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/detect.rst

This function attempts to guess the encoding of a byte sequence. It relies on strong signals like UTF-16 byte-order marks or successful UTF-8 decoding. It cannot guess non-Unicode CJK encodings such as Shift-JIS or Big5.

```python
def guess_bytes(bytes):
    """Guess the encoding of a byte sequence.

    This function attempts to be less terrible than other byte-encoding-guessers
    in common cases. Instead of using probabilistic heuristics, it picks up
    on very strong signals like "having a UTF-16 byte-order mark" or
    "decoding successfully as UTF-8".

    This function won't solve everything. It can't solve everything. In particular,
    it has no capacity to guess non-Unicode CJK encodings such as Shift-JIS
    or Big5.
    """
    pass
```

--------------------------------

### ftfy.fix_text Function

Source: https://github.com/rspeer/python-ftfy/blob/main/docs/explain.rst

The primary function for fixing text encoding issues. It processes the input string, applies all possible fixes, and returns the cleaned text. It operates on lines of text independently.

```python
from ftfy import fix_text

text_with_errors = "This text has some encoding issues like \xe2\x80\x93 a dash."
fixed_text = fix_text(text_with_errors)
print(fixed_text)
# Example Output: This text has some encoding issues like – a dash.
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.