### Get Tree Builders for DOM Implementations in html5lib Source: https://context7.com/html5lib/html5lib-python/llms.txt Demonstrates how to retrieve tree builder classes for different DOM implementations like ElementTree (etree), DOM minidom, and lxml. It shows instantiation of `html5lib.HTMLParser` with each builder and basic usage examples for parsing and interacting with the resulting DOM tree. It also includes error handling for the lxml builder if the library is not installed and shows how to specify an implementation module explicitly. ```python import html5lib import xml.etree.ElementTree as ET # Get default ElementTree builder (fast, stdlib) ETreeBuilder = html5lib.getTreeBuilder("etree") parser_etree = html5lib.HTMLParser(tree=ETreeBuilder) doc_etree = parser_etree.parse("Content") # Get DOM minidom builder (standard DOM API) DOMBuilder = html5lib.getTreeBuilder("dom") parser_dom = html5lib.HTMLParser(tree=DOMBuilder) doc_dom = parser_dom.parse("Content") # Use DOM methods print(doc_dom.getElementsByTagName('body')[0].toxml()) # Get lxml builder (CPython only, full XPath/XSLT support) try: LxmlBuilder = html5lib.getTreeBuilder("lxml") parser_lxml = html5lib.HTMLParser(tree=LxmlBuilder) doc_lxml = parser_lxml.parse("

Text

") # Use lxml features paragraphs = doc_lxml.xpath("//p[@class='test']") print(paragraphs[0].text) # 'Text' except ImportError: print("lxml not available") # Specify implementation module explicitly ETreeBuilder = html5lib.getTreeBuilder("etree", implementation=ET) ``` -------------------------------- ### Interface Definition Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Demonstrates the syntax for defining an interface in an IDL (Interface Definition Language) format, commonly used for web APIs. This example shows a simple interface named 'Example' with a comment. ```idl interface Example { // this is an IDL definition }; ``` -------------------------------- ### CSS Fragment Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Shows a basic example of CSS (Cascading Style Sheets) code, typically used for styling web pages. This fragment is enclosed in C-style comments. ```css /* this is a CSS fragment */ ``` -------------------------------- ### Install html5lib (Bash) Source: https://github.com/html5lib/html5lib-python/blob/master/README.rst Installs the html5lib Python package using pip. This command is executed in a bash shell. ```bash pip install html5lib ``` -------------------------------- ### Method Call Syntax Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Illustrates the general syntax for calling a method on an object, including optional arguments. This pattern is common in many programming languages for object-oriented interactions. ```pseudocode variable = object . `method`( [ optionalArgument ] ) ``` -------------------------------- ### HTML Custom Attribute Extension Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Demonstrates how to use vendor-specific custom attributes for experimental features in HTML. Attributes starting with 'x-' are reserved for user agent use and are guaranteed not to conflict with future HTML specifications. ```html

This smells of lemons!

``` -------------------------------- ### JavaScript DOM Manipulation Examples Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Demonstrates how to interact with the DOM using JavaScript. Examples include accessing a link element by its index and modifying its 'href' attribute, protocol, or using setAttribute. ```javascript var a = document.links[0]; // obtain the first link in the document a.href = 'sample.html'; // change the destination URL of the link a.protocol = 'https'; // change just the scheme part of the URL a.setAttribute('href', 'http://example.com/'); // change the content attribute directly ``` -------------------------------- ### Algorithm Step Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Represents a step within an algorithm, specifically indicating steps within synchronous sections. The '⌛' symbol is used to denote these synchronous operations. ```pseudocode In an algorithm, steps in synchronous sections are marked with ⌛. ``` -------------------------------- ### Setup default policy (JavaScript) Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/wpt/random/worker-constructor.https.html Sets up a default Trusted Types policy named 'default'. This policy is configured to transform script URLs by replacing 'potato' with 'https', allowing certain URLs that might otherwise be blocked. ```javascript promise_test(t => { trustedTypes.createPolicy("default", { createScriptURL: s => s.replace("potato", "https") }); return Promise.resolve(); }, "Setup default policy."); ``` -------------------------------- ### CSS Grid auto-fill Rows Syntax Examples Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/wpt/weighted/grid-auto-fill-rows-001.html Demonstrates various CSS Grid `repeat(auto-fill, )` syntaxes for defining row tracks. These examples showcase how to combine `auto-fill` with fixed track sizes, percentages, `minmax()`, and named grid lines to control row layout. ```css .grid { border: 2px solid magenta; height: 200px; width: 25px; align-content: start; grid-auto-rows: 157px; grid-auto-columns: 25px; float: left; position: relative; margin-right: 2px; } .gridOnlyAutoRepeat { grid-template-rows: repeat(auto-fill, 30px [autobar]); } .gridAutoRepeatAndFixedBefore { grid-template-rows: 10px [foo] 20% [bar] repeat(auto-fill, [autofoo] 35px); } .gridAutoRepeatAndFixedAfter { grid-template-rows: repeat(auto-fill, [first] 30px [last]) [foo] minmax(60px, 80px) [bar] minmax(45px, max-content); } .gridAutoRepeatAndFixed { grid-template-rows: [start] repeat(2, 50px [a]) [middle] repeat(auto-fill, [autofoo] 15px [autobar]) minmax(5%, 10%) [end]; } .gridMultipleNames { grid-template-rows: [start] 20px [foo] 50% repeat(auto-fill, [bar] 20px [start foo]) [foo] 10% [end bar]; } .gridMultipleTracks { grid-template-rows: [start] 20px repeat(auto-fill, [a] 2em [b c] 10% [d]) [e] minmax(75px, 1fr) [last]; } .item { background-color: blue; } .item:nth-child(2) { background: green; } .item:nth-child(3) { background: orange; } .gap { grid-row-gap: 20px; } ``` -------------------------------- ### JavaScript Event Extension Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Illustrates how to prefix custom event types with a vendor-specific string to prevent naming conflicts with future specification additions. This approach ensures that experimental event handling remains distinct. ```javascript if (user_agent == "Pleasold") { // User agent 'Pleasold' adds a 'pleasoldgoingup' event element.addEventListener('pleasoldgoingup', handleElevatorEvent); element.setAttribute('onpleasoldgoingup', 'handleElevatorEvent()'); } ``` -------------------------------- ### JavaScript DOM Extension Example Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Shows an example of adding a vendor-specific IDL attribute to a DOM interface to avoid conflicts with future HTML specifications. The attribute name in IDL drops the 'x-' prefix from the content attribute. ```javascript // Example: Implementation might add a 'fooTypeTime' attribute // corresponding to a content attribute like 'x-foo-type-time' // In IDL: interface HTMLControlElement { // ... other members ... attribute long fooTypeTime; }; // In HTML: // ``` -------------------------------- ### HTML Attribute Examples Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html Illustrates various ways to define attributes within HTML tags, including unquoted values, empty attributes, and attributes with quoted values (single and double quotes). ```html simple ``` -------------------------------- ### Get Tree Walker for Traversal in html5lib Source: https://context7.com/html5lib/html5lib-python/llms.txt This snippet demonstrates how to obtain a tree walker using `html5lib.getTreeWalker` for a specific backend (e.g., 'etree'). It shows parsing an HTML document, creating a walker instance from the parsed tree, and then iterating over the walker to process the HTML as a stream of tokens. The output of each token, represented as a dictionary, is printed to the console. ```python import html5lib # Parse document tree = html5lib.parse(""" Test

Heading

Paragraph

""") # Get walker class and create instance Walker = html5lib.getTreeWalker("etree") walker = Walker(tree) # Iterate over tokens for token in walker: print(token) # Output tokens (dict format): ``` -------------------------------- ### Basic HTML Document with CSS Styling Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/html.html This snippet demonstrates a fundamental HTML5 document structure. It includes a title, a header, and a paragraph, with inline CSS to set the background to navy and text color to yellow. This example illustrates how to apply basic styling directly within the HTML. ```html Sample styled page

Sample styled page

This page is just a demo.

``` -------------------------------- ### Parsing HTML with a specific tree builder class in Python Source: https://github.com/html5lib/html5lib-python/blob/master/doc/movingparts.rst Shows how to explicitly get a tree builder class by name using html5lib.getTreeBuilder and then pass it to the HTMLParser constructor. This allows for more control over the tree construction process, for example, using the 'dom' tree builder. ```python import html5lib TreeBuilder = html5lib.getTreeBuilder("dom") parser = html5lib.HTMLParser(tree=TreeBuilder) minidom_document = parser.parse("

Hello World!") ``` -------------------------------- ### Sanitizing HTML using a filter in Python Source: https://github.com/html5lib/html5lib-python/blob/master/doc/movingparts.rst Demonstrates how to sanitize an HTML string to remove unsafe markup, like script tags, using the sanitizer.Filter from html5lib.filters. This example parses the HTML into a 'dom' tree, gets a walker for it, creates a filtered stream, and implies the result would be safe HTML. ```python import html5lib from html5lib.filters import sanitizer dom = html5lib.parse("

``` -------------------------------- ### Serialize HTML Tree to String with html5lib Source: https://context7.com/html5lib/html5lib-python/llms.txt This snippet illustrates how to serialize an HTML DOM tree back into an HTML string using `html5lib.serialize`. It covers basic serialization, serialization with a specified encoding (returning bytes), and serialization with various customizable options such as omitting optional tags, minimizing boolean attributes, quoting attribute values, and sorting attributes alphabetically. It also shows how to serialize an lxml tree and use trailing solidus for void elements. ```python import html5lib # Parse document tree = html5lib.parse("

Hello

") # Basic serialization (returns string) html_string = html5lib.serialize(tree, tree="etree") print(html_string) # Output:

Hello

# Serialize with encoding (returns bytes) html_bytes = html5lib.serialize(tree, tree="etree", encoding="utf-8") print(type(html_bytes)) # # Serialize with options html_formatted = html5lib.serialize( tree, tree="etree", omit_optional_tags=False, # Keep all tags minimize_boolean_attributes=False, # disabled="disabled" quote_attr_values="always", # Always quote attributes alphabetical_attributes=True # Sort attributes ) # Serialize lxml tree lxml_tree = html5lib.parse("

Test

", treebuilder="lxml") html_output = html5lib.serialize(lxml_tree, tree="lxml") # Serialize with trailing solidus for void elements html_xhtml = html5lib.serialize( tree, tree="etree", use_trailing_solidus=True, #
space_before_trailing_solidus=True ) ``` -------------------------------- ### Create Worker via string with default policy (JavaScript) Source: https://github.com/html5lib/html5lib-python/blob/master/benchmarks/data/wpt/random/worker-constructor.https.html Tests Worker creation using a string URL after a default Trusted Types policy has been set up. The default policy is expected to sanitize the URL, allowing the Worker to be created successfully. ```javascript const default_url = "support/WorkerGlobalScope-importScripts.potato.js" promise_test(t => { new Worker(default_url); return Promise.resolve(); }, "Create Worker via string with default policy."); ``` -------------------------------- ### Configure HTMLSerializer for Custom Output in html5lib Source: https://context7.com/html5lib/html5lib-python/llms.txt This snippet demonstrates creating and using a custom `HTMLSerializer` instance from `html5lib.serializer` to gain fine-grained control over HTML output formatting. It shows how to configure various options like boolean attribute minimization, attribute quoting, tag omissions, trailing solidus usage, entity resolution, attribute sorting, and whitespace handling. It also covers serialization with encoding and streaming serialization that yields chunks. ```python from html5lib import parse, getTreeWalker from html5lib.serializer import HTMLSerializer tree = parse(""" Photo

Content

""") # Create serializer with custom options serializer = HTMLSerializer( # Boolean attributes minimize_boolean_attributes=True, # disabled vs disabled="disabled" # Attribute quoting quote_attr_values="legacy", # "legacy", "spec", or "always" quote_char='"', # or "'" use_best_quote_char=True, # Auto-select " or ' based on content escape_lt_in_attrs=False, # Escape < in attribute values # Tag options omit_optional_tags=False, # Don't omit , , etc. use_trailing_solidus=False, #
vs
space_before_trailing_solidus=True, #
vs
# Content options escape_rcdata=False, # Escape in