### Serialize Element to HTML with ElementRef::html and inner_html Source: https://context7.com/rust-scraper/scraper/llms.txt Use `html()` to get the full HTML string of an element, including the element itself. Use `inner_html()` to get only the HTML content of its children. These methods are useful for serializing parts of the DOM back into strings. ```rust use scraper::{Html, Selector}; let html = rлід
Hello, world!
Hello, world!
Hello, world!
// Select nested element let p_sel = Selector::parse("p").unwrap(); let p = document.select(&p_sel).next().unwrap(); println!("Paragraph html: {}", p.html()); // Output: Paragraph html:Hello, world!
println!("Paragraph inner: {}", p.inner_html()); // Output: Paragraph inner: Hello, world! ``` -------------------------------- ### Serialize HTML and Inner HTML in Rust Source: https://github.com/rust-scraper/scraper/blob/master/scraper/README.md Get the HTML representation of a selected element using `.html()` or its inner HTML content using `.inner_html()`. Useful for extracting or displaying specific parts of the DOM. ```rust use scraper::{Html, Selector}; let fragment = Html::parse_fragment("
"#;
let document = Html::parse_fragment(html);
// Get href from link
let link_sel = Selector::parse("a").unwrap();
let link = document.select(&link_sel).next().unwrap();
println!("URL: {}", link.attr("href").unwrap_or("no href"));
// Output: URL: https://example.com
// Get multiple attributes from image
let img_sel = Selector::parse("img").unwrap();
let img = document.select(&img_sel).next().unwrap();
println!("Image src: {}, alt: {}",
img.attr("src").unwrap_or(""),
img.attr("alt").unwrap_or("no alt"));
// Output: Image src: /image.png, alt: Description
// Check for boolean attribute
let input_sel = Selector::parse("input").unwrap();
let input = document.select(&input_sel).next().unwrap();
let is_required = input.attr("required").is_some();
println!("Input is required: {}", is_required);
// Output: Input is required: true
```
--------------------------------
### Extract Text Content with ElementRef::text
Source: https://context7.com/rust-scraper/scraper/llms.txt
Use `ElementRef::text()` to get an iterator over all text nodes within an element and its descendants. This method includes text from nested elements. It can be collected into a Vec for individual segments or joined into a single String.
```rust
use scraper::{Html, Selector};
let html = rлідHello, bold and italic text!
"#; let document = Html::parse_fragment(html); let selector = Selector::parse("p").unwrap(); let paragraph = document.select(&selector).next().unwrap(); // Collect all text segments let text_parts: Vec<_> = paragraph.text().collect(); println!("Text parts: {:?}", text_parts); // Output: Text parts: ["Hello, ", "bold", " and ", "italic", " text!"] // Join into single string let full_text: String = paragraph.text().collect(); println!("Full text: {}", full_text); // Output: Full text: Hello, bold and italic text! ``` ```rust use scraper::{Html, Selector}; let html = rлід "#; let document = Html::parse_fragment(html); let article_sel = Selector::parse(".article").unwrap(); let article = document.select(&article_sel).next().unwrap(); let all_text: String = article.text() .map(|s| s.trim()) .filter(|s| !s.is_empty()) .collect::This is a paragraph.
"#; let document = Html::parse_document(html); // Access the root element let root = document.root_element(); println!("Root element: {}", root.value().name()); // Output: Root element: html // Serialize the entire document back to HTML let serialized = document.html(); println!("{}", serialized); // Output: ...... ``` -------------------------------- ### Selector::parse Source: https://context7.com/rust-scraper/scraper/llms.txt Parses a CSS selector string into a reusable Selector object. ```APIDOC ## Selector::parse ### Description Parses a CSS selector string into a reusable Selector object. Supports standard CSS selector syntax including element, class, ID, attribute selectors, combinators, and pseudo-classes. ### Parameters #### Request Body - **selector_string** (string) - Required - The CSS selector string to parse. ### Request Example "div.container#main" ### Response #### Success Response (200) - **selector** (Selector) - A parsed Selector object. #### Error Response (400) - **error** (String) - Returns an error if the selector syntax is invalid. ``` -------------------------------- ### HtmlTreeSink Source: https://context7.com/rust-scraper/scraper/llms.txt Enables DOM manipulation such as removing, reparenting, and modifying nodes. ```APIDOC ## HtmlTreeSink ### Description The `HtmlTreeSink` wrapper enables DOM manipulation using the `TreeSink` trait from html5ever. This allows removing, reparenting, and modifying nodes. ### Method Struct wrapper for DOM manipulation ### Response - **Html** - The modified document returned after calling `finish()`. ``` -------------------------------- ### Html::select - Select Elements from Document Source: https://context7.com/rust-scraper/scraper/llms.txt Returns an iterator over all elements matching a CSS selector. Use this to find all occurrences of specific tags or classes within the entire document. ```rust use scraper::{Html, Selector}; let html = r#"Keep me
Also keep
"; let selector = Selector::parse(".remove").unwrap(); // Parse and collect node IDs to remove let mut document = Html::parse_document(html); let node_ids: Vec<_> = document.select(&selector).map(|x| x.id()).collect(); // Wrap in TreeSink for manipulation let tree = HtmlTreeSink::new(document); // Remove nodes for id in node_ids { tree.remove_from_parent(&id); } // Finish manipulation and get back the Html let document = tree.finish(); println!("{}", document.html()); // Output:Keep me
Also keep
``` -------------------------------- ### Select Descendent Elements in Rust Source: https://github.com/rust-scraper/scraper/blob/master/scraper/README.md Query for elements that are descendants of another selected element. This allows for more specific selections within a particular part of the DOM. ```rust use scraper::{Html, Selector}; let html = r#"Paragraph in box
REMOVE ME
"; let selector = Selector::parse(".hello").unwrap(); let mut document = Html::parse_document(html); let node_ids: Vec<_> = document.select(&selector).map(|x| x.id()).collect(); let tree = HtmlTreeSink::new(document); for id in node_ids { tree.remove_from_parent(&id); } let document = tree.finish(); assert_eq!(document.html(), "hello"); ``` -------------------------------- ### Generic Element Selection with Selectable Trait in Rust Source: https://context7.com/rust-scraper/scraper/llms.txt The `Selectable` trait enables generic functions to work with both `Html` documents and `ElementRef` instances, allowing for reusable selection logic across different scopes within the HTML structure. ```rust use scraper::{Html, Selector}; use scraper::selectable::Selectable; use scraper::element_ref::ElementRef; // Generic function that works with Html or ElementRef fn extract_links<'a, S>(selectable: S) -> Vec