### Install htmlquery Package Source: https://github.com/antchfx/htmlquery/blob/master/README.md Use go get to install the htmlquery package. ```go go get github.com/antchfx/htmlquery ``` -------------------------------- ### Quick Start Example: Bing Search Results Source: https://github.com/antchfx/htmlquery/blob/master/README.md A comprehensive example demonstrating loading a URL, querying for news items, and extracting titles and links from search results. ```go func main() { doc, err := htmlquery.LoadURL("https://www.bing.com/search?q=golang") if err != nil { panic(err) } // Find all news item. list, err := htmlquery.QueryAll(doc, "//ol/li") if err != nil { panic(err) } for i, n := range list { a := htmlquery.FindOne(n, "//a") if a != nil { fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href")) } } } ``` -------------------------------- ### Optimized Node Searching with Pre-compiled XPath (Go) Source: https://context7.com/antchfx/htmlquery/llms.txt Utilize `QuerySelectorAll` with a pre-compiled `*xpath.Expr` for maximum performance in tight loops. This bypasses string parsing and the LRU cache, making it ideal for repeated evaluations of the same XPath expression. The example shows compiling an expression once and reusing it across multiple HTML documents. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" "github.com/antchfx/xpath" ) func main() { // Compile once, reuse many times expr, err := xpath.Compile("//a[@href]") if err != nil { panic(err) } pages := []string{ `One`, `Two Three`, } for _, page := range pages { doc, _ := htmlquery.Parse(strings.NewReader(page)) nodes := htmlquery.QuerySelectorAll(doc, expr) for _, n := range nodes { fmt.Printf("%s -> %s\n", htmlquery.InnerText(n), htmlquery.SelectAttr(n, "href")) } } } ``` -------------------------------- ### Find Nodes in HTML Document (Go) Source: https://context7.com/antchfx/htmlquery/llms.txt Use `Find` for panic-on-error node searching with XPath. It's suitable for hardcoded expressions where invalid syntax indicates a programming error. This example demonstrates finding all table rows and then cells within each row. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" ) func main() { html := `

A	B
C	D

` doc, _ := htmlquery.Parse(strings.NewReader(html)) img := htmlquery.FindOne(doc, "//img") fmt.Println(htmlquery.SelectAttr(img, "src")) // Output: photo.jpg fmt.Println(htmlquery.SelectAttr(img, "alt")) // Output: A mountain fmt.Println(htmlquery.SelectAttr(img, "missing")) // Output: (empty string) // Attribute nodes via XPath srcNode := htmlquery.FindOne(doc, "//img/@src") fmt.Println(htmlquery.InnerText(srcNode)) // Output: photo.jpg } ``` -------------------------------- ### Extracting Inner Text from HTML Nodes (Go) Source: https://context7.com/antchfx/htmlquery/llms.txt The `InnerText` function retrieves the combined text content of a node and its descendants, stripping all tags and ignoring comments. This is useful for getting clean text from HTML elements. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" ) func main() { html := `

Hello, World!

` doc, _ := htmlquery.Parse(strings.NewReader(html)) div := htmlquery.FindOne(doc, "//div") fmt.Println(htmlquery.InnerText(div)) // Output: Hello, World! } ``` -------------------------------- ### Load and Query HTML from a Local File Source: https://context7.com/antchfx/htmlquery/llms.txt Opens an HTML file from the local filesystem and parses it. Finds and prints the text content of all paragraph elements. Requires importing the htmlquery package. ```go package main import ( "fmt" "github.com/antchfx/htmlquery" ) func main() { doc, err := htmlquery.LoadDoc("/var/www/html/index.html") if err != nil { fmt.Printf("failed to load file: %v\n", err) return } // Find all paragraphs for i, p := range htmlquery.Find(doc, "//p") { fmt.Printf("p[%d]: %s\n", i+1, htmlquery.InnerText(p)) } } ``` -------------------------------- ### Load HTML Document from File Path Source: https://github.com/antchfx/htmlquery/blob/master/README.md Load an HTML document from a local file path. ```go filePath := "/home/user/sample.html" doc, err := htmlquery.LoadDoc(filePath) ``` -------------------------------- ### Load HTML Document from URL Source: https://github.com/antchfx/htmlquery/blob/master/README.md Load an HTML document directly from a given URL. ```go doc, err := htmlquery.LoadURL("http://example.com/") ``` -------------------------------- ### Load HTML from File Source: https://context7.com/antchfx/htmlquery/llms.txt Opens an HTML file from the local filesystem and parses it into a node tree. ```APIDOC ## LoadDoc(path string) (*html.Node, error) ### Description Opens an HTML file from the local filesystem and parses it into a node tree. ### Method ```go LoadDoc(path string) (*html.Node, error) ``` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **path** (string) - The path to the HTML file. ### Request Example ```go package main import ( "fmt" "github.com/antchfx/htmlquery" ) func main() { doc, err := htmlquery.LoadDoc("/var/www/html/index.html") if err != nil { fmt.Printf("failed to load file: %v\n", err) return } // Find all paragraphs for i, p := range htmlquery.Find(doc, "//p") { fmt.Printf("p[%d]: %s\n", i+1, htmlquery.InnerText(p)) } } ``` ### Response #### Success Response (200) - **doc** (*html.Node) - The root node of the parsed HTML document. - **err** (error) - An error if opening or parsing the file fails. #### Response Example None provided. ``` -------------------------------- ### Load and Query HTML from a URL Source: https://context7.com/antchfx/htmlquery/llms.txt Fetches an HTML page from a URL, handling compression and charsets. Extracts the page title and lists all links with their href attributes. Requires importing the htmlquery package. ```go package main import ( "fmt" "github.com/antchfx/htmlquery" ) func main() { doc, err := htmlquery.LoadURL("https://example.com/") if err != nil { fmt.Printf("failed to load URL: %v\n", err) return } // Extract the page title title := htmlquery.FindOne(doc, "//title") if title != nil { fmt.Println("Page title:", htmlquery.InnerText(title)) } // List all links for _, a := range htmlquery.Find(doc, "//a[@href]") { fmt.Printf("Link: %s -> %s\n", htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"), ) } } ``` -------------------------------- ### CreateXPathNavigator Source: https://context7.com/antchfx/htmlquery/llms.txt Creates an `xpath.NodeNavigator` from an `*html.Node` tree, enabling advanced XPath evaluations. ```APIDOC ## CreateXPathNavigator(top *html.Node) *NodeNavigator ### Description Creates an `xpath.NodeNavigator` backed by an `*html.Node` tree, enabling direct use of the `github.com/antchfx/xpath` evaluation API for advanced use cases such as XPath `Evaluate()` (returning numbers, booleans, or strings rather than node-sets). ### Parameters - **top** (*html.Node) - The root node of the HTML document to create a navigator for. ### Return Value *NodeNavigator - An XPath navigator for the given HTML node tree. ``` -------------------------------- ### Create XPath Navigator for Advanced Evaluation Source: https://context7.com/antchfx/htmlquery/llms.txt Creates an XPath navigator from an HTML node tree for advanced XPath evaluations, such as returning numbers, booleans, or strings using `xpath.Evaluate()`. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" "github.com/antchfx/xpath" ) func main() { html := `

` doc, _ := htmlquery.Parse(strings.NewReader(html)) // Count nodes using XPath Evaluate (returns float64 for count()) expr, _ := xpath.Compile("count(//li)") nav := htmlquery.CreateXPathNavigator(doc) count := expr.Evaluate(nav).(float64) fmt.Printf("Number of

elements: %.0f\n", count) // Output: 3 // Check a boolean expression exprBool, _ := xpath.Compile("boolean(//li[text()='B'])") result := exprBool.Evaluate(htmlquery.CreateXPathNavigator(doc)).(bool) fmt.Println("Contains 'B':", result) // Output: true } ``` -------------------------------- ### Load HTML from URL Source: https://context7.com/antchfx/htmlquery/llms.txt Fetches an HTML page over HTTP, automatically handling gzip and deflate Content-Encoding, charset detection, and response body cleanup. ```APIDOC ## LoadURL(url string) (*html.Node, error) ### Description Fetches an HTML page over HTTP, automatically handling gzip and deflate Content-Encoding, charset detection, and response body cleanup. ### Method ```go LoadURL(url string) (*html.Node, error) ``` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **url** (string) - The URL of the HTML page to fetch. ### Request Example ```go package main import ( "fmt" "github.com/antchfx/htmlquery" ) func main() { doc, err := htmlquery.LoadURL("https://example.com/") if err != nil { fmt.Printf("failed to load URL: %v\n", err) return } // Extract the page title title := htmlquery.FindOne(doc, "//title") if title != nil { fmt.Println("Page title:", htmlquery.InnerText(title)) } // List all links for _, a := range htmlquery.Find(doc, "//a[@href]") { fmt.Printf("Link: %s -> %s\n", htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"), ) } } ``` ### Response #### Success Response (200) - **doc** (*html.Node) - The root node of the parsed HTML document. - **err** (error) - An error if fetching or parsing fails. #### Response Example None provided. ``` -------------------------------- ### Query All Elements with XPath Source: https://github.com/antchfx/htmlquery/blob/master/README.md Execute an XPath query to find all matching nodes in an HTML document. Panics if the XPath expression is invalid. ```go nodes, err := htmlquery.QueryAll(doc, "//a") if err != nil { panic(`not a valid XPath expression.`) } ``` -------------------------------- ### Load HTML Document from String Reader Source: https://github.com/antchfx/htmlquery/blob/master/README.md Parse an HTML document from an io.Reader, such as a string reader. ```go s := `....` doc, err := htmlquery.Parse(strings.NewReader(s)) ``` -------------------------------- ### Find Single Node and Attributes (Go) Source: https://context7.com/antchfx/htmlquery/llms.txt Use `FindOne` to retrieve the first matching node or `nil` if not found. This function is useful for accessing specific elements like the `` tag or retrieving attribute values. It also handles cases where the target node is missing. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" ) func main() { html := `Test` doc, _ := htmlquery.Parse(strings.NewReader(html)) htmlNode := htmlquery.FindOne(doc, "//html") fmt.Println(htmlquery.SelectAttr(htmlNode, "lang")) // Output: en title := htmlquery.FindOne(doc, "//title") fmt.Println(htmlquery.InnerText(title)) // Output: Test missing := htmlquery.FindOne(doc, "//article") fmt.Println(missing) // Output: } ``` -------------------------------- ### Find All Anchor Elements Source: https://github.com/antchfx/htmlquery/blob/master/README.md Find all anchor () elements within the loaded HTML document. ```go list := htmlquery.Find(doc, "//a") ``` -------------------------------- ### Configure HTML Query Cache Source: https://context7.com/antchfx/htmlquery/llms.txt Control the built-in LRU XPath expression cache using `SelectorCacheMaxEntries` and `DisableSelectorCache`. Caching is enabled by default with a capacity of 50 entries and is safe for concurrent use. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" ) func main() { html := `

Hello

` doc, _ := htmlquery.Parse(strings.NewReader(html)) // Increase cache capacity for workloads with many distinct XPath expressions htmlquery.SelectorCacheMaxEntries = 200 node, _ := htmlquery.Query(doc, "//p") fmt.Println(htmlquery.InnerText(node)) // Output: Hello // Disable the cache (e.g., for testing or one-shot scripts) htmlquery.DisableSelectorCache = true node2, _ := htmlquery.Query(doc, "//p") fmt.Println(htmlquery.InnerText(node2)) // Output: Hello // Re-enable htmlquery.DisableSelectorCache = false } ``` -------------------------------- ### Parse HTML from a Reader Source: https://context7.com/antchfx/htmlquery/llms.txt Parses HTML from any io.Reader. Useful when HTML is already in memory or from a stream. Requires importing the htmlquery package. ```go package main import ( "fmt" "strings" "github.com/antchfx/htmlquery" ) func main() { s := `