### Composer Autoloader with QueryPath Functions Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Combines Composer's autoloader with QueryPath's own autoloader (qp.php) to enable both the Object-Oriented API and procedural functions like htmlqp(). This provides the most comprehensive access to QueryPath's features. ```php text(); // This works because qp.php was imported print htmlqp('http://technosophos.com', 'title')->text(); ?> ``` -------------------------------- ### QueryPath Autoloader Integration Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Includes QueryPath's autoloader to enable the use of QueryPath classes and functions. This method ensures both the OO API and procedural functions like htmlqp() are available. ```php text(); print htmlqp('http://technosophos.com', 'title')->text(); ?> ``` -------------------------------- ### QueryPath Document Parsing Methods Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Demonstrates the different methods for parsing XML and HTML documents with QueryPath, including auto-detection. ```php QueryPath::withXML(): This *only* handles XML documents. If you give it an HTML document, it will attempt to force XML parsing on that document. htmlqp(), QueryPath::withHTML(): This will force QueryPath to use the HTML parser. it will also make a number of adjustments to QueryPath to accommodate common HTML breakages. qp(), QueryPath::with(): This will attempt to guess whether the document is XML or HTML. In general, it favors XML slightly. Guessing may be done by… - File extension - XML declaration - The suggestions made by any options passed into the document ``` -------------------------------- ### QueryPath Character Encoding Handling Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Explains QueryPath's automatic character encoding conversion and how to manually adjust it. ```php // QueryPath attempts to convert documents automatically using PHP's internal character detection libraries. // You can adjust this feature manually by passing in language settings in the $options array. // See the documentation on qp() for details. ``` -------------------------------- ### Composer Autoloader with QueryPath OO API Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Uses Composer's autoloader for PSR-0 compatibility, allowing the use of QueryPath's Object-Oriented API (e.g., QueryPath::withHTML()). Note that procedural functions like htmlqp() are not available with this method alone. ```php text(); // THIS DOESN'T WORK! // print htmlqp('http://technosophos.com', 'title')->text(); ?> ``` -------------------------------- ### QueryPath Method: text() Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Illustrates the use of the `text()` method in QueryPath to retrieve the text content of the first selected element. If no element is found, it returns an empty string. ```php text(); ?> ``` -------------------------------- ### Basic QueryPath HTML Parsing and Text Extraction Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md Demonstrates loading an HTML document from a URL, selecting the 'title' tag using a CSS selector, and extracting its text content. This showcases the core functionality of QueryPath for web scraping and data extraction. ```php text(); ?> ``` -------------------------------- ### Install QueryPath Source: https://github.com/gravitypdf/querypath/blob/main/README.md Installs the QueryPath library using Composer. This is the recommended method for managing dependencies. ```bash composer require gravitypdf/querypath ``` -------------------------------- ### Encoding Conversion Example Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library Shows how to convert the encoding of an HTML page to UTF-8 using the `htmlqp` function, and then select specific child elements. ```php htmlqp($html, 'body', array('convert_to_encoding' => 'utf-8'))->children('p.a'); ``` -------------------------------- ### Chained Method Manipulation Source: https://github.com/gravitypdf/querypath/blob/main/README.md Illustrates the power of QueryPath's chained methods for manipulating HTML documents. This example creates an HTML5 document, adds text, appends elements, modifies attributes and styles, and outputs the result. ```php text('Example of QueryPath.') // Now look for the element ->top('body') // Inside the body, add a title and paragraph. ->append('

This is a test page

Test text

') // Now we select the paragraph we just created inside the body ->children('p') // Add a 'class="some-class"' attribute to the paragraph ->attr('class', 'some-class') // And add a style attribute, too, setting the background color. ->css('background-color', '#eee') // Now go back to the paragraph again ->parent() // Before the paragraph and the title, add an empty table. ->prepend('
') // Now let's go to the table... ->top('#my-table') // Add a couple of empty rows ->append('') // select the rows (both at once) ->children() // Add a CSS class to both rows ->addClass('table-row') // Now just get the first row (at position 0) ->eq(0) // Add a table header in the first row ->append('This is the header') // Now go to the next row ->next() // Add some data to this row ->append('This is the data') // Write it all out as HTML ->writeHTML5(); } catch (\QueryPath\Exception $e) { // Handle error } ``` -------------------------------- ### Find and Iterate Nodes Source: https://github.com/gravitypdf/querypath/blob/main/README.md Demonstrates how to find specific nodes within a document using the `find()` method and iterate over the results. This example finds all `
  • ` elements and prints their text content. ```php
  • Foo
  • Bar
  • FooBar
  • '; $qp = html5qp($html); foreach ($qp->find('li') as $li) { echo $li->text() .'
    '; } } catch (\QueryPath\Exception $e) { // Handle error } ``` -------------------------------- ### Basic HTML Parsing and Link Extraction Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library Demonstrates how to create a QueryPath object from an HTML string, find all anchor () tags, and extract their text content. It also shows how to quickly get the text of the title tag. ```php //Create a new QueryPath object and supply it with source $html page $qp = QueryPath::withHTML($html); // find desired html nodes $linkNodes = $qp->find('a') //Loop through all the links in the page foreach ($linkNodes as $li) { echo $li->text() ; } // Quickly get title text $titleText = $qp->find('title')->text(); ``` -------------------------------- ### Data Extraction Functions Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library Illustrates common functions used to extract data from matched HTML nodes. These include getting text content, attribute values, and HTML content. Note that if multiple nodes are matched, these functions return data from the first node. ```php text() // Get combined text contents of each element in the set of matched elements, including their descendants. attr('src') // Get value of an attribute with a given name. html() // Get HTML contents of matching node innerHtml() // Get the HTML contents INSIDE the node. ``` -------------------------------- ### Running QueryPath Tests Source: https://github.com/gravitypdf/querypath/blob/main/README.md Commands to run the linter and PHPUnit tests for the QueryPath project. These are essential for ensuring code quality and verifying bug fixes before submitting contributions. ```bash composer run lint ``` ```bash vendor/bin/phpunit ``` -------------------------------- ### QueryPath API Documentation Source: https://github.com/gravitypdf/querypath/blob/main/examples/testGrid.html Comprehensive API documentation for QueryPath, covering its classes, methods, and functionalities for XML and HTML manipulation. ```APIDOC QueryPath: __construct(string $html = '', array $options = []) Initializes QueryPath with HTML content and options. $html: The HTML content to parse. $options: An array of configuration options. find(string $selector) Finds elements matching the given CSS selector. $selector: The CSS selector. Returns: A QueryPath object representing the found elements. html(string $html = null) Gets or sets the HTML content. $html: The HTML content to set (optional). Returns: The HTML content or the QueryPath object for chaining. text(string $text = null) Gets or sets the text content of the selected elements. $text: The text content to set (optional). Returns: The text content or the QueryPath object for chaining. attr(string $attribute, string $value = null) Gets or sets the value of an attribute. $attribute: The attribute name. $value: The attribute value to set (optional). Returns: The attribute value or the QueryPath object for chaining. children(string $selector = null) Gets the children of the selected elements. $selector: An optional selector to filter children. Returns: A QueryPath object representing the children. parent(string $selector = null) Gets the parent of the selected elements. $selector: An optional selector to filter parents. Returns: A QueryPath object representing the parent. remove() Removes the selected elements. Returns: The QueryPath object for chaining. replaceWith(string $html) Replaces the selected elements with new HTML content. $html: The HTML content to replace with. Returns: The QueryPath object for chaining. append(string $html) Appends HTML content to the selected elements. $html: The HTML content to append. Returns: The QueryPath object for chaining. prepend(string $html) Prepends HTML content to the selected elements. $html: The HTML content to prepend. Returns: The QueryPath object for chaining. before(string $html) Inserts HTML content before the selected elements. $html: The HTML content to insert. Returns: The QueryPath object for chaining. after(string $html) Inserts HTML content after the selected elements. $html: The HTML content to insert. Returns: The QueryPath object for chaining. wrap(string $html) Wraps the selected elements with new HTML content. $html: The HTML content to wrap with. Returns: The QueryPath object for chaining. wrapAll(string $html) Wraps all selected elements with new HTML content. $html: The HTML content to wrap with. Returns: The QueryPath object for chaining. wrapInner(string $html) Wraps the inner content of the selected elements with new HTML content. $html: The HTML content to wrap with. Returns: The QueryPath object for chaining. clone() -> QueryPath Clones the current QueryPath object. Returns: A new QueryPath object. is(string $selector) Checks if the selected elements match the given selector. $selector: The CSS selector. Returns: True if the elements match, false otherwise. hasClass(string $class) Checks if the selected elements have the given class. $class: The class name. Returns: True if the elements have the class, false otherwise. addClass(string $class) Adds a class to the selected elements. $class: The class name to add. Returns: The QueryPath object for chaining. removeClass(string $class) Removes a class from the selected elements. $class: The class name to remove. Returns: The QueryPath object for chaining. toggleClass(string $class) Toggles a class on the selected elements. $class: The class name to toggle. Returns: The QueryPath object for chaining. css(string $property, string $value = null) Gets or sets a CSS property. $property: The CSS property name. $value: The CSS property value to set (optional). Returns: The CSS property value or the QueryPath object for chaining. data(string $key, mixed $value = null) Gets or sets data associated with the selected elements. $key: The data key. $value: The data value to set (optional). Returns: The data value or the QueryPath object for chaining. removeData(string $key = null) Removes data associated with the selected elements. $key: The data key to remove (optional, removes all if null). Returns: The QueryPath object for chaining. empty() Removes all child nodes from the selected elements. Returns: The QueryPath object for chaining. removeAttributes(string $attribute = null) Removes attributes from the selected elements. $attribute: The attribute name to remove (optional, removes all if null). Returns: The QueryPath object for chaining. serialize() -> string Serializes the QueryPath object to an HTML string. Returns: The HTML string. toXmlString() -> string Serializes the QueryPath object to an XML string. Returns: The XML string. toDomDocument() -> DOMDocument Returns the underlying DOMDocument object. Returns: The DOMDocument object. toNodeList() -> DOMNodeList Returns the underlying DOMNodeList object. Returns: The DOMNodeList object. each(callable $callback) Iterates over the selected elements, executing a callback for each. $callback: The callback function to execute for each element. Returns: The QueryPath object for chaining. map(callable $callback) -> array Maps the selected elements to a new array using a callback function. $callback: The callback function to execute for each element. Returns: An array of mapped values. filter(callable $callback) -> QueryPath Filters the selected elements based on a callback function. $callback: The callback function to execute for filtering. Returns: A QueryPath object containing the filtered elements. not(string $selector) -> QueryPath Removes elements matching the given selector from the selection. $selector: The CSS selector. Returns: A QueryPath object with the filtered elements. eq(int $index) -> QueryPath Selects the element at the specified index. $index: The index of the element. Returns: A QueryPath object containing the selected element. first() -> QueryPath Selects the first element in the selection. Returns: A QueryPath object containing the first element. last() -> QueryPath Selects the last element in the selection. Returns: A QueryPath object containing the last element. prev(string $selector = null) -> QueryPath Selects the previous sibling of the selected elements. $selector: An optional selector to filter previous siblings. Returns: A QueryPath object containing the previous siblings. next(string $selector = null) -> QueryPath Selects the next sibling of the selected elements. $selector: An optional selector to filter next siblings. Returns: A QueryPath object containing the next siblings. siblings(string $selector = null) -> QueryPath Selects all siblings of the selected elements. $selector: An optional selector to filter siblings. Returns: A QueryPath object containing the siblings. closest(string $selector) -> QueryPath Selects the closest ancestor element matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the closest ancestor. findParent(string $selector) -> QueryPath Finds the parent element matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the matching parent. findChildren(string $selector) -> QueryPath Finds the child elements matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the matching children. findNext(string $selector) -> QueryPath Finds the next sibling element matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the matching next sibling. findPrev(string $selector) -> QueryPath Finds the previous sibling element matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the matching previous sibling. findFirstChild(string $selector) -> QueryPath Finds the first child element matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the matching first child. findLastChild(string $selector) -> QueryPath Finds the last child element matching the given selector. $selector: The CSS selector. Returns: A QueryPath object containing the matching last child. findNthChild(int $n, string $selector = null) -> QueryPath Finds the nth child element matching the given selector. $n: The index of the child (1-based). $selector: An optional selector to filter the nth child. Returns: A QueryPath object containing the matching nth child. findParentUntil(string $selector) -> QueryPath Finds the parent element matching the given selector, stopping at a certain ancestor. $selector: The CSS selector. Returns: A QueryPath object containing the matching parent. findChildrenUntil(string $selector) -> QueryPath Finds the child elements matching the given selector, stopping at a certain descendant. $selector: The CSS selector. Returns: A QueryPath object containing the matching children. findNextUntil(string $selector) -> QueryPath Finds the next sibling element matching the given selector, stopping at a certain sibling. $selector: The CSS selector. Returns: A QueryPath object containing the matching next sibling. findPrevUntil(string $selector) -> QueryPath Finds the previous sibling element matching the given selector, stopping at a certain sibling. $selector: The CSS selector. Returns: A QueryPath object containing the matching previous sibling. findFirstChildUntil(string $selector) -> QueryPath Finds the first child element matching the given selector, stopping at a certain descendant. $selector: The CSS selector. Returns: A QueryPath object containing the matching first child. findLastChildUntil(string $selector) -> QueryPath Finds the last child element matching the given selector, stopping at a certain descendant. $selector: The CSS selector. Returns: A QueryPath object containing the matching last child. findNthChildUntil(int $n, string $selector = null) -> QueryPath Finds the nth child element matching the given selector, stopping at a certain descendant. $n: The index of the child (1-based). $selector: An optional selector to filter the nth child. Returns: A QueryPath object containing the matching nth child. queryPath(string $selector) -> QueryPath Applies a QueryPath selector to the current selection. $selector: The QueryPath selector. Returns: A QueryPath object representing the result of the query. xpath(string $xpath) -> QueryPath Finds elements matching the given XPath expression. $xpath: The XPath expression. Returns: A QueryPath object representing the found elements. css2xpath(string $selector) -> string Converts a CSS selector to an XPath expression. $selector: The CSS selector. Returns: The equivalent XPath expression. parse(string $html) -> QueryPath Parses HTML content and returns a QueryPath object. $html: The HTML content to parse. Returns: A QueryPath object representing the parsed HTML. save(string $filename) Saves the current QueryPath content to a file. $filename: The name of the file to save. Returns: The number of bytes written. saveHtml() -> string Saves the current QueryPath content to an HTML string. Returns: The HTML string. saveXml() -> string Saves the current QueryPath content to an XML string. Returns: The XML string. saveHtmlFile(string $filename) Saves the current QueryPath content to an HTML file. $filename: The name of the file to save. Returns: The number of bytes written. saveXmlFile(string $filename) Saves the current QueryPath content to an XML file. $filename: The name of the file to save. Returns: The number of bytes written. load(string $filename) Loads content from a file into QueryPath. $filename: The name of the file to load. Returns: The QueryPath object. loadHtml(string $html) Loads HTML content into QueryPath. $html: The HTML content to load. Returns: The QueryPath object. loadXml(string $xml) Loads XML content into QueryPath. $xml: The XML content to load. Returns: The QueryPath object. loadHtmlFile(string $filename) Loads HTML content from a file into QueryPath. $filename: The name of the file to load. Returns: The QueryPath object. loadXmlFile(string $filename) Loads XML content from a file into QueryPath. $filename: The name of the file to load. Returns: The QueryPath object. create(string $tag, array $attributes = [], string $content = '') -> QueryPath Creates a new HTML element. $tag: The tag name of the element. $attributes: An array of attributes for the element. $content: The content of the element. Returns: A QueryPath object representing the created element. createTextNode(string $text) -> QueryPath Creates a new text node. $text: The text content. Returns: A QueryPath object representing the text node. createComment(string $text) -> QueryPath Creates a new comment node. $text: The comment content. Returns: A QueryPath object representing the comment node. createDocumentFragment() -> DOMDocumentFragment Creates a new DOMDocumentFragment. Returns: A DOMDocumentFragment object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCdataSection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The name of the document type. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createComment(string $data) -> DOMComment Creates a new DOMComment. $data: The comment content. Returns: A DOMComment object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The name of the document type. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createComment(string $data) -> DOMComment Creates a new DOMComment. $data: The comment content. Returns: A DOMComment object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns: A DOMDocumentType object. createNotation(string $name) -> DOMNotation Creates a new DOMNotation. $name: The name of the notation. Returns: A DOMNotation object. createAttribute(string $name, string $value) -> DOMAttr Creates a new DOMAttr. $name: The name of the attribute. $value: The value of the attribute. Returns: A DOMAttr object. createCDATASection(string $data) -> DOMCdataSection Creates a new DOMCdataSection. $data: The CDATA content. Returns: A DOMCdataSection object. createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction Creates a new DOMProcessingInstruction. $target: The target of the instruction. $data: The instruction data. Returns: A DOMProcessingInstruction object. createEntityReference(string $name) -> DOMEntityReference Creates a new DOMEntityReference. $name: The name of the entity. Returns: A DOMEntityReference object. createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType Creates a new DOMDocumentType. $name: The document type name. $publicId: The public identifier. $systemId: The system identifier. Returns ``` -------------------------------- ### Parse HTML/XML Documents Source: https://github.com/gravitypdf/querypath/blob/main/README.md Demonstrates how to parse HTML or XML documents using QueryPath. It shows loading from a file or a string, with options for HTML5 parsing (using masterminds/html5) or legacy libxml parsing. ```php You can pass a string of HTML directly to the function'); // load a string } catch (\QueryPath\Exception $e) { // Handle error } try { // Legacy: uses libxml to parse HTML $qp = htmlqp(__DIR__.'/path/to/file.html'); // load a file from disk $qp = htmlqp('
    You can pass a string of HTML directly to the function
    '); // load a string } catch (\QueryPath\Exception $e) { // Handle error } try { // XML or XHTML $qp = qp(__DIR__.'/path/to/file.html'); // load a file from disk $qp = qp(""); // load a string } catch (\QueryPath\Exception $e) { // Handle error } ``` -------------------------------- ### General CSS Styling Source: https://github.com/gravitypdf/querypath/blob/main/examples/doc.html Defines basic styling for HTML elements like body, hr, and p, ensuring a consistent look and feel across the documentation pages. It sets margins, fonts, and border styles. ```css body { margin: 0; font-family: Lucida Sans,Lucida Grande,Helvetica,Arial,"Bitstream Vera Sans",sans-serif; } hr { border-top: 1px solid black; width: 100%; float: left; margin: 0; } p { margin: -10px 0 -10px 0; } ``` -------------------------------- ### QueryPath DOMQuery API Reference Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library Provides an overview of the QueryPath DOMQuery class methods for parsing and manipulating HTML/XML. This includes methods for finding elements using CSS selectors or XPath, traversing the DOM tree, and extracting data. ```apidoc QueryPath.DOMQuery: withHTML(html_string, options = null) Parses an HTML string and returns a QueryPath object. Parameters: html_string: The HTML content to parse. options: An array of options for parsing (e.g., 'convert_to_encoding'). Returns: A QueryPath object. find(selector) Selects elements matching the given CSS selector. Parameters: selector: A CSS selector string. Returns: A QueryPath object representing the matched elements. xpath(xpath_query) Selects elements matching the given XPath query. Parameters: xpath_query: An XPath query string. Returns: A QueryPath object representing the matched elements. top(selector = null) Selects the document element or an element matching the selector. Parameters: selector: Optional CSS selector for the root element. Returns: A QueryPath object. parents(selector = null) Selects ancestor elements. Parameters: selector: Optional CSS selector to filter ancestors. Returns: A QueryPath object. parent(selector = null) Selects the direct parent element. Parameters: selector: Optional CSS selector to filter the parent. Returns: A QueryPath object. siblings(selector = null) Selects sibling elements. Parameters: selector: Optional CSS selector to filter siblings. Returns: A QueryPath object. next(selector = null) Selects the next sibling element. Parameters: selector: Optional CSS selector to filter the next sibling. Returns: A QueryPath object. nextAll(selector = null) Selects all subsequent sibling elements. Parameters: selector: Optional CSS selector to filter subsequent siblings. Returns: A QueryPath object. prev(selector = null) Selects the previous sibling element. Parameters: selector: Optional CSS selector to filter the previous sibling. Returns: A QueryPath object. prevAll(selector = null) Selects all preceding sibling elements. Parameters: selector: Optional CSS selector to filter preceding siblings. Returns: A QueryPath object. children(selector = null) Selects immediate child elements. Parameters: selector: Optional CSS selector to filter children. Returns: A QueryPath object. deepest(selector = null) Selects the deepest node(s) within the current selection. Parameters: selector: Optional CSS selector to filter the deepest nodes. Returns: A QueryPath object. text() Gets the combined text content of the matched elements. Returns: String containing the text content. attr(attribute_name) Gets the value of a specified attribute from the first matched element. Parameters: attribute_name: The name of the attribute. Returns: String containing the attribute value. html() Gets the HTML content of the matched elements. Returns: String containing the HTML content. innerHtml() Gets the inner HTML content of the matched elements. Returns: String containing the inner HTML content. ``` -------------------------------- ### Chained Traversing Functions (CSS Selector) Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library Demonstrates using a chain of traversing functions with CSS selectors to find a specific table row element. ```php $tr = $this->qp->top('body')->find('table[id="main"]')->find('tr:nth-child(3)'); ```