### Composer Autoloader with QueryPath Functions
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Combines Composer's autoloader with QueryPath's own autoloader (qp.php) to enable both the Object-Oriented API and procedural functions like htmlqp(). This provides the most comprehensive access to QueryPath's features.
```php
text();
// This works because qp.php was imported
print htmlqp('http://technosophos.com', 'title')->text();
?>
```
--------------------------------
### QueryPath Autoloader Integration
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Includes QueryPath's autoloader to enable the use of QueryPath classes and functions. This method ensures both the OO API and procedural functions like htmlqp() are available.
```php
text();
print htmlqp('http://technosophos.com', 'title')->text();
?>
```
--------------------------------
### QueryPath Document Parsing Methods
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Demonstrates the different methods for parsing XML and HTML documents with QueryPath, including auto-detection.
```php
QueryPath::withXML(): This *only* handles XML documents. If you give it an HTML document, it will attempt to force XML parsing on that document.
htmlqp(), QueryPath::withHTML(): This will force QueryPath to use the HTML parser. it will also make a number of adjustments to QueryPath to accommodate common HTML breakages.
qp(), QueryPath::with(): This will attempt to guess whether the document is XML or HTML. In general, it favors XML slightly. Guessing may be done by…
- File extension
- XML declaration
- The suggestions made by any options passed into the document
```
--------------------------------
### QueryPath Character Encoding Handling
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Explains QueryPath's automatic character encoding conversion and how to manually adjust it.
```php
// QueryPath attempts to convert documents automatically using PHP's internal character detection libraries.
// You can adjust this feature manually by passing in language settings in the $options array.
// See the documentation on qp() for details.
```
--------------------------------
### Composer Autoloader with QueryPath OO API
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Uses Composer's autoloader for PSR-0 compatibility, allowing the use of QueryPath's Object-Oriented API (e.g., QueryPath::withHTML()). Note that procedural functions like htmlqp() are not available with this method alone.
```php
text();
// THIS DOESN'T WORK!
// print htmlqp('http://technosophos.com', 'title')->text();
?>
```
--------------------------------
### QueryPath Method: text()
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Illustrates the use of the `text()` method in QueryPath to retrieve the text content of the first selected element. If no element is found, it returns an empty string.
```php
text();
?>
```
--------------------------------
### Basic QueryPath HTML Parsing and Text Extraction
Source: https://github.com/gravitypdf/querypath/blob/main/examples/quickstart-guide.md
Demonstrates loading an HTML document from a URL, selecting the 'title' tag using a CSS selector, and extracting its text content. This showcases the core functionality of QueryPath for web scraping and data extraction.
```php
text();
?>
```
--------------------------------
### Install QueryPath
Source: https://github.com/gravitypdf/querypath/blob/main/README.md
Installs the QueryPath library using Composer. This is the recommended method for managing dependencies.
```bash
composer require gravitypdf/querypath
```
--------------------------------
### Encoding Conversion Example
Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library
Shows how to convert the encoding of an HTML page to UTF-8 using the `htmlqp` function, and then select specific child elements.
```php
htmlqp($html, 'body', array('convert_to_encoding' => 'utf-8'))->children('p.a');
```
--------------------------------
### Chained Method Manipulation
Source: https://github.com/gravitypdf/querypath/blob/main/README.md
Illustrates the power of QueryPath's chained methods for manipulating HTML documents. This example creates an HTML5 document, adds text, appends elements, modifies attributes and styles, and outputs the result.
```php
text('Example of QueryPath.')
// Now look for the
element
->top('body')
// Inside the body, add a title and paragraph.
->append('This is a test page
Test text
')
// Now we select the paragraph we just created inside the body
->children('p')
// Add a 'class="some-class"' attribute to the paragraph
->attr('class', 'some-class')
// And add a style attribute, too, setting the background color.
->css('background-color', '#eee')
// Now go back to the paragraph again
->parent()
// Before the paragraph and the title, add an empty table.
->prepend('')
// Now let's go to the table...
->top('#my-table')
// Add a couple of empty rows
->append('|
')
// select the rows (both at once)
->children()
// Add a CSS class to both rows
->addClass('table-row')
// Now just get the first row (at position 0)
->eq(0)
// Add a table header in the first row
->append('This is the header | ')
// Now go to the next row
->next()
// Add some data to this row
->append('This is the data | ')
// Write it all out as HTML
->writeHTML5();
} catch (\QueryPath\Exception $e) {
// Handle error
}
```
--------------------------------
### Find and Iterate Nodes
Source: https://github.com/gravitypdf/querypath/blob/main/README.md
Demonstrates how to find specific nodes within a document using the `find()` method and iterate over the results. This example finds all `` elements and prints their text content.
```php
Foo
Bar
FooBar
';
$qp = html5qp($html);
foreach ($qp->find('li') as $li) {
echo $li->text() .'
';
}
} catch (\QueryPath\Exception $e) {
// Handle error
}
```
--------------------------------
### Basic HTML Parsing and Link Extraction
Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library
Demonstrates how to create a QueryPath object from an HTML string, find all anchor () tags, and extract their text content. It also shows how to quickly get the text of the title tag.
```php
//Create a new QueryPath object and supply it with source $html page
$qp = QueryPath::withHTML($html);
// find desired html nodes
$linkNodes = $qp->find('a')
//Loop through all the links in the page
foreach ($linkNodes as $li) {
echo $li->text() ;
}
// Quickly get title text
$titleText = $qp->find('title')->text();
```
--------------------------------
### Data Extraction Functions
Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library
Illustrates common functions used to extract data from matched HTML nodes. These include getting text content, attribute values, and HTML content. Note that if multiple nodes are matched, these functions return data from the first node.
```php
text() // Get combined text contents of each element in the set of matched elements, including their descendants.
attr('src') // Get value of an attribute with a given name.
html() // Get HTML contents of matching node
innerHtml() // Get the HTML contents INSIDE the node.
```
--------------------------------
### Running QueryPath Tests
Source: https://github.com/gravitypdf/querypath/blob/main/README.md
Commands to run the linter and PHPUnit tests for the QueryPath project. These are essential for ensuring code quality and verifying bug fixes before submitting contributions.
```bash
composer run lint
```
```bash
vendor/bin/phpunit
```
--------------------------------
### QueryPath API Documentation
Source: https://github.com/gravitypdf/querypath/blob/main/examples/testGrid.html
Comprehensive API documentation for QueryPath, covering its classes, methods, and functionalities for XML and HTML manipulation.
```APIDOC
QueryPath:
__construct(string $html = '', array $options = [])
Initializes QueryPath with HTML content and options.
$html: The HTML content to parse.
$options: An array of configuration options.
find(string $selector)
Finds elements matching the given CSS selector.
$selector: The CSS selector.
Returns: A QueryPath object representing the found elements.
html(string $html = null)
Gets or sets the HTML content.
$html: The HTML content to set (optional).
Returns: The HTML content or the QueryPath object for chaining.
text(string $text = null)
Gets or sets the text content of the selected elements.
$text: The text content to set (optional).
Returns: The text content or the QueryPath object for chaining.
attr(string $attribute, string $value = null)
Gets or sets the value of an attribute.
$attribute: The attribute name.
$value: The attribute value to set (optional).
Returns: The attribute value or the QueryPath object for chaining.
children(string $selector = null)
Gets the children of the selected elements.
$selector: An optional selector to filter children.
Returns: A QueryPath object representing the children.
parent(string $selector = null)
Gets the parent of the selected elements.
$selector: An optional selector to filter parents.
Returns: A QueryPath object representing the parent.
remove()
Removes the selected elements.
Returns: The QueryPath object for chaining.
replaceWith(string $html)
Replaces the selected elements with new HTML content.
$html: The HTML content to replace with.
Returns: The QueryPath object for chaining.
append(string $html)
Appends HTML content to the selected elements.
$html: The HTML content to append.
Returns: The QueryPath object for chaining.
prepend(string $html)
Prepends HTML content to the selected elements.
$html: The HTML content to prepend.
Returns: The QueryPath object for chaining.
before(string $html)
Inserts HTML content before the selected elements.
$html: The HTML content to insert.
Returns: The QueryPath object for chaining.
after(string $html)
Inserts HTML content after the selected elements.
$html: The HTML content to insert.
Returns: The QueryPath object for chaining.
wrap(string $html)
Wraps the selected elements with new HTML content.
$html: The HTML content to wrap with.
Returns: The QueryPath object for chaining.
wrapAll(string $html)
Wraps all selected elements with new HTML content.
$html: The HTML content to wrap with.
Returns: The QueryPath object for chaining.
wrapInner(string $html)
Wraps the inner content of the selected elements with new HTML content.
$html: The HTML content to wrap with.
Returns: The QueryPath object for chaining.
clone() -> QueryPath
Clones the current QueryPath object.
Returns: A new QueryPath object.
is(string $selector)
Checks if the selected elements match the given selector.
$selector: The CSS selector.
Returns: True if the elements match, false otherwise.
hasClass(string $class)
Checks if the selected elements have the given class.
$class: The class name.
Returns: True if the elements have the class, false otherwise.
addClass(string $class)
Adds a class to the selected elements.
$class: The class name to add.
Returns: The QueryPath object for chaining.
removeClass(string $class)
Removes a class from the selected elements.
$class: The class name to remove.
Returns: The QueryPath object for chaining.
toggleClass(string $class)
Toggles a class on the selected elements.
$class: The class name to toggle.
Returns: The QueryPath object for chaining.
css(string $property, string $value = null)
Gets or sets a CSS property.
$property: The CSS property name.
$value: The CSS property value to set (optional).
Returns: The CSS property value or the QueryPath object for chaining.
data(string $key, mixed $value = null)
Gets or sets data associated with the selected elements.
$key: The data key.
$value: The data value to set (optional).
Returns: The data value or the QueryPath object for chaining.
removeData(string $key = null)
Removes data associated with the selected elements.
$key: The data key to remove (optional, removes all if null).
Returns: The QueryPath object for chaining.
empty()
Removes all child nodes from the selected elements.
Returns: The QueryPath object for chaining.
removeAttributes(string $attribute = null)
Removes attributes from the selected elements.
$attribute: The attribute name to remove (optional, removes all if null).
Returns: The QueryPath object for chaining.
serialize() -> string
Serializes the QueryPath object to an HTML string.
Returns: The HTML string.
toXmlString() -> string
Serializes the QueryPath object to an XML string.
Returns: The XML string.
toDomDocument() -> DOMDocument
Returns the underlying DOMDocument object.
Returns: The DOMDocument object.
toNodeList() -> DOMNodeList
Returns the underlying DOMNodeList object.
Returns: The DOMNodeList object.
each(callable $callback)
Iterates over the selected elements, executing a callback for each.
$callback: The callback function to execute for each element.
Returns: The QueryPath object for chaining.
map(callable $callback) -> array
Maps the selected elements to a new array using a callback function.
$callback: The callback function to execute for each element.
Returns: An array of mapped values.
filter(callable $callback) -> QueryPath
Filters the selected elements based on a callback function.
$callback: The callback function to execute for filtering.
Returns: A QueryPath object containing the filtered elements.
not(string $selector) -> QueryPath
Removes elements matching the given selector from the selection.
$selector: The CSS selector.
Returns: A QueryPath object with the filtered elements.
eq(int $index) -> QueryPath
Selects the element at the specified index.
$index: The index of the element.
Returns: A QueryPath object containing the selected element.
first() -> QueryPath
Selects the first element in the selection.
Returns: A QueryPath object containing the first element.
last() -> QueryPath
Selects the last element in the selection.
Returns: A QueryPath object containing the last element.
prev(string $selector = null) -> QueryPath
Selects the previous sibling of the selected elements.
$selector: An optional selector to filter previous siblings.
Returns: A QueryPath object containing the previous siblings.
next(string $selector = null) -> QueryPath
Selects the next sibling of the selected elements.
$selector: An optional selector to filter next siblings.
Returns: A QueryPath object containing the next siblings.
siblings(string $selector = null) -> QueryPath
Selects all siblings of the selected elements.
$selector: An optional selector to filter siblings.
Returns: A QueryPath object containing the siblings.
closest(string $selector) -> QueryPath
Selects the closest ancestor element matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the closest ancestor.
findParent(string $selector) -> QueryPath
Finds the parent element matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching parent.
findChildren(string $selector) -> QueryPath
Finds the child elements matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching children.
findNext(string $selector) -> QueryPath
Finds the next sibling element matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching next sibling.
findPrev(string $selector) -> QueryPath
Finds the previous sibling element matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching previous sibling.
findFirstChild(string $selector) -> QueryPath
Finds the first child element matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching first child.
findLastChild(string $selector) -> QueryPath
Finds the last child element matching the given selector.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching last child.
findNthChild(int $n, string $selector = null) -> QueryPath
Finds the nth child element matching the given selector.
$n: The index of the child (1-based).
$selector: An optional selector to filter the nth child.
Returns: A QueryPath object containing the matching nth child.
findParentUntil(string $selector) -> QueryPath
Finds the parent element matching the given selector, stopping at a certain ancestor.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching parent.
findChildrenUntil(string $selector) -> QueryPath
Finds the child elements matching the given selector, stopping at a certain descendant.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching children.
findNextUntil(string $selector) -> QueryPath
Finds the next sibling element matching the given selector, stopping at a certain sibling.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching next sibling.
findPrevUntil(string $selector) -> QueryPath
Finds the previous sibling element matching the given selector, stopping at a certain sibling.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching previous sibling.
findFirstChildUntil(string $selector) -> QueryPath
Finds the first child element matching the given selector, stopping at a certain descendant.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching first child.
findLastChildUntil(string $selector) -> QueryPath
Finds the last child element matching the given selector, stopping at a certain descendant.
$selector: The CSS selector.
Returns: A QueryPath object containing the matching last child.
findNthChildUntil(int $n, string $selector = null) -> QueryPath
Finds the nth child element matching the given selector, stopping at a certain descendant.
$n: The index of the child (1-based).
$selector: An optional selector to filter the nth child.
Returns: A QueryPath object containing the matching nth child.
queryPath(string $selector) -> QueryPath
Applies a QueryPath selector to the current selection.
$selector: The QueryPath selector.
Returns: A QueryPath object representing the result of the query.
xpath(string $xpath) -> QueryPath
Finds elements matching the given XPath expression.
$xpath: The XPath expression.
Returns: A QueryPath object representing the found elements.
css2xpath(string $selector) -> string
Converts a CSS selector to an XPath expression.
$selector: The CSS selector.
Returns: The equivalent XPath expression.
parse(string $html) -> QueryPath
Parses HTML content and returns a QueryPath object.
$html: The HTML content to parse.
Returns: A QueryPath object representing the parsed HTML.
save(string $filename)
Saves the current QueryPath content to a file.
$filename: The name of the file to save.
Returns: The number of bytes written.
saveHtml() -> string
Saves the current QueryPath content to an HTML string.
Returns: The HTML string.
saveXml() -> string
Saves the current QueryPath content to an XML string.
Returns: The XML string.
saveHtmlFile(string $filename)
Saves the current QueryPath content to an HTML file.
$filename: The name of the file to save.
Returns: The number of bytes written.
saveXmlFile(string $filename)
Saves the current QueryPath content to an XML file.
$filename: The name of the file to save.
Returns: The number of bytes written.
load(string $filename)
Loads content from a file into QueryPath.
$filename: The name of the file to load.
Returns: The QueryPath object.
loadHtml(string $html)
Loads HTML content into QueryPath.
$html: The HTML content to load.
Returns: The QueryPath object.
loadXml(string $xml)
Loads XML content into QueryPath.
$xml: The XML content to load.
Returns: The QueryPath object.
loadHtmlFile(string $filename)
Loads HTML content from a file into QueryPath.
$filename: The name of the file to load.
Returns: The QueryPath object.
loadXmlFile(string $filename)
Loads XML content from a file into QueryPath.
$filename: The name of the file to load.
Returns: The QueryPath object.
create(string $tag, array $attributes = [], string $content = '') -> QueryPath
Creates a new HTML element.
$tag: The tag name of the element.
$attributes: An array of attributes for the element.
$content: The content of the element.
Returns: A QueryPath object representing the created element.
createTextNode(string $text) -> QueryPath
Creates a new text node.
$text: The text content.
Returns: A QueryPath object representing the text node.
createComment(string $text) -> QueryPath
Creates a new comment node.
$text: The comment content.
Returns: A QueryPath object representing the comment node.
createDocumentFragment() -> DOMDocumentFragment
Creates a new DOMDocumentFragment.
Returns: A DOMDocumentFragment object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCdataSection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The name of the document type.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createComment(string $data) -> DOMComment
Creates a new DOMComment.
$data: The comment content.
Returns: A DOMComment object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The name of the document type.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createComment(string $data) -> DOMComment
Creates a new DOMComment.
$data: The comment content.
Returns: A DOMComment object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns: A DOMDocumentType object.
createNotation(string $name) -> DOMNotation
Creates a new DOMNotation.
$name: The name of the notation.
Returns: A DOMNotation object.
createAttribute(string $name, string $value) -> DOMAttr
Creates a new DOMAttr.
$name: The name of the attribute.
$value: The value of the attribute.
Returns: A DOMAttr object.
createCDATASection(string $data) -> DOMCdataSection
Creates a new DOMCdataSection.
$data: The CDATA content.
Returns: A DOMCdataSection object.
createProcessingInstruction(string $target, string $data) -> DOMProcessingInstruction
Creates a new DOMProcessingInstruction.
$target: The target of the instruction.
$data: The instruction data.
Returns: A DOMProcessingInstruction object.
createEntityReference(string $name) -> DOMEntityReference
Creates a new DOMEntityReference.
$name: The name of the entity.
Returns: A DOMEntityReference object.
createDocumentType(string $name, string $publicId = '', string $systemId = '') -> DOMDocumentType
Creates a new DOMDocumentType.
$name: The document type name.
$publicId: The public identifier.
$systemId: The system identifier.
Returns
```
--------------------------------
### Parse HTML/XML Documents
Source: https://github.com/gravitypdf/querypath/blob/main/README.md
Demonstrates how to parse HTML or XML documents using QueryPath. It shows loading from a file or a string, with options for HTML5 parsing (using masterminds/html5) or legacy libxml parsing.
```php
You can pass a string of HTML directly to the function'); // load a string
} catch (\QueryPath\Exception $e) {
// Handle error
}
try {
// Legacy: uses libxml to parse HTML
$qp = htmlqp(__DIR__.'/path/to/file.html'); // load a file from disk
$qp = htmlqp('You can pass a string of HTML directly to the function
'); // load a string
} catch (\QueryPath\Exception $e) {
// Handle error
}
try {
// XML or XHTML
$qp = qp(__DIR__.'/path/to/file.html'); // load a file from disk
$qp = qp(""); // load a string
} catch (\QueryPath\Exception $e) {
// Handle error
}
```
--------------------------------
### General CSS Styling
Source: https://github.com/gravitypdf/querypath/blob/main/examples/doc.html
Defines basic styling for HTML elements like body, hr, and p, ensuring a consistent look and feel across the documentation pages. It sets margins, fonts, and border styles.
```css
body {
margin: 0;
font-family: Lucida Sans,Lucida Grande,Helvetica,Arial,"Bitstream Vera Sans",sans-serif;
}
hr {
border-top: 1px solid black;
width: 100%;
float: left;
margin: 0;
}
p {
margin: -10px 0 -10px 0;
}
```
--------------------------------
### QueryPath DOMQuery API Reference
Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library
Provides an overview of the QueryPath DOMQuery class methods for parsing and manipulating HTML/XML. This includes methods for finding elements using CSS selectors or XPath, traversing the DOM tree, and extracting data.
```apidoc
QueryPath.DOMQuery:
withHTML(html_string, options = null)
Parses an HTML string and returns a QueryPath object.
Parameters:
html_string: The HTML content to parse.
options: An array of options for parsing (e.g., 'convert_to_encoding').
Returns: A QueryPath object.
find(selector)
Selects elements matching the given CSS selector.
Parameters:
selector: A CSS selector string.
Returns: A QueryPath object representing the matched elements.
xpath(xpath_query)
Selects elements matching the given XPath query.
Parameters:
xpath_query: An XPath query string.
Returns: A QueryPath object representing the matched elements.
top(selector = null)
Selects the document element or an element matching the selector.
Parameters:
selector: Optional CSS selector for the root element.
Returns: A QueryPath object.
parents(selector = null)
Selects ancestor elements.
Parameters:
selector: Optional CSS selector to filter ancestors.
Returns: A QueryPath object.
parent(selector = null)
Selects the direct parent element.
Parameters:
selector: Optional CSS selector to filter the parent.
Returns: A QueryPath object.
siblings(selector = null)
Selects sibling elements.
Parameters:
selector: Optional CSS selector to filter siblings.
Returns: A QueryPath object.
next(selector = null)
Selects the next sibling element.
Parameters:
selector: Optional CSS selector to filter the next sibling.
Returns: A QueryPath object.
nextAll(selector = null)
Selects all subsequent sibling elements.
Parameters:
selector: Optional CSS selector to filter subsequent siblings.
Returns: A QueryPath object.
prev(selector = null)
Selects the previous sibling element.
Parameters:
selector: Optional CSS selector to filter the previous sibling.
Returns: A QueryPath object.
prevAll(selector = null)
Selects all preceding sibling elements.
Parameters:
selector: Optional CSS selector to filter preceding siblings.
Returns: A QueryPath object.
children(selector = null)
Selects immediate child elements.
Parameters:
selector: Optional CSS selector to filter children.
Returns: A QueryPath object.
deepest(selector = null)
Selects the deepest node(s) within the current selection.
Parameters:
selector: Optional CSS selector to filter the deepest nodes.
Returns: A QueryPath object.
text()
Gets the combined text content of the matched elements.
Returns: String containing the text content.
attr(attribute_name)
Gets the value of a specified attribute from the first matched element.
Parameters:
attribute_name: The name of the attribute.
Returns: String containing the attribute value.
html()
Gets the HTML content of the matched elements.
Returns: String containing the HTML content.
innerHtml()
Gets the inner HTML content of the matched elements.
Returns: String containing the inner HTML content.
```
--------------------------------
### Chained Traversing Functions (CSS Selector)
Source: https://github.com/gravitypdf/querypath/wiki/How-to-parse-HTML-in-PHP-using-querypath-library
Demonstrates using a chain of traversing functions with CSS selectors to find a specific table row element.
```php
$tr = $this->qp->top('body')->find('table[id="main"]')->find('tr:nth-child(3)');
```