### Install reliq-python Library
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Instructions to install the reliq-python library using pip, the Python package installer.
```Python
pip install reliq
```
--------------------------------
### Traversing All Elements
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Provides an example of using the 'everything' axis to iterate through all elements within the HTML structure, using a generator.
```python
rq = reliq("""
Title
Title
A
List
A
B
C
TEXT
")
#traverse everything through generator
for i in rq.everything(True):
print(str(i))
```
--------------------------------
### Get Raw or String Data
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Retrieves the HTML content from which the reliq object was compiled. The 'raw' parameter determines if the output is bytes or a string.
```python
data = Path('index.html').read_bytes
rq = reliq(data)
x = rq[0][2][1][8]
# if both objects are bytes() then their ids should be the same
x.get_data(True) is data
```
--------------------------------
### Reliq Error Handling
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Details the custom error types provided by the reliq library. Includes examples of catching `HtmlError` for exceeding depth limits and `ScriptError` for invalid expressions.
```python
import reliq
# Example of catching HtmlError
try:
# This might exceed a hypothetical HTML depth limit
reliq.parse('
' * 8193)
except reliq.HtmlError:
print('HTML depth limit exceeded')
# Example of catching ScriptError
try:
reliq.parse('| |') # An invalid expression
except reliq.ScriptError:
print('Incorrect expression')
```
--------------------------------
### Get All Nodes (self + descendants) in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `full()` method is a combination of `self()` and `descendants()`, retrieving all nodes within the context and all nodes below them. It effectively provides a complete view of the current node and its entire subtree.
```python
# struct
rq.full()
# [, , , , , , , , , , , ]
# list
# Note: The example provided in the source text for list/single appears to be descendants(), not full().
rq.filter('[0] section').descendants()
# [, , ]
# single
rq[1][0].descendants()
# [, , ]
```
--------------------------------
### Get Preceding Nodes (preceding) in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `preceding()` method is similar to `before()` but ignores ancestors. It retrieves nodes with a lower `.position` property that are not ancestors of the context nodes. It does not work for 'struct' types.
```python
# list
rq.filter('[0] title, [1] section').preceding()
# [, , , , , , , , , ]
# single
title = rq[0][0]
title.preceding() # all tags before it are it's ancestors
# []
# single
second_section = rq[1][3]
second_section.preceding()
# [, , , , , , , , , ]
```
--------------------------------
### Get Following Nodes (after) in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `after()` method retrieves all nodes that have a higher `.position` property than the context nodes. It does not work for 'struct' types. For nodes without following nodes, it returns an empty list.
```python
# list
rq.filter('h2, ul').after()
# [, , , , , , , , ]
# single
h2 = rq[1][1]
h2.after()
# [, , , , ]
# single
ul = rq[1][2]
ul.after()
# [, , , ]
# single
third_section = rq[1][3] # last element
third_section.after()
# []
```
--------------------------------
### Get Preceding Nodes (before) in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `before()` method retrieves all nodes that have a lower `.position` property than the context nodes. It does not work for 'struct' types. For nodes without preceding nodes, it returns an empty list.
```python
# list
rq.filter('[0] title, [1] section').before()
# [, , , , , , , , , , , ]
# single
title = rq[0][0]
title.before()
# []
# single
second_section = rq[1][3]
second_section.before()
# [, , , , , , , , , , ]
# single
head = rq[0]
head.before() #first element doesn't have any nodes before it
# []
```
--------------------------------
### Get All Descendant Nodes in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `descendants()` method retrieves all nodes that are one or more levels below the current context nodes. It applies to 'struct', 'list', and 'single' types, returning all nested nodes.
```python
# struct
rq.descendants()
# [, , , , , , , , , ]
# list
rq.filter('[0] section').descendants()
# [, ]
# single
rq[1][0].descendants()
# [, ]
```
--------------------------------
### Get Ancestor Nodes in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `ancestors()` method retrieves all ancestor nodes of the context nodes. It does not work for 'struct' types. For nodes without ancestors (e.g., top-level), it returns an empty list.
```python
# list
rq.filter('li').ancestors()
# [, , , , , ]
# single
rq[1][2][0].ancestors()
# [, ]
# first element of ancestors() should be the same as for parent()
rq[1][2][0].ancestors()[0].name == rq[1][2][0].parent()[0].name
# single
rq[0].ancestors() # top level nodes don't have ancestors
# []
```
--------------------------------
### Get Parent Nodes in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `parent()` method retrieves the immediate parent node of the context nodes. It does not work for 'struct' types. For nodes without parents (e.g., top-level), it returns an empty list.
```python
# list
rq.filter('li').parent()
# [, , ]
# single
rq[1][2][0].parent()
# []
# single
rq[0].parent() # top level nodes don't have parents
# []
```
--------------------------------
### Get Direct Children Nodes in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `children()` method retrieves all nodes that are directly one level below the current context nodes. It works across 'struct', 'list', and 'single' types, returning immediate descendants.
```python
# struct
rq.children()
# [, , , , ]
# list
rq.filter('head, ul').children()
# [, , , ]
# single
first_section = rq[1][0]
first_section.children()
# [, ]
```
--------------------------------
### Get Context Nodes (self) in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `self()` method retrieves nodes based on the context type. For 'single' and 'list' types, it returns unfiltered nodes. For 'struct' types, it filters by `reliq.Type.tag`. It can also accept a `type` argument for custom filtering.
```python
# rq is a reliq.Type.struct object
rq.self()
# [, ]
rq.self(type=None)
# [, , , , , ]
rq.self(type=reliq.Type.tag|reliq.Type.comment)
# [,, ]
# ls is a reliq.Type.list object that has comments and text types
ls = rq.filter('[:3] ( comment@ * )( text@ * )')
ls.self()
# [, , , ]
ls.self(type=reliq.Type.tag|reliq.Type.comment)
# []
# body is a reliq.Type.single object
body = rq[1].self()
len(body.self())
# 1
body.self()[0].name
# "body"
```
--------------------------------
### Get Relative Parent Nodes (rparent) in Reliq-Python
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
The `rparent()` method behaves like `parent()` but returns the parent to which the current object is relative. It does not take a `rel` argument, and returned objects are always relative. It does not work for 'struct' types.
```python
# Example usage for rparent() would follow similar patterns to parent(),
# but the provided text does not contain explicit code examples for rparent().
# It is described as a variation of parent() that always returns relative objects.
```
--------------------------------
### Initialize reliq Object from String, File, or Empty
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Illustrates various ways to instantiate the `reliq` object, including directly from an HTML string, from a file path using `Path()`, or as an empty object.
```Python
rq = reliq('
Example
') #passed directly
rq2 = reliq(Path('index.html')) #passed from file
rq3 = reliq(None) # empty object
rq4 = reliq() # empty object
```
--------------------------------
### Importing reliq and RQ Classes
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Shows how to import the main `reliq` class and its alias `RQ` from the `reliq` library for use in Python applications.
```Python
from reliq import reliq, RQ
```
--------------------------------
### Reliq Project Wrapper (RQ)
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Introduces the `RQ` function for creating project-specific `reliq` instances. It allows for caching compiled expressions and managing paths relative to the calling function's directory, promoting modularity.
```python
from reliq import RQ
# Create a cached RQ instance
reliq_cached = RQ(cached=True)
# Use the instance to parse HTML
rq_element = reliq_cached('
Alive!
')
print(rq_element)
# RQ function signature: def RQ(path="", cached=False)
# - path: Directory to save/cache expressions. Merged with calling function's directory if not absolute.
# - cached: If True, compiled expressions are saved and reused.
# Paths passed to functions accepting expressions are relative to the first declared 'path' argument,
# unless they are absolute or start with './' or '../'.
```
--------------------------------
### Initialize reliq Object with Reference URL (`ref`)
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Explains how the optional `ref` argument influences the base URL of the parsed HTML document. It demonstrates its behavior when a base tag is present in the HTML or when `ref` is an empty string.
```Python
rq = reliq('
',ref="")
rq4.ref # https://wikipedia.org
```
--------------------------------
### URL Joining Functions
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Explains `urljoin` and `ujoin` functions, which are similar to `urllib.parse.urljoin`. `urljoin` supports byte arguments and returns based on the `raw` parameter, while `ujoin` uses a default reference URL.
```APIDOC
urljoin(base_url, url, *, raw=False)
- Joins a base URL with another URL, similar to urllib.parse.urljoin.
- Supports byte arguments for base_url and url.
- Returns str or bytes based on the raw argument.
- Parameters:
- base_url: The base URL (str or bytes).
- url: The URL to join (str or bytes).
- raw: If True, returns bytes; otherwise, returns str.
ujoin(url, *, raw=False)
- Works like urljoin but uses a default reference URL for the 'ref' argument.
- Parameters:
- url: The URL to join (str or bytes).
- raw: If True, returns bytes; otherwise, returns str.
```
--------------------------------
### Reference URL Access
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Shows how to access the saved reference URL provided during the initialization of a reliq object, both as a string and as raw bytes.
```python
rq = reliq('',ref="http://en.wikipedia.org")
rq.ref # "http://en.wikipedia.org"
rq.ref_raw # b"http://en.wikipedia.org"
```
--------------------------------
### Item Access and Indexing
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Explains how the __getitem__ method works for accessing elements, distinguishing between 'single' and 'list' results based on the axis used.
```python
rq = reliq('
1
Text H
')
first = rq[0] # struct
#
first[1] # single
#
r = first.filter('( text@ * )( * ) child@')
r[1] # list
# " Text " obj
r[2] == first[1]
```
--------------------------------
### Decode HTML Entities with Reliq
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Demonstrates the `reliq.decode` function for converting HTML entities back into characters. Shows variations using `raw=True` and `no_nbsp=False` to control output format and non-breaking space handling.
```python
import reliq
# Decode with default settings
print(reliq.decode(r"text & < ⃛ Ô"))
# Expected output: b'text & < \xe2\x83\x9b\xe2\x83\x9b \xc3\x94'
# Decode with no_nbsp=False, preserving non-breaking space
print(reliq.decode('ex t', no_nbsp=False))
# Expected output: 'ex\xa0t'
# Decode with raw=True and no_nbsp=False, returning bytes
print(reliq.decode('ex t', True, no_nbsp=False))
# Expected output: b'ex\xc2\xa0t'
```
--------------------------------
### Tag Properties and Content Access
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Illustrates accessing various properties of tag elements, including structural levels, counts of tags/text/comments, attributes, and content.
```python
rq = reliq("""
\n '
```
--------------------------------
### Comprehensive reliq-python HTML Parsing and Data Extraction
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Demonstrates advanced usage of the reliq library for parsing HTML, compiling complex expressions, filtering elements, traversing ancestors, searching with compiled expressions, handling errors, and extracting text content. It also shows how to decode HTML entities and convert parsed data to a dictionary based on a structured expression.
```Python
from reliq import reliq
html = ""
with open('index.html','r') as f:
html = f.read()
rq = reliq(html) #parse html
expr = reliq.expr(r"""
div .user; {
a href; {
.name @ | "%i",
.link @ | "%(href)v"
},
.score.u span .score,
.info dl; {
.key dt | "%i",
.value dd | "%i"
} |,
.achievements.a li class=b>"achievement-" | "%i\n"
}
""") #expressions can be compiled
users = []
links = []
for i in rq.filter(r'table; { tr, text@ iw>lisp }')[:-2]:
# ignore comments and text nodes
if i.type is not reliq.Type.tag:
continue
first_child = i[0]
if first_child.desc_count < 3 and first_child.name == "div" and first_child.starttag == '
':
continue
link = first_child[2].attrib['href']
if re.match('^https://$',link):
links.append(link)
continue
#make sure that object is an ancestor of tag
for j in i.ancestors():
if j.name == "main":
break
else:
continue
#search() returns str, in this case expression is already compiled
# but can be also passed as a str() or bytes(). If Path() is passed
# file will be read
user = json.loads(i.search(expr))
users.append(user)
try: #handle errors
rq.search('p / /','')
except reliq.ScriptError: # all errors inherit from reliq.Error
print("error")
#get text from all text nodes that are descendants of object
print(rq[2].text_recursive)
#get text from all text nodes that are children of object
print(rq[2].text)
#decode html entities
reliq.decode('loop & < ⃛ Ô')
#execute and convert to dictionary
rq.json(r"""
.files * #files; ( li )( span .head ); {
.type i class child@ | "%(class)v" / sed "s/^flaticon-//",
.name @ | "%Dt" / trim sed "s/ ([^)]* [a-zA-Z][Bb])$//",
.size @ | "%t" / sed 's/.* \(([^)]* [a-zA-Z][Bb])\)$/\1/; s/,//g; /^[0-9].* [a-zA-Z][bB]$/!d' "E"
} |
""") #dict format is enforced and any incompatible expressions will raise reliq.ScriptError
```
--------------------------------
### Encode Strings with Reliq
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Illustrates the `reliq.encode` function for converting special characters and HTML entities into their encoded forms. Covers options like `raw=True` for byte output and `full=True` for comprehensive entity encoding.
```python
import reliq
# Encode a string with standard HTML entities
print(reliq.encode("
li & \t 'seq' \n
"))
# Expected output: '<p>li & \t 'seq' \n </p>'
# Encode with raw=True, returning bytes
print(reliq.encode("
li & \t 'seq' \n
", True))
# Expected output: b'<p>li & \t 'seq' \n </p>'
# Encode with full=True for comprehensive entity encoding
print(reliq.encode("
", True, full=True))
# Expected output: b'<p>li &; 	 'seq' 
 </p>'
```
--------------------------------
### String Representation of Objects
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Demonstrates the string conversion for reliq objects, including the full structure, filtered lists, single elements, and empty objects.
```python
rq = reliq("""
H1
N2
N3
""")
str(rq) # struct
# '\n
H1
\n
N2
\n
N3
\n'
str(rq.filter('h2')) # list
# '
N2
N3
'
str(rq[0]) # single
# '
H1
'
str(reliq()) # empty
# ''
```
--------------------------------
### Relativity and Context in Reliq Filtering
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Explains the concept of 'relativity' in reliq, where filtered objects retain a pointer to their context node. Demonstrates how `.rlvl`, `.rposition`, and `.rparent()` work with relative selections.
```python
import reliq
rq = reliq.parse("""
""")
# Non-relative selection
li = rq[0][0][0][1] # Selects the second 'li' element
# Relative selection based on a filter
# 'li i@w>"B"' selects 'li' elements where the inner text is greater than 'B'
li_rel = rq.filter('nav; li i@w>"B"')[0] # Relative to 'nav'
# .rlvl and .rposition show relative levels and positions
print(f"li.rlvl: {li.rlvl}, li_rel.rlvl: {li_rel.rlvl}")
# Expected output: li.rlvl: 3, li_rel.rlvl: 2
print(f"li.rposition: {li.rposition}, li_rel.rposition: {li_rel.rposition}")
# Expected output: li.rposition: 10, li_rel.rposition: 9
# Using .rparent() to get the parent relative to the current node
nav_rel = li_rel.rparent()[0]
print(f"nav_rel.rlvl: {nav_rel.rlvl}, nav_rel.rposition: {nav_rel.rposition}")
# Expected output: nav_rel.rlvl: -2, nav_rel.rposition: -7
# Iterating descendants with relativity
nav = rq[0][0]
for i in nav.descendants(rel=True):
if i.rlvl == 2 and i.name == 'li':
print(f"Descendant li: lvl={i.lvl}, rlvl={i.rlvl}")
# Expected output: Descendant li: lvl=3, rlvl=2
break
```
--------------------------------
### Comment Content Access
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Demonstrates how to retrieve the content of HTML comments using the 'insides' property and how comments can be converted to bytes or strings.
```python
c = reliq('').self(type=None)[0]
c.insides
# ' Comment '
bytes(c)
# b''
str(c)
# ''
```
--------------------------------
### Text Content Conversion
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Shows the conversion of plain text elements into string format using the str() function.
```python
t = reliq('Example').self(type=None)[0]
str(t)
# 'Example'
```
--------------------------------
### reliq.Type Enum and Object States
Source: https://github.com/tuvimen/reliq-python/blob/master/README.md
Documents the `reliq.Type` enumeration, which defines the internal state or type of a `reliq` object. This type influences the behavior of various methods. It describes the `empty`, `unknown`, `struct`, and `list` types.
```APIDOC
reliq.Type(Flag):
- empty:
Description: Returned from `reliq(None)` or `reliq.filter()` when no matches are found. All methods return default values.
- unknown:
Description: Similar to `empty` but should ideally not occur.
- struct:
Description: Returned upon successful initialization, e.g., `reliq('
Example
')`.
- list:
Description: Returned by `reliq.filter()` when it successfully finds matches.
```
=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.