### Install reliq-python Library Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Instructions to install the reliq-python library using pip, the Python package installer. ```Python pip install reliq ``` -------------------------------- ### Traversing All Elements Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Provides an example of using the 'everything' axis to iterate through all elements within the HTML structure, using a generator. ```python rq = reliq(""" Title

Title

A

List

TEXT
") #traverse everything through generator for i in rq.everything(True): print(str(i)) ``` -------------------------------- ### Get Raw or String Data Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Retrieves the HTML content from which the reliq object was compiled. The 'raw' parameter determines if the output is bytes or a string. ```python data = Path('index.html').read_bytes rq = reliq(data) x = rq[0][2][1][8] # if both objects are bytes() then their ids should be the same x.get_data(True) is data ``` -------------------------------- ### Reliq Error Handling Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Details the custom error types provided by the reliq library. Includes examples of catching `HtmlError` for exceeding depth limits and `ScriptError` for invalid expressions. ```python import reliq # Example of catching HtmlError try: # This might exceed a hypothetical HTML depth limit reliq.parse('
' * 8193) except reliq.HtmlError: print('HTML depth limit exceeded') # Example of catching ScriptError try: reliq.parse('| |') # An invalid expression except reliq.ScriptError: print('Incorrect expression') ``` -------------------------------- ### Get All Nodes (self + descendants) in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `full()` method is a combination of `self()` and `descendants()`, retrieving all nodes within the context and all nodes below them. It effectively provides a complete view of the current node and its entire subtree. ```python # struct rq.full() # [, , , , , , , , , , , ] # list # Note: The example provided in the source text for list/single appears to be descendants(), not full(). rq.filter('[0] section').descendants() # [, , ] # single rq[1][0].descendants() # [, , ] ``` -------------------------------- ### Get Preceding Nodes (preceding) in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `preceding()` method is similar to `before()` but ignores ancestors. It retrieves nodes with a lower `.position` property that are not ancestors of the context nodes. It does not work for 'struct' types. ```python # list rq.filter('[0] title, [1] section').preceding() # [, , , , , , , , , ] # single title = rq[0][0] title.preceding() # all tags before it are it's ancestors # [] # single second_section = rq[1][3] second_section.preceding() # [, , , , , , , , , ] ``` -------------------------------- ### Get Following Nodes (after) in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `after()` method retrieves all nodes that have a higher `.position` property than the context nodes. It does not work for 'struct' types. For nodes without following nodes, it returns an empty list. ```python # list rq.filter('h2, ul').after() # [, , , , , , , , ] # single h2 = rq[1][1] h2.after() # [, , , , ] # single ul = rq[1][2] ul.after() # [, , , ] # single third_section = rq[1][3] # last element third_section.after() # [] ``` -------------------------------- ### Get Preceding Nodes (before) in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `before()` method retrieves all nodes that have a lower `.position` property than the context nodes. It does not work for 'struct' types. For nodes without preceding nodes, it returns an empty list. ```python # list rq.filter('[0] title, [1] section').before() # [, , , , , , , , , , , ] # single title = rq[0][0] title.before() # [] # single second_section = rq[1][3] second_section.before() # [, , , , , , , , , , ] # single head = rq[0] head.before() #first element doesn't have any nodes before it # [] ``` -------------------------------- ### Get All Descendant Nodes in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `descendants()` method retrieves all nodes that are one or more levels below the current context nodes. It applies to 'struct', 'list', and 'single' types, returning all nested nodes. ```python # struct rq.descendants() # [, , , , , , , , , ] # list rq.filter('[0] section').descendants() # [, ] # single rq[1][0].descendants() # [, ] ``` -------------------------------- ### Get Ancestor Nodes in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `ancestors()` method retrieves all ancestor nodes of the context nodes. It does not work for 'struct' types. For nodes without ancestors (e.g., top-level), it returns an empty list. ```python # list rq.filter('li').ancestors() # [, , , , , ] # single rq[1][2][0].ancestors() # [, ] # first element of ancestors() should be the same as for parent() rq[1][2][0].ancestors()[0].name == rq[1][2][0].parent()[0].name # single rq[0].ancestors() # top level nodes don't have ancestors # [] ``` -------------------------------- ### Get Parent Nodes in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `parent()` method retrieves the immediate parent node of the context nodes. It does not work for 'struct' types. For nodes without parents (e.g., top-level), it returns an empty list. ```python # list rq.filter('li').parent() # [, , ] # single rq[1][2][0].parent() # [] # single rq[0].parent() # top level nodes don't have parents # [] ``` -------------------------------- ### Get Direct Children Nodes in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `children()` method retrieves all nodes that are directly one level below the current context nodes. It works across 'struct', 'list', and 'single' types, returning immediate descendants. ```python # struct rq.children() # [, , , , ] # list rq.filter('head, ul').children() # [, , , ] # single first_section = rq[1][0] first_section.children() # [, ] ``` -------------------------------- ### Get Context Nodes (self) in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `self()` method retrieves nodes based on the context type. For 'single' and 'list' types, it returns unfiltered nodes. For 'struct' types, it filters by `reliq.Type.tag`. It can also accept a `type` argument for custom filtering. ```python # rq is a reliq.Type.struct object rq.self() # [, ] rq.self(type=None) # [, , , , , ] rq.self(type=reliq.Type.tag|reliq.Type.comment) # [,, ] # ls is a reliq.Type.list object that has comments and text types ls = rq.filter('[:3] ( comment@ * )( text@ * )') ls.self() # [, , , ] ls.self(type=reliq.Type.tag|reliq.Type.comment) # [] # body is a reliq.Type.single object body = rq[1].self() len(body.self()) # 1 body.self()[0].name # "body" ``` -------------------------------- ### Get Relative Parent Nodes (rparent) in Reliq-Python Source: https://github.com/tuvimen/reliq-python/blob/master/README.md The `rparent()` method behaves like `parent()` but returns the parent to which the current object is relative. It does not take a `rel` argument, and returned objects are always relative. It does not work for 'struct' types. ```python # Example usage for rparent() would follow similar patterns to parent(), # but the provided text does not contain explicit code examples for rparent(). # It is described as a variation of parent() that always returns relative objects. ``` -------------------------------- ### Initialize reliq Object from String, File, or Empty Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Illustrates various ways to instantiate the `reliq` object, including directly from an HTML string, from a file path using `Path()`, or as an empty object. ```Python rq = reliq('

Example

') #passed directly rq2 = reliq(Path('index.html')) #passed from file rq3 = reliq(None) # empty object rq4 = reliq() # empty object ``` -------------------------------- ### Importing reliq and RQ Classes Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Shows how to import the main `reliq` class and its alias `RQ` from the `reliq` library for use in Python applications. ```Python from reliq import reliq, RQ ``` -------------------------------- ### Reliq Project Wrapper (RQ) Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Introduces the `RQ` function for creating project-specific `reliq` instances. It allows for caching compiled expressions and managing paths relative to the calling function's directory, promoting modularity. ```python from reliq import RQ # Create a cached RQ instance reliq_cached = RQ(cached=True) # Use the instance to parse HTML rq_element = reliq_cached('

Alive!

') print(rq_element) # RQ function signature: def RQ(path="", cached=False) # - path: Directory to save/cache expressions. Merged with calling function's directory if not absolute. # - cached: If True, compiled expressions are saved and reused. # Paths passed to functions accepting expressions are relative to the first declared 'path' argument, # unless they are absolute or start with './' or '../'. ``` -------------------------------- ### Initialize reliq Object with Reference URL (`ref`) Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Explains how the optional `ref` argument influences the base URL of the parsed HTML document. It demonstrates its behavior when a base tag is present in the HTML or when `ref` is an empty string. ```Python rq = reliq('

Example

') rq.ref # None rq2 = reliq(b'

Second example

',ref="http://en.wikipedia.org") rq2.ref # http://en.wikipedia.org rq3 = reliq(b'

Second example

',ref="http://en.wikipedia.org") rq3.ref # https://wikipedia.org rq4 = reliq(b'

Second example

',ref="") rq4.ref # https://wikipedia.org ``` -------------------------------- ### URL Joining Functions Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Explains `urljoin` and `ujoin` functions, which are similar to `urllib.parse.urljoin`. `urljoin` supports byte arguments and returns based on the `raw` parameter, while `ujoin` uses a default reference URL. ```APIDOC urljoin(base_url, url, *, raw=False) - Joins a base URL with another URL, similar to urllib.parse.urljoin. - Supports byte arguments for base_url and url. - Returns str or bytes based on the raw argument. - Parameters: - base_url: The base URL (str or bytes). - url: The URL to join (str or bytes). - raw: If True, returns bytes; otherwise, returns str. ujoin(url, *, raw=False) - Works like urljoin but uses a default reference URL for the 'ref' argument. - Parameters: - url: The URL to join (str or bytes). - raw: If True, returns bytes; otherwise, returns str. ``` -------------------------------- ### Reference URL Access Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Shows how to access the saved reference URL provided during the initialization of a reliq object, both as a string and as raw bytes. ```python rq = reliq('',ref="http://en.wikipedia.org") rq.ref # "http://en.wikipedia.org" rq.ref_raw # b"http://en.wikipedia.org" ``` -------------------------------- ### Item Access and Indexing Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Explains how the __getitem__ method works for accessing elements, distinguishing between 'single' and 'list' results based on the axis used. ```python rq = reliq('

1

Text H
') first = rq[0] # struct #
first[1] # single # r = first.filter('( text@ * )( * ) child@') r[1] # list # " Text " obj r[2] == first[1] ``` -------------------------------- ### Decode HTML Entities with Reliq Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Demonstrates the `reliq.decode` function for converting HTML entities back into characters. Shows variations using `raw=True` and `no_nbsp=False` to control output format and non-breaking space handling. ```python import reliq # Decode with default settings print(reliq.decode(r"text & < ⃛ Ô")) # Expected output: b'text & < \xe2\x83\x9b\xe2\x83\x9b \xc3\x94' # Decode with no_nbsp=False, preserving non-breaking space print(reliq.decode('ex t', no_nbsp=False)) # Expected output: 'ex\xa0t' # Decode with raw=True and no_nbsp=False, returning bytes print(reliq.decode('ex t', True, no_nbsp=False)) # Expected output: b'ex\xc2\xa0t' ``` -------------------------------- ### Tag Properties and Content Access Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Illustrates accessing various properties of tag elements, including structural levels, counts of tags/text/comments, attributes, and content. ```python rq = reliq("""
""") ul = rq[0][0] a = ul[0] li1 = a[0] li2 = ul[1] ul.name # 'ul' ul.name_raw # b'ul' ul.lvl # 1 li1.lvl # 3 ul.text # '\n \n \n ' ul.text_recursive # '\n \n L1\n \n L2\n ' a.insides # '\n
  • L1
  • \n ' ``` -------------------------------- ### Comprehensive reliq-python HTML Parsing and Data Extraction Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Demonstrates advanced usage of the reliq library for parsing HTML, compiling complex expressions, filtering elements, traversing ancestors, searching with compiled expressions, handling errors, and extracting text content. It also shows how to decode HTML entities and convert parsed data to a dictionary based on a structured expression. ```Python from reliq import reliq html = "" with open('index.html','r') as f: html = f.read() rq = reliq(html) #parse html expr = reliq.expr(r""" div .user; { a href; { .name @ | "%i", .link @ | "%(href)v" }, .score.u span .score, .info dl; { .key dt | "%i", .value dd | "%i" } |, .achievements.a li class=b>"achievement-" | "%i\n" } """) #expressions can be compiled users = [] links = [] for i in rq.filter(r'table; { tr, text@ iw>lisp }')[:-2]: # ignore comments and text nodes if i.type is not reliq.Type.tag: continue first_child = i[0] if first_child.desc_count < 3 and first_child.name == "div" and first_child.starttag == '
    ': continue link = first_child[2].attrib['href'] if re.match('^https://$',link): links.append(link) continue #make sure that object is an ancestor of
    tag for j in i.ancestors(): if j.name == "main": break else: continue #search() returns str, in this case expression is already compiled # but can be also passed as a str() or bytes(). If Path() is passed # file will be read user = json.loads(i.search(expr)) users.append(user) try: #handle errors rq.search('p / /','

    ') except reliq.ScriptError: # all errors inherit from reliq.Error print("error") #get text from all text nodes that are descendants of object print(rq[2].text_recursive) #get text from all text nodes that are children of object print(rq[2].text) #decode html entities reliq.decode('loop & < ⃛ Ô') #execute and convert to dictionary rq.json(r""" .files * #files; ( li )( span .head ); { .type i class child@ | "%(class)v" / sed "s/^flaticon-//", .name @ | "%Dt" / trim sed "s/ ([^)]* [a-zA-Z][Bb])$//", .size @ | "%t" / sed 's/.* \(([^)]* [a-zA-Z][Bb])\)$/\1/; s/,//g; /^[0-9].* [a-zA-Z][bB]$/!d' "E" } | """) #dict format is enforced and any incompatible expressions will raise reliq.ScriptError ``` -------------------------------- ### Encode Strings with Reliq Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Illustrates the `reliq.encode` function for converting special characters and HTML entities into their encoded forms. Covers options like `raw=True` for byte output and `full=True` for comprehensive entity encoding. ```python import reliq # Encode a string with standard HTML entities print(reliq.encode("

    li & \t 'seq' \n

    ")) # Expected output: '<p>li &amp; \t 'seq' \n </p>' # Encode with raw=True, returning bytes print(reliq.encode("

    li & \t 'seq' \n

    ", True)) # Expected output: b'<p>li &amp; \t 'seq' \n </p>' # Encode with full=True for comprehensive entity encoding print(reliq.encode("

    li & \t 'seq' \n

    ", full=True)) # Expected output: '<p>li &amp; 'seq' </p>' # Encode with raw=True, full=True, returning bytes print(reliq.encode("

    li & \t 'seq' \n

    ", True, full=True)) # Expected output: b'<p>li &amp; 'seq' </p>' ``` -------------------------------- ### String Representation of Objects Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Demonstrates the string conversion for reliq objects, including the full structure, filtered lists, single elements, and empty objects. ```python rq = reliq("""

    H1

    N2

    N3

    """) str(rq) # struct # '\n

    H1

    \n

    N2

    \n

    N3

    \n' str(rq.filter('h2')) # list # '

    N2

    N3

    ' str(rq[0]) # single # '

    H1

    ' str(reliq()) # empty # '' ``` -------------------------------- ### Relativity and Context in Reliq Filtering Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Explains the concept of 'relativity' in reliq, where filtered objects retain a pointer to their context node. Demonstrates how `.rlvl`, `.rposition`, and `.rparent()` work with relative selections. ```python import reliq rq = reliq.parse(""" """) # Non-relative selection li = rq[0][0][0][1] # Selects the second 'li' element # Relative selection based on a filter # 'li i@w>"B"' selects 'li' elements where the inner text is greater than 'B' li_rel = rq.filter('nav; li i@w>"B"')[0] # Relative to 'nav' # .rlvl and .rposition show relative levels and positions print(f"li.rlvl: {li.rlvl}, li_rel.rlvl: {li_rel.rlvl}") # Expected output: li.rlvl: 3, li_rel.rlvl: 2 print(f"li.rposition: {li.rposition}, li_rel.rposition: {li_rel.rposition}") # Expected output: li.rposition: 10, li_rel.rposition: 9 # Using .rparent() to get the parent relative to the current node nav_rel = li_rel.rparent()[0] print(f"nav_rel.rlvl: {nav_rel.rlvl}, nav_rel.rposition: {nav_rel.rposition}") # Expected output: nav_rel.rlvl: -2, nav_rel.rposition: -7 # Iterating descendants with relativity nav = rq[0][0] for i in nav.descendants(rel=True): if i.rlvl == 2 and i.name == 'li': print(f"Descendant li: lvl={i.lvl}, rlvl={i.rlvl}") # Expected output: Descendant li: lvl=3, rlvl=2 break ``` -------------------------------- ### Comment Content Access Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Demonstrates how to retrieve the content of HTML comments using the 'insides' property and how comments can be converted to bytes or strings. ```python c = reliq('').self(type=None)[0] c.insides # ' Comment ' bytes(c) # b'' str(c) # '' ``` -------------------------------- ### Text Content Conversion Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Shows the conversion of plain text elements into string format using the str() function. ```python t = reliq('Example').self(type=None)[0] str(t) # 'Example' ``` -------------------------------- ### reliq.Type Enum and Object States Source: https://github.com/tuvimen/reliq-python/blob/master/README.md Documents the `reliq.Type` enumeration, which defines the internal state or type of a `reliq` object. This type influences the behavior of various methods. It describes the `empty`, `unknown`, `struct`, and `list` types. ```APIDOC reliq.Type(Flag): - empty: Description: Returned from `reliq(None)` or `reliq.filter()` when no matches are found. All methods return default values. - unknown: Description: Similar to `empty` but should ideally not occur. - struct: Description: Returned upon successful initialization, e.g., `reliq('

    Example

    ')`. - list: Description: Returned by `reliq.filter()` when it successfully finds matches. ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.