### Install apachelogs Source: https://github.com/jwodder/apachelogs/blob/master/docs/index.md Install the apachelogs library using pip. Requires Python 3.10 or higher. ```bash python3 -m pip install apachelogs ``` -------------------------------- ### Parse Apache Log Lines with LogParser Source: https://context7.com/jwodder/apachelogs/llms.txt Demonstrates creating a LogParser instance with a predefined format (COMBINED) and parsing a single log line. Shows how to access parsed fields and handle InvalidEntryError. ```python from apachelogs import LogParser, COMBINED, InvalidEntryError # Build a reusable parser for the NCSA Combined log format parser = LogParser(COMBINED) # Equivalent explicit format string: # parser = LogParser('%h %l %u %t "%r" %>s %b "%%{Referer}i" "%%{User-Agent}i"') line = ( '209.126.136.4 - bob [01/Nov/2017:07:28:29 +0000] ' '"GET /index.html HTTP/1.1" 200 4321 ' '"https://example.com/" "Mozilla/5.0 (X11; Linux x86_64)"' ) try: entry = parser.parse(line) except InvalidEntryError as e: print(f"Parse failed: {e}") raise print(entry.remote_host) # '209.126.136.4' print(entry.remote_user) # 'bob' print(entry.request_time) # datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc) print(entry.request_line) # 'GET /index.html HTTP/1.1' print(entry.final_status) # 200 print(entry.bytes_sent) # 4321 print(entry.headers_in["Referer"]) # 'https://example.com/' print(entry.headers_in["User-Agent"]) print(entry.remote_logname) # None (bare '-' decoded to None) # Directive-based lookup (alternative to named attributes) print(entry.directives["%h"]) print(entry.directives["%>s"]) ``` -------------------------------- ### Log Format Constants Source: https://context7.com/jwodder/apachelogs/llms.txt Shows how to import and use pre-defined log format constants like `COMMON`, `COMBINED`, and `COMBINED_DEBIAN` with the `LogParser` class. ```python import apachelogs # Common Log Format (CLF) print(apachelogs.COMMON) # '%h %l %u %t "%r" %>s %b' # CLF with virtual host prepended print(apachelogs.VHOST_COMMON) # '%v %h %l %u %t "%r" %>s %b' # NCSA Combined (CLF + Referer + User-Agent) print(apachelogs.COMBINED) # '%h %l %u %t "%r" %>s %b "%%{Referer}i" "%%{User-Agent}i"' # Combined with %O (bytes including headers) — Debian/Ubuntu Apache default print(apachelogs.COMBINED_DEBIAN) # '%h %l %u %t "%r" %>s %O "%%{Referer}i" "%%{User-Agent}i"' # Combined Debian with virtual host:port prepended print(apachelogs.VHOST_COMBINED) # '%v:%p %h %l %u %t "%r" %>s %O "%%{Referer}i" "%%{User-Agent}i"' # Debian CLF variant print(apachelogs.COMMON_DEBIAN) # '%h %l %u %t "%r" %>s %O' # Parse using a constant from apachelogs import LogParser, VHOST_COMBINED parser = LogParser(VHOST_COMBINED) entry = parser.parse( 'www.example.com:443 203.0.113.5 - carol [02/Feb/2024:08:30:00 +0000] ' '"GET /page HTTP/2" 200 8192 "https://google.com/" "Firefox/121.0"' ) print(entry.virtual_host) # 'www.example.com' print(entry.server_port) # 443 print(entry.remote_host) # '203.0.113.5' print(entry.remote_user) # 'carol' print(entry.bytes_out) # 8192 ``` -------------------------------- ### Parse Single Log Entry with LogParser.parse() Source: https://context7.com/jwodder/apachelogs/llms.txt Shows how to use the `parse()` method of a `LogParser` instance to process a single log line. Includes handling of successful parses and catching `InvalidEntryError` for non-matching lines. ```python from apachelogs import LogParser, InvalidEntryError parser = LogParser('%h %l %u %t "%r" %>s %b') # Successful parse entry = parser.parse('127.0.0.1 - alice [10/Oct/2023:13:55:36 -0700] "DELETE /api/item/9 HTTP/1.1" 204 0\n') print(entry.remote_host) # '127.0.0.1' print(entry.remote_user) # 'alice' print(entry.final_status) # 204 print(entry.bytes_sent) # None (0 bytes → int 0, but '-' would be None) # Failed parse try: parser.parse("this does not match the format at all") except InvalidEntryError as e: print(e) # Could not match log entry 'this does not match the format at all' # against log format '%h %l %u %t "%r" %>s %b' print(e.entry) # 'this does not match the format at all' print(e.format) # '%h %l %u %t "%r" %>s %b' ``` -------------------------------- ### Stream and Parse Log Lines with LogParser.parse_lines() Source: https://context7.com/jwodder/apachelogs/llms.txt Demonstrates using `parse_lines()` to process an iterable of log entries, such as a file handle. Shows how to ignore invalid lines by setting `ignore_invalid=True` and processing entries based on their status code. ```python from apachelogs import LogParser, COMBINED parser = LogParser(COMBINED) # Stream a live log file, ignoring lines that don't match (e.g. comment lines) with open("/var/log/apache2/access.log") as fp: for entry in parser.parse_lines(fp, ignore_invalid=True): if entry.final_status >= 500: print( entry.request_time, entry.request_line, entry.final_status, ) # Example output: # 2023-10-10 14:02:11+00:00 GET /api/crash HTTP/1.1 500 ``` -------------------------------- ### LogParser Initialization Source: https://context7.com/jwodder/apachelogs/llms.txt Compiles an Apache log format string into a reusable parser. The encoding parameter controls how escaped byte strings are decoded. Raises InvalidDirectiveError or UnknownDirectiveError for malformed or unsupported directives. ```APIDOC ## LogParser(format, encoding='iso-8859-1', errors=None) Compiles an Apache log format string into a reusable parser. The `encoding` parameter controls how escaped byte strings in log fields are decoded into Python `str`; pass `'bytes'` to skip decoding and receive raw `bytes` values. Raises `InvalidDirectiveError` for malformed directives and `UnknownDirectiveError` for unsupported ones. ```python from apachelogs import LogParser, COMBINED, InvalidEntryError # Build a reusable parser for the NCSA Combined log format parser = LogParser(COMBINED) # Equivalent explicit format string: # parser = LogParser('%h %l %u %t "%r" %>s %b "%%{Referer}i" "%%{User-Agent}i"') line = ( '209.126.136.4 - bob [01/Nov/2017:07:28:29 +0000] ' '"GET /index.html HTTP/1.1" 200 4321 ' '"https://example.com/" "Mozilla/5.0 (X11; Linux x86_64)"' ) try: entry = parser.parse(line) except InvalidEntryError as e: print(f"Parse failed: {e}") raise print(entry.remote_host) # '209.126.136.4' print(entry.remote_user) # 'bob' print(entry.request_time) # datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc) print(entry.request_line) # 'GET /index.html HTTP/1.1' print(entry.final_status) # 200 print(entry.bytes_sent) # 4321 print(entry.headers_in["Referer"]) # 'https://example.com/' print(entry.headers_in["User-Agent"]) # 'Mozilla/5.0 (X11; Linux x86_64)' print(entry.remote_logname) # None (bare '-' decoded to None) # Directive-based lookup (alternative to named attributes) print(entry.directives["%h"]) # '209.126.136.4' print(entry.directives["%>s"]) # 200 ``` ``` -------------------------------- ### parse(format, entry, encoding='iso-8859-1', errors=None) Source: https://context7.com/jwodder/apachelogs/llms.txt Parses a single log entry string into a structured object. This is a convenience function that internally creates and uses a LogParser. ```APIDOC ## `parse(format, entry, encoding='iso-8859-1', errors=None)` Module-level convenience function for parsing a single entry without constructing a `LogParser` directly. Internally creates and immediately uses a `LogParser`. ```python from apachelogs import parse, COMMON entry = parse( COMMON, '10.0.0.1 - - [25/Dec/2023:00:00:01 +0000] "GET /health HTTP/1.1" 200 12', ) print(entry.remote_host) # '10.0.0.1' print(entry.final_status) # 200 print(entry.request_time) # datetime.datetime(2023, 12, 25, 0, 0, 1, tzinfo=datetime.timezone.utc) ``` ``` -------------------------------- ### Custom Encoding and LogEntry Directives Source: https://context7.com/jwodder/apachelogs/llms.txt Configure log parsing with custom encoding for fields marked with '*'. Use 'bytes' for raw byte decoding. Access log fields via directive-keyed lookup in LogEntry.directives. ```python from apachelogs import LogParser # Default iso-8859-1 decoding parser_default = LogParser('%h %U', encoding='iso-8859-1') # Raw bytes — no decoding of escaped sequences parser_bytes = LogParser('%h %U', encoding='bytes') line = r'10.0.0.1 /caf\xc3\xa9' entry_str = parser_default.parse(line) print(type(entry_str.remote_host)) # print(entry_str.request_uri) # '/café' (iso-8859-1 decoded) entry_bytes = parser_bytes.parse(line) print(type(entry_bytes.remote_host)) # print(entry_bytes.request_uri) # b'/caf\xc3\xa9' (raw bytes) # directive-keyed access print(entry_str.directives["%h"]) print(entry_str.directives["%U"]) # Status-code-conditional directives parser_cond = LogParser('"%400r" "%r"') entry_cond = parser_cond.parse('"-" "GET /index.html HTTP/1.1"') print(entry_cond.directives["%400r"]) print(entry_cond.directives["%r"]) ``` -------------------------------- ### Parse Multiple Log Lines with `parse_lines()` Source: https://context7.com/jwodder/apachelogs/llms.txt Demonstrates parsing an iterable of raw log lines using the `parse_lines` convenience function. It iterates through the parsed entries and prints specific fields. ```python from apachelogs import parse_lines, COMBINED raw_lines = [ '1.2.3.4 - - [01/Jan/2024:12:00:00 +0000] "GET / HTTP/1.1" 200 512 "-" "curl/8.0"', '5.6.7.8 - alice [01/Jan/2024:12:00:01 +0000] "POST /login HTTP/1.1" 302 0 "-" "Mozilla/5.0"', ] for entry in parse_lines(COMBINED, raw_lines): print(f"{entry.remote_host:15} {entry.final_status} {entry.request_line}") # 1.2.3.4 200 GET / HTTP/1.1 # 5.6.7.8 302 POST /login HTTP/1.1 ``` -------------------------------- ### Parse Single Log Entry with `parse()` Source: https://context7.com/jwodder/apachelogs/llms.txt Uses the module-level `parse` function to parse a single log entry string using a predefined format. This is a convenience function that internally creates and uses a `LogParser`. ```python from apachelogs import parse, COMMON entry = parse( COMMON, '10.0.0.1 - - [25/Dec/2023:00:00:01 +0000] "GET /health HTTP/1.1" 200 12', ) print(entry.remote_host) # '10.0.0.1' print(entry.final_status) # 200 print(entry.request_time) # datetime.datetime(2023, 12, 25, 0, 0, 1, tzinfo=datetime.timezone.utc) ``` -------------------------------- ### Parse a single Apache log entry Source: https://github.com/jwodder/apachelogs/blob/master/docs/index.md Instantiate a LogParser with a custom format string and use it to parse a single log entry. The parsed entry provides attributes for each log directive and automatically converts values to appropriate Python types. ```python from apachelogs import LogParser parser = LogParser("%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"") # The above log format is also available as the constant `apachelogs.COMBINED`. entry = parser.parse('209.126.136.4 - - [01/Nov/2017:07:28:29 +0000] \"GET / HTTP/1.1\" 301 521 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\"\n') entry.remote_host entry.request_time entry.request_line entry.final_status entry.bytes_sent entry.headers_in["Referer"] is None entry.headers_in["User-Agent"] # Log entry components can also be looked up by directive: entry.directives["%r"] entry.directives["%>s"] entry.directives["%t"] ``` -------------------------------- ### Aggregate Statistics from Log File Source: https://context7.com/jwodder/apachelogs/llms.txt Reads an Apache log file, parses its entries, and calculates total bytes served and counts of final status codes. Requires the 'apachelogs' library and a log file at '/var/log/apache2/access.log'. ```python from apachelogs import parser with open("/var/log/apache2/access.log") as fp: entries = list(parser.parse_lines(fp, ignore_invalid=True)) total_bytes = sum(e.bytes_sent or 0 for e in entries) status_counts = {} for e in entries: status_counts[e.final_status] = status_counts.get(e.final_status, 0) + 1 print(f"Total bytes served: {total_bytes}") print(f"Status counts: {status_counts}") ``` -------------------------------- ### Parse a file of Apache log entries Source: https://github.com/jwodder/apachelogs/blob/master/docs/index.md Iterate over log entries in a file using the parse_lines method of a LogParser instance. This is efficient for processing large log files. ```python with open('/var/log/apache2/access.log') as fp: for entry in parser.parse_lines(fp): print(str(entry.request_time), entry.request_line) ``` -------------------------------- ### parse_lines(format, entries, encoding='iso-8859-1', errors=None, ignore_invalid=False) Source: https://context7.com/jwodder/apachelogs/llms.txt Parses an iterable of log entry strings into structured objects. This is a convenience function that avoids the need to instantiate a LogParser directly. ```APIDOC ## `parse_lines(format, entries, encoding='iso-8859-1', errors=None, ignore_invalid=False)` Module-level convenience function for parsing an iterable of entries without constructing a `LogParser` directly. ```python from apachelogs import parse_lines, COMBINED raw_lines = [ '1.2.3.4 - - [01/Jan/2024:12:00:00 +0000] "GET / HTTP/1.1" 200 512 "-" "curl/8.0"', '5.6.7.8 - alice [01/Jan/2024:12:00:01 +0000] "POST /login HTTP/1.1" 302 0 "-" "Mozilla/5.0"', ] for entry in parse_lines(COMBINED, raw_lines): print(f"{entry.remote_host:15} {entry.final_status} {entry.request_line}") # 1.2.3.4 200 GET / HTTP/1.1 # 5.6.7.8 302 POST /login HTTP/1.1 ``` ``` -------------------------------- ### Handle `UnknownDirectiveError` Source: https://context7.com/jwodder/apachelogs/llms.txt Illustrates catching `UnknownDirectiveError` when a log format string includes a syntactically valid but unsupported directive. The directive itself is accessible via the exception object. ```python from apachelogs import LogParser, UnknownDirectiveError # UnknownDirectiveError — syntactically valid but unsupported directive try: LogParser("%h %Z") # %Z is not a valid Apache log directive except UnknownDirectiveError as e: print(e) # Unknown log format directive: '%Z' print(e.directive) # '%Z' ``` -------------------------------- ### LogParser.parse() Source: https://context7.com/jwodder/apachelogs/llms.txt Parses a single log entry string and returns a LogEntry object. Trailing `\r\n` is stripped automatically. Raises `InvalidEntryError` if the line does not match the format. ```APIDOC ## LogParser.parse(entry) Parses a single log entry string and returns a `LogEntry`. Trailing `\r\n` is stripped automatically. Raises `InvalidEntryError` if the line does not match the format. ```python from apachelogs import LogParser, InvalidEntryError parser = LogParser('%h %l %u %t "%r" %>s %b') # Successful parse entry = parser.parse('127.0.0.1 - alice [10/Oct/2023:13:55:36 -0700] "DELETE /api/item/9 HTTP/1.1" 204 0\n') print(entry.remote_host) # '127.0.0.1' print(entry.remote_user) # 'alice' print(entry.final_status) # 204 print(entry.bytes_sent) # None (0 bytes → int 0, but '-' would be None) # Failed parse try: parser.parse("this does not match the format at all") except InvalidEntryError as e: print(e) # Could not match log entry 'this does not match the format at all' # against log format '%h %l %u %t "%r" %>s %b' print(e.entry) # 'this does not match the format at all' print(e.format) # '%h %l %u %t "%r" %>s %b' ``` ``` -------------------------------- ### LogParser.parse_lines() Source: https://context7.com/jwodder/apachelogs/llms.txt Parses an iterable of log entry strings (e.g., a file handle) and yields `LogEntry` objects. When `ignore_invalid=True`, lines that do not match the format are silently skipped instead of raising `InvalidEntryError`. ```APIDOC ## LogParser.parse_lines(entries, ignore_invalid=False) Parses an iterable of log entry strings (e.g., a file handle) and yields `LogEntry` objects. When `ignore_invalid=True`, lines that do not match the format are silently skipped instead of raising `InvalidEntryError`. ```python from apachelogs import LogParser, COMBINED parser = LogParser(COMBINED) # Stream a live log file, ignoring lines that don't match (e.g. comment lines) with open("/var/log/apache2/access.log") as fp: for entry in parser.parse_lines(fp, ignore_invalid=True): if entry.final_status >= 500: print( entry.request_time, entry.request_line, entry.final_status, ) # Example output: # 2023-10-10 14:02:11+00:00 GET /api/crash HTTP/1.1 500 ``` ``` -------------------------------- ### Parse Multiple Log Entries from a File Source: https://github.com/jwodder/apachelogs/blob/master/README.rst Iterate over log entries from a file object using `parse_lines`. This is suitable for processing large log files efficiently. ```python with open('/var/log/apache2/access.log') as fp: # doctest: +SKIP for entry in parser.parse_lines(fp): print(str(entry.request_time), entry.request_line) ``` -------------------------------- ### Handle `InvalidDirectiveError` Source: https://context7.com/jwodder/apachelogs/llms.txt Demonstrates catching `InvalidDirectiveError` when a log format string contains malformed directive syntax. It shows how to access the format string and the position of the error. ```python from apachelogs import LogParser, InvalidDirectiveError # InvalidDirectiveError — malformed directive syntax in the format string try: LogParser("%h %") # incomplete directive at the end except InvalidDirectiveError as e: print(e) # Invalid log format directive at index 3 of '%h %' print(e.format) # '%h %' print(e.pos) # 3 ``` -------------------------------- ### parse_apache_timestamp(s) Source: https://context7.com/jwodder/apachelogs/llms.txt Parses an Apache-formatted timestamp string into a timezone-aware datetime object. It handles the standard format and interprets month abbreviations correctly. ```APIDOC ## `parse_apache_timestamp(s)` Parses an Apache-format timestamp string into a timezone-aware `datetime.datetime`. Handles the `[DD/Mon/YYYY:HH:MM:SS ±HHMM]` format (brackets optional) and always interprets month abbreviations as English regardless of the current locale. Returns `None` if `s` is `None`; raises `ValueError` on malformed input. ```python from apachelogs import parse_apache_timestamp # With brackets (as it appears in log lines) dt = parse_apache_timestamp('[01/Nov/2017:07:28:29 +0000]') print(dt) # datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc) print(dt.tzinfo) # datetime.timezone.utc # Without brackets dt2 = parse_apache_timestamp('14/Apr/2018:18:39:42 +0530') print(dt2) # datetime.datetime(2018, 4, 14, 18, 39, 42, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800))) # None passthrough print(parse_apache_timestamp(None)) # None # Invalid input try: parse_apache_timestamp("not-a-timestamp") except ValueError as e: print(e) # not-a-timestamp ``` ``` -------------------------------- ### Handle Invalid Log Entries with InvalidEntryError Source: https://context7.com/jwodder/apachelogs/llms.txt Catch InvalidEntryError when a log line does not match the compiled format. Access the original entry and format from the exception object. ```python from apachelogs import LogParser, InvalidEntryError parser = LogParser('%h %>s') try: parser.parse("not-an-ip 200 extra-field") except InvalidEntryError as e: print(e) # Could not match log entry 'not-an-ip 200 extra-field' # against log format '%h %>s' print(e.entry) # 'not-an-ip 200 extra-field' print(e.format) # '%h %>s' ``` -------------------------------- ### Parse Apache Timestamp String Source: https://context7.com/jwodder/apachelogs/llms.txt Uses `parse_apache_timestamp` to convert Apache log timestamp strings into timezone-aware `datetime.datetime` objects. It handles timestamps with or without brackets and correctly interprets month abbreviations. ```python from apachelogs import parse_apache_timestamp # With brackets (as it appears in log lines) dt = parse_apache_timestamp('[01/Nov/2017:07:28:29 +0000]') print(dt) # datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc) print(dt.tzinfo) # datetime.timezone.utc # Without brackets dt2 = parse_apache_timestamp('14/Apr/2018:18:39:42 +0530') print(dt2) # datetime.datetime(2018, 4, 14, 18, 39, 42, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800))) # None passthrough print(parse_apache_timestamp(None)) # None # Invalid input try: parse_apache_timestamp("not-a-timestamp") except ValueError as e: print(e) # not-a-timestamp ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.