### Run the Parser Source: https://github.com/pest-parser/book/blob/master/src/examples/ini.md Example output from executing the parser via cargo. ```shell $ cargo run [ ... ] { "": { "password": "plain_text", "username": "noha", "salt": "NaCl" }, "second_server": { "ip": "", "document_root": "/var/www/example.com", "interface": "eth1" }, "server_1": { "interface": "eth0", "document_root": "/var/www/example.org", "ip": "127.0.0.1" } } ``` -------------------------------- ### Complete JSON Parser Example Source: https://context7.com/pest-parser/book/llms.txt Build a full JSON parser by defining the grammar, constructing an Abstract Syntax Tree (AST), and implementing recursive parsing logic. This example requires a separate `json.pest` grammar file. ```rust // grammar: json.pest // json = _ { SOI ~ (object | array) ~ EOI } // value = _{ object | array | string | number | boolean | null } // object = { "{" ~ "}" | "{" ~ pair ~ ("," ~ pair)* ~ "}" } // pair = { string ~ ":" ~ value } // array = { "[" ~ "]" | "[" ~ value ~ ("," ~ value)* ~ "]" } // string = ${ "\"" ~ inner ~ "\"" } // inner = @{ char* } // char = { !("\"" | "\\") ~ ANY | "\" ~ ("\"" | "\\") ~ ("n" | "r" | "t" | "u" ~ ASCII_HEX_DIGIT{4}) } // number = @{ "-"? ~ ("0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT*) ~ ("." ~ ASCII_DIGIT*)? ~ (^"e" ~ ("+" | "-")? ~ ASCII_DIGIT+)? } // boolean = { "true" | "false" } // null = { "null" } // WHITESPACE = _ { " " | "\t" | "\r" | "\n" } use pest::Parser; use pest::error::Error; use pest::iterators::Pair; use pest_derive::Parser; #[derive(Parser)] #[grammar = "json.pest"] struct JSONParser; #[derive(Debug)] enum JSONValue<'a> { Object(Vec<(&'a str, JSONValue<'a>)>), Array(Vec>), String(&'a str), Number(f64), Boolean(bool), Null, } fn parse_json(input: &str) -> Result> { let json = JSONParser::parse(Rule::json, input)?.next().unwrap(); fn parse_value(pair: Pair) -> JSONValue { match pair.as_rule() { Rule::object => JSONValue::Object( pair.into_inner() .map(|p| { let mut inner = p.into_inner(); let key = inner.next().unwrap().into_inner().next().unwrap().as_str(); let value = parse_value(inner.next().unwrap()); (key, value) }) .collect(), ), Rule::array => JSONValue::Array(pair.into_inner().map(parse_value).collect()), Rule::string => JSONValue::String(pair.into_inner().next().unwrap().as_str()), Rule::number => JSONValue::Number(pair.as_str().parse().unwrap()), Rule::boolean => JSONValue::Boolean(pair.as_str() == "true"), Rule::null => JSONValue::Null, _ => unreachable!(), } } Ok(parse_value(json)) } fn main() { let input = r#"{"name": "pest", "version": 2.6, "active": true, "tags": ["parser", "peg"]}"#; match parse_json(input) { Ok(json) => println!("{:#?}", json), Err(e) => eprintln!("Error: {}", e), } // Output: // Object([ // ("name", String("pest")), // ("version", Number(2.6)), // ("active", Boolean(true)), // ("tags", Array([String("parser"), String("peg")])), // ]) } ``` -------------------------------- ### Example INI Configuration File Source: https://github.com/pest-parser/book/blob/master/src/examples/ini.md A sample INI file structure containing key-value pairs and sections. ```ini username = noha password = plain_text salt = NaCl [server_1] interface=eth0 ip=127.0.0.1 document_root=/var/www/example.org [empty_section] [second_server] document_root=/var/www/example.com ip= interface=eth1 ``` -------------------------------- ### Example of string interpolation Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md A Python example demonstrating string interpolation where whitespace is handled normally. ```python #!/bin/env python3 print(f"The answer is {2 + 4}.") ``` -------------------------------- ### J Program Example Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md Demonstrates basic J language syntax for strings, monadic operations, matrix manipulation, and arithmetic operations on lists. ```j 'A string' *: 1 2 3 4 matrix =: 2 3 $ 5 + 2 3 4 5 6 7 10 * matrix 1 + 10 20 30 1 2 3 + 10 residues =: 2 | 0 1 2 3 4 5 6 7 residues ``` -------------------------------- ### Example J Program Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md This is an example J program demonstrating various J language constructs, including arithmetic operations, assignments, and monadic/dyadic verbs. It is used as input for the Rust parser. ```j _2.5 ^ 3 *: 4.8 title =: 'Spinning at the Boundary' *: _1 2 _3 4 1 2 3 + 10 20 30 1 + 10 20 30 1 2 3 + 2 | 0 1 2 3 4 5 6 7 another =: 'It''s Escaped' 3 | 0 1 2 3 4 5 6 7 (2+1)*(2+2) 3 * 2 + 1 1 + 3 % 4 x =: 100 x - 1 y =: x - 1 y ``` -------------------------------- ### Rust raw string literal example Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Example of a raw string literal in Rust. ```rust const raw_str: &str = r###" Some number of number signs # followed by a quotation mark ". Quotation marks can be used anywhere inside: """""""", as long as one is not followed by a matching number of number signs, which ends the string: "###; ``` -------------------------------- ### Example JSON Document Source: https://github.com/pest-parser/book/blob/master/src/examples/json.md A sample JSON document illustrating nested objects, arrays, and string escaping. ```json { "nesting": { "inner object": {} }, "an array": [1.5, true, null, 1e-6], "string with escaped double quotes" : "\"quick brown foxes\"" } ``` -------------------------------- ### Get Span and Position Information Source: https://context7.com/pest-parser/book/llms.txt Use `as_span()` to get the span of a matched rule, then `start()` and `end()` for character offsets. `start_pos()` and `end_pos()` provide line and column details. ```rust use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar_inline = r#" number = { ASCII_DIGIT+ } expression = { number ~ "+" ~ number } "#] struct MyParser; fn main() { let input = "line one\n42 + 17\nline three"; // Parse just the expression part let pair = MyParser::parse(Rule::expression, "42 + 17") .unwrap() .next() .unwrap(); // Get span information let span = pair.as_span(); println!("Matched: '{}'", span.as_str()); println!("Start: {}, End: {}", span.start(), span.end()); // Output: // Matched: '42 + 17' // Start: 0, End: 7 // Get position details let start_pos = span.start_pos(); let end_pos = span.end_pos(); println!("Start line:col = {:?}", start_pos.line_col()); println!("End line:col = {:?}", end_pos.line_col()); // Output: // Start line:col = (1, 1) // End line:col = (1, 8) } ``` -------------------------------- ### Example AWK Program Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Demonstrates a typical AWK program structure with BEGIN, pattern matching, and END blocks. This program processes employee data, counts engineers, and prints information based on age. ```awk BEGIN { print "Processing employee data..." } /Engineer/ { engineers++ } $2 > 30 { print $1, "is over 30 years old" } END { print "Found", engineers, "engineers" } ``` -------------------------------- ### Reading AWK Program from File Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Execute an AWK program stored in a file. This example filters employees older than 30. ```bash echo '$2 > 30 { print $1, "is over 30" }' > filter.awk cargo run -- -f filter.awk employees.txt ``` -------------------------------- ### Shell Command Output Source: https://github.com/pest-parser/book/blob/master/src/examples/csv.md Example output from running the Rust program using `cargo run`. Shows the calculated sum and record count. ```shell $ cargo run [ ... ] Sum of fields: 2643429302.327908 Number of records: 5 ``` -------------------------------- ### Example CSV Data Source: https://github.com/pest-parser/book/blob/master/src/examples/csv.md A sample CSV file containing numeric fields. Ensure a trailing carriage return is present to avoid parsing errors. ```text 65279,1179403647,1463895090 3.1415927,2.7182817,1.618034 -40,-273.15 13,42 65537 ``` -------------------------------- ### Pest Grammar for Integers and Atoms Source: https://github.com/pest-parser/book/blob/master/src/examples/calculator.md Defines the basic building blocks for the calculator grammar, starting with integers. ```pest // No whitespace allowed between digits integer = @{ ASCII_DIGIT+ } atom = _ { integer } ``` -------------------------------- ### Statistical Processing with AWK Clone Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Perform statistical calculations using AWK's BEGIN, pattern, and END blocks. This example calculates the average age of individuals over 25. ```bash cargo run -- -p 'BEGIN { sum = 0 count = 0 } $2 > 25 { sum += $2 count++ } END { print "Average age over 25:", sum/count }' employees.txt ``` -------------------------------- ### Enforce Full Input Matching with SOI and EOI Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Use Start of Input (SOI) and End of Input (EOI) markers to ensure a rule matches the entire input string. ```pest main = { SOI ~ (...) ~ EOI } ``` -------------------------------- ### Define Number and Expression Rules in Pest Source: https://github.com/pest-parser/book/blob/master/src/grammars/peg.md Defines rules for recognizing numbers and simple expressions using Pest syntax. Requires no specific setup beyond the Pest parser generator. ```pest number = { ASCII_DIGIT+ } expression = { number | "true" } ``` -------------------------------- ### Pest Grammar for Whitespace and Equation Source: https://github.com/pest-parser/book/blob/master/src/examples/calculator.md Defines whitespace handling and the top-level rule for a complete equation, ensuring it starts at the beginning (SOI) and ends at the end (EOI) of the input. ```pest WHITESPACE = _{ " " } // We can't have SOI and EOI on expr directly, because it is used // recursively (e.g. with parentheses) equation = _{ SOI ~ expr ~ EOI } ``` -------------------------------- ### Pest Grammar: Lexing Rules for Numbers, Identifiers, and Strings Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md Defines how integers, decimals, identifiers, and strings are tokenized. Negative numbers use an underscore prefix. Identifiers start with a letter. Strings are enclosed in single quotes, with escaped quotes represented by double single quotes. ```pest integer = @{ "_"? ~ ASCII_DIGIT+ } decimal = @{ "_"? ~ ASCII_DIGIT+ ~ "." ~ ASCII_DIGIT* } ident = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* } string = @{ "'" ~ ( "''" | (!"'" ~ ANY) )* ~ "'" } ``` -------------------------------- ### Initialize Project Source: https://github.com/pest-parser/book/blob/master/src/examples/csv.md Commands to create a new binary project using Cargo. ```shell $ cargo init --bin csv-tool Created binary (application) project $ cd csv-tool ``` -------------------------------- ### Initialize Interpreter Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Sets up the Interpreter with default AWK built-in variable values and provides methods for configuration. ```rust impl Interpreter { pub fn new() -> Self { let mut variables = HashMap::new(); // Initialize AWK's built-in variables with default values variables.insert("FS".to_string(), Value::String(" ".to_string())); variables.insert("OFS".to_string(), Value::String(" ".to_string())); variables.insert("RS".to_string(), Value::String("\n".to_string())); variables.insert("ORS".to_string(), Value::String("\n".to_string())); Self { variables, fields: Vec::new(), record_number: 0, field_separator: " ".to_string(), output_field_separator: " ".to_string(), record_separator: "\n".to_string(), output_record_separator: "\n".to_string(), } } // Allow external configuration of field separator (for -F command line option) pub fn set_field_separator(&mut self, fs: String) { self.field_separator = fs.clone(); self.variables.insert("FS".to_string(), Value::String(fs)); } } ``` -------------------------------- ### Define Identifier Rule in Pest Source: https://github.com/pest-parser/book/blob/master/src/examples/rust/literals.md Defines a Pest rule for Rust-like identifiers. It handles two cases: identifiers starting with a letter followed by alphanumeric characters or underscores, and identifiers starting with an underscore followed by at least one alphanumeric character or underscore. ```pest ident = { ('a'..'z' | 'A'..'Z') ~ ident_char* | "_" ~ ident_char+ } ``` -------------------------------- ### Cargo Commands for Help and Version Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md These bash commands demonstrate how to access the help documentation and version information for the Rust application using Cargo. This is standard practice for CLI tools built with Cargo. ```bash cargo run -- --help ``` ```bash cargo run -- --version ``` -------------------------------- ### Parse Input Strings with Parser::parse Source: https://context7.com/pest-parser/book/llms.txt The `Parser::parse` method is the primary entry point for parsing. It returns a `Result` which must be handled. Basic parsing involves calling `parse` with a rule and input, then iterating through the resulting `Pairs`. Quick parsing uses `unwrap()` and panics on error. ```rust use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "grammar.pest"] struct MyParser; fn main() { // Basic parsing with error handling match MyParser::parse(Rule::expression, "4 + 5 * 2") { Ok(mut pairs) => { let expr = pairs.next().unwrap(); println!("Parsed: {:?}", expr.as_str()); // Output: Parsed: "4 + 5 * 2" } Err(error) => { println!("Parse error: {}", error); } } // Quick parsing (panics on error) let result = MyParser::parse(Rule::number, "42") .unwrap() // unwrap Result to get Pairs .next() // get first Pair (Option) .unwrap(); // unwrap Option println!("Matched: {}", result.as_str()); // Output: Matched: 42 } ``` -------------------------------- ### Invalid tagging examples Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Tags are ignored if applied to silent rules or built-in rules. ```pest rule = { #tag = expression } expression = _{ ... } ``` ```pest rule = { #tag = ASCII } ``` -------------------------------- ### Defining a repeated rule Source: https://github.com/pest-parser/book/blob/master/src/parser_api.md Example of a grammar rule using the asterisk quantifier to match zero or more occurrences. ```pest list = { number* } ``` -------------------------------- ### Configure Parser Dependencies Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Sets up the necessary imports and the derive macro for the AWK parser. ```rust use pest::Parser; use pest::iterators::{Pair, Pairs}; use pest::pratt_parser::{Assoc, Op, PrattParser}; use anyhow::{Result, anyhow}; use crate::ast::*; #[derive(pest_derive::Parser)] #[grammar = "awk.pest"] pub struct AwkParser; ``` -------------------------------- ### Initialize Pest Parser in Rust Source: https://github.com/pest-parser/book/blob/master/src/examples/ini.md Sets up the Rust struct to use the generated Pest grammar. ```rust use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "ini.pest"] pub struct INIParser; ``` -------------------------------- ### Main Application Structure in Rust Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md This Rust code sets up the main application logic for an AWK clone. It uses `clap` for command-line argument parsing, `anyhow` for error handling, and integrates custom `parser` and `interpreter` modules. It reads program logic from a file or string, processes input from a file or stdin, and executes the AWK program. ```rust use clap::{Arg, Command}; use std::fs; use std::io::{self, Read}; use anyhow::Result; // Import our modules mod ast; mod parser; mod interpreter; use parser::*; use interpreter::*; fn main() -> Result<()> { let matches = Command::new("awk-clone") .version("1.0") .author("Your Name ") .about("A simple AWK clone built with pest") .long_about(" This AWK clone demonstrates how to build a complete language interpreter using Rust and pest. It supports AWK's core features including pattern-action programming, field processing, built-in variables, and control flow constructs. Examples: awk-clone -p '{ print $1, $2 }' data.txt awk-clone -f script.awk input.txt cat data.txt | awk-clone -p 'BEGIN { sum = 0 } { sum += $2 } END { print sum }' ") .arg(Arg::new("program") .short('p') .long("program") .value_name("PROGRAM") .help("AWK program string") .long_help("AWK program as a command-line string. Cannot be used with --file.") .required_unless_present("file") .conflicts_with("file")) .arg(Arg::new("file") .short('f') .long("file") .value_name("FILE") .help("AWK program file") .long_help("Read AWK program from a file. Cannot be used with --program.") .required_unless_present("program") .conflicts_with("program")) .arg(Arg::new("input") .help("Input file (reads from stdin if not provided)") .long_help("Input data file to process. If not provided, reads from standard input.") .index(1)) .arg(Arg::new("field-separator") .short('F') .long("field-separator") .value_name("FS") .help("Field separator pattern") .long_help("Set the field separator. Default is whitespace. Examples: -F ',' for CSV, -F ':' for /etc/passwd")) .get_matches(); // Parse program source let program_text = if let Some(prog) = matches.get_one::("program") { prog.clone() } else if let Some(file) = matches.get_one::("file") { fs::read_to_string(file) .map_err(|e| anyhow::anyhow!("Failed to read program file '{}': {}", file, e)?) } else { unreachable!("clap should ensure either program or file is provided"); }; // Read input data let input_text = if let Some(input_file) = matches.get_one::("input") { fs::read_to_string(input_file) .map_err(|e| anyhow::anyhow!("Failed to read input file '{}': {}", input_file, e)?) } else { let mut buffer = String::new(); io::stdin().read_to_string(&mut buffer) .map_err(|e| anyhow::anyhow!("Failed to read from stdin: {}", e))?; buffer }; // Parse the AWK program let program = parse_program(&program_text) .map_err(|e| anyhow::anyhow!("Parse error: {}", e))?; // Create and configure interpreter let mut interpreter = Interpreter::new(); if let Some(fs) = matches.get_one::("field-separator") { interpreter.set_field_separator(fs.clone()); } // Execute the program interpreter.run_program(&program, &input_text) .map_err(|e| anyhow::anyhow!("Runtime error: {}", e))?; Ok(()) } ``` -------------------------------- ### Core Dependencies for AWK Clone Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Lists the essential Rust crates required for building the AWK clone, including the Pest parser, command-line argument parser (clap), regex engine, and error handling library (anyhow). ```toml [dependencies] pest = "2.8" pest_derive = "2.8" clap = { version = "4", features = ["derive"] } regex = "1.5" anyhow = "1" ``` -------------------------------- ### Wrap Rules with Whitespace Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Implicit whitespace is not inserted at the start or end of rules; sandwich rules between SOI and EOI to include surrounding whitespace. ```pest WHITESPACE = _{ " " } expression = { "4" ~ "+" ~ "5" } main = { SOI ~ expression ~ EOI } ``` ```text "4+5" " 4 + 5 " ``` -------------------------------- ### Cargo.toml Dependencies for AWK Clone Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Lists the necessary dependencies for building the AWK clone project, including pest for parsing, clap for CLI arguments, regex for pattern matching, and anyhow for error handling. ```toml [package] name = "awk-clone" version = "0.1.0" edition = "2021" [dependencies] pest = "2.8" pest_derive = "2.8" clap = { version = "4", features = ["derive"] } regex = "1.5" anyhow = "1" ``` -------------------------------- ### CSV Processing with Custom Separator Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Process CSV data by specifying a custom field separator. This example uses a comma and prints the name and department. ```bash echo "name,age,dept Alice,25,Engineering Bob,30,Sales" | cargo run -- -F "," -p '{ print $1, ": ", $3 }' ``` -------------------------------- ### Compare text vs pattern matching Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Demonstrates the difference between matching the same text via the stack versus matching the same pattern. ```pest same_text = { PUSH( "a" | "b" | "c" ) ~ POP } same_pattern = { ("a" | "b" | "c") ~ ("a" | "b" | "c") } ``` -------------------------------- ### Regex Pattern Matching in AWK Clone Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Use a regex pattern to filter lines and perform actions. This example prints a message for lines containing 'Engineer'. ```bash cargo run -- -p '/Engineer/ { print $1, "is an engineer" }' employees.txt ``` -------------------------------- ### Configure Input Boundaries and Implicit Whitespace Source: https://context7.com/pest-parser/book/llms.txt Define global WHITESPACE and COMMENT rules to handle automatic spacing and comments, and use SOI/EOI to enforce full input matching. ```pest // Ensure entire input is parsed main = { SOI ~ expression ~ EOI } // Define implicit whitespace (automatically inserted at ~ and repetitions) WHITESPACE = _{ " " | "\t" | "\r" | "\n" } // Define implicit comments COMMENT = _{ "//" ~ (!"\n" ~ ANY)* } // With WHITESPACE defined, these are equivalent: // expression = { "4" ~ "+" ~ "5" } // matches "4+5", "4 + 5", "4 + 5" // Expression grammar with whitespace number = @{ ASCII_DIGIT+ } // atomic: no whitespace between digits operator = { "+" | "-" | "*" | "/" } expression = { number ~ (operator ~ number)* } // matches "1 + 2 - 3" with any spacing ``` -------------------------------- ### Pest Grammar: Comment Rule Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md Defines J comments, which start with 'NB.' and extend to the end of the line. The newline character at the end of the comment line is not consumed. ```pest COMMENT = _{ "NB." ~ (!"\n" ~ ANY)* } ``` -------------------------------- ### Perform Stack Operations Source: https://context7.com/pest-parser/book/llms.txt Use stack operations like PUSH, POP, and PEEK to handle balanced delimiters or context-dependent content like raw strings. ```pest // PUSH: match and push to stack // POP: match top of stack and pop // PEEK: match top of stack without popping // DROP: remove top without matching // Match same delimiter on both sides quoted = { PUSH("'" | "\"") // push the opening quote ~ inner ~ POP // match the same quote } inner = { (!PEEK ~ ANY)* } // Rust raw string literals: r#"..."# raw_string = { "r" ~ PUSH("#"*) // match and push any number of # ~ "\"" ~ raw_content ~ "\"" ~ POP // match same number of # } raw_content = { (!("\"" ~ PEEK) ~ ANY)* } // PUSH_LITERAL: push without consuming input balanced = { "(" ~ PUSH_LITERAL(")") ~ content ~ POP | "<" ~ PUSH_LITERAL(">") ~ content ~ POP } ``` -------------------------------- ### Error Handling with `anyhow` Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Use `anyhow` for context-aware error messages and error chaining. Ensure `anyhow` is added as a dependency. ```rust Err(anyhow!("Failed to read program file '{}': {}", file, e)) ``` ```rust parse_program(&program_text)? ``` -------------------------------- ### Define Integer Literals Source: https://github.com/pest-parser/book/blob/master/src/examples/rust/literals.md Defines decimal integer literals that start with a digit and can be followed by digits or underscores. Uses a silent rule for digits to avoid repetition. ```pest int = { '0'..'9' ~ ('0'..'9' | "_")* } ``` ```pest digit = _{ '0'..'9' } int = { digit ~ (digit | "_")* } ``` -------------------------------- ### Load multiple grammar files Source: https://github.com/pest-parser/book/blob/master/src/grammars/grammars.md Combine multiple grammar files into a single parser by stacking the grammar attribute. ```rust use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "parser/base.pest"] #[grammar = "parser/grammar.pest"] struct MyParser; ``` -------------------------------- ### Rust Main Function for Parser Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md The `main` function in Rust reads a J program file, passes it to the `parse` function, and prints the resulting Abstract Syntax Tree (AST). Ensure the 'example.ijs' file exists in the same directory. ```rust fn main() { let unparsed_file = std::fs::read_to_string("example.ijs") .expect("cannot read ijs file"); let astnode = parse(&unparsed_file).expect("unsuccessful parse"); println!("{:?}", &astnode); } ``` ``` -------------------------------- ### Define JSON Root Rule Source: https://github.com/pest-parser/book/blob/master/src/examples/json.md Defines the top-level rule for a JSON file, ensuring it starts with `SOI`, contains either an object or array, and ends with `EOI`. Marked as silent to exclude these tokens from the parse tree. ```pest json = _{ SOI ~ (object | array) ~ EOI } ``` -------------------------------- ### Enable grammar-extras feature Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Add the grammar-extras feature to your Cargo.toml to support rule tagging. ```toml # ... pest_derive = { version = "2.7", features = ["grammar-extras"] } ``` -------------------------------- ### Abstract Syntax Tree Output Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md This is the expected Abstract Syntax Tree (AST) output generated by the Rust parser when run with the provided 'example.ijs' file. It represents the structure of the J program. ```shell $ cargo run [ ... ] [Print(DyadicOp { verb: Power, lhs: DoublePrecisionFloat(-2.5), rhs: Integer(3) }), Print(MonadicOp { verb: Square, expr: DoublePrecisionFloat(4.8) }), Print(IsGlobal { ident: "title", expr: Str("Spinning at the Boundary") }), Print(MonadicOp { verb: Square, expr: Terms([Integer(-1), Integer(2), Integer(-3), Integer(4)]) }), Print(DyadicOp { verb: Plus, lhs: Terms([Integer(1), Integer(2), Integer(3)]), rhs: Terms([Integer(10), Integer(20), Integer(30)]) }), Print(DyadicOp { verb: Plus, lhs: Integer(1), rhs: Terms([Integer(10), Integer(20), Integer(30)]) }), Print(DyadicOp { verb: Plus, lhs: Terms([Integer(1), Integer(2), Integer(3)]), rhs: Integer(10) }), Print(DyadicOp { verb: Residue, lhs: Integer(2), rhs: Terms([Integer(0), Integer(1), Integer(2), Integer(3), Integer(4), Integer(5), Integer(6), Integer(7)]) }), Print(IsGlobal { ident: "another", expr: Str("It\'s Escaped") }), Print(DyadicOp { verb: Residue, lhs: Integer(3), rhs: Terms([Integer(0), Integer(1), Integer(2), Integer(3), Integer(4), Integer(5), Integer(6), Integer(7)]) }), Print(DyadicOp { verb: Times, lhs: DyadicOp { verb: Plus, lhs: Integer(2), rhs: Integer(1) }, rhs: DyadicOp { verb: Plus, lhs: Integer(2), rhs: Integer(2) } }), Print(DyadicOp { verb: Times, lhs: Integer(3), rhs: DyadicOp { verb: Plus, lhs: Integer(2), rhs: Integer(1) } }), Print(DyadicOp { verb: Plus, lhs: Integer(1), rhs: DyadicOp { verb: Divide, lhs: Integer(3), rhs: Integer(4) } }), Print(IsGlobal { ident: "x", expr: Integer(100) }), Print(DyadicOp { verb: Minus, lhs: Ident("x"), rhs: Integer(1) }), Print(IsGlobal { ident: "y", expr: DyadicOp { verb: Minus, lhs: Ident("x"), rhs: Integer(1) } }), Print(Ident("y"))] ``` -------------------------------- ### Add Dependencies Source: https://github.com/pest-parser/book/blob/master/src/examples/csv.md Required dependencies for Cargo.toml to use Pest. ```toml [dependencies] pest = "2.6" pest_derive = "2.6" ``` -------------------------------- ### Basic Field Processing with AWK Clone Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Execute a simple AWK program to print the first two fields of each line from a file. ```bash cargo run -- -p '{ print $1, $2 }' employees.txt ``` -------------------------------- ### Activate pest with an external grammar file Source: https://github.com/pest-parser/book/blob/master/src/grammars/grammars.md Use the Parser derive macro to link a struct to an external .pest grammar file located relative to the src directory. ```rust use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "parser/grammar.pest"] // relative to project `src` struct MyParser; ``` -------------------------------- ### Test Parser Source: https://github.com/pest-parser/book/blob/master/src/examples/csv.md Main function to test the parser with valid and invalid inputs. ```rust fn main() { let successful_parse = CSVParser::parse(Rule::field, "-273.15"); println!("{:?}", successful_parse); let unsuccessful_parse = CSVParser::parse(Rule::field, "this is not a number"); println!("{:?}", unsuccessful_parse); } ``` -------------------------------- ### Basic Parsing Logic with Choice Source: https://github.com/pest-parser/book/blob/master/src/grammars/peg.md A general representation of parsing logic using ordered choice and sequence. This pattern tries one sequence and, if it fails, tries another. ```pest (this ~ next_thing) | (other_thing) ``` -------------------------------- ### Bash Script for Performance Benchmarking Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Shell commands to generate large test data and benchmark the AWK clone against GNU AWK using a simple filtering and aggregation task. This helps assess performance under load. ```bash # Generate test data seq 1 100000 | awk '{print "user" $1, int(rand()*100), "dept" int(rand()*10)}' > large_dataset.txt # Benchmark against GNU AWK time awk '$2 > 50 { sum += $2 count++ } END { print sum/count }' large_dataset.txt time ./target/release/awk-clone -p '$2 > 50 { sum += $2 count++ } END { print sum/count }' large_dataset.txt ``` -------------------------------- ### Implement Lookahead Predicates Source: https://context7.com/pest-parser/book/llms.txt Use positive and negative lookahead predicates to match patterns without consuming input, useful for context-sensitive parsing. ```pest // Positive predicate (&): succeeds if inner matches digit_followed_by_letter = { &ASCII_DIGIT ~ ASCII_DIGIT ~ ASCII_ALPHA } // Negative predicate (!): succeeds if inner fails // Common idiom: "any character except..." not_quote = { !"\"" ~ ANY } // String content: any character except quote or backslash string_char = { !("\"" | "\\") ~ ANY | "\\" ~ ("\"" | "\\") | "n" | "r" | "t") } // Block comment (not containing closing delimiter) block_comment = { "/*" ~ (!"*/" ~ ANY)* ~ "*/" } ``` -------------------------------- ### Initialize Properties HashMap Source: https://github.com/pest-parser/book/blob/master/src/examples/ini.md Initializes the nested HashMap structure to store parsed INI data. ```rust fn main() { // ... let mut properties: HashMap<&str, HashMap<&str, &str>> = HashMap::new(); // ... } ``` -------------------------------- ### Define Statements and Control Flow Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Defines rules for assignments, control flow structures, and print statements. ```pest // Variable and field assignment assignment = { (identifier | field_ref) ~ (assign | add_assign | sub_assign | mul_assign | div_assign | mod_assign) ~ expr } // Increment and decrement statements increment_stmt = { (identifier | field_ref) ~ increment } decrement_stmt = { (identifier | field_ref) ~ decrement } // Print statement with optional arguments print_stmt = { "print" ~ print_args? } print_args = { expr ~ ("," ~ expr)* } // Control flow if_stmt = { "if" ~ "(" ~ expr ~ ")" ~ statement ~ ("else" ~ statement)? } while_stmt = { "while" ~ "(" ~ expr ~ ")" ~ statement } for_stmt = { "for" ~ "(" ~ assignment? ~ ";" ~ expr? ~ ";" ~ assignment? ~ ")" ~ statement } // Block statements block = { "{" ~ statement* ~ "}" } // Expression as statement expr_stmt = { expr } // Union of all statement types statement = _{ assignment | increment_stmt | decrement_stmt | print_stmt | if_stmt | while_stmt | for_stmt | block | expr_stmt } ``` -------------------------------- ### Initialize a PrattParser instance Source: https://github.com/pest-parser/book/blob/master/src/precedence.md Configures operator precedence and associativity. Operators defined earlier have lower precedence, and those chained with '|' share equal precedence. ```rust let pratt = PrattParser::new() .op(Op::infix(Rule::add, Assoc::Left) | Op::infix(Rule::sub, Assoc::Left)) .op(Op::infix(Rule::mul, Assoc::Left) | Op::infix(Rule::div, Assoc::Left)) .op(Op::infix(Rule::pow, Assoc::Right)) .op(Op::postfix(Rule::fac)) .op(Op::prefix(Rule::neg)); ``` -------------------------------- ### Inspecting a Pair Source: https://github.com/pest-parser/book/blob/master/src/parser_api.md Demonstrates how to extract a rule, access the matched string, and retrieve inner rules from a Pair. ```rust let pair = Parser::parse(Rule::enclosed, "(..6472..) and more text") .unwrap().next().unwrap(); assert_eq!(pair.as_rule(), Rule::enclosed); assert_eq!(pair.as_str(), "(..6472..)"); let inner_rules = pair.into_inner(); println!("{}", inner_rules); // --> [number(3, 7)] ``` -------------------------------- ### Implement Parser Struct Source: https://github.com/pest-parser/book/blob/master/src/examples/csv.md Rust code to compile the grammar file into a parser struct. ```rust use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "csv.pest"] pub struct CSVParser; ``` -------------------------------- ### Iterate Over Parse Tokens Source: https://github.com/pest-parser/book/blob/master/src/parser_api.md Demonstrates how to obtain and iterate over tokens from a successful parse result using the `tokens()` method. This is useful for analyzing the structure of the parsed input. ```rust let parse_result = Parser::parse(Rule::sum, "1773 + 1362").unwrap(); let tokens = parse_result.tokens(); for token in tokens { println!("{:?}", token); } ``` -------------------------------- ### Read and Parse INI File Source: https://github.com/pest-parser/book/blob/master/src/examples/ini.md Reads an INI file from disk and invokes the parser. ```rust use std::collections::HashMap; use std::fs; fn main() { let unparsed_file = fs::read_to_string("config.ini").expect("cannot read file"); let file = INIParser::parse(Rule::file, &unparsed_file) .expect("unsuccessful parse") // unwrap the parse result .next().unwrap(); // get and unwrap the `file` rule; never fails // ... } ``` -------------------------------- ### Define Section and Property Rules in Pest Source: https://github.com/pest-parser/book/blob/master/src/examples/ini.md Defines the structure for sections and key-value properties. ```pest section = { "[" ~ name ~ "]" } property = { name ~ "=" ~ value } ``` -------------------------------- ### Execute Parser from Standard Input Source: https://github.com/pest-parser/book/blob/master/src/examples/calculator.md Reads lines from stdin and processes them using the defined parser and AST mapping function. ```rust fn main() -> io::Result<()> { for line in io::stdin().lock().lines() { match CalculatorParser::parse(Rule::equation, &line?) { Ok(mut pairs) => { println!( "Parsed: {:#?}", parse_expr( // inner of expr pairs.next().unwrap().into_inner() ) ); } Err(e) => { eprintln!("Parse failed: {:?}", e); } } } Ok(()) } ``` -------------------------------- ### Configure Implicit Whitespace and Comments Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Define WHITESPACE and COMMENT rules to automatically handle spacing between tokens. These rules are run repeatedly and should match a single unit. ```pest expression = { "4" ~ "+" ~ "5" } WHITESPACE = _{ " " } COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" } ``` ```text "4+5" "4 + 5" "4 + 5" "4 /* comment */ + 5" ``` ```pest expression = { "4" ~ (ws | com)* ~ "+" ~ (ws | com)* ~ "5" } ws = _{ " " } com = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" } ``` -------------------------------- ### Handling parse results Source: https://github.com/pest-parser/book/blob/master/src/parser_api.md Demonstrates matching on the Result returned by the parse method to handle success or failure. ```rust // check whether parse was successful match Parser::parse(Rule::enclosed, "(..6472..)") { Ok(mut pairs) => { let enclosed = pairs.next().unwrap(); // ... } Err(error) => { // ... } } ``` -------------------------------- ### Pest Grammar for Binary Operators Source: https://github.com/pest-parser/book/blob/master/src/examples/calculator.md Defines the binary operators (addition, subtraction, multiplication, division) used in the calculator grammar. ```pest bin_op = _{ add | subtract | multiply | divide } add = { "+" } subtract = { "-" } multiply = { "*" } divide = { "/" } ``` -------------------------------- ### Parse raw strings using the stack Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Uses the stack to track the number of hash signs in a raw string literal. ```pest raw_string = { "r" ~ PUSH("#"*) ~ "\"" // push the number signs onto the stack ~ raw_string_interior ~ "\"" ~ POP // match a quotation mark and the number signs } raw_string_interior = { ( !("\"" ~ PEEK) // unless the next character is a quotation mark // followed by the correct amount of number signs, ~ ANY // consume one character )* } ``` -------------------------------- ### Define doc comments in Pest Source: https://github.com/pest-parser/book/blob/master/src/grammars/comments.md Use /// for rule-level documentation and //! for file-level documentation. ```pest //! A parser for JSON file. json = { ... } /// Matches object, e.g.: `{ "foo": "bar" }` object = { ... } ``` -------------------------------- ### Implement Top-Level Parsing Source: https://github.com/pest-parser/book/blob/master/src/examples/awk.md Parses the input string into a Program AST by iterating over the generated parse tree rules. ```rust pub fn parse_program(input: &str) -> Result { let mut pairs = AwkParser::parse(Rule::program, input)?; let program_pair = pairs.next().unwrap(); let mut rules = Vec::new(); for rule_pair in program_pair.into_inner() { if rule_pair.as_rule() == Rule::rule { rules.push(parse_rule(rule_pair)?); } } Ok(Program { rules }) } ``` -------------------------------- ### Add Pest Dependencies to Cargo.toml Source: https://github.com/pest-parser/book/blob/master/src/examples/rust/setup.md Include pest and pest_derive in your Cargo.toml file to use the Pest parsing library and its derive macros. ```toml pest = "2.6" pest_derive = { version = "2.6", features = ["grammar-extras"] } ``` -------------------------------- ### Label Rule Parts with Tags Source: https://context7.com/pest-parser/book/llms.txt Enable the `grammar-extras` feature to use tags for labeling parts of rules. Use `find_first_tagged` to extract tagged pairs by their label. ```rust // In Cargo.toml: // pest_derive = { version = "2.7", features = ["grammar-extras"] } // In grammar.pest: // assignment = { #name = identifier ~ "=" ~ #value = expression } use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar_inline = r#" identifier = @{ ASCII_ALPHA+ } number = @{ ASCII_DIGIT+ } assignment = { #name = identifier ~ "=" ~ #value = number } "#] struct MyParser; fn main() { let pairs = MyParser::parse(Rule::assignment, "count=42") .unwrap(); // Find tagged pairs if let Some(name) = pairs.clone().find_first_tagged("name") { println!("Name: {}", name.as_str()); } if let Some(value) = pairs.clone().find_first_tagged("value") { println!("Value: {}", value.as_str()); } // Output: // Name: count // Value: 42 // Check tag on individual pair for pair in pairs { if let Some(tag) = pair.as_node_tag() { println!("Found tag: {}", tag); } } } ``` -------------------------------- ### Define a calculator grammar in pest Source: https://github.com/pest-parser/book/blob/master/src/intro.md A complete grammar definition for a simple calculator using PEG syntax. ```pest num = @{ int ~ ("." ~ ASCII_DIGIT*)? ~ (^"e" ~ int)? } int = { ("+" | "-")? ~ ASCII_DIGIT+ } operation = _{ add | subtract | multiply | divide | power } add = { "+" } subtract = { "-" } multiply = { "*" } divide = { "/" } power = { "^" } expr = { term ~ (operation ~ term)* } term = _{ num | "(" ~ expr ~ ")" } calculation = _{ SOI ~ expr ~ EOI } WHITESPACE = _{ " " | "\t" } ``` -------------------------------- ### Define atomic and compound atomic rules in Pest Source: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md Use @ for atomic rules to silence interior rules, or $ for compound atomic rules to keep interior tokens. ```pest /// Atomic rule start with `@` atomic = @{ ... } /// Compound Atomic start with `$` compound_atomic = ${ ... } ``` -------------------------------- ### Define term and expression rules Source: https://github.com/pest-parser/book/blob/master/src/examples/rust/syntax.md Combines unaries, values, and calls into a term rule, then defines the expression rule using infix operators. ```pest term = { op_unary* ~ value ~ (dot ~ call)* } expr = { term ~ (op_infix ~ term)* } ``` -------------------------------- ### Parse J Program Source to AST Source: https://github.com/pest-parser/book/blob/master/src/examples/jlang.md Parses a J program string into a vector of AST nodes. It uses the Pest parser to tokenize the input and then builds the AST, wrapping each top-level expression in a `Print` node. ```rust pub fn parse(source: &str) -> Result, Error> { let mut ast = vec![]; let pairs = JParser::parse(Rule::program, source)?; for pair in pairs { match pair.as_rule() { Rule::expr => { ast.push(Print(Box::new(build_ast_from_expr(pair)))); } _ => {} // Ignore other rules } } Ok(ast) } ``` -------------------------------- ### Test Floating Point Parsing Source: https://github.com/pest-parser/book/blob/master/src/examples/rust/literals.md Tests the parsing of floating-point numbers, including cases with a decimal point and exponents, using the `parses_to!` macro. ```rust #[test] fn zero_point() { parses_to! { parser: RustParser, input: "0.", rule: Rule::float, tokens: [ float(0, 2, [ int(0, 1) ]) ] }; } ``` ```rust #[test] fn one_exponent() { parses_to! { parser: RustParser, input: "1e10", rule: Rule::float, tokens: [ float(0, 4, [ int(0, 1), exp(1, 4, [ int(2, 4) ]) ]) ] }; } ``` -------------------------------- ### Eager Repetition in Pest Source: https://github.com/pest-parser/book/blob/master/src/grammars/peg.md Illustrates the eager matching behavior of the repetition operator (+) in Pest. This rule matches one or more ASCII digits. ```pest ASCII_DIGIT+ ```