### Match Any Character Example Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AnyChar.txt Demonstrates the usage of the 'A' token, which matches '\' followed by any character and then '/'. This example shows a successful match with three 'A' tokens. ```lezer \_/ /\x/ ==> T(A, A, A) ``` -------------------------------- ### Example: Using Extended Token 'async' Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Specialize.txt Shows how an extended token, 'async' in this example, can be interpreted differently based on the following token. When followed by a number, it's parsed as 'Async(Number)'; otherwise, it's treated as a regular identifier. ```lezer # Use of extended token can be determined by next token async 1; async; ==> T(Async(Number), Id) ``` -------------------------------- ### Example of Explicit Inline Rule Usage Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ExplicitInline.txt Demonstrates how an explicit inline rule, 'prefix', is used in a grammar. The example shows a valid input string and its corresponding parse tree. ```lezer # Compiles a::a ==> T(Foo(A, A)) ``` -------------------------------- ### Word Token Example Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenExpr.txt Illustrates the recognition of 'Word' tokens, which start with an ASCII letter followed by letters or digits. Example input includes mixed alphanumeric strings. ```lezer # Word tokens Hello Catch22 Foo azAZ09 ==> T(Word,Word,Word,Word) ``` -------------------------------- ### Token Calls: 'let' and 'if' Keywords Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenArgs.txt Shows how keyword tokens defined with arguments can be matched. This example specifically matches 'let' and 'if' keywords. ```lezer let if let ==> T(let, if, let) ``` -------------------------------- ### Lezer Grammar Example with Empty Productions Source: https://github.com/lezer-parser/generator/blob/main/test/cases/EmptyAfterLookahead.txt Illustrates the usage of the previously defined Lezer grammar, showing two 'GlobalConstantDeclaration' examples that result in empty productions due to the semicolon placement. This demonstrates the parser's ability to handle such cases without corrupting the generated tree. ```lezer // Comment // Comment Comment const ACES_INPUT = ; // Comment Comment Comment const ACES_OUTPUT = ; ==> Program( GlobalConstantDeclaration(AttributeList,Keyword,Identifier), GlobalConstantDeclaration(AttributeList,Keyword,Identifier)) ``` -------------------------------- ### Parse 'a' to Doc(A) Source: https://github.com/lezer-parser/generator/blob/main/test/cases/NodeDeclaration.txt Example of parsing the input 'a' which maps to the 'A' token within the 'Doc' node. ```lezer # Adds a document node a ==> Doc(A) ``` -------------------------------- ### Example of Grouped Parser Output Source: https://github.com/lezer-parser/generator/blob/main/test/cases/DefineGroup.txt Demonstrates the resulting parser tree structure when the `@isGroup` property is applied. ```lezer # Adds the group prop a(1) ==> T(Id[group=Expression],ParenExpr[group=Expression]("(", Number[group=Expression], "))") ``` -------------------------------- ### Example: Using 'async' as a Contextual Keyword Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Specialize.txt Demonstrates how 'async' can be used as a contextual keyword when defined with @specialize. The parser correctly identifies 'let async = 10;' as a declaration. ```lezer # Can use a contextual keyword as regular identifier let async = 10; exit; ==> T(Decl(Id, Number), Exit) ``` -------------------------------- ### Example Input and Output Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ExternalSpecializer.txt Demonstrates how the grammar and external specializer handle a specific input string. The input 'one a two b three' is parsed according to the grammar rules, resulting in a specialized token structure. ```text one a two b three ==> T(One(Id), Two(Id), Id) ``` -------------------------------- ### Example of Division vs. RegExp Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenGroups.txt Demonstrates how the parser disambiguates division operators from regular expression literals based on defined rules. The output shows the parsed structure. ```lezer # Disambiguates division from regexp x / y / /foo/ ==> T(BinOp(BinOp(Symbol,Symbol),RegExp)) ``` -------------------------------- ### Example Input and Parse Tree Source: https://github.com/lezer-parser/generator/blob/main/test/cases/SkipPosition.txt Illustrates an input string processed by the grammar defined above, showing how skipped tokens are represented in the resulting parse tree. This demonstrates the practical application of the @skip directive. ```text # Puts skipped content in the right tree position x'c';'c' x 'c' x 'c'; 'c' ==> T(Statement(Variable, Comment), Comment, Statement(Variable, Comment, Variable, Comment), Comment) ``` -------------------------------- ### Lezer Parser Example: Single Part Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TailRecursiveToken.txt Demonstrates a simple single-part token definition in Lezer. This is used for basic string literals. ```lezer `foo` ``` -------------------------------- ### Parse Input with Comments and Whitespace Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Whitespace.txt Example showing that comments and surrounding whitespace are skipped, and comments themselves can be captured if not skipped. ```lezer # Skips comments x x # I'm a comment! x ==> T(X, X, Comment, X) ``` -------------------------------- ### Lezer Parser Example: Dollar Signs Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TailRecursiveToken.txt Illustrates handling of dollar signs within string literals in Lezer, including escaped dollar signs and interpolations starting with a dollar sign. This is useful for languages where '$' has special meaning. ```lezer `$$` `$` `$${.}` ``` -------------------------------- ### Parse Input with Whitespace Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Whitespace.txt Example input demonstrating that consecutive spaces and newlines are skipped, resulting in a clean parse tree. ```lezer # Skips whitespace x x ==> T(X, X) ``` -------------------------------- ### Number and Operator Token Example Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenExpr.txt Demonstrates the parsing of 'Number' and 'Operator' tokens. This includes various number formats (integers, decimals, scientific notation) and operators like '+' and '++'. ```lezer # Numbers and operators 50 + 200e-5 ++ .2 - 111.111e+111 ==> T(Number,Operator,Number,Operator,Number,Operator,Number) ``` -------------------------------- ### Lezer Example: Arrow Expression Parsing Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Conflict.txt Demonstrates parsing an arrow expression with a parameter and an identifier. This shows how the ArrowExpr rule is applied. ```lezer # Arrow (a) => b ==> T(ArrowExpr(ParamName, Identifier)) ``` -------------------------------- ### Token Precedence Example Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenPrecedence.txt This example shows how precedence affects parsing. Given the input 'ABBAB', the parser correctly identifies 'BB' before 'B' due to the defined precedence, resulting in the parse tree T(A, BB, A, B). ```lezer # Token precedence ABBAB ==> T(A, BB, A, B) ``` -------------------------------- ### Resynchronizing to an Outer Context Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Recover.txt This example shows how the parser can resynchronize to an outer context after encountering an error. It demonstrates the parser's ability to recover and continue parsing. ```lezer @top T { (Class | Block)* } Class { "class" "{" "classitem"* "}" } Block { "{" (Block | "statement")* "}" } @skip { whitespace } @tokens { whitespace { @whitespace+ } } # Can resynchronize to an outer context { { { class { classitem classitem } ==> T(Block(Block(Block(⚠),⚠),⚠),Class) ``` -------------------------------- ### Lezer Parser Example: Interpolation Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TailRecursiveToken.txt Shows how to define interpolation within strings in Lezer. This allows embedding expressions or other constructs within string literals. ```lezer `foo${.}bar` ``` -------------------------------- ### Properly Placing End-of-File Errors Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Recover.txt This example illustrates how the parser correctly identifies and places end-of-file errors. It shows the parser's behavior when the input ends unexpectedly. ```lezer # Properly places end-of-file errors { { { ==> T(Block(Block(Block(⚠),⚠),⚠)) ``` -------------------------------- ### Parse '(b)' to Doc(ParenOpen, B, ParenClose) Source: https://github.com/lezer-parser/generator/blob/main/test/cases/NodeDeclaration.txt Example of parsing the input '(b)' which includes parentheses and the 'b' token, mapping to 'ParenOpen', 'B', and 'ParenClose' within the 'Doc' node. ```lezer # Applies punctuation info (b) ==> Doc(ParenOpen, B, ParenClose) ``` -------------------------------- ### Example: Non-Contextual Keyword Usage Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Specialize.txt Illustrates that non-contextual keywords, like 'print' in this case, cannot be used as identifiers. Attempting to declare 'let print = 2;' results in a parse error. ```lezer # Can not use a non-contextual keyword as identifier print 10; let print 2; ==> T(Print(Number),Decl(⚠),⚠,Print(Number)) ``` -------------------------------- ### Example of Precedence Resolution Source: https://github.com/lezer-parser/generator/blob/main/test/cases/PrecedenceOrder.txt Demonstrates how precedence rules resolve ambiguity. In this case, `Tag` has higher precedence than `<`, `<<`, and `<<<`, ensuring it's matched first when applicable. ```lezer T(Tag, "<<<", "<<", "<") ``` -------------------------------- ### Apply Scoped Skip Rules Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ScopedSkip.txt Demonstrates the application of the defined scoped skip rules. The first example shows valid parsing with dashes being skipped, while the second highlights invalid whitespace within parentheses, resulting in error markers. ```lezer # Applies the correct skip rules x (---b---) (b) ==> T(A, A, A) ``` ```lezer # Marks invalid whitespace ( b ) ==> T(A(⚠, ⚠)) ``` -------------------------------- ### Define Line Comment and Whitespace Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ForcedSkipReduce.txt Defines tokens for line comments (starting with '//') and whitespace. These are typically skipped during parsing. ```lezer @tokens { LineComment { "//" ![ ]* } space { @whitespace+ } } ``` -------------------------------- ### Example Parse with EOF Token Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Eof.txt Illustrates a successful parse of the input 'xxx' into the grammar rule 'A'. The output A(X, X, Y) shows that the last 'x' was successfully parsed as token Y, utilizing the @eof marker. ```lezer xxx ==> A(X, X, Y) ``` -------------------------------- ### Handle Unfinished Skipped Terms Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ForcedSkipReduce.txt Demonstrates how the parser generator properly terminates unfinished skipped terms, illustrated by an example where a block comment is not properly closed. ```lezer # Properly terminates unfinished skipped terms // Line /* Block ==> T(LineComment, BlockComment(⚠)) ``` -------------------------------- ### Lezer Example: Parenthesized Expression Parsing Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Conflict.txt Illustrates parsing a parenthesized expression containing an identifier. This showcases the ParenExpr rule. ```lezer # Paren expr (a) ==> T(ParenExpr(Identifier)) ``` -------------------------------- ### Example Usage with Astral Characters Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Astral.txt Demonstrates parsing input containing astral characters, including identifiers with astral characters and tokens defined using astral emojis. Note the handling of unsupported characters. ```lezer # Tokens with astral characters foo föö 象𫝄鼻 -💩 -🦆 -& -🍰 ==> T(Id,Id,Id,Thing,Thing,⚠,⚠,Id) ``` -------------------------------- ### Match Astral Characters Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AnyChar.txt This example shows that the token definition 'A' can successfully match astral characters like emojis. The pattern '\' followed by an astral character and then '/' results in a single 'A' token. ```lezer \🤢/ ==> T(A) ``` -------------------------------- ### Parse String with Local Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/LocalTokens.txt Example demonstrating how a string containing interpolated expressions is parsed using the defined local and global tokens. The parser correctly identifies nested structures and local token types. ```lezer "foo{{x}}bar{{x}}yz" ==> T(String(Interpolation(InterpolationStart,X,InterpolationEnd),Interpolation(InterpolationStart,X,InterpolationEnd),Letter,Letter)) ``` -------------------------------- ### Define Global Tokens in Lezer Parser Source: https://github.com/lezer-parser/generator/blob/main/test/cases/LocalTokensBadSkip.txt Use the @tokens directive to define tokens that are available globally throughout the parser. This example defines the InterpolationEnd token. ```lezer @tokens { InterpolationEnd { "}}" } } ``` -------------------------------- ### Lezer Grammar with Local and Global Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/LocalTokensBadMix.txt Defines a grammar with a top-level rule 'T' and an 'expr' rule that includes string literals. It showcases local token definitions for 'stringEnd', 'InterpolationStart', and 'stringContent', alongside global tokens for 'InterpolationEnd'. This setup is useful for parsing structured text with embedded expressions. ```lezer @top T { expr* } expr { X { "x" } | String { '"' (stringContent | Interpolation)* stringEnd? } } Interpolation { InterpolationStart expr InterpolationEnd } @local tokens { stringEnd { '"' } InterpolationStart { "{{" } @else stringContent } @tokens { InterpolationEnd { "}}" } } ``` -------------------------------- ### Parse Tuple with Trailing Comma Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Tuple.txt Example of parsing a tuple with a trailing comma. The grammar correctly identifies two 'Expr' tokens within the tuple. ```lezer (E,E,) ``` -------------------------------- ### Define Grammar for Binary Expressions Source: https://github.com/lezer-parser/generator/blob/main/test/cases/BinaryExpr.txt Defines the grammar rules for expressions, including atoms and binary expressions with multiplication and addition operators. This setup is crucial for parsing mathematical or logical expressions. ```lezer @precedence { mult @left, plus @left } @top T { expr } expr { atom | BinaryExpr } BinaryExpr { expr !mult MultOp expr | expr !plus AddOp expr } atom { Symbol | "(" expr ")" } @tokens { MultOp { "*" | "/" } AddOp { "+" | "-" } Symbol { "x" | "y" } } ``` -------------------------------- ### Define Grammar with Skipped Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ForcedSkipReduce.txt Defines the top-level grammar rule 'T' and specifies tokens to be skipped, including line comments, spaces, and block comments. This setup ensures that comments and whitespace are ignored during parsing. ```lezer @top T { "a"* } @skip { LineComment | space | BlockComment } ``` -------------------------------- ### Doesn't assign delim when tokens are part of a choice Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AutoDelimTag.txt Example showing that delimiter properties are not assigned when tokens are part of a choice rule, such as in DualExpr, where multiple delimiter options exist. ```lezer {{8}} ==> T(DualExpr("{{", Number, "}}")) ``` -------------------------------- ### Define Block Comment Structure Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ForcedSkipReduce.txt Specifies the structure for block comments, including their start ('/*'), content, and end ('*/'). It also defines local tokens for the block comment end and its content, allowing for nested or complex comment structures. ```lezer @skip {} { BlockComment { "/*" blockCommentContent* blockCommentEnd } } @local tokens { blockCommentEnd { "*/" } @else blockCommentContent } ``` -------------------------------- ### buildParser(text, options?) Source: https://context7.com/lezer-parser/generator/llms.txt Compiles a grammar string directly into a live `LRParser` instance. Ideal for unit tests since no file I/O or bundling is needed. External tokenizers, specializers, prop sources, and context trackers can be injected via `options`. ```APIDOC ## `buildParser(text, options?)` — Build an in-memory parser for testing Compiles a grammar string directly into a live `LRParser` instance. Ideal for unit tests since no file I/O or bundling is needed. External tokenizers, specializers, prop sources, and context trackers can be injected via `options`. ```typescript import { buildParser } from "@lezer/generator" import { ExternalTokenizer, InputStream } from "@lezer/lr" const grammar = ` @top Program { statement+ } statement { LetDecl { kw<"let"> Name "=" Number ";" } | PrintStmt { kw<"print"> Number ";" } } kw { @specialize } @skip { spaces } @tokens { spaces { @whitespace+ } Name { @asciiLetter+ } Number { @digit+ } } ` const parser = buildParser(grammar, { // Custom warning handler instead of console.warn warn(msg) { console.error("Grammar warning:", msg) }, // Include term name metadata in the serialized tables (useful for debugging) includeNames: true, }) // Parse source text — returns a Lezer Tree const tree = parser.parse("let foo = 42; print 99;") console.log(tree.toString()) // => Program(LetDecl(Name,Number),PrintStmt(Number)) // Inject external tokenizers when the grammar uses @external tokens const grammarWithExternal = ` @top T { expr* } expr { X { "x" } | Braced { braceOpen expr braceClose } } @external tokens ext from "./tokens" { braceOpen, braceClose } ` const parserWithExt = buildParser(grammarWithExternal, { externalTokenizer(name, terms) { if (name === "ext") { return new ExternalTokenizer((input: InputStream) => { if (input.next === "{".charCodeAt(0)) { input.advance() input.acceptToken(terms.braceOpen) } else if (input.next === "}".charCodeAt(0)) { input.advance() input.acceptToken(terms.braceClose) } }) } throw new Error("Unknown external tokenizer: " + name) }, }) console.log(parserWithExt.parse("x{x}x").toString()) // => T(X, Braced(X), X) ``` ``` -------------------------------- ### Define and Use Parameterized Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenArgs.txt Demonstrates defining tokens with arguments and how they can be used in grammar rules. The `kw` syntax allows creating keyword tokens with specific names. ```lezer @top T { (kw<"let"> | kw<"if"> | Foo)+ } @skip { whitespace } @tokens { whitespace { @whitespace+ } kw[@name={word}] { word } Foo { bar<"foo"> } bar { baz } baz { "!" z } } ``` -------------------------------- ### Define External and Local Tokens Source: https://context7.com/lezer-parser/generator/llms.txt Demonstrates how to declare external tokenizers resolved at runtime and local, context-sensitive token groups. ```lezer @external tokens indentation from "./indent-tokens" { indent, dedent } # @local tokens — context-sensitive token group (e.g., inside strings) @local tokens { stringEnd { '"' } Interpolation { "${" } @else stringContent } ``` -------------------------------- ### Parse Tuple without Trailing Comma Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Tuple.txt Example of parsing a tuple without a trailing comma. The grammar correctly identifies three 'Expr' tokens within the tuple. ```lezer (E,E,E) ``` -------------------------------- ### Basic lezer-generator CLI Usage Source: https://context7.com/lezer-parser/generator/llms.txt Compiles a .grammar file into JavaScript parser and terms files. Use -o to specify the output file name. ```bash lezer-generator my-lang.grammar -o my-lang.parser.js ``` ```bash lezer-generator my-lang.grammar ``` ```bash lezer-generator --cjs my-lang.grammar -o my-lang.parser.cjs ``` ```bash lezer-generator --names my-lang.grammar -o my-lang.parser.js ``` ```bash lezer-generator --noTerms my-lang.grammar -o my-lang.parser.js ``` ```bash lezer-generator --typeScript my-lang.grammar -o my-lang.parser.ts ``` ```bash lezer-generator --export myParser my-lang.grammar -o my-lang.parser.js ``` ```bash lezer-generator --help ``` -------------------------------- ### Build In-Memory Parser with buildParser Source: https://context7.com/lezer-parser/generator/llms.txt Compiles a grammar string into a live LRParser instance for testing. External tokenizers, specializers, prop sources, and context trackers can be injected via options. Use this for unit tests as it avoids file I/O and bundling. ```typescript import { buildParser } from "@lezer/generator" import { ExternalTokenizer, InputStream } from "@lezer/lr" const grammar = ` @top Program { statement+ } statement { LetDecl { kw<"let"> Name "=" Number ";" } | PrintStmt { kw<"print"> Number ";" } } kw { @specialize } @skip { spaces } @tokens { spaces { @whitespace+ } Name { @asciiLetter+ } Number { @digit+ } } ` const parser = buildParser(grammar, { // Custom warning handler instead of console.warn warn(msg) { console.error("Grammar warning:", msg) }, // Include term name metadata in the serialized tables (useful for debugging) includeNames: true, }) // Parse source text — returns a Lezer Tree const tree = parser.parse("let foo = 42; print 99;") console.log(tree.toString()) // => Program(LetDecl(Name,Number),PrintStmt(Number)) ``` ```typescript // Inject external tokenizers when the grammar uses @external tokens const grammarWithExternal = ` @top T { expr* } expr { X { "x" } | Braced { braceOpen expr braceClose } } @external tokens ext from "./tokens" { braceOpen, braceClose } ` const parserWithExt = buildParser(grammarWithExternal, { externalTokenizer(name, terms) { if (name === "ext") { return new ExternalTokenizer((input: InputStream) => { if (input.next === "{".charCodeAt(0)) { input.advance() input.acceptToken(terms.braceOpen) } else if (input.next === "}".charCodeAt(0)) { input.advance() input.acceptToken(terms.braceClose) } }) } throw new Error("Unknown external tokenizer: " + name) }, }) console.log(parserWithExt.parse("x{x}x").toString()) // => T(X, Braced(X), X) ``` -------------------------------- ### Configure Parser Build Options Source: https://context7.com/lezer-parser/generator/llms.txt Defines `BuildOptions` for customizing parser generation, including file names, warning handlers, module formats, and external tokenizer/prop sources. ```typescript import { buildParserFile, BuildOptions } from "@lezer/generator" import { NodeProp, NodePropSource } from "@lezer/common" import { ExternalTokenizer, Stack, ContextTracker } from "@lezer/lr" const options: BuildOptions = { // Filename shown in error/warning messages fileName: "my-lang.grammar", // Replace the default console.warn handler warn(message: string) { process.stderr.write(`[grammar] ${message}\n`) }, // Include term names in serialized data (larger output, useful for debugging) includeNames: true, // Output module format: "es" (default) or "cjs" moduleStyle: "es", // Emit TypeScript source instead of JavaScript typeScript: false, // Name of the exported parser variable (default: "parser") exportName: "myLangParser", // Provide placeholder ExternalTokenizer instances (required by buildParser) externalTokenizer(name: string, terms: Record): ExternalTokenizer { throw new Error("No external tokenizer: " + name) }, // Resolve @external prop sources externalPropSource(name: string): NodePropSource { throw new Error("No prop source: " + name) }, // Provide external specializer functions (buildParser only) externalSpecializer(name: string, terms: Record) { return (value: string, stack: Stack) => -1 }, // Create NodeProp instances for @external prop declarations externalProp(name: string): NodeProp { return new NodeProp({ deserialize: x => x }) }, // Attach a context tracker to the built parser (buildParser only) contextTracker: new ContextTracker({ start: null, shift: () => null, strict: false }), } const { parser, terms } = buildParserFile("@top T { 'x'+ }", options) ``` -------------------------------- ### Test Parser Output with testTree and fileTests Source: https://context7.com/lezer-parser/generator/llms.txt Utilities for verifying parser output against expected tree structures using compact s-expression notation or .txt case files. ```typescript import { buildParser } from "@lezer/generator" import { testTree, fileTests } from "@lezer/generator/test" import { readFileSync } from "fs" const parser = buildParser(` @top T { (A | B)+ } A { "a" } B { "b" } @tokens { A { "a" } B { "b" } } `) // testTree(tree, expectation, mayIgnore?) // - Node names must match exactly // - ⚠ matches an error node // - (...) lists expected children // - NodeName[prop=value] checks NodeProp values // - NodeName(...) is a wildcard child list const tree = parser.parse("aab") testTree(tree, "T(A, A, B)") // passes testTree(tree, "T(A, ...)") // passes — wildcard children // fileTests parses the .txt case format: // # Test name // // ==> ExpectedTree(...) const caseFile = readFileSync("test/cases/BinaryExpr.txt", "utf8") const grammarSrc = caseFile.split(/\n# /)[0] // grammar is before first test case const casesSrc = caseFile.slice(grammarSrc.length) const binaryParser = buildParser(grammarSrc) for (const { name, run } of fileTests(casesSrc, "BinaryExpr.txt")) { try { run(binaryParser) console.log(`PASS: ${name}`) } catch (e) { console.error(`FAIL: ${name} — ${(e as Error).message}`) } } // PASS: Parenthesized // PASS: Associativity // PASS: Precedence // PASS: Mixed precedence // ... ``` -------------------------------- ### Skip Simple Whitespace Source: https://github.com/lezer-parser/generator/blob/main/test/cases/SkipExpr.txt Demonstrates parsing a sequence of 'foo' tokens separated by simple whitespace, which is ignored by the skip expression. ```lezer # Can skip the simple part of the skip expression foo foo ==> T(Foo,Foo) ``` -------------------------------- ### Token Calls: Chained Token Definitions Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenArgs.txt Illustrates how tokens can call other tokens, creating a chain of definitions. The `Foo` token calls `bar`, which in turn calls `baz`. ```lezer !foo ==> T(Foo) ``` -------------------------------- ### buildParserFile(text, options?) Source: https://context7.com/lezer-parser/generator/llms.txt Compiles a grammar string into JavaScript (or TypeScript) source code suitable for bundling. Returns an object with two properties: `parser` (the main module exporting the `LRParser` instance) and `terms` (a companion module exporting numeric constants for every named terminal). ```APIDOC ## `buildParserFile(text, options?)` — Generate parser source code files Compiles a grammar string into JavaScript (or TypeScript) source code suitable for bundling. Returns an object with two properties: `parser` (the main module exporting the `LRParser` instance) and `terms` (a companion module exporting numeric constants for every named terminal). ```typescript import { buildParserFile } from "@lezer/generator" import { writeFileSync } from "fs" const grammar = ` @precedence { mult @left, add @left } @top Expr { expr } expr { Atom | BinaryExpr } BinaryExpr { expr !mult MultOp expr | expr !add AddOp expr } Atom { Number | "(" expr ")" } @tokens { MultOp { "*" | "/" } AddOp { "+" | "-" } Number { @digit+ } } ` // Generate ES module output (default) const { parser, terms } = buildParserFile(grammar, { moduleStyle: "es", // "es" (default) or "cjs" exportName: "parser", // name of the exported parser variable (default: "parser") includeNames: false, // embed term name strings for debugging typeScript: false, // emit TypeScript (.ts) output instead of JS fileName: "expr.grammar" // used in error messages }) writeFileSync("expr.parser.js", parser) writeFileSync("expr.terms.js", terms) // expr.parser.js will look like: // import {LRParser} from "@lezer/lr" // export const parser = LRParser.deserialize({ version: ..., states: "...", ... }) // expr.terms.js will look like: // export const // MultOp = 1, // AddOp = 2, // Number = 3, // BinaryExpr = 4, // Atom = 5, // Expr = 6 // Generate CommonJS output const { parser: cjsParser } = buildParserFile(grammar, { moduleStyle: "cjs" }) writeFileSync("expr.parser.cjs", cjsParser) // => const {LRParser} = require("@lezer/lr") // exports.parser = LRParser.deserialize({ ... }) ``` ``` -------------------------------- ### Parse Mixed Precedence Source: https://github.com/lezer-parser/generator/blob/main/test/cases/BinaryExpr.txt Demonstrates parsing an expression with mixed multiplication and division operators, respecting their precedence and associativity rules. ```lezer x*x+y/y ==> T(BinaryExpr(BinaryExpr(Symbol,MultOp,Symbol),AddOp,BinaryExpr(Symbol,MultOp,Symbol))) ``` -------------------------------- ### Generate Parser Source Files with buildParserFile Source: https://context7.com/lezer-parser/generator/llms.txt Compiles a grammar string into JavaScript or TypeScript source code for bundling. Returns an object with `parser` and `terms` modules. Options control module style, export name, name inclusion, and TypeScript output. ```typescript import { buildParserFile } from "@lezer/generator" import { writeFileSync } from "fs" const grammar = ` @precedence { mult @left, add @left } @top Expr { expr } expr { Atom | BinaryExpr } BinaryExpr { expr !mult MultOp expr | expr !add AddOp expr } Atom { Number | "(" expr ")" } @tokens { MultOp { "*" | "/" } AddOp { "+" | "-" } Number { @digit+ } } ` // Generate ES module output (default) const { parser, terms } = buildParserFile(grammar, { moduleStyle: "es", // "es" (default) or "cjs" exportName: "parser", // name of the exported parser variable (default: "parser") includeNames: false, // embed term name strings for debugging typeScript: false, // emit TypeScript (.ts) output instead of JS fileName: "expr.grammar" // used in error messages }) writeFileSync("expr.parser.js", parser) writeFileSync("expr.terms.js", terms) // expr.parser.js will look like: // import {LRParser} from "@lezer/lr" // export const parser = LRParser.deserialize({ version: ..., states: "...", ... }) // expr.terms.js will look like: // export const // MultOp = 1, // AddOp = 2, // Number = 3, // BinaryExpr = 4, // Atom = 5, // Expr = 6 ``` ```javascript // Generate CommonJS output const { parser: cjsParser } = buildParserFile(grammar, { moduleStyle: "cjs" }) writeFileSync("expr.parser.cjs", cjsParser) // => const {LRParser} = require("@lezer/lr") // exports.parser = LRParser.deserialize({ ... }) ``` -------------------------------- ### Recognize 'a' with Default Top Rule Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AlternativeTop.txt Parses the input 'a' using the default top-level rule. The output indicates that rule 'A' was successfully applied. ```lezer a ==> A(a) ``` -------------------------------- ### Sees through rules Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AutoDelimTag.txt Shows how delimiter properties are assigned even when tokens are part of a rule like BracketExpr, with explicit opening and closing tokens. ```lezer [|50|] ==> T(BracketExpr(BracketLeft[closedBy="BracketRight"], Number, BracketRight[openedBy="BracketLeft"]))) ``` -------------------------------- ### Define Contextual Keywords with @specialize Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Specialize.txt Use @specialize to define keywords that can also be used as regular identifiers. This is useful for creating contextual keywords that have special meaning only in certain grammar rules. ```lezer @top T { (statement ";")+ } statement { Decl { kw<"let"> Id "=" Number } | Print { kw<"print"> Number } | Exit { kw<"exit"> } | Async { kwExt<"async"> Number } | Id } kw { @specialize } kwExt { @extend } @skip { whitespace } @tokens { whitespace { @whitespace+ } Id { @asciiLetter+ } Number { @digit+ } } ``` -------------------------------- ### Define Alternative Top Rules A and B Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AlternativeTop.txt Defines two distinct top-level rules, 'A' and 'B', each recognizing a specific string. These can be used as alternative entry points for parsing. ```lezer @top A { "a" } @top B { "b" } @tokens { "a" "b" } ``` -------------------------------- ### Handling Multiple Skipped Comments Source: https://github.com/lezer-parser/generator/blob/main/test/cases/MoveSkip.txt Demonstrates the parser's ability to handle multiple consecutive skipped tokens (LineComment) within the grammar structure. ```lezer # Can handle multiple comments if a // comment // comment ==> T(IfStatement(IfClause), LineComment, LineComment) ``` -------------------------------- ### Recognizes named literals Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AutoDelimTag.txt Demonstrates the parser's ability to recognize named literals, like DoubleLeft and DoubleRight, and assign delimiter properties accordingly. ```lezer [[5]] ==> T(DoubleExpr(DoubleLeft[closedBy="DoubleRight"], Number, DoubleRight[openedBy="DoubleLeft"]))) ``` -------------------------------- ### Error Case: No Dialect Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Demonstrates an error case when parsing 'ca' without a specified dialect, resulting in 'T(C, ⚠)'. ```lezer # No dialect, error ca ==> T(C, ⚠) ``` -------------------------------- ### Parse Left Associativity Source: https://github.com/lezer-parser/generator/blob/main/test/cases/BinaryExpr.txt Illustrates how the parser handles left associativity for addition operators. Consecutive additions are grouped correctly from left to right. ```lezer x+x+x+x ==> T(BinaryExpr(BinaryExpr(BinaryExpr(Symbol,AddOp,Symbol),AddOp,Symbol),AddOp,Symbol)) ``` -------------------------------- ### Recognize 'b' with Explicit Top Rule 'B' Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AlternativeTop.txt Parses the input 'b' by explicitly specifying 'B' as the top-level rule. This demonstrates using an alternative top-level rule to parse specific input. ```lezer b ==> B(b) { "top": "B" } ``` -------------------------------- ### Recognize 'a' with Explicit Top Rule 'A' Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AlternativeTop.txt Parses the input 'a' by explicitly specifying 'A' as the top-level rule. This achieves the same result as the default top-level recognition. ```lezer a ==> A(a) { "top": "A" } ``` -------------------------------- ### Rollup Plugin for Lezer Grammar Compilation Source: https://context7.com/lezer-parser/generator/llms.txt Integrates lezer-generator into Rollup build process. Import .grammar files directly; the plugin handles compilation. Import .grammar.terms for term constants. ```javascript // rollup.config.js import { lezer } from "@lezer/generator/rollup" export default { input: "./src/index.js", output: { file: "dist/index.js", format: "es" }, plugins: [ lezer({ // Optional: override the exported parser name (default: "parser") exportName: "myParser" }) ] } // src/index.js — import the grammar directly; Rollup resolves it at build time import { parser } from "./my-lang.grammar" import { Identifier, Number, BinaryExpr } from "./my-lang.grammar.terms" export { parser, Identifier, Number, BinaryExpr } ``` -------------------------------- ### Parse Input with LR(1) Grammar Source: https://github.com/lezer-parser/generator/blob/main/test/cases/NotLALR.txt Demonstrates parsing an input string 'aeb' using the defined LR(1) grammar. The parser correctly identifies the structure as T(F). ```lezer # Can parse aeb ==> T(F) ``` -------------------------------- ### Skip Whitespace and Comments Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Whitespace.txt Configures the parser to skip both 'whitespace' and 'Comment' tokens. This ensures these are ignored during parsing. ```lezer @skip { whitespace | Comment } ``` -------------------------------- ### Define Default Skip Set Source: https://github.com/lezer-parser/generator/blob/main/test/cases/InconsistentSkip.txt Specifies a default skip set named 'space'. This is typically used for whitespace. ```lezer @skip { space } ``` -------------------------------- ### Configure Rollup Plugin for Lezer Grammars Source: https://github.com/lezer-parser/generator/blob/main/README.md Use this Rollup plugin to automatically transform .grammar and .grammar.terms files during the build process. It integrates seamlessly with Rollup's build pipeline. ```javascript import {lezer} from "@lezer/generator/rollup" export default { input: "./in.js", output: {file: "out.js", format: "cjs"}, plugins: [lezer()] } ``` -------------------------------- ### Define Scoped Skip Rules in Lezer Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ScopedSkip.txt Defines grammar rules, including a top-level rule, a general skip rule for spaces, and a scoped skip rule for dashes within a specific context. This allows for more granular control over whitespace and ignored characters. ```lezer @top T { A+ } @skip { spaces } @skip { dashes } { A { "x" | "(" "b" ")" } } @tokens { spaces { " "+ } dashes { "-"+ } } ``` -------------------------------- ### Define Skip Expression with Simple Content Source: https://github.com/lezer-parser/generator/blob/main/test/cases/SkipExpr.txt Defines a grammar where spaces and angle-bracketed content are skipped. This is useful for ignoring whitespace or comments. ```lezer @top T { Foo { "foo" }+ } @skip { " " | "<" skipContent* ">" } skipContent { A { "a" } | B { "b" } } ``` -------------------------------- ### Skipped Tokens Within Optional Content Source: https://github.com/lezer-parser/generator/blob/main/test/cases/MoveSkip.txt Shows how a skipped token (LineComment) is correctly placed inside the IfStatement node when it appears before an ElseClause that is present. ```lezer # Puts them inside if the node continues if a // comment else b ==> T(IfStatement(IfClause, LineComment, ElseClause)) ``` -------------------------------- ### Define Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/InconsistentSkip.txt Defines the 'space' token, which consists of one or more space characters. ```lezer @tokens { space { " "+ } } ``` -------------------------------- ### Define Tokens with Dialects Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Defines tokens 'a' and 'b', specifying that 'a' belongs to dialect 'a' and 'b' belongs to dialect 'b'. ```lezer @tokens { "a"[@dialect=a,@name=] "b"[@dialect=b,@name=] } ``` -------------------------------- ### Define Token Precedence in Lezer Source: https://github.com/lezer-parser/generator/blob/main/test/cases/PrecedenceOrder.txt Use the `@precedence` keyword to define the order in which tokens are resolved when they have overlapping definitions. Higher precedence tokens are preferred. ```lezer @tokens { space { " "+ } Tag { "<" "<"* @asciiLetter+ } @precedence { Tag, "<<" } @precedence { Tag, "<" } @precedence { Tag, "<<<") "<" "<<" "<<<") } ``` -------------------------------- ### Parse Parenthesized Expression Source: https://github.com/lezer-parser/generator/blob/main/test/cases/BinaryExpr.txt Demonstrates parsing a simple parenthesized expression involving addition and multiplication. This showcases the parser's ability to handle nested structures and operator precedence. ```lezer (x+y)/x ==> T(BinaryExpr(BinaryExpr(Symbol,AddOp,Symbol),MultOp,Symbol)) ``` -------------------------------- ### Error Case: Dialect A Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Shows an error when parsing 'cab' with dialect 'a' specified, resulting in 'T(C, A, ⚠)'. ```lezer # Dialect A, error {"dialect": "a"} cab ==> T(C, A, ⚠) ``` -------------------------------- ### Define Token Groups and Precedence Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenGroups.txt Defines token groups, precedence rules, and skip rules for a Lezer parser. Use this to structure your grammar. ```lezer @precedence { div @left } @top T { expr } expr { RegExp | Symbol | BinOp } BinOp { expr !div "/" expr } @skip { whitespace } @tokens { whitespace { @whitespace+ } Symbol { @asciiLetter+ } RegExp { "/" ![/]+ "/" } } ``` -------------------------------- ### Parse with Both Dialects Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Parses the input 'abc' with both dialects 'a' and 'b' specified, resulting in 'T(A, B, C)'. ```lezer # Both dialects {"dialect": "a b"} abc ==> T(A, B, C) ``` -------------------------------- ### Define Token Expressions Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenExpr.txt Define various token types like words, numbers, and operators using Lezer's grammar definition syntax. Ensure whitespace is skipped. ```lezer @top T { (Word | Number | Operator)+ } @skip { whitespace } @tokens { whitespace { @whitespace+ } Word { @asciiLetter (letter | @digit)* } Number { (@digit+ (".." @digit*)? | "." @digit+) (("e" | "E") ("+" | "-")? @digit+)? } Operator { "+" "+"? | "-" } letter { $[a-zA-Z] } } ``` -------------------------------- ### Define Grammar with External Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/ExternalTokens.txt Defines a grammar structure that utilizes tokens imported from an external source. Ensure the external tokens are correctly defined and exported in the specified JavaScript file. ```lezer @top T { expr* } expr { Braced { braceOpen expr braceClose } | X { "x" (Dot Y { "y" })? } } @external tokens ext1 from "./external_tokens.js" { braceOpen, braceClose, Dot } # Uses external tokens x{x}x.y ==> T(X, Braced(X), X(Dot, Y)) ``` -------------------------------- ### Handle Missing Operator Error Source: https://github.com/lezer-parser/generator/blob/main/test/cases/BinaryExpr.txt Demonstrates how the parser detects and represents an error when an operator is missing between two symbols. ```lezer xy ==> T(Symbol,⚠(Symbol)) ``` -------------------------------- ### Parse with Dialect A Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Parses the input 'ca' with dialect 'a' specified, resulting in 'T(C, A)'. ```lezer # Dialect A {"dialect": "a"} ca ==> T(C, A) ``` -------------------------------- ### Lezer Grammar Definition with Empty Production Source: https://github.com/lezer-parser/generator/blob/main/test/cases/EmptyAfterLookahead.txt Defines a Lezer grammar with a top-level rule 'Program' that allows zero or more 'GlobalConstantDeclaration' followed by a semicolon. It includes token definitions for whitespace, comments, and identifiers, and specifies a skip rule for whitespace and comments. ```lezer // This example caused the moving of nodes for reductions to move a // node across a buffer position stored on the parser stack, causing a // node to fall out of its parent node. @skip { space | lineComment } @top Program { (GlobalConstantDeclaration ";")* } GlobalConstantDeclaration { AttributeList @specialize[@name='Keyword'] Identifier "=" } AttributeList { Attribute { "@" Identifier }* } @tokens { space { std.whitespace+ } lineComment { "//" ![\n\r]* $[ \r]? } Identifier { $[a-zA-Z_] $[0-9a-zA-Z_]* } } ``` -------------------------------- ### Skip Nested Content within Angle Brackets Source: https://github.com/lezer-parser/generator/blob/main/test/cases/SkipExpr.txt Shows how nested content within angle brackets, including skipped elements like 'a' and 'b', is correctly parsed and accounted for. ```lezer # Outputs tags from skipped content foo foo ==> T(Foo,A,B,A,Foo) ``` -------------------------------- ### Define Token for Any Character Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AnyChar.txt This token definition matches a literal backslash followed by any single character (represented by '_') and then a literal forward slash. It's useful for simple character matching within tokens. ```lezer @tokens { A { "\\" _ "/" } } ``` -------------------------------- ### Skipped Tokens Before Optional Content Source: https://github.com/lezer-parser/generator/blob/main/test/cases/MoveSkip.txt Illustrates how a skipped token (LineComment) appearing before an optional ElseClause is associated with the IfStatement node. ```lezer @top T { IfStatement+ } IfStatement { IfClause { "if" "a" } ElseClause { "else" "b" }? } @skip { "\n" | " " | LineComment } @tokens { LineComment { "//" ![\n]* } } # Puts skipped tokens before optional content outside of a node if a // comment ==> T(IfStatement(IfClause), LineComment) ``` -------------------------------- ### Non-Match Multiple Characters Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AnyChar.txt Illustrates that the 'A' token is designed to match only a single character between the literal '\' and '/'. Attempting to match multiple characters results in an error, indicated by '⚠'. ```lezer \xy/ ==> T(⚠) ``` -------------------------------- ### Define Token Precedence in Lezer Source: https://github.com/lezer-parser/generator/blob/main/test/cases/TokenPrecedence.txt Use the `@precedence` directive within the `@tokens` block to specify the order in which tokens should be preferred when they can match the same input. Higher precedence tokens are matched before lower precedence ones. ```lezer @top T { (A | B | BB)+ } @tokens { @precedence { BB, B } A { "A" } B { "B" "."? } BB { "BB" } } ``` -------------------------------- ### Define Grammar with Dialects Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Defines a top-level rule 'T' that accepts one or more occurrences of 'A', 'B', or 'C'. 'A', 'B', and 'C' are defined as simple terminal rules. ```lezer @dialects { a, b } @top T { (A | B | C)+ } A { "a" } B { "b" } C { "c" } ``` -------------------------------- ### Parse Operator Precedence Source: https://github.com/lezer-parser/generator/blob/main/test/cases/BinaryExpr.txt Shows the parser correctly applying precedence rules, where multiplication is performed before addition and subtraction in a mixed expression. ```lezer x+x*x-x ==> T(BinaryExpr(BinaryExpr(Symbol,AddOp,BinaryExpr(Symbol,MultOp,Symbol)),AddOp,Symbol)) ``` -------------------------------- ### Lezer Grammar with @skip Source: https://github.com/lezer-parser/generator/blob/main/test/cases/SkipPosition.txt Defines a simple grammar with rules for statements and variables, and uses the @skip directive to ignore whitespace and comments. This is useful for defining parsers where certain tokens should not be part of the main syntax tree. ```lezer @top T { Statement+ } Statement { Variable ";" | Variable Variable ";" } Variable { identifier } @skip { space | Comment } @tokens { space { @whitespace+ } Comment { "'" ![']* "'" } identifier { "x" } } ``` -------------------------------- ### Assigns delimiter node props Source: https://github.com/lezer-parser/generator/blob/main/test/cases/AutoDelimTag.txt Demonstrates how the parser assigns 'openedBy' and 'closedBy' properties to nodes when delimiters are explicitly defined in the grammar. ```lezer (11) ==> T(ParenExpr("("[closedBy=")"], Number, ")"[openedBy="("])) ``` -------------------------------- ### Parse with Dialect B Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Parses the input 'bc' with dialect 'b' specified, resulting in 'T(B, C)'. ```lezer # Dialect B {"dialect": "b"} bc ==> T(B, C) ``` -------------------------------- ### Parse without Dialect Source: https://github.com/lezer-parser/generator/blob/main/test/cases/Dialect.txt Parses the input 'cc' using the default dialect, resulting in 'T(C, C)'. ```lezer # No dialect cc ==> T(C, C) ``` -------------------------------- ### Define Tokens Source: https://github.com/lezer-parser/generator/blob/main/test/cases/NodeDeclaration.txt Declares the tokens that can be recognized by the parser, including their names and associated character sequences. ```lezer @tokens { "a"[@name=A] "("[@name=ParenOpen] ")"[@name=ParenClose] } ``` -------------------------------- ### Define Grammar Rules Source: https://github.com/lezer-parser/generator/blob/main/test/cases/NodeDeclaration.txt Defines the top-level document structure and a nested rule for grouping. ```lezer @top Doc { ("a" | "(" B ")")+ } B { "b" } ```