Содержание
Grammars organize regexes, just like classes organize methods. The following example demonstrates how to parse JSON, a data exchange format already introduced (see ).
# file lib/JSON/Tiny/Grammar.pm
grammar JSON::Tiny::Grammar {
rule TOP { ^[ <object> | <array> ]$ }
rule object { '{' ~ '}' <pairlist> }
rule pairlist { [ <pair> ** [ \, ] ]? }
rule pair { <string> ':' <value> }
rule array { '[' ~ ']' [ <value> ** [ \, ] ]? }
proto token value { <...> };
token value:sym<number> {
'-'?
[ 0 | <[1..9]> <[0..9]>* ]
[ \. <[0..9]>+ ]?
[ <[eE]> [\+|\-]? <[0..9]>+ ]?
}
token value:sym<true> { <sym> };
token value:sym<false> { <sym> };
token value:sym<null> { <sym> };
token value:sym<object> { <object> };
token value:sym<array> { <array> };
token value:sym<string> { <string> }
token string {
\" ~ \" [ <str> | \\ <str_escape> ]*
}
token str {
[
<!before \t>
<!before \n>
<!before \\>
<!before \">
.
]+
# <-["\\\t\n]>+
}
token str_escape {
<["\\/bfnrt]> | u <xdigit>**4
}
}
# test it:
my $tester = '{
"country": "Austria",
"cities": [ "Wien", "Salzburg", "Innsbruck" ],
"population": 8353243
}';
if JSON::Tiny::Grammar.parse($tester) {
say "It's valid JSON";
} else {
# TODO: error reporting
say "Not quite...";
}
A grammar contains various named regex. Regex names may be constructed
the same as subroutine names or method names. While regex names are
completely up to the grammar writer, a rule named TOP
will, by default, be invoked when the .parse() method is
executed on a grammar. The above call to JSON::Tiny::Grammar.parse($tester)
starts by attempting to match the regex named TOP to the string $tester.
In this example, the TOP rule anchors the match to the start and end
of the string, so that the whole string has to be in valid JSON format
for the match to succeed. After matching the anchor at the start of the
string, the regex attempts to match either an <array> or an <object>. Enclosing a regex name in angle brackets causes the regex
engine to attempt to match a regex by that name within the same grammar.
Subsequent matches are straightforward and reflect the structure in
which JSON components can appear.
Regexes can be recursive. An array contains value. In turn a
value can be an array. This will not cause an infinite loop as
long as at least one regex per recursive call consumes at least one
character. If a set of regexes were to call each other recursively
without progressing in the string, the recursion could go on infinitely
and never proceed to other parts of the grammar.
The example grammar given above introduces the goal matching syntax
which can be presented abstractly as: A ~ B C. In
JSON::Tiny::Grammar, A is '{', B is '}' and C is <pairlist>. The atom on the left of the tilde (A) is matched
normally, but the atom to the right of the tilde (B) is set as the
goal, and then the final atom (C) is matched. Once the final atom
matches, the regex engine attempts to match the goal (B). This has
the effect of switching the match order of the final two atoms (B and
C), but since Perl knows that the regex engine should be looking for
the goal, a better error message can be given when the goal does not
match. This is very helpful for bracketing constructs as it puts the
brackets near one another.
Another novelty is the declaration of a proto token:
proto token value { <...> };
token value:sym<number> {
'-'?
[ 0 | <[1..9]> <[0..9]>* ]
[ \. <[0..9]>+ ]?
[ <[eE]> [\+|\-]? <[0..9]>+ ]?
}
token value:sym<true> { <sym> };
token value:sym<false> { <sym> };
The proto token syntax indicates that value will be a set of
alternatives instead of a single regex. Each alternative has a name of the
form token value:sym<thing>, which can be read as alternative of
value with parameter sym set to thing. The body of such an
alternative is a normal regex, where the call <sym> matches the value
of the parameter, in this example thing.
When calling the rule <value>, the grammar engine attempts to
match the alternatives in parallel and the longest match wins. This is
exactly like normal alternation, but as we'll see in the next section,
has the advantage of being extensible.