Home | Libraries | People | FAQ | More |
This example demonstrates:
The symbol table holds a dictionary of symbols where each symbol is a sequence of characters. The template class, can work efficiently with 8, 16, 32 and even 64 bit characters. Mutable data of type T are associated with each symbol.
Traditionally, symbol table management is maintained separately outside the
BNF grammar through semantic actions. Contrary to standard practice, the
Spirit symbol table class symbols
is a parser. An object of which may be used anywhere in the EBNF grammar
specification. It is an example of a dynamic parser. A dynamic parser is
characterized by its ability to modify its behavior at run time. Initially,
an empty symbols object matches nothing. At any time, symbols may be added
or removed, thus, dynamically altering its behavior.
Each entry in a symbol table may have an associated mutable data slot. In this regard, one can view the symbol table as an associative container (or map) of key-value pairs where the keys are strings.
The symbols class expects one template parameter to specify the data type associated with each symbol: its attribute. There are a couple of namespaces in X3 where you can find various versions of the symbols class for handling different character encoding including ascii, standard, standard_wide, iso8859_1, and unicode. The default symbol parser type in the main x3 namespace is standard.
Here's a parser for roman hundreds (100..900) using the symbol table. Keep in mind that the data associated with each slot is the parser's attribute (which is passed to attached semantic actions).
struct hundreds_ : x3::symbols<unsigned> { hundreds_() { add ("C" , 100) ("CC" , 200) ("CCC" , 300) ("CD" , 400) ("D" , 500) ("DC" , 600) ("DCC" , 700) ("DCCC" , 800) ("CM" , 900) ; } } hundreds;
Here's a parser for roman tens (10..90):
struct tens_ : x3::symbols<unsigned> { tens_() { add ("X" , 10) ("XX" , 20) ("XXX" , 30) ("XL" , 40) ("L" , 50) ("LX" , 60) ("LXX" , 70) ("LXXX" , 80) ("XC" , 90) ; } } tens;
and, finally, for ones (1..9):
struct ones_ : x3::symbols<unsigned> { ones_() { add ("I" , 1) ("II" , 2) ("III" , 3) ("IV" , 4) ("V" , 5) ("VI" , 6) ("VII" , 7) ("VIII" , 8) ("IX" , 9) ; } } ones;
Now we can use hundreds
,
tens
and ones
anywhere in our parser expressions. They are all parsers.
Up until now, we've been inlining our parser expressions, passing them directly
to the phrase_parse
function.
The expression evaluates into a temporary, unnamed parser which is passed
into the phrase_parse
function,
used, and then destroyed. This is fine for small parsers. When the expressions
get complicated, you'd want to break the expressions into smaller easier-to-understand
pieces, name them, and refer to them from other parser expressions by name.
A parser expression can be assigned to what is called a "rule". There are various ways to declare rules. The simplest form is:
rule<ID> const r = "some-name";
At the very least, the rule needs an identification tag. This ID can be any struct or class type and need not be defined. Forward declaration would suffice. In subsequent tutorials, we will see that the rule ID can have additional functionalities for error handling and annotation.
The name is optional, but is useful for debugging and error handling, as
we'll see later. Notice that rule r
is declared const
. Rules are
immutable and are best declared as const
.
Rules are lightweight and can be passed around by value. Its only member
variable is a std::string
: its name.
Note | |
---|---|
Unlike Qi (Spirit V2), X3 rules can be used with both |
For our next example, there's one more rule form you should know about:
rule<ID, Attribute> const r = "some-name";
The Attribute parameter specifies the attribute type of the rule. You've
seen that our parsers can have an attribute. Recall that the double_
parser has an attribute of double
. To be precise, these are synthesized
attributes. The parser "synthesizes" the attribute value. If the
parser is a function, think of them as function return values.
After having declared a rule, you need a definition for the rule. Example:
auto const r_def = double_ >> *(',' >> double_);
By convention, rule definitions have a _def suffix. Like rules, rule definitions
are immutable and are best declared as const
.
Now that we have a rule and its definition, we tie the rule with a rule definition
using the BOOST_SPIRIT_DEFINE
macro:
BOOST_SPIRIT_DEFINE(r);
Behind the scenes, what's actually happening is that we are defining a parse_rule
function in the client namespace
that tells X3 how to invoke the rule. For example, given a rule named my_rule
and a corresponding definition
named my_rule_def
, BOOST_SPIRIT_DEFINE(my_rule)
expands to this code:
template <typename Iterator, typename Context> inline bool parse_rule( decltype(my_rule) , Iterator& first, Iterator const& last , Context const& context, decltype(my_rule)::attribute_type& attr) { using boost::spirit::x3::unused; static auto const def_ = my_rule_def; return def_.parse(first, last, context, unused, attr); }
And so for each rule defined using BOOST_SPIRIT_DEFINE
,
there is an overloaded parse_rule
function. At parse time, Spirit X3 recursively calls the appropriate parse_rule
function.
Note | |
---|---|
|
Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete entity for encapsulating rules. In X3, a grammar is simply a logical group of rules that work together, typically with a single top-level start rule which serves as the main entry point. X3 grammars are grouped using namespaces. The roman numeral grammar is a very nice and simple example of a grammar:
namespace parser { using x3::eps; using x3::lit; using x3::_val; using x3::_attr; using ascii::char_; auto set_zero = [&](auto& ctx){ _val(ctx) = 0; }; auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; }; auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); }; x3::rule<class roman, unsigned> const roman = "roman"; auto const roman_def = eps [set_zero] >> ( -(+lit('M') [add1000]) >> -hundreds [add] >> -tens [add] >> -ones [add] ) ; BOOST_SPIRIT_DEFINE(roman); }
Things to take notice of:
unsigned
.
_val(ctx)
gets a reference to the rule's synthesized attribute.
_attr(ctx)
gets a reference to the parser's synthesized attribute.
eps
is a special spirit
parser that consumes no input but is always successful. We use it to
initialize the rule's synthesized attribute, to zero before anything
else. The actual parser starts at +lit('M')
, parsing
roman thousands. Using eps
this way is good for doing pre and post initializations.
roman
and the
definition roman_def
are const objects.
class roman
. C++ allows you to declare the
class in the actual template declaration as you can see in the example:
x3::rule<class roman, unsigned> const roman = "roman";
bool r = parse(iter, end, roman, result); if (r && iter == end) { std::cout << "-------------------------\n"; std::cout << "Parsing succeeded\n"; std::cout << "result = " << result << std::endl; std::cout << "-------------------------\n"; } else { std::string rest(iter, end); std::cout << "-------------------------\n"; std::cout << "Parsing failed\n"; std::cout << "stopped at: \": " << rest << "\"\n"; std::cout << "-------------------------\n"; }
roman
is our roman numeral
parser. This time around we are using the no-skipping version of the parse
functions. We do not want to skip any spaces! We are also passing in an attribute,
unsigned result
,
which will receive the parsed value.
The full cpp file for this example can be found here: roman.cpp