Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyPG doesn't properly parse keyword based grammar #20

Open
jrleek opened this issue May 22, 2015 · 2 comments
Open

TinyPG doesn't properly parse keyword based grammar #20

jrleek opened this issue May 22, 2015 · 2 comments

Comments

@jrleek
Copy link

jrleek commented May 22, 2015

I could be wrong, but I'm pretty sure this is a bug in TinyPG. It isn't able to properly parse this grammar:

EOF -> @"^\s_$";
[Skip] WHITESPACE -> @"\s+";
LIST -> "LIST";
END -> "END";
IDENTIFIER -> @"[a-zA-Z_][a-zA-Z0-9_]_";
Expr -> LIST IDENTIFIER+ END;
Start -> (Expr)+ EOF;
The resulting parser cannot parse this:

LIST foo BAR Baz END
because it greedily lexes END as an IDENTIFIER, instead of properly as the END keyword.

@Theoistic
Copy link

Here is an example from the Simple-CIL-compiler project,
The identifier has to catch single words except the ones listed, which means you have to include the exception token's in to the identifier

IDENTIFIER-> @"[a-zA-Z_][a-zA-Z0-9_]*(?<!(^)(end|else|do|while|for|true|false|return|to|incby|global|or|and|not|write|readnum|readstr|call))(?!\w)";

Hope that helps.

@ultrasuperpingu
Copy link

ultrasuperpingu commented Jan 25, 2017

First of all, sorry for my english which is far from perfect. I hope this post is still understandable.

I had the same issue. But because I wasn't attempting to match a particular grammar, I just modified it as a workaround (here, I would just replace LIST and END tokens by something like [ and ]). But I looked in the code why this wasn't working. It is because of the "Partial Context Sensitive/Ambiguous Grammars" feature (take a look at the documentation here). The parser asks to the scanner to look ahead for expected tokens but, if the rule has a OneOrMultiple cardinality (+), the expected token list does not contains the following rules first terminal tokens. That means that, in this example, while parsing the IDENTIFIER+ rule, the lookahead only looks for IDENTIFIER tokens, matches the END token as an identifier and consumes it.

The solution here would be to provide too the next rule(s?) first terminals list as expected token. I will give it a try and, if working, propose it as a pull request.

Edit: Pull request submited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants