Published 2006-12-12 10:16:00

I've been spending far to much time looking at the autocompletion in leds for D. The justification, while growing weaker the more time i spend on it, remains the same. Autocompletion and AutoHelp save a huge amount of time when writing code.

The PHP parser in leds is working quite well, is organically written, and has room to grow, the next major step is guessing object types by matching method calls on unknown objects, so while I spend about the same amount of time coding in D as I do in PHP now, I had decided to look at the D parser, and push that along.

Antonio had written the current parser which is in the dantfw project. It aimed to parse all C like languages, C, D, Java and C# etc. This meant that rather than following the D specification, the tokenizer and parser where quite generic. and lead to two issues.

  • When I added autocompletion as you type, rather than on demand (ctrl space), I noticed that it was attempting to autocomplete when i was typing "strings" and comments.
  • Quite often it appeared to be missing some key variables or methods that it should have known about.


The first issue I resolved by rewriteing the tokenizer (following the D specification closely), and re-tokenizing the document as you type (if it thinks your scope may have changed), then deducing the scope being input.

The second, I began to conclude was more related to the generic nature of the parser. It was making many assuptions about the language that turned out to be incorrect in D's context. The only, rather drastic solution was to rewrite the parser....

So on rather a quiet day i started that thankless task. Using Antonio's original design, of using object contructors to parse and eat their own tokens, I set about writing loads of switch/case combos.

My first effort worked reasonably well, but the more i wrote it, I began to realize that both the complexity, and the use of a series of tokens had a number of flaws. The declaration code for parsing methods, and variables was quite large, and very similar. (ending in a lot of redundant, duplicate code) Declaring multiple variables in one line also made the resulting abstract syntax tree objects quite klunky.

At this point I put it on hold for a while, partly to consider alternative approaches, and more to ensure paid work was not falling behind.

When i returned, I had come up with a few ideas

  • look at dmd for inspiration
  • check dsource to see if somebody had already done this
  • consider preparsing tokens into a tree before sending them to the parser.
From studing the dmd parser, I realized that Walter had gone for a parser Object in C++, (rather than our constructor parsing) and broken the different scopes into different parsing routines When the current parsing routine they hit a syntax patern that matched it called the parse do deal with that scope, that in turn added a new object to the AST Stack. He also appeared to skip all Comments/Whitespace and EOL's from the token fetching routine, (actually merging the comment block into the next token, although it's a touch more complex than that.)

From dsource I found codeanalyser, which after a bit of hacking to remove what appeared to be the C++ memory stack allocation routines. I finally got to compile (without crashing dmd), parse and output what looked a little bit like syntax trees.

The downside to this project was that extracting the detailed information I required (type definitions, method declarations, line numbers etc) was not really feasible, and the tree that was produced by the code included a significant amount of noise, (in that it frequently created tree nodes for failed syntax matches). Along with this, it would require some work to store the token data within the tree that was created.

On the upside, in debugging the compiling issues I began to get a better understanding Templates (one of those black magic features of D). I ponder if the introduction to templates in D should basically start with the statement.
"D does not have preprocessor macro's, it has Templates" as while Templates are considerably more powerfull than Macro's, from a comprehension point of view, that is effectively what they are. And, given that D's decision not to have macro's, on what appears to be a sensible view that they obfusicate the code (along with often making it a nightmare to compile), I consider Templates to be on the list of features to be used with extreme caution, as they have a similar obfusicating effect, although perhaps not to the same degree....

Anyway, so the last option was to chuck my first draft, and use an idea of building a token tree, and passing that to the parser, rather than just giving a series of tokens. Having got about half as far as I did before, I consider this one of those little gem's of an idea. That makes writing the Parser considerably simpler..

Consider the simple statement
void main(char[][] argv) { }
Which would tokenize into the following.
Token.T_void : void
Token.IDENTIFIER main
'('
Token.T_char char;
'['
']'
'['
']'
Token.IDENTIFIER argv
')'
'{'
'}'

Using a post tokenizing tree building routine, it now looks like this, which from the perspective of the parser, is considerably simpler to deal with.

Token.T_void : void
Token.IDENTIFIER main
'('
Token.T_char char;
'['
']'
'['
']'
Token.IDENTIFIER argv
')'
'{'
'}'
From a pattern perspective, we are just looking for
BaseType, IDENTIFIER '(' '{' == a method declaration.
(although it's a bit more complex than that in real life!)

The tokens inside of a '(', '{', '[' collapse into that token, so we can either ignore them, when not needed, or send them as a set to a parsing routine. which doesnt have to keep dealing with nesting or determining the closer.

Anyway, the new parser is slowly ticking away, once it parses correctly, the next step is to work out how to make the resolver cleaner...
Mentioned By:
google.com : april ( referals)
google.com : december ( referals)
www.planet-php.net : Planet PHP ( referals)
google.com : php autocompletion ( referals)
google.com : digitalmars ( referals)
www.phpeye.com : Autocompletion in leds for Digitalmars D - Alan Knowles|PHP教程|PHP新闻|PHP5|PEAR|PHP框架|PHPEye - Powered by HappyCMS ( referals)
www.megite.com : Megite Jwynia News: What's Happening Right Now ( referals)
google.com : php autocompletion linux ( referals)
syntux.net : Don’t say geek say syntux! - Your freedom is worth more than you think. Take advantage of it while you can. ( referals)
google.com : black magic information ( referals)
google.com : Digital mars d ( referals)
google.com : digitalmars d ( referals)
google.com : generic parser ( referals)
google.com : php autocomplete linux ( referals)
google.com : php autocomplete macro ( referals)
google.com : Tokenizing Autocomplete ( referals)
www.stefankoopmanschap.com : news aggregator | Stefan Koopmanschap's RSS ( referals)
syntux.net : Testing your code - Don’t say geek say syntux! - Your freedom is worth more than you think. Take advantage of it while y ( referals)
planet.debian.org.hk : Debian HK : Debian @ Hong Kong ( referals)
google.com : autocomplete tokenizer ( referals)

Add Your Comment

Follow us on