I’ve recently started a research project to port a Ruby workflow engine (OpenWFEru) to Python. I thought that if I could write a basic Ruby parser I could at least form the skeleton of my project (since package, module, and class notations are similar between the languages, and mapping the Abstract Syntax Tree to actual code would be a breeze.)
I did a little Googling to find a Ruby grammar that I could work from, and there seemed to be a little activity around using Antlr. Antlr does support generating Python parsers, so I downloaded it and the Ruby grammar and gave it a shot.
When I ran Antlr against the grammar I got a bunch of warnings about lexical non-determinism. That didn’t surprise me given Ruby’s complexity, but a parser and a lexer was generated and I hoped that it would suffice. However, my hopes were immediately dashed when I fed actual Ruby code into the lexer:
error: exception caught while lexing: unexpected char: ‘#’
It didn’t understand comments! I really only wanted to spend a little time on this portion of the project (what’s commonly referred to as a “spike”) so that’s as far as I took it. The Antlr docs spoke of comment ambiguity in the Python parsers, but I didn’t have time to dig deeper. Along the way I Googled to see how the Python community handles its parser generator needs, and ran across this excellent analysis. YAPPS seemed to be the clear winner in that case, but in order to use a different parser I would probably need to translate the only Ruby grammar I found into a different syntax.
So for now I’ll be using the parser generator between my ears and lexing with my eyes. If anyone’s had better success at parsing Ruby, please leave a comment below.