When parsing Javascript, what determines the meaning of a slash?

Javascript has a tricky grammar to parse. Forward-slashes can mean a number of different things: division operator, regular expression literal, comment introducer, or line-comment introducer. The last two are easy to distinguish: if the slash is followed by a star, it starts a multiline comment. If the slash is followed by another slash, it is a line-comment.

But the rules for disambiguating division and regex literal are escaping me. I can't find it in the ECMAScript standard. There the lexical grammar is explicitly divided into two parts, InputElementDiv and InputElementRegExp, depending on what a slash will mean. But there's nothing explaining when to use which.

And of course the dreaded semicolon insertion rules complicate everything.

Does anyone have an example of clear code for lexing Javascript that has the answer?

29
задан Ned Batchelder 1 April 2011 в 22:38
поделиться