|
Well, I've postponed it long enough. It's time to look at the production for a number. Normally, this would be the grammar that the lexer would follow, not the parser. However, I'm just going to drop in these rules and include it as part of the overall parser we are working on. In fact, if you've been using the earlier programs, you are already using the grammer we'll talk about here. NumbersI think most of us are now used to numbers, including integers, decimal values, and scientific values. It would be nice to have a set of productions that properly relate to them. Here's my version:
If you try this out on various legal and illegal numbers, I think you'll find it's works well. Try and trip it up! Scanning TextLet's focus now on the detailed process of analyzing some text. I've said, out the outset, that this parser will only process single strings. This simplifies the management of scanning the text, somewhat, and allows us to focus on the important details without getting overly hung up on scanning details. I'll leave more sophisticated text scanning for you. One thing that should be clear now is that we need to keep track of where we are within the string. This means some kind of "position" needs to be tracked, along with the string itself, to tell us where the next character is. This position needs to be updated as the scanning proceeds. We may also need to keep track of the various values as parsing proceeds, but I'm leaving that as a later concern. For now, let's focus on the scanning process and simply determining whether or not the mathematical expression is valid. I'm using a variable called eqpos which is the position within the string called eq that we are currently processing. It always starts out with the value of 1 so that it refers to the first character in the string. As the string is processed by the various productions, eqpos is advanced. If the entire equation matches our production rules, then eqpos will point past the end of the string when everything is done and the final status will be TRUE (-1) from Expression. If some failure occurs, then eqpos will point to the first unprocessed character and that fact can be used to illustrate where the error in the equation is located. I've neglected much talk about spaces, tabs, and other forms of so-called 'white space.' But it's necessary, in practice, to make some decisions about it. I've already been including, in prior code examples, the use of SkipSpaces. But I haven't actually shown the code for it, just yet. For my purposes, the only valid white space is a space character. Anything else, such as a tab, will be rejected as invalid input. However, it would be easy to add that support in the routine. I'll leave that for you, if you are interested. Also, while it's fine for spaces to be present elsewhere, I think it should be forbidden to include spaces in the middle of a number -- numbers should not be broken up with spaces, in other words. Okay, that said, let's look at SkipSpaces:
This routine advances eqpos until it points at a character in eq that is NOT a space. That's about all. I've also been using a routine called Match. Let's look at it: FUNCTION Match% (charlist AS STRING, eqpos AS INTEGER, eq AS STRING) DIM status AS INTEGER IF eqpos <= LEN(eq) THEN IF INSTR(charlist, MID$(eq, eqpos, 1)) <> 0 THEN LET eqpos = eqpos + 1 LET status = -1 END IF ELSE LET status = 0 END IF LET Match = status END FUNCTION That routine does exactly what I said and updates 'eqpos' if the character is matched. The return status is (-1), if successful, and (0), if not. You can provide more than one character in the string called charlist that is passed to Match. In this way, you can check for several different characters matching the next character in eq. Well, that's the two helper routines mentioned earlier. Number Coding?Okay, it's time to get back to writing the code for scanning numbers. Here's everything it takes:
The above example code also illustrates a coding style I've used earlier, but what to call out here. If I'm writing code for a FUNCTION, I place the value assignment for that function at the very bottom of the routine. In cases like this, where there is a status value, I create a temporary variable in the routine, assign the status to it, then at the bottom transfer that value to the function's value. It's not obvious here, why I do that, because the above routine could be trivially reduced to a single statement. But I do it for consistency with other routines I'll be writing here where things aren't so trivial -- if I need to change the name of the FUNCTION, I only have one place I need to edit and this reduces editing mistakes. Since BASIC will often just create a variable without telling you that you made a typing error, this helps me. In any case, this is my style for these examples. (In fact, this detail has helped me in writing all the earlier examples, where I sometimes needed to just change the name of a routine.) As I said, you've already been using these routines in the code, so far.
Last updated: Saturday, January 14, 2006 00:02
|