6. Numbers

Well, I've postponed it long enough. It's time to look at the production for a number.

Normally, this would be the grammar that the lexer would follow, not the parser. However, I'm just going to drop in these rules and include it as part of the overall parser we are working on. In fact, if you've been using the earlier programs, you are already using the grammer we'll talk about here.

Numbers

I think most of us are now used to numbers, including integers, decimal values, and scientific values. It would be nice to have a set of productions that properly relate to them.

Here's my version:

digits := digit digits | <null>
digit := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
mantissa := . digit digits | digit . digits | digit digits
scaleid := e | E
scale := scaleid sign digit digits | <null>
sign := + | - | <null>
number := sign mantissa scale

If you try this out on various legal and illegal numbers, I think you'll find it's works well. Try and trip it up!

Scanning Text

Let's focus now on the detailed process of analyzing some text. I've said, out the outset, that this parser will only process single strings. This simplifies the management of scanning the text, somewhat, and allows us to focus on the important details without getting overly hung up on scanning details. I'll leave more sophisticated text scanning for you.

One thing that should be clear now is that we need to keep track of where we are within the string. This means some kind of "position" needs to be tracked, along with the string itself, to tell us where the next character is. This position needs to be updated as the scanning proceeds. We may also need to keep track of the various values as parsing proceeds, but I'm leaving that as a later concern. For now, let's focus on the scanning process and simply determining whether or not the mathematical expression is valid.

I'm using a variable called eqpos which is the position within the string called eq that we are currently processing. It always starts out with the value of 1 so that it refers to the first character in the string. As the string is processed by the various productions, eqpos is advanced. If the entire equation matches our production rules, then eqpos will point past the end of the string when everything is done and the final status will be TRUE (-1) from Expression. If some failure occurs, then eqpos will point to the first unprocessed character and that fact can be used to illustrate where the error in the equation is located.

I've neglected much talk about spaces, tabs, and other forms of so-called 'white space.' But it's necessary, in practice, to make some decisions about it. I've already been including, in prior code examples, the use of SkipSpaces. But I haven't actually shown the code for it, just yet. For my purposes, the only valid white space is a space character. Anything else, such as a tab, will be rejected as invalid input. However, it would be easy to add that support in the routine. I'll leave that for you, if you are interested. Also, while it's fine for spaces to be present elsewhere, I think it should be forbidden to include spaces in the middle of a number -- numbers should not be broken up with spaces, in other words.

Okay, that said, let's look at SkipSpaces:

SUB SkipSpaces (eqpos AS INTEGER, eq AS STRING)

        DO WHILE eqpos <= LEN(eq)
            IF MID$(eq, eqpos, 1) <> " " THEN
                EXIT DO
            END IF
            LET eqpos = eqpos + 1
        LOOP

END SUB

This routine advances eqpos until it points at a character in eq that is NOT a space. That's about all.

I've also been using a routine called Match. Let's look at it:

FUNCTION Match% (charlist AS STRING, eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        IF eqpos <= LEN(eq) THEN
            IF INSTR(charlist, MID$(eq, eqpos, 1)) <> 0 THEN
                LET eqpos = eqpos + 1
                LET status = -1
            END IF
        ELSE
            LET status = 0
        END IF

    LET Match = status

END FUNCTION

That routine does exactly what I said and updates 'eqpos' if the character is matched. The return status is (-1), if successful, and (0), if not. You can provide more than one character in the string called charlist that is passed to Match. In this way, you can check for several different characters matching the next character in eq.

Well, that's the two helper routines mentioned earlier.

Number Coding?

Okay, it's time to get back to writing the code for scanning numbers. Here's everything it takes:

SUB Digits (eqpos AS INTEGER, eq AS STRING)

        DO WHILE Digit(eqpos, eq)
        LOOP

END SUB

FUNCTION Digit% (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        LET status = Match("0123456789", eqpos, eq)

    LET Digit = status

END FUNCTION

FUNCTION Mantissa% (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        IF Match(".", eqpos, eq) THEN
            LET status = Digit(eqpos, eq)
            IF status THEN
                Digits eqpos, eq
            END IF
        ELSEIF Digit(eqpos, eq) THEN
            Digits eqpos, eq
            LET status = -1 OR Match(".", eqpos, eq)
            Digits eqpos, eq
         ELSE
            LET status = 0
        END IF

    LET Mantissa = status

END FUNCTION

FUNCTION ScaleID% (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        LET status = Match("eE", eqpos, eq)

    LET ScaleID = status

END FUNCTION

SUB Scale (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER, savepos AS INTEGER

        LET savepos = eqpos
        IF ScaleID(eqpos, eq) THEN
            Sign eqpos, eq
            LET status = Digit(eqpos, eq)
            IF status THEN
                Digits eqpos, eq
            ELSE
                LET eqpos = savepos
            END IF
        END IF

END SUB

SUB Sign (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        LET status = Match("+-", eqpos, eq)

END SUB

FUNCTION Number% (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER, dummy as INTEGER

        Sign eqpos, eq
        LET status = Mantissa(eqpos, eq)
        IF status THEN
            Scale eqpos, eq
        END IF

    LET Number = status

END FUNCTION

The above example code also illustrates a coding style I've used earlier, but what to call out here. If I'm writing code for a FUNCTION, I place the value assignment for that function at the very bottom of the routine. In cases like this, where there is a status value, I create a temporary variable in the routine, assign the status to it, then at the bottom transfer that value to the function's value. It's not obvious here, why I do that, because the above routine could be trivially reduced to a single statement. But I do it for consistency with other routines I'll be writing here where things aren't so trivial -- if I need to change the name of the FUNCTION, I only have one place I need to edit and this reduces editing mistakes. Since BASIC will often just create a variable without telling you that you made a typing error, this helps me. In any case, this is my style for these examples. (In fact, this detail has helped me in writing all the earlier examples, where I sometimes needed to just change the name of a routine.)

As I said, you've already been using these routines in the code, so far.

Last updated: Saturday, January 14, 2006 00:02