1. Expressions

To start this process, we could take this from two directions:

start out trying to understand the detailed anatomy of a number and then work out from there to expressions involving combinations of them; or,
start out describing the general form of an expression and leaving the details of worrying about number anatomy until later.

In some sense, this is a kind of bottom-up versus top-down choice. Both will work fine in this case. The definition of number formats is so well worked-out by now so that it integrates smoothly with algebraic expressions that we can be pretty sure we won't need to modify what a number is after we start working on the expressions. So we are safe in choosing to work on the numbers, first. But it would also work about as well to proceed from defining the general anatomy of expressions, first -- similar reasoning suggests that will succeed, too.

I'm going to start with the anatomy of expressions (direction 2, above) and eventually wind up talking about the details of what a number looks like, later on. I'm choosing this approach because most of us already have a fairly concrete idea in our heads about what constitutes a number and we can safely leave the precise details until later and cover it as a "necessary evil," then.

Let's first list some possible expressions. Maybe that will help in getting some initial ideas started.

45
23 + 10
3 * 4
( 6 + 5 ) * 7
2 * -8
6 + ( 4 - (2) ) * -3
(-6)

We are probably certain we'd like to accept addition and subtraction, and multiplication and division, too. What about raising a value to a power? I think so, but for now let's defer that detail. In fact, this suggests an idea -- why don't we defer ALL of the mathematical operations for a moment and see where that takes us?

Handling Parentheses

If we defer considering the mathematical operations until later, we have the following general possibilities for an expression:

number
( number )
( ( number ) )
( ... ( number ) ... )

If you aren't familiar with my use of ... above, it's an ellipis and just means "more of the same." In this case, more pairs of parentheses. It's short-hand. And rather than using specific numbers here, I've decided to just say number and leave the details of exactly what that is until later on. We can pack the number inside as many parenthesis pairs as we want, or none at all. Without the math operations to worry about, there's not much else -- it's just numbers and parentheses.

How might we write a specification for this, though? Perhaps something like:

expression := number | ( expression )

Before going on, let's see how what I mean by the way I wrote that. First, when I use lower-case name above, it's a production and is meant to be taken symbolically. A number does not mean the literal text, "number," for example. It means any general number can be inserted at that point. It's conceptual. Also, the single character, |, means --or--, as in this OR that. It's helpful short-hand to learn. Finally, the := character pair means IS -- the thing on the left side IS the thing on the right side. In this case, an expression is declared to be the same thing as either a number or else an open paren followed by an expression followed by a close paren.

Here's how you should read it: "An expression is either a number or else it is an open paren, followed by an expression, followed by a close paren." This kind of special notation is often called a "production" in the specialized literature (at least, those I've read.)

Note that this production is recursive. A concept on the right side uses a concept on the left side that uses that same concept, yet again. An expression is either a number or else an enclosed expression. But an enclosed expression can be either a number or an enclosed expression. Etc. In other words, it only ends when you decide to say that an expression is a number. At that point, the process ends. So it kind of goes like this:

number                        first option
( number )                    second option, first option
( ( number ) )                second option, second option, first option
( ... ( number ) ... )        etc.

The nice thing about specifying an expression that way is that it compactly describes what an expression really is at this point. We don't need to list each possibility -- instead, we can simply refer back, recursively, to an expression and capture all the possibilities that way.

It should be clear now that arranging it so that what goes inside the two parens is another expression is what allows nested parens -- otherwise, if what went inside was just a number, then we couldn't nest the paren pairs. Using that production, I think you can see how you can construct all of the simplest examples mentioned, those composed only of a number or a number surrounded by one or more parenthesis pairs.

Coding?

Let's connect what we've discussed above to the idea of writing code for it. How might we write a program to analyze the above syntax?

Well, let's say we had two subroutines, called Expression and Number. When called, the Expression subroutine would first check for an open paren, right away. If one is present, Expression would then simply call itself to get the value of the expression inside and then, upon getting that value, it would then insist on seeing a close paren in order to match it up with the open paren it saw earlier. And if so, it could return successfully with the value it returned to itself. However, if Expression didn't see an open paren to begin with, it would instead call the subroutine Number to get the value of the number and return with the result.

Note that when Expression sees an open paren to begin with, it calls itself. Recursive calls like this are common in "recursive descent" parsers. In fact, it's where the name comes from.

We haven't yet defined exactly what a number should look like so we really can't code up Number, yet. But we can get a look at what Expression might look like and this concrete example might help a little:

FUNCTION Expression% (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        IF Match("(", eqpos, eq) THEN
            LET status = Expression(eqpos, eq)
            IF status THEN
                LET status = Match(")", eqpos, eq)
            END IF
        ELSE
            LET status = Number(eqpos, eq)
        END IF

    LET Expression = status

END FUNCTION

There are some important details to examine, here.

First, the routine returns a value. This value is 0 (FALSE) if the expression wasn't matched properly and returns -1 (TRUE) if the expression was matched. The routine first checks for an open paren. If this is matched, then it calls itself recursively to see if an expression lies within a parenthesis pair. It then checks, if an expression was found to be valid, for the closing paren. Third, if an open paren isn't found, then it must be a number and that function is called to verify this fact. Either way, the resulting status is returned.

Second, the routine is passed two variables, eqpos and eq. eq is the equation string, in its entirety, and eqpos is the current position in the equation string. Functions like Match and Number may make adjustments to eqpos, depending on their success. They won't change eq, though.

Third, I've introduced a "helper" function called Match, which simply checks to see if the next character in the string happens to be in the provided list. The provided list here is always just one character, so only that character can be matched. If Match does detect a match, it returns TRUE (-1) and adjusts the eqpos variable to reflect the fact that a match occured.

So, if an open paren is matched, eqpos is updated by Match so that the next routine seeing eqpos will not "see" the open paren. The routine then asks itself to examine the string from that point (after the open paren) on to see if another expression can be validly matched. If so, eqpos will be updated automatically so that the matched expression is now skipped and then the routine can check for the final close paren, which is needed to match up with the earlier open paren. However, if an open paren isn't seen in the first place, then the routine insists on finding a number.

One thing to notice, if you haven't already, is that this method won't handle spaces. The expression, ((5)), is valid, while, ((5 )), is not. The inserted space is enough to cause problems. To fix that, we need another "helper" function:

FUNCTION Expression% (eqpos AS INTEGER, eq AS STRING)

    DIM status AS INTEGER

        SkipSpaces eqpos, eq
        IF Match("(", eqpos, eq) THEN
            LET status = Expression(eqpos, eq)
            IF status THEN
                SkipSpaces eqpos, eq
                LET status = Match(")", eqpos, eq)
            END IF
        ELSE
            LET status = Number(eqpos, eq)
        END IF

    LET Expression = status

END FUNCTION

It we skip spaces before a hard character. like the open and close parens, then we've got it covered. Now spaces can take place anywhere except for inside a number, itself.

To test all this, you can try the following:

EQ_01.ZIP or EQ_01.BAS: An example program that will only accept the an expression of the type mentioned so far. In this example, the code also includes routines to handle examining numbers, but you can ignore that necessity for now. Just note the code for Expression and the support routines called Match and SkipSpaces, if you are interested.

Last updated: Friday, January 13, 2006 23:50