Basic!'s code blocks - RFO-BASIC/Basic GitHub Wiki

Summary

Basic!'s code blocks are currently independent of each other, which can cause some interesting, confusing, and hazardous side effects .

There has been discussion of combining code blocks into one stack, but it is still too early to tell the benefits or repercussions.

The issue can be discussed further here on,Paulon0n's fork issue.

Background

"Code block" here means a block of code that is defined by a control structure. For example,
IF a < b
PRINT "Less"
ELSE
PRINT "More"
ENDIF
consists of two independent code blocks. One is executed if the condition is true, and the other is executed if the condition is false. Other commands that define code blocks are FOR/NEXT, DO/UNTIL, WHILE/REPEAT, and SW.BEGIN/SW.END. Each of these, except IF/ELSE, has a corresponding break (e.g., SW.BREAK).

Any of these structures can be nested. BASIC! manages nested structures by putting information in stacks. Each type of structure has its own stack. For example, a FOR/NEXT loop needs to know the current index value, the index limit, and the step. Because each structure type has its own stack, the stacks operations are very fast. The most common, IF/ELSEIF/ELSE/ENDIF, is the fastest, since each stack entry carries only one number (an object of the Integer class.)

This works very well as long as entry and exit from each block is orderly, meaning block are entered only by a single kind of control statement, and exited in the reverse order from which they were entered -- last-in-first-out stack order.

So far, we have identified three ways to break this order ("block-busters"):

  • An IF/ELSE block nested within another block executes a break from the outer block. This exits the IF block and without hitting its ENDIF.
  • A user-defined function is called from a code block, and the function contains a break of the same type as the block. This is a syntax error but may not be caught (@Paulon0n, can you confirm this?)
  • GOTO. It has long been understood that GOTO is the nemesis of block-structured code. GOTO within a block, or outside of all blocks, is never a problem. GOTO out of a block is conceptually safe, but BASIC! currently does not detect this case and manage its stacks as needed. GOTO into a block is almost always a really bad idea. Fortunately, a GOTO that crosses block boundaries almost always causes a syntax error -- for example, it may look like a NEXT with no FOR. If the user manages the code carefully, there will be no error at all. Unfortunately, not all errors will be caught, and some failures will just give bad results with no error message. [We probably need to come up with some cases to prove that statement!]

Another way to look at the problem:

When IF/ELSE finishes a block, it scans forward through the text looking for "ELSEIF", "ELSE", or "ENDIF". When SW.BREAK exits a switch, it scans forward looking for "SW.END". It does not look for "ENDIF". One kind of control structure skips over the termination command of other kinds of control structures without executing them. This breaks the block structure. (This comment does not apply to GOTO, because GOTO does not scan. It jumps directly to the labeled line.)

There is a difference between the scan after an IF/ELSE block finishes and the scan after any of the xx.BREAK commands. An IF, ELSEIF, or ELSE sets a flag in the IfElseStack so that the interpreter will skip lines as it reads them. The other control blocks are different - they scan immediately, in the BREAK keyword's execution function.

Proposal

@Paulon0n has designed a stack structure that combines all the control stacks into a single stack, referred to as a "token stack". A partial implementation is available for review and test on @Paulon0n's fork. [Link?]

We would like to discuss if this structure should be implemented in BASIC!. Feel free to modify this page to carry on the discussion.

Every design change is a trade-off. A fix that affects everybody but helps only a few might not be worth it, depending on how much it affects everybody and how many it helps.

Pros

  • The biggest reason to implement the single token-stack is that it could be extended to solve the two block-busters described above that do not involve GOTO.
  • Implementation so far indicates it is a simple change (large, but simple) from the current multiple-stack implementation, making the implementation relatively safe.
  • Block-structuring code is generally considered better coding practice than the flat field of the original GOTO/GOSUB-based BASIC. BASIC! should support block-structuring as well as possible.

Cons

  • Large change - the code change is simple, but the conceptual change is large, which is by definition risky. BASIC! has been getting along just fine for quite a while now, asking its users to use control structures and GOTO appropriately.
  • Incomplete - we have not determined yet how to fix the GOTO problem, even with the single token-stack.
  • Consideration of alternatives. Is there a better way?
  • Possible performance impact - replacing a simple Integer in IfElseStack with a Bundle may be expensive. On the other hand, some of the other control stacks are already Bundles.

Alternatives

Static structural analysis - a preprocessor/prescanner could find all the block boundaries in the code and record where they are. Each jump (break, if/else boundary, or goto) would check the record to see what boundaries are being crossed. This starts to sound like a compiler. Start-up penalty could be substantial.

Smarter text scan - this is based on "Another way to look at the problem", above. We could fix non-GOTO block-busters by scanning for all kinds of block-termination commands, not just the current block type's terminator. This is a much smaller conceptual change than re-writing the control stacks.

Alternate control implementations - the most complex and most fragile structure is the IF/ELSEIF/ELSE/ENDIF tree. Is a nested control structure the only way to manage it? Can an IF tree be managed with a Finite State Machine whose "events" are control structure commands? How did early BASICs (e.g., Atari) handle them?


Atari and other early Basics had single line IF statements ...

IF condition THEN statement

All on one line. No ELSE, ELSEIF, ENDIF. No blocks.


Would a Java class for the stack nodes be faster than a Bundle? (Or is Bundle fast enough and we don't need to worry about it?)

Are we trying to solve a real world problem here? I [Paul] have only seen a couple of cases of overflowing IF/ELSE stacks. I have seen one or two cases of user function stacks overflows. I have never seen any cases of other control stack overflows. Is this a good case for the application of, "If it ain't broke, don't fix it?"

"If it ain't broke" always depends on your definition of "broke". GOTO breaks block-structured programming, and outside of a compiler that's hard to fix. So maybe we tell users who use it to use it carefully and leave it at that. The detector that warns of IF/ELSE stack abuse helps users figure out what they did wrong. I think that covers the real-world problems.

But I [Marc] am an idealist, I guess. Right now there is a way good code can corrupt BASIC! stacks. It does not happen often, so it should be low priority, but I would still like to see that fixed. And I'd like to find a way to fix it that does not turn BASIC! on its ear. We need to find a way to define the problem that suggests a good solution, one that works within what BASIC! is. We don't have that yet.