Indexing - r3n/rebol-wiki GitHub Wiki

Table of Contents Introduction Does 0 "exist"? What should an index be? X-based indices The NEXT function The SUCC function The benefits of the SUCC function Help string BACK function compatibility More flexible indexing PICK function compatibility The BACK function Getting the value at the current position The INDEX? function The AT function The SKIP function The PICK function Simple index arithmetic task Summary

Introduction

This article is meant as an illustration of the principles behind indexing. Unless stated otherwise, the examples were tested using:

rebol/version ; == 2.7.8.3.1

Does 0 "exist"?

To start the discussion in a "philosophical" way, let's discuss one of the "philosophical" themes first.

Some opponents of 0 state: "0 does not even exist, it should not be allowed as an index, then"

It is fair to admit that the question whether 0 "exists" may indeed be resolved by stating that it actually does not "exist". However, that does not mean the opponents of 0 could start to celebrate. The situation is the same as with the number 1, e.g. Does 1 "exist"? Not "much", we cannot find it in the real world any easier than we can find 0. Note that many Greek mathematicians did not consider 1 to be "a number", so to them 2 was "the smallest number" (they used the term "multitude" and it is easy to see why they considered inappropriate to call 1 "multitude").

It suffices to say that 1 is not something that "exists" in the common sense of the word, it is just an abstract notion invented by clever people to point to the common property of a hand containing one apple, a letter consisting of one page, a room in which one man is sitting, a net containing just one fish, a purse containing just one coin, etc.

In a similar sense 0 is as an abstract notion pointing to the common property of a hand containing no apple, an empty envelope, a room in which nobody is sitting, a net containing no fish, an empty purse, etc.

What should an index be?

For the purpose of referring to positions (elements) in a series like Rebol blocks, strings, etc. we need an index value.

The index value should be an integer. This statement may look uncontroversial, but it may be useful to mention the reasons why:

An element in a series may have a next element, its successor in the series. The successor element is such an element that there is no element between the element and its successor in the series.

In integer arithmetic there is also a successor operation. The successor of an integer is the integer increased by one.

The correspondence between the successor operations in the series and in the range of integers (the "successor to successor correspondence") is the reason why integers are used for indexing.

When the successor to successor correspondence is maintained, other operations are in correspondence as well, for example:

skipping two elements in a series corresponds to adding 2 to the index,
skipping one element in a series corresponds to adding 1 to the index,
skipping no element in a series (remaining at the same place, in fact) corresponds to adding 0 to the index
skipping one element backwards corresponds to subtracting 1 from the index (or adding -1 to the index),
skipping two elements backwards corresponds to subtracting 2 from the index (or adding -2 to the index)

This also illustrates that when the successor to successor correspondence is kept, knowing the index of one element we can calculate indices of all other elements using addition.

X-based indices

If the successor to successor correspondence is kept, knowing the index of one element in a series determines the indices of all other elements. Usually, the element with the smallest index value is called the base element. If the index of the "base element" is x, the index is called x-based.

The NEXT function

The next function is not an "indexing operator" not doing any indexing for us (not defining any correspondence between indices and positions in a series). However, as discussed above, the "successor to successor correspondence" is crucial for indexing. Therefore, it is wise to examine whether the NEXT function can serve as a successor operator.

The help string of the next function says:

"Returns the series at its next position."

Let's test how it works:

s: [a]
index? s ; == 1
mold/all s ; == [a]
next-s: next s
index? next-s ; == 2
mold/all next-s ; == #[block! [a] 2]
same? next-s s ; == false

In this case we took series s and used the next function to obtain its successor next-s. We obtained a new, distinct, "successor" series, as all our examinations proved.

Also, there is one more property worth mentioning. The next-s series is a tail, as the following test reveals:

same? next-s tail next-s ; == true

However, if we continue using the next function on next-s:

next-next-s: next next-s

we get:

index? next-next-s ; == 2
mold/all next-next-s ; == #[block! [a] 2]
same? next-next-s next-s ; == true

This illustrates that when supplying the tail series, the next function does not yield its successor, instead it just yields the series tail.

The SUCC function

To determine whether the next function is the successor operator we need to find out whether it could yield the successor of the next-s series. If yes, then the next function isn't actually the successor operator, and its help string is misleading.

It turns out that our concerns are justified. To prove it, let's define:

succ: func [
    "Returns the series at its next position."
    series [series!]
    /local t
] [
    t: tail :series
    insert/dup t #"^(00)" 1 + (index? :series) - (index? :t)
    series: next :series
    clear :t
    :series
]

Now our testing reveals:

succ-next-s: succ next-s
mold/all succ-next-s ; == "#[block![a]3]"
same? succ-next-s next-s ; == false

, i.e., we really obtained the successive position confirming our concerns.

The benefits of the SUCC function

Our new succ function is actually our proposal how the new next function implementation should behave. Let's list some of the advantages of the new behaviour.

Help string

The help string of the succ function isn't misleading. This is not a big advantage, however. For the original next function implementation to not have a misleading help string it would suffice to adjust the help string to "Returns the series at its next position, stops at tail."

BACK function compatibility

This is a bigger advantage. To demonstrate it, let's see how the current next function behaves:

same? s: tail [a] back next s ; == false

, while:

same? s: tail [a] back succ s ; == true

More flexible indexing

Together with the improved back function it generates the new, improved skip function, which definines new (essentially "any base") indexing. (see the discussion in the SKIP section)

PICK function compatibility

Defining the new curr function and pick function different index access methods become compatible (finally).

The BACK function

The back function help string states:

"Returns the series at its previous position."

Similarly as with the next function we may want to determine whether the help string is misleading or not. We can find out that the back function does not yield the preceding position in case it obtains a head series.

Now we may want to find out whether it is actually possible to obtain a position preceding the head. Like in the case of the next function, such series exist in Rebol:

index? #[block! [1 2] -2] ; == -2

Therefore, it makes sense to redefine the back function as proposed for the next function, i.e., so that the function yields a predecessor of the given series without stopping at head.

Getting the value at the current position

Such a function already exists in R2, it is called first, but there are several issues with it:

the function name assumes that the current position is (relatively to itself) indexed by 1. If the successor to successor correspondence is kept, index 0 (relative to the current position) would need to be assigned to the position immediately preceding the current position. Opponents of this (relative) indexing method correctly object that it is counterintuitive and uncomfortable for 0 to "point one step backwards from the current position". The simple indexing arithmetic example below demonstrates that such choice makes index arithmetic more complicated than necessary. Last but not least, the function name doesn't need to assume any (relative) index value.
the behaviour of the first function was found uncomfortable when compared to the behaviour of the pick function in that it triggered an error for out-of-range (past-head, tail or past-tail) positions.

The INDEX? function

The index? function uses 1-based indexing, the "base element" being the series head. The series head has index 1. The correspondence between series successor and the integer successor (addition of 1) looks like being kept.

When testing the above succ-next-s series we obtain:

index? succ-next-s ; == 2

, while we know that all other indicators (the mold/all output, the same? function) demonstrate that it is not true.

Similarly as we patched the next function we can patch also the index? function:

index?': func [
    {Returns the index number of the current position in the series.}
    series [series! port!]
    /local t i
] [
    t: tail :series
    while [all [tail? :series not :series =? :t]] [
        insert tail :series #"^(00)"
    ]
    i: index? :series
    clear :t
    i
]

Now we obtain:

index?' next-s ; == 2
index?' next-next-s ; == 3

The AT function

This function uses different indexing than the index? function, the indexing of positions is "relative to the given series argument". If we examine the function thoroughly, we find that "the base element" (the element with the smallest index) can actually (depending on the situation) have any index not exceeding 1. Example:

s: skip [a b c d e f] 3
at s -2 ; == [b c d e f]
at s -3 ; == [a b c d e f]

However, the at function uses "ambiguous indexing":

at s 1 ; == [d e f g]
at s 0 ; == [d e f g]

, since it assigns two different indices (0 and 1) to one position. The ambiguous indexing cannot maintain the successor to successor correspondence. Moreover, it is the only ambiguous-type indexing used, i.e., it is not compatible with any other indexing used.

The SKIP function

The skip function uses indexing relative to the series argument as well, but, fortunately, it does not use ambiguous indexing like the at function. Maintains the correspondence between series positions and integers (its integer argument is called "offset", but the argument name is not relevant).

Similarly as the next function, though, it does not skip past tail, nor does it skip past head (interestingly, past-head positions are equally possible in R2 as past-tail positions). While this property looks like logically justifiable, there is a problem that this property is not compatible with indexing used by the pick function indexing.

Note that if skip were able to skip past head as well as past tail, any-based indexing would be possible. That is of advantage for some code, Rebol would be one of languages allowing any bounds for series (when using the appropriate skip), not just nonpositive minimal index and positive maximal index.

The PICK function

The pick function uses indexing relative to the given series. It is able to refer to the past-head:

pick [a b] -1 ; == none

yielding #[none] in such case or past tail:

pick [a b] 3 ; == none

yielding #[none] as well.

However, the pick function does not maintain the successor to successor correspondence. Example:

s: skip [a b c d e f g] 3
pick s -1 ; == c
pick s 1 ; == d

R2 assigns index -1 to the c element, while it does not assign 0 (the successor of -1) to the d element, assigning 1 to it instead. Because of that, the index arithmetic (the arithmetic correspondence between indices and positions in the series) is broken.

Simple index arithmetic task

Define a head-index? function obtaining a series s and an index value i yielding an index value j such that the pick s i expression will be equivalent to the pick head s j expression. Here are the solutions comparing different approaches:

The solution working when the index? function and the pick function both use indexing method compatible with the skip function indexing method:

head-index?: func [s [series!] i [integer!]] [i + index? s]

(This solution is the simplest.)

The solution working when the index? function and the pick function use the indexing methods from R3:

head-index?: func [s [series!] i [integer!]] [i - 1 + index? s]

(This solution is the second simplest.)

The solution working in the present state of R2:

head-index?: func [s [series!] i [integer!]] [case [i < 0 [i - index? s] i > 0 [i - 1 + index? s] i = 0 [0]]]

(This solution is the most complicated and it is neither "simple" nor "arithmetic". Neither it can be, having to deal with a successor to successor incompatible non-arithmetic indexing method.)

Summary

Function	Properties	Problems
NEXT	go to the successive position	unable to go past-tail
BACK	go to preceding position	unable to go past-head
SKIP	relative to the SERIES, nonpositive-based	unable to go past-tail or -head
AT	relative to the SERIES	ambiguous, unable to go past-tail or -head
INDEX?	1-based	lying for past-tail positions
PICK	relative to the SERIES, goes past-tail, -head	not s-to-s
POKE	relative to the SERIES, does not go past-tail, -head, otherwise compatible with PICK	not s-to-s

The next and back functions are not indexing functions. However, if they don't yield true successor and predecessor, they cannot be used for construction of a successor to successor compatible index, i.e. of a system with fundamentally correct indexing arithmetic.

The skip function based on true successor and predecessor functions would generate the proper indexing arithmetic automatically, but when based on incorrect principles, the indexing arithmetic generated by this function will also become incorrect (having to handle many unnecessary and incomfortable exceptions).

The at function is even worse:

it generates ambiguous indexing adding more exceptions to the already existing set of exceptions from the skip function
it uses index shift (addition of 1) of the series position generating some new boundary exceptions
due to the newly introduced sets of exceptions and due to shifting indexing generated by the at function is also incompatible with the indexing generated by the skip function

The pick function is similarly bad as the indexing generated by the at function:

it generates "indexing hole" introducing new exceptions to the already existing set of exceptions from the skip function
it uses index shift (addition of 1) of the series position generating the same additional boundary exceptions as the at function
the head-index? function is quite complicated, most probably not circumventing all the indexing bugs nevertheless
the indexing is incompatible with the indexing generated by the skip function
the indexing is incompatible with the indexing generated by the at function

The index? function is less general than other functions (not defining relative indexing, just absolute indexing, both the indexing generated by the at function as well as the indexing generated by the pick function generalize it, although they are mutually incompatible), yet it introduces unacceptable bugs for past-tail positions.

For the mathematical model of Rebol series and for the definitions of the new, improved indexing system see the Mathematical_model_of_Rebol_series article.