TIP 225: Arithmetic Series with Optimized Space Complexity

Login
Author:         Salvatore Sanfilippo <[email protected]>
Author:         Miguel Sofer <[email protected]>
Author:         Brian Griffin <[email protected]>
Author:         Eric Taylor <[email protected]>
State:          Withdrawn
Type:           Project
Vote:           Pending
Created:        25-Oct-2004
Post-History:   
Tcl-Version:    8.7
Tcl-Ticket:     1052584
Obsoleted-By:   629

Abstract

This TIP proposes to add a new command to generate arithmetic sequences as Tcl lists that may be stored in constant space in many practical situations. The only change from the point of view of the Tcl programmer is the addition of a new command named range.

Rationale

An idiomatic way to assign successive elements of an arithmetic series to a variable is to use the for command. Usually the loop variable is initialized to the first element of the sequence, and incremented at every iteration of a given step using incr. The for test condition is used to limit the sequence generation to a given element, like in the following example:

for {set i 0} {$i < 10} {incr i} {
    puts $i
}

The Tcl programming language is at higher level than the C language, where this idiom firstly appeared, so it may be desiderable to be able to generate arithmetic sequences of integer numbers in a more comfortable way. Being the Tcl list a central data structure of the Tcl language, it apperas natural to generate a Tcl list of integers, and possibly use the foreach command to loop over every element, so that the above for loop can be translated into the following fragment of code:

foreach i [range 0 10] {
    puts $i
}

The range command can be also conveniently used in different contexts. The following code generates a list of squares of 0, 1, 2, 3, ... 9.

	proc map {varname script mylist} {
	    upvar $varname var
	    set res {}
	    foreach var $mylist {
	        lappend res [uplevel 1 $script]
	    }
	    return $res
	}

	puts [map x {expr {$x*$x}} [range 0 10]]

	# Will output "0 1 4 9 16 25 36 49 64 81"

The range command can be implemented in a way that makes it possible to internally store the arithmetic sequences genereated in constant space if they are only accessed using foreach, llength and lindex commands (lrange may also be handled in a special way). When needed, the object will be converted into a List object automatically. From the Tcl programmer point of view this optimization is transparent.

Specification of the Behaviour

The range command takes three arguments in the complete format, named start, end and step, and generates a sequence of integers accordingly to the following algorithm in pseudo code:

RangeLen(start, end, step)
1. if step = 0
2.     then ERROR
3. if start = end
4.     then return 0
5. if step > 0 AND start > end
6.     then ERROR
7. if setp < 0 AND end > start
8.     then ERROR
9. return 1+((ABS(end-start)-1)/ABS(step))

Range(start, end, step)
1. result <- EMPTY LIST
2. len <- RangeLen(start, end, step)
3. for i <- 0 to len - 1
4.     result.append(start+(i*step))
6. return result

The step argument can be omitted, and default to the value of 1, so [range 0 10 1] is the same as [range 0 10]. It's also possible to call the range command with a single argument, omitting both the start and step argument that will default respectively to 0 and 1, so that the following three commands will generate the same sequence of integers:

range 0 10 1
range 0 10
range 10

The following are examples of outputs.

[range 0 10 1] => 0 1 2 3 4 5 6 7 8 9
[range 10 0 -2] => 10 8 6 4 2
[range 10 10] => empty list
[range 10 20 -3] => ERROR
[range 5] -> 0 1 2 3 4

Infinite series and series resulting in lists bigger than the maximum list length that the Tcl code can handle are detected and reported as an error. start, end, and step can be anything can fit into a Tcl wide integer.

Note that there is a practical justification for the fact that the elements generated never reach the value of the End argument, with the effect of [range 0 10 1] generating the sequence 0, 1, 2, ..., 9 and a range with the same value of start and end always generating an emtpy list. This is needed in order to make it comfortable to use range and foreach instead of for loops like in the following example:

foreach i [range [llength $mylist]] {
    foobar [lindex $mylist $i]
}

Because Tcl indexes are mostly zero-based, and it is often useful to access every element of a sequence given it's length, this appears to be the more sensible behaviour (this semantic is very similar to the range() function of the Python programming language, where range() is fully used to replace C-like for loops.)

Unfortunately this behaviour is not as comfortable to run the indexes in reverse order:

foreach i [range [expr {[llength $mylist]-1}] -1 -1] {
    foobar [lindex $mylist $i]
}

But the access from the first to the last element is far more common in programs, and the range command needs to be consistent when the step is negative.

An alternative syntax for reverse-indexing is:

foreach i [range [llength $mylist]] {
   foobar [lindex $mylist end-$i]
}

Proposed Change

The change proposed is to modify the Tcl core in order to handle a new object type called ArithSeries, that is recongnized and handled as a special case by at least the llength, lindex and foreach commands. Syntactically, the ArithSeries object will have the string representation that is exactly that that would be produced by creating a list with the elements that would be iterated over by foreach as previously described. This TIP also proposes to add logic into SetListFromAny method of the List type in order to convert an Arithmetic Series object into a List directly without to pass from the string representation.

This TIP proposes to add a range command to the Tcl core having the semantics specified above, and returning an Arithmetic Series object. Formally, the syntax is:

range ?start? end ?step?

The proposed changes are available as a Patch against HEAD that can be found in the SourceForge Tcl patch 1052584 http://sf.net/tracker/?func=detail&aid=1052584&group_id=10894&atid=310894

Proposed Extended Change

Current Status -- August 2022

This TIP has been implemented, in it's entirety, along with the additional command syntax proposed in TIP-629 (see tcl branch tip-629). Currently, the command name is "range", but I propose changing it to "lseq"; it is less likely to conflict with user code, and is in line with other list related commands, i.e, begins with an "l". Note that the original tip-225 proposes a new Obj type: ArithSeries. This new type is fully implemented and supported for foreach, lmap, llength, lindex, lrange, lreverse, and the "in & ni" operators. List operations that modify the list will shimmer the ArithSeries into a List type before completing the operation, as Salvatore described below in the Appendix Discussion

Variant Implementation

However, I propose an expanded approach.

The ArithSeries Obj type basically replaces the elemCount and elements fields with functions that perform llength(), lindex(), and slice() operations without having to generate an actual elements[] array. In theory, these functions could be provided abstractly to allow any kind of list value. Here are some silly examples:

I bring this up because currently, the ArithSeries is hard-coded into the various list operations as a special case (somewhat like the dict obj type is in certain cases). But if this obj type can be abstracted, then the list operations would only require one "special case", and not n-special cases. Even the current List type could be implemented as an Abstract List. This would allow for future type implementations, that can achive the goal of the original authors, to create "Tcl lists that may be stored in constant space". I have an initial implementation of an Abstract List type along with an implemented [lseq] command that defines the ArithSeries as an Abstract List.

Creating an AbstractList type requires providing a set of functions that allow the type to emulate a List.

typedef struct AbstractList {
    /* List emulation functions */

    /* How to create a new Tcl_Obj of this custom type */
    Tcl_ALNewObjProc *newObjProc;

    /* How to duplicate a internal rep of this custom type */
    Tcl_ALDupRepProc *dupRepProc;

    /* Return the [llength] of the AbstractList */
    Tcl_ALLengthProc *lengthProc;

    /* Return a value (Tcl_Obj) for [lindex $al $index ...] */
    Tcl_ALIndexProc *indexProc;

    /* Return an AbstractList for [lrange $al $start $end] */
    Tcl_ALSliceProc *sliceProc;

    /* Return an AbstractList for [lreverse $al] */
    Tcl_ALReverseProc *reverseProc;

    size_t alvaluesize;  /* value size */
    void* alvalue;       /* Custom value reference */
} AbstractList;

If a function is not provided, the code will revert to shimmering the value to a List before completing the operation. The bare minimum for a functioning implementation is lengthProc and indexProc.

To illustrate how a new AbstractList is created, here is the code for the ArithSeries:

Tcl_Obj *
TclNewArithSeriesObj(Tcl_WideInt start, Tcl_WideInt end, Tcl_WideInt step, Tcl_WideInt len)
{
    Tcl_WideInt length = (len>=0 ? len : ArithSeriesLen(start, end, step));
    Tcl_Obj *arithSeriesPtr;
    ArithSeries *arithSeriesRepPtr;

    if (length == -1) return NULL; /* Invalid range error */

    arithSeriesPtr = Tcl_NewAbstractListObj(NULL, "arithseries", sizeof (ArithSeries));
    arithSeriesRepPtr = (ArithSeries*)Tcl_AbstractListGetTypeRep(arithSeriesPtr);
    arithSeriesRepPtr->start = start;
    arithSeriesRepPtr->end = end;
    arithSeriesRepPtr->step = step;
    arithSeriesRepPtr->len = length;
    Tcl_SetAbstractListNewProc(     arithSeriesPtr, Tcl_NewArithSeriesObj    );
    Tcl_SetAbstractListLengthProc(  arithSeriesPtr, Tcl_ArithSeriesObjLength );
    Tcl_SetAbstractListIndexProc(   arithSeriesPtr, Tcl_ArithSeriesObjIndex  );
    Tcl_SetAbstractListSliceProc(   arithSeriesPtr, TclArithSeriesObjRange   );
    Tcl_SetAbstractListReverseProc( arithSeriesPtr, TclArithSeriesObjReverse );
    Tcl_SetAbstractListDupRepProc(  arithSeriesPtr, DupArithSeriesRep        );

    if (length > 0) {
        Tcl_InvalidateStringRep(arithSeriesPtr);
    } else {
        TclInitStringRep(arithSeriesPtr, NULL, length);
    }
    return arithSeriesPtr;
}

This proposal includes a set of Tcl C API functions for extension authors to create new forms of abstract lists.

Some suggestions so far include:

AbstractList Implementation

An implementation of AbstractList, along with an [lseq] ArithSeries command and abstact obj type has been done. At this writing, it is still a work in progress, but it has shown that:

ToDo

Concerns

-Brian


Copyright

This document has been placed in the public domain.


Appendix: Reference Pure-Tcl Implementation

It may be useful to test the behaviour of the range command without having to apply the Patch, so the following is a pure Tcl implementation that should be exactly equivalent in the semantic to the specification in this TIP, but of course not able to store ranges in O(1) space.

# RangeLen(start, end, step)
# 1. if step = 0
# 2.     then ERROR
# 3. if start = end
# 4.     then return 0
# 5. if step > 0 AND start > end
# 6.     then ERROR
# 7. if setp < 0 AND end > start
# 8.     then ERROR
# 9. return 1+((ABS(end-start)-1)/ABS(step))
proc rangeLen {start end step} {
    if {$step == 0} {return -1}
    if {$start == $end} {return 0}
    if {$step > 0 && $start > $end} {return -1}
    if {$step < 0 && $end > $start} {return -1}
    expr {1+((abs($end-$start)-1)/abs($step))}
}

# Range(start, end, step)
# 1. result <- EMPTY LIST
# 2. len <- RangeLen(start, end, step)
# 3. for i <- 0 to len - 1
# 4.     result.append(start+(i*step))
# 6. return result
proc range args {
    # Check arity
    set l [llength $args]
    if {$l == 1} {
        set start 0
        set step 1
        set end [lindex $args 0]
    } elseif {$l == 2} {
        set step 1
        foreach {start end} $args break
    } elseif {$l == 3} {
        foreach {start end step} $args break
    } else {
        error {wrong # of args: should be "range ?start? end ?step?"}
    }

    # Generate the range
    set rlen [rangeLen $start $end $step]
    if {$rlen == -1} {
        error {invalid (infinite?) range specified}
    }
    set result {}
    for {set i 0} {$i < $rlen} {incr i} {
        lappend result [expr {$start+($i*$step)}]
    }
    return $result
}

Appendix: Discussion

Does the TIP include a C-level api to ranges, or are they transparent also in C - in the sense that they are addressable with any of the list-oriented functions of the Tcl api? What if any changes and caveats are necessary in the documentation of Tcl's C api? Miguel

Ranges are transparent to C level too, in the proposed patch, because the logic is put inside the commands, so directly in the code implementing lindex, foreach, ... In all the other cases, when a SetListFromAny() call occurs the range is converted into a normal Tcl list object. Salvatore