TIP 304: A Standalone [chan pipe] Primitive for Advanced Child IPC

Login
Author:         Alexandre Ferrieux <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        07-Feb-2007
Post-History:   
Keywords:       Tcl,exec,process,subprocess,pipeline,channel
Tcl-Version:    8.6
Tcl-Ticket:     1978495

Abstract

Currently, it is not easy to get both (separate) dataflows from the stdout and stderr of a child. BLT's bgexec does this in an extension, and could be added to the core. But the point of this TIP is to show that a much smaller code addition can provide a lower-level primitive with much more potential than bgexec's: a standalone pipe creation tool like TclX's pipe command.

Background

Getting back both stdout and stderr from a child has long been an FAQ on news:comp.lang.tcl, to the point that bgexec has been offered in an extension, BLT, whose main job is very remote from IPC. Now this has been a problem for many, who didn't want to have the problems of distributing a script depending on an extension. Moreover, bgexec does not scale up, in that it cannot bring back the separate stderrs of all four children in:

	set ff [open "|a | b | c | d" r]

A popular workaround for script-only purists is to spawn an external "pump" like cat in an [open ... r+], and redirect the wanted stderr to the write side of the pump. Its output can then be monitored through the read side:

	set pump [open "|cat" r+]
	set f1 [open "|cmd args 2>@ $pump" r]
	fileevent $f1 readable got_stdout
	fileevent $pump readable got_stderr

Now this is all but elegant of course, difficult to deploy on Windows (where you need an extra cat.exe), and not especially efficient since the "pump" consumes context switches and memory bandwidth only to emulate a single OS pipe when Tcl is forced to create two of them via [open ... r+].

For this latter performance issue, a better alternative is a named pipe. But it is even harder to create on Windows, and it is a nightmare to handle its lifecycle properly (it doesn't die automagically with the creating process; blocks on open() if other side is not ready).

Proposed Change

All this points to the obvious solution: steal TclX's pipe command, which wraps the OS's pipe()/CreatePipe() syscall, yielding two Tcl channels wrapping the read and write side of the underlying pipe.

TclX's pipe command allows two syntaxes:

	pipe pr pw
	foreach {pr pw} [pipe] break

Its application to following stderr is straightfoward:

	pipe pr pw
	set f1 [open "|cmd args 2>@ $pw" r]
	fileevent $f1 readable got_stdout
	fileevent $pr readable got_stderr

Specification

The basic functionality is to return a pair of channels, so the script-level API does just that:

**chan pipe**

Create a pipe, and return a list of two channels wrapping either side of the pipe. The first element is the side opened for reading, the second for writing:

	lassign [chan pipe] pr pw

Subsequently, anything written to $pw will be readable on $pr. We purposefully drop the other [pipe pr pw] syntax from TclX, for the sake of minimality.

C Interface

A counterpart to chan pipe is added to Tcl's C API:

int Tcl_CreatePipe(Tcl_Interp *interp, Tcl_Channel *rchan, Tcl_Channel *wchan, int flags)

The interp argument is used for reporting errors and registering the channels created, the rchan and wchan arguments point to variables into which to place the channels for each end of the pipeline, and the flags argument is reserved for future expansion.

Discussion

It has been in TclX for years, so it is fireproof. The code can be directly copied from there, and is dead simple, just calling Tcl_MakeFileChannel and Tcl_RegisterChannel on both fds/handles. Just the low-level syscall changes between unix and windows.

Additional Benefits

It should also be noted that this primitive allows to deprecate the use of [open "|..."] and pid themselves !

	lassign [chan pipe] p1r p1w
	lassign [chan pipe] p2r p2w
	set pids [exec cmd $args >@ $p1w 2>@ $p2w &]
	fileevent $p1r readable got_stdout
	fileevent $p2r readable got_stderr

Taking this idea one step further, one can even deprecate the "|" in exec, with:

	exec a | b | c &

being rewritten as:

	exec a >@ $p1w &
	exec b <@ $p1r >@ $p2w &
	exec c <@ $p2r &

(I'm not crazy about actually deprecating ol'good and concise idioms in favour of the tedious lines above; however as an orthogonality worshipper I like to think it would be theoretically possible to reimplement them in pure Tcl thanks to the added power of one tiny extra primitive)

A more serious advantage, is that it allows the "half-close" of a command pipeline (detailed in [301] which the current TIP makes partly obsolete). Indeed

	lassign [chan pipe] pr pw
	lassign [chan pipe] qr qw
	exec a <@$pr >@$qw &

allows to close just the read side with

	close $pw

and still read all the output of the command up to its EOF:

	while {[gets $qr line]>=0} {...}

which is notoriously impossible with

	set pq [open |a r+]

Directions for Future Work

On unix, there's no reason to limit the use of unnamed pipes to stdout and stderr. An arbitrary pipe topology can be set up between several childs with descriptors 3,4,5... For this we could imagine a natural extension of exec's 2>@ to 3>@, 4<@, etc.

	exec foo >@ $p 3>@ $q 4<@ $r &
	exec bar <@ $q >@ $r 4<@ $p &

Of course, such uses are extreme, but useful in complicated IPC setups to achieve much better performance (through direct point-to-point pipes) than would a simpler "star" or "blackboard" topology (where all children write back to a central message routing process).

Reference Implementation

A patch against the Tcl HEAD (8.6) is located at http://sf.net/tracker/?func=detail&aid=1978495&group\_id=10894&atid=310894

Copyright

This document has been placed in the public domain.