The first command in an osh command sequence writes a stream of objects but does not have an input stream. Each subsequent command reads a stream of objects and writes a stream of objects. For a given command, the relationship between inputs and outputs is not necessarily one-to-one. For example, the f command reads one object from the stream, applies a function to it, and then generates one object to its output stream. The select command copies objects from the input stream to the output stream if and only if the select's predicate evaluates to true for the input object. expand generates any number of output objects for a single input object.
Commands, in addition to operating on input and output streams, may have side-effects. For example, out writes to stdout or a file; sql may update a database; and commands with function arguments (e.g. f, select) can operate on variables in the osh command sequence's namespace.
jao@zack$ osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ out ('fred1', 1) ('fred2', 0) ('fred3', 5)
Now suppose you want to find the total number of open requests across the cluster. You can pipe the (node, request count) tuples into an aggregation command:
jao@zack$ osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ agg 0 'total, node, count: total + count' $ 6
The same computation can be done using the API as follows:
#!/usr/bin/python from osh.api import * osh(remote("fred", sql("select count(*) from request where state = 'open'")), agg(0, lambda total, node, count: total + count))
zack$ osh gen 10 ^ f 'x: x**2' $ (0,) (1,) (4,) (9,) (16,) (25,) (36,) (49,) (64,) (81,)gen 10 generates the first ten integers, 0, 1, ..., 9. These integers are passed to the next command, f. The argument to f is a function specification, x: x**2. This is a lambda expression, (the CLI permits the keyword lambda to be omitted). When the f command receives an input, it computes the square and writes the result to the output stream. The squared numbers are then passed to the out command which writes its inputs to stdout.
The streams connecting commands always contain tuples. If a command writes a single object to a stream, (e.g. gen which generates integers), the osh runtime wraps this object into a 1-tuple, (which is why the output from the above command contains 1-tuples, not integers).
Arbitrary-length argument lists work as usual. For example, suppose you have a file containing CSV (comma-separated values) data, in which each row contains 20 items. If you want to add integers in columns 7 and 18 (0-based) then you could invoke f, providing a function with 20 arguments, and add the 7th and 18th items. Or you could use an argument list:
osh cat data.csv ^ f 's: s.split(",")' ^ f '*row: int(row[7]) + int(row[18])' $cat data.csv writes the lines of data.csv to the output stream. Each such line contains values separated by commas; f 's: s.split(",")' splits each such line into a tuple of values. The next command, f: '*row: int(row[7]) + int(row[18])', assigns the entire tuple to row instead of assigning each tuple value to one function argument.
Command-line interface (CLI): The osh executable interprets command-line arguments as osh syntax. Any shell should be usable, however some osh CLI syntax may require escapes in some shells. (The osh CLI has been tested most extensively using the bash shell.)
Python application programming interface (API): The osh CLI invokes the osh runtime, which invokes Python modules corresponding to each command. The runtime and command modules can also be invoked from a Python API.