数据存储---内存列式数据库KDB+(Q)文档

 Kx systems公司的创始人之一Arthur Whitney在2003年研发了列式数据库KDB和它的操作语言Q。    官网:www.kx.com


主要Feature:

  • 内存内的数据库:理解KDB的一种方式就是KDB是一个内存数据库,但拥有磁盘可持久化能力。
  • 解释性语言 :开发周期更短,q语言要做到简洁,高效和富表达性。(当然学习曲线也不是一般般滴说)
  • 列表是有顺序的 :不同于数据库中的行,因为列表有序,所以数据表也有序
  • 从右往左解析 (q的起源受到多种语言的启示,包括APL、LISP和函数式编程。)
  • 面向表 (就像其他语言使用字符串一样频繁)
  • 面向列:关系型数据库按行处理数据和存储数据,kdb是按列存数据,对数据进行运算也是直接作用在列向量上。
  • 强类型
  • Null值拥有特殊含义 (详细见后面的文档)
  • 内置I/O的支持 (很简洁)

 KDB+(Q)入门  (以下文档来自网络,记下备用)


Now that we know how q works and how to start it up, let'sexamine some real code that shows the power of q. The following program reads acsv file of time-stamped symbols and prices, places the data into a table andcomputes the maximum price for each day. It then opens a socket connection to aq process on another machine and retrieves a similar daily aggregate. Finally,it merges the two intermediate tables and appends the result to an existingfile.

sample:{
 t:("DSF"; enlist ",") 0: `:c:/q/data/px.csv;
 tmpx:select mpx:max Price by Date,Sym from t;
 h:hopen `:aerowing:5042;
 rtmpx:h "select mpx:max Price by Date,Sym from tpx";
 hclose h;
 .[`:c:/q/data/tpx.dat; (); ,; rtmpx,tmpx]
}

 

 

Contents

2. Atoms

Overview

All data is ultimately built from atoms, so we begin withatoms. An atom is an irreducible value with a specific data type. Thebasic data types in q correspond to those of SQL with some additional date andtime related types that facilitate time series. We summarize the data types inthe tables below, giving the corresponding types in SQL, and where appropriateJava and C#. We cover enumerations inCasting and Enumerations.

Q

SQL

Java

C#

boolean

boolean

Boolean

Boolean

byte

byte

Byte

Byte

short

smallint

Short

Int16

int

int

Integer

Int32

long

bigint

Long

Int64

real

real

Float

Single

float

float

Double

Double

char

char(1)

Character

Char

symbol

varchar

(String)

(String)

date

date

Date

datetime

datetime

Timestamp

!DateTime

minute

second

time

time

Time

!TimeSpan

enumeration

Note:The words boolean, short, int, etc. arenot keywords in q, so they arenot displayed in a special font in this text. They do have special meaning whenused as name arguments in some operators. You should avoid using them as names.

The next table collects the important information abouteach of the q data types. We shall refer to this in subsequent sections.

type

size

char type

num type

notation

null value

boolean

1

b

1

1b

byte

1

x

4

0x26

0x00

short

2

h

5

42h

0Nh

int

4

i

6

42

0N

long

8

j

7

42j

0Nj

real

4

e

8

4.2e

0Ne

float

8

f

9

4.2

0n

char

1

c

10

"z"

" "

symbol

*

s

11

`zaphod

`

month

4

m

13

2006.07m

0Nm

date

4

d

14

2006.07.21

0Nd

datetime

4

z

15

2006.07.21T09:13:39

0Nz

minute

4

u

17

23:59

0Nu

second

4

v

18

23:59:59

0Nv

time

4

t

19

09:01:02:042

0

enumeration

*

`u$v

dictionary

99

`a`b`c!10 20 30

table

98

([] c1:`a`b`c; c2:10 20 30)

2.1 IntegerData

The basic integer data type is common to nearly allprogramming environments.

int

An int is a signed four-byte integer. A numeric value isidentified as an int by that fact that it contains only numeric digits,possibly with a leading minus sign,without a decimal point. Inparticular, it has no trailing character that would indicate that it is anothernumeric type (see below). Here is a typical int value,

        42

short andlong

The other two integer data types are short and long. Theshort type represents a two byte signed integer and is denoted by a trailing'h' after optionally signed numeric digits. For example,

        b:-123h
        b
-123h

Similarly, the long type represents an eight byte signedlong integer denoted by a trailing 'j' after optionally signed numeric digits.

        c:1234567890j
        c
1234567890j

Important:Type promotion is performed automatically in q primitive operations. However,if a specific integer type is required in a list and a narrower type ispresented - e.g., an int is expected and a short is presented - the submittedtype willnot be automatically promoted and an error will result.This may be unintuitive for programmers coming from languages of C ancestry,but it will make sense in the context of tables.

2.2 FloatingPoint Data

Single and double precision floating point data types aresupported.

float

The float type represents an IEEE standard eight-bytefloating point number, often called "double" in other languages. Itis denoted by optionally signed numeric digits containing a decimal point withan optional trailing 'f'. A floating point number can hold at least 15 decimaldigits of precision.

For example,

        pi:3.14159265
 
        float1:1f

real

The real type represents a four-byte floating point numberand is denoted by numeric digits containing a decimal point and a trailing 'e'.Keep in mind that this type is called 'float' in some languages. A real canhold at least 6 decimal digits of precision, 7 being the norm. Thus

        r:1.4142e
        r
1.4142e

is a valid real number.

Note:The q console abbreviates the display of float or real values having zeros tothe right of the decimal.

        2.0
2f
        4.00e
4e

The behavior of substituting floating point types ofdifferent widths is analogous to the case of integer types.

ScientificNotation

Both float and real values can be specified in IEEEstandard scientific notation for floating point values.

        f:1.23456789e-10
        r:1.2345678e-10e

By default, the q console displays only seven decimaldigits of accuracy for float and real values by rounding the display in theseventh significant digit.

        f
1.234568e-10
        r
1.234568e-10e

You can change this by using the \P command (noteupper case) to specify a display width up to 16 digits.

        f12:1.23456789012
        f16:1.234567890123456
 
        \P 12
        f12
1.23456789012
        f16
1.23456789012
 
        \P 16
        f12
1.23456789012
        f16
1.234567890123456

2.3 BinaryData

Binary data can be represented as bit or byte values.

boolean

The boolean type uses one byte to store an individual bitand is denoted by the bit value followed by 'b'.

        bit:0b
        bit
0b

byte

The byte type uses one byte to store 8 bits of data and isdenoted by '0x' followed by a hexadecimal value,

        byte:0x2a

Binary Datais Numeric

In handling binary data, q is more like C than itsdescendants, in that both binary types are considered to be unsigned integersthat can participate in arithmetic expressions or comparisons with othernumeric types. There are no keywords for 'true' or 'false', nor are thereseparate logical operators. With a and pi as above,

        a:42
        bit:1b
        a+bit
43

is an int and

        byte+pi
45.14159

is a float. Observe that type promotion has been performedautomatically.

2.4 CharacterData

There are two atomic character types in q. They resemblethe SQL types CHAR and VARCHAR more than the character types of verboselanguages.

char

A char holds an individual ASCII character and is stored inone byte. This corresponds to a SQL CHAR. A char is denoted by a singlecharacter enclosed in double quotes.

        ch:"q"
        ch
"q"

Some keyboard characters, such as the double-quote, cannotbe entered directly into a char since they have special meaning in q. As in C,these characters are escaped with a preceding back-slash ( \ ). While theconsole display also includes the escape, these are actually single characters.

        ch:"\""                        / double-quote
        ch                              / console also displays the escape "\""
        ch:"\\"                        / back-slash
        ch:"\n"                        / newline
        ch:"\r"                        / return
        ch:"\t"                         / horizontal tab

You can also escape a character with an underlying numericvalue expressed as three octal digits.

        "\142"
"b"

symbol

A symbol holds a sequence of characters as a single unit. Asymbol is denoted by a leading back-quote (` ), also read "backtick" in q circles.

        s1:`q
        s2:`zaphod

A symbol is irreducible, meaning that the individualcharacters that comprise it arenot directly accessible. Symbols areoften used in q to hold names of other entities.

Important:A symbol isnot a string. We shall see inlists that there is an analogue of strings inq, namely a list of char. While a list of char is a kissing cousin to a symbol,we emphasize that a symbol isnot made up of char. The symbol`a and thechar "a" are not the same. The char"q" and thesymbol`kdb are both atomic entities.

Advanced:Youmay ask whether a symbol can include embedded blanks and special characterssuch as back-tick. The answer is yes. You create such a symbol using therelationship between lists of char and symbols. See Creating Symbols from Stringsfor more on this.

        `$"A symbol with `backtick"
`A symbol with `backtick

Note:A symbol is somewhat akin a SQL VARCHAR, in that it can hold and arbitrarynumber of characters. It is different in that it is atomic. The char "q"and the symbol `kdb are both atomic entities.

2.5 TemporalData

A major benefit of q is that it can process both timeseries and relational data in a consistent and efficient manner. Q extends thebasic SQL date and time data types to facilitate temporal arithmetic, which isminimal in SQL and can be clumsy in verbose languages (e.g., Java's datelibrary and its use of time zones). We begin with the equivalents to SQLtemporal types. The additional temporal types in q deal with constituents of adate or time.

date

A date is stored in four bytes and is denoted by yyyy.mm.dd,where yyyy represents the year, mm the month and dd theday. A date value stores the count of days from Jan 1, 2000.

        d:2006.07.04
        d
2006.07.04

Important:Months and days begin at 1 (not zero) so January is '01'.

Leading zeroes in months and days are required; theiromission causes an error.

        bday:2007.1.1
'2007.1.1

Advanced:The underlying day count can be obtained by casting to int.

        `int$2000.02.01
31

time

A time is stored in four bytes and is denoted by hh:mm:ss.uuuwhere hh represents hours on the 24-hour clock, mm representsminutes, ss represents seconds, and uuu represents milliseconds.A time value stores the count of milliseconds from midnight.

        t:09:04:59.000
        t
09:04:59:000

Again, leading zeros are required in all constituents of atime.

Advanced:The underlying millisecond count can be obtained by casting to int.

        `int$12:34:56.789
45296789

datetime

A datetime is the combination of a date and a time,separated by 'T' as in the ISO standard format. A datetime value stores thefractional day count from midnight Jan 1, 2000.

        dt:2006.07.04T09:04:59:000
        dt
2006.07.04T09:04:59:000

Advanced:The underlying fractional day count can be obtained by casting to float.

        `float$2000.02.01T12:00:00.000
31.5

month

The month type uses four bytes and is denoted by yyyy.mmwith a trailing 'm'. A month values stores the count of months since thebeginning of the year.

        mon:2006.07m
        mon
2006.07m

Advanced:The underlying month offset can be obtained by casting to int.

        `int$2000.04m
3

minute

The minute type uses four bytes and is denoted by hh:mm.A minute value stores the count of minutes from midnight.

        mm:09:04
        mm
09:04

Note:We did not usemin for the variable name becausemin is a reserved name inq.

Advanced:The underlying minute offset can be obtained by casting to int.

        `int$01:23
83

second

The second type uses four bytes and is denoted by hh:mm:ss.A second value stores a count of seconds from midnight.

        sec:09:04:59
        sec
09:04:59

The representation of the second type makes it look like aneveryday time value. However, a q time value is a count of milliseconds frommidnight, so the underlying values are different.

Advanced:The underlying values can be obtained by casting to int. This manifests theinequality.

        `int$12:34:56
45296
        `int$12:34:56.000
45296000
        12:34:56=12:34:56.789
0b

Constituentsand Dot Notation

The constituents of dates, times and datetimes can be extractedusing dot notation. The individual field values are all extracted as int. Thefield values of a date are named 'year', 'mm' and 'dd'.

        d:2006.07.04
        d.year
2006
        d.mm
7
        d.dd
4

Similarly, the field values of time are 'hh', 'mm', 'ss'.

        t:12:45:59.876
        t.hh
12
        t.mm
45
        t.ss
59

Note:At the time of this writing (Jun 2007) there is no syntax to retrieve themillisecond constituent. Use the construct,

        t mod 1000
876

In addition to the individual field values, you can alsoextract higher-order constituents.

        d.month
2007.07m
        t.minute
12:45
        t.second
12:45:59

Of course, this works for a datetime as well.

        dt:2006.07.04T12:45:59.876
        dt.date
2006.07.04
        dt.time
12:45:59.876
        dt.month
2006.07m
        dt.mm
7
        dt.minute
12.45

Advanced:It is a quirk in q that dot notation for accessing temporal constituents doesnot work on function arguments. For example,

        fmm:{[x] x.mm}
        fmm 2006.09.15
{[x] x.mm}
'x.mm

Instead, cast to the constituent type,

        fmm:{[x] `mm$x}
        fmm 2006.09.15
9

2.6 Infinitiesand NaN

In addition to the regular numeric and temporal values,special values represent infinities, whose absolute values are greater than any“normal” numeric or temporal value.

Token

Value

0w

Positive float infinity

0W

Positive int infinity

0Wh

Positive short infinity

0Wj

Positive long infinity

0Wd

Positive date infinity

0Wt

Positive time infinity

0Wz

Positive datetime infinity

0n

NaN, or not a number

Important:Observe the distinction between lower case 'w' and upper case 'W'.

The result of dividing any positive (or unsigned) non-zerovalue by any zero value is positive float infinity, denoted0w.Dividing a negative value by zero results in negative float infinity, denotedby-0w. The way to remember these is that 'w' looks like the infinitysymbol ∞.

The integral infinities can not be produced via anarithmetic division on normal int values, since the result of division in q isalways a float.

The result of dividing any 0 value by any zero value isundefined, so q represents this as the floating point null 0n.

The q philosophy is that any valid arithmetic expressionwill produce a result rather than an error. Therefore, dividing by 0 produces aspecial float value rather than an exception. You can perform a complexsequence of calculations without worrying about things blowing up in the middleor inserting cumbersome exception trapping. We shall see more about this inPrimitive Operations.

Advanced:While infinities can participate in arithmetic operations, infinite arithmeticis not implemented. Instead, q performs the operation on the underlying bitpatterns. Math propeller heads (including the author) find the followingdisconcerting.

        0W-2
2147483645
 
        2*0W
-2

2.7 NullValues

Overviewof Nulls

The concept of a null value generally indicates missingdata. This is an area in which q differs from both verbose programminglanguages and SQL.

In such languages as C++, Java and C#, the concept of anull value applies to complex entities (i.e., objects) that are accessedindirectly by pointer or by reference. A null value for such an entitycorresponds to an un-initialized pointer, meaning that it has not been assignedthe address of an allocated block of memory. There is no concept of null forentities that are of simple or value type. For those types that admit null, youtest for being null by asking if the value is equal to null.

The NULL value in SQL indicates that the data value isinapplicable or missing. The NULL value is distinct from any value that canactually be contained in a field and does not have '=' semantics. That is, youcannot test a field for being null with = NULL. Instead, you ask if it IS NULL.Because NULL is a separate value, Boolean fields actually have three states: 0,1 and NULL.

In q, the situation is more interesting. While most typeshave distinct null values, some types have no designated way of representing anull value.

The following table summarizes the way nulls are handled.

type

null

boolean

0b

byte

0x00

short

0Nh

int

0N

long

0Nj

real

0Ne

float

0n

char

" "

sym

`

month

0Nm

date

0Nd

datetime

0Nz

minute

0Nu

second

0Nv

time

0Nt

BinaryNulls

Let's start with the binary types. As you can see, theyhave no special null value, which means that null is equivalent to the valuezero. Consequently, you cannot distinguish between a missing boolean value andthe value that represents false.

In practice, this isn't an issue, since in mostapplications it isn't a critical distinction. It can be a problem if thedefault value of a boolean flag in your application is not zero, so you mustensure that this does not occur. A similar precaution applies to byte values.

Numericand Temporal Nulls

Next, observe that all the numeric and temporal types havetheir own designated null values. Here the situation is similar to SQL, in thatyou can distinguish missing data from data whose underlying value is zero. Thedifference from SQL is that there is no universal null value.

The advantage of the q approach is that the null valueshave equals semantics. The tradeoff is that you must use the correct null valuein type-checked situations.

CharacterNulls

Finally, we consider the character types. Considering asymbol to a variable length character collection justifies why the symbol nullvalue is the empty symbol, designated by a back-tick (` ).

In contrast, the null value for the char type is the charconsisting of the blank character ( " " ). As with binary data, youcannot distinguish between a missing char value and a blank value. Again, thisis not seriously limiting in practice, but you should ensure that yourapplication does not rely on this distinction.

Note:The value"" isnot the char null. Instead, it is the empty list of char.


 

 

 

Contents

[hide]

3. Lists

Overview

Data complexity is built up from atoms, which we know, andlists. It is important to achieve a thorough understanding of lists sincenearly all q programming involves processing lists. The concepts are simple butcomplexity can build rapidly. Our approach is to introduce the basic notion ofa general list in the first section, take a quick detour to cover simple andsingleton lists, then return to cover general lists in more detail.

Introductionto Lists

A list is simply an ordered collection. A collection ofwhat, you ask. More precisely, alist is an ordered collection of atomsand other lists. Since this definition is recursive, let's start with thesimplest case in which the list comprises only atoms.

ListDefinition and Assignment

The notation for a general list encloses its items withinmatching parentheses and separates them with semicolons. For readability,optional whitespace is used after the semicolon separators in the last example.

        (1;2;3)
 
        ("a";"b";"c";"d")
 
        (`Life;`the;`Universe;`and;`Everything)
 
        (-10.0; 3.1415e; 1b; `abc; "z")

In the preceding examples, the first three lists are simple,meaning that the list comprises atoms of uniform type. The last example is agenerallist, meaning that it is not simple. Otherwise put, a general list containsitems that are not atoms of a uniform type. This could be atoms of mixed type,nested lists of uniform type, or nested lists of mixed type.

Important:The order of the items in the list is positional (i.e., left-to-right) and ispart of its definition. The lists(1;2) and(2;1) are different. SQLis based on sets, which are inherently unordered. This distinction leads tosome subtle differences between the results of queries on q tables versus theresult sets from analogous SQL queries. The inherent ordering of lists makestime series processing natural and fast in q, while it is cumbersome andperforms poorly in standard SQL.

Lists can be assigned to variables exactly like atoms.

        L1:(1;2;3)
 
        L2:("z";"a";"p";"h";"o";"d")
 
        L3:(`Life;`the;`Universe;`and;`Everything)
 
        L4:(0b;1b;0b;1b;1b;0b)
 
        L5:(-10.0;3.1415e;1b;`abc;"z")

count

The number of items in a list is its count. You canobtain the count of a list as follows,

        count L1
3

This is our first example of a function, which we willlearn about in Functions. For now, we need onlyunderstand that count returns an int value equal to the number ofitems in a list to its right.

Observe that the count of any atom is 1.

        count 42
1
        count `abcd
1

SimpleLists

A simple list - that is, a list of atoms of a uniform type- corresponds to the mathematical notion of avector. Such lists aretreated specially in q. They have a simplified notation, take less storage andcompute faster than general lists. Of course, you can use general list notationfor a vector, but q converts a general list to a vector whenever feasible.

SimpleInteger Lists

A simple list of any numeric type omits the enclosingparentheses and replaces the separating semi-colons with blanks. The followingtwo expressions for a simple list of int are equivalent,

        (100;200;300)
 
        100 200 300

This is confirmed by the console display,

        (100;200;300)
100 200 300

Similar notation is used for simple lists of short and longwith the addition of the type indicator.

        H:(1h;2h;255h)
        H
1 2 255h

We conclude that a trailing type indicator in the displayapplies to the entire list and not just the last item of the list; otherwise,the list would not be simple and would be displayed in general form.

        G:(1; 2; 255h)
        G
1
2
255h

SimpleFloating Point Lists

Simple lists of float and real are notated similarly.Observe that the q console suppresses the decimal point when displaying a floathaving zero(s) to the right of the decimal, but the value is not an int.

        F:(123.4567;9876.543;99.0)
        F
123.4567 9876.543 99

This notational efficiency for float display means that alist of floats having no decimal parts displays with a trailingf.

        FF:1.0 2.0 3.0
        FF
1 2 3f

SimpleBinary Lists

The simplified notation for a simple list of binary datajuxtaposes the individual data values together with a type indicator. The typeindicator for boolean trails the value.

        bits:(0b;1b;0b;1b;1b)
        bits
01011b

The indicator for byte leads,

        bytes:(0x20;0xa1;0xff)
        bytes
0x20a1ff

Note:A simple list of boolean atoms requires the same number of bytes to store as ithas atoms. While the simplified notation is suggestive, multiple bits arenotcompressed to fit inside a single byte. The list bits above holds itsvalues in 5 bytes of storage.

SimpleSymbol Lists

The simplified notation for simple lists of symbolsjuxtaposes the individual atoms with no intervening whitespace.

        symbols:(`Life;`the;`Universe;`and;`Everything)
        symbols
`Life`the`Universe`and`Everything

Inserting spaces between the atoms causes an error.

        bad:`This `is `wrong
'is

Simplechar Lists and Strings

The simplified notation for a list of char looks just likea string in most languages, with the juxtaposed sequence of characters enclosedin double quotes.

        chars:("s";"o";" ";"l";"o";"n";"g")
        chars
"so long"

Note:A simple list of char is called astring.

EnteringSimple Lists

Lists can be defined using simplified notation,

        L:100 200 300
 
        H:1 2 255h
 
        F:123.4567 9876.543 99.99
 
        bits:01011b
 
        bytes:0x20a1ff
 
        symbols:`Life`the`Universe`and`Everything
 
        chars:"so long"

Finally, we observe that a list entered as intermixed intsand floats is converted to a simple list of floats.

        1 2.0 3
1 2 3f

Lists ofTemporal Data

Specifying a list of mixed temporal types has a differentbehavior from that of a list of mixed numeric types. In this case, the listtakes the type of the first item in the list; other items are widened ornarrowed to match.

        12:34 01:02:03
12:34 01:02
 
        01:02:03 12:34
01:02:03 12:34:00

To force the type of a mixed list of temporal values,append a type specifier.

        01:02:03 12:34 11:59:59.999u
01:02 12:34 11:59

Empty andSingleton Lists

Lists with one or no items merit special consideration.

TheGeneral Empty List

It is useful to have lists with no items. A pair ofparentheses with nothing (except possibly whitespace) between denotes the emptylist.

        L:(  )
        L
-

We shall see in Creating Typed Empty Liststhat it is possible to define an empty list with a specific type.

Lists witha Single Item

There is a quirk in q regarding how it handles a listcontaining a single item, called asingleton. Creation of a singletonpresents a notational problem. To see the issue, first realize that a listcontaining a single atom is distinct from the individual atom. As any UPSdriver will readily tell you, an item in a box is not the same as an unboxeditem. By now, we recognize the following as atoms,

        42
 
        1b
 
        0x2a
 
        `beeblebrox
 
        "z"

We also recognize the following are all lists with twoelements,

        (42;6)
 
        01b
 
        `zaphod`beeblebrox
 
        "zb"
 
        (40;`two)

How to create a list of a single item? Good question. Theanswer is that there is no syntactic way to do so. You might think that youcould simply enclose the item in parentheses, but this doesn't work since theresult is an atom.

        singleton:(42)
        singleton
42

The reason for this is that parentheses are used formultiple purposes in q. As we have seen, paired parentheses are used to delimititems in the specification of a general list. Paired parentheses are also usedfor grouping in expressions - that is, to isolate the result of the expressioninside the parentheses. The latter usage forces (42) to be the same as the atom42 and so precludes the intention in the specification ofsingletonabove.

The way to make a list with a single item is to use the enlistfunction, which returns a singleton list containing what is to its right.

        singleton:enlist 42
        singleton
,42

To distinguishbetween an atom and the equivalent singleton, examine the sign of their types.

        signum type 42
-1
        signum type enlist 42
1

As a final check before moving on, make sure that youunderstand that the following also defines a list containing a single item,

        singleton:enlist 1 2 3
        count singleton
1

Indexing

Recall that a list is ordered from left to right by theposition of its items. The offset of an item from the beginning of the list iscalled itsindex. Thus, the first item is has index 0, the second item(if there is one) has index 1, etc. A list of count n has index domain 0 ton-1.

IndexNotation

Given a list L, the item at index i isaccessed by L[i]. Retrieving an item by its index is calleditemindexing. For example,

        L:(-10.0;3.1415e;1b;`abc;"z")
        L[0]
-10f
        L[1]
3.1415e
        L[2]
1b
        L[3]
`abc
        L[4]
"z"

IndexedAssignment

Items in a list can also be assigned via item indexing.Thus,

        L1:1 2 3
        L1[2]:42
        L1
1 2 42

Important:Index assignment into a simple list enforces strict type matching with no typepromotion. Otherwise put, when you reassign an item in a simple list, the typemust match exactly and a narrower type is not widened.

        L:100 200 300
        L[1]:42h
'type
 
        f:100.0 200.0 300.0
        f
100 200 300f
        f[1]:400
'type

This may come as a surprise if you are accustomed tonumeric values always being promoted to wider types in a verbose language.

IndexingDomain

Providing an invalid data type for the index results in anerror.

        L:(-10.0;3.1415e;1b;`abc;"z")
        L[`1]
'type

If you attempt to index outside of the bounds of the list,the result is not an error. Rather, you get a null value. If the list issimple, this is the null for the type of atoms in the list. For general lists,the result is0n.

        L[5]
0n

One way to understand this is that the result of asking fora non-existent index is "missing value." Keep this in mind, sinceindexing one position past the end of the list is easy to do, especially ifyou're not used to indexing relative to 0.

EmptyIndex and Null Item

An empty index returns the entire list.

        L[]
-10f
3.1415e
1b
`abc
"z"

Note:An empty index isnot the same as indexing with an empty list. Thelatter returns an empty list.

        L[()]
_

The syntactic form double-colon ( :: ) denotes thenull item, which allows explicit notation or programmatic generation of anempty index.

        L[::]
-10f
3.1415e
1b
`abc
"z"

Advanced:The type of the null item is undefined; in particular, its type does not matchthat of any normal item in a list. As a consequence, inclusion of the null itemin a list forces the list to be general.

        L:(1;2;3;::)
        L
1
2
3
::
        type L
0h

This can be used to avoid a nasty surprise when q is tooclever. To see how, consider the general list,

        L:(1;2;3;`a)
        type L
0h

Now, reassign the last item to an int and note what happensto the list.

        L[3]:4
        L
1 2 3 4
        type L
6h

The list has been converted to a simple list of int! Asubsequent attempt to reassign the last item back to its original value failswith a type error.

        L[3]:`a
'type

This can be circumvented by placing a null item in thelist, forcing it to remain general.

        L:(1;2;3;`a;::)
        L[3]:4
        L
1
2
3
4
::
       type L
0h
        L[3]:`a
        L
1
2
3
`a
::

Lists fromVariables

Lists can be created from variables.

        L1:(1;2;100 200)
        L2:(1 2 3;‘ab`c)
 
        L6:(L1;L2)
        L6
1     2   100 200
1 2 3 `ab `c

JoiningLists

We scoop our presentation on operations in the next chapterto describe an important operation on lists. Probably the most common operationon two lists is to join them together to form a larger list. More precisely,the join oerator (,) appends its right operand to the end of the left operandand returns the result. It accepts an atom in either argument.

        1 2,3 4 5
1 2 3 4 5
        1,2 3 4
1 2 3 4
        1 2 3,4
1 2 3 4

Observe that if the arguments are not of uniform type, theresult is a general list.

        1 2 3,4.4 5.5
1
2
3
4.4
5.5
        1 2 3,"ab"
1
2
3
"a"
"b"

Note:To accept either a scalar or a list x and produce a uniform shape, use theidiom,

        (),x

which always yields a list with the content of x.

Lists asMaps

Thus far, we have viewed a list as a static collection ofits items. We can also consider a list to be a mapping provided by itemindexing. Specifically, a listL of count n represents a monadicmapping over the domain of non-negative integers 0,...,n-1. The list mappingassigns the output valueL[ i] to the input value i.Succinctly, the I/O association for the list is,

        i ——> L[ i]

Here are the I/O tables for some basic lists:

101 102 103 104

I

O

0

101

1

102

2

103

3

104

(`a; 123.45; 1b)

I

O

0

`a

1

123.45

2

1b

(1 2; 3 4)

I

O

0

1 2

1

3 4

The first two examples demonstrate ranges of a collectionof atoms. The last example has a range comprised of lists.

A list not only looks like a map, it is a map whosenotation is a shortcut for the I/O table assignment. This is a useful way oflooking at things. We shall see inPrimitive Operations that anested list can be viewed as a multivalent map whose range is atoms.

From the perspective of list as map, the fact that indexingoutside the bounds of a list returns null means the map is implicitly extendedto the domain of all integers with null values outside the list items.

Nesting

Data complexity is built by using lists as items of lists.

Depth

Now that we're comfortable with simple lists, we return togeneral lists. We can nest by including lists as items of lists. The number oflevels of nesting for a list is called itsdepth. Atoms are consideredto have depth 0 and simple lists have depth 1.

The notation of complex lists reflects their nesting. Forpedagogical purposes, in this section, we shall often use general notation todefine even simple lists; however, the console always display lists insimplified form. In subsequent sections, we shall use only simplified notationfor simple lists.

Following is a list of depth 2 that has three items, thefirst two being atoms and the last a list.

        L1: (1;2;(100;200))
        count L1
3

Following is the simplified notation for the inner list,

        L1:(1;2;100 200)
        L1
1
2
100 200

PictorialRepresentation

We present a pictorial representation that may help invisualizing levels of nesting. An atom is represented as a circle containingits value. A list is represented as a box containing its items. A general listis a box containing boxes and atoms.

Examples

Following is a list of depth two having two elements, eachof which is a simple list,

        L2:((1;2;3);(`ab;`c))
        L2
1 2 3
`ab`c
        count L2
2

Following is a list of depth two having three elements,each of which is a general list,

        L3:((1;2h;3j);("a";`bc);(1.23;4.56e))
        L3
(1;2h;3j)
("a";`bc)
(1.23;4.55999994278e)
        count L3
3

Following is a list of depth two having one item that is asimple list,

        L4:enlist 1 2 3 4
        L4
1 2 3 4
        count L4
1
        L4[0]
1 2 3 4

Following is list of depth three having two items. Thesecond item is a list of depth two having three items, the last of which is asimple list of four items.

        L5:(1;(100;200;(1000;2000;3000;4000)))
        L5
1
(100;200;1000 2000 3000 4000)
       count L5
2
       count L5[1]
3

Following is a "rectangular" list that can bethought of as a 3x4 matrix,

        m:((11;12;13;14);(21;22;23;24);(31;32;33;34))
        m
11 12 13 14
21 22 23 24
31 32 33 34

Indexingat Depth

It is possible to index directly into the items of a nestedlist.

RepeatedItem Indexing

Retrieving an item via a single index always retrieves anuppermost item from a nested list.

        L:(1;(100;200;(1000;2000;3000;4000)))
        L[0]
1
        L[1]
100
200
1000 2000 3000 4000

Recalling that q evaluates expressions from right-to-left,we interpret the second retrieval above as,

·        Retrieve the item at index 1from L

Alternatively, reading it functionally as left-of-right,

·        Retrieve from L the item atindex 1

Since the result L[1] is itself a list, we canretrieve its elements using a single index.

        L[1][2]
1000 2000 3000 4000

Read this as:

·        Retrieve the item at index 2from the item at index 1 in L

or,

·        Retrieve the item at index 1from L, and from it retrieve the item at index 2

We can repeat single indexing once more to retrieve an itemfrom the innermost nested list.

        L[1][2][0]
1000

Read this as,

·        Retrieve the item from index 0from the item at index 2 in the item at index 1 in L

or,

·        Retrieve the item at index 1from L, and from it retrieve the item at index 2, and from it retrieve the itemat index 0

Notationfor Indexing at Depth

There is an alternate notation for repeated indexing intothe constituents of a nested list. The last retrieval can also be written as,

        L[1;2;0]
1000

Retrieving inner items for a nested list with this notationis called indexing at depth.

Important:The semicolons in indexing at depth are critical.

Assignment via index also works at depth.

        L:(1;(100;200;(1000 2000 3000 4000)))
        L[1;2;0]:999
        L
1
(100;200; 999 2000 3000 4000)

To verify that the notation for indexing at depth isreasonable, we return to our matrix example,

        m:((11;12;13;14);(21;22;23;24);(31;32;33;34))
        m[0;2]
13
        m[0][2]
13

The indexing at depth notation suggests thinking of mas a multi-dimensional matrix, whereas repeated single indexing suggeststhinking ofm as an array of arrays.Chacun à son goût.

ListIndexing

A list of positions can be used to index a list.

RetrievingMultiple Items

In this section, we begin to see the power of q formanipulating lists. We start with,

        L1:100 200 300 400

We know how to index single items of the list

        L1[0]
100
        L1[2]
300

By extension, we can retrieve a list of multiple items viamultiple indices,

        L1[0 2]
100 300

The indices can be in any order, and the correspondingitems are retrieved,

        L1[3 2 0 1]
400 300 100 200

An index can be repeated,

        L1[0 2 0]
100 300 100

Some more examples,

        bits:01101011b
        bits[0 2 4]
011b
        chars:"beeblebrox"
        chars[0 7 8]
"bro"

This explains why including the semi-colon separators isessential when indexing at depth. Leaving them out effectively specifiesmultiple indices, and you will get a corresponding list of values from the toplevel as a result.

Indexingvia a Simple List

You have no doubt noticed that retrieving items viamultiple indices looks just like we've substituted a list for the index.Indeed, this is exactly what is happening. Here are some examples of a simpleindex list,

        I:3 2 0
        L1[I]
400 300 100
         L2:(-10.0;3.1415e;1b;`abc;"z")
         L2[I]
`abc
1b
-10f
        L3:(1;(100;200;(1000;2000;3000;4000));5;(600 700))
        L3
1
(100 200; 1000 2000 3000 4000)
5
600 700
 
        J:2 1 0
       L3[J]
5
(100 200; 1000 2000 3000 4000)
1

Indexingvia a General List

Observe that in every case, the result of indexing a givenlist via a simple list is a new list whose values are retrieved from the firstlevel of the given list and whose shape is the same as the index list. Inparticular, the retrieved list has the same shape as the index list. Thissuggests the behavior with an index that is a non-simple list.

        L1:100 200 300 400
        L1[(0 1; 2 3)]
100 200
300 400
 
        I:(1;(0;(3 2)))
        L1[I]
200
(100;400 300)

To figure out the result of indexing by any non-simplelist, start with the fact that the result always has the same shape as theindex.

Advanced:More precisely, the result of indexing via a list conforms to the index list.The notion ofconformability of lists is defined recursively. All atomsconform. Two lists conform if they have the same number of items and each oftheir corresponding items conform. In plain language, two lists conform if theyhave the same shape.

Assignmentwith List Indexing

Recall that a list item can be assigned via item indexing,

        L:100 200 300 400
        L[0]:1000
        L
1000 200 300 400

Assignment via index extends to indexing via a simple list.

        L:100 200 300 400
        L[1 2 3]:2000 3000 4000
        L
100 2000 3000 4000

Note:Assignment via a simple index list is processed in index order - i.e., fromleft-to-right. Thus,

        L[3 2 1]:999 888 777

is equivalent to,

        L[3]:999
        L[2]:888
        L[1]:777

Consequently, in the case of a repeated item in the indexlist, the right-most assignment prevails.

        L:100 200 300 400
        L[0 1 0 3]:1000 2000 3000 4000
        L
3000 2000 300 4000

You can assign a single value to multiple items in a listby indexing on a simple list and using an atom for the assignment value.

        L:100 200 300 400
        L[1 3]:999
        L
100 999 300 999

Juxtaposition

Now that we're familiar with retrieving and assigning viaan index list, we introduce a simplified notation. It is permissible to leaveout the brackets and juxtapose the list and index with a separating blank. Someexamples follow.

        L:100 200 300 400
        L[0]
100
        L 0
100
        L[2 1]
300 200
        L 2 1
300 200
 
        I:2 1
        L[I]
300 200
        L I
300 200
 
        L[::]
100 200 300 400
        L ::
100 200 300 400

Which notation you use is a matter of personal preference.In this manual, we usually use brackets, since this notation is probably mostfamiliar from verbose programming. Experienced q programmers often usejuxtaposition since it reduces notational density.

Find (?)

The dyadic primitive find ( ? ) returns the indexof the right operand in the left operand list.

       1001 1002 1003?1002
1

Performing find on a list is the inverse to positionalindexing because it maps an item to its position.

If you try to find an item that is not in the list, theresult is an int equal to the count of the list.

        1001 1002 1003?1004
3

The way to think of this result is that the position of anitem that is not in the list is one past the end of the list, which is where itwould be if you were to append it to the list.

Of course, find extends to lists of items.

        1001 1002 1003?1003 1001
2 0

ElidedIndices

ElidingIndices for a Matrix List

We return to the situation of indexing at depth for nestedlists. For simplicity, let's start with a list that looks like a matrix.

        m:(1 2 3 4; 100 200 300 400; 1000 2000 3000 4000)

Analogy with traditional matrix notation suggests that wecould retrieve a row or column fromm by providing a"partial" index at depth. Indeed, this works.

        m[1;]
100 200 300 400
 
        m[;3]
4 400 4000

Observe that eliding the last index reduces to itemindexing at the top level.

        m[1;]
100 200 300 400
 
        m[1]
100 200 300 400

Note:In the previous example, the two syntactic forms have the same result, but thefirst more clearly connotes the situation.

The situation of eliding other than the first index is moreinteresting. The way to readm[;3] above is,

·        Retrieve the items in the thirdposition from all items at the top level of m

ElidingIndices for a General List

Let's tackle another level of nesting.

        L:((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the"))
        L
(1 2 3;4 5 6 7)
(`a`b`c`d;`z`y`x`;`0`1`2)
("now";"is";"the")
 
        L[;1;]
4 5 6 7
`z`y`x`
"is"
        L[;;2]
3 6
`c`x`2
"w e"

Interpret L[;1;] as,

·        Retrieve all items in thesecond position of each list at the top level

Interpret L[;;2] as,

·        Retrieve the items in the thirdposition for each list at the second level

Observe that in L[;;2] the attempt to retrieve theitem at the third position of the string "is" resulted in the nullvalue " "; hence the blank in "w e" of the result.

Recommendation:In general, it will make things more evident if you donot omit trailingsemi-colons when eliding indices. For example, with L as above,

        L[ ;;]                 / instead of L[]
        L[1;;]                / instead of L[1]
        L[;1;]                / instead of L[;1]

As the final exam for this section, let's combine an elidedindex with indexing by simple arrays. LetL be as above. Then we canretrieve a cross-section ofL using a combination of elided and listindices.

        L[0 2;;0 1]
(1 2;4 5)
("no";"is";"th")

Interpret this as,

·        Retrieve the items frompositions 0 and 1 from all columns in rows 0 and 2

RectangularLists and Matrices

RectangularLists

In this section, we further investigate the matrix-likelists from the previous section. A "rectangular" list is a list oflists, all having the same count. Understand that this does not mean that arectangular list is necessarily a traditional matrix, since there can beadditional levels of nesting. For example, the following list is rectangularbecause each of its items has count three, but is not a matrix.

        L:(1 2 3; (10 20; 100 200; 1000 2000))
        L
1         2         3
10   20   100  200  1000 2000

In a rectangular list, elision of the second indexcorresponds to generalized row retrieval and elision of the first indexcorresponds to generalized column retrieval.

        r:(`a`b`c;(1 2 3 4;10 20 30 40;100 200 300 400))
        r[0;]
`a`b`c
        r[;1]
`b
10 20 30 40

Advanced:A rectangular list can be transposed withflip (seeflip), meaning that that therows and columns are reflected, effectively reversing the first two indices inindexing at depth. For example, the transpose ofL above is,

        flip L
1 10 20
2 100 200
3 1000 2000

Matrices

Matrices are a special case of rectangular lists and canmost easily be defined recursively. Amatrixof dimension 1 is a simplelist. In the context of mathematical operations, the simple list would havenumeric type, but this is not a restriction. The count of a one-dimensionalmatrix is called itsize. In some contexts, a simple one-dimensionalmatrix is called a vector, its countlength, and an atom is a scalar.Some examples.

        v1:1 2 3
        v2:98.60 99.72 100.34 101.93
        v3:`so`long`and`thanks`for`all`the`fish

For n>1, we define a matrix of dimension n recursivelyas a list of matrices of dimensionn-1 all having the same size. Thus, amatrix of dimension 2 is a list of matrices of dimension 1, all having the samesize. If all items in a matrix have the same type, we call this thetypeof the matrix.

Two andThree Dimensional Matrices

Two-dimensional matrices are frequently encountered andhave special terminology. Letm be a two-dimensional matrix. The itemsofm are its rows. As we have already seen, theithrow of m can be obtained via item indexing asm[i]. Equivalently, wecan use an elided index with indexing at depth to obtain theithrow asm[i;].

By laying out the rows of m in tabular form, werealize that the list m[;j] is the jth column of m.Note that the expressionsm[i][j] andm[i;j] both retrievethe same item - namely, the element in rowi and columnj.

Following is an example of a two dimensional matrix of int,having size 4x3,

        m:(1 2 3;10 20 30;100 200 300;1000 2000 3000)
        m[0]
1 2 3
        m[0;]
1 2 3
        m[;2]
3 30 300 3000
        m[0][2]
3
        m[0;2]
3

The specification of m demonstrates that ourapproach to matrix definition treatsm as a collection of rows - i.e.,m is in row order. Since each row is a simple list, the elements of arow are in fact stored in contiguous memory. This makes retrieval of an entirerow very fast, but retrieval of a column will be slower since its elements arenot contiguous. This choice was made so that list indexing would result in theconventional matrix notation.

Advanced:It is equally valid to consider a one-dimensional array as a column and a twodimensional array as a collection of column vectors. This would make columnretrieval very fast, but index order would be transposed from conventionalnotation. As we shall see inTables, a table is in fact a collection ofcolumns that are notationally transposed for convenience. The constraints andcalculations of q-sql operate on columns, so they are fast, especially when thecolumns are vectors (i.e., simple lists). In particular, a simple time seriescan be represented by two parallel ordered columns, one holding the datetimesand the second holding the associated values. Retrieving and manipulating thepoints stored in time sequence is faster by orders of magnitude than performingthe same operations in an RDBMS that stores data by row with undefined roworder.

For completeness, here is an example of a three dimensional2x3x3 matrix - i.e., each item ofmm is a 3x3 matrix,

        mm:((1 2 3;4 5 6;7 8 9);(10 20 30; 40 50 60; 70 80 90))
        mm[0]
1 2 3
4 5 6
7 8 9
        mm[1;2]
70 80 90
        mm[1;;2]
30 60 90

MatrixFlexibility

We have seen that matrices in q look and act like theirmathematical counterparts. However, they have additional features not availablein simple mathematical notation or in many verbose languages. We have seen thata matrix can be viewed and manipulated both as a multi-dimensional array (i.e.,indexing at depth) and as an array of arrays (repeated item indexing). Inaddition, we can extend individual item indexing with indexing via a simplelist. With m as above,

        m[0 2]
1   2   3
100 200 300

 

 

Contents

[hide]

4. PrimitiveOperations

Introductionto Functions

FunctionNotation

Operators and functions are closely related. In fact,operators are just functions used with infix notation. We cover functions indepth inFunctions, but provide a brief overviewhere. Function evaluation in q uses square brackets to enclose the argumentsand semicolons to separate them. Thus the output value of a monadic functionffor the inputx is written,

       f[x]

Similarly, the value of a dyadic function is written,

        f[x;y]

The simplest functions are those whose domain and range areatomic data types. These functions are called (what else?)atomicfunctions.

Primitives,Verbs and Functional Notation

The normal way of writing addition in mathematics and mostprogramming languages uses an operator with infix notation - that is, a plussymbol between the two operands,

        2+3

In q, we can also consider addition to be a dyadic functionthat takes two numeric arguments and returns a numeric result. You probablywouldn't think twice at seeing,

         sum[a;b]

But you might blink at the following perfectly logicalequivalent,

        +[a;b]

A dyadic function that is written with infix notation iscalled a verb. This terminology arises from thinking of the left operandas the subject which acts on the right operand as object.

The primitive operators are the built-in atomicverbs, including the basic arithmetic, relation and comparison operators. Someare represented by a single ASCII symbol such as '+', '-', '=', and '<'.Others use compound symbols, such as '<=', '>=', and '<>'. Stillothers have names such as 'not', 'neg'. The extent of operations is not limitedto the primitives, since any monadic or dyadic function can be made into averb.

Any verb, including all the primitive operators, can alsouse regular function notation. So, in q you can write,

        +[2;3]
5

It is even possible, and sometimes useful, to write abinary verb using a combination of infix and functional notation for the twooperands. This may look very strange at first,

        (2+)[3]
5

It is even possible to write,

        (2+)3
5

Item-wiseExtension of Atomic Functions

A fundamental feature of an atomic function or operator isthat its domain is extended to lists by item-wise application. Thus, a monadicatomic function is applied to a simple list by operating element-wise on thelist. A dyadic atomic operator is extended to operate on an atom and a simplelist by applying its operation to the atom and the items in each position ofthe list. Similarly, a dyadic atomic operator is extended to operate on a pairof simple lists by operating pair-wise on elements in corresponding positions.

Symbolically, let m be a unary atomic verb, opa binary atomic verb,a an atom,L, L1 and L2simple lists, and i an int index. Then,

i th element of

is

m[ L]

m[ L[ i] ]

a op L

a op L[ i]

L op a

L[ i] op a

L1 op L2

L1[ i] op L2[ i]

For example, the result of applying neg to asimple list is obtained by application to each item of the list.

        L:100 200 300 400
        neg L
-100 -200 -300 -400

The result of adding an atom to a simple list is obtainedby adding the atom to each item of the list.

        99+L
199 299 399 499

The result of adding two simple lists of the same length isaddition of items at corresponding positions.

        L1:100 200 300 400
        L2:9 8 7 6
        L1+L2
109 208 307 406

OperatorPrecedence

TraditionalOperator Precedence

Recall that mathematical notation and verbose programminglanguages have a concept of operator precedence, which attempts to resolveambiguities in the evaluation of arithmetic and logical operations inexpressions. The arithmetic precedence rules were drummed into you inelementary school: multiplication and division are equal and come beforeaddition and subtraction, etc. There are similar precedence rules for =, <,>, 'and' and 'or'.

Left ofRight Evaluation

Although the traditional notion of operator precedence hasthe weight of many years of incumbency (not to mention the imprecations of yourfifth grade math teacher), it's time to throw the bum out. As mentioned inatoms, q has no rules for operatorprecedence. Instead, it has one simple rule for evaluating any expression:

Expressions areevaluated left of right

We could also say "right to left" since theinterpreter evaluates an expression from right-to-left. However, every actionin q is essentially a function evaluation, and it is more natural to read"f of x" rather than "x evaluated by f". Thinkingfunctionally makes "of" a paradigm, not just a preposition.

The adoption of left-of-right expression evaluation frees qto treat infix notation simply and uniformly. Which notation is used, infix orfunctional, depends on what is clearer in the specific context.

Left-of-right expression evaluation also means that thereis no ambiguity in any expression. (This is from the compiler's perspective; itis certainly possible to write q expressions comprehensible to only thecompiler and q gods). Parentheses can still be used to override the defaultevaluation order but there will be far fewer once you abandon the old (bad)habit of using them to override operator precedence. You should arrange yourexpressions with a goal of placing parentheses on the endangered species list.

A Gotchaof Left of Right Evaluation

Due to left-of-right evaluation, parentheses areneeded to isolate the result of an expression that is the left operand of averb. Omitting such parentheses is a common error for q newbies, as this groupingis often unnecessary in verbose languages.

Here is a canonical example, where < and > have theirusual meanings. As we shall see shortly, the | operator returns the maximum ofits operands; this reduces to "or" for binary types. It is a rite ofpassage of q newbies to write the first expression intending the second,

        x:100
        x<42|x>98
0b
        (x<42)|x>98
1b

The first expression parses from right to left as:

·        x is tested against 98 bygreater than, yielding 1b, which is compared for the larger to 42, yielding 42,against which x is tested by less than, yielding 0b.

The second expression parses from right to left as:

·        x is tested against 98 bygreater than, yielding 1b, which is compared for the larger to 0b (being theresult of testing x against 42 by less than), yielding 1b.

Should this seem unnatural, don't worry. Once you completethis chapter, revisit here and it'll feel right as rain.

Rationalefor No Operator Precedence

Operator precedence is quite feeble in that it requires allthe components of an expression to be analyzed (think for a moment about howyou do it manually) before it can be evaluated. Ironically, it results in thefrequent use of parentheses to override the very rules that are purportedlythere to help.

Even more damning is that operator precedence forcessemantic content onto infix notation. Suppose a programming language wished toallow dyadic functions to be verbs - i.e., expressed in infix notation - sothat

        f[x;y]

can also be written,

        x f y

This would entail the extension of precedence rules tocover verbs whenever they are mixed with arithmetic operations. Aside frombeing impractical, this would result in yet more parentheses.

Match (~)

The non-atomic, binary match operator ( ~ )applies to any two entities, returning a boolean result of1b if theyare identical and0b otherwise. For two entities to match, they musthave the same shape, the same type and the same value(s), but they may occupyseparate storage locations. Colloquially, clones are considered identical in qbecause they are indistinguishable.

Advanced:This differs from the notion of identity in some verbose languages, in thatdistinct q entities can be identical. For example, in languages of C ancestry,objects are equal if and only if their underlying pointers address the samememory location. Identical twins arenot equal. You must write your ownequivalence method to determine if one object is a deep copy of another.

There are no restrictions as to the type or shape of thetwo operands for match. Try to predict each of the following results of match,

        42~42
1b
        42~42h
0b
        42f~42.0
1b
        42~`42
0b
        `42~"42"
0b
        4 2~2 4
0b
        42~(4 2;(1 0))
0b
        (4 2)~(4; 2*1)
1b
        (1 2; 3 4)~(1; 2 3 4)
0b

While you are learning q, applying match can be aneffective way to determine if you have entered what you intended, or todiscover whether two different ways of expressing something produce the sameresult. For example, q newbies often trip over

        42~(42)
1b

This technique can be useful in checking intermediateresults when debugging (except for the q gods who enter perfect q code everytime).

RelationalOperators

The relational operators are atomic verbs that returnboolean results. Relational operations on atomic types have requirementsregarding the compatibility of the operands.

Equality(=) and Inequality (<>)

We begin with the equality operator ( = ), whichdiffers from match in that it is atomic, so it tests its operandscomponent-wise instead of in entirety. All atoms of numeric or char type aremutually compatible for equality, but symbols are compatible only with symbols.

Equality is not strict with regard to type, meaning typeswith the same underlying value are equal. For example, chars are equal to theirunderlying values.

 #!q
       42h=2*21
1b
        42=42.0
1b
        42=(42)
1b
        42=0x42
0b
        42="*"
1b

A symbol and a character are not compatible and an errorresults from the test,

        `a="a"
'type

The not-equal primitive is ( <> ).

        42<>0x42
1b

Note:The test "not equal" can also be expressed by applyingnot to theresult of testing with=.

        a:42
        b:98.6
        a<>b
1b
       not a=b
1b

Note:When comparing floats, q uses multiplicative tolerance, which makes arithmeticgive rational results.

        r:1%3
        r
0.3333333
        2=r+r+r+r+r+r
1b

Not Zero(not)

The monadic, atomic relational operator notdiffers from its equivalent in some verbose languages. It returns a booleanresult and has a domain of all numeric and character types; it is not definedfor symbols. Thenot operator generalizes the reversal of true andfalse values to any entity having an underlying numeric value by testing itsargument against an underlying 0. In other words, it answers the Hamletonianquestion: to be, or not to be, zero.

The test against zero yields the expected results forboolean arguments.

        not 0b
1b
        not 1b
0b

More generally, the test against zero apples for anynumeric type.

        not 42
0b
        not 0
1b
        not 0j
1b
        not 0xff
0b
        f:98.6
        not f
0b
        not 0.0
1b

For char values, not returns false except for thecharacter representing the underlying value of 0.

        not "a"
0b
        not " "
0b
        not "\000"
1b

For date and datetime values, not tests againstmidnight of Jan 1, 2000, since this is the datetime with underlying value 0.

        not 2042.04.02
0b
        not 2000.01.01T00:00:00:000
1b
        not 2000.01
0b

The last example obtains because omitted temporalconstituents default to their underlying numeric 0 values.

For time values, not tests against 00:00:00.000.

         not 00:00:00.000
1b
         not 04:02:42.042
0b

Ordering:<, <=, >, >=

We consider the binary atomic order operators. Less than ( <), greater than (> ) less or equal (<= ) and greateror equal ( >= ) are defined for all atoms with the requirement thatthe operands be of compatible types. Numeric and char types are mutuallycompatible, but symbols are only compatible with symbols. Comparison fornumeric and char types is based on underlying numeric value, independent oftype.

        4<42
1b
        4h>=0x2a
0b
        -1.59e<=99j
1b

For char atoms, the underlying numeric value results incomparison according to ASCII character sequence.

        "A"<"Z"
1b
        "a"<="Z"
0b
        "A"<"0"
0b
        "?"<"/"
0b

A numeric atom and a char are compared according to theunderlying numeric value of the char.

        42<"z"
1b

For symbols, comparison is based on lexicographic order.

 #!q
       `a>=`b
0b
        `ab<`abc
1b

Now that we are familiar with relational operations onatoms, let's examine their item-wise extensions to simple lists.

        2<1 2 3
001b
 
        1 2 3h>=-987.65 1.234 567.89
110b
 
        " "="Life the Universe and Everything"
00001000100000000100010000000000b
 
        "zaphod"="Arthur"
000100b
 
        "zaphod">"Arthur"
100000b

Note:As of this writing (Jun 2007), the primitive> is converted to theequivalent< under the covers by the q interpreter. That is,

        a>b

is actually evaluated as,

        b<a

This does not matter when a and b areatoms or lists, but it does have consequences when they are dictionaries.

BasicArithmetic: +, -, *, %

The arithmetic operators are atomic verbs and come in twoflavors: binary (in the mathematical sense of having two operands) andunary(one operand). We begin with the four operations of elementary arithmetic.

42+67

Symbol

Name

Example add

*

times

2h*3h

%

divide

42%6

On the surface, things look pretty much like otherprogramming languages, except that division is represented by% since/is used to delimit comments. We have,

        6*7
42
        a:42
        b:3
        c:a-b
        c
39
        100*a
4200
        c%b
13f

Note:The result of division isalways a float.

For a programmer not accustomed to left-of-rightevaluation, the following may take some getting used to.

        2*1+1
4

Things can get funky fast for the q newbie.

        c:1000*b:1+a:42
        c
43000

One way to read this is:

The integervalue 42 is assigned to the variable named a, then the assigned value is addedto 1, then this result is assigned to the variable named b, whose assignedvalue is multiplied by 1000 and the result is assigned to the variable named c

The arithmetic operations are defined for all numerictypes, and all numeric types are compatible. The type of the result depends onthe operands. Loosely speaking, smaller types are promoted to their widercousins and division always results in floats. Typing does not get in the way ofarithmetic.

When binary types participate in addition, subtraction andmultiplication, they are promoted to int. In other words, arithmetic isnotperformed modulo 2 (i.e., in base 2) for binary values, or modulo 256 for bytevalues.

        1b+1b
2
        0x2a+0x11
59
        42+1b
43
        5*0x2a
210

When integer types are used in addition, subtraction andmultiplication, the result is an int or the widest type present, whichever iswider.

        a:42
        b:123h
        c:1234567890j
        b+b
246
        a+b
165
        a+b+c
12345678055j

The result of addition, subtraction and multiplication ofinteger data types is modulo the width of the result. That is, overflow isignored. For example, int arithmetic is modulo 2^32^.

        i:2147483647
        i+3
- 2147483646

When any numeric types participate in division, they arepromoted to float and the result is a float.

        1%3
0.3333333
        3%1
3f

When floating point data types are mixed, the result isfloat.

        6.0*7.0e
42.0

Note:The arithmetic operators arealways dyadic. In particular, while (- ) is alsoused syntactically to denote a negative number, there is no unary function (- ) to negatea value. Its attempted use for such generates an error. Use the operatorneg for thispurpose.

        a:-4
        a
-4
        -a                / This is an error
'-
        neg a
4

According to the discussion in Match, the arithmetic operatorsare extended item-wise to lists. Thus,

        2+100 200 300
102 202 302
 
        b:1000.0 2000.0 3000.0 4000.0
        b*2
2000 4000 6000 8000f
 
        c:2 4 6 8
        b%c
500 500 500 500f

In the following example, observe that item-wise atomicapplication is recursive when all the list components are numeric,

        e:(100 200;1000 2000)
        e-2
98  198
998 1998

Max (|)and Min (&)

The comparison operators are atomic and binary, and returnthe type of the widest operand. Numeric types and char are mutually compatible;comparison is not defined for symbols.

The max operation ( | ) returns the maximum of itsoperands based on underlying numeric values; this reduces to logical"or" for binary operands. The min operation (& )returns the minimum of its operands based on underlying numeric values; thisreduces to logical "and" for binary operands. The same type promotionrules apply as for the arithmetic operators.

         0b|1b
1b
        1b&0b
0b
        42|0x2b
43
        4.2e&42j
4.2e
        "a"|"z"
"z"
        "0"&"A"
"0"
        `a|`z                / this is an error
`type

Following are examples of comparison extended item-wise tosimple lists.

        2|0 1 2 3 4
2 2 2 3 4
        11010101b&01100101b
01000101b
        "zaphod"|"arthur"
"zrthur"

Note:For the symbolically challenged, the operator| can also be written asor. Theoperator & can be written asand.

        1 and 3
1
        "a" or "z"
"z"

ExponentialPrimitives: sqrt, exp, log, xexp, xlog

sqrt

The atomic unary sqrt has as domain allnon-negative numeric values and returns a float representing the square root ofits argument.

         sqrt 2
1.414214
         sqrt 4
2f
         sqrt 0x42
8.124038
         sqrt -1
0n

exp

The atomic unary exp has as domain all numericvalues and returns a float representing the base e raised to the power of itsargument.

        exp 1
2.718282
        exp 4.2
66.68633
        exp -12h
6.144212e-06

Note:Do not confuse the 'e' used in the display of scientific notation with themathematical base of natural logarithms.

log

The atomic unary log has as domain all numericvalues and returns a float representing the natural logarithm of its argument.

        log 1
0f
        log 0x2a
3.73767
        log 0.0001
-9.21034
        log -1
0n

xexp

The atomic binary xexp has as domain all numericvalues in both operands and returns a float representing the left operandraised to the power of the right operand. If the mathematical operation doesnot make sense, the result is0n.

        2 xexp 5
32f
     -2 xexp .5
0n

xlog

The atomic binary xlog has as domain all numericvalues in both operands and returns a float representing the logarithm of theright operand with respect to the base of the left operand. If the mathematicaloperation does not make sense, the result is 0n.

        2 xlog 32
5f
        2 xlog -1
0n

MorePrimitives: mod, signum, reciprocal, floor, ceiling and abs

These functions are useful in calculations.

Modulus(mod)

The binary mod is atomic in its left operand (dividend)which is any numeric value. The right operand (divisor) is a numericatom. The result is the remainder of dividing the dividend by the divisor. Thisproduces the usual remainder from elementary school for positive integers butis somewhat more complex for general numeric arguments.

For a positive divisor, the remainder is defined as thedifference between the dividend and the largest integral multiple of thedivisor not exceeding the absolute value of the dividend.

        4 mod 3
1
        0x2a mod 0x10
10
        4.5 mod 2.3
2.2
        4.5 mod -2.3
-0.1

Sign(signum)

The atomic unary signum has as domain all integraland floating point types and returns an int representing the sign of itsargument. Here 1 represents "positive", -1 represents"negative" and 0 represents a zero argument.

        signum 4.2
1
        signum -42
-1
        signum 0
0

reciprocal

The atomic unary reciprocal has as domain allnumeric types and returns afloat representing 1.0 divided by theargument.

        reciprocal 0.02380952
42.00001
        reciprocal 0
0w

floor

The atomic unary floor has as domain int andfloating point types and returns an int representing the largest integer thatis less than or equal to its argument.

        floor 4
4
        floor 4.0
4
        floor 4.2
4
        floor -4.0
-4
        floor -4.2
-5

The floor operator can be used to truncate orround floating point values to a specific number of digits to the right of thedecimal.

        a:4.242
       0.01*floor 100*a
4.24
        0.1*floor 0.5+10*a
4.2

Note:Thefloor function does not apply to boolean, byte, or short types.

        floor 0x2a
'type

ceiling

Analogous to floor, the atomic unary ceilinghas as domain int, long and floating point types and returns the smallest intthat is greater than or equal to its argument.

        ceiling 4
4
        ceiling 4.0
4
        ceiling 4.2
5
        ceiling -4.0
-4
        ceiling -4.2
-4

Note:For reasons known to the q gods,ceiling does apply to boolean or bytetypes but not to short type.

        ceiling 0b
0
        ceiling 42h
'type

AbsoluteValue (abs)

The atomic unary abs has as domain all integraland floating point types. It returns its argument if the argument is greaterthan or equal to zero, orneg applied to its argument otherwise. Theresult ofabs has the same type as the argument.

        abs 4
4
        abs -4
4
        abs -4.2
4.2
        abs -4.0
4f
        abs -4.2e
4.2e
        abs -4j
4j

Operationson Temporal Values

We have separated temporal types and their operations intothis section because they have richer semantics.

Internal Formatof Temporal Types

First, we note that a date or datetime is actually storedunder the covers as a signed float, with 0.0 corresponding to midnight ofJanuary 1, 2000. So,

        0.0=2000.01.01T00:00:00:000
1b

The integral part of the floating point value correspondsto the number of days after (positive) or before (negative) the start of themillennium. The decimal portion of a datetime is the fractional portion of a24-hour day represented by its time component. Thus,

        33.5=2000.02.03T12:00:00.000
1b

Time is stored as the number of milliseconds from the startof day. Thus, a time value is between 0 and 86,400,000 (24*60*60*1000). So,

        43200000=12:00:00.000
1b

BasicOperations

In contrast to some verbose languages, any expression involvingtemporal types and numerical types that should make sense actually does, and itworks in the expected fashion. Comparison of dates or datetimes reduces tocomparison of the underlying floating point values. Thus,

        2006.01.01T00:00:00.000<2005.12.25T12:00:00.000
0b
        2005.12.25=2005.12.25T00:00:00.000
1b
        2005.12.25<2005.12.25T12:00:00.000
1b

Time values can be compared with each other and the resultis based on the underlying millisecond counts.

        12:01:10.987<17:05:42.986
1b

A date and a time can be added to give a datetime.

        2007.07.04+12:45:59.876
2007.07.04T12:45:59.876

Note:A time is implicitly converted to a fractional day when it is added to a dateto get a datetime.

Day Countsand Time Counts

A date or datetime can be compared, or tested for equality,with a float,

        366.0=2001.01.01
1b

A time can be compared with an int.

        43200000<12:00:00.001
1b

A float representing a fractional day count can be added toor subtracted from a datetime (or date) to give a datetime. In this context,the integral part of the fractional day count represents the number of days andthe decimal part represents the fractional part of a 24-hour day. For example,to move forward 33 days and 12 hours,

        2000.01.01T00:00:00:000+33.5
2000.02.03T12:00:00.000

Or, to move back 2 hours and 30 minutes,

        2000.01.01T00:00:00:000-2.5%24
1999.12.31T21:30:00.000

An int representing a day count can be added to orsubtracted from a date to give a date.

        2006.07.04+5
2006.07.09

The difference of two datetimes is a float representing thefractional day count between them.

        2007.02.03T12:00:00.000-2007.01.01T00:00:00:000
33.5

The difference between two dates is an int day countrepresenting the number of days between them.

        2006.07.04-2006.04.04
91

An int representing a time count of milliseconds can beadded to or subtracted from a time to give a time.

        12:00:00.000+1000
12:00:01.000

The difference between two times is an int count of thenumber of milliseconds between them.

        23:59:59.999-00:00:00.000
86399999

Observe that a time does not wrap when it exceeds 24 hours.

        23:59:59.999+2
24:00:00.001

Operationson Infinities and Nulls

As you gain experience with the way q handles infinitiesand nulls, you'll find that it is simpler and more rational than verboselanguages. Injection of such an exceptional value into a calculation streampropagates through subsequent steps in a predictable way without the need forspecial error trapping and handling. While the result will contain somemeaningless data, portions that do not depend on the invalid values will stillcompute correctly.

ProducingInfinities

We show how to produce and operate with the infinities wemet in Infinities and NaN. Division of anon-negative numeric value by any 0 results in float infinity, denoted by0w.

        4.0%0
0w
        3.14%0.0
0w
        0x32%0
0w
        1b%0
0w

Similarly, division of a negative numeric value by any 0results in negative float infinity, denoted by-0w.

        -4%0.0
-0w
        -3.14%0
-0w

The int infinities can not be produced via an arithmeticoperation on normal int values, since the result of division in q is always oftype float.

        42%0
0w
        -42%0
-0w

ProducingNaN

When any numeric zero is divided by zero, the mathematicalresult is undefined. This is sometimes represented in writing as NaN("not-a-number"). It is denoted in q by0n, which is thefloat null value,

        0%0
0n
        0.0%0.0
0n
        0.0e%0b
0n
        0j%0x00
0n

BasicArithmetic on Infinities and Nulls

The infinities and nulls act reasonably in numericexpressions and comparisons. Generally, if one member of an expression isinfinite or null so is the result. In an arithmetic mix of infinity, null orNaN, the null prevails over infinity and NaN prevails over other nulls. Notethat the signs of infinities are carried correctly through arithmetic andmeaningless expressions involving infinities result in NaN.

        2+0w-3
0w
        0w*-0w
-0w
        -0w+0w
0n
        42+0n
0n
        42+0N
0N
        0w+0n
0n
        0n+0N
0n

The exception to the above is that any integral infinitycan be added to its negative infinity to yield 0.

        -0Wj+0Wj
0j

TypePromotion

When nulls occur in expressions of mixed type, the sametype promotion rules apply as for finite values.

        42+0N
0N
        42j+0N
0Nj
        0N+0Nj
0Nj
        0n+0N
0n

Equality

Infinities are distinct from all numeric values and fromall nulls as well, since they do not represent missing data. All nulls areequal since they differ only by type.

        42=0W                / can compare a numeric value to infinity
0b
        0w=42%0            / can compare float infinity to itself
1b
        0=0N                   / 0 is not the same as missing integer
0b
        0=0n                   / 0 is not the same as missing float
0b
        0w=0W               / float infinity is not the same as int infinity
0b
        0w=0N                / float infinity is not the same as null integer
0b
        0w=0n                / float infinity is not the same as missing float
0b
        0Nj=0N               / missing long and missing int are the same
1b
        0N=0n                / missing int and missing float are the same
1b

Note:In contrast to some languages, such as C, separateNaNs are equal.

        (0%0)=0%0
1b

Advanced:The integral infinities, positive and negative, have underlying values whosebit patterns correspond to legitimate base-2 integral values.

Value

Bit Representation

0Wh

0111111111111111b

0W

01111111111111111111111111111111b

0Wj

0111111111111111111111111111111111111111111111111111111111111111b

Consequently, we find

        32767=0Wh
1b
        2147483647j=0W
1b
        -32767=-0Wh
1b
        -2147483647j=-0W
1b

Match

Match is a different story because type matters.

        42~0w                    / can try to match a numeric value to infinity
0b
        0w~42%0                / can match infinity to itself
1b
        0~0N                       / 0 does not match an missing integer
0b
        0~0n                        / 0 does not match missing float
0b
        0w~0W                     / float infinity does not match int infinity
0b
        0w~0N                      / infinity does not match missing integer
0b
        0w~0n                      / infinity does not match missing float
0b
        0Nj~0N                     / missing long and missing int do not match
0b
        0N~0n                      / missing int and missing float do not match
0b

not

The not operator returns 0b for allinfinities and nulls since they all fail the test of equality with 0.

        not 0w
0b
        not 0W
0b
        not 0N
0b
        not 0n
0b

neg

The neg operator returns -1 times its operand, soit reverses the sign on infinities but does nothing to nulls since sign ismeaningless for missing data.

        neg 0W
-0W
        neg -0w
0w
        neg 0N
0N
        not " "
0b

Comparison

Comparisons apply to infinities and nulls, as summarized inthe following diagram.

nulls < -0w < -0Wj< -0W < -0Wh < numeric values < 0Wh < 0W < 0Wj < 0w

As rules:

  • Positive float infinity is greater any positive integral infinity
  • Positive integral infinities are ordered by their type, widest largest
  • Nulls and all negative infinities are less than all normal values which are less than all positive infinities
  • Negative float infinity is less than any integral negative infinity
  • Negative integral infinities are ordered by their type, widest least
  • Any null is less than any infinity or numeric value

Note:These relations characterize the infinities, in the sense that they are largeror smaller than all normal values. The integral infinities have underlying bitpatterns corresponding to legitimate base 2 values that yield the aboverelations. Infinite arithmetic will parse, but the results are not particularlyuseful. It is recommended that you limit operations on integral infinities toequals, not equals and inequalities.

Some examples,

         42<0W
1b
        -0w<42.0
1b
        -0w<1901.01.01
1b
        -0w<0w
1b
        0W<0w
1b
        -0w<0W
1b
        -10000000<0N
0b
        0Nj<42
1b
        0n<-0w
1b

The null symbol is less than any other symbol

        `a<`                / the right side is the null symbol
0b

Max andMin

The behavior of | and & withinfinities and nulls derives from that of equality and comparison.

        42|0W
0W
        -42&0N
0N
        0w|0n
0w
        -0w&0n
0n
        0n|0N
0n
        0n&0n
0n
        0W&0Wj
2147483647j

The last result obtains because int infinity is promoted toa long and its bit pattern corresponds to the listed value.

Alias(Advanced)

An alias is a variable that is defined as anexpression involving other variables. This differs from ordinary assignmentwhich defines a variable as theresult of an expression.

Alias andDouble assignment

Double assignment (::) outside a function defines the leftoperand as an alias of the right operand. When the alias is referenced,the underlying expression will be (re)evaluated. For example, the following definesb as an alias fora. Observe that changing the value of ais reflected in b but not inc:

        a:42
        b::a
        c:a
        b
42
 
        c
42
 
        a:98.6
        b
98.6
 
        c
42

Aliasing is useful when the underlying expressionrepresents a calculation.

        u:4
        v:3
        w::v+sqrt u
        w
5f
 
        u:9
        w
6f

The result of aliasing can also be achieved with afunction. In the previous example, we could define,

        f:{y+sqrt x}
        f[4;3]
5f

Aliasing provides convenient variable syntax instead offunction semantics, but the dependencies are more evident in the function.

Alias chains are resolved and dependency loops aredetected.

        a:42
        b::a
        c::b+1000
        b
42
 
        c
1042
 
        a:98.6
        b
98.6
 
        c
1098.6
 
        a::c
'loop

Advanced:Aliasing can be used to provide a view in a database by specifying a query asthe right operand. For example

        t:([]c1:`a`b`c`a;c2:20 15 10 20;c3:99.5 99.45 99.42 99.4)
        va:select sym:c1,px:c3 from t where c1=`a
        va
sym px
--------
a   99.5
a   99.4

Dependencies

Double assignment establishes a dependency of the alias onthe entities in its underlying expression. For example,

        u:4
        v:3
        w::u+v

establishes s dependency of w on u and v. Q maintains alist of dependencies in the dictionary .z.b.

        .z.b
u| w
v| w

Each entity in the domain of .z.b is mapped to the entitiesthat depend on it. If we add an alias of u in our example, we find,

        .z.b
u| w z
v| w

Advanced:The table dependencies implicit in views are not reflected in .z.b.

        t:([]c1:`a`b`c`a;c2:20 15 10 20;c3:99.5 99.45 99.42 99.4)
        s:select c1,c3 from t where c2=20
        .z.b
u| w z
v| w

 

 

Contents

5. Functions

Overview

In this chapter, we cover functions in depth. Beforestarting, you may wish to review theMathematical Functions Refresher if it hasbeen a while since your last encounter with mathematical functions.

Appendix A contains specifics and examples of all the qbuilt-in functions. We shall use built-in functions in the following sectionswithout introduction. Simply look it up inAppendix A.

FunctionSpecification

The notion of a function in q corresponds to a(mathematical) map that is specified by an algorithm. Afunction is asequence of expressions to be evaluated, having optional input parameters and areturn value.Application of a function is the process of evaluating theexpressions in sequence, substituting actual arguments for any formalparameters. If a return value is specified, the function evaluates to itsreturn value.

Advanced:Because a q function can access global variables, the corresponding mathematicalmapping actually includes the workspace as an implicit parameter. In otherwords, q is not a pure functional language because functions can have sideeffects.

FunctionDefinition

The distinguishing characteristic of function definition isa matching pair of braces{ and} enclosing a sequence ofexpressions separated by semi-colons. In contrast to verbose languages, afunction's input parameters and the return value are not typed. In fact, theydon't even need to be declared explicitly. Even the function name is optional.

Following is a full specification of a function thatreturns the square of its input. Observe that we have added optional whitespaceafter the parameter for readability.

        f:{[x] x*x}

You call f by enclosing its actual parameter insquare brackets,

        f[3]
9

Here is a compact form of an equivalent function evaluationin which optional aspects are omitted,

        {x*x}[5]
25

FunctionNotation and Terminology

The notation for function definition is,

{[p1;...;pn]e1; ...; en}

where the optional p1, ... , pnare formal parameters ande1, ... , en is asequence of expressions to be evaluated in left-to-right sequence.

For readability, we shall normally insert optionalwhitespace after the closing square bracket that closes the parameter list, aswell as after each semicolon separator. Other styles may differ.

Note:The reason the expressions in a function are evaluated in left-to-rightsequence is so that the sequence becomes top-to-bottom when the function definitionis split across multiple lines. Specifically, right-to-left expressionevaluation would result in the following definition,

        f:{[p1;...;pn]
                e,,1,,;
                ...;
                e,,n,,}

being evaluated from bottom to top, which would be veryunnatural.

The number of formal input parameters, either implicit orexplicit, is the function'svalence". Most common are monadic (valence1) and dyadic (valence 2). You specify a function with no parameters (niladic)with an empty argument list,

        {[] ...}

Important:The maximum valence currently permitted is 8, so specifying more than eightarguments will cause an error. You can circumvent this restriction byencapsulating multiple parameters in a list argument.

Recommendation:Q functions should be compact and modular: each function should performwell-defined unit of work. Due to the power of q operators and built-infunctions, helper functions are often one liners. When a function exceeds 20expressions, you should ask yourself if it can be factored.

Variables that are defined within the expression(s) of afunction are called local variables.

The "return value" of a function is the valuecarried by the function evaluation. It is determined by the following rules:

  • If an empty assignment appears - i.e., a ':' with no variable name to the left - then its assignment value is returned.
  • Otherwise, if any local variables are assigned, the assigned value of the last one is returned.
  • Otherwise, the result of the last expression evaluation is result.

For example, the following function specifications resultin the same input-output mapping.

        f1:{[x] :x*x}                / explicit return
        f2:{[x] r:x*x}               / local variable is returned
        f3:{[x] x*x}                 / last expression is result

So does this one, even though it includes useless andunexecuted evaluations.

        f4:{[x] a:1;:x*x;3}

Advanced:In contrast to k, the q operators are not overloaded on valence, meaning thatan operation does not have different functionality for different numbers ofarguments. However, q some operators (and build-in functions) are overloaded onthe types of the arguments, or even the sign of the arguments. For example, tounderstand the exact use of (?), you must carefully examine the operands.

ImplicitParameters

If you omit the formal parameters and their brackets, threeimplicit positional parametersx,y and z areautomatically available in the function's expressions. Thus, the following twospecifications are equivalent:

        f:{[x] x*x}
 
        g:{x*x}

And so are,

        f:{[x;y] x+y}
 
        g:{x+y}

When using implicit parameters, x is always thefirst actual argument,y second andz third. The followingfunction g generates an error unless it is called with threeparameters.

        g:{x+z}        / likely meant x+y; requires 3 parms in call
 
        g[1;2]         / error...needs three parameters
{z+z}[1;2]
 
        g[1;2;3]       / OK...2nd value is required but ignored
4

Recommendation:If you use the names x, y and z in a function, reserve them for the first threeparameters, either explicit or implicit. Any other use will almost certainlylead to confusion, if not to trouble.

AnonymousFunctions

A function can be defined without being assigned to avariable. Such a function is calledanonymous since it cannot beevaluated by name.

        {x+y}[4;5]
9

An anonymous function can be appropriate when it will beevaluated in only one location. A prevalent use is in-line helper functionswithin other functions.

        f{[...] ...; {...}[...]; ...}

It is arguably more readable to extract anonymousfunctions.

        g:{...}
        f:{...; g[...]; ....}

This is a matter of coding style.

TheIdentity Function (::)

The identity function :: returns its argument. Itis useful for specifying defaults when using functional forms ofamendandselect.

Important:The identity function cannot be used with juxtaposition.

        ::[`a]
`a
        ::[1 2 3]
1 2 3
        :: 42
'

Functionsare Nouns

The q entities we have met until now have been either nounsor verbs. Atoms and lists are nouns. Operators are verbs. In the followingexpression,

        a:1+L:100 200 300

a, L and theliterals 100, 200, 300 are nouns, while the assign and plus operators areverbs.

It may come as a surprise that functions are also nouns. Wecan write,

        a:3
        f:{[x] 2*x}
        a:f
        a 3
6

Operators used as functions are also nouns, so continuingthe previous example we can also write,

        L:(f;+)
        L
{2*x}
+

Note:The display ofL illustrates that a function name is resolved to its body at thetime of assignment. If the definition off is subsequentlymodified, L will not change.

Local andGlobal Variables

LocalVariables

A variable that is defined by assignment in an expressionin a function is called alocal variable. For example,a is alocal variable in the following function.

        f:{a:42; a+x}

A local variable exists only from the time it is firstassigned until the completion of the enclosing function's evaluation; it has novalue until it is actually assigned. Provided there is no variableaalready assigned in the workspace, evaluation of the function does not createsuch a variable. Using f as above,

        f[6]
48
        a
`a

GlobalVariables

Variables that have been assigned outside any functiondefinition are called global variables.

        b:6
        f:{x*b}
        f[7]
42

To assign a global variable inside a function, use a doublecolon ( :: ), which tells the interpreter not to create a localvariable with the same name.

        b:6
        f:{b::7; x*b}
        f[6]
42
        b
7

Local andGlobal Collision

When a local variable is defined with the same name as aglobal variable, the global variable is obscured.

        a:42
        f:{a:98; x+a}
        f[6]
104
        a
42

Important:When local and global names collide, the global variable is always obscured.Even double colon assignment affects the local variable. For example,

        a:42
        f:{a:6;a::98; x*a}
        f[6]
588
        a
42

Amend (:)

Amend in CLanguage

We have already seen the basic form of assignment usingamend

        a:42

Programmers from languages with C heritage will be familiarwith expressions such as,

        x += 2;                // C expression representing amend

which is shorthand for,

        x = x + 2;            // C expression

This is usually read simply "add 2 to x" but moreprecisely is, "assign to x the result of adding 2 to the current value ofx." This motivates the interpretation of such an operation as"amend," in whichx is re-assigned the value obtained byapplying the operation + to the operandsx and 2. Byimplication, a variable can only be amended if it has been previously assigned.

SimpleAmend

In q, the equivalent to the above C expression uses +:as the operator.

        x:42
        x+:2
        x
44

There is nothing special about + in the abovediscussion. Amend is available with any binary verb, as long as the operandtypes are compatible.

        a:42
        a-:1
        a
41

We shall see interesting examples of amend with otheroperators in later chapters.

Amend withLists

This capability to amend in one step extends to lists andindexing,

        L1:100 200 300 400
        L1[1]+:9
        L1
100 209 300 400
 
        L1[0 2]+:99
        L1
199 209 399 400
 
        L1:100 200 300 400
        L1[0 1 2]+:1 2 3
        L1
101 202 303 400
 
        L2:(1 2 3; 10 20 30)
        L2[;1]+:9
        L2
1  11  3
10 29 30
 
        L2:(1 2 3; 10 20 30)
        L2[0;1]+:100
        L2
1  102 3
10 20  30

Note:Amend enforces strict type matching with simple lists, since the result must beplaced back into the list,

        L1[0]+:42f
`type

Projection

FunctionProjection

Sometimes a function of valence two or more is evaluatedrepeatedly while some of its arguments are held constant. For this situation, amultivalent function can have one or more arguments fixed and the result is afunction of lower valence called theprojection of the original functiononto the fixed arguments. Notationally, a projection appears as a function callwith the fixed arguments in place and nothing in the other positions.

For example, the dyadic function which returns thedifference of its arguments,

        diff:{[x;y] x-y}

can be projected onto the first argument by setting it to42, written as,

        diff[42;]

The projected function is the monadic function"subtract from 42",

        diff[42;][6]
36

This projection is equivalent to,

        g:{[x] 42-x}
        g[6]
36

We can also project diff onto its second argumentto get "subtract 42",

        diff[;42][6]
-36

which is equivalent to,

        h{[x] x-42}

When a function is projected onto any argument other thanthe last, the trailing semi-colons can be omitted. Givendiff asabove,

        diff[42][6]
36

Recommendation:It will make your intent more evident if you donot omit trailingsemi-colons when projecting. For example, withdiff as above, a readerwill immediately recognize the projection,

        diff[42;][6]                / instead of diff[42][6]

The brackets denoting a function projection are required,but the additional brackets in the projection's evaluation can be omitted withjuxtaposition (as for any regular function).

        diff[;42] 6
-36
        diff[42] 6
36

Which notation to use is a matter of coding style.

VerbProjection

A binary verb can also be projected onto its left argument,although the notation may take some getting used to. For example, theprojection of - onto its left argument is,

        (42-)6
36

A verb cannot be projected onto its right argument, sincethis would lead to notational ambiguity. For example,(-42) is theatom-42 and not a projection.

        (-42)
-42

If you really want to project onto the right argument of anoperator, you can do so by using the dyadic function form and juxtaposition ofthe argument.

        -[;42] 98
56

In fact, the whitespace is not necessary in this example.

        -[;42]98
56

We warned you about the notation.

MultipleProjections

When the original function has valence greater than two, itis possible to project onto multiple arguments simultaneously. For example,given,

        f:{x+y+z}

we can project f into its first and thirdarguments and end up with a monadic function,

        f[1;;3][5]
9

We arrive at the same result by taking the projection f[1;;]- now a dyadic function - and projecting onto its second argument to arrive atf[1;;][;3].

        f[1;;][;3][5]
9

This is equivalent to projecting in the reverse order,

        f[;;3][1;][5]
9

Note:Ifg is defined as a projection off and the definition of f is changed,g remains theprojection of the originalf.

        f:{[x;y] x-y}
        g:f[42;]
        g
{[x;y] x-y}[42;]
 
        g[6]
36
 
        f:{[x;y] x+y}
        g[6]
36

This can be seen by displaying g on the console,

        g
{[x;y] x-y}[42;]

Lists andFunctions as Maps

This section explores the deeper relationship between listsand functions. While it can be skipped on first reading by the mathematicallyfaint of heart, that would be like not eating your vegetables when you were akid.

Similarityof Notation

You have no doubt noticed that the notation for listindexing is identical to that for function evaluation. That is,

        L:(0 1 4 9 16 25 36)
        f:{[x] x*x}
        L[2]
4
        f[2]
4
        L 5
25
        f 5
25
        L 3 6
9 36
        f 3 6
9 36

This is not an accident. In Creating Typed Empty Listswe saw that a list is a map defined by means of the implicit input-outputcorrespondence given by item indexing. A function is a map defined by asequence of expressions representing the algorithm used to obtain an outputvalue from the input parameters. For consistency, the two different mechanismsfor implementing a map do have the same notation. It may take a little time toget accustomed to the rationality of q.

Item-wiseExtension of Atomic Functions

With the interpretation of lists and functions as maps, wecan motivate the behavior of list indexing and function application when asimple index or atomic parameter is replaced by a simple list of the same.Specifically, we are referring to,

        L[2 5]
4 25
        f[2 5]
4 25

in the previous examples. The expression enclosed inbrackets is a simple list, call itl. Viewing the listI as amap, the two expressions are the composition ofL and I, andthe composition of f and I,

        L[2 5]   is   (L[2]; L[5])
 
        f[2 5]   is   (f[2]; f[5])

For a general list L, function f and itemindex list I, the compositions are,

        L ◦ I(j) = L(i,,j,,)
 
        f ◦ I(j) = f(i,,j,,)

Indexingat Depth and Ragged Arrays

Next, we show the deeper correspondence between listindexing and multivalent function evaluation. Notationally, a nested list is alist of lists, but it can also be viewed functionally as a compact form of theinput-output relationship for a multivariate map. This mapping transformstuples of integers onto the constituent atoms of the list and has valence equalto one plus the level of nesting of the list.

For example, a list with no nesting is a monadic map ofintegers to its atoms via item indexing.

        L1:(1;2h;`three;"4")
        L1[3]
"4"

A list with one level of nesting can be viewed as anirregular (or ragged) array by laying its rows out one above another. Forexample, the listL2 specified as,

        L2:((1b;2j;3.0);(4.0e;`five);("6";7;0x08;2000.01.10))

can be thought of as a ragged array. The console displaydoes just this,

        L2
(1b;2j;3f)
(4e;`five)
("6";7;0x08;2000.01.10)

This representation of a ragged array is a generalizationof the I/O table for monadic maps. From this perspective, indexing at depth isa function whose output value is obtained by indexing into the ragged array viaposition. In other words, the output value L2[i;j] is the jthelement of the ith row,

        L2[1;0]
4.0e

This motivates the interpretation of L2 as dyadicmap over a sub-domain of the two-dimensional Cartesian product of non-negativeintegers and with range equal to the atoms ofL2. The duplei,jis mapped positionally, analogous to simple item indexing.

Advanced:It is possible create a ragged array of any number of columns using 0N as thenumber of rows with the reshape operator (# ).

        0N 3#til 10
0 1 2
3 4 5
6 7 8
,9

Projectionand Index Elision

You may have also noticed that the notations of functionprojection and elided indices in a list are identical. Revisiting the exampleof elided indices we used inNesting,

 #!q
        L :((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the"))

Define the list L1 by eliding the first and lastindex as,

        L1:L[;1;]
        L1
4 5 6 7
`z`y`x`
"is"

Viewing L as a map of valence three whose outputvalue is obtained by indexing at depth, this makesL1 the projectionofL onto its second argument. From this perspective,L1 is adyadic map that retrieves values from a sub-list,

        L1[1;2]
`x

Out ofBounds Index

The previous discussion also motivates the explanation forthe behavior of item indexing in case an "out of bounds" index ispresented. In verbose languages, this would either result in some sort of error- the infamous indexing off the end of an array in C‚ - or an exception in Javaand C#.

By viewing a list as a function defined on a sub-domain ofintegers, it is reasonable to extend the domain of the function to all integersby assigning a null output value to any input not in the original domain. Inthis context, null should be thought of as "missing value." This isexactly what happens.

In the following examples, observe that the type of nullreturned matches the item type for simple lists and is0N for ageneral list

        L1:1 2 3
        L1[-1]
0N
        L2:100.1 200.2 300.3 400.4
        L2[100]
0n
        L3:"abcde"
        L3[-1]
" "
        L4:1001101b
        L4[7]
0b
        L5:(1;`two;3.0e)
        L5[5]
 
0N

CreatingStrings from Data

As mentioned earlier, q strings are simple lists of char,which play a role similar to strings in verbose languages. It is possible toconvert data into strings, akin to the toString() method in O-O languages.

The function string can be applied to any q entityto produce a textual representation suitable for display or use in externalcontexts such as text editors, Excel, etc. In particular, thestringresult does not contain any q formatting information. Also, note that theresult ofstring is always a list of char. Following are someexamples.

        string 42
"42"
        string 6*7
"42"
        string 42422424242j
"42422424242"
        string `Zaphod
"Zaphod"

See Appendix A for more details on string.

Adverbs

Syntactically q has nouns, verbs and adverbs. Data entitiessuch as atoms, lists, dictionaries and tables are nouns. Functions are alsonouns. Primitive symbol operators and operations expressed in infix notationare verbs. For example, in the expression,

        c:a+b

a, b and care nouns, while : and + are verbs. On the other hand, in

        c:+[a;b]

a, b, cand + are nouns, while : is a verb.

An adverb is an entity that modifies a verb orfunction to produce a new verb or function whose behavior is derived from theoriginal.

The following adverbs are available in q.

Symbol

Name

'

each both

each

each monadic

/:

each right

\:

each left

/

over

\

scan

':

each previous

Note:The character that represents each is the single quote (' ) which isdistinct from the back-tick (` ) used with symbols.

each-both(')

Loosely speaking, the adverb each-both (') modifies a verbor function by applying its behavior item-wise to corresponding list elements.This concept is similar to the manner in which an atomic verb or function isextended to lists.

Important:There cannot be any whitespace between' and the verb itmodifies.

Perhaps the most common example of each is join-each ( ,') which concatenates two lists item-wise. In its base form, join takes twolists and returns the result of the second appended to the first.

        L1:1 2 3 4
        L2: 5 6
        L1,L2
1 2 3 4 5 6

Two lists of the same count can be joined item-wise to formpairs.

        L3:100 200 300 400
        L1,'L3
1 100
2 200
3 300
4 400

As in the case of item-wise extension of atomic functions, thetwo arguments must be of the same length, or either can be an atom.

        L1,'1000
1 1000
2 1000
3 1000
4 1000
 
        `One,'L1
`One 1
`One 2
`One 3
`One 4
 
        "a" ,' "z"
"az"

When both arguments of a derived function are atoms, theadverb has no effect.

        3,'4
3 4

Advanced:A useful example of join-each arises when both arguments are tables. Since atable is a list of records, it is possible to apply join-each to tables withthe same count. The item-wise join of records results in a sideways join of thetables.

 
        t1:([] c1:1 2 3)
        t2:([] c2:`a`b`c)
        t1
c1
__
1
2
3
 
        t2
c2
__
a
b
c
 
        t1,'t2
c1 c2
-------
1  a
2  b
3  c

Monadiceach

There is a form of each that applies to monadic functionsand unary operators. It applies a (non-atomic) function to each element of alist. Monadic each can be notated in two equivalent ways for a monadic functionf,

        f each
 
        each[f]

The latter form underscores the fact that eachtransforms a function into a new function.

        reverse each (1 2;`a`b`c;"xyz")
2 1
`c`b`a
"zyx"
 
        each[reverse] (1 2;`a`b`c;"xyz")
2 1
`c`b`a
"zyx"

The transform is arguably more readable when the base operationis a projection.

        (1#) each 1001 1002 1004 1003
1001
1002
1004
1003
        each[1#] 1001 1002 1004 1003
1001
1002
1004
1003

Observe that the result of the last example can also beobtained with enlist.

        enlist each 1001 1002 1004 1003
1001
1002
1004
1003
 
        flip enlist 1001 1002 1004 1003
1001
1002
1004
1003

The last expression executes fastest for long lists.

each-left(\:)

The each-left adverb \: modifies the base functionso that it applies the entire second argument to each item of the firstargument.

Important:There cannot be any whitespace between\: and the verb itmodifies.

To append a given string to every string in a list,

        ("Now";"is";"the";"time") ,\: ", "
"Now, "
"is, "
"the, "
"time, "

each-right(/:)

The each-right adverb /: modifies the basefunction so that it applies the entire first argument to each item of thesecond argument.

Important:There cannot be any whitespace between/: and the verb itmodifies.

To prepend a given string to every string in a list,

        " ," ,/: ("Now";"is";"the";"time")
" ,Now"
" ,is"
" ,the"
" ,time"

CartesianProduct (,/:\:)

To achieve a Cartesian (cross) product of two lists, beginwith join-right ,/: and modify it with each-left. The net effect is tojoin every item of the first argument with every element of the secondargument.

        L1:1 2
        L2:`a`b`c
        L1,/:\:L2
1 `a 1 `b 1 `c
2 `a 2 `b 2 `c

There is an extra level of nesting that can be eliminatedwith raze.

        raze L1,/:\:L2
1 `a
1 `b
1 `c
2 `a
2 `b
2 `c

You can also begin with join-left ,\: and modifyit with each-right.

        raze L1,\:/:L2
1 `a
2 `a
1 `b
2 `b
1 `c
2 `c

Observe that the orders of the resulting items for ,/:\:and for ,\:/: are transposed.

Note:Cartesian product is also encapsulated in the functioncross.

        L1 cross L2
1 `a
1 `b
1 `c
2 `a
2 `b
2 `c

Over (/)

The over adverb / modifies a base dyadic functionso that the items of the second argument are applied iteratively to the firstargument.

Important: Therecannot be any whitespace between / and the function it modifies.

To add multiple items to another entity,

        L:100 200 300
        ((L+1)+2)+3
106 206 306
 
        L+/1 2 3
106 206 306
 
        0+/10 20 30                / easy way to add a list
60

To raze a list,

        L1:(1; 2 3; (4 5; 6))
        (),/L1
1
2
3
4 5
6

To use your own function,

        f:{2*x+y}
       100 f/ 1 2 3
822

Advanced:To delete multiple items from a dictionary,

        d:1 2 3!`a`b`c
        d _/1 3
2| b

Scan (\)

The scan adverb \ modifies a base dyadic functionso that the items of the right operand are applied cumulatively to the leftoperand.

Important:There cannot be any whitespace between\ and the function itmodifies.

To find running sums,

        100+\1 2 3
101 103 106
        0+\10 20 30        / easy way to find running sums of list
10 30 60

To use your own function,

        f:{2*x+y}
        100 f\ 1 2 3
202 408 822

each-previous(':)

The each-previous adverb ': modifies a base dyadicfunction so that each item of the right operand is applied to its predecessor.The left operand of the adverb is taken as the predecessor for the initialitem.

Important:There cannot be any whitespace between ': and the function it modifies.

To find the running 2-item sum with 0 before the initialitem,

        0+':1 2 3 4 5
1 3 5 7 9

More interesting is to determine the positions where itemsincrease in value.

        0w>':8 9 7 8 6 7
010101b
        -0w>':8 9 7 8 6 7
110101b

The left operand controls the initial result. The firstexpression results in initial 0b for all numeric lists, while the secondresults in initial 1b. Why?

Verb Formsof Indexing and Evaluation

We are familiar with the syntactic forms of indexing andfunction application using either square brackets or juxtaposition.

        L:(1 2;3 4 5; 6)
        L[0]
1 2
        L[0 2]
1 2
6
        L 0 2
1 2
6
        L[1;2]
5
 
        f:{x*x}
        f[0]
0
        f[0 2]
0 4
        f 0 2
0 4
        g:{x+y}
        g[1;2]
3

There are equivalent verb forms for indexing and functionapplication. The verb forms are read "index" or "apply"depending on the context.

Verb @

The verb @ takes a list or a unary function as itsleft operand and a list of indices or a list of arguments as its right operand.For a list operand,@ returns the items specified by the right operand- i.e., indexing at the top level. For a function operand,@ returnsthe result of applying the function to the arguments item-wise.

With L and f as above,

       L@0
1 2
        L@0 2
1 2
6
        f@0
0
        f@0 4
0 16

The evaluation of a niladic function with @requires an arbitrary scalar argument.

        fn:{6*7}
        fn[]
42
        fn@0N
42

Advanced:The verb@ also applies to dictionaries, tables and keyed tables. Fordictionaries and keyed tables it performs lookup. Since a table is a list ofrecords, it indexes records.

        d:`a`b`c!10 20 30
        d@`b
20
 
        t:([]c1:1 2 3; c2:`a`b`c)
        t@1
c1| 2
c2| b
 
        kt:([k:`a`b`c]f:1.1 2.2 3.3)
        kt@`c
f| 3.3

Verb Dot(.)

The verb . takes a list or a multivalent functionas its left operand and a list of indices or a list of arguments as its rightoperand. For a list left operand, verb. returns the result ofindexing the list at depth as specified by the right operand. For a functionleft operand, verb. returns the result of applying the function tothe arguments.

Important:Verb. must be separated from its operands by whitespace if they are namesor literal constants.

With L and g as above,

        L . 1 2
5
        g . 1 2
3

The verb . evaluates functions of any valence. This isuseful when the function or arguments are supplied programmatically and thevalence cannot be known beforehand.

Note:The right argument of. must be a list.

        f . 4
'type
        f . enlist 4
16

Use the null item :: to elide an index when usingverb . to index at depth.

        m:(1 2 3;4 5 6)
        m[;1]
2 5
        m . (::;1)
2 5

Evaluating a niladic function with . requires a singletonoperand, which is arbitrary.

        fn:{6*7}
        fn[]
42
        fn . enlist 0N
42

Advanced:Verb. provides a generalization of indexing at depth for complex entitiescomprised of general lists, dictionaries, tables and keyed tables. Perhaps theeasiest way to understand its action is to view all such entities as compositemappings. Verb. evaluates the composite map by iteratively applying indexing/lookupon each item of the right operand to the result of the previous step.

The use of verb . in the first following complexis list indexing in all positions; in the second, the middle item is a lookup.

        L1:(1;2 3;(4; 5 6))
        L1 . 2 1 1
6
 
        L2:(1;2 3;`a`b!(4;5 6))
        L2 . (2;`b;1)
6

In the following complex dictionary, the first use of verb .yields lookup followed by indexing, whereas the second use is two lookups.

        dd:`a`b`c!(1 2;1.1 2.2 3.3;`aa`bb!10 20)
        dd . (`a;1)
2
        dd . (`c`bb)
20

Because a table is a list of records, verb .indexes a record on the first item and then performs a field lookup on thesecond.

        t:([]c1:1 2 3;c2:`a`b`c)
        t . (1;`c2)
`b

Because a keyed table is a dictionary mapping between twotables, verb . performs key lookup on the first item and then a fieldlookup on the second.

        kt:([k:`a`b`c]f:1.1 2.2 3.3)
        kt . `b`f
2.2

FunctionalForms of Amend

The functions @ and . can be used withvalence three or four to apply any function to an indexed sublist and anoptional second argument. The fact that the list can be a table that may bestored on disk makes this very powerful.

Apply (@)for Dyadic Functions

The general form of functional @ for dyadicfunctions is,

        @[L;I;f;y]

While the notation is suggestive of lists, in fact Lcan be any mapping with explicit domain such as a list, dictionary, table,keyed table or open handle to a table on disk. ThenI is a list ofitems in the domain of the map,f is a dyadic function andyis an atom or list conforming to I. When L is a list, theresult is the item-wise application to the items ofL,indexed atthe top level by I, of f and the parametery.Over the subdomainI, the map output becomes,

        L[I] f y                / written as binary verb
 
        f[L[I];y]                / written as dyadic function

Or, using verb @ for indexing,

        (L@I) f y              / written as binary verb
 
        f[L@I;y]               / written as dyadic function

For example, to add 42 to certain items in a list,

        L:100 200 300 400
        I:1 2
        @[L;I;+;42 43]
100 242 343 400

To replace these items,

        @[L;I;:;42 43]
100 42 43 400

Observe that the argument L is unchanged,

        L
100 200 300 400

In order to change the list argument, it must be referencedby name.

        @[`L;I;:;42]                / update L
`L
        L
100 42 42 400

Note:The result of functional amend with a reference by name is a symbol containingthe name of the entity affected, not to be confused with an error message.

Advanced:As mentioned previously,L can be a dictionary, a table, or even an open handle to a table ondisk. In the general case, the resultf[L@I;y] is applied alongthe subdomain.

        d:`a`b`c!10 20 30
        @[d;`a`c;+;9]
a| 19
b| 20
c| 39
        t:([] c1:`a`b`c; c2:10 20 30)
        @[t;0;:;(`aa;100)]
c1 c2
------
aa 100
b  20
c  30

Apply (@)for Monadic Functions

The general form of functional @ for a monadicfunction is,

        @[L;I;f]

Again the notation is suggestive of lists, but Lis any map with explicit domain,I is a list of items in the domain ofL, and f is a monadic function. WhenL is a list, the resultis the item-wise application of f to the items ofL indexed at the toplevel by I. Over the subdomain I, the map output becomes,

        f L[I]                / written as unary verb
 
        f[L[I]]                / written as mondaic function

Or, using the verb form of @,

 
        f[L@I]

For example,

        L:101 102 103
        I:0 2
        @[L;I;neg]
-101 102 -103

Advanced:In the general case, the resultf[L@I]is applied along the subdomain.

        d:`a`b`c!10 20 30
        @[d;`a`c;neg]
a| -10
b| 20
c| -30

Dot (.)for Dyadic Functions

The general form of functional . for dyadicfunctions is,

        .[L;I;f;y]

Again the notation is suggestive of lists, but Lis a mapping with explicit domain,I is a list in the domain ofL,f is a dyadic function andy is an atom or list of the propershape. For a list, the result is the item-wise application to the items ofLindexed at depth byI, of f and the parameter y.Over the subdomain I, the map output becomes,

        (L . I) f y            / binary operator
 
        f[L . I;y]             / dyadic function

For example, to add along a sublist,

        L:(100 200;300 400 500)
        I1:1 2
        I2:(1;0 2)
        .[L;I1;+;42]
100 200
300 400 542
        .[L;I2;+;42 43]
100 200
342 400 543

To replace the same item,

       .[L;I2;:;42 43]
100 200
42 400 43

Observe that the argument L is not modified.

        L
100 200
300 400 500

In order to change L, it must be referenced by name.

        L:(100 200;300 400 500)
        .[`L;I1;:;42]                / update L
`L
        L
100 200
300 400 42

Note:The result of functional amend with a reference by name is the name of theentity affected, not an error message.

Advanced:In the general case, the result f[L . I;y] is applied along the subdomain.

        d:`a`b`c!(100 200;300 400 500;600)
        .[d;(`b;1);+;42]
a| 100 200
b| 300 442 500
c| 600

4.9.4 Dot(.) for Monadic Functions

The general form of functional . for a monadicfunction is,

        .[L;I;f]

Again the notation is suggestive of lists, but Lis any map with explicit domain,I is a list in the domain ofL,and f is a monadic function. For a list, the result is the item-wiseapplication off to the items ofL indexed at the depth levelby I. Over the subdomainI, the map output becomes,

        f[L . I]

For example,

        L:(100 200;300 400 500)
        I:1 2
        .[L;I;neg]
100 200
300 400 -500

Advanced:In the general case, the result f[L . I] is applied along the subdomain.

        d:`a`b`c!(100 200;300 400 500;600)
        .[d;(`b;1 2);neg]
a| 100 200
b| 300 -400 -500
c| 600

 

 

Contents

6. Castingand Enumerations

Types andCast

Casting manifests the malleability of data. In some cases,such as changing a string to a symbol, this is obvious and straightforward.Converting a char to its underlying ASCII code or converting an datetime to afloat require a little more consideration. Enumerations also fit into the castpattern.

BasicTypes

Every atom has both an associated numeric and symbolic datatype. For convenience we repeat the data types table fromatoms.

type

type symbol

type char

type num

boolean

`boolean

b

1h

byte

`byte

x

4h

short

`short

h

5h

int

`int

i

6h

long

`long

j

7h

real

`real

e

8h

float

`float

f

9h

char

`char

c

10h

symbol

`

s

11h

month

`month

m

13h

date

`date

d

14h

datetime

`datetime

z

15h

minute

`minute

u

17h

second

`second

v

18h

time

`time

t

19h

type

The monadic function type can be applied to anyentity in q to find its (numeric) short data type. It is a quirk of q that thedata type of atoms is a short with thenegative of the value in thefourth column above;

        type 42
-6h
        type 1b
-1h
        type 4.2
-9h
        type 4h
-5h
        type `42
-11h
        type "4"
-10
        type 2007.04.02
-14h

Observe that infinities also carry a type.

        type 0W
-6h
        type -0w
-9h

The type of a simple list is a short containing the positivevalue of the type of its constituent atoms.

        type 1 2 3
6h
        type "abc"
10h
        type 1 2 3f
9h

The type of any general list is 0.

        type (1;2h;3j)
0h
        type (1;2;(3 4))
0h
        type (`1;"2";3)
0h

Type of aVariable

How q handles the type of a variable may be confusing tothose coming from verbose languages. In many typed languages, the variable'stype must be specified before the variable is assigned a value - that is, whenit is declared. In q, a variable is assigned without declaration. The variablecan subsequently be reassigned a new value of a different type.

        a:42
        type a
-6h
 
        a:98.6
        type a
-9h

This can be understood by considering that q considers avariable to be a name (symbol) associated with a value. The association is madeupon assignment. A variable has the type of the value associated with its name.

In the example at hand, a variable with name 'a' is createdwhen the initial assignment is made. Since this is the first time that the name'a' is assigned, the q interpreter creates an entry for 'a' in its dictionaryof variable names and associates it with the int value 42. On the second assignment,there is already an entry for 'a' in the dictionary, so this name is simplyre-associated with the float value 98.6.

When you ask q for the type of a variable, it returns thetype of the value associated with the variable's name. Thus, when you reassignthe variable, the type of the variable reflects the type of its new value.

Cast ($)

As in verbose languages, it is possible to cast an entityfrom one type to another, provided the underlying values are compatible. Such acast informs the compiler that you want it to consider the variable to be ofthe specified type for subsequent operations. Such a cast may result in acompile-time or run-time error if it can not be performed.

The q cast operator, denoted $, is a binary verb that isatomic in its right operandsource value, and whose left operand is thetargettype. The target can be represented in any of three type designators inthe table ofBasic Types.

  • The type's (positive) numeric short value
  • A char type value
  • A type name symbol

First, examples using the numeric type.

        5h$42
42h
        6h$4.2
4

This form is useful when the target type is obtainedprogrammatically using thetype function.

It is arguably more readable to use the type's char in acast.

        "i"$4.2
4
        "x"$42
0x2a
        "d"$2004.04.02T04:02:24.042
2004.04.02

The most readable (but longest) form uses the symbolic typename.

        `int$4.2
4
        `short$42
42h
        `date$2004.04.02T04:02:24.042
2004.04.02

The result of casting between superficially distinct typescan be uncovered by considering the underlying numeric values. Chars correspondto their underlying ASCII sequence; dates to their offset from Jan 1, 2000; andtimes to their count of milliseconds.

        "c"$0x42
"B"
        `date$42
2000.02.12

Because cast is atomic in its right operand, it is extendeditem-wise to a list.

        "x"$(10 20 30;255)
0x0a141e
0xff

Cast is also atomic in its left operand.

        5 6 7h$42
42h
42
42j

Advanced:When integral infinities are cast to integers of wider type, they areconsidered to be their underlying bit patterns. Since these bit patterns arelegitimate values for the wider type, the cast results in a finite value.

        "i"$0Wh
32767
        "i"$-0Wh
-32767
        "j"$-0W
-2147483647j
        "j"$0W
2147483647j

CreatingSymbols from Strings

Casting from a string (i.e., a list of char) to a symbol isa convenient way to create symbols. It is the preferred way to create symbolswith embedded blanks or other special characters. To cast a char or a string toa symbol, use the empty symbol (` ) as the target domain.

        `$"z"
`z
        `$"Zaphod Beeblebrox"
`Zaphod Beeblebrox
        `$("Life";"the";"Universe";"and";"Everything")
`Life`the`Universe`and`Everything

Cast is atomic in both operands.

A string is trimmed as part of the cast.

        `$"   abc   "
`abc
        string `$"   abc   "
"abc"

ParsingStrings to Data

Cast can also be used to parse data from a string by usingan upper case type char in the left argument.

        "I"$"4267"
4267
        "T"$"23:59:59.999"
23:59:59.999

Date string parsing is flexible with respect to the formatof the date.

        "D"$"2007-04-24"
2007.04.24
        "D"$"12/25/2006"
2006.12.25
        "D"$"07/04/06"
2006.07.04

CoercingTypes

Casting can be used to coerce type-safe assignment. Recallthat assignment into a simple list must strictly match the type.

        c:10 20 30 40
        c[1]:42h
`type

This situation can arise when the list and the assignmentvalue are created dynamically. You can coerce the type by casting it to that ofthe target.

        c[1]:(type c)$42h
        c
10 42 30 40
 
        c[0 1 3]:(type c)$(1.1; 42j; 0x2a)
        c
1 42 30 42

CreatingTyped Empty Lists

We met the empty list in lists. Observe that it has type 0h,meaning that is a general list whose elements have no specific type,

        type ()
0h

This empty list can be considered as the degenerate case ofa general list, so we call it thegeneral empty list. In situationswhere type enforcement is desired, it is necessary to have an empty list with aspecific type. Casting the general empty list using a symbolic type name makesthis clear.

        L1:`int$()
        type L1
6h
        L2:`float$()
        type L2
9h
        L3:`$()
        type L3
11h

A typed empty list is the degenerate case of a simple listof the specified type. This is useful because type matching is enforced whenyou append items.

        L1,:4.2
'type
        L1,:42
        L1
,42

Enumerations

We have seen that the dyadic casting operator ( $) transforms its right operand into a conforming entity of type specified bythe left operand. In the basic operation, the left operand can be a char typeabbreviation, a type short, or a symbol type name. In this section, casting isextended to user-defined target domains, providing a functional version ofenumerated types.

TraditionalEnumerations

To begin, recall that in some verbose languages, anenumerated type is a way of associating a series of names with a correspondingset of integral values. Often the sequence of numbers is consecutive and beginswith 0. The specific set of names/values is called the domain of the enumeratedtype and its name identifies the enumeration.

A traditional enumerated type serves multiple purposes.

  • It allows a descriptive name to be used instead of an arbitrary number - e.g., 'blue' instead of 3.
  • It permits strong type checking to ensure that only permissible values are supplied - i.e., choosing a named color from a list instead of remembering a number is less prone to error.
  • It can provide name spaces, meaning the same name can be reused in different domains without fear of confusion - e.g., color.blue and mood.blue.

There is a subtler, more powerful use: an enumerationnormalizes data.

Data Normalization

Broadly speaking, data normalization seeks to eliminateduplicates and retain the minimum amount of data. Suppose you know that youwill have a list—in either the colloquial or q sense—of text entries taken froma fixed and reasonably short set of values. Storing a long list of such stringsverbatim presents two problems.

  • Values of variable length complicate storage management for the list
  • There is potentially much duplication of data in the list arising from repeated values

An enumeration solves both problems.

To see how, we start with the case of a q list vcontaining arbitrary symbols representing character values. Letu bethe unique values inv. This is achieved with the distinctfunction (SeeAppendix A for a detaileddescription).

        u:distinct v

Let's try a simple example.

        v:`c`b`a`c`c`b`a`b`a`a`a`c
        u:distinct v
        u
`c`b`a

Observe that order of the items in u is the orderof their first appearances inv.

Now consider a new list k that represents thepositions in u of each of the items inv. This is achievedwith the find (?) operator (SeeFind).

        k:u?v
        k
0 1 2 0 0 1 2 1 2 2 2 0

Then we have,

        u[k]
`c`b`a`c`c`b`a`b`a`a`a`c
 
        v~u[k]
1b

We observe that u and k indeed normalizethe data of v. In general,v will have many repetitions ofeach of the underlying values, butu stores each value once. Changingan underlying value requires only one operation in the normalized version butpotentially many updates to the non-unique list.

Extra credit for recognizing that v is simply thecomposite map u◦k. Effectively, we have factored the non-unique listvthrough the unique listu via the index map k.

        v = u◦k

Why would we want to do this? Easy: compactness and speed.

Advanced:Let's say that the count ofu isa and the maximum width (in the colloquial sense) of thesymbols inu isb. For a list v of variable count x,the amount of storage required is potentially

        b*x

For the factored form, the storage is known to be

        a*b+4*x

which represents the fixed amount of storage for uplus the variable amount of storage for the simple integer listk. Ifais small and b is even moderately large, the factorization issignificantly smaller.

This can be seen by comparing the sizes of v, uand k in a slightly modified version of our example.

        v:`ccccccc`bbbbbbb`aaaaaaa`ccccccc`ccccccc`bbbbbbb
        u:distinct v
        u
`ccccccc`bbbbbbb`aaaaaaa
 
        k:u?v
        k
0 1 2 0 0 1

Now imagine v and k to be much longer.

Reading and writing the factored index list from/to disk isa block operation that will be very fast.

Assuming that items of v are symbols stored in ahash-table, item indexing in the un-factored list requires looking up eachsymbol. Indexing into the factored list can be done directly via position sinceit is a uniform list of integers. This will be faster.

Enumerations

Enumeration encapsulates the above factorization of anarbitrary list of symbols through a list of unique values. An enumeration usesthe binary cast operator ($) and is a generalization of the basic cast betweentypes.

The general form of an enumerated value is,

        `u$v

where u is a simple list of unique symbol valuesand v is either an atom inu or a list of such. The projection`u$ is theenumeration,u is the domain of theenumeration and `u$v represents theenumerated value(s).

Under the covers, applying the enumeration `u$ toa vector v actually factorsv throughu as in theprevious section. The resulting index listk is stored internally andthe lookup is performed automatically.

5.3.4Working with an Enumeration

We recast our factorization example as an enumeration,

        u:`c`b`a
        v:`c`b`a`c`c`b`a`b`a`a`a`c
        ev:`u$v
        ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c

While the display of the enumeration ev shows thevalues of v within the domainu, only the implicit int indexlist is actually stored.

The enumeration ev acts just like the original v.

        v[3]
`c
 
        ev[3]
`u$`c
 
        v[3]:`b
        v
`c`b`a`b`c`b`a`b`a`a`a`c
 
        ev[3]:`b
        ev
`u$`c`b`a`b`c`b`a`b`a`a`a`c
 
        v=`a
001000101110b
 
        ev=`a
001000101110b
 
        v in `a`b
011101111110b
 
        ev in `a`b
011101111110b

Note:While the enumeration is item-wise equal to - and can be freely substituted for- the original, they arenot identical.

        v=ev
111111111111b
 
        v~ev
0b

The find operator ( ? ) can be used with anenumeration to locate the first position of specific values.

        v?`a
2
        ev?`a
2

The function where can be used to find alloccurrences of a specific value.

        where v=`a
2 6 8 9 10
 
       where ev=`a
2 6 8 9 10

Updatingan Enumeration

The normalization provided by an enumeration reducesupdating all occurrences of a value into a single operation. This can havesignificant performance implications for large lists with many repetitions.

With u, v and e as above,

        u[1]:`x
        ev
`u$`c`x`a`c`c`x`a`x`a`a`a`c
 
        v
`c`b`a`c`c`b`a`b`a`a`a`c

To make the equivalent update to v, it isnecessary to change every occurrence.

        v[where v=`b]:`x
        v
`c`x`a`c`c`x`a`x`a`a`a`c

Appendingto an Enumeration

One situation in which an enumeration is more complicatedthan working with the denormalized data is when you want to add a new value.Continuing with the example above, appending a new item tov is ssingle operation but this is not the case for the corresponding enumerationev.

        u:`c`b`a
        v:`c`b`a`c`c`b`a`b`a`a`a`c
        ev:`u$v
        v,:`d
        v
`c`b`a`c`c`b`a`b`a`a`a`c`d
 
        ev,:`d
'cast

What went wrong? The new value must first be added to theunique list.

        u,:`d
        ev,:`d
        ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c`d

You may have already recognized that this presents acomplication in practice. Because you may not know whether the value you areappending tov is already inu, in order to maintain uniquenessin u you must test this before appending.

Fortunately, q has anticipated this situation. When dyadic ?is used with thename of a (simple) list of symbols as its left argumentand a symbol as its right argument, it appends the symbol to the list if andonly if it is not an item in the list.

        u
`c`b`a`d
 
        `u?`a
`u$`a
 
        u
`c`b`a`d
 
        `u?`e
`u$`e
 
        u
`c`b`a`d`e

If you wish to append items to an enumerated valueprogrammatically, simply add to the unique list using? beforeappending to the enumerated value.

        u:`c`b`a
        v:`c`b`a`c`c`b`a`b`a`a`a`c
        ev:`u$v
 
        `u?`e
`u$`e
        ev,:`e
 
        u
`c`b`a`e
 
        ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c`e

Resolvingan Enumeration

If you are given an enumerated value, you can recover theoriginal value by applyingvalue. In our example,

        ev
`u$`c`b`a`c`c`b`a`b`a`a`a`c
 
        value ev
`c`b`a`c`c`b`a`b`a`a`a`c

Type of anEnumeration

Each enumeration is assigned a new numeric data type, beginningwith 20h. If you start a new q session and load no script files, you willobserve the following.

        u1:`c`b`a
        u2:`2`4`6`8
        u3:`a`b`c
        u4:`c`b`a
 
        type `u1$`c`a`c`b`b`a
20h
        type `u1$`a`a`b`b`c`c
20h
        type `u2$`8`8`4`2`6`4
21h
        type `u3$`c`a`c`b`b`a
22h
        type `u4$`c`a`c`b`b`a
23h

Note:Enumerations with distinct domains are distinct, even when the domains match.

        u1~u4
1b
        v:`c`a`c`b`b`a
        (`u1$v)~`u4$v
0b

 

 

 

Contents

[hide]

7. Dictionaries

Overview

Dictionaries are a generalization of lists and provide thefoundation for tables. A dictionary is a (mathematical) mapping defined by anexplicit I/O association between a domain list and range list. The two listsmust have the same count and the domain list should be a unique collection.While general lists can be used to create a dictionary, many usefuldictionaries involve lists of special forms. The domain is frequently acollection of symbols representing names. As we shall see, a dictionary whosedomain is a unique list of symbols and whose range is rectangular correspondsto a table.

DictionaryBasics

A dictionary is an ordered collection of key-value pair -that is, a hashtable in verbose languages.

Definition

A dictionary, also called an association, isa mapping defined by an explicit I/O association between a domain list and arange list via positional correspondence. The creation of a dictionary uses thexkey primitive (! ),

Ldomain!Lrange

Recall from Mathematical Functions Refresher the viewof a map's I/O table as a pair of input and output columns. Dictionary notationis simply the map's I/O table turned on its side for ease of entry andcompactness of display.

Note:All dictionaries have type 99h.

The domain list comprises the keys of the dictionaryand the range list itsvalues. The keys of a dictionary are retrieved bythe unary primitivekey and the values by the unary primitive value.The count of the dictionary is the (common) count of its keys andvalues.

Note:Although q does not enforce the requirement that the key items are unique, adictionary does provide a unique output value for each input value, thusguaranteeing a well-defined mathematical map. See below for details.

The most basic dictionary maps a simple list to a simplelist. The following I/O table represents a mapping of three symbols containingnames to the corresponding individual's intelligent quotient,

I

O

`Dent

98

`Beeblebrox

42

`Prefect

126

This mapping is defined compactly as a dictionary.

        d1:`Dent`Beeblebrox`Prefect!98 42 126
        count d
3
        key d
`Dent`Beeblebrox`Prefect
 
        value d
98 42 126

The console displays a dictionary I/O table in columnarform.

        d
Dent           | 98
Beeblebrox | 42
Prefect       | 126

The function cols also returns the domain.

        cols d1
`Dent`Beeblebrox`Prefect

Note:The order of the items in the domain and range lists is significant, just aspositional order is significant for lists. Although the I/O assignments and theassociated mappings are equivalent regardless of order, differently ordereddictionaries arenot identical.

        d1:`Prefect`Beeblebrox`Dent!126 42 98
        d~d1
0b

Lookup

Finding the dictionary output value corresponding to aninput value is called looking up the input. This actually is achievedvia a hash-table lookup under the covers. Similar to functions and lists, bothd[x]andd x lookup the output value for x.

        d[`Beeblebrox]
42
        d `Beeblebrox
42

As with item indexing, lookup of a key not in the domain ofa dictionary results in an appropriately typed null value,not an error.

        d[`Slartibartfast]
0N

As with lists and functions, key lookup in a dictionary isextended item-wise to a simple list of keys.

        d[`Dent`Prefect]
98 126

Advanced:We can interpret key list lookup as the composition of the key lookup map withthe item indexing map. Symbolically, letd be a dictionary and K a key listin the domain of d. Then for 0 ≤j < countK,

        d[K][j] = d[K[j]]

Using one of our examples,

        d:`Dent`Beeblebrox`Prefect! 98 42 126
        K:`Dent`Prefect
        d[K][1]
126
        d[K[1]]
126

Or, using the entire index list,

        d K
98 126
        d[K]
98 126

Dictionaryvs. List

A dictionary a generalization of a list in which itemindexing has been extended to a non-integral domain. In particular, adictionary cannot be indexed implicitly via position. Attempting this on anydictionary generates an error.

        d:"abcde"!1.1 2.2 3.3 4.4 6.5
        d["c"]
3.3
        d[0]
`type

We can define a dictionary whose lookup emulates themapping of list item indexing.

        L3:`one`two`three
        L3[1]
`two
        d3:0 1 2!`one`two`three
        d3[1]
`two

When we ask q to compare the two entities for equality, itobliges by considering both as mappings with integral domain. It then tests theassignments item-wise.

        L3=d3
0| 1
1| 1
2| 1

However, the dictionary so-specified is not the sameas the list.

        L3~d3
0b

Although retrieving items from a list-like dictionary isnotationally identical to item indexing, it is not the same. Item indexing is apositional offset, whereas dictionary retrieval is a lookup. They areimplemented differently under the covers.

Lookupwith Verb @

Recall that indexing into a list can be achieved with verb @.

        L:100 200 300
        L[1]
200
        L@1
200

The same syntax works for dictionary lookup.

        d:`a`b`c!10 20 30
        d[`b]
20
        d@`b
20

Uniquenessof Keys

We noted earlier that q does not enforce uniqueness in adictionary domain list. In the event of a repeated domain item, only the outputvalue associated with the first occurrence in left-to-right order is accessiblevia lookup. This guarantees that a dictionary provides a unique output for eachinput value and is thus a well-defined mathematical map.

For example,

        ddup:8 4 8 2 3 1!`one`two`three`four`five`six
        ddup[8]
`one

Advanced:Reverse lookup works properly for a non-unique domain.

        ddup?`three
8

Non-simpleDomain or Range

The range values of a dictionary are not required to beatoms. The range can be a general list that contains nested lists.

        dgv:(1;2h;3.3;"4")!(`one;2 3;"456";(7;8 9))
        dgv["4"]
7
8 9

Nor are keys are required to be atoms.

        dgk:(0 1; 2 3)!`first`second
        dgk[0 1]
`first
        dgk[2 3]
`second

Advanced:If the keys are not a list of items of uniform shape, lookup does not work in auseful way.

        dweird:(0 1; 2; 3)!`first`second`third
        dweird[0 1]
`first
        dweird[2]
`
        dweird[3]
`

The observed behavior is that key lookup fails at the firstkey of different shape.

Extractinga Sub-Dictionary by Key

Dictionary lookup on a key or a list of keys returns theassociated values. It is also possible to extract the key-value associationsusing the take operator (#). The left operand is alist of keys, theright operand is thesource dictionary and the result is a newdictionary whose mapping is that of the original restricted to the specifiedkeys.

       (enlist `c)#d
c| 30
 
        `a`c#d
a| 10
c| 30

This works when the keys are not simple.

        dns:(1 2; 3 4; 5 6)!("onetwo"; "threefour"; "fivesix")
        (1 2; 5 6)#dns
1 2| "onetwo"
5 6| "fivesix"

Operationson Dictionaries

Amend andUpsert

As with lists, the items of a dictionary can be modifiedvia indexed assignment.

        d:10 20 30!"abc"
        d[30]:"x"
        d
10| a
20| b
30| x

Important:In contrast to lists, dictionariescan be extended via index assignment.For example,

        d[40]:"y"
        d
10| a
20| b
30| x
40| y
 
        L:"abc"
        L[3]:"x"
'length

Let's examine this capability to modify or extend adictionary via index assignment more closely. Letd be a dictionary,cbe an atom whose type matches the domain ofd, and x an itemwhose type is compatible with the range ofd. The assignment,

        d[c]:x

updates the existing range value if c is in thedomain of d, but inserts a new entry at the end of the dictionary ifcis not in the domain ofd.

This insert/update behavior is called upsertsemantics. Because tables are essentially dictionaries, upsert semantics carrythrough to tables.

ReverseLookup with Find (?)

Recall that the dyadic primitive find ( ? )returns the index of the right operand in a list.

        1001 1002 1003?1002
1

Extending this concept to dictionaries means reversing thedomain-to-range mapping. We expect? to perform reverse lookup bymapping a range element to its domain element.

        d:`a`b`c!1001 1002 1003
        d?1002
`b

The result of find on an entity not in the range is a nullwhose type matches the domain list. For simple lists, the null matches the typeof the list; for general lists, the null is0N.

        d?1004        / the result is the null symbol `
`
        dg:(1;`a;"z")!10 20 30
        dg?50
0N

Note:For a non-unique range element, find returns thefirst item mapping toit from the domain list.

        d:`a`b`c`d!1001 1002 1003 1002
        d?1002
`b

RemovingEntries

The binary operation delete (_) returns the result ofremoving an entry from a dictionary by key value. The left operand of delete isthe dictionary (target) and the right operand is a key value whose typematches that oftarget.

Note: Whitespace isrequired to the left of _ if the first operand is a variable.

For example,

        d:1 2 3!`a`b`c
        d _2
1| a
3| c

Observe that attempting to remove a key that does not existhas no effect.

       d _42
1| a
2| b
3| c

The binary delete, also denoted by an underscore ( _), returns the result of removing multiple entries from a dictionary. The leftoperand of delete is a list of key values whose type matches that of thedictionary and the right operand is the dictionary (target). The resultis a dictionary obtained by removing the specified key-value pairs fromtarget.

Note:Whitespace is also required to the left of_ if the first operand isa variable.

Note:Since the left operand is required to be a list, a single key value must beenlisted.

For example,

        d:1 2 3!`a`b`c
        (enlist 2)_d
1| a
3| c
        1 3_d
2| b
        (enlist 42)_d
1| a
2| b
3| c

Attempting to remove a key that does not exist has noeffect.

        4 5_d
1| a
2| b
3| c

Observe that removing all the entries in a dictionaryleaves a dictionary with empty domain and range lists of the appropriate types.

        1 2 3_d

There binary operator cut is the same as ( _) on a dictionary.

        (enlist 2) cut d
1| a
3| c

Primitive Operations

Because dictionaries are maps, it is possible to composetheir mappings with function mappings to perform operations on dictionaries. Ofcourse, this assumes that the range of each dictionary is in the domain of theindicated operation, so that the operation makes sense. The application of aunary operator is straight-forward.

        d1:`a`b`c!1 2 3
        neg d1
a| -1
b| -2
c| -3
        2*d1
a| 2
b| 4
c| 6
        d1=2
a| 0
b| 1
c| 0

When the domains of two dictionaries are identical, performingbinary operations is straightforward. For example, to add two dictionaries witha common domain, add their corresponding range elements,

        d2:`a`b`c!10 20 30
        d1+d2
a| 11
b| 22
c| 33

How do we combine two dictionaries whose domains are notidentical? First, the domain of the resulting dictionary is the union of thedomains of its operands. For items in the intersection of the domain lists,clearly we should simply apply the indicated operation on the correspondingrange items.

The real question is, what to do on non-common domainitems? The answer: do what makes sense for the operation. We start with joiningtwo dictionaries.

Join

In the simple case of joining two disjoint dictionaries,the result should be the merge.

        d3:`e`f`g!100 200 300
        d1,d3
a| 1
b| 2
c| 3
e| 100
f| 200
g| 300
 
        d3,d1
e| 100
f| 200
g| 300
a| 1
b| 2
c| 3

Observe that although the mappings arising fromopposite-order joins have equivalent input-output assignments, the dictionariesare not identical because order is significant.

We examine another simple example of joining dictionarieswith a special form. The particular dictionaries map symbols to lists of simplelists. When the two are disjoint the result should again be the merge. Forexample,

        dc1:`a`b!(1 2 3; 10 20 40)
        dc2:(enlist `c)!enlist 10 20 30
        dc1,dc2
a| 1  2  3
b| 10 20 40
c| 10 20 30

As in the previous example, join simply appends the domainsand ranges in the obvious way. We shall refer to this case later.

Now we tackle the case of non-disjoint dictionaries. Theissue is how to merge items that are common to both dictionary domains, sincethese elements each have two I/O assignments.

Important:In a join of dictionaries, the right operand's I/O assignment prevails forcommon domain elements.

The result is another illustration of upsert semantics.Each I/O assignment of the right operand is applied as an update if the domainelement is assigned in the left operand, or as an insert if the domain elementis not already assigned.

With d1 as above,

        d3:`c`d!33 44
        d1,d3
a| 1
b| 2
c| 33
d| 44

Observe that upsert is not commutative, even over a commondomain. Join order matters.

        d4:`a`b`c!300 400 500
        d1,d4
a| 300
b| 400
c| 500
        d4,d1
a| 1
b| 2
c| 3

ArithmeticOperations

Now that we understand how to join two dictionaries, weexamine other operations. When arithmetic and comparison operations areperformed on dictionaries, the indicated operation is performed on the commondomain elements and the dictionaries are merged elsewhere,

        d5:`c`x`y!1000 2000 3000
        d1+d5
a| 1
b| 2
c| 1003
x| 2000
y| 3000
        d1*d5
a| 1
b| 2
c| 3000
x| 2000
y| 3000
        d1|d5
a| 1
b| 2
c| 1000
x| 2000
y| 3000

When a relational operation is performed on twodictionaries, the indicated operation is performed over the entire uniondomain. Effectively, each dictionary is extended to the union domain with(type-matched) nulls. Otherwise put, for non-common domain items, the operationis performed on a pair of items in which a null whose type matches the providedrange item is substituted for the missing range item.

In the following examples, observe that operations on d1and d6 are equivalent to the corresponding operations ond11andd66,

        d1:`a`b`c!1 2 3
        d6:`b`c`d`e!22 3 44 55
        d1=d6
a| 0
b| 0
c| 1
d| 0
e| 0
        d1<d6
a| 0
b| 1
c| 0
d| 1
e| 1
        d6<d1
b| 0
c| 0
d| 0
e| 0
a| 1
        d1>d6
b| 0
c| 0
d| 0
e| 0
a| 1
 
        d11:`a`b`c`d`e!1 2 3 0N 0N
        d66:`a`b`c`d`e!0N 22 3 44 55
        d11=d66
a| 0
b| 0
c| 1
d| 0
e| 0
         d11<d66
a| 0
b| 1
c| 0
d| 1
e| 1
        d66<d11
a| 1
b| 0
c| 0
d| 0
e| 0
        d11>d66
a| 1
b| 0
c| 0
d| 0
e| 0

Note:The> operation is evidently converted to the equivalent< operationwith reversed operands.

ColumnDictionaries

Column dictionaries are the foundation for tables.

Definitionand Terminology

A very useful type of dictionary is one that maps a list ofsymbols to a rectangular list of lists. Such a dictionary has the form,

c1... cn  !(v1 ;... ;vn)

where each ci is a symbol and the viare lists with common count. Such a dictionary associates the symbolciwith the list of valuesvi.

Interpreting each symbol as a column name and thecorresponding vector as the column values, we call such a list acolumndictionary. Thetype of column named by ci is thetype of its value listvi. For many column dictionaries, theviare all simple lists, meaning that each column is a vector of atoms of uniformtype. We call this asimple column dictionary.

SimpleExample

Let's reorganize the example of the previous section as asimple column dictionary.

        scores:`name`iq!(`Dent`Beeblebrox`Prefect;42 98 126)

In this dictionary, the values for the name columnare,

        scores[`name]
`Dent`Beeblebrox`Prefect

It is possible to retrieve the values for a column in acolumn dictionary using dot notation.

        scores.name
`Dent`Beeblebrox`Prefect

The value in row 1 of the name column is,

        scores[`name][1]
`Beeblebrox

Similarly, the value in row 2 of the iq column is,

        scores[`iq][2]
126

The dictionary console shows the mapping clearly.

        scores
name| Dent Beeblebrox Prefect
iq      | 42     98              126

AccessingValues

For a general column dictionary defined as,

dcols:c1 ...cn!(v1;...;vn)

the ith element of column cjis retrieved by,

dcols [cj][i]

What should we make of the following notation?

dcols [cj;i]

We can interpret it in three ways:

  • Indexing at depth in the dictionary
  • A generalization of a two dimensional matrix in which item indexing in the first dimension has become lookup into the list of column names
  • A dyadic mapping

All interpretations are all equivalent and give the sameresult,

dcols [cj][i]

In our example,

        scores[`iq][2]
126
        scores[`iq; 2]
126

Rows andColumns

Viewing the dictionary as a dyadic function, we can projectonto its first argument by fixing it to obtain the monadic function - i.e.,dcols[cj;].This projected form yields item indexing into the column list.

In simple terms, projecting onto the first argumentretrieves a vector of column values from a column dictionary.

        scores[`iq;]
42 98 126

Analogously, we would expect projection onto the secondargument to retrieve a "row" corresponding to the values in theithposition of each column vector. What form does such a row take?

Observe that the projection of the dyadic function onto itssecond argument by fixing the item index,

dcols[;i])

is a monadic function corresponding to generalized indexingby column name - i.e., dictionary lookup. Thus, we expect theithrow to be a dictionary that maps each column name to the value in that column'sithrow.

This is exactly what we find.

        scores[;2]
name| `Prefect
iq      | 126

Notational differences aside, this resembles the result ofretrieving a record from a table using a SQL query: we get the column names andthe associated row values.

A column dictionary seems to be the perfect data structureto serve as the basis for a table: a generalized matrix with indexed rows andnamed columns. But you no doubt notice the fly in the ointment: the indices arein the wrong order. It is unnatural to retrieve a column in the first index anda row in the second.

ColumnDictionary with a Single Column

The domain of a column dictionary must always be a listof symbols and the range must be alist of column vectors. Consequently,when there is only one column you must enlist the domain and range. Thefollowingis a valid column dictionary (the parentheses are necessary),

        ds:(enlist `c)!enlist 100 200 300

The following dictionary that maps a symbol to a list is nota valid column dictionary,

        dnot:`c!1 2 3

Flipping aDictionary

Transposeof a Column Dictionary

A column dictionary can be viewed as a generalizedrectangular matrix. Let d be a column dictionary defined as,

d:c1... cn!(v1;...;vn)

where ci is a symbol and the vihave common count, saym. We can index at depth intod for eachci and eachj,

d[ci;j]= vi[j]

Since all the vi have count m, inanalogy with matrices, it makes sense to define the transposet ofdby the formula,

t[j;ci]= d[ci;j]

Exactly what is t that so defined? The answercomes from realizing that indexing at depth intot should be the sameas repeated indexing,

t[j;ci]= t[j][ci]

The right hand side of this equation makes explicit that tis a list of n itemst[j] , for 0≤j<n.

What is each item in the list t? Combining thethree previous equations, we see that that,

t[j][ci]= vi[j]

Now fix j in this equation. We see that t[j]is a dictionary with the same domain asd, meaning the list ofci.This dictionary assigns to each itemci the output valuevi[j].Thus, the range of the dictionary is the collection of valuesv1![1],...,vn[j].

We summarize our findings,

  • The transpose of a column dictionary is a list of dictionaries.
  • The dictionaries in the transpose have as common domain the column names of the original dictionary.
  • The dictionary in the jth item of the transpose maps the column names to thejth row of values across the column vectors.

Flip of aColumn Dictionary

As in the case of lists, the transpose of a dictionary isobtained by applying the unaryflip operator,

        flip d

When flip is applied to a column dictionary, nodata is actually rearranged. The console display confirms the transposition ofrows and columns.

        d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)
        flip d
name       iq
-------------------
Dent            98
Beeblebrox 42
Prefect       126

The net effect of flipping a column dictionary is simplyreversing the order of the indices. This is logically equivalent to transposingrows and columns.

Flip of aFlipped Column Dictionary

If you transpose a dictionary twice, you obtain theoriginal dictionary,

        d~flip flip d        / true for any column dictionary d
1b

Consequently, if you are given t the transpose of acolumn dictionary and you flip it, you obtain a column dictionary.

        t:flip d        / pretend you didn't see this step
        flip t
name| Dent Beeblebrox Prefect
iq      | 98     42              126

Advanced:As of this writing (Jan 2007),flip has been implemented in q fordictionaries of columns, although the operation makes sense for any rectangulardictionary. In the event thatflip is implemented for a generalrectangular dictionary (i.e., any dictionary in which the range is a list oflists all having the same count) we would find the following:

The transpose ofa rectangular dictionary is a list of dictionaries. The dictionaries in thetranspose have a common domain that is the domain of the original dictionary.Thejth dictionary of the transpose maps the original domainto thejth row of values across the range list.

In this case, data likely will berearranged.

Contents

[hide]

8.Tables

Overview

Tables form the basis for kdb+. A table is a collection ofnamed columns implemented as a dictionary. Consequently, q tables arecolumn-oriented, in contrast to row-oriented tables in relational databases.Moreover, a column's values in q comprise anordered list; thiscontrasts to SQL, in which the order of rows is undefined. The fact that qtables comprise ordered column lists makes kdb+ very efficient at storing,retrieving and manipulating sequenced data. One important example is data thatarrives in time sequence.

Kdb+ handles relational and time series data in the unifiedenvironment of q tables. There is no separate data definition language, noseparate stored procedure language and no need to map internal representationsto a separate form for persistence. Just q tables, expressions and functions.

Tables are built from dictionaries, so it behooves thecursory reader to reviewDictionaries before proceeding.

TableDefinition

Table isthe flip of Column Dictionary

You undoubtedly realized at the end of Dictionaries that a table isimplemented as a column dictionary that has been flipped (i.e., transposed).Theonly effect of flipping the column dictionary is to reverse theorder of its indices; no data is rearranged under the covers.

Note:All tables have type 98h.

For example,

        d:`name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)
        d[`iq;]
98 42 126
 
        d[;2]
name| `Prefect
iq      | 126
 
        d[`iq; 2]
126
 
        t: flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)
        t[;`iq]
98 42 126
 
        t[2;]
name| `Prefect
iq     | 126
 
        t[2;`iq]
126

To access items in a table t created by flipping acolumn dictionary d, simply reverse the order of the arguments in theprojections of d. We also reverse the roles ofi andjcompared to dictionaries to make things morenatural from the table perspective.

t[i;] / row i isdictionary mapping column names to values

t[i] / ithelement of list t...same as previous

t[;cj]/ vector of column values for column cj

This validates the implementation of a table as a flippedcolumn dictionary. Retrieving rows and columns conforms to conventional matrixnotation in which the first index denotes the row and the second index thecolumn.

TableDisplay

Observe that rows and columns of a table display are indeedthe transpose of the dictionary display, even though the internal data layoutis the same.

        d
name| Dent Beeblebrox Prefect
iq  | 98   42         126
 
        t
name       iq
--------------
Dent       98
Beeblebrox 42
Prefect    126

TableDefinition Syntax

Table definition can also be accomplished using a syntaxthat manifests the columns,

([] c1:L1;...;cn:Ln)

Here c1 is a symbol containing a columnname and L1 is the corresponding list of column values. TheL1are lists of equal count, but in some circumstances can be atoms. The purposeof the square brackets is to specify a primary key and will be explained inBasic Select.

Note:For readability, we shall normally include optional whitespace after theclosing square bracket and to the right of semicolon separators.

In our example, we can define t as,

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        t[;`iq]
98 42 126
 
        t[2;]
name| `Prefect
iq     | 126
 
        t[2;`iq]
126

Defining t syntactically yields the same result ascreating the column dictionary and flipping it. It is arguably simpler andclearer.

The value columns can be stored in variables, which isuseful for programmatic table definition.

        c1:`Dent`Beeblebrox`Prefect
        c2:98 42 126
        t:([]c1;c2)
        t
c1         c2
--------------
Dent       98
Beeblebrox 42
Prefect    126

Note:WhenallLi are singleton lists - that is, you aredefining a table with a single row - they must be enlisted.

        tt:([]c1:`a;c2:100)
'type
        tt:([]c1:enlist `a; c2:enlist 100)

Note:Whenat least one column is a list and one or more columns are atoms,each atom column is extended into a list whose count matches the other columns.This can be used to assign a default value.

        tdef:([]c1:`a`b`c; c2:42; c3:1.1 2.2 3.3)
        tdef
c1 c2 c3
---------
a  42 1.1
b  42 2.2
c  42 3.3

Advanced:If you create a table as the flip of a column dictionary, item-wise extensionof an atom column is not performed on the dictionary definition but it isperformed when the column dictionary is flipped into a table.

        ddef:`c1`c2`c3!(`a`b`c;42;1.1 2.2 3.3)
        ddef
c1| `a`b`c
c2| 42
c3| 1.1 2.2 3.3
 
        flip ddef
c1 c2 c3
---------
a  42 1.1
b  42 2.2
c  42 3.3

TableMetadata

The column names of a table can be retrieved by using theunary cols.

        cols t
`name`iq

Recall that it is possible to retrieve the column values ina column dictionary using dot notation. This is also true after it is flippedto a table. For a tablet and a columnc, the expression t.cretrieves the value list for columnc. In our example,

        t.name
`Dent`Beeblebrox`Prefect
        t.iq
98 42 126

The dot effectively disassociates a column's values fromits name.

The function meta can be applied to a table tto retrieve its metadata. The result is a keyed table with one record for eachcolumn int. The key columnc of the result contains thecolumn names. The columnt contains a symbol denoting the type of thecolumn. The columnf contains the domains of any foreign keys. Thecolumna contains any attribute associated with the column.

        meta t
c | t f a
--| -----
c1| s
c2| i

Advanced:If the result of meta displays an upper case type char for a column, thisindicates that column is a non-simple list in which each item is a list of thecorresponding type. Such tables arise, for example, when you group withoutaggregating in a select.

        t:([] sc:1 2 3; nsc:(1 2; 3 4; 5 6 7))
        t
sc nsc
--------
1  1 2
2  3 4
3  5 6 7

Advanced:The function tables XE "tables (function)" takes a symbolrepresenting a context (seeworkspace organization) andreturns a sorted list of symbol names of the tables in that context. Forexample,

tables `.
`s#`t`tt

lists all the tables in the default context. Alternatively,the command \a provides tha same result. If no parameter is provided,it returns the result for the current context.

Records

We observe that count returns the number of rowsin the table since each row is an item in the list. In our example,

        count t
3

Now let's inspect the sequence of dictionaries thatcomprise the rows.

        t[0]
c1| `Dent
c2| 98
 
        t[1]
name| Beeblebrox
iq  | 98

The dictionary in each row maps the common domain list ofcolumn names to the column values of the row. This motivates calling each rowdictionary arecord in the table.

Important:A table is a sequentially ordered list of records. Each record is anassociation of column names with one row's values.

Sometimes it is useful to separate a record's values fromits column names. In this context, we shall refer to therow value list.The row value list for theith row of a table is obtained byretrieving theith item of each of the column vectors. Thisis simply the range of the record dictionary.

        value t[1]
`Beeblebrox
42

FlippedColumn Dictionary vs. List of Records

Is a table a flipped column dictionary or a list ofrecords? Logically it is both, but physically it is stored as a columndictionary with a flipped indicator.

To verify this, we create a list of records, each of whichis a dictionary that maps (common) column names to a row's values.

        lrows:(`name`iq!(`Dent;98); `name`iq!(`Beeblebrox;42))

While this list is apparently different from the equivalentcolumn dictionary, observe the curious result when you display the list ofrows,

        lrows
name       iq
-------------
Dent       98
Beeblebrox 42

The q interpreter has recognized that this list conforms tothe requirements for a list of records of a table - i.e., the domain lists ofall the dictionaries are the same, the range lists have common count, and thetypes of the range lists are consistent by position. It has converted the listof dictionaries to a flipped column dictionary by reorganizing the values thatwe specified record-by-record into column vectors.

Advanced:In general, column retrieval and manipulation on a simple column dictionarywill be significantly faster than operations on rows. The values in a simplecolumn are stored contiguously, whereas the values in each row must beretrieved by indexing into all columns.

Be mindful that deletion of a row is an expensive operationbecause all the column lists must be compressed to close the resulting gap.This can result in large amounts of data being moved in a table with many rows.

EmptyTables and Schema

We saw in the previous section that a table can be definedand populated in one step using table syntax.

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

This is infrequently done with individual values inpractice, other than for small tests. Often values are deferred to run-time orthe value lists may be prohibitively long.

In these circumstances, it is useful to create an emptytable initially and then populate it later. The empty parentheses here signifythe empty list.

        t:([] name:(); iq:())

The table will then be populated, for example, by readingthe values from a file.

When an empty table is created as above, the columns arelists of general type, so data of any type can initially be loaded. The type ofeach column will be determined by the type of the first item placed in it.Thereafter, type checking is enforced for all inserts and updates, with notype promotion performed.

It is possible to fix the type of any column in an emptytable definition by specifying a null list of the appropriate type.

        t:([] name:`symbol$(); iq:`int$())

Shorter, and arguably less obvious,

        t:([] name:0#`; iq:0#0N)

Note:Either of the previous two forms of empty table definition is the q version ofthe table's schema.

Basicselect

We shall use the following definition in this section,

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

Syntax

We shall cover select expressions in depth in q-sql, but we provide an introductionhere in order to extract and display data in our examples. The basic selectexpression takes the form,

select colsfrom table

where table is either a table or a keyed table and colsis a comma separated list of columns fromtable. This expression resultsin a list of all records for the specified columns intable.

The simplest form of select is,

select from table

which corresponds to the SQL statement,

SELECT * FROMtable

In q you do not need to write the wildcard character whenyou want all columns in the table.

Note:The basic select expression may look familiar from SQL, but it should seem oddto the q newbie who is finally becoming accustomed to parsing expressionsright-to-left. Neither select nor from represent functions that can stand alone. Instead, they are part ofa template and always appear together.

Q has a host of extensions to the basic select templatewhose elements appear between theselect andfrom or afterthe table element. As we shall see inq-sql, it is possible to convert anyselect template to a purely functional form, although this form isn'tparticularly friendly to the q newbie.

Displayingthe Result

Since the result of select is a list of records, it too isa table.

       select from t
name       iq
--------------
Dent       42
Beeblebrox 98
Prefect    126

We shall use this method of display in what follows unlesswe need to see the structure of the underlying column dictionary.

SelectingColumns

To select specific columns, list them in the desired order,comma-separated, between select and from.

        select name from t
name
------
Dent
Beeblebrox
Prefect
 
        select iq,name from t
iq  name
--------------
98  Dent
42  Beeblebrox
126 Prefect

Basicupdate

The syntax of basic update is similar to select,but named columns represent replacement by the specified values. In ourexample,

        show update iq:iq%100 from t
name       iq
---------------
Dent       0.98
Beeblebrox 0.42
Prefect    1.26

PrimaryKeys and Keyed Tables

KeyedTable

In SQL, it is possible to declare column(s) of a table as aprimary key. Amongst other things, this means that the values in the column(s)are unique, making it possible to retrieve a row via its key value. These twofeatures motivate how q implements a primary key.

We begin with a simple key - i.e., the key is a singlecolumn. The idea is to place the key column in a separate table parallel to atable containing the remaining columns. How to associate each key with itscorresponding value record? Simple: set up a dictionary mapping between the keyrecords and the associated value records.

A keyed table is a dictionary that maps each row ina table of unique keys to a corresponding row in a table of values.

SimpleExample

Let's see how this works for our previous example. Viewingthe data table as a flipped dictionary of rows will make things explicit.

        values:flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)

Now say we want to add a key column named eidcontaining employee identifiers. We place the identifiers in a separate table.Recall fromColumn Dictionary with a Single Columnthat we must enlist both the column name and the value list for a columndictionary having a single column.

        k:flip (enlist `eid)!enlist 1001 1002 1003

Now we establish the mapping between the two tables.

        kt:k!values

Voilà!—a keyed table. The console display of a keyed tablelists the key column(s) on the left, separated by a vertical bar from the valuecolumns on the right.

        kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

Note:The key mapping assumes that the key rows and value records are incorresponding order since the dictionary associates a key with the data row inthe same position.

Note:The keys should be unique. As we have already noted, dictionary creation doesnot enforce uniqueness, but a value row associated with a repeat key is not beaccessible via key lookup. It can be retrieved via a select on the key column.

KeyedTable Specification

The console display of a keyed table demonstrates how todefine it in one step as a dictionary of flipped dictionaries,

        kt:(flip (enlist `eid)!enlist 1001 1002 1003)!flip `name`iq!(`Dent`Beeblebrox`Prefect;98 42 126)

Unless you are constructing the keyed table from itsconstituents, it is simpler to use table syntax. The key column goes betweenthe square brackets and the value columns to the right as in a normal tabledefinition.

 #!q
       kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

To define an empty keyed table, use empty key and valuecolumns.

        ktempty:([eid:()] name:(); iq:())

The empty columns can be typed with either of the followingconstructs,

        ktempty:([eid:`int$()] `symbol$name:(); iq:`int$())
        ktempty:([eid:0#0] name:0#`; iq:0#0)

AccessingRecords of a Keyed Table

Since a keyed table is a dictionary mapping, it providesaccess to records in the value table via key lookup. Remember that the recordsin the key table and value table are both dictionary mappings for their rows.

        kt[`eid!1002]
name| `Beeblebrox
iq  | 42

You can abbreviate the full dictionary specification of akey record to its key value. Our example reduces to,

        kt[1002]
name| `Beeblebrox
iq  | 42

An individual column in the value record can be accessedvia repeated indexing or indexing at depth.

        kt[1002][`iq]
42
        kt[1002;`iq]
42

Important:The net effect of placing a key on a table is to convert item indexing of therows to generalized indexing via key value. Otherwise put, the first index isconverted from positional retrieval to key lookup.

RetrievingMultiple Records

Given that it is possible to lookup a single record in akeyed table by the key value,

        kt[1001]

you might think it is possible to retrieve multiple recordsfrom a keyed table via a simple list of keys. You would be wrong.

        kt[1001 1002]
`length

To lookup multiple key values in a keyed table, you mustuse a list of enlisted keys.

        kt[(enlist 1001; enlist 1002)]
name       iq
-------------
Dent       98
Beeblebrox 42

A fast way to do this is,

         kt[flip enlist 1001 1002]
name       iq
-------------
Dent       98
Beeblebrox 42

Another convenient way to lookup multiple keys is to indexusing a table having a single column with the name of the primary key and valuelist of the desired keys. In our example,

        kt[([] eid:1001 1002)]
name       iq
-------------
Dent       98
Beeblebrox 42

This works because the records of the inner table are inthe domain of the keyed table dictionary. SeeOperations on Dictionaries fordetails.

If you want to retrieve the full entries of the keyed tableinstead of just the value records, use the # operator.

        ([]eid:1001 1002)#kt
eid | name       iq
----| -------------
1001| Dent       98
1002| Beeblebrox 42

ReverseLookup

Because a keyed table is a dictionary, it is possible toperform reverse lookup from a value to a key. In a simple example,

        kts:[eid:1001 1002 1003]
        name:`Dent`Beeblebrox`Prefect)
        kts
eid | name
----| ----------
1001| Dent
1002| Beeblebrox
1003| Prefect
 
        kts?`Prefect
eid| 1003

Componentsof a Keyed Table

Since a keyed table is a dictionary mapping between thetable of keys and the table of values, the functionskey andvalueprovide a convenient way to retrieve the two constituent tables.

        key kt
eid
----
1001
1002
1003
 
        value kt
name       iq
--------------
Dent       98
Beeblebrox 42
Prefect    126

A list containing the names of the key column(s) can beretrieved with the functionkeys.

        keys kt
,`eid

Observe that cols retrieves both the key and valuecolumn names for a keyed table.

        cols kt
`eid`name`iq

Tables vs.Keyed Tables

It is sometimes convenient to convert between a regulartable having a column of (presumably) unique values and the corresponding keyedtable.

The dyadic primitive xkey converts a table with acolumn of unique values to a keyed table. The right argument ofxkeyis the table and the left operand is a symbol (or list of symbols) with thename of the column(s) to be used as the key(s).

        t:([] eid:1001 1002 1003; name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        t
eid  name       iq
-------------------
1001 Dent       98
1002 Beeblebrox 42
1003 Prefect    126
 
       `eid xkey t
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

Conversely, to convert a keyed table to a regular table,use xkey with an empty table as the left operand.

        kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126
 
        () xkey kt
eid  name       iq
-------------------
1001 Dent       98
1002 Beeblebrox 42
1003 Prefect    126

Note:The conversion expressions above do not affect the original table. You mustrefer to the table by name to modify the original.

        `eid xkey `t
`t
        t
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126
 
        () xkey `kt
`kt
 
        kt
eid  name       iq
-------------------
1001 Dent       98
1002 Beeblebrox 42
1003 Prefect    126

Advanced:It is possible to applyxkey against a column that does not contain unique values. The result isa keyed table that does not have a primary key.

        t:([] eid:1001 1002 1003 1001; name:`Dent`Beeblebrox`Prefect`Dup )
        t
eid  name
---------------
1001 Dent
1002 Beeblebrox
1003 Prefect
1001 Dup
 
        ktdup:`eid xkey t
        ktdup
eid | name
----| ----------
1001| Dent
1002| Beeblebrox
1003| Prefect
1001| Dup

Duplicate key values are not accessible via key lookup,

        ktdup 1001
name| Dent

but they are accessible via select.

        select from ktdup where eid=1001
eid | name
----| ----
1001| Dent
1001| Dup

CompoundPrimary Key

We understand that the q implementation of a SQL table witha simple key is actually a dictionary mapping between a pair of tables in whichthe first table has a single key column. This has a straightforward extensionto a compound key.

Recall that a compound key in SQL is a collection of two ormore columns that together provide a unique value for each row. To implement acompound key in q, we generalize the key table from a single column to multiplecolumns by requiring that each record in the key table has a unique combinationof column values.

Here is our example redone to replace the employee id witha compound key comprising the last and first names.

          ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:98 42 126)

Observe that the console displays a compound keyed tablewith the key columns on the left separated by a vertical bar| fromthe value columns to the right.

          ktc
lname      fname | iq
-----------------| ---
Dent       Arthur| 98
Beeblebrox Zaphod| 42
Prefect    Ford  | 126

As in the case of a simple primary key, we can abbreviatethe full key record to the key value for retrieval.

        ktc[`Dent`Arthur]
iq| 98

Here is how to initialize our example as an empty table,

        ktc:([lname:();fname:()] iq:())

The empty keyed table can be typed with either of thefollowing,

        ktc:([lname:`symbol$();fname:`symbol$()] iq:`int$())
 
        ktc:([lname:0#`;fname:0#`] iq:0#0)

We shall see in Insert into Keyed Tables how to fillboth key columns and data tables in a keyed table simultaneously.

For the fundamentalist, here is the same compound keyedtable built from its constituent pair of tables

        ktc:(flip `lname`fname!(`Dent`Beeblebrox;`Arthur`Zaphod))!
        flip (enlist `iq)!enlist 98 42 126

And here is retrieval by full key record,

        ktc[`lname`fname!`Beeblebrox`Zaphod]
iq| 42

Most will agree that the table definition syntax andabbreviated key value retrieval is simpler.

7.4.10Retrieving Records with a Compound Primary Key

Retrieval of multiple records via a compound primary key isactually easier than with a simple key, since each compound key value isalready a list.

        ktc (`Dent`Arthur; `Prefect`Ford)
iq
---
98
126

As was the case with a keyed table having a simple key,retrieval can be performed via a table whose columns and values match the keycolumns.

        K:([] lname:`Dent`Prefect; fname:`Arthur`Ford)
        ktc[K]
iq
---
98
126
 
        ktc K            /use juxtaposition
iq
---
98
126

As in the case of a simple key, you can use # to retrievethe full entities of the keyed table instead of just the value records.

        K#ktc
lname   fname | iq
--------------| ---
Dent    Arthur| 98
Prefect Ford  | 126

Key Lookupwith txf

Looking up keys in a keyed table is complicated by thedifferent formats for simple and compound keys. The triadic functiontxfprovides a uniform way to perform such key lookup. The first argument is akeyed table (target). The second argument is a list of key values,either simple or compound. The third argument is a list of symbol column namesin the value table oftarget. The result is a list comprising thematching row values from the specified columns of the value table oftarget.

In the following example using a simple key, observe thecolumn order of the result.

        kts:([k:101 102 103] c1:`a`b`c; c2:1.1 2.2 3.3)
        txf[kts;101 103;`c2`c1]
1.1 `a
3.3 `c

With a compound key, the values to be looked up must belisted in columns.

        ktc:([k1:`a`b`a; k2:`x`y`z] c1:100 200 300; c2:1.1 2.2 3.3)
        txf[ktc;(`a`b;`z`y);`c1`c2]
300 3.3
200 2.2

ForeignKeys and Virtual Columns

A foreign key in SQL is a column in one table whose valuesare members of a primary key column in another table. Foreign keys are themechanism for establishing relations between tables.

One of the important features of a foreign key is that theRDBMS enforces referential integrity, meaning that the value in a foreign keycolumn isrequired to be in the referenced primary key column. To inserta foreign key value that is not in the primary key column, it must first beinserted into the primary key column.

Definitionof Foreign Key

Q has the notion of a foreign key that also providesreferential integrity. Extra credit to the reader who has guessed that aforeign key is implemented using an enumeration. In our introduction toenumerations, we saw that an enumeration domain can be any list with uniqueitems. A keyed table meets the criterion of a unique domain, since the keyrecords in the dictionary domain are unique.

A foreign key is a table column defined as anenumerated value over a keyed table. As an enumeration, a foreign key indeedprovides referential integrity by restricting values in the foreign key columnto be in the list of primary key values.

Example ofSimple Foreign Key

An enumeration over a keyed table domain acts just like oursimple enumeration examples. Let's return to a previous example.

        kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

To enumerate over the primary key of kt , use asymbol containing the keyed table name as the domain in the enumeration.

       `kt$

The primary key table records provide the unique set ofvalues for enumerating records.

       `kt$`eid!1001
`kt$1001

As usual, q saves us the trouble of being so explicit andallows the enumeration to be applied to items in the value list for the primarykey dictionary - that is, the primary key values.

        e1:`kt$1002 1001 1001 1003 1002 1003
        e1 = 1003
000101b

As with any enumeration, attempting to enumerate a keyvalue that is not in the domain causes an error.

        `kt$1004
`cast

We can use table definition syntax to define a table with aforeign key over kt.

        tdetails:([] eid:`kt$1003 1001 1002 1001 1002 1001; sc:126 36 92 39 98 42)

The foreign key column has simply been defined as anenumeration over the keyed table.

We see the foreign key table in the f column whenwe invoke meta on the table.

        meta tdetails
c  | t f  a
---| ------
eid| i kt
sc | i

As of release 2.4, the built-in function fkeysreturns a dictionary in which each foreign key column name is mapped to its keydomain—its primary key table name.

        treport:([] eid:`kt$1001 1002 1003; mgrid:`kt$1002 0N 1002)
        fkeys treport
eid  | kt
mgrid| kt

Resolvinga Foreign Key

There are occasions when you wish to resolve a foreign key,by which we mean substitute the actual values in place of the enumeratedvalues. As with an ordinary enumeration, this is done by applying thevaluefunction to the foreign key column.

        update eid:value eid from tdetails
eid  sc
--------
1003 126
1001 36
1002 92
1001 39
1002 98
1001 42

ForeignKeys and Relations

In SQL, an inner join is used to splice back together datathat has been normalized via relations. The splice is usually done along aforeign key, which establishes a relation to the keyed table via the primarykey. In the join, columns from both tables are available using dot notation.

In q the same effect is achieved using foreign keys withoutexplicitly creating the joined table. The notation is similar, but differentenough to warrant close attention.

Let tf be a table having a foreign key fwhose enumeration domain is the keyed tablekt. All columns inktare available via dot notation in any select expression whose from domain istf.A columnc in kt that is accessed in this way is called avirtualcolumn and is specified with dot notationf.c in the selectexpression.

For example, given t as above, we create a detailstable that contains individual test results for each person. We name theforeign key in the details table the same as the primary key it refers to, butthis is not required,

      tdetails:([] eid:`kt$1003 1002 1001 1002 1001 1002; sc:126 36 92 39 98 42)

Now we can access columns in t via a select on tdetails.

        select eid.name, sc from tdetails
name       sc
--------------
Prefect    126
Beeblebrox 36
Dent       92
Beeblebrox 39
Dent       98
Beeblebrox 42

The case in which the enumeration domain of a foreign keyhas a compound primary key is slightly more complicated. We cover this inOperations on Compound Column Data

Workingwith Tables and Keyed Tables

In this section, we use the following examples.

    t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
 
    kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)

First andLast Records

Because a table is a list of records, the functions firstand last retrieve the first and last records, respectively.

         first t
name| `Dent
iq  | 98
 
         last t
name| `Prefect
iq  | 126
 
         first kt
name| `Dent
iq  | 98
 
         last kt
name| `Prefect
iq  | 126

These functions are useful in select expressions,especially with grouping and aggregation.

Note:Every table in kdb+ has a first and last record since it is an ordered list ofrecords. Moreover, the result of aselect template is atable and so is also ordered. Contrast this with SQL, in which tables andresult sets are unordered, and you must use ORDER BY to impose an order.

You can retrieve the first or last n records of atable or keyed table using the take operator (# ).

        2#t
name       iq
-------------
Dent       98
Beeblebrox 42
 
        -3#kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126

See Appendix A for more on usingtake. Also see select[n] for another way to achievethis result using select[n].

Find

The find operator ( ? ) used with a table performsa reverse lookup of a record and returns the corresponding row number. Withtas above,

        t?`name`iq!(`Dent;98)
0

As usual, the record can be abbreviated to a list of rowvalues.

        t?(`Dent;98)
0

You can reverse-lookup a list of multiple row values.

        t?((`Dent;98);(`Prefect;126))
0 2

Since a keyed table is a dictionary, find performs areverse lookup of a value record and returns the key record.

        kt?`name`iq!(`Dent;98)
eid| 1001
        kt?(`Dent;98)
eid| 1001

In the case of find on a table with a single column, eachlist of row values must be a singleton list.

        t1:([] eid:1001 1002 1003)
        t1?(enlist 1001; enlist 1002)
0 1

The list of singletons can be created by the followingexpressions, although the first executes faster, especially for long lists.

        flip enlist 1001 1002
1001
1002
 
        enlist each 1001 1002
1001
1002

PrimitiveJoin (,)

The join operator ( , ) is defined for tables andkeyed tables.

You can use join to append a record to a table.

        t:([]c1:`a`b;c2:10 20)
        t,`c1`c2!(`c;30)
c1 c2
-----
a  10
b  20
c  30

This join is one situation in which you cannot use a listof row values.

        t,(`a;30)
`c1`c2!(`a;10)
`c1`c2!(`b;20)
`a
30

You can, however, use a list of row values to amend theoriginal table.

        t,:(`a;30)
        t
c1 c2
-----
a  10
b  20
c  30

Only tables having exactly the same list of column namesand compatible column types can be joined. Since a table is a list of records,the result is obtained by appending the rows of the right operand to those ofthe left operand.

        t1:([] a:1 2 3; b:100 200 300)
        t2:([] a:3 4 5; b:300 400 500)
        t1,t2
a b
-----
1 100
2 200
3 300
3 300
4 400
5 500

Note that common rows are duplicated in the result.

Two tables with the same columns in different order cannotbe joined with , because the order of columns in records issignificant in q,

        t3:([]b:1001 2001 3001; a:101 201 301)
        t1,t3
'mismatch

Two keyed tables with the same key and value columns can bejoined. Because a keyed table is a dictionary, the result has upsert semantics,as we saw inJoin Keys in the right operand thatare not in the left operand are treated as inserts, whereas the right operandacts as an update for common key values.

        kt1:([k:1 2 3] c:10 20 30)
        kt2:([k:3 4 5] c:300 400 500)
        kt1,kt2
k| c
-| ---
1| 10
2| 20
3| 300
4| 400
5| 500

Coalesce(^)

The coalesce operator ( ^ ) is defined for keyedtables and differs from primitive join (, ) in its treatment of nullcolumn items in the value tables.

When two keyed tables have the same key and value columnsand the column values of both keyed tables are non-null atoms,^ behavesthe same as primitive join (, ).

        kt1:([k:1 2 3] c1:10 20 30;c2:`a`b`c)
        kt2:([k:3 4 5] c1:300 400 500;c2:`cc`dd`ee)
        kt1,kt2
k| c1  c2
-| ------
1| 10  a
2| 20  b
3| 300 cc
4| 400 dd
5| 500 ee
 
        kt1^kt2
k| c1  c2
-| ------
1| 10  a
2| 20  b
3| 300 cc
4| 400 dd
5| 500 ee

When the right operand has null column values, the columnvalues of left operand are only updated with non-null values of the rightoperand.

        kt3:([k:2 3] c1:0N 3000;c2:`bbb`)
        kt3
k| c1   c2
-| --------
2|      bbb
3| 3000
 
        kt1,kt3
k| c1   c2
-| --------
1| 10   a
2|      bbb
3| 3000
 
        kt1^kt3
k| c1   c2
-| --------
1| 10   a
2| 20   bbb
3| 3000 c

Note:The performance of^ is slower than that of, since each column valueof the right operand must be checked for null.

ColumnJoin

Two tables with the same number of rows can be combinedwith join-each ( ,' ) to form a sideways, or column, join in which thecolumns are aligned in parallel.

        t1:([] a:1 2 3)
        t2:([] b:100 200 300)
        t1,'t2
a b
-----
1 100
2 200
3 300

When the column lists of the tables are not disjoint, theoperation on the common columns has upsert semantics because each column is adictionary.

        t3:([] a:10 20 30; b:100 200 300)
        t1,'t3
a  b
------
10 100
20 200
30 300

Because keyed tables are dictionaries, they can only besideways joined if they have identical key columns. In such a situation, we candeduce the behavior by recalling fromRemoving Entries that any operation ona dictionary is applied on the common elements of the merged domains and isextended to the non-common domain elements with appropriate nulls.

Thus, a sideways join on keyed tables with identical keycolumns has simple upsert semantics for common data columns. More interestingare the non-common data columns, where the result becomes a column join splicedalong common key values.

        t4:([a:1 2 3] x:100 200 300)
        t4
a| x
-| ---
1| 100
2| 200
3| 300
 
        t5:([a:3 4 5] y:1000 2000 3000)
        t5
a| y
-| ----
3| 1000
4| 2000
5| 3000
 
        t4,'t5
a| x   y
-| --------
1| 100
2| 200
3| 300 1000
4|     2000
5|     3000

ComplexColumn Data

SimpleExample

Recall from the definition of a column dictionary in Dictionary vs. List that there is norestriction that the column vectors must be lists of simple type. We haveheretofore worked with examples having homogenous atomic values in each columnbecause they correspond to familiar SQL tables, but there is no need to limitourselves to simple columns.

Suppose we want to keep track of a pair of dailyobservations, say a low temperature and a high temperature. We can do this bystoring the low and high values in separate columns.

        t1:([] d:2006.01.01 2006.01.02; l:67.9 72.8; h:82.1 88.4)
        t1
d          l    h
--------------------
2006.01.01 67.9 82.1
2006.01.02 72.8 88.4
 
        t1[0]
d| 2006.01.01
l| 67.9
h| 82.1
 
        t1.l
67.9 72.8
 
        t1.h
82.1 88.4

We can also store pairs in a single column.

        t2:([] d:2006.01.01 2006.01.02; lh:(67.9 82.10; 72.8 88.4))
        t2
d          lh
--------------------
2006.01.01 67.9 82.1
2006.01.02 72.8 88.4
 
        t2[0]
d | 2006.01.01
lh| 67.9 82.1
 
        t2.lh
67.9 82.1
72.8 88.4
 
        t2.lh[;0]
67.9 72.8
 
        t2.lh[;1]
82.1 88.4

The first form is arguably more natural if you intend tomanipulate the low and high values separately. This example can easily begeneralized to the situation of n-tuples. In this case, storing multiple valuesin a single column has a definite advantage since defining and populating ncolumns is unwieldy when n is not known in advance. Storing and retrievingn-tuples to/from a single column is a simple operation in q. A useful examplein finance is storing daily values for a yield curve.

Operationson Compound Column Data

We generalize the above example to the case of storing aset of repeated observations in which the number of observations is not fixed -i.e., varies with each occurrence. Say we want to perform a statisticalanalysis on the weekly gross revenues for movies and we don't care about thespecific titles. Since there will be a different number of movies in releaseeach week, the number of observations will not be constant. An oversimplifiedversion of this might look something like,

        t3:([] wk:2006.01.01 2006.01.08; gr:( 38.92 67.34; 16.99 5.14 128.23 31.69))
        t3
wk         gr
----------------------------------
2006.01.01 38.92 67.34
2006.01.08 16.99 5.14 128.23 31.69

Handling the situation in which the number of column valuesis not known in advance, or is variable, is cumbersome in SQL. You normalizethe data into a master-detail pair of tables, but you cannot re-assemble thedetails into separate columns via a join. Instead, for each master record youget a collection of records that must be iterated over via some sort ofcursor/loop. In verbose programming, this results in many lines of code thatare slow and prone to error on edge cases.

By storing complex values in a single column in a q table,sophisticated operations can be performed in a single expression that executesfast. In the following q-sql examples, don't worry about the details of thesyntax, and remember to read individual expressions from right to left. Observethat because there are no stinking loops, we never need to know the number ofdetail records.

Using our movie data, we can produce the sorted gross, theaverage and high gross for each week in one expression.

        select wk, srt:desc each gr, avgr:avg each gr, hi:max each gr from t3
wk         srt                     avgr    hi
-------------------------------------------------
2006.01.01 67.34 38.92             53.13   67.34
2006.01.08 128.23 31.69 16.99 5.14 45.5125 128.23

While sorts and aggregates such as Max and Avg are standardSQL, think of how you'd produce the sorted sublist and the aggregates together.In your favorite verbose programming environment, you'll soon discover that youhave a sordid list of rows requiring a loop to unravel into a single outputline.

Now think about what you'd do to compute the percentagedrops between successive gross numbers within each week. Because the sorteddetail items are rows in SQL, this requires another loop. In q,

        select wk,drp:neg 1_'deltas each desc each gr,avgr:avg each gr,hi:max each gr from t3
wk         drp              avgr    hi
------------------------------------------
2006.01.01 ,28.42           53.13   67.34
2006.01.08 96.54 14.7 11.85 45.5125 128.23

CompoundForeign Key

Storing multiple values in a column is how to make aforeign key on a compound primary key. We return to the example using last nameand first name as the primary key.

        ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford]; iq:98 42 126)

We create a details table with a foreign key enumerationover ktc by placing the names in the foreign key column.

        tdetails:([] name:`ktc$(`Beeblebrox`Zaphod;`Prefect`Ford;`Beeblebrox`Zaphod); sc:36 126 42)

The columns of ktc are available as virtualcolumns from tdetails.

        select name.lname,name.iq,sc from tdetails
lname      iq  sc
------------------
Beeblebrox 42  36
Prefect    126 126
Beeblebrox 42  42

Attributes

Attributes are metadata applied to lists of special form.They are used on a dictionary domain or a table column to reduce storagerequirements and/or speed retrieval. When it sees an attribute, the qinterpreter can make certain optimizations based on the structure of the list.

Important:Attributes are descriptive rather than prescriptive. Consequently, applying anattribute (other than`g#) to a list will not make it so. Moreover, a modification thatrespects the form specified by the attribute leaves the attribute intact (otherthan`p#), while a modification that breaks the form is permitted but theattribute is lost on the result.

The syntax for applying an attribute looks like the verb #with a left operand containing the symbol for the attribute and the list as theright operand. However, this use of# is not functional.

Note:You will not see significant benefit from a attribute for less than a millionitems. This is why attributes are not automatically applied in mundanesituations such as the result of til or distinct. You should test yourparticular situation to see whether applying an attribute actually providesperformance benefit.

Sorted (`s#)

Applying the sorted attribute (`s#) to a listindicates that the items of the list are sorted in ascending order.

Note:As of this writing (Jun 2007) there is no way to indicate a descending sort.

When a list has the sorted attribute, the default linearsearch used in lookups is replaced with binary search. Sorted also makescertain operations much faster — for examplemin andmax.

The following fragments show situations in which thisapplies.

x?v
... where x = v, ...
... where x in v, ...
... where x within v, ...

The sorted attribute can be applied to a simple list,

        L:`s#1 2 2 4 8
        L
`s#1 2 2 4 8
 
        L,:16                      / respects sort
        L
`s#1 2 2 4 8 16
 
        L,:0                        /  does not, attribute lost
        L
1 2 2 4 8 16 0

or a column of a table,

        t:([]`s#t:04:02:42.001 04:02:42.003;v:101.05 100.95)

The sorted attribute can be applied to a dictionary, whichmakes the dictionary into a step function.

        ds:`s#1 2 3 4 5!`a`b`c`d`e
        ds
1| a
2| b
3| c
4| d
5| e

Applying the sorted attribute to a table implies binarysearch on the table and also that the first column is sorted.

         ts:`s#([]t:04:02:42.001 04:02:42.003;v:101.05 100.95)
         ts
t            v
-------------------
04:02:42.001 101.05
04:02:42.003 100.95

Applying the sorted attribute to a keyed table means thatthe dictionary, its key table and its key column(s) are all sorted.

        kt:`s#([k:1 2 3 4] v:`d`c`b`a)
        kt
k| v
-| -
1| d
2| c
3| b
4| a

Unique (`u#)

Applying the unique attribute (`u#) to a listindicates that the items of the list are distinct. Knowing that the elements ofa list are unique dramatically speeds updistinct and allows q to exitsome comparisons early.

Operations on the list must preserve uniqueness or theattribute is lost.

        LU:`u#4 2 6 18 1
        LU
`u#4 2 6 18 1
 
        LU,:0                / uniqueness preserved
        LU
`u#4 2 6 18 1 0
 
        LU,:2                /  attribute lost
        LU
4 2 6 18 1 0 2

The unique attribute can be applied to the domain of adictionary, a column of a table, or the key column of a keyed table. It cannotbe applied to a dictionary, a table or a keyed table directly.

Parted (`p#)

The parted attribute (`#p) indicates that the listrepresents a step function in which all occurrences of a particular outputvalue are adjacent. The range is an int or temporal type that has an underlyingint value, such as years, months, days, etc. You can also partition over asymbol provided it is enumerated.

Advanced':Applying the parted attribute causes the creation of an index dictionary thatmaps each unique output value to the position of its first occurrence.

When a list is parted, lookup is much faster since linearsearch is replaced by hashtable lookup.

Sorting in ascending or descending order is one way toproduce the partitioned structure, but list need not be in sorted order. Forexample,

        L:`p#2 2 2 1 1 4 4 4 4 3 3
        L,:3
        L
2 2 2 1 1 4 4 4 4 3 3 3

The parted attribute is not preserved under an operation onthe list, even if the operation preserves the partitioning.

Note:The parted attribute should be considered when the number of entities reaches abillion and most of the partitions of of substantial size—i.e., there issignificant repetition.

Grouped (`g#)

The grouped attribute (`g#) differs from otherattributes in that it imposes additional structure on the list by causing q tocreate and maintain an index. Grouping can be applied to a list when no otherassumptions about its structure can be made.

Applying the grouped attribute to a table column roughly correspondsto placing a SQL index on a column. For example, if you wish to query a tablevia a symbol column sym, applying the grouped attribute to the columndrastically speeds up queries such as,

        select[-100] ... where sym=`xyz

Here we are retrieving the last 100 records matching a symvalue.

Advanced:The index is a dictionary that maps each unique output value to the a list ofthe positions of all its occurrences. This speeds lookups and some operations(e.g., distinct). The tradeoff is significant storage overhead.

For example,

        L:`g#1 2 3 2 3 4 3 4 5  2 3 4 5 4 3 5 6
        L
`g#1 2 3 2 3 4 3 4 5 2 3 4 5 4 3 5 6

Note:The grouped attribute is preserved for both inserts and upserts.

Applying the grouped attribute to a table column,

        t:([]`g#c1:1 2 3 2 3 4 3 4; c2:`a`b`a`c`a`d`b`c)

Note:As of this writing (Jun 2007), the maximum number of`g# attributesthat can be placed on a single table is 99.

 

 

Contents

[hide]

9. Queries:q-sql

Overview

Q has a collection of functions for manipulating tablesthat are similar to their counterparts in SQL. This collection, which we callq-sql,includes the usual suspects such as insert, select, update, etc., as well asfunctionality that is not available in traditional SQL. While q-sql provides asuperset of SQL functionality, there are some significant differences in thesyntax and behavior.

The first important difference is that a q table haswell-defined record and column orders. This is particularly useful in dealingwith the situation in which records are inserted in a canonical order.Subsequent actions against the table will then retrieve records in this order.For example, a time series can be created by inserting (in time order) pairs consistingof a time (or date, or datetime) value and data value(s). The result of anyselect will then be in time order, without requiring a sort.

A second difference is that every q table is storedphysically as a collection of column vectors. This means that operations oncolumn data are easy and fast since atomic, aggregate or uniform functionsapplied to columns are optimized vector operations.

A third difference is that q-sql provides upsert semantics.This means that one dataset can be applied to another without the need toseparate inserts from updates. Upsert can simplify operations significantly inpractice.

In this chapter, we cover the important features of q-sql,including all the basic operations in kdb+. We demonstrate each feature with asimple example. Gradually, more complex examples are introduced.

Many examples are based on the sp.q distributionscript. The schemas for the tables in the script are,

        s:([s:()]name:();status:();city:())
        p:([p:()]name:();color:();weight:();city:())
        sp:([]s:`s$();p:`p$();qty:())

The contents of the tables are,

      s
s | name  status city
--| -------------------
s1| smith 20     london
s2| jones 10     paris
s3| blake 30     paris
s4| clark 20     london
s5| adams 30     athens
 
        p
p | name  color weight city
--| -------------------------
p1| nut   red   12     london
p2| bolt  green 17     paris
p3| screw blue  17     rome
p4| screw red   14     london
p5| cam   blue  12     paris
p6| cog   red   19     london
 
        sp
s  p  qty
---------
s1 p1 300
s1 p2 200
s1 p3 400
s1 p4 200
s4 p5 100
s1 p6 100
s2 p1 300
s2 p2 400
s3 p2 200
s4 p2 200
s4 p4 300
s1 p5 400

Insert

Insert appends records to a table or keyed table.

BasicInsert

To add records to a table, use the dyadic function insert,

insert[st;L]

where st is a symbol containing the name of atable (target) andL is a list whose items correspond to recordsoftarget. The result ofinsert is a list of int representingthe positions of the new record(s).

Note:Since the items in L are appended to the column vectors of st, eachvalue must type-match the corresponding column vector.

For a regular (i.e., non-keyed) table, the effect of insertis to append a new record holding the specified values. Let's use our simpleexample.

        t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)

Insert a record into t as follows,

        insert[`t;(`Slartibartfast;156)]
,3
 
        t
name            iq
-------------------
Dent            42
Beeblebrox      98
Prefect        126
Slartibartfast 156

AlternateForms

Since the dyadic insert is also a verb, it cantake various notational forms. For example, the previous insert can be writtenas a binary operator.

        `t insert (`Slartibartfast; 156)

It can also be expressed as a projection onto the firstargument with juxtaposition of the second argument.

        insert[`t] (`Slartibartfast; 156)

You may find one of these more readable. We shall use theminterchangeably.

You can also insert a record, as opposed to a list of rowvalues.

        `t insert `name`iq!(`Slartibartfast; 156)

This is useful when you wish to insert a table which is theresult of a select.

RepeatedInserts

For a (non-keyed) table, repeatedly inserting the same datais permissible and it results in duplicate records.

        t:([] name:`Dent`Beeblebrox`Prefect; iq:42 98 126)
        `t insert (`Slartibartfast; 156)                 / one form
,3
 
        insert[`t] (`Slartibartfast; 156)                / equivalent form
,4
 
        t
name            iq
-------------------
Dent            42
Beeblebrox      98
Prefect         126
Slartibartfast 156
Slartibartfast 156

ColumnarBulk Insert

In the preceding, we have considered the case when the listin an insert represents a set of values for a single row. Each item is an atomdestined for the corresponding column in the table. It is also possible to bulkinsert multiple entries.

Recall that a table is a dictionary of columns. So in theexample,

        t:([] name:`Dent`Beeblebrox; iq:98 42)
        `t insert (`Prefect;126)
,2

the right operand looks like a row, but is in fact a listof column values. With this perspective, a bulk insert can be achieved with acompound list, each of whose items is a list of column values destined for thecorresponding column in the table.

        t:([] name:`Dent`Beeblebrox; iq:98 42)
        `t insert (`Prefect`Mickey;126 1024)
2 3

TableInsert

It is also possible to bulk insert records (i.e., rows). Atable can be viewed as a list of records (and vice versa), so it is reasonableto insert one table into another provided the columns are compatible.

        t:([] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        tnew:([] name:`Slartibartfast`Mickey; iq:158 1042)
        `t insert tnew
3 4
 
        t
name           iq
-------------------
Dent           98
Beeblebrox     42
Prefect        126
Slartibartfast 158
Mickey         1042

Insertinto Keyed Tables

Inserting data into a keyed table works just like insertingdata into a regular table, with the additional requirement that the key mustnot already exist in the table. Using our previous example of a keyed table,

        t:([eid:1001 1002] name:`Dent`Beeblebrox; iq:98 42)
        t
eid | name       iq
----| -------------
1001| Dent       98
1002| Beeblebrox 42
 
        `t insert (1004; `Slartibartfast; 158)
,2
 
        t
eid | name            iq
----| -------------------
1001| Dent            98
1002| Beeblebrox      42
1004| Slartibartfast 158

The following insert fails because the key 1004 alreadyexists in t,

        `t insert (1004; `Slartibartfast; 158)
'insert

Observe that, by default, the records in a keyed table arestored in insert order rather than key order.

        `t insert (1003; `Prefect; 126)
        t
eid | name            iq
----| -------------------
1001| Dent            98
1002| Beeblebrox      42
1004| Slartibartfast  158
1003| Prefect         126

Insertinto Empty Tables

We consider the situation of an empty table with no columntypes specified. The column types are inferred from the first insert.

        t:([] name:(); iq:())
        type t.name
0h
 
        type t.iq
0h
 
        `t insert (`Dent; 98)
,0
 
        type t.name
11h
 
        type t.iq
6h

If you define an empty table without types, be especiallycareful to get the first insert correct.

        `t insert (98; `Dent)
                .
                .
                .
 
        `t insert (`Beeblebrox; 42)
`type

It is advantageous to define an empty table with types. Inour example,

        t:([] name:`symbol$(); iq:`int$())
        t:([] name:0#`; iq:0#0)        / an equivalent way
 
        type t.name
11h
        type t.iq
6h

Insert andForeign Keys

When inserting data into a table that has a foreign key,everything works as for a regular table, except that a value destined for aforeign key column must already exist as a key in the corresponding primary keytable.

Note:This last requirement is how q implements referential integrity.

Returning to our example of the previous section,

        kt:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        tdetails:([] eid:`kt$1003 1002 1001 1002 1001; sc:126 36 92 39 98)
        kt
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126
 
        tdetails
eid  sc
--------
1003 126
1002 36
1001 92
1002 39
1001 98
 
        `tdetails insert (1002; 42)
,5
 
         tdetails
eid  sc
--------
1003 126
1002 36
1001 92
1002 39
1001 98
1002 42

The following insert fails because the key 1004 does notexist in kt.

        `tdetails insert (1004; 158)
'cast

The selectand exec Templates

In this section, we investigate the general form of select,which we met briefly inBasic select. We presentselect asa template having required and optional elements. The template elements, inturn, contain phrases whose expressions involve column values. The qinterpreter applies the template against the specified table to produce aresult table. While the syntax and results resemble those of the analogous SQLstatement, the underlying mechanics are quite different.

We shall examine each of the constituents of the selecttemplate in detail. Our approach is to introduce the concepts with illustrativeexamples using trivial tables and then to proceed with more meaningful examplesusing time series. Here are our sample table definitions:

        tk:([eid:1001 1002 1003] name:`Dent`Beeblebrox`Prefect; iq:98 42 126)
        tdetails:([] eid:`tk$1003 1002 1001 1002 1001 1002; sc:126 36 92 39 98 42)

Syntax

The select template has the following form, whereelements enclosed in matching angle brackets (<...>) are optional.

select <ps> <by pb>from texp <wherepw>

The select and from keywords arerequired, as is texp, which is a q expression whose result isa table or keyed table. The elementsps,pband pw are the select, the by and the wherephrases, respectively. The result ofselect is a list of records or,equivalently, a table.

Note:Ifwhere is present andtexp is itself the result of aselect, the expression that producestexp must be enclosed inparentheses.

Some simple examples follow.

        select from tk
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126
 
        select eid,name from tk where name=`Dent
eid  name
---------
1001 Dent
 
        select cnt:count sc by eid.name from tdetails
name      | cnt
----------| ---
Beeblebrox| 3
Dent      | 2
Prefect   | 1
 
        select topsc:max sc, cnt:count sc by eid.name from tdetails where eid.name<>`Prefect
name      | topsc cnt
----------| ---------
Beeblebrox| 42    2
Dent      | 98    2

The order of execution for select is:

(1) fromexpression texp,

(2) where phrasepw

(3) by phrase pb

(4) selectphrase ps

In particular, the from expression is always evaluatedfirst and the select phrase last.

Note:Ifps is absent, all columns are returned. There is no needfor the * wildcard of SQL.

Each phrase in the select template is acomma-separated list of subphrases. Asubphrase is an expressioninvolving columns oftexp or virtual columns if a tablerelated totexp via foreign key. The subphrases within aphrase are evaluated left-to-right, but each expression comprising a subphraseis parsed right-to-left, like any q expression.

Important:The commas separating the subphrases are separators, meaning that it is notnecessary to enclose a subphrase in parentheses. However, any expressioncontaining the join operator ( , ) must be enclosed in parentheses todistinguish it from the separator.

The wherePhrase

The where phrase controls which records appear in theresult. The action of this phrase is a generalization of the built-inwherefunction (SeeAppendix A).

Each subphrase is a criterion on columns. It produces aboolean result vector corresponding to records passing or failing thecriterion. The effect of awhere subphrase is to select only therecords that pass its criterion.

The individual where subphrases are applied fromleft-to-right. Each step produces a result whose rows are a subset of theprevious one. The net effect is a series of progressively narrowed interimtables.

        select from tk where iq <100
eid  name       iq
------------------
1001 Dent       98
1002 Beeblebrox 42
 
        select from tdetails where eid=1002
eid  sc
-------
1002 36
1002 39
1002 42
 
        select from tdetails where eid=1002,sc<eid.iq
eid  sc
-------
1002 36
1002 39
 
         select from tdetails where (eid=1002)&sc<eid.iq
eid  sc
-------
1002 36
1002 39

We point out that the last two queries return the sameresult but execute differently; we shall see more about this later. Alsoobserve that the parentheses in the last query are necessary due toright-to-left evaluation of expressions.

The selectPhrase

The select phrase controls which columns appear in theresult. Each select subphrase produces a column. The name of the result columnfrom each subphrase is taken from the last underlying column referenced in thesubphrase evaluation unless the result is renamed by assignment.

        select LastName:name,iq from tk
LastName   iq
--------------
Dent       98
Beeblebrox 42
Prefect    126

If a column is repeated in the select phrase, it appearsmore than once in the result. This behaves like SQL SELECT.

        select iq,iq from tk
iq  iq
-------
98  98
42  42
126 126

Important:A virtual columni holding the position of each record is implicitly available in theselect phrase.This is useful, for example, in aggregation if you want a column with recordcounts without reference to a specific column name.

        select cnt:count i by eid from tdetails
eid | cnt
----| ---
1001| 2
1002| 3
1003| 1

In this situation, i plays a role somewhat similarto * in SQL, but is more useful since it can be used to select specificrecords. For example, criteria oni can be used to fill only one pageof results when you do not wish to transmit an entire result set. Here is thesecond page of detail records for a page size of 3, noting that the withinfunction includes both its endpoints (see Appendix A).

        select from tdetails where (3<=i) and i<6
eid  sc
-------
1002 39
1001 98
1002 42

This is difficult to do in SQL and vendors have addedproprietary extensions to handle it.

The byPhrase

The by phrase controls how rows are grouped in theresult. The action of this phrase is a generalization of the built-ingroupfunction (SeeAppendix A).

Each by subphrase is an expression involving acolumn. It produces a grouping criterion for that column. The columns resultingfrom theby phrase become the primary keys of theselectresult. Multiple subphrases in theby phrase result in a compoundprimary key in the result.

Note:If the by phrase is included, theresult of select is a keyed table; if itis omitted, the result is a table.

Important:Every column included in the by phrase is automatically included in the resultand should not be included separately in theselect phrase.

It is possible to group without aggregation. The result isa table with non-simple lists for columns - that is, non-atomic column values.(SeeComplex Column Data for more on tables withnon-simple column lists.)

        select sc by eid from tdetails
eid | sc
----| --------
1001| 92 98
1002| 36 39 42
1003| ,126

This cannot be achieved easily with GROUP BY in SQL.

The function ungroup can be used to normalize theresult of grouping back to a flat table.

        seid:select sc by eid from tdetails
        seid
eid | sc
----| -----
1001| 92 98
1002| 36 39 42
1003| ,126
 
       ungroup seid
eid  sc
--------
1001 92
1001 98
1002 36
1002 39
1002 42
1003 126

The execTemplate

The syntax of the exec template is identical to select.

exec <ps> <by pb>from texp <wherepw>

The difference from select is that the result isnot a table.

If only one column is produced by the select phrase, theresult of exec is a list containing the column values produced. Thiscontrasts withselect, which produces a table with a single column inthis situation.

With tk as above,

        tk
eid | name       iq
----| --------------
1001| Dent       98
1002| Beeblebrox 42
1003| Prefect    126
 
        select name from tk
name
----------
Dent
Beeblebrox
Prefect
 
        exec name from tk
`Dent`Beeblebrox`Prefect

Using exec to extract a single column of a table (asopposed to a keyed table) is more powerful than other mechanisms to extract thecolumn because you can apply constraints on other columns.

        tdetails.sc
126 36 92 39 98 42
        tdetails[`sc]
126 36 92 39 98 42
        exec sc from tdetails
126 36 92 39 98 42
        exec sc from tdetails where eid in 1001 1002
36 92 39 98 42

If more than one column is produced by the select phrase,the result of exec is a dictionary mapping the column names to thevalues produced. This contrasts withselect, which produces a tablewith the specified columns.

        select eid,name from tk
eid  name
---------------
1001 Dent
1002 Beeblebrox
1003 Prefect
 
        exec eid,name from tk
eid | 1001 1002       1003
name| Dent Beeblebrox Prefect

Usingdistinct in select and exec

The built-in distinct function (see Appendix A) applied to a sourcetable returns a table containing the unique records in the source.

        tdup:([]c1:10 20 10 30 10 20 40 30;c2:`a`b`a`c`z`b`d`c)
        tdup
c1 c2
-----
10 a
20 b
10 a
30 c
10 z
20 b
40 d
30 c
 
        distinct tdup
c1 c2
-----
10 a
20 b
30 c
10 z
40 d

By including distinct in the select phrase of a selector exec query, you can similarly suppress duplicates from the result.

        select distinct c1 from tdup
c1
--
10
20
30
40
 
        exec distinct c2 from tdup
`a`b`c`z`d

Note:Whendistinct is used inselect, it appears immediately after ‘select’ and is applied across allthe specified columns, meaning that it returns rows with distinct values inthose columns. By contrast, in exec,distinct can apply to anycolumn and the result will be a non-rectangular in general.

        select distinct c2,c1 from tdup
c2 c1
-----
a  10
b  20
c  30
z  10
d  40
 
        exec distinct c2,c1 from tdup
c2| `a`b`c`z`d
c1| 10 20 10 30 10 20 40 30
 
        exec distinct c2,distinct c1 from tdup
c2| `a`b`c`z`d
c1| 10 20 30 40

One way to understand this behavior is as follows. Theresult of select is a table, which is rectangular; hencedistinctmust produce full rows. The result ofexec is a dictionary, so eachcolumn name (i.e., key) can have a different number of values.

Using eachin where

If a function or operator used in a where criterion is notatomic or uniform in its argument, you must use an each adverb. This is becausethe criterion is applied across the column vector(s).

        ts:([]f:1.1 2.2 3.3;s:("abc";"d";"ef"))
        select from ts where  s~"abc"
f s
---
 
        select from ts where  ("abc"~) each s
f   s
---------
1.1 "abc"

The first select does not achieve the desiredresult because it asks if the entire column matches the specified string. Thesecondselect works correctly because it is the projection of thebinary match operator applied to each item of the column.

Nestedwhere

As was mentioned in The where Phrase the criteria in thesubphrases of a where phrase are applied to the records of the tablesequentially from left to right. Consequently, the final list of records isobtained via a succession of intermediate results, each of which is narrowed bythe following subphrase criterion. Otherwise put, the where subphrasesconstitute a nested set of criteria.

The order of the subphrases in a nested where can havesignificant performance implications for queries against large tables. Wheneverpossible, list the subphrases in order of decreasing restrictiveness. That is,choose the subphrase at each position to be the one that results in thegreatest narrowing. Each intermediate table will be smallest and consequentlyless processing will be required at the next step.

Note:If there is one where subphrase that will always result in a significantlysmaller result set, it should be placed first in the sequence. In the case of apartitioned table, place any constraint on the partition column first.

A typical example is a series of measurements for entitieswith an identifier. This could be real-time stock prices, daily bond yields,yearly batting averages, test scores, etc. Say there are many differentidentifier values and you want to select certain records for a givenidentifier. It is better to filter on the identifier first since this willimmediately restrict the result set to a small subset of the original. This canlead an order of magnitude improvement.

Let's take our trivial example of IQ test scores andimagine that the table contains the result of SAT scores for all high schoolseniors in the United States. In this case, there will be several millionstudents with only a few records per student. Clearly if you want to perform ananalysis on the scores of an individual, it is best to limit the result bystudent first, since the initial and subsequent intermediate tables will betiny.

Imagine the following table containing millions of studentsocial security numbers and scores.

        tscores:([]ssn:0#`;sc:0#0)
        `tscores insert (`$"111-11-1111"; 999)
        `tscores insert (`$"222-22-2222"; 1242)
        `tscores insert (`$"333-33-3333"; 735)
        `tscores insert (`$"444-44-4444"; 1600)
        `tscores insert (`$"555-55-5555"; 1178)
        `tscores insert (`$"111-11-1111"; 1021)
        `tscores insert (`$"666-66-6666"; 882)
         .
         .
         .

Since each student takes the test only a few times, thefollowing query,

        select from tscores where ssn=`$"111-11-1111",0<deltas sc

executes significantly faster than,

        select from tscores where (ssn=`$"111-11-1111")&0<deltas sc

We point out that any nested where phrase is logicallyequivalent to an unnested phrase in which each of the subphrases is joined by&. In our example, the nested query produces the am results as either,

        select from tscores where (ssn=`$"111-11-1111")&0<deltas sc
 
        / or
 
        select from tscores where (0<deltas sc)&ssn=`$"111-11-1111"

However, both unnested versions will execute more slowlysince the compound criterion is applied against all records in the table.

select[n]

You can return the first or last n records of a selectresult using function parameter syntax on theselect. A positiveparameter returns the first n records specified by theselect body,while a negative parameter returns the last records.

        select[2] from tk
eid | name       iq
----| -------------
1001| Dent       98
1002| Beeblebrox 42
 
        select[-1] from tk
eid | name    iq
----| -----------
1003| Prefect 126

fby

It is sometimes desirable to use an aggregate function inthe where phrase of select. For example, suppose we are given a tablewith a foreign key and we wish to determine which key values have more than oneentry in the table. A first attempt might be to place a condition in thewherephrase that filters on the count being greater than 1. In our example oftdetails,this would be something like,

        select distinct eid from tdetails where 1<count eid
eid
----
1003
1002
1001

You can see this doesn't work, as the record for eidvalue 1003 is included even though it has only a single entry intdetails.What went wrong?

The better question is, what does this where expressionactually do? Since count is an aggregate function, it is appliedagainst the list of column values foreid. It cannot select individualrows since it does not return a boolean vector result. Indeed, it returns thescalar 5, the number of items in the column vector.

You could achieve the desired result with a correlatedsubquery. The inner query counts the records for each key value usingaggregation and grouping.

        q1:select cnt:count eid by eid from tdetails
        q1
eid | cnt
----| ---
1001| 2
1002| 3
1003| 1

The outer query selects the records with the desired count.

        select eid from q1 where 1<cnt
eid
----
1001
1002

An easier way to accomplish this result is to use fbyin the where phrase. Placingfby in a where subphrase allows anaggregate function to be used to select individual rows. The action is similarto the grouping ofby, with the specified aggregate function appliedacross the grouped values. (Hence the name "fby" which is short for"function by").

The use of fby is somewhat more abstract thanother elements of the select template. It is a binary operator of theform,

(fagg;expcol) fby c

The left operand is a two-item list consisting of anaggregate function fagg and a column expressionexp'colon which the function will be applied. The right operandc is a symbolcontaining the name of the column whose values are grouped to form lists forthe aggregate function.

Inclusion of fby in a where subphrase selectsthose records whose group passes the subphrase criterion specified by theaggregate function. This means that all records in a group either pass or failtogether, depending on the result of the aggregation on the group.

In our example above, we can achieve the desired resultwith an un-nested select usingfby. First, we verify thatfbydoes indeed accomplish what we want. Remember to evaluate the where criterionright-to-left.

        select eid from tdetails where 1<(count;eid) fby eid
eid
----
1002
1001
1002
1001
1002

Now we eliminate the duplicates.

        select distinct eid from tdetails where 1<(count;eid) fby eid
eid
----
1002
1001

Note:Multiple columns in the right operand offby must be encapsulatedin a table. To do this is, create an anonymous empty table with the desiredcolumn names only.

        t:([]sym:`IBM`IBM`MSFT`IBM`MSFT;
               ex:`N`O`N`N`N;
               time:12:10:00.0 12:30:00.0 12:45:00.0 12:50:00.0 13:30:00.0;
               price:82.1 81.95 23.45 82.05 23.40)
        t
sym  ex time         price
--------------------------
IBM  N  12:10:00.000 82.1
IBM  O  12:30:00.000 81.95
MSFT N  12:45:00.000 23.45
IBM  N  12:50:00.000 82.05
MSFT N  13:30:00.000 23.4
 
        select from t where price=(max;price) fby ([]sym;ex)
sym  ex time         price
--------------------------
IBM  N  12:10:00.000 82.1
IBM  O  12:30:00.000 81.95
MSFT N  12:45:00.000 23.45

It may take a while to get used to this notation.

The updateTemplate

Basicupdate

The update template has the same form as the selecttemplate.

update <pu><by pb> from texp <wherepw>

The difference is that column assignments in the updatephrase pu represent modifications to columns instead ofcolumn name aliases.

        t:([] c1:`one`two`three; c2:10 20 30)
        update c1:`third,c2:33 from t where c1=`three
c1    c2
--------
one   10
two   20
third 33

Important:In order to modify the contents oftexp you must refer to atable by name.

After execution of the query above, we still find,

        t
c1    c2
--------
one   10
two   20
three 30

However, t can be modified in place by referringto the table by name.

        update c1:`third,c2:33 from `t where c1=`three
`t
        t
c1    c2
--------
one   10
two   20
third 33

Note:Unlike updates in SQL,update can add a new column.

        t:([] c1:20 10 30 20; c2:`z`y`x`a)
        t
c1 c2
-----
20 z
10 y
30 x
20 a
 
        update c3:100+c1 from `t
`t
 
        t
c1 c2 c3
---------
20 z  120
10 y  110
30 x  130
20 a  120

update-by

When the by phrase is present, update can be usedto create new columns from the grouped values. When an aggregate function isused, it is applied to each group of values and the result is assigned to allrecords in the group.

        t:([] n:`a`b`a`c`c`b; p:10 15 12 20 25 14)
        t
n p
----
a 10
b 15
a 12
c 20
c 25
b 14
 
        update av:avg p by n from t
n p  av
---------
a 10 11
b 15 14.5
a 12 11
c 20 22.5
c 25 22.5
b 14 14.5

If a uniform function is used, it is applied across thegrouped values and the result is assigned in sequence to the records in thegroup. Withtas above,

        update s:sums p by n from t
n p  s
-------
a 10 10
b 15 15
a 12 22
c 20 20
c 25 45
b 14 29

upsert

The dyadic function upsert is an alternate namefor join ( , ) on tables and keyed tables.

For keyed tables, the match is done by key value.

        kt:([k:`one`two`three] c:10 20 30)
        kt
k    | c
-----| --
one  | 10
two  | 20
three| 30
 
        ku:([k:`three`four]; c:300 400)
        ku
k    | c
-----| ---
three| 300
four | 400
 
        kt upsert ku
k    | c
-----| ---
one  | 10
two  | 20
three| 300
four | 400

For regular (non-keyed) tables, the records are appended.

        t:([]c1:`one`two`three;c2:10 20 30)
        t
c1    c2
--------
one   10
two   20
three 30
 
        u:([]c1:`three`four;c2:30 40)
        u
c1    c2
--------
three 30
four  40
 
        t upsert u
c1    c2
--------
one   10
two   20
three 30
three 30
four  40

Note:The upsert expressions above do not affect the original table. You must referto the table by name to modify the original.

delete

The syntax of the delete template is simpler thanthat of select, with the added restriction that eitherpcolsorpw can be present but not both.

delete <pcols> from texp<where pw>

If pcols is present as a symbol list ofcolumn names, the result is a table derived fromtexp inwhich the secified columns are removed. Ifpw is present, theresult is a table derived fromtexp in which records meetingthe criteria ofpw are removed.

        t:([]c1:`a`b`c;c2:`x`y`z)
        t
c1 c2
-----
a  x
b  y
c  z
 
        delete c1 from t
c2
--
x
y
z
        delete from t where c2=`z
c1 c2
-----
a  x
b  y

Important:In order to modify the contents oftexp you must refer to thetable by name.

Thus, after execution of the last query above, we stillfind,

        t
c1 c2
-----
a  x
b  y
c  z

However, t can be modified in place with,

        delete from `t where c2=`z
't
        t
c1 c2
-----
a  x
b  y

Groupingand Aggregation

Aggregation is the result of applying an aggregate function- one that produces an atom from a list - to a column.

SQLAggregation

In traditional SQL, aggregation and grouping are limitedand cumbersome. Aggregation and grouping are bound together: only columns thatappear in the GROUP BY can participate in the SELECT result. Moreover, there isa limited collection of built-in aggregation functions.

In q, grouping and aggregation can be used independently ortogether.

Groupingwithout Aggregation

Grouping in q collects rows having a common value in thegroup domain. Unlike SQL, any column can participate in the select result whengrouping. Moreover, the columns in the by phrase are automatically included inthe result as keys.

When a column not in the by phrase is explicitly specifiedin the select phrase, the result of grouping without aggregation has acorresponding column of non-simple type. There will be one item in the valuelist for each record matching a given group domain value.

For example, we can group order quantities by supplier inthe sp.q script sample tables.

        select qty by s from sp
s | qty
-----| -----------------------
s1| 300 200 400 200 100 4000
s2| 300 400
s3| ,200
s4| 100 200 300

You can group by the result of a function applied to acolumn. For example, the following query groups all products meeting a certainorder quantity threshold.

        select distinct p by thrsh:qty>200 from sp
thrsh| p
-----| ------------------
0    | `p$`p2`p4`p5`p6
1    | `p$`p1`p3`p2`p4`p5

You can also group by virtual columns from foreign keys.

        select sname:s.name, qty by pname:p.name from sp
pname| sname                    qty
-----| ----------------------------------------
bolt | `smith`jones`blake`clark 200 400 200 200
cam  | `clark`smith             100 400
cog  | ,`smith                  ,100
nut  | `smith`jones             300 300
screw| `smith`smith`clark       400 200 300

Important:When no columns are explicitly specified in the select phrase, the result ofgrouping without aggregation has columns of simple type. The value for eachresult column is obtained by picking the value of the last record matching thegroup domain value.

For example, the following query,

        select by p from sp
p | s  qty
--| ------
p1| s2 300
p2| s4 200
p3| s1 400
p4| s4 300
p5| s1 400
p6| s1 100

is equivalent to the following query using the aggregatelast on each non-grouped column,

        select last s, last qty by p from sp
p | s  qty
--| ------
p1| s2 300
p2| s4 200
p3| s1 400
p4| s4 300
p5| s1 400
p6| s1 100

One way to obtain all the remaining columns in a groupingwithout explicitly listing them in aselect is to use thexgroupfunction. It takes column symbol(s) as the left operand and a table as itsright operand. The result is a keyed table that is that same as listing all thenon-grouped columns in the comparableselect.

Using the distibution example,

        `p xgroup sp
p | s               qty
--| -------------------------------
p1| `s$`s1`s2       300 300
p2| `s$`s1`s2`s3`s4 200 400 200 200
p3| `s$,`s1         ,400
p4| `s$`s1`s4       200 300
p5| `s$`s4`s1       100 400
p6| `s$,`s1         ,100

Aggregationwithout Grouping

Aggregation can be applied against a column of non-simpletype in any table. The aggregate function can be any function that processes alist of the appropriate form and produces an atom. While q has many built-inaggregates, you can also define and use your own.

We calculate the average order quantity in the sp.qscript sample tables by using the built-in aggregateavg.

        select totq:sum qty, avgq:avg qty from sp
totq avgq
-------------
3100 258.3333

Groupingwith Aggregation

The equivalent of SQL aggregation is achieved in q bycombining grouping with aggregation.

Continuing with the sp.q script example, wecombine grouping and aggregation to compute the average order quantity bysupplier.

        select avgqty:avg qty by s.name from sp
name | avgqty
-----| --------
blake| 200
clark| 200
jones| 350
smith| 266.6667

UsingUniform and Aggregate Functions

Any uniform or aggregate function can by applied directlyto columns in aggregation.

Again using the sp.q distribution example, foreach salesperson we can find the cumulative low quantity at the same time withthe average and high.

        select cumlo:mins qty, av:avg qty, hi:max qty by s.name from sp
name | cumlo                   av       hi
-----| ------------------------------------
blake| 200                     200      200
clark| 100 100 100             200      300
jones| 300 300                 350      400
smith| 300 200 200 200 100 100 266.6667 400

Using each

If the data in a column is not atomic (that is, the columnhas a list of values for each row), you must use theeach modifier toapply an aggregate.

In our sp.q example, suppose we define a table ofintermediate results as,

        o:select qty by p.name from sp
        o
name | qty
-----| ---------------
bolt | 200 400 200 200
cam  | 100 400
cog  | 100
nut  | 300 300
screw| 400 200 300

We must use each to compute the average order sizefor each product ino.

        select name, avqty:avg qty from o
'length
 
        select name, avqty:avg each qty from o
name  avqty
-----------
bolt  250
cam   250
cog   100
nut   300
screw 300

Usingungroup

The monadic function ungroup is a partial inverseto the resultant keyed tables ofselect andxgroup. Itunwinds the keyed table into a table whose records have the same format as theoriginal table. How closely its output resembles the original table depends onwhether information has been collapsed in the grouping.

We use the sp table from the distribution scriptfor our examples.

        sp
s  p  qty
---------
s1 p1 300
s1 p2 200
s1 p3 400
s1 p4 200
s4 p5 100
s1 p6 100
s2 p1 300
s2 p2 400
s3 p2 200
s4 p2 200
s4 p4 300
s1 p5 400
 
        `p xgroup sp
 
p  s  qty
---------
p1 s1 300
p1 s2 300
p2 s1 200
p2 s2 400
p2 s3 200
p2 s4 200
p3 s1 400
p4 s1 200
p4 s4 300
p5 s4 100
p5 s1 400
p6 s1 100

Since no aggregation has been performed and all non-keycolumns are present, the result ofungroup is the same as the originaltable with the rows sorted by the group column(s).

        ungroup `p xgroup sp
p  s  qty
---------
p1 s1 300
p1 s2 300
p2 s1 200
p2 s2 400
p2 s3 200
p2 s4 200
p3 s1 400
p4 s1 200
p4 s4 300
p5 s4 100
p5 s1 400
p6 s1 100

If aggregation has been performed or columns have beenomitted, then only the selected values will be reflected after theungroup.For example, we omit the s column in the following grouping, so it is alsomissing after theungroup.

        select qty by p from sp
p | qty
--| ---------------
p1| 300 300
p2| 200 400 200 200
p3| ,400
p4| 200 300
p5| 100 400
p6| ,100
 
        ungroup select qty by p from sp
p  qty
------
p1 300
p1 300
p2 200
p2 400
p2 200
p2 200
p3 400
p4 200
p4 300
p5 100
p5 400
p6 100

Note:The result of aselect in which grouping is specified but no columns are explicitly listedis not a keyed table of the proper form forungroup. You will receivean error if you apply ungroup to the result of such a query.

Sorting

Recall that tables and keyed tables are lists of recordsand therefore have an inherent order. A table or keyed table can be sorted bythe values of any column(s).

We use the following table definition in this section.

        t:([]c1:20 10 30 20;c2:`z`y`x`a)
        t
c1 c2
-----
20 z
10 y
30 x
20 a

xasc

The dyadic xasc takes a scalar or list of symbolscontaining column names as its left argument and a table as its right argument.It returns the records of the table sorted in ascending order of the items inthe specified column(s). The order of the column names indicates the sortorder, from major to minor.

        `c1 xasc t
c1 c2
-----
10 y
20 z
20 a
30 x
 
       `c2 xasc t
c1 c2
-----
20 a
30 x
10 y
20 z
 
       `c1`c2 xasc t
c1 c2
-----
10 y
20 a
20 z
30 x

Important:In order to modify the contents of a table you must refer to the table by name.

After execution of the expressions above, we still find,

       t
c1 c2
-----
20 z
10 y
30 x
20 a

However, t can be sorted in place with,

         `c1`c2 xasc `t
`t
        t
c1 c2
-----
10 y
20 a
20 z
30 x

xdesc

The dyadic xdesc behaves exactly as xasc,except that the sort is performed in descending order.

        t
c1 c2
-----
20 z
10 y
30 x
20 a
 
        `c1`c2 xdesc t
c1 c2
-----
30 x
20 z
20 a
10 y

Renamingand Rearranging Columns

Since a table is the flip of a column dictionary, itscolumns are named and ordered by the list of symbols in the dictionary domain.It is sometimes necessary to rename or reorder the columns. This isaccomplished using the dyadic functionsxcol andxcols.

We use the following table definition in this section.

        t:([]c1:20 10 30 20;c2:`z`y`x`a;c3:101.1 202.2 303.3 404.4)
        t
c1 c2 c3
-----------
20 z  101.1
10 y  202.2
30 x  303.3
20 a  404.4

xcol

The dyadic xcol takes a scalar or list of symbolscontaining column names as its left argument (names) and a table (source)as its right argument. The count ofnames must be less than or equal tothe number of columns insource. The result is a table obtained fromsourceby renaming the columns, in order, using the symbols innames.

For example,

        `id`name`val xcol t
id name val
-------------
20 z    101.1
10 y    202.2
30 x    303.3
20 a    404.4

Important:The functionxcol does not modify its table operand.

After execution of the expressions above, we still find,

        t
c1 c2 c3
-----------
20 z  101.1
10 y  202.2
30 x  303.3
20 a  404.4

However, t can effectively be renamed with,

        t:`id`name`val xcol t
        t
id name val
-------------
20 z    101.1
10 y    202.2
30 x    303.3
20 a    404.4

If the count of names is less than the number ofcolumns in source, the remaining columns are unaffected. Returning tothe original definition of t,

        `id`name xcol t
id name c3
-------------
20 z    101.1
10 y    202.2
30 x    303.3
20 a    404.4

xcols

The dyadic xcols takes a scalar or list of symbolscontaining column names as its left argument ("names") and a table("source") as its right argument. The count of "names" mustbe less than or equal to the number of columns in "source". Itreturns a table obtained from "source" by reordering the columnsaccording to the symbols in "names".

Note:The source operand can not be a keyed table.

For example,

        `c3`c2`c1 xcols t
c3    c2 c1
-----------
101.1 z  20
202.2 y  10
303.3 x  30
404.4 a  20

Important:The functionxcols does not modify itssource.

After execution of the expressions above, we still find,

        t
c1 c2 c3
-----------
20 z  101.1
10 y  202.2
30 x  303.3
20 a  404.4

However, t can effectively be reordered with,

        t: `c3`c2`c1 xcols t
        t
c3    c2 c1
-----------
101.1 z  20
202.2 y  10
303.3 x  30
404.4 a  20

If the count of names is less than the number ofcolumns in source, the specified columns are reordered at the beginningof the column list and the remaining columns are left unchanged. Returning tothe original definition of t,

        `c3`c1 xcols t
c3    c1 c2
-----------
101.1 20 z
202.2 10 y
303.3 30 x
404.4 20 a

Joins

It is common in SQL to reassemble normalized data byjoining a table having a foreign key (source) to its primary key tablealong common key values. This situation occurs when the tables have amaster-detail relation, or when the values of a field have been factored into alookup table. Such an inner join with equals in the join criterion is called anequal join or anequijoin. In an equijoin, the join can bespecified in either order, and there will beexactly one record it theresult for each record in the source.

An inner join combines two tables having compatible columnsby selecting a subset of the Cartesian product along matching column values. Inaleft inner join, each row from the first table (source) ispaired with any matching rows from the second table. In aright inner join,each row from the second table (source) is paired with any matching rows fromthe first table. The match columns do not need to be key columns. In an innerjoin, there may beno rows or multiple rows in the result for each rowin the source.

SQL also has outer joins, in which each element of onetable (source) is paired with all matching elements of the other table.The match columns do not need to be key columns. In an outer join, there is atleastone row in the result for each row in the source.

Equijoinon Foreign Key

Given a primary key table m, foreign key table dand common key columnk, an equijoin can be expressed in various SQLnotations, among them,

        m,d WHERE m.k = d.k
 
        m INNER JOIN d ON m.k = d.k

A SELECT statement for this join refers to columns in thejoin by using dot notation based on the constituent tables.

SELECT d.cold,m.colm FROM m,d WHERE m.k = d.k

As we saw in Foreign Keys and Relations a join along aforeign key is accomplished with an enumeration in q. The join is implicit inthe followingselect on the detail table.

select cold, k.colm from d

This generalizes to the situation where d hasmultiple foreign keys. Sayd has foreign keys k1, k2,... ,kn referring to primary key tables m1, m2,... ,mn. Columns from the n-way join ofd to the primarykey tables are accessed via a select of the form,

select cold, k1.colm1, k2.colm2,... , kn.colmn from d

For example, in the sp.q distribution script,

        select sname:s.name,pname:p.name,qty from sp
sname pname qty
---------------
smith nut   300
smith bolt  200
smith screw 400
smith screw 200
clark cam   100
smith cog   100
jones nut   300
jones bolt  400
blake bolt  200
clark bolt  200
clark screw 300
smith cam   400

Multi-way equijoins also arise when m and dare as above and additionallyd has a primary keyl. If sis a table with a foreign key whose enumeration domain isl, thenm,d and s can be joined. In SQL,

SELECT m.colm, d.cold, s.colsfrom m,d,s WHERE m.k=d.k AND d.l=s.l

In q this is

select l.k.colm,l.cold,colsfrom s

PseudoJoin

It is possible to lookup a table's values in a keyed tableeven if there is no foreign key relationship defined. One method to achievethis is to perform a dictionary lookup inselect. There is norequirement for column names to match and the result will be a left outer join.

In the following example, observe that we must transformthe column to be looked up into the proper shape.

        kt:([k:101 102 103] v:`a`b`c)
        t:([] c1:101 103 104)
        select c1, v:kt[flip enlist c1;`v] from t
c1  v
-----
101 a
103 c
104

Here is an example with compound keys.

        t:([]c1:`a`b`c; c2:`x`x`z)
        ktc:([k1:`a`b`a; k2:`y`x`x] v:`one`two`three)
        select c1, c2, v:txf[ktc;(c1;c2);`v] from t
c1 c2 v
-----------
a  x  three
b  x  two
c  z

Ad hoc LeftJoin

You can also create a left outer join using the dyadic lj.The right operand is a keyed table (lookup) and the left operand is atable (source) having column(s) that match the key column(s) inlookup.In particular,source can have a foreign key defined overlookup.The ad hoc joinlj uses lookup to map the records of theappropriatesource column(s) and upsertssource with the valuecolumn(s) fromlookup.

In our example,

        tdetails lj tk
eid  sc  name       iq
-----------------------
1003 126 Prefect    126
1002 36  Beeblebrox 42
1001 92  Dent       98
1002 39  Beeblebrox 42
1001 98  Dent       98
1002 42  Beeblebrox 42

The same result can be obtained with a foreign key join bylisting all the columns

        select eid, sc, eid.name, eid.iq from tdetails
eid  sc  name       iq
-----------------------
1003 126 Prefect    126
1002 36  Beeblebrox 42
1001 92  Dent       98
1002 39  Beeblebrox 42
1001 98  Dent       98
1002 42  Beeblebrox 42

Note:The performance of an equijoin on a key is approximately 2.5 times faster thanan ad hoc left join.

In contrast to the equijoin, an ad hoc left join does notrequire a column in thesource table to be defined explicitly as aforeign key into thelookup keyed table.

        td:([] eid:1003 1001 1002 1001 1002; sc:126 36 92 39 98)
        td lj tk
eid  sc  name       iq
-----------------------
1003 126 Prefect    126
1001 36  Dent       98
1002 92  Beeblebrox 42
1001 39  Dent       98
1002 98  Beeblebrox 42

Note:If the column(s) for the join are not foreign key(s) into the keyed table, thename(s) must match the key name(s).

Let's examine the general result of lj closely. Say tis the source table and kt is the lookup keyed table. Foreach record int, the result has at least one record. If there are norecords inkt whose values in the join column(s) match those in thecorresponding column(s) oft, thet columns are present in theresult and the remaining columns are null. If there are matching records inkt,for each match the result has a record comprising the catenation of thematching records.

        kt:([k:1 2 3] b:100 200 300)
        kt
k| b
-| ---
1| 100
2| 200
3| 300
 
         t:([]k:1 1 2 2 3 4; a:10 11 20 21 30 40)
         t
k a
----
1 10
1 11
2 20
2 21
3 30
4 40
 
         t lj kt
k a  b
--------
1 10 100
1 11 100
2 20 200
2 21 200
3 30 300
4 40

Advanced:The behavior oflj differs from that of a SQL outer join when there are duplicatecolumns in the two tables. The SQL left outer join will display both columns,whereaslj upserts the appropriate column items of thesource tablewith those of thelookup keyed table.

         t2:([]k:1 2 3;b:10 20 30)
         t2
k  b
------
1 10
2 20
3 30
 
        kt2:([k:1 2 3 4]b:100 200 300 400)
        kt2
k| b
-| ---
1| 100
2| 200
3| 300
4| 400
 
        t2 lj kt2
k b
-----
1 100
2 200
3 300

Plus Join

The plus join pj is a type of left join that isuseful for adding matching values in tables containing numeric data. As with anad hoc join, the right operand of plus join is a keyed table (lookup) andthe left operand is a table (source) having column(s) that match the keycolumn(s) in lookup. The plus join pj uses lookup to mapthe records of the appropriate source column(s), zero filling nulls inthe result from the lookup value column(s). It then performs a table add ofthis interim result into thesource table.

For example,

        kt:([k1:1 2; k2:`x`y] a:10 20; b:1.1 2.2)
        t:([]k1:1 2 3; k2:`x`y`z; a:100 200 300)
        t pj kt
k1 k2 a   b
-------------
1  x  110 1.1
2  y  220 2.2
3  z  300 0

We examine the result of pj more closely. Eachrecord of t has a corresponding record in the result.

Along the matching rows, the value columns from lookup ktare added to those of sourcet. In our example, this means thatcolumnsa and b are added intot on matching rows.Since a exists in both tables, corresponding values are added.According to the rules of table arithmetic, sinceb does not exist int,it is implicitly assumed to have 0 values int for the addition.

For non-matching rows, the values of the source tare extended with 0 in the columns oflookup.

Advanced:Note that the result in our example can also be obtained by the expression,

        t+0^kt[`k1`k2#t]
k1 k2 a   b
-------------
1  x  110 1.1
2  y  220 2.2
3  z  300 0

Union Join

The union join uj combines any two tables. In theresult, the rows and columns of the left operand appear before those of theright operand. Column value lists are joined for common columns. For non-commoncolumns, the value lists are extended with nulls so that they are the samelength. The column value lists of the left operand have nulls appended, whereasthose of the right operand have nulls prepended.

        t1:([]c1:1 2 3;c2:101 102 103;c3:`x`y`z)
        t2:([]c2:103 104 105 106;c4:`a`b`c`d)
        t1
c1 c2  c3
---------
1  101 x
2  102 y
3  103 z
 
        t2
c2  c4
------
103 a
104 b
105 c
106 d
 
        t1 uj t2
c1 c2  c3 c4
------------
1  101 x
2  102 y
3  103 z
   103    a
   104    b
   105    c
   106    d

Asof Join

The asof join is so-named because it is often used to jointables along time columns, but this is not a restriction. In general, thetriadic functionaj can be used to join two tables along commoncolumns. Significantly, there is no requirement for any of the join columns tobe keys. The syntax of asof join is,

aj[c1...cn;t1;t2]

where c1...cn is a symbol listof common column names for the join andt1 andt2are the tables to be joined. The result is a table containing records from theleft outer join oft1 andt2 along thespecified columns.

For each record in t1, the result has onerecord containing all the items int1. If there is no recordint2 whose values in the specified columns match those inthe corresponding columns oft1, there are no further itemsin the result record. If there are matching records int2,the items of the last (in row order) matching record are appended to those ofthet1 record in the result.

For example,

        t:([]ti:10:01:01 10:01:03 10:01:04;sym:`msft`ibm`ge;qty:100 200 150)
        t
ti       sym  qty
-----------------
10:01:01 msft 100
10:01:03 ibm  200
10:01:04 ge   150
 
        q:([]ti:10:01:00 10:01:01 10:01:01 10:01:03;sym:`ibm`msft`msft`ibm; px:100 99 101 98)
        q
ti       sym  px
-----------------
10:01:00 ibm  100
10:01:01 msft 99
10:01:01 msft 101
10:01:03 ibm  98
 
        aj[`ti`sym;t;q]
ti       sym  qty px
---------------------
10:01:01 msft 100 101
10:01:03 ibm  200 98
10:01:04 ge   150

ParameterizedQueries

Relational databases have the concept of stored procedures,which are programs that operate on tables via SQL statements. The programminglanguages that extend SQL are not part of the SQL standard, differ acrossvendors and the capabilities of the programming environments vary greatly.

This situation forces a programmer to make a difficultchoice: pay a steep price in programming power to place functionality close tothe data, or extract the data into an application server in order to performcalculations. Multi-tier architectures with separate database and applicationservers have evolved largely to address this problem, but they increase costand complexity.

This choice is obviated in kdb+, since the q programmingenvironment has all the power and performance you need. In fact, q is muchfaster than traditional database programming environments for retrieval andcalculations on large time series. Other components of the application canperform their data retrieval and manipulation by making calls to q.

Traditional calls to a database are made via storedprocedures, which are programs executed by the database manager. Often thestored procedure has parameters that supply specific values to the queries.Such parameters are limited to the basic data types of SQL.

Any q program can serve as a stored procedure; there is nodistinction between data retrieval and calculations. Any valid q expressionthat operates on tables or dictionaries can be invoked in a function. Functionparameters can be used to supply specific values for queries. In particular,the select, update and delete templates can be invoked within a function byusing parameters to pass specific values to the query. Such a function iscalled aparameterized query.

Important:Parameterized queries have restrictions. First, a parameterized query cannotuse implicit function parameters. Second, columns cannot be passed asparameters.

In the following example using our tdetails table,we pass a specific value for a foreign key match criterion.

        getScByEid:{[e] select from tdetails where eid=e}
        getScByEid 1003
eid  sc
--------
1003 126

This example can be generalized to handle a scalar or listargument.

        getScByEid:{[e] select from tdetails where eid in ((),e)}
        getScByEid 1001
eid  sc
-------
1001 92
1001 98
 
        getScByEid 1001 1003
eid  sc
--------
1003 126
1001 92
1001 98

The last expression in the revised function definitionwarrants closer examination. The empty-list join turns a scalar argument into alist and has no effect on a list. It must be enclosed in parentheses because itappears in a phrase inselect, otherwise the comma would beinterpreted as a separator.

You can pass a table as a parameter to a stored procedure.Suppose we have multiple trade tables, all having at the columnspx(price) anddate in common. The following parameterized query returnsthe maximum price over a specified date range from any trade table.

        maxpx:{[t;range] select max px from t where date within range}

Here t is a trade table and range is alist of two dates in increasing order.

Advanced:You can effectively parameterize column names in two ways. First, you can mimica common technique from SQL in which the query is created dynamically: buildthe query text in a string and then pass the string tovalue forexecution. There is a performance penalty for this approach. Also, you mustremember to escape special characters in the string.

The second method is to use the functional form of thequery, which has no performance penalty. In the functional form, all columnsare referred to by name, so columns names are passed as symbols.

Views

In SQL, a view is essentially a stored procedure whoseresult set is used like a table. Views are used to encapsulate such datatransformations as hiding data columns or rows, renaming columns, orsimplifying complex queries. Q-sql implements a view as an alias to a query.

View

A view is a named query created as an alias withthe double assignment (::) operator. In the following, the double–colonsignifies thatv is an alias for the query rather than the currentresult of the query.

        t:([] c1:`a`b`c; c2:1 2 3)
        v::select c1 from t where c2=2
        v
c1
--
b

When the content of the underlying table changes, theresult will be reflected in the view. This is not true of the equivalent singleassignment.

        r:select c1 from t where c2=2
        `t insert (`d;2)
,3
 
        t
c1 c2
-----
a  1
b  2
c  3
d  2
 
        r
c1
--
b
 
        v
c1
--
b
d

FunctionalForms

The functional forms of select, updateand delete can be used in any situation but are especially useful forprogrammatically generated queries, such as when column names are dynamicallyproduced. The functional forms are,

        ?[t;c;b;a]                / select
 
        ![t;c;b;a]                / update and delete

where t is a table, a is a dictionary ofaggregates, b is a dictionary of groupbys andc is a list ofconstraints.

Note:All q entities ina,b and c must be referenced by name, meaning they appear as symbolscontaining the entity names.

The q interpreter parses the syntactic forms of select,exec,update anddelete into their equivalentfunctional forms, so there is no performance difference.

Advanced:The function parse can be applied to a string containing a query template toproduce a parse tree whose items are close to the arguments of the equivalentfunctional form. See the description of parse in Appendix A for more details.

Functionalselect

Let's start with a simple select example.

        t:([]n:`x`y`x`z`z`y;p:0 15 12 20 25 14)
        t
n p
----
x 0
y 15
x 12
z 20
z 25
y 14
 
        select m:max p,s:sum p by name:n from t where p>0,n in `x`y
name| m  s
----| -----
x   | 12 12
y   | 15 29

Following is the equivalent functional form. Note the useof enlist to create singletons, ensuring that appropriate entities arelists.

        c: ((>;`p;0);(in;`n;enlist `x`y))
        b: (enlist `name)!enlist `n
        a: `m`s!((max;`p);(sum;`p))
        ?[t;c;b;a]
name| m  s
----| -----
x   | 12 12
y   | 15 29

Of course, the functional form can be written without theintermediate variablesa,b and c. We leave this asan exercise to the macho coder.

The general form of functional select is,

         ?[t;c;b;a]

where t is a table, c is a list of wherespecifications (constraints), b is a dictionary of groupingspecifications (by phrase), and a is a dictionary ofselectspecifications (aggregations).

Every item in c is a triple consisting of aboolean or int valued dyadic function together with its arguments, each anexpression containing column names and other variables. The function is appliedto the two arguments, producing a boolean vector. The resulting boolean vectorselects the rows that yield non-zero results. The selection is performed in theorder of the items in c, from left to right.

The domain of b is a list of symbols that are thekey names for the grouping. The range ofb is a list of columnexpressions whose results are used to construct the groups. The grouping isordered by the domain elements, from major to minor.

The domain of a is a list of symbols containingthe names of the produced columns. Each element of the range of a is an evaluationlist consisting of a function and its argument(s), each of which is a columnname or another such result list. For each evaluation list, the function isapplied to the specified value(s) for each row and the result is returned. Theevaluation lists are resolved recursively when operations are nested.

Note:Here are the degenerate cases: For no constraints, makec the empty(general) list For no grouping makeb a boolean 0b To produceall columns of the original table in the result, makea the emptylist

For example,

        select from t                / is equivalent to functional form
        ?[t;();0b;()]                   / degenerate case for c, b, a

Functionalexec

The functional form of exec is a simplified formof select. Since the constraint parameter is the same as inselect,we omit it in the following.

In the simplest example of a single result column, thegroupby parameter is the empty list and the aggregate parameter is a symbolatom.

        exec n from t
`x`y`x`z`z`y
        ?[t;();();`n]  / same as previous exec
`x`y`x`z`z`y

In the same query with multiple columns, the groupbyparameter is the empty list and the aggregate parameter is a dictionary as itwould be in aselect. Remember that the result is a dictionary ratherthan a table.

        exec n,p from t
n| x y  x  z  z  y
p| 0 15 12 20 25 14
        ?[t;();();`n`p!`n`p]   / same as previous exec
n| x y  x  z  z  y
p| 0 15 12 20 25 14

If you wish to group by a single column, specify it as asymbol atom.

        exec p by n from t
x| 0  12
y| 15 14
z| 20 25
        ?[t;();`n;`p]          / same as previous exec
x| 0  12
y| 15 14
z| 20 25

More complex examples of exec seem to reduce tothe equivalent select.

Functionalupdate

The functional form of update is completelyanalogous to that of select. Again note the use ofenlist tocreate singletons to ensure that appropriate entities are lists.

        update p:max p by n from t where p>0
n p
----
x 0
y 15
x 12
z 25
z 25
y 15
 
        c: enlist (>;`p;0)
        b: (enlist `n)!enlist `n
        a: (enlist `p)!enlist (max;`p)
        ![t;c;b;a]
n p
----
x 0
y 15
x 12
z 25
z 25
y 15

Note:The degenerate cases are the same as in functionalselect.

Functionaldelete

The functional form of delete is a simplified formof functional update,

         ![t;c;0b;a]

where t is a table, c is a list of wherespecifications (constraints) anda is a list of column names. Eithercor a, but not both, must be present. The list of constraints, whichhas the same format as in functional select and update, chooses which rows willbe removed. The aggregates argument is a simple list of symbols with the namesof columns to be removed.

In the following examples, note the use of enlistto create singletons to ensure that appropriate entities are lists.

        t:([]c1:`a`b`c;c2:`x`y`z)
 
        / following is: delete c2 from t
        ![t;();0b;enlist `c2]
c1
--
a
b
c
 
        / following is: delete from t where c2 = `y
        ![t;enlist (=;`c2; enlist `y);0b;`symbol$()]
c1 c2
-----
a  x
c  z

Examples

In this section we demonstrate many of the capabilities ofq-sql using semi-serious examples taken from the world of finance. We create asample table representing a month's worth of trades for a small set of Americanstocks. To make things easy, we treat all trades as buys.

The TableSchemas

Our vastly over-simplified trading example involves twotables. The instrument table is a reference keyed table that contains basicinformation about the companies whose financial instruments (stocks in ourcase) are traded. Its schema has fields for the stock symbol, the name of thecompany and the industry classification of the company.

        instrument:([sym:`symbol$()] name:`symbol$(); industry:`symbol$())
        instrument
sym| name industry
---| -------------

The trade table represents a collection of trades.Each trade record comprises: the symbol of the instrument; the date and time ofthe trade; the quantity—i.e. number of shares traded; and the price of thetrade.

        trade:([] sym:`instrument$(); date:`date$(); time:`time$(); quant:`int$();px:`float$())
        trade
sym date time quant px
----------------------

Note:In practice, the trade table would likely be partitioned by day on disk andonly the current day's trades would be stored in memory.

Creatingthe Tables

Populating the instrument reference table is donevia simple inserts.

 `instrument insert (`ibm; `$"International Business Machines"; `$"Computer Services")
 `instrument insert (`msft; `$"Microsoft"; `$"Software")
 `instrument insert (`g; `$"Google"; `$"Internet")
 `instrument insert (`intc; `$"Intel"; `$"Semiconductors")
 `instrument insert (`gm; `$"General Motors"; `$"Automobiles")
 `instrument insert (`ge; `$"General Electric"; `$"Diversified Industries")

Here is the console display of instrument,

        instrument
sym | name                            industry
----| ------------------------------------------------------
ibm | International Business Machines Computer Services
msft| Microsoft                       Software
g   | Google                          Internet
intc| Intel                           Semiconductors
gm  | General Motors                  Automobiles
ge  | General Electric                Diversified Industries

In order to populate the trade table with somewhatrealistic data, we create an auxiliary function. The filltrade function takesthe name of the target trade table, a stock symbol, a median price and a count.It populates the named table with simulated trade data for the month of Jan2007. The trades are randomly distributed across days and times. The quantitiesoccur in multiples of 10. The prices are uniformly distributed around themedian price. We do not claim that this represents realistic trade data; onlythat it is sufficient to serve our query examples.

filltrade:{[tname;s;p;n]
        // tname is name of target table
        // s is stock symbol
        // p is median price
        // n is count of items
        //
        / sym column duplicates stock symbol n times
        sc:n#s;
        / date column has n random days in Jan 2007
        dc:2007.01.01+n?31;
        / time column has n random times
        tc:n?24:00:00.000;
        / quantity column has n random multiples of 10
        qc:10*n?1000;
        / price column has n random prices that are
        / distributed uniformly around p
        / prices are in pennies
        pc:.01*floor (.9*p)+n?.2*p*:100;
        / bulk insert columns into target table
        tname insert (sc;dc;tc;qc;pc)
        }
 
        filltrade[`trade;`ibm;115;10000]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..
 
        trade
sym date       time         quant px
----------------------------------------
ibm 2007.01.15 02:32:54.217 9280  111.59
ibm 2007.01.20 08:56:05.985 9960  110.69
ibm 2007.01.24 19:20:17.727 5970  114.58
ibm 2007.01.21 08:44:50.939 1090  113.32
..

We invoke filltrade on each of the remaininginstruments.

        filltrade[`trade;`msft;30;5000]
10000 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010..
        filltrade[`trade;`g;540;12000]
15000 15001 15002 15003 15004 15005 15006 15007 15008 15009 15010..
        filltrade[`trade;`intc;25;4000]
27000 27001 27002 27003 27004 27005 27006 27007 27008 27009 27010..
        filltrade[`trade;`ge;40;9000]
31000 31001 31002 31003 31004 31005 31006 31007 31008 31009 31010..
        filltrade[`trade;`gm;35;3000]
40000 40001 40002 40003 40004 40005 40006 40007 40008 40009 40010..

Finally, we sort trade by date and time so that itrepresents trades as they came in.

        `date`time xasc `trade
`trade
        trade
sym  date       time         quant px
-----------------------------------------
intc 2007.01.01 00:00:04.569 5440  26.63
ge   2007.01.01 00:02:24.871 8280  40.11
gm   2007.01.01 00:02:43.419 4280  32.13
ibm  2007.01.01 00:03:06.278 5070  105.73
intc 2007.01.01 00:03:24.229 1740  24.47
gm   2007.01.01 00:04:17.590 830   36.53
gm   2007.01.01 00:04:18.227 5060  33.02
ge   2007.01.01 00:04:18.772 8290  43.73
msft 2007.01.01 00:06:01.424 5170  27.71
..

BasicQueries

In this section, we demonstrate the use of basic q-sql toquery the trade andinstrument tables we have created.

We can count the total number of trades in several ways.

        count trade
43000
        select count i from trade
x
-----
43000
        exec count i from trade
43000

We can count the number of trades for an individual symbol.

        exec count i from trade where sym=`ibm
10000
        count select from trade where sym=`ibm
10000

Observe that the former retrieves only a single record fromthe query whereas the latter retrieves all matching records and then countsthem.

We can count the number of trades across all symbols.

        select count i by sym from trade
sym | x
----| -----
g   | 12000
ge  | 9000
gm  | 3000
ibm | 10000
intc| 4000
msft| 5000
 
        () xkey select count i by sym from trade
sym  x
----------
g    12000
ge   9000
gm   3000
ibm  10000
intc 4000
msft 5000

Observe that the former retrieves the results as a keyedtable and the latter removes the key.

We find one day's trades for GM.

        select from trade where sym=`gm, date=2007.01.07
sym date       time         quant px
---------------------------------------
gm  2007.01.07 00:29:31.311 4390  32.24
gm  2007.01.07 00:29:57.886 1270  38.08
gm  2007.01.07 00:30:35.671 3370  35.67
gm  2007.01.07 00:30:43.216 8090  36.77
gm  2007.01.07 00:44:26.336 1800  35.03
..

We find all lunch hour trades for GM.

        select from trade where sym=`gm, time within (12:00:00;13:00:00)
 
sym date       time         quant px
---------------------------------------
gm  2007.01.01 12:01:32.133 7960  33.61
gm  2007.01.01 12:37:45.021 8480  31.84
gm  2007.01.01 12:39:46.197 5350  32.34
gm  2007.01.01 12:57:13.215 1090  33.34
gm  2007.01.02 12:53:06.764 1080  31.63
..

We find the maximum daily price for GE. Due to oursimplistic construction, it is statistically constant.

        select maxpx:max px by date from trade where sym=`ge
date      | maxpx
----------| -----
2007.01.01| 43.97
2007.01.02| 43.99
2007.01.03| 43.99
2007.01.04| 43.98
..

We find the minimum and maximum trade price over the timespan for each symbol and display the result by company name. The latterresolves the foreign key to theinstrument table with an implicitinner join.

        select lo:min px, hi:max px by sym.name from trade
 
name                           | lo    hi
-------------------------------| ------------
General Electric               | 36    43.99
General Motors                 | 31.5  38.49
Google                         | 486   593.99
Intel                          | 22.5  27.49
International Business Machines| 103.5 126.49
Microsoft                      | 27    32.99

We find the total and average trade volume for threesymbols. Due to our simplistic construction, the latter are statistically thesame.

        select totq:sum quant, avgq:avg quant by sym from trade where sym in`ibm`msft`g
sym | totq     avgq
----| -----------------
g   | 59748830 4979.069
ibm | 49983940 4998.394
msft| 24988910 4997.782

We find the daily volume weighted average price for Intel.

        select vwap:quant wavg px by date from trade where sym=`intc
date      | vwap
----------| --------
2007.01.01| 24.86849
2007.01.02| 25.00113
2007.01.03| 24.82538
2007.01.04| 24.98049
2007.01.05| 25.27898
..

We find the high, low and close over one minute intervalsfor Intel.

        select hi:max px,lo:min px,close:last px by date, time.minute from trade where sym=`intc
 
date       minute| hi    lo    close
-----------------| -----------------
2007.01.01 00:12 | 23.3  23.3  23.3
2007.01.01 00:17 | 24.03 24.03 24.03
2007.01.01 00:26 | 24.45 24.45 24.45
2007.01.01 00:51 | 25.73 25.73 25.73
2007.01.01 00:55 | 25.34 25.34 25.34
..

We demonstrate how to use your own functions in queries.Suppose we define a funky average that weights items by their position.

        favg:{(sum x*1+til count x)%(count x)*count x}

Then we can apply this just as we did the built-in qfunction avg.

        select favgpx:favg px by sym from trade
sym | favgpx
----| --------
g   | 270.0021
ge  | 19.99897
gm  | 17.51145
ibm | 57.53255
intc| 12.48081
msft| 15.00309

MeatyQueries

In this section, we demonstrate more interesting q-sqlagainst the trade table.

We find the volume weighted average price over 5 minuteintervals for intel.

        select vwap:quant wavg px by date, bucket:5 xbar time.minute from trade where sym=`intc
date       bucket| vwap
-----------------| --------
2007.01.01 00:10 | 23.3
2007.01.01 00:15 | 24.03
2007.01.01 00:25 | 24.45
2007.01.01 00:50 | 25.73
2007.01.01 00:55 | 25.34
..

We use favg from the previous section todemonstrate that user functions can appear in any phrase of the query.

        select from trade where px<2*(favg;px) fby sym
sym  date       time         quant px
-----------------------------------------
gm   2007.01.01 00:06:02.168 5270  33.6
g    2007.01.01 00:07:36.023 9340  527.71
g    2007.01.01 00:09:46.313 3640  491.6
intc 2007.01.01 00:12:05.909 610   23.3
ibm  2007.01.01 00:12:17.056 6410  112.92
..

We find the average daily volume and price for allinstruments and store the result for the next example.

        atrades:select avgqt:avg quant, avgpx:avg px by sym, date from trade
        atrades
sym date      | avgqt    avgpx
--------------| -----------------
g   2007.01.01| 5098.892 542.3796
g   2007.01.02| 5021.136 538.6672
g   2007.01.03| 5114     539.1208
g   2007.01.04| 4712.385 541.5371
g   2007.01.05| 5202.108 539.6128
..

We find the days when the average price went up. Note thatwe must explicitly exclude the first day becausedeltas is funky onits first value. Observe that the avpx column scrolls off the page.

        select date, avgpx by sym from atrades where 0<{0,1_deltas x} avgpx
sym | date
----| -------------------------------------------...
g   | 2007.01.03 2007.01.04 2007.01.06 2007.01.08...
ge  | 2007.01.02 2007.01.04 2007.01.06 2007.01.08...
gm  | 2007.01.02 2007.01.04 2007.01.05 2007.01.07...
ibm | 2007.01.01 2007.01.03 2007.01.05 2007.01.08...
intc| 2007.01.04 2007.01.05 2007.01.08 2007.01.10...
msft| 2007.01.01 2007.01.02 2007.01.04 2007.01.07...

To see a more representative display, take only the firstfew field values.

        select 2#date, 2#avgpx by sym from atrades where 0<{0,1_deltas x} avgpx
sym | date                  avgpx
----| ---------------------------------------
g   | 2007.01.03 2007.01.04 539.1208 541.5371
ge  | 2007.01.02 2007.01.04 39.98092 40.115
gm  | 2007.01.02 2007.01.04 35.13107 35.25371
ibm | 2007.01.01 2007.01.03 115.1667 115.1036
intc| 2007.01.04 2007.01.05 24.83024 25.18836
msft| 2007.01.01 2007.01.02 29.73195 30.03784

We can denormalize trade to obtain a keyed tablewith one row and complex columns for each symbol. We display the first twoitems of each field to make the structure more evident.

        dntrades:select date,time,quant,px by sym from trade
        select 2#date,2#time,2#quant,2#px by sym from trade
sym | date                  time                      quant     px
----| -----------------------------------------------------------------------
g   | 2007.01.01 2007.01.01 00:09:54.444 00:12:34.851 4670 3080 591.05 523.08
ge  | 2007.01.01 2007.01.01 00:02:24.871 00:04:18.772 8280 8290 40.11  43.73
gm  | 2007.01.01 2007.01.01 00:02:43.419 00:04:17.590 4280 830  32.13  36.53
ibm | 2007.01.01 2007.01.01 00:03:06.278 00:06:27.951 5070 9740 105.73 117.76
intc| 2007.01.01 2007.01.01 00:00:04.569 00:03:24.229 5440 1740 26.63  24.47
msft| 2007.01.01 2007.01.01 00:06:01.424 00:23:28.908 5170 1370 27.71  29.86

In such a complex table or keyed table, you must use eachto apply a monadic (unary) function across the items in a field.

        select sym,cnt:count each date, avgpx:avg each px from dntrade
 
        / or the following alternate notation is equivalent
 
        select sym,cnt:each[count] date, avgpx: each[avg] px from dntrade
 
sym  cnt   avgpx
-------------------
g    12000 540.0778
ge   9000  39.99574
gm   3000  34.98716
ibm  10000 114.978
intc 4000  24.96621
msft 5000  29.98583

We can also apply our own monadic favg functionwith each.

        select sym, favgpx:favg each px from dntrades
sym  favgpx
-------------
g    269.94
ge   19.98121
gm   17.48443
ibm  57.49667
intc 12.48413
msft 15.0314

We find the volume weighted average price by applying thedyadic wavg. In this case we must use the each-both adverb '. Observethat our simplistic construction makes the average price and volume weightedaverage price statistically the same.

        select sym, vwap:quant wavg' px from dntrade
        / is equivalent to the alternate notation
        select sym, vwap:wavg'[quant;px] from dntrade 
sym  vwap
-------------
g    540.1832
ge   40.00807
gm   34.95398
ibm  114.9836
intc 24.97542
msft 29.96661

Note that the latter form generalizes to n-adicfunctions for any n>1.

We find the profit of the ideal transaction over the monthfor each symbol. This is the maximum amount of money that could be made with20-20 hindsight. In other words, find the largest profit obtainable by buyingat any traded price and selling at the highest subsequently traded price. Tosolve this, we reverse the perspective. For each traded price, we look at theminimum prices that preceeded it. The largest such difference is our answer.

        select max px-mins px by sym from trade
sym | px
----| ------
g   | 107.99
ge  | 7.99
gm  | 6.99
ibm | 22.99
intc| 4.99
msft| 5.99

RemoteQueries

In this section, we demonstrate how to execute q-sqlqueries against a remote server. We assume that our sample tables have beencreated in a q instance (the server) that is listening on some port, say 5042.We also assume that we have another q process (the client) with an open handle hto the server. See IO for details on how to connect to remoteprocesses in q. The following expressions are all executed on theclient.

We can ask the server to list its tables.

        h "tables `."
`dntrades`instrument`trade

We can ask the server for the count of its trade table.

        h "count trade"
43000

We look up a name by sym. Observe the result is a vector.

        h "exec sym from instrument where name=`Intel"
,`intc

We can look up a sym by name. Observe the necessity ofescaping the double quotes inside the dynamic q-sql string.

        h "exec name from instrument where name=`$\"General Electric\""
,`General Electric

We can construct a query on the client and send it to theserver along with parameters to be executed.

        qdaily:{[s;d] select from trade where sym=s, date=d}
        h (qdaily;`g;2007.01.12)
sym date       time         quant px
----------------------------------------
g   2007.01.12 00:03:24.082 3570  507.44
g   2007.01.12 00:05:31.920 2900  588.99
..

We can construct the same query on the server and executeit remotely.

        h "qdaily:{[s;d] select from trade where sym=s, date=d}"
        / verify that it's there
        h "qdaily"
{[s;d] select from trade where sym=s, date=d}
 
        / execute it
        h "qdaily[`msft;2007.01.31]"
sym  date       time         quant px
----------------------------------------
msft 2007.01.31 00:00:41.237 9940  29.65
msft 2007.01.31 00:01:36.508 580   27.19
..

 

 

Contents

[hide]

10. ExecutionControl

Overview

Function evaluation provides sequential execution of aseries of expressions. In this chapter, we demonstrate how to control executionin q.

ControlFlow

In a vector-oriented language such as q, the clearest codeand best performance is generally obtained by avoiding loops and individualtests. For those times when you simply must write iffy or loopy code, q hasversions of the usual constructs.

Warning:The constructs in this section all involve branching in the byte code that is generatedby the q interpreter. The offset of the branch destination is limited(currently to 255), which means that the sequence of q expressions that can becontained in any part of$,if, do, or while must be short. At some point, insertion of one additional statementwill result in abranch error, which is q's way of rejecting bloated code. If you insist onwriting iffy or loopy code (never a good idea in q), factor code blocks intoseparate functions.

BasicConditional Evaluation

Languages of C heritage have a form of in-line 'if' calledconditional evaluation that has the form.

exprcond ? exprtrue : exprfalse

where exprcond is an expression thatevaluates to a boolean (or int in C and C++). The result of the expression isexprtruewhenexprcond is true (or non-zero) andexprfalseotherwise.

The same effect can be achieved in q using basicconditional evaluation,

$[exprcond;exprtrue;exprfalse]

where exprcond is an expression thatevaluates to a boolean or int. The result isexprtrue whenexprcondis not zero and exprfalse if it is zero.

        a:42
        b:98
        $[a>60;`Pass;`Fail]
`Fail
        $[b>60;`Pass;`Fail]
`Pass

Observe that a test for zero in exprcondcan be abbreviated.

        c:0
        $[a;`Nonzero;`Zero]
`Nonzero
        $[b;`Nonzero;`Zero]
`Nonzero
        $[c;`Nonzero;`Zero]
`Zero

Note:A null is not accepted for exprcond.

        d:0N
        $[d;`NonNull;`Null]
'type

ExtendedConditional Evaluation

In languages of C heritage, the if-else construct has theform,

if (exprcond){

statementtrue1;

.

.

.

}

else {

statementfalse1;

.

.

.

}

where exprcond is an expression thatevaluates to a boolean (or int in C and C++). If the expressionexprcondis true (or non-zero) the first sequence of statements in braces is executed;otherwise, the second sequence of statements in braces is executed.

A similar effect can be achieved in q using an extendedform of conditional evaluation.

$[exprcond;[exprtrue1;...];[exprfalse1;...]]

where exprcond is an expression thatevaluates to a boolean or int. Whenexprcond evaluates tonon-zero, the first bracketed sequence of expressions is executed inleft-to-right order; otherwise, the second bracketed sequence of expression isexecuted.

        a1:42
        a2:24
        $[a1<>42;[a:6;b:7;a*b];[a:`Life;b:`the;c:`Universe;a,b,c]]
`Life`the`Universe
 
        $[a2<>42;[a:6;b:7;a*b];[a:`Life;b:`the;c:`Universe;a,b,c]]
42

Languages of C heritage have a cascading form of if-else inwhich multiple tests can be made,

if (exprcond1){

statementtrue11;

.

.

.

}

else if (exprcondn){

statementtruen1;

.

.

.

}

.

.

.

else {

statementfalse;

.

.

.

}

In this construction, the exprcond areevaluated consecutively until one is true (or non-zero), at which point theassociated block of statements is executed and the statement is complete. Ifnone of the expressions passes, the final block of statements, called the defaultcase, is executed.

Note that any conditional other than the first is onlyevaluated if all those prior to it have evaluated to false. In addition, onlyone of the statement blocks will be executed.

A similar effect can be achieved in q with another extendedform of conditional execution.

$[exprcond1;exprtrue1;... ;exprcondn;exprtruen;exprfalse]

In this form, the conditional expressions are evaluatedconsecutively until one is non-zero, at which point the associatedexprtrueis evaluated and its result is returned. If none of the conditional expressionsevaluates to non-zero,exprfalse is evaluated and its resultis returned. Observe thatexprfalse is distinguished as thelast expression following a sequence of paired expressions.

Note:Any conditional other than the first is only evaluated if all those prior to ithave evaluated to zero. Otherwise put, a conditional evaluating to non-zeroshort-circuits the evaluation of all those after it.

         a:42
         b:0
         c:-42
         $[a=0;`zero;a>0;`pos;`neg]
`pos
         $[b=0;`zero;b>0;`pos;`neg]
`zero
         $[c=0;`zero;c>0;`pos;`neg]
`neg

Finally, the previous extended form of conditionalexecution can be further extended by substituting a bracketed sequence ofexpressions for anyexprtrue orexprfalse.

$[exprcond1;[exprtrue11;...];... ; exprcondn;[exprtruen1;...];[exprfalse1;...]]

9.1.3Vector Conditional Evaluation

Triadic vector-conditional evaluation ( ? ) hasthe form,

?[vb; exprtrue ; exprfalse]

where vb is a simple boolean list and exprtrueandexprfalse are atoms or vectors of the same type thatconform tovb. The result conforms tovb,and containsexprtrue in positions where vb has1b and exprfalsein positions wherevb has 0b .

The following example inserts 42 for odd-valued items of a list.

        L:(til 10) mod 3
        L
0 1 2 0 1 2 0 1 2 0
 
        ?[0=L mod 2;L;42]
0 42 2 0 42 2 0 42 2 0

Note:All arguments of a vector-conditional are fully executed. In other words, thereis no short circuiting of the evaluation.

if

The if statement conditionally evaluates asequence of expressions. It has the form,

if[exprcond;expr1;... ;exprn]

where exprcond is evaluated and if it isnon-zero the expressionsexpr1 thruexprnare evaluated in left-to-right order. Theif statement does not havean explicit result.

For example,

        a:42
        b:98
        z:""
        if[a=42;z:"Life the universe and everything"]
        z
"Life the universe and everything"
 
        if[b<>42;x:6;y:7;z:x*y]
        z
42

do

The do statement is an iterator of the form,

do[exprcount; expr1;... ; exprn]

where exprcount must evaluate to an int.The expressions expr1 thru exprn areevaluated exprcount times in left-to-right order. Thedostatement does not have an explicit result.

For example, the following expression computes nfactorial. It iteratesn-1 times, decrementing the factorfon each pass.

        n:5
        do[-1+f:r:n;r*:f-:1]
        r
120

while

The while statement is an iterator of the form,

while['exprcond;expr1;... ; exprn]

where expr cond is evaluated and theexpressions expr1 thruexprn are evaluatedrepeatedly in left-to-right order as long asexprcond isnon-zero. Thewhile statement does not have an explicit result.

Let's examine a nifty example taken from the Q Language Reference Manual. The followingfunction returns a list in which each null item in the argument listxhas been replaced with the item before it.

        f:{r:x;r[i]:r[-1+i:where null r];r}

Now observe that the expression,

        max null v

indicates whether there are any nulls in a list v(why?).

The following expression applies f iterativelyuntil there are no nulls left inv.

        while[max null v;v:f v]

Effectively, non-null values are propagated forward acrossnulls.

        v:10 -3.1 0n 42 0n 0n 0n 3.4
        while[max null v;v:f v]
        v
10 -3.1 -3.1 42 42 42 42 3.4

Do you see the problem with this example? Hint:consider the case wherev has one or more initial null items andremember that Ctrl-C terminates execution of a long-running q expression. Thewhileexpression will iterate forever because there is no value to propagate acrossthe initial item.

When you know v will be of a type having anunderlying numeric value, one solution is to prepend a default initial valueand remove it afterward. We use a type-matched zero,

        v:0n -3.1 0n 42 0n 0n 0n 3.0
        w:((type v)$0),v
        while[max null w;w:f w]
        1_w
0 -3.1 -3.1 42 42 42 42 3

Return andSignal

Normal function execution evaluates each expression in thefunction and terminates after the last one. There are two mechanisms for endingthe execution early: one returns successfully and the other aborts.

To terminate a function's execution successfully and returna value, use an empty assignment, which is assign (: ) with a valueto its right and no variable to its left. For example, in the followingcontrived function, execution is terminated and the result is returned afterthe third expression. The final expression is never evaluated.

        c:0
       f:{a:6;b:7;:a*b;c::98}
       f 0
42
       c
0

To abort function execution immediately, use signal,which is single-quote (' ) with a value to its right. For example, inthe following function, execution will be aborted in the third expression. Thefinal expression that assignsc is never evaluated.

        c:0
        g:{a:6;b:7;'`TheEnd;c::98}
        g 0
{a:6;b:7;'`TheEnd;c::98}
'TheEnd
 
        c
0

Note:Unless a function issuing a signal is invoked with protected execution, thesignal will cause the calling routine to fail.

You can also use signal within an if statement toterminate execution. Compare the following,

        a:42
        if[a<50; '`Stop; b:100]
'Stop

ProtectedEvaluation

Languages of C++ heritage have the concept of protectedexecution using a try-catch. The idea is that an unexpected condition arisingfrom any statement enclosed in the try portion does not abort execution.Instead, control transfers to the catch block, where the exception can behandled or passed up to the caller. This mechanism allows the call stack to beunwound gracefully.

Q provides a similar capability using triadic forms offunction evaluation ( @ ) and ( . ). Triadic @ isused for monadic functions and triadic. is used for multivalentfunctions. The syntax is the similar for both,

@[fmon;a;exprfail]

.[fmul;Largs;exprfail]

Here fmon is a monadic function, ais single argument,fmul is a multivalent function,Largsis a list of arguments, andexprfail is any expression. Inboth forms, the function is applied to its argument(s). Provided there is noerror in evaluating the function, the return value off is returnedfrom the protected evaluation. Should an error arise, exprfailis evaluated.

Note:Ifexprfail results in an error, the protected call itselfwill fail.

These functions are especially useful when processing inputreceived from users. In the following examples, you would replace the unhelpfulerror message with more useful error handling.

Suppose a user wishes to enter dynamic q expressions. Youcould place the expression in a string and pass it tovalue. Theproblem with this is that if the user types an invalid q expression, it willcause the application to fail. You should instead use protected execution.

        s:"6*7"
        @[value;s;`$"Invalid q expression"]
42
 
        s:"6x7"
        @[value;s;`$"Invalid q expression"]
`Invalid q expression

Similarly, triadic . provides protected executionfor multivalent functions.

        x:6
        y:7
        .[*;(x;y);`$" Invalid args for *"]
42
        x:6
        y:`7
        .[*;(x;y);`$" Invalid args for *"]
`Invalid args for *

Debugging

Debugging in q harkens back to the olden days, before theadvent of debuggers and integrated development environments. The q gods don'tgive debugging much consideration because their code always runs correctly thefirst time. For the rest of us, things aren't quite as bad as inserting printstatements, but you are certainly on your own. There is no debugger, nor isthere any notion of break points or tracing execution.

When any expression evaluation fails, the console displaysan (often cryptic) error message along with a dump of the offending values.Many errors manifest as either'type or'length, indicatingan incompatibility in function arguments with respect to type or length. Thegoal is to discover the root cause of the superficial error.

The first step is to examine the dump of the offendingarguments. Sometimes, the error will be obvious. A common'typeculprit is violation of type checking by attempting to assign a non-matchingvalue to a simple list (e.g., a table column). Another common'typeoffense is attempting to perform an operation on an atom not in the domain ofthe operation. A common culprit is failure to enlist an argument when a list isexpected.

In a technique passed on by Simon Garland,you can get a more useful display of relevant information when a function issuspended. Define a function, sayzs, as follows,

        zs:{`d`P`L`G`D!(system"d"),v[1 2 3],enlist last v:value x}

This function takes another function as its argument andreturns a dictionary with entries for the current directory, functionparameters, local variables referenced, global variables referenced and thefunction definition.

We demonstrate this with a trivial example.

        b:7
        f:{a:6;x+a*b}
 
        f[100]                / this is OK
142
        f[`100]                / this is an error
{a:6;x+a*b}
'type
+
`00
42
        zs f                / see what's what
d| `.
P| ,`x
L| ,`a
G| ``b
D| "{a:6;x+a*b}"

Stopping execution prior to the offending expression ishelpful. This can be done by inserting a signal before the expression you wishto examine. You can then evaluate the various items in the offendingevaluation. Stopping execution with a signal is a poor man's break point.

However the execution is suspended, you can evaluate theexpressions of the function by hand from the console. To resume execution witha return value, issue a return (: ) with the desired value at thecommand prompt. To return an error, issue a signal (' ) from thecommand line. To terminate execution and clear the call stack, issue (\) from the command line.

Scripts

A script is a q program stored in a text file withan extension of 'q'. A script can contain any q expressions or commands. Thecontents of the script are executed sequentially from top to bottom. Non-localentities created in the script exist in the workspace after the script isloaded.

Creatingand Loading a Script

You can create a script in a text editor and save it with aq extension. For example, enter the following lines and save to a file namedtrades.q in the q directory.

        trades:([] sym:(); ex:(); time:(); price:())
        `trades insert (`IBM;`N; 12:10:00.0; 82.1)
        `trades insert (`IBM;`O; 12:30:00.0; 81.95)
        `trades insert (`MSFT;`N; 12:45:00.0; 23.45)
        `trades insert (`IBM;`N; 12:50:00.0; 82.05)
        `trades insert (`MSFT;`N; 13:30:00.0; 23.40)

Now issue the load command,

        \l trades.q
,0
,1
,2
,3
,4

You can verify that the trades table has been created andthe records have been inserted.

       count trades
5

A script can be loaded at the start of the q session, or atany time during the session using the\l command. The load command canbe executed from the console or from another script. Seehere for more oncommands.

SpecialNotations

You can comment out a block of code by surrounding itmatching / and \. An unmatched \ exits the script.

Multi-line expressions are permitted in a script but theyhave a special form. The first line must be out-dented, meaning that it beginsat the left of the line withno initial whitespace. Any continuationlines must be indented, meaning that there isat least one whitespacecharacter at the beginning of the line. Empty lines between expressions arepermitted.

Table definition syntax and function definition syntax havethe same rule for splitting across multiple lines:

A table orfunction can have line breaks after the closing square bracket or after asemicolon separator (;).

PassingParameters

Parameters are passed to a q script at q startup similarlyto command line parameters in a C or Java program. They are strings that arenot explicitly declared and are accessed positionally corresponding to theorder in which they are passed.

Note:As of this writing (Jun 2007), parameters can be passed when a script is loadedat q startup but not when a script is loaded with the\l command.

Specifically, the system variable .z.x is a listof strings, each of which contains the char representation of an argumentpresent when the script was invoked. For example, the scriptcaptureargs.q,

         / script that captures its first three arguments
        p0:.z.x 0;
        p1:.z.x 1;
        p2:.z.x 2;

can be loaded during q startup,

        q.exe captureargs.q 42 forty 2.0

and in the new q session you will find,

        p0
"42"
        p1
"forty"
        p2
"2.0"

Example

Here is the commented script text for the sample programfrom Overview.

        / read px.csv file into table t
        t:("DSF"; enlist ",") 0: `:c:/q/data/px.csv;
 
        / select max Price from t grouped by Date and Sym
        tmpx:select mpx:max Price by Date,Sym from t;
 
        / open connection to q process on port 5042 on aerowing
        h:hopen `:aerowing:5042;
 
        / issue above query against table tpx on remote machine
        rtmpx:h "select mpx:max Price by Date, Sym from tpx";
 
        / close connection
        hclose h;
 
        / append merger of local and remote results to file tpx.dat
        .[`:c:/q/data/tpx.dat; (); ,; rtmpx,tmpx]

 

 

 

Contents

[hide]

11.I/O

Overview

I/O in q is achieved using handles, which are symbols whose valuesare file names. The handle acts as a mapping to an I/O stream, in the sensethat retrieving a value from the handle results in a read and passing a valueto the handle is a write.

Data Files

All q entities are automatically serializable to disk. The persistent formis a self-describing version of the in-memory form. Adata filecomprises a q entity written to disk.

File Handle

A file handle is a symbol that starts with a colon ( : ) and has the form,

        `:[path]fname

where the bracketed expression represents an optional path and fname is afile name. Both path and fname must be valid names as recognized by theunderlying operating system.

Important: The one caveat is that separators in q paths are always represented bythe forward slash ( / ), even for Windows.

Using hcount and hdel

Use hcount with a file handle to determine the size of the file in bytes.The result is a long.

        hcount`:c:/q/Life.txt

21210j

Use hdel with a file handle to delete a file from the file system of theunderlying operating system. A return value of the file handle indicates thatthe deletion was successful. You will get an error message if the file does notexist or if the delete cannot be performed.

        hdel`:c:/q/Life.txt

`:c:/q/Life.txt

Using set and get

A data file is created and a q entity written to it in a single step usingbinary set . The left operand is a file handle, the right operand is the entityto be written and the result is the handle of the written file. The file isclosed once the write is complete.

       `:/q/qdata.dat set 101 102 103

`:/q/qdata.dat

Note: Thebehavior of set is to create the file if it does not exist and overwrite it ifit does.

A data file can be read using unary get, whose argument is a file handleand whose result is the q entity contained in the data file.

        get`:/q/qdata.dat

101 102 103

An alternate way to read a data file is with value,

 #!q

       value`:/q/qdata.dat

101 102 103 42 1 2 3 4

Using hopen and hclose

A data file handle is opened with hopen. The result of hopen is an intfile handle that acts like a function for writing to the file once assigned toa variable.

        h:hopen`:c:/qdata.dat

 

        h[42]                        / handle used as function

 

        h 1 2 34                   / juxtapositionnotation

If the file already exists, opening it with hopen appends to it ratherthan overwriting it.

To close the handle, issue hclose on the result of hopen. This flushes anydata that might be buffered.

        hclose h

After the operations above, we fond,

        get`:/q/qdata.dat

101 102 103 42 1 2 3 4

Using Dot Amend

Fundamentalists can use dot amend to write to data files. To overwrite thefile if it exists, use assign ( : ).

       .[`:/q/qdata.dat;();:;1001 1002 1003]

`:/q/qdata.dat

 

        get`:/q:/qdata.dat

1001 1002 1003

To append to the file if it exists, use join ( , ).

       .[`:/q/qdata.dat;();,;42 43]

`:/q/qdata.dat

 

         get`:/q/qdata.dat

1001 1002 1003 42 43

Writing Splayed Tables

Writing a table to a data file using the above methods puts it into asingle file. For example,

        t:([] c1:101102 103; c2:1.1 2.2 3.3)

       `:/q/data/t.dat set t

`:/q/data/t.dat

creates a single file in the data subdirectory of the q directory. Listthe directory on your disk now to verify this.

You can write each column of the table to its own file in the directoryspecified in the handle; this is especially useful for large tables. A tablewritten in this form is called asplayed table.

To splay a table, specify the path as a directory - that is, with atrailing slash (/) and no file name.

       `:/q/data/t/ set t

`:/q/data/t/

If you list the directory in the OS, you will see a new subdirectory named't'. It contains three files, one file for each column in the original table,as well as a '.d' file containing q meta data. The latter describes how to putthe columns back together.

Important: For a table to be splayed, each column must be of uniform width.Consequently a splayed table cannot contain any symbol or non-simple columns. Atable with symbol column(s) can effectively splayed by enumerating the symbols.

Thus, the following fails,

       ts:([]c1:`a`b`c`a;c2:10 20 30 40)

       `:/q/data/ts/ set ts

'type

Enumerate the symbol column and the write succeeds.

       syms:distinct ts.c1

        updatec1:`syms$c1 from `ts

`ts

 

        ts

c1 c2

-----

a  10

b  20

c  30

a  40

 

       `:/q/data/ts/ set ts

`:/q/data/ts/

Save and Load on Tables

The save and load functions simplify the process of writing and readingtables to/from disk files.

In its simplist form, save writes a table to a file with the same name asthe table. The form,

        save`:path/tname

in which path is an optional path name and tname is the name of a table inthe workspace, is equivalent to,

        `:path/tnameset tname

Thus,

        save`:/q/trade

writes the trade table to a file named trade in the q directory.

Similarly,

        save`:path/tname/

splays the table within the directory tname.

As you would expect, load is the inverse of save, in that it reads a tablefrom a file into a variable with the same name as the file. In other words,

        load`:path/tname

is equivalent to,

        tname:get`:path/tname

Thus, the expression,

        load `:/q/trade

creates a table variable trade and populates it from the file data.

As before, appending a / indicates that the table has been splayed. So,

        load`:path/tname/

populates a table tname from the directory tname.

You can also use save to write a table as delimited text simply byappending an appropriate file extension. The expression,

        save`:path/tname.txt

writes the table as text records. The expression,

        save`:path/tname.csv

writes the table as csv records. The expression,

        save`:path/tname.xml

writes the table as xml records.

Note: Tableswritten as .txt or .csv can be read as text files.

As an example, we take the simple table,

        tsimp:([]c1:`a`b`c; c2:10 20 30)

We save it,

        save`:/q/tsimp

`:/q/tsimp

Then reload it

        tsimp:()

        load `:/q/tsimp

`tsimp

        tsimp

c1 c2

-----

a  10

b  20

c  30

Next we save it in delimited text formats,

        save`:/q/tsimp.txt

`:/q/tsimp.txt

        save`:/q/tsimp.csv

`:/q/tsimp.csv

        save`:/q/tsimp.xml

`:/q/tsimp.xml

Now we inspect the files files with a text editor. In tsimp.txt, we find,

c1      c2

a       10

b       20

c       30

In tsimp.csv we have,

c1,c2

a,10

b,20

c,30

In tsimp.xml, we have,

<R>

<r><c1>a</c1><c2>10</c2></r>

<r><c1>b</c1><c2>20</c2></r>

<r><c1>c</c1><c2>30</c2></r>

</R>

Text Files

Importing and exporting data often involves reading and writing textfiles. The mechanism for doing this in q differs from processing q data files.

Writing (0:) and Reading(read0)

The q primitive verb denoted 0: takes a file handle as its left argumentand a list of q strings as it right argument. It writes each string as a lineof text in the specified file.

        `:/q/Life.txt 0: ("So";"Long")

`:/q/Life.txt

Opening the file Life.txt in a text editor will show a file with twolines.

Read a text file with read0. The result is a list of strings, one for eachline in the file.

        read0`:/q/Life.txt

"So"

"Long"

Using hopen and hclose

A text file handle can be opened with hopen. The result of hopen is apositive int whosenegative is a file handle can be used to write textto the file.

        h:hopen`:/q/Life.txt

        (negh)["and"]

-152

        (neg h)("Thanks";"for";"all";"the";"Fish")

-152

If the file already exists, opening it with hopen will append to it ratherthan overwriting it.

To close the handle, issue hclose on the int result of hopen . Thisflushes any data that might be buffered.

        hclose h

        read0`:/q/Life.txt

"So"

"Long"

"and"

"Thanks"

"for"

"all"

"the"

"Fish"

Binary Files

It is also useful to read and write data from/to binary files. Themechanism for doing this is similar to that for processing text files. In q, abinary record is a simply a list of byte values.

Writing (1:) and Reading(read1)

The q primitive verb denoted 1: takes a file handle as its left argumentand a simple byte list as its right argument. It writes each byte in the listas a byte in the specified file.

       `:/q/answer.bin 1: 0x2a0607

`:q/answer.bin

Opening the file answer.bin in an editor that displays binary data willshow a file with three bytes.

Read a text file with read1. The result is a list of byte.

        read1`:/q/answer.bin

0x2a0607

Using hopen and hclose

A binary file handle can be opened with hopen. The result of hopen is apostiive file handle int that can be used to write a list of byte to the file.Close the file by issuing hclose on the handle.

        h:hopen`:/q/answer.bin

        h[0x01]

152

 

        h 0x020304

152

 

        hclose h

        read1`:/q/answer.bin

0x2a060701020304

Reading Text Files as Binary

A text file can also be read as binary data by using read1. With Life.txtas above,

        read0`:/q/Life.txt

"So"

"Long"

"and"

"Thanks"

"for"

"all"

"the"

"Fish"

        read1`:c:/q/Life.txt

0x536f0d0a4c6f6e670d0a616e640d0a5468616e6b730d0a666f720d0...

To convert this binary data to char, cast the binary. On a Windowsmachine, this looks as follows,

       "c"$read1 `:c:/q/Life.txt

"So\r\nLong\r\nand\r\nThanks\r\nfor\r\nall\r\nthe\r\nFish\r\n"

Parsing File Records

Binary forms of 0: and 1: parse individual fields of a text or binaryrecord according to data type. Field parsing is based on the following fieldtypes.

0

1

Type

Width(1)

Format(0)

B

b

boolean

1

[1tTyY]

X

x

byte

1

H

h

short

2

[0-9a-fA-F][0-9a-fA-F]

I

i

int

4

J

j

long

8

E

e

real

4

F

f

float

8

C

c

char

1

S

s

symbol

n

M

m

month

4

[yy]yy[?]mm

D

d

date

4

[yy]yy[?]mm[?]dd or [m]m/[d]d/[yy]yy

Z

z

datetime

8

date?time

U

u

minute

4

hh[:]mm

V

v

second

4

hh[:]mm[:]ss

T

t

time

4

hh[:]mm[:]ss[[.]ddd]

blank

skip

*

literal chars

The column labeled '0' contains the (upper case) field type char for textdata. The (lower case) char in column '1' is for binary data. The columnlabeled 'Width(1)' contains the number of bytes that will be parsed for abinary read. The column labeled 'Format(0)' displays the format(s) that areaccepted in a text read.

Note: Theparsed records are presented in column form rather than in row form because qconsiders a table to be a collection of columns.

Fixed Length Records

The binary form of 0: and 1: for reading fixed length files is,

(Lt;Lw) 0: f

(Lt;Lw) 1: f

The left operand is a (general) list containing two sublists: Ltis a simple list of char containing one letter per field; Lwis a simple list of int containing one int width per field. The sum of thefield widths inLw must equal the width of the record. Theresult of the function in all cases is a (general) list of lists with an itemfor each field.

The simplest form of the right operand f is a symbol representing a filehandle. For example,

        ("IFCD";4 8 10 6 4) 0: `:/q/Fixed.txt

reads a text file containing fixed length records of width 32. The firstfield is an int of length 4; the second field is a float of width 8; the thirdfield consists of 10 char; the fourth slot of 6 positions is skipped; the fifthfield is a date of width 10.

You might think that the widths are superfluous, but they are not. Theactual width can be narrower than the default for small values. Alternatively,you may wish to specify a width larger than that required by the correspondingdata type to indicate blanks between fields. If the file in the previousexample were rewritten with one additional blank character between fields, theproper left operand to read it would be,

        ("IFCD"; 5 9 11 6 4)

For example, we take a file c:/q/data/Px.txt having the form,

       1001DBT12345678 98.61002EQT98765432 24.571003CCR00000001121.23

The read is,

       ("ISF";4 11 6) 0: `:/q/data/Px.txt

1001       1002        1004

DBT12345678 EQT98765432 CCR00000001

98.6       24.75       121.23

The second form of the right operand f is,

(hfile;i;n)

where hfile is a symbol containing a file name,iis the offset into the file to begin reading andn is the number ofbytes to read. This is useful for large files that cannot be read into memoryin one operation.

Note: A readoperation must begin and end on a record boundary.

In our trivial example, the following reads the second and third records,

       ("ISF";4 11 6) 0: (`:/q/data/Px.txt; 21; 42)

1002        1004

EQT98765432 CCR00000001

24.75       121.23

Variable Length Records

The binary form of 0: and 1: for reading variable length delimited filesis,

(Lt;D) 0: f

(Lt;D) 1: f

The left operand is a (general) list comprising two items: Ltis a simple list of char containing one type letter per field;D is aeither a char representing the delimiting character or an enlisted such.

If D is a delimiter char, the result is a general list of lists.Each list in the result is made up of items of type specified byLt.The simplest form of the right operandf is a symbol representing a filehandle.

For example, say we have a csv file /q/data/Px.csv having records,

 1001,"DBT12345678",98.6

 1002,"EQT98765432",24.75

 1004,"CCR00000001",121.23

Reading with a simple delimiter char results in a list of column lists,

       ("ISF";",") 0: `:c:/q/data/Px.csv

1001       1002        1004

DBT12345678 EQT98765432 CCR00000001

98.6       24.75       121.23

If D is the enlist of a delimiter char, the first record is taken to be alist of column names. Subsequent records are read as data specified by thetypes inLt. The result is a table in which each record isformed from a file record.

Say we have a csv file /q/data/pxtitles.csv having records,

 "Seq","Sym","Px"

 1001,"DBT12345678",98.6

 1002,"EQT98765432",24.75

 1004,"CCR00000001",121.23

Reading with an enlisted delimiter results in a table,

       ("ISF";enlist ",") 0: `:/q/data/pxtitles.csv

Seq  Sym         Px

-----------------------

1001 DBT12345678 98.6

1002 EQT98765432 24.75

1004 CCR00000001 121.23

You can also read this file with an atomic delimiter. The result is a listof lists with nulls in the positions where the header records do not match thespecified types.

       ("ISF";",") 0: `:c:/q/data/pxtitles.csv

1001       1002        1004

Sym DBT12345678 EQT98765432 CCR00000001

    98.6        24.75       121.23

Saving and Loading Contexts

It is possible to save or restore all the entities in a q context in oneoperation. This is useful to restore the state of a system to its initialcondition or from a checkpoint.

Saving a Context

Recall that a context is actually a dictionary. You can write an entirecontext, with all its entities, to a single data file by writing thedictionary.

For example, to write out the default context,

        `:currentwsset value `.

`:currentws

Loading a Context

To retrieve a saved context, use get with the file handle,

        dc:get`:currentws

Use set with a symbol containing the context name to replace the context,

        `. set dc

Important: Overlaying the root context replaces all its entities. This isconvenient for re-initialization, but be sure of your intent.

Interprocess Communication

A q process can communicate with another q process residing anywhere onthe network, provided that process is accessible. The process that initiatesthe communication is theclient, while the process receiving andprocessing the request is the server. The server process can be on thesame machine, the same network, a different network or on the internet. Thecommunication can be synchronous (wait for a result to be returned) orasynchronous (don't wait and no result returned).

The easiest way to examine interprocess communication (IPC) is to startanother q process on the same machine running your current q session. Make sureit is listening on a different port (the default port is 5000). In what followswe shall assume that a server q process has been started on the same machinewith the command,

        q -p 5042

This means it is listening on port 5042.

Communication Handle

A communication handle is similar to a file handle. It is a symbol thatstarts with a colon (:) and has the form,

       `:[server]:port

where the bracketed expression represents an optional server machineidentifier and port is a port number.

If the server process is running on the same machine as the clientprocess, you can omit the server identifier. In our case, the communicationhandle is,

 #!q

       `::5042

If the server is on the same network as your machine, you can use itsmachine name. In our case,

       `:aerowing:5042

You can use the IP address of the server,

       `:198.162.0.2:5042

If the server is running on the internet, you can use a url,

       `:www.yourco.com:5042

Connection Handle

Use a communication handle as the argument of hopen to open a connectionto the server process. Store the int result of hopen , called theconnectionhandle, in a variable. You issue commands to the server by treating thisvariable as if it were a function.

For example, if the server process is running on the same machine and islistening on port 5042, the following q code opens a connection to the serverprocess. It assigns the value 42 to the variable a on the server and thenretrieves the value of a from the server. Finally, the connection is closed.

        h:hopen`::5042

        h"a:42"

        h"a"

42

        hclose h

Note:Whitespace between h and the quoted string is optional, as it is in functionjuxtaposition. We include it for readability.

Message Format

The general message format for interprocess communication is a list,

(f; arg1; arg2; ...)

Here f is a symbol or string representing an expression to be evaluated onthe server. It can be an expression containing q operators or it can be afunction, dictionary or list. The remaining itemsarg1, arg2... are optional parameters for the map. The parameters are arguments when f isfunction, indices when f is a list, or domain items when f is a dictionary.Message execution returns the result of the server's evaluation.

This form of remote call is very powerful, in that it can send a mappingto a remote q instance for evaluation. In particular, the lambda of a functionis transported. In a simple example, say we already have an open handle h to aserver. If f is defined on the client as,

        f:{x*x}

then executing the following expression on the client,

        h (f;2)

results in f being sent to the server with the argument 2 and thenevaluated there. The result is,

        h (f;2)

4

Important: Exercise caution when sending entities to a remote server. A trivialmistake could place the server into a non-responding state. It is safer todefine a function on the server and screen its input internally.

A special case of the general message format, which we used previously, isa string in which f is a q expression to be executed on the server and thereare no args. For example,

       "a:6*7"

       "select avg price from t where date>2006.01.01"

This format can be used to execute a function that has been defined on theserver. For example, suppose g is defined on theserver as,

        g:{x*x*x}

Executing the following on the client sends the string "g 2" tothe server where it is evaluated. The result is,

        h "g2"

8

Compare this with the example above where f is defined on the client.

Note: If theexpression in the execution string contains special characters, they must beescaped. For example, to define a string on the server, you must escape thedouble quotes in the message string.

       "str:\"abc\""

When the remote function performs an operation on a table, it can beviewed as a remote stored procedure. For example, suppose t and f are definedon the server as,

       t:([]c1:`a`b`c;c2:1 2 3)

        f:{[x] selectc2 from t where c1=x}

The following expression on the client executes f on the server, selectingrows that match the value `b in c1,

        h "f`b"

c2

--

2

The equivalent of dynamic SQL can be achieved by passing a functiondefinition.

        h ({[x]select c2 from t where c1=x};`b)

+(,`c2)!,,2

Synchronous Messages

The messages sent in the previous sections were synchronous,meaning that the sending client process waits for a result from the serverbefore proceeding. The result of the operation on the server becomes the returnvalue of the remote call that uses the connection handle.

To send a synchronous message, use the original positive int value of theconnection handle as if it were a function. A typical example of sending asynchronous message is executing a select expression on the server. In thiscase, you surely want to wait for the result to return.

For example, suppose a table has been defined on the server as,

       t:([]c1:`a`b`c;c2:1 2 3)

The following message executes a query against t, assuming h is an openconnection handle to the server.

        h"select from t where c1=`b"

c1 c2

-----

b  2

Note: Theprevious example demonstrates how to perform the equivalent of dynamic SQLagainst the server process.

As another example, send an insert synchronously if you want confirmationof the operation.

       h "`tinsert (`x;42)"

,3

       h"t"

c1 c2

-----

a  1

b  2

c  3

x  42

Asynchronous Messages

It is also possible to send messages asynchronously, meaning thatthe client does not wait and there is no result containing a return value. Youwould typically send an asynchronous message to kick off a long-runningoperation on the server. You might also send an asynchronous message if theoperation does not have a meaningful result, or if you simply don't care towait for the result.

To send an asynchronous message, use the negative of the int connectionhandle returned by hopen. For example, the insert that was sent synchronouslyin the previous example can also be sent asynchronously,

        (neg h)"`t insert (`y;43)"

        h"t"

c1 c2

-----

a  1

b  2

c  3

x  42

y  43

Observe that there is no return value from the first message.

Advanced: In theprevious example, because the first message is asynchronous, it is possiblethat the second message will be sent from the client before the insert hascompleted on the server. However, the second message will not execute on theserver until the first has completed.

Message Handlers

When a q process receives a message via interprocess communication, thedefault behavior is to evaluate the message, effectively executing the messagecontent. If the message is synchronous, the result is returned to the client.

During message processing on the server, the server connection handle isautomatically placed in .z.w . This can be used to manage connections on theserver. See below for a simple example.

Note: Theconnection handle on the client side and the connection handle on the serverside are assigned independently by their respective q processes. In general,they are not equal.

The default message processing can be overridden using message filters.Message filters are event-handling functions in the .z context. The .z.pgmessage filter processes synchronous requests and .z.ps processes asynchronousrequests.

Advanced: The namesend in 'g' and 's' because synchronous processing has "get" semanticsand asynchronous processing has "set" semantics.

The following two assignments on the server recreate the default messageprocessing behavior.

       .z.ps:{value x}

       .z.pg:{value x}

Message filtering can be used for a variety of purposes. For example,suppose the connection allows a user on the client side to execute dynamicq-sql against the server. You could improve on the default processing byenclosing the evaluation in protected execution.

       .z.pg:{@[value; x; errHandler x]}

Here errHandler is a function that recovers from an unexpected error.

A more interesting example is a server that keeps track of the clientsconnected to it. A simplistic way to do this is to maintain a dictionary ofconnection handles mapped to client names. The following function on the serverregisters a new client connection by upserting it to the global dictionary cp.Remember, .z.w has the connection handle.

       cp:()!()                                               / server

 

       regConn:{cp[.z.w]::x}                          / server

The client could pass its machine name,

        h:hopen`::5042                                /client

        h                                                       / client

224

        h"regConn `",string .z.h                    / client

After this call, cp will contain an entry that reflects the specifichandle assigned to the connection on the server. For example,

        cp                                                       / server

4| macpro.local

As additional connections are made to the server, cp will contain oneentry for each connection.

Handling Close

An open connection can be closed by either the client or the server. Theclose can be deliberate, meaning it occurs under user or program control, or itcan be unanticipated due to a process terminating unexpectedly.

The close handler .z.pc can be used to perform processing whenever aconnection is closed from the other end. While it will be invoked on any close,it does not know how the close was initiated.

In our example above, we use a close handler to remove the informationabout a connection once it is closed. Specifically, we create a handler toremove the appropriate entry from cp.

       .z.pc:{cp::cp _ x}                                / server

When the client issues an hclose on its connection handle,

        hcloseh                                               / client

the dictionary cp no longer shows the connection,

        cp                                                       / server

_

Now that we have established basic close handling on the server, we turnour attention to the client. We want the client to reconnect automatically inthe event the server disconnects for any reason. The easiest way to do this iswith the timer.

We create a close handler that resets the global connection handle to 0and issues a command that sets the timer to fire every 2 seconds (2000milliseconds).

       .z.pc:{h::0; value"\\t 2000"}

The timer handler attempts to re-open the connection. Upon success, itissues a command that turns the timer off.

       .z.ts:{h::hopen`::5042; if[h>0;value"\\t 0"]}

Note: Inpractice, you should restrict the number of connection retries rather than tryforever.

Http Connection Handler

There is also a message handler for http connections, named z.ph. Sincehttp communication is always synchronous, there is only one handler. Incontrast to other system handlers, there is a default handler for http, whichis used for the q web viewer.

The default handler allows a q process to be accessed programmaticallyover the web, similar to a servlet. The ambitious reader could replace thiswith a handler that processes SOAP, thus enabling q to be a web service. (Sucha handler would be the object of derision from those who decry SOAP asunnecessary and wasteful.)

 

 

 

Contents

[hide]

12. WorkspaceOrganization

Overview

The collection of entities that exist in a q sessioncomprises the workspace. In other words, the workspace includes allatoms, lists, dictionaries, functions, enumerations, etc., that have beencreated through the console or via script execution.

Any programming environment of reasonable complexity hasthe potential for name clashes. For example, should two separate q scripts bothcreate a variable called 'foobar', one will overwrite the value of the other.Variable typing is of no help here, since a variable can be reassigned with adifferent type at any time.

The solution to name clashes is to create namespaces. Thisis accomplished with a hierarchical naming structure implemented with aseparator character, usually a dot or a slash. For example, the name spacesAandB can both have an entity foobar , yet A.foobarandB.foobar are distinct. A familiar example of this is thehierarchical directory/file system used by operating systems.

Namespaces in q are called directories or contexts.Contexts provide an organization of the workspace.

Contexts

The q workspace provides a simple namespace structure usingdot notation for entity names. Each of the nodes is called acontext, oradirectory. The default context, also called theroot,comprises all entities whose names start with an initial alpha character. Thevariables we have created heretofore have resided in the default context.

ContextNotation

A context name has the form of a dot ( . )followed by alphnums, starting with an alpha. The following are all validcontext names.

        .a
        .q
        .z0
        .zaphod

There is no need to pre-declare the context name. As in thecase of variables, a context is created dynamically as required. You specify avariable to a context by prepending the context name to the variable name,separated by a dot (. ). The variablefoobar can be createdin various contexts,

        foobar:42
        .aa.foobar:43
        .z0.foobar:45
        .zaphod.foobar:46

Variables of the same name in different contexts are indeeddistinct,

        foobar
42
        .aa.foobar
43
        .z0.foobar
45
        .zaphod.foobar
46

When an entity name includes its full context name, we saythe name is fully qualified. When an entity name omits the context name,we say the name isunqualified.

ReservedContexts

All contexts of a single letter (both lower and upper case)are reserved for q itself. Some of these are listed below:

Name

Use

.q

Built-in functions

.Q

Low-level routines used by q

.z

Environmental interaction

Important:While q will not prevent you from placing entities in the reserved contexts,doing so risks serious problems should you collide with names used by q.

Workingwith Contexts

At any time in a q session, there is a current or workingcontext. When you start a q session, the current context is the defaultcontext. You change the current context with the\d command. Forexample, to switch to the 'files' context,

        \d .files

To switch back to the default context,

        \d .

To display the current context,

        \d
`.

Any entity in the current context can be specified usingits unqualified name.

        \d .                / switch to default context
 
        .files.home:`c:
        .files.home
`c:
        \d .files
        home
`c:

A Contextis a Dictionary

A context is actually a sorted dictionary whose domain is alist of symbols with the names of the entities defined in the context. Applythekey function to the dictionary name to display the names of theentities in the context. Applyvalue to see the entire dictionarymapping.

        .new.a:42
        .new.L:1 2 3
        .new.d:`a`b`c!1 2 3
        key `.new
``a`L`d
 
        value `.new
 | ::
a| 42
L| 1 2 3
d| `a`b`c!1 2 3

Observe that q places an entry into any non-default contextthat maps the null symbol to the null item.

You can look up an entity name in the directory to get itsassociated value. Use a symbol containing the context name to refer to thedictionary.

        `.new[`L]
1 2 3

Note:In order to access an entity in the default context from another context, youmust retrieve the value from the context dictionary. There is no syntacticform.

        \d .
        ztop:42
        \d .new
        `.[`ztop]
42

Expungingfrom a Context

We have seen that a context is a directory that maps entitynames for the context to their values. This means that in order to expunge anentity from a context, we can simply delete it from the dictionary.

For example, if we can define a variable a in the context .newand then remove it from the workspace when it is no longer needed. Observe thatwe use the symbolic name of the context to ensure that the delete is applied toit by reference.

        .new.a:42
        .new
 | ::
a| 42
        /
        / do some work ...
        /
        delete a from `.new
`.new
        .new
| ::

In particular, to expunge a global entity from the defaultcontext, use `. as the directory name. In a fresh workspace we find,

        a:42
        b:98.6
        c:`life
        \v
`s#`a`b`c
        delete a from `.
`.
        \v
`s#`b`c

Functionsand Contexts

Function definition presents an issue with respect toglobal variable references and unqualified names. In the following function,the variablea is an unqualified global variable,

        f:{a+x}

There is a potential ambiguity with respect to the contextof a. Is the context resolved at the timef is defined, or isit resolved at the timef is evaluated?

Important:The context of an unqualified global variable in a function is the context inwhich the function is defined, not the context in which it is evaluated.

Thus, we find

        \d .
        a:42
       \d .lib
       f:{a+x}
       f[6]
{a+x}
'a
)\
 
        a:100
        f[6]
106
 
        \d .
        .lib.f[6]
106

We also find the following result, because even though glives in the .lib context, it is defined in the default context.

        \d .
        .lib.g:{a*x}
        a:42
        g[2]
'g
 
        \d .lib
        g[3]
126
        a:6
        g[7]
294

Namespaces(Advanced)

It is possible to simulate a multi-level namespacehierarchy by using multiple dots in names.

        .lib1.vars.op1:6
        .lib1.vars.op2:7
        .math.fns.f:{x*y}
        .math.fns.f[.lib1.vars.op1;.lib1.vars.op2]
42

In the example above, q creates dictionaries at each nodeof the tree.

 #!q
       value `.lib1.vars
   | ::
op1| 6
op2| 7
        value `.math.fns
 | ::
f| {x*y}

But appearances are deceiving. As of this writing (Jan2007), q does not recognize a context tree below the first level. So, inour example, you can not switch to a context.lib1.vars using the\dcommand.

        \d .math.fns
'.math.fns

You must access the contents of a node dictionarybelow the top level functionally.

        `.math.fns[`f] [6;7]
42

The following is arguably more readable.

        mlib:`.math.fns
        mlib[`f][6;7]
42
 
        vlib:`.lib1.vars
        vlib[`op1`op2]
6 7
        mlib[`f] . vlib[`op1`op2]
42

This is one way to perform late-bound computation usingmembers in the context tree.

 

 

 

Contents

[hide]

13. Commandsand System Variables

CommandFormat

Commands control aspects of the q environment. A commandbegins with a back-slash (\) and is followed by one or morecharacters. Some commands have an optional parameter that is separated from thecommand by whitespace.

Important:Case is significant in the command characters.

To execute a command programmatically, place it in a stringand use the value function.

        value "\\p 5042"

Note:A backslash in the string must be escaped.

Tables(\a)

The command \a returns a list of symbols with thenames of all tables in the current context. For example, in a fresh q session,

        t:([]c1:1 2 3; c2:`a`b`c)
        \a
,`t

Console(\c)

The command \c (note lower case) controls the sizeof the q virtual console display. The first parameter specifies the number ofrows and the second the number of columns. The default setting is 23 by 79.

        til 100
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..
 
        \c 23 200
        til 1000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
 57 58 59 60 61 62 63 64 65 66 67 68 ..

WebConsole (\C)

The command \C (note upper case) controls the sizeof the q web console display. The first parameter specifies the number of rowsand the second the number of columns. The default setting is 36 by 2000.

Change O/SDirectory (\cd path)

The \cd command affects the current workingdirectory of the underlying operating system. To display the current directory,issue\cd with no argument.

        \cd
"/Users/jeffry/bin"

The result of \cd is the text string as receivedfrom the O/S with escapes where applicable. For Windows, the back-slashcharacters are escaped and are not converted to forward-slashes.

To change the current working directory, issue \cdwith the path of the desired directory.

        \cd /q

If the specified directory does not exist, it will becreated.

Note:Since the argument of\cd is not a string, special characters do not need to be escaped.

Directory(\d)

The \d command controls the current context(directory).

To determine the current context, issue \d with noparameter.

        \d
`.

To set the current context, issue \d followed bythe target context.

        \d .tutorial
        \d
`.tutorial

Note:If the specified context does not exist, using it in\d will causeits creation.

Issue \d . to set the current working context tothe default context.

        \d .
        \d
`.

Functions(\f)

The \f command returns a sorted list containingthe functions in a context (directory). When used with no parameters, itreturns the functions in the current context.

        \f
`s#`diff`f`g

Use \f with the name of a context to list itsfunctions.

        \f .debug
`s#``addBPs`break`clearBPs`deleteBPs`stop

Load (\l)

A script can be loaded at startup of q or during a session.To load the script from the session, issue the\l command with the(optionally qualified) name of the script file.

For example, to load the distribution script sp.q from thecurrent directory,

        \l sp.q
+`p`city!(`p$`p1`p2`p3`p4`p5`p6`p1`p2;`london`london`london`london`london`lon..
(+(,`color)!,`blue`green`red)!+(,`qty)!,900 1000 1200
+`s`p`qty!(`s$`s1`s1`s1`s2`s3`s4;`p$`p1`p4`p6`p2`p2`p4;300 200 100 400 200 300)

Offset(\o)

The \o command sets the offset in hours from GMTused to determine local time in. For example,

        .z.z
2007.04.12T11:31:13.352
        .z.Z
2007.04.12T07:31:15.365
 
        \o -2
 
        .z.z
2007.04.12T11:31:35.954
        .z.Z
2007.04.12T09:31:37.587

Port (\p)

The \p command controls which port the kdb+ serverlistens on. For example,

        \p 5001

means that it will listen for connections on port 5001.

Note:When you issue the \p commend, kdb+ attempts to open the port. For this to besuccessful, the security settings of the machine must allow it.

If the port has not been set, you will see,

        \p
0

This means that no connection to this instance of kdb+ iscurrently possible because it is not listening on any port. You can also issue\p 0 to stop listening on any port.

Precision(\P)

The precision command \P (note the upper case)sets the display precision for floating point numbers to the specified numberof digits.

The default precision is 7, meaning that the display offloat or real values is rounded to the seventh significant digit.

        \P
7
 
        f:1.23456789012345678
        f
1.234568

Set the precision with a non-negative int parameter.

        \P 12
        f
1.23456789012

Set the precision to the maximum available that respectsmultiplicative tolerance with 0. This is currently the same as using 16.

        \P 0
        f
1.234567890123457

Set the precision to the maximum available with 17.

        \P 17
        f
1.2345678901234569

Seed (\S)

The \S (note upper case) sets the seed forpseudo-random number generation. The default value is -314159. The argument isan integer.

        \S
-314159
 
        \S 424242
        \S
424242

Timer (\t)

The \t command controls the timer. The optionalparameter is the number of milliseconds between timer ticks, with 0 signifyingthat the timer is off. On each timer tick, the function.z.ts isinvoked if it has been assigned.

To determine the current timer setting, issue \twith no parameter.

        \t
0

To set the timer, issue \t with the number ofmilliseconds. For example, to set the timer to tick once a second,

        \t 1000

Note:The actual timer tick frequency is determined by the timing granularitysupported by the underling operating system. This can be considerably less thana millisecond.

To turn the timer off,

        \t 0

ElapsedTime (\t expr)

When the \t command is invoked with an expressionas its parameter, the expression is evaluated and its duration of execution isreported. This can be used to profile code execution when tuning anapplication.

In q there are often multiple ways to achieve a desired result,but one may execute significantly faster. This may not matter for small tablesor sporadic updates, but for processing very large volumes of data in real timeit can be essential. Inserting\t at key points in the program canidentify the critical routines that are consuming the most time. By measuringthe execution times of alternate expressions for the critical routines, you candetermine which is most efficient in your environment.

The following measures the time required to add the first100,000 integers 10,000 times on the author's laptop.

        \t do[10000; sum til 100000]
2553

We conclude that adding the first 100,000 integers oncerequires approximately .25 milliseconds.

If it is actually necessary to add the first 100,000integers in an application, you could use the formula,

        sn = (n*n-1)%2

We time it for n = 100,000.

        \t do[10000; (100000*99999)%2]
10

As you can see, this is roughly 200 times faster thanperforming the actual addition. We can do even better by replacing the divisionwith a multiplication,

        sn = .5*n*n-1

To see the effects clearly, we increase the counter to100,000.

        \t do[100000; sum til 100000]
25216
        \t do[100000; (100000*99999)%2]
120
        \t do[100000; .5*100000*99999]
80

Timeout(\T)

The \T command (note upper case) controlsexecution timeout. The int parameter is the number of seconds any call from aclient will execute before it is timed out and terminated. The default value is0 which means no timeout.

Variables(\v)

The \v command returns a sorted list containingthe variables in the current context (directory). When used with no parameters,it returns the variables in the current context.

        \v
`s#`L`h`kt`p`pi`r`sqrt2`t`tdetails`third

Use \v with the name of a context to list its variables.

        \v .debug
`s#`breakPoints`stopPoints

Workspace(\w)

The workspace command \w (note lower case)displays six integer values that indicate memory usage by the currentworkspace.

        \w
168144 67108864 67108864 0 0 8589934592j

The first value indicates the number of bytes currentlyallocated. The second indicates the total number of bytes available in theheap. The third indicates the maximum heap seenn so far in the current session.The fourth indicates the maximum bytes available if set with the -w commandline option, else 0. The fifth display the bytes mapped. The sixth displays thephysical memory.

WeekOffset (\W)

The week offset command \W (note upper case)specifies the start of week offset. An offset of 0 corresponds to Saturday. Thedefault is 2, which is Monday.

ExpungeHandler (\x)

The expunge handler command \x deletes the assignment of auser-specified function to one of the.z.p* event handlers andrestores the default behavior. For example, if you have assigned a routine to.z.pcin order to process remote connection close, reset with,

        \x .z.pc

DateFormat (\z)

The date format command \z specifies the formatfor date parsing. A value of 0 corresponds tomm/dd/yyyy; a value of 1corresponds todd/mm/yyyy.

        \z
0
 
        "D"$"12/31/2007"
2007.12.31
 
        "D"$"31/12/2007"
0Nd
 
        \z 1
        "D"$"12/31/2007"
0Nd
 
        "D"$"31/12/2007"
2007.12.31

OperatingSystem (\text)

If a backslash is followed by characters not recognized asa kdb+ command, the text is assumed to be an operating system command and ispassed to the O/S for execution.

For example, you can issue,

        \dir                          / display Windows directory
(" Volume in drive C has no label.";" Volume Serial Number is E89F-3533";..
 
        \pwd                          / display Unix directory
"/Users/jeffry/bin"

Any return value from the O/S is displayed as a list ofstrings.

Interrupt(Ctrl-C)

You can terminate a long-running routine by pressing the Ctrl-Ccombination.

Terminate(\)

The terminate command, denoted by a single backslash (\),exits one level of the q interpreter. This is useful when debugging a failedfunction evaluation. In the following console shot, we do not suppress the qprompt.

 
        q)f:{x*y}
        q)f[2;`3]
{x*y}
'type
*
2
`3
        q))\
        q)_

Here the underscore denotes the blinking cursor.

Advanced:If you issue \ at the "q)" prompt, you drop into a k session.

        q)\
_

Again, the underscore denotes the blinking cursor. Becausek is q's underlying implementation language, some q expressions will execute asexpected in the k session but most will not. Explanation of k is beyond thescope of this manual.

To return to the q console from a k session and see the"q)" prompt again, enter a single \ at the prompt.

        \
q)

Exit Q(\\)

To exit the q process, enter a double backslash (\\),

        \\

Important:There is no confirmation prompt for \\. The q session is terminated with extremeprejudice.

SystemVariables

Variables in certain reserved contexts provide useful qenvironmental interaction.

IP Address(.z.a)

The variable .z.a is an int representing the IPaddress of the current running kdb instance. To see the usual four-integer IPaddress, decode the int using base 256. For example, on the author’s laptop,

        .z.a
-1442929031
        `int$0x00 vs .z.a
169 254 166 121

Dependencies(.z.b)

The systen variable .z.b is a dictionary thatrepresents variable dependencies. Recall that non-local assignment with ::establishes a dependency between the variable and variables in the expressionassigned to it. These dependencies are recorded in the dictionary .z.bthat maps a variable name to a list of the names of variables that depend onit.

For example, in a new q session, we find,

        a:42
        b:98
        c::a+b
        .z.b
a| c
b| c

GlobalDate (.z.d)

The variable .z.d retrieves the date component ofGreenwich Mean Time (GMT) and is equivalent to,

        \`date$.z.z

Local Date(.z.D)

The variable .z.D retrieves the local datecomponent from the local datetime and is equivalent to,

        `date$.z.Z

StartupFile (.z.f)

The system variable .z.f is a symbol representingthe name of the file or directory provided on the command line when the runninginstance of q was invoked. For example, if q is invoked from the O/S consolewith,

        q.exe convertargs.q 42 forty 2.0

we find,

        .z.f
`convertargs.q
        .z.x
("42";"forty";"2.0")

Host (.z.h)

The variable .z.h is a symbol representing thename of the host running the q instance.

        .z.h
`macpro.local

Process ID(.z.i)

The system variable .z.i is an int representingthe process id of the running q instance.

        .z.i
8615

Note:As of this writing (Jun 2007), .z.i is not yet implemented on Windows.

ReleaseDate (.z.k)

The system variable .z.k is a date valuerepresenting the release date of the running kdb+ instance.

        .z.k
2006.06.01

ReleaseMajor Version (.z.K)

The system variable .z.K is a float value representing themajor version of the running kdb+ instance.

        .z.K
2.4

LicenseInformation (.z.l)

The variable .z.l is a list of strings containinginformation about the license of the running kdb+ instance. The most useful arethe items in positions 1 and two which represent the expiry date and updatedate, respectively.

#1q
        .z.l
("";"2007.07.01";"2007.07.01";,"1";,"1";,"0";,"0")

O/S (.z.o)

The system variable .z.o is a symbol representingthe underlying operating system. For example, this tutorial is being written ona 64 bit Mac system.

        .z.o
`m64

ProcessClose (.z.pc)

The variable .z.pc is a q function representing anevent handler that is executed whenever a connection to the current q processis closed. SeeInterprocess Communication for a discussion.

To reset the .z.pg to the default behavior, issuethe command,

        \x .z.pc

ProcessGet (.z.pg)

The variable .z.pg is a q function representing anevent handler that is executed whenever a client q process makes a synchronouscall to the current q process. The name derives from the fact that anasynchronous call has get semantics. SeeInterprocess Communication for a discussion.

To reset the .z.pg to the default setting, issuethe command,

        \x .z.pg

ProcessHTTP Get (.z.ph)

The variable .z.ph is a q function representing anevent handler that is executed whenever an HTTP get is routed to the current qprocess. SeeInterprocess Communication for a discussion.

To reset the .z.ph to the default setting, issuethe command,

        \x .z.ph

ProcessInput (.z.pi)

The variable .z.pi is a qfunction representing an event handler that is executed when q echoes theresult of user input to the console. You can make the console display mimicthat of 2.3 by assigning,

        .z.pi:{-1 .Q.s1 value x}

You can make the console display mimic that of 2.4 byassigning,

        .z.pi:{-1 .Q.s value x}

To reset the .z.pi to the default setting, issuethe command,

        \x .z.pi

ProcessOpen (.z.po)

The variable .z.po is a q function representing anevent handler that is executed whenever a connection to the current q processis opened. SeeInterprocess Communication for a discussion.

To reset the .z.po to the default setting, issuethe command,

        \x .z.po

ProcessHTTP Post (.z.pp)

The variable .z.pp is a q function representing anevent handler that is executed whenever an HTTP post is routed to the current qprocess.

To reset the .z.pp to the default setting, issue thecommand,

        \x .z.pp

ProcessSet (.z.ps)

The variable .z.ps is a q function representing anevent handler that is executed whenever a client q process makes anasynchronous call to the current q process. The name derives from the fact thatan asynchronous call has set semantics. SeeInterprocess Communication for a discussion.

To reset the .z.ps to the default setting, issuethe command,

        \x .z.ps

GlobalTime (.z.t)

The variable .z.t retrieves the time component ofGreenwich Mean Time (GMT) and is equivalent to,

        `time$.z.z

Local Time(.z.T)

The variable .z.T retrieves the time component ofGreenwich Mean Time (GMT) and is eqivalent to,

        `time$.z.z

TimerExpression (.z.ts)

The variable .z.ts is a q function representing anevent handler that is executed on every timer tick (see the command\t).For example, the following displays local time to the console approximatelyevery two seconds.

        .z.ts:{0N!`time$.z.Z}
         \t 2000
07:20:00.329
07:20:02.332
07:20:04.335
...

User(.z.u)

The variable .z.u is a symbol representing theuser id that invoked the running q instance.

         .z.u
`Jeffry

Value Set(.z.vs)

The variable .z.vs is a q function representing anevent handler that is executed whenever anyglobal variable in rootnamespace is assigned in q. You could use.z.vs, for example, tomonitor who is modifying certain variables.

The signature of the handler is,

        {[v;i]...}

where v represents a symbol with the name of thevariable being assigned andi is the index for which the assignment isapplied. The following trivial handler displaysv andi tothe console.

        .z.vs:{[v;i]0N!v;0N!i;}
        a:42
`a
()
 
        a:til 5
`a
()
 
        a[2]:42
`a
,2
 
        a[0 3]:6
`a
,0 3

Since the granularity of .z.s is all or nothing,you'd need to write your own logic to monitor only certain variables, forinstance.

To remove the handler, issue the command \x .z.vs.

        \x .z.vs
        a:42
_

Handle(.z.w)

The variable .z.w contains an int with theconnection handle (i.e., “who”) during synchronous or asynchronous requestprocessing. SeeInterprocess Communication for a discussion.

CommandLine Parameters (.z.x)

The system variable .z.x is a list of stringsrepresenting the command line parameters provided after the name of the file ordirectory on the command line when the running instance of q was invoked. Forexample, if q is invoked from the O/S console with,

        q.exe convertargs.q 42 forty 2.0

we find,

        .z.f
`convertargs.q
        .z.x
("42";"forty";"2.0")

GMT (.z.z)

The variable .z.z is a datetime value representingthe current Greenwich Mean Time (GMT) as reported by the operating system.

        .z.z
2007.02.02T15:24:28.156

Local Dateand Time (.z.Z)

The variable .z.Z is a datetime value representing thecurrent local time as known to the operating system.

        .z.Z
2007.02.02T10:24:30.820

Note:The -o startup option or \o command override the default time zone offset asdetermined by the operating system. This is useful when you want to adjust timemanually, such as for daylight savings time.

CommandLine Parameters

We describe here the options of a q session that can be setvia command line parameters. A command line parameter is deonted by a dash (-)and a single character, followed by whitespace and then the valu(s) of theparameter. Multiple command line characters are separated by whitespace and canbe entered in any order.

Note:The case of the command line character is significant.

Most command line parameters have equivalent workspacecommands denoted by the same character. SeeCommand Format (\d)for detailed descriptions and examples.

Console(-c)

The console parameter is a pair of ints that specifythe size of the q virtual console display. The first specifies the number ofrows and the second the number of columns. The default setting is 23 by 79.This parameter corresponds to the command\c.

WebBrowser Console (-C)

The web console parameter (note upper case) is apair of ints the specify the size of the q web console display. The firstparameter specifies the number of rows and the second the number of columns.The default setting is 36 by 2000. This parameter corresponds to the command \C.

Offset(-o)

The offset parameter is an int that sets the offsetin hours from GMT used to determine local time in.z.Z. This parametercorresponds to the command\o.

Port (-p)

The port parameter is an int that specifies the portnumber on which the kdb+ server listens. This parameter corresponds to thecommand\p.

PrintDigits (-P)

The print digits parameter is an int that specifiesthe display precision for floating point numbers to the specified number ofdigits. The default precision is 7, meaning that the display of float or realvalues is rounded to the seventh significant digit. This parameter correspondsto the command \P.

Timer (-t)

The timer parameter is an int that specifies thenumber of milliseconds between timer ticks, with 0 signifying that the timer isturned off. This parameter corresponds to the command\t.

Timeout(-T)

The timeout parameter (note upper case) is an intthat specifies the number of milliseconds any call from a client will executebefore it is timed out and terminated. The default value is 0 which means notimeout. This parameter corresponds to the command \T.

WorkspaceSize (-w)

The workspace parameter is an int that specifies themaximum workspace size in megabytes. The default value is unlimited. A value of0 means an unlimited workspace. In a multithreaded mode, as each thread has itsown heap, this limit is per thread and not per process.

WeekOffset (-W)

The week offset parameter (note upper case) is anint that specifies the start of week as an offset from Saturday. For example,

        q –W 2

starts a q session in which Monday is considered thebeginning of the week.

DateFormat (-z)

The date format parameter is a boolean value that specifiesthe format expected in date parsing. A value of 0 corresponds tomm/dd/yyyy;a value of 1 corresponds todd/mm/yyyy. This parameter corresponds tothe command \z.

Contents

14. Built-inFunctions

Overview

The collection of built-in functions in q is rich andpowerful. In this chapter, we group functions by form. Astring functiontakes a string and returns a string. Anaggregate function takes a listand returns an atom. Auniform function takes a list and returns a listof the same count. A mathematical function takes numeric arguments and returnsa numeric argument derives by some numerical calculation.

Note that these categories are not mutually exclusive. Forexample, some mathematical functions are also aggregate functions.

StringFunctions

The basic string functions perform the usual stringmanipulations on a list of char. There are also powerful functions that areunique to q.

like

The dyadic like performs pattern matching on itsfirst string argument (source) according to the pattern in its stringsecond argument (pattern). It returns a boolean result indicatingwhetherpattern is matched. The pattern is expressed as a mix of regularcharacters and special formatting characters. The special chars are"?", "*", the pair"[" and"]",and "^" enclosed in square brackets.

The special char "?" represents an arbitrarysingle character in the pattern.

        "fan" like "f?n"
1b
        "fun" like "f?n"
1b
        "foP" like "f?p"
0b

The special char "*" represents an arbitrarysequence of characters in the pattern.

Note:As of this writing (Jan 2007), only a single occurance of * is allowed in thepattern.

        "how" like "h*"
1b
        "hercules" like "h*"
1b
        "wealth" like "*h"
1b
        "flight" like "*h*"
1b
        "Jones" like "J?ne*"
1b
        "Joynes" like "J?ne*"
0b
        "Joynes" like "J*ne*"
'nyi

The special character pair "[" and"]" encloses a sequence of alternatives for a single charactermatch.

        "flap" like "fl[ao]p"
1b
        "flip" like "fl[ao]p"
0b
        "459-0609" like "[09][09][09]-0[09][09][09]"
1b
        "459-0609" like "[09][09][09]-1[09][09][09]"
0b

The special character "^" is used in conjunctionwith "[" and "]" to indicate that the enclosedsequence of characters is disallowed. For example, to test whether a stringends in a numeric character,

        "M26d" like "*[^09]"
1b
        "Joe999" like "*[^09]"
0b

lower

The monadic lower takes a char or string argumentand returns the result of converting any alpha characters to lower case.

        lower "A"
"a"
        lower "a Bc42De"
"a bc42de"

ltrim

The monadic ltrim takes a string argument andreturns the result of removing leading blanks.

        ltrim "   abc  "
"abc  "

You can also apply ltrim to a non-blank char.

        ltrim "a"
"a"

rtrim

The monadic rtrim takes a string argument andreturns the result of removing trailing blanks.

        rtrim "   abc  "
"   abc"

You can also apply rtrim to a non-blank char.

        rtrim "a"
"a"

ss

The dyadic ss ("string search") performsthe same pattern matching as like against its first string argument (source),looking for matches to its string second argument (pattern). However,the result ofss is a list containing the position(s) of the matchesof the pattern insource. See above for a discussion of like.

        "Now is the time for all good men to come to" ss "me"
13 29 38
        "fun" ss "f?n"
,0

If no matches are found, an empty int list is returned.

        "aa" ss "z"
`int$()

Note:You cannot use * to match withss.

ssr

The triadic ssr ("string search andreplace") extends the capability ofss with replacement. Theresult is a string based on the first string argument (source) in whichall occurrences of the second string argument (pattern) are replacedwith the third string argument.

        ssr["suffering succotash";"s";"th"]
"thuffering thuccotathh"

Note:You cannot use * to match withssr.

string

The monadic string can be applied to any q entityto produce a textual representation of the entity. For scalars, lists andfunctions, the result ofstring is a list of char that does notcontain any q formatting characters. Following are some examples.

        string 42
"42"
        string 6*7
"42"
        string 42422424242j
"42422424242"
 
        string `Zaphod
"Zaphod"
 
        f:{[x] x*x}
        string f
"{[x] x*x}"

The next example demonstrates that string is notatomic, because the result of applying it to an atom is alist of char.

        string "4"
,"4"

The next example may be surprising.

        string 0x42
"42"

To see why, recall from Creating Symbols from Stringsthat a string can be parsed into q data using $ with the appropriateupper-case type domain character. Now, converting to a string and parsing froma string should be inverse maps, in that their composite returns the originalinput value. That is, we should find,

        "X"$string 0x42
0x42

Thus, the behavior of string is determined by thatof parse.

        "X"$"42"
0x42

Comparing these two results, we see that the result of stringon a byte must not contain the format characterless. This reasoningworks for other types as well.

Although string is not atomic (it returns a listfrom an atom), it does act like an atomic function in that its application isextended item-wise to a list.

        string 42 98
"42"
"98"
        string 1 2 3
,"1"
,"2"
,"3"
        string "Beeblebrox"
,"B"
,"e"
,"e"
,"b"
,"l"
,"e"
,"b"
,"r"
,"o"
,"x"
        string(42; `life; ("the"; 0x42))
"42"
"life"
((,"t";,"h";,"e");"42")

Considering a list as a mapping, we see that stringacts on the range of the mapping. Viewing a dictionary as a generalized list,we conclude that the action ofstring on a dictionary should alsoapply to its range.

        d:1 2 3!100 101 102
        string d
1| "100"
2| "101"
3| "102"

A table is the flip of a column dictionary, so we expect stringto operate on the range of the column dictionary.

        t:([] a:1 2 3; b:`a`b`c)
        string t
a    b
---------
,"1" ,"a"
,"2" ,"b"
,"3" ,"c"

Finally, a keyed table is a dictionary, so we expect stringto operate on the value table.

        kt:([k:1 2 3] c:100 101 102)
        string kt
k| c
-| -----
1| "100"
2| "101"
3| "102"

sv

The basic form of dyadic sv ("string fromvector") takes a char as its left operand and a list of strings (source)as its right operand. It returns a string that is the concatenation of thestrings insource, separated by the specified char.

        ";" sv("Now";"is";"the";"time";"")
"Now;is;the;time;"

When sv is used with an empty symbol as its leftoperand and a list of symbols as its right operand (source), the resultis a symbol in which the items insource are concatenated with aseparating dot.

        ` sv `qalib`stat
`qalib.stat

This is useful for q context names.

When sv is used with an empty symbol as its leftoperand and a symbol right operand (source) whose first item is a filehandle, the result is a symbol in which the items insource areconcatenated with a separating forward-slash. This is useful for fullyqualified q path names.

        ` sv `:`q`tutorial`draft1
`:/q/tutorial/draft1

When sv is used with an int left operand (base)that is greater than 1, together with a right operand of a simple list of placevalues expressed inbase, the result is an int representing theconverted base 10 value.

        2 sv 101010b
42
        10 sv 1 2 3 4 2
12342
 
        256 sv 0x001092
4242

Advanced:More precisely, the last version ofsv evaluates thepolynomial,

        (d[n-1]*b exp n-1) + ... +d[0]

where d is the list of digits, n is thecount of d, andb is the base.

Thus, we find,

        10 sv 1 2 3 11 2
12412
        -10 sv 2 1 5
195

trim

The monadic trim takes a string argument andreturns the result of removing leading and trailing blanks.

        trim "   abc  "
" abc"

Note:The functiontrim is equivalent to,

        {ltrim rtrim x}

You can also apply trim to a non-blank char.

        trim "a"
"a"

upper

The monadic upper takes a char, string or symbolargument and returns the result of converting any alpha characters to uppercase.

        upper "a"
"A"
        upper "a Bc42De"
"A BC42DE"

vs

The dyadic vs ("vector from string")takes a char as its left operand and a string (source) as its rightoperand. It returns a list of strings containing the tokens ofsource asdelimited by the specified char.

        " " vs "Now is the time "
"Now"
"is"
"the"
"time"
""

When vs is used with an empty symbol as its leftoperand and a symbol right operand (source) containing separating dots,it returns a simple symbol list obtained by splittingsource along thedots.

        ` vs `qalib.stat
`qalib`stat

When vs is used with an empty symbol as its leftoperand and a symbol representing a fully qualified file name as the rightoperand, it returns a simple list of symbols in which the first item is thepath and the second item is the file name.

       ` vs `:/q/tutorial/draft
`:/q/tutorial`draft

Note that in the last usage, vs is not quite theinverse of sv.

When vs is used with a null of binary type as theleft operand and an value of integer type as the right operand (source),it returns a simple list whose items comprise the digits of the correspondingbinary representation ofsource.

        0x00 vs 4242
0x00001092
 
        10h$0x00 vs 8151631268726338926j
"q is fun"
 
        0b vs 42
00000000000000000000000000101010b

Advanced:The last form can be used to display the internal representation of specialvalues.

        0b vs 0W
01111111111111111111111111111111b
 
        0b vs -0W
10000000000000000000000000000001b

MathematicalFunctions

The mathematical functions perform the mathematicaloperations for basic calculations. Their implementations are efficient.

acos

The monadic acos is the mathematical inverse of cos.For a float argument between -1 and 1,acos returns the float between0 and π whose cosine is the argument.

        sqrt 2:1.414213562373095
        acos 1
0f
 
        acos sqrt2
0n
 
        acos -1
3.141592653589793
\        acos 0
1.570796326794897

asin

The monadic asin is the mathematical inverse of sin.For a float argument between -1 and 1,asin returns the float between-π/2 and π/2 whose sine is the argument.

        sqrt2:1.414213562373095
        asin 0
0f
 
       asin sqrt 2%2
0.7853982
 
        asin 1
1.570796
 
        asin -1
-1.570796326794897

atan

The monadic atan is the mathematical inverse of tan.For a float argument, it returns the float between -π/2 and π/2 whose tangentis the argument.

        sqrt2:1.414213562373095
 
        atan 0
0f
 
      atan sqrt 2
0.9553166181245093
 
        atan 1
0.7853981633974483

cor

The dyadic cor takes two numeric lists of the samecount and returns a float equal to the mathematical correlation between theitems of the two arguments.

        23 -11 35 0 cor 42  21 73 39
0.9070229

Note:The functioncor is equivalent to,

        {cov[x;y]%dev[x]*dev y}

cos

The monadic cos takes a float argument and returnsthe mathematical cosine of the argument.

        pi:3.141592653589793
        cos 0
1f
 
        cos pi%3
0.5000000000000001
 
        cos pi%2
6.123032e-017
 
        cos pi
-1f

cov

The dyadic cov takes a numeric atom or list inboth arguments and returns a float equal to the mathematical covariance betweenthe items of the two arguments. If both arguments are lists, they must have thesame count.

        98 cov 42
0f
 
        23 -11 35 0 cov 42  21 73 39
308.4375

Note:The functioncov is equivalent to,

        {avg[x*y]-avg[x]*avg y}

cross

The binary cross takes atoms or lists as argumentsand returns their Cartesian product - that is, the set of all pairs drawn fromthe two arguments.

        1 2 cross `a`b`c
1 `a
1 `b
1 `c
2 `a
2 `b
2 `c

Note:Thecross operator is equivalent to the function,

        {raze x,\:/:y}

inv

The monadic inv returns the inverse of a floatmatrix.

        m:(1.1 2.1 3.1; 2.3 3.4 4.5; 5.6 7.8 9.8)
        inv m
-8.165138 16.51376  -5
12.20183  -30.18349 10
-5.045872 14.58716  -5

Note:An integer argument will cause an error, so cast it to float.

lsq

The dyadic matrix function lsq returns the matrixX that solves the following matrix equation, whereA is the floatmatrix left operand,B is the float matrix right operand and·is matrix multiplication.

        A = X·B

For example,

        A:(1.1 2.2 3.3;4.4 5.5 6.6;7.7 8.8 9.9)
        B:(1.1 2.1 3.1; 2.3 3.4 4.5; 5.6 7.8 9.8)
        A lsq B
1.211009  -0.1009174 2.993439e-12
-2.119266 2.926606   -3.996803e-12
-5.449541 5.954128   -1.758593e-11

Observe that the result of lsq can be obtained as,

        A mmu inv B
1.211009  -0.1009174 1.77991e-12
-2.119266 2.926606   -5.81224e-12
-5.449541 5.954128   -1.337952e-11

Note:Integer arguments will cause an error, so cast them to float.

mmu

The dyadic matrix multiplication function mmureturns the matrix product of its two float vector or matrix arguments, whichmust be of the correct shape.

Note:Integer arguments will cause an error, so cast them to float.

Here is an example of multiplying a matrix and itstranspose.

        m1:(1.1 2.2 3.3;4.4 5.5 6.6;7.7 8.8 9.9)
        m2:flip m2
        m1 mmu  m2
36.3   43.56  50.82
79.86  98.01  116.16
123.42 152.46 181.5

The $ operator is overloaded to yield matrixmultiplication when its arguments are float vectors or matrices.

        1 2 3f mmu 1 2 3f
14f
 
        1 2 3f$1 2 3f
14f

sin

The monadic sin takes a float argument and returnsthe mathematical sine of the argument.

        pi:3.141592653589793
        sin 0
0f
 
        sin pi%4
0.7071068
 
        sin pi%2
1f
 
        sin pi
1.224606e-016

tan

The monadic tan takes a float argument and returnsthe mathematical tangent of the argument.

Note:The valuetan x is (sin x)%cos x

        pi:3.141592653589793
        tan 0
0f
 
        tan pi%8
0.4142136
 
        tan pi%4
1f
 
        tan pi%2
1.633178e+016
 
        tan pi
-1.224606e-016

var

The monadic var takes a scalar or numeric list andreturns a float equal to the mathematical variance of the items.

        var 42
0f
 
        var 42 45 37 38
10.25

Note:The functionvar is equivalent to

        {(avg[x*x]) - (avg[x])*(avg[x])}

wavg

The dyadic wavg takes two numeric lists of thesame count and returns the average of the second argument weighted by the firstargument. The result is always of type float.

        1 2 3 4 wavg 500 400 300 200
300f

Note:The expressionw wavg b is equivalent to,

        (sum w*a)%sum w

In our example,

        (sum (1 2 3 4)*500 400 300 200)%sum 1 2 3 4
300f

It is possible to apply wavg to a nested listprovided all sublists of both arguments conform. In this context, the resultconforms to the sublists and the weighted average is calculated recursivelyacross the sublists.

        (1 2;3 4) wavg (500 400; 300 200)
350 266.6667
 
        ((1;2 3);(4;5 6)) wavg ((600;500 400);(300;200 100))
360f
285.7143 200

wsum

The dyadic wsum takes two numeric lists of thesame count and returns the sum of the second argument weighted by the firstargument. The result is always of type float.

        1 2 3 4 wsum 500 400 300 200
3000f

Note:The expressionw wsum b is equivalent to,

        sum w*a

In our example,

        sum (1 2 3 4)*500 400 300 200
3000

It is possible to apply wsum to a nested listprovided all sublists of both arguments conform. In this context, the resultconforms to the sublists and the weighted sum is calculated recursively acrossthe sublists.

        (1 2;3 4) wsum (500 400;300 200)
1400 1600
 
        ((1;2 3);(4;5 6)) wsum ((600;500 400);(300;200 100))
1800
2000 1800

AggregateFunctions

An aggregate function operates on a list and returns anatom. Aggregates are especially useful with grouping inselectexpressions.

all

The monadic all takes a scalar or list of numerictype and returns the result of& applied across the items.

        all 1b
1b
 
        all 100100b
0b
 
        all 10 20 30
10

any

The monadic any takes a scalar or list of numerictype and returns the result of| applied across the items.

        any 1b
1b
 
        any 100100b
1b
 
        any 2001.01.01 2006.10.13
2006.10.13

avg

The monadic avg takes a scalar, list, dictionaryor table of numeric type and returns the arithmetic average. The result isalways of type float.

        avg 42
42f
 
        avg 1 2 3 4 5
3f
 
        avg `a`b`c!10 20 40
23.33333

It is possible to apply avg to a nested listprovided the sublists conform. In this context, the result conforms to thesublists and the average is calculated recursively on the sublists.

        avg (1 2; 100 200; 1000 2000)
367 734f
 
        avg ((1 2;3 4); (100 200;300 400))
50.5  101
151.5 202

For tables, the result is a dictionary that maps eachcolumn name to the average of its column values.

        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        avg t
c1| 2.75
c2| 3.5

dev

The monadic dev takes a scalar, list, ordictionary of numeric type and returns the standard deviation. For result is afloat.

        dev 42
0f
 
        dev 42 45 37 38
3.201562
 
        dev `a`b`c!10 20 40
12.47219

Note:The functiondev is equivalent to

        {sqrt[var[x]]}

med

The monadic med takes a list, dictionary or tableof numeric type and returns the statistical median.

For lists and dictionaries, the result is a float.

        med 42  21 73 39
40.5
 
        med `a`b`c!10 20 40
20f

Note:The functionmed is equivalent to,

        {$[n:count x;.5*sum x[rank x]@floor .5*n-1 0;0n]}

For tables, the result is a dictionary mapping the columnnames to their value medians.

        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        med t
c1| 2.75
c2| 3.5

prd

The monadic prd takes a scalar, list, dictionaryor table of numeric type and returns the arithmetic product.

For scalars, lists and dictionaries the result has the typeof its argument.

        prd 42
42
 
        prd 1.1 2.2 3.3 4.4 5.5
193.2612
 
        prd `a`b`c!10 20 40
8000

It is possible to apply prd to a nested listprovided the sublists conform. In this case, the result conforms to thesublists and the product is calculated recursively on the sublists.

        prd (1 2; 100 200; 1000 2000)
100000 800000
 
        prd ((1 2;3 4); (100 200;300 400))
100 400
900 1600

For tables, the result is a dictionary that maps eachcolumn name to the product of its column values.

        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        prd t
c1| 35.1384
c2| 120

sum

The monadic sum takes a scalar, list, dictionaryor table of numeric type and returns the arithmetic sum.

For scalars, lists and dictionaries the result has the typeof its argument.

        sum 42
42
 
        sum 1.1 2.2 3.3 4.4 5.5
16.5
 
        sum `a`b`c!10 20 40
70

It is possible to apply sum to a nested listprovided the sublists conform. In this case, the result conforms to thesublists and the sum is calculated recursively on the sublists.

        sum (1 2; 100 200; 1000 2000)
1101 2202
 
        sum ((1 2;3 4); (100 200;300 400))
101 202
303 404

For tables, the result is a dictionary that maps eachcolumn name to the sum of its column values.

        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        sum t
c1| 11
c2| 14

UniformFunctions

Uniform functions operate on lists and return lists of thesame shape. They are useful inselect expressions.

deltas

The uniform deltas takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the differenceof each item from its predecessor.

        deltas 42
42
 
        deltas 1 2 3 4 5
1 1 1 1 1
 
        deltas 96.25 93.25 58.25 73.25 89.50 84.00 84.25
96.25 -3 -35 15 16.25 -5.5 0.25
 
        deltas `a`b`c!10 20 40
a| 10
b| 10
c| 20
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        deltas t
c1  c2
------
1.1 5
1.1 -1
1.1 -1
1.1 -1

Important:As the third example shows, the result ofdeltas contains theinitial item ofsource in its initial position. This may be inconsistentwith the behavior of similar functions in other languages or libraries thatreturn 0 in the initial position. The alternate behavior can be achieved withthe expression

        1_deltas (1#x),x

In our example above,

        1_deltas (1#x),x:96.25 93.25 58.25 73.25 89.50 84.00 84.25
0 -3 -35 15 16.25 -5.5 0.25

differ

The uniform differ takes as its argument (source)a list and returns a boolean list whose item in position i is the result ofmatch (~) applied to the item at positioni and the item at positioni-1.The result of differ on a scalar is0b.

Note:The item at position 0 in the result is always 1b.

        differ 1 1 2
101b
 
        differ 0N 0N 1 1 2
10101b
 
        differ "mississippi"
11101101101b
 
        differ (1 2; 1 2; 3 4 5)
101b

One use of differ is to locate runs of repreateditems in a list.

        L:0 1 1 2 3 2 2 2 4 1 1 3 4 4 4 4 5
        L where nd|next nd:not differ L
1 1 2 2 2 1 1 4 4 4 4

fills

The uniform fills takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns a copy of thesourcein which non-null items are propagated forward to fill nulls.

        fills 42
42
 
        fills 1 0N 3 0N 5
1 1 3 3 5
 
        fills `a`b`c`d`e`f!10 0N 30 0N 0N 60
a| 10
b| 10
c| 30
d| 30
e| 30
f| 60
 
        tt:([] c1:1 0N 3 0N; c2:`a`b``d)
        tt
c1 c2
-----
1  a
   b
3
   d
 
        fills tt
c1 c2
-----
1  a
1  b
3  b
3  d

Note:Initial nulls are not affected byfills.

        fills 0N 0N 3 0N 5
0N 0N 3 3 5

mavg

The uniform dyadic mavg takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving average ofsource, obtained byapplyingavg over length consecutive items. For positions lessthanlength-1,avg is applied only through that position.

In the following example, the first item in the result isthe average of itself only; the second result item is the average of the firsttwo source items; all other items reflect the average of the item at theposition along with its two predecessors.

        3 mavg 10 20 30 40 50
10 15 20 30 40f

For length 1, the result is the source converted to float.Forlength less than or equal to 0 the result is all nulls.

Note:As of release 2.4,mavg ignores null values.

         3 mavg  10 20 0N 40 50 60 0N
10 15 15 30 45 50 55f

maxs

The uniform maxs takes as its argument (source)a scalar, list, dictionary or table and returns the cumulative maximum of thesourceitems.

        maxs 42
42
 
        maxs 1 2 5 4 10
1 2 5 5 10
 
        maxs "Beeblebrox"
"Beeelllrrx"
 
        maxs `a`b`c`d!10 30 20 40
a| 10
b| 30
c| 30
d| 40
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
       maxs t
c1  c2
------
1.1 5
2.2 5
3.3 5
4.4 5

mcount

The uniform dyadic mcount takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving count ofsource, obtained byapplyingcount over length consecutive items. For positionsless thanlength-1,count is applied only through thatposition.

This function is useful in computing other movingquantities. For example,

        3 mcount 10 20 30 40 50
1 2 3 3 3

For length less than or equal to 0 the result is allzeroes

Note:As of release 2.4, mcount ignores null values.

        3 mcount 10 20 0N 40 50 60 0N
1 2 2 2 2 3 2

mdev

The uniform dyadic mdev takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving standard deviation ofsource,obtained by applyingdev over length consecutive items. Forpositions less thanlength-1,dev is applied only through thatposition.

In the following example, the first item in the result isthe standard deviation of itself only; the second result item is the standarddeviation of the first two source items; all other items reflect the standarddeviation of the item at the position along with its two predecessors.

        3 mdev 10 20 30 40 50
0 5 8.164966 8.164966 8.164966

For length less than or equal to 0 the result is allnulls.

mins

The uniform mins takes as its argument (source)a scalar, list, dictionary or table and returns the cumulative minimum of thesourceitems.

        mins 42
42
 
        mins 10 4 5 1 2
10 4 4 1 1
 
        mins "Beeblebrox"
"BBBBBBBBBB"
 
        mins `a`b`c`d!40 10 30 20
a| 40
b| 10
c| 10
d| 10
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        mins t
c1  c2
------
1.1 5
1.1 4
1.1 3
1.1 2

mmax

The uniform dyadic mmax takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving maximum ofsource, obtained byapplyingmax over length consecutive items. For positions lessthanlength-1,max is applied only through that position.

In the following example, the first item in the result isthe max of itself only; the second result item is the max of the first twosource items; all other items reflect the max of the item at the position alongwith its two predecessors.

        3 mmax 20 10 30 50 40
20 20 30 50 50

For length less than or equal to 0 the result is source.

mmin

The uniform dyadic mmin takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving minimum ofsource, obtained byapplyingmin over length consecutive items. For positions lessthanlength-1,min is applied only through that position.

In the following example, the first item in the result isthe min of itself only; the second result item is the min of the first twosource items; all other items reflect the min of the item at the position alongwith its two predecessors.

        3 mmin 20 10 30 50 40
20 10 10 10 30

For length less than or equal to 0 the result is source.

msum

The uniform dyadic msum takes as its firstargument an int (length) and as its second argument (source) anumeric list. It returns the moving sum ofsource, obtained by applyingsumover length consecutive items. For positions less thanlength-1,sumis applied only through that position.

In the following example, the first item in the result isthe sum of itself only; the second result item is the sum of the first twosource items; all other items reflect the sum of the item at the position alongwith its two predecessors.

        3 msum 10 20 30 40 50
10 30 60 90 120

For length less than or equal to 0 the result is allzeros.

next

The uniform next takes as its argument (source)a scalar, list or table of numeric type and returns thesource shiftedone position to the left with no wrapping. For lists and dictionaries, the lastitem of the result is a null matching the type ofsource. For tables,the last record of the result is a row of nulls.

        next 1 2 3 4 5
2 3 4 5 0N
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        next t
c1  c2
------
2.2 4
3.3 3
4.4 2

prds

The uniform sums takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the cumulativeproduct of thesource items.

        prds 42
42
 
        prds 1 2 3 4 5
1 2 6 24 120
 
        prds `a`b`c!10 20 40
a| 10
b| 200
c| 8000
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        prds t
c1      c2
-----------
1.1     5
2.42    20
7.986   60
35.1384 120

prev

The uniform prev takes as its argument (source)a scalar, list, dictionary or table. It returns thesource shifted oneposition forward with initial null filling.

        prev 42
42
 
        prev 1 2 3 4 5
0N 1 2 3 4
 
        prev `a`b`c!10 20 40
a|
b| 10
c| 20
 
        t:([]c1:`a`b`c;c2:10 20 40)
        t
c1 c2
-----
a  10
b  20
c  40
 
       prev t
c1 c2
-----
a  10
b  20

rank

The uniform rank takes as its argument (source)a list, dictionary or table whose values are sortable. It returns a list of intcontaining the order of each item in thesource under an ascending sort.For dictionaries, the operation is against the range.

        rank 5 2 3 1 4
4 1 2 0 3
 
        rank `a`b`c`e`f! 5 2 3 1 4
4 1 2 0 3

For tables and keyed tables, the result is a list with therank of the records under ascending sort of the first column or the key column.

        ttt:([] c1:2.2 1.1 3.3 5.5 4.4; c2:1 2 3 4 5)
        ttt
c1  c2
------
2.2 1
1.1 2
3.3 3
5.5 4
4.4 5
 
       rank ttt
1 0 2 4 3
 
        kt:([k:103 102 101 105 104] d:1 2 3 4 5)
        kt
k  | d
---| -
103| 1
102| 2
101| 3
105| 4
104| 5
 
        rank kt
2 1 0 4 3

ratios

The uniform ratios takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the float ratioof each item to its predecessor.

        ratios 42
42
 
        ratios 1 2 3 4 5
1 2 1.5 1.333333 1.25
 
        ratios 96.25 93.25 58.25 73.25 89.50 84.00 84.25
96.25 0.9688312 0.6246649 1.257511 1.221843 0.9385475 1.002976
 
        deltas `a`b`c!10 20 40
a| 10
b| 10
c| 20
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        ratios t
c1       c2
------------------
1.1      5
2        0.8
1.5      0.75
1.333333 0.6666667

Important:As the second example shows, the result ofratios contains theinitial item ofsource in its initial position. This may be inconsistentwith the behavior of similar functions in other languages or libraries thatreturn 1 in the initial position. The alternate behavior can be achieved withthe expression,

        1,ratios 1_x

In our example above,

 #!q
       1,ratios 1_x:96.25 93.25 58.25 73.25 89.50 84.00 84.25
1
93.25
0.6246649
1.257511
1.221843
0.9385475
1.002976

rotate

The uniform dyadic rotate takes as its firstargument an int (length) and as its second argument (source) anumeric list or table. It returns the source shiftedlength positions tothe left with wrapping iflength is positive, orlength positionsto the right with wrapping iflength is negative. Forlength 0,it returns the source.

        2 rotate 1 2 3 4 5
3 4 5 1 2
 
       -2 rotate 1 2 3 4 5
4 5 1 2 3
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
---------
1.1  5
2.2  4
3.3  3
4.4  2
 
        2 rotate t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2

sums

The uniform sums takes as its argument (source)a scalar, list, dictionary or table of numeric type and returns the cumulativesum of thesource items.

        sums 42
42
 
        sums 1 2 3 4 5
1 3 6 10 15
 
        sums `a`b`c!10 20 40
a| 10
b| 30
c| 70
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        sums t
c1  c2
------
1.1 5
3.3 9
6.6 12
11  14

xbar

The uniform dyadic xbar takes as its firstargument a non-negative numeric atom (width) and a second argument (source)that is a numeric list, dictionary or table. It returns an entity that conformstosource, in which each item of source is mapped to the largestmultiple of thewidth that is less than or equal to that item. The typeof the result is that of thewidth parameter.

        3 xbar 2 7 12 17 22
0 6 12 15 21
 
        5.5 xbar 59.25 53.75 81.00 96.25 93.25 58.25 73.25 89.50 84.00 84.25
55 49.5 77 93.5 88 55 71.5 88 82.5 82.5
 
        15 xbar `a`b`c!10 20 40
a| 0
b| 15
c| 30
 
        t:([]c1:1.1 2.2 3.3 4.4; c2:5 4 3 2)
        t
c1  c2
------
1.1 5
2.2 4
3.3 3
4.4 2
 
        2 xbar t
c1 c2
-----
0  4
2  4
2  2
4  2

Since xbar is atomic in its second argument it canbe applied to a nested list.

        5 xbar ((11;21 31);201 301)
10  20 30
200 300

xprev

The dyadic xprev takes an int as its firstargument (shift) and is uniform in its second argument (source),which can be a list or a table. It returns a result that conforms tosource.Whenshift is 0 or positive, each entity in source is shiftedshiftpositions forward in the result, with the initialshift entries nullfilled.

        2 xprev 10 20 30 40
0N 0N 10 20
 
        t:([]c1:`a`b`c`d;c2:10 20 30 40)
        t
c1 c2
-----
a  10
b  20
c  30
d  40
 
        2 xprev t
c1 c2
-----
 
a  10
b  20

When shift is negative, the result is a copy of sourcewith the initialshift entries null filled.

        -2 xprev 10 20 30 40
30 40 0N 0N

xrank

The binary xrank is uniform in its right operand (source),which is a list, dictionary, table or keyed table whose values are sortable.The left operand is a positive int (quantile). It returns a list of intcontaining the quantile of the source distribution to which each item of sourcebelongs. The analysis is applied to the range of a dictionary and the firstcolumn of a table.

For example, by choosing quantile to be 4, xrankdetermines into which quartile each item ofsource falls.

        4 xrank 30 10 40 20 90
1 0 2 0 3
 
        4 xrank `a`b`c`d`e!30 10 40 20 90
1 0 2 0 3
 
        t:([]c1:30 10 40 20 90;c1:`a`b`c`d`e)
        t
c1 c11
------
30 a
10 b
40 c
20 d
90 e
 
        4 xrank t
1 0 2 0 3

Choosing quantile to be 100 gives percentileranking.

MiscellaneousFunctions

We collect here the built-in functions that don't fit intoany of the previously defined categories.

ConditionalAppend (?)

The left operand of conditional append ( ? ) is asymbol representing the name of a list of symbols (target) and the rightoperand is a symbol, the right operand is appended totarget if and onlyif it is not intarget. There is no effect when the right operand isalready in target. The result is the enumeration of the right operand intarget.

        v:`a`b`c
        `v?`z
`v$`z
 
        v
`a`b`c`z
 
        `v?`b
`v$`b
 
        v
`a`b`c`z

Note:While conditional append is normally used with a target list of unique items,this is not a requirement.

asc

The monadic function asc operates on a list or adictionary (source). The result ofasc on a list is a listcomprising the items ofsource sorted in increasing order with the s#attribute applied. The result ofasc on a dictionary is an equivalentmapping with the range items sorted in increasing order and with thes#attribute applied.

        asc 3 7 2 8 1 9
`s#1 2 3 7 8 9
 
        asc `b`c`a!3 2 1
a| 1
c| 2
b| 3

bin

The dyadic bin takes a simple list of items (target)in strictly increasing order as its first argument and is atomic in its secondargument (token). Loosely speaking, the result of bin is the position atwhichtoken would fall in target.

More precisely, the result is -1 if token is lessthan the first item intarget. Otherwise, the result is the position ofthe right-most item oftarget that is less than or equal to token;this reduces to the found position if the token is intarget. Iftokenis greater than the last item in target, the result is the count oftarget.

Note:For large sorted lists, the binary search performed bybin isgenerally more efficient than the linear search algorithm used byin.

Some examples with simple lists,

        1 2 3 4 bin 3
2
 
        "xyz" bin "a"
-1
 
        1.0 2.0 3.0 bin 0.0 2.0 2.5 3.0
-1 1 1 2

Observe that the type of token must strictly matchthat of target.

        1 2 3 bin 1.5
`type

We can apply bin to a dictionary to performreverse lookup, provided the dictionary domain is in increasing order. Whensourceis a dictionary,bin takes a token whose type matches that ofthe dictionary range. The result is null iftoken is less than everyitem of the range. Otherwise, the result is the right-most domain element whosecorresponding range element is less than or equal totoken. Loosely put,when token is not found, the result is the domain item after which youwould make an insertion to place it into the dictionary in proper order.

Note that the result reduces to the corresponding domainitem if token is found intarget, and is the last domain item iftokenis greater than every range item.

        d:10 20 30!`first`second`third
        d bin `second
20
 
        d bin `missing
10
 
        d bin `zero
30
 
        d bin `aaa
0N

Because a table is a list of records, we expect binto return the row number of a record.

        t:([] a:1 2 3; b:`a`b`c)
        t
a b
---
1 a
2 b
3 c
 
        t bin `a`b!(2;`b)
1

As always, the record can be abbreviated to the list of rowvalues.

        t bin (1;`a)
0
        t bin (0;`z)
0N

Observe that a record that is not found results in a nullresult.

Finally, since a keyed table is a dictionary, binwill perform a reverse lookup on a record of the value table, which can beabbreviated to a list of row values.

        kt:([k:1 2 3] c:100 101 102)
        kt
k| c
-| ---
1| 100
2| 101
3| 102
 
         kt bin (enlist `c)!enlist 101
k| 2
 
        kt bin 101
k| 2

Warning:While the items of the first argument ofbin should be in strictlyincreasing order for the result to meaningful, this condition is not enforced.The results ofbin when the first argument is not strictly increasing are predictablebut not particularly useful.

count

The monadic count returns a non-negative intrepresenting the number of entities in its argument. Its domain comprisesscalars, lists, dictionaries, tables and keyed tables.

        count 3
1
 
        count 10 20 30
3
 
        count `a`b`c`d!10 20 30 40
4
 
        count ([] a:10 20 30; b:1.1 2.2 3.3)
3
 
        count ([k:10 20] c:`one`two)
2

Note:You cannot usecount to determine whether an entity is a scalar or list since scalarsand singletons both have count 1.

        count 3
1
 
        count enlist 3
1

This test is accomplished instead by testing the sign ofthe type of the entity.

        0>type 3
1b
        0>type enlist 3
0b

Aside:Do you know why they call it count? Because it loves to count!! Nyah, ha, ha,ha, ha. Vun, and two, and tree, and....

cut

The binary operator cut is related to the _operator. It is the same as_ when the right operand is a dictionaryand the left operand is a list of items from the dictionary domain.

        d:1 2 3!`a`b`c
        (enlist 2) cut d
1| a
3| c

However, for a list right operand source and an intleft operand size,cut returns a new list created by collectingthe items ofsource into sublists of countsize.

        5 cut til 13
0 1 2 3 4
5 6 7 8 9
10 11 12

Advanced:Thecut function is equivalent to,

        {$[0>type x;x*til neg floor neg(count y)mod x;x]_y}

delete (_)

The symbol _ is overloaded to have several meaningsdepending on the signature of its operands. See also drop.

Note:When _ is used as an operator, whitespace isrequired to the left if theleft operand is a name. This is because _ is a valid non-initial namecharacter. Whitespace is permitted but not required to the right.

When the first argument of dyadic ( _ ) is a listof non-negative int and the second argument (source) is a list, itproduces a new list obtained by breakingsource into sublists at thepositions indicated in the first argument. An example will make this clear.

        0 3_100 200 300 400 500
100 200 300
400 500

Each sublist includes the items from the beginning cutposition up to, but not including, the next cut position. The final cutincludes the items to the end ofsource. Observe that if the leftargument does not begin with 0, the initial items ofsource will notbe included in the result.

        2 4_2006.01 2006.02 2006.03 2006.04 2006.05 2006.06
2006.03 2006.04
2006.05 2006.06

When the right operand of _ is a dictionary (source)and the left operand is a list of key values whose type matchessource,the result is a dictionary obtained by removing the specified key-value pairsfrom the target.

For example,

        d:1 2 3!`a`b`c
        (enlist 42) _ d
1| a
2| b
3| c
 
        (enlist 2) _ d
1| a
3| c
 
        1 3 _ d
2| b
 
        (enlist 32) _ d
1| a
2| b
3| c
 
        1 2 3 _ d
_

Note:The operand must be a list, so a single key value must be enlisted.

When the first argument of dyadic delete ( _ ) isa list or a dictionary (source) and the second argument is a position inthe list or an item in the domain of the dictionary, the result is a new entityobtained by deleting the specified item from the source.

        L: 101 102 103 104 105
        L _2
101 102 104 105
 
        d:`a`b`c`d!101 102 103 104
        d _ `b
a| 101
c| 103
d| 104

Since a table is a list, delete can be applied by rownumber.

        t:([]c1:1 2 3;c2:101 102 103;c3:`x`y`z)
        t
c1 c2  c3
---------
1  101 x
2  102 y
3  103 z
 
        t _ 1
c1 c2  c3
---------
1  101 x
3  103 z

Since a keyed table is a dictionary, delete can be appliedby key value.

        kt:([k:101 102 103]c:`one`two`three)
        kt
k  | c
---| -----
101| one
102| two
103| three
 
         kt _ 102
k  | c
---| -----
101| one
103| three

desc

The monadic function desc operates on a list or adictionary (source). The result ofdesc on a list is a listcomprising the items ofsource sorted in decreasing order with thes#attribute applied. The result ofdesc on a dictionary is an equivalentmapping with the range items sorted in decreasing order and with thes#attribute applied.

        desc 3 7 2 8 1 9
9 8 7 3 2 1
 
        desc `b`c`a!3 2 1
b| 3
c| 2
a| 1

distinct

The monadic function distinct returns the distinctentities in its argument. For a list, it returns the distinct items in thelist, in order of first occurrence.

        distinct 1 2 3 2 3 4 6 4 3 5 6
1 2 3 4 6 5

For a table, distinct returns a table comprisingthe distinct records of the argument, in the order of first occurrence.

        tdup:([]a:1 2 3 2 1; b:`washington`adams`jefferson`adams`wasington)
        tdup
a b
------------
1 washington
2 adams
3 jefferson
2 adams
1 wasington
 
        distinct tdup
a b
------------
1 washington
2 adams
3 jefferson
1 wasington

Observe that all fields of the records must be identicalfor the records to be considered identical. Otherwise put, if any fielddiffers, the records are distinct.

When applied to an int n, distinct produces arandom int between 0 (inclusive) and n (exclusive).

        distinct 42
37
        distinct 42
39

drop (_)

The symbol _ is overloaded to have several meaningsdepending on the signature of its operands. See also delete.

Note:When _ is used as an operator, whitespace isrequired to the left if theleft operand is a name. This is because _ is a valid non-initial namecharacter. Whitespace is permitted but not required to the right.

When the first argument of the dyadic _ is an intand the second argument (source) is a list, the result is a new listcreated via removal fromsource. A positive int in the first argumentindicates that the removal occurs from the beginning of thesource,whereas a negative int in the first argument indicates that the removal occursfrom the end of thesource.

The source can be a list, a dictionary, a table or akeyed table.

        2_10 20 30 40
30 40
 
        -3_`one`two`three`four`five
`one`two
 
         2_`a`b`c`d!10 20 30 40
c| 30
d| 40
 
        -1_([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)
a  b
------
10 1.1
20 2.2
30 3.3
 
        2_([k:10 20 30] c:`one`two`three)
k | c
--| -----
30| three

The result of drop is of the same type and shape as sourceand is never a scalar.

       1_42 67
,67

Observe that for nested lists, the deletion occurs at thetop-most level.

        1_(100 101 102;103 104 105)
103 104 105

In the degenerate case, the result is an empty entityderived from source.

        4_10 20 30 40
`int$()
 
        4_`a`b`c`d!10 20 30 40
        4_([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)
a b
--
 
        3_([k:10 20 30] c:`one`two`three)
k| c
-| -

eval

The monadic eval evaluates a list that representsa valid q parse tree, which can be produced by parse or by hand (if you knowwhat you're doing). A discussion of parse trees is beyond the scope of thismanual.

        show pt:parse "a:6*7"
:
`a
(*;6;7)
 
        eval pt
42

except

The dyadic except takes a simple list or adictionary whose range is a simple list as its first argument (target)and returns a list containing the items oftarget excluding those thatare in its second argument, which can be a scalar or a list. The returned itemsare in the order of their first occurrence intarget.

        1 2 3 4 3 2 except 2
1 3 4 3
 
        1 2 3 4 3 2 except 1 2 10
3 4 3
 
        "Now is the time_" except "_"
"Now is the time"
 
        d:`a`c`d`e!1 2 1 2
        d except 1
2 2

The result of except is never a scalar.

        1 2 except 1
,2
 
        1 2 except 2 1
`int$()
 
        d except 1 2
`int$()

exit

The monadic exit takes an int as its argument anda and executes the system command\\ with the specified parameter.

Warning:Exit does not prompt for a confirmation.

fill (^)

The dyadic fill ( ^ ) takes an atom as its firstargument and a list or dictionary (target) as its second argument. For alist, it returns a list obtained by substituting the first argument for everyoccurrence of null intarget. It operates on the range of a dictionary.

        42^1 2 3 0N 5 0N
1 2 3 42 5 42
 
        ";"^"Now is the time"
"Now;is;the;time"
 
        `NULL^`First`Second``Fourth
`First`Second`NULL`Fourth
 
        d:`a`b`c`d!100 0N 200 0N
        42^d
a| 100
b| 42
c| 200
d| 42

Observe that the action of fill is recursive - i.e., it isapplied to sublists of the target.

        42^(1;0N;(100;200 0N))
        42^
a| 100
b| 42
c| 200
d| 42

find (?)

When the first argument (target) of find ( ?) is a simple list, find is atomic in the second argument (source) andreturns the positions intarget of the initial occurrence of each itemofsource.

The simplest case is when source is a scalar.

         100 99 98 87 96?98
2
        "Now is the time"?"t"
7

If source is not found in target, findreturns the count of target - i.e., the position one past the lastelement.

        `one`two`three?`four
3

In this context, find is atomic in its second argument, soit is extended item-wise to asource list.

        "Now is the time"?"the"
7 8 9

Note that find always returns the position of the firstoccurrence of each atom.

        "Now is the time"?"time"
7 4 13 9

When the first argument (target) of find is ageneral list, find considers both elements to be general lists and attempts tolocate the second argument (source) in the target, returning theposition where it is found or the count oftarget if not found.

        (1 2;3 4)?3 4
1

Observe that find only compares items at the top level ofthe two arguments and does not look for nested items,

        ((0;1 2);3 4;5 6)?1 2
3
       ((0;1 2);3 4;5 6)?(1;(2;3 4))
3

When the first argument (target) of find is adictionary, find represents reverse lookup and is atomic in the second argument(source). In other words, find returns the domain item mapping tosourceifsource is in the range, or a null appropriate to the domain typeotherwise.

        d:1 2 3!100 101 102
        d
1| 100
2| 101
3| 102
 
        d?101
2
 
        d?99
0N
 
        d?102 100
3 1

When the first argument (target) of find is a tableand the second argument (source) is a record of the target, find returnsthe position ofsource if it is intarget, or the count of targetotherwise.

        t:([] a:1 2 3; b:`a`b`c)
        t
a b
---
1 a
2 b
3 c
        t?`a`b!(2;`b)
1

As usual with records, you can abbreviate the record to itsrow values.

        t?(3;`c)
2

When the first argument of find is a keyed table, since akeyed table is a dictionary, find performs a reverse lookup on a record fromthe value table.

        kt:([k:1 2 3] c:100 101 102)
        kt
k| c
-| ---
1| 100
2| 101
3| 102
 
        kt?`c!101
k| 2

Again, a record of the value table can be abbreviated toits row value(s).

        kt?102
k| 3

flip

The monadic function flip takes a rectangularlist, a column dictionary or a table as its argument (source). Theresult is the transpose ofsource.

When source is a rectangular list, the items arerearranged, effectively reversing the first two indices in indexing at depth.For example,

        L:(1 2 3; (10 20; 100 200; 1000 2000))
        L
1         2         3
10   20   100  200  1000 2000
 
        L[1;0]
10 20
 
        fL:flip L
        fL
1 10 20
2 100 200
3 1000 2000
 
        fL[0;1]
10 20

When source is a singleton list whose item is asimple list, flip creates a vertical list.

        flip enlist 101 103
101
103

This idiom is used to index multiple key values into keyedtables.

        kt:([k:101 102 103] c:`one`two`three)
        kt flip enlist 101 103
c
-----
one
three

When source is a column dictionary, the result is a tablewith the given column names and values. Row and column access are effectivelyreversed, but no data is rearranged.

        d:(`a`b`c!1 2 3;1.1 2.2 3.3;("one";"two";"three"))
        d
`a`b`c!1 2 3
1.1 2.2 3.3
("one";"two";"three")
 
        d[`b;0]
1.1
 
        t:flip d
        t
a  b    c
-----------
1 1.1 one
2 2.2 two
3 3.3 three
 
        t[0;`b]
1.1

When source is a table, the result is the underlyingcolumn dictionary. Row and column access are effectively reversed, but no datais rearranged.

        t:([]a:1 2 3;b:1.1 2.2 3.3;c:("one";"two";"three"))
        t
a b   c
-------------
1 1.1 "one"
2 2.2 "two"
3 3.3 "three"
 
        t[1;`c]
"two"
 
        d:flip t
        d
a| 1     2     3
b| 1.1   2.2   3.3
c| "one" "two" "three"
 
        d[`c;1]
"two"

getenv

The monadic function getenv takes a symbolargument representing the name of an OS environment variable and returns thevalue (if any) of that environment variable.

        getenv `SHELL
"/bin/bash"

group

The monadic function group operates on a list (source)and returns a dictionary in which each distinct item insource is mappedto a list of the indices of its occurrences in source. The items in the domainof the result are in the order of their first appearance insource.

        group "i miss mississippi"
i| 0 3 8 11 14 17
 | 1 6
m| 2 7
s| 4 5 9 10 12 13
p| 15 16

This can be used to extract specific information about theoccurrences, such as,

        dm:group "i miss mississippi"
        count each dm
i| 6
 | 2
m| 2
s| 6
p| 2
        first each dm
i| 0
 | 1
m| 2
s| 4
p| 15

iasc

The monadic function iasc operates on a list or adictionary (source). Consideringsource as a mapping, the resultofiasc is a list comprising the domain items arranged in increasingorder of their associated range items. Otherwise put, retrieving the items ofsourcein the order specified byiasc sorts source in ascending order.

        L:3 7 2 8 1 9
        iasc L
4 2 0 1 3 5
 
        L[iasc L]
1 2 3 7 8 9
 
        d:`b`c`a!3 2 1
        iasc d
`a`c`b
 
        d[iasc d]
1 2 3

identity

The monadic function denoted by double colon ( ::), is the identity function, meaning that the return value is the same as theargument.

        ::[42]
42
 
        ::[`zaphod]
`zaphod
 
        ::["Life the Universe and Everything"]
"Life the Universe and Everything"

Note:The identity function cannot be used with juxtaposition or @. Its argument mustbe enclosed in brackets.

        :: 42
'

idesc

The monadic function idesc operates on a list or adictionary (source). Consideringsource as a mapping, the resultofidesc is a list comprising the domain items arranged in decreasingorder of their associated range items. Otherwise put, retrieving the items ofsourcein the order specified byidesc sorts source in descendingorder.

        L:3 7 2 8 1 9
        idesc L
5 3 1 0 2 4
 
        L[idesc L]
9 8 7 3 2 1
 
        d:`b`c`a!3 2 1
        idesc d
`b`c`a
        d[idesc d]
3 2 1

in

The dyadic function in is atomic in its firstargument (source) and takes a second argument (target) that is anatom or list. It returns a boolean result that indicates whethersourceappears intarget. The comparison is strict with regard to type.

        3 in 8
0b
 
        42 in 0 6 7 42 98
1b
 
        "cat" in "abcdefg"
110b
 
        `zap in `zaphod`beeblebrox
0b
 
        2 in 0 2 4j
'type

inter

The dyadic inter can be applied to lists,dictionaries and tables. It returns an entity of the same type as itsarguments, containing those elements of the first argument that appear in thesecond argument.

        1 1 2 3 inter 1 2 3 4
1 1 2 3
 
       "ab cd " inter " bc f"
"b c "

Note:Lists are not sets and the operation ofinter on lists is notidentical to intersection of sets. In particular, the result ofinter does notcomprise the distinct items common to the two arguments. One consequenceis that the expression,

        (x inter y)~y inter x

is not true in general.

When applied to dictionaries, inter returns theset of common range items that are mapped from the the same domain items.

        d1:1 2 3!100 200 300
        d2:2 4 6!200 400 600
        d1 inter d2
,200

Tables that have the same columns can participate in inter.The result is a table with the records that are common to the two tables.

        t1
a b
--------
1 first
2 second
3 third
 
        t2
a b
--------
2 second
4 fourth
6 sixth
 
        t1 inter t2
a b
--------
2 second

join (,)

The dyadic join ( , ) can take many differentcombinations of arguments.

When both operands are either lists or atoms, the result isa list with the item(s) of the left operand followed by the item(s) of theright operand.

        2,3
2 3
 
        `a,`b`c
`a`b`c
 
        "xy","yz"
"xyyz"
 
        1.1 2.2,3 4
1.1
2.2
3
4

Observe that the result is a general list unless all itemsare of a homogeneous type.

When both operands are dictionaries, the result is themerge of the dictionaries using upsert semantics. The domain of the result isthe (set theoretic) union of the two domains. Range assignment of the rightoperand prevails on common domain items.

        d1:1 2 3!`a`b`c
        d2:3 4 5!`cc`d`e
        d1,d2
1| a
2| b
3| cc
4| d
5| e

When both operands are tables having the same column namesand types, the result is a table in which the records of the right operand areappended to those of the left operand.

        t1:([]a:1 2 3;b:`x`y`z)
        t1
a b
---
1 x
2 y
3 z
 
        t2:([]a:3 4;b:`yy`z)
        t2
a b
----
3 yy
4 z
 
        t1,t2
a b
----
1 x
2 y
3 z
3 yy
4 z

When both operands are keyed tables having the same key andvalue columns, the result is a keyed table in which the records of the leftoperand are upserted with those of the right operand.

        kt1:([k:1 2 3]v:`a`b`c)
        kt1
k| v
-| -
1| a
2| b
3| c
 
        kt2:([k:3 4]v:`cc`d)
        kt2
k| v
-| --
3| cc
4| d
 
        kt1,kt2
k| v
-| --
1| a
2| b
3| cc
4| d

join-each(,')

The verb join ( , ) can be combined with the adverb monadiceach ( ' ) to yield join-each ( ,' ), which can be used on lists, dictionariesor tables.

List operands must have the same count.

        L1:1 2 3
        L2:`a`b`c
        L1,'L2
1 `a
2 `b
3 `c

As always with dictionaries, the operation occurs along thecommon domain items, with null extension elsewhere.

        d1:1 2 3!10 20 30
        d2:2 3 4!`a`b`c
        d1,'d2
1| 10 `
2| 20 `a
3| 30 `b
4| 0N `c

For two tables with the same count of records, join-eachresults in a column join (Column Join), in which columns withnon-common names are juxtaposed and overlapping columns are upserted.

        t1:([]c1:1 2 3;c2:1.1 2.2 3.3)
        t1
c1 c2
------
1  1.1
2  2.2
3  3.3
 
        t2:([]c2:`a`b`c;c3:100 200 300)
        t2
c2 c3
------
a  100
b  200
c  300
 
        t1,'t2
c1 c2 c3
---------
1  a  100
2  b  200
3  c  300

Note:When join-each is used in aselect, it must be enclosed in parentheses to avoid the comma beinginterpreted as a separator.

       select j:(c1,'c2) from t1
j
-----
1 1.1
2 2.2
3 3.3

list

The function list replaces plist. It XE"list (function)" takes a variable number of arguments and returns alist whose items are the arguments. It is useful for creating listsprogrammatically.

Note:Unlike user-defined functions, the number of arguments to list is notrestricted to eight.

For example,

        list[6;7;42;`Life;"The Universe"]
6
7
42
`Life
"The Universe"
 
        list[1;2;3;4;5;6;7;8;9;10]
1 2 3 4 5 6 7 8 9 10

null

The atomic function null takes a list (source)and returns a binary list comprising the result of testing each item insourceagainst null.

        null 1 2 3 0N 5 0N
000101b
 
        null `a`b``d```f
0010110b

Since null is atomic, it is applied recursively tosublists.

        null (1 2;3 0N)
00b
01b

It is useful to combine where with nullto obtain the positions of the null items.

         where null 1 2 3 0N 5 0N
3 5

When applied to a dictionary (source), nullreturns a dictionary in which each item in thesource range is replacedwith the result of testing the item against null.

        null 1 2 3!100 0N 300
1| 0
2| 1
3| 0

The action of null on a table (source) isexplained by recalling that the table is a flipped column dictionary. Based onthe action ofnull on a dictionary, we expect the result ofnullon a table will be a new table in which each column value in the source isreplaced with the result of testing the value against null.

        tnull:([]a:1 0N 3; b:0N 200 300)
        null tnull
a b
---
0 1
1 0
0 0

Similarly, we expect null to operate on a keyedtable by returning a result keyed table whose value table entries are theresult of testing those of the argument against null.

        ktnull:([k:101 102 103];v:`first``third)
        null ktnull
k  | v
---| ---
101| 0
102| 1
103| 0

parse

The monadic function parse takes a string argumentcontaining a valid q expression and returns a list containing the correspondingparse tree. Applying the functioneval to the result will evaluate it.A discussion of q parse trees is beyond the scope of this tutorial.

        .Q.s1 parse "a:6*7"
"(:;`a;(*;6;7))"
        eval parse "a:6*7"
42

Note:It is useful to apply parse to a query template in order to discover itsfunctional form. The result is not always exactly the functional form,especially for exec, but a little experimenting will lead to the correct form.

        t:([]c1:`a`b`a; c2:1 2 3)
        select c2 by c1 from t
c1| c2
--| ---
a | 1 3
b | ,2
 
        parse "select c2 by c1 from t"
?
`t
()
(,`c1)!,`c1
(,`c2)!,`c2
 
        ?[t;();(enlist `c1)!enlist `c1;(enlist `c2)!enlist `c2]
c1| c2
--| ---
a | 1 3
b | ,2
 
        exec c2 by c1 from t
a| 1 3
b| ,2
 
        parse "exec c2 by c1 from t"
?
`t
()
,`c1
,`c2
 
        ?[t;();`c1;`c1]
a| `a`a
b| ,`b

rand (?)

The dyadic function rand ( ? ) is overloaded tohave different meanings. In the case where both arguments are numeric scalars,?returns a list of random numbers. More specifically, the first argument must beof integer type, and the second argument can by any numeric value. In this context,? returns a list of pseudo-random numbers of count given by firstargument.

In case the second argument is a positive number offloating point type and the first argument is positive, the result is a list ofrandom float selectedwith replacement from the range between 0(inclusive) and the second argument (exclusive).

        5?4.2
3.778553 1.230056 1.572286 0.517468 0.07107598
 
        4?1.0
0.5274765 0.5435815 0.4611484 0.7493561

In case the second argument is of integer type and thefirst argument is positive, the result is a list of random integers selectedwithreplacement from the range between 0 (inclusive) and the second argument(exclusive).

        10?5
1 2 0 3 4 4 4 0 3 1
 
        10?5
0 2 1 0 2 4 2 3 4 0
 
        1+10?5
4 2 3 3 3 2 1 1 5 3

The last example shows how to select randomintegers between 1 and 5. More generally, for integersi andj,where i<j, and any integer n, the idiom,

       i+n?j+1-i

selects n random integers between i and jinclusive.

        i:3
        j:7
        n:10
        i+n?j+1-i
3 4 5 7 7 5 4 4 7 4

In case the second argument is of integer type and thefirst argument is negative, the result is a list of random integers selectedwithoutreplacement from the range between 0 (inclusive) and the second argument(exclusive). Since the selected values are not replaced, the absolute value ofthe first argument cannot exceed the second argument,

        -3?5
2 3 0
 
       -5?5
4 1 2 0 3
 
       -6?5
'length

raze

The monadic raze takes a list or dictionary (source)and returns the entity derived from the source by eliminating the top-mostlevel of nesting.

        raze (1 2;`a`b)
1
2
`a
`b

One way to envision the action of raze is to writethe source list in general form, then remove the parentheses directly beneaththe outer-most enclosing pair.

        raze ((1;2);(`a;`b))
1
2
`a
`b

Observe that raze only removes the top-most levelof nesting and does not apply recursively to sublists.

        raze ((1 2;3 4);(5;(6 7;8 9)))
1 2
3 4
5
(6 7;8 9)

If source is not nested, the result is the source.

        raze 1 2 3 4
1 2 3 4

When raze is applied to an atom, the result is alist.

        raze 42
,42

When raze is applied to a dictionary, the resultis raze applied to the range.

        dd:`a`b`c!(1 2; 3 4 5;6)
        raze dd
1 2 3 4 5 6

reshape(#)

When the first argument of the dyadic reshape ( # ) is alist (shape) of two positive int, the result reshapes the source into arectangular list according toshape. Specifically, the count of theresult in dimensioni is given by the item in positioni in shape.The elements are taken from the beginning of the source.

A simple example makes this clear.

        2 3#1 2 3 4 5 6
1 2 3
4 5 6

As in the case of take, if the number of elements in thesource exceeds what is necessary to form the result, trailing elements areignored.

          2 2#`a`b`c`d`e`f`g`h
a b
c d

Similarly, if the number of elements in the source is lessthan necessary to form the result, the extraction resumes from the initial itemof the source; this process is repeated until the result is complete.

 #!q
       5 4#"Now is the time"
"Now "
"is t"
"he t"
"imeN"
"ow i"

It is possible create a ragged array of any number ofcolumns by using 0N as the number of rows with the reshape operator ( # ).

        0N 3#til 10
0 1 2
3 4 5
6 7 8
,9

reverse

The monadic reverse inverts the order of theconstituents of its argument. In the case of an atom, it simply returns theargument.

        reverse 42
42

In the case of a list, the result is a list in which theitems are in reverse order of the argument.

        reverse 1 2 3 4 5
5 4 3 2 1

For nested lists, the reversal takes place only at thetopmost level.

        reverse (1 2 3; "abc"; `Four`Score`and`Seven)
`Four`Score`and`Seven
"abc"
1 2 3

In the case of an empty list, reverse returns theargument.

        reverse ()
()

In the case of a dictionary, reverse inverts boththe domain and range lists.

        reverse`a`b`c!1 2 3
c| 3
b| 2
a| 1

Since a table is a list of records, reverseinverts the order of the records.

        t:([] c1:`a`b`c; c2:1 2 3)
        t
c1 c2
-----
a  1
b  2
c  3
        reverse t
c1 c2
-----
c  3
b  2
a  1

Since a keyed table is a dictionary, reverseinverts both the domain and range tables, effectively inverting the row order.

       kt
k| c
-| ---
1| 100
2| 101
3| 102
 
        reverse kt
k| c
-| ---
3| 102
2| 101
1| 100

sublist

The dyadic function sublist retrieves a sublist ofcontiguous items from a list. The left operand is a simple list of two ints:the first item is the starting index (start); the second item is thenumber of items to retrieve (count). The right operand (target)is a list or dictionary.

If target is a list, the result is a list comprisingcount items fromtarget beginning at indexstart.

        L:1 2 3 4 5
        1 3 sublist L
2 3 4

If target is a dictionary, the result is adictionary whose domain comprisescount items from thetargetdomain beginning at index start, and whose range is the correspondingitems in thetarget range.

        d:`a`b`c`d`e!1 2 3 4 5
        1 3 sublist d
b| 2
c| 3
d| 4

Since a table is a list of records, sublistapplies to the rows of a table.

        t:([]c1:`a`b`c`d`e;c2:1 2 3 4 5)
        1 3 sublist t
c1 c2
-----
b  2
c  3
d  4

Since a keyed table is a dictionary, sublist isapplied to the key table.

         kt:([k:`a`b`c`d`e]c1:1 2 3 4 5)
         1 3 sublist kt
k| c1
-| --
b| 2
c| 3
d| 4

system

The monadic system takes a string argument and executes itis a q command, if recognized, or an OS command otherwise. The function systemis equivalent to\\ but can be more convenient or readable insituations such as remote or programmatic execution in which the backslashesmust be escaped.

The following changes the current working directory to itssparent directory.

        system "cd .."

take (#)

When the left operand of take ( # ) is an int atom, itcreates a new entity via extraction from its right operand (source) asspecified by the first operand. A positive integer in the first operandindicates that the extraction occurs from the beginning of the source,whereas a negative integer in the first operand indicates that the extractionoccurs from the end of thesource.

The source can be an atom, a list, a dictionary, atable or a keyed table.

        2#3
3 3
 
       -1#10 20 30 40
,40
 
        -2#`a`b`c`d!10 20 30 40
c| 30
d| 40
 
        3#([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)
a  b
------
10 1.1
20 2.2
30 3.3
 
        1#([k:10 20 30] c:`one`two`three)
k | c
--| ---
10| one

The result of take is of the same type and shape as the source,except the result is never a scalar.

        1#42
,42

If the number of elements in source exceeds what isnecessary to form the result, trailing elements are ignored.

        4#`a`b`c`d`e`f`g`h
`a`b`c`d

If the number of elements in source is less thannecessary to form the result, the extraction resumes from the starting point ofthesource list; this process is repeated until the result is filled.

        5#98 99
98 99 98 99 98
 
        -7#`a`b`c
`c`a`b`c`a`b`c

In the degenerate case, the result is an empty entity withthe same type as the source. This is an effective way to obtain the schema of aq dictionary or list.

        0#42
`int$()
 
        0#10 20 30 40
`int$()
 
        0#`a`b`c`d!10 20 30 40
_
 
        0#([] a:10 20 30 40; b:1.1 2.2 3.3 4.4)
a b
---
 
        0#([k:10 20 30] c:`one`two`three)
k| c
-| -

Note:Since the result of0# on a list is always a list, we can use this construct as shorthandto initialize an empty value column with a definite type in a table definition.This ensures that only values of the specified type can be inserted into thecolumn. For example,

        ([] a:0#0; b:0#`)
a b
---

defines an empty table whose first column is of type intand whose second column is of type symbol.

When the left operand of # is a list of symbolcolumn names and the right operand is a table, the result is the table obtainedby extracting the specified columns from t.

        t:([] c1:`a`b`c; c2:1 2 3; c3:1.1 2.2 3.3)
        `c1`c3#t
c1 c3
------
a  1.1
b  2.2
c  3.3

When the left operand of # is a table (keys)and the second operand is a keyed table whose key table containskeys,the result is the keyed table corresponding to those values inkeys.

        ktc:([lname:`Dent`Beeblebrox`Prefect; fname:`Arthur`Zaphod`Ford] iq:98 42 126)
        ktc
lname      fname | iq
-----------------| ---
Dent       Arthur| 98
Beeblebrox Zaphod| 42
Prefect    Ford  | 126
 
        K:([] lname:`Dent`Prefect; fname:`Arthur`Ford)
        K#ktc
lname   fname | iq
--------------| ---
Dent    Arthur| 98
Prefect Ford  | 126

til

The monadic til returns a list of the integersfrom 0 to n-1, where its argumentn is a non-negative integer.

        til 4
0 1 2 3

The result of til is always a list of int. So,

        til 1
,0
 
        til 0
`int$()

Generating sequences is simple with til.

 
        2*til 10               / evens
0 2 4 6 8 10 12 14 16 18
 
        1+2*til 10     / odds
1 3 5 7 9 11 13 15 17 19
 
        20+til 5
20 21 22 23 24
 
        0.5*til 10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

The function til is useful for extracting asublist from a list. The idiom,

        L[i+til n]

extracts from the list L the sublist of length nstarting with the element in positioni. For example,

        L:10 20 30 40 50 60 70
        i:2
        n:3
 
        L[i+til n]
30 40 50

Similarly, the idiom

        L[i+til j+1-i]

extracts the sublist from positions i through j,inclusive. WithL andi as above,

        i:2
        j:5
        L[i+til j+1-i]
30 40 50 60

Note:In the second idiom, omitting the increment-by-one retrieves one less item thanyou probably intend. This is an easy error to make.

These idioms are useful for extracting substrings.

        s:"abcdefg"
        i:1
        n:2
        j:4
        s[i+til n]
"bc"
 
        s[i+til j+1-i]
"bcde"

Note:You can use the built-in functionsublist to retrieve substrings.

The expression,

        n = count til n

is true for every n ? 0. Similarly, theexpression,

        L~L[til count L]

is true for every list L. Both expressions remainvalid in the degenerate case of the empty list.

ungroup

The monadic ungroup can be applied to a keyedtable that is the result of aselect with grouping or of thexgroupfunction. The result will have the selected records in the same format as theoriginal table but they may be in a different order since they will be sortedby the grouping column(s).

Using the distribution example,

        sp
s  p  qty
---------
s1 p1 300
s1 p2 200
s1 p3 400
s1 p4 200
s4 p5 100
s1 p6 100
s2 p1 300
s2 p2 400
s3 p2 200
s4 p2 200
s4 p4 300
s1 p5 400
 
        ungroup select s, qty by p from sp
p  s  qty
---------
p1 s1 300
p1 s2 300
p2 s1 200
p2 s2 400
p2 s3 200
p2 s4 200
p3 s1 400
p4 s1 200
p4 s4 300
p5 s4 100
p5 s1 400
p6 s1 100

Note:You can apply ungroup to a keyed table that did not arise from a groupoperation, but it must have the correct form or an error will result.

union

The dyadic union can be applied to lists andtables. It returns an entity of the same type as its arguments containing thedistinct elements from both arguments.

        1 union 2 3
1 2 3
 
        1 2 union 2 3
1 2 3
 
        1 1 3 union 1 2 3 1
1 3 2
 
        "a good time" union "was had by all"
"a godtimewshbyl"

Observe that the items of the first argument appear firstin the result.

Tables that have the same columns can participate in union.The result is a table with the distinct records from the combination of the twotables.

        t1:([] a:1 2 3 4; b:`first`second`third`fourth)
        t2:([] a:2 4 6; b:`dos`cuatro`seis)
        t1
a b
--------
1 first
2 second
3 third
4 fourth
 
        t2
a b
--------
2 dos
4 cuatro
6 seis
 
        t1 union t2
a b
--------
1 first
2 second
3 third
4 fourth
2 dos
4 cuatro
6 seis

Note:As of this writing (Jun 2007), union does not apply to dictionaries or keyedtables.

value

The function value has two uses. When applied to adictionary, value returns the range of the dictionary.

        d:`a`b`c!1 2 3
        value d
1 2 3

Logically enough, for a keyed table, value returns thevalue table.

        kt:([k:101 102 103] c1:`a`b`c)
        kt
k  | c1
---| --
101| a
102| b
103| c
 
        value kt
c1
--
a
b
c       

When value is applied to a string, it passes thestring to the q interpreter and returns the result.

        value "6*7"
42
 
        value "{x*x} til 10"
0 1 4 9 16 25 36 49 64 81
        
        z:98.6
        value"z"
98.6
 
        value "a:6;b:7;c:a*b"
        a
6
 
        b
7
 
        c
42

Note:This use of thevalue function is a powerful feature that allows q code to be written andexecuted on the fly. If abused, it can quickly lead to unmaintainable code.(The spellchecker suggests "unmentionable" instead of"unmaintainable." How did it know?)

A common use of value is to convert a symbol orstring containing the name of a q entity into the value associated with theentity.

        a:42
        s:`a
        value `a
42
 
        value s
42
 
        value "a"
42

where

The monadic where has multiple uses, depending onthe type of its argument.

When the argument is a boolean list, where returnsa list of int comprising the positions in the argument having value1b.

        where 00110101b
2 3 5 7

This is useful when the boolean list is generated by a teston a list.

        L:"Now;is;the;time"
 
        where L=";"
3 6 10
 
        L[where L=";"]:" "
        L
"Now is the time"

Note:The behavior of the where phrase in theselect template is relatedto thewhere function on a boolean list. The former limits the selection totable rows in those positions where the value of the where expression is notzero. Since the expression involves test(s) on column value(s), the wherephrase effectively selects the rows satisfying its column condition, just as inSQL. SeeThe where Phrase for more on thewhere phrase.

When the argument s of where is a list ofnon-negative int, the result is a list of int comprising the items 0, ... ,-1+counts, in which the original item at positioni is repeated s[i]times.

For example,

        where 2 1 3
0 0 1 2 2 2
        where 4 0 2
0 0 0 0 2 2
        where 4#1
0 1 2 3

Note:The behavior of where on an int list reduces to that on a boolean list byconsidering the boolean values as ints.

When the argument s is a dictionary whose range is alist of non-negative int,where returns a list comprising items of thedomain ofs, in which the item at positioni is repeated s[i]times.

For example,

        where `a`b`c!2 1 3
`a`a`b`c`c`c
        where `a`b`c!4 0 2
`a`a`a`a`c`c

Note:The behavior ofwhere on a dictionary is consistent with its behavior on a list byconsidering a list L as a mapping whose implicit domain istil count L.

within

The dyadic function within is atomic in its firstargument (source) and takes a second argument that is a list of twoitems that have underlying numeric values. It returns a boolean valuerepresenting whether source is between the two items of the second argument(inclusive).

        3 within 2 5
1b
 
        100 within 0 100
1b
 
        "c" within "az"
1b
 
        2006.11.19 2007.07.04 2008.08.12 within 2007.01.01 2007.12.31
010b

Observe that within is type tolerant provided botharguments have underlying numeric values, meaning that the types of itsarguments do not need to match.

        0x42 within (30h;100j)
1b
 
        100 within "aj"
1b

It is also possible to apply within to symbolssince they have lexicographic order.

        `ab within `a`z
1b

Note:The expression

        x within (a;b)

is equivalent to,

        (a<=x)&x<=b

Thus, if the items of the second argument are not inincreasing order, the result ofwithin will always be0b.

        5 within 6 2
0b

 

 

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值