When does : (amend) not modify its first argument?

Short answer: When 1) projected or 2) modified by an adverb, e.g., :/: :\: or :'


Normally, the : (amend) function is used to assign a value to a name:

q)foo: 47
q)foo
47
q)

Like other built-in functions that take two arguments, : (amend) can be called using either infix notation (as above) or function call notation:

q):[foo; 747]
q)foo
747
q)

Note that : (amend) displays its special semantics (which it shares with assignments in all strict languages) of not evaluating its first argument when that argument is simply a name, regardless of whether : (amend) is invoked infix or functionally.

q)delete bar from `.    / make sure bar is not defined
q):[bar; 42]
q)bar
42
q)

On the other hand, : (amend) does evaluate its first argument when that argument is an expression, but : (amend) still produces l-values when it does so:

q)list: 0 1 2 3
q)list[0]: 47
q)list
47 1 2 3
q):[list 1 2 3; 10 20 30]
q)list
47 10 20 30
q)

Based on the previous examples, you might be tempted create a projection from : (amend) and a left-hand side, but it turns out you can't:

q):[foo]
'foo
q)

It is possible to close the first argument to : (amend), but the resulting projection is no longer an assignment; it is an identity function:

q)foo: 42
q)(:[foo])[47]
47
q)foo
42
q)

In fact, the value attached to the first argument of : (amend) is irrelevant:

q)(:[`xyzzy])[47]
47
q)

Closing the second argument will cause : (amend) to return the same value always:

q)f: :[; 3]     / bind the 2nd argument
q)f 18          / no matter what we pass
3
q)f "hello"     / f will always return 3
3
q)

The other case when : (amend) does not perform assignment is when it is modified by an adverb:

q)list1: 0 1 2 3
q)list2: 4 5 6 7
q)(list1; list2) :\: 10 20 30 40   / we get the rhs
10 20 30 40                        / once for each list
10 20 30 40
q)list1                            / neither list
0 1 2 3
q)list2                            / has been modified
4 5 6 7
q)list2 :/: 0 1 2 3
0 1 2 3
q)list1[0 1 2 3] :' 10 20 30 40
10 20 30 40
q)

What is q's equivalent to sql's ORDER BY?

xasc and xdesc. Consider the following table:

q)table: ([] x: 10 ? "abc"; y: 10 ? til 10)
q)table
x y
---
a 7
a 7
b 1
b 9
a 1
a 0
c 8
b 8
c 3
c 1
q)

To sort by column x, instead of "ORDER BY x", we write `x xasc table:

q)`x xasc table // sort by column x
x y
---
a 7
a 7
a 1
a 0
b 1
b 9
b 8
c 8
c 3
c 1
q)

The application of xdesc is similar.

In addition, to sort by multiple columns, instead of "ORDER BY y, x" or "ORDER BY y, x DESC", we pass the list of column names as the left argument to xasc or xdesc, respectively. For example,

q)`y`x xdesc select from table where y > 5
x y
---
b 9
c 8
b 8
a 7
a 7
q)

Don't confuse xasc and xdesc with asc and desc, which operate on a vector instead of a table. Read more about sorting vectors in this related faq.

How do I extract hours, minutes, or seconds from a time?

Short answer: Times are integers (think milliseconds).

q)(`hh`mm`ss $ .z.T), `int $ .z.T mod 1000
16 49 2 233
q)


You can extract the hours, minutes, and seconds from a time by passing `hh, `mm, and `ss, respectively, as left arguments to $ (cast):

q)now: .z.T
q)now
16:49:02.233
q)`hh $ now
16
q)`mm $ now
49
q)`ss $ now
2
q)`hh`mm`ss $ now
16 49 2
q)

Getting the milliseconds from a time is slightly less obvious. Times (type -19) are represented internally by q as 32-bit integers; typically the value counts the number of milliseconds since midnight, but it can also represent a span of time. We can cast freely back and forth between the two types and the values are preserved:

q)`time $ 0
00:00:00.000
q)`int $ 00:00:00.000
0
q)`time $ 24 * 60 * 60 * 1000
24:00:00.000
q)`int $ 24:00:00.000
86400000
q)`int $ now
60542233
q)`time $ 60542233
16:49:02.233
q)

Not only is the internal representation of time simply an integer, we can mix integers and times in integer arithmetic operations, and the result is always a time:

q)01:00:00.000 + 00:01:00.000
01:01:00.000
q)01:00:00.000 + 60000
01:01:00.000
q)01:00:00.000 * 4
04:00:00.000
q)now - 01:30:20.123
15:18:42.110
q)

By using div and mod, then, we have an alternative means to calculate the components of a time:

q)now div 3600000 // milliseconds per hour
00:00:00.016
q)now mod 1000    // just the milliseconds, please
00:00:00.233
q)

Although extracting milliseconds from a time while keeping the time type (as in the second example above) is sometimes useful, we normally want to get back these components of a time as integers, so let's cast it:

q)`int $ now mod 1000
233
q)


By the way, there is a shortcut for getting hours, minutes, and seconds from global variables that hold times: dot notation.

q)x: .z.T
q)x.hh, x.mm, x.ss
16 49 2
q)

However, we rarely use global variables to hold time values.

How do I parse a date and time?

Short answer: "D"$, "T"$, and "Z"$


In general you can parse strings to some particular type by passing the uppercase form of the corresponding type character as the left argument to $ (cast).

q can parse dates in YYYYMMDD, YYYY-MM-DD, or YYYY.MM.DD format:

q)"D"$ ("20070809"; "2007-08-09"; "2007.08.09")
2007.08.09 2007.08.09 2007.08.09
q)

Times must have the form HH:MM:SS.mmm, but you can shorten them:

q)"T"$ ("07:08:09.010"; "07:08:09"; "07:08"; "07")
07:08:09.010 07:08:09.000 07:08:00.000 07:00:00.000
q)

In addition, "T"$ will tolerate a missing zero for hours (and hours only) less than 10:

q)"T"$ "7:08:09"
07:08:09.000
q)

"Z"$ expects (as in XML) that dates and times are joined with a T:

q)"Z"$ "2007-08-09T07:08:09.101"
2007.08.09T07:08:09.101
q)

"Z"$ usually treats the text analogously to "D"$ and "T"$; that is, the date must be complete, but the time may have its finer points elided:

q)"Z"$ "2007"
0Nz
q)"Z"$ "20070809"
2007.08.09T00:00:00.000
q)

Note, however, the "Z"$ - unlike "T"$ - requires that hours less than 10 be prepended with zero:

q)"Z"$ "20070809T7:08:09"
0Nz
q)

You can also parse months with "M"$, minutes with "U"$, and seconds with "V"$:

q)"M"$ ("2007-08"; "2007.08"; "200708")
2007.08 2007.08 2007.08m
q)"U"$ "07:08:09.010"
07:08
q)"V"$ "07:08:09.010"
07:08:09
q)

Make sure you don't pass any extra text to "M"$, though:

q)"M"$ "2007.08.01"
2007.01m
q)

There are two more type characters for parsing temporal values. "N"$ and "P"$ are for parsing nanosecond-precise time spans and datetimes, respectively:

q)"N"$ "12D07:08:09.123456789"
12D07:08:09.123456789
q)"P"$ "2007-08-09T07:08:09.123456789"
2007.08.09D07:08:09.123456789
q)

You may have discovered that "N"$ can parse floating point numbers, as well, but beware - depending on the range of the input, you might not get what you expect:

q)"N"$ "0.123456789"
0D00:00:00.123456789
q)"N"$ "1.123456789"
0D01:00:00.123456789
q)"N"$ "12.123456789"
0D12:00:00.123456789
q)"N"$ "24.123456789"
1D00:00:00.123456789
q)"N"$ "48.123456789"
2D00:00:00.123456789
q)"N"$ "4800.123456789"
2D00:00:00.123456789

Stick with strings that represent durations in the way q does, and you'll steer clear of surprises.

Lastly, if you find yourself with a text full of times expressed as seconds (and, optionally, fractions thereof) since the Unix epoch, "P"$, and "Z"$ can parse them, too:

q)unix_time: first system "date +%s"
q)unix_time
"1311635286"
q)"Z"$ unix_time
2011.07.25T23:08:06.000
q)"P"$ unix_time , ".123456789"
2011.07.25D23:08:06.123456789

How do I get the length of a list?

Short answer:

count list
or
count each column_name
to get the length of a list-valued field in a query.


The length of any list can be found using the count function; the only complication occurs with certain applications of count in queries. We'll illustrate with a table of simulated trade data:

q)trade
time         sym price size
---------------------------
09:30:17.623 glo 26.79 3200
09:30:26.602 ojb 28.17 1600
09:30:33.657 fjc 91.31 5400
09:30:40.884 knb 59.7  2900
09:31:19.256 apl 1.26  7900
..
q)

The following query uses count to return the number of rows that satisfy the where clause:

q)exec count i from trade where sym = `aif
6
q)

This is usually what you want.

However, suppose we created another table (perhaps to improve the performance of certain queries) that grouped prices for each symbol:

q)grouped: select price by sym from trade
q)grouped
sym| price                                                   ..
---| --------------------------------------------------------..
acl| 18.1 85.5 86.31 45.65 10.91 31.25 5.49 51.66 1.02 31.82 ..
aif| 44.02 47.74 11.83 14.28 85.11 99.09                     ..
alk| 51.12 16.7 78.71 96.56 26.55 32.09 33.66 30.01 83.24 30...
amg| 59.25 56.18 92.63 46.57 57.25 14.58 69.21 88.25 94.1 28...
aog| 92.59 51.52 96.95 83.2 6.21 59.77 44.19 94.19 3.13 41.94..
..
q)

In table grouped, the price column is of type list-of-float - note the uppercase type letter F:

q)meta grouped
c    | t f a
-----| -----
sym  | s    
price| F    

It appears that we can retrieve the prices for a given symbol easily enough:

q)exec price from grouped where sym = `aif
44.02 47.74 11.83 14.28 85.11 99.09
q)

Suppose, however, that we want to know how many trades occurred for a particular symbol:

q)exec count price from grouped where sym = `aif
1    
q)

What went wrong? A where function yields a list of row indexes that meet the constraints, and then each projection (i.e., the column names between exec and from — in this case, price) yields a list of corresponding field values. Since the number of rows that met our sole constraint is 1, the result of the projection is an untyped list with a single element:

q)type exec price from grouped where sym = `aif
0h
q)count exec price from grouped where sym = `aif
1
q)

Projection results are the arguments to aggregation functions in queries. In other words, the untyped list in our example is the same as the one passed to count. Since the projection's single element contains the list of prices we want to count, the way out is to combine count with first:

q)exec count first price from grouped where sum = `aif
6
q)

Applying the same logic when counting the trades for every symbol, we need to use count each:

q)select count price by sym from grouped
sym| price
---| -----
acl| 1    
aif| 1    
alk| 1    
amg| 1    
aog| 1    
..
q)select count each price by sym from grouped
sym| price
---| -----
acl| 14   
aif| 6    
alk| 10   
amg| 12   
aog| 13   
..
q)

Constraints in where clauses behave the same as well. Consider the following select statement:

q)t
x y z
-------------
a 1 "xyzzy"
b 2 "grue"
c 3 "frobozz"
q)select from t where 5 < count z
x y z
-----
q)

Newcomers to q often expect the above query to return the rows from t whose z column has more than 5 characters (i.e., c 3 frobozz). Rather than counting the contents of each z field, however, count is actually counting the list t `z:

q)count t `z
3
q)t `z
"xyzzy"
"foobar"
"frobozz"
q)

This insight suggests the solution:

q)count each t `z
5 6 7
q)select from t where 5 < count each z
x y z
-------------
c 3 "frobozz"
q)

select sum size from a large table unexpectedly returns a negative number.  Why?

Short answer: cast the column to long before applying sum:

select sum `long $ size from large_table

In many programming languages, including q, anytime you add integers (unless you somehow know for sure that the sum will fit in - roughly - 31 bits) you risk integer overflow. If you're lucky, the error will be obvious (e.g., you'll get a negative number when you expected a positive one). Casting the arguments to sum to long will give you (almost) 63 bits of breathing room.

If your sum won't fit in 63 bits, you'll need to explore other options:

  • Switch to floats. Although overflow is still possible, it's rare. However, you lose some precision.
  • Use a bignum library such as gmp.
  • Use a language (e.g., haskell or clojure) that has built-in support for arbitrary-precision arithmetic.

Lastly, keep in mind that literal values that cannot fit into an int must be suffixed with j - even if the context suggests that a long is expected:

q)select from meta large_table where c = `id
c | t f a
--| -----
id| j
q)count select from large_table where id = 84066472837652480
'84066472837652480
q)count select from large_table where id = 84066472837652480j
1
q)

Does the order of field constraints matter in the performance of a select query?

Short answer: Yes, put the most restrictive constraint first.

Although expressions in q are usually evaluated from right to left, there is one exception: the constraints in a where clause are evaluated from left to right. Consider the following example:

q)t: ([] x: `a`b`c; y: 1 2 3)
q)select from t where x < `c, y > 1
x y
---
b 2
q)

The comma separating x < `c from y > 1 is not the join operator; instead, it delimits the constraints of the where clause. If you need to use the join operator in a where clause, use parentheses:

q)select from t where x in (`a, `b), y > 1
x y
---
b 2
q)

Each constraint is evaluated in the usual order, i.e., right-to-left:

q)select from t where x < first -1 # `p`c, y > 1
x y
---
b 2
q)

Because constraints are evaluated from left to right, putting the most restrictive constraint first will speed up queries on large tables. On a date partitioned database, for example, the placement of a constraint on the virtual date column will make a marked performance difference:

select from large_table where date = 2011.02.25, name = `JOE

is significantly faster than

select from large_table where name = `JOE, date = 2011.02.25

The latter query will attempt to load the entire contents of table for rows matching name `JOE prior to narrowing the result set down to a single date. kdb does not provide behind-the-scenes optimization in cases like these.

How do I set a breakpoint inside a q function?

Consider the following function, testfunc:

testfunc: {[]
x: `aaa;
y: `bbb;
z:: `ccc;
-1 "finished!";
}

Let's demonstrate the placement of a breakpoint prior to the assignment of global variable z. Although there is no explicit support for breakpoints in q, insertion of non-compliant code, such as breakhere; shown below, does the job (don't forget the trailing semicolon):

testfunc: {[]
x: `aaa;
y: `bbb;
breakhere;
z:: `ccc;
-1 "finished!"
}

We are tricking q into throwing a signal 'breakhere.

q)testfunc[]
{[]
x: `aaa;
y: `bbb;
breakhere;
z:: `ccc;
-1 "finished!"
}
'breakhere
q))

At this point, we can examine the local stack.

q))x
`aaa
q))y
`bbb
q))z
()
q))

kdb has suspended execution, leaving the remaining two lines of function testfunc unexecuted. : (colon) resumes execution.

q)):
finished!
-1
q)x
'x
q)y
'y
q)z
`ccc
q)

See: global amend (::)

How do I expose a subset of partitions and remove the rest from view?

Short answer: .Q.view partition_list

By default, all partitions of a given historical kdb database are available. The .Q.view function enables a given kdb process instance to selectively expose an arbitrary set of partitions into view.

The following example restricts all operations of the database instance to the most recent partition date (typically the previous business day):

q)yesterday: last date
q).Q.view yesterday
q)count date
1
q)yesterday ~ value exec from select distinct date from trade
1b
q)

Here is a more realistic example of a date partitioned historical database where viewable partitions are limited to those year-to-date:

q)start_date: "D" $ string[`year $ .z.D], ".01.01"
q)end_date: last date
q).Q.view date where date within (start_date; end_date)
q)count select from trade where not date within (start_date;
end_date)
0
q)

See also: .Q.pn

I have a very large file to uncompress. zip and gzip do not work!

Short answer: Use 7-zip. It contains large-file support.

$ unzip verylargefile.zip /destination_dir/
I/O error: File too large

Make sure your file system can handle LFS.

$ 7za e -o /destination_dir/ verylargefile.zip

See also: 7-zip usage examples.