What exactly does column-oriented mean?

It means that all the values in a column are stored together. This is in contrast to row-oriented, in which all the values for a row are stored together. Which organization is better depends on the access pattern of your application.

Many operations are more efficient with a column-oriented approach. In particular, operations that need to access a sequence of values from a particular column are much faster. If all the values in a column have the same size (which is true, by design, in kdb), things get even better. This type of access pattern is typical of the applications for which q and kdb are used.

To make this concrete, let's examine a column of 64-bit, floating point numbers:

q).Q.w[] `used
108464j
q)t: ([] f: 1000000 ? 1.0)
q).Q.w[] `used
8497328j
q)

As you can see, the memory needed to hold one million 8-byte values is only a little over 8MB. That's because the data are being stored sequentially in an array. To clarify, let's create another table:

q)u: update g: 1000000 ? 5.0 from t
q).Q.w[] `used
16885952j
q)

Both t and u are sharing the column f. If q organized its data in rows, the memory usage would have gone up another 8MB. Another way to confirm this is to take a look at k.h.

Now let's see what happens when we write the table to disk:

q)`:t/ set t
`:t/
q)\ls -l t
"total 15632"
"-rw-r--r-- 1 kdbfaq staff 8000016 May 29 19:57 f"
q)

16 bytes of overhead. Clearly, all of the numbers are being stored sequentially on disk. Efficiency is about avoiding unnecessary work, and here we see that q does exactly what needs to be done when reading and writing a column - no more, no less.

OK, so this approach is space efficient. How does this data layout translate into speed?

If we ask q to sum all 1 million numbers, having the entire list packed tightly together in memory is a tremendous advantage over a row-oriented organization, because we'll encounter fewer misses at every stage of the memory hierarchy. Avoiding cache misses and page faults is essential to getting performance out of your machine.

Moreover, doing math on a long list of numbers that are all together in memory is a problem that modern CPU instruction sets have special features to handle, including instructions to prefetch array elements that will be needed in the near future. Although those features were originally created to improve PC multimedia performance, they turned out to be great for statistics as well. In addition, the same synergy of locality and CPU features enables column-oriented systems to perform linear searches (e.g., in where clauses on unindexed columns) faster than indexed searches (with their attendant branch prediction failures) up to astonishing row counts.

A thorough introduction to these topics is given in Ulrich Drepper's 'What every programmer should know about memory' and Scott Meyer's 2011 talk on 'CPU caches and why you care'.

How do I convert a character to its ASCII code?

By passing the integer type symbol, `int, as the left argument to the $ (cast) operator:

q)`int $ "A"
65
q)

You can convert a list of characters at once:

q)`int $ "Hello, world!"
72 101 108 108 111 44 32 119 111 114 108 100 33
q)

Alternatively, the left argument to $ (cast) can also be the integer type character, "i" or type value 6h:

q)"i" $ "Hello, world!"
72 101 108 108 111 44 32 119 111 114 108 100 33
q)
q)type 1 2 3
6h
q)6h $ "Hello, world!"
72 101 108 108 111 44 32 119 111 114 108 100 33
q)

See also: Unicode, UTF-8

I'm in a hurry. Can multiple kdb processes write simultaneously to a splayed database?

Short answer: Yes, multiple processes can write to a database as long as the processes avoid concurrent write operations to the same table on the same partition. To see a significant speedup, you'll need to use segments.


With the exception of the ? operator, which can be used to provide concurrency protection for sym files, q processes do not coordinate file I/O operations in any way. Thus, kdb splayed table write operations require the same degree of care as ordinary file writes.

The mechanism to protect sym files, however, is used throughout the functions in the .Q namespace that pertain to writing splayed tables. So, you can use those functions to build a database faster by running multiple loader processes in parallel.

Example 1: Concurrent writes to the same table on the same partition. This means two processes are writing to the same (table column) files simultaneously, and that is not safe! The potential for corruption is high.

kdb_proc1: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]
kdb_proc2: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]

Example 2: Concurrent writes to two different tables on the same partition. This is will not cause corruption.

kdb_proc1: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]
kdb_proc2: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `q]

Example 3: Concurrent writes to the same table on different partitions. This is also safe, with no potential corruption.

kdb_proc1: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]
kdb_proc2: .Q.dpft[`:/vol/db; 2011.04.08; `sym; `t]

If the two partitions are on different I/O channels using segments, this last approach can be much faster than performing the writes serially.

How can I reorder column names of a splayed table on disk?

Modify the file .d in the splayed table directory. The contents are editable via operators get and set.

$ rlwrap q sp.q
KDB+ 2.7 2011.02.16 Copyright © 1993-2011 Kx Systems
+`p`city!(`p$`p1`p2`p3`p4`p5`p6`p1`p2;`london`london`london..
(+(,`color)!,`blue`green`red)!+(,`qty)!,900 1000 1200
+`s`p`qty!(`s$`s1`s1`s1`s2`s3`s4;`p$`p1`p4`p6`p2`p2`p4;300 ..
q)sp
s p qty
---------
s1 p1 300
s1 p2 200
s1 p3 400
q)`:splaydir/ set sp
`:splaydir/
q)\ls -a splaydir
,"."
".."
".d" // this is where the col names live
,"p"
"qty"
,"s"
q)get `:splaydir/.d
`s`p`qty
q)`:splaydir/.d set reverse get `:splaydir/.d
`:splaydir/.d
q)get `:splaydir/.d
`qty`p`s
q)

Verify with a fresh kdb instance:

$ rlwrap q splaydir
KDB+ 2.7 2011.02.16 Copyright © 1993-2011 Kx Systems
q)\v
,`splaydir
q)splaydir
qty p s
-------
300 0 0
200 1 0
400 2 0
q)

See also: xcol and xcols faq

How do I split a list?

If you want to split the list into lists of equal size, see this related faq on the # (take) operator.

If you want to split a string on a delimiter, use the vs (vector from scalar) function:

q)" " vs "The quick brown fox"
"The"
"quick"
"brown"
"fox"
q)", " vs "Hello, world!"
"Hello"
"world!"
q)

When its left argument is the null symbol, `, the vs function breaks apart a symbol on dots:

q)` vs `foo.bar.baz
`foo`bar`baz
q)

Those are the most common cases. We can also split a list into lists of varying length by passing a list of indexes as the left argument to the _ (cut) operator:

q)0 1 4 9 _ til 10
,0
1 2 3
4 5 6 7 8
,9
q)

To split a list of some type other than char using a delimiter is a little more complicated. We start by finding the indexes in the list that match the delimiter:

q)list: 1 2 0 3 4 0 5 6
q)delimiter: 0
q)indexes: where delimiter = list
q)indexes
2 5
q)

Now we can break list into pieces using the _ (cut) operator as above:

q)indexes _ list
0 3 4
0 5 6
q)

This is almost what we want. We'll use _/: (drop-each-right) to get rid of the delimiters:

q)1 _/: indexes _ list
3 4
5 6
q)

We can grab the first element of the result with # (take):

q)first[indexes] # list
1 2
q)

Then we can just join (using ,) the two together:

q)(enlist first[indexes] # list), 1 _/: indexes _ list
1 2
3 4
5 6
q)

Note that we must call enlist on the front of the list or else we'll get something a little different from what we intended:

q)(first[indexes] # list), 1 _/: indexes _ list
1
2
3 4
5 6
q)

Lastly, we can generalize to non-atomic types by replacing = with ~/: (match-each-right):

split: {[list; delimiter]
    indexes: where delimiter ~/: list;
    front: first[indexes] # list;
    rest: 1 _/: indexes _ list;
    : (enlist front), rest;
    }

How can I find the data type of a given table column?

Short answer: meta tablename

For example -

q)\l sp.q        / found in $QHOME/sp.q
q)s
s | name  status city
--| -------------------
s1| smith 20     london
s2| jones 10     paris
s3| blake 30     paris
s4| clark 20     london
s5| adams 30     athens
q)meta s
c     | t f a
------| -----
s     | s
name  | s
status| i
city  | s
q)

The output of meta is a kdb table, where columns c and t indicate the column name and data type, respectively.

The key the provides an alternate method:

q)key exec city from s
`symbol
q)

If you need the type code, use the type function:

q)type exec city from s
11h
q)

See also: Type character definitions, Meta output faq.

How do I set the return value of a function?

Short answer:      : return_value

Unless you use : (return) or ' (signal) to specify otherwise, a q function returns the value of its last expression. Since functions are lists of expressions separated by semicolons, both of the following functions return 3:

q)single_expression: {3}
q)single_expression[]
3
q)last_expression: {1; 2; 3}
q)last_expression[]
3
q)

To return a value other than that of the last expression, use : (return):

q)explicit_return: {: 3; 4}
q)explicit_return[]
3
q)

The last expression in explicit_return (i.e., 4) is never reached.

Meanwhile, if a function that does not return via : (return) (or ' (signal)) does not have a final expression (i.e, no expression after its last semicolon), that function returns a generic null:

q)no_return: {3; }
q)no_return[]
q)null no_return[]
1b
q)type no_return[]
101h
q)no_return[] ~ (::)
1b
q)

Lastly, functions can exit in two other ways:

1. Calling the exit function:

q)die: {exit 0}
q)die[]
$

(By the way, you can add code to run automatically at exit using .z.exit.)

2. Signaling an error:

q)signal: {' "oops"}
q)signal[]
'oops
q)

See this related faq for more information on the use of ' (signal)

How can I test for the existence of a file or directory in q?

Short answer: key file_handle or key dir_handle

File handle example:

q)\touch /tmp/test.txt
q)key hsym ` $ "/tmp/test.txt"
`:/tmp/test.txt
q)

The function key returns an empty list when the file does not exist:

q)hdel `:/tmp/test.txt
`:/tmp/test.txt
q)() ~ key `:/tmp/test.txt
1b
q)

key is one of the most heavily overloaded functions in q. It accepts an argument of type dictionary, keyed table, table column, list, and integer. It's easy to confuse it with keys, which turns out to be an entirely different function.

See also: ~ (match), \ (system), hdel, hsym

How do I remove a key from a dictionary?

Pass the key you want to remove as the left argument to the _ (drop) operator, and pass the dictionary as its right argument:

q)dict: `a`b`c ! 1 2 3
q)dict
a| 1
b| 2
c| 3
q)`b _ dict
a| 1
c| 3
q)

If you reverse the order of the arguments, the removal still works:

q)dict _ `b
a| 1
c| 3
q)

Although we can't think of a good reason to do this, you might come across it. As our commenter Attila pointed out, however, you are more likely to see this overload in its assignment form:

q)dict _: `b / same as dict: dict _ `b
q)dict
a| 1
c| 3
q)

Meanwhile, you can remove multiple keys at once by passing a list of keys as the left argument to _ (drop):

q)`a`c _ dict
b| 2
q)

Reversing the arguments does not work in this case:

q)dict _ `a`c
'type
q)

I have a very big text file. Can I read a text file in chunks?

By defining a function to handle one chunk

process_chunk: {[list_of_lines]
    // Split and parse each line in list_of_lines and
    // then - most likely - create some sort of output.
    // After all, the idea here is to avoid holding
    // all of the data in memory.
    }

and passing your chunk handler along with the file handle to .Q.fs:

bytes_read: .Q.fs[process_chunk; `:big_file_name]

For details on how to parse lines of text inside your chunk handler, see this related faq.