Entries in hdb (6)

Does the order of field constraints matter in the performance of a select query?

Short answer: Yes, put the most restrictive constraint first.

Although expressions in q are usually evaluated from right to left, there is one exception: the constraints in a where clause are evaluated from left to right. Consider the following example:

q)t: ([] x: `a`b`c; y: 1 2 3)
q)select from t where x < `c, y > 1
x y
b 2

The comma separating x < `c from y > 1 is not the join operator; instead, it delimits the constraints of the where clause. If you need to use the join operator in a where clause, use parentheses:

q)select from t where x in (`a, `b), y > 1
x y
b 2

Each constraint is evaluated in the usual order, i.e., right-to-left:

q)select from t where x < first -1 # `p`c, y > 1
x y
b 2

Because constraints are evaluated from left to right, putting the most restrictive constraint first will speed up queries on large tables. On a date partitioned database, for example, the placement of a constraint on the virtual date column will make a marked performance difference:

select from large_table where date = 2011.02.25, name = `JOE

is significantly faster than

select from large_table where name = `JOE, date = 2011.02.25

The latter query will attempt to load the entire contents of table for rows matching name `JOE prior to narrowing the result set down to a single date. kdb does not provide behind-the-scenes optimization in cases like these.

I'm in a hurry. Can multiple kdb processes write simultaneously to a splayed database?

Short answer: Yes, multiple processes can write to a database as long as the processes avoid concurrent write operations to the same table on the same partition. To see a significant speedup, you'll need to use segments.

With the exception of the ? operator, which can be used to provide concurrency protection for sym files, q processes do not coordinate file I/O operations in any way. Thus, kdb splayed table write operations require the same degree of care as ordinary file writes.

The mechanism to protect sym files, however, is used throughout the functions in the .Q namespace that pertain to writing splayed tables. So, you can use those functions to build a database faster by running multiple loader processes in parallel.

Example 1: Concurrent writes to the same table on the same partition. This means two processes are writing to the same (table column) files simultaneously, and that is not safe! The potential for corruption is high.

kdb_proc1: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]
kdb_proc2: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]

Example 2: Concurrent writes to two different tables on the same partition. This is will not cause corruption.

kdb_proc1: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]
kdb_proc2: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `q]

Example 3: Concurrent writes to the same table on different partitions. This is also safe, with no potential corruption.

kdb_proc1: .Q.dpft[`:/vol/db; 2011.04.07; `sym; `t]
kdb_proc2: .Q.dpft[`:/vol/db; 2011.04.08; `sym; `t]

If the two partitions are on different I/O channels using segments, this last approach can be much faster than performing the writes serially.

Why doesn’t my query 'select from reallybigtable' ever return?

It's a valid query, but q will run out of memory before returning. You'll have to add some constraints to the query to reduce the size of the result set. Something like

select from reallybigtable where date = YYYY.MM.DD

would be a good start.

What should I know about the sym file, and why back it up?

In a partitioned database, the sym file is a string intern pool which grows every time a new distinct symbol is introduced and persisted to that database. The sym file cannot be regenerated (unless you know the exact order in which your data was loaded from the inception of your database) and does not automatically shrink when symbols are removed from the database.

See this solution to compacting a bloated sym file.

What does '\l .' or 'system “l ."' do?

Typically, it's used to reinitialize and alert an existing kdb process of a new partition. Note that command triggers the execution of any q scripts existing in the current working directory of the running kdb process.

This command allows kdb to continue running and servicing requests while a new partition is constructed. Running kdb remains unware of the new partition until the \l . command is issued.

See also: .Q.l

Why can’t I use exec on partitioned tables?

We don't know. We imagine that kx has something clever up their sleeve that hasn't been released yet. For now, just use exec select from tablename.