How can I experiment on a production kdb while keeping the data free from accidental damage?

Use the -b command line argument to start kdb in read only mode, or start up with a negative port number.

one can verify the read-only mode by examining the return value of function \_

$ q -b
KDB+ 2.7 2011.02.16 Copyright (C) 1993-2011 Kx Systems
...
q)\_
1
q)

What is a virtual column?

A virtual column is not stored. There are two kinds of virtual columns in kdb:

  1. The index column, i, which is present in all tables, and
  2. the partition column on a partitioned table. For example, the most common partitioning scheme is by date. Rather than storing the date column along with the other columns in the table, the date is inferred from the partition directory; splayed table directories will not contain files for the date column.

See also: .Q.pf and .Q.pt

What is the difference between the functions xcol and xcols?

Use xcol to rename columns. Note that it doesn't require a complete list of columns.

$ rlwrap q sp.q
KDB+ 2.7 2011.02.16 Copyright © 1993-2011 Kx Systems
+`p`city!(`p$`p1`p2`p3`p4`p5`p6`p1`p2;`london`london`london`l..
(+(,`color)!,`blue`green`red)!+(,`qty)!,900 1000 1200
+`s`p`qty!(`s$`s1`s1`s1`s2`s3`s4;`p$`p1`p4`p6`p2`p2`p4;300 200 100 400 200 300)
q)sp
s p qty
---------
s1 p1 300
s1 p2 200
s1 p3 400
s1 p4 200
q)`p`s xcol sp // rename the first two cols
p s qty
---------
s1 p1 300
s1 p2 200
s1 p3 400
s1 p4 200
q)

Use xcols to reorder columns:

q)(reverse cols sp) xcols sp
qty p s
---------
300 p1 s1
200 p2 s1
400 p3 s1
200 p4 s1

One way to remember which is which is that, just as rename precedes reorder (in alphabetical order), so does xcol precede xcols.

What should I know about the sym file, and why back it up?

In a partitioned database, the sym file is a string intern pool which grows every time a new distinct symbol is introduced and persisted to that database. The sym file cannot be regenerated (unless you know the exact order in which your data was loaded from the inception of your database) and does not automatically shrink when symbols are removed from the database.

See this solution to compacting a bloated sym file.

How do I get the day of the week?

q)`Sat`Sun`Mon`Tue`Wed`Thu`Fri .z.D mod 7
`Sat
q)\date
"Sat Mar 5 18:56:04 PST 2011"
q)

see also: -W, .z.D, mod, \

Is the output of meta available for use in my program?

Yes. meta's result is an ordinary q table:

$ rlwrap q sp.q
KDB+ 2.7 2011.02.16 Copyright © 1993-2011 Kx Systems
q)type meta sp
99h
q)meta sp
c  | t f a
---| -----
s  | s s
p  | s p
qty| i
q)cols meta sp
`c`t`f`a
q)meta meta sp
c| t f a
-| -----
c| s
t| c
f| s
a| s
q)

Who uses time series or column-oriented databases and why?

Like any database technology platform, time series databases have found specialized applications in several industries, including telecommunications, utility, finance, and military.

For example, cell carriers use time series databases to monitor peak mobile phone usage. Similarly, utilities use them to study optimal tariff structure to shape on-peak or off-peak electricity usage.

See wikipedia for more on time series and column-oriented databases.

How do I delete a function in namespace .z?

The usual way to delete a function - any entity, for that matter, is

delete entity from namespace

e.g.

delete foo from `.

However, .z is a special namespace, so delete doesn't work:

q).z.ts: {47}
q)delete ts from `.z
'type
q)

There is a special command just for removing elements from .z:

q)\x .z.pi

See \x (expunge).

How do I choose whether to make a field a symbol or string?

A q string is a vector of characters. If you need to manipulate the string, or the set of possible values the string might take is unbounded, then go with string.

Use symbol when the set of values is restricted, e.g., exchange names, or tickers. Symbols are faster for comparison operations (and that means lookup as well).

See string interning.

How can I group all (other) columns?

Short Answer

xgroup.

Explanation

When you write a grouping query (i.e., one with a by clause) that lists columns in its select clause, the result nests all of the values from the selected columns. However, if no columns are listed in the select clause, the result only includes each column's last value for each key.

For example, consider the following table:

trade: ([] time:  `time$   ();
           sym:   `symbol$ ();
           price: `float$  ();
           size:  `int$    ());

`trade insert (09:30:00t + til 1000;
               1000 # 10 ? `3;
               10.0 + 1000 ? 15.0;
               100 + 100 * 1000 ? 10);

If we execute a query with a column in its select clause, the result is nested:

q)select price by sym from trade
sym| price                                         ..
---| ----------------------------------------------..
dgh| 13.68204 18.47666 19.59256 23.92803 12.75742 2..
dmo| 19.39943 19.1779  12.71798 10.9528  24.89916 2..
ekh| 22.16764 19.13283 23.42802 11.1409  24.42824 1..
gap| 16.33531 11.47509 13.90435 16.40958 22.59581 2..
log| 23.23925 17.83933 21.00673 14.97167 18.14587 1..
q)

However, if the select clause is empty, then the query returns only the last item from each group:

q)select by sym from trade
sym| time         price    size
---| --------------------------
dgh| 09:30:00.992 22.55807 1000
dmo| 09:30:00.999 22.72169 1000
ekh| 09:30:00.993 24.27423 300 
gap| 09:30:00.995 20.08908 900 
log| 09:30:00.998 24.42545 600 
q)

Usually, this behavior is what you want. However, if you want to group all columns (except the grouping column), you have two options:

1. List each column explicitly in your select clause,

q)select time, price, size by sym from trade
sym| time                                          ..
---| ----------------------------------------------..
dgh| 09:30:00.002 09:30:00.012 09:30:00.022 09:30:0..
dmo| 09:30:00.009 09:30:00.019 09:30:00.029 09:30:0..
ekh| 09:30:00.003 09:30:00.013 09:30:00.023 09:30:0..
gap| 09:30:00.005 09:30:00.015 09:30:00.025 09:30:0..
log| 09:30:00.008 09:30:00.018 09:30:00.028 09:30:0..
q)

or,

2. Use xgroup:

q)`sym xgroup trade
sym| time                                          ..
---| ----------------------------------------------..
pln| 09:30:00.000 09:30:00.010 09:30:00.020 09:30:0..
nch| 09:30:00.001 09:30:00.011 09:30:00.021 09:30:0..
dgh| 09:30:00.002 09:30:00.012 09:30:00.022 09:30:0..
ekh| 09:30:00.003 09:30:00.013 09:30:00.023 09:30:0..
ojn| 09:30:00.004 09:30:00.014 09:30:00.024 09:30:0..
q)

The more columns your table has, the more attractive xgroup will be. However, every time we tried, xgroup was slightly slower:

q)\t do[10000; select time, price, size by sym from trade]
246
q)\t do[10000; `sym xgroup trade]
275

On the other hand, if you want the code in question to be generically applied to several tables, xgroup is much better than the alternatives.

Notice that xgroup, in contrast to select by, does not sort the result by its key. If you need the rows sorted by key, follow xgroup with xasc:

q)`sym xasc `sym xgroup trade
sym| time                                          ..
---| ----------------------------------------------..
dgh| 09:30:00.002 09:30:00.012 09:30:00.022 09:30:0..
dmo| 09:30:00.009 09:30:00.019 09:30:00.029 09:30:0..
ekh| 09:30:00.003 09:30:00.013 09:30:00.023 09:30:0..
gap| 09:30:00.005 09:30:00.015 09:30:00.025 09:30:0..
log| 09:30:00.008 09:30:00.018 09:30:00.028 09:30:0..
q)

The cost of the sort will make the xgroup significantly slower than using select:

q)\t do[10000; `sym xasc `sym xgroup trade]
479
Page 1 ... 5 6 7 8 9 ... 12 Next 10 Entries »