Short answer: cast the column to long before applying sum:

select sum `long $ size from large_table

In many programming languages, including q, anytime you add integers (unless you somehow know for sure that the sum will fit in – roughly – 31 bits) you risk integer overflow. If you’re lucky, the error will be obvious (e.g., you’ll get a negative number when you expected a positive one). Casting the arguments to sum to long will give you (almost) 63 bits of breathing room.

If your sum won’t fit in 63 bits, you’ll need to explore other options:

  • Switch to floats. Although overflow is still possible, it’s rare. However, you lose some precision.
  • Use a bignum library such as gmp.
  • Use a language (e.g., haskell or clojure) that has built-in support for arbitrary-precision arithmetic.

Lastly, keep in mind that literal values that cannot fit into an int must be suffixed with j – even if the context suggests that a long is expected:

q)select from meta large_table where c = `id
c | t f a
--| -----
id| j
q)count select from large_table where id = 84066472837652480
'84066472837652480
q)count select from large_table where id = 84066472837652480j
1
q)

Short answer: Yes, put the most restrictive constraint first.

Although expressions in q are usually evaluated from right to left, there is one exception: the constraints in a where clause are evaluated from left to right. Consider the following example:

q)t: ([] x: `a`b`c; y: 1 2 3)
q)select from t where x < `c, y > 1
x y

b 2
q)

The comma separating x < `c from y > 1 is not the join operator; instead, it delimits the constraints of the where clause. If you need to use the join operator in a where clause, use parentheses:

q)select from t where x in (`a, `b), y > 1
x y

b 2
q)

Each constraint is evaluated in the usual order, i.e., right-to-left:

q)select from t where x < first -1 # `p`c, y > 1
x y

b 2
q)

Because constraints are evaluated from left to right, putting the most restrictive constraint first will speed up queries on large tables. On a date partitioned database, for example, the placement of a constraint on the virtual date column will make a marked performance difference:

select from large_table where date = 2011.02.25, name = `JOE

is significantly faster than

select from large_table where name = `JOE, date = 2011.02.25

The latter query will attempt to load the entire contents of table for rows matching name `JOE prior to narrowing the result set down to a single date. kdb does not provide behind-the-scenes optimization in cases like these.

Consider the following function, testfunc:

testfunc: {[] x: `aaa;
y: `bbb;
z:: `ccc;
-1 “finished!”;
}

Let’s demonstrate the placement of a breakpoint prior to the assignment of global variable z. Although there is no explicit support for breakpoints in q, insertion of non-compliant code, such as breakhere; shown below, does the job (don’t forget the trailing semicolon):

testfunc: {[] x: `aaa;
y: `bbb;
breakhere;
z:: `ccc;
-1 “finished!”
}

We are tricking q into throwing a signal ‘breakhere.

q)testfunc[] {[] x: `aaa;
y: `bbb;
breakhere;
z:: `ccc;
-1 “finished!”
}
‘breakhere
q))

At this point, we can examine the local stack.

q))x
`aaa
q))y
`bbb
q))z
()
q))

kdb has suspended execution, leaving the remaining two lines of function testfunc unexecuted. : (colon) resumes execution.

q)):
finished!
-1
q)x
‘x
q)y
‘y
q)z
`ccc
q)

See: global amend (::)