Entries in 0: (5)

How do I read in a text file?

Short answer:

1. (types; delimiter) 0: `:filename
2. .Q.fs[chunk_handler; `:filename]

Although there are many ways to read an ASCII file in q - depending on the content, how big the file is, and what you want to do with it - most of the time you will use one of two methods. The first method is for files you want to read into memory in their entirety, while the other approach is for situations in which you want to deal with the file in chunks. The latter scenario is covered in this related faq.

If the file is small enough (compared to the available memory in your system), you can read it all in as a list of lines in one go using read0. Given the following file, lines.txt,

foo=10
bar=20
baz=30

we can write

q)lines: read0 `:lines.txt
q)lines
"foo=10"
"bar=20"
"baz=30"
q)

To break up the lines, we use the vs (vector from scalar) function, applying the /: (each right) adverb so that we split each line:

q)split: "=" vs/: lines
q)split
"foo" "10"
"bar" "20"
"baz" "30"
q)

At this point, you probably want to parse each piece of text into its corresponding type to facilitate fast searching or arithmetic etc. You use the $ (cast) operator to do this, passing an uppercase type character as its left argument:

q)"S" $ "foo"
`foo
q)"I" $ "10"
10
q)

You may remember from this related faq that you can convert a list of items at once:

q)"S" $ ("foo"; "bar"; "baz")
`foo`bar`baz
q)"I" $ ("10"; "20"; "30")
10 20 30
q)

But wait! There's more! If you pass a list of type characters as the left argument to $, you can parse multiple lists:

q)"SI" $ (("foo"; "bar"; "baz"); ("10"; "20"; "30"))
foo bar baz
10 20 30
q)

The list of type characters can be as long as you like:

q)"SSFI*" $ ("foo"; "bar"; "10.5"; "47"; "left as a string")
`foo
`bar
10.5
47
"left as a string"
q)

Thus, we can parse our file with the following code:

q)"SI" $ flip "=" vs/: read0 `:lines.txt
foo bar baz
10 20 30
q)

Since this sequence of operations is so common, it has been wrapped up in an overload of that workhorse of text I/O, 0: (load text). The trick is to pass a pair as the left argument to 0: where the first element of the pair is the string of type characters and the second element of the pair is the delimiter between each name and value in the file:

q)("SI"; "=") 0: `:lines.txt
foo bar baz
10 20 30

Putting it all together, we can turn our file into a table like so:

q)flip `name`val ! ("SI"; "=") 0: `:lines.txt
name val
--------
foo 10
bar 20
baz 30
q)

Using 0: instead of the combination of read0, vs and $ is faster and less memory-intensive. The differences become significant as the file size grows:

q)system "wc trade_small.csv"
" 1000001 1000001 29997921 trade_small.csv"
q)\ts ("TSIF"; ",") 0: `:trade_small.csv
554 20971840j
q)\ts "TSIF" $ flip "," vs/: read0 `:trade_small.csv
2649 232389280j
q)

As a consequence of the 0: function's superior memory efficiency, it can handle much larger files than the other approach:

q)system "wc trade.csv"
" 10000001 10000001 299888328 trade_big.csv"
q)\ts ("TSIF"; ",") 0: `:trade_big.csv
5672 335544640j
q)\ts "TSIF" $ flip "," vs/: read0 `:trade_big.csv
wsfull
q)

If you don't actually need to parse the file contents, then read0 (by itself) is fine.

By the way, if you only want to grab part of the file, you can pass a triple to read0 in order to read a subset of the bytes. You'll still get a list of lines broken on newlines:

q)offset: 3
q)number_of_bytes_to_read: 6
q)read0 (`:lines.txt; offset; number_of_bytes_to_read)
"=10"
"ba"
q)

Unless your file has fixed-length records, however, you may find it easier - assuming you have the head and tail utilities (or similar) available, to use the system function to get exactly the lines you want. For example,

q)first_line: first system "head -1 lines.txt"
q)

(Note the call to first; system always returns a list of strings, even when there is only one.) This particular example is handy when you need to examine the start of a file to figure out how to read it properly.

How do I parse a ctrl-A delimited string?

Short answer: "\001" vs data

Use the vs (vector from scalar) function:

q)fstring
"f1=va\001f2=vb\001f3=vc"
q)"\001" vs fstring
"f1=va"
"f2=vb"
"f3=vc"
q)

We can combine vs with /: (each right) to break up each component:

q)"=" vs/: "\001" vs fstring
"f1" "va"
"f2" "vb"
"f3" "vc"
q)

Note that you cannot put any space between vs and /:. If you did, that would be a comment.

We might next turn fstring's contents into a table using a combination of flip and ! (dict):

q)flip "=" vs/: "\001" vs fstring
"f1" "f2" "f3"
"va" "vb" "vc"
q)`field`value ! flip "=" vs/: "\001" vs fstring
field| "f1" "f2" "f3"
value| "va" "vb" "vc"
q)flip `field`value ! flip "=" vs/: "\001" vs fstring
field value
-----------
"f1"  "va" 
"f2"  "vb" 
"f3"  "vc"
q)

This is almost what we want. Usually, though, you want the field names to be symbols rather than strings. We can do that by applying (@) the $ (cast) operator to (only) the values of the field column:

q)columns: @[flip "=" vs/: "\001" vs fstring; 0; `$]
q)columns
f1   f2   f3  
"va" "vb" "vc"
q)flip `field`value ! columns
field value
-----------
f1    "va" 
f2    "vb" 
f3    "vc" 
q)

However, there is a shortcut we can take to deconstruct the string into a table using one of the many variants of 0::

q)flip `field`value ! "S=\001" 0: fstring
field value
-----------
f1 "va"
f2 "vb"
f3 "vc"
q)

Not only is this last example more succinct, it's much faster:

q)\t do[100000;
        flip `field`value !
                 @[flip "=" vs/: "\001" vs fstring; 0; `$]]
485
q)\t do[100000; flip `field`value ! "S=\001" 0: fstring]
110
q)

See also How do I build a ctrl-A delimited string?

How do I build a ctrl-A delimited string?

Short answer: "\001" sv ("one"; "two"; "three")

To build delimited strings in general, use the sv (scalar from vector) function:

q)", " sv ("Hello"; "world!")
"Hello, world!"
q)

As in many programming languages, the syntax for non-printable ASCII characters is borrowed from C, although only a subset of C's sequences is supported:

        \n      newline
        \r      carriage return
        \t      tab

(q also supports a few printable escape sequences: \\, \" and \'). However, as in C, you can express any ASCII code in q by following a backslash with the code's 3 digit octal representation:

q)"\054\040" sv ("Hello"; "world!")
"Hello, world!"
q)

Since ctrl-A is ASCII 1, we need to use \001:

q)fstring: "\001" sv ("f1=va"; "f2=vb"; "f3=vc")
q)fstring
"f1=va\001f2=vb\001f3=vc"

We can use 0: to write fstring to a file, and use an external program to confirm the result:

q)`:test 0: enlist fstring
`:test
q)\\
$ hexdump test
0000000 66 31 3d 76 61 01 66 32 3d 76 62 01 66 33 3d 76
0000010 63 0a                                          
0000012

To break fstring up into its parts, see How do I parse a ctrl-A delimited string?

How do I save a table to delimited text?

One way, if the format you want is csv, is automatically invoked when using the save function if the name of the destination file ends in .csv:

q)t: ([] x: `a`b; y: 1 2)
q)save `:t.csv
q)\cat t.csv
"x,y"
"a,1"
"b,2"
q)

For other delimiters, you must first format the text using the overload of the 0: function whose first parameter is a character:

q)t: ([] x: `a`b; y: 1 2)
q)"|" 0: t
"x|y"
"a|1"
"b|2"
q)

Notice that, by default, q saves the table column names as a header in the first line of the file. If you want to get rid of the header line, use _ (drop):

q)t: ([] x: `a`b; y: 1 2)
q)1 _ "\t" 0: t
"a\t1"
"b\t2"
q)

Once you have your data as a list of strings complete with delimiters, you use another overload of the 0: function to save the data to disk; this time, the first parameter is the file handle (name) of the destination file:

q)t: ([] x: `a`b; y: 1 2)
q)`:filename.psv 0: 1 _ "|" 0: t
`:filename.psv
q)\cat filename.psv
"a|1"
"b|2"
q)

Since we can't seem to keep all of the overloads for 0: straight (there are more we didn't cover here), we like to wrap the above idiom in a function with a descriptive name. For example,

SaveTableDataAsText: {[path; table; delimiter]
hsym[path] 0: 1 _ delimiter 0: table}

See also: .h.cd

How do I load in a csv file into kdb with the headers as table fieldnames?

Insert the keyword enlist before the delimiter argument to 0: or 1:.

(“SSS”; enlist “,”) 0: `:filename.csv

See also: field parsing and operator 0: