linux

Linux snippets: sorting a file, ignoring the header

When working with large data files that have a header, sometimes it is more efficient to sort the files for evaluation so that a streaming algorithm can be used. In addition, you may want to simply sort the data that you have by some key for organizational and readability purposes. Regardless, a lot of data preparation involves doing something with data in a delimited file containing a header, while also preserving the position and contents of the header.

Here is a short example that sorts a tab delimited file with a header by the first field in the file:

(head -n 1 data.tsv && tail -n +2 data.tsv  | sort -k1 -t'     ') > data_sorted.tsv
What this command does is spawn a subshell that runs everything in parenthesis, and then outputs it to a second file. Within the parenthesis, we first get the header (head -n 1). Then we run another command that takes everything except the header (tail -n +2) and pipes it to the sort utility. The arguments to sort include the field to sort by (-k1, or the first field in this case) and a delimiter (-t' ', which specifies using tab as a delimiter - you can paste a tab character by typing Ctrl-V followed by Tab). You could substitute whatever routine you want for sort.

Linux snippets: using xclip to pipe to the system clipboard

A lot of times I write scripts to generate code, specifically in the case where I have to generate a large amount of SQL column names. If I want to then paste this into a file in the appropriate place, I can either copy and paste from the terminal (which is cumbersome, especially on Linux) or pipe it to a file, and then copy and paste it (which is also a bit unwieldy).

Instead, we can save a step by piping directly to the system (X) clipboard using xclip.  To get it on Ubuntu, we can install it from the repositories:

sudo apt-get install xclip

The default behavior of xclip is not to put its input onto the system clipboard (it puts text in the X clipboard, so you'll be able to middle click to paste in X applications, but not your IDE), so I created an alias in my .bashrc (or .zshrc) file:

alias xclip='xclip -selection c'

Then, you can pipe to the system clipboard with:

cat long_file.txt | xclip
Now you can paste the output of cat long_file.txt with the system paste command into any other application.