Open Source Gem


A little-known, very powerful data processor for your scripts, datamash makes long, complex calculations simple.

GNU datamash [1] is a command-line program capable of analyzing, summarizing, or transforming in various ways tables of numbers, with or without text, stored inside plaintext files. For these kinds of tasks, datamash is often a faster, more productive alternative to tools like AWK, sed, or any scripting language.

Just like those other tools, datamash is a good team player, in the traditional Unix and Linux sense: You can use datamash interactively at the prompt, automatically in shell scripts, and even directly attach it to other programs (including itself!) via Unix pipes.

Besides, in almost all the cases I have seen or can imagine, datamash does what you need with less typing, possibly a lot less. Last but not least, datamash lets you easily perform basic quality checks on raw data. I’ll show you how to do all this from scratch, starting with the basic options and ways of working with datamash and then moving to more complicated examples.


