Some programmers and systems engineers try to do everything they can with basic command line tools on the grounds that someday they may be in an environment where that’s all they have. I think of this as a sort of computational survivalism.
I’m not much of a computational survivalist, but I’ve come to appreciate such a perspective. It’s an efficiency/robustness trade-off, and in general I’ve come to appreciate the robustness side of such trade-offs more over time. It especially makes sense for consultants who find themselves working on someone else’s computer with no ability to install software. I’m not often in that position, but that’s kinda where I am on one project.
Example
I’m working on a project where all my work has to be done on the client’s laptop, and the laptop is locked down for security. I can’t install anything. I can request to have software installed, but it takes a long time to get approval. It’s a Windows box, and I requested a set of ports of basic Unix utilities at the beginning of the project, not knowing what I might need them for. That has turned out to be a fortunate choice on several occasions.
For example, today I needed to count how many times certain characters appear in a large text file. My first instinct was to write a Python script, but I don’t have Python. My next idea was to use grep -c
, but that would count the number of lines containing a given character, not the number of occurrences of the character per se.
I did a quick search and found a Stack Overflow question “How can I use the UNIX shell to count the number of times a letter appears in a text file?” On the nose! The top answer said to use grep -o
and pipe it to wc -l
.
The -o
option tells grep
to output the regex matches, one per line. So counting the number of lines with wc -l
gives the number of matches.
Computational minimalism
Computational minimalism is a variation on computational survivalism. Computational minimalists limit themselves to a small set of tools, maybe the same set of tools as computational survivalist, but for different reasons.
I’m more sympathetic to minimalism than survivalism. You can be more productive by learning to use a small set of tools well than by hacking away with a large set of tools you hardly know how to use. I use a lot of different applications, but not as many as I once used.
I didn’t know about the -o option but `grep … | sort | uniq -c | sort -nr` is very firmly embedded into my muscle memory.
And awk/gawk is for everything else. Unless you also have TCL.
You should have requested a go compiler at the beginning. Et viola.
I didn’t know about `-o`. That’s handy.
I think I told you about Gow years ago, and you weren’t interested. I’m glad you’ve come around. :)
That is the bioinformatician way! We stick to the unix basics because they are reliable, efficient (handy when working with huge files), work on text files and they are available on most unix machines.
to count all chars at once you can use
fold -w1 file | sort | uniq -c
Fatih: Very clever. I would not have thought of using fold that way.
I tend to install BusyBox on Windows: https://frippery.org/busybox/ and also the Tiny C compiler https://sourceforge.net/projects/tinycc-win32/
That gives me a UNIX environment from the 1990s (or even later) for about 2 MB (*not* GB), with most of the common utilities, including awk, plus the vi editor, also a fast ANSI C compiler (though the executables are up to two times slower than gcc-produced executables; not a real-world problem most of the time).
Well, Windows has PowerShell installed, and it is quite powerfull
Just checked and in powershell core command
> (Get-Content “.\pareto.R” | Select-String -Pattern “a” -AllMatches ).Matches.Count
produced exactly the same count (26) on both Windows 10 and Linux (ubuntu 18.04) powershell installations.