I was looking back at Jeroen Janssen’s book Data Science at the Command Line and his dseq
utility caught my eye. This utility prints out a sequence of dates relative to the current date. I’ve needed this and didn’t know it.
Suppose you have a CSV file and you need to add a column of dates as the first column. I’d probably open a spreadsheet, create a column of the dates I needed, then open the CSV file and paste in the column of dates.
With Jeroen’s utility I could run
dseq 5 | paste -d, - foo.csv
to create the same sequence of dates and add them as the first column of the file foo.csv
. The option -d,
tells paste
to use a comma as the field separator rather than the default tab. The dash tells paste
to use the piped output from dseq
as its first input file.
You can run dseq
three ways. With one argument, such as the 5 above, it returns the next five days from today (starting with tomorrow). With two arguments, the first is the beginning and the second is the end. With three arguments, the middle argument is an increment. As the source file summarizes it:
# Usage: dseq LAST # or: dseq FIRST LAST # or: dseq FIRST INCREMENT LAST
If you just want to use dseq
, grab it here. If you’d like to understand how dseq
is implemented, maybe in order to modify it, keep reading.
How it works
The code is a clever one-liner:
seq -f "%g day" "$@" | date --file - +%F
The source file has 17 lines: a shebang, several lines of documentation, and one line of code.
The one-liner starts with seq
, a builtin utility that produces a sequence of integers. Like many command line utilities, seq
is trivial, but it composes nicely with other utilities. And so it can be used in a pipeline to create useful scripts, as it does above.
The argument "$@"
simply passes on the arguments of the script calling seq
as arguments to seq
. So the arguments of dseq
become the arguments to seq
.
The rest of the call to seq
is formatting. It tells seq
to append ” day” after each number. The command
seq -f "%g day" 5
produces
1 day 2 day 3 day 4 day 5 day
This creates strings which the date
utility will interpret.
The command
date -d "1 day"
returns the date one day from now. It includes the time, so it’s the date and time 24 hours from now.
The command
date -d "1 day" +%F
uses the format string +%F
to format the date like YYYY-MM-DD, chopping off the time.
The date option --file
says to take a file as input and process each line as if it were passed into date
with the -d
option. The dash option says to use standard input as the file, just as the example with the paste
command above used dash to signify standard input, i.e. the output of the command to the left of the pipe symbol.
Note that this script works with the GNU coreutils implementation of date
. It does not work, for example, with the version of date
that ships with MacOS.
Thank you, John! Your blog has many command-line gems that have served as inspiration for the book.
I like how you’ve explained `dseq`. I must admit that that one-liner made me think again, which makes me wonder whether it’s not too clever. Then again, it’s robust and has served me well over the past decade.
To tie this back to the use case you mentioned in the introduction, this one-liner adds a column of dates to a CSV file named my.csv, regardless of its length:
“`
paste -d, <(echo "date"; dseq $(($(< my.csv wc -l) -1))) my.csv
“`
It once again shows the power of the command line but IMHO this is taking it too far. At this point it's probably better to use AWK. I'm looking forward to your next blog post :)
Cheers,
Jeroen
Jeroen’s example in his comment can avoid the ‘echo date’ by inserting 0 as dseq’s first argument.
It’s a shame that the -1 is needed later in the line but that’s due to seq(1)’s arguments being an open interval. We now know a half-open interval is generally more useful, unlike back when Kernighan created seq in Eighth Edition Unix.
I’ve used date(1)’s -d option together with seq(1) to produce dates in the past. Beware of time zones and in particular daylight saving adjustments. Asking date to add ‘3 day’ adds that much time, it does not wind today’s date on by three days. If the result is near a daylight-saving boundary then the same date can result as ‘2 day’ or ‘4 day’ if the command is run at an unfortunate time of day.
$ TZ=America/Los_Angeles date -d ’93 day’
2023-11-05 00:51:33 -0700 Sun
$ TZ=America/Los_Angeles date -d ’94 day’
2023-11-05 23:51:38 -0800 Sun
Using the UTC time zone avoids this. Giving the time as 12:00 also avoids a leap second causing a similar problem.