You may have seen sed
programs even if you didn’t know that’s what they were. In online discussions it’s common to hear someone say
s/foo/bar/
as a shorthand to mean “replace foo with bar.” The line s/foo/bar/
is a complete sed
program to do such a replacement.
sed
comes with every Unix-like operating system and is available for Windows here. It has a range of features for editing files, but sed
is worth using even if you only know how to do one thing with it:
sed "s/pattern1/pattern2/g" file.txt > newfile.txt
This will replace every instance of pattern1 with pattern2 in the file file.txt
and will write the result to newfile.txt
. The original file file.txt
is unchanged.
I used to think there was no reason to use sed
when other languages like Python will do everything sed
does and much more. Suppose you agree with that. Now suppose you find you often have to make global search-and-replace operations and so you write a script to do this, say a Python script. You’ve got to call your script something, remember what you called it, and put it in your path. How about calling it sed
? Or better, don’t write your script, but pretend that you did. If you’re on Linux, it’s already in your path. One advantage of the real sed
over your script named sed
is that the former can do a lot more, should you ever need it to.
Now for a few details regarding the sed
command above. The “s” on the front stands for “substitute” and the “g” on the end stands for “global.” Without the “g” on the end, sed
would only replace the first instance of the pattern on each line. If that’s what you want, then remove the “g.”
The patterns inside a sed
command are regular expressions, so it’s best to get in the habit of always quoting sed
commands. This isn’t necessary for simple string substitutions, but regular expressions often contain characters that you’ll need to prevent the shell from interpreting.
You may find the default regular expression support in sed
odd or restrictive. If you’re used to regular expressions in Perl, Python, JavaScript, etc. and you’re using a Gnu implementation of sed
, you can add the -r
option for more familiar regular expression syntax.
I got the idea for this post from Greg Grouthaus’ post Why you should learn just a little Awk. He makes a good case that you can benefit from learning just a few commands of a language like Awk with no intention to learn more of the language.
I have used
sed
for quite a while now. I have found its default regular expressions syntax quite restrictive, and even with extended regular expressions, they are still implemented as a DFA per the POSIX standard (this can be annoying when you’re using an expression likefoo.*bar
, sincefoobar
may simply just matchfoo.*
) and does not have the familiar character classes, again for POSIX compliance.The best way to accomplish everything that
sed
andawk
do is to use the following command:perl -p -e 'some perl command' myfile > newfile
this tells the perl interpreter to execute the perl command on each line of myfile, assigning that line in question to
$_
(as far as the perl input command — or set of commands — is concerned). This construct has more than fulfilled all my needs which Iwould have previously used awk or sed for, and perl is shipped with every unix distro anyways. See this article — http://www.techrepublic.com/article/use-command-line-perl-to-make-unix-administration-easier/1044668 . Cheers!Dan, thanks for the tip. Along with Greg Grouthaus’ line of thinking, someone could use Perl as you suggested without learning any more Perl. The “some perl command” could be
sed
commands of the form in this post.In fact, this is how I use Perl, pretty much. that is:
cat file | perl -pe "s/old/new/g" > newfile
I third ‘perl -e’. I use it all the time, much more than invoking Perl scripts, although for more complicated jobs the script is the way to go.
I don’t use ‘perl -e’ so much for file editing as for things like:
perl -e "foreach $i (0..999) {mkdir "foo$i"}"
I haven’t found a simpler or more convient way to do this on Windows. PowerShell could probably do it, and I’m sure there’s a way to install and use Unix-y tools to do it, too. But I find the Perl one-liners extremely convenient.
Some distributions come with a command called ‘replace’.
I often use it for simple recursive substitutions in multiple files at once:
find . -type f -name "*.ext" | xargs replace "BEFORE" "AFTER" --
@Will: Why the use of cat?
@Chris: it’s a common pattern if you’re using more than one pipe. Eg: ‘cat foo | sed -e s/foo/bar/’ | grep ’37’ > foo.37s.fixed_foo’. Doing it this way makes it easier to reorder the pipe components, and also makes it easier for people who look at your scripts to tell what file you’re working with. Also, it’s a bit easier to reason about if you’re doing a for x in a b c; do … done loop. In the end, though, I find it’s most common amongst people who do a lot of work on one command line (which, like many things, is naughty and something you should never do, even though everyone starts doing it sooner or later).
With -i you can do all the changes within the original file without the need to create a second output file (I think it just moves the output to the input after it’s done).
Some parts of perl were designed to appeal to then-current UNIX users. The Perl Power Tools were a proof of concept reimplementation of the (common)UNIX tools, and generally only took a few lines for each tool.
As to “learn one xyz command,” the fact that these sort of commands are re-implemented over and over indicates that there is a common problem that isn’t solved by by the UNIX tools. That problem might just be the horrible documentation, but I’m not so sure.
if you want to replace newline chars, it’s a problem with sed. Also, if your text span multi-lines. After i learned perl, i thought perl supercedes all unix commands, but actually i find it faster and more efficient to mine apache log files by several piping with sed/awk/grep/cat/sort etc. These days, all my text processing script is elisp in emacs.