ETOOBUSY 🚀 minimal blogging for the impatient
Grep o Treat
TL;DR
there is more to GNU grep than what I thought for a long time, including the possibility to avoid using sed from time to time.
I guess everybody has those nice little Today I Learnt… moments every now and then. This happened to me not too much time ago, reading some article that used GNU grep in a way that’s not seen everyday. And, by the way, that’s not portable (i.e. not supported in POSIX grep), if you’re wondering.
A Couple Of Interesting Options
GNU grep provides a couple of interesting command-line options, like these:
-o
: prints out only the matched test, instead of the whole line that includes it. So, if you’re looking for IP Addresses in some text that you’re fine to search with a simplified, approximate regex you can do something like this:
grep -o '[0-9]\+\(\.[0-9]\+\)\{3\}' <input.txt
-P
: activate the mighty engine for Perl-Compatible Regular Expressions (a.k.a. PCRE). This basically boils down to using Perl’s regular expressions, which provide much more flexibility. The previous example would become:
grep -Po '\d+(\.\d+){3}' <input.txt
Also, \K
Can Save The Day
Using PCRE gives much more than just less backslashes or shortcuts for
matching digits; some of the adds-on play particularly well with option
-o
.
One such feature is \K
, which resets the match start to where it is put.
Let’s see an example.
Suppose that you’re looking for the right IP Address in your input, the one that is preceded by the word “right”. This can of course be accomplished by just putting the additional constraint in the regular expression:
grep -Po 'right \d+(\.\d+){3}' <input.txt
Alas, this kind of defies the usefulness of -o
though, because now we
also get right
in the output! This is where \K
comes to the rescue:
grep -Po 'right \K\d+(\.\d+){3}' <input.txt
This tells that a successful match MUST include right
as we wish, but
for the purpose of setting the matched string, everything before \K
must be ignored. So we are back to our IP-address-only output like before.
And \b
, Of Course
Another interesting escape sequence enabled by PCRE is \b
, that is
a zero-width match for a word boundary.
In our example, suppose that your file might also contain IP Addresses that are adjacent to words, like this:
this is not right 10.20.1.230bis
...
this is right 10.1.12.34
To tell the two situations apart, we can just require that our approximate pattern for IP addresses ends on a word boundary:
grep -Po 'right \K\d+(\.\d+){3}\b' <input.txt
Today I Learnt…
… that grep + sed is a wonderful and powerful combination, but sometimes GNU grep alone can do wonders.