ETOOBUSY 🚀 minimal blogging for the impatient
Pounded by #
TL;DR
I’ve been hit by an alleged no-bug in Raku.
While doing some parsing for Advent of Code 2018 puzzle 4 I ended up with the following regular expression:
/Guard \s+ \# (\d+)/
Alas, this does not work in Raku. The #
character is considered to
be starting a comment despite the preceding backslash, eventually
making the rest of the line invisible to the parser and making the
compilation fail spectacularly:
$ raku
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2021.07.
To exit type 'exit' or '^D'
> '[1518-11-01 00:00] Guard #10 begins shift' ~~ /Guard \s+ \# (\d+)/
===SORRY!===
Regex not terminated.
at line 2
------> <BOL>⏏<EOL>
Unable to parse regex; couldn't find final '/'
at line 2
------> <BOL>⏏<EOL>
expecting any of:
infix stopper
I looked around and it seems that there is no plan to fix this in Rakudo: ’#’ literals in Grammars: syntax error. So I opened a documentation issue about this.
The workaround is to put the #
character in quotes:
/Guard \s+ '#' (\d+)/
which works fine:
> '[15818-11-01 00:00+ Guard #10 begins shift' ~~ /Guard \s+ '#' (\d+)/
「Guard #10」
0 => 「10」
Actually, workaround is a bit of a misnomer, as it’s really a different, approved and maybe even suggested way of doing this kind of things. But you know, conciseness.
Another way of doing this might be to create a character class for the character, like this:
/Guard \s+ <[#]> (\d+)/
Is it any better? I don’t know, maybe it’s a bit too line-noisy…
Anyway, if you need to put #
in your Raku regular expressions…
quote it or, at least, don’t escape it!
Stay safe and have -Ofun
people!