PWC111 - Ordered Letters


On with TASK #2 from the Perl Weekly Challenge #111. Enjoy!

The challenge

Given a word, you can sort its letters alphabetically (case insensitive). For example, “beekeeper” becomes “beeeeekpr” and “dictionary” becomes “acdiinorty”.

Write a script to find the longest English words that don’t change when their letters are sorted.

The questions

Can I overengineer it?!?

For reasons that will be clear shortly…

The solution

I know. It says English words.

But but…

… I wanted to make it more generic.

So I took a chance to look at the Perl Unicode Cookbook, hoping to not make too much of a mess:

#!/usr/bin/env perl
use utf8;                     # so literals and identifiers can be in UTF-8
use v5.24;
use strict;                   # quote strings, declare variables
use warnings;                 # on by default
use warnings qw(FATAL utf8);  # fatalize encoding glitches
use open qw(:std :utf8);      # undeclared streams in UTF-8
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;

use Unicode::Normalize;
use Unicode::Collate;
use Encode qw(decode_utf8);

@ARGV = map { decode_utf8($_, 1) } @ARGV;

my @pairs;
while (<>) {
   my $pair = check_ordered(NFD($_)) // next;
   push @pairs, $pair;
say for reverse map { $_->[1] } sort { $a->[0] <=> $b->[0] } @pairs;

sub check_ordered ($x) {
   state $coll = Unicode::Collate->new(level => 1);
   state $es = Unicode::Collate->new(level => 1, normalization => undef);
   my @chars = $x =~ m{(\X)}gmxs;
   shift @chars while @chars && $chars[0] =~ m{[\h\v]}mxs;
   pop @chars   while @chars && $chars[-1] =~ m{[\h\v]}mxs;
   my $original   = join '', @chars;
   my $rearranged = join '', $coll->sort(@chars);
   return [scalar(@chars), $original] if $es->eq($original, $rearranged);
} ## end sub check_ordered ($x)

I have to admit that I don’t understand the 100% of it. In particular, I’m using two instances of Unicode::Collate:

but I didn’t really understand what the difference is.

The comparison and check is performed without caring for either case or accent. The latter should not be an issue in English, although I noticed that it gives the green light to words like access's. Go figure.

The input list for the words is taken from /usr/share/dict/words.

Running the program gives back all the words, so the filtering can be done from the shell:

$ perl perl/ /usr/share/dict/words | head

It takes a bit… but it’s hopefully correct.

It’s interesting that the longest word composed of letters only is… billowy. Today I learned that it indicates something that is full or forming large waves or swell of something (I guess water, usually).

Stay safe everybody!

Comments? Octodon, Twitter, GitHub, Reddit, or drop me a line!