PWC161 - Pangrams

TL;DR

On with TASK #2 from The Weekly Challenge #161. Enjoy!

The challenge

A pangram is a sentence or phrase that uses every letter in the English alphabet at least once. For example, perhaps the most well known pangram is:

the quick brown fox jumps over the lazy dog

Using the provided dictionary, so that you don’t need to include individual copy, generate at least one pangram.

Your pangram does not have to be a syntactically valid English sentence (doing so would require far more work, and a dictionary of nouns, verbs, adjectives, adverbs, and conjunctions). Also note that repeated letters, and even repeated words, are permitted.

BONUS: Constrain or optimize for something interesting (completely up to you) […]

The questions

Only a few annoying one:

  • is it OK to consider uppercase and lowercase as the same letter? I’ll assume yes because of the constraint on the provided dictionary
  • are you mean to give such an open space for coming up with something interesting?!?

The solution

OK, I like randomness because it gives the perfect excuse for not doing things properly! This time, just to exercise this bit of laziness a bit, we’ll try to build something that might seem like it makes some sense.

We’ll be using a Markov Chain text generator: something that, based on how frequently words appear after other words, tries to generate a text based on the same statistics.

As the base of our generator we’ll be taking the book Three men in a boat (to say nothing of the dog), because it’s funny and I was recently reminded about it. Just to take advantage of the latest and greatest typo fixes, we’ll be downloading it dynamically from the internet:

#!/usr/bin/env perl
use v5.24;
use warnings;
use experimental 'signatures';
no warnings 'experimental::signatures';

use HTTP::Tiny;
use constant DEFAULT_BOOK_URL =>
  'https://www.gutenberg.org/cache/epub/308/pg308.txt';
use FindBin '$Bin';
use List::Util 'sum';

# Assume Unixish filesystem
my $dictionary_file = shift // "$Bin/../../../data/dictionary.txt";
my $book_url = shift // DEFAULT_BOOK_URL;

my $dictionary = read_dictionary($dictionary_file);
my $book = get_stuff($book_url);
my $model = markov_model($book, $dictionary);

my ($src, $word, %hits, @trail);
while (keys(%hits) < 26) {
   $src = $model->{$word // ''} // $dictionary;
   push @trail, $word = random_word_draw($src);
   ++$hits{$_} for split m{}mxs, $word;
}

say join ' ', @trail;

sub random_word_draw ($weighted_candidates) {
   my $total = sum values $weighted_candidates->%*;
   my $draw = rand $total; # definitely space for improvement...
   for my $word (keys $weighted_candidates->%*) {
      $draw -= $weighted_candidates->{$word};
      return $word if $draw < 0;
   }
   die "unreached, hopefully\n";
}

sub read_dictionary ($filename) {
   open my $fh, '<:encoding(utf-8)', $filename or die "open(): $!\n";
   return { map {chomp; $_ => 1} readline $fh };
}

sub get_stuff ($url) {
   my $response = HTTP::Tiny->new->get($url);
   die "$response->{status} $response->{reason}\n"
      unless $response->{success};
   return $response->{content};
}

sub markov_model ($text, $dictionary) {
   my $previous = undef;
   my %successors_for;
   for my $word (split m{[^a-z]+}mxs, lc $text) {
      if ($dictionary->{$word}) {
         $successors_for{$previous}{$word}++ if defined $previous;
         $previous = $word;
      }
      else {
         $previous = undef; # restart
      }
   }
   return \%successors_for;
}

Now, isn’t this a beautiful concoction of overengineering, derailing and not giving a dime about how and in how many ways this can be improved? As an example, the restart might just be avoided, by only counting letters from allowed words while also using words outside the dictionary.

Anyway, here’s a run:

grinds subtraction belfries arduously agitating putties inelegant consults loftiest mollify concussions municipalities rascals rattler cheeped textile whimpered medleys transfers rewarding empties enormous prices for about its best to take your life was sure it there rode missing his board by the project freshman superseding dressers berets capitalist forage arterial reconciling tandem scooters lazier rabbles abridge seclude overstep madams jell grub eluding sneakiest certain number of five we were into it again an evening the grave as a flesh and live shameful conduct of his hair round before the woods all but it is certainly rather be worrying work is to put upon the sail at this seemed that dipped down at the man hobbling across your time either bank of justice to sell your fellow in its would break off in the wall and he rose growling at all the boat and tried to a knock kneed broken and at the boat they know after a bit of the evening however is most virulent form the husband she could not care or not lie in my friends watch me this agreement will he finished we went on its king the lock it up the water and he said it about donations to ever tried to have been a rock which he swore he wanted more would all the current they said seven and putting my wrist and sit and king the bottom of god draws nearer and with which are not know that is doing more and sat on the bank from this work and nothing whatever way back with me half dressed nothing about take your guessing it is always a fearful row them and then that it down with one evening we had she does you saw on rocks and have her own bedroom of the wave comes without betraying any you took up from the people gather together so sad and look and a few he would not agree to a fraternity in life that wash up his mother gently down not like doing a very big dogs is and the pie between them it you must appear to wake you will do it made our school master had been very happy to be altered foresee worsens aisles horizons grislier vogue quips

We’ll translate this into Raku, too, with just a slight concession to my ubiquitous laziness in that I don’t really want to search the equivalent of using FindBin at this point. Apart from this, it does everything the other program does:

#!/usr/bin/env raku
use v6;
use HTTP::Tiny;

constant \DEFAULT_BOOK_URL =
  'https://www.gutenberg.org/cache/epub/308/pg308.txt';

sub MAIN (Str:D $dictionary-file, Str:D $book-url = DEFAULT_BOOK_URL) {
   my $dictionary = read-dictionary($dictionary-file);
   my $book = get-stuff($book-url);
   my $model = markov-model($book, $dictionary);

   my ($src, $word, %hits);
   my @trail = gather while %hits.elems < 26 {
      $src = $model{$word // ''} // $dictionary;
      take $word = random-word-draw($src);
      ++%hits{$_} for $word.comb;
   };
   @trail.join(' ').put;
}

sub random-word-draw ($weighted-candidates) {
   my $total = $weighted-candidates.values.sum;
   my $draw = $total.rand;
   for $weighted-candidates.kv -> $word, $weight {
      $draw -= $weight;
      return $word if $draw < 0;
   }
   die "unreached, hopefully\n";
}

sub read-dictionary ($filename) {
   $filename.IO.words.map({$_ => 1}).Hash;
}

sub get-stuff ($url) {
   my $response = HTTP::Tiny.new.get($url);
   die "$response<status> $response<reason>\n"
      unless $response<success>;
   return $response<content>.decode;
}

sub markov-model ($text, $dictionary) {
   my $previous = Nil;
   my $successors-for;
   for $text.lc.split(/<-[ a..z ]>/) -> $word {
      if $dictionary{$word}:exists {
         $successors-for{$previous}{$word}++ if defined $previous;
         $previous = $word;
      }
      else {
         $previous = Nil; # restart
      }
   }
   return $successors-for;
}

And now – would you guess? Stay safe!!!


Comments? Octodon, , GitHub, Reddit, or drop me a line!