Find fishy Pods in Kubernetes

TL;DR

A little utility to find out fishy Pods in a Kubernetes cluster.

Here’s a handy program to figure out which Pods in a Kubernetes cluster are still… on their way, with the possibility of getting a hint as to why they might be prevented from going Ready:

kgp 1.53 KiB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#!/usr/bin/env perl
use strict;
use warnings;

$|++; # disable buffering
my $show_last_event = $ENV{KGP_SHOW_LAST_EVENT} || $0 =~ m{kgpe\z}mxs;
my $namespace = $show_last_event ? get_namespace(@ARGV) : undef;

unshift @ARGV, qw< kubectl get pods >;
print {*STDERR} "# Suspicious/transient Pods from:\n";
print {*STDERR} "#    @ARGV\n";
open my $fh, '-|', @ARGV or die "ERROR: $!\n";

while (<$fh>) {
   my ($ready, $total, $status) = m{
      \s+ (\d+) / (\d+) \s+   # $ready / $total
      (\S+)               # $status
   }mxs or print && next; # e.g. header line, ...
   next if ($status eq 'Completed') || ($ready == $total);
   print;

   next unless $show_last_event;
   my ($time, $severity, $error, $object, $msg) = get_last_event($namespace, $_) or next;
   print "   [$severity] $error: $msg";
}

sub get_namespace {
   my @args = @_;
   my $namespace;
   for my $i (0 .. $#args) {
      return if $args[$i] =~ m{\A (?:--all-namespaces|-A) \z}mxs;
      $namespace = $args[$i + 1] if $args[$i] =~ m{\A (?:--namespace|-n) \z}mxs;
   }
   return $namespace;
}

sub get_last_event {
   my ($namespace, $line) = @_;
   my ($first, $second) = split m{\s+}mxs, $line;
   my ($ns, $name) = defined $namespace ? ($namespace, $first) : ($first, $second);

   open my $pfh, '-|', qw< kubectl get events -n >, $ns, '--field-selector', "involvedObject.name=$name";
   scalar readline $pfh; # ditch first line
   my $last_event;
   while (<$pfh>) { $last_event = $_ }
   close $pfh;

   return unless defined $last_event;
   return split m{\s+}mxs, $last_event, 5;
}

Local version here. Save it as kgp and put it somewhere in PATH.

Its usage is pretty straightforward: use it as if it were kubernetes get pod. It will run the command for you with the options you pass on the command line, and filter the output to only keep Pods whose state is not as expected.

Examples (output slighly redacted for readability):

# get Pods in a weird state from any namespace
$ kgp -A
NAMESPACE       NAME     READY   STATUS    RESTARTS   AGE
polettix        foobar   0/1     Pending   0          12m

# get Pods in a weird state in namespace "polettix" only
$ kgp -n polettix
NAME     READY   STATUS    RESTARTS   AGE
foobar   0/1     Pending   0          12m

Many times, though, it’s also interesting to know what is going wrong with a Pod; an initial investigation point is usually the last event of the Pod itself. For this reason, it’s possible to get the last event line by setting environment variable KGP_SHOW_LAST_EVENT to 1 (again, output slightly redacted for readability):

$ KGP_SHOW_LAST_EVENT=1 kgp -n polettix
NAME     READY   STATUS    RESTARTS   AGE
foobar   0/1     Pending   0          12m
   [Warning] FailedScheduling:...unbound immediate PersistentVolumeClaims

I know, it can be a real hassle to do this every time, you can either set the environment variable persistently, or you can just call the program with a different name kgpe (the added e is for event):

$ ln -s kgp "$(which kgp)e"
$ kgpe -n polettix
NAME     READY   STATUS    RESTARTS   AGE
foobar   0/1     Pending   0          12m
   [Warning] FailedScheduling:...unbound immediate PersistentVolumeClaims

In this way you can decide whether you want the more compact behaviour or the more verbose one.

Oh, a question!

Why would you want the compact one?

Sometimes Pods just need some time to get up and reach the Ready state, without necessarily being problematic. In these cases, the compact behaviour prints less clutter and is perfect for running a watch looper, like this:

watch kgp -A

I hope you will never be in the condition to need this program of course… but I’m more hopeful that it will be useful in case you need to do some troubleshooting.

Cheers!


Comments? Octodon, , GitHub, Reddit, or drop me a line!