Extract links/images from files or URLs

TL;DR

if you need to squeeze all link/image URLs from HTML files or URLs, look no further. Quick’n’dirty but should serve most needs.

If the above snippet from GitLab doesn’t show up, please take a look at this possibly outdated local version.

This is a quick’n’dirty way of extracting all links (i.e. href attributes of a tags) and images (i.e. src attributes of img tags) out of a list of local files (interpreted as HTML) or URLs (dynamically downloaded). Leverages Mojolicious.

It’s very bare-bones, e.g. it does not pre-pend a base URL in case of relative URLs. It should be a good starting point though.


Comments? Octodon, , GitHub, Reddit, or drop me a line!