Extracting regular data from file in terminal

Once in a while I need to write a shell script that extracts some information from a file and does something with it. It is easy to accomplish with the standard UNIX tools, such as cat, grep, head, sed, etc., once you’ve learned their features. Today’s example is, having an index.html file, we need to extract the css files imported in it (except the commented ones and, say, mobile.css).

Here’s our sample index.html:

<!DOCTYPE html>
<html>
    <head>
        <title>Hello there</title>
        <link rel="stylesheet" type="text/css" href="css/main.css" />
        <link rel="stylesheet" type="text/css" href="css/mobile.css" />
        <link rel="stylesheet" type="text/css" href="css/nav.css" />
        <!--<link rel="stylesheet" type="text/css" href="css/old.css" />-->
    </head>
    …
</html>

If you only know grep, here’s a way:

grep '\.css' index.html | egrep -v '(mobile\.css|<!--.*-->)' | grep -o 'href=".*"' | cut -d'"' -f2

It works, but kind of long: filter out strings not having .css lines, ignore excluded patterns, remove everything except href="…", and finally cut the contents. Turns out this can be rewritten with just one command, using sed:

sed -nE '/\.css/ { /(mobile\.css|<!--.*-->)/ { d; }; s/.*href="([^"]+)".*/\1/; p; }' index.html

Here -n parameter disables output of the file by default, and -E enables extended regexes. The command is: for each line containing .css, replace the whole line with the part inside href="…" and print it, unless the line contains excluded patterns. Nice, easy enough, and should be faster. Enjoy!

About Me

Hi there! Nice to have you here.

My name is Eugene and I'm a software developer. This blog is about my thoughts, discoveries, and experiences in IT, computers, software development, and maybe something else. My current programming languages: php, Haskell, typescript, shell scripting; I'm also familiar with Swift, Objective-C, Python, Java.

Recommended Podcasts

(in alphabetic order)

KISS 🇺🇦

Stop the war!

Extracting regular data from file in terminal

Comments