KISS

Keep It Simple Stupid

Automation: PDFs, Preview, and vim

| comments

I sometimes need to extract certain pages from a big PDF document into a new one. Assuming I have a 150-page PDF, I can use such a command:

1
pdftk input.pdf cat 1 2 4-8 16-32 64 128 output output.pdf

pdftk is a great tool for this task. However, it takes a lot of time to go through the document, select the pages to extract, and write the numbers in the command line. Luckily, I’ve come up with a solution.

First, I tried the native OS X’s tools like Automator.app and AppleScript to get the current page number in Preview.app and build the proper list. The former one can’t do that, whereas the latter is better. I managed to get the page number, but trying to remove an element from a list with that “user-friendly” syntax was too complicated.

I decided to revert back to the tried-and-true command-line tools.

Getting page number

First, use AppleScript to get the Preview’s window title (which contains the current page number):

1
osascript -e 'tell app "Preview" to get name of front window'

which outputs i.e. “input.pdf (page 5 of 109)”. Getting the number is a piece of cake:

1
osascript -e 'tell app "Preview" to get name of front window' | sed -E 's/.*page ([0-9]+).*/\1/'

Using vim to edit command

To be able to use the command above, it is convenient to edit a command in vim. That’s easily done with the Ctrl-X Ctrl-E shortcut in bash and zsh.

You may need to set this environment variable: export EDITOR=vim. Also, read this post.

vim macros

I have two use cases for extraction: single page and a range of pages. So I have created this macros in ~/.vimrc:

1
2
inoremap <silent> <F7> <C-R>=system("osascript -e 'tell app \"Preview\" to get name of front window' \| sed -E 's/.*page ([0-9]+).*/ \\1/' \| tr -d '\\n'")<CR>
inoremap <silent> <F8> <C-R>=system("echo -n -$(($(osascript -e 'tell app \"Preview\" to get name of front window' \| sed -E 's/.*page ([0-9]+).*/\\1/' \| tr -d '\\n') - 1))\\ ")<CR>

With these two shortcuts you can insert the current page number with <F7> and the end of a page range with <F8>. Let me explain the second one, as the first one is a little simpler:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<C-R>=system(       # insert the result of a custom shell command
    "echo
        -n          # don't append a newline character
        -$((        # print - and the calculated expression
            $(
                    # get the Preview's window title
                osascript -e 'tell app \"Preview\" to get name of front window' \|
                    # extract the page number
                sed -E 's/.*page ([0-9]+).*/\\1/' \|
                    # remove the trailing newline character
                tr -d '\\n'
            ) - 1   # extract one from the page number
        ))\\ "
)<CR>

Using these macros inserts an extra space after a page range. This is a minor inconvenience that can be fixed with this command in vim:

1
:%s/\s\{2,\}/ /g

Thank you for reading. Please leave a comment if you have any questions or comments.

Comments