Delete Blank Pages from a PDF on Linux Command Line

By | 26/03/2018

OK, so I was looking for  a way to delete perceptually blank pages from PDF files in bulk. I found scripts that will remove truly blank pages, but that doesn’t help with pages that have been scanned because they can have just 1 pixel and now they’re not blank.

The script below which is running on CentOS 7 is checks for pages that have page coverage of less than or equal to 0.1% in a PDF file and then removes those pages.

The trick here is to use the new(ish) option in GhostScript, inkcov. This option reports how much of the page in percentages is covered by each colour. I am adding these percentages together to get total coverage of the page using awk.

If that percentage is less than the threshold, do not add that page number to the list of pages to keep.

Finally, pass the list of pages to keep to pdftk and create a new PDF with them.

#!/bin/sh
IN="$1"
filename=$(basename "${IN}")
filename="${filename%.*}"
PAGES=$(pdfinfo $IN | grep ^Pages: | tr -dc '0-9')

non_blank() {
    for i in $(seq 1 $PAGES)
    do
        PERCENT=$(gs -o -  -dFirstPage=${i} -dLastPage=${i} -sDEVICE=inkcov ${IN} | grep CMYK | nawk 'BEGIN { sum=0; } {sum += $1 + $2 + $3 + $4;} END { printf "%.5f\n", sum } ')
        if [ $(echo "$PERCENT > 0.001" | bc) -eq 1 ]
        then
            echo $i
            #echo $i 1>&2
        fi
        echo -n . 1>&2
    done | tee ${filename}.tmp
    echo 1>&2
}

set +x
pdftk "${IN}" cat $(non_blank) output "${filename}.pdf"

if [ $? -eq 0 ]
then
   rm ${filename}.tmp
   # Uncomment the line below to delete the input file
   # rm ${IN}
fi

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.