PDF management in Linux

Compress PDF with ghostscript

ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Even smaller -dPDFSETTINGS=/ebook:

ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Rotate PDF

Rotate 90 degrees anti-clockwise

pdfjam --angle 90 Cuckoocaster-Stencil-BALLISTIC.pdf  --outfile Cuckoocaster-Stencil-BALLISTIC-portrait.pdf

Rotate 90 degrees clockwise

pdfjam --angle -90 Cuckoocaster-Stencil-BALLISTIC.pdf  --outfile Cuckoocaster-Stencil-BALLISTIC-portrait.pdf

Resize PDF files

Using pdfjam will keep the aspect ratio of the original PDF and fit it into the new measurements. In this example into DIN A4:

pdfjam --papersize '{210mm,297mm}' Cuckoocaster-Stencil-BALLISTIC-portrait.pdf --outfile Cuckoocaster-Stencil-BALLISTIC-pdfjam-A4.pdf

Split PDF file into pages and sections

Extract page 12 from PDF:

pdftk source.pdf cat 12 output new.pdf

Extract a range and single pages:

$ pdftk source.pdf cat 1 3-6 313-389 output new.pdf

Split into individual pages

$ pdftk source.pdf burst 

Merge many PDF files into one

pdftk *.pdf output all.pdf

Merge multiple images into A4 PDF

pdfjoin --a4paper --fitpaper false --rotateoversize false scan*.png

More information on pdfjoin here. Install pdfjoin as part of texlive-extra-utils:

sudo apt install texlive-extra-utils

Extract text from PDF

The text is already in the PDF, it is not in images and requires OCR:

pdftotext input.pdf output.txt 

Extract images in original format from PDF

pdfimages -all fileWithImages.pdf ../../path/to/save/to

Create booklet from PDF for for double-sided printing

A booklet is a PDF file which contains a number of pages resized and fit to be printed on double-sided sheets in a way that allows the printed pages to be collated, folded, and stapled in the middle, resulting in a single booklet with the correct page order.

Before the pdf is composed the INPUT file is cropped to the relevant area in order to discard unnecessary white spaces. In this process, all pages are cropped to the same dimensions. Extra margins can be defined at the edges of the booklet and in the middle where the binding occurs.

The OUTPUT is written to INPUT-book.pdf. Existing files will be overwritten. All input files are processed seperatly.

https://manpages.ubuntu.com/manpages/xenial/man1/pdfbook2.1.html

By default assuming the long side in double-sided printing when turning the pages inside the printer.

pdfbook2 --paper=a4paper Weihnachtslieder-2022.pdf

Specify the short edge in double-sided printing when turning the pages inside the printer by setting --short-edge. I ended up using this option with a custom sized A5 format (148mm x 230mm) as the source, because the --no-crop option didn’t work. (Also: trying to set the margins manually didn’t look as neat as the crop done by pdfbook2)

pdfbook2 --paper=a4paper --short-edge Weihnachtslieder-2022.pdf

The following --no-crop version didn’t work well for me. Some special character where missing and displayed only as whitespace (e.g. fl in Schneeflöckchen) which were printed correctly before, the only difference being the --no-crop option.

pdfbook2 --paper=a4paper --short-edge --no-crop Weihnachtslieder-2022.pdf

Password protection for PDF files

pdftk sourcefilename.pdf output targetfilename.pdf user_pw PROMPT

This will prompt you:

Please enter the user password to use on the output PDF.
   It can be empty, or have a maximum of 32 characters:

And save a password protected PDF-file with the name targetfilename.pdf.