PDF management in Linux
Compress PDF with ghostscript
ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Even smaller -dPDFSETTINGS=/ebook
:
ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Rotate PDF
Rotate 90 degrees anti-clockwise
pdfjam --angle 90 Cuckoocaster-Stencil-BALLISTIC.pdf --outfile Cuckoocaster-Stencil-BALLISTIC-portrait.pdf
Rotate 90 degrees clockwise
pdfjam --angle -90 Cuckoocaster-Stencil-BALLISTIC.pdf --outfile Cuckoocaster-Stencil-BALLISTIC-portrait.pdf
Resize PDF files
Using pdfjam
will keep the aspect ratio of the original PDF and fit it into the new measurements. In this example into DIN A4:
pdfjam --papersize '{210mm,297mm}' Cuckoocaster-Stencil-BALLISTIC-portrait.pdf --outfile Cuckoocaster-Stencil-BALLISTIC-pdfjam-A4.pdf
Split PDF file into pages and sections
Extract page 12 from PDF:
pdftk source.pdf cat 12 output new.pdf
Extract a range and single pages:
$ pdftk source.pdf cat 1 3-6 313-389 output new.pdf
Split into individual pages
$ pdftk source.pdf burst
Merge many PDF files into one
pdftk *.pdf output all.pdf
Merge multiple images into A4 PDF
pdfjoin --a4paper --fitpaper false --rotateoversize false scan*.png
More information on pdfjoin
here. Install pdfjoin
as part of texlive-extra-utils
:
sudo apt install texlive-extra-utils
Extract text from PDF
The text is already in the PDF, it is not in images and requires OCR:
pdftotext input.pdf output.txt
Extract images in original format from PDF
pdfimages -all fileWithImages.pdf ../../path/to/save/to
Create booklet from PDF for for double-sided printing
A booklet is a PDF file which contains a number of pages resized and fit to be printed on double-sided sheets in a way that allows the printed pages to be collated, folded, and stapled in the middle, resulting in a single booklet with the correct page order.
Before the pdf is composed the INPUT file is cropped to the relevant area in order to discard unnecessary white spaces. In this process, all pages are cropped to the same dimensions. Extra margins can be defined at the edges of the booklet and in the middle where the binding occurs.
The OUTPUT is written to INPUT-book.pdf
. Existing files will be overwritten. All input files are processed seperatly.
https://manpages.ubuntu.com/manpages/xenial/man1/pdfbook2.1.html
By default assuming the long side in double-sided printing when turning the pages inside the printer.
pdfbook2 --paper=a4paper Weihnachtslieder-2022.pdf
Specify the short edge in double-sided printing when turning the pages inside the printer by setting --short-edge
. I ended up using this option with a custom sized A5 format (148mm x 230mm) as the source, because the --no-crop
option didn’t work. (Also: trying to set the margins manually didn’t look as neat as the crop done by pdfbook2)
pdfbook2 --paper=a4paper --short-edge Weihnachtslieder-2022.pdf
The following --no-crop
version didn’t work well for me. Some special character where missing and displayed only as whitespace (e.g. fl
in Schneeflöckchen
) which were printed correctly before, the only difference being the --no-crop
option.
pdfbook2 --paper=a4paper --short-edge --no-crop Weihnachtslieder-2022.pdf
Password protection for PDF files
pdftk sourcefilename.pdf output targetfilename.pdf user_pw PROMPT
This will prompt you:
Please enter the user password to use on the output PDF.
It can be empty, or have a maximum of 32 characters:
And save a password protected PDF-file with the name targetfilename.pdf
.