5 Handy Tips for Working with PDFs on Linux

2237 VIEWS

You probably don’t think of Linux as a premier platform for editing, converting, splitting, manipulating or otherwise working with PDF files.

After all, Adobe Acrobat, the leading commercial platform for managing PDFs, doesn’t run natively on Linux. Few people use Linux to create or edit PDF files.

But thanks to the power of Bash scripting and Linux command-line tools like pdftk, Ghostscript and pdf2text, your Linux PC or laptop can be a very efficient environment for working with PDF files.

In this article, we’ll take a look at some useful tasks you can accomplish with PDF files on Linux. Some of the utilities we’ll be examining could be used on other operating systems, too, but I think they’re most powerful when you run them in a Bash shell in which you can script tasks easily.

Merging PDF Files from the Command Line

Ever need to combine multiple PDFs into a single PDF file? I do all the time. Fortunately, pdftk makes it easy.

To use pdftk, you first need to install it. On Ubuntu, you can do that with a simple:

sudo apt-get install pdftk

To merge PDFs with pdftk, simply open up a terminal and run a command like this:

pdftk file1.pdf file2.pdf file3.pdf cat output combined.pdf

In this example, file1.pdf, file2.pdf and file3.pdf are the PDFs you want to merge. Make sure you list the file names within the command in the order that you want them to appear within the PDF you are creating. The resulting PDF is named combined.pdf.

I like pdftk because it’s fast and can easily be incorporated into a Bash script in order to automate the generation of PDFs. I find this handy when, for example, I am working on a book manuscript and need to update the master manuscript by combining individual chapters (which are separate PDFs) into a single PDF file.

Merging or Reordering PDFs Graphically

If you prefer a visual interface for merging or editing PDFs, you can use PDF-Shuffler. Install it with:

sudo apt-get install pdfshuffler

You can start it by typing in a terminal:

pdfshuffler

Note that there’s no hyphen in the name of the package or the command-line utility.

PDF-Shuffler is relatively basic, but it makes it easy to merge PDFs, as well as remove or reorder pages within a PDF file by dragging and dropping.

Unfortunately, PDF-Shuffler doesn’t support editing the content of a PDF. It only allows you to move pages around, delete pages, and so on.

Converting .doc, .docx or .odt Files to PDF

If I had a dollar for every time I have had to convert a word processor file to a PDF, I’d no longer have to work.

Fortunately, although no one pays me to convert documents to PDF, I can do it easily on Linux using LibreOffice.

Sometimes, I export a document to PDF in LibreOffice using the graphical interface. There’s a button in the interface that allows you to do this very easily. It looks like this:

But other times, it can be useful to convert documents to PDF format using the command line. This is much faster and easier when you have a large number of documents to convert, or you need to do the conversions programmatically.

In that case, you can take advantage of LibreOffice’s command-line interface, which is called soffice. Yes, LibreOffice has a relatively powerful CLI utility, although it’s unfortunately poorly documented.

To convert a .doc document to PDF from the command line using LibreOffice, run a command like:

soffice –convert-to pdf file.doc  –headless

In this example, file.doc is the file you want to convert to PDF. The output PDF file will be named file.pdf. Using the same type of command, you could convert .docx, .odt, .rtf or any other type of word processor document that LibreOffice supports.

You can use wildcard characters to convert more than one file at once. For example, this command would convert all of the .doc files in your working directory to PDF:

soffice –convert-to pdf *.doc  –headless

Note that you can’t have the graphical version of LibreOffice running when you use soffice on the command line. You have to close LibreOffice first.

While soffice is a bit high-maintenance in some respects, being able to convert word processor documents to PDFs on the command line is really handy—especially if you have a large number of files to convert.

Convert PDFs to Text

Sometimes, it’s handy to be able to convert a PDF file to text so that you can manipulate the file’s contents easily.

On Linux, pdf2text is a handy utility for doing this.

On Ubuntu, pdf2text is part of the poppler-utils package, so you can install it with:

sudo apt-get install poppler-utils

Then, to convert a PDF file to text, use a command like:

pdftotext file.pdf text.txt

In this example, file.pdf will be converted to a text file named text.txt.

Note that, once installed, the binary you run is called pdftotext, not pdf2text. It’s a little confusing, but that’s what makes Linux fun!

Reduce PDF File Size Using Ghostscript

Sometimes you can get a PDF file that is very large and therefore difficult to email. This happens to me frequently when someone sends me a high-quality scan of a document in PDF form.

Unless I need the document to be high resolution, I like to make it smaller so that it takes up less space on my hard drive and can be distributed to other people more easily. You can do this using Ghostscript.

To install Ghostscript on Ubuntu, run (you guessed it!):

sudo apt-get install ghostscript

Then, to reduce the size of a PDF file, run a command such as:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Here, input.pdf is the file you want to make smaller, and output.pdf is your output file. Your input file will be left intact.

You can play around with the Ghostscript parameters to find the best balance between file size and quality. Keep in mind, of course, that making a file much smaller will reduce its quality, and sometimes, if a PDF has embedded images, there is no way to make it small without severely reducing readability.

But again, some PDFs are much larger than they need to be, and Ghostscript is a handy way to make them lean.

My One Wish: A Better Way to Edit PDFs Directly on Linux

The one thing I wish I could do more easily on Linux is edit the contents of a PDF without converting the PDF to a different format. On Adobe Acrobat, it’s possible to edit a PDF directly (although even there, the experience can sometimes be imperfect—Your fonts may be thrown off, for example), but on Linux, there’s no good way to do this.

You can edit PDFs in LibreOffice or GIMP and then export your new file directly back to PDF, but neither of these approaches works super well.
Still, there are plenty of very powerful things you can do with PDFs on Linux—and you don’t need Adobe for any of them.

Do you think you can beat this Sweet post?

If so, you may have what it takes to become a Sweetcode contributor... Learn More.
http://www.fixate.io

Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO.


Discussion

Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu