Sed, cat, grep, tr and other Linux command-line tools are well known to programmers and sysadmins.
But these Linux programs can be handy even if your job is not in development or ITOps. Here’s a list of ways that I use Unix CLI tools in my non-tech life.
1. Merge PDFs
Combining multiple PDF files into a single PDF takes a long time if you do it through a GUI app. It requires lots of pointing, clicking and searching. A faster way is to use pdftk, an open source CLI tool for editing PDFs.
You’ll probably need to install pdftk, since it is not pre-installed in any Linux distribution I know of. But on Ubuntu, it’s a simple apt-get install pdftk away.
Once installed, pdftk lets you merge PDFs with a command like:
pdftk file1.pdf file2.pdf cat output combined.pdf
You can do lots of other stuff with pdftk, too. For example, you can extract only a specific range of pages from a PDF file and merge those with another file.
2. Convert word processor documents to PDF
Where do I get my PDFs in the first place? In many cases, they’re from word processor files that I converted to PDF.
And if I converted my documents to PDF using the GUI dialogue in my word processor, I’d be an old man by the time I finished. I have better things to do with the next forty years of my life.
So, to avoid the tedium, I use soffice, the CLI tool for the LibreOffice word processor. It can convert a .doc (or .odt, or .docx or almost any other kind of word processor file) file to PDF using a command like:
soffice –convert-to pdf *.doc –headless
This will convert all .doc files in the working directory to PDF files.
The soffice program supports lots of other conversions, too. Pretty much any file type that is supported by LibreOffice will work with soffice.
You might want to note, though, that soffice has some quirks. For one, it doesn’t work if another instance of LibreOffice is already running. (There is a workaround here, which involves starting soffice as a different user, but I forget the exact syntax.) So you have to close LibreOffice first.
I have also found that soffice is finicky about its syntax. You can’t move arguments and flags around as easily as you can with most Linux CLI tools, and the exact syntax required seems to vary somewhat by Linux distribution. The command above works on Ubuntu 16.04. (It may complain about not being able to find a Java runtime, but the command still works.)
3. Count instances of words in a file
Ever find yourself needing to determine how many times a particular word appears within a document or web page?
A quick and easy way to find out is to save the document as a text file, then use the programs grep and wc to do the math for you. For example, this command would count how many times the word “Linux” appears within the file somefile.txt:
cat somefile.txt | grep -o Linux | wc -l
You could make the search case-insensitive by passing the -i flag to grep, by the way.
4. Make an invoice
As a freelance writer, I often have to make invoices for blog posts that I publish. And as a lazy and disorganized person, I do a poor job of keeping my own records of my publications. Instead, I wait until my invoice is due, then log into WordPress to pull data about how many posts I published in that month.
If I copy and paste the WordPress “Posts” page to a text file, I get a messy bunch of text, which looks like this:
"5 Cool Unikernels Projects Edit | Quick Edit | View Christopher Tozzi Features ClickOS, Clive, Jitsu, Mirage OS, Rump Kernels, Unikernels 0No approved comments22 pending comments Published 2016/08/24 Good Bad Select Containers vs. VMs: Which Virtualization Solution is Better? Containers vs. VMs: Which Virtualization Solution is Better? Edit | Quick Edit | View Christopher Tozzi Features —No tags —No comments Published 2016/08/22 Good Good Select Docker, System Containers and VMs: Virtuozzo’s Take Docker, System Containers and VMs: Virtuozzo’s Take Edit | Quick Edit | View Christopher Tozzi Features containerization, docker, OS, system containers, virtual machines, Virtuozzo —No comments Published 2016/08/17 Good"
But I can use grep and sed to pull out just the parts I want, like so:
cat hey | grep -v Edit | grep -v Christopher | grep -v comment | grep -v keyword | grep -v “Last Modified” | grep -v Published | grep -v Select | grep -v Bad | grep -v Good | grep -v OK
(I know: That grep command could be a lot cleaner. I do not claim to be grep wizard.)
The result is nicely formatted text, which I can send to my very nice editors in order to get paid:
"5 Cool Unikernels Projects 2016/08/24 Containers vs. VMs: Which Virtualization Solution is Better? 2016/08/22 Docker, System Containers and VMs: Virtuozzo’s Take 2016/08/17"
5. Downloading web pages with wget
Sure, it’s easy enough to save a web page directly from your browser.
But what if you want to save, say, one hundred web pages? Saving each one manually would take a very long time. That’s why, when I want to do something such as pull library catalog records for a hundred different search terms, I do it with wget. I write a short bash script with a for loop, feed it the list of search terms, then have wget iteratively download the HTML results from each search term.