It is not often that I come across a tool which does something I’ve been trying to do for a long time. In this instance the tool is for PDF manipulation. This blog entry is primarily a reminder to myself of how to do this, but hopefully it will help someone else. Scanning documents and converting them to PDF files is fairly simple these days thanks to freebies such as Foxit Reader. However, for anything more complicated (page insertion, deletion, rotation, etc) I’ve always run into problems finding free tools. I do own an old copy of Adobe Acrobat, which can do many of these functions but it is on an old computer and I don’t seem to be able to move it thanks to Adobe’s licensing regime. In the past I’ve managed to find a variety of tools to do these one off PDF manipulations, say a merge of two documents, but no real “workhorse” tool or programme which is worth keeping around. However – today I found a real gem!
Some background. I’ve started scanning a variety of documents rather than keeping paper copies. Single pages are typically no problem, as described above. However, today I had the need to scan an A5 booklet formed of A4 sheets. Each printed portrait A5 page was half of a landscape A4 sheet. The page numbers of the booklet were as follows:
A4 sheet1: A5 pages: 12 / 1 (reverse side: 2 / 11)
A4 sheet2: A5 pages: 10 / 3 (reverse side: 4 / 9)
A4 sheet3: A5 pages: 8 / 5 (reverse side: 6 / 7)
In the past, I tried to solve this type of task by cutting the A4 sheets in half and then scanning the resultant A5 sheets. This worked in a fashion but the scanner had reliability issues when feeding the A5 sheets. I figured there must be a better way. Turns out there is, and with a bit of Googling I found this.
Three free tools can accomplish what I want. These tools obviously need to be installed/available on your computer.
- Foxit Reader – for the scanning of the document (any “scan to PDF” tool will be fine for this)
- Briss – for cropping the scanned A4 PDF pages into A5 PDF pages
- PDFtk – for rotating and reordering the PDF file’s pages – PDFtk Server (the command line tool) is described herein
PDFtk is the real gem which I found today. Super powerful and free!
The process to scan and process such a booklet and get a usable resulting PDF is as follows.
Remove any staples from the booklet and check the pages remain in order
This is fairly straight forward. Check the pages remain in order and that they will pass through the scanner without issues. Check for any wrinkles or bent corners. The key is to ensure that the pages scan as smoothly and repeatedly as possible.
Scan the double-sided A4 sheets
Using a typical tool, such as Foxit Reader, scan the double-sided A4 sheets, into a PDF titled ff1.pdf. I ended up with alternating upside down, rightside up sheets. So the PDF pages were as:
PDF page 1: pages 12/1 upside down
PDF page 2: pages 2/11 rightside up
PDF page 3: pages 10/3 upside down
PDF page 4: pages 4/9 rightside up
PDF page 5: pages 8/5 upside down
PDF page 6: pages 6/7 rightside up
Rotate the upside down pages
We need to get all the pages in the PDF file to be the correct orientation. This is a breeze with PDFtk. In my case, I used the following command line:
pdftk ff1.pdf cat 1south 2 3south 4 5south 6 output ff2.pdf
Reading the documentation further, I discovered that one could instead use:
pdftk ff1.pdf rotate oddsouth output ff2.pdf
This command rotates pages 1, 3 and 5 by 180 degrees and outputs the resulting PDF to ff2.pdf. We now have a PDF with the scanned pages correctly orientated but each PDF page consists of two A5 sheets:
PDF page 1 : pages 12/1
PDF page 2: pages 2/11
PDF page 3: pages 10/3
PDF page 4: pages 4/9
Crop each A4 PDF page into a pair of A5 PDF pages
This is where the Briss tool works its magic. Briss enables PDF files to be cropped. In this case we want two crop regions on each PDF page. So we load ff2.pdf into Briss. We then define two crop areas on each page. Note that Briss overlays all even and all odd numbered pages so that only two crop definitions are required for multi-page PDFs. A single crop area is displayed by default for both the even and odd stacked pages. A second crop area can be created by clicking and dragging on the stacked pages. Carefully define two similar sized crop areas over the pair of A5 pages on each displayed A4 stack. Once the areas are defined generate the new PDF, ff3.pdf
Reorder the PDF pages
The resultant PDF, ff3.pdf, should now have all the pages as A5 looking pages but they will be out of order. In my case the PDF pages contained the following booklet page order: 12, 1, 2, 11, 10, 3, 4, 9, 8, 5, 6, 7
We turn again to PDFtk and run the following command:
pdftk ff3.pdf cat 2 3 6 7 10 11 12 9 8 5 4 1 output ff4.pdf
This creates a PDF file, ff4.pdf, with reordered pages. The first page in the new PDF was the second from the input PDF and so on.
Enjoy the completed PDF
We now have a completed PDF containing the individual pages from the booklet, all correctly ordered and rotated. A little bit of work, sure. But much easier than manually trying to scan each individual page.
A further comment. I think the PDFtk Server tool is fabulous. It is a command line tool and I can see myself returning to it time and again. It is seriously powerful with a vast array of options. I am sorry and amazed that I’ve not come across it before. There is a free GUI version available which isn’t as powerful and a paid-for GUI with a similar feature set.
I stumbled upon your ‘page’ and wondered whether you would advise me regarding my PDF requirements. I simply want a reliable and relatively accurate program which I can use to convert an occasional PDF document into a M.S.Word document, for subsequent adaption to my own specific needs. I have looked at many such programs and they seem to me to be too expensive and/or too complicated.
Am I correct in thinking that “PDFtk” could be a program the would fit my needs? Any advice will be very much appreciated.
For initial scanning of documents, I highly recommend “NAPS2”, it’s free and open source (windows). I haven’t tried it on WINE on linux. New features are continually being added:
Word 2013 (and Office 365 ProPlus) have a native support for opening and editing pdfs. So if you have the latest Office you don’t need to convert the file. Naturally, if the pdf is protected, it can’t be edited in Office.
Word 2013 (and Office 365 ProPlus) DO NOT HAVE a native support for opening and editing pdfs. They only can save documents to the PDF format.
At the Command Prompt… enter pdftk –help
the complete help file is there.