Using an iPhone 3GS to Scan Documents and Create PDFs

During my field research in Korea, Taiwan, and China I carried around a hefty camera with me to archives and libraries. On those fortunate occasions when I was allowed to use it, I snapped nice high-contrast “text mode” photos of everything from handwritten documents, mimeographed newspapers, pages of books, and thousands of pictures of microfilm reader screens zoomed in on a particular item. I also developed my own coding system to connect the numbers of the images in the digital camera to items in my notes in order to easily find the images again when I need them in my dissertation.

On other occasions I carried another smaller camera in my backpack for emergencies when I wanted to copy some pages out of books but the pictures were often blurry. I recently discovered, however, that the camera on my iPhone 3GS contains a good enough camera to take decent pictures of books and documents if you have moderate indoor lighting.

The Pics Need Processing

To get optimal results however, pictures of books and documents taken from an iPhone 3GS need to be processed: the contrast and brightness need to be turned way up, the size of the image can be significantly reduced in size (from about 1.1MB to 0.25MB each), and if you are making copies of an article or part of a book, ideally you want the result to be a PDF, not a folder full of pictures. Indeed, it is for this purpose I have logged dozens of hours standing in front of the various PDF scanners in the libraries here at Harvard that I wrote about here.

Processing these pictures is time consuming, and begs for a hack. iPhone applications like JotNot and PocketScan are a nice idea but I find them to be incredibly slow and awkward to use.

So I spent a few hours last night and came up with an inelegant but effective solution that, once set up, makes the whole process of getting iPhone pictures processed and into a readable PDF fast and painless. A real hacker would create a script that does all this for the user in a single step, and I would love to get my hands on such a script but in the meantime, in case there is someone out there who would find this useful, here is my current solution using OS X 10.6 and Adobe Photoshop CS3.

Preparations

You only need to do these steps once to get your computer set up. but they are kind of convoluted. I’m sure someone out there has a more efficient method:

1. Create a folder somewhere easy to get to on your hard drive and call it “Convert”

2. Create a folder (in the same folder as Convert for example) and call it “Converted”

3. Open “Automator” in your Applications folder and create a new Automator workflow that looks like this:

Save this as a workflow that we can attach to the “Convert” folder as a folder action. In the top pop-up menu select “Other…” and choose the “Convert” folder which will contain the iPhone photos you will drop in to have converted into a PDF. The applescript will command Photoshop to do an action I have called “CreatePDF” which will process the images one at a time (see below). The automator workflow then grabs all the files, which Photoshop will save into a folder called “Converted” which you should indicate, and create a PDF from them. The final step cleans up the images in the Convert and Converted folder by deleting them. You can delete this step if you don’t want it to delete the images but I usually drop in copies or exported images so I don’t need them once the PDF has been created. You can if you like, download my Automator application version of this workflow here, modify it for your own use and folder locations and save it as a workflow. Keep in mind you need to change the path on the “rm” commands to point to your Convert and Converted folders.

4. Now we need to open Photoshop in order to create two actions. You can see what my actions look like below and create your own version, or download mine here, import them into Photoshop and modify them for your own needs. In the picture below you can see that I have one action called PrepPDF which actually processes a single image by a) changing from color to grayscale b) increasing the brightness and contrast and c) reducing the size of the image and d) saves the image as a JPEG and compresses it significantly. You may find that you want to process it in some different way. The second action, CreatePDF runs Photoshop’s batch command, performing the PrepPDF action on every image it finds in the Convert folder and saves the resulting processed image in the Converted folder.

5. Finally, in the Finder, right click on the “Convert” folder and choose “Folder Actions Setup…” and attach the workflow you created in Automator.

Now things are set up and you will be able to convert your pictures to PDF whenever you like by the means below. You won’t have to repeat the steps above:

If things don’t go right when setting up, make sure the files are all pointing to the right locations, the correct folders, and the correct names for the actions in Photoshop and the action set they are saved in.

Going from iPhone Pictures to Readable PDF

1. Take pictures of the document or book in decent lighting. Click on the screen to focus if it is not focusing properly. Would be nice to put together a nice copy stand to hold the iPhone up while you take pictures, but I’m not that kind of hacker.

2. Import your pictures of the documents/books from your iPhone into iPhoto or, via Image Capture, into your computer somewhere. I don’t recommend importing the pictures from the iPhone directly into the “Convert” folder as the copying process is slow and the script seems to speed ahead of the copying and end up with incomplete PDFs.

3. Open Photoshop. The script should launch it, but I find it work better when it is already open.

4. With Photoshop open, drag and drop your images (or a copy of them, by holding down the option key) into the “Convert” folder. It will run the Automator workflow, which will run the Photoshop action CreatePDF which will run PrepPDF on each picture found in the Convert folder, dump them into the Converted folder after processing them, and when it is done the Automator script will take those processed images in the Converted folder, create a PDF out of them, and delete all the images in both folder so it is clean and ready for the next job. The PDF will be found on the Desktop (this old Automator action seems to be broken in Snow Leopard and I can’t get it to save the PDF anywhere else).

With this I have been able to, even while standing in the stacks of my library, whip out my iPhone and, holding the book open, snap pictures of an interesting chapter etc. and process them quickly and easily into PDFs once I get home. Here is one short example of a PDF created from some pictures taken in the stacks with my iPhone.

Update: If you are converting a lot of pictures into a single PDF, the Applescript in the first command can time out. I added two lines to my workflow to increase the timeout from the default two minutes to 10 minutes:

tell application "Adobe Photoshop CS3"
  with timeout of 600 seconds
    do action "PrepSave" from "Default Actions"
  end timeout
end tell

4 thoughts on “Using an iPhone 3GS to Scan Documents and Create PDFs”

Anthony Barker says:

2010.5.11 at 10:25

I am not a huge fan of pdfs – they get large and unwieldy quickly

Why not use the comic formats? often called .cbz, .cbr and .cbt). All you need to do is have them numbered and then zip or rar them

http://en.wikipedia.org/wiki/Comic_Book_Archive_file

An alternative is DJVU
http://en.wikipedia.org/wiki/DjVu

and here is a comparison of the djvu and pdf
http://www.djvu.org/resources/djvu_digital_vs_super_hero_pdf.php
Sayaka says:

2010.5.15 at 16:12

you nerd.
Muninn says:

2010.5.20 at 21:31

Dear Anthony, thanks that is a great idea. I understand there are an increasing number of apps for that on iPad. To keep it simple for people I exchange PDFs with, I think for now I will stick with that older format but am certainly open to embracing another format in the future.

I couldn’t find any good OS X DJVU viewers – the only one I found looks like crap. Hope more apps embrace it in the future if the compression is significantly better.

Also looks like there are slim pickings for cbz/cbr readers. If they go more mainstream I will definitely look at that again!
Ken says:

2010.10.27 at 21:48

I took a peek at your sample pdf. It’s too bad that you aren’t working with roman script. Devonthink Pro Office has built in OCR. I don’t think it will work with east Asian characters though.

Comments are closed.