I finally declared war on my PDF organizing system. I am struggling to manage some 2400 or so PDF files on my computer. This huge number of files consists of downloaded or scanned journal articles, newspaper articles, historical documents, PhD dissertations, books, and various personal documents that I got sick of dragging around the world in fat disorganized folders.

This week I tried to find a dissertation I had read part of which talked about the relationship between liberalism in Japanese domestic politics under the premiership of Hara Kei and the aftermath of the March 1st movement in colonial Korea.

Did I file this PDF away in the Academic Papers/Korea/ folder, the Academic Papers/Japan/ folder, the Documents/To Read/ folder, the Docs/Dissertation/MUST READ/ folder, or was it still stranded in the downloads folder? Apparently none of these and I still haven’t been able to find the damn file. My folder system is an embarrassing mess.

However, in the 21st century, where tagging rules, why should my folder system matter? Why can’t I tag the stupid files and be done with it. If each PDF file can have a dozen tags that I could easily search through later. The file I described above, for example, could be tagged as “Academic Papers, Korea, Japan, colonial period, Taisho, liberalism, political, Hara Kei, March 1st Movement, dissertations”. Well, in order to do this I broke down and paid for the PDF indexing software Yep (Mac only, I’m sure there is something similar out there for those unfortunate Windows users out there).

I feel better now and have already made serious progress. With tag clouds, smart folders, and an iTunes like interface in Yep, I’m hoping I will gradually overcome my jumbled mess of PDF files and therefore be able to write my own dissertation in no time. Or not, but at least I feel less like I’m hunting for a document in a bombed out archive. I hope that future versions of the software will allow the option of including not only .pdfs but also images since scanning documents saved as PDFs is much slower than taking pictures of documents. I have thousands of crisp high contrast black and white photos of historical documents and newspaper articles that I would love to be able to tag in the same way without having to do it in a separate piece of software.

There are half a dozen other programs out there like Yojimbo, DEVON products, etc. which also allow you to store PDF files in a database of various kinds of data that might also include regular text, images, and so on. However, what I don’t like about them is that they are 1) often slow to import the PDFs 2) they are usually importing the PDFs into the program’s database thus swelling the size of the DB and slowing down its overall performance. I am not really interested in having over 30,000 pages of PDFs all inside the DB of a program and would like to keep them scattered in various places on my hard drive only to be indexed by a program by Yep.