Little dh and Planting Seeds

The following is reposting of an entry I contributed to the THATCamp New England blog:

I am excited to have the opportunity to join THATCamp New England this November and look forward to learning from everyone I meet there. I was asked to post an entry here about the issues that I hope will be discussed at the event. I have no doubt many of the main themes I’m interested in will receive plentiful attention but I would like to bring up two issues that I find of particular importance: 1) the continued need for the appreciation of and promotion of what I’ll call the “little dh” of the digital humanities and 2) an action oriented discussion about the need to plant seeds within each and every department that promotes the cultivation of both real skills and the requisite appreciation for a spirit of experimentation with technology in the humanities not merely among the faculty but even more importantly as a part of the graduate curriculum.

Little dh

I have been unable to keep up with the ever-growing body of scholarship on the digital humanities but what I have read suggests that much of the work that has been done focuses upon the development of new techniques and new tools that assist us in conducting research and teaching in the humanities in roughly four areas: the organization of sources and data (for example Zotero, metadata practices), the analysis of data (e.g. using GIS, statistical text analysis), the delivery and representation of sources and research results (e.g. Omeka) and effective means for promoting student learning (e.g. teaching with clickers, promoting diverse online interactions).

I’m confident that these areas should and will remain the core of Digital Humanities for the foreseeable future. I do hope, however, that there continues to be an appreciation for digital humanities with a small “d” or little dh, if you will, that has a much longer history and I believe will continue to remain important as we go forward. So what do I mean by little dh? I mean the creation of limited, often unscalable, and usually quickly assembled ad hoc solutions tailored to the problems of individual academics or specific projects. In other words, hacks. These solutions might consist of helping a professor, student, or specific research project effectively use a particular combination of software applications, the writing of short scripts to process data or assist in creating workflows to move information smoothly from one application to another, the creation of customized web sites for highly specialized tasks, and so on. These tasks might be very simple such as helping a classics professor develop a particular keyboard layout for a group of students or particular project. It might be more complex, for example, involve helping a Chinese literature professor create a workflow to extract passages from an old and outdated database, perform certain repetitive tasks on the resulting text using regular expressions, and then transform that text into a clean website with automatic annotations in particular places.

The skill set needed to perform “little dh” tasks is such that it is impossible to train all graduate students or academics for them, especially if they have little interest or time to tinker with technology. “Little dh” is usually performed by an inside amateur, for example, the departmental geek, or with the assistance of technology services at an educational institution that are willing to go beyond the normal bounds of “technical support” defined as “fixing things that go wrong.” Unfortunately, my own experience suggests that sometimes the creation of specialized institutes that focus on innovation and technology in education has actually reduced accessibility for scholars to resources that can provide little dh instead of increased it because it is far more sexy to produce larger tools that can be widely distributed than it is to provide simple customized solutions for the problems of individual scholars or projects. One such center to promote innovative uses of technology in education I have seen in action, for example, started out providing very open-ended help to scholars but very quickly shifted to creating and customizing a very small set of tools that may or may not have been useful for the specific needs of the diverse kinds of scholarship being carried out in humanities. There is a genuine need for both, even though one is far less glamorous.

I hope that we can discuss how it is possible to continue to provide and expand the availability of technical competence that can provide help with little dh solutions within our departments and recognize the wide diversity of needs within the academic community, even as we celebrate and increasingly adopt more generalized tools and techniques for our research and teaching.

Planting Seeds

I have been impressed with progress in the digital humanities amongst more stubborn professors that I’ve come across in three areas: 1) an increasing awareness of open access and its benefits to the academic community, 2) an appreciation for the importance of utilizing online resources and online sites of interaction, and 3) the spread of use of bibliographic software amongst the older generation of scholars. This is, to be honest, the only areas of digital humanities that I have really seen begin to widely penetrate the departments I’ve interacted with both as a graduate student and earlier as a technology consultant within a university. I’m now convinced the biggest challenge we face is not in teaching the skills needed to use the software and techniques themselves to the professors and scholars of our academic community, but the pressing need for us to, as it were, “poison the young,” and infect them with a curiosity for the opportunities that the digital humanities offer to change our field in the three key areas of research, teaching, and most threateningly for the status quo, publishing.

There are a growing number of centers dedicated to the digital humanities but I wonder if we might discuss the opening of an additional front, (and perhaps such a front has already been opened and I would love to learn more of it) that attempts to plant a seed of digital humanities within every university humanities department, by asking graduate students to take, or at least offering them the opportunity to take, courses or extended workshops on the digital humanities that focus on: some basic training in self-chosen areas of digital humanities techniques and tools, the cultivation of a spirit of experimentation among students, and finally a more theoretical discussion on the implications of the use of digital humanities for the humanities in general (particularly on professional practices such as publishing, peer review, and the interaction of academics with the broader community of the the intellectually curious public). Promoting the incorporation of such an element into the graduate curriculum will, of course, be a department by department battle, but there are surely preparations that can be made by us as a community, that can help arm sympathetic scholars with the arguments and pedagogical tools needed to bring that struggle into committee meetings at the university and department level.

Zotero and DEVONThink

I have a bibliographic database in Zotero. The citation information is easy to scrape from web databases such as my own library, Amazon, and the many journal databases that I use. It is convenient to be able to tag and organize my sources and can use Zotero to format footnotes and bibliographies for my dissertation and other papers.

I recently shifted my note taking to a knowledge database called DEVONthink and as I recently discussed here, I created a special template script to add individual note files (like note cards in the old days) for each fragment or note I take on a source and keep them in a folder with, and linked to a main overview note file for the source. Each fragment created duplicates the tags of the main note file.

Automating a Zotero to DEVONthink Workflow with Applescript

I thought it be nice to automate the creation of a folder and a main note file in DEVONthink for each and every source I have in my Zotero database with a script. I also thought it would be nice if the script brought over any and all tags the source had in Zotero, including tags for whatever collections the source was found in. Finally, I wanted the script to only create folders and note files for those sources that I did not already have a folder for in DEVONthink so that I can run the script every once in a while to keep it synced with my Zotero without too much difficulty.

I created such a script this evening and it can be downloaded here:

Zotero to DEVONthink (Version 1.6 2010.10.16)

To use it:

1. Download and unzip the script

2. Open it in the Applescript Editor and edit the two configuration variables (the name of the group you want to put all the source folders and note files in, and the location of your Zotero database)

3. Put a copy of the saved script into your DEVONthink scripts folder and run it every time you want to import Zotero all your entries or, thereafter, check if there are new sources to be added.

4. Please read my notes in the script for more details on how it works and some things to keep in mind.

Note to developers who can do better:

My script accomplishes its task by using sqlite3 shell commands to directly query the Zotero database. While this is read only and seems to work fine, this is not the most graceful way of going about this.

Zotero has an open API and a developer who is less of an amateur than I could probably think of some effective way of using applescript and perhaps a combination of something else like a Firefox plugin to talk to this API directly in Firefox and get information out of the database that way, which is recommended by the Zotero team. Read more here, here, and here.

Other possible future improvements that I think could be made to this script or some combination of it and a Firefox plugin if someone has the time to work on it. I unfortunately don’t have any time to work on any of these. Please let me know if you add such features and I’ll post updates to the script:

1. If Zotero has any attachments, such as PDFs of the sources or snapshots for the webpages, the script could be improved so that these could be imported along with the entry.

2. It would be great if a formatted bibliographic entry for the source was added to the main note file created in DEVONthink. Currently this must be done by hand by dragging and dropping citation into the note file.

3. Any notes already in the Zotero entry for a source should be added to the main note file created in DEVONthink.

4. Ideally, the script would recreate the collections structure found in Zotero within DEVONthink.

5. Ideally, the script would check to see if any tags have been added or deleted from Zotero and not only add tags on the first import of the source.

6. Ideally, the script would keep track of itemID info for each entries and use that to judge whether an entry has already been imported into DEVONthink. That way the user can shorten or edit the folder titles etc. in DEVONthink without the script re-importing that entry because it doesn’t find an exact match by title.

7. Ideally, the script or a script and a plugin would somehow eliminate the need to be run – that somehow every time I added a new source to Zotero, DEVONthink would automatically get updated.


-I got rid of a “display dialog” command leftover from my last minute debugginge
-I added a check to see if Firefox is running, and gives the user the opportunity to quit Firefox, otherwise the script cannot run since the Zotero database is locked.


-The script now puts all the sources in a sub group “_All” (option to change the name) and then recreates the collection group hierarchy you had in Zotero. (Note: if you move or change the hierarchy of folders in Zotero and run the script again later, it will not delete the old groups or move them)

-I added a few more configurations, an option to control whether you are warned that Firefox will be quit, an option to add the author name to the name of the group/note file created

-Firefox is automatically restarted at the end of the script


-The script now supports logging. Every new group and file created will log an entry in a log file. This can be turned off and the location of the log file can be customized in the configuration section of the script.


There are some problems with the way it handles unusual titles of books and although the script works fine for me, I have seen some comments saying others are having trouble. Please post your comments on the script at the DEVONthink thread for the script where I hope we can get some help from more skilled scripters who might have time to work more on this:

DEVONthink and Zotero – DEVONthink Forum

Revisting the Note Taking Problem with DEVONthink

Though I continue to enjoy using the excellent software Scrivener to compose my dissertation, I am still unhappy with my note taking strategies and how I collect and organize this information digitally. After writing several postings on what I wish existed in terms of a software solution for doing research for a book or dissertation (1,2,3) and writing a little script to help improve the imperfect solution I have been using, I still find myself frustrated.

To summarize what I wish I had again in terms of a knowledge database:

As I make a note on a source, e.g. recording a single fact, fragment of information, observation, or summary of an idea from a work I want that piece of information to be taggable so that it can be easily found in the future when searching for that tag. I want to be able to add and tag many such notes quickly and efficiently, some of which are “under” others in the form of a hierarchical order, and which then inherit the tags of their parent notes so that I am saved a lot of repetitive tagging. Every single fragment or note must also contain some link, tag, or meta-data which indicates the source it came from (a book, article, archival document, interview, etc.) so that when I use that note in my dissertation or book, I can easily find the source it came from.1

DEVONthink Pro

I am in the process of shifting my note taking to a powerful knowledge database program called DEVONthink Pro. I was impressed at how quickly and easily I could import all of my nearly one thousand OmniOutliner documents, which I can now preview, search, tag, and group within DEVONthink. I don’t just want to reproduce my existing source-based note structure. I want to experiment with using this application to get just a little closer to my dream knowledge database described above. How am I doing this?

In DEVONthink, I create a group (which is what DEVONthink calls folders) for Sources.

Add a Group for a New Source – Each time I take notes on a new source (a book, movie, archive document, etc.), I create a group for it within this Source folder with the title of the source.

Create and Tag an Overview Document for the Source – In this newly created group I create a new text document with the name of the source in which I give some general information about that source (an overall description or summary) and give it some general tags that well represent the whole source.

Because DEVONthink also creates a gray colored pseudo-tag to every member of a group with the name of the group, any notes that go into this source group will contain a pseudo-tag indicating what source it is from.

Add Notes Using Customized Template Script – After creating and tagging the overview document, every time I want to add a note from this source, I select the overview document and invoke a keyboard shortcut connected to a DEVONthink template I have called “Note On Source” (I’m using Ctrl-Cmd-M) This invokes the creation of a hacked version of an existing template that comes with DEVONthink called “Annotation” written by Eric Böhnisch-Volkmann and modified by Christian Grunenberg. In its modified form the new template script does the following:

a. A new note is created in the source’s group
b. The new note gets a link created by the template script which links to it back to the overview document for the source (assuming it was selected when invoking the script).
c. The new note is then automatically tagged with whatever tags the overview document contained. I can then, of course, add further tags or delete any that may not be relevant to this particular fragment or note.

So what does this method accomplish?

Well, using this method, all my fragments, quotes, and notes from a particular source are together in its own folder, a typical default way of organizing one’s notes. However, every single note can also be found by searching for a particular combination of tags using DEVONthink’s various methods for looking up tagged items. Alternatively, one can create “Smart Groups” that include notes using certain tags. Every note contains a link back to its source, however, both through a direct link in the document, and through its pseudo-tag attached to the originating group. In short, one can find all notes related to certain tags without losing their source (or needing to input it manually in the note), and all notes related to a particular source. The default tagging of new notes on a source saves me a lot of typing, and I can just add any more specific tags relevant for that specific note.

Remaining Issues

Although I’m really impressed with the new 2.x version of the application, there are still a few things that I find less than ideal with DEVONthink to work in, some of which are no fault of the designers, but merely are a result of its developers not having the same specific goals that I have when they created the application.

1. Unlike Yojimbo or Evernote, DEVONthink supports hierarchical groups/folders. This is wonderful, and makes a lot of things possible. However, when selected, parent groups do not list contents of its child groups. Thus if I have a group called “Sources” and a sub-group called “Movies” inside of which I have files or groups related to individual movies, clicking on Sources reveals only an empty folder/group in the standard three pane view (or in icon view, a list of the folders that are in it) instead of all files under it in the hierarchy. Of course, the Finder and other applications often work the same way but it would be fantastic if there was an option to be able to “Go Deep” as one can when viewing folder contents in an application like Leap

2. Although I think someone could further modify the script I hacked to make this work, currently the system as I have it now does not permit no cascading notes: all notes are children of the original source, there aren’t any children notes of notes on a source. Thus the benefits of the kind of hierarchy of bullet points one is used to seeing in a note file is lost.

3. Because almost everything that was originally (in a note taking app like OmniOutliner) fragments that take form as hierarchical bullet points in single document are now fragment files in a hierarchy of folders, much of the power of viewing all of the content of these various fragments together at once is lost. DEVONthink lists all notes as files with single-line names. Ideally my dream note-taking software wouldn’t even need names for the fragments (my hacked version of the script just names them the date plus the name of the source) and would merely directly display the contents of notes so they can be seen juxtaposed with whatever other notes are in the list.

Downloading and Using the “Note on Source” Template

Again, I didn’t write this from scratch, but modified an existing template that comes with DEVONthink Pro. To use it, follow the instructions above. To install it:

1. Download the Script: Note on Source
2. Unzip the script and double click on the _Note on Source___Cmd-Ctrl-M.templatescriptd file inside. DEVONthink Pro will ask you if you want to import or install the template. Choose “Install” and it should now be active with the Cmd-Ctrl-M shortcut or directly in the menu at Data->New from Template->Note on Source

  1. My more ambitious and detailed description of this (including the idea of a “smart outline” which would then become possible) can be found in summary form in this posting. []

Using an iPhone 3GS to Scan Documents and Create PDFs

During my field research in Korea, Taiwan, and China I carried around a hefty camera with me to archives and libraries. On those fortunate occasions when I was allowed to use it, I snapped nice high-contrast “text mode” photos of everything from handwritten documents, mimeographed newspapers, pages of books, and thousands of pictures of microfilm reader screens zoomed in on a particular item. I also developed my own coding system to connect the numbers of the images in the digital camera to items in my notes in order to easily find the images again when I need them in my dissertation.

On other occasions I carried another smaller camera in my backpack for emergencies when I wanted to copy some pages out of books but the pictures were often blurry. I recently discovered, however, that the camera on my iPhone 3GS contains a good enough camera to take decent pictures of books and documents if you have moderate indoor lighting.

The Pics Need Processing

To get optimal results however, pictures of books and documents taken from an iPhone 3GS need to be processed: the contrast and brightness need to be turned way up, the size of the image can be significantly reduced in size (from about 1.1MB to 0.25MB each), and if you are making copies of an article or part of a book, ideally you want the result to be a PDF, not a folder full of pictures. Indeed, it is for this purpose I have logged dozens of hours standing in front of the various PDF scanners in the libraries here at Harvard that I wrote about here.

Processing these pictures is time consuming, and begs for a hack. iPhone applications like JotNot and PocketScan are a nice idea but I find them to be incredibly slow and awkward to use.

So I spent a few hours last night and came up with an inelegant but effective solution that, once set up, makes the whole process of getting iPhone pictures processed and into a readable PDF fast and painless. A real hacker would create a script that does all this for the user in a single step, and I would love to get my hands on such a script but in the meantime, in case there is someone out there who would find this useful, here is my current solution using OS X 10.6 and Adobe Photoshop CS3.


You only need to do these steps once to get your computer set up. but they are kind of convoluted. I’m sure someone out there has a more efficient method:

1. Create a folder somewhere easy to get to on your hard drive and call it “Convert”

2. Create a folder (in the same folder as Convert for example) and call it “Converted”

3. Open “Automator” in your Applications folder and create a new Automator workflow that looks like this:


Save this as a workflow that we can attach to the “Convert” folder as a folder action. In the top pop-up menu select “Other…” and choose the “Convert” folder which will contain the iPhone photos you will drop in to have converted into a PDF. The applescript will command Photoshop to do an action I have called “CreatePDF” which will process the images one at a time (see below). The automator workflow then grabs all the files, which Photoshop will save into a folder called “Converted” which you should indicate, and create a PDF from them. The final step cleans up the images in the Convert and Converted folder by deleting them. You can delete this step if you don’t want it to delete the images but I usually drop in copies or exported images so I don’t need them once the PDF has been created. You can if you like, download my Automator application version of this workflow here, modify it for your own use and folder locations and save it as a workflow. Keep in mind you need to change the path on the “rm” commands to point to your Convert and Converted folders.

4. Now we need to open Photoshop in order to create two actions. You can see what my actions look like below and create your own version, or download mine here, import them into Photoshop and modify them for your own needs. In the picture below you can see that I have one action called PrepPDF which actually processes a single image by a) changing from color to grayscale b) increasing the brightness and contrast and c) reducing the size of the image and d) saves the image as a JPEG and compresses it significantly. You may find that you want to process it in some different way. The second action, CreatePDF runs Photoshop’s batch command, performing the PrepPDF action on every image it finds in the Convert folder and saves the resulting processed image in the Converted folder.


5. Finally, in the Finder, right click on the “Convert” folder and choose “Folder Actions Setup…” and attach the workflow you created in Automator.

Now things are set up and you will be able to convert your pictures to PDF whenever you like by the means below. You won’t have to repeat the steps above:

If things don’t go right when setting up, make sure the files are all pointing to the right locations, the correct folders, and the correct names for the actions in Photoshop and the action set they are saved in.

Going from iPhone Pictures to Readable PDF

1. Take pictures of the document or book in decent lighting. Click on the screen to focus if it is not focusing properly. Would be nice to put together a nice copy stand to hold the iPhone up while you take pictures, but I’m not that kind of hacker.

2. Import your pictures of the documents/books from your iPhone into iPhoto or, via Image Capture, into your computer somewhere. I don’t recommend importing the pictures from the iPhone directly into the “Convert” folder as the copying process is slow and the script seems to speed ahead of the copying and end up with incomplete PDFs.

3. Open Photoshop. The script should launch it, but I find it work better when it is already open.

4. With Photoshop open, drag and drop your images (or a copy of them, by holding down the option key) into the “Convert” folder. It will run the Automator workflow, which will run the Photoshop action CreatePDF which will run PrepPDF on each picture found in the Convert folder, dump them into the Converted folder after processing them, and when it is done the Automator script will take those processed images in the Converted folder, create a PDF out of them, and delete all the images in both folder so it is clean and ready for the next job. The PDF will be found on the Desktop (this old Automator action seems to be broken in Snow Leopard and I can’t get it to save the PDF anywhere else).

With this I have been able to, even while standing in the stacks of my library, whip out my iPhone and, holding the book open, snap pictures of an interesting chapter etc. and process them quickly and easily into PDFs once I get home. Here is one short example of a PDF created from some pictures taken in the stacks with my iPhone.

Update: If you are converting a lot of pictures into a single PDF, the Applescript in the first command can time out. I added two lines to my workflow to increase the timeout from the default two minutes to 10 minutes:

tell application "Adobe Photoshop CS3"
  with timeout of 600 seconds
    do action "PrepSave" from "Default Actions"
  end timeout
end tell

iAnki for iPad Hack

I recently broke down and got an iPad. I use it mostly for reading PDFs on the run, watching movies, taking notes (with external bluetooth keyboard), and studying my daily flashcards.

After trying (and writing reviews of) many different flashcard programs over the years, and even designing some of my own many years ago, I become a loyal daily user of an open source project called Anki (read my review here). It is, in my opinion, the best program around that uses “spaced repetition” or interval study to prompt you only to review information that you are on the verge of forgetting. It helps me keep up on vocabulary in various languages and even serves as kind of daily “meditation of repetitive action” for me.

I can use Anki on my iPhone/iPad through a browser based script called iAnki but there were some things about the layout of the iAnki plug-in which I didn’t think worked well for the big screen of the iPad, which is now my primary way of studying vocab decks when I’m out of the house.

I made some changes to the HTML in the plug-in that I think work better for me. These include:

1. Increasing the font sizes of several fields. 2. Removing the “Show Answer” button and making most of the screen function as a “Show Answer” button so you you don’t need to reach and hit the button. 3. Moving the 1 and 3 buttons to the left edge where I can easily reach them while holding my iPad. 4. Moving the 2 and 4 buttons to the right edge where I can easily reach them.

For anyone out there also using iAnki on an iPad who want to try my hack here is what you do:

1. Download the hacked template here.

2. Unzip it and use it to replace the existing ianki.html file that is in
the iAnki plugin folder. For example, on my Mac the old ianki file is

~/Library/Application Support/Anki/plugins/ianki_ext/templates/

Replace that file with the new one you downloaded.

3. Open up Anki, launch the iAnki plug and install it on your iPad (you’ll need to
install and bookmark it again if you had it installed already)

If you use Anki, please support Damien’s programming efforts in Japan with a donation and congratulate him on his recent marriage.

Time to Walk the Walk

I am deeply frustrated with the sometimes closed atmosphere in academic life. I feel a profound discomfort when I encounter students and scholars who are paranoid that their research ideas will be stolen, that their sources will be discovered and, shock and horror, will be used by someone else. I’m simply incapable of sympathizing with them. I don’t like it when scholars pass around papers with bold warnings commanding me, “Do not circulate,” and I’m even less happy when I have been given handouts at a presentation only to have the speaker collect them again following the talk as if I was looking over instructor comments on a graded final exam. I feel my stomach churn as, to give a recent example, a professor opens up a database file of archival information and, smiling mischievously to the audience, declares that this is his “secret” source.

Such is life, people say to me, or else quote me some snotty French equivalent. That is the reality of this harsh academic world we live in. Well, perhaps I’m suffering from an early onset of old-age grumpiness, but I just don’t want to play that game. I don’t care that I’m still a graduate student, that job committees will look over everything they can find by me in search of sub-standard material, or that publishing firms will want me to explain why an earlier version of something I have submitted to them is available for download somewhere online. I don’t care if someone else finds some topic I have done some preliminary work on interesting, runs with it, and ends up publishing something on it. I may feel a momentary pang of regret that I didn’t get my own butt in gear and finish the project myself, but if they did a good job, then I really have no cause for complaint.

I’ve decided to just go ahead and start posting everything I produce academically, including short conference presentations and other research works in progress. You can find this material on a new research page here at Muninn.

OmniOutliner AppleScript to Append a Note to Selected Rows

In the last of three postings I wrote on note taking for the dissertation about a year ago, I proposed a kind of a note taking software that would allow researchers to link the huge gap between our notes on individual sources and that which we do when outlining and structuring large writing projects.

I argued that one of the key elements of this new note taking software would be that every bullet point in one’s notes carried within itself information about where it came from: what its source was.1 This would allow a graduate student like me who is writing a dissertation or a scholar who is writing a book to drag and drop individual bullet points of notes made on a given source, such as an archival document into a broader outline of a chapter or dissertation without having to make an additional note as to what source I got that bullet point from.

This is particularly useful if, like me, you have dozens of pages of notes, taken from hundreds of archival documents, books, articles, etc. but you want to extract individual points from these sources and compile them into a larger outline as you plan the writing process.

While the tooth fairy never produced such software for me, I have created a little AppleScript hack for my favorite note taking software OmniOutliner that gets me closer to what I want. Simply put, the applescript assigns some text to the “note” field of a group of selected bullet points in an OmniOutliner document. Thus, if those bullet points are dragged and dropped into another document, they will always carry with them whatever source information you assigned to those bullet points.

I created two AppleScripts: One which takes whatever text is in the clipboard and sets the “note” of each selected row in your notes to that text. The second script ignores the clipboard and asks you directly in a dialog box what text you wish to put into the “note” of each selected row. You may download these two scripts from my Huginn script collection:

Set Note to Source

I don’t get it, why would you want to do this? Watch my screencast where I explain what I’m trying to do. If the youtube video below doesn’t appear, you can visit it directly on youtube through this link.

(Sorry about the poor sound quality)

How do you get this work?

1. Download the script.
2. Place the “Set Note to Source in Clipboard” script in one of your script folders (either that for OmniOutliner or in the [your home folder]/Library/Scripts/
3. Open OmniOutliner
4. Select some text that corresponds to the source of some notes that you took, and Copy it via the edit menu (or Command-C).
5. Select one or more rows that you wish assign this source to.
6. Choose the “Set Note to Source in Clipboard” script from the script menu. If your script menu is not visible in the menubar, turn it on.
7. If you find yourself doing this often, consider adding a keyboard shortcut for the script, using triggers for quicksilver or fastscripts or the like.

  1. I also suggested adding the ability to easily add tags to a bullet point and those bullet points under it. []

Organizing Information for Dissertation Writing – Part 2 of 3

In the first of three postings on this topic I explained that I have become increasingly concerned that there exists a vast and empty middle layer of organization between the various primary sources, notes, and ‘notes on notes’ I have on the one hand, and my dissertation outline. I have felt the need to develop some way, while I’m still out here in the field conducting my research, of better tying up the many individual fragments of information I find in the sources with the arguments I want to make in the written dissertation.

I’d be very interested in hearing about how other graduate students have sought to resolve the problem of connecting the large quantity of notes, outlines, and unprocessed raw sources with the grand outline of a huge writing project like a dissertation. Below I describe briefly how I have essentially integrated this process into my own task management routine.

First, let me describe how I have been organizing the historical materials I have been collecting in the field and while back at university. Read on for the details. Continue reading Organizing Information for Dissertation Writing – Part 2 of 3

The Workshop Wire

In addition to my Muninn blog, I occasionally make contributions at the various Frog in a Well blogs, the East Asia Libraries and Archives wiki, some programming weblogs at Fool’s Workshop, and create various scripts and other creations. I have a Friendfeed profile but I don’t really like leaving this sort of thing to third party sites that come and go with the fads. I’ve decided to keep track of most of my own online projects with a little site and feed called the Workshop Wire.

Workshop Wire (The RSS Feed for it is here)

Script for Creating a Chinese Vocab List

The Problem: Let us say you have a list of Chinese words or single Chinese characters in a file. There are a lot of them. You want some easy and fast way of getting the pinyin and English definitions of that list of words or single characters and you want this in a format that can be easily imported into a flashcard program so you can practice these words.

Today I faced this kind of problem. There are lots of “annotator” websites online that make use of the free CEDICT Chinese dictionary but I have yet to find one which outputs a simple, and nicely formated (with all […], and /…/ stuff removed) tab delimited vocab lists.

I have recently been frustrated by the fact that I often come across Chinese characters that I haven’t learn, or, more often, characters that I only know how to pronounce in Japanese or Korean. I also am frustrated at the fact that I have forgotten the tones for a lot of characters I knew well many years ago when I studied Chinese formally.

Over the summer I want to review or learn the 3500 most frequently used Chinese characters, particularly their pronunciation, so that I can improve my tones and more quickly lookup compounds I don’t know.1

I found a few frequency lists online (see here and here for example) and I stripped out the data I didn’t need to create a list with nothing but one character on each line.2 Although it is an older list based on a huge set of Usenet postings from ’93-’94 you can download an already converted list of 3500 characters here.3

Since I’m not in the mood to look up 3500 characters one by one, I spent a few hours this evening using this problem as an excuse to write my second script in the Ruby programming language.

In the remote possibility that others find it useful who are using Mac OS X, you can download the result of my tinkering here:

Cedict Vocabulary List Generator 1.1

This download includes the 2007.8 version of CEDICT, the latest I could find here.4

How this script works:

1. After unzipping the download, boot up the “” applescript application. It will ask you to identify the file you want to annotate. It is looking for a text file (not a word or rich text file) in Unicode (UTF-8) format with either simplified or traditional Chinese characters or word compounds, one on each line.

2. This application will then send this information to the convert.rb ruby script which will search for the words in the CEDICT dictionary in the same folder, format the information it finds (the hanzi, pinyin, and English definition), including the putting of multiple hits for the same character/word within the same entry with the definitions numbered. It does not currently add the alternate form of the hanzi (it won’t add simplified version to traditional or vice versa).

3. It will then produce a new file with the word “converted” added to its name. It will create tab-delimited files by default but you can change this by changing this option at the top of the convert.rb file in a text editor.

4. Though this version of the script doesn’t do this yet, you may want to run the resulting text through the Pinyin Tone dashboard widget or a similar online tool such as the one here or here. That will get rid of the syllable final tone numbers and add the appropriate tone marks. I am having a bit of trouble converting the JavaScript that my widget and this site uses into Ruby so if anyone is interested in working on this let me know!

If the script doesn’t work: make sure you are saving your text file as UTF-8 before you convert. I am also having trouble when my script is placed somewhere on a hard disk where the path has lots of spaces. Try putting the script folder on your Desktop.

Note: If you don’t have Mac OS X but can run Ruby scripts on your operating system, you may be able to run my script convert.rb from the command line. It takes this format:

convert.rb /path/to/file.txt /path/to/cedict.u8

UPDATE 1.1: The script now replaces “u:” with “ü” (CEDICT uses u:).

  1. The top 3000 make up some 98-99% when their cumulative frequency is considered. []
  2. A few of the frequency lists I have seen have Cedict dictionary data included but not in a very clean format []
  3. I notice that there is a high frequency of phonetic hanzi for expression emotion in the postings and some other characters one doesn’t come across as often in more formal texts, I actually don’t mind []
  4. If you find a newer version (in UTF-8) put it in the same directory as my script and name it cedict.u8 []