Tech | Muninn | Page 2

Chinese Input Method: QIM

Apple’s Macintosh operating system and the Chinese language have a long history. Many years ago, when I was an undergraduate college student, well before the advent of Mac OS X and the rise of Unicode, I was already happily inputing Chinese on my Mac and delighted in amazing my friends with the Apple Chinese voice recognition software I had gotten soon after its release in 1996. Meanwhile, PC users I knew across campus and the world were drowning in the technical challenges of mysterious programs such as Twinbridge and its earlier and more obscure competitors. I know from my own experience as a former tech support geek at Columbia University that the legacies of these issues continue to haunt Chinese language departments around the US.

With Windows XP, however, Microsoft finally started getting their act together and created a typically clunky but still relatively easy method (with about a dozen clicks + the use of your OS cd) for adding Chinese input to a non-Chinese OS. Since then I have felt that the Mac Chinese input options lagged behind, especially in the convenience of inputting traditional characters (繁體/繁体). The “Hanin” input method was something of an improvement, but with tens of millions of customers in China using pirated copies of Windows XP and only a handful using the more expensive Macintosh solutions for their computing, it is not surprising that Apple has lost its innovation edge in the area of Chinese input.

Well, I have apparently been somewhat out of the loop since mid-2006. Today I took a few minutes to skim through a year or two of the postings on the Google Group “Chinese Mac.” Thanks to this I was able to learn about a fantastic new piece of software for the Mac:

QIM Input Method ($20)

You can read a bit more about the software on the internet’s premier resource for (English language) information about inputting Chinese on the Mac.

I would recommend anyone who inputs Chinese frequently on the Mac to try out QIM, which is fantastic. I dished out the $20 within 10 minutes of confirming that the software works in all the basic work applications I frequently use Chinese in (Omnioutliner, Microsoft Word, Wenlin, Apple Mail, iFlash). QIM produces characters in real time as you type, has amazing shortcut options, and optionally defaults all output to traditional characters.

PDF Scanner – A Researcher’s Lifesaver

During the past year or so, and especially in the last few weeks, I owe a great deal of thanks to a machine I call a PDF scanner, since I don’t know what it normally called.

The scanner looks like a photocopy machine with a computer screen attached to it. Like a regular photocopy machine you can use the glass or the feeder on top to copy documents and books at the same speed as you might expect from such a machine. However, instead of charging you money and outputting these copies on regular paper, the result of the free scan is displayed as thumbnails on the screen to the right. When you are happy with the resulting scans, you may save them together as a PDF (or as separate image files) and have the file sent to a USB drive or to a server of your choice via FTP.

The machine can be set to a number of resolutions (200 dpi and up) and scans in black and white, grayscale, or color. You may also indicate the paper size of the scanned image. If you are using the feeder tray, you may scan either single or double-sided documents. The model I have used on campus does not have shrink or enlargement features available and lacks some of the other advanced features we are used to dealing with on a regular photocopy machine. However, if you are scanning English language documents, there is one wonderful extra feature: Putting a check next to “Hidden Text Layer” will direct the machine to OCR the scanned pages of text and make the PDF documents searchable. The accuracy is far from perfect, but more than good enough to make those usually dead images great for keyword searching.

This machine, and in one case a different variation of it, can now be found at several library locations throughout the Harvard campus. Competition for its use is heavy in some libraries, especially those where visiting researchers are desperate to copy materials before they return home and want to avoid the costs of large amounts of photocopying and the weight of carrying these copies back.

The advantages of this machine are huge:

1. Use of the machine is completely free (at least on our campus). This has probably saved me hundreds of dollars in the past year and a half or so.
2. Except when scanning poor quality documents or large amounts of double-sided documents using the feeder tray, there are far fewer jams and other problems which arise with using a photocopy machine.
3. There is no wasting of paper or ink. No paper also means no lugging around heavy photocopies.
4. The scans are at a very high speed and surpass the speed of any but the most expensive personal scanners and is much faster than most document feeding trays I have seen.
5. The scanner’s glass is much larger than all but the most expensive personal scanners and can thus easily handle very large books.
6. The OCR text recognition provides no opportunity for correcting mistakes but is transparently built into the scanning process. You never actually see it happen. It adds only a short time to the final saving of the document as it is transferred to the USB drive. This dramatically reduces the time the OCR process would take if you were to do it after scanning documents on a personal scanner with something like OmniPage Pro or using Adobe Acrobat Professional or other tools.
7. Easy OCR means searchable PDFs which means faster research through your own scanned materials.

Potential general complaints from the perspective of librarians and researchers:

1. The product is a scan – which you view on a screen. This is less fun to read than on paper and less convenient to annotate and scribble on.
2. Free and fast copying means that violating copyrights in the library is now free and fast too. Since the products are PDF files, rather than a single hard copy, it is easier than ever to distribute these PDF in ways that violate copyrights.

What have I found this useful for?

1. I digitized the entire Sino-Japanese studies journal, which is now hosted online. I have been wanting to do this project with Josh Fogel for a long time and only with the introduction of these PDF scanners around campus has it become something manageable with a limited budget of time.

2. I have boxes upon boxes of photocopies that I have made throughout the years. Dragging them around is a pain. The PDF scanner has allowed me to eliminated several boxes of paper (I simply haven’t had the time to go through them all, and I want to keep some highlighted materials and materials that don’t scan well). These documents are now all on my computer, and backed up on other media.

3. I often take handouts from presentations, various mail and personal documents, and scan them up quickly using the document feeder.

4. Any books I might need to have as reference in the field but which I don’t want to bring with me in my baggage, I simply scan up before I go. It takes me about 30 minutes to scan a 300 page book, or about ten pages per minute. It takes another 2 minutes to save the book if you choose black and white at 200 dpi. This means that many of my favorite history books in my field are not only on my computer, but those in English are easily searchable, thanks to the OCR feature included on the machine. I can then leave the original book in storage while I travel around in East Asia. When you are sitting in an archive or on a train in the middle of nowhere, without any internet connection or access to Google books and other search engines – there is nothing like being able to search through a lot of locally stored data on one’s own machine.

Wish List for the Future

1. As more and more people around Harvard campus discover the power of these machines to reduce paper and produce OCRed PDF files of everything from our personal papers, I have watched as competition for their use has exploded – especially for the PDF scanner in Harvard-Yenching library. I hope that the librarians come to see that the advantages outweigh the disadvantages and add more machines to the collection. I would also love to see PDF scanners in libraries and especially archives around the world. The National Archives, for example, is perfectly happy to have me click away with my personal camera at thousands and thousands of pages of articles but still charges considerable photocopying fees. If the archives had a PDF scanner (perhaps the alternative kind found in Harvard’s Widener library Philip Reading room which is face-up rather than face-down and thus less damaging to books) they could seriously cut on machine maintenance fees while providing an incredibly valuable service to researchers.

Obviously the question of copyright needs to be addressed – but the solution is not to cripple the gains from technology advances that improve on existing tools that perform the same essential task: the paper-based photocopier, the slower personal scanner, and the camera, all of which we have had for years.

2. I would love to see these machines support OCR in many more languages.

3. It would be nice for there to be some kind of semi-automated “submission” or “registration” system for scanned materials so that eventually you can reduce the physical burden on the scanned materials in libraries and archives. If certain pages, articles, or archival documents have been scanned before, and are found in the system, then you could simply retrieve this previously scanned document and thereby contribute the preservation of the original by not subjecting it further copy.

4. I would like these machines to have more options than the software they currently have provide such as enlarge/shrink options, crop features, auto-crop features, more media size options, much better color scans of glossy photographs, etc.

Honorable Mention

Another similar machine that I also owe a lot to recently is the Microfilm PDF scanner. A number of my recent postings at Frog in a Well and contributions to the Frog in a Well Library refer to documents that I found on microfilms. The documents I have been uploading are PDFs directly created by the PDF scanning software on the computers attached to the microfilm reading machines that I use in the Government Documents section in the basement of Harvard’s Lamont library. It works very much like the microfilm printers we have seen in libraries for years but this time the product is a PDF rather than paper copies. Like the regular PDF scanner above, all these scans are free and allow me to easily share my findings with others.

Hack: Griffin AirClick USB for use with iFlash

I got a cheap used Griffin AirClick for USB to control my older laptop Macintosh by remote control. Another remote I like better (KeyPOINT) has been acting up so I got the Griffin as a replacement. The downside with Griffin is that it has fewer buttons, no mouse control, and a limited set of applications that it works with. One of the applications that I want to use the remote with is the best flashcard program on the Macintosh, iFlash. I use this almost every day to practice Korean vocab and other languages. Since this is not one of the supported applications, this afternoon I hacked the AirClick.app program that comes with the remote to add support for iFlash. You may download my modified version of the AirClick application here.

For those who wish to add support for their own program I briefly outline how I did the hack below:
Continue reading Hack: Griffin AirClick USB for use with iFlash

Microsoft Book Search

Microsoft’s new book search site Live Search Books is, well, live. It doesn’t work in the Safari browser, but it works fine in Firefox. I have only played around with it a bit, but I can already say that for those interested in doing historical research, the new Microsoft book search offers two major advantages over Google book search, despite the fact that the former only provides search results for books out of copyright (mostly before 1923).

I have at once lauded but also complained about severe flaws in Google’s book search in an earlier posting here at Muninn and also at a Frog in a Well posting. My two biggest complaints at this time are:

1) Not all books which are clearly out of copyright are fully viewable at Google search. Sometimes only partial view, or “snippet view” is available.
2) Though there is the wonderful feature of PDF download on Google book search for books that they recognize as out of copyright, once you download the book, you cannot search the document within your PDF viewer because Google does not supply the text layer for these documents.

Microsoft’s book search has neither of these problems as far as I can tell:

1) All of the books I have clicked on can be downloaded as a full PDF
2) The PDFs I have downloaded are fully searchable on the text layer.

This is wonderful news and I hope Google Books will respond accordingly. Microsoft book search still seems a bit rough around the edges and doesn’t have the nice new smooth scroll view that Google Books recently added, but I am very happy to see that there are two competing services in this area. I hope the Microsoft search will continue to add books, and also, hopefully, consider adding materials out of copyright for later periods when this can be determined.

Create Yojimbo Note: Applescript for Apple Mail

My last few postings have all been tech related. Since most of my history related postings are also Asian history related postings, I post those increasingly at Frog in a Well. I’ve been in a hacking mood in the last few weeks, especially since I’ve been working on some new projects at Frog in a Well in my spare time. My last two scripts are essentially shortcuts for creating iCal To-Dos. The first provides a fast way to use Quicksilver to create To-do items without leaving whatever application you are working in. The second provides a way for you to email yourself To-dos that then automatically tell iCal to create a To-Do with content from the email.

The script below is not related to To-Dos but is also aiming to increase my organization a bit. I use a program called Yojimbo (it is a great program but I think the title and its Karate Kid icon are both cheesy) made by Bare Bones, the same people who make BBEdit, the essential text editor for web programmers and coders of all stripes. Yojimbo is a program which keeps snippets of notes, web passwords, serial numbers, and best of all: you can drag and drop any web page you are viewing onto a little tab it keeps at the side of the screen and Yojimbo will download a static copy of the web-page and store it along with all your other notes. Great for when you think you might be away from the internet and want to keep the content handy for later searching.

Sometimes I’m away from my laptop, and as in the case of emailing myself To-Dos that I want created automatically in iCal, there are times I jot down notes from something and want to email myself those notes. It would then be nice for Mail to automatically detect those emails, and create a new Note in Yojimbo with the appropriate title and contents. I hacked together a little script which does just that.
Continue reading Create Yojimbo Note: Applescript for Apple Mail

Create iCal To-do: Applescript for Apple Mail

In my last posting I shared a script I put together to use with the shortcut application Quicksilver to quickly create To-do items in iCal. I am now using this all the time. I have separate scripts, based on the same which create a medium priority “READ: ” to-do for things I want to put on my reading to-do list, a “LOOKUP: ” medium priority script to remind myself of things I want to look up at some point (assigned to a different color calendar), and a high priority to-do script in its own iCal calendar.

Now, what if I’m away from home, am without my laptop computer, but have access to a nearby terminal with an internet connection, and I want to create to-dos for myself. I synch iCal with my Palm pilot so I could easily write down the to-do directly into my Palm. However, Palms are still not fun to use when you have a lot to write down. Today, I was in the Fung library reading some Chinese historical journals and wanted to remind myself to look up a few things I found there and come back and read some articles I didn’t have time for today. There were internet terminals all over the place so instead of entering the titles (in Chinese) on the Palm, which is a pain and time consuming, I emailed myself some email messages with the subject “LOOKUP: ” or “READ: ” and then what I wanted to lookup/read. In the body of the email I put more information, such as a URL or notes to myself. I decided that tonight when I came home I would hack up an applescript that used Apple’s mail filter to automatically find those messages as they arrived in my inbox, create the appropriate to-do in iCal (like the Quicksilver script) with the subject line of the email, put the body of the email into the todo’s notes, and then move the message into an archived folder. The results are below.
Continue reading Create iCal To-do: Applescript for Apple Mail

Create iCal To-do: Applescript for Quicksilver

While waiting for a plane here at the airport I got bored and decided to make a little Applescript for Quicksilver. For those of you who don’t already use Quicksilver and use a Macintosh computer, I recommend it as a way of significantly increasing your productivity. It is hard to explain what it is but it acts like a launcher and provides easy ways to give shortcut commands for all sorts of tasks.

My simple goal was this: I want to be able to add a “To-Do” item to iCal very quickly. I don’t want to have to switch to iCal, press Command-K, then type in my summary (sometimes this is slow in iCal and the focus doesn’t always jump to the new item correctly, I find), and then choose “High Priority”

Instead I: a) Use my keyboard shortcut to activate Quicksilver b) Type “.” and the text for the To-do item, c) press “Tab” and start typing “todo” (the name of the script) d) press enter.

The number of steps may seem the same but it is all through the keyboard and iCal adds the to-do item in the background so I have found it to be very handy. The script adds the to-do item to the “Home” calendar to keep it simple. I may one day expand the script so you can first time a number, corresponding to the number of days until the to-do’s “due date” but I can’t be bothered.

You can download the script I wrote here. Full text of the script below:

UPDATE: There is a much more powerful script which can do much more for those who want a more advanced but more comprehensive solution. See this Hawk Wings entry for an explanation of Benjamin Harley’s new iCal action.
Continue reading Create iCal To-do: Applescript for Quicksilver

Fighting the Korean Internet Again

Does anyone have an ID/Password at the Korean newspaper Hankyoreh (or the means to create one easily for me?) who might be willing to share their access with me? If so, I would be most grateful if I might use it to access their web page (you can email me via the contact link above).

I don’t know when this happened but I can’t view archived articles anymore without logging in (I assume it is behind a simple registration like NYT instead of paid access). However, as always, I have to fight the twin problems of the Korean internet: 1) I’m not Korean. 2) I don’t use Windows and Internet Explorer.

Being a foreigner and using a Macintosh is pretty much suicidal for internet use in Korea. I had to wait 1 full week for Naver.com to inspect my Norwegian passport photo (which they required me to upload) and make sure that it matches my registration info. I was thrilled that I could register at all as many places require me to have a Korean citizenship/residence number, but having to wait this long is ridiculous.

Now, yet again, I have to go through this horrible process with Hankyoreh, though I had hoped it wouldn’t take a full week for my registration to come through. Today ended in complete failure and frustration though. When I tried to register through the special “foreigner” registration page at Hankyoreh, and after choosing “Other European Country” (since Norway wasn’t important enough to get listed) I gave them my Norwegian passport picture for upload and was all ready to go. Then the 2nd problem arose: Horrible programming. For some reason, no matter what I put in my birth year, either my real date of birth or any other number from 1-2006, it tells me that I haven’t entered my birth year. This is classic Javascript validation gone bad.

I really hate it when lazy programmers do Javascript validation or other web scripting and then only test it on Windows with Internet Explorer…

In this case, they slapped some crap together, as they often do, and wow – it worked in Internet Explorer on Windows – so that means it will work for everyone, right? I will happily spread the word that Korea is a place where the internet and technology is making great strides…as soon as web programmers and designers can master absolutely basic programming skills and create standards-compliant web sites. Like so many other websites I have struggled with around the world, this lack of quality on large scale commercial sites is really unacceptable. It might as well be 1995 all over again.

In this particular case, the Javascript Console in Firefox shows more than 30 errors for the registration page…I’m lucky the year of birth was the only thing that didn’t work…

Automatic Paper Generation

Some of you may have heard the news that some MIT students got their computer generated Computer Science paper, consisting of grammatically correct mumbo jumbo, accepted as a non-reviewed paper at some conference.

You can create your own paper with their code here. My friend Jai and I were discussing how fun it would be to add to this code the ability to feed it a selection of essay texts, say a collection of essays by Bhabha or Spivak to take two examples that come to mind, so that the nouns and verbs it chooses are roughly in correspondence to the frequency with which they appear in work by these scholars. It would then be interesting to see if readers can actually tell the difference.

If someone ever gets around to adding this feature, I think all you would have to do is 1) employ some kind of textual analysis algorithms that must be out there already (they use it for example in the processing of texts for speech recognition software or in linguistic studies doing word frequency analysis), 2) take the output of analysis and reconstruct the file “system_names.in” which is in the scigen code package you can download from there site. 3) Add some kind of frequency control to the code to make sure the output used nouns/verbs in a similar frequency to the originals.

Music Plasma

I played around a little with the Music Plasma website. Put in a favorite singer and get a spatial map of music “surrounding” that artist. It is fascinating to use, but more importantly, when combined with something like the iTunes music store for browsing the songs of various artists (30 second snippets of each song) it has been incredible in introducing me to new artists or ones I have simply never bothered to listen to before.