When Archive Digitization Goes Wrong

Last week I paid a visit to a wonderful archive in a medium sized city of Shandong province, China. There I looked up various documents from the 1940s for my dissertation research that are a bit more local in scope than those I have been looking at in the Shandong Provincial Archives here in Jinan.

The archivists were incredibly friendly, and warned me in advanced that they didn’t think they would have too much from the period I was looking at. After providing the letters of introduction that are required at most archives in China and having the way paved for me thanks to a phone call from a contact I made in Jinan, I was allowed to search for documents using their digital database. They even gave me a free lunch from their cafeteria on the first day and a free copy of a book they had published that I was interested in getting containing documents from the wartime period.

Unlike the provincial archives, this archive found their collection manageable enough to scan and store digitally copies of all the files and make them available for viewing by visitors in place of the originals. Unfortunately, I was not given the option of looking at the originals instead. Also unlike the provincial archives, the online search of their database seems to return results from a much larger proportion of materials that are found by searching for the same on their internal database.1 They did not allow me to save any of the digital TIF image collections of individual documents onto a USB drive2 but I was allowed to print documents and, after their contents was checked over by the archivist3, to make off with these environmentally less friendly non-digital printouts.

Unfortunately, almost everything that could have been done wrong with this digitization program and its presentation to the visitor did. So let me list of the issues as a warning to other, especially smaller archives, that might consider going the digital route. I have listed them from the least worrisome to most serious:

1) Environment: The computer designated for viewing of documents had a cheap monitor with little screen brightness (even when set to full) which faced a window where sunlight beamed into the room (even when I convinced them to partially lower shades), providing a horrible viewing experience and harm to the eyes. An uncomfortable mini-mouse, horrible chair, and a table with almost no spare room for visitors to put a notebook or their laptop made this a nightmare to spend any length of time looking at documents.

2) Software: The custom built database software had an advanced query system which is useful for advanced users and archivists but requires multiple stages to search and although I quickly got used to it, I think it would confuse users not used to such systems. Also, when it shows images of archive files, a lot of vertical screen space is wasted on software options and interface components, which leads to a great deal of scrolling at any zoom level that makes reading possible.

3) Page Numbers: At the archive in question I requested a lot of documents where essentially local versions of other documents that I had seen before from other districts. Having seen many originals of this kind I know most of them are one small A5ish sized sheets of very thin paper that are held together with string. Despite the age of these documents, surprisingly I have never run into paging issues at the provincial archives, mostly because I’m seeing them still stringed together. By contrast, pages were all over the place in these documents in their digital form. While it is possible they were already unstringed and in messed up order when the contractors got the documents, I suspect that they got messed up through negligence when the originals were unstringed in order to be scanned.

4) Indexing: This is a very serious problem I found with all but two of the 70 or so documents I looked up during the two days I was at the archive. Before coming to the archive, I used the online database I made a list of file names and file numbers for documents I was interested in. I brought these to the archive and looked up the same numbers in the internal database. Each file number, unfortunately, corresponds to a packet of multiple files ranging, at least judging by what I saw, from 15-50 or so in number. I could then easily locate the appropriate document by its file name and open the images directly in the system. To my horror, in all but two of the cases, the documents in the file images did not correspond to the file name. For each document I would have to hunt through the other dozen or several dozen documents in the same general area to find the images for the file I was looking for. Sometimes I was never able to locate the file, suggesting that those images are probably found in other file groups, if at all. Now, what am I supposed to do as a historian when I cite the documents I did find? I’ll record the correct file numbers, found in the database, but any other historian wishing to confirm the information I am citing will look them up and find a completely different document unless the archivists have gone in and fixed all the indexing issues throughout their scanned collection.

I asked two of the archivists about this issue and I essentially got a, “That is funny. Well, just hunt through the rest of them and find your document. It’s probably like that for this whole collection. We paid a contractor to have it done and didn’t have the resources to check all their work.”

5) Quality: The documents I’m looking at are Communist public security bureau reports and Communist party internal reports. Some of them are hand written or are characters carved onto a special surface that allows a sort of reproduction process frequently used in the 1940s (any printing history buffs know what this ancient photocopying method is called?). In either case, they are very difficult to read, faded with time, on surfaces that are themselves often in poor condition, and most importantly, written in tiny sizes. If you are going to digitize these kinds of documents, then, you need to digitize them with a much higher quality. As I mentioned in my posting on triage in the archives, I have had to sometimes completely skip some of the more hopelessly unreadable documents or those for which the pages per hour drops to a rate that makes the investment of time not worth it. I would say that this happens in perhaps 1/10 documents I look at here.

Now, take these same kinds of documents and scan them. If you scan them well, at high resolution and with color, then you can actually make those difficult to read but important sections more readable thanks to the power of zooming in on parts of the image. However, that is not what happened here.

The contractors here decided to take these extremely difficult to read originals and scan them in black and white (not even in greyscale!). Now I know the evidence seems to suggest that if you are going to run a massive scale OCR program on historical newspapers, for example, then black and white is not significantly worse than greyscale. However, OCR is not even worth trying on these hard documents, unless there are some major breakthroughs in artificial intelligence. If, however, you are trying to use human eyes to read difficult to read handwritten or carved Chinese characters on poorly preserved mediums, you need to preserve as much of the quality of the originals as possible. The cost benefit analysis done in this case resulted, in the case of many documents, in completely unreadable digital copies.

This really left me depressed. In the case of the completely botched indexing described in number four above, an archivist or the hired contractor can go back and meticulously re-index the documents so that they point to the correct images. Since some of the documents have visible page numbers, messed up page numbers might also be fixed in those cases. However, I suspect it is harder to go back and explain to the budget committee, “Ya, our contractor blew the scanning job and made thousands of once barely readable documents in our collection now completely unreadable to visitors. Can we pay to do the scanning all over again?”

I came back to Jinan yesterday morning and felt incredibly happy to go back to reading similar documents in my own hands.4 Digitization can do amazing things for improving access and preservation. When the Japanese national library set about digitizing all Meiji and now Taisho period publications I found myself complaining mostly about the slower speed at which I could browse or skim through the books. I didn’t find that readability itself suffered too much during the process. In a case like these far more difficult to read wartime Communist documents, however, sloppy digitization of these documents, only gradually opening up to researchers and historians, actually reduces rather than increases access.

  1. When I asked one of the archivists at the provincial archives why they did not provide full online access to the database, rather than a very small sampler of the full internal database so that visitors could come prepared with a list of documents to request, I got a bewildered and serious look, “Do you want to put me out of a job?” This answer only makes sense if you realize that one of the primary duties of two of the archivists is to sit at the database search engine and help first time visitors search for documents. Given the fact many of the, especially older, visitors are completely computer illiterate, however, I still believe their services would continue to be required to help elderly comrades who come to search for their records. []
  2. though, as was the case with the Korean national archive, it would have been simple enough for a less scrupulous person to do this given the access to the “Save As…” option in the file menu and apparent lack of any security on the machine I was given access to. In fact, in the case of the Korean national archive at Daejeon, web browser access was restricted but I was able to confirm, at least as of 2008, the DOS command line still gave me FTP access to my server where I could have uploaded hundreds of pages of Korean archive documents they were requiring me to wastefully print and pay for, had I been so inclined to disregard their rules. []
  3. A bizarre and surely unnecessary step, since the documents have been screened once when they were added to the database for classified information. I could easily note down in my notes anything I read in the documents before printing them so not letting me keep the print outs hardly serves to prevent sensitive or privacy violating information from leaking out. If privacy issues are primary there should be a system, like the one at the Korean national archive, which charges the visitor to process accessed documents to redact out the names of people mentioned. At the Pusan branch of the Korean National Archive I paid about $50 and waited three days to get access to some old police logs. It took that much time because they had to go through and erase the names and provide me copies. However, I’m still grateful I got access at all. Although this is an important issue that deserves consideration, I generally feel that the privacy laws of Korea and Japan are far too strict and that they seriously inhibit serious historical work from the 19th through the period I’m working on in the mid-20th century []
  4. Note to super friendly archivists: if you encourage a visiting PhD student to eat while looking at the documents by suddenly (and generously) giving him a handful of juicy baby tomatoes, you might end up with a bit of tomato juice on one of the pages of part two of the 1946 treason elimination report from the Donghai public security bureau of the Jiaodong district. []

A Night in Changdao

I’ve been outside of Jinan this week, traveling about a bit. Yesterday I caught a ferry from Penglai (蓬莱) to a group of islands known as Changdao (長島) county which I had been told were well known for their scenic beauty. I had a day left of traveling with no specific plans and it seemed like a nice quiet place to spend a day before I head back to Jinan for my last week in China. I arrived in Changdao late in the afternoon and after checking into one of the only hotels open before the summer tourist season starts in May, I wandered about the town a bit. I didn’t ever get outside the sleepy fishing town in the south of the islands either that evening or the next morning when I caught the ferry back to the mainland. Instead of making it out to see the Changdao National Forest Park and Changdao National Nature Reserve, instead I mostly roamed about the back streets of the town and port.

I couldn’t help noticing that the locals gave me more than the usual amount of attention with a much higher frequency of gasps, cries of “Laowai!” and in one case a mother in a grocery store giving a short lecture to her child, surely too young to understand, about what this monster in their midst was (“You have never seen one of those before, have you? Don’t be scared. A foreigner is someone from another country and they don’t all look like us…”). This is nothing new, of course, to those who have traveled outside the major cities of Asia and I simply attributed this to the natural curiosity for non-Asians I have experienced throughout the countryside of Japan, Korea, and China.

During that first evening, though, I learn something about Changdao almost by accident. Walking back to my hotel late in the evening I passed by a TV shop where my iPod detected a wireless internet connection. I stopped outside the shop to download some email, and, since I really knew nothing about the place I was visiting, at least downloaded the Chinese and English wikipedia articles for the islands on my little offline Wikipedia client on my iPod. When I read the article later that evening, I found the English page had these two surprising paragraphs:

Changdao Island is closed to non-Chinese nationals. Westerners found on the island are swiftly taken to the passenger ferry terminal and placed on the next ferry back to Penglai by the islands Police service. Islanders promptly report all “outsiders” to the islands police service. (First hand experience) Police explain the reasons for this, due to the high number of military installations on the Island.

The Changdao Islands are now open to non-Chinese nationals, including westerners This was agreed by the local and national governments as of 1st December 2008.

Given the fact that non-Chinese nationals have apparently only been permitted on the island since December, and the tourism season hasn’t really started, the relative isolation of these islands may not have been the only reason there was extra surprise at the sight of a (visibly identifiable) foreigner in their midsts.

The next day, I checked out of the hotel, and made my way back to the ferry terminal. On the way, I walked over to the nearby TV shop to download my morning email (I know, I’m an addict). A middle aged man across the street yelled at me to stop. None of the many townspeople I had come across the day before had stopped me but armed with my new knowledge about the island I nervously complied. He came up to me and asked me if I had registered with the police. I told him I hadn’t. He asked me what I was doing on the islands, where I had stayed, etc. I answered honestly. Although he was polite, he said he wouldn’t let me go until he had called the police to ask if I had registered yet. I explained I hadn’t registered but I had only arrived the night before1 and, at any rate, was now on my way to the ferry terminal to return to the mainland. “Ah, he said, but why are you going this way, when the ferry terminal is that way?!” Fortunately, a little more explanation made him understand that I simply wanted to walk a few more meters up the road to steal a wireless connection I had come across to check my email before hopping into a cab and going to the ferry terminal. At any rate, I avoided this concerned citizen’s detention, and the potential time-consuming process of going to the Changdao county police station to register myself.

Two notes to the Changdao authorities:

1. If I hadn’t downloaded that Wikipedia article, I never would have known there was any special status for the islands or any kind of military installations. Only the English wikipedia entry, and this 2005 blog entry from someone who was blocked entry some years ago alerted me to the fact, and only after I had checked into my hotel on the island. If foreigners need to take care to register when visiting the scenic islands or are subject to other restrictions, perhaps a sign anywhere in the ferry terminal2, or perhaps somewhere on the nice English language website for Changdao county where I am welcomed to the, “peaceful, sincere, civilized and beautiful Changdao for business investment and holiday!” If there is some kind of required registration procedure, can I recommend that one be able and asked to do this upon arrival at the ferry terminal or when one checks into the hotel (the hotel didn’t even look inside my Norwegian passport when I checked in). Finally, if a potentially military adversary like the United States really wanted to send a spy to reconnoiter your military bases on the islands, do you really think it would be a good idea to send an easily identifiable caucasian instead of one of its many citizens of Asian or similar complexion or even better, a hired local?

Continue reading A Night in Changdao

  1. I think foreigners are technically supposed to register with the police everywhere in China within 24 hours of their arrival, and I did register in Jinan soon after my arrival, but almost no tourists traveling in China register in every city they stay in, At any rate, this registration he spoke of is not thus a Changdao specific requirement. Technically though, I hadn’t yet reached the 24th hour and I was off the island before my time ran out. []
  2. I confirmed there is no special information in either Chinese or English posted about the status of the islands when I returned to Penglai[]

Legal MP3 Downloads in China via Google

I listened to a great podcast recently of a Columbia University SIPA sponsored talk by Kai-Fu Lee on Google’s many different efforts to compete in the China market (Find their China site easily at g.cn). One of the things Lee mentioned was the initial difficulty of competing with the MP3 downloads available, often illegally, through Baidu, their now scandal plagued competitor.

Google went into some kind of licensing agreement with Chinese music distributors and now provides download of a lot of Chinese music in an even easier fashion than that of their competitors. The service, however, is only provided to Chinese users with a Chinese IP address to avoid cannibalizing the music industry’s income outside of the mainland where illegal music download is, unlike China, somewhere below 100% of the available market.

I have to say, having now used this Google China service, I’m very impressed. This is really like the old Napster days back with a vengeance, at least for Chinese music – but this time it is actually legal!

Here is a step by step demonstration of how one gets the MP3 of a song in China through Google:

1. Search for the song’s name. Google Suggest, unlike the US, is on by default in China because, as Kai-Fu Lee says, “Typing Chinese is hard.”


2. If google recognizes the search item as a song it knows, above other web links for the given search, you will get album art and a series of special music links, including direct links for listen (试听), download [as MP3] (下载), link for the artist, etc.


3. Click on the download link, and a pop-up window results, showing the size of the file, its format (MP3), and a big green download button. There is also a banner advertisement, where I presume some of the revenue is generated for the music industry.


Click the download button, and you will soon have a downloaded 192kbps quality MP3 of 许巍’s song 难忘的一天, complete with lyrics. Unfortunately, the encoding of the metadata is not Unicode so it doesn’t show up correctly in iTunes, but it is easy enough to copy/paste this info from the Google download window.

The ease of this process really impressed me. Google China and the Chinese music industry are way ahead of the game here. I don’t know if they have found this distribution mechanism to be profitable but from the user’s perspective, this is really hard to beat.

Apartment Heating: Democracy at Work

In some countries you pay your electricity every month to an electric company, based on what amount you have used for the preceding month. When you set up the account they will come by and check your meter and begin the count. In the Komaba International House in Japan, near Tokyo University’s Komaba campus, where I lived for a few months as a student, you “charged” your room with electricity and the amount still remaining on your account was displayed conveniently on a little meter near the entrance to one’s room.


Everyday you could see how much “juice” you had left and could make a guess as to whether you had enough to make it through another day. This was a reasonable system, once you got used to it, although the dormitory had its other issues.

Here in my apartment in China, they opted for another method. You charge your room with electricity, as one does at the Komaba dormitory in Japan, but the only place you can read the meter is hidden deep in the bowels of the pipe room of one’s floor where it is accessible only to a custodian with a flashlight and a hefty collection of keys.

So when I came down with a horrible fever and cold this week, and was drifting in and out of consciousness, I was not happy to discover that, in the middle of the night, my electricity shut off, and therefore my electric heater, because my charge had run out. Stumbling around in the darkness, knocking over a cup of cold tea leaf filled tea, I managed to make my way down to security and they got the poor janitor up to charge my room with another 10RMB (the maximum amount the janitor is allowed to accept from me outside of regular hours), which, it turned out, provided only another 8 hours of continuous heating for my room with my electric heater.

So, as you can imagine, I have been eagerly awaiting that beautiful moment when “the heating” turns on for my building (and my city? I’m not sure, but this is usually a pretty centralized operation in China) and I can stop wasting electricity on my (less efficient) electric heater. The nice steamy water pipe heating I have been waiting for made for a cozy and comfortable winter when I lived in China last time in 1999-2000, as long as one didn’t touch the pipes at their entry point. I’m not exactly sure by what mysterious process it gets decided by the powers that be that it is cold enough to have heating, but, believe, me, it is.

Thus consider my dismay when I get on the elevator in my apartment complex today to see a sign that says,

“53 households said they want heating this winter, while 34 households said they did not want heating this winter. In accordance with the law, since less than 70% of the households in the building want heating this winter, we will not be turning on the heat.”

It looks like, in addition to not getting any of the mail sent to me here in China over the last month despite several confirmations of my address, somebody else might have used my ballot for this crucial election…

Reporting Residency in Japan and China

In many countries you’re required to register your place of residence with some local government body. I think this is technically also the case in Norway, although I’ve rarely done it, but it’s something that I’ve now had to do both in Japan and China. In Korea, either because of my unusual visa status during my longest stay of a year, or because I simply ignore the rules regarding residence registration, I have yet to experience this process.

When I lived in Tokyo and Yokohama the registration of my place of residence was done almost immediately after my entry into Japan, because this was one of the first steps that allowed you to function in Japanese society, and do things like open a bank account, get a cell phone, and other such important initial steps to starting a life in the country. This is because the registration of one’s residency is combined with the getting of a foreigner registration card. Once, when I moved from an international dormitory near Tokyo University to a new apartment in Kichijôji, I had to go through the registration process again since the new location was technically in the city bounds of Musashino, a city in the suburbs of Tokyo.

Overall I’ve been really impressed with the smooth nature of the process in Japan, even though getting one’s registration card can take a few days and finding the local ward or city office can sometimes be a challenge if you’re fresh off the boat. Generally speaking Japanese ward and city offices, especially in larger cities, have relatively good services for foreigners and I was always impressed with the fact that when I registered myself, even though I went through application process for a foreigner registration card, I still felt like I was being welcomed into a given community. I was often offered a handful of brochures about local athletic and health facilities, trash filtering and recycling, and other information. The two times I lived in Japan a year or more I was able to get registered into the Japanese national health insurance program which allowed me to get access to relatively cheap and high quality Japanese public health services, often offered at prices much cheaper than that I might get in the US. Also, the local city and ward offices often host a number of community activities, language classes and other cultural activities, and sometimes make an extra effort to reach out to foreigners in the community through these activities and the providing of information in multiple languages. This probably wasn’t always the case but I do get the impression that Japan has come a long way in addressing the increasingly large foreign community in its cities.

Today I went through the registration process for my residency in China for the first time. Although I lived in China twice before, once for three months and once for a year, on both of these occasions I lived in a dormitory on a university campus and I don’t remember going through a similar process.

After staying in a hotel for a few days I found an apartment here by going directly to a real estate agent (believe it or not, my first stop was the nearby Century 21, which has branches in all the neighborhoods around Shandong University). Compared to my experience here in Jinan, I found this to be a much less foreign friendly process in Japan and also in Korea as in both places it’s relatively difficult to find short-term housing options that aren’t quite expensive for very modestly sized apartments. However, despite the fact that I’m in a provincial capital with most likely a small number of foreign students I was surprised to see that real estate agents almost immediately offered me a number of options when I told him I was looking for an apartment to rent for a few months. I ended up renting an apartment from a military officer who owns a number of small places in the area in clean and recently built apartment complexes, and although the real estate agent told me that registering my new residency with the local police station was more trouble than it was worth, the international office at my university told me that I did have to go through this process. It was very different from that of the Japanese process.

I was first directed to the local ward police station and then sent up to the second floor where the plain clothes police officers work in various offices there. After a short conversation with a female police officer in charge, I was then sent back to the local police corner branch located just around the corner from my apartment complex to meet someone she called on the phone there. There I was met by a friendly elderly police officer who was to join me for an “inspection.” He told me that he had to accompany me back to my apartment in order to see if my apartment was “appropriate” or not. Now, I’m sure that there were some rational reason behind this and there is some logic at work here which I’m just not aware of, but I did feel kind of strange having a police officer escort me back to my apartment, inspect it, and decide whether or not it was appropriate as my residency. He did come back with me, poked around my kitchen and bathroom, and inspected my bookshelf but didn’t go as far as opening drawers or inspecting the contents of my refrigerator. He was friendly throughout and we had a little chat before we went back to the police station. I was then sent back, again, to the main ward police station which is about 15 minutes walk down the road, where I had to fill out a registration form for outsiders taking up residency in that particular ward. I was then sent back, again, to the corner police station where I filled out another form to register me as a resident of that particular block. They also contacted my landlady to get some more information from her. Although there was a lot of going back and forth and at times it seemed like I was one of the only foreigners who had actually gone about this process, since several of the police officers seemed somewhat unfamiliar with the procedures, things went smoothly enough so that I only had to spend a single afternoon on the process. Thinking about all the visa nightmares faced, and hoops needing to be jumped through by non-imperial citizens traveling to the US, I really don’t think I have much reason to complain.

Jinan Used Book Market

I have just gotten settled in here in Jinan, in Shandong province, China. Except for a few weeks in Shanghai and Nanjing, I’ll be here until the end of next April doing my dissertation research affiliated with Shandong University.

A young history masters student who has been helping me out since I got here and showing me around the libraries of the university invited me to join him for a trip to the used book market here. He told me he makes the trip down there every two or three weeks to look for good deals on academic history books on his period.

The used book market is open on weekends from around 8am until noon in Sun Yatsen park (中山公园). There are perhaps close to a hundred bookstore stalls and open-air table-based vendors. The selection varies widely of course, with some stores specializing in books on Chinese medicine, others on test prep books, others on Chinese literature, but most have a wide selection of what appear to be left over stock from bookstores. I’m guessing this since many books are cut partly on the spine to distinguish them from new books. I was surprised to see such a large selection of academic and especially history books, including collections of historical materials, obscure reference books, and historical journals. Amazingly, and thanks to the good eyes of my friend, one of the 18 books I bought today for just over $10 was a very useful pamphlet put out by the office of the Shandong provincial historical society that I had noted down for future copying only a few days earlier in the library of Shandong University’s history department. It has an index of periodicals published in Shandong from before 1949, with list of extant issues and which library or archive in the province still has those issues (建国前山东旧期刊目录1903-1949).

The price of the academic books on history I was looking at currently seem to average around 5 RMB (less than $1) but many books go for 1, 1.5, or 3 RMB. Sometimes, and I have no idea what market forces are at work here since it really seemed quite arbitrary, prices could go as high as 10 or 30 RMB. Perhaps a bookseller catches a glint in the eye of the purchaser indicating that he desperately wants a copy? Regardless, considering that many of the books in question go for 30-50 RMB new, these books are quite heavily discounted, in contrast with the Japanese used book market for academic works.

The used book market clearly draws a lot of students and there was an excellent showing from the department I’m affiliated with. I was told there are currently 13 graduate students in the history department of Shandong University, mostly masters students. A good half dozen of these were in the book market today prowling for good deals. These students would often keep an eye out for books each of them might have particular interest in and sometimes made cellphone calls to friends absent who might appreciate them snapping up some bargains. They would also compare prices with each other and use it in their efforts to bargain. One student found a Chinese translation of a volume of the Cambridge History of China for just over $1, while another who heard about this was frustrated in his efforts to bargain down a separate copy found elsewhere to under its $5 price. I was also interested to hear that students had been directed to snap up available copies of one of their professor’s books to give them. While the professors can buy somewhat discounted copies of their own books from the publisher, it is even cheaper to get them, or have their students get them from the used book market, perhaps for use as gifts to friends.

I’m really impressed at how much some of these graduate students seem to know about Chinese history works coming out the US and with their excellent critical skills and strong curiosity for new approaches to history. One student invited me to some kind of history reading club in the afternoon and said he wanted me to share with them what good stuff was being published in the US academic field on Chinese history. I explained that I had been out of the country for a while and had been reading mostly Korean and Japanese history of late so that I wasn’t really up to date on trends in English language scholarship on China, but that I was willing to pass on a few orals lists used by graduate students in the US. I was surprised to be assured that this wasn’t necessary since all they really needed were Chinese history books newly published in English in 2007 and 2008!




More pictures available in a variety of sizes can be found here.

Quotational Quarantines

As historians, we often engage in the liberal use of quotations to sanitize and quarantine distasteful terms or phrases that lend legitimacy to a category or a way of referring to an institution or other body. The use of these quotes, which I confess to frequently using, presumably robs such terms of their nomenclatural power and further serves to establish distance between us and the ideas and terms we enlist to talk about the past.

Finally, use of these quotation marks excuses us from having to spend time analyzing the terms themselves, putting them aside as if to say, “Yes, yes, this is a very inappropriate term that needs careful and sensitive discussion, but since I’ve a lot to do in this essay, I just can’t be bothered at the moment to deal with it.”

Some people seem to feel that the aesthetic impact on one’s work is such that the frequent use of quotations is just not worth it, or perhaps feel that we simply aren’t accomplishing anything useful by using them for direct translations or referrals to terms as they were used decades or centuries ago. However, not using quotations or confronting problematic terms can earn the ire of book reviewers, as I discussed in a response to a review of the book Collaboration by Timothy Brooks. Brooks was criticized for used the term “pacification teams” to refer to the units the Japanese called “pacification teams” in occupied China during the war even if he is anything but sympathetic to the Japanese in his book.

One strategy is to use quotations once, and then announce that you won’t be using them anymore. I came across this tactic today when reading a Chinese translation of an essay by Matsuda Toshihiko, called 日本帝國在殖民地的憲兵警察制度:從朝鮮,關東州致滿洲國的統治樣式遷移 (English title was listed as “The ‘Gendarme-oriented’ Police System in the Japanese Colonial Empire: The Transfer of Models of Rule Used in Colonial Korea to Kwantung Province and Manchukuo”) After putting Japan’s 內地 (the interior of Japan = Japan proper excluding its colonies) and terms like 滿洲 (Manchuria, 滿洲國 Manchukuo, the largely Japanese controlled Manchurian state from 1932-1945, often called 僞滿州 or the “puppet Manchukuo”) in quotations, he follows each with “一下省略括號” (“Brackets left out below”).

Another strategy that can sometimes be used, which is one I follow for some words like “traitors,” is to embrace a word and use it quite shamelessly in order to deliberately provoke the reader. In English, the word traitor has lost much of its punch of late – a good thing in my opinion – but still holds great power in many other places and languages. The discomfort generated by the word and the way it forces readers to think about what it really means is part of what I aim to achieve when I use the term. Far from wanting to contribute to the term’s legitimacy, my deliberate use of it is partly out of a kind of mockery, but more importantly out of a desire to help set the scene of the politically charged context in which it was used.

Though I can’t speak for them, I suspect something similar is being done in some other famous cases of this. Some scholars of Korean history have been strongly criticized for using words like “terrorist” to describe Korea’s national tragic hero Kim Koo. I suspect these same critics would have much less opposition to him be referred to by his popular nickname, “the assassin.” I really don’t have strong feelings on this issue and I don’t think it is as straightforward as my own case, but it raises some interesting questions. What if these scholars are also engaging in a dual process of linguistic mockery and deliberate attempt at reviving a historical scene? Should the word be off limits entirely, should it necessarily be accompanied with quotations, or are there alternatives? What I think escapes some critics of such scholars is that I believe at least some of them are using the word terrorist not as a way to conjure images of Kim Koo as a suicide bomber in a crowded market but, on the contrary, to show how the word terrorist has itself a history and potentially embraces a wide range of figures we might be less willing to unconditionally condemn. In doing so, they potentially open a space in which to critique the way the word has come to be used and what it now narrowly represents, as well as the wide range of activities and contexts it covered both in the past and now. Can we only engage in such a rhetorical technique through the use of quotations?

I’d be interested in hearing from other students and scholars about this. What strategies do others take when they are faced with the need or potential need to establish quotational quarantines? What conventions do you follow?

Script for Creating a Chinese Vocab List

The Problem: Let us say you have a list of Chinese words or single Chinese characters in a file. There are a lot of them. You want some easy and fast way of getting the pinyin and English definitions of that list of words or single characters and you want this in a format that can be easily imported into a flashcard program so you can practice these words.

Today I faced this kind of problem. There are lots of “annotator” websites online that make use of the free CEDICT Chinese dictionary but I have yet to find one which outputs a simple, and nicely formated (with all […], and /…/ stuff removed) tab delimited vocab lists.

I have recently been frustrated by the fact that I often come across Chinese characters that I haven’t learn, or, more often, characters that I only know how to pronounce in Japanese or Korean. I also am frustrated at the fact that I have forgotten the tones for a lot of characters I knew well many years ago when I studied Chinese formally.

Over the summer I want to review or learn the 3500 most frequently used Chinese characters, particularly their pronunciation, so that I can improve my tones and more quickly lookup compounds I don’t know.1

I found a few frequency lists online (see here and here for example) and I stripped out the data I didn’t need to create a list with nothing but one character on each line.2 Although it is an older list based on a huge set of Usenet postings from ’93-’94 you can download an already converted list of 3500 characters here.3

Since I’m not in the mood to look up 3500 characters one by one, I spent a few hours this evening using this problem as an excuse to write my second script in the Ruby programming language.

In the remote possibility that others find it useful who are using Mac OS X, you can download the result of my tinkering here:

Cedict Vocabulary List Generator 1.1

This download includes the 2007.8 version of CEDICT, the latest I could find here.4

How this script works:

1. After unzipping the download, boot up the “Convert.app” applescript application. It will ask you to identify the file you want to annotate. It is looking for a text file (not a word or rich text file) in Unicode (UTF-8) format with either simplified or traditional Chinese characters or word compounds, one on each line.

2. This application will then send this information to the convert.rb ruby script which will search for the words in the CEDICT dictionary in the same folder, format the information it finds (the hanzi, pinyin, and English definition), including the putting of multiple hits for the same character/word within the same entry with the definitions numbered. It does not currently add the alternate form of the hanzi (it won’t add simplified version to traditional or vice versa).

3. It will then produce a new file with the word “converted” added to its name. It will create tab-delimited files by default but you can change this by changing this option at the top of the convert.rb file in a text editor.

4. Though this version of the script doesn’t do this yet, you may want to run the resulting text through the Pinyin Tone dashboard widget or a similar online tool such as the one here or here. That will get rid of the syllable final tone numbers and add the appropriate tone marks. I am having a bit of trouble converting the JavaScript that my widget and this site uses into Ruby so if anyone is interested in working on this let me know!

If the script doesn’t work: make sure you are saving your text file as UTF-8 before you convert. I am also having trouble when my script is placed somewhere on a hard disk where the path has lots of spaces. Try putting the script folder on your Desktop.

Note: If you don’t have Mac OS X but can run Ruby scripts on your operating system, you may be able to run my script convert.rb from the command line. It takes this format:

convert.rb /path/to/file.txt /path/to/cedict.u8

UPDATE 1.1: The script now replaces “u:” with “ü” (CEDICT uses u:).

  1. The top 3000 make up some 98-99% when their cumulative frequency is considered. []
  2. A few of the frequency lists I have seen have Cedict dictionary data included but not in a very clean format []
  3. I notice that there is a high frequency of phonetic hanzi for expression emotion in the postings and some other characters one doesn’t come across as often in more formal texts, I actually don’t mind []
  4. If you find a newer version (in UTF-8) put it in the same directory as my script and name it cedict.u8 []

Pinyin Tone Dashboard Widget

Icon.pngI’m happy to announce the results of a few hours of tinkering: The Pinyin Tone Widget. This OS X dashboard widget will take a series of Chinese pinyin words with tone numbers appended at the end of each syllable and will add the tone marks where appropriate (e.g. zhong1guo2 becomes zhōngguó).

Many years ago, before Unicode became dominant, I used a Microsoft Word macro written by a Chinese language scholar, James Dew, as the basis for making an old Mac OS 9 application that translated texts between various pinyin fonts that were floating around online. Later, I made an online script that could convert tone numbers into unicode tone marks. I was surprised to hear from various Chinese language instructors at a conference I presented at a few years later (2003) that many of them used the script regularly when preparing texts for their Chinese language classes.

The online script still works but there is a much more elegantly written online script which does the same thing written by a more skilled programmer in Taiwan named Mark Wilbur hosted on his site Doubting to Shuō. You can find his tool here: Pinyin Tone Tool.

My old PHP script is ugly by comparison to Mark’s compact javascript so I have essentially installed his script to work in an OS X dashboard widget. You can download the widget here:

Pinyin Tone Widget v. 1.02
Continue reading Pinyin Tone Dashboard Widget

Pitfalls of a Hotel Hallway

I am usually not too picky about my sleeping quarters when I travel. On my recent trip to China however, I met a really friendly university student on the bus who offered to help me scout out a relatively clean and conveniently located hotel near Shandong University. The place turned out to be more than adequate and I enjoyed numerous conversations with the half dozen or so staff there during the five days I was there.

One morning, however, I woke up to what seemed to be the sound of a jack hammer drilling into my hotel room wall. I managed to ignore it but when as I walked out of my room, I found the source of the noise as soon as I opened the door: