Workshop


Workshop29 Jun 2008 06:09 am

In addition to my Muninn blog, I occasionally make contributions at the various Frog in a Well blogs, the East Asia Libraries and Archives wiki, some programming weblogs at Fool’s Workshop, and create various scripts and other creations. I have a Friendfeed profile but I don’t really like leaving this sort of thing to third party sites that come and go with the fads. I’ve decided to keep track of most of my own online projects with a little site and feed called the Workshop Wire.

Workshop Wire (The RSS Feed for it is here)

Print This Post
China and Language and Tech and Workshop22 Jun 2008 02:24 pm

The Problem: Let us say you have a list of Chinese words or single Chinese characters in a file. There are a lot of them. You want some easy and fast way of getting the pinyin and English definitions of that list of words or single characters and you want this in a format that can be easily imported into a flashcard program so you can practice these words.

Today I faced this kind of problem. There are lots of “annotator” websites online that make use of the free CEDICT Chinese dictionary but I have yet to find one which outputs a simple, and nicely formated (with all [...], and /…/ stuff removed) tab delimited vocab lists.

I have recently been frustrated by the fact that I often come across Chinese characters that I haven’t learn, or, more often, characters that I only know how to pronounce in Japanese or Korean. I also am frustrated at the fact that I have forgotten the tones for a lot of characters I knew well many years ago when I studied Chinese formally.

Over the summer I want to review or learn the 3500 most frequently used Chinese characters, particularly their pronunciation, so that I can improve my tones and more quickly lookup compounds I don’t know.1

I found a few frequency lists online (see here and here for example) and I stripped out the data I didn’t need to create a list with nothing but one character on each line.2 Although it is an older list based on a huge set of Usenet postings from ‘93-’94 you can download an already converted list of 3500 characters here.3

Since I’m not in the mood to look up 3500 characters one by one, I spent a few hours this evening using this problem as an excuse to write my second script in the Ruby programming language.

In the remote possibility that others find it useful who are using Mac OS X, you can download the result of my tinkering here:

Cedict Vocabulary List Generator 1.1

This download includes the 2007.8 version of CEDICT, the latest I could find here.4

How this script works:

1. After unzipping the download, boot up the “Convert.app” applescript application. It will ask you to identify the file you want to annotate. It is looking for a text file (not a word or rich text file) in Unicode (UTF-8) format with either simplified or traditional Chinese characters or word compounds, one on each line.

2. This application will then send this information to the convert.rb ruby script which will search for the words in the CEDICT dictionary in the same folder, format the information it finds (the hanzi, pinyin, and English definition), including the putting of multiple hits for the same character/word within the same entry with the definitions numbered. It does not currently add the alternate form of the hanzi (it won’t add simplified version to traditional or vice versa).

3. It will then produce a new file with the word “converted” added to its name. It will create tab-delimited files by default but you can change this by changing this option at the top of the convert.rb file in a text editor.

4. Though this version of the script doesn’t do this yet, you may want to run the resulting text through the Pinyin Tone dashboard widget or a similar online tool such as the one here or here. That will get rid of the syllable final tone numbers and add the appropriate tone marks. I am having a bit of trouble converting the JavaScript that my widget and this site uses into Ruby so if anyone is interested in working on this let me know!

If the script doesn’t work: make sure you are saving your text file as UTF-8 before you convert. I am also having trouble when my script is placed somewhere on a hard disk where the path has lots of spaces. Try putting the script folder on your Desktop.

Note: If you don’t have Mac OS X but can run Ruby scripts on your operating system, you may be able to run my script convert.rb from the command line. It takes this format:

convert.rb /path/to/file.txt /path/to/cedict.u8

UPDATE 1.1: The script now replaces “u:” with “ü” (CEDICT uses u:).

  1. The top 3000 make up some 98-99% when their cumulative frequency is considered. []
  2. A few of the frequency lists I have seen have Cedict dictionary data included but not in a very clean format []
  3. I notice that there is a high frequency of phonetic hanzi for expression emotion in the postings and some other characters one doesn’t come across as often in more formal texts, I actually don’t mind []
  4. If you find a newer version (in UTF-8) put it in the same directory as my script and name it cedict.u8 []
Print This Post
Workshop21 May 2008 08:11 am

I yearn for the old days of HyperCard, which I started learning back in the glory days of the late 1980s. I’m trying to learn how to use the monster currently maintained by Apple that is AppleScript Studio. It is the flawed scripting language AppleScript, which has some similarities with the scripting language of HyperCard, in the programming environment of XCode. It feels like a marriage between a Nuclear Power-plant and a water wheel that’s missing some of its blades.

I am, however, going through one of my 3-month programming cravings, so I have decided to play with AppleScript Studio and see if I can make an upgrade for one of my favorite old applications (more on that if I ever make any progress). I made a little weblog to chronicle my efforts and leave some tips behind for other beginners who might happen upon it later:

Fool’s Applescript Workshop

Print This Post
China and Language and Workshop13 May 2008 04:12 am

Icon.pngI’m happy to announce the results of a few hours of tinkering: The Pinyin Tone Widget. This OS X dashboard widget will take a series of Chinese pinyin words with tone numbers appended at the end of each syllable and will add the tone marks where appropriate (e.g. zhong1guo2 becomes zhōngguó).

Many years ago, before Unicode became dominant, I used a Microsoft Word macro written by a Chinese language scholar, James Dew, as the basis for making an old Mac OS 9 application that translated texts between various pinyin fonts that were floating around online. Later, I made an online script that could convert tone numbers into unicode tone marks. I was surprised to hear from various Chinese language instructors at a conference I presented at a few years later (2003) that many of them used the script regularly when preparing texts for their Chinese language classes.

The online script still works but there is a much more elegantly written online script which does the same thing written by a more skilled programmer in Taiwan named Mark Wilbur hosted on his site Doubting to Shuō. You can find his tool here: Pinyin Tone Tool.

My old PHP script is ugly by comparison to Mark’s compact javascript so I have essentially installed his script to work in an OS X dashboard widget. You can download the widget here:

Pinyin Tone Widget v. 1.02
(more…)

Print This Post
Tech and Workshop20 Nov 2007 09:45 am

The song name, artist, and album tags in many music files (whether they are acquired legally or otherwise) from Chinese and Korean sources are completely garbled in iTunes on a Macintosh. I assume this is because iTunes assumes that the text is one encoding (Unicode or MacRoman?) and they were in fact encoded in another (often EUC_KR for Korean, Big5 for Taiwanese files, GB for files with simplified Chinese characters). I used to frequently get this problem with Japanese music files but for some reason (perhaps because Unicode is more popular in Japan?) this has gradually become less of a problem.

Fixing these tags can be a pain and some of the older tools such as once awesome “MP3 Rage” and “ID3 Editor” often make things worse due to their inconsistent handling of 2-byte non-Roman languages.

An Apple Support page, however, recently pointed me to a great shareware application ($12) called ID3Mod2 which looks like it is made by the same people that made the incredible Chinese input method QIM that I talked about in an earlier posting (I don’t know this developer personally so it is not as if I’m trying to find good things to say about their work). You can freely use the software for a number of days, during which I was able to go through and fix all of the garbled tags in music files I have collected in China, Korea, and Japan over the last decade. Amazing - I might now actually learn the names of some of the songs I have been listening to for so long and someday even gather the courage to request them on a future karaoke adventure.

Print This Post
Workshop11 Sep 2007 08:57 pm

I’m split between using Safari and Firefox. The former provides a faster and more pleasant browsing experience but sometimes Firefox renders certain pages better, has the indispensible Zotero, and I have a specific kind of integration with del.icio.us that I like better than alternatives available for Safari.

To save me a few keystrokes I wrote the following simple applescript to take the current page open in Safari and open it in a new tab in Firefox:

tell application “Safari”
   activate
   set my_URL to the URL in document 1
end tell

–Convert Unicode Text of my_URL to Plain text
set my_URL to «class ktxt» of ( (my_URL as string) as record)

tell application “Firefox”
   activate
   Get URL my_URL
end tell

Download this as an applescript or as an applescript compiled application that you can invoke easily from Quicksilver.

Print This Post
Workshop11 Sep 2007 07:59 pm

I keep a journal of sorts, but wanted to filter out some of the quantitative or repetitive data I occasionally record about myself (how much I’m sleeping, my weight, exercise stats, etc.) into separate files that can be easily manipulated in something like Excel and displayed in charts if I ever choose to do so. Since I hate Excel, I want to do this without having to open it, or anything else if possible. I created a simple applescript, while not elegant, helps me with this task. I simply launch the script whenever I want to record this information for the day and it saves the data in separate text files as tab delimited data by date.

See the script below.
(more…)

Print This Post
Tech and Workshop18 Jan 2007 02:16 pm

I got a cheap used Griffin AirClick for USB to control my older laptop Macintosh by remote control. Another remote I like better (KeyPOINT) has been acting up so I got the Griffin as a replacement. The downside with Griffin is that it has fewer buttons, no mouse control, and a limited set of applications that it works with. One of the applications that I want to use the remote with is the best flashcard program on the Macintosh, iFlash. I use this almost every day to practice Korean vocab and other languages. Since this is not one of the supported applications, this afternoon I hacked the AirClick.app program that comes with the remote to add support for iFlash. You may download my modified version of the AirClick application here.

For those who wish to add support for their own program I briefly outline how I did the hack below:
(more…)

Print This Post
Tech and Workshop14 Dec 2006 01:24 am

My last few postings have all been tech related. Since most of my history related postings are also Asian history related postings, I post those increasingly at Frog in a Well. I’ve been in a hacking mood in the last few weeks, especially since I’ve been working on some new projects at Frog in a Well in my spare time. My last two scripts are essentially shortcuts for creating iCal To-Dos. The first provides a fast way to use Quicksilver to create To-do items without leaving whatever application you are working in. The second provides a way for you to email yourself To-dos that then automatically tell iCal to create a To-Do with content from the email.

The script below is not related to To-Dos but is also aiming to increase my organization a bit. I use a program called Yojimbo (it is a great program but I think the title and its Karate Kid icon are both cheesy) made by Bare Bones, the same people who make BBEdit, the essential text editor for web programmers and coders of all stripes. Yojimbo is a program which keeps snippets of notes, web passwords, serial numbers, and best of all: you can drag and drop any web page you are viewing onto a little tab it keeps at the side of the screen and Yojimbo will download a static copy of the web-page and store it along with all your other notes. Great for when you think you might be away from the internet and want to keep the content handy for later searching.

Sometimes I’m away from my laptop, and as in the case of emailing myself To-Dos that I want created automatically in iCal, there are times I jot down notes from something and want to email myself those notes. It would then be nice for Mail to automatically detect those emails, and create a new Note in Yojimbo with the appropriate title and contents. I hacked together a little script which does just that.
(more…)

Print This Post
Tech and Workshop28 Nov 2006 10:23 pm

In my last posting I shared a script I put together to use with the shortcut application Quicksilver to quickly create To-do items in iCal. I am now using this all the time. I have separate scripts, based on the same which create a medium priority “READ: ” to-do for things I want to put on my reading to-do list, a “LOOKUP: ” medium priority script to remind myself of things I want to look up at some point (assigned to a different color calendar), and a high priority to-do script in its own iCal calendar.

Now, what if I’m away from home, am without my laptop computer, but have access to a nearby terminal with an internet connection, and I want to create to-dos for myself. I synch iCal with my Palm pilot so I could easily write down the to-do directly into my Palm. However, Palms are still not fun to use when you have a lot to write down. Today, I was in the Fung library reading some Chinese historical journals and wanted to remind myself to look up a few things I found there and come back and read some articles I didn’t have time for today. There were internet terminals all over the place so instead of entering the titles (in Chinese) on the Palm, which is a pain and time consuming, I emailed myself some email messages with the subject “LOOKUP: ” or “READ: ” and then what I wanted to lookup/read. In the body of the email I put more information, such as a URL or notes to myself. I decided that tonight when I came home I would hack up an applescript that used Apple’s mail filter to automatically find those messages as they arrived in my inbox, create the appropriate to-do in iCal (like the Quicksilver script) with the subject line of the email, put the body of the email into the todo’s notes, and then move the message into an archived folder. The results are below.
(more…)

Print This Post

Next Page »

Creative Commons License