Wenlin Conversion Script

Wenlin is the the best piece of software around for students of Chinese. Among other tools, it has a powerful and handy offline dictionary with very flexible and fast search options as well.

I know many students of Chinese that use Wenlin to get their definitions and input vocabulary into flashcard software. Most recently I saw someone do this in a coffee shop here in Taipei, and it brought back a lot of memories of me doing the same in Beijing almost a decade ago.

Wenlin doesn’t make it easy for you, however, to get the word entries into a format that can be easily imported into flaschard applications. There is no “export” feature, presumably because the developer doesn’t like the idea of large parts of the Wenlin dictionary getting out of the software and into a separate database. However, the lack of such a feature means that students have to copy and paste words from Wenlin and add their own tabs. In my case, I also like to delete the alternate hanzi to keep my flashcards more clean.

Although a more experience programmer with good regular expressions skills could easily take this further, I am releasing the results of an evening spent trying to learn how to program in the programming language Ruby:

Wenlin Conversion Script 1.3

Here is a screencast explaining how to use the script:

Wenlin Conversion Script Screencast

This script takes a text file with a list of Wenlin dictionary entries (Saved in TextEdit, not in Wenlin) and puts tabs between the hanzi and the pinyin and between the pinyin and the definition. It saves the converted file which can then be easily imported into your favorite flashcard program.

It is made up of two scripts: the convert.app applescript application which you is what you use to run the script and the convert.rb ruby script which does the actual conversion. You can customize three options in the convert.rb script. Just open it up and set the three option variables at the top to true or false according to your preference for that option. There is a description of what each option does in the ruby file but basically they control whether the alternate traditional/simplified hanzi are removed, whether the “|” character is changed to “Example: ” and the “~” in examples replaced by the pinyin of the word.

I haven’t tested this too extensively so if you see it do strange things with the wenlin vocab items let me know and I’ll tweak the script in the future.

UPDATES:

-I just noticed in the screencast that it split the word “fandong fenzi” and put “fenzi” into the definition – I need to update the regular expression so that it looks for the part of speech rather than a space to separate the pinyin from the definition. I didn’t realize that Wenlin sometimes puts spaces into its pinyin words. I’ll release this soon.

-I just updated a 1.1 version. See the enclosed Read Me file for things I have fixed and changed in this new version of the script.

-I just updated the script again to 1.3, see the readme in the download for the details.