::G's Link Page: Japanese Language Parsing

2006-05-18

Japanese Language Parsing

Interesting-looking piece of software, MeCab. I had been wondering how to parse Japanese text into keywords, like search engines would have to do. Turns out it's not as easy as splitting text on spaces as in English. There are bindings for Perl, also.

MeCab apparently uses Markov models to parse text. Supposedly it doesn't need a dictionary or corpus, using "conditional random fields" to build probability data. Cool!

According to the MeCab page, other parsers include ChaSen, JUMAN, and KAKASI. In my searches, the latter was cited quite a bit.

::G's Link Page

2006-05-18

Japanese Language Parsing

No comments:

Gun News

Gunblogs

Of Interest

Blog Archive

Followers