Lemmatizer.org

Using C++ wrapper

First of all, we creating tl:lemmatizer object and loading dictionaries. If loading failed lemmatizer throws exception with full diagnostic output:

#include <turglem/lemmatizer.hpp>

// ...
    tl:lemmatizer lem;
    try
    {
        lem.load_lemmatizer("/usr/local/share/turglem/russian/dict_russian.auto",
            "/usr/local/share/turglem/russian/paradigms_russian.bin",
            "/usr/local/share/turglem/russian/prediction_russian.auto"
        );
    }
    catch (const std::exception &e)
    {
        printf("EXCEPTION: %s\n", e.what());
        return -1;
    }
// ...

After this, if you want to lemmatize some word s (const char *) in UTF-8 codepage for example, you just need to call one template-function with charset adapter corresponding to this codepage as template parameter:

    tl::lem_result lr;
    size_t sz_lem = lem.lemmatizei<russian_utf8_adapteri>(s, lr);
Now the word was lemmatized. Variable sz_lem contains number of lemmatizer answers. For example, the word «call» may be a noun or a verb.

The most common use of the lemmatizer is to get the word in its initial form. To do this you can simple enumerate through all lemmatizer answers:

    for (size_t i = 0; i < sz_lem; i++)
    {
        std::string nform = lem.get_text(lr, i, 0);
        printf("\tnormal form: '%s'\n", nform.c_str());
    }

Next time I'll tell you, how to get grammatical characteristics of the word.


© 2007, Lemmatizer Team.
Contact e-mail: lemmatizer@mail.ru
(spammers are welcome, I always order your products nowhere. :) )
Рейтинг@Mail.ru