| Lemmatizer.org | European languages lemmatizer |
|
English | Russian
Libraries documentation |
Using C++ wrapperFirst of all, we creating tl:lemmatizer object and loading dictionaries. If loading failed lemmatizer throws exception with full diagnostic output:
#include <turglem/lemmatizer.hpp>
// ...
tl:lemmatizer lem;
try
{
lem.load_lemmatizer("/usr/local/share/turglem/russian/dict_russian.auto",
"/usr/local/share/turglem/russian/paradigms_russian.bin",
"/usr/local/share/turglem/russian/prediction_russian.auto"
);
}
catch (const std::exception &e)
{
printf("EXCEPTION: %s\n", e.what());
return -1;
}
// ...
After this, if you want to lemmatize some word s (const char *) in UTF-8 codepage for example, you just need to call one template-function with charset adapter corresponding to this codepage as template parameter:
The most common use of the lemmatizer is to get the word in its initial form. To do this you can simple enumerate through all lemmatizer answers:
for (size_t i = 0; i < sz_lem; i++)
{
std::string nform = lem.get_text
Next time I'll tell you, how to get grammatical characteristics of the word. Contact e-mail: lemmatizer@mail.ru (spammers are welcome, I always order your products nowhere. :) ) |