| Lemmatizer.org | European languages lemmatizer |
|
English | Russian
Libraries documentation |
libMAFSA documentationlibMAFSA is an implementation of the minimal finit state automaton using as storage for dictionaries. Particularly, it implements Jan Daciuk's automaton building algorithm. The main difference of our implementation from others is a small memory requirements. MAFSA interprets as Minimal Acyclic Finite State Automaton. You can use it for any of your needs. Turglem lemmatizer uses libMAFSA to determine paradigm number and flexia from given word. The core substance of the library is the MAFSA_letter type a letter from automaton alphabet. You can combine these letters into words, feed them to automaton and finally check existence of some word in it. You can also search for word (one or more) by its prefix. To create and use automaton, you first need to determine its alphabet. Usage example in source package determines english alphabet which contains 26 letters from «A» to «Z», symbol «-», symbol «'» and special symbol «DELIMITER», which helps to implement get-something-by-keyword feature (get something like std::map<keyword, ...>). Contact e-mail: lemmatizer@mail.ru (spammers are welcome, I always order your products nowhere. :) ) |