Использование Марковские модели для преобразования заглавных букв в смешанные и родственные задачи

Question

I've been thinking about using Markov techniques to restore missing information to natural language text.

Restore all-caps text to mixed-case.
Restore accents / diacritics to languages which should have them but have been converted to plain ASCII.
Convert rough phonetic transcriptions back into native alphabets.

That seems to be in order of least difficult to most difficult. Basically the problem is resolving ambiguities based on context.

I can use Wiktionary as a dictionary and Wikipedia as a corpus using n-grams and Hidden Markov Models to resolve the ambiguities.

Am I on the right track? Are there already some services, libraries, or tools for this sort of thing?

Examples

5

задан Brock Adams 6 August 2011 в 16:27

0 ответов

Другие вопросы по тегам: