Evaluation of part of speech tagging in uzbek language: problems and proposals
Keywords:
Tag, markup, annotation, tagset, NLP, corpus, CLAWSAbstract
Speaking of a language corpus, the issue of building a linguistic
database becomes the subject of concern because of its complexity
and importance at the same time. The process of assigning appropriate
identifiers to speech fragments in corpus texts is problematic since language
modeling is associated with the rules and patterns of tagging existing in the
language. Tagging, especially grammatical tagging or PoS tagging, is also
a topical issue for Uzbek corpus linguistics. Because a special “encoded”
symbol system serves as the primary key in solving NLP problems related
to the Uzbek language. The article analyzes the studies of tagging and PoS
tagging in world linguistics and considers the current tagging process in
Uzbek linguistics. Based on the rules of the Uzbek language, an alternative
set of tags was proposed taking sets of tags widely used in the world into
consideration.