Steps of creating a tagged parallel corpus of the uzbek-english languages
Keywords:
Corpus, tagging, alignment, parallel corpora, POS tagging, XML encodingAbstract
The article talks about corpus linguistics, which is one of the main directions of computer linguistics, monolingual corpora and parallel corpora, and also about the stages of creating a parallel corpus of Uzbek-English languages based on world experience in the field of parallel corpora is maintained. In addition, information is provided about priority tasks such as establishing the programming and linguistic principles of the parallel corpus of Uzbek-English languages, linguistic and extralinguistic tagging of selected units, and developing an algorithm for creating a parallel corpus. Considerations are given on how to select data for the parallel corpus, the requirements for the data, and what opportunities the creation of the Uzbek-English parallel corpus provides to researchers and users. In this process, linguistic and methodological problems, such as material selection, as well as programmatic difficulties in creating a parallel corpus
are reflected.