Corpus Tagging: Concept and Domains
DOI:
https://doi.org/10.35682/mjhsc.v38i1.603Keywords:
Tagging, dictionary, computational technologies, concordancer, lexical entries, corpora, words frequencyAbstract
This paper reviews Corpus Tagging, a topic rarely explored in the Arab literature despite its importance in Linguistics and Natural Language Processing fields. This paper defines Corpus and Corpus Tagging then reviews several studies that investigated Corpus Tagging, which, nonetheless, did not set a clear borderline between the types of tags that can be added. Here comes the importance of this paper in distinguishing between three types of tags that can be added to corpus, which include adding linguistic tags for words (Tagging), marking-up text structure (Markup), and adding descriptive data to a corpus (Metadata). This paper also explains the forms of each type of these tags and the mechanism for adding them to Arabic language corpora accompanied with examples. It also describes the mechanism for combining these three types in one corpus, which contributes to making them more rich and useful for researchers in the Linguistics and Natural Language Processing fields.