Corpus Tagging: Concept and Domains

Authors

  • Abdullah Alfaifi Imam Muhammad Ibn Saud Islamic University, Kingdom of Saudi Arabia

DOI:

https://doi.org/10.35682/mjhsc.v38i1.603

Keywords:

Tagging, dictionary, computational technologies, concordancer, lexical entries, corpora, words frequency

Abstract

This paper reviews Corpus Tagging, a topic rarely explored in the Arab literature despite its importance in Linguistics and Natural Language Processing fields. This paper defines Corpus and Corpus Tagging then reviews several studies that investigated Corpus Tagging, which, nonetheless, did not set a clear borderline between the types of tags that can be added. Here comes the importance of this paper in distinguishing between three types of tags that can be added to corpus, which include adding linguistic tags for words (Tagging), marking-up text structure (Markup), and adding descriptive data to a corpus (Metadata). This paper also explains the forms of each type of these tags and the mechanism for adding them to Arabic language corpora accompanied with examples. It also describes the mechanism for combining these three types in one corpus, which contributes to making them more rich and useful for researchers in the Linguistics and Natural Language Processing fields.

Published

2023-05-29

How to Cite

Alfaifi ع. . . (2023). Corpus Tagging: Concept and Domains. Humanities and Social Sciences Series Mutah Lil-Buhuth Wad-Dirasat, 38(1). https://doi.org/10.35682/mjhsc.v38i1.603

Issue

Section

Articles