Arabic Dialect Processing
Published Dec 1, 2006 · Mona T. Diab, Nizar Habash, Ńawilatan ζadīdatan
12
Citations
0
Influential Citations
Abstract
The existence of dialects for any language constitutes a challenge for Natural Language Processing (NLP) in general since it adds another set of variation dimensions from a known standard. The problem is particularly interesting and challenging in Arabic and its different dialects, where the diversion from the standard could, in some linguistic theories, warrant a classification as a different language. This problem would not be as pronounced if standard Arabic were to be a living language, however it is not. Any realistic and practical approach to processing Arabic will have to account for dialectal usage since it is so pervasive. In this tutorial, we will attempt to highlight different dialectal phenomena and how they migrate from the standard and why they pose challenges to NLP. Our tutorial will have four different parts: First, we will give you a background layout of issues for standard Arabic NLP. Then, we will present a high level generic view of dialects and different aspects of them that are of interest for the NLP community, addressing both text and speech issues in addition to standardization issues. We will focus in depth on two aspects of dialect processing in the third and fourth parts of the tutorial, namely, dialectal morphology and dialectal syntactic parsing. Throughout the presentation we will make references to the different resources available and draw contrastive links with standard Arabic and English. We will provide links to recent publications and available toolkits/resources for all four sections.