Arabic NLP Resources

for the Arabic WordNet Project



William BLACK,
Sabri ELKATEB
School of Informatics
University of Manchester
Sackville Street, Manchester, M60 1QD,
w.black@manchester.ac.uk,
sabri.elkateb@manchester.ac.uk
Manuel BERTRAN, Xavier FARRERES, David FARWELL, Reda HALKOUM, Horacio RODRIGUEZ
Politechnical University of Catalonia,
horacio@lsi.upc.edu,
mbertran@lsi.upc.edu Musa ALKHALIFA,
Tànit ASSAF,
M.A. MARTI,
University of Barcelona
Gran Via 585, 08007-Barcelona
{musa,tanit}@thera-clic.com,
amarti@ub.edu
Piek VOSSEN Irion Technologies
Delftechpark 26, 2628XH,
Delft, The Netherlands
piek.vossen@irion.nl
Adam PEASE
Articulate Software Inc,
420 College Ave
Angwin, CA 94508
apease@articulatesoftware.com Christiane FELLBAUM
Princeton University,
Department of Psychology,
Green Hall, Princeton, NJ 08544
fellbaum@clarity.princeton.edu
Table of Contents

1. INTRODUCTION 3

2. OPEN DOMAIN LEXICAL RESOURCES 5

2.1 Arabic Monolingual Corpora. 5

2.2 Arabic/English/... Parallel Corpora. 5

2.3 Arabic Monolingual Dictionaries and Lexicons 6

2.4 Arabic/English Bilingual dictionaries and lexicons 7

2.4.1 Printed bilingual dictionaries 7

2.4.2. On-line MRD (Machine Readable Dictionaries) 8

2.5 Lexicons obtained from (selective) access to online MT systems: 11

2.6 Stopwords 11

2.7 Gazetteers 12

2.8 Online newspapers 12

2.9 On line Press Agencies 15

2.10 List of verbs 16

2.11 List of roots 16

2.12 Electronic Books 16

3. DOMAIN RESTRICTED LEXICAL RESOURCES 16

3.1 Medical domain 17

3.2 Agriculture and related domains 18

3.3 Psycology 19

3.4 Hydrology 19

3.5 Urbanism 19

3.6 Chemistry 19

3.7 Zoology 20

3.8 Mathematics 20

3.9 Islamic terms 20

3.10 Finance and Banking 20

3.11 Botanic 20

4. OTHER LINGUISTIC RESOURCES 20

4.1 Arabic Conjugators 20

4.2 Arabic Dependency Treebank 21

5. ARABIC NL PROCESSORS 22

5.1. Morphological Analyzers 22

5.2. Stemmers 22

5.3. Root extractors 22

5.4 Transliterators 22

5.5 Other Arabic NL Processors 23

6. OTHER ARABIC NL TOOLS 23

7. SLIGHTY COMMENTED BIBLIOGRAPHY 24

7.1 Thesis 24

7.2. Articles 25

8. OTHER LINKS 32

9. MISCELLANEOUS 33


1. INTRODUCTION


This report is intended to be a guide to resources (both linguistic data and linguistic processors and tools) that have been used (or at least tried) or simply considered for use during the development of AWN.
Our intention is to maintain an evolving document, for the duration of the project, where new resources and new comments or assessments on previous items could be added on the fly. Thus, this initial version 0 will be followed (we hope) by other increasingly useful versions.
The report is not intended to be a complete survey of Arabic NLP resources and tools. We have focused on resources related to the needs of AWN and on free resources.
For more in depth information on Arabic NLP resources, besides the content of this report and the links included in it, the following references could be useful:
http://www.ccls.columbia.edu/cadim/links.html
http://www.nemlar/org/
http://cf.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/search/search-all- res2.cfm?res=All&AppLanguageId=43&search1=search1
Non Arabic-specific resource repositories (but including valuable Arabic resources and tools) can be found in:
- Arabic Gigaword http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T12 - Arabic Gigaword Second Edition: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02
http://www.elsnet.org/
- An-Nahar Newspaper Text Corpus http://www.elda.org/catalogue/en/text/W0027.html - DixAF (Bilingual Dictionary French Arabic, Arabic French) http://www.elda.org/catalogue/en/text/M0040.html - Arabic Data Set http://www.elda.org/catalogue/en/text/W0030.html - "Le Monde Diplomatique" Text corpus in Arabic http://www.elda.org/catalogue/en/text/W0030.html

Additional useful information and useful links can be found on the Web pages of a number of people or institutions: http://www-nlp.stanford.edu/links/statnlp.html. An annotated list of resources for Satistical NLP and corpus based Computational linguistics http://www.siglex.org/: A Special Interest Group on the Lexicon of ACL (Association for Computational Linguistics) - http://sites.univ-lyon2.fr/langues_promodiinar/Accueil.htm - http://elsap1.unicaen.fr/

A useful recent survey (very extensive but mainly focussing on commercial products) is:
- Mahtab Nikkhou, Khalid Choukri (2005), Survey of Arabic Language Resources and Tools in the Mediterranean Countries. Nemlar Report, March 2005
For the sake of completeness a slightly commented bibliography of Arabic NLP is included.

2. OPEN DOMAIN LEXICAL RESOURCES


2.1 Arabic Monolingual Corpora.


- From LDC several corpora are available - This corpus must be pre-processed in order to use it for probability estimation. To this end normalization and light stemming should be sufficient (see available tools for this purpose below). - Nijmegen Corpus: http://www.let.kun.nl/wba/Content2/1.4.5_Nijmegen_Corpus.htm - News articles - Arabic Corpus

2.2 Arabic/English/... Parallel Corpora.


http://157.150.97.21/dgaacs/unterm.nsf http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm
http://arabiCorpus.byu.edu Is being designed to allow students and scholars to search large untagged Arabic corpora for words and structures. It provides information on word frequency, citations giving 10 words before and 10 words after, and information on collocates of the word in question’.
http://english.aljazeera.net/HomePage
www.aps.dz New webpage with intersting parallel articles.
http://termweb.unesco.org/Default.asp?admin=1&internet=1
http://www.fao.org
http://www.arabization.org.ma/dictionnaire.asp
ftp://ftp.microsoft.com/developr/msdn/newup/Glossary 27 Mb of Computer Science Glossary
- Environmental Terms - A Trilingual Glossary by Yaron Batit.
http://www.clsp.jhu.edu/ws99/projects/mt/
http://enlil.ff.cuni.cz/veda/projekty/clara.htm

2.3 Arabic Monolingual Dictionaries and Lexicons



http://www.malayin.com/laut.asp?catid=2
http://fadakbooks.com/ardia.html
http://dictionary.sakhr.com/ http://qamoos.sakhr.com/intro/introles.asp?lex_id=6 ( In Arabic)
http://www.lsi.upc.es/~halkoum/aralisan.php
- Vegetables http://www.tammar.4t.com/vegta.htm - Plants http://www.tammar.4t.com/herb.htm - Fruits http://www.tammar.4t.com/fruit.htm
http://www.khayma.com/roqia/nabaway.HTM (5 old books online)

2.4 Arabic/English Bilingual dictionaries and lexicons


2.4.1 Printed bilingual dictionaries


From the large quantity of dictionaries that are available, the most relevant sources for this section are:
http://www.malayin.com/laut.asp?catid=2
The list includes the popular Al Mawrid English-Arabic Dictionary and Al Mawrid Arabic-English Dictionary (printed version with CD-ROM).
http://www.bibliomonde.com/pages/fiche-auteur.php3?id_auteur=1508
- Dictionnaire Larousse Saturne arabe - français / français – arabe. Publishing House Larousse: 150,000 words and phrases
The Dutch Language Union, Amsterdam
MSA Dutch dictionary is based on a corpus of 3,000,000 words. Mark Van Mol has compiled the lexical data base and he may have an electronic version of this dictionary. He is the director of the Leuven Group (Belgium) and he has many publications in ANLP.
http://www.kuleuven.ac.be/ilt/arabic/index_en.htm http://mark.vanmol at ilt.kuleuven.ac.be

This dictionary is one of the most important for many, perhaps the only one in use for many years and it has been quoted by numerous English language authors
It can be found at:
www.spokenlanguage.com or http://www.amazon.com/gp/reader/0879500034/ref=sib_dp_pt/103-9733622-2675046
We consider this dictionary to be necessary given for our needs.
www.let.kun.nl
This dictionary must be considered to be very important and useful.

2.4.2. On-line MRD (Machine Readable Dictionaries)


In this section we deal with a large quantity of information that is continuously changing and being updated.
http://www.jccm.es/educacion/atenc_div/diccionario_arabe/

www.StudyQuran.co.uk
A complete version is available for free on line (even though www.aramedia.com is selling it for over than $450??).
This is the largest lexicon available comprising 8 volumes (about 3200 pages). The dictionary’s author spent over 30 years on compiling it.
As with any Arabic dictionary it is organized by roots, and it is also available on line.
Although this dictionary was expected to be finished months ago, it was only available as of December 2005. Because they expected heavy online traffic they announced that the links sometimes would not be working properly.

It contains 122,920 entries in XML format including Arabic proper names and it is organized as follows: Arabic word # Part of Speech # English word
http://crl.nmsu.edu/~ahmed/
http://crl.nmsu.edu/Resources/lang_res/arabic.html

www.mesiti.it/arabic/search_dict.asp
This dictionary it has been created by a group of teachers from Italy. It allows the user to introduce Arabic or English words although the search is done using roots. Plurals and feminine forms of adjectives and nouns are also provided when necessary using a manify function (Javascript).
www.arabsun.de
It also allows the user to introduce English and Arabic words using an Arabic keyboard.
http://www.jccm.es/educacion/atenc_div/diccionario_arabe
http://www.edu365.com/agora/dic/catala_arab/



- Bidirectional dictionary. ($49.9). Free sample. - English <-> Arabic lexicon ($159). - French <->Arabic dictionary (800 words).







http://qamoos.sakhr.com/intro/mgz01.asp


This has information about the Sakhr English-Arabic dictionary and useful information on Arabic grammar and Arabic language technology in general.



http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm


2.5 Lexicons obtained from (selective) access to online MT systems:





http://www.sakhr.com/Sakhr_e/Products/Idrisi.htm?Index=2&Main=Products&Sub=Idrisi ] - http://qamoos.sakhr.com/idrisidic_1.asp?Sentence=car

2.6 Stopwords


Some papers and tools related in a way to Arabic stopwords:
http://www.lemurproject.org/
The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows’.






2.7 Gazetteers











2.8 Online newspapers



Title

Web page

Country




El Khabar

http://www.ech-chaab.com/ Algeria
El Moudjahid http://www.elkhabar.com/accueil/ Algeria
Ech-Chaab http://www.elmoudjahid-dz.com/ Algeria
El Rai http://www.elrai.com/ Algeria
El Watan http://www.elwatan.com/ Algeria
El Youm http://www.el-youm.com/ Algeria



Al Moharer http://al-moharer.freeservers.com/ Australia



Al Ayam http://www.alayam.com/ Bahrain



Al Ahram http://www.ahram.org.eg/

Egypt

Al Shaab

http://www.alshaab.com/

Egypt




Ashar Al-Awsat http://www.asharqalawsat.com/

England




Al Ahali

http://www.ahali-iraq.com/ Iraq
Al-Efyaa

Iraq




Alquds Daily Newspaper

http://www.alquds.com/ Israel
Arabynet (Yediot Achronot) www.arabynet.com Israel



Ad Dustour http://www.addustour.com/Default/Default.asp Jordan
Al Arab Al Yawm http://www.alarabalyawm.net/ Jordan
Al Rai http://www.alrai.com/ Jordan
Albawa http://www.albawa.com/ Jordan



Al Qabas http://www.alqabas.com.kw/ Kuwait
Al Rai Al Aam http://www.alqabas.com.kw/

Kuwait

Al Seyassah

http://www.alarabalyawm.net/

Kuwait

Al Watan http://www.alwatan.com.kw/

Kuwait




Al Anwar

http://www.alanwar.com/ar/ Lebanon
Al Liwaa http://www.aliwaa.com/ Lebanon
Al Mustaqbal http://www.almustaqbal.com/ Lebanon
Annahar http://www.annaharonline.com/

Lebanon

An-Nahar

http://www.annahar.com.lb/

Lebanon

As-Safir

http://assafir.com/iso/today/front/summary.html Lebanon



Al Fair al Jadid http://www.alfajraljadeed.com Libya
Al Jamahiriyah http://www.aljamahiria.com/ Libya
Al Shames http://www.alshames.com/

Libya

Libyan Press

http://www.libyanpress.com/ Libya



Al Watan http://www.alwatan.com/

Oman

Oman Arabia Daily

http://www.omandaily.com/ Oman


Palestine
Al Ayyam
Palestine
Al hayat Al Jadedah http://www.alhayat-j.com/ Palestine
Al Manar http://www.manar.com/ Palestine
Al Quds http://www.alquds.com/ Palestine
Al Sabar http://www.hanitzotz.com/alsabar/ Palestine



Fasl Al-Maqal http://www.fasl-almaqal.com/ Qatar
Al Watan http://www.al-watan.com/ Qatar
Al-Sharq http://www.al-sharq.com/site/topics/index.asp?cu_no=1&temp_type=44 Qatar
Raya http://www.al-sharq.com/site/topics/index.asp?cu_no=1&temp_type=44 Qatar



Akhbar http://www.elakhbar.org.eg/ Saudi Arabia
Al Hayat http://www.alhayat.com/ Saudi Arabia
Al Itidal http://www.ynh.com/al-itidal/ Saudi Arabia
Al Jazirah http://www.ynh.com/al-itidal/ Saudi Arabia
Al Madinah http://www.almadinah.com/ Saudi Arabia
Al Riyadth
Saudi Arabia
Al-Aalam Al-Islami http://www.muslimworldleague.org/paper/1875/index.htm

Saudi Arabia

Asharq Al-Awsat

http://www.asharqalawsat.com/

Saudi Arabia

Okaz

http://www.okaz.com.sa/okaz/





Adaraweesh

http://members.tripod.com/~adaraweesh/ Sudan
Al Rayaam http://www.rayaam.net/ Sudan
Alosbua Sudanese Daily http://alosbua.com/alosbua/ Sudan
Alray Alaa’m http://www.rayaam.net/ Sudan
Alwaan http://www.qatartop.com/ Sudan



Al Baath http://www.albaath.news.sy/epublisher/user/ Syria
Al Bawaba http://www.albawaba.com/ar/countries/Syria/ Syria
Al Furat http://furat.alwehda.gov.sy/ Syria

Al Jamahir http://jamahir.alwehda.gov.sy/ Syria
Al Maukef Al Riadi http://riadi.alwehda.gov.sy/_View_news2.asp?FileName=2748410720060206235922 Syria
Al Ouruba http://ouruba.alwehda.gov.sy/ Syria
Al Thawra http://www.thawra.com/ Syria
Al Wahda http://www.thawra.com/data/wehda/ Syria
Teshreen http://tishreen.info/ Syria



Assabah
Tunisia
Essahafa http://www.tunisie.com/LaPresse/ Tunisia



Akbar Al Arab http://www.akhbaralarab.co.ae/

United Arab Emirates

Al Bayan http://www.albayan.ae/servlet/Satellite?pagename=Bayan/Page/BayanPage&c=Page&cid=1039065325549

United Arab Emirates

Al Ittihad http://www.alittihad.co.ae/

United Arab Emirates

Al Khaleej http://www.alkhaleej.co.ae/

United Arab Emirates




26 September

http://www.26september.com/ Yemen
Al Mathak http://www.gpc.org.ye/mathak.htm Yemen
Al Thagafiah http://www.y.net.ye/althaqafiah/ Yemen
Al-Ayyam http://www.al-ayyam.info/ Yemen
Al-Gumhuryah http://www.y.net.ye/al-gumhuryah/ Yemen
Al-Sahwa http://www.alsahwa-yemen.net/

Yemen

Al-Shoura

http://www.y.net.ye/shoura/

Yemen

Al-Thawrah

http://www.althawra.gov.ye/

Yemen

Al-Wahdawi

http://www.alwahdawi.net/

Yemen

Attariq


Yemen

Naba Al Hakekah http://www.y.net.ye/naba/ Yemen
Ray http://www.ray-yem.com/

Yemen



2.9 On line Press Agencies

www.aps.dz //Fr En Ar http://www.afp.fr/arabic/home/ //Fr En Ar Sp
http://www.map.ma/ar //Fr En Ar Sp
http://www.arabic.xinhuanet.com/arabic/index.htm //En Ar
http://news.bbc.co.uk/hi/arabic/news/ //En Ar
The agencies quoted do not lay out s paralle articles except the Chinese and Algerian agency.

2.10 List of verbs


http://www.verba.org/verbi_utf8/all_verbs_index_ar.html The database includes for each verb the vowelized forms of: imperative, conditional, jussive 620 verbs


2.11 List of roots


A list of triliteral and quadriliteral roots organized in Arabic alphabetical order compiled by Tim Buckwalter but not available in his webpage www.qamus.org
We found it at: www.angelfire.com/tx4/lisan/roots1.htm

http://www.openburhan.com/ob_main_frame.html
We will use this list for generating automatically a corpus. Instead of extracting the root of the word, we make the opposite step from the root and the various forms of patterns, then reconsitue a lexicon.

2.12 Electronic Books


Here we can find a vast and copious collection of free Arabic books.
http://www.almeshkat.net/books/index.php (2282 books) http://www.al-eman.com/Islamlib/ http://tafsir.org/books/menu.php?action=new


3. DOMAIN RESTRICTED LEXICAL RESOURCES


UNTERM United Nations Terminology Database (AR-EN-ES-FR-RU-ZH) : This has 70,000 entries in 6 Official Languages and its content can be extracted because the queries result in long lists of words in English and Arabic. It covers over 80 different domains:


COUNTRY NAME, AIDS, agriculture, atmospheric science, biodiversity, bioscience, budget and management, cartography and geography, child welfare, climate change, codes and regulations, communication, core concept, culture, declarations, demographics, development, disarmament, disasters, discrimination, documents, economics, education and training, energy, environment, export controls and sanctions, finance, fisheries, food, forestry, functional and other titles, geoscience, governance, Greek, habitat, health and medicine, human rights, humanitarian issues, indigenous peoples, information technology, intellectual property, international law, international relations, international trade, labour, landmines and mine action, Latin, law enforcement, law of the sea, logistics and supplies, meetings, migrations and refugees, military abbreviations, military issues, multilateral instruments, narcotic drugs, national law, natural resources, nuclear science, oceanography, organizational structure, peace and security, peace operations, plans of action and initiatives, political life, poverty, religions, science and technology, set phrases, small arms, social issues, space, staff matters, statistics, TALOS, terrorism, transport and communications, water, weapons of mass destruction, women.





UNESCOTERM Search (AR-DE-EN-ES-FR-RU-ZH) :

This can be used as reference and its content can be extracted. It includes terms related to UNESCO such as administrative and financial terms, education, conferences and meetings, etc).


UNESCO Structures, Superseded UNESCO Structures, Institutions: (IGOs, NGOs, etworks, Systems, Foundations), IOC: Titles, Terms and Acronyms , Administrative and Financial Terms , International (Days, Weeks, Years and Decades), Campaigns and Appeals, UNESCO's Member States, UNESCO's Standard-Setting Instruments, International Prizes, (Non-Member States, Non-Self-governing Territories, Dependent Territories etc.), UNESCO Chairs, Miscellaneous, UN and International Legal Instruments, UNESCO Functions and Titles, (Conferences, Meetings etc.), Terms in the field of Education, (UNESCO's Programmes, Projects, Initiatives), (International Programmes, Projects, Initiatives), Former Institutions: (IGOs, NGOs, Networks, Systems, Foundations)



3.1 Medical domain


This has the Unified Medical Dictionary (UMD) from the World Health Organization along with its specialized UMD dictionaries which cover more than 70 domains. Entries are arranged by alphabetical order in every domain and one can see all the English entries with their Arabic equivalents page by page. All medical terms were approved by the Arab Academies in Cairo, Damascus, Baghdad and Amman. They also made sure that the Arabic terms were selected carefully in accordance with a very strict, clear, simplified and user-friendly methodology. An electronic version of this edition is available on CD-ROM in a Windows environment, and comprises about 150 000 terms.


A copy of this CD-ROM is available from khayat@emro.who.int



The domains include (numbers refers to number of entries of the domains sampled):

All specialised UMD dictionaries: Abbreviations (799 entries), Acidology (1669), Acronyms (248), Anatomy (2000), Anesthesiology (484), Anthropology and anthropometrics (1427), Bacteriology (1827), Biochemistry and Chemistry (2000), Biology, Biomedical engineering, Biomedical ethics, Biostatistics, Blood transfusion medicine, Botany, Cardiology and cadiovascular surgery, Cell biology, Demography, Dentistry(2000), Dermatology, Diagnostics(symptoms&signs), Embryology & teratology, Emergency medicine, Endocrinology & metabolism, Entomology, Environmental health, Enzymology and Zymology, Family and community medicine, Food safety, Forensic medicine, Gastroenterology, Genitourinary medicine , venereology and STDs, Health services, Helminthology, hematology, Histology, Hospital administration, Immunology, Infectious diseases, Informatics, Laboratory medicine, Maternal and child health, Measures, Microbiology, Mycology, Nephrology, Neurology, Nutrition and dietetics, Obstetrics and gynecology, Occupational medicine, industrial medicine, Oncology, Ophthalmology and optics, Orthopedics, Otorhinolaryngology, Parasitology, Pathology, Pediatrics, Pharmacology and therapeutics, Physiatrics and physical medicine, Physiology, Prefixes, Preventive medicine, Public health, community mdeicine and hygiene, Reproductive health, Sexology, Suffixes, Surgery, Taxonomy, nosology and classification (1118), Toxicology, Transplantation, Tropical medicine, Virology, WHO managerial terms (2000), Zoology (997). Helminthology,  hematology,  Histology,  Hospital administration,  Immunology,  Infectious diseases,  Informatics,  Laboratory medicine,  Maternal and child health,  Measures,  Microbiology, Mycology,  Nephrology,  Neurology,  Nutrition and dietetics,  Obstetrics and gynecology, Occupational medicine, industrial medicine,  Oncology,  Ophthalmology and optics,  Orthopedics, Otorhinolaryngology,  Parasitology,  Pathology,  Pediatrics,  Pharmacology and therapeutics, Physiatrics and physical medicine,  Physiology,  Prefixes,  Preventive medicine,  Public health, community mdeicine and hygiene,  Reproductive health,  Sexology,  Suffixes,  Surgery,  Taxonomy, nosology and classification (1118),  Toxicology,  Transplantation,  Tropical medicine,  Virology, WHO managerial terms (2000),  Zoology (997).




3.2 Agriculture and related domains



You can download a copy of AGROVOC from http://www.fao.org/aims/ag_download.htm

Each descriptor has its equivalent in other languages. Descriptors are indexing terms which consist of one or more words representing one and the same concept. Non-descriptors are terms which help the user to find the appropriate descriptor(s). Non-descriptors are followed by a reference (USE operator) to the descriptor, which is the preferred term. For indexing purposes, it is important that only descriptor terms are used.



It is stated clearly in their website that AGROVOC is free of charge for educational or other strictly non-commercial purposes.
AGROVOC is available for downloading in MySQL, TagText, ISO2709 and Microsoft Access formats. To download the AGROVOC database for off-line use, please send your request to fao-agris-caris@fao.org. When sending the request please specify the following: Full Name, Email, Organisation, Reason for downloading AGROVOC, Comments. AGROVOC is also available through web services. More information available here: http://www.fao.org/aims/ag_webservices.jsp



3.3 Psycology






3.4 Hydrology



http://www.disclic.unige.it/glos_idro/indice.php?list=0&lang=ar&style=1

http://www.cemagref.fr (2000 entries Pdf format)

3.5 Urbanism


Habitat and Urbanism Glossary (AR-EN-FR): This has 3850 Arabic-English-French entries in PDF.



3.6 Chemistry


Elementymology & Elements Multidict (MULTI): This is a multilingual dictionary of the names of chemical elements in many languages. There are alphabetical and numerical lists. Clicking on the name of an element brings the element information page up in the main window. It can be used as a reference.


3.7 Zoology


Zoology Dictionary (EN>AR): This has 2500 terms in alphabetical order.


3.8 Mathematics


Glencoe Online – This is a multilingual Mathematics Glossary (AR-EN-ES-KO-RU-UR-VI-ZH) in pdf files, in the form of an alphabetical list with glosses.


3.9 Islamic terms



3.10 Finance and Banking




3.11 Botanic


  • Spices http://stephkup.nexenservices.com/epices/affichage/liste.htm

4. OTHER LINGUISTIC RESOURCES


4.1 Arabic Conjugators



This allows the online generation of individual verb forms (from I to X) for Arabic verbs with tri-consonantal given roots. ( in Arabic letters).



  • Arabic word form generator. Rudolf W. Meijer


This is more complete than the above. It uses the Latin characters for introducing the Arabic root and it is off line. It has been downloaded and it works for Windows.


  • Jerzy Łacina Poland (MS-Dos programs)

  • Muhallil a simple analyzer of Arabic verbs. Musarrif a simple generator of Arabic verbs.
http://www.staff.amu.edu.pl/~lacina/page4.html
http://www.verba.org/verbi_utf8/all_verbs_index_ar.html
  • Conjugation of Arabic verbs
www.freshmeat.net
  • fa.ala: This is a tool that conjugates Arabic verbs

  • Morfix Arabic Search This is a multilingual search engine using Arabic Morphology and cross-language search.


www.morfix.il


Interesting and useful online tool. Arabic Morfix has a big capacity of morphological searching and is standalone search engine. This tool is a demonstration and it is based on a collection of 200 articles which contain general news items form various sources. In its searching it takes into account the following features: context sensitivity, expanded morphological search, thesaurus search and entering queries in Latin Transcription for Arabic names.





  • Off line Conjugator


www.geocities.com/effel_dahling



  • aConCorde: concordancy program for Arabic by Andrew Roberts. A multilingual tool for processing a corpus. It has been downloaded and tested. It works.


www.comp.leeds.co.uk www.freshmeat.net


The tools called concordancers have as main tasks searching, sorting and classifying words and they are a real help in which concerns the manipulation of corpus.


4.2 Arabic Dependency Treebank

http://www.ircs.upenn.edu/arabic/


5. ARABIC NL PROCESSORS


5.1. Morphological Analyzers


  • Sebawai
Morphological Analyzer (Kareem Darwish)
  • Xerox
http://www.xrce.xerox.com/competencies/content-analysis/arabic/
  • Aramorph

  • Buckwlater
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004L02
  • Morphological Analizer ($590)
http://www.cimos.com/



5.2. Stemmers


  • Al-Stem
- Light stemmer (Kareem Darwish)
  • Light10
- Larkey
  • Chen
  • Khoja

5.3. Root extractors


  • gendic: reduces Arabic words to their roots
www.freshmeat.net

5.4 Transliterators


  • Jtransliterator: a tool that transliterates Arabic scripts to Latin script

www.freshmeat.net

5.5 Other Arabic NL Processors


  • Syntactic Analyser ($990)
http://www.cimos.com/


6. OTHER ARABIC NL TOOLS


Illinois Institute of Technology Information Retrieval System Online Query the TREC Arabic collection http://www.ir.iit.edu:8180/arabic-interface/index.html
  • Arabeyes: It includes various resources.

http://www.arabeyes.org/ Arabeyes is a Meta project that is aimed at fully supporting the Arabic language in the Unix/Linux environment. It is designed to be a central repository for standardizng the Arabization process. Arabeyes relies on voluntary contributions by computer professionals and enthusiasts from all over the world. - Katoob: Editor of Arabic texts - Mozilla: Arabization of Mozilla - ITL: Islamic tools (data calculus,…) - BiCon: Console in Arabic - Quran: Tools for reading the Coran - QaMoose: a oOn-line access to a dictionary (information extracted from the word list) - Akka: Arabization of Linux Consoles - Arabbix: Arabized Linux Live-CD - Bayani: arabized scientific plotter. - Distros: Arabized Linux distributions - Duali: Orthographical corrector - FreeBSD: FreeBSD Arabization

lala: a localization tool for LINUX Arabic support. conv_ara_html: a tool for converting Arabic numeric character references PostArabic: Arabic shaping for PostgreSQL ToIpt: PHP class for writing Farsi and Arabic text on images mule: multilingual emacs ClearlyU: BDF fonts useable for Unicode text Arabeske: an arabesque-like pattern design tool buckwalter2unicode: A Python script to convert from buckwalter to Unicode Encode::Arabic : Perl module that can convert from and to some Arabic encodings (including buckwalter, araTeX, …) FriBidi: a free implementation of the Unicode Bidi algorithm.




7. SLIGHTY COMMENTED BIBLIOGRAPHY


Bibliography on Arabic Linguistics http://www.lib.umich.edu/area/Near.East/ALSLING.html
Selective Bibliography on Arabic Grammar and linguistics http://www.lib.umich.edu/area/Near.East/WFischerBibliography.pdf

7.1 Thesis



Ahmed Farouk Ahmed. () Developing an Arabic Parser in a Multilingual Machine Translation System. Master Thesis. Cairo University (with PROLOG CODE)
Azza Abd and El-Moniem Mohamed. Machine Translation of Noun Phrases: From English to Arabic. Master Tesis. Cairo University.
Kadri Y., Benyamina A. (1992). “Un système d’analyse syntaxico-sémantique du langage arabe non voyellé”, Mémoire d’ingénieur, Université d’Oran.
Kareem Darwish (2003). Probabilistic Methods for Searching OCR-Degraded Arabic Text PHD Thesis.
Mohamed Attia and Mohamed Elaraby Ahmed (2000). A large-scale computational processor of the Arabic morphology and application. Master thesis, Cairo University
MORPHO3 morphological analyzer 4000 roots 1000 patterns
Mona Diab (2003). Word Sense Disambiguation within a Multilingual Framework. PHD Thesis.


R. Al-shalabi (1996). Design and implementation of an Arabic morphological system to support natural language processing. Ph.D. Dissertation. Computer Science Department, Illinois Institute of Technology. Chicago, 1996.

Sabri Elkateb (2005) Design and implementation of an English Arabic dictionary/editor. PhD thesis, The University of Manchester, United Kingdom.

Sebawai Morphological Analyzer.
Al-Stem Light stemmer


7.2. Articles


Abdelhadi Soudi, Violetta Cavalli-Sforza () "Interfacing an Arabic Morphology and sentence generation system with an English-to-Arabic knowledge-based Machine Translation System".
  • KANT MT system

Abdelhadi Soudi, Violetta Cavalli-Sforsa, Abderrahim Jamari (2002a) "The Arabic Noun System Generation", in Proceedings of the International Conference on Arabic Processing, Manouba University,Tunisia.
  • Arabic broken plural
  • Lexema-based model for broken plural
  • Implemented in Morphe

Abdelhadi Soudi, Violetta Cavalli-Sforsa, Abderrahim Jamari (2002b) "A Prototype English-to-Arabic Interlinguabased MT System", in Proceedings of the Processing of Arabic Workshop, Language Resources Evaluation Conference, Las Palmas, Spain.
Abdelhadi Soudi, Jim Cowie, Hamdy S.Soliman (1999) "Interfacing an Arabic Morphological Generator with an Interlingua-based Machine Translation System", Carnegie Mellon University, USA.

Abdelmajid Ben Hamadou (1986) "A Compression Technique for Arabic Dictionaries: The Affix Analysis" COLING 1986-
- Morphological Analyzer
Ahmed Rafea, Khaled Shaalan (1993) "Lexical Analysis of Inflected Arabic Words using Exhaustive Search of an Augmented Transition Network". Software Practice & Experience, Vol 23 (6), pags. 567-588.
  • [Begin_1 | Begin_1] + Stem + [Last_1] + [Last_2] + [Last_3]
  • ATN implemented in Pascal
  • 5 registers
  • 17 flags
  • types of rules
  • Some details are given


Alexander Fraser, Jinxi Xu, Ralph Weischedel (2002) "Cross-Lingual Retrieval at BBN", TREC 2002


Allan Ramsay, Hanady Mansur (2000) "Arabic Morphology: a categorial approach"

  • Recover diacritics missing in MSA texts

Allan Ramsay, Hanady Mansur (2004) "The parser from an Arabic Text-to speech system", Le traitement automatique de l'arabe, JEP-TALN, Fes, 19-21 april 2004
  • Sign-based system


Alshalabi, R. and Evens, M. (1998). "A Computational Morphology System for Arabic", In Workshop on Computational Approaches to Semitic Languages COLING-ACL98, August 16, Montreal, 1998.

Azza Abdel Monem, Khaled Shaalan, Ahmed Rafea, Hoda Baraka. () "A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System"

  • Nespole
  • Grammar rules: Cavalli-Sforza, Soudi
  • Morphological rules: Timothy

Azzah Al-Maskari and Mark Sanderson, "The effect of Machine Translation on the performance of Arabic-English QA System"
Black, W. J., and Elkateb, S. (2004) A Prototype English-Arabic Dictionary Based on WordNet, Proceedings of 2nd Global WordNet Conference, GWC2004, Czech Republic, 67-74.
  • AE bilingual WN
  • Good editor
  • Using Prolog for WN navigation


Black, W., Elkateb, S., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A. and Fellbaum, C., (2006). Introducing the Arabic WordNet Project, in Proceedings of the Third International WordNet Conference, Sojka, Choi, Fellbaum and Vossen eds.

Beesley, K. R. and L. Karttunen: (2000) ‘Finite-State Non-Concatenative Morphotactics’. In: Proceedings of the fifth workshop of the ACL special interest group in computational phonology, SIGPHON-2000. Luxembourg.
Beesley, K. R. and L. Karttunen (2003). Finite-State Morphology: Xerox Tools and Techniques.Cambridge University Press.

Berg, H. (2001) ‘Computers and the Qur’¯an’. In: J. D. McAuliffe (ed.): Encyclopaedia of the Qur’¯an, Vol. One. Leiden–Boston–K¨oln: Brill, pp. 391–395.
Chen, A., Gey, F (2002)."Building an Arabic Stemmer for Information Retrieval". The Eleventh Text Retrieval Conference (TREC 2002)
  • Two light stemmers:
MT-based light stemmer (similar to Larkey's)


Chiang, David, Mona Diab, Nizar Habash, Owen Rambow and Safi Sharif. 2006. Parsing Arabic Dialects. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento, Italy. [ PDF ] Chowdhury, A., Aljlayl, M., Jensen, E., Beitzel, S.,Grossman, D., Frieder, O. (2002)."IIT at TREC 2002 Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval."The Eleventh Text Retrieval Conference (TREC 2002)

  • Two stemmers:
pattern-based (how to get the patterns) deeper light stemmer
Darwish, K., Oard, D. W. (2002).“CLIR experiments at Maryland for Trec-2002 : Evidence combination for Arabic-English retrieval”, Eleventh Text Retrieval Conference (TREC 2002).
Mona Diab.() "Feasibility of Bootstrapping an Arabic WordNet Leveraging Parallel Corpora and an English WordNet"
Mona Diab (2004) "An Unsupervised Approach for Bootstrapping Arabic Sense Tagging" Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automated Methods for Processing Arabic Text: From Tokenization to Base Phrase Chunking. Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer Publications, 2007.

Diab, Mona, Kadri Hacioglu and Daniel Jurafsky (2004). Automatic Tagging of Arabic Text: From raw text to Base Phrase Chunks. In Proceedings of HLT-NAACL 2004.
Dichy. (2001) "On Lemmatization in Arabic – A FormalDefinition of the Arabic Entries of Multilingual Lexical Databases," Proc. of the Workshop on Arabic LanguageProcessing, Toulouse, 2001.
Dichy, J. / A. Farghaly (2003) “Roots & Patterns vs. Stems: on what grounds should a multilingual database centred on Arabic be built?”, in Proceedings of the MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches,September 23, 2003, New Orleans, Louisiana, U.S.A.

Elkateb, S., Black, W., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A. and Fellbaum, C., (2006). Introducing a WordNet for Arabic, in Proceedings of the Fifth International Conference on Language resources 2006, Genoa Italy.

El-Sadany, T. A. and M. A. Hashish, (1989) “An Arabic Morphological System.”In IBM Systems Journal, Vol. 28, No. 4, 600-612, 1989.

Feddagi, A., (1992) ‘Arabic Morpho-syntax and semantic parsing’, Department of Computer Science, University of Manchester, 3rd International Conference on Multilingual, 10-12 Dec., 1992, Univ. of Durham.


Franz, M., McCarley, J. S. (2002)."Arabic Information Retrieval at IBM". The Eleventh Text Retrieval Conference (TREC 2002).
- Presentation of two models for crosslanguage IR (English queries, Arabic documents)


George Anton Kiraz (1994) "Computational Analysis of Arabic Morphology." In Narayanan A. and Ditters E. (eds) The linguistic Computation of Arabic

  • multi-tape two level FST
  • grammars and sample lexicon are included

George Anton Kiraz, (1998)"Arabic Computational Morphology in the West." In Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing, Cambridge, 1998.



Habash, Nizar, Owen Rambow and George Kiraz. Morphological Analysis and Generation for Arabic Dialects. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the Conference of American Association for Computational Linguistics (ACL'05). [ PDF ]

Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis, and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the Conference of American Association for Computational Linguistics (ACL'05). [PDF ]

Habash, Nizar. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004. [ PDF]

Habash, Nizar and Owen Rambow. MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects. In Proceedings of COLING-ACL, Sydney, Australia, 2006 (Main Volume). [ PDF ]

Habash, Nizar and Owen Rambow. A Morphological Analyzer for MSA and the Arabic Dialects. Presented at the Arabic Linguistic Society annual meeting, Kalamazoo. 2006.

Habash, Nizar. "Arabic Morphological Representations for Machine Translation." Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer Publications, 2007.

Habash, Nizar, Bonnie Dorr and Christof Monz. Challenges in Building an Arabic Generation-heavy Machine Translation System and Extending it with Statistical Components. In Proceedings of the Association for Machine Translation in the Americas (AMTA-2006). [ PDF ]

Habash, Nizar and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical Machine Translation, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), New York, 2006. [ PDF]

Habash, Nizar, Abdelhadi Soudi, and Tim Buckwalter. "On Arabic Transliteration." Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer Publications, 2007.

Habash, Nizar. "On Arabic and its Dialects," Multilingual Magazine. Volume 21 Number 3. 2006.

Habash, Nizar and Owen Rambow. Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004. [ PDF ]

Habash, Nizar, Clinton Mah, Randy Calistri-Yeh, Sabiha Imran and Paraic Sheridan. The Design and Validation of an Arabic Conceptual Interlingua for Information Retrieval. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006. Haidar M. Harmanani, Walid T. Keirouz, Saeed Raheel ()"A rule-based extensible stemmer for information retrieval with application to arabic".
Hasnah, A. / Evens, M. (2001), “Arabic/English Cross Language Information Retrieval Using a Bilingual Dictionary”, in: Proceedings of the ACL/EACL 2001 Workshop on Arabic Language Processing: Status and Prospects, July 6, 2001, Toulouse, France.
Hassan Sawaf, Jörg Zaplo, Hermann Ney (2001) "Statistical Classification Methods for Arabic News Articles"

  • Character 3gram and full words
  • MaxEntropy
  • Document clustering
  • Mutual Information


HLAL Y. (1987) ‘Information system and Arabic: the use of Arabic in information system’, Linguistics and Signal & information processing, A subsidiary of Haarper & Row publishing, Inc. 191-197, 1987.


Hudson, G. (1986) "Arabic Root and Pattern Morphology without Tiers" Journal of Linguistics, 22:85-122.


Imad A. Al-Sughaiyer and Ibrahim A. Al-Kharashi. () "Arabic Morphological Analysis Techniques: A Comprehensive Survey". 25 pages, very good. See there sakhr link.
Imad A. Al-Sughaiyer and Ibrahim A. Al-Kharashi. () "Rule Parser for Arabic Stemmer"
Jawad Berri, Hamza Zidoum and Yacine Atif (2001), "Web-based Arabic Morphological Analyzer." In: A.Gelbukh (ed.): CICLing 2001, No. 2004 in Lecture Notes in Computer
John Maloney and Michael Niv. () "TAGARAB: A Fast, Accurate Arabic Name Recognizer

Using High-Precision Morphological Analysis".


Judith Dror. () "Morphological Tagging of the Qur’an", Department of Arabic Language and Literature, University of Haifa.
Kadri, Y. (2003) “Recherche d’information translinguistique sur les documents en arabe”, Rapport de prédoctoral, DIRO, Université de Montréal.
Kenneth, R. Beesley (1996)."Arabic Finite-State Morphological Analysis and Generation" . In Using Xerox tools for Arabic morphology
Kenneth, R. Beesley (1998). "Arabic Morphological Analysis on the Internet", In Proceedings of the International Conference on Multi-Lingual Computing (Arabic & English), Cambridge G.B.,17-18 April, 1998. Using Xerox tools for Arabic morphology


Kenneth, R. Beesley (2001). "Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001". Using Xerox tools for Arabic morphology
Kareem Darwish, Douglas W. Oard.()"Term Selection for Searching Printed Arabic"
Kareem Darwish, Douglas W. Oard.()"Probabilistic Structured Query Methods"
Kazem Taghva, Rania Elkhoury, Jeffrey S. Coombs (2005) "Arabic Stemming Without A Root Dictionary". ITCC (1) 2005: 152-157. More works by Kazem can be found in: http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/t/Taghva:Kazem.html

  • ISRI proposal: Khoja's without root dictionary
  • Complete Algorithm (without pattern sets) is provided

Khoja, Shereen and Garside, Roger (1999) "Stemming Arabic Text" Computer Departament, Lancaster University, Lancaster 1999 http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps
  • System of Arabic stemming. Accuracy over 96%

Larkey, Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) "Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis" In Proceedings of the 25th Annual International Conference on Research and Development in Information Retrieval (SIGIR 2002), Tampere, Finland, August 11-15, 2002, pp. 275-282. http://ciir.cs.umass.edu/pubfiles/ir-249.pdf
  • Improves previous approaches to Arabic stemming using co-occurrence statistics

  • Participation in TREC-11, Light1, Light2, Light3, Light8

Larkey, Leah S. and Connell, Margaret, (2002) "Arabic Information Retrieval at UMass in TREC-10" In Voorhees, E.M. & Harman, D.K. (Eds.), The Tenth Text Retrieval Conference, TREC 2001, NIST Special Publication 500-250, pp. 562-570. http://ciir.cs.umass.edu/pubfiles/ir-254.pdf
  • Participation of UMass in TREC-10 Cross-language track,
  • INQUERY + Language Modelling (LM)
  • Arabic corpus, normalization, using Khoja stemmer
  • several resources
  • AFP Arabic Corpus 383,872 documents
  • Ectaco Dictionary
  • Sakhr Dictionary
  • Sakhr SET MT
  • Place Name Lexicon
  • Stop words

Larkey, Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) "Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis", In Proceedings of the 25th Annual International Conference on Research and Development in Information Retrieval (SIGIR 2002), Tampere, Finland, August 11-15, 2002, pp. 275-282.

Larkey, Leah S. and Connell, Margaret, (2002) "Arabic Information Retrieval at Umass". In Voorhees, E.M. & Harman, D.K. (Eds.) The Tenth Text Retrieval Conference, TREC 2001 NIST Special Publication 500-250, pp. 562-570.
Larkey, Leah S., Ballesteros, Lisa, and Connell, Margaret. (2005) "Light Stemming for Arabic Information Retrieval"
  • Light10
  • Lemur toolkit
  • Affix Removal
  • Statistical Techniques
  • See references
  • Good description of tools (stemmers & morphological analyzers)

Maamouri, Mohamed, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi. Developing and Using a Pilot Dialectal Arabic Treebank. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.

Mahtab Nikkhou, Khalid Choukri (2005) "Survey on Arabic Language Resources and Tools in the Meditarranean Countries", Nemlar Report, March 2005.
Mark Sanderson, Asaad Alberair (2001) "Keep it simple Sheffield - a KISS approach to the Arabic track".

  • Using almisbar, ajeeb MT systems




Rambow, Owen, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Carnegie Keith J. Miller, Teruko Mitamura, Florence Reeder, Advaith Siddharthan. Parallel Syntactic Annotation of Multiple Languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.

Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (ACL-Coling'06). Sydney, Australia. 2006. [PDF

] Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic Verb Classes. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), New York, 2006. [ PDF ]

René Schneider, Thomas Mandl, and Christa Womser-Hacker ()"Integration of Arabic to a Cross-Lingual Retrieval Tool:Challenges and Perspectives".
Riyad Al-Shalabi and Martha Evens (1998) "A computational morphology system for Arabic". In Michael Rosner, editor, Proceedings of the Workshop on Computational Approaches to Semitic languages, pages 66–72, Montreal, Quebec, August. COLING-ACL’98.
Saliba, B. and Al Dannan, A. (1989) “Automatic Morphological Analysis of Arabic: A study ofContent Word Analysis”, In Proceedings of the Kuwait Computer Conference, Kuwait, March 3-5, 1989.


Sabri El-Kateb, William J. Black.(2004) "English-Arabic Dictionary for translation"
Sabri El-Kateb, William J. Black (2001)"Towards the design of English-Arabic terminological and lexical knowledge base"
Schramm, G. (1962), An Outline of Classical Arabic Verb Structure, Language vol. 38, pp. 360-75.
Shereen Khoja (2001) "APT: Arabic Part-of-speech Tagger" Proceedings of the Student Workshop at the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL2001), Carnegie Mellon University, Pittsburgh, Pennsylvania. June 2001. http://www.comp.lancs.ac.uk/computing/users/khoja/NAACL.pdf

  • Tagger for Arabic.
  • Mixed statistic + rule based
  • Trained from a corpus of 50,000 words manually tagged
  • Accuracy 90%

Shereen Khoja, Roger Garside and Gerry Knowles (2001) "An Arabic Tagset for the Morphosyntactic Tagging of Arabic", Corpus Linguistics 2001, Lancaster University, Lancaster, UK, March 2001. To appear in a book entitled A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, edited by Andrew Wilson, Paul Rayson, and Tony McEnery; Lincom-Europa, Munich. http://www.comp.lancs.ac.uk/computing/users/khoja/CL2001.pdf
  • Proposed tagset for Arabic Language
  • Hierarchical tagset
  • 177 tags

Smets, M. (1998). "Paradigmatic Treatment of Arabic Morphology.", In Workshop on Computational Approaches to Semitic Languages COLING -ACL98, August 16, Montreal, 1998.
Soudi, A. (2004) "Challenges in the Generation of Arabic from Interlingua".
Soudi, A. (1999), "Interfacing an Arabic Morphological Generator with an Interlingua-based Machine Translation System", MS. Carnegie Mellon University, USA.


Soudi, A., Eisele, A. (2004) "Generating an Arabic Full-Form Lexicon for Bidirectional Morphology Lookup", in Proceedings of Language Resources Evaluation Conference (LREC), Lisbon, Portugal.
Soudi, A., Cavalli-sforza, V., Jamari, A. (2001), "A Computational Lexeme-based Treatment of Arabic Morphology", in Proceedings of The Arabic Processing Workshop, Association For Computational Linguistics, Toulouse, France, 2001.


Tomlinson, S. (2002) "Experiments in Named Page Finding and Arabic Retrieval with Hummingbird." Eleventh Text Retrieval Conference (TREC 2002)



Violetta Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura.() "Arabic Morphology Generation Using a Concatenative Strategy"

  • Regular and Hollow verbs in detail
  • Using MORPHE for writing rules

Youssef Kadri & Jian-Yun Nie, (1992) "Traduction des requêtes pour la recherche d’information translinguistique anglais-arabe". IR Laboratoire RALI, Département d’informatique et de recherché opérationnelle, Université de Montréal
Zahed Ahmed () "Arabic weak verb formulation and computation".
  • Arabic weak verb formulation using FST implemented in Prolog

Zajac, R. and Casper, M. (1997) “The temple Web Translator”, 1997 Available at: http://www.crl.nmsu.edu/Research/Projects/tide/papers/twt.aaai97.html

Abdelghani Bellaachia list of Bellaachia’s works are available in: http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bellaachia:Abdelghani.html

More works by Leah are here http://ciir.cs.umass.edu/~larkey

JEP-TALN (2004), Traitement Automatique de l’Arabe, Fès, 20 avril 2004

8. OTHER LINKS


http://129.69.218.213/arabtex/doc/arabdoc.pdf ArabTeX link document on typesetting Arabic, Hebrew, etc.
http://cpan.uwinnipeg.ca/dist/Encode-Arabic Encode-Arabic, Perl extension for encodings of Arabic can be downloaded.
http://www.arabic-domains.org/intrnational-entites.php Links to International entities concerned with Arabic domain names for completely Arabic internet
http://www.arabismo.com/ Arabic resources list http://www.alburaq.net/dictionary1/transform.cfm English-Arabic, Arabic-English, and it has a search facility for Arabic words by root or free search. Online dictionary.
http://literary.ajeeb.com/ (Registration needed). Only links to different sites related to Sakhr and its Arabic solutions, like Tarjim, Johaina (news), Siraj (text mining), etc.
http://english.ajeeb.com/ (Registration needed) in English On-line literary dictionary Virtual keyboard

Only links to different sites related to Sakhr and its Arabic solutions, like Tarjim, Johaina (news), Siraj (text mining), etc.
http://www.cimos.com/
  • Can translate full text and words.
  • Multilingual NLP tools (English, French, Arabic….)



http://www.lexicool.com/ Lists of several resources of many languages and different language pairs.
http://www.al-bab.com/arab/comp2.htm Provides links to several resources like dictionaries, keyboard layouts, translation software, etc.
http://www.languageguide.org/arabic/ - In Arabic - Visual vocabulary classified on subjects
http://wordnet.princeton.edu/links WordNet WEB-GUIs
www.memodata.com Alexandria: application that allows to look for words in a dictionary with a click on a word in a web page. Several to several languages.

9. MISCELLANEOUS


  • Workshops (for saving all the proceedings)

Atlas 1999, Arabic Translation and Localization Symposium (University of Tunise) ACL Workshop on Arabic Language Processing: Status and Perspective (2001) ACL Workshop on Computational Approaches to Semitic Languages (2002, University of Pennsylvania) TAL 06, France (EURAR project DICO may be ongoing)
  • Gateway

ayna.com alltheweb.com alidrisi.com hahoua.com google.au (interesting Arabic google version)