Arabic NLP Resources

for the Arabic WordNet Project



William BLACK,
Sabri ELKATEB
School of Informatics
University of Manchester
Sackville Street, Manchester, M60 1QD,
w.black@manchester.ac.uk,
sabri.elkateb@manchester.ac.uk
Manuel BERTRAN, Xavier FARRERES, David FARWELL, Reda HALKOUM, Horacio RODRIGUEZ
Politechnical University of Catalonia,
horacio@lsi.upc.edu,
mbertran@lsi.upc.edu Musa ALKHALIFA,
Tànit ASSAF,
M.A. MARTI,
University of Barcelona
Gran Via 585, 08007-Barcelona
{musa,tanit}@thera-clic.com,
amarti@ub.edu
Piek VOSSEN Irion Technologies
Delftechpark 26, 2628XH,
Delft, The Netherlands
piek.vossen@irion.nl
Adam PEASE
Articulate Software Inc,
420 College Ave
Angwin, CA 94508
apease@articulatesoftware.com Christiane FELLBAUM
Princeton University,
Department of Psychology,
Green Hall, Princeton, NJ 08544
fellbaum@clarity.princeton.edu
Table of Contents

1. INTRODUCTION 3

2. OPEN DOMAIN LEXICAL RESOURCES 5

2.1 Arabic Monolingual Corpora. 5

2.2 Arabic/English/... Parallel Corpora. 5

2.3 Arabic Monolingual Dictionaries and Lexicons 6

2.4 Arabic/English Bilingual dictionaries and lexicons 7

2.4.1 Printed bilingual dictionaries 7

2.4.2. On-line MRD (Machine Readable Dictionaries) 8

2.5 Lexicons obtained from (selective) access to online MT systems: 11

2.6 Stopwords 11

2.7 Gazetteers 12

2.8 Online newspapers 12

2.9 On line Press Agencies 15

2.10 List of verbs 16

2.11 List of roots 16

2.12 Electronic Books 16

3. DOMAIN RESTRICTED LEXICAL RESOURCES 16

3.1 Medical domain 17

3.2 Agriculture and related domains 18

3.3 Psycology 19

3.4 Hydrology 19

3.5 Urbanism 19

3.6 Chemistry 19

3.7 Zoology 20

3.8 Mathematics 20

3.9 Islamic terms 20

3.10 Finance and Banking 20

3.11 Botanic 20

4. OTHER LINGUISTIC RESOURCES 20

4.1 Arabic Conjugators 20

4.2 Arabic Dependency Treebank 21

5. ARABIC NL PROCESSORS 22

5.1. Morphological Analyzers 22

5.2. Stemmers 22

5.3. Root extractors 22

5.4 Transliterators 22

5.5 Other Arabic NL Processors 23

6. OTHER ARABIC NL TOOLS 23

7. SLIGHTY COMMENTED BIBLIOGRAPHY 24

7.1 Thesis 24

7.2. Articles 25

8. OTHER LINKS 32

9. MISCELLANEOUS 33


1. INTRODUCTION


This report is intended to be a guide to resources (both linguistic data and linguistic processors and tools) that have been used (or at least tried) or simply considered for use during the development of AWN.
Our intention is to maintain an evolving document, for the duration of the project, where new resources and new comments or assessments on previous items could be added on the fly. Thus, this initial version 0 will be followed (we hope) by other increasingly useful versions.
The report is not intended to be a complete survey of Arabic NLP resources and tools. We have focused on resources related to the needs of AWN and on free resources.
For more in depth information on Arabic NLP resources, besides the content of this report and the links included in it, the following references could be useful:
http://www.ccls.columbia.edu/cadim/links.html
http://www.nemlar/org/
http://cf.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/search/search-all- res2.cfm?res=All&AppLanguageId=43&search1=search1
Non Arabic-specific resource repositories (but including valuable Arabic resources and tools) can be found in:
- Arabic Gigaword http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T12 - Arabic Gigaword Second Edition: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02
http://www.elsnet.org/
- An-Nahar Newspaper Text Corpus http://www.elda.org/catalogue/en/text/W0027.html - DixAF (Bilingual Dictionary French Arabic, Arabic French) http://www.elda.org/catalogue/en/text/M0040.html - Arabic Data Set http://www.elda.org/catalogue/en/text/W0030.html - "Le Monde Diplomatique" Text corpus in Arabic http://www.elda.org/catalogue/en/text/W0030.html

Additional useful information and useful links can be found on the Web pages of a number of people or institutions: http://www-nlp.stanford.edu/links/statnlp.html. An annotated list of resources for Satistical NLP and corpus based Computational linguistics http://www.siglex.org/: A Special Interest Group on the Lexicon of ACL (Association for Computational Linguistics) - http://sites.univ-lyon2.fr/langues_promodiinar/Accueil.htm - http://elsap1.unicaen.fr/

A useful recent survey (very extensive but mainly focussing on commercial products) is:
- Mahtab Nikkhou, Khalid Choukri (2005), Survey of Arabic Language Resources and Tools in the Mediterranean Countries. Nemlar Report, March 2005
For the sake of completeness a slightly commented bibliography of Arabic NLP is included.

2. OPEN DOMAIN LEXICAL RESOURCES


2.1 Arabic Monolingual Corpora.


- From LDC several corpora are available - This corpus must be pre-processed in order to use it for probability estimation. To this end normalization and light stemming should be sufficient (see available tools for this purpose below). - Nijmegen Corpus: http://www.let.kun.nl/wba/Content2/1.4.5_Nijmegen_Corpus.htm - News articles - Arabic Corpus

2.2 Arabic/English/... Parallel Corpora.


http://157.150.97.21/dgaacs/unterm.nsf http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm
http://arabiCorpus.byu.edu Is being designed to allow students and scholars to search large untagged Arabic corpora for words and structures. It provides information on word frequency, citations giving 10 words before and 10 words after, and information on collocates of the word in question’.
http://english.aljazeera.net/HomePage
www.aps.dz New webpage with intersting parallel articles.
http://termweb.unesco.org/Default.asp?admin=1&internet=1
http://www.fao.org
http://www.arabization.org.ma/dictionnaire.asp
ftp://ftp.microsoft.com/developr/msdn/newup/Glossary 27 Mb of Computer Science Glossary
- Environmental Terms - A Trilingual Glossary by Yaron Batit.
http://www.clsp.jhu.edu/ws99/projects/mt/
http://enlil.ff.cuni.cz/veda/projekty/clara.htm

2.3 Arabic Monolingual Dictionaries and Lexicons



http://www.malayin.com/laut.asp?catid=2
http://fadakbooks.com/ardia.html
http://dictionary.sakhr.com/ http://qamoos.sakhr.com/intro/introles.asp?lex_id=6 ( In Arabic)
http://www.lsi.upc.es/~halkoum/aralisan.php
- Vegetables http://www.tammar.4t.com/vegta.htm - Plants http://www.tammar.4t.com/herb.htm - Fruits http://www.tammar.4t.com/fruit.htm
http://www.khayma.com/roqia/nabaway.HTM (5 old books online)

2.4 Arabic/English Bilingual dictionaries and lexicons


2.4.1 Printed bilingual dictionaries


From the large quantity of dictionaries that are available, the most relevant sources for this section are:
http://www.malayin.com/laut.asp?catid=2
The list includes the popular Al Mawrid English-Arabic Dictionary and Al Mawrid Arabic-English Dictionary (printed version with CD-ROM).
http://www.bibliomonde.com/pages/fiche-auteur.php3?id_auteur=1508
- Dictionnaire Larousse Saturne arabe - français / français – arabe. Publishing House Larousse: 150,000 words and phrases
The Dutch Language Union, Amsterdam
MSA Dutch dictionary is based on a corpus of 3,000,000 words. Mark Van Mol has compiled the lexical data base and he may have an electronic version of this dictionary. He is the director of the Leuven Group (Belgium) and he has many publications in ANLP.
http://www.kuleuven.ac.be/ilt/arabic/index_en.htm http://mark.vanmol at ilt.kuleuven.ac.be

This dictionary is one of the most important for many, perhaps the only one in use for many years and it has been quoted by numerous English language authors
It can be found at:
www.spokenlanguage.com or http://www.amazon.com/gp/reader/0879500034/ref=sib_dp_pt/103-9733622-2675046
We consider this dictionary to be necessary given for our needs.
www.let.kun.nl
This dictionary must be considered to be very important and useful.

2.4.2. On-line MRD (Machine Readable Dictionaries)


In this section we deal with a large quantity of information that is continuously changing and being updated.
http://www.jccm.es/educacion/atenc_div/diccionario_arabe/

www.StudyQuran.co.uk
A complete version is available for free on line (even though www.aramedia.com is selling it for over than $450??).
This is the largest lexicon available comprising 8 volumes (about 3200 pages). The dictionary’s author spent over 30 years on compiling it.
As with any Arabic dictionary it is organized by roots, and it is also available on line.
Although this dictionary was expected to be finished months ago, it was only available as of December 2005. Because they expected heavy online traffic they announced that the links sometimes would not be working properly.

It contains 122,920 entries in XML format including Arabic proper names and it is organized as follows: Arabic word # Part of Speech # English word
http://crl.nmsu.edu/~ahmed/
http://crl.nmsu.edu/Resources/lang_res/arabic.html

www.mesiti.it/arabic/search_dict.asp
This dictionary it has been created by a group of teachers from Italy. It allows the user to introduce Arabic or English words although the search is done using roots. Plurals and feminine forms of adjectives and nouns are also provided when necessary using a manify function (Javascript).
www.arabsun.de
It also allows the user to introduce English and Arabic words using an Arabic keyboard.
http://www.jccm.es/educacion/atenc_div/diccionario_arabe
http://www.edu365.com/agora/dic/catala_arab/



- Bidirectional dictionary. ($49.9). Free sample. - English <-> Arabic lexicon ($159). - French <->Arabic dictionary (800 words).







http://qamoos.sakhr.com/intro/mgz01.asp


This has information about the Sakhr English-Arabic dictionary and useful information on Arabic grammar and Arabic language technology in general.



http://lib-thesaurus.un.org/LIB/DHLUNBISThesaurus.nsf/$$searcha?OpenForm


2.5 Lexicons obtained from (selective) access to online MT systems:





http://www.sakhr.com/Sakhr_e/Products/Idrisi.htm?Index=2&Main=Products&Sub=Idrisi ] - http://qamoos.sakhr.com/idrisidic_1.asp?Sentence=car

2.6 Stopwords


Some papers and tools related in a way to Arabic stopwords:
http://www.lemurproject.org/
The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows’.






2.7 Gazetteers











2.8 Online newspapers



Title

Web page

Country




El Khabar

http://www.ech-chaab.com/ Algeria
El Moudjahid http://www.elkhabar.com/accueil/ Algeria
Ech-Chaab http://www.elmoudjahid-dz.com/ Algeria
El Rai http://www.elrai.com/ Algeria
El Watan http://www.elwatan.com/ Algeria
El Youm http://www.el-youm.com/ Algeria



Al Moharer http://al-moharer.freeservers.com/ Australia



Al Ayam http://www.alayam.com/ Bahrain



Al Ahram http://www.ahram.org.eg/

Egypt

Al Shaab

http://www.alshaab.com/

Egypt




Ashar Al-Awsat http://www.asharqalawsat.com/

England




Al Ahali

http://www.ahali-iraq.com/ Iraq
Al-Efyaa

Iraq




Alquds Daily Newspaper

http://www.alquds.com/ Israel
Arabynet (Yediot Achronot) www.arabynet.com Israel



Ad Dustour http://www.addustour.com/Default/Default.asp Jordan
Al Arab Al Yawm http://www.alarabalyawm.net/ Jordan
Al Rai http://www.alrai.com/ Jordan
Albawa http://www.albawa.com/ Jordan



Al Qabas http://www.alqabas.com.kw/ Kuwait
Al Rai Al Aam http://www.alqabas.com.kw/

Kuwait

Al Seyassah

http://www.alarabalyawm.net/

Kuwait

Al Watan http://www.alwatan.com.kw/

Kuwait




Al Anwar

http://www.alanwar.com/ar/ Lebanon
Al Liwaa http://www.aliwaa.com/ Lebanon
Al Mustaqbal http://www.almustaqbal.com/ Lebanon
Annahar http://www.annaharonline.com/

Lebanon

An-Nahar

http://www.annahar.com.lb/

Lebanon

As-Safir

http://assafir.com/iso/today/front/summary.html Lebanon



Al Fair al Jadid http://www.alfajraljadeed.com Libya
Al Jamahiriyah http://www.aljamahiria.com/ Libya
Al Shames http://www.alshames.com/

Libya

Libyan Press

http://www.libyanpress.com/ Libya



Al Watan http://www.alwatan.com/

Oman

Oman Arabia Daily

http://www.omandaily.com/ Oman


Palestine
Al Ayyam
Palestine
Al hayat Al Jadedah http://www.alhayat-j.com/ Palestine
Al Manar http://www.manar.com/ Palestine
Al Quds http://www.alquds.com/ Palestine
Al Sabar http://www.hanitzotz.com/alsabar/ Palestine



Fasl Al-Maqal http://www.fasl-almaqal.com/ Qatar
Al Watan http://www.al-watan.com/ Qatar
Al-Sharq http://www.al-sharq.com/site/topics/index.asp?cu_no=1&temp_type=44 Qatar
Raya http://www.al-sharq.com/site/topics/index.asp?cu_no=1&temp_type=44 Qatar



Akhbar http://www.elakhbar.org.eg/ Saudi Arabia
Al Hayat http://www.alhayat.com/ Saudi Arabia
Al Itidal http://www.ynh.com/al-itidal/ Saudi Arabia
Al Jazirah http://www.ynh.com/al-itidal/ Saudi Arabia
Al Madinah http://www.almadinah.com/ Saudi Arabia
Al Riyadth
Saudi Arabia
Al-Aalam Al-Islami http://www.muslimworldleague.org/paper/1875/index.htm

Saudi Arabia

Asharq Al-Awsat

http://www.asharqalawsat.com/

Saudi Arabia

Okaz

http://www.okaz.com.sa/okaz/





Adaraweesh

http://members.tripod.com/~adaraweesh/ Sudan
Al Rayaam http://www.rayaam.net/ Sudan
Alosbua Sudanese Daily http://alosbua.com/alosbua/ Sudan
Alray Alaa’m http://www.rayaam.net/ Sudan
Alwaan http://www.qatartop.com/ Sudan



Al Baath http://www.albaath.news.sy/epublisher/user/ Syria
Al Bawaba http://www.albawaba.com/ar/countries/Syria/ Syria
Al Furat http://furat.alwehda.gov.sy/ Syria

Al Jamahir http://jamahir.alwehda.gov.sy/ Syria
Al Maukef Al Riadi http://riadi.alwehda.gov.sy/_View_news2.asp?FileName=2748410720060206235922 Syria
Al Ouruba http://ouruba.alwehda.gov.sy/ Syria
Al Thawra http://www.thawra.com/ Syria
Al Wahda http://www.thawra.com/data/wehda/ Syria
Teshreen http://tishreen.info/ Syria



Assabah
Tunisia
Essahafa http://www.tunisie.com/LaPresse/ Tunisia



Akbar Al Arab http://www.akhbaralarab.co.ae/

United Arab Emirates

Al Bayan http://www.albayan.ae/servlet/Satellite?pagename=Bayan/Page/BayanPage&c=Page&cid=1039065325549

United Arab Emirates

Al Ittihad http://www.alittihad.co.ae/

United Arab Emirates

Al Khaleej http://www.alkhaleej.co.ae/

United Arab Emirates




26 September

http://www.26september.com/ Yemen
Al Mathak http://www.gpc.org.ye/mathak.htm Yemen
Al Thagafiah http://www.y.net.ye/althaqafiah/ Yemen
Al-Ayyam http://www.al-ayyam.info/ Yemen
Al-Gumhuryah http://www.y.net.ye/al-gumhuryah/ Yemen
Al-Sahwa http://www.alsahwa-yemen.net/

Yemen

Al-Shoura

http://www.y.net.ye/shoura/

Yemen

Al-Thawrah

http://www.althawra.gov.ye/

Yemen

Al-Wahdawi

http://www.alwahdawi.net/

Yemen

Attariq


Yemen

Naba Al Hakekah http://www.y.net.ye/naba/ Yemen
Ray http://www.ray-yem.com/

Yemen



2.9 On line Press Agencies

www.aps.dz //Fr En Ar http://www.afp.fr/arabic/home/ //Fr En Ar Sp
http://www.map.ma/ar //Fr En Ar Sp
http://www.arabic.xinhuanet.com/arabic/index.htm //En Ar
http://news.bbc.co.uk/hi/arabic/news/ //En Ar
The agencies quoted do not lay out s paralle articles except the Chinese and Algerian agency.

2.10 List of verbs


http://www.verba.org/verbi_utf8/all_verbs_index_ar.html The database includes for each verb the vowelized forms of: imperative, conditional, jussive 620 verbs


2.11 List of roots


A list of triliteral and quadriliteral roots organized in Arabic alphabetical order compiled by Tim Buckwalter but not available in his webpage www.qamus.org
We found it at: www.angelfire.com/tx4/lisan/roots1.htm

http://www.openburhan.com/ob_main_frame.html
We will use this list for generating automatically a corpus. Instead of extracting the root of the word, we make the opposite step from the root and the various forms of patterns, then reconsitue a lexicon.

2.12 Electronic Books


Here we can find a vast and copious collection of free Arabic books.
http://www.almeshkat.net/books/index.php (2282 books) http://www.al-eman.com/Islamlib/ http://tafsir.org/books/menu.php?action=new


3. DOMAIN RESTRICTED LEXICAL RESOURCES


UNTERM United Nations Terminology Database (AR-EN-ES-FR-RU-ZH) : This has 70,000 entries in 6 Official Languages and its content can be extracted because the queries result in long lists of words in English and Arabic. It covers over 80 different domains:


COUNTRY NAME, AIDS, agriculture, atmospheric science, biodiversity, bioscience, budget and management, cartography and geography, child welfare, climate change, codes and regulations, communication, core concept, culture, declarations, demographics, development, disarmament, disasters, discrimination, documents, economics, education and training, energy, environment, export controls and sanctions, finance, fisheries, food, forestry, functional and other titles, geoscience, governance, Greek, habitat, health and medicine, human rights, humanitarian issues, indigenous peoples, information technology, intellectual property, international law, international relations, international trade, labour, landmines and mine action, Latin, law enforcement, law of the sea, logistics and supplies, meetings, migrations and refugees, military abbreviations, military issues, multilateral instruments, narcotic drugs, national law, natural resources, nuclear science, oceanography, organizational structure, peace and security, peace operations, plans of action and initiatives, political life, poverty, religions, science and technology, set phrases, small arms, social issues, space, staff matters, statistics, TALOS, terrorism, transport and communications, water, weapons of mass destruction, women.





UNESCOTERM Search (AR-DE-EN-ES-FR-RU-ZH) :

This can be used as reference and its content can be extracted. It includes terms related to UNESCO such as administrative and financial terms, education, conferences and meetings, etc).


UNESCO Structures, Superseded UNESCO Structures, Institutions: (IGOs, NGOs, etworks, Systems, Foundations), IOC: Titles, Terms and Acronyms , Administrative and Financial Terms , International (Days, Weeks, Years and Decades), Campaigns and Appeals, UNESCO's Member States, UNESCO's Standard-Setting Instruments, International Prizes, (Non-Member States, Non-Self-governing Territories, Dependent Territories etc.), UNESCO Chairs, Miscellaneous, UN and International Legal Instruments, UNESCO Functions and Titles, (Conferences, Meetings etc.), Terms in the field of Education, (UNESCO's Programmes, Projects, Initiatives), (International Programmes, Projects, Initiatives), Former Institutions: (IGOs, NGOs, Networks, Systems, Foundations)



3.1 Medical domain


This has the Unified Medical Dictionary (UMD) from the World Health Organization along with its specialized UMD dictionaries which cover more than 70 domains. Entries are arranged by alphabetical order in every domain and one can see all the English entries with their Arabic equivalents page by page. All medical terms were approved by the Arab Academies in Cairo, Damascus, Baghdad and Amman. They also made sure that the Arabic terms were selected carefully in accordance with a very strict, clear, simplified and user-friendly methodology. An electronic version of this edition is available on CD-ROM in a Windows environment, and comprises about 150 000 terms.


A copy of this CD-ROM is available from khayat@emro.who.int



The domains include (numbers refers to number of entries of the domains sampled):

All specialised UMD dictionaries: Abbreviations (799 entries), Acidology (1669), Acronyms (248), Anatomy (2000), Anesthesiology (484), Anthropology and anthropometrics (1427), Bacteriology (1827), Biochemistry and Chemistry (2000), Biology, Biomedical engineering, Biomedical ethics, Biostatistics, Blood transfusion medicine, Botany, Cardiology and cadiovascular surgery, Cell biology, Demography, Dentistry(2000), Dermatology, Diagnostics(symptoms&signs), Embryology & teratology, Emergency medicine, Endocrinology & metabolism, Entomology, Environmental health, Enzymology and Zymology, Family and community medicine, Food safety, Forensic medicine, Gastroenterology, Genitourinary medicine , venereology and STDs, Health services, Helminthology, hematology, Histology, Hospital administration, Immunology, Infectious diseases, Informatics, Laboratory medicine, Maternal and child health, Measures, Microbiology, Mycology, Nephrology, Neurology, Nutrition and dietetics, Obstetrics and gynecology, Occupational medicine, industrial medicine, Oncology, Ophthalmology and optics, Orthopedics, Otorhinolaryngology, Parasitology, Pathology, Pediatrics, Pharmacology and therapeutics, Physiatrics and physical medicine, Physiology, Prefixes, Preventive medicine, Public health, community mdeicine and hygiene, Reproductive health, Sexology, Suffixes, Surgery, Taxonomy, nosology and classification (1118), Toxicology, Transplantation, Tropical medicine, Virology, WHO managerial terms (2000), Zoology (997). Helminthology,  hematology,  Histology,  Hospital administration,  Immunology,  Infectious diseases,  Informatics,  Laboratory medicine,  Maternal and child health,  Measures,  Microbiology, Mycology,  Nephrology,  Neurology,  Nutrition and dietetics,  Obstetrics and gynecology, Occupational medicine, industrial medicine,  Oncology,  Ophthalmology and optics,  Orthopedics, Otorhinolaryngology,  Parasitology,  Pathology,  Pediatrics,  Pharmacology and therapeutics, Physiatrics and physical medicine,  Physiology,  Prefixes,  Preventive medicine,  Public health, community mdeicine and hygiene,  Reproductive health,  Sexology,  Suffixes,  Surgery,  Taxonomy, nosology and classification (1118),  Toxicology,  Transplantation,  Tropical medicine,  Virology, WHO managerial terms (2000),  Zoology (997).




3.2 Agriculture and related domains



You can download a copy of AGROVOC from http://www.fao.org/aims/ag_download.htm

Each descriptor has its equivalent in other languages. Descriptors are indexing terms which consist of one or more words representing one and the same concept. Non-descriptors are terms which help the user to find the appropriate descriptor(s). Non-descriptors are followed by a reference (USE operator) to the descriptor, which is the preferred term. For indexing purposes, it is important that only descriptor terms are used.



It is stated clearly in their website that AGROVOC is free of charge for educational or other strictly non-commercial purposes.
AGROVOC is available for downloading in MySQL, TagText, ISO2709 and Microsoft Access formats. To download the AGROVOC database for off-line use, please send your request to fao-agris-caris@fao.org. When sending the request please specify the following: Full Name, Email, Organisation, Reason for downloading AGROVOC, Comments. AGROVOC is also available through web services. More information available here: http://www.fao.org/aims/ag_webservices.jsp



3.3 Psycology






3.4 Hydrology



http://www.disclic.unige.it/glos_idro/indice.php?list=0&lang=ar&style=1

http://www.cemagref.fr (2000 entries Pdf format)

3.5 Urbanism


Habitat and Urbanism Glossary (AR-EN-FR): This has 3850 Arabic-English-French entries in PDF.



3.6 Chemistry


Elementymology & Elements Multidict (MULTI): This is a multilingual dictionary of the names of chemical elements in many languages. There are alphabetical and numerical lists. Clicking on the name of an element brings the element information page up in the main window. It can be used as a reference.


3.7 Zoology


Zoology Dictionary (EN>AR): This has 2500 terms in alphabetical order.


3.8 Mathematics


Glencoe Online – This is a multilingual Mathematics Glossary (AR-EN-ES-KO-RU-UR-VI-ZH) in pdf files, in the form of an alphabetical list with glosses.


3.9 Islamic terms



3.10 Finance and Banking




3.11 Botanic


  • Spices http://stephkup.nexenservices.com/epices/affichage/liste.htm

4. OTHER LINGUISTIC RESOURCES


4.1 Arabic Conjugators



This allows the online generation of individual verb forms (from I to X) for Arabic verbs with tri-consonantal given roots. ( in Arabic letters).



  • Arabic word form generator. Rudolf W. Meijer


This is more complete than the above. It uses the Latin characters for introducing the Arabic root and it is off line. It has been downloaded and it works for Windows.


  • Jerzy Łacina Poland (MS-Dos programs)

  • Muhallil a simple analyzer of Arabic verbs. Musarrif a simple generator of Arabic verbs.
http://www.staff.amu.edu.pl/~lacina/page4.html
http://www.verba.org/verbi_utf8/all_verbs_index_ar.html
  • Conjugation of Arabic verbs
www.freshmeat.net
  • fa.ala: This is a tool that conjugates Arabic verbs

  • Morfix Arabic Search This is a multilingual search engine using Arabic Morphology and cross-language search.


www.morfix.il


Interesting and useful online tool. Arabic Morfix has a big capacity of morphological searching and is standalone search engine. This tool is a demonstration and it is based on a collection of 200 articles which contain general news items form various sources. In its searching it takes into account the following features: context sensitivity, expanded morphological search, thesaurus search and entering queries in Latin Transcription for Arabic names.





  • Off line Conjugator


www.geocities.com/effel_dahling



  • aConCorde: concordancy program for Arabic by Andrew Roberts. A multilingual tool for processing a corpus. It has been downloaded and tested. It works.


www.comp.leeds.co.uk www.freshmeat.net


The tools called concordancers have as main tasks searching, sorting and classifying words and they are a real help in which concerns the manipulation of corpus.


4.2 Arabic Dependency Treebank

http://www.ircs.upenn.edu/arabic/


5. ARABIC NL PROCESSORS


5.1. Morphological Analyzers


  • Sebawai
Morphological Analyzer (Kareem Darwish)
  • Xerox
http://www.xrce.xerox.com/competencies/content-analysis/arabic/
  • Aramorph

  • Buckwlater
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004L02
  • Morphological Analizer ($590)
http://www.cimos.com/



5.2. Stemmers


  • Al-Stem
- Light stemmer (Kareem Darwish)
  • Light10
- Larkey
  • Chen
  • Khoja

5.3. Root extractors


  • gendic: reduces Arabic words to their roots
www.freshmeat.net

5.4 Transliterators


  • Jtransliterator: a tool that transliterates Arabic scripts to Latin script

www.freshmeat.net

5.5 Other Arabic NL Processors


  • Syntactic Analyser ($990)
http://www.cimos.com/


6. OTHER ARABIC NL TOOLS


Illinois Institute of Technology Information Retrieval System Online Query the TREC Arabic collection http://www.ir.iit.edu:8180/arabic-interface/index.html
  • Arabeyes: It includes various resources.

http://www.arabeyes.org/ Arabeyes is a Meta project that is aimed at fully supporting the Arabic language in the Unix/Linux environment. It is designed to be a central repository for standardizng the Arabization process. Arabeyes relies on voluntary contributions by computer professionals and enthusiasts from all over the world. - Katoob: Editor of Arabic texts - Mozilla: Arabization of Mozilla - ITL: Islamic tools (data calculus,…) - BiCon: Console in Arabic - Quran: Tools for reading the Coran - QaMoose: a oOn-line access to a dictionary (information extracted from the word list) - Akka: Arabization of Linux Consoles - Arabbix: Arabized Linux Live-CD - Bayani: arabized scientific plotter. - Distros: Arabized Linux distributions - Duali: Orthographical corrector - FreeBSD: FreeBSD Arabization

lala: a localization tool for LINUX Arabic support. conv_ara_html: a tool for converting Arabic numeric character references PostArabic: Arabic shaping for PostgreSQL ToIpt: PHP class for writing Farsi and Arabic text on images mule: multilingual emacs ClearlyU: BDF fonts useable for Unicode text Arabeske: an arabesque-like pattern design tool buckwalter2unicode: A Python script to c