Arabic
NLP Resources
for the Arabic WordNet Project
|
William BLACK,
Sabri ELKATEB School of Informatics University of Manchester Sackville Street, Manchester, M60 1QD, w.black@manchester.ac.uk, sabri.elkateb@manchester.ac.uk Manuel BERTRAN, Xavier FARRERES, David FARWELL, Reda HALKOUM, Horacio RODRIGUEZ Politechnical University of Catalonia, horacio@lsi.upc.edu, mbertran@lsi.upc.edu Musa ALKHALIFA, Tànit ASSAF, M.A. MARTI, University of Barcelona Gran Via 585, 08007-Barcelona {musa,tanit}@thera-clic.com, amarti@ub.edu |
Piek VOSSEN
Irion
Technologies
Delftechpark 26, 2628XH, Delft, The Netherlands piek.vossen@irion.nl Adam PEASE Articulate Software Inc, 420 College Ave Angwin, CA 94508 apease@articulatesoftware.com Christiane FELLBAUM Princeton University, Department of Psychology, Green Hall, Princeton, NJ 08544 fellbaum@clarity.princeton.edu |
2. OPEN DOMAIN LEXICAL RESOURCES 5
2.1 Arabic Monolingual Corpora. 5
2.2 Arabic/English/... Parallel Corpora. 5
2.3 Arabic Monolingual Dictionaries and Lexicons 6
2.4 Arabic/English Bilingual dictionaries and lexicons 7
2.4.1 Printed bilingual dictionaries 7
2.4.2. On-line MRD (Machine Readable Dictionaries) 8
2.5 Lexicons obtained from (selective) access to online MT systems: 11
3. DOMAIN RESTRICTED LEXICAL RESOURCES 16
3.2 Agriculture and related domains 18
4. OTHER LINGUISTIC RESOURCES 20
4.2 Arabic Dependency Treebank 21
5.1. Morphological Analyzers 22
5.5 Other Arabic NL Processors 23
7. SLIGHTY COMMENTED BIBLIOGRAPHY 24
This has information about the Sakhr English-Arabic dictionary and useful information on Arabic grammar and Arabic language technology in general.
COUNTRY
NAME, AIDS, agriculture, atmospheric science, biodiversity,
bioscience, budget and management, cartography and geography, child
welfare, climate change, codes and regulations, communication, core
concept, culture, declarations, demographics, development,
disarmament, disasters, discrimination, documents, economics,
education and training, energy, environment, export controls and
sanctions, finance, fisheries, food, forestry, functional and other
titles, geoscience, governance, Greek, habitat, health and medicine,
human rights, humanitarian issues, indigenous peoples, information
technology, intellectual property, international law, international
relations, international trade, labour, landmines and mine action,
Latin, law enforcement, law of the sea, logistics and supplies,
meetings, migrations and refugees, military abbreviations, military
issues, multilateral instruments, narcotic drugs, national law,
natural resources, nuclear science, oceanography, organizational
structure, peace and security, peace operations, plans of action and
initiatives, political life, poverty, religions, science and
technology, set phrases, small arms, social issues, space, staff
matters, statistics, TALOS, terrorism, transport and communications,
water, weapons of mass destruction, women.
UNESCOTERM
Search (AR-DE-EN-ES-FR-RU-ZH)
:
This can be used as reference and its content can be extracted. It includes terms related to UNESCO such as administrative and financial terms, education, conferences and meetings, etc).
UNESCO Structures,
Superseded UNESCO Structures, Institutions: (IGOs, NGOs, etworks,
Systems, Foundations), IOC: Titles, Terms and Acronyms ,
Administrative and Financial Terms , International (Days, Weeks,
Years and Decades), Campaigns and Appeals, UNESCO's Member States,
UNESCO's Standard-Setting Instruments, International Prizes,
(Non-Member States, Non-Self-governing Territories, Dependent
Territories etc.), UNESCO Chairs, Miscellaneous, UN and International
Legal Instruments, UNESCO Functions and Titles, (Conferences,
Meetings etc.), Terms in the field of Education, (UNESCO's
Programmes, Projects, Initiatives), (International Programmes,
Projects, Initiatives), Former Institutions: (IGOs, NGOs, Networks,
Systems, Foundations)
A copy of this
CD-ROM is available from khayat@emro.who.int
The domains include (numbers refers to number of entries of the domains sampled):
All specialised UMD dictionaries: Abbreviations (799 entries), Acidology (1669), Acronyms (248), Anatomy (2000), Anesthesiology (484), Anthropology and anthropometrics (1427), Bacteriology (1827), Biochemistry and Chemistry (2000), Biology, Biomedical engineering, Biomedical ethics, Biostatistics, Blood transfusion medicine, Botany, Cardiology and cadiovascular surgery, Cell biology, Demography, Dentistry(2000), Dermatology, Diagnostics(symptoms&signs), Embryology & teratology, Emergency medicine, Endocrinology & metabolism, Entomology, Environmental health, Enzymology and Zymology, Family and community medicine, Food safety, Forensic medicine, Gastroenterology, Genitourinary medicine , venereology and STDs, Health services, Helminthology, hematology, Histology, Hospital administration, Immunology, Infectious diseases, Informatics, Laboratory medicine, Maternal and child health, Measures, Microbiology, Mycology, Nephrology, Neurology, Nutrition and dietetics, Obstetrics and gynecology, Occupational medicine, industrial medicine, Oncology, Ophthalmology and optics, Orthopedics, Otorhinolaryngology, Parasitology, Pathology, Pediatrics, Pharmacology and therapeutics, Physiatrics and physical medicine, Physiology, Prefixes, Preventive medicine, Public health, community mdeicine and hygiene, Reproductive health, Sexology, Suffixes, Surgery, Taxonomy, nosology and classification (1118), Toxicology, Transplantation, Tropical medicine, Virology, WHO managerial terms (2000), Zoology (997). Helminthology, hematology, Histology, Hospital administration, Immunology, Infectious diseases, Informatics, Laboratory medicine, Maternal and child health, Measures, Microbiology, Mycology, Nephrology, Neurology, Nutrition and dietetics, Obstetrics and gynecology, Occupational medicine, industrial medicine, Oncology, Ophthalmology and optics, Orthopedics, Otorhinolaryngology, Parasitology, Pathology, Pediatrics, Pharmacology and therapeutics, Physiatrics and physical medicine, Physiology, Prefixes, Preventive medicine, Public health, community mdeicine and hygiene, Reproductive health, Sexology, Suffixes, Surgery, Taxonomy, nosology and classification (1118), Toxicology, Transplantation, Tropical medicine, Virology, WHO managerial terms (2000), Zoology (997).
You
can download a copy of AGROVOC from
http://www.fao.org/aims/ag_download.htm
Each descriptor has its equivalent in other languages. Descriptors are indexing terms which consist of one or more words representing one and the same concept. Non-descriptors are terms which help the user to find the appropriate descriptor(s). Non-descriptors are followed by a reference (USE operator) to the descriptor, which is the preferred term. For indexing purposes, it is important that only descriptor terms are used.
AGROVOC is available in 9 languages: the five FAO official languages (which are English, French, Spanish, Chinese and Arabic), Czech, Portuguese, Japanese and Thai. Other languages like German, Italian, Korean, Hungarian, Slovak and Lao are currently being prepared.
It is stated
clearly in their website that AGROVOC is free of charge for
educational or other strictly non-commercial purposes.
AGROVOC
is available for downloading in MySQL, TagText, ISO2709 and Microsoft
Access formats. To download the AGROVOC database for off-line use,
please send your request to fao-agris-caris@fao.org. When sending the
request please specify the following: Full Name, Email, Organisation,
Reason for downloading AGROVOC, Comments. AGROVOC is also available
through web services. More information available here:
http://www.fao.org/aims/ag_webservices.jsp
International Glossary of Hydrology (1418 entries): This is a multilingual resource that includes Arabic and English (to view Arabic characters choose Unicode UTF-8).
http://www.disclic.unige.it/glos_idro/indice.php?list=0&lang=ar&style=1
Habitat
and Urbanism Glossary (AR-EN-FR):
This has 3850 Arabic-English-French entries in PDF.
Elementymology
& Elements Multidict (MULTI):
This is a multilingual dictionary of the names of chemical elements
in many languages. There are alphabetical and numerical lists.
Clicking on the name of an element brings the element information
page up in the main window. It can be used as a reference.
Zoology
Dictionary (EN>AR):
This has 2500 terms in alphabetical order.
Glencoe
Online –
This is a multilingual Mathematics Glossary (AR-EN-ES-KO-RU-UR-VI-ZH)
in pdf files, in the form of an alphabetical list with glosses.
This allows the
online generation of individual verb forms (from I to X) for Arabic
verbs with tri-consonantal given roots. ( in Arabic letters).
This is more
complete than the above. It uses the Latin characters for introducing
the Arabic root and it is off line. It has been downloaded and it
works for Windows.
Interesting and useful online tool. Arabic Morfix has a big capacity of morphological searching and is standalone search engine. This tool is a demonstration and it is based on a collection of 200 articles which contain general news items form various sources. In its searching it takes into account the following features: context sensitivity, expanded morphological search, thesaurus search and entering queries in Latin Transcription for Arabic names.
Off line Conjugator
www.geocities.com/effel_dahling
www.comp.leeds.co.uk
www.freshmeat.net
The tools called concordancers have as main tasks searching, sorting and classifying words and they are a real help in which concerns the manipulation of corpus.
R. Al-shalabi
(1996). Design and implementation of an
Arabic morphological system to support natural language processing.
Ph.D. Dissertation. Computer Science
Department, Illinois Institute of Technology. Chicago, 1996.
Sabri
Elkateb (2005) Design and implementation of an English Arabic
dictionary/editor. PhD thesis, The University of Manchester, United
Kingdom.
Sebawai
Morphological Analyzer.
Al-Stem Light
stemmer
Allan
Ramsay, Hanady Mansur (2000) "Arabic Morphology: a categorial
approach"
Alshalabi, R.
and Evens, M. (1998). "A Computational Morphology System for
Arabic", In Workshop on Computational Approaches to Semitic
Languages COLING-ACL98, August 16, Montreal, 1998.
Azza
Abdel Monem, Khaled Shaalan, Ahmed Rafea, Hoda Baraka. () "A
Proposed Approach for Generating Arabic from Interlingua in a
Multilingual Machine Translation System"
Chiang, David, Mona Diab, Nizar Habash, Owen Rambow and Safi Sharif. 2006.
Parsing Arabic Dialects. In Proceedings of the 11th Conference of the
European Chapter of the Association for Computational Linguistics. Trento,
Italy. [
PDF ]
Chowdhury,
A., Aljlayl, M., Jensen, E., Beitzel, S.,Grossman, D., Frieder, O.
(2002)."IIT at TREC 2002 Linear Combinations Based on Document
Structure and Varied Stemming for Arabic Retrieval."The
Eleventh Text Retrieval Conference (TREC 2002)
Diab,
Mona, Kadri Hacioglu and Daniel Jurafsky (2004). Automatic Tagging of
Arabic Text: From raw text to Base Phrase Chunks. In Proceedings
of HLT-NAACL 2004.
Dichy. (2001)
"On Lemmatization in Arabic – A FormalDefinition of the
Arabic Entries of Multilingual Lexical Databases," Proc. of
the Workshop on Arabic LanguageProcessing, Toulouse, 2001.
Dichy, J. / A.
Farghaly (2003) “Roots & Patterns vs. Stems: on what
grounds should a multilingual database centred on Arabic be built?”,
in Proceedings of the MT Summit IX Workshop on Machine Translation
for Semitic Languages: Issues and Approaches,September 23, 2003,
New Orleans, Louisiana, U.S.A.
Elkateb,
S., Black, W., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A. and
Fellbaum, C., (2006). Introducing a WordNet for
Arabic, in Proceedings of the Fifth International Conference
on Language resources 2006, Genoa Italy.
El-Sadany, T.
A. and M. A. Hashish, (1989) “An Arabic Morphological
System.”In IBM Systems Journal, Vol. 28, No. 4, 600-612,
1989.
Feddagi, A., (1992) ‘Arabic
Morpho-syntax and semantic parsing’, Department of Computer
Science, University of Manchester, 3rd International Conference on
Multilingual, 10-12 Dec., 1992, Univ. of Durham.
Franz,
M., McCarley, J. S. (2002)."Arabic Information Retrieval at
IBM". The Eleventh Text Retrieval Conference (TREC 2002).
-
Presentation of two models for crosslanguage IR (English queries,
Arabic documents)
George
Anton Kiraz (1994) "Computational Analysis of Arabic
Morphology." In Narayanan A. and Ditters E. (eds) The
linguistic Computation of Arabic
Habash, Nizar, Owen Rambow and George Kiraz. Morphological Analysis and Generation for Arabic Dialects. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the Conference of American Association for Computational Linguistics (ACL'05). [ PDF ]
Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis, and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the Conference of American Association for Computational Linguistics (ACL'05). [PDF ]
Habash, Nizar. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004. [ PDF]
Habash, Nizar and Owen Rambow. MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects. In Proceedings of COLING-ACL, Sydney, Australia, 2006 (Main Volume). [ PDF ]
Habash, Nizar and Owen Rambow. A Morphological Analyzer for MSA and the Arabic Dialects. Presented at the Arabic Linguistic Society annual meeting, Kalamazoo. 2006.
Habash, Nizar. "Arabic Morphological Representations for Machine Translation." Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer Publications, 2007.
Habash, Nizar, Bonnie Dorr and Christof Monz. Challenges in Building an Arabic Generation-heavy Machine Translation System and Extending it with Statistical Components. In Proceedings of the Association for Machine Translation in the Americas (AMTA-2006). [ PDF ]
Habash, Nizar and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical Machine Translation, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), New York, 2006. [ PDF]
Habash, Nizar, Abdelhadi Soudi, and Tim Buckwalter. "On Arabic Transliteration." Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer Publications, 2007.
Habash, Nizar. "On Arabic and its Dialects," Multilingual Magazine. Volume 21 Number 3. 2006.
Habash, Nizar and Owen Rambow. Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004. [ PDF ]
Habash, Nizar, Clinton Mah, Randy Calistri-Yeh, Sabiha Imran and Paraic
Sheridan. The Design and Validation of an Arabic Conceptual Interlingua for
Information Retrieval. In Proceedings of the International Conference on
Language Resources and Evaluation (LREC). 2006.
Haidar M.
Harmanani, Walid T. Keirouz, Saeed Raheel ()"A rule-based
extensible stemmer for information retrieval with application to
arabic".
Hasnah, A. /
Evens, M. (2001), “Arabic/English Cross Language Information
Retrieval Using a Bilingual Dictionary”, in: Proceedings of
the ACL/EACL 2001 Workshop on Arabic Language Processing: Status and
Prospects, July 6, 2001, Toulouse, France.
Hassan
Sawaf, Jörg Zaplo, Hermann Ney (2001) "Statistical
Classification Methods for Arabic News Articles"
Hudson, G.
(1986) "Arabic Root and Pattern Morphology without Tiers"
Journal of Linguistics, 22:85-122.
Imad A.
Al-Sughaiyer and Ibrahim A. Al-Kharashi. () "Arabic
Morphological Analysis Techniques: A Comprehensive Survey". 25
pages, very good. See there sakhr link.
Imad A.
Al-Sughaiyer and Ibrahim A. Al-Kharashi. () "Rule Parser for
Arabic Stemmer"
Jawad Berri,
Hamza Zidoum and Yacine Atif (2001),
"Web-based Arabic Morphological Analyzer."
In: A.Gelbukh (ed.): CICLing
2001, No. 2004 in Lecture Notes in Computer
John
Maloney and Michael Niv. () "TAGARAB: A Fast, Accurate Arabic
Name Recognizer
Using High-Precision Morphological Analysis".
Judith Dror.
() "Morphological Tagging of the Qur’an", Department
of Arabic Language and Literature, University of Haifa.
Kadri,
Y. (2003) “Recherche d’information translinguistique sur
les documents en arabe”, Rapport de prédoctoral, DIRO,
Université de Montréal.
Kenneth,
R. Beesley (1996)."Arabic Finite-State Morphological Analysis
and Generation" . In Using Xerox tools for Arabic morphology
Kenneth,
R. Beesley (1998). "Arabic Morphological Analysis on the
Internet", In Proceedings of the International Conference on
Multi-Lingual Computing (Arabic & English), Cambridge
G.B.,17-18 April, 1998. Using Xerox tools for
Arabic morphology
Kenneth,
R. Beesley (2001). "Finite-State Morphological Analysis and
Generation of Arabic at Xerox Research: Status and Plans in 2001".
Using Xerox tools for Arabic morphology
Kareem
Darwish, Douglas W. Oard.()"Term Selection for Searching Printed
Arabic"
Kareem
Darwish, Douglas W. Oard.()"Probabilistic Structured Query
Methods"
Kazem
Taghva,
Rania Elkhoury, Jeffrey
S. Coombs
(2005) "Arabic Stemming Without A Root Dictionary". ITCC
(1) 2005:
152-157.
More
works by Kazem can be found in:
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/t/Taghva:Kazem.html
Maamouri, Mohamed, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi. Developing and Using a Pilot Dialectal Arabic Treebank. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.
Mahtab
Nikkhou, Khalid Choukri (2005) "Survey on Arabic Language
Resources and Tools in the Meditarranean Countries", Nemlar
Report, March 2005.
Mark
Sanderson, Asaad Alberair (2001) "Keep it simple Sheffield - a
KISS approach to the Arabic track".
Rambow, Owen, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Carnegie Keith J. Miller, Teruko Mitamura, Florence Reeder, Advaith Siddharthan. Parallel Syntactic Annotation of Multiple Languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006.
Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (ACL-Coling'06). Sydney, Australia. 2006. [PDF
] Snider, Neal and Mona Diab. Unsupervised Induction of Modern Standard Arabic Verb Classes. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), New York, 2006. [ PDF ]
René
Schneider, Thomas Mandl, and Christa Womser-Hacker ()"Integration
of Arabic to a Cross-Lingual Retrieval Tool:Challenges and
Perspectives".
Riyad
Al-Shalabi and Martha Evens (1998) "A computational morphology
system for Arabic". In Michael Rosner, editor, Proceedings of
the Workshop on Computational Approaches to Semitic languages,
pages 66–72, Montreal, Quebec, August. COLING-ACL’98.
Saliba, B. and
Al Dannan, A. (1989) “Automatic Morphological Analysis of
Arabic: A study ofContent Word Analysis”, In Proceedings of
the Kuwait Computer Conference, Kuwait, March 3-5, 1989.
Sabri
El-Kateb, William J. Black.(2004) "English-Arabic Dictionary for
translation"
Sabri
El-Kateb, William J. Black (2001)"Towards the design of
English-Arabic terminological and lexical knowledge base"
Schramm, G. (1962), An Outline of
Classical Arabic Verb Structure, Language vol. 38, pp. 360-75.
Shereen
Khoja (2001) "APT: Arabic Part-of-speech Tagger"
Proceedings of the Student Workshop at the Second Meeting of the
North American Chapter of the Association for Computational
Linguistics (NAACL2001), Carnegie Mellon University, Pittsburgh,
Pennsylvania. June 2001.
http://www.comp.lancs.ac.uk/computing/users/khoja/NAACL.pdf
Soudi, A.,
Eisele, A. (2004) "Generating an Arabic Full-Form Lexicon for
Bidirectional Morphology Lookup", in Proceedings of Language
Resources Evaluation Conference (LREC), Lisbon, Portugal.
Soudi, A.,
Cavalli-sforza, V., Jamari, A. (2001), "A Computational
Lexeme-based Treatment of Arabic Morphology", in Proceedings
of The Arabic Processing Workshop, Association For Computational
Linguistics, Toulouse, France, 2001.
Tomlinson,
S. (2002) "Experiments in Named Page Finding and Arabic
Retrieval with Hummingbird."
Eleventh
Text Retrieval Conference (TREC 2002)
Violetta
Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura.() "Arabic
Morphology Generation Using a Concatenative Strategy"