Arabic
NLP Resources
for the Arabic WordNet Project
|
William BLACK,
Sabri ELKATEB School of Informatics University of Manchester Sackville Street, Manchester, M60 1QD, w.black@manchester.ac.uk, sabri.elkateb@manchester.ac.uk Manuel BERTRAN, Xavier FARRERES, David FARWELL, Reda HALKOUM, Horacio RODRIGUEZ Politechnical University of Catalonia, horacio@lsi.upc.edu, mbertran@lsi.upc.edu Musa ALKHALIFA, Tànit ASSAF, M.A. MARTI, University of Barcelona Gran Via 585, 08007-Barcelona {musa,tanit}@thera-clic.com, amarti@ub.edu |
Piek VOSSEN
Irion
Technologies
Delftechpark 26, 2628XH, Delft, The Netherlands piek.vossen@irion.nl Adam PEASE Articulate Software Inc, 420 College Ave Angwin, CA 94508 apease@articulatesoftware.com Christiane FELLBAUM Princeton University, Department of Psychology, Green Hall, Princeton, NJ 08544 fellbaum@clarity.princeton.edu |
2. OPEN DOMAIN LEXICAL RESOURCES 5
2.1 Arabic Monolingual Corpora. 5
2.2 Arabic/English/... Parallel Corpora. 5
2.3 Arabic Monolingual Dictionaries and Lexicons 6
2.4 Arabic/English Bilingual dictionaries and lexicons 7
2.4.1 Printed bilingual dictionaries 7
2.4.2. On-line MRD (Machine Readable Dictionaries) 8
2.5 Lexicons obtained from (selective) access to online MT systems: 11
3. DOMAIN RESTRICTED LEXICAL RESOURCES 16
3.2 Agriculture and related domains 18
4. OTHER LINGUISTIC RESOURCES 20
4.2 Arabic Dependency Treebank 21
5.1. Morphological Analyzers 22
5.5 Other Arabic NL Processors 23
7. SLIGHTY COMMENTED BIBLIOGRAPHY 24
This has information about the Sakhr English-Arabic dictionary and useful information on Arabic grammar and Arabic language technology in general.
COUNTRY
NAME, AIDS, agriculture, atmospheric science, biodiversity,
bioscience, budget and management, cartography and geography, child
welfare, climate change, codes and regulations, communication, core
concept, culture, declarations, demographics, development,
disarmament, disasters, discrimination, documents, economics,
education and training, energy, environment, export controls and
sanctions, finance, fisheries, food, forestry, functional and other
titles, geoscience, governance, Greek, habitat, health and medicine,
human rights, humanitarian issues, indigenous peoples, information
technology, intellectual property, international law, international
relations, international trade, labour, landmines and mine action,
Latin, law enforcement, law of the sea, logistics and supplies,
meetings, migrations and refugees, military abbreviations, military
issues, multilateral instruments, narcotic drugs, national law,
natural resources, nuclear science, oceanography, organizational
structure, peace and security, peace operations, plans of action and
initiatives, political life, poverty, religions, science and
technology, set phrases, small arms, social issues, space, staff
matters, statistics, TALOS, terrorism, transport and communications,
water, weapons of mass destruction, women.
UNESCOTERM
Search (AR-DE-EN-ES-FR-RU-ZH)
:
This can be used as reference and its content can be extracted. It includes terms related to UNESCO such as administrative and financial terms, education, conferences and meetings, etc).
UNESCO Structures,
Superseded UNESCO Structures, Institutions: (IGOs, NGOs, etworks,
Systems, Foundations), IOC: Titles, Terms and Acronyms ,
Administrative and Financial Terms , International (Days, Weeks,
Years and Decades), Campaigns and Appeals, UNESCO's Member States,
UNESCO's Standard-Setting Instruments, International Prizes,
(Non-Member States, Non-Self-governing Territories, Dependent
Territories etc.), UNESCO Chairs, Miscellaneous, UN and International
Legal Instruments, UNESCO Functions and Titles, (Conferences,
Meetings etc.), Terms in the field of Education, (UNESCO's
Programmes, Projects, Initiatives), (International Programmes,
Projects, Initiatives), Former Institutions: (IGOs, NGOs, Networks,
Systems, Foundations)
A copy of this
CD-ROM is available from khayat@emro.who.int
The domains include (numbers refers to number of entries of the domains sampled):
All specialised UMD dictionaries: Abbreviations (799 entries), Acidology (1669), Acronyms (248), Anatomy (2000), Anesthesiology (484), Anthropology and anthropometrics (1427), Bacteriology (1827), Biochemistry and Chemistry (2000), Biology, Biomedical engineering, Biomedical ethics, Biostatistics, Blood transfusion medicine, Botany, Cardiology and cadiovascular surgery, Cell biology, Demography, Dentistry(2000), Dermatology, Diagnostics(symptoms&signs), Embryology & teratology, Emergency medicine, Endocrinology & metabolism, Entomology, Environmental health, Enzymology and Zymology, Family and community medicine, Food safety, Forensic medicine, Gastroenterology, Genitourinary medicine , venereology and STDs, Health services, Helminthology, hematology, Histology, Hospital administration, Immunology, Infectious diseases, Informatics, Laboratory medicine, Maternal and child health, Measures, Microbiology, Mycology, Nephrology, Neurology, Nutrition and dietetics, Obstetrics and gynecology, Occupational medicine, industrial medicine, Oncology, Ophthalmology and optics, Orthopedics, Otorhinolaryngology, Parasitology, Pathology, Pediatrics, Pharmacology and therapeutics, Physiatrics and physical medicine, Physiology, Prefixes, Preventive medicine, Public health, community mdeicine and hygiene, Reproductive health, Sexology, Suffixes, Surgery, Taxonomy, nosology and classification (1118), Toxicology, Transplantation, Tropical medicine, Virology, WHO managerial terms (2000), Zoology (997). Helminthology, hematology, Histology, Hospital administration, Immunology, Infectious diseases, Informatics, Laboratory medicine, Maternal and child health, Measures, Microbiology, Mycology, Nephrology, Neurology, Nutrition and dietetics, Obstetrics and gynecology, Occupational medicine, industrial medicine, Oncology, Ophthalmology and optics, Orthopedics, Otorhinolaryngology, Parasitology, Pathology, Pediatrics, Pharmacology and therapeutics, Physiatrics and physical medicine, Physiology, Prefixes, Preventive medicine, Public health, community mdeicine and hygiene, Reproductive health, Sexology, Suffixes, Surgery, Taxonomy, nosology and classification (1118), Toxicology, Transplantation, Tropical medicine, Virology, WHO managerial terms (2000), Zoology (997).
You
can download a copy of AGROVOC from
http://www.fao.org/aims/ag_download.htm
Each descriptor has its equivalent in other languages. Descriptors are indexing terms which consist of one or more words representing one and the same concept. Non-descriptors are terms which help the user to find the appropriate descriptor(s). Non-descriptors are followed by a reference (USE operator) to the descriptor, which is the preferred term. For indexing purposes, it is important that only descriptor terms are used.
AGROVOC is available in 9 languages: the five FAO official languages (which are English, French, Spanish, Chinese and Arabic), Czech, Portuguese, Japanese and Thai. Other languages like German, Italian, Korean, Hungarian, Slovak and Lao are currently being prepared.
It is stated
clearly in their website that AGROVOC is free of charge for
educational or other strictly non-commercial purposes.
AGROVOC
is available for downloading in MySQL, TagText, ISO2709 and Microsoft
Access formats. To download the AGROVOC database for off-line use,
please send your request to fao-agris-caris@fao.org. When sending the
request please specify the following: Full Name, Email, Organisation,
Reason for downloading AGROVOC, Comments. AGROVOC is also available
through web services. More information available here:
http://www.fao.org/aims/ag_webservices.jsp
International Glossary of Hydrology (1418 entries): This is a multilingual resource that includes Arabic and English (to view Arabic characters choose Unicode UTF-8).
http://www.disclic.unige.it/glos_idro/indice.php?list=0&lang=ar&style=1
Habitat
and Urbanism Glossary (AR-EN-FR):
This has 3850 Arabic-English-French entries in PDF.
Elementymology
& Elements Multidict (MULTI):
This is a multilingual dictionary of the names of chemical elements
in many languages. There are alphabetical and numerical lists.
Clicking on the name of an element brings the element information
page up in the main window. It can be used as a reference.
Zoology
Dictionary (EN>AR):
This has 2500 terms in alphabetical order.
Glencoe
Online –
This is a multilingual Mathematics Glossary (AR-EN-ES-KO-RU-UR-VI-ZH)
in pdf files, in the form of an alphabetical list with glosses.
This allows the
online generation of individual verb forms (from I to X) for Arabic
verbs with tri-consonantal given roots. ( in Arabic letters).
This is more
complete than the above. It uses the Latin characters for introducing
the Arabic root and it is off line. It has been downloaded and it
works for Windows.
Interesting and useful online tool. Arabic Morfix has a big capacity of morphological searching and is standalone search engine. This tool is a demonstration and it is based on a collection of 200 articles which contain general news items form various sources. In its searching it takes into account the following features: context sensitivity, expanded morphological search, thesaurus search and entering queries in Latin Transcription for Arabic names.
Off line Conjugator
www.geocities.com/effel_dahling
www.comp.leeds.co.uk
www.freshmeat.net
The tools called concordancers have as main tasks searching, sorting and classifying words and they are a real help in which concerns the manipulation of corpus.