Christiane Fellbaum, Princeton University, USA Tutorial on Wordnet, 1st Internationl Wordnet Conference, Mysore, India, 21-25-jan-2002 =============== PRELIMINARY REMARK: Many of these comments are based on experiences gained during the construction of the Princeton WordNet. WordNet started as an experiment testing theories of human semantic memories, and was not initially designed as a tool for NLP. WordNet evolved gradually and underwent many changes and additions, some of which were motivated by specific funding situations. The WordNet experiment now permits us the benefit of hindsight; many shortcomings of the Princeton WordNet were successfully addressed in EuroWordNet. DESIGN Design with a purpose/application(s) in mind (e.g. Word Sense Disambiguation, a basic requirement for Machine Translation, Information Retrieval, etc. This goal would affect the degree of polysemy in the lexicon.) Observe international standards as proposed in Mysore (cf. minutes of business meeting). METHODOLOGY 1. Use corpora and minimize intuitions, esp. when distinguishing senses Use test (e.g., Cruse, 1986) to distinguish e.g., hyponyms from synonyms See also tests listed by Vossen for EWN construction. 2. Acquisition of contents -automatically --fast --requires good Machine Readable Dictionaries --requires manual checking -manually --slow, expensive CONTENTS 1. Which parts of speech should be included? Adjectives less important that nouns, verbs. Adverbs are "icing on the cake" and may be excluded for lack of money, time 2. Phrasal verbs? Idioms? Noncompositional compound nouns? 3. Decide on Core Vocabulary Specific Domains for specific applications 4. Synsets: --As there is arguably no absolute synonymy in language, (some) synonym sets can have a single member, though synonyms are most useful for WSD. But that work can also be done by definitions (glosses) and/or example sentences. --Glosses should be uniform (e.g., superordinate plus distinguishing features) --Example sentences should be taken from a corpus and/or the web 5. Polysemy: Decide what level of fine-grainedness is appropriate for the purpose. Ideally, both fine- and coarse-grained sense distinctions should be available. This can be achieved by clustering related senses (as was done for many verbs in WordNet); the user can have the option of viewing either fine or coarse-grained distinctions. 2. Conceptual-semantic and Lexical Relations: Number of Relations: Find a balance between too many and too few E.g., how many kinds of meronymy are truly useful? Consider distinction between two kinds of hyponymy: ISA and function/role as in dog ISA mammal; dog HAS ROLE pet Cross-part of speech links: --Verb-noun and adjective-noun links show important information about subcategorization properties and selectional rectrictions. Can serve to disambiguate as well. --Morphologically and semantically related words (as currently done for WN 2.0)