Base Concepts in Wordnets

This document describes the synsets (sets of synonymous word meanings) that are most important in 3 up to 4 wordnets for different languages, the so-called Base Concepts. The Base Concepts are the major building blocks on which the other word meanings in the wordnets depend. Importance of synsets is based on two criteria: high number of relations with other synsets and a high position in the hierarchy. These criteria have independently been applied to 4 different wordnets:

 In the near future we will verify, and when necessary extend this set, on the basis of similar selections in German, French, Czech and Estonian.

The resulting synsets in the local wordnets have been translated to the closest WordNet1.5 synset. Note that there could be more than one WordNet1.5 synset as a translation. Next we selected those WordNet1.5 synsets that occurred in 3 or 4 selections (a set of 1024 synsets selected by at least 2 sites is described in http: //www.hum.uva.nl/~ewn). The selection shared by 3 to 4 sites consists of 206 synsets:

In some cases, the selection was rather unbalanced: e.g. dog was selected but not cat. In those cases we abstracted from this level and selected only the hyperonym class. On the other hand we added all the WordNet1.5 hyperonyms of these synsets that were not yet part of the selection. The total result is a set of 164 synsets that can be seen as the most important synsets in the wordnets of 4 languages. The set is divided into two parts:

 66 Concrete synsets (all nouns)

 98 Abtract synsets (63 nouns and 35 verbs)

We have listed all these synsets in two files where we specify the part-of-speech, the gloss in WordNet1.5, the hyperonyms in WordNet1.5 and the EuroWordNet Top-Concepts that have been assigned, e.g.:

WordNet1.5 Synset: {ability 2; power 3}

Part-of-Speech: n

WordNet1.5 File off-set: 3841132

WordNet1.5 Gloss: possession of the qualities (especially mental qualities) required to do something or get something done

WordNet1.5 Hyperonyms: cognition 1

EuroWordNet Top Concepts: Modal Property

When clicking on the WordNet1.5 hyperonym you will be linked to a file with the WordNet1.5 hyperonym chain that goes with the synset. The same holds for the EuroWordNet top concepts that are given, as is illustrated in these examples. You can also directly go the EuroWordNet top-ontology or the Wordnet hierarchy for Concrete or Abstract synsets.

It is possible to further reduce this set. In some cases, several synsets seem to represent more or less the same concept:

{form 1; shape 1}

{form 6; pattern 5; shape 5}

 

{attribute 1}

{character 2; lineament 2; quality 4}

{property 2}

{quality 1}

{attribute 2; dimension 3; property 3}

 

{fluid 2}

{liquid 4}

 

{instrument 2}

{instrumentality 1; instrumentation 2}

All these examples can be replaced by a single synset or concept. Another way in which the list can be minimalized is by removing or reducing Functional classes. In addition to the general class Function, we see many specific realizations of function which happen to important in the wordnets, e.g. furniture, garment, building. This list can never be complete, because there can be as many functions as conceivable roles in situation. The fact that these functions are selected and not others only means that they are more strongly lexicalizad in 4 languages. By minimalizing the set in this way we have been able to reduce the 164 synsets to 71 Base Types, which can be organized into an ontology.

This ontology shows much resemblance with the EuroWordNet top-ontology, which has 63 ontological distinctions. The only difference is that the EuroWordNet ontology is organized as a lattice which makes it possible to derive many more feature combinations, whereas the minimal set of Base Types is a more static hierarchy.

 

Some important remarks:

The list of synsets is not the most minimal set possible. It is also not necessarily the most representative set for any top-level ontology. These Base Concepts at most reflect strong lexicalizations shared in 3 to 4 languages, and as such represent the most important synsets in the wordnets to relate all the other synsets. For a more minimal set go to the 71 Base Types.

The synsets are primarily defined in terms of their relations to other concepts. In principle it is possible to create a wordnet without formally defining the concepts (intensionally or extensionally). Using so-called Diagnostic Frames (Cruse 1986), such as "If it is a stallion then it is also a horse", it is possible to establish lexical semantic relations between (most) words (although there are cases where these frames do not elicit answers). For example, it is possible to state that the relation between person-man-woman is the same as the relation between horse-stallion-merry, without explicitly defining what each of these words mean. In order to judge the meaning of a synset in a wordnet it is therefore necessary to primarily look at the lexical semantic relations it has with other synsets. The glosses are just clues to help distinguishing senses from each other, they are not formal definitions as such. Consequently, it may happen that the information in the gloss does not match with the way a synset has been related to other synsets.

Furthermore, the EuroWordNet top ontology and the Base Type Ontology are sometimes in conflict with the WordNet1.5 classification. This specifically holds for "communication 1", "sign 3" and for "amount 1". This is mainly due to the fact that the (sub-)hyponyms of these concepts are concrete whereas the gloss and the hyperonyms suggest that they are abstract.