|
The Berkeley FrameNet project is creating an on-line lexical resource
for English, based on frame semantics and supported by
corpus evidence. The aim is to document the range of semantic and
syntactic combinatory possibilities (valences) of each word in each of
its senses, through computer-assisted annotation of example sentences
and automatic tabulation and display of the annotation results. The
major product of this work, the FrameNet lexical database, currently
contains more than 11,600 lexical units (defined below), more than
6,800 of which are fully annotated, in more than 960 semantic frames,
exemplified in more than 150,000 annotated sentences. It has gone
through five releases, and is now in use by hundreds of researchers,
teachers, and students around the world.
Please have a look at our work:
- Type a word into the "Search" box (upper left) to see if FrameNet has it.
- Use the FrameGrapher to browse the network of frames, or
- Click on the "View FrameNet data" link at the left.
Active research projects are now seeking to produce
comparable frame-semantic lexicons for other languages and to devise
means of automatically labeling running text with semantic frame
information.
A lexical unit (LU) is a pairing of a word with a meaning.
Typically, each sense of a polysemous word belongs to a different
semantic frame, a script-like conceptual structure that
describes a particular type of situation, object, or event and the
participants and props involved in it. For example, the Apply_heat
frame describes a common situation involving a Cook, some Food, and a
Heating_Instrument, and is evoked by words such as
bake, blanch, boil, broil, brown, simmer, steam, etc. We
call these roles frame elements (FEs) and the frame-evoking
words are LUs in the Apply_heat frame. Some frames are more
abstract, such as Change_position_on_a_scale, evoked by LUs such
as decline, decrease, gain, plummet, rise, etc., with FEs
such as Item, Attribute, Initial_value and Final_value.
In the simplest case, the frame-evoking LU is a verb and the FEs are
its syntactic dependents:
[Cook Matilde] fried [Food the catfish]
[Heating_instrument in a heavy iron skillet]
[Item Colgate's stock] rose [Difference $3.64]
[Final_value to $49.94]
but LUs can also be event nouns such as reduction in the
Cause_change_of_scalar_position frame:
...the reduction [Item of debt levels]
[Value_2 to $665 million] [Value_1 from $2.6 billion]
or adjectives such as asleep in the Sleep frame:
[Sleeper They] [Copula were] asleep
[Duration for hours]
The lexical entry for a predicating word, derived from such
annotations, identifies the frame which underlies a given meaning and
specifies the ways in which FEs are realized in structures headed by
the word.
Many common nouns, such as artifacts like hat or
tower, typically serve as dependents rather than
clearly evoking their own frames. When we annotate such lexical
units, the main purpose is to identify the most common predicates that
govern phrases headed by them, and thus to illustrate the
ways in which these common nouns function as FEs within frames evoked
by the governing predicates.
We do recognize that artifact and natural kind nouns also
have a minimal frame structure of their own. For example, artifacts
often occur together with expressions indicating their sub-type, the
material of which they are made, their manner of production, and their
purpose/use; these are defined as FEs in the frames for various types
of artifacts. However, the frames evoked by artifact and natural kind
nouns rarely dominate the clauses in which they occur, and so we
seldom select them as targets of annotation.
Formally, our annotations are constellations of triples that make up
the frame element realization for each annotated sentence, each
consisting of a frame element (for example, Food), a
grammatical function (say, Object) and a
phrase type (say, NP). We think of these three
types of annotation on each tagged frame element as "layers" and
they are displayed as such in the annotation software used in the
project, but the grammatical function and phrase type layers are not
displayed in the web-based report system, to avoid visual clutter. The
full data, available as part of the data download (see [LINK] FNdata),
includes these three layers (and several more not discussed here) for
all of the annotated sentences, along with complete frame and FE
descriptions, frame-frame relations, and lexical entries summarizing
the valence patterns for each annotated LU.
The FrameNet database is a lexical resource with unique
characteristics that differentiate it from other resources such as
commercially available dictionaries and thesauri and from the
best-known lexical resource, WordNet.
- Like dictionary subentries, our lexical units come with
definitions, either from the Concise Oxford Dictionary, 10th Edition
(courtesy of Oxford University Press) or a definition written by a
FrameNet staff member.
- Unlike commercial dictionaries, we provide multiple annotated
examples of each sense of a word (i.e. each lexical unit). Moreover,
the set of examples (roughly 20 per LU) is meant to illustrate
all of the combinatorial possibilities of the lexical unit.
- The examples we provide are attestations taken from naturalistic
corpora rather than constructed by a linguist or lexicographer. The
main FrameNet corpus is the 100-million-word British National Corpus
(BNC), which is both large and balanced across genres (editorials,
textbooks, advertisements, novels, sermons, etc.), but, of course,
lacks many specifically American expressions. We are also using
U.S. newswire texts provided by the Linguistic Data Consortium, and
have recently acquired the newly released initial part of the
American National Corpus, which we plan to begin using soon.
- Our analysis of the English lexicon proceeds frame by frame
rather than by lemma, whereas traditional dictionary-making procedes
word by word through the alphabet. Thus, while a traditional
lexicographer measures progress in words completed, FrameNet measures
progress in frames completed. The fact that there are one or more LUs
for a given word in completed frames does not mean that there could
not be other LUs for the same word in future frames.
- Each lexical unit is linked to a semantic frame, and hence to
the other words which evoke that frame. This makes the FrameNet
database similar to a thesaurus, grouping together semantically
similar words.
- All ontologies and WordNet provide some sort of hierarchical
relations between their nodes; likewise, FrameNet includes a network
of relations between frames. Several types are defined, of which the
most important are:
- Inheritance: The child frame is a subtype of the parent frame,
and each FE in the parent is bound to a corresponding FE in the
child. (An IS-A relation.)
- Using: The child frame presupposes the parent frame as
background, e.g the Speed frame "uses" (or presupposes) the Motion
frame; however, not all parent FEs need to be bound to child FEs.
- Subframe: The child frame is a subevent of a complex event
represented by the parent, e.g. the Criminal_process frame has
subframes of Arrest, Arraignment, Trial, and Sentencing.
These frame-frame relations are shown in the frame reports
Frame Index ; the FE-FE relations are not shown. There is also a new graphical browser for them, the Frame Grapher
- Since we do not do much annotation of nouns denoting artifacts
and natural kinds, the FrameNet database is not readily usable as an
ontology of things. In this area, we mostly defer to WordNet, which
provides extensive coverage, including hierarchical relations of areas
such as animals, plants, etc.
In this discussion, we have used the word word in talking
about lexical units. The reality is actually rather complex. When we
say that the word bake is polysemous, we mean that the lemma
bake.v (which has the word-forms bake, bakes, baked,
and baking) is linked to three different frames:
- Apply_heat: Michelle baked the potatoes for 45 minutes.
- Cooking_creation: Michelle baked her mother a cake for her
birthday.
- Absorb_heat: The potatoes have to bake for more than 30 minutes.
These constitute 3 different LUs, with different
definitions. Multiword expressions such as given name and
hyphenated words like shut-eye can also be LUs. Idiomatic
phrases such as middle of nowhere and give the slip
(to) are also defined as LUs in the appropriate frames
(Isolated_places and Evading, respectively), and their internal
structure is not analyzed.
For more detailed discussion on everything discussed here, please see
the Book and the FAQs.
|