<words>
tags) being just a very flat tree, with one branch for each
word and the second (inside the
<non-terminals>
tags) being a tree of
phrase nodes (P tags) which in turn contain basic element nodes (E
tags). These nodes and edges are intended to constitute a tree,
rooted in a node labeled <s>, with ID number 00, with the word nodes
in the first part of the file as its leaves.
The comparison process is as follows:
First, check to be sure that the top-level structure of each file is identical, and that the list of word (W) elements is the same in both files; if not, stop and return an informative error message.
Then, each other element to be compared is either a P (phrase) element or an E element. For every phrase that is a frame (indicated by the presence of a "Frame" attribute), compare the frame name and the list of word nodes that is the value of the target attribute; the match is valid only if they are identical. (Note that for most targets, this list has length one, but there can be more than one word in the target for multi-word expressions) The list of word nodes is unordered, so may be sorted to do the comparison. If the frame node also has a "denoted" attribute, this is a valid FE match if the denoted FE names are the same.
Within each frame P element, find each E that is a frame element (i.e. has an "FE" attribute) The match is valid if the FE names are identical and the (unordered) list of head words is identical. If the Ref attribute contains only (one or more) word nodes, that constitutes the list of headwords. If the Ref attribute on an FE is not a list of word nodes, then it must point to a phrase; starting from that node, collect the list of headwords
To collect a list of headwords:
If the current node is a frame node, and there is no Transparent attribute, the value of the current target is the head word list, stop. If the current node is not a frame node, either one or more of the current node's daughters will be marked with either "Head" attribute or a "SemHead" attribute. (If some are Head and other are SemHead, return an error.) Recursively follow each of the Head (or SemHead) references until you return with a word list; concatenate the word lists of all the Head or SemHead daughter nodes and return this list.
Supports (with the "Supp" attribute) are to be labeled and will be scored like non-core FEs.
Denoted FEs (which are shown as an attribute on the same P node as the Frame name and the Target) are handled like other FEs. These are cases where the target word both evokes a frame *and* denotes a filler of one of the frame elements.
Scoring: Frame and Core FE matches have a weight of 1. Peripheral and Extra-thematic FEs, along with Supps, have a weight of 0.5. Named entities (including the denoted FE) have a weight of 0.5 in total.
Syntactic phrases or lists are not counted; they serve only as parts of chains linking FEs to their headwords. In other words, if some of the E nodes are marked as "Head" or "SemHead", the other E nodes are ignored. There are a few cases in which the fttosem process does not recognize any heads among the E nodes; in those cases, all of the E nodes will be treated as heads and followed recursively.
A DTD for the semantic XML files and a scoring program, along with examples of output files and their scores will be posted on the task website soon after the beginning of the evaluation period.