THE PROBLEM OF UNREALIZED COMPLEMENTS AND ITS RELATION TO FRAMES AND SCRIPTS
 ============================================================================
	Working Note 29, Peter Clark, Boeing Research and Technology (2008)
		     peter.e.clark@boeing.com

1. The Problem
--------------
Complements and adjuncts are often unrealized in language, e.g.,:

   (1) A man sold a book [to someone] [for a price]
   (2) A man sold a book for $10 [to someone]
   (3) A woman received a book [from someone]
   (4) Proposals [for funds] [from people/institutions] [to a funding agency] 
		 must be 15 pages.
   (5) Members [of an organization] objected to the proposal.
   (6) There was violence [by someone] [to someone/something] in Budapest last 
	     night

Their absence causes problems for a simple deductive rule engine trying to
apply rules, as the rules' conditions may not be satisfied by the explicitly
stated information, e.g., the below rules are (undesirably) not triggered by 
the above sentences:

   (r1) IF X sells Y to Z THEN Z buys Y from X
   (r2) IF X sells Y to Z for Cash THEN Z gives Cash to X
   (r3) IF X receives Y from Z THEN Z gives Y to X
   (r4) IF X is violent to Y THEN X damages Y

As a result, entailments may be missed, e.g.:

   T1: A book was sold.
   H1: A book was bought / received / paid for.

In the above example, although DIRT contains rules suggesting T1 -> H1, these 
rules don't fire when applied in the normal deductive fashion as their 
conditions are not explicitly satisfied, and hence the desired implications 
are missed by the system.

This phenomenon has also been referred to as "implicit role reference" 
in the literature (see Appendix).

2. Solutions
------------

(a) Unrealized complements
--------------------------
If the rule's condition "X sells Y" does not fire on text "Y was sold",
the system is essentially assuming that: if no seller (X) is mentioned, 
then we cannot be sure that there is a seller. However, in language, often 
unstated complements DO exist in the real world, implied by other elements
(e.g., the verb) of the sentence. To handle this, one can relax 
the inference engine's assumption in the rule application, so a rule 
triple (X r Y), e.g., ("sell" agent "person"), will match any text 
containing X', e.g., "sell", providing there is no clashing 
text triple (X' r Y'), e.g., ("sell" agent "company"). Now rule condition 
testing is starting to look a little more like matching.

We could take this one step further and also allow rule conditions like 
"X sells Y for Money" to match text "X sells Y", thus abducing the existence 
of Money. The existence of the "for" complement may be a certain thing, or 
just a probable thing, depending on whether the "for" complement is always or 
usually present in selling (if it's only sometimes present, it's really
an adjunct, not a complement). Similarly whether that complement is of 
type Money may be certain or just probable, depending on the semantics of 
selling; this is real world knowledge, not syntactic knowledge. It is 
knowledge of the expectations that the frame (verb) should invoke.

Alternatively, a simple way of filling in complements would be to use a 
resource like FrameNet (say) to find the complements of a verb and their 
types, and assert them. For example, FrameNet tells us that "sell" has a 
buyer, seller, goods, and money (although doesn't specify their types).
We can match the frame with the text and fill in the complements. In 
fact, this is really very similar to the matching of a rule's condition 
described above, the difference being that FrameNet frames are geared 
towards describing the complements (players, Frame Elements) of a verb without 
further implication, while rules are primarily geared towards implication
(i.e., saying what facts follow from the given elements).

Of course, we cannot universally assume that an unmentioned preposition or 
complement exists. Consider the rule:

   IF a Covering is over a Thing THEN the Covering covers the Thing

This rule should *not* fire on a covering (e.g., a blanket), abducing that
the covering is implicitly over something. Similarly:

   IF a Person walks under a Thing THEN the Thing is over the Person

we shouldn't abduce from the existence of a Person walking (or just a Person)
that the Person is under something. The bottom line is that we need to 
distinguish essential arguments
(complements) from optional arguments (adjuncts), and only allow abduction on 
the essential arguments. (A trivial rule, implemented in our software, is that 
only subjects and objects can be abduced). To make matters more complicated, 
it may be that the complement/adjunct distinction depends on the sense of the 
verb in play, e.g., the syntactic object of break#v1 as in "the engine broke" 
is an adjunct (optional), while the object of break #v2 as in "the engine 
broke the axle" is a complement (obligatory).

We earlier said if the abduced complement "clashes" with what is known, then
the complement should not be abduced. But the notion of "clash" is tricky, as
often multiple objects can be in the same semantic relation with a verb.
Consider 
	 Rule: "A bandage covers a wound" 
	 Text: "A bandage covers stitches"
Although the objects of "cover" are distinct, they do not clash as more than
one thing can be covered; in this case it is perfectly sensible for a
bandage to cover both a wound AND stitches, and for the existence of the
wound to be abduced.

(b) Coordinated completion of unrealized complements
----------------------------------------------------
Often a verb's complements can be filled in in multiple, coordinated ways, eg, 
people are often violent to people, storms are often violent to property, etc. 
Thus sometimes if we know one complement, it might help us determine the 
likely types of other complements. If, as in FrameNet, we only have a single 
frame encompassing all uses of the verb sense, then it may have such general 
type restrictions on its roles (frame elements) as to be largely uninformative.
Instead, it may be better to have a collection of different, alternative 
specialized uses of a verb sense. These essenntially correspond to having
multiple, distinct FrameNet-style frames for a given verb sense.

For example, consider again the earlier sentence:

   (5) Proposals for funds [to a funding agency] must be 15 pages.

Here the complement "for funds" to "proposal" suggests the recipient ("to") 
of the proposals is a funding agency. We'd like one instantiation of 
the "proposal" frame to include the complements "funds" and "funding agency"
(and maybe "institution wanting funds"). Alternatively, if we had a sentence
about a "proposal for marriage", we would expect the "to" complement to be
different. [Aside: note here we're working with
noun modifiers, rather than verb complements, but the problem is essentially
the same].

Similarly if the algorithm suggests a set of complements, but one clashes
with a complement that *is* given, then this suggests the whole set may
be inapplicable. There should be a coordinated application of the whole set.

(c) Anaphoric unrealized complements
------------------------------------
The appropriate complement may not be attached to the verb in the
sentence, but still be mentioned earlier in the text. For example:

   (7) Members of the Redding Fire Department brought their ladder truck to 
       campus and raised the 45-foot ladder. Students took turns climbing 
       to the top [of the ladder]. 
       (Source: news article)

The complement of "climb" is the earlier-mentioned "ladder". We'd like a
system to realize this, not just introduce a Thing (the "climbee") as
the object complement of "climb". The fact the complement is realized in
a different sentence adds another level of complication; we cannot just
match frames/rule conditions with text on a sentence by sentence basis, but
need to take into account the whole paragraph. Similarly, consider:

   (8) Jeff decided to go surfing. There were sightings of Great Whites 
         off Newport, but Jeff wasn't
       concerned [about himself being eaten by the Great Whites.] 
       (Source: Lange and Wharton, 1999). 

(8) is particularly interesting, because the unrealized complement 
(namely the proposition "Great Whites eat Jeff") isn't mentioned in the text, 
but an element of that proposition ("Great Whites") is.

In fact, the process of filling in complements with more specific 
values (e.g., "funds"-"funding agency") again looks a lot like matching scripts
with text, where the "scripts" are a single event frame. If there are
multiple words with missing complements, then each can help disambiguate
the other as to the correct complements to fill in.

3. From Unrealized Complements to Scripts
-----------------------------------------

3.1 Scripts
-----------
It is not a huge leap to go from the unrealized complement problem to the whole
script-matching problem. Consider my two favorite script examples:

   T2: The bomb attack destroyed the shrine
   H2: The bomb exploded

   T3: A soldier was killed in a gun battle
   H3: The soldier was shot

In T2 there is something akin to a "frame" or "complements" associated with
"bomb" in which it explodes and destroys things, and matching this against
T2 will suggest H2. Strictly, this frame is not attached to/part of the
semantic knowledge about just "bomb"; rather it's an independent structure
associated with several terms including "bomb" and "destroy".

Similarly in T3, there are expectations associated with "kill" and "gun" (and
"soldier") in which someone is shot, which should apply in this case.

The bottom line is that it's all really matching against expectations
at various levels of granularity.

3.2 The Matching and Knowledge Problems
---------------------------------------
A concept in context often suggests other concepts 
in relation to it (e.g., each complement), even if those other concepts are
not stated in text. Matching consists of hypothesizing these additional 
concepts. This process may be as simple as a word suggesting a single triple 
(e.g., "sell" -> "sell to person"), or as complex as a whole structure 
suggesting a full-blown script. Implications, both small and large, will 
reinforce or contradict each other in various ways. We might say that
there are two key problems:
 - The MATCHING PROBLEM is to find the most coherent set of implications
   from a given set of assertions (e.g., stated in text).
 - The KNOWLEDGE PROBLEM is to construct a database of plausible implications
   in the first place, including a degree of confidence on what constitutes 
   evidence to trigger them, and a degree of confidence in their various 
   implications. 
   Tuples seem like they should play a role in solving the knowledge problem.
   Similarly the DIRT paraphrase database seems like a possible knowledge 
   source for this. It might be one can construct larger tuple-structure 
   expectations, but with lower confidence -- even spanning multiple 
   sentences, perhaps.

It is not clear whether such expectations should be encoded as rules or
scripts. The good thing about rules is that there are well-defined mechanisms 
for using them. The problem with rules is that they impose a somewhat 
artificial directionality on knowledge. For example, for "bomb" we might need 
rules expressing different permutations such as (informally) 
bomb & destroy -> explode
bomb & explode -> destroy
explode & destroy -> bomb
These are all different ways of saying "some evidence should suggest the whole 
lot". On the other hand, matching against the script "bomb & destroy & explode"
is poorly specified: What degree of match is needed to consider the script
applicable? Clearly some features are more significant than others, and
some are essential; some weighting scheme would be needed.

REFERENCES
==========
T. Lange and C. Wharton (1999). "Retrieval from Episodic Memory by
Inferencing and Disambiguation", in Understanding Language Understanding, 
pp 107-180, Ed. A. Ram and K. Moorman. MA: MIT Press. 

POSTSCRIPT 1/2/15
=================
I later discovered this issue has also been explored by Joel Tetreault 
(Univ Rochester) under the title of "Implicit Role Reference" ([1,2]). Here,
he treats these as the complement of indirect anaphora:

  Indirect anaphora: Explicit reference back to an implicit entity.
  Implicit Role Reference: Implicit reference back to an explicit entity. e.g.,

	(1) Take engine E1 from Avon to Dansville
	(2a) Pick up the boxcar and take it to Broxburn [from ?]
    here, the (implicit) "from" of "take" = (previously mentioned) Dansville.
    The computational task is to "resolve" this implicit reference.

In Joel's statistics, about half (= a lot!) of a verb's semantic roles are 
implicit and need to be filled in. The fillers are typically mentioned earlier.

He did specific experiments with missing to/from fillers in transportation
events (TRAINS domain). His solution was to search backwards in the text, rather
than do a world simulation (<- I'd have preferred this approach).

(Papers and PPT available at 
 http://www.cs.rochester.edu/~tetreaul/academic.html)
[1] Joel R. Tetreault. Tense and Implicit Role Reference. Annotation Standards 
    for Temporal Information in Natural Language, Workshop in LREC 2002 Las 
    Palmas de Gran Canaria, May 27, 2002, p.61-64.
[2] Joel R. Tetreault. Implicit Role Reference. 2002 International Symposium 
    on Reference Resolution for Natural Language Processing. Alicante, Spain, 
    June 3 - 4, 2002, p.109-115.