Сборник 2001

A locative-question answering system

Nicolas Denand and Monique Rolbert

 

 

1        Introduction

This paper presents a natural language processing (NLP) system able to answer where-questions. The main application objective is to query a knowledge base containing spatial information. Our system is based on a general NLP platform named ILLICO, which offers a lexical, syntactic and semantic framework for sentence analysis. The specificity of our application domain leads us to propose a pragmatic level of processing accounting for some space related and question answering issues.

First of all, we briefly present the main features of ILLICO and the target application. Then, we highlight the problems that arise in space question answering and their solutions in our approach.

2        System main features

Currently, the application objectives of our system are twofold:

  • to query a geographical database about the state of the World
  • to propose linguistic games where a child can query the system about the location of objects on a picture

Here are two examples of typical uses of our system:

(1) - Where is France?

(2) - Where is the tree?

In both cases, the system has to "understand" the given question, then find an appropriate answer.

 

Our development platform is ILLICO, a generic NLP system that provides a set of tools allowing to develop various NLP applications (database interfaces, communication aid systems. . . ) [Pasero and Sabatier, 1995]. It is based on the following principles:

  1. Different types of linguistic knowledge independently encoded in separate modules:
  • a lexicon containing expected words and expressions
  • a grammar specifying the expected sentence structures and the grammatical agreement
  • a set of semantic composition rules producing semantic representations from the syntactic rules of the grammar
  • a conceptual model specifying the world of the application in terms of relations and their conceptual domains
  • a contextual model specifying the objects introduced in the preceding sentences and the initial state of the world (a database or a picture in our system)
  1. Sentence composition using partial synthesis and guided composition

 

For each application, we develop a lexicon, a set of grammar rules and an application specific conceptual model. When the user enters a question, the system firstly checks its lexical, syntactical and conceptual well-formedness. Then the question is translated into a lambda-calculus formula, which is processed by an evaluator with respect to the set of formulae representing the context of the application, thus giving the set of possible answers to the user’s question.

 

Our application has to deal with where-questions, each having several possible answers. Therefore, ILLICO's strategy isn't sufficient to produce a relevant answer. The most important part of our contribution to ILLICO consists in introducing a pragmatic level of post-processing, accounting for pragmatic skills and space-specific cognitive behaviors.

3      Answering the question Where?

Our main objective is to implement a system able to answer locative questions by producing a locative prepositional phrase:

Figure 1: situation example

(3) Where is the chair? In front of the desk.

following the syntactic structures:

  • Question: Interrogative Pronoun + State Locative Verb + Noun Phrase
  • Answer: Locative Preposition + Noun Phrase

The main problem here is that such questions frequently imply a large number of possible answers depending on the spatial context. In example (3), for instance, there are several other possible answers (in front of the computer, near the corner of the room, on the floor, under the ceiling, etc.). From a formal point of view, all of these answers are correct (they express true facts). Nevertheless, it is obvious that some of them are better than others. Therefore, we have to find a way to automatically choose between them.

 

According to previous related work in various fields of research, we consider that the answer to a given locative question must satisfy two (sometimes antagonist) constraints:

  1. The amount of information contained in (or implied by) the answer is maximal. This point of view comes from the studies of [Sperber and Wilson, 1989] about relevance and Grice's maxims in [Grice, 1975].
  2. The choice of a reference point in space accounts for cognitive and perceptual phenomena. This idea is related to studies on space, either in cognitive sciences [Herskovits, 1985] and linguistics [Talmy, 1978] [Vandeloise, 1988]

3.1        Maximal Informativeness

According to [Sperber and Wilson, 1989], the more effects for an utterance, the more relevant it is. [Reboul and Moeschler, 1998] define three contextual effects produced by the interpretation of any given utterance U:

  1. The interpretation of U modifies the way a proposition of the context is believed.
  2. The interpretation of U refutes one of the propositions of the context. So, one of the two propositions P (obtained by the interpretation of the utterance) and P’ (that Prefutes) has to be removed. The less believed one will be removed.
  3. The interpretation of U gives one or more contextual implications: these are the conclusions produced by the deductive inferential mechanism from the set of propositions of the context, plus the logical formula issued from the utterance.

Currently, we account only for this last point. Therefore, the answer of the system to a user's question affects his context in adding new information, either directly, or through inferences induced by universal real-world laws. An example of such a law is the transitivity of the relation is_above:

is_above(X,Y) Ù is_above(Y,Z) Þ is_above(X,Z)

 

Taking this type of inferences into account implies that two different answers to the same question can have different effects on the user’s context. For instance, let us consider the following situation:

 
   

 

 

 

 

 

Figure 2: inference example

 

If the user asks for the location of object X, there are two possible answers: X is above Y or X is above Z. Besides, if the user already knows that Y is above Z, due to the transitivity of relation is_above, he or she can deduce the latter answer from the former. This will make the first answer better than the second. More generally, we consider that an answer is maximally relevant, among all possible answers, if it maximalizes the difference between the state of the user's world before the answer is added and its state after the answer has been added[1].

3.2        Cognitive and perceptual aspects

When a locative question is uttered, computing the "best" answer mostly means choosing the best ground for a given figure[2].  In other words, when looking for the chair in example (3), there are a number of objects or locations that could be used as a spatial reference. What makes the desk the most valuable ground for the given figure? A number of criteria play a part in this issue, most of them being related to the concept of salience (that is, what makes an object more "visible" or "accessible" than others within a scene with respect to a given figure):

  • perceived size: the way the user perceives the size of an object (function of its physical size and the distance between the object and the point of view). Obviously, the bigger it is perceived, the more salient it is.
  • physical properties size: any physical property making an object salient (color, shape, luminosity...)
  • contextual properties size: non-physical properties allowing to establish a total order on objects of the same conceptual category sharing the same property. For instance, in terms of economical property, Japan would be ranked before Ethiopia, giving it a better ground potential to localize a third country.
  • physical distance between figure and ground: the closer an object is to the figure (in terms of metrics), the better ground it is
  • conceptual distance between figure and ground: the more similar an object is to the figure, the better ground it is

For a more detailed description of the criteria above, refer to [Denand, 2001].

3.3        Implementation within ILLICO

As noticed in section 2, we basically introduced a pragmatic level of processing in the ILLICO framework. This new level of processing is an add-on module in charge of implementing the theoretical aspects described in subsections 3.1 and 3.2. Roughly, this module takes the set of possible answers given by ILLICO’s evaluator and sorts it. Each answer is a short formula consisting in a spatial relation between figure and ground.

First of all, to account for maximal informativeness of an answer, it is processed in its primary form, as a formula. We set up a user’s context on which the n possible answers to a given question are applied to compute the n possible new user’s contexts (these being the whole set of theorems of the initial context, plus all the theorems recursively inferred from the adding of the considered formula). Let Cn be the cardinality of user’s context n (that is the number of true formulae), the best answer formulae are the one that produce the context having a cardinality of Max(Cn). The complexity of the algorithm computing all the possible new contexts may be very high (due, for instance, to cascading inferences). But, from a cognitive point of view, we can assume that all the inferences do not have to be processed, but only the first levels of the inference cascade.

Secondly, in order to process the cognitive and perceptual aspects of a possible answer, the pragmatic module uses the related ground. As a prerequisite, objects of the initial world must be described with respect to the properties involved by the criteria. Then, the level of satisfaction is computed for each ground with respect to each criterion. In order to account for the respective influence of each criterion, we finally affect a weight to each computed level of satisfaction. This last step gives us the final ordering of the possible grounds (the highest value representing the best ground) on the cognitive and perceptual aspects.

Combining these two main results, we finally obtain an ordered set of answers, the best one being displayed as a synthesized locative prepositional phrase by the system’s user interface.

4      Conclusion

We implemented most of the cognitive and perceptual criteria in our system, which already gives interesting results. To generate the best among all possible answers to a locative question, we had to build a corpus revealing the linguistic behaviour of actual human speakers in a given set of spatial situations. This corpus was made from a World Wide Web form-based survey. Statistical analysis of the results allowed us to refine the combination of the constraints, affecting to each of them an empirically determined weight.

For now, the system can handle several spatial situations, giving to any locative question of the related context an answer accounting for our theoretical developments. Actually, the order in which answers are displayed by the system corresponds to the reference order statistically computed from our corpus.

 

Although it is already working on some example applications, the system needs to be tested in every situation of the corpus. Since we did not implement every theoretical aspect yet, the system is not complete. In particular, the effects of an answer on the user's context are currently not taken into account. Similarly, computing distances within a qualitative framework is a complex operation. It raises issues in terms of ontology (conceptual distance) as well as qualitative space representation [Vieu, 1991].

 

References

[Denand, 2001] Nicolas Denand. Answering the question Where? – some experimental results. In H. Bunt, I. Van der Sluis & E. Thijsse (Eds.), Proceedings of the 4th International Workshop on Computational Semantics IWCS-4, Tilburg, The Netherlands, pages 81-96, 2001.

[Grice, 1975] H. P. Grice. Logic and conversation. In P. Cole & J. Morgan eds, Syntax and semantics 3: Speech acts, Academic Press, N.Y., 1975.

[Herskovits, 1985] Anette Herskovits. Semantics and Pragmatics of Locative Expressions. Cognitive Science, pages 345-348, 1985.

[Pasero and Sabatier, 1995] Robert Pasero and Paul Sabatier. Guided Sentences Composition: Some Problems, Solutions and Applications , Proceedings of the 5th International Workshop on Natural Language Understanding and Logic Programming (NLULP 95, Lisbonne), pp 97-110, 1995.

[Reboul and Moeschler, 1998] Anne Reboul and Jacques Moeschler. Pragmatique du discours. Armand Colin, 1998.

[Sperber and Wilson, 1989] Dan Sperber and Deirdre Wilson. La pertinence, communication et cognition. Paris, Minuit, 1989.

[Talmy, 1978] Leonard Talmy. Figure and ground in complex sentences. In J. H. Greenberg, C. Ferguson & J. Moravcsik (Eds.), Universals of human language, Vol. 4. Stanford, CA: Stanford University Press, Stanford, 1978.

[Vandeloise, 1988] Claude Vandeloise. Les usages spatiaux statiques de la préposition à. Cahiers de Lexicologie, 53(2), 1998.

[Vieu, 1991] Laure Vieu. Sémantique des relations spatiales et inférences spatio-temporelles : Une contribution à l’étude des structures formelles de l’espace en Langage Naturel, PhD Thesis, Université Paul Sabatier, Toulouse, 1991.

[1] Note that this is relevant only in discourse situations where the user does not hold the whole knowledge available to the system.

[2] According to Talmy's terminology [Talmy, 1978], we call figure the object whose location is unknown (namely the chair in example (3)) and ground the reference object that localizes it (the desk).