A Brief Overview of Some Datapoints in the Query Relaxation Literature ====================================================================== Working Note 40 Peter Clark, Boeing Research & Technology September 2009 I: The Literature ================= Query relaxation is a topic of interest in the database community. A classic illustration of the need is mentioned in [1]: a person is doing some shopping on the Web, and wants to search a product database on a Web site. He fills in attribute fields in a form, but gets zero results back. So he changes some fields, and still gets zero results, so deletes some fields, and then gets too many results back. A frustrating user experience, to say the least. Three topics emerge in the segment of literature I examined: 1. Defining relaxation operators 2. Expressing preferences between different relaxation operations and different query answers 3. Involving the user in the relaxation process 1. Defining relaxation operators ================================ Relaxation involves rewriting the query to target a greater space than the original. The mechanics of relaxation are typically fairly straightforward, involving generalizing or dropping a predicate or the type constraints on its arguments, e.g., [2]. Gaasterland [3,4] presents a more general characterization of relaxation operators: 1. rewrite a predicate 2. broaden a type constraint on a query variable 3. break a join dependency Rewriting a predicate means replacing it with a "more general" predicate. Gaasterland presents a useful notion of what this means in terms of inference, namely: p is a generalization of q IF q->p In other words, we can relax the query q by applying a deductive rule to the query. If the relaxed query p can be answered, then we can abduce the original query q (by following the deductive rule in reverse): DB |- p <- q Note that solutions to the relaxed query are not guaranteed to satisfy the original query, of course, as an abductive step is involved. Gaasterland also provides a formal definition of relaxation using Horn clauses with multiple conditions. First we illustrate his definition. If we know the deductive rule: R: q & b & c -> p and the user queries: Q: q then a relaxed version of the query according to his definition would be: Q': p & b & c Essentially, p is the generalization of q, and b & c are constraints on p. If the DB can prove b, c, and p, then we can abduce q. The general definition Gaasterland provides is that if the rule is B1,...,Bn,Q -> H, then the relaxation of Q can be written B1,...,Bn,H. He writes this: B1,...,Bn,H -> relax(Q) In other words, the relaxed query is the conclusion clause constrained by the other clauses. [ This can be justified by considering that the relaxed query is the maximally constrained generalization of the original. In the example above, if p is true, then we only need to abduce that q is true for the rule to have fired (rather than abducing q, b, and c are true), because we already know b and c are true in solutions to Q'. Or consider an example: R: fast(Person) & runs-for(Person,Team) -> successful(Team) Q: fast(fred)? Q': runs-for(fred,X), successful(X)? Note that Q' is a more constrained generalization than simply the rule conclusion on it own: successful(X). ] 2. Expressing preferences between different relaxation operations ================================================================= In general, there may be many ways to relax a query, resulting in many possible solutions from the database. There has been work on both (a) formalisms for specifying preferred relaxations of a query (b) formalisms for specifying preferred solutions to a query (a) formalisms for specifying preferred relaxations of a query -------------------------------------------------------------- There have been formalisms designed to allows relaxation preferences to be supplied by the user, e.g., Cooperative SQL (CSQL) [5,6]. An example CSQL query looks: SELECT aport_name, runway_length_ft, runway_width_ft FROM aports WHERE runway_length_ft > 7500 AND runway_width_ft > 100 RELAXATION-ORDER runway_length_ft, runway_width_ft Here the user has specifyied that runway length should be relaxed in preference to runway width, to help guide the relaxation process. (b) formalisms for specifying preferred solutions to a query ------------------------------------------------------------ Preference SQL [1] allows preferred answers to be specified, e.g., SELECT * FROM car WHERE make = 'Opel’ PREFERRING(category = 'roadster' ELSE category <> 'passenger' AND price AROUND 40000 AND HIGHEST(power)) CASCADE color = 'red' CASCADE LOWEST(mileage); SELECT * FROM trips PREFERRING start_date AROUND '2001/11/23' AND duration AROUND 14 BUT ONLY DISTANCE(start_date)<=2 AND DISTANCE(duration)<=2; Similarly, Preference Datalog Programs (PDPs) [7] allows the specification and computation of preferred answers in Datalog. Essentially, these formalisms allow additional, soft constraints to be specified in database queries [1]. This can be viewed as an alternative to query relaxation: Rather than relax a query, we provide a more general query with soft constraints, and prioritize the answers returned. 3. Involving the user in the relaxation process =============================================== There is a branch of research in building interactive ("cooperative") query answering systems that allow the user to guide the system as to which relaxations to explore. In the simplest case, the user is presented with the set of alternative, one-step relaxations and asked to choose between them [2,3]. II: Discussion ============== The notion of query relaxation used in the DB literature is largely the one we have been considering for AURA QF, the most interesting extension being Gaasterland's definition of query relaxation. As well as relaxing by applying rules deductively to the query (as we have already been considering in AURA), he also considers abductive rewrites (see the definition of relax(Q) earlier). These might also be applicable to AURA, e.g.,: Q: Does the nucleolus synthesize ribosomes? Q': The nucleolus is part of X AND X synthesizes ribosomes? (applying Gaasterland's relax(Q) operator using the rule IF X does Y AND X is part of Z THEN Z does Y). Note that if AURA does relax a query, it should NOT assume that the relaxed answer(s) = the answer(s) to the original question. Rather, AURA should simply present the user with the information it has found using the relaxed query. Also, the survey shows that the issue of how to prioritize relaxations and answers is quite important when working with databaes. It might also be important in AURA, particularly if multiple answers are returned. If so, we may be able to borrow some of the ideas for representing and using priorities/preferences described above. REFERENCES ========== [1] W. Kiebling. Foundations of Preferences in Database Systems. Tech Report 2001-8, October 2001, Univ Augsburg. [2] T. Gaasterland. Cooperative answering through controlled query relaxation. IEEE Expert, 1997. [3] T. Gaasterland, P. Godfrey, J. Minker. Relaxation as a platform for cooperative answering. Journal of Intelligent Information Systems, 1992 - Springer. [4] C. Tempich, S. Staab, A. Wranik. Remindin': semantic query routing in peer-to-peer networks based on social metaphors. Proc. 13th Int Conf on the World Wide Web, 2004 [5] W. W. Chu. CoBase: A Cooperative Query Answering Facility for Database Systems. Proc 4th Inf Conf on Database and Expert Systems, 1993 [6] W. W. Chu, Q. Chen, M. Merzbacher. CoBase: A Cooperative Database System. In "Nonstandard queries and nonstandard answers" by R. Demolombe and T. Imielinski, 1994. [7] K. Govndarajan, B. Jayaraman, S. Mantha. Preference Queries in Deductive Databases. In: New Generation Computing Volume 19, Issue 1, March 2001.