Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Sedna XML Database: Query Parser & Optimizing Rewriter

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 21 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Sedna XML Database: Query Parser & Optimizing Rewriter (20)

Anzeige

Aktuellste (20)

Sedna XML Database: Query Parser & Optimizing Rewriter

  1. 1. Sedna Query Parser & Optimizing Rewriter Dmitry Lizorkin [email_address] Ph.D., software developer Sedna team
  2. 2. Goals <ul><li>Wide range of queries/statements support: XQuery queries, XML update statements, data definition language statements </li></ul><ul><li>High performance for both query evaluation and updates execution </li></ul><ul><ul><li>Query optimization strategies designed in correspondence with Sedna internal data representation design </li></ul></ul>
  3. 3. Query processing steps query execution plan (QEP): to Sedna executor operation tree Physical Plan Generator Optimizing Rewriter Static Analyzer Parser XML update statement Data Definition Language statement XQuery query
  4. 4. Query parser <ul><li>Input: 3 types of queries/statements </li></ul><ul><ul><li>XQuery queries </li></ul></ul><ul><ul><li>XML update statements </li></ul></ul><ul><ul><li>Data Definition Language statements (e.g. create document statement) </li></ul></ul><ul><li>Output: uniform operation tree for all above 3 query/statement types </li></ul>
  5. 5. Static analyzer <ul><li>Query static analysis phase </li></ul><ul><li>Static context is initialized with XQuery Functions and Operators and augmented with query prolog declarations </li></ul><ul><li>Query operation tree is expanded with imported XQuery modules </li></ul><ul><li>All namespace prefixes, function names and variable names are resolved </li></ul><ul><li>XQuery static errors are reported, if any </li></ul>
  6. 6. Optimizing rewriter step Physical Plan Generator Optimizing Rewriter query execution plan Static Analyzer Parser operation tree XML update statement Data Definition Language statement XQuery query
  7. 7. Optimizing Rewriter <ul><li>Optimization based on query rewriting </li></ul><ul><li>Removing unnecessary ordering operations </li></ul><ul><li>Combining abbreviated descendant-or-self path step with a next path step </li></ul><ul><li>Removing unnecessary node copies for constructed content </li></ul><ul><li>Analyzing nested for-clauses </li></ul>
  8. 8. Ordering operations: challenges <ul><li>Many XQuery expressions have the semantics for their resulting sequences to be ordered in document order with duplicate nodes removed </li></ul><ul><ul><li>“ Distinct-Document Order” semantics is expressed via explicit DDO operations in Sedna query operation tree </li></ul></ul><ul><li>DDO operations decrease query execution performance </li></ul><ul><ul><li>Require the whole argument sequence to be evaluated before a first result item could be produced </li></ul></ul><ul><ul><li>Break execution pipeline </li></ul></ul>
  9. 9. Ordering operations: optimization <ul><li>Idea: removing unnecessary ordering operations </li></ul><ul><li>Analysis: for each operation in the query operation tree, the following properties for the resulting sequence are recursively found out </li></ul><ul><ul><li>whether in DDO </li></ul></ul><ul><ul><li>whether consists of no more than a single item </li></ul></ul><ul><ul><li>whether consists of nodes on a common level of an XML tree </li></ul></ul><ul><li>Result: a DDO operation is removed if </li></ul><ul><ul><li>either its argument is known to be in DDO, or </li></ul></ul><ul><ul><li>DDO is not required for the resulting sequence </li></ul></ul>
  10. 10. Descendant-or-self: challenges <ul><li>The “//” abbreviated path step (expanded into descendant-or-self::node() ) is frequently used in practical XQuery queries </li></ul><ul><li>Expensive to evaluate </li></ul><ul><ul><li>Bad selectivity: generally, selects almost all nodes in an XML document </li></ul></ul><ul><ul><li>Does not allow to use benefits of Sedna descriptive schema-driven storage strategy </li></ul></ul>
  11. 11. Descendant-or-self: optimization idea <ul><li>Idea: combining the // step with a next path step </li></ul><ul><ul><li>E.g., //para transformed to /descendant::para </li></ul></ul><ul><ul><li>Better intermediate selectivity </li></ul></ul><ul><ul><li>Benefits of Sedna schema-driven storage </li></ul></ul><ul><li>Technical issue: context position/size </li></ul><ul><ul><li>“ The path expression //para[1] does not mean the same as the path expression /descendant::para[1] ” (XQuery Spec., Subsect. 3.2.4) </li></ul></ul>
  12. 12. Descendant-or-self: solution <ul><li>For the // path step, its next step predicate expressions are analyzed (if any) </li></ul><ul><li>If predicate expressions results do not depend on context position and size (neither explicitly, nor implicitly), </li></ul><ul><ul><li>than the // step can be combined with its next step while keeping the original query semantics </li></ul></ul>
  13. 13. Removing unnecessary node copies <ul><li>An XQuery constructor semantics implies constructor content being new nodes, “ even if they are copies of existing nodes ” </li></ul><ul><li>Problem: making a deep copy of an XML subtree is expensive to evaluate </li></ul><ul><li>Idea: avoiding node copies that do not affect query result semantics </li></ul><ul><li>Algorithm: a constructed node needs not be copied if </li></ul><ul><ul><li>it is a part of the query result sequence, or </li></ul></ul><ul><ul><li>it becomes a direct child of another constructed node </li></ul></ul>
  14. 14. Analyzing nested for-clauses <ul><li>FLWOR-expressions generally contain multiple iteration variables in for-clauses </li></ul><ul><li>for $u in doc('users')//user_tuple, </li></ul><ul><li>$i in doc('items')//item_tuple </li></ul><ul><li>where ... return ... </li></ul><ul><li>Binding sequences with nested loop semantics </li></ul><ul><li>An expression associated with an inner iteration variable is analyzed </li></ul><ul><ul><li>If the associated expression does not depend on outer iteration variables, it is marked as “lazy” </li></ul></ul><ul><ul><li>Lazy associated expression value can be evaluated just once, with the query semantics preserved </li></ul></ul>
  15. 15. Physical plan generation step Physical Plan Generator Optimizing Rewriter query execution plan Static Analyzer Parser operation tree XML update statement Data Definition Language statement XQuery query
  16. 16. Query physical plan generation <ul><li>Query execution plan (QEP) for the query operation tree is constructed </li></ul><ul><li>Structural location path fragments are extracted </li></ul><ul><ul><li>A path that starts from an XML document node and contains only descending axes and no predicates </li></ul></ul><ul><ul><li>Mapped to Sedna descriptive schema access operations </li></ul></ul><ul><li>For & order-by clauses are mapped to tuple stream generation & reordering operations respectively </li></ul>
  17. 17. Implementation details <ul><li>Query parser is implemented with ANTLR parser generator (www.antlr.org) </li></ul><ul><li>ANTLR native representation for an operation tree is an S-expression </li></ul><ul><li>S-expression is a native data representation for Scheme programming language </li></ul><ul><ul><li>Scheme provides extensive native features for S-expressions rewriting purposes </li></ul></ul><ul><li>Optimizing rewriter is implemented in Scheme </li></ul><ul><li>The Scheme-to-C compiler produces high-performance code </li></ul>
  18. 18. Summary <ul><li>Complete query static analysis phase and rewriting-based optimization processing (i.e. from query textual representation to physical plan) </li></ul>
  19. 19. Thank you for your attention!
  20. 20. Future work: Cost-based optimization <ul><li>Problem analysis </li></ul><ul><li>Join operations are primary candidates to benefit from cost-based optimization implemented </li></ul><ul><li>In XML, the problem of join evaluation is not as vital as for relational data </li></ul><ul><ul><li>One-to-many entity relationships can often be modeled via XML elements nesting mechanism </li></ul></ul><ul><ul><li>For many-to-many relationships, Sedna explicit indexes can be used </li></ul></ul>
  21. 21. Cost-based optimization: plans <ul><li>Join operations extraction in a query operation tree </li></ul><ul><li>Selectivity estimation mechanism for XQuery expressions </li></ul><ul><ul><li>A storage for selectivity statistics is required (XML?) </li></ul></ul><ul><li>Cost-based physical plan selection </li></ul><ul><ul><li>is a complicated task in general; however, a lot of related work exists </li></ul></ul><ul><ul><li>can be relatively easily implemented for simple cases </li></ul></ul><ul><ul><li>Hints can be used for complex cases </li></ul></ul>

×