Extreme querying with_analytics

Standard SQL functionality focuses on rows. When you want to explore the relationship between rows, those rows would have to come from different sources, even if it was the same table with a different alias. And those different data sources would have to be joined. Analytic functions allow the rows in a result set to 'peek' at each other, avoiding the need for joining duplicated data sources.
A partition separates discrete and independent slices of data. A window dictates how far back or forward the row can peek WITHIN the partition The default windows is from the 'start' of the partition to the current row. The concept of 'start' generally requires an explicit ORDER BY
Without analytics, all the expressions and columns in a row of a result set had to come from within that row. Analytics allows you to 'peek' at your neighbours
You can have multiple analytics in a single select, all with different partitioning and/or ordering characteristics. You may want to partition by Customer or Order, and order by dates ascending and descending. Like any SQL, there will be limits on how complex you ALLOW things to get.
This 'New Aggregate' section is pretty much only here to emphasize this function.
If I want the biggest selling item, there's little point in telling me how many I've sold if you can't tell me WHAT I sold.
'X' is made up, just to give a row in Cities that has the same population as Sydney
It looks better with your own user defined collections.
XML allows complex or 'wide' data sets (lots of columns) to be pulled together. They are a (mostly) reliable mechanism for grouping data so that it can be ungrouped at a later point.
You can group up an entire record this way, treat it as a single value as you move it around (eg in a MAX … KEEP) and then decompose it back at the end.
Wraps the <Name><Pop> into a higher level record.
Turn the results into a single row. This is the AGGREGATION function.
Finally, you may need to wrap the XML records into a higher level chunk. Here we have a PARENT, with multiple LINE entries each of which consists of a NAME and POP element.
Better than the old days when you had to rely on DECODE, which was a right royal PITA if you needed greater than/less than style comparisons, especially with strings where you couldn't use SIGN.
These are also available in SQL Server : Google &quot;DAT317: T-SQL Power! The OVER Clause: Your Key to No-Sweat Problem Solving&quot; Which was presented at &quot;Tech Ed North America 2010&quot;
The most common scenario is a Top-N query.
Not wanting to lose all the employees from a particular sector, Smithers brings a report ranking the employees within their department (or sector).
NOTE: Since I used a WAGE DESC as the order by, nulls were put to the top. I can avoid this with a NULLS LAST clause select name, wage, sector, row_number() over (partition by sector order by wage desc nulls last) rn, rank() over (partition by sector order by wage desc nulls last) rnk, dense_rank() over (partition by sector order by wage desc nulls last) drnk from emp where sector in ('7G','9i') order by sector, wage desc nulls last ROW_NUMBER gives non-duplicating, consecutive numbers. If the ORDER BY is not deterministic, the results may differ. It can give you a &quot;Give me the two highest paid employees&quot; and guarantee no more than two rows, but with the risk that it isn't deterministic. You might get Lenny or Carl. RANK gives the same number when the ORDER BY values match, but will skip numbers. It can be the best for &quot;Give me the two highest paid employees&quot; with the caveat that you may get more than two records if there are 'ties' at the end. For 7G, Homer, Lenny and Carl would be returned. DENSE_RANK gives the same number when the ORDER BY values match, and the next number is always consecutive. In this case 'Give me the three highest salaries and the people to whom they are paid&quot; would return the four people in 7G with salaries of 200, 100 and 50. If there are no ties, the results are equivalent.
Cumulative amount demonstrates the ORDER BY. It is generally less confusing if, where you have an ORDER BY in an analytic, you have the same ORDER BY at the bottom of the query.
Because Lenny and Carl both earn 100, the SUM analytic 'groups' both together. However the ROW_NUMBER analytic orders them uniquely. Using the ROW_NUMBER derived value as a filter means that a group is broken and the results look wrong.
The default for the windowing clause is RANGE, which is deterministic and has rows of equivalent value at the same level. The alternative is ROWS which puts in an artificial, and arbitrary, differentiator as a tie-breaker. The UNBOUNDED PRECEDING is also the default. It just means start at the beginning (eg the beginning of the partition, but we'll get on to partitions later).
NTILE (and the similar PERCENT_RANK) can be useful for excluding the outliers. For example, you only select rows where the percent_rank is between 10 and 90 to exclude the top and bottom 10% of 'weird' values, or select the middle third Percent_rank is always a percentage. NTILE allows you to choose your own bucket size NTILE is also available in SQL Server. Postgres also has analytics, which are enhanced in Postgres 9.
create table test_mill as select round(dbms_random.normal,3) n from dual connect by level < 100000; select round(n*2) label, count(*) val from (select n, ntile(9) over (order by n) nt from test_mill) where nt [not] in (1,9) group by round(n*2) order by 2
IGNORE NULLS
Emphasize that May went from 130 to 170, an increase of 40 (or about 31%). Not much use for LEAD, but it is pretty similar. create table sales (period date, amount number); Insert into sales select add_months(trunc(sysdate,'YYYY'), rownum -1), round(dbms_random.value(100, 500),-1) from dual connect by level < 10; column perc format 9999.99 select to_char(period,'Month') mon, amount, lag(amount) over (order by period) prev_amt, 100*(amount - (lag(amount) over (order by period)))/(lag(amount) over (order by period)) perc from sales order by period /
Ignore nulls syntax (11g) select to_char(period,'Month') mon, amount, lag(amount) ignore nulls over (order by period) prev_amt from sales order by period /
Partitioning by Quarter Get the sales value for the first month of each quarter.
Answers questions that I never need to ask, such as the rolling total of the last three months.
Filter is applied AFTER the analytic SQL> explain plan for 2 select * from 3 (select order_id, line_id, 4 sum(value) over (order by line_id) cumul 5 from order_lines) 6 where order_id =10; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------- Plan hash value: 2716399136 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 234 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 6 | 234 | 4 (25)| 00:00:01 | | 2 | WINDOW SORT | | 6 | 234 | 4 (25)| 00:00:01 | | 3 | TABLE ACCESS FULL| ORDER_LINES | 6 | 234 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(&quot;ORDER_ID&quot;=10)
In this case, it allows from predicate pishing SQL> explain plan for 2 select * from 3 (select order_id, line_id, 4 sum(value) over 5 (partition by order_id order by line_id) cumul 6 from order_lines) 7 where order_id =10; Explained. SQL> SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------ Plan hash value: 2716399136 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 117 | 4 (25)| 00:00:01 | | 1 | VIEW | | 3 | 117 | 4 (25)| 00:00:01 | | 2 | WINDOW SORT | | 3 | 117 | 4 (25)| 00:00:01 | |* 3 | TABLE ACCESS FULL| ORDER_LINES | 3 | 117 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(&quot;ORDER_ID&quot;=10) --------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------- SQL_ID 7zc4srwzv21gq, child number 0 ------------------------------------- SELECT * FROM ORD_LN_VW WHERE ORDER_ID = :B1 Plan hash value: 2082499838 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4 (100)| | | 1 | VIEW | ORD_LN_VW | 1 | 39 | 4 (25)| 00:00:01 | | 2 | WINDOW SORT | | 1 | 39 | 4 (25)| 00:00:01 | |* 3 | TABLE ACCESS FULL| ORDER_LINES | 1 | 39 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(&quot;ORDER_ID&quot;=:B1)
select decode(grouping(to_char(period,'Q')),1,'Total', nvl(to_char(period,' MM Month'),' Subtotal')) mnth, sum(amount) amt from sales group by rollup(to_char(period,'Q'), period); Sometimes use GROUPING in a filter predicate to avoid duplicate results. [Should be able to use HAVING, but ORA-0600 in XE] select m1,m2, amt from (select m1, m2, sum(amount) amt, grouping(m1) gm1, grouping(m2) gm2 from (select to_char(Period,'Month') m1, to_char(period,'MM') m2, amount from sales) group by rollup (m1, m2)) where gm1 = gm2 order by m2, m1, amt /
The example above excludes the detail rows shown below. SQL> select colour, shape, count(*) 2 from stc 3 group by cube(colour,shape) 4 / COLOUR SHAPE COUNT(*) ---------- ---------- ---------- 249 Oval 83 Round 83 Square 83 Red 50 Red Oval 16 Red Round 17 Red Square 17 Blue 49 Blue Oval 17 Blue Round 16 Blue Square 16 Green 50 Green Oval 17 Green Round 16 Green Square 17 White 50 White Oval 16 White Round 17 White Square 17 Yellow 50 Yellow Oval 17 Yellow Round 17 Yellow Square 16 24 rows selected.
Number, datatype and names of columns are fixed at parse time. There can't be any chance of a subsequent execution, potentially with different bind variables, returning a differently structured data set.

Extreme querying with_analytics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Extreme querying with_analytics

Ähnlich wie Extreme querying with_analytics (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Extreme querying with_analytics

Hinweis der Redaktion