表: Customer
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| customer_id | int |
| name | varchar |
| visited_on | date |
| amount | int |
+---------------+---------+
在 SQL 中,(customer_id, visited_on) 是该表的主键。
该表包含一家餐馆的顾客交易数据。
visited_on 表示 (customer_id) 的顾客在 visited_on 那天访问了餐馆。
amount 是一个顾客某一天的消费总额。
你是餐馆的老板,现在你想分析一下可能的营业额变化增长(每天至少有一位顾客)。
计算以 7 天(某日期 + 该日期前的 6 天)为一个时间段的顾客消费平均值。average_amount
要 保留两位小数。
结果按 visited_on
升序排序。
返回结果格式的例子如下。
示例 1:
输入:
Customer 表:
+-------------+--------------+--------------+-------------+
| customer_id | name | visited_on | amount |
+-------------+--------------+--------------+-------------+
| 1 | Jhon | 2019-01-01 | 100 |
| 2 | Daniel | 2019-01-02 | 110 |
| 3 | Jade | 2019-01-03 | 120 |
| 4 | Khaled | 2019-01-04 | 130 |
| 5 | Winston | 2019-01-05 | 110 |
| 6 | Elvis | 2019-01-06 | 140 |
| 7 | Anna | 2019-01-07 | 150 |
| 8 | Maria | 2019-01-08 | 80 |
| 9 | Jaze | 2019-01-09 | 110 |
| 1 | Jhon | 2019-01-10 | 130 |
| 3 | Jade | 2019-01-10 | 150 |
+-------------+--------------+--------------+-------------+
输出:
+--------------+--------------+----------------+
| visited_on | amount | average_amount |
+--------------+--------------+----------------+
| 2019-01-07 | 860 | 122.86 |
| 2019-01-08 | 840 | 120 |
| 2019-01-09 | 840 | 120 |
| 2019-01-10 | 1000 | 142.86 |
+--------------+--------------+----------------+
解释:
第一个七天消费平均值从 2019-01-01 到 2019-01-07 是restaurant-growth/restaurant-growth/ (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
第二个七天消费平均值从 2019-01-02 到 2019-01-08 是 (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
第三个七天消费平均值从 2019-01-03 到 2019-01-09 是 (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
第四个七天消费平均值从 2019-01-04 到 2019-01-10 是 (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86
# Write your MySQL query statement below
# SELECT
# a.visited_on,
# sum( b.amount ) AS amount,
# round(sum( b.amount ) / 7, 2 ) AS average_amount
# FROM
# ( SELECT DISTINCT visited_on FROM customer ) a JOIN customer b
# ON datediff( a.visited_on, b.visited_on ) BETWEEN 0 AND 6
# WHERE
# a.visited_on >= (SELECT min( visited_on ) FROM customer ) + 6
# GROUP BY
# a.visited_on
# ORDER BY
# a.visited_on
SELECT DISTINCT visited_on,
sum_amount AS amount,
ROUND(sum_amount/7, 2) AS average_amount
FROM (
SELECT visited_on, SUM(amount) OVER ( ORDER BY visited_on RANGE interval 6 day preceding ) AS sum_amount
FROM Customer) t
-- 最后手动地从第7天开始
WHERE DATEDIFF(visited_on, (SELECT MIN(visited_on) FROM Customer)) >= 6
窗口函数
当使用MySQL 8.0中的窗口函数时,不同的函数类型有不同的功能。我会逐个讲解每种函数的用法,并提供相应的示例。
ROW_NUMBER(): 为结果集中的每一行分配一个唯一的序号。
语法:ROW_NUMBER() OVER (ORDER BY column_name)
示例:
SELECT column1, column2, ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM table_name;
RANK(): 为结果集中的每一行分配一个排名。
语法:RANK() OVER (ORDER BY column_name)
示例:
SELECT column1, column2, RANK() OVER (ORDER BY column2 DESC) AS ranking
FROM table_name;
DENSE_RANK(): 为结果集中的每一行分配一个密集排名。
语法:DENSE_RANK() OVER (ORDER BY column_name)
示例:
SELECT column1, column2, DENSE_RANK() OVER (ORDER BY column1) AS dense_ranking
FROM table_name;
这些函数可以在窗口中进行聚合计算,而不影响结果集的行数。
SUM(), AVG(), MIN(), MAX(): 聚合函数也可以作为窗口函数,用于计算特定窗口或分区中的总和、平均值、最小值或最大值。
示例:
SELECT column1, column2,
SUM(column3) OVER (PARTITION BY column1) AS sum_column3,
AVG(column4) OVER (ORDER BY column2) AS avg_column4
FROM table_name;
CUME_DIST(): 计算当前行的累积分布值,表示当前行在窗口中的位置。
语法:CUME_DIST() OVER (ORDER BY column_name)
示例:
SELECT column1, column2, CUME_DIST() OVER (ORDER BY column2) AS cumulative_dist
FROM table_name;
这些函数用于访问当前行之前或之后的行的数据。
LAG(): 获取当前行之前的行的数据。
语法:LAG(column_name, offset, default_value) OVER (ORDER BY column_name)
示例:
SELECT column1, column2, LAG(column2, 1, 0) OVER (ORDER BY column1) AS lag_value
FROM table_name;
LEAD(): 获取当前行之后的行的数据。
语法:LEAD(column_name, offset, default_value) OVER (ORDER BY column_name)
示例:
SELECT column1, column2, LEAD(column2, 1, 0) OVER (ORDER BY column1) AS lead_value
FROM table_name;
FIRST_VALUE(): 返回窗口中第一个行的值。
语法:FIRST_VALUE(column_name) OVER (ORDER BY column_name)
示例:
SELECT column1, column2, FIRST_VALUE(column2) OVER (ORDER BY column1) AS first_value
FROM table_name;
LAST_VALUE(): 返回窗口中最后一个行的值。
语法:LAST_VALUE(column_name) OVER (ORDER BY column_name ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
示例:
SELECT column1, column2, LAST_VALUE(column2) OVER (ORDER BY column1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_value
FROM table_name;
NTH_VALUE(): 返回窗口中指定位置的值。
语法:NTH_VALUE(column_name, n) OVER (ORDER BY column_name)
示例:
SELECT column1, column2, NTH_VALUE(column2, 3) OVER (ORDER BY column1) AS third_value
FROM table_name;
NTILE(): 将结果集分成指定数量的桶,并为每个行分配桶的编号。
语法:NTILE(n) OVER (ORDER BY column_name)
示例:
SELECT column1, column2, NTILE(4) OVER (ORDER BY column2) AS quartile
FROM table_name;
这些示例展示了 MySQL 8.0 中各种窗口函数的具体用法和相应的示例,能够帮助您更好地理解和应用这些函数进行数据分析。
在使用MySQL的窗口函数时,OVER
子句用于定义窗口的范围和条件。OVER
后面的内容指定了窗口函数计算的范围、排序方式和分组方式。
ORDER BY column_name: 指定窗口函数按照哪个列的值进行排序。
SELECT column1, column2, ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM table_name;
PARTITION BY column_name: 指定窗口函数计算的分区方式,将结果集分成多个分区进行独立计算。
SELECT column1, column2, SUM(column3) OVER (PARTITION BY column1) AS sum_column3
FROM table_name;
ROWS/RANGE BETWEEN clause: 指定窗口函数的行范围或值范围。
SELECT column1, column2, SUM(column3) OVER (ORDER BY column1 ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS sum_range
FROM table_name;
UNBOUNDED PRECEDING/FOLLOWING: 指定窗口函数的起始点和终止点。
UNBOUNDED PRECEDING
表示从结果集的第一行开始。UNBOUNDED FOLLOWING
表示到结果集的最后一行结束。SELECT column1, column2, SUM(column3) OVER (ORDER BY column1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM table_name;
SELECT
column1,
column2,
SUM(column3) OVER (
PARTITION BY column1
ORDER BY column2
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
) AS sum_range
FROM table_name;
在上述示例中,OVER
子句后面的内容用于定义窗口函数的范围和条件。您可以根据具体的需求使用 ORDER BY
、PARTITION BY
、ROWS/RANGE BETWEEN
和 UNBOUNDED PRECEDING/FOLLOWING
等子句来构建窗口函数,以满足不同的分析需求。
一些关键字的信息:
语法:
[你要的操作] OVER ( PARTITION BY <用于分组的列名>
ORDER BY <按序叠加的列名>
ROWS|RANGE <窗口滑动的数据范围> )
<窗口滑动的数据范围> 用来限定 [你要的操作] 所运用的数据的范围,具体有如下这些:
当前 - current row
之前的 - preceding
之后的 - following
无界限 - unbounded
表示从前面的起点 - unbounded preceding
表示到后面的终点 - unbounded following
举例:
取当前行和前五行:ROWS between 5 preceding and current row --共6行
取当前行和后五行:ROWS between current row and 5 following --共6行
取前五行和后五行:ROWS between 5 preceding and 5 following --共11行
取当前行和前六行:ROWS 6 preceding(等价于between...and current row) --共7行
这一天和前面6天:RANGE between interval 6 day preceding and current row --共7天
这一天和前面6天:RANGE interval 6 day preceding(等价于between...and current row) --共7天
字段值落在当前值-100到+200的区间:RANGE between 100 preceding and 200 following --共301个数值
大家好,我是xwhking,一名技术爱好者,目前正在全力学习 Java,前端也会一点,如果你有任何疑问请你评论,或者可以加我QQ(2837468248)说明来意!希望能够与你共同进步