admin管理员组文章数量:1023794
Table user_power_rank
:
name | type | pk |
---|---|---|
id | string | 1 |
userid | integer | 0 |
power | integer | 0 |
atime | integer | 0 |
Table user_power_rank
:
name | type | pk |
---|---|---|
id | string | 1 |
userid | integer | 0 |
power | integer | 0 |
atime | integer | 0 |
Indexes of this table:
index name | columns |
---|---|
i_power_desc_atime_asc | power desc, atime asc |
i_id_power | id, power |
Query to get user rank:
SELECT * FROM (
SELECT id,power,row_number() OVER (ORDER BY power DESC, atime ASC) ranking
FROM user_power_rank
) WHERE id="the-data-id"
Run time of this query:
data count | avg query time(ms) |
---|---|
10000 | 17.81 |
50000 | 101.32 |
100000 | 218.69 |
Performance is bad. Query plan:
id | parent | detail |
---|---|---|
2 | 0 | CO-ROUTINE SUBQUERY 1 |
5 | 2 | CO-ROUTINE SUBQUERY 3 |
9 | 5 | SCAN user_rank_power USING INDEX i_power_desc_atime_asc |
24 | 2 | SCAN SUBUERY 3 |
63 | 0 | SCAN SUBQUERY 1 |
How do I optimize performance?
Share Improve this question edited Nov 19, 2024 at 23:50 user4157124 2,99614 gold badges31 silver badges46 bronze badges asked Nov 19, 2024 at 3:14 Ace.YinAce.Yin 9172 gold badges9 silver badges19 bronze badges 5 |1 Answer
Reset to default 2Since your ranking
is simply the global row_number()
, it is equivalent to the number of rows that do sort "above". Instead of establishing the row_number()
for all rows and then picking one result, we can count the number of rows that sort "above" that row for that exact sort-order; that number of rows is the value row_number()
would have had.
SELECT id, power,
(SELECT COUNT(*)
FROM user_power_rank AS inner
WHERE (inner.power > user_power_rank.power
OR (inner.power = user_power_rank.power
AND inner.atime <= user_power_rank.atime))) AS ranking
FROM user_power_rank
WHERE id = ?;
For 200,000 rows in user_power_rank
the above query executes in 0.04s instead of 0.24s; for 1,000,000 rows it executes in 0.18s instead of 1.6s. Your mileage may vary depending on values for power
are distributed.
One needs to be careful about how NULL
-values would have been treated by ORDER BY
compared to >
/<
, if NULL
-values in power
/atime
are allowed at all.
Notice that in the construction of this query
inner.power > user_power_rank.power
is equivalent toOVER (ORDER BY power DESC)
OR (inner.power = user_power_rank.power AND inner.atime <= user_power_rank.atime)
is equivalent toOVER (..., atime ASC)
, but only for those values wherepower
does not establish an order (because the values are equal).
Also notice that with the original query there is ambiguity with respect to the value of ranking
if there are multiple rows with the same (power, atime)
-values; this would have been solved by OVER (ORDER BY power DESC, atime ASC, id)
(notice id
at the bottom of the sort-tree) to guarantee unambiguous order. This ambiguity is preserved with this query, which may cause it to return different result than OP's query. This is not an error, the exact sort-order had been ambiguous to begin with.
Table user_power_rank
:
name | type | pk |
---|---|---|
id | string | 1 |
userid | integer | 0 |
power | integer | 0 |
atime | integer | 0 |
Table user_power_rank
:
name | type | pk |
---|---|---|
id | string | 1 |
userid | integer | 0 |
power | integer | 0 |
atime | integer | 0 |
Indexes of this table:
index name | columns |
---|---|
i_power_desc_atime_asc | power desc, atime asc |
i_id_power | id, power |
Query to get user rank:
SELECT * FROM (
SELECT id,power,row_number() OVER (ORDER BY power DESC, atime ASC) ranking
FROM user_power_rank
) WHERE id="the-data-id"
Run time of this query:
data count | avg query time(ms) |
---|---|
10000 | 17.81 |
50000 | 101.32 |
100000 | 218.69 |
Performance is bad. Query plan:
id | parent | detail |
---|---|---|
2 | 0 | CO-ROUTINE SUBQUERY 1 |
5 | 2 | CO-ROUTINE SUBQUERY 3 |
9 | 5 | SCAN user_rank_power USING INDEX i_power_desc_atime_asc |
24 | 2 | SCAN SUBUERY 3 |
63 | 0 | SCAN SUBQUERY 1 |
How do I optimize performance?
Share Improve this question edited Nov 19, 2024 at 23:50 user4157124 2,99614 gold badges31 silver badges46 bronze badges asked Nov 19, 2024 at 3:14 Ace.YinAce.Yin 9172 gold badges9 silver badges19 bronze badges 5-
Why do you do a 2 step request and not just 1 request with the
where
directly done in it? It would make the request less expensive if most lines are filtered out. – Jérôme Richard Commented Nov 19, 2024 at 14:24 -
@JérômeRichard, because it's the global row_number() that OP is after. The query is correct, and the query plan is also optimal: At worst a full table scan is required to establish the rank; this is done walking the index, and yielding the one row where
id=?
as soon as it comes along. If the table is read-heavy, it might we worth the cost to update the ranking (for all rows) using a trigger. – user2722968 Commented Nov 19, 2024 at 18:29 - Ok. I wonder if the problem is not the SQLite implementation itself... I mean assuming your ID strings are <=8 bytes and you use 64-bit integers, the DB with 100000 rows should take only few MiB. So the operation cannot be memory bound (<1% of the bandwidth). I don't see why the operation would be compute-bound either. That being said, I assume the indexes are stored in memory, but this is apparently not the case from what I can read. Thus, if they are not and you have HDD, then this can be very slow. Even a bit with an SSD, this is not great (though far better). It makes things latency bound. – Jérôme Richard Commented Nov 19, 2024 at 23:26
- Is this Ok for you to use an in-memory database? If so, then can you try that just to check the IO latency is the issue or not. – Jérôme Richard Commented Nov 19, 2024 at 23:32
- @JérômeRichard yes, this is an in-memory database – Ace.Yin Commented Nov 20, 2024 at 1:09
1 Answer
Reset to default 2Since your ranking
is simply the global row_number()
, it is equivalent to the number of rows that do sort "above". Instead of establishing the row_number()
for all rows and then picking one result, we can count the number of rows that sort "above" that row for that exact sort-order; that number of rows is the value row_number()
would have had.
SELECT id, power,
(SELECT COUNT(*)
FROM user_power_rank AS inner
WHERE (inner.power > user_power_rank.power
OR (inner.power = user_power_rank.power
AND inner.atime <= user_power_rank.atime))) AS ranking
FROM user_power_rank
WHERE id = ?;
For 200,000 rows in user_power_rank
the above query executes in 0.04s instead of 0.24s; for 1,000,000 rows it executes in 0.18s instead of 1.6s. Your mileage may vary depending on values for power
are distributed.
One needs to be careful about how NULL
-values would have been treated by ORDER BY
compared to >
/<
, if NULL
-values in power
/atime
are allowed at all.
Notice that in the construction of this query
inner.power > user_power_rank.power
is equivalent toOVER (ORDER BY power DESC)
OR (inner.power = user_power_rank.power AND inner.atime <= user_power_rank.atime)
is equivalent toOVER (..., atime ASC)
, but only for those values wherepower
does not establish an order (because the values are equal).
Also notice that with the original query there is ambiguity with respect to the value of ranking
if there are multiple rows with the same (power, atime)
-values; this would have been solved by OVER (ORDER BY power DESC, atime ASC, id)
(notice id
at the bottom of the sort-tree) to guarantee unambiguous order. This ambiguity is preserved with this query, which may cause it to return different result than OP's query. This is not an error, the exact sort-order had been ambiguous to begin with.
本文标签: sqlHow can I improve performance of this rownumber() ranking window function queryStack Overflow
版权声明:本文标题:sql - How can I improve performance of this row_number() ranking window function query? - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745584360a2157494.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
where
directly done in it? It would make the request less expensive if most lines are filtered out. – Jérôme Richard Commented Nov 19, 2024 at 14:24id=?
as soon as it comes along. If the table is read-heavy, it might we worth the cost to update the ranking (for all rows) using a trigger. – user2722968 Commented Nov 19, 2024 at 18:29