admin管理员组文章数量:1130349
Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:
print({2, 3, 10})
x = 2
print({x, 3, 10})
Output (Attempt This Online!):
{3, 10, 2}
{10, 2, 3}
Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?
My sys.version and sys.implementation:
3.13.0 (main, Nov 9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')
Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:
print({2, 3, 10})
x = 2
print({x, 3, 10})
Output (Attempt This Online!):
{3, 10, 2}
{10, 2, 3}
Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?
My sys.version and sys.implementation:
3.13.0 (main, Nov 9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')
Share
Improve this question
asked Jan 13 at 15:24
Stefan PochmannStefan Pochmann
28.6k9 gold badges47 silver badges113 bronze badges
16
|
Show 11 more comments
1 Answer
Reset to default 25It's a function of a couple things:
Hash bucket collisions - For the smallest
setsize,8(implementation detail of CPython),2and10collide on their cutdown hash codes (which, again implementation detail, are2and10; mod8, they're both 2). Whichever one is inserted first "wins" and gets bucket index 2, the other gets moved by the probing operation. The probing operation (again, CPython implementation detail) initially checks linearly adjacent buckets for an empty bucket (because it usually finds one, and better memory locality improves cache performance), and only if it doesn't find one does it begin the randomized jumping about algorithm to find an empty bucket (it can't do pure linear probing, because that would make it far too easy to trigger pathological cases that changesetoperations from amortized average-caseO(1)toO(n)).Compile-time optimizations: In modern CPython,
sets andlists of constant literals that are at least three elements long are constructed at compile time as an immutable container (frozensetandtuplerespectively). At runtime, it builds an emptyset/list, thenupdates/extends it with the immutable container, rather than performing individual loads andadds/appends for each element. This means that when you build withs = {2, 3, 10}, you're actually doings = set(),s.update(frozenset({2, 3, 10}))(with thefrozensetpulled from cache), whiles = {x, 3, 10}is building by loadingx,3and10on the stack, then building thesetas a single operation.
The two of these mean that you're actually building it differently; {x, 3, 10} is inserting 2, then 3, then 10, so buckets 2 and 3 are filled, and 10 gets relocated (the probing strategy clearly puts it in bucket 0 or 1, before bucket 2). When you do {2, 3, 10}, at compile-time it's making a frozenset({3, 10, 2}), then at runtime, it's creating the empty set, then updating it by iterating that frozenset, which has already reordered the elements, so now they're no longer being added in 2, 3, 10 order, and the race for "preferred" buckets is won by different elements.
In summary, the behavior of {x, 3, 10} is equivalent to:
s = set()
s.add(x)
s.add(3)
s.add(10)
which predictably gives buckets 2 and 3 to 2 and 3 themselves, with 10 being displaced to bucket 0 or 1.
By contrast, {2, 3, 10} builds a frozenset({3, 10, 2}) (note: it's in that order after conversion to frozenset; if you tried to run that exact line and print it, you'd see a different order), then updates an empty set with it. There is an optimized code path for populating an empty set from another set/frozenset that just copies the contents directly (rather than iterating and inserting piecemeal), so the {3, 10, 2} ordering in the cached frozenset is preserved in each set created from it, the same as as if you'd run:
s = set()
s.update(frozenset({2, 3, 10}))
but more performant (because the frozenset is created once at compile time and loaded cheaply for each new set to initialize).
Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:
print({2, 3, 10})
x = 2
print({x, 3, 10})
Output (Attempt This Online!):
{3, 10, 2}
{10, 2, 3}
Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?
My sys.version and sys.implementation:
3.13.0 (main, Nov 9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')
Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:
print({2, 3, 10})
x = 2
print({x, 3, 10})
Output (Attempt This Online!):
{3, 10, 2}
{10, 2, 3}
Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?
My sys.version and sys.implementation:
3.13.0 (main, Nov 9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')
Share
Improve this question
asked Jan 13 at 15:24
Stefan PochmannStefan Pochmann
28.6k9 gold badges47 silver badges113 bronze badges
16
- 3 @JonSG Don't know what you guys are talking about. 10 and 11 are binary. 111 is three digits, so a ternary. – Ted Klein Bergman Commented Jan 13 at 15:40
-
2
{2, 3, 10}are compiled into afrozenset, which changes their order. At runtime, python creates an empty set and then adds the values.{x, 3, 10}are built as a set from the get go. I'd wager its just a question of the hash implementation and how differing the order of insertion matters. – tdelaney Commented Jan 13 at 15:41 -
2
@TedKleinBergman - How in the world are 10 and 11 binary? Are you saying we count 1,2,3,4,5,6,7,8,9,3,4,12,13,14,15? Look at the definition of python integer literals and you'll see they are decimal.
0b10would be binary.0x10would be hex, etc. – tdelaney Commented Jan 13 at 15:46 - 2 @OldBoy I'm often able to predict the order. For example with 9 instead of 10, all three numbers differ modulo 8 so there are no collisions, and so I get the order {9, 2, 3} because that's their "by modulo 8" order. – Stefan Pochmann Commented Jan 13 at 15:46
- 2 @tdelaney It was an attempt at a joke. It's of course not binary, but just the same representation and a coincidental correlation. – Ted Klein Bergman Commented Jan 13 at 15:50
1 Answer
Reset to default 25It's a function of a couple things:
Hash bucket collisions - For the smallest
setsize,8(implementation detail of CPython),2and10collide on their cutdown hash codes (which, again implementation detail, are2and10; mod8, they're both 2). Whichever one is inserted first "wins" and gets bucket index 2, the other gets moved by the probing operation. The probing operation (again, CPython implementation detail) initially checks linearly adjacent buckets for an empty bucket (because it usually finds one, and better memory locality improves cache performance), and only if it doesn't find one does it begin the randomized jumping about algorithm to find an empty bucket (it can't do pure linear probing, because that would make it far too easy to trigger pathological cases that changesetoperations from amortized average-caseO(1)toO(n)).Compile-time optimizations: In modern CPython,
sets andlists of constant literals that are at least three elements long are constructed at compile time as an immutable container (frozensetandtuplerespectively). At runtime, it builds an emptyset/list, thenupdates/extends it with the immutable container, rather than performing individual loads andadds/appends for each element. This means that when you build withs = {2, 3, 10}, you're actually doings = set(),s.update(frozenset({2, 3, 10}))(with thefrozensetpulled from cache), whiles = {x, 3, 10}is building by loadingx,3and10on the stack, then building thesetas a single operation.
The two of these mean that you're actually building it differently; {x, 3, 10} is inserting 2, then 3, then 10, so buckets 2 and 3 are filled, and 10 gets relocated (the probing strategy clearly puts it in bucket 0 or 1, before bucket 2). When you do {2, 3, 10}, at compile-time it's making a frozenset({3, 10, 2}), then at runtime, it's creating the empty set, then updating it by iterating that frozenset, which has already reordered the elements, so now they're no longer being added in 2, 3, 10 order, and the race for "preferred" buckets is won by different elements.
In summary, the behavior of {x, 3, 10} is equivalent to:
s = set()
s.add(x)
s.add(3)
s.add(10)
which predictably gives buckets 2 and 3 to 2 and 3 themselves, with 10 being displaced to bucket 0 or 1.
By contrast, {2, 3, 10} builds a frozenset({3, 10, 2}) (note: it's in that order after conversion to frozenset; if you tried to run that exact line and print it, you'd see a different order), then updates an empty set with it. There is an optimized code path for populating an empty set from another set/frozenset that just copies the contents directly (rather than iterating and inserting piecemeal), so the {3, 10, 2} ordering in the cached frozenset is preserved in each set created from it, the same as as if you'd run:
s = set()
s.update(frozenset({2, 3, 10}))
but more performant (because the frozenset is created once at compile time and loaded cheaply for each new set to initialize).
本文标签: pythonHowwhy are 2310 and x10 with x2 ordered differentlyStack Overflow
版权声明:本文标题:python - Howwhy are {2,3,10} and {x,3,10} with x=2 ordered differently? - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://it.en369.cn/questions/1737438556a1473354.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。


{2, 3, 10}are compiled into afrozenset, which changes their order. At runtime, python creates an empty set and then adds the values.{x, 3, 10}are built as a set from the get go. I'd wager its just a question of the hash implementation and how differing the order of insertion matters. – tdelaney Commented Jan 13 at 15:410b10would be binary.0x10would be hex, etc. – tdelaney Commented Jan 13 at 15:46