python - Howwhy are {2,3,10} and {x,3,10} with x=2 ordered differently?

admin管理员组
文章数量:1130349

Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:

print({2, 3, 10})
x = 2
print({x, 3, 10})

Output (Attempt This Online!):

{3, 10, 2}
{10, 2, 3}

Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?

My sys.version and sys.implementation:

3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')

Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:

print({2, 3, 10})
x = 2
print({x, 3, 10})

Output (Attempt This Online!):

{3, 10, 2}
{10, 2, 3}

Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?

My sys.version and sys.implementation:

3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')

Share Improve this question asked Jan 13 at 15:24 Stefan Pochmann 28.6k9 gold badges47 silver badges113 bronze badges

3 @JonSG Don't know what you guys are talking about. 10 and 11 are binary. 111 is three digits, so a ternary. – Ted Klein Bergman Commented Jan 13 at 15:40
2 {2, 3, 10} are compiled into a frozenset, which changes their order. At runtime, python creates an empty set and then adds the values. {x, 3, 10} are built as a set from the get go. I'd wager its just a question of the hash implementation and how differing the order of insertion matters. – tdelaney Commented Jan 13 at 15:41
2 @TedKleinBergman - How in the world are 10 and 11 binary? Are you saying we count 1,2,3,4,5,6,7,8,9,3,4,12,13,14,15? Look at the definition of python integer literals and you'll see they are decimal. 0b10 would be binary. 0x10 would be hex, etc. – tdelaney Commented Jan 13 at 15:46
2 @OldBoy I'm often able to predict the order. For example with 9 instead of 10, all three numbers differ modulo 8 so there are no collisions, and so I get the order {9, 2, 3} because that's their "by modulo 8" order. – Stefan Pochmann Commented Jan 13 at 15:46
2 @tdelaney It was an attempt at a joke. It's of course not binary, but just the same representation and a coincidental correlation. – Ted Klein Bergman Commented Jan 13 at 15:50

| Show 11 more comments

1 Answer 1

Sorted by: Reset to default 25

It's a function of a couple things:

Hash bucket collisions - For the smallest set size, 8 (implementation detail of CPython), 2 and 10 collide on their cutdown hash codes (which, again implementation detail, are 2 and 10; mod 8, they're both 2). Whichever one is inserted first "wins" and gets bucket index 2, the other gets moved by the probing operation. The probing operation (again, CPython implementation detail) initially checks linearly adjacent buckets for an empty bucket (because it usually finds one, and better memory locality improves cache performance), and only if it doesn't find one does it begin the randomized jumping about algorithm to find an empty bucket (it can't do pure linear probing, because that would make it far too easy to trigger pathological cases that change set operations from amortized average-case O(1) to O(n)).
Compile-time optimizations: In modern CPython, sets and lists of constant literals that are at least three elements long are constructed at compile time as an immutable container (frozenset and tuple respectively). At runtime, it builds an empty set/list, then updates/extends it with the immutable container, rather than performing individual loads and adds/appends for each element. This means that when you build with s = {2, 3, 10}, you're actually doing s = set(), s.update(frozenset({2, 3, 10})) (with the frozenset pulled from cache), while s = {x, 3, 10} is building by loading x, 3 and 10 on the stack, then building the set as a single operation.

The two of these mean that you're actually building it differently; {x, 3, 10} is inserting 2, then 3, then 10, so buckets 2 and 3 are filled, and 10 gets relocated (the probing strategy clearly puts it in bucket 0 or 1, before bucket 2). When you do {2, 3, 10}, at compile-time it's making a frozenset({3, 10, 2}), then at runtime, it's creating the empty set, then updating it by iterating that frozenset, which has already reordered the elements, so now they're no longer being added in 2, 3, 10 order, and the race for "preferred" buckets is won by different elements.

In summary, the behavior of {x, 3, 10} is equivalent to:

s = set()
s.add(x)
s.add(3)
s.add(10)

which predictably gives buckets 2 and 3 to 2 and 3 themselves, with 10 being displaced to bucket 0 or 1.

By contrast, {2, 3, 10} builds a frozenset({3, 10, 2}) (note: it's in that order after conversion to frozenset; if you tried to run that exact line and print it, you'd see a different order), then updates an empty set with it. There is an optimized code path for populating an empty set from another set/frozenset that just copies the contents directly (rather than iterating and inserting piecemeal), so the {3, 10, 2} ordering in the cached frozenset is preserved in each set created from it, the same as as if you'd run:

s = set()
s.update(frozenset({2, 3, 10}))

but more performant (because the frozenset is created once at compile time and loaded cheaply for each new set to initialize).

Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:

print({2, 3, 10})
x = 2
print({x, 3, 10})

Output (Attempt This Online!):

{3, 10, 2}
{10, 2, 3}

Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?

My sys.version and sys.implementation:

3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')

Sets are unordered, or rather their order is an implementation detail. I'm interested in that detail. And I saw a case that surprised me:

print({2, 3, 10})
x = 2
print({x, 3, 10})

Output (Attempt This Online!):

{3, 10, 2}
{10, 2, 3}

Despite identical elements written in identical order, they get ordered differently. How does that happen, and is that done intentionally for some reason, e.g., for optimizing lookup speed?

My sys.version and sys.implementation:

3.13.0 (main, Nov  9 2024, 10:04:25) [GCC 14.2.1 20240910]
namespace(name='cpython', cache_tag='cpython-313', version=sys.version_info(major=3, minor=13, micro=0, releaselevel='final', serial=0), hexversion=51183856, _multiarch='x86_64-linux-gnu')

Share Improve this question asked Jan 13 at 15:24 Stefan Pochmann 28.6k9 gold badges47 silver badges113 bronze badges

3 @JonSG Don't know what you guys are talking about. 10 and 11 are binary. 111 is three digits, so a ternary. – Ted Klein Bergman Commented Jan 13 at 15:40
2 {2, 3, 10} are compiled into a frozenset, which changes their order. At runtime, python creates an empty set and then adds the values. {x, 3, 10} are built as a set from the get go. I'd wager its just a question of the hash implementation and how differing the order of insertion matters. – tdelaney Commented Jan 13 at 15:41
2 @TedKleinBergman - How in the world are 10 and 11 binary? Are you saying we count 1,2,3,4,5,6,7,8,9,3,4,12,13,14,15? Look at the definition of python integer literals and you'll see they are decimal. 0b10 would be binary. 0x10 would be hex, etc. – tdelaney Commented Jan 13 at 15:46
2 @OldBoy I'm often able to predict the order. For example with 9 instead of 10, all three numbers differ modulo 8 so there are no collisions, and so I get the order {9, 2, 3} because that's their "by modulo 8" order. – Stefan Pochmann Commented Jan 13 at 15:46
2 @tdelaney It was an attempt at a joke. It's of course not binary, but just the same representation and a coincidental correlation. – Ted Klein Bergman Commented Jan 13 at 15:50

| Show 11 more comments

1 Answer 1

Sorted by: Reset to default 25

It's a function of a couple things:

Hash bucket collisions - For the smallest set size, 8 (implementation detail of CPython), 2 and 10 collide on their cutdown hash codes (which, again implementation detail, are 2 and 10; mod 8, they're both 2). Whichever one is inserted first "wins" and gets bucket index 2, the other gets moved by the probing operation. The probing operation (again, CPython implementation detail) initially checks linearly adjacent buckets for an empty bucket (because it usually finds one, and better memory locality improves cache performance), and only if it doesn't find one does it begin the randomized jumping about algorithm to find an empty bucket (it can't do pure linear probing, because that would make it far too easy to trigger pathological cases that change set operations from amortized average-case O(1) to O(n)).
Compile-time optimizations: In modern CPython, sets and lists of constant literals that are at least three elements long are constructed at compile time as an immutable container (frozenset and tuple respectively). At runtime, it builds an empty set/list, then updates/extends it with the immutable container, rather than performing individual loads and adds/appends for each element. This means that when you build with s = {2, 3, 10}, you're actually doing s = set(), s.update(frozenset({2, 3, 10})) (with the frozenset pulled from cache), while s = {x, 3, 10} is building by loading x, 3 and 10 on the stack, then building the set as a single operation.

In summary, the behavior of {x, 3, 10} is equivalent to:

s = set()
s.add(x)
s.add(3)
s.add(10)

which predictably gives buckets 2 and 3 to 2 and 3 themselves, with 10 being displaced to bucket 0 or 1.

s = set()
s.update(frozenset({2, 3, 10}))

but more performant (because the frozenset is created once at compile time and loaded cheaply for each new set to initialize).

本文标签： pythonHowwhy are 2 3 10 and x 10 with x2 ordered differentlyStack Overflow

版权声明：本文标题：python - Howwhy are {2,3,10} and {x,3,10} with x=2 ordered differently? - Stack Overflow 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://it.en369.cn/questions/1737438556a1473354.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

python - Howwhy are {2,3,10} and {x,3,10} with x=2 ordered differently? - Stack Overflow

1 Answer 1

1 Answer 1

更多相关文章

3

1. 有1,2,3,4个数字, 能组成多少个互不相同且无重复数字的三位数? 都是多少?

计算该年该月该日天数 一年中有 12 个月，而每个月的天数是不一样的。其中大月 31 天，分别为 1,3,5,7,8,10,12 月，小月 30 天，分别 为 4,6,9,11 月。

一年中有12个月，每个月的天数是不一样的，其中有7个月为31天称为大月，分别为1,3,5,7,8,10,12月， 有4个月为30天,称为小月,分别为4,6,9,11月，还有二月较特殊，平年的二月有28

综合实践计算机的入门知识教学设计,3

【3】

sql注入中的union select 1,2,3....

Eigen::Matrix＜double,3,1＞ F；Eigen::MatrixXd F (3, 2)；这两行代码有什么区别？

一句话实现输出所有由1,2,3,4这四个数字组成的素数,并且每个素数中每个数字只使用一次

用lua输出斐波那契fibonacci数列 : 0, 1,1, 2, 3, 5, 8, 13, 21, 34, 55, 89

个人理解向：证明：对n=2,3,...,成立 11*2+12*3+...+1（n

1.算法斐波那契数列0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ，递归实现（超简答）

Set placeholder %2, %3, ... in Windows Event Log with PowerShell - Stack Overflow

python - Howwhy are {2,3,10} and {x,3,10} with x=2 ordered differently? - Stack Overflow

javascript - Uncaught Error: Based on the provided shape, [1024,3], the tensor should have 3072 values but has 30 - Stack Overfl

javascript - Split array into different size chunks (4, 3, 3, 3, 4, 3, 3, 3, etc) - Stack Overflow

javascript - Jquery: parse div blocks and add id(1,2,3,4,5) - Stack Overflow

javascript - for loop number sequence of (1,1,2,2,3,3, etc.) - Stack Overflow

javascript - How to iterate over the series: 1, -2, 3, -4, 5, -6, 7, -8, ...? - Stack Overflow

python - RuntimeError: Given groups=1, weight of size [64, 3, 3, 7, 7], expected input[1, 8, 3, 112, 112] to have 3 channels, bu

发表评论

推荐文章

电话号码被标记了怎么取消？标记取消最强攻略来了

Ubuntu 上安装搜狗输入法、CUDA &amp; cuDNN 和 Anaconda

Linux中分区的格式化命令,linux格式化命令是多少？Linux分区格式化的命令汇总

libcef.dll找不到？3个方法带你彻底解决，亲测有效！

50 行代码，实现中英文翻译

热门文章

linux 18.04安装搜狗输入法后不能输出中文

Linux Mint17.3安装搜狗输入法

Rockchip RK3588 - ArmSoM-Sige7开发板介绍

windows11下误改管理员权限导致没有管理员账号该如何找回

autocad 2021 mac中文版

【免费下载】 Exocad 数字化牙科CADCAM软件介绍文档

新手指南｜欢迎来到CSDN

电子书阅读| z-library + 微信

Win10 22H2纯净安装版深度解析与安装指南

屏幕翻译器实时翻译哪款好？这5款翻译软件值得你收藏

最新文章

Sublime 32位 激活码

windows下载安装远程桌面工具RealVNC-Server教程(RealVNC_E4_6_1版带注册码)

【亲测免费】 抖音直播伴侣推流密钥获取工具使用教程

【亲测免费】 Proxifer 安装包与注册码

Royal TSX许可证密钥(6.x后所有版本都可以用)

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

WordPress get parent category taxonomy

Omit specific product categories from WooCommerce shortcode

Updating Posts table in database without overwriting user generated content

php - Use wp_get_recent_posts with search term

responsive - How to exclude an image size from the Wordpress srcset

计算该年该月该日天数一年中有 12 个月，而每个月的天数是不一样的。其中大月 31 天，分别为 1,3,5,7,8,10,12 月，小月 30 天，分别为 4,6,9,11 月。

一年中有12个月，每个月的天数是不一样的，其中有7个月为31天称为大月，分别为1,3,5,7,8,10,12月，有4个月为30天,称为小月,分别为4,6,9,11月，还有二月较特殊，平年的二月有28

个人理解向：证明：对n=2,3,...,成立 112+123+...+1（n

Ubuntu 上安装搜狗输入法、CUDA & cuDNN 和 Anaconda

Sublime 32位激活码

【亲测免费】抖音直播伴侣推流密钥获取工具使用教程