admin管理员组
文章数量:1026989

I was wondering if someone knows some really efficient way how to get all unique numbers from the sorted list. Example: if I have list = [1,1,2,3,3,3,6,6,8,10,100,180,180] I want to get a list like this [1,2,3,6,8,10,100,180]. I am looking for a solution that is more efficient than going through the whole list, to have complexity better than O(n).

My first idea was to check every second number, but it did not work. There is my code:

solution = []
i = 0
while i < values.size() - 2:
    if values.get(i) == values.get(i + 2):
        if values.get(i) not in solution:
            solution.append(values.get(i))
    else:
        if values.get(i) not in solution:
                solution.append(values.get(i))
                
        if values.get(i + 1) != values.get(i + 2) \
            and values.get(i) != values.get(i + 1):
            solution.append(values.get(i + 1))
        solution.append(values.get(i + 2))
    i += 2
    
if values.get(values.size() - 1) not in solution:
    solution.append(values.get(values.size() - 1))
return solution

My first idea was to check every second number, but it did not work. There is my code:

solution = []
i = 0
while i < values.size() - 2:
    if values.get(i) == values.get(i + 2):
        if values.get(i) not in solution:
            solution.append(values.get(i))
    else:
        if values.get(i) not in solution:
                solution.append(values.get(i))
                
        if values.get(i + 1) != values.get(i + 2) \
            and values.get(i) != values.get(i + 1):
            solution.append(values.get(i + 1))
        solution.append(values.get(i + 2))
    i += 2
    
if values.get(values.size() - 1) not in solution:
    solution.append(values.get(values.size() - 1))
return solution

Share Improve this question edited Nov 23, 2024 at 2:15 John Kugelman 363k69 gold badges553 silver badges597 bronze badges asked Nov 16, 2024 at 19:18 Jack Skyblue 412 bronze badges

Add a comment |

4 Answers 4

Sorted by: Reset to default 2

Let k be the number of distinct elements, and n the total number of elements.

get all unique numbers from the sorted list [with] ... complexity better than O(n).

In the general case, clearly you can't, since k might be some fraction of n or even be as large as n. There does appear to be at least two degrees of wiggle room, here.

counting sort

How did that sorted list come to be, in the first place? Perhaps it was constructed using some comparison sort, with O(n log n) cost. But perhaps we know the elements have restricted range, and we chose to construct the list using counting sort, with O(n) time complexity and O(k) memory complexity.

In which case, preserve those k counters, and read out the element values they correspond to, with O(k) time complexity. So rather than just input of "a list", we also accept the counters data structure.

compression

Almost the same thing: insist that the sorted values arrive in some general purpose compressed format such as Lempel-Ziv. The PyArrow apache parquet binary file format works nicely. With k = 12 and n = 10^6, those million randomly chosen values will be represented in a file of just 675 bytes. And now it's a matter of uncompressing to recover a few (count, value) tuples, rather than recovering a giant number of element values.

binary search

As we sequentially scan through all n elements we shall encounter k "breaks", where we advance to a brand new element value. We can bisect to locate such breaks.

Consider the degenerate case. Maybe each element corresponds to "female" or "male", and the application was used at a girl's school so the list simply indicates we have n females. Clearly we can identify this case in O(1) constant time, by examining first and last elements and relying on monotonicity of the input.

Let's visit a public school and try that again. It develops that first and last elements differ by one, and we will call the start of the list the "first break". Now we need to find a single break. Apply binary search in the usual way, with O(log n) time complexity.

In a more general setting where k << n still holds, we can find k breaks in O(k log n) time, with O(k) memory complexity.

The following Rust code will find breaks in values pretty quickly. It uses a half-open interval.

/// Finds an xs index where the element values change.
fn find_break(xs: &[i16], start: usize, end: usize) -> usize {
    assert!(!xs.is_empty());
    let mut i = start;
    let mut j = end;
    assert!(i < j);

    while i < j {
        let mid = i + (j - i) / 2;
        if xs[i] == xs[mid] {
            i = mid + 1;
        } else {
            j = mid;
        }
    }
    assert!(i == 0 || xs[i - 1] != xs[i]);

    i
}

Going left-to-right, finding the end of the current value with exponential search:

from bisect import bisect

def uniq1(s):
    n = len(s)
    i = 0
    uniq = []
    while i < n:
        uniq.append(s[i])
        k = 1
        while i+k < n and s[i+k] == s[i]:
            k *= 2
        i = bisect(s, s[i], i+k//2+1, min(i+k, n))
    return uniq

Divide-and-conquer idea from someone's comment:

def uniq2(s):
    uniq = []
    def add(i, j):
        if s[i] == s[j]:
            if s[i] not in uniq[-1:]:
                uniq.append(s[i])
            return
        m = (i + j) // 2
        add(i, m)
        add(m+1, j)
    if s:
        add(0, len(s) - 1)
    return uniq

Rough benchmark with a million elements from a pool of 1000 unique values:

True   4.4 ms  uniq_exponential
True   9.0 ms  uniq_divide
True  64.4 ms  uniq_dict
True  26.0 ms  uniq_groupby_1
True  36.4 ms  uniq_groupby_2
True  29.3 ms  uniq_groupby_3

Test script:

from bisect import bisect

def uniq_exponential(s):
    n = len(s)
    i = 0
    uniq = []
    while i < n:
        uniq.append(s[i])
        k = 1
        while i+k < n and s[i+k] == s[i]:
            k *= 2
        i = bisect(s, s[i], i+k//2+1, min(i+k, n))
    return uniq


def uniq_divide(s):
    uniq = []
    def add(i, j):
        if s[i] == s[j]:
            if s[i] not in uniq[-1:]:
                uniq.append(s[i])
            return
        m = (i + j) // 2
        add(i, m)
        add(m+1, j)
    if s:
        add(0, len(s) - 1)
    return uniq


def uniq_dict(s):
    return list(dict.fromkeys(s))


from itertools import groupby

def uniq_groupby_1(s):
    return [x for x, _ in groupby(s)]

def uniq_groupby_2(s):
    return list(next(zip(*groupby(s))))

from operator import itemgetter

def uniq_groupby_3(s):
    return list(map(itemgetter(0), groupby(s)))


funcs = [uniq_exponential, uniq_divide, uniq_dict, uniq_groupby_1, uniq_groupby_2, uniq_groupby_3]

import random
from time import time

def test(s):
    expect = sorted(set(s))
    for f in funcs:
        t = time()
        print(f(s) == expect, f'{(time()-t)*1e3:5.1f} ms ', f.__name__)
    print()

test([1,1,2,3,3,3,6,6,8,10,100,180,180])
test(sorted(random.choices(random.sample(range(1000000), 1000), k=1000000)))
test(list(range(10000)))

Attempt This Online!

You haven't told us how many unique elements k there are among the n elements, but if k was small relative to n, and n was really large, I'd so something like this.

For a sublist x[low .. < high], we want to know all the runs that start in that sublist.

If x[low] == x[high], then look at x[low - 1]. If low > 0 and x[low] == x[low - 1], then no runs start in this sublist. Otherwise, x[low] is the start of a run.
If high - low < delta, for some value delta to be determined, just look at the values linearly. There's nothing to be gained by the recursive algorithm.
Let s = sqrt(high - low). Break the sublist into s subsets of s elements. Perform the algorithm recursively on each one.

I suspect that this will, in most cases be sublinear. It will quickly identify large blocks of identical elements without looking at each element.

My gut tells me that dividing into s elements of s is the best algorithm to do in the third case above. I might also experiment with just splitting the list into two and doing a binary search for run starts that way. Probably easier to code.

from itertools import groupby

sorted_list = [1, 1, 2, 3, 3, 3, 6, 6, 8, 10, 100, 180, 180]

unique_list = [key for key, _ in groupby(sorted_list)]

print(unique_list)

My first idea was to check every second number, but it did not work. There is my code:

solution = []
i = 0
while i < values.size() - 2:
    if values.get(i) == values.get(i + 2):
        if values.get(i) not in solution:
            solution.append(values.get(i))
    else:
        if values.get(i) not in solution:
                solution.append(values.get(i))
                
        if values.get(i + 1) != values.get(i + 2) \
            and values.get(i) != values.get(i + 1):
            solution.append(values.get(i + 1))
        solution.append(values.get(i + 2))
    i += 2
    
if values.get(values.size() - 1) not in solution:
    solution.append(values.get(values.size() - 1))
return solution

My first idea was to check every second number, but it did not work. There is my code:

solution = []
i = 0
while i < values.size() - 2:
    if values.get(i) == values.get(i + 2):
        if values.get(i) not in solution:
            solution.append(values.get(i))
    else:
        if values.get(i) not in solution:
                solution.append(values.get(i))
                
        if values.get(i + 1) != values.get(i + 2) \
            and values.get(i) != values.get(i + 1):
            solution.append(values.get(i + 1))
        solution.append(values.get(i + 2))
    i += 2
    
if values.get(values.size() - 1) not in solution:
    solution.append(values.get(values.size() - 1))
return solution

Share Improve this question edited Nov 23, 2024 at 2:15 John Kugelman 363k69 gold badges553 silver badges597 bronze badges asked Nov 16, 2024 at 19:18 Jack Skyblue 412 bronze badges

Add a comment |

4 Answers 4

Sorted by: Reset to default 2

Let k be the number of distinct elements, and n the total number of elements.

get all unique numbers from the sorted list [with] ... complexity better than O(n).

In the general case, clearly you can't, since k might be some fraction of n or even be as large as n. There does appear to be at least two degrees of wiggle room, here.

counting sort

compression

binary search

As we sequentially scan through all n elements we shall encounter k "breaks", where we advance to a brand new element value. We can bisect to locate such breaks.

In a more general setting where k << n still holds, we can find k breaks in O(k log n) time, with O(k) memory complexity.

The following Rust code will find breaks in values pretty quickly. It uses a half-open interval.

/// Finds an xs index where the element values change.
fn find_break(xs: &[i16], start: usize, end: usize) -> usize {
    assert!(!xs.is_empty());
    let mut i = start;
    let mut j = end;
    assert!(i < j);

    while i < j {
        let mid = i + (j - i) / 2;
        if xs[i] == xs[mid] {
            i = mid + 1;
        } else {
            j = mid;
        }
    }
    assert!(i == 0 || xs[i - 1] != xs[i]);

    i
}

Going left-to-right, finding the end of the current value with exponential search:

from bisect import bisect

def uniq1(s):
    n = len(s)
    i = 0
    uniq = []
    while i < n:
        uniq.append(s[i])
        k = 1
        while i+k < n and s[i+k] == s[i]:
            k *= 2
        i = bisect(s, s[i], i+k//2+1, min(i+k, n))
    return uniq

Divide-and-conquer idea from someone's comment:

def uniq2(s):
    uniq = []
    def add(i, j):
        if s[i] == s[j]:
            if s[i] not in uniq[-1:]:
                uniq.append(s[i])
            return
        m = (i + j) // 2
        add(i, m)
        add(m+1, j)
    if s:
        add(0, len(s) - 1)
    return uniq

Rough benchmark with a million elements from a pool of 1000 unique values:

True   4.4 ms  uniq_exponential
True   9.0 ms  uniq_divide
True  64.4 ms  uniq_dict
True  26.0 ms  uniq_groupby_1
True  36.4 ms  uniq_groupby_2
True  29.3 ms  uniq_groupby_3

Test script:

from bisect import bisect

def uniq_exponential(s):
    n = len(s)
    i = 0
    uniq = []
    while i < n:
        uniq.append(s[i])
        k = 1
        while i+k < n and s[i+k] == s[i]:
            k *= 2
        i = bisect(s, s[i], i+k//2+1, min(i+k, n))
    return uniq


def uniq_divide(s):
    uniq = []
    def add(i, j):
        if s[i] == s[j]:
            if s[i] not in uniq[-1:]:
                uniq.append(s[i])
            return
        m = (i + j) // 2
        add(i, m)
        add(m+1, j)
    if s:
        add(0, len(s) - 1)
    return uniq


def uniq_dict(s):
    return list(dict.fromkeys(s))


from itertools import groupby

def uniq_groupby_1(s):
    return [x for x, _ in groupby(s)]

def uniq_groupby_2(s):
    return list(next(zip(*groupby(s))))

from operator import itemgetter

def uniq_groupby_3(s):
    return list(map(itemgetter(0), groupby(s)))


funcs = [uniq_exponential, uniq_divide, uniq_dict, uniq_groupby_1, uniq_groupby_2, uniq_groupby_3]

import random
from time import time

def test(s):
    expect = sorted(set(s))
    for f in funcs:
        t = time()
        print(f(s) == expect, f'{(time()-t)*1e3:5.1f} ms ', f.__name__)
    print()

test([1,1,2,3,3,3,6,6,8,10,100,180,180])
test(sorted(random.choices(random.sample(range(1000000), 1000), k=1000000)))
test(list(range(10000)))

Attempt This Online!

You haven't told us how many unique elements k there are among the n elements, but if k was small relative to n, and n was really large, I'd so something like this.

For a sublist x[low .. < high], we want to know all the runs that start in that sublist.

If x[low] == x[high], then look at x[low - 1]. If low > 0 and x[low] == x[low - 1], then no runs start in this sublist. Otherwise, x[low] is the start of a run.
If high - low < delta, for some value delta to be determined, just look at the values linearly. There's nothing to be gained by the recursive algorithm.
Let s = sqrt(high - low). Break the sublist into s subsets of s elements. Perform the algorithm recursively on each one.

I suspect that this will, in most cases be sublinear. It will quickly identify large blocks of identical elements without looking at each element.

from itertools import groupby

sorted_list = [1, 1, 2, 3, 3, 3, 6, 6, 8, 10, 100, 180, 180]

unique_list = [key for key, _ in groupby(sorted_list)]

print(unique_list)

本文标签： pythonWhat is the most efficient way to get unique elements from sorted listStack Overflow

版权声明：本文标题：python - What is the most efficient way to get unique elements from sorted list? - Stack Overflow 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/questions/1745646176a2161038.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

python - What is the most efficient way to get unique elements from sorted list? - Stack Overflow

4 Answers 4

counting sort

compression

binary search

4 Answers 4

counting sort

compression

binary search

更多相关文章

python - What is the most efficient way to get unique elements from sorted list? - Stack Overflow

发表评论

推荐文章

javascript - How to make 3D animation with Canvas - Stack Overflow

javascript - Using &lt;slot&gt; in native &lt;select&gt; element - Stack Overflow

pull to refresh - SwiftUI .refreshable default progressView not shown if parentView has .navigationBarTitleDisplayMode(.large) -

java - Jenkins job freezes on executing jar web application - Stack Overflow

flutter - HTTP request get statusCode:400 in flutter_test, but it get status:200 in orignal dart method call - Stack Overflow

热门文章

wp mail - Send email from Wordpress

plugins - Error when setting up phpunit tests with wp-cli scaffold

javascript - How to pull the data context from a view in Meteor using Blaze.getData() - Stack Overflow

javascript - React Native limit characters - Stack Overflow

GitGithub: fixing repos history - Stack Overflow

javascript - Angular ngStyle Dynamic Styling Element with ngIf - Stack Overflow

plotly - A sunburst plot produced by R&#39;s plot_ly function shows a blank page - Stack Overflow

javascript - How do I filter this array using value with handlebars.js? - Stack Overflow

javascript - Insert dynamically a React component in a div - Stack Overflow

javascript - angular 8 webpack-bundle-analyzer looking for wrong polyfill files - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

javascript - Type &#39;undefined&#39; is not assignable to type &#39;menuItemProps[]&#39; - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

javascript - Using <slot> in native <select> element - Stack Overflow

plotly - A sunburst plot produced by R's plot_ly function shows a blank page - Stack Overflow

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow