Удалите дубликаты в списке при хранении его порядка (Python)

Question

Удалите дубликаты в списке при хранении его порядка (Python)

im, не уверенный, существует любой другой путь, чем:

$result = [ $result ]   if ref($result) ne 'ARRAY';  
foreach .....

20

python list

задан Community 23 May 2017 в 12:25

6 ответов

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )

9

ответ дан 29 November 2019 в 22:38

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

3

ответ дан 29 November 2019 в 22:38

dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

1

ответ дан 29 November 2019 в 22:38

Супер простой способ сделать это:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

Это не самый эффективный способ, потому что:

он просматривает весь список для каждого элемента в списке, поэтому он O (n ^ 2)
он рекурсивен, поэтому использует глубину стека, равную длине списка

Однако для простых целей (не более нескольких сотен элементов, не критичных для производительности) этого достаточно.

0

ответ дан 29 November 2019 в 22:38

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

0

ответ дан 29 November 2019 в 22:38

Другие вопросы по тегам:

python list

Похожие вопросы:

score 24 · Accepted Answer

Мой ответ на другой ваш вопрос, который вы полностью проигнорировали !, показывает Вы ошибаетесь, утверждая, что

Ответы на этот вопрос не keep the "order"

my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist