您现在的位置： Linux教程網 >> UnixLinux > >> Linux編程 >> Linux編程

Python進階強化訓練之數據結構與算法進階

如何在列表、字典、集合中根據條件篩選數據？

實際問題

過濾列表中的負數
篩選出字典種值高於90的項
篩選出集合種能被3整出的元素

圍繞上面三個問題我們來進行討論，比如下面有一個列表：

>>>from random import randint
>>>li = [randint(-10, 10) for _ in range(10)]
>>>li
[-10, -9, 1, 10, -3, -7, -6, -7, 4, -5]

我們常規的做法就是通過for循環對列表中的每一個值進行迭代，然後判斷如果值大於等於0，就確定這個值是一個整數，否則就丟棄，比如下面的代碼：

>>>result = []
>>>for n in li:
# 如果這個元素大於等於0
...   if n >= 0:
# 加入的result列表中
...     result.append(n)
...
>>>result
[1, 10, 4]

實例

本篇所有的代碼均在Python 3.5.x種運行，如果你使用的是python 2.7.x，那麼請自行測試，在此之前，請導入一下模塊用於測試：

# 用於生成隨機數
>>>from random import randint
# 准確測量小段代碼的執行時間
>>>import timeit

請仔細閱讀下面的代碼，看完後你將會有不一樣的收獲。

列表

filter函數

生成一個隨機列表

>>>li = [randint(-10, 10) for _ in range(10)]
>>>li
[6, -8, 9, 3, 3, 8, 9, -4, 9, -6]

# x=列表中的一個元素，有多少個元素就迭代多少次
>>>result = filter(lambda x: x >=0, li)
>>>for n in result:
...  print(n)
...
6
9
3
3
8
9
9

列表解析

生成一個隨機列表

>>>li = [randint(-10, 10) for _ in range(10)]
>>>li
[8, -5, -2, 8, 9, 4, -6, -5, 5, 4]

>>>[x for x in li if x >=0 ]
[8, 8, 9, 4, 5, 4]

filter與列表解析性能對比

使用filter執行時間

>>>timeit.Timer('filter(lambda x: x >=0, [4, -1, 1, 3, -10, 5, -8, 0, 6, 3])').timeit()
0.38938787838583266

使用列表解析執行時間

>>>timeit.Timer('[x for x in [4, -1, 1, 3, -10, 5, -8, 0, 6, 3] if x >=0 ]').timeit()
1.1142896312373978

通過以上的測試可以看出filter的執行時間明顯比列表解析要快些，當然這並不是一個非常准確的數字，還是有待考察的。

字典

先隨機生成一個字典：

>>>dic = { x: randint(60, 100) for x in range(1, 21) }
>>>dic
{1: 61, 2: 75, 3: 69, 4: 70, 5: 79, 6: 90, 7: 74, 8: 85, 9: 77, 10: 86, 11: 93, 12: 96, 13: 86, 14: 79, 15: 60, 16: 84, 17: 70, 18: 72, 19: 61, 20: 87}

字典解析

>>>{ k: v for k, v in dic.items() if v > 90 }
{11: 93, 12: 96}

集合

生成一個集合：

>>>li = [randint(-10, 10) for _ in range(10)]
>>>s = set(li)
>>>s
{0, 1, 3, 4, 7, -9, -8}

集合解析

>>>{ x for x in s if x % 3 == 0 }
{0, 3, -9}

如何為元組中的每個元素命名、提高程序可讀性？

實際問題

某校的學生信息系統中的數據存儲格式如下：

(名字,年齡,性別,郵箱地址)

比如有如下學生信息:

student1 = ('Hello', 15, 'Schoolboy', 'hello@gmail.com')
student2 = ('World', 16, 'Girls', 'World@gmail.com')
student3 = ('ansheng', 20, 'Schoolboy', 'anshengme.com@gmail.com')
.....

通常我們會以如下方式進行取值：

>>>student1[2]
'Schoolboy'
>>>student1[3]
'hello@gmail.com'

在代碼比較多的情況下，使用大量的索引進行訪問會降低程序的可讀性，如何解決這個問題呢？

方案1

定義類似於其他語言的枚舉類型，也就是定義一系列的數值常量.

# 創建一個學生
>>>student = ('ansheng', 20, 'Schoolboy', 'anshengme.com@gmail.com')
# 定義常量
>>>NAME, AGE, SEX, EMAIL = range(4)
# 通過常量進行取值
>>>student[NAME]
'ansheng'
>>>student[AGE]
20
>>>student[EMAIL]
'anshengme.com@gmail.com'

方案2

使用標准庫中的collections.namedtuple替代內置的tuple

>>>from collections import namedtuple
>>>Student = namedtuple('Student', ['name','age','sex','email'])
>>>s = Student('ansheng', 20, 'Schoolboy', 'anshengme.com@gmail.com')
>>>s
Student(name='ansheng', age=20, sex='Schoolboy', email='anshengme.com@gmail.com')
# 使用屬性進行訪問
>>>s.name
'ansheng'
>>>s.age
20
>>>s.sex
'Schoolboy'

s是tuple的一個子類

>>>isinstance(s, tuple)
True

如何統計序列中元素的出現頻度？

實際問題

某隨機的列表中，找出出現次數最高的三個元素，他們的出現次數是多少？
對某英文文章的單詞進行詞頻統計，找出出現次數最高的10個單詞，他們出現的次數是多少？

解決方案

使用collections.Counter對象，將序列傳入Counter的構造器，得到Counter對象是元素頻度的字典，Counter.most_common(n)方法得到頻度最高的n個元素的列表

實例

常規的解決方法

生成隨機序列的列表

>>>from random import randint
>>>li = [randint(0, 20) for _ in range(30)]
>>>li
[14, 11, 10, 13, 19, 10, 3, 17, 12, 13, 18, 5, 10, 1, 9, 17, 1, 8, 3, 15, 8, 3, 20, 10, 9, 20, 6, 13, 8, 20]

以字典的形式創建每個數字出現的次數，

# 默認的出現次數為0
>>>count = dict.fromkeys(li, 0)
>>>count
{1: 0, 3: 0, 5: 0, 6: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 17: 0, 18: 0, 19: 0, 20: 0}

每遇到一個x(列表中的數)，就去字典中讓值+1

>>>for x in li:
...  count[x] += 1
...
>>>count
{1: 2, 3: 3, 5: 1, 6: 1, 8: 3, 9: 2, 10: 4, 11: 1, 12: 1, 13: 3, 14: 1, 15: 1, 17: 2, 18: 1, 19: 1, 20: 3}

然後循環count找到最大的三個數字，取出來就好。

使用collections.Counter對象

# 導入Counter
>>>from collections import Counter

>>>count = Counter(li)
>>>count
Counter({10: 4, 3: 3, 8: 3, 13: 3, 20: 3, 1: 2, 9: 2, 17: 2, 5: 1, 6: 1, 11: 1, 12: 1, 14: 1, 15: 1, 18: 1, 19: 1})
>>>count.most_common(3)
[(10, 4), (3, 3), (8, 3)]

英文文章詞頻統計實例

>>>from collections import Counter
>>>import re
>>>txt = open('jquery.cookie.js').read()
>>>count = Counter(re.split('\W+', txt))
>>>count
Counter({'s': 20, 'cookie': 18, 'options': 16, 'value': 12, 'function': 11, 'key': 10, 'var': 10, 'return': 9, 'expires': 8, 'if': 8, 'config': 8, 't': 6, 'result': 5, 'the': 5, 'a': 5, 'it': 5, '1': 4, 'decode': 4, 'i': 4, 'converter': 4, 'factory': 4, 'undefined': 4, 'cookies': 4, 'read': 3, 'domain': 3, 'g': 3, 'encode': 3, 'raw': 3, 'path': 3, 'name': 3, 'replace': 3, 'typeof': 3, 'parts': 3, 'we': 3, 'define': 3, 'document': 3, 'is': 3, 'pluses': 3, 'jquery': 3, 'If': 3, '': 2, 'else': 2, 'for': 2, 'object': 2, 'can': 2, 'json': 2, 'join': 2, 'days': 2, 'not': 2, 'jQuery': 2, 'in': 2, 'l': 2, 'parse': 2, 'ignore': 2, 'split': 2, 'isFunction': 2, 'unusable': 2, 'stringifyCookieValue': 2, 'secure': 2, 'extend': 2, 'decodeURIComponent': 2, 'parseCookieValue': 2, 'JSON': 2, '0': 2, 'defaults': 2, 'extending': 1, 'storing': 1, 'Copyright': 1, 'thus': 1, 'use': 1, 'place': 1, 'require': 1, 'max': 1, 'length': 1, 'according': 1, 'AMD': 1, 'carhartl': 1, 'first': 1, 'globals': 1, 'catch': 1, 'https': 1, 'are': 1, 'try': 1, 'Write': 1, 'spaces': 1, 'written': 1, 'array': 1, 'age': 1, 'supported': 1, 'attribute': 1, 'under': 1, 'exports': 1, 'that': 1, 'Klaus': 1, 'RFC2068': 1, 'Replace': 1, 'as': 1, 'shift': 1, 'Prevent': 1, '864e': 1, 'slice': 1, 'prevents': 1, 'stringify': 1, 'loop': 1, 'e': 1, 'at': 1, 'v1': 1, '5': 1, 'com': 1, 'when': 1, 'toUTCString': 1, 'setTime': 1, 'CommonJS': 1, 'false': 1, 'amd': 1, 'Plugin': 1, 'quoted': 1, 'couldn': 1, 'an': 1, 'no': 1, 'github': 1, 'argument': 1, 'Must': 1, 'license': 1, 'second': 1, 'break': 1, 'Browser': 1, 'IE': 1, 'Date': 1, 'to': 1, 'prevent': 1, 'To': 1, 'Released': 1, 'by': 1, 'with': 1, 'Cookie': 1, 'Also': 1, 'indexOf': 1, 'there': 1, 'side': 1, 'MIT': 1, 'odd': 1, 'case': 1, 'number': 1, 'encodeURIComponent': 1, 'calling': 1, 'Hartl': 1, 'unescape': 1, 'all': 1, 'removeCookie': 1, 'Read': 1, 'new': 1, 'assign': 1, 'fresh': 1, 'server': 1, '2013': 1, 'String': 1, 'empty': 1, '4': 1, 'This': 1, 'alter': 1})
>>>count.most_common(10)
[('s', 20), ('cookie', 18), ('options', 16), ('value', 12), ('function', 11), ('key', 10), ('var', 10), ('return', 9), ('expires', 8), ('if', 8)]

如何根據字典中值的大小, 對字典中的項排序？

實際問題

某班英語成績以字典形式進行存儲，格式為：

{
   'ansheng': 79,
   'Jim': 66,
   'Hello': 99,
   ...
}

要求根據成績高低，計算學生排名。

解決方案

使用內置函數sorted()，但是默認情況下sorted()並不能對字典進行排序，這裡提供了兩種解決方法：

利用zip()將字典數據轉化為元組然後把值傳給sorted()進行排序
傳遞sorted()函數的key參數

實例

先創建一個成績單：

>>>from random import randint
>>>Transcripts = { x: randint(60,100) for x in 'xyzabc' }
>>>Transcripts
{'z': 61, 'x': 74, 'b': 81, 'c': 65, 'y': 88, 'a': 98}

使用sorted()進行排序的時候是以字典的key進行的

>>>sorted(Transcripts)
['a', 'b', 'c', 'x', 'y', 'z']

第一種解決方法

獲取字典的所有建

>>>Transcripts.keys()
dict_keys(['z', 'x', 'b', 'c', 'y', 'a'])

獲取字典所有的值

>>>Transcripts.values()
dict_values([61, 74, 81, 65, 88, 98])

通過zip()把字典轉換為元組

>>>T = zip(Transcripts.values(), Transcripts.keys())

通過sorted()進行排序得到結果

>>>sorted(T)
[(61, 'z'), (65, 'c'), (74, 'x'), (81, 'b'), (88, 'y'), (98, 'a')]

元組在進行比較的時候是先從第一個元素進行比較，如果比較值為True，則後面的就不進行比較：

>>>(100, 'a') > (50, 'b')
True
>>>(50, 'a') > (50, 'b')
# a不大於b，返回False
False

第二種解決方法

>>>Transcripts.items()
dict_items([('z', 61), ('x', 74), ('b', 81), ('c', 65), ('y', 88), ('a', 98)])
# key需要傳入一個函數，每次迭代Transcripts.items()的時候，把第一個元素傳入進去，然後進行排序
>>>sorted(Transcripts.items(), key=lambda x: x[1])
[('z', 61), ('c', 65), ('x', 74), ('b', 81), ('y', 88), ('a', 98)]

如何快速找到多個字典中的公共鍵(key)？

在每一個字典中都會出現的鍵稱之為公共鍵。

>>>from random import randint, sample
# abcdefg是隨機產生的key，每次隨機取3-6個key
>>>sample('abcdefg', randint(3, 6))
['g', 'f', 'c', 'a', 'e', 'd']

生成隨機的字典

>>>s1 = { x: randint(1, 4) for x in sample('abcdefg', randint(3, 6)) }
>>>s2 = { x: randint(1, 4) for x in sample('abcdefg', randint(3, 6)) }
>>>s3 = { x: randint(1, 4) for x in sample('abcdefg', randint(3, 6)) }
>>>s1
{'c': 3, 'g': 1, 'e': 2}
>>>s2
{'a': 2, 'f': 2, 'g': 3, 'e': 1, 'd': 2}
>>>s3
{'a': 2, 'c': 2, 'e': 2, 'f': 4}

解決方案

傳統的做法如下:

# 生成一個列表
>>>res = []
# 循環s1字典的所有鍵
>>>for k in s1:
# 如果鍵在s2和s3中都存在就添加到res列表中
...   if k in s2 and k in s3:
...     res.append(k)
...
>>>res
# 得到的鍵e，在s2和s3中都存在
['e']

利用集合(set)的交集操作

使用字典的keys()方法，得到一個字典的keys的集合

>>> s1.keys() & s2.keys() & s3.keys()
{'e'}

使用map函數，得到所有字典的keys的集合，然後使用reduce函數，取所有字典的keys的集合的交集

>>>from functools import reduce
>>>reduce(lambda a, b: a & b, map(dict.keys, [s1, s2, s3]))
{'e'}

如何讓字典保持有序？

解決方案

使用collections.OrderedDict，以OrderedDict替代內置字典Dict，依次將數據存入OrderedDict

# 導入OrderedDict模塊
>>>from collections import OrderedDict

# 創建一個OrderedDict()對象
>>>dic = OrderedDict()
# 添加數據
>>>dic['H1'] = ('a', 'b')
>>>dic['H2'] = ('ab', 'cd')
>>>dic['H3'] = ('ww', 'ss')
# 變量字典中的數據
>>>for n in dic: print(n)
...
H1
H2
H3
>>>for n in dic: print(n)
...
H1
H2
H3

如何實現用戶的歷史記錄功能(最多n條)？

解決方案

可以使用容量為n的隊列存儲歷史紀錄，使用標准庫collections中的deque，它是一個雙端循環隊列。

如果要保存到文件中，可以在程序堆出前，可以使用pickle將隊列對象存入文件，再次運行程序時將其導入。

小例子

制作一個簡單的猜數字小游戲，添加歷史紀錄的功能，顯示用戶最近猜過的數字，限定最近5條。

# 導入deque模塊
>>>from collections import deque
# 創建一個隊列，容量初始值為空，最多放5個值
>>>q = deque([], 5)
>>>q
deque([], maxlen=5)
>>>q.append(1)
>>>q.append(2)
>>>q.append(3)
>>>q.append(4)
>>>q.append(5)
>>>q
deque([1, 2, 3, 4, 5], maxlen=5)
# 當超過5個值的時候，第一個就會被擠出去
>>>q.append(6)
>>>q
deque([2, 3, 4, 5, 6], maxlen=5)

實例的腳本文件為：

from random import randint
from collections import deque

# 隨機生成1-100的數字
N = randint(1, 100)
# 創建一個隊列histort，默認為空，最大限度值為5個
histort = deque([], 5)


def guess(k):
    # 數字猜對
    if k == N:
        print('right')
        return True
    if k < N:
        print('%s is less-than N' % (k))
    else:
        print('%s is greater-than N' % (k))
    return False


while True:
    line = input('please input a number: ')
    if line.isdigit():
        k = int(line)
        histort.append(k)
        if guess(k):
            break
    # 如果輸入'history'就輸出隊列中的內容
    elif line == 'history':
        print(list(histort))

演示截圖

Ubuntu 14.04安裝Python 3.3.5 http://www.linuxidc.com/Linux/2014-05/101481.htm

CentOS上源碼安裝Python3.4 http://www.linuxidc.com/Linux/2015-01/111870.htm

《Python核心編程第二版》.(Wesley J. Chun ).[高清PDF中文版] http://www.linuxidc.com/Linux/2013-06/85425.htm

《Python開發技術詳解》.( 周偉,宗傑).[高清PDF掃描版+隨書視頻+代碼] http://www.linuxidc.com/Linux/2013-11/92693.htm

Python腳本獲取Linux系統信息 http://www.linuxidc.com/Linux/2013-08/88531.htm

在Ubuntu下用Python搭建桌面算法交易研究環境 http://www.linuxidc.com/Linux/2013-11/92534.htm

Python 語言的發展簡史 http://www.linuxidc.com/Linux/2014-09/107206.htm

Python 的詳細介紹：請點這裡
Python 的下載地址：請點這裡

上一篇文章：高質量equals方法和HashCode
下一篇文章： Python進階強化訓練之字符串處理技巧

Linux編程

第五章 Python數據結構

Python常見數據結構整理

Python之函數進階

Python進階強化訓練之字符串處理技巧

Python高級進階：細說Python淺拷貝和深拷貝

Python 的3大內置數據結構

Linux C 算法與數據結構

Linux　c　算法與數據結構－－棧

相關文章

Python入門(一)----什麼是python?python及模塊的安裝

二叉樹數據結構的實現

JavaScript進階：數據類型

數據結構--堆的實現(下)

數據結構--堆的實現(上)

Python的數據類型總結

Python復雜數據類型

[Python]遞歸算法時間復雜度----漢諾塔

[Python]遞歸算法時間復雜度

Python算法之插入排序

C++數據結構之二叉樹

Python:實現一個小算法

Linux編程

SHELL編程

PERL編程