Monday, December 30, 2013

How to compute the most frequently occuring element in a list in python

In python there is a module named "collections" which has some specialized functions that can be used as an alternative to dict, list, etc.

There is a tool called Counter which performs operations like tallying the number of occurrences of objects in a list, etc. Here is how you can make use of this to compute the mode or most frequently occurring element in a list.

%First import the Counter tool from the collections module
>>> from collections import Counter

%Built a counter for a list. This list can be of numbers or strings. For example, if I consider a list of numbers
>>> info = Counter([1, 2, 1, 1, 1, 2])

%To know the frequencies of each element, you can call the most_common function with no arguments. This will return a list of elements attached with their frequencies in descending order.
>>>  info.most_common()
[(1, 4), (2, 2)]

%If you want to know what is the top element with highest frequency, you can send in the argument '1'
>>> info.most_common(1)
[(1, 4)]             %Here 1 is the element and 4 is the frequency of '1'

You can access these elements like this:
>>> info.most_common(1)[0][0]
1
>>> info.most_common(1)[0][1]
4

%If you want to know the top 2 elements with their frequency, you can send argument '2'
>>> info.most_common(2)
[(1, 4), (2, 2)]

To compute the size of the unique elements, you can say:
>>> len(info)
2

Similar operations on a list of strings:

>>> info = Counter(["a","b","c","a","b","c","a","a"])

>>> info.most_common()
[('a', 4), ('c', 2), ('b', 2)]
>>> info.most_common(1)
[('a', 4)]
>>> info.most_common(2)
[('a', 4), ('c', 2)]
>>> info.most_common(3)
[('a', 4), ('c', 2), ('b', 2)]

%The interesting thing is, even if you pass the argument which is greater than the length of the counter, it won't give any segmentation fault. So, it is suggested to put a check on the length of the counter.

>>> info.most_common(4)
[('a', 4), ('c', 2), ('b', 2)]

>>> len(info)
3

Hope this is useful.

No comments: