A Complete Overview of Python itertools groupby() Method

So, you want to learn about itertools.groupby() method, then you are at the right place. Itertools.groupby() is a part of a python module itertools, a collection of tools used to handle iterators. All the methods/tools of itertools make the iterator algebra. Itertools will make your code stand out. Above all, it will make it more pythonic.

One would question the need for itertools. They are faster and much more memory efficient on iterable data structures.

The itertools are of the following types:

  • Infinite iterators
  • Terminating iterators
  • Combinatoric iterators

Although the itertools library contains many valuable functions, in this article, we will focus on the itertools.groupby() method of terminating iterators, you can check out the other methods here.

Decoding itertools.groupby()

The groupby() method of itertools goes through an iterable and groups values based on a particular key. Then it returns an iterator(stream of tuples).

The first value of tuple consists of keys, on which the items of iterable were grouped. The second value of the tuple will be an iterator that contains all the items grouped by the key.

If you feel that it was a lot to take in, don’t worry; we will make it lucid using some concise examples.

Importing itertools.groupby()

# want to use it like itertools.groupby()
import itertools

# want to use directly
from itertools import groupby

Syntax

itertools.groupby(iterable,keyfunc)

Parameters of itertools.groupby()

  1. iterable: Iterables are objects which generate an iterator. For instance, common python iterable are list, tuple, string, dictionaries. The itertools.groupby() groups elements of the iterable together.
  2. keyfunc: A function for computing the group category for each element. If the key function is not specified or is None, the element itself is used for grouping.

Note: itertools.groupby() collects together contiguous items with the same key. To put it differently, sorting your iterable will save you from any discrepancies.

Return type

Itertools.groupby() returns an iterator with streams of tuples inside.

Return type
Iterator

Complexity

Itertools.groupby() has O(n) time complexity.

Examples for itertools.groupby()

Example 1:

Let’s start with a most basic example.

import itertools

string = 'aaaabbbbbbcccdddddd'
string_tuple = itertools.groupby(string)
print(string_tuple, type(string_tuple))

Let’s go through the above code:

  • In the above code we have string variable as an iterable, we have taken a already sorted string.
  • After passing string to the itertools.groupby(), we get an iterator, which we have stored in string_tuple variable.
itertools.groupby() iterator
iterator created
for item in string_tuple:
    print(item)
  • We iterate over the string_tuple iterator, notice the returned tuples, first values are the keys and the second values are the iterators; which have the grouped elements of iterable according to the keys.
Tuples returned from string_tuple iterator
Tuple returned iteration over the string_tuple iterator
for key, iter_item in string_tuple:
    print(f"Key:{key}")
    for item in iter_item:
        print(item,end=" ")
    print()
  • Notice in the output below; how the elements of the iterable string = ‘aaaabbbbbbcccdddddd’ have been grouped together.
Iterable elements grouped together by groupby method
Iterable elements grouped together by groupby method

Example 2:

Let’s take an example; we have to group anagrams(word formed by rearranging characters of a different word, for example, ‘cat’ and ‘act’ are anagrams) together.

import itertools

anagrams =['angel','below','glean','bored','robed','study','dusty','cat','act','inch','chin','taste','state','elbow']
grouped_anagrams = [list(group) for key, group in itertools.groupby(sorted(anagrams, key=sorted),sorted)]
print(grouped_anagrams)
anagram example
Output: grouped anagrams using itertools.groupby()

Let’s break down the code provided above:

  • We have a 7 pairs of anagrams stored in the variable named anagrams.
  • Using list comprehension, as shown above, we have saved the list of grouped anagrams into the variable named grouped_anagrams.
  • Here(see code above), sorted function takes in the anagrams list, sorts them according to the matching anagram counterparts.
  • For instance, look at the image below:
anagram pairs
sorted(anagrams, key=sorted); sorted the pairs of anagrams together
  • Again, sorted is passed as keyfunc in the groupby method, which returns a key to be grouped on.
  • To put it diffrently, ‘cat’ and ‘act’ return the same key which is [‘a’, ‘c’, ‘t’]. Hence, they are grouped togther.
  • Similarly, groupby method couples remaining 6 pairs of anagrams.
  • Image provided below will make this more clear and unambiguous.
example for more clarity
Example

Real-world example of groupby

students = [
	{
		'name':'Teri Howard',
		'state':'CA'
	},
	{
		'name':'Stephen Reyes',
		'state':'CA'
	},
	{
		'name':'Thalia Franklin',
		'state':'CA'
	},
	{
		'name':'Yvonne Slater',
		'state':'Tx'
	},
	{
		'name':'Rolf Wilcher',
		'state':'Tx'
	},
	{
		'name':'Teri Dinwiddie',
		'state':'MS'
	},
	{
		'name':'Fred Greer',
		'state':'AL'
	},
	{
		'name':'Lane Snee',
		'state':'AL'
	}
]

def get_state(students):
    return students['state']
grouped_students = itertool.groupby(students, get_state)

for state, students in grouped_student:
    print(f"State: {state}")
    for student in students:
        print(f"Students: {student}", end=" ")
	print()
  • Suppose, you are provided with a large data of student details in dictionary/json data format. Your task is to group students using some key( here state). Observe the output below:
Grouped by using the state as the key
Grouped by using the state as the key

Itertools.groupby on 2-D array

import itertools 

key_func = lambda x:x[0]
organims = [['land','lion'],['aquatic','shark'],['air','eagle'],['land','bear'],['land','monkey'],['aquatic','octopus']]

for key, group in itertools.groupby(sorted(organims,key=key_func),key_func):
	print('{}: {}'.format(key,[i[1] for i in group]))
Groupby method on a 2-D array
Groupby method on a 2-D array

FAQ’s on itertools.groupby()

Q1. Itertools.groupby() not grouping correctly?

Itertools.groupby() collects together contiguous items with the same key. To put it differently, sorting your iterable will save you from any discrepancies or errors.

Q2. SQL’s groupby vs Python’s groupby

A significant difference between the two is that in the case of SQL’s groupby, no sorting of the data is required. Still, in Python’s groupby method, this is necessary to avoid errors and discrepancies.

Q3. Error: Itertools.groupby() has no len()

Since itertools groupby returns an iterator, that is a stream of tuples. As len() is limited to iterables, to put it simply, it will not work. Instead, you can convert it to a list and find the length. But instead of filling up memory, you can initiate a counter variable, increment it using a loop, and get the length.

Conclusion

Itertools.groupby() has its own niche, particularly in grouping the data items of an iterable together. We discussed the itertools library, the itertools groupby method; its syntax, and the parameters. We looked at some examples and worked on the groupby method on a 2-D array to further consolidate our understanding of the topic.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments