TimSort: Algorithm and Implementation in Python

Hello coders!! In this article, we will learn about the TimSort algorithm and learn its implementation in Python. Tim Peters created TimSort in the year 2002 to improve the sorting performance of the list. sort() function makes use of this algorithm and is the fastest sorting algorithm.

What is Python TimSort Algorithm?

TimSort is a sorting algorithm that is a hybrid of merge sort and insertion sort algorithm. It is a stable algorithm and works on real-time data. In this, the list that needs to be sorted is first analyzed, and based on that the best approach is selected.

Python TimSort Algorithm:

  1. Divide the array into blocks known as run
  2. The size of a run can either be 32 or 64
  3. Sort the elements of every run using insertion sort
  4. Merge the sorted runs using the merge sort algorithm
  5. Double the size of the merged array after every iteration

Understanding the algorithm in detail:

Calculation of minrun:

We know now that the first step of the Python Timsort algorithm is to divide the blocks into runs. minrun is the minimum length of each run. It is calculated from the number of elements of the array N.

  • It should not be very long as we will implement insertion sort to sort each run and we know that insertion sort works more efficiently for shorter arrays
  • However, it should also not be too short as the next step is to merge these runs, shorter runs will result in more number of runs
  • It is beneficial if N/minrun is a power of 2 as it will result in the best performance by merge sort.

Splitting and sorting of runs:

When we reach this step, we already have two values – N and minrun

  • descending flag is by the comparison between the first 2 items, if there is only 1 item left, then it is set as false.
  • Then the other elements are iterated and checked whether they are still in ascending or “strict” descending order until an item that does not follow this order is found.
  • After this, we will have a run in either ascending or “strict” descending order. If it is in a “strict” descending order we need to reverse it.
  • Then we check to make sure that the length of the run satisfies minrun. If it is smaller, we compensate for the following items to make it achieve the minrun size.

merging of runs:

  • At first, we will create a temporary run having the size of the smallest of the merged runs.
  • After this, we need to copy the shortest run into the temporary one.
  • Now, we will mark the first element of the large and temporary array is as the current position.
  • In each following step, we will compare the elements in the large and temporary arrays, and the smaller ones will be moved into a new sorted array.
  • We need to repeat the above step until one of the arrays runs out.
  • Lastly, we will add All the remaining elements to the end of the new one.

Implementation of Tim’s Algorithm in Python:

MINIMUM= 32
 
def find_minrun(n): 
 
    r = 0
    while n >= MINIMUM: 
        r |= n & 1
        n >>= 1
    return n + r 
 
def insertion_sort(array, left, right): 
    for i in range(left+1,right+1):
        element = array[i]
        j = i-1
        while element<array[j] and j>=left :
            array[j+1] = array[j]
            j -= 1
        array[j+1] = element
    return array
             
def merge(array, l, m, r): 
 
    array_length1= m - l + 1
    array_length2 = r - m 
    left = []
    right = []
    for i in range(0, array_length1): 
        left.append(array[l + i]) 
    for i in range(0, array_length2): 
        right.append(array[m + 1 + i]) 
 
    i=0
    j=0
    k=l
  
    while j < array_length2 and  i < array_length1: 
        if left[i] <= right[j]: 
            array[k] = left[i] 
            i += 1
 
        else: 
            array[k] = right[j] 
            j += 1
 
        k += 1
 
    while i < array_length1: 
        array[k] = left[i] 
        k += 1
        i += 1
 
    while j < array_length2: 
        array[k] = right[j] 
        k += 1
        j += 1
 
def tim_sort(array): 
    n = len(array) 
    minrun = find_minrun(n) 
 
    for start in range(0, n, minrun): 
        end = min(start + minrun - 1, n - 1) 
        insertion_sort(array, start, end) 
  
    size = minrun 
    while size < n: 
 
        for left in range(0, n, 2 * size): 
 
            mid = min(n - 1, left + size - 1) 
            right = min((left + 2 * size - 1), (n - 1)) 
            merge(array, left, mid, right) 
 
        size = 2 * size 
 
 
 
 
array = [-1,5,0,-3,11,9,-2,7,0] 
 
print("Array:") 
print(array) 
 
tim_sort(array) 
 
print("Sorted Array:") 
print(array)

Output:

Output for timsort algorithm obtained by executing the given code
OUTPUT

Complexity Analysis of Python TimSort Algorithm:

  • Worst-case time complexity: O(n log n)
  • Best-case time complexity: O(n)
  • Average-case time complexity: O(n log n)
  • Worst-case space complexity: O(n)

Advantages of Python Timsort:

  • It is a stable sorting algorithm
  • Works for real-time data

Must Read

Conclusion:

That is all about the Python TimSort algorithm. The in-built sort() function of Python uses this algorithm as it provides a stable sorting approach. It is faster than other sorting algorithms and can be directly implemented in python using the sort() method.

Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Richard Whipple
Richard Whipple
2 years ago

Lines 23 and 24 have an extra array_ in the variable name. With this data set this code is never reached.

Pratik Kinage
Admin
2 years ago

Thank you for finding this bug! I’ve fixed it and updated the post accordingly.

Regards,
Pratik

Roo
Roo
2 years ago

Seems to be missing critical points of what makes a TimSort: identification of natural runs, checking invariants for perfectly balanced sub-array merges and galloping function

Jeremy Sanders
Jeremy Sanders
1 year ago

If translating this code into another language, note that it depends on Python’s negative array index feature to run. Line 15 can access index -1 of the array before checking whether it is negative. If your language supports short-circuit expression evaluation, the two conditions should be swapped. An example where this occurs is the input [3, 6, 3, 4, 5, -1].