Achieving String Compression in Python


Hello geeks! Today we are here with another module of string. The topic of discussion will be String compression using the python programming language. String compression is crucial for in-memory database management systems (IMDBMS) to reduce memory consumption and processing time. It has many applications, such as shortening the URL and messages as well to save space. Let’s have a detailed discussion on string compression.

What do you mean by String compression?

String compression in python basically means to shorten any string whose length is very long. Compressing the string will never change the original intent of the string. Let me give you an example for a better understanding.

Let us suppose we have an URL of an image .


As you can see, the length of the URL is very long. So to shorten this URL, we will use string compression. By compressing the URL, the length of the URL changes, but the URL which you will get after shortening will take you to the same image if you paste that URL to google.

string compression python
String Compression

Why do we need string compression?

The main aim of string compression in python is to save the memory as much as possible. This is because consumption of memory will lead to requiring more resources which are ultimately very expensive.

Nowadays, the world wants speed in whichever task they are searching for. The compressed data or the string will take less processing time and will deliver the output as quickly as possible.

It also features fast read operations, i.e., if a message is compressed, it will take less time for a user to read.

Thus, String compression will reduce the consumption of memory and the processing time, and the user’s time to read a message.

Until now, you must have understood the importance of string compression and its use when it comes to real-life problems.

Algorithm for string compression in python

Let us consider the following example

  • ‘AAABAACCDDDD’ —> ‘A3B1A2C2D4’
  • ‘BAAACCDDDD’ —> ‘B1A3C2D4’
  • ‘AABBCC’ –> ‘A2B2C2’
  • ‘BBBAACCCCDD’ –> ‘B3A2C4D2
  1. Pick the first character from the input string (str).
  2. Append it to the compressed string.
  3. Count the number of subsequent occurrences of the character (in str) and append the count to the compressed string if it is more than 1 only​.
  4. Then, pick the next character and repeat the steps above until the end of str is reached.

Code in Python Without Using Any Library

def compress(string):
    index = 0
    compressed = ""
    len_str = len(string)
    while index != len_str:
        count = 1
        while (index < len_str-1) and (string[index] == string[index+1]):
            count = count + 1
            index = index + 1
        if count == 1:
            compressed += str(string[index])
            compressed += str(string[index]) + str(count)
        index = index + 1
    return compressed

string = "pythooonnnpool"
OUTPUT:- pytho3n3po2l

Time complexity

The time complexity of this code is O(n).

Code in Python Using itertools Library

from itertools import takewhile, dropwhile

def compress(s):
    if not s:
        return ""
    return str(len(list(takewhile(lambda c: c == s[0], s)))) + s[0] + compress("".join(list(dropwhile(lambda c: c == s[0], s))))

OUTPUT:-  1p1y1t1h3o3n1p2o1l

Explanation of the code

Lets look at each function used in the code above

  1. Right after return you can see Python variation of ternary operator – [value_true] if [condition] else [value_false].
    So line return ” if not s is identical to the guardian clause in the first algorithm. The else return value is implementing the rest of the logic.
  2. The loop is implemented as takewhile(lambda c: c == s[0], s)
    This will iterate over characters of the string argument for as long as the letter equals the first letter of the string argument (s[0]). I used “s” to make it shorter on the screen. However, “takewhile” returns a generator. Therefore we need the next function
  3. And the next function in this chain is the list([generator]). The generator returns one item at a time, and the list() function gets all items out of it.
    However, it returns a list, so string “AAA” will become [‘A’, ‘A’, ‘A’]
  4. Using the len([list]) function, we get the number of elements found in this substring. Combining it with the first element of the argument, we get “3A” out of [‘A’, ‘A’, ‘A’]. And this creates a “head” of every recursive call. Now we need to create a tail.
  5. The tail is created by using dropwhile(lambda c: c == s[0], s) function, which drops the number of elements already taken by the “head”.
  6. Since dropwhile is also a generator, we use list() as well.
  7. But we cannot pass a list with the recursive call, so “”.join() function will join elements of the list into a string, and the resulting string will be passed as a new argument into the recursive call.
  8. The recursion will stop when we take all characters out of the string and pass in an empty string.

Using simple Loop

new_string = ""
string = "pythooonnnpool"
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
        new_string =  new_string + string[i] + str(count)
        count = 1 
new_string = new_string + string[i+1] + str(count)
Enter the string:p1y1t1h1o3n3p1o2l1

Also, Read


1. How to use zlib for performing string compression?

Ans :-

import zlib
text = b"Python Pool!" * 100
compressed = zlib.compress(text)

print("size of original: " + str(len(text)))
print("size of compressed: " + str(len(compressed)))

decompressed = zlib.decompress(compressed)
print("size of decompressed: " + str(len(decompressed)))
size of original: 1300
size of compressed: 32


I hope you have understood today’s detailed tutorial on string compression using python. We have discussed the importance of string compression in real life and why it is important. We also understood the algorithm that is supposed to be used and a detailed explanation of the code using the library and without using the library.

If you have any confusion or doubt feel free to comment down below. Practice some questions on string compression to gain confidence. Keep pythonning geeks!

Notify of
Newest Most Voted
Inline Feedbacks
View all comments
2 years ago

It says “‘i’ is not defined” for the Simple Loop compression program when trying to compress 1 character.

Pratik Kinage
2 years ago
Reply to  James

Can you try using a different code? Practically it’s hard to compress 1 character unless you have zero-width space characters. Let me know if you still need help with anything.


1 year ago

how can i make my for loop do it but in a way where it does say 1 for example abbc compressed would be ab2c

Pratik Kinage
1 year ago
Reply to  bob

Following code will do –

new_string = ""
string = "pythooonnnpool"
count = 1
for i in range(len(string)-1):
if string[i] == string[i+1]:
count = count + 1
new_string = new_string + string[i] + (str(count) if count > 1 else "")
count = 1
new_string = new_string + string[i+1] + (str(count) if count > 1 else "")

Last edited 1 year ago by Pratik Kinage