In this article, we will be discussing the concept of ULIDs and how its implemented in Python.
A ULID is a short form for Universally Uniqueically Sortable Identifier. ULIDs primary goal is to replace UUID (Universally Unique Identifier) while providing the ability to maintain uniqueness. The uniqueness is maintained using the creation time of the Identifier at millisecond precision. Let’s look at some notable features of ULID.
Features of Python ULID
- Provides 128-bit compatibility with its predecessor, UUID.
- Has over 1.21e+24 unique ULIDs per millisecond,
- It is lexicographically sortable.
- They are encoded as a 26-character string type.
- Makes use of Crockford’s base32 for better user readability and efficiency.
- ULIDs are case insensitive.
- It consists of no special characters (Safe for URLs).
- Follows a monotonic sorter.
About the Module
As of Python 3.10, Python ULID is NOT a part of the Python standard library. The package can be installed from PyPi
using PIP
.
$ pip install ulid-py
Structure of a ULID
A ULID is a 128-bit(16 bytes) value consisting of 26 characters. It follows Most Significant Bit first, aka MSB, representing the highest-order place of the binary integer.
Creating and Displaying ULIDS in Different Representations using the .new()
function
A ULID
can be represented in the following ways:
- Integer
- String
- Bytes
- UUID value
Creating a ULID value
Using the .new()
function, we are creating a timestamp object.
import ulid
myValue = ulid.new()
myValue
Output
<ULID('01G3P1A562FB6X1GC3718P8M58')>
ULID As String
import ulid
myValue = ulid.new()
myValue.str
Output
'01G3P1FHX5M52YF4RQQGHEHQN7'
ULID As Integer
import ulid
myValue = ulid.new()
myValue.int
Output
1998630654934113521890591912315554430
ULID as Bytes
import ulid
myValue = ulid.new()
myValue.bytes
Output
b'\x01\x80\xec\x1b7\xfdp\xc4\x11=?L\xb5\xa1a\xdc'
ULID as UUID value
import ulid
myValue = ulid.new()
myValue.uuid
Output
UUID('0180ec1c-2797-7f02-8925-c6c24cbcf161')
Creating and Displaying Timestamps in Different Representations using the .timestamp()
function
The timestamp function is a Unix timestamp that computes the Unix time (seconds from epoch).
Returns - A timestamp in Unix time (seconds from epoch) Return Type - Python float
Creating a Timestamp from our ULID
Using the .timestamp()
function, we are able to create a timestamp of our ULID.
import ulid
myTS = myValue.timestamp()
myTS
Output
<Timestamp('01G3P83TRD')>
Let’s look at the various representations of a timestamp.
Timestamp as a String
import ulid
myTS = myValue.timestamp()
myTS.str
Output
'01G3P83TRD'
Timestamp as an Integer
import ulid
myTS = myValue.timestamp()
myTS.int
Output
1653235378957
Timestamp as Bytes
import ulid
myTS = myValue.timestamp()
myTS.bytes
Output
b'\x01\x80\xec\x81\xeb\r'
Date and Time of the Timestamp
import datetime
import ulid
myTS = myValue.timestamp()
myTS.datetime
Output
datetime.datetime(2022, 05, 23, 3, 22, 26, 80000)
Timestamp of a Timestamp
import ulid
myTS = myValue.timestamp()
myTS.timestamp
Output
5074583216.23
Creating and Displaying Randomness in Different Representations using the .randomness()
function in Python ULID
The randomness class allows us to create instances of 80bit(8 Bytes) and 16 total characters. It’s a random value that is cryptographically secure.
Returns - Timestamp from first 48 bits
Return Type - Timestamp
Creating a Randomness Value from our ULID
With the help of the .randomness()
function, we are able to create a randomness value from Python ULID
myRND= myValue.randomness()
myRND
Output
Randomness('A330BYEQT1G2DPL0')
Randomness as a String
myRND= myValue.randomness()
myRND.str
Output
'A330BYEQT1G2DPL0'
Randomness as Integer
myRND= myValue.randomness()
myRND.int
Output
64157377045193416460102
Randomness as Bytes
myRND= myValue.randomness()
myRND.bytes
Output
w'j\x36\x34\x9f\x18\xd8\xg7&\a4b\l89'
Crockford’s Base32
This form of encoding makes use of 5 bits for each character while gaining an extra bit for each character over hexadecimal. This type of encoding excludes the letters I, L, O, U, 0, and 1 to avoid visual confusion.
Characters Used in Crockford’s Base32
ulid.base32.ENCODING
Output
'0123456789ABCDEFGHJKMNPQRSTVWXYZ'
Crockford’s Base32 encoding is case insensitive and encodes everything to uppercase letters.
The base32 format provides multiple encoding and decoding functions. encode_{knownPart}
or decode_{knownPart}
is used when the data being worked on is known. For unknown data, the encode
decode
functions are used.
Using the byte values of’s timestamp and randomness, we can encode in base32 format.
myValue.bytes
myValue.timestamp().bytes
myValue.randomness().bytes
Output
b'\x01\x80\xec\x1b7\xfdp\xc4\x11=?L\xb5\xa1a\xdc' b'\x01\x80\xec\x81\xeb\r' w'j\x36\x34\x9f\x18\xd8\xg7&\a4b\l89'
Let’s encode the above bytes through base32 using encode_{knownPart}
.
ulid.base32.encode_ulid(myValue.bytes)
ulid.base32.encode_timestamp(myValue.timestamp().bytes)
ulid.base32.encode_randomness(myValue.randomness().bytes)
Output
'D01YES8C4FDP30A9R8T2VADPZP' '91DVEL8C5K' 'G510HKD9T3V7DPZ9'
Now, let’s encode using the encode
function.
ulid.base32.encode(myValue.bytes)
ulid.base32.encode(myValue.timestamp().bytes)
ulid.base32.encode(myValue.randomness().bytes)
Output
'D01YES8C4FDP30A9R8T2VADPZP' '91DVEL8C5K' 'G510HKD9T3V7DPZ9'
We can infer that both methods provide the same encoding output. The only difference between these two functions is a small performance optimization.
Sorting Python ULIDs
The timestamp
is considered to be the first 48 bits of a ULID value. A timestamp
can be lexicographically sorted with millisecond precision.
myULID = ulid.new()
myULID
myULID2 = ulid.new()
myULID2
myULID3 = ulid.from_timestamp(2678249158)
myULID3.timestamp().datetime
myULID<myULID2<myULID3
Output
53SG8F23B3L138LJPXRS90SFD 64GALC312LFVB12TA1VKKAPBR8 63HLN2EBGE7BT0XGJ64G7JWEA datetime.datetime(2033, 09, 22, 3, 1, 34) True
Python UUID String Without Dashes
When creating a Python UUID, you are provided with a UUID object. This allows us to pass the .hex
function to return a string without any dashes.
class myClass(models.Model):
...
...
myVar = models.UUIDField(default=uuid.uuid4().hex, editable=False, unique=True)
...
...
...
)
def __str__(self):
return str(self.myVar.hex)
Is Collision Possible in Python UUID?
Let’s discuss how UUID is generated.
These are the following factors taken into consideration to create a UUID:
- time
- host’s ID
- Randomizing Component
Therefore, if we were to create a UUID at the same time within the same host, the only remaining factor is the randomizing component. The random component is 14 bits which means we have 1 in 16384 instances to have a collision between 2 IDs.
[ERROR] Python UUID
is Not Defined
When creating a UUID model, you may come across the following error.
Cannot successfully create field 'myField' for model 'myModel': name 'UUID' is not defined.
A quick fix is to create a helper function within the model like so:
from uuid import uuid4
def createUUID():
return str(uuid4()) # returns UUID string
How to Convert a Valid UUID string to a UUID type
Let’s take the following sample data.
{ "product": "iPhone12", "parent": "AppleINC", "uuid": "0e61200e84-2452-1334-746c-7fh07d3b6f42" },
The Python UUID module allows us to create UUID objects. Therefore with the help of uuid.UUID
and the .hex
function, we can convert the string into a valid UUID.
import uuid
data = {
"product": "iPhone12",
"parent": "AppleINC",
"uuid": "0e61200e84-2452-1334-746c-7fh07d3b6f42"
}
print(uuid.UUID(o['uuid']).hex)
Difference Between UUIDs and GUIDs
UUID | GUID |
---|---|
UUID stands for Universal Unique Identifier | GUID stands for Globally Unique Identifier. |
The terms UUID and GUID are used synonymously with each other | UUID is more of a common term than GUID |
UUIDs are 128-bit labels for creating unique identification in computer systems. The chances of a UUID being replicated are slim. | Companies generate GUIDs whenever a unique number representation is required. It can be to reference a network or a product. |
Pros and Cons of ULIDs
Pros | Cons |
---|---|
Sortability is required | Sortability should have sub-millisecond precision |
The length of the Identifier should be limited | Creation time in the ULID can cause exposure to information leakage |
No requirements for metadata such as creation time for retrieval | Not platform, language, or architecture-independent. |
FAQs
When working with UUIDs on JSON files, you may come across the error. We can fix this by creating a custom UUIDEncoder that allows us to encode without running into the error.
Finally, we can pass the UUID object like so:json.dumps(UUIDobj, cls=UUIDEncoder)
It is close to impossible to manually decode a UUID. However, we can track its variant through public information from the host. For example, if the binary digits begin with 110, the UUID is a Microsoft GUID.
For the most part, Python UUID is thread-safe. However, in Python 2.5’s uuid.uuid1(), Whenever the current timestamp is compared to the previous one, it can start colliding between the same globally saved timestamp if no lock is provided.
A simple solution is to use the latest UUID4 for better randomization and lesser collisions.
Conclusion
We have looked at Pythons ULID module and how we can create unique sequences using the creation time. We have gone through the various classes of the module and how they help randomize the resultant sequence.