The Secret Guide To Regex Optional Group

Pattern matching is one of the major parts we depend on while building some applications or websites. However, thinking for pattern matching without the regex module is not an easy take. We highly rely on this library whenever it comes to matching patterns.

Regular expressions or regex are written in the condensed formatted language. You can think it like we are giving some regular expressions pattern to a regex processor, which matches the patterns from the source of data and then returns whether either the data we are looking for exists or not.

However, getting deeper into regex only took a lot of time. So, in this article, we will focus only on the regex optional group. We will see what regex optional group is? or how can we use it? But before that, we need some idea of groups. Let’s see that first.

Regex Groups

So, the need for groups in regex is raised because it is a bit of pain to write the whole regex pattern at once. Sometimes we need to match some complex patterns, and specifying them for a while is somewhat challenging to do. In that case, developers or programmers use groups.

In groups, we divide the whole pattern into several parts which we can say that we are grouping it. Then, we write regex patterns for each part which are known as groups. The convention for specifying a group is that we write them within the parenthesis. Let’s see an example to understand groups.

  • Normal regex
re.findall("[a-zA-Z]{1,100}\[edit\]",wiki)
           |       |       |       |
           |-part1-|-part2-|-part3-|

So, in the above case, you can see that we first specify that we want a word that has “A-Z” or “a-z” in it. Then in part two, we specified the quantifier, which tells that the word length should be between 1 to 100. Then, in part 3rd, we determined that the word should be followed by “[edit]”. If we do the same thing using a group, we can write it as follows.

re.findall("([\w ]*)(\[edit\])",wiki)
           |---g1---|---g2---|

So, in the above case, we have divided the pattern into two groups, i.e., g1 and g2. In the g1 group, we have specified that we need a group of words. And then, in the g2 group, we have specified that g1 is followed by the “[edit]” word. It clearly seems that it makes our work a lot easier as we don’t have to care about quantifiers here. In the same way, handling complex patterns groups helped us a lot.

Optional Group

Now, here you are in a condition where you can guess what an optional group is! So, as the name suggests, it is known as an optional group when we make any group optional. It means that the existence of that group or pattern is not compulsory. It is good if we find it or not.

So to make any group optional, we need to have to put a “?” after the pattern or group. This question mark makes the preceding group or pattern optional. This question mark is also known as a quantifier. Let’s take an example to understand it clearly.

Example 1: Regex Mark group as Optional

Consider a string that consists of some telephonic pattern of numbers. Now, different people write that in different ways. However, their meaning is similar in all the cases. So, while matching the pattern, we need to capture all the strings. Let’s see how we can do it.

import re
string = """ This is a string we want to match
             123.123.1234
             123-123-1234
             (123)-123-1234
             (123).123.1234"""

 # Writing regex without optional group ("?")
print("Optional group is not used and didn't includes parentheses")
print(re.findall("\d+[.-][\d]{3}[.-][\d]{4}",string))

Output:

Optional group is not used and didn't includes parentheses
['123.123.1234', '123-123-1234']

So, in the first pattern, we have written a regex that matches the first section of the number without parentheses as we have not mentioned parentheses in it. So, it returns the numbers which don’t have parentheses in that. This isn’t the ideal case, as all the string represents the same number.

# Writing regex without optional group ("?")
print("Optional group is not used and includes parentheses")
print(re.findall("\(\d+\)[.-][\d]{3}[.-][\d]{4}",string))

Output:

Optional group is not used and includes parentheses
['(123)-123-1234', '(123).123.1234']

In the second pattern, we have written a regex that includes the parentheses, which tells that the matching list should have parentheses in that, and if there are no parentheses, it will not match the number. This is also not the ideal case as we have to match all the strings.

# Writing regex with optional group ("?")
print("Optional group is used and matching with and without parentheses")
print(re.findall("\(?\d+\)?[.-][\d]{3}[.-][\d]{4}",string))

Output:

Optional group is used and matching with and without parentheses
['123.123.1234', '123-123-1234', '(123)-123-1234', '(123).123.1234']

In the third case, we mentioned that we need parentheses in our matching list, but it is not mandatory. It is good if we find it or not. To do that, we used “?” after both the parentheses, which tells that the preceding character is not compulsory to match. Hence, it returns all the numbers by matching them. In this way, we can use optional groups while matching patterns using regex.

Regex Optional Group at the End

However, there could be wide use of the optional groups while matching. To do it, we will add “?” to the end of the last group. Let’s see an example of how we can do it at the end.

import re

string = """ This is the string we will check
             /api/1.0/clients/0/
             /clients/
             /clients/0/
             /clients/0/delete """

re.findall("\/clients\/(?:\d+\/)?",string)
Output:

['/clients/0/', '/clients/', '/clients/0/', '/clients/0/']

Example: Regex optional group without capture

import re

string = """ This is the string we will check
             https://www.pythonpool.com/
             www.pythonpool.com/
             www.pythonpool.com/category/numpy/
             https://www.pythonpool.com/category/numpy/ """

re.findall("(?=[\w]+:\/\/)?w{3}.[\w]+.[\w]+(?:\/[\w]*\/[\w]*)?",string)
Output:

['www.pythonpool.com',
 'www.pythonpool.com',
 'www.pythonpool.com/category/numpy',
 'www.pythonpool.com/category/numpy']

So, to make any non-capturing group, we will use the concept of lookahead. To do that, we will add “?=” at the start of the group. In the above example, we did that by adding a lookahead/ non-capturing group for “https://” and then making it optional, which returns all the strings accordingly.

Example: Regex Optional Group using Greedy method

Optional characters are already greedy (you would use ?? to make it non-greedy). But greediness just means trying to find the longest match that still allows the rest of the regular expression to match. It will still backtrack if necessary. If you want to force failure, if there’s something following it, one way to do that is with a negative lookahead. Consider the following example to understand it.

input = 'sinh x'
output=re.sub(r'(sin|cos|tan|cot|sec|csc)(h?)\s*(|\^\s*[\(]?\s*\-?\s*\d+\s*[\)]?\s*)?([a-z0-9]+)',r'\1\2\3(\4)', input)

We will substitute the regex given below to make the following input greedy.

(sin|cos|tan|cot|sec|csc)(?!.\([^)]*\))(h?)\s*(|\^\s*[\(]?\s*\-?\s*\d+\s*[\)]?\s*)?([a-z0-9]+)

Conclusion

So, today in this article, we have seen what groups in a regex are and how it helps in pattern matching. Then, we have witnessed optional groups in regex and how we can make any group optional in matching patterns. Then, we have seen some demonstrations of using optional groups while matching patterns. I hope this article has helped you. Thank You.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments