Learn Urlparse Function Better Than Anyone Else

Hello geeks, Today in this article, we will learn about the urlparse() function in python. We will see how it gives us a better way to resolve any URLs and use its component. Then, we will also get a walkthrough of its features. Once we get a theoretical understanding, we will take examples better to understand the urlparse() function in python. So what are we waiting for!! Let’s get started.

Urllib Package

So, before diving into the urlparse() function, we need to understand its use and where it is deduced. Let’s start with that.

So, urllib is a package with a collection of modules for handling various operations on an URL. Now it may be requesting any URL or parsing any URL. They all together make a urllib library.

ModuleDescription
urllib.requestFor opening and reading URLs
urllib.errorFor rasing appropriate errors while URL requests
urllib.parseFor parsing URLs
urllib.robotparserFor parsing robots.txt files

Although it’s an extensive module to understand in one go, we will only focus on urllib. parse module and, more specifically, on url.parse() function.

urllib.parse Module

This module is used for parsing the URLs. It means we can get components of any given URL by splitting it. It is also used to obtain any URL given base URL and relative path.

Installing urllib

urllib is a standard library in python. You need to import it before using it. To do that, we can use the following command.

import urllib

urlparse()

This is a method available in urllib.parse module to split any URL into its components. These components hold some helpful information about a given URL. Let’s understand it with an example.

from urllib.parse import urlparse, urlunparse

parse_url = urlparse('https://www.pythonpool.com/')
print("Url Components:",parse_url)

# Now in the parse_url, we have components to the specified url, we can also obtain url using urlunparse function.
# Let's try it 

unparsed_url = urlunparse(parse_url)
print("Original URL:",unparsed_url)

Recommended Reading | Learn to import modules from different locations

Output:

Url Components: ParseResult(scheme='https', netloc='www.pythonpool.com', path='/', params='', query='', fragment='')

Original URL: https://www.pythonpool.com/

We have seen how we can use the urlparse() and urlunparse() function. Let’s understand the meaning of each component of returned ParseResult.

Components of urlparse() method

Following are the components of urlparse() method.

  • scheme :- Scheme of a url identifies the protocol to be used to access the resource avaialble on internet. It can either be HTTP or “HTTPS”.
  • netloc :- It is network location of the given url
  • path :- The path contains the specific path to the given resource that a web client want to access.
  • params :- It is the parameters to the for the path elements.
  • query :- It follows the path element and provides string of information that the resource can use for some purpose.
  • fragment:- Identifies the fragment.

As discussed above, all these components hold some information about the URL. These components can also be accessed using the index position as the returned object present in the form of a tuple. Let’s see how we can access them.

print(parse_url.scheme, "==",parse_url[0])
print(parse_url.netloc, "==",parse_url[1])
print(parse_url.path, "==",parse_url[2]) 
print(parse_url.params, "==",parse_url[3])     # returns empty string as no parameter is definded
print(parse_url.query, "==",parse_url[4])        # returns empty string as there is no query available
print(parse_url.fragment, "==",parse_url[5])   # returns empty string as no fragment identifier is present.

Output:

https == https
www.pythonpool.com == www.pythonpool.com
/ == /
 == 
 == 
 == 

Although these are the major indexed component, some more features can be accessed using keywords.

  • hostname :- It contains the hostname of the URL
  • username :- It holds the username of the user
  • password :- It holds the password of the user
  • port:- It holds the port number if present

How is urlsplit similar to urlparse

However, we can get the components through urlparse(), but there is an alternative to it also. We can use urlsplit also for that. However, it is almost the same as urlparse(), but there is a slight difference. It does not split the parameters from the given URL; the remaining components are as it is. Concluding it, we can say that urlsplit() returns a 5 tuple class object excluding “params” from urlparse().

FAQs

Q1) What is the return datatype of urlparse()?

The returned datatype of urlparse() is namedtuple.

Q2) How to solve no module named ‘urlparse’ error?

To use urlparse(), you need to import it first using the following command.
import urllib.parse to urlparse

Q3) What does the allow_fragments parameter do in urlparse() function?

allow_fragments parameter is used to parse the fragment identifier as the preceding component and is not identified. The default value for this parameter is true.

Conclusion

So, Today we learned urlparse() method from urllib.parse module. We discussed each component of any given URL. We also discussed how we could access those components and fetch information from the component.

I hope this article helped you! I appreciate your support.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments