Hello geeks, Today in this article, we will learn about the urlparse() function in python. We will see how it gives us a better way to resolve any URLs and use its component. Then, we will also get a walkthrough of its features. Once we get a theoretical understanding, we will take examples better to understand the urlparse() function in python. So what are we waiting for!! Let’s get started.
Urllib Package
So, before diving into the urlparse() function, we need to understand its use and where it is deduced. Let’s start with that.
So, urllib is a package with a collection of modules for handling various operations on an URL. Now it may be requesting any URL or parsing any URL. They all together make a urllib library.
Module | Description |
---|---|
urllib.request | For opening and reading URLs |
urllib.error | For rasing appropriate errors while URL requests |
urllib.parse | For parsing URLs |
urllib.robotparser | For parsing robots.txt files |
Although it’s an extensive module to understand in one go, we will only focus on urllib. parse module and, more specifically, on url.parse() function.
urllib.parse Module
This module is used for parsing the URLs. It means we can get components of any given URL by splitting it. It is also used to obtain any URL given base URL and relative path.
Installing urllib
urllib is a standard library in python. You need to import it before using it. To do that, we can use the following command.
import urllib
urlparse()
This is a method available in urllib.parse module to split any URL into its components. These components hold some helpful information about a given URL. Let’s understand it with an example.
from urllib.parse import urlparse, urlunparse
parse_url = urlparse('https://www.pythonpool.com/')
print("Url Components:",parse_url)
# Now in the parse_url, we have components to the specified url, we can also obtain url using urlunparse function.
# Let's try it
unparsed_url = urlunparse(parse_url)
print("Original URL:",unparsed_url)
Recommended Reading | Learn to import modules from different locations
Output:
Url Components: ParseResult(scheme='https', netloc='www.pythonpool.com', path='/', params='', query='', fragment='')
Original URL: https://www.pythonpool.com/
We have seen how we can use the urlparse() and urlunparse() function. Let’s understand the meaning of each component of returned ParseResult.
Components of urlparse() method
Following are the components of urlparse() method.
- scheme :- Scheme of a url identifies the protocol to be used to access the resource avaialble on internet. It can either be “HTTP” or “HTTPS”.
- netloc :- It is network location of the given url
- path :- The path contains the specific path to the given resource that a web client want to access.
- params :- It is the parameters to the for the path elements.
- query :- It follows the path element and provides string of information that the resource can use for some purpose.
- fragment:- Identifies the fragment.
As discussed above, all these components hold some information about the URL. These components can also be accessed using the index position as the returned object present in the form of a tuple. Let’s see how we can access them.
print(parse_url.scheme, "==",parse_url[0])
print(parse_url.netloc, "==",parse_url[1])
print(parse_url.path, "==",parse_url[2])
print(parse_url.params, "==",parse_url[3]) # returns empty string as no parameter is definded
print(parse_url.query, "==",parse_url[4]) # returns empty string as there is no query available
print(parse_url.fragment, "==",parse_url[5]) # returns empty string as no fragment identifier is present.
Output:
https == https
www.pythonpool.com == www.pythonpool.com
/ == /
==
==
==
Although these are the major indexed component, some more features can be accessed using keywords.
- hostname :- It contains the hostname of the URL
- username :- It holds the username of the user
- password :- It holds the password of the user
- port:- It holds the port number if present
How is urlsplit similar to urlparse
However, we can get the components through urlparse(), but there is an alternative to it also. We can use urlsplit also for that. However, it is almost the same as urlparse(), but there is a slight difference. It does not split the parameters from the given URL; the remaining components are as it is. Concluding it, we can say that urlsplit() returns a 5 tuple class object excluding “params” from urlparse().
FAQs
The returned datatype of urlparse() is namedtuple.
To use urlparse(), you need to import it first using the following command.
import urllib.parse to urlparse
allow_fragments parameter is used to parse the fragment identifier as the preceding component and is not identified. The default value for this parameter is true.
Conclusion
So, Today we learned urlparse() method from urllib.parse module. We discussed each component of any given URL. We also discussed how we could access those components and fetch information from the component.
I hope this article helped you! I appreciate your support.