Files are used for storing information with the ability to read and write on them. The operations which can be performed on files in python are – read, write, open, close, rename and delete. There are two main types of files in python – binary file and text file. Binary files can be of various types such as image files like .png, .gif, .jpg or documents like .pdf, .xls, .doc, etc. The text file can be source code, web standards, tabular data, etc. In this article, we shall be looking into one such tabular data from the text file – .tsv file. We shall be seeing into how to read tsv file in python.
What is a TSV file?
The TSV file stands for tab-separated values file. It is a text file that stores data in a tabular form. The TSV file format is widely used for exchanging data between databases in the form of a database table or spreadsheet data. Here, each record is separated from the other by a tab character ( \t ). It acts as an alternate format to the .csv format. The difference between .tsv and .csv format is that the .csv format uses commas to separate columns in data whereas .tsv format uses tabs to separate columns.
Reading TSV file in Python Using open Function
We can read the tsv file in python using the open() function. We can read a given file with the help of the open() function. After reading, it returns a file object for the same. With open(), we can perform several file handling operations on the file such as reading, writing, appending, and creating files.
After opening the file, we shall make use of the reader() present in CSV to convert the file object into CSV.reader object. For using the reader, we shall be first importing CSV.
import csv
Then, we shall write the open() function. We shall be using a tsv file named ‘product.tsv’ , which consists of the sales count for three products over a span of 12 months. We will pass the tsv file as an argument to the open() function, and ‘file’ will be the file’s object.
Then we use csv.reader to convert the file object to csv.reader object. We pass the delimiter as ‘\t’ to the csv.reader. The delimiter is used to indicate the character which will be separating each field.
Since this is a tsv file, we shall be passing the tab character as the delimiter. The variable ‘tsv_file’ will be the object for the tsv file. Then, we shall iterate the entire file and print each statement line by line.
with open("product.tsv") as file:
tsv_file = csv.reader(file, delimiter="\t")
for line in tsv_file:
print(line)
The tsv file is printed line by line as the output:
['Month', 'Product A Sales', 'Product B Sales', 'Product C Sales'] ['January', '297', '119', '289'] ['February', '305', '437', '362'] ['March', '234', '247', '177'] ['April', '184', '193', '219'] ['May', '373', '316', '177'] ['June', '433', '169', '370'] ['July', '294', '403', '429'] ['August', '156', '445', '216'] ['September', '441', '252', '498'] ['October', '328', '472', '491'] ['November', '270', '251', '372'] ['December', '146', '159', '156']
The Entire Code is:
import csv
with open("product.tsv") as file:
tsv_file = csv.reader(file, delimiter="\t")
for line in tsv_file:
print(line)
Reading TSV file in Python Using Pandas
There is another way to read the tsv file which is using the pandas library. Pandas library in python is used for performing data analysis and data manipulation. It is a powerful library for manipulating numerical tables.
First, we shall be importing the pandas library.
import pandas as pd
Now, we shall be making use of the read_csv() function from the pandas library. We shall be passing the tsv file to the read_csv(). Along with the file, we shall be passing separator as ‘\t’ for the tab character because, for tsv files, the tab character will separate each field.
tsv_data = pd.read_csv('product.tsv', sep='\t')
tsv_data
The output will be the tsv file:
The Entire Code is:
import pandas as pd
tsv_data = pd.read_csv('product.tsv', sep='\t')
tsv_data
Now, to read the first five rows from the product.tsv, we shall make use of head() function. This will get the first n rows from the tsv file.
print(tsv_data.head())
By default, if you don’t specify the number of rows, head() will print 5 rows.
Month Product A Sales Product B Sales Product C Sales
0 January 297 119 289
1 February 305 437 362
2 March 234 247 177
3 April 184 193 219
4 May 373 316 177
To print all the entries of a particular column, we shall be using the following code. We will print the entire ‘Product A Sales’ column.
print(tsv_data['Product A Sales'])
The output will be:
0 297 1 305 2 234 3 184 4 373 5 433 6 294 7 156 8 441 9 328 10 270 11 146 Name: Product A Sales, dtype: int64
Writing Over a TSV File with Pandas
Now, we shall see how to write over an already existing tsv file. We shall make use of the open() function but this time we shall open the file in ‘wt’ mode. Using ‘wt’ mode, we can write the file as text. Instead of the csv.reader(), here we shall be using csv.writer(). We shall pass the tsv file and the delimiter as ‘\t’ to the writer() function.
After that, we shall use writerow() to write individual rows to the file. Finally, we shall insert two rows using the same function.
import csv
with open('product.tsv', 'wt') as file:
tsv_writer = csv.writer(file, delimiter='\t')
tsv_writer.writerow(['January', 324, 122, 191])
tsv_writer.writerow(['February', 291, 322, 291])
Now, let us try to again read the ‘product.tsv’ file. Again, we shall use the same piece of code as used before for reading.
with open("product.tsv") as file:
tsv_file = csv.reader(file, delimiter="\t")
for line in tsv_file:
print(line)
For the output, we can see that the file has been overwritten and it only contains two rows instead of the twelve rows which were present before.
['January', '324', '122', '191'] ['February', '291', '322', '291']
Writing TSV Without Pandas
To write over tsv files without using the pandas library, we shall use the following code. Here, we will append the contents of a file named ‘total_sales’ into another tsv file named ‘product’. The ‘total_sales’ consists of sales for all the products for a year, whereas the ‘product’ consists of sales for all in products individually.
with open("total_sales.tsv") as file:
for line in file:
with open('product.tsv', "a") as f:
f.write(line)
Now, to read the file:
import csv
with open("product.tsv") as file:
tsv_file = csv.reader(file, delimiter="\t")
for line in tsv_file:
print(line)
The output is:
['Month', 'Product A Sales', 'Product B Sales', 'Product C Sales'] ['January', '297', '119', '289'] ['February', '305', '437', '362'] ['March', '234', '247', '177'] ['April', '184', '193', '219'] ['May', '373', '316', '177'] ['June', '433', '169', '370'] ['July', '294', '403', '429'] ['August', '156', '445', '216'] ['September', '441', '252', '498'] ['October', '328', '472', '491'] ['November', '270', '251', '372'] ['December', '146', '159', '156Month', 'Total Sales'] ['January', '558'] ['February', '871'] ['March', '756'] ['April', '509'] ['May', '987'] ['June', '625'] ['July', '862'] ['August', '548'] ['September', '669'] ['October', '827'] ['November', '776'] ['December', '955']
As seen above, the ‘product’ file has been appended with the contents of the ‘total_sales’ file.
Reading TSV into dictionary with open()
We can read a given tsv file and store its contents into a dictionary. To achieve that, we shall be taking a tsv file containing two columns – month and total sales. Then, with the help of the open() function, we shall store each month as the dictionary’s key and the total sales amount for the month as the values.
We shall split the month and sales using the tab character. Then, we shall enumerate over the dictionary and print its values.
sales_dictionary = {}
with open("total_sales.tsv") as f:
for line in f:
(month, sales)=line.split('\t')
sales_dictionary[month]=sales
for i,month in enumerate(sales_dictionary):
print(f'{month} : {sales_dictionary[month]}')
The output is:
Month : Total Sales January : 558 February : 871 March : 756 April : 509 May : 987 June : 625 July : 862 August : 548 September : 669 October : 827 November : 776 December : 955
Must, Read
That sums up everything about the tsv file. If you have any questions, let us know in the comments below.
Until next time, Keep Learning!