Introduction
Numpy is the name that came from ‘Numerical Python.’ It is a library in Python that performs n-dimensional operations on arrays. But have you ever thought about loading the data into numpy from the text files? We can do this with two functions i.e. numpy.genfromtxt() and numpy.loadtxt(). In this tutorial, we will be studying numpy genfromtxt().
What is numpy genfromtxt()?
We use Numpy genfromtxt() to load the data from the text files, handling missing values as specified.
Syntax
numpy.genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+, -./:;<=>?@[\\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes', *, like=None)
Parameters
- fname: It is the file, filename, list, string, list of string, or generator to read. If the filename is with the extension gz or bz2, then the file is decompressed. Note: that generator should always return byte strings. Strings in the list are treated as lines.
- dtype: It is the data type of the resulting array, which is the optional input. If we set type to None, then the dtypes will be determined by each column’s contents individually.
- comment: optional. We use this character to indicate the start of the comment. The characters occurring in a line after a comment is discarded.
- delimiter: optional. This is the string used to separate the values by default, any consecutive whitespace that occurs acts as a delimiter.
- skip_header: optional. This is the number of lines we must skip from the beginning of the file.
- skip_footer: optional. This is the number of lines we must skip from the end of the file.
- skip_rows: optional. It was removed in numpy 1.10. So. instead of it, we can use skip_header.
- converters: optional. The set function converts the data of a column to a value. The converters are also be used to provide a default value for missing data: converters = {3: lambda s: float(s or 0)}.
- missing: optional. It was removed in numpy 1.10. So. instead of it, we can use missing_values.
- missing_values: optional. These are the set of strings corresponding to missing data.
- filling_values: optional. These are the set of values to be used as default when the data are missing.
- usecols: optional. This tells us about which column to be read; 0 is being the first.
- names: optional. If names are set to True, then the field names are read from the first line after the first skip_header lines. A comment delimiter can optionally begin this line. If names are None, the names of the dtype fields will be used.
- excludelist: optional. It is a sequence of lists of names to exclude. This list gets appended to the default list.
- deletechars: optional. If a string is combining invalid characters, that must be deleted from the names.
- replace_space: optional. These are the Characters that are used to replace white spaces in the variable’s names. By default, it uses a ‘_.’
- autostrip: optional. This tells whether to strip white spaces from the variables automatically.
- defaultfmt: optional. This format defines default field names.
- unpack: optional.
- case_sensitive: optional. If set to True, field names are case sensitive, and if False or ‘upper,’ field names are converted to uppercase. If ‘lower,’ they are converted to lowercase.
- usemask: optional. It is a boolean value. If set to True, it returns a masked array. Else it will return a regular array.
- loose: optional. It is a boolean value. If set to True, it does not raise an error for invalid values.
- invalid_raise: optional. If we set it to True, then an exception is raised if an inconsistency is detected in the number of columns. Otherwise, a warning is emitted, and the offending lines are skipped.
- max_rows: optional. It tells about the maximum number of rows to be read. we cannot use it with the skip_footer parameter at the same time. By default, it reads the entire file.
- encoding: optional. This is used to decode the input file. It does not allow when the filename is the file object.
- like: This is the reference object to allow the creation of arrays that are not NumPy arrays.
Return value of numpy genfromtxt()
The function gives the return value as an array. In this, data is read from the text file. If we have set usemask to True, then it is a masked array.
Examples of numpy genfromtxt()
Let us understand numpy genfromtxt() with all the parameters with the help of examples:
1. Using str, dtype, encoding and delimiter as a parameter
In this example, we will be importing 2 libraries from python, i.e., numpy and StringIO. Then, we will take an input string in the form of a list and apply it with the given parameter and see the output.
#import numpy as np
#from io import StringIO
import numpy as np
from io import StringIO
str = StringIO("1,5.5,Latracal")
data1 = np.genfromtxt(str, dtype= str, encoding = None, delimiter=",")
Print("output : ",data1)
Output:
output : (1, 5.5, b'Latra')
Explanation:
Firstly, we have imported two libraries, i.e., numpy with an alias name as np and from io import StringIO. Secondly, we have taken an input string in str. Finally, we have applied the genfromtxt() function in which we have given some str, dtype, encoding, and delimiter and printed the output. Hence, you can see the output.
2. Using skip_header and skip_footer
In this example, we will write a file. We will import the numpy library as an alias name np. then, we will apply the function and write the file name in the function with the other parameters.
#import numpy as np
import numpy as np
Data = np.genfromtxt("latra.txt", dtype=str,
encoding=None, skip_footer=1, skip_header=1)
print(Data)
Output:
This is the best website
I love this website
Explanation:
Here, we have taken a text file with the name Latra.txt in which we have written some content. Then, we have imported the numpy library. We have then applied the genfromtxt() function in which we have given the text filename, dtype, encoding, skip header, and skip footer, which will skip the first and last line from and print the lines containing in the file. Hence, you can see the output.
3. showing comments in numpy genfromtxt()
In this example, we will be showing how comments work in the file. For this, we will import two libraries, numpy, and StringIO, and then taken input. After that, we will apply the numpy genfromtxt() function and see the output.
#import numpy and StringIO library
import numpy as np
from io import StringIO
f = StringIO('''
Latra,# of chars
Latracal Solution,13
Solutions,8''')
Data = np.genfromtxt(f, dtype='S12,S12', delimiter=',')
print(Data)
Output:
[(b'Latra', b'') (b'Latracal Sol', b'13') (b'Solutions', b'8')]
Explanation:
Here, we will be showing that how comments work in the files. For this, we will import the numpy and StringIO library. Then, we will take an input string f in which we comment with the # symbol. Then, we will apply the genfromtxt() function. Finally, we will print the output and see after the # symbol line gets removed and not printed in the output array.
4. Using autostrip in numpy genfromtxt()
In this example, we will import numpy and StringIO library. Then we will take the input as data and apply the function with the parameters and print the output with and without the autostrip parameter. Hence, we can see the difference with and without the autostrip parameter.
import numpy as np
from io import StringIO
data = u"Latra, sol , 2\n 3, xyz, 4"
# Without autostrip
d = np.genfromtxt(StringIO(data), delimiter=",", dtype="|U5")
print("without autostrip :\n ",d)
d = np.genfromtxt(StringIO(data), delimiter=",", dtype="|U5",autostrip =True)
print("with autostrip : \n",d)
Output:
without autostrip :
[['Latra' ' sol ' ' 2']
['3' ' xyz' ' 4']]
with autostrip :
[['Latra' 'sol' '2']
['3' 'xyz' '4']]
Explanation:
Here, we have imported two libraries, numpy, and StringIO. We have taken input in the data string. Then, we will apply the genfromtxt() function with its parameters. In this, firstly, we will print the output without the ‘autostrip’ parameter, and after that, we will print the output with the ‘autostrip’ parameter. Hence, we can see the output.
Difference between Genfromtxt() and loadtxt()
Numpy Genfromtxt()
We use Numpy genfromtxt() to load the data from the text files, with missing values handled as specified.
Numpy Loadtxt()
We use Numpy loadtxt() to load the data from the text files, with the aim to be a fast reader for simple text files.
Example of genfromtxt() and loadtxt()
In this example, we will be using both the function simultaneously and observe the difference between them just by seeing the output and their definition.
import numpy as np
# StringIO behaves like a file object
from io import StringIO
a = StringIO("M 21 72\nF 35 58")
b = np.loadtxt(d, dtype ={'names': ('gender', 'age', 'weight'),'formats': ('S1', 'i4', 'f4')})
print(b)
d = StringIO(u"11.3abcde")
e = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
delimiter=[1,3,5],encoding = None)
print("\n")
print(e)
Output:
[(b'M', 21, 72.) (b'F', 35, 58.)]
(1, 1.3, 'abcde')
Explanation:
Here firstly, we have imported the numpy library as np and also imported the StringIO library. Secondly, we have taken input and applied the loadtxt() function, and printed the output. Thirdly, we have taken an input as d and applied a genfromtxt() function and printed the output. Hence, you can see the output.
Conclusion
In this tutorial, we have learned about how to use the numpy genfromtxt() function. We have explained the concept in detail by taking all its parameters in the example. We have explained all the examples in detail so that you understand every parameter in deep. Hence, you can use the function and its parameters according to your need.