NumPy recarray Guide for Record Arrays

Numpy is the short form for Numerical Python. It is a library in python which is used for working with matrices and multi-dimensional arrays. In addition, it is used in linear algebra and for computing scientific operations while processing arrays. In this article, we will be looking at numpy recarray.

Contents

What are record arrays?

Record arrays are similar to structured arrays. Record arrays are structured arrays that are wrapped using numpy.rec.arrays. By using record arrays, we can access arrays using array attributes instead of array indexes. They have a special datatype which is numpy.record. This data type enables field access as an attribute lookup while accessing the individual elements for a given array. We use numpy.recarray for constructing record arrays.

Let us say we have an array : [( x , int ) , ( y , float )]. This array has a data type containing field. If we try to access the array elements by indexing, then it would be like – array[‘x’], array[‘y’]. But if we use record arrays, we will be able to access the same field using its attributes like – arr.x and arr.y.

Syntax of

The syntax of the numpy recarray is:

np.recarray(shape ,dtype=dtype)

Parameters of Numpy recarray:

shape: It is a tuple which contains the shape of the output array.

dtype: It is an optional parameter that tells about the data type which we want for the array.

formats: It is an optional parameter that contains a list of data types. The list contains data types for each column of the array.

name: This is also an optional parameter which is a tuple of string value signifying the name of each column.

buf: This optional parameter creates an array of the mentioned dimensions if no value has been passed by default. If buf value has been passed, then the array will use memory from an existing buffer.

Other Parameters:

There are other parameters as well such as – titles, byteorder, aligned, strides, offset and order.

Return Value:

rec : It returns an output array of the given shape and data type.

Accessing arrays using indexing

Here, we will try to access numpy arrays using regular indexing. But, first, we shall import the numpy library.

import numpy as np

Now, we shall use the array() function from the numpy library and create a user-defined array named ‘array.’ The array would store two data types which is a string datatype – ‘name’ and an integer data type which is – ‘age.’ Here, ‘S6’ indicates string and ‘i8’ indicates the integer data type. So, as the first argument, we pass three different records. And for the second argument, we pass the data type.

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', 'i8')])

Now, we shall print the entire array ‘array’.

print(array)

The output array is:

[(b'Harry', 17) (b'Rachel', 25) (b'Bailey', 21)]

Here ‘b’ before every name indicates that the data type is a string. If we see the return value of the array, we can see the data type of the array.

array

The return value is:

array([(b'Harry', 17), (b'Rachel', 25), (b'Bailey', 21)],
      dtype=[('name', 'S6'), ('age', '<i8')])

Here we can clearly see that the data type are ‘name’ and ‘age’ for the given array.

Now, if we want to access any given data type, we will have to use indexing. If we want to print the ‘name’ column, we will use ‘name’ as the index.

array['name']

Output:

array([b'Harry', b'Rachel', b'Bailey'], dtype='|S6')

To print the ‘age’ column:

array['age']

Output:

array([17, 25, 21])

The Entire Code is:

import numpy as np

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', 'i8')])

print(array)

print(array['name'])

print(array['age'])

Accessing Record Arrays Using Numpy recarray

But instead, if we want to access an array’s columns not by indexing but as attributes, we use record arrays. We will import numpy first and then create the same array using the np.array.

import numpy as np

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', 'i8')])

Now, we shall use the view() function present in numpy to create a record array. We will call it using array.view() and pass the np.recarray as an argument to the function.

array = array.view(np.recarray)

Now, we can access the array columns as array attributes. To access the ‘name’ column, we shall call it using the array.name.

array.name

The output is:

array([b'Harry', b'Rachel', b'Bailey'], dtype='|S6')

Similarly, to call the ‘age’ column, we shall use array.age.

array.age

The output will be:

array([17, 25, 21])

If we want to access a particular attribute for a given record, then we can do it this way:

print(array[1].name)

The output will print the name column for the second record, i.e., row number 1.

b'Rachel'

The entire code is:

import numpy as np

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', 'i8')])

array = array.view(np.recarray)

print(array)

print(array.name)

print(array.age)

Converting numpy.recarray to list

In order to convert a numpy array into a list, we shall be using the tolist() function. We will convert the ‘name’ column of the record array into a list and shall save it into a new list named ‘rec_array_list’.

The python code is:

import numpy as np

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', int)])

rec_array = array.view(np.recarray)

rec_array_list = rec_array.age.tolist()

print(rec_array_list)

The output is:

[17, 25, 21]

Numpy.recarray to pandas dataframe

We can also convert a numpy recarray into a pandas dataframe. For that, we shall import the pandas dataframe.

Then, we shall use the DataFrame() function available in pandas by passing the ‘rec_array’ as an argument. We shall store that into a variable ‘df’ and print its value and its data type. Its data type is the panda’s framework.

import numpy as np

import pandas as pd

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', int)])

rec_array = array.view(np.recarray)

df = pd.DataFrame(rec_array)

print(df)

print(type(df))

The output is:

        name  age
0   b'Harry'   17
1  b'Rachel'   25
2  b'Bailey'   21
<class 'pandas.core.frame.DataFrame'>

Adding new records to numpy record

To add new records to the numpy record, we shall make use of the append() function. But first, we shall have to convert the array to be appended into a record array before adding it to another record array.

import numpy as np

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', int)])

rec_array = array.view(np.recarray)

new = np.array([('Sue',16)], dtype=[('name', 'S6'), ('age', int)])

new = new.view(np.recarray)

rec_array = np.append(rec_array, new)

print(rec_array)

The output is:

[(b'Harry', 17) (b'Rachel', 25) (b'Bailey', 21) (b'Sue', 16)]

Adding columns to existing record array

To add columns to an existing record array, we shall first convert it into a pandas data frame. And then, we shall add a column to the dataframe. Finally, after appending the column, we will convert it back to a numpy record array.

import numpy as np

import pandas as pd

array = np.array([('Harry', 17), ('Rachel', 25), ('Bailey', 21 )], dtype=[('name', 'S6'), ('age', int)])
rec_array = array.view(np.recarray)

df = pd.DataFrame(rec_array)

df['grade'] = ['A', 'B', 'A']

new = df.to_records(index=False)

print(new)

The output with the added column value is:

[(b'Harry', 17, 'A') (b'Rachel', 25, 'B') (b'Bailey', 21, 'A')]

Methods of Numpy Recarray

Numpy.recarray has several methods to work on record arrays. We will be looking at some of the methods here.

Numpy.recarray.all()

The Numpy.recarray.all() method is used to check if all elements are True. It will return True if all elements are True.

import numpy as np


array = np.array([(97.4, 17), (0, 25), (58, 21 )], dtype=[('marks', 'f8'), ('age', 'i8')])

rec_array = array.view(np.recarray)

print(np.recarray.all(rec_array.marks))

print(np.recarray.all(rec_array.age))

Output:

False
True

Here the first output is false because the ‘marks’ column for the second record was zero. Hence, not all values were True, so it printed False. But for the second output, since all the records in the ‘age’ column were True, the output for the all() function was also True.

Numpy.recarray.argmax()

The Numpy.recarray.argmax() is used to return the index of the maximum value from an array’s particular column.

import numpy as np

array = np.array([(97.4, 17), (0, 25), (58, 21 )], dtype=[('marks', 'f8'), ('age', 'i8')])

rec_array = array.view(np.recarray)

index1 = np.recarray.argmax(rec_array.marks)
print(rec_array[index1].marks)

index2 = np.recarray.argmax(rec_array.age)
print(rec_array[index2].age)

Here we have returned the index value of the maximum value from the ‘marks’ and ‘age’ column into ‘index1’ and ‘index2’, respectively. Then we have printed the respective column’s record.

The output is:

97.4
25

Numpy.recarray.sort()

The Numpy.recarray.sort() returns a sorted copy of a given record array.

import numpy as np

array = np.array([(97.4, 17), (0, 25), (58, 21 )], dtype=[('marks', 'f8'), ('age', 'i8')])

rec_array = array.view(np.recarray)

np.recarray.sort(rec_array.marks)
print(rec_array.marks)

Here in the above code, we have sorted the ‘marks’ column of the recarray and then printed it. It will print the marks in ascending order.

Output:

[ 0.  58.  97.4]

FAQ’s

What is the difference between numpy recarray and numpy structured array?

The difference between structured array and recarray is the way of accessing both. We can access structured arrays using indexing, i.e., by passing the column name as the index to the array. But, with recarray, we access the records by using the column name as an attribute name.

That is it for Numpy Recarray in python. If you have any questions to ask, do let us know in the comments below.

Until the, Keep Learning!