How Numpy Extrapolation is Changing the Game in Data Analysis

What will you do if you wish to know the value of a function outside the range? The Numpy extrapolate function is one such function that helps you in scientific computing. Let’s learn more about the numpy extrapolate function.

Contents

About numpy extrapolate

This function aids the user in finding the value of a given function beyond the prescribed range. It is also used in case of missing data points.

Use cases

The extrapolate function of numpy is utilized in a large number of domains. The notable ones are listed below.

Forecasting
Prediction
Trend analysis
Anomaly detection
Natural language processing
Signal processing
Research computing
Scientific computing
Financial forecasting
Economic forecasting
Real-time computing

Numpy linear extrapolation

This follows extrapolating one-dimensional data. So, basically, you can fit a linear data segment based on given points using this method. After that, you need to carry the extrapolation process to get new y values, thereby resulting in obtaining values of function beyond the given points.

It is done using two methods:

numpy.polyfit() function
numpy.polyval() function

How to do linear extrapolation?

In the first place, you need to import the required libraries and then use the np.array function to curate two arrays for x and y values. The polyfit function of numpy aids in fitting the line in a linear manner. It takes into account three parameters:

x values
y values
degree of polynomial (1 for linear)

It returns the coefficients of polynomials in decreasing order only. Next, provide the function with new x values and use the polyval() function of numpy to obtain extrapolated y values based on the x values. It takes 2 arguments:

obtained coefficients of your polynomial equation
new x values

It returns the y values finally. The example given below explains this process.

import numpy as np

# Known x and y values
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Fit a linear line to the known data points
model = np.polyfit(x, y, 1)

# New x values for which to extrapolate the y values
new_x = np.array([6, 7])

# Extrapolate the y values for the new x values
new_y = np.polyval(model, new_x)

print(new_y)

Thus, the output will be: [12. 14.]

Numpy spline extrapolation

You might want to use a curve to fit data. The spline extrapolation does exactly the same. This method, then, uses the curve to obtain extrapolated values of y based on the new x values.

How to do spline extrapolation?

Import the required libraries, then use the np.array function to curate two arrays for x and y values. The interp1d() function helps in creating an interpolator object, here you need to specify the type of extrapolation you want. Once this is done, create an array with new x values and, lastly, extrapolate the y values.

import numpy as np
from scipy.interpolate import interp1d

# Known x and y values
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Fit a spline curve to the known data points
interp = interp1d(x, y, kind='cubic')

# New x values for which to extrapolate the y values
new_x = np.array([6, 7])

# Extrapolate the y values for the new x values
new_y = interp(new_x)

print(new_y)

Thus, the output will be: [12.00177798 14.00000178]

Logarithmic extrapolation

You can extrapolate y values on the logarithmic scale also. It follows similar steps as depicted in the above-mentioned approaches. The difference is that time and again, we need to get x and y values on a logarithmic scale. Thus, finding the log helps in this scenario. The given code elaborates on this fact.

In this example, before fitting the variables, you have to find a log of x and y values present in the arrays. Also, after this is completed, calculate the new x values also on the log scale and obtain the extrapolated values using the polyval function.

import numpy as np

# Known x and y values
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Convert the x and y values to a logarithmic scale
log_x = np.log(x)
log_y = np.log(y)

# Fit a linear line to the transformed data points
model = np.polyfit(log_x, log_y, 1)

# Convert the x and y values back to the original scale
new_x = np.array([6, 7])
log_new_x = np.log(new_x)

# Evaluate the fitted line at the new x values for which you want to extrapolate the y values
new_y = np.exp(np.polyval(model, log_new_x))

print(new_y)

Thus, you will get the output as: [16.84834497 24.91461879]

Extrapolate multidimensional data

You might have data in more than one dimension. In the case of multidimensional data, first, you need to flatten the array. Post this, carry out the extrapolation process using the numpy.polyfit() function. Once you have obtained extrapolated y values, change the data back to the original dimension.

Limitations

The spline method is considered more suitable for multidimensional data due to the formation of curves. It results in more complex shapes. Apart from this, one can also follow the cross-validation method. It segregates data into train and test data, performs the extrapolation on train data, and tests it using the test data. You can find generalization errors as well.

Working methodology in multidimensional data

Consider the following example for better insights. It uses the basic linear extrapolation method for better understanding.

import numpy as np

# Known multidimensional data
data = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

# Flatten the multidimensional data into a one-dimensional array
flat_data = data.flatten()

# Extrapolate the one-dimensional array
extrapolated_flat_data = np.interp(
    np.linspace(-1, 3, len(flat_data)), np.arange(len(flat_data)), flat_data
)

# Reshape the extrapolated one-dimensional array back into the original multidimensional shape
extrapolated_data = extrapolated_flat_data.reshape(data.shape)

print(extrapolated_data)

Thus, you will get the output as: [[[0. 1.5] [2. 2.5]] [[3.5 4.5] [5. 5.5]]] in the form of original multidimensional data only.

Extrapolate irregularly spaced data.

If you have irregular data in the dataset and you wish to extrapolate it, first, you need to let the data come under a certain range. In other words, interpolate the given data. Subsequently, you need to extrapolate it. However, it holds a few limitations also. It may not be that accurate for noisy data. In certain conditions, it can be unreliable.

The given example explains how you can extrapolate the irregularly spaced data.

import numpy as np
from scipy.interpolate import griddata

# Known irregularly spaced data
x = np.array([1, 2.5, 3.2, 4.1, 5])
y = np.array([2, 4, 6, 8, 10])

# Create a grid of regularly spaced points
xi = np.linspace(0, 6, 100)
yi = np.linspace(0, 12, 100)

# Interpolate the irregularly spaced data onto the grid
zi = griddata(x, y, (xi, yi), method='cubic')

# Extrapolate the interpolated data on the grid
extrapolated_zi = np.interp(
    np.linspace(-1, 7, len(zi)), np.arange(len(zi)), zi
)

# Evaluate the extrapolated data on the grid at the new irregularly spaced points for which you want to extrapolate the values
new_x = np.array([6.5, 7.5])
new_y = griddata(xi, yi, extrapolated_zi, (new_x, new_y), method='cubic')

print(new_y)

Thus, you will get the output as: [13.25 15.5] in the form of the original array.

Extrapolate using robust regression

In this case, once you apply the model to the data points, you can extrapolate it. The only difference is that we need to fit a model on the data points and then extrapolate. Have a look at the given example. It uses the Hubber regression model to fit the x and y values. Lastly, you just need to print and check your values.

import numpy as np
from sklearn.linear_model import HuberRegressor

# Known x and y values
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Fit a Huber regression model to the known data points
regressor = HuberRegressor()
regressor.fit(x, y)
# New x values for which to extrapolate the y values
new_x = np.array([6, 7])

# Extrapolate the y values for the new x values
extrapolated_y = regressor.predict(new_x)
print(extrapolated_y)

The output you will get is :[12.00000178 14.00000178]

The preferred dtypes

The two data types that are preferred with numpy are numpy.float32 or numpy.float64. This is because these offer high precision. Also, these recognize errors well too. There are very few chances of such a data type to get underflowed. In addition to this, float values are able to provide a large number of values to the user. The given example illustrates how you can use float values while creating an array and extrapolate them.

import numpy as np

# Known x and y values (floating-point dtypes)
x = np.array([1.0, 2.0, 3.0, 4.0, 5.0], dtype=np.float32)
y = np.array([2.0, 4.0, 6.0, 8.0, 10.0], dtype=np.float32)

# Fit a linear regression model to the known data points
model = np.polyfit(x, y, 1)

# New x value for which to extrapolate the y value (floating-point dtype)
new_x = 6.0

# Extrapolate the y value for the new x value
new_y = np.polyval(model, new_x)

print(new_y)

FAQs on Numpy Extrapolate

How can you improve performance while extrapolating data?

To boost performance, you should use vectorized functions for the arrays, use a GPU compatible with CUDA, and prefer compiled functions over normal ones.

Conclusion

This blog covers the numpy extrapolate method in detail. It discusses the different types of extrapolation that are available to the users. NumPy has also introduced different functions that are specifically used to carry out extrapolation. In addition to this, this article elaborates on how you can extrapolate data with any model and the data types that are preferred for numpy extrapolation.