Numpy Variance | What var() Function Do in Numpy

In today’s article, we will learn about the Numpy var() function. The Numpy variance function calculates the variance of Numpy array elements. Variance calculates the average of the squared deviations from the mean, i.e., var = mean(abs(x – x.mean())**2)e. Mean is x.sum() / N, where N = len(x) for an array x. The variance is for the flattened array by default, otherwise over the specified axis.

Variance refers to the expectation of standard deviation for a variable from its mean in layman’s terms. Numpy Variance calculates the same thing over the array of numbers. Moreover, with improved performance and algorithms, you get the variance in the form of a numpy array in return. In this post, we’ll look at this variance function in detail.

Contents

Syntax of Numpy var():

numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)

Parameter of Numpy Variance

a = Array containing elements whose variance is to be calculated

Axis = The default is none, which means computes the variance of a 1D flattened array. However, the axis can be int or tuple of ints. If they want the variance to be calculated along any particular axis or axes, respectively. (Optional)

dtype = Data type to use in computing the variance. Default is float64 for arrays of integer type. For arrays of float types it is the same as the array type.(Optional)

out = Alternate output array having the same dimension as that of the expected output. But the type is cast if needed. (Optional)

Ddof = Refers to “Delta Degrees of Freedom”: the divisor used in the calculation is N – ddof. Where N represents the number of elements. ddof is zero by default. (Optional)

Keepdims = If this is set to True. Additionally, the reduced axes return as arrays with size one dimension. With this option, the result will broadcast correctly against the input array. For default value, keepdims will not be passed through to the var() method of sub-classes of ndarray. However, any non-default value do pass. (Optional)

Return type of Numpy var() function in Python:

Returns variance of the data elements of the input array. If out=None, returns a new array containing the variance; otherwise, a reference to the output array is returned.

Example of Numpy Variance:

import numpy as np 
  
# create array 
array = np.arange(10) 
print(array) 

r = np.var(array) 
print("\nvariance: ", r)

Output:

variance:  8.25

Explanation

In the above example. Numpy var() function is used to calculate the variance of an array created by the programmer. The optional parameters are noncompulsory while using the function in programs. The numpy var() functions return the variance accurately by passing the array as a parameter.

Numpy Variance var() with desired dtype

import numpy as np  
      
# 1D array  
a = [20, 2, 7, 1, 34]  
  
print("array : ", a)  
print("var of array : ", np.var(a))  
  
print("\nvar of array : ", np.var(a, dtype = np.float32))  
print("\nvar of array : ", np.var(a, dtype = np.float64))

Output:

array :  [20, 2, 7, 1, 34]
variance of array :  158.16

variance of array :  158.16

variance of array :  158.16

Explanation:

In the above example, first, we print the variance of the given 1D array. When the dtype is not included. dtype is the data type we desire while computing the variance. It is optional and, by default, is float64 for integer type arrays. But when we include the dtype parameter and set its value other than the default. We get the output variance of the desired dtype. Similarly, we have set the dtype here to float32 and float64, respectively.

Numpy Variance function in Python for multi dimensional array

import numpy as np  
      
# 2D array  
arr = [[2, 2, 2, 2, 2],  
    [15, 6, 27, 8, 2],  
    [23, 2, 54, 1, 2, ],  
    [11, 44, 34, 7, 2]]  
  
      
# var of the flattened array  
print("\nvar of arr, axis = None : ", np.var(arr))  
      
# var along the axis = 0  
print("\nvar of arr, axis = 0 : ", np.var(arr, axis = 0))  
  
# var along the axis = 1  
print("\nvar of arr, axis = 1 : ", np.var(arr, axis = 1))

Output:

var of arr, axis = None :  236.14000000000004

var of arr, axis = 0 :  [ 57.1875 312.75   345.6875   9.25     0.    ]

var of arr, axis = 1 :  [  0.    77.04 421.84 269.04]

Explanation:

In the above example, the function calculates the given multidimensional array variance along with the axis parameter. When the axis is none, which is the default value, it calculates the flattened array variance. When the axis is 0, it calculates the given multi-dimensional array variance along the direction of rows. And when the axis is 1, it calculates the variance along the direction of columns.

Numpy var() v/s Statistics var()

Statistics var() calculates the variance of given array elements just like the Numpy var() function. However, it does not work well with a multi-dimensional array because:

The statistics module does not create multidimensional arrays. We need a Numpy library for that.

Also, there is no parameter to recognize which axis the variance is to be calculated for multidimensional arrays.

Syntax of Statistics var()

Syntax of Statistics var():

statistics.variance(data, xbar=None)

Use this parameter, where data is an array of valid numbers, including Decimal and Fraction values. And, xbar is the mean of data. This parameter is optional. If not mentioned, then the mean is automatically calculated.

Example of Statistics var()

import statistics

dataset = [21, 19, 11, 21, 19, 46, 29]
output = statistics.variance(dataset) 

print(output)

Output:

124.23809523809524

Variance of Array without NumPy

We can calculate the variance without using the Numpy module. Following example illustrates, how it’s done –

import math 
  
def variance(a, n): 
    sum = 0
    for i in range(0 ,n): 
        sum += a[i] 
    mean = sum /n 
    sqDiff = 0
    for i in range(0 ,n): 
        sqDiff += ((a[i] - mean)*(a[i] - mean)) 
    return sqDiff / n

arr = [500, 460, 270, 400, 350] 
n = len(arr) 
print("Variance: ", int(variance(arr, n)))

Output –

Variance:  6584

Explanation –

Firstly, the variance depends on the square of the difference between the value and its mean. As a result, the larger the values away from the mean, the more will be the variance. In the above example, we created a function named variance() that accepts the array and its length and returns its variance. At first, the mean is calculated, and then the sum of squares of differences.

Conclusion

In conclusion, this article provides you with all the information regarding the Numpy variance function in Python. The variance function is used to find the variance of a given data set. Importing the Numpy module gives access to create ndarray and perform operations like mean standard deviation. Moreover, the variance over it using specific functions inbuilt in the Numpy module itself. You can refer to the above examples for any queries regarding the Numpy var() function in Python.

However, if you have any doubts or questions do let me know in the comment section below. I will try to help you as soon as possible.

Happy Pythoning!