In this article, we will learn something interesting and useful. The topic we are going to learn: How to use scree plot in python? This will be very easy to learn. This is useful for PCA. PCA means Principal Component Analysis. A Scree plot is something that may be plotted in a graph or bar diagram. Let us learn about the scree plot in python.
A Scree plot is a graph useful to plot the eigenvectors. This plot is useful to determine the PCA(Principal Component Analysis) and FA (Factor Analysis). The screen plot has another name that is the scree test. In a scree plot, the eigenvalues are always in a downward curve. It orders the values in descending order that is from largest to smallest.
Importance of Scree Plot in PCA.
A PCA is a reduction technique that transforms a high-dimensional data set into a new lower-dimensional data set. At the same time, preserving the maximum amount of information from the original data. And whenever dealing with PCA, we are encounter eigenvalues and eigenvectors.
A scree plot is a tool useful to check if the PCA working well on our data or not. The amount of variation is useful to create the Principal Components. It is represented as PC1, PC2, PC3, and so on. PC1 is useful to capture the topmost variation. PC2 is useful for another level, and it goes on. The advantage is that if PC1, PC2, and PC3 capture the most variation, we can ignore the rest.
Scree Plot Criterion
A method followed to determine the number of Principal Component is a graphical representation, and that is known as Scree plot. The Scree plot shows the eigenvalue for each Principal Component.
The graph shows eigenvalues on the y axis and no of factors on the x-axis. It is a downward curve. Most of the scree plot looks similar to each other in shapes, etc. This happens because PC1 gives most of the variation. PC2 gives moderate, and the rest of the others are a tiny part to look similar.
Steps to be followed in PCA
- First, we have to standardize the data
- Secondly, we have to calculate the covariance matrix
- Thirdly, we have to find the eigenvalues and eigenvectors for that covariance matrix.
- Fourthly, we have to sort that eigenvalues
- Fifthly, transform the original matrix
import numpy as np import matplotlib import matplotlib.pyplot as plt N=np.random.randn(6,9) N=np.matrix(N.T)*np.matrix(N) A,B,C=np.linalg.svd(N) eigen_values=B**2/np.sum(B**2) figure=plt.figure(figsize=(10,6)) sing_vals=np.arange(len(eigen_values)) + 1 plt.plot(sing_vals,eigen_values, 'ro-', linewidth=2) plt.title('Scree Plot') plt.xlabel('Principal Component') plt.ylabel('Eigenvalue') plt.show()
This is the code to plot the scree plot. In the x-axis, it shows the Principal component, and on the y-axis, it shows eigenvectors. This will display the downward curve.
Must Read | Cracking The Python Autocorrelation Code
Applications of PCA
- PCA is useful for image compression. We can resize the image as we want.
- It is also useful in the food science field.
- Using in Banking field and healthcare industries.
- PCA is useful in the finance center.
PCA using sklearn library
import numpy as np from sklearn import decomposition from sklearn import datasets import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D
First, importing necessary libraries. Here we are using Numpy, matplotlib, sklearn libraries. From sklearn importing datasets and decomposition.
np.random.seed(5) cen = [[1, 1], [-1, -1], [1, -1]] a = datasets.load_iris() X = a.data y = a.target
Next, giving the centers to plot as cen. In x-axis giving data, and in y-axis giving a target.
figure = plt.figure(1, figsize=(5,4)) plt.clf() axis = Axes3D(figure, rect=[0, 0, 0.95, 1], elev=45, azim=132) plt.cla() PCA = decomposition.PCA(n_components=3) PCA.fit(X)
Next setting the axis and figure size. Setting PCA components as 3.
X = PCA.transform(X) y = np.choose(y, [1, 2, 0]).astype(float) axis.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.nipy_spectral, edgecolor='k') axis.w_xaxis.set_ticklabels() axis.w_yaxis.set_ticklabels() axis.w_zaxis.set_ticklabels() plt.show()
After this getting all the axis using plt.show().
Factor Analysis using PCA
Factor analysis is a regression method. We can apply to discover root causes or hidden factors that are present in the data set. But not observable. Using factor analysis, we can find latent variables that explain the pattern of observed behavior.
This explained the variance among the observed variables and condensed a set of observed variables into unobserved variables. This is called the factor.
- The data set should not have out layers.
- The sample set should be greater than the factor.
- For example, if we have 10 sample sets then the factor maybe 3, then only the data set is used to calculate the FA.
- Fraud detection
- Spam detection
Difference between PCA and FA
|PCA stands for Principal Component Analysis||FA stands for Factor Analysis|
|It is useful to transform the data from a larger to a smaller number of components.||It is useful to understand the underlying “cause.”|
|It is a type of SVD (Singular Value Decomposition)||It is also known as Common Factor Analysis.|
|It explains the cumulative variance in the predictors.||It explains the correlation between the variables.|
FAQs related to scree plot in python
The graph shows eigenvalues on the y-axis and no of factors on the x-axis.
The scree plot always shows a downward curve.
A Scree plot is useful to determine the PCA(Principal Component Analysis) and FA (Factor Analysis).
Another name for the scree plot is the scree test.
In The End
We have seen about scree plot in python. The Scree plot is nothing. It is a simple graph. Here we have learned What is scree plot? The criterion of scree plot. Importance of scree plot in PCA and applications of PCA. We hope this article is easy and beneficial.