Obstacles to finding love

Are you single and looking for love? Are you finding it hard to meet the right person? When you’re having trouble finding a love connection, it’s all too easy to become discouraged or buy into the…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




A deep dive into Factor Analysis

Extensive and unnecessary datasets are a nightmare for any data scientist as several attributes affect the performance of machine learning algorithms. So, dimensionality reduction techniques are necessary to reduce the number of attributes we can further use for analysis.

In this article, I will be discussing Factor Analysis, a dimensionality reduction technique, and give an overview of this statistical technique. The basic outline of the working of Factor Analysis; when and how to employ factor analysis in the dataset are all discussed in the content piece.

If you want to get an overview of Dimensionality Reduction, you can refer to this blog.

Factor Analysis is an unsupervised, probabilistic machine learning algorithm used for dimensionality reduction. It aims at regrouping the correlated variables into fewer latent variables called factors that share a common variance. The main aim of the factor analysis is to find the intercorrelations among n variables through a set of common factors (the number of factors is less than the n variables). In simple terms, it groups the variables into meaningful categories.

Factor Analysis is based on the idea that the latent factors are in lower-dimensional space. The new observations are modeled as a linear transformation of latent variables plus Gaussian noise.

To get accurate results, we should check the following pre-requisites before applying the factor analysis -

Before learning the methods to apply factor analysis, let’s briefly glimpse the basic terminologies used later in this article.

The factor is a latent (hidden or unobserved) variable representing the correlated variables that share a common variance. The maximum number of factors is equal to the number of variables.

Eigenvalues represent the total variance that a given principal component can explain. Variance cannot be negative, so negative eigenvalues imply an incorrect model. In contrast, eigenvalues close to zero indicate multicollinearity as the first component can take up all the variance. E.g., an eigenvalue of 2.5 means that the factor would explain the variance of 2.5 variables.

Factor loading is the correlation coefficient for the variable and factor. It is a measure of how much the variable contributes to the factor. So, a high factor loading score means that the variables better consider the dimensions of the factors.

Communalities are the sum of the squared loadings for each variable. It indicates the amount of variance in each variable. If the communalities for a particular variable are low, say between 0–0.5, then this suggests the variable will not load significantly on any factor. Rotations don’t have any influence over the communalities of the variables.

The various steps involved in factor analysis are:

Let’s go through each step one by one in detail.

Checking the factorability of the dataset means ‘can we find the factors in the dataset?’. The methods to check the factorability are

Is the given dataset a combination of high and low correlations? If yes, then we can proceed with the factor analysis.

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy tests whether the partial correlations among variables are minor. It is a statistic that specifies the proportions of variance in the variables caused by underlying factors. High values, i.e., close to 1, indicate that the factor analysis is helpful for the dataset. If the value is less than 0.5, one shouldn’t proceed with the factor analysis.

It tests the hypothesis that the correlation matrix is an identity matrix. It will further indicate that the variables are unrelated, and therefore, factor analysis is not applicable here. If the value is less than 0.05, you can go ahead with the factor analysis. Essentially it checks to see a certain redundancy between the variables that we can summarize with a few factors. The null hypothesis of the test is that the variables are orthogonal, i.e., not correlated.

If we extract too many factors, then undesirable error variance might occur. On the other hand, removing a handful of factors might leave out valuable common variance. So it’s essential to select the most feasible way to decide the number of factors to extract.

Mainly, the eigenvalues and scree test, i.e., scree plot, determine the number of factors to retain. An eigenvalue is an analytical approach, while the scree plot is the graphical approach. Let’s see both methods in detail.

This approach is also known as Kaiser’s Criterion. In this method, all the factors above the eigenvalue of 1 are retained. An eigenvalue of more than one means that the factor explains more variance than a unique variable. The reason for choosing factors having eigenvalues more than 1 is quite simple. Our data is standard scaled, so the feature’s variance is also 1. We get the factors that explain more variance than a single observed variable.

It has been found that this criterion sometimes results in overestimation in the number of factors extracted. So, a better approach would be to use the scree test in conjugation with the analytical method.

The graphical approach is based on the visual representation of factors’ eigenvalues, called scree plots.

The scree plot consists of eigenvalues and factors. The number of factors to be retained are the data points that are left to the “elbow” of the graph. The elbow of the graph is the point where the eigenvalues seem to level off.

The scree plot consists of eigenvalues and factors. The number of factors to be retained are the data points left to the graph’s “elbow.” The elbow of the chart is the point where the eigenvalues seem to level off.

After finding the optimal number of factors, we need to interpret the factors with the help of factor loadings, commonalities, and variance. Interpretation of factors is essential to determine the strength of the relationships among the variables in the factors.

We can identify the factors through the most significant loadings. The zero loadings and the low loadings are used to confirm the identification of the factors. The signs of the loadings show the direction of the correlation does not affect the interpretation of the magnitude of factor loading or the number of factors to retain. Loading scores range from -1 to 1. Values closer to -1 or 1 indicate that the factor influences these variables. In contrast, if the loading value is more relative to 0, the factor has a lower influence on the variable.

If the factor analysis is unrotated, then the variances will be equal to eigenvalues as rotation changes the distribution of the proportional variance, keeping the cumulative variance the same.

Communality is the proportion of each variable’s variance that the factors explain. Rotations don’t have any influence over the communality of the variables. Higher communality indicated that a more considerable amount of the variance in the variable had been extracted by the factor solution. For better measurement of factor, analysis communalities should be 0.4 or greater.

Applying rotation and factor analysis does not inherently improve the predictive value of the derived factors. Still, it does help in better visualization and interpretation of the factors since the unrotated factors are sometimes ambiguous.

There are two types of rotation: Orthogonal Rotation & Oblique Rotation.

Orthogonal rotation is when the factors are rotated 90° from each other. Two standard orthogonal techniques are Quartimax and Varimax rotation. Quartimax involves minimizing the number of factors needed to explain each variable. Varimax reduces the number of variables with high loadings on each factor and makes small loadings even smaller.

Oblique rotation is when the factors are not rotated 90° from each other. The standard oblique rotation techniques are Direct Oblimin and Promax. Direct Oblimin attempts to simplify the output structure, while Promax is expedient because of its speed in larger datasets. Promax involves raising the loadings to a power of four which ultimately results in more excellent correlations among the factors and achieves a simple structure. The only problem with oblique rotation is that it makes the factors correlated.

Hello 👋

I am an aspiring researcher, who is deeply interested in Machine Learning & Deep Learning. Hope you find this article useful. 😇

Add a comment

Related posts:

Too lazy to make your bed? Here are 3 reasons why starting your day with this habit can help you win the day.

It may seem like an unnecessary chore. But it’s the keystone habit you need to start your day off on the right foot and have a sense of accomplishment. From Tim Ferriss to Jocko Willink, everyone has…