This is a document showing the process to find SUMA. First, the process to determine the final model dimensions is shown using Principal Component Analysis (PCA) in R-Studio. Next, the final model tool and its corresponding VBA code is provided here http://rmarkdown.rstudio.com.
#Set working directory
setwd("C:/Users/vanes/Documents/A-Research/A_Indoor Navigation/A_Defense/PCA")
# Load libraries
library(ggplot2)
library(FactoMineR)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
# Upload data
data <- read.csv('Final Data.csv')
Here PCA will be conducted.
# Run Principal Component Analysis
data.pca <- prcomp(data, center = TRUE, scale = TRUE)
Now researcher judgement will be used to determine which principal components (PC) on 3 criteria.
3.1. Visualization (Scree plot): visualize the variability each PC accounts for. An elbow in the plot usually signifies around where you will define a cutoff for the PC to keep.
3.2. Kaisers Rule: Selected PC should have an eigenvalue greater than 1.
3.3. Proportion of Variance: Selected PC should account for at least 50% of the original dataset’s variability.
# 3.1. Visualization )Scree plot)
plot(data.pca, type="l", main= "Scree Plot")
# 3.2. Kaisers Rule
eigen(cor(data))$values
## [1] 4.2892832 2.5078757 1.2962672 1.0134558 0.8086376 0.7008438 0.6659542
## [8] 0.5464710 0.3083299 0.2727792 0.2424242 0.1861854 0.1614928
# 3.3. Proportion of Variance
# Shown on Scree plot
fviz_eig(data.pca, addlabels = TRUE, main= "Scree Plot")
# Proportion of variance of each PC and cumulative variance
summary(data.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.0711 1.5836 1.13854 1.00671 0.8992 0.83716 0.81606
## Proportion of Variance 0.3299 0.1929 0.09971 0.07796 0.0622 0.05391 0.05123
## Cumulative Proportion 0.3299 0.5229 0.62257 0.70053 0.7627 0.81664 0.86787
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.73924 0.55527 0.52228 0.49237 0.43149 0.40186
## Proportion of Variance 0.04204 0.02372 0.02098 0.01865 0.01432 0.01242
## Cumulative Proportion 0.90991 0.93362 0.95461 0.97326 0.98758 1.00000
# Rotation matrix:
print(data.pca$rotation)
## PC1 PC2 PC3 PC4
## Satisfaction 0.40743266 -0.12412628 0.06300737 0.29007977
## Percievable 0.36098097 -0.26082404 0.14168258 -0.10163382
## Operable 0.38014743 -0.12562730 -0.01195847 0.38912573
## Understandable 0.37523835 -0.24409040 0.05821591 0.29383288
## Robust 0.25146943 -0.22280222 0.16742767 -0.58055299
## Flexibility 0.21368316 -0.28899384 0.08706650 -0.47325845
## Recall 0.10238364 0.33937353 0.31576946 0.11403152
## recognition 0.06860482 0.26417873 0.44885222 -0.09502660
## Errors -0.27243654 -0.06886467 0.46023554 -0.03258493
## Hints -0.25365329 -0.40140601 -0.11531735 0.07042785
## Trials.before.mastery -0.29259051 -0.33528459 0.38199373 0.15701651
## Times -0.20221902 -0.40896128 -0.30222492 0.01551537
## Task.Completetion 0.17095517 0.27846701 -0.41624368 -0.23195333
## PC5 PC6 PC7 PC8
## Satisfaction -6.382866e-02 -0.05814290 -0.09134504 0.072806791
## Percievable 8.474734e-02 -0.02164993 -0.07684818 -0.251118586
## Operable 7.432292e-02 0.01846792 -0.10258219 0.080594815
## Understandable -1.066560e-02 -0.09391828 -0.11173894 -0.005618854
## Robust 9.489796e-02 0.03438693 0.06356460 -0.543170525
## Flexibility -1.373159e-01 -0.13897093 0.19361905 0.743381432
## Recall -2.697261e-01 -0.65794657 0.46444916 -0.163706438
## recognition -6.465558e-01 0.31189661 -0.43409357 0.021846042
## Errors 4.106476e-01 -0.35960673 -0.40590319 0.145051827
## Hints -3.988881e-01 -0.19030383 -0.01236963 -0.110976421
## Trials.before.mastery 5.044985e-02 -0.07309710 -0.11270007 -0.055735673
## Times -3.662921e-01 -0.19941005 -0.10840659 -0.115893315
## Task.Completetion -7.256901e-06 -0.47505822 -0.57245000 -0.017052466
## PC9 PC10 PC11 PC12
## Satisfaction 0.40997585 -0.00543586 -0.25967070 0.060965255
## Percievable -0.57546260 -0.02427120 -0.59321972 -0.061999832
## Operable -0.39176333 -0.30531505 0.61191582 -0.202542533
## Understandable 0.34065065 0.29658156 0.03569358 0.286392537
## Robust 0.23978526 -0.03935357 0.37399904 0.031202287
## Flexibility -0.01759915 0.01483129 0.03522686 -0.090279321
## Recall -0.03608573 -0.08474132 0.01408464 -0.050235540
## recognition -0.03559159 -0.06469405 0.03440940 0.030287365
## Errors -0.05410484 -0.20772107 0.05049328 0.412992545
## Hints -0.32403870 0.49473219 0.20677412 0.292142287
## Trials.before.mastery 0.17408134 0.21393257 -0.01381877 -0.727797627
## Times 0.18031804 -0.65658662 -0.12677012 -0.004317853
## Task.Completetion 0.01087488 0.20071686 0.02968442 -0.266856207
## PC13
## Satisfaction -0.6856531697
## Percievable 0.0845177160
## Operable -0.0556780714
## Understandable 0.6333556968
## Robust -0.1135394154
## Flexibility 0.0327733631
## Recall 0.0475995496
## recognition 0.0567698389
## Errors -0.0722757522
## Hints -0.2657354623
## Trials.before.mastery 0.0418771774
## Times 0.1464584515
## Task.Completetion -0.0005291889
Using the criteria it was decided to keep the first 2 PC based on:
Now that the first 2 PC were determine to keep, which original dataset dimensions to keep will be selected. In behavioral research loadings from 0.32 to 0.50 are used as selection thresholds.
# Print the loadings for the first two principal components
rotation_matrix <- data.pca$rotation
print(rotation_matrix[, 1:2])
## PC1 PC2
## Satisfaction 0.40743266 -0.12412628
## Percievable 0.36098097 -0.26082404
## Operable 0.38014743 -0.12562730
## Understandable 0.37523835 -0.24409040
## Robust 0.25146943 -0.22280222
## Flexibility 0.21368316 -0.28899384
## Recall 0.10238364 0.33937353
## recognition 0.06860482 0.26417873
## Errors -0.27243654 -0.06886467
## Hints -0.25365329 -0.40140601
## Trials.before.mastery -0.29259051 -0.33528459
## Times -0.20221902 -0.40896128
## Task.Completetion 0.17095517 0.27846701
Given the results of this dataset, a cut off of 0.35 was selected and yeilded a reduced dataset of 6 dimensions: satisfaction, perceivable, operable, understandable, and task completion time.
Last proofs can be checked to ensure fundamental assumptions and properties of PCA are upheld.
5.1. Eigenvalues of Correlation Matrix= Variance of Transformed Data If the data has been standardized (mean-centered and scaled to unit variance), the correlation matrix and the covariance matrix of the original data are essentially the same when eigenvalues are considered
5.2. Proof of Orthogonality The diagonal matrix with 1s on the diagonal and 0s elsewhere in the correlation matrix of proved orthogonality and standardization.
# PCA Proof
# 5.1. Need PC to have mean=0, and variance=eigenvalue
eigen(cor(data))$values
## [1] 4.2892832 2.5078757 1.2962672 1.0134558 0.8086376 0.7008438 0.6659542
## [8] 0.5464710 0.3083299 0.2727792 0.2424242 0.1861854 0.1614928
diag(var(data.pca$x[,]))
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 4.2892832 2.5078757 1.2962672 1.0134558 0.8086376 0.7008438 0.6659542 0.5464710
## PC9 PC10 PC11 PC12 PC13
## 0.3083299 0.2727792 0.2424242 0.1861854 0.1614928
# 5.2. Show PC are orthogonal
cor(data.pca$x)
## PC1 PC2 PC3 PC4 PC5
## PC1 1.000000e+00 6.025343e-20 -1.224612e-16 2.204008e-16 9.997869e-17
## PC2 6.025343e-20 1.000000e+00 -2.273681e-17 -3.381308e-16 -2.954093e-16
## PC3 -1.224612e-16 -2.273681e-17 1.000000e+00 -3.601802e-16 -1.148432e-17
## PC4 2.204008e-16 -3.381308e-16 -3.601802e-16 1.000000e+00 -1.072004e-16
## PC5 9.997869e-17 -2.954093e-16 -1.148432e-17 -1.072004e-16 1.000000e+00
## PC6 -2.447470e-16 -1.052046e-16 -1.304982e-16 2.173398e-16 4.213683e-17
## PC7 -4.989024e-16 2.074557e-16 1.461877e-16 7.784022e-17 -4.929628e-16
## PC8 2.706046e-16 1.207084e-17 3.432536e-16 9.348395e-16 1.952246e-16
## PC9 -1.472830e-15 -1.673737e-17 1.211626e-16 -3.573365e-17 4.965619e-16
## PC10 -1.240592e-15 6.787356e-17 -1.430841e-16 -2.520490e-16 -1.203846e-16
## PC11 9.393499e-16 -4.722369e-16 4.435577e-16 1.696966e-16 -9.196416e-17
## PC12 9.844457e-17 -1.484719e-16 -5.418184e-16 -9.149781e-16 -7.012350e-16
## PC13 9.087503e-17 2.374371e-16 -2.939414e-16 -5.938340e-16 1.063313e-16
## PC6 PC7 PC8 PC9 PC10
## PC1 -2.447470e-16 -4.989024e-16 2.706046e-16 -1.472830e-15 -1.240592e-15
## PC2 -1.052046e-16 2.074557e-16 1.207084e-17 -1.673737e-17 6.787356e-17
## PC3 -1.304982e-16 1.461877e-16 3.432536e-16 1.211626e-16 -1.430841e-16
## PC4 2.173398e-16 7.784022e-17 9.348395e-16 -3.573365e-17 -2.520490e-16
## PC5 4.213683e-17 -4.929628e-16 1.952246e-16 4.965619e-16 -1.203846e-16
## PC6 1.000000e+00 -1.565405e-16 -4.889407e-16 4.550821e-16 -1.318238e-16
## PC7 -1.565405e-16 1.000000e+00 -1.162650e-16 3.349880e-17 -2.330568e-16
## PC8 -4.889407e-16 -1.162650e-16 1.000000e+00 -8.573692e-16 -2.751941e-16
## PC9 4.550821e-16 3.349880e-17 -8.573692e-16 1.000000e+00 -2.644992e-16
## PC10 -1.318238e-16 -2.330568e-16 -2.751941e-16 -2.644992e-16 1.000000e+00
## PC11 1.598627e-16 -1.787284e-16 -3.556507e-16 -4.862962e-16 -6.681759e-17
## PC12 2.814743e-16 -3.878900e-16 -1.494996e-16 -5.800430e-16 9.684632e-17
## PC13 5.764499e-16 6.344024e-16 -1.008947e-15 4.340642e-16 5.482013e-16
## PC11 PC12 PC13
## PC1 9.393499e-16 9.844457e-17 9.087503e-17
## PC2 -4.722369e-16 -1.484719e-16 2.374371e-16
## PC3 4.435577e-16 -5.418184e-16 -2.939414e-16
## PC4 1.696966e-16 -9.149781e-16 -5.938340e-16
## PC5 -9.196416e-17 -7.012350e-16 1.063313e-16
## PC6 1.598627e-16 2.814743e-16 5.764499e-16
## PC7 -1.787284e-16 -3.878900e-16 6.344024e-16
## PC8 -3.556507e-16 -1.494996e-16 -1.008947e-15
## PC9 -4.862962e-16 -5.800430e-16 4.340642e-16
## PC10 -6.681759e-17 9.684632e-17 5.482013e-16
## PC11 1.000000e+00 -4.661230e-16 -7.923753e-16
## PC12 -4.661230e-16 1.000000e+00 -1.311666e-17
## PC13 -7.923753e-16 -1.311666e-17 1.000000e+00
Use the PCAtest R package to find the the overall significance of a PCA and more. Find more about the PCAtest package here https://github.com/arleyc/PCAtest.
If the p-value is significant, it can be concluded that the observed value is unlikely to have occurred by random chance. This conclusion can also support there was enough data points to provide meaningful PCA results.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.