Finding SUMA

This is a document showing the process to find SUMA. First, the process to determine the final model dimensions is shown using Principal Component Analysis (PCA) in R-Studio. Next, the final model tool and its corresponding VBA code is provided here http://rmarkdown.rstudio.com.

1. Load data and libraries

#Set working directory
setwd("C:/Users/vanes/Documents/A-Research/A_Indoor Navigation/A_Defense/PCA")

# Load libraries
library(ggplot2)
library(FactoMineR)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
# Upload data
data <- read.csv('Final Data.csv')

2. Run PCA

Here PCA will be conducted.

# Run Principal Component Analysis
data.pca <- prcomp(data, center = TRUE, scale = TRUE) 

3. Interpret PCA- PC Selection

Now researcher judgement will be used to determine which principal components (PC) on 3 criteria.

3.1. Visualization (Scree plot): visualize the variability each PC accounts for. An elbow in the plot usually signifies around where you will define a cutoff for the PC to keep.

3.2. Kaisers Rule: Selected PC should have an eigenvalue greater than 1.

3.3. Proportion of Variance: Selected PC should account for at least 50% of the original dataset’s variability.

# 3.1. Visualization )Scree plot)
plot(data.pca, type="l", main= "Scree Plot")  

# 3.2. Kaisers Rule
eigen(cor(data))$values 
##  [1] 4.2892832 2.5078757 1.2962672 1.0134558 0.8086376 0.7008438 0.6659542
##  [8] 0.5464710 0.3083299 0.2727792 0.2424242 0.1861854 0.1614928
# 3.3. Proportion of Variance 
# Shown on Scree plot
fviz_eig(data.pca, addlabels = TRUE, main= "Scree Plot")

# Proportion of variance of each PC and cumulative variance 
summary(data.pca)  
## Importance of components:
##                           PC1    PC2     PC3     PC4    PC5     PC6     PC7
## Standard deviation     2.0711 1.5836 1.13854 1.00671 0.8992 0.83716 0.81606
## Proportion of Variance 0.3299 0.1929 0.09971 0.07796 0.0622 0.05391 0.05123
## Cumulative Proportion  0.3299 0.5229 0.62257 0.70053 0.7627 0.81664 0.86787
##                            PC8     PC9    PC10    PC11    PC12    PC13
## Standard deviation     0.73924 0.55527 0.52228 0.49237 0.43149 0.40186
## Proportion of Variance 0.04204 0.02372 0.02098 0.01865 0.01432 0.01242
## Cumulative Proportion  0.90991 0.93362 0.95461 0.97326 0.98758 1.00000
# Rotation matrix:
print(data.pca$rotation)
##                               PC1         PC2         PC3         PC4
## Satisfaction           0.40743266 -0.12412628  0.06300737  0.29007977
## Percievable            0.36098097 -0.26082404  0.14168258 -0.10163382
## Operable               0.38014743 -0.12562730 -0.01195847  0.38912573
## Understandable         0.37523835 -0.24409040  0.05821591  0.29383288
## Robust                 0.25146943 -0.22280222  0.16742767 -0.58055299
## Flexibility            0.21368316 -0.28899384  0.08706650 -0.47325845
## Recall                 0.10238364  0.33937353  0.31576946  0.11403152
## recognition            0.06860482  0.26417873  0.44885222 -0.09502660
## Errors                -0.27243654 -0.06886467  0.46023554 -0.03258493
## Hints                 -0.25365329 -0.40140601 -0.11531735  0.07042785
## Trials.before.mastery -0.29259051 -0.33528459  0.38199373  0.15701651
## Times                 -0.20221902 -0.40896128 -0.30222492  0.01551537
## Task.Completetion      0.17095517  0.27846701 -0.41624368 -0.23195333
##                                 PC5         PC6         PC7          PC8
## Satisfaction          -6.382866e-02 -0.05814290 -0.09134504  0.072806791
## Percievable            8.474734e-02 -0.02164993 -0.07684818 -0.251118586
## Operable               7.432292e-02  0.01846792 -0.10258219  0.080594815
## Understandable        -1.066560e-02 -0.09391828 -0.11173894 -0.005618854
## Robust                 9.489796e-02  0.03438693  0.06356460 -0.543170525
## Flexibility           -1.373159e-01 -0.13897093  0.19361905  0.743381432
## Recall                -2.697261e-01 -0.65794657  0.46444916 -0.163706438
## recognition           -6.465558e-01  0.31189661 -0.43409357  0.021846042
## Errors                 4.106476e-01 -0.35960673 -0.40590319  0.145051827
## Hints                 -3.988881e-01 -0.19030383 -0.01236963 -0.110976421
## Trials.before.mastery  5.044985e-02 -0.07309710 -0.11270007 -0.055735673
## Times                 -3.662921e-01 -0.19941005 -0.10840659 -0.115893315
## Task.Completetion     -7.256901e-06 -0.47505822 -0.57245000 -0.017052466
##                               PC9        PC10        PC11         PC12
## Satisfaction           0.40997585 -0.00543586 -0.25967070  0.060965255
## Percievable           -0.57546260 -0.02427120 -0.59321972 -0.061999832
## Operable              -0.39176333 -0.30531505  0.61191582 -0.202542533
## Understandable         0.34065065  0.29658156  0.03569358  0.286392537
## Robust                 0.23978526 -0.03935357  0.37399904  0.031202287
## Flexibility           -0.01759915  0.01483129  0.03522686 -0.090279321
## Recall                -0.03608573 -0.08474132  0.01408464 -0.050235540
## recognition           -0.03559159 -0.06469405  0.03440940  0.030287365
## Errors                -0.05410484 -0.20772107  0.05049328  0.412992545
## Hints                 -0.32403870  0.49473219  0.20677412  0.292142287
## Trials.before.mastery  0.17408134  0.21393257 -0.01381877 -0.727797627
## Times                  0.18031804 -0.65658662 -0.12677012 -0.004317853
## Task.Completetion      0.01087488  0.20071686  0.02968442 -0.266856207
##                                PC13
## Satisfaction          -0.6856531697
## Percievable            0.0845177160
## Operable              -0.0556780714
## Understandable         0.6333556968
## Robust                -0.1135394154
## Flexibility            0.0327733631
## Recall                 0.0475995496
## recognition            0.0567698389
## Errors                -0.0722757522
## Hints                 -0.2657354623
## Trials.before.mastery  0.0418771774
## Times                  0.1464584515
## Task.Completetion     -0.0005291889

Using the criteria it was decided to keep the first 2 PC based on:

  1. Visualization- the scree plot apprears to level off after PC 2
  2. Kaisers Rule- can select up to the first 4 PC since they have an eigenvalue above 1
  3. Proportion of Variance- should select at least the first 2 PC to account for more than 50% of the cumulative variance

4. Reduced Dimension Selection

Now that the first 2 PC were determine to keep, which original dataset dimensions to keep will be selected. In behavioral research loadings from 0.32 to 0.50 are used as selection thresholds.

# Print the loadings for the first two principal components
rotation_matrix <- data.pca$rotation
print(rotation_matrix[, 1:2])
##                               PC1         PC2
## Satisfaction           0.40743266 -0.12412628
## Percievable            0.36098097 -0.26082404
## Operable               0.38014743 -0.12562730
## Understandable         0.37523835 -0.24409040
## Robust                 0.25146943 -0.22280222
## Flexibility            0.21368316 -0.28899384
## Recall                 0.10238364  0.33937353
## recognition            0.06860482  0.26417873
## Errors                -0.27243654 -0.06886467
## Hints                 -0.25365329 -0.40140601
## Trials.before.mastery -0.29259051 -0.33528459
## Times                 -0.20221902 -0.40896128
## Task.Completetion      0.17095517  0.27846701

Given the results of this dataset, a cut off of 0.35 was selected and yeilded a reduced dataset of 6 dimensions: satisfaction, perceivable, operable, understandable, and task completion time.

5. PCA proof

Last proofs can be checked to ensure fundamental assumptions and properties of PCA are upheld.

5.1. Eigenvalues of Correlation Matrix= Variance of Transformed Data If the data has been standardized (mean-centered and scaled to unit variance), the correlation matrix and the covariance matrix of the original data are essentially the same when eigenvalues are considered

5.2. Proof of Orthogonality The diagonal matrix with 1s on the diagonal and 0s elsewhere in the correlation matrix of proved orthogonality and standardization.

# PCA Proof
# 5.1. Need PC to have mean=0, and variance=eigenvalue
eigen(cor(data))$values 
##  [1] 4.2892832 2.5078757 1.2962672 1.0134558 0.8086376 0.7008438 0.6659542
##  [8] 0.5464710 0.3083299 0.2727792 0.2424242 0.1861854 0.1614928
diag(var(data.pca$x[,]))  
##       PC1       PC2       PC3       PC4       PC5       PC6       PC7       PC8 
## 4.2892832 2.5078757 1.2962672 1.0134558 0.8086376 0.7008438 0.6659542 0.5464710 
##       PC9      PC10      PC11      PC12      PC13 
## 0.3083299 0.2727792 0.2424242 0.1861854 0.1614928
# 5.2. Show PC are orthogonal
cor(data.pca$x)
##                PC1           PC2           PC3           PC4           PC5
## PC1   1.000000e+00  6.025343e-20 -1.224612e-16  2.204008e-16  9.997869e-17
## PC2   6.025343e-20  1.000000e+00 -2.273681e-17 -3.381308e-16 -2.954093e-16
## PC3  -1.224612e-16 -2.273681e-17  1.000000e+00 -3.601802e-16 -1.148432e-17
## PC4   2.204008e-16 -3.381308e-16 -3.601802e-16  1.000000e+00 -1.072004e-16
## PC5   9.997869e-17 -2.954093e-16 -1.148432e-17 -1.072004e-16  1.000000e+00
## PC6  -2.447470e-16 -1.052046e-16 -1.304982e-16  2.173398e-16  4.213683e-17
## PC7  -4.989024e-16  2.074557e-16  1.461877e-16  7.784022e-17 -4.929628e-16
## PC8   2.706046e-16  1.207084e-17  3.432536e-16  9.348395e-16  1.952246e-16
## PC9  -1.472830e-15 -1.673737e-17  1.211626e-16 -3.573365e-17  4.965619e-16
## PC10 -1.240592e-15  6.787356e-17 -1.430841e-16 -2.520490e-16 -1.203846e-16
## PC11  9.393499e-16 -4.722369e-16  4.435577e-16  1.696966e-16 -9.196416e-17
## PC12  9.844457e-17 -1.484719e-16 -5.418184e-16 -9.149781e-16 -7.012350e-16
## PC13  9.087503e-17  2.374371e-16 -2.939414e-16 -5.938340e-16  1.063313e-16
##                PC6           PC7           PC8           PC9          PC10
## PC1  -2.447470e-16 -4.989024e-16  2.706046e-16 -1.472830e-15 -1.240592e-15
## PC2  -1.052046e-16  2.074557e-16  1.207084e-17 -1.673737e-17  6.787356e-17
## PC3  -1.304982e-16  1.461877e-16  3.432536e-16  1.211626e-16 -1.430841e-16
## PC4   2.173398e-16  7.784022e-17  9.348395e-16 -3.573365e-17 -2.520490e-16
## PC5   4.213683e-17 -4.929628e-16  1.952246e-16  4.965619e-16 -1.203846e-16
## PC6   1.000000e+00 -1.565405e-16 -4.889407e-16  4.550821e-16 -1.318238e-16
## PC7  -1.565405e-16  1.000000e+00 -1.162650e-16  3.349880e-17 -2.330568e-16
## PC8  -4.889407e-16 -1.162650e-16  1.000000e+00 -8.573692e-16 -2.751941e-16
## PC9   4.550821e-16  3.349880e-17 -8.573692e-16  1.000000e+00 -2.644992e-16
## PC10 -1.318238e-16 -2.330568e-16 -2.751941e-16 -2.644992e-16  1.000000e+00
## PC11  1.598627e-16 -1.787284e-16 -3.556507e-16 -4.862962e-16 -6.681759e-17
## PC12  2.814743e-16 -3.878900e-16 -1.494996e-16 -5.800430e-16  9.684632e-17
## PC13  5.764499e-16  6.344024e-16 -1.008947e-15  4.340642e-16  5.482013e-16
##               PC11          PC12          PC13
## PC1   9.393499e-16  9.844457e-17  9.087503e-17
## PC2  -4.722369e-16 -1.484719e-16  2.374371e-16
## PC3   4.435577e-16 -5.418184e-16 -2.939414e-16
## PC4   1.696966e-16 -9.149781e-16 -5.938340e-16
## PC5  -9.196416e-17 -7.012350e-16  1.063313e-16
## PC6   1.598627e-16  2.814743e-16  5.764499e-16
## PC7  -1.787284e-16 -3.878900e-16  6.344024e-16
## PC8  -3.556507e-16 -1.494996e-16 -1.008947e-15
## PC9  -4.862962e-16 -5.800430e-16  4.340642e-16
## PC10 -6.681759e-17  9.684632e-17  5.482013e-16
## PC11  1.000000e+00 -4.661230e-16 -7.923753e-16
## PC12 -4.661230e-16  1.000000e+00 -1.311666e-17
## PC13 -7.923753e-16 -1.311666e-17  1.000000e+00

Test Significance of PCA

Use the PCAtest R package to find the the overall significance of a PCA and more. Find more about the PCAtest package here https://github.com/arleyc/PCAtest.

If the p-value is significant, it can be concluded that the observed value is unlikely to have occurred by random chance. This conclusion can also support there was enough data points to provide meaningful PCA results.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.