Escolar Documentos
Profissional Documentos
Cultura Documentos
ABSTRACT
In this paper we show how PCA (Principal Component Analysis) method can be implemented using Java programming
language. We consider using PCA algorithm especially in analysed data obtained from Raman spectroscopy
measurements, but other applications of developed software should also be possible. Our goal is to create a general
purpose PCA application, ready to run on every platform which is supported by Java.
Keywords: PCA, statistical procedures, Raman spectroscopy, JAVA.
1. INTRODUCTION
Nowadays large amount of data is used in many applications. Unfortunately sometimes it is difficult to notice some
differences in data presented in tables or even in plots. A possible assistance in solving this problem is to use statistical
methods of factor analysis, in particular the relatively simple PCA (Principal Component Analysis) method. The PCA
found many applications in different fields including image compression [1], multiple ways for face recognition [2, 3]
and many more advanced applications such as communications between human brain and external devices (BCI method)
[4]. With all that examples PCA is dominant tool for finding patterns in analyzed data. In this paper principal component
analysis basics will be explained with some computations and one example from multivariate data. Its purpose is to
explain how PCA works and better understanding of its implementation in real-world data measurement. Our goal is to
create a general purpose application ready to work on every platform supported by Java. Developed software has been
tested on data received from Raman spectroscopy.
Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2015,
edited by Ryszard S. Romaniuk, Proc. of SPIE Vol. 9662, 96623P · © 2015 SPIE
CCC code: 0277-786X/15/$18 · doi: 10.1117/12.2205857
Now with covariance matrix (let C be m x m covariance matrix) there is orthogonal m x m matrix A with columns which
are eigenvectors form C and the second matrix B (diagonal) where diagonal elements of it are eigenvalues of C, that:
𝐴𝑇 𝐶𝐴 = 𝐵 (8)
This linear transformation helps to transform data points from one axis system to another with uncorrelated variables.
0.5
0 -
Wal Eng Scot
-
N Ire
-0.5
1
-300 -200 -100 0 100 200 300 400 500
PC1
Figure 1. First principal component (PC1) for data presented in table 1 [7].
As it can be see in Figure 1 the coordinates are clustered. It shows that two major clusters are forming. One is Wales,
England and Scotland together and the second one is Northern Ireland itself, separate from others. Here it is notable
enough to see that something is different about Northern Ireland. Figure 2 shows two principal components (orthogonal)
which projects the coordinates in 2D scores plot. Now 2 it is much easier to see differences between UK countries. One
visible difference is that North Ireland is major outlier and it is real life example because North Ireland is one of those
four countries that are not on the Great Britain Island.
There may be a question why dots are clustered together in that specific way. Answer is simple - if we look again at
Table 1 we will notice that North Ireland consumes less fish, fruits, alcoholic drinks and cheese and at the same time
more fresh potatoes. It is much harder to see differences in table with large data amount, whereas using the PCA
methods results in an improvement in the data analysis process. For better visual understanding of PCA there is a
website with multivariate data example which can be useful [9].
r
200 YYr aI
N Irr+
Eng
Scot
0:0 00 400 DC
0.,
0,3, .36
0,3
0,25
25
0,2
0,15
0,1',
0,05
0,00 0219
0 200 400 600 800 1000 1200 1400 16W 1900 2 000 2 200 2 40. _ 3 .0) 3 2N0 3 400 3 460 3 900 4 003
X Aids
- ABS td
- PEHD,h1
PPDI
PS6t
GJ
-7,5 -7p -6,5 dp -5.5 Sp H,5 -1p J5 -3,0 -25 -2p -1.5 -1p -0,5 0p .5 10 1.5 2D 2S 3D 3,5 40 4.5 5p 55 60 6,5 ]p J.5
PCI OS Y 4P02
6J
AS.61
PE14006
PP.4
P3.G2
4. CONCLUSION
PCA is a great statistical method which is used to reduce dimensionality and visualize correlations and proximities. It has
five steps: data preparation, covariance or correlation matrix calculation, eigenvectors and eigenvalues decomposition,
principal component selection, new data set computation [12]. Figures 1, 2 and 4 are great example of how this variables
were reduced without any loss of original data set information. In PCA direction with the largest variances is most
important or in this case most principal. It is particularly used with highly correlated variables. Above example shows
that in order to classify the data, only 2 variables are needed and not nearly two thousand as in input data.The motivation
of this application and article was to show that a great statistical method (used with image compression, face recognition,
Raman spectroscopy and many more) can be obtained with modern programming language and can be simply used with
any .txt data on every platform which is supported by Java.
REFERENCES
[1] Qian, D. and Fowler, J., "Hyperspectral Image Compression Using JPEG2000 and Principal Component Analysis",
Geoscience and Remote Sensing Letters IEEE (4), 201-205 (2007).
[2] Nedevschi, S., "PCA type algorithm applied in face recognition ", Intelligent Computer Communication and
Processing (ICCP), 167-171 (2012).
[3] Zhang, D., Zhou, Z. and Chen, S., "Diagonal principal component analysis for face recognition", Pattern
Recognition 39 (1), 140-142 (2006).
[4] Kottaimalai, R., Rajasekaran, M., Selvam, V. and Kannapiran, B., "EEG signal classification using Principal
Component Analysis with Neural Network in Brain Computer Interface applications", Emerging Trends in
Computing, Communication and Nanotechnology (ICE-CCN), 227-231 (2013).
[5] Smith, L., "A tutorial on Principal Components Analysis", Cornell University, 1-22 (2002).
[6] Gillies, D., "DOC493: Intelligent Data Analysis and Probabilistic Inference Lecture 15", Department of Computing,
Imperial College London.
[7] Richardson, M., "Principal Component Analysis", Wiley Interdisciplinary Reviews: Computational Statistics (2), 3-
14 (2009).
[8] http://www.jascoinc.com/applications (on 30.06.2015).
[9] http://setosa.io/ev/principal-component-analysis/ (on 30.06.2015).
[10] Belka, R., Suchańska, M., Czerwosz, E. and Kęczkowska, J., “Raman studies of Pd-C nanocomposites,” Central
European Journal of Physics11 (2), 245-250 (2013).
[11] Belka, R. and Suchańska, M., “Properties of the carbon-palladium nanocomposites studied by Raman spectroscopy
method,” Proceedings of SPIE 8903, (2013).
[12] http://www.sthda.com/english/wiki/principal-component-analysis-the-basics-you-should-read-r-software-and-data-
mining (on 30.06.2015).