Excelente herramienta

© All Rights Reserved

16 visualizações

Excelente herramienta

© All Rights Reserved

- Application of multivariate principal component analysis on dimensional reduction of milk composition variables
- Multivariate Data Analysis Using SPSS
- A Confirmatory Factor Analysis of the Student Adaptation to
- tmpEC2F.tmp
- Segmentation
- measuring-volunteer-motivations.pdf
- Assignment Matrix Decomposition
- Minerals Thin Section
- PCA_JH_MM_GE
- IJAIEM-2014-02-28-079
- Multivariate Technique
- 1-s2.0-S026087741100238X-main
- A Comparative Study of Various Inclusive Indices and the Index Constructed by the Principal Components Analysis
- Multi WayPCA
- Grery Best
- 2004 - Object Detection Using Feature Subset Selection
- Class Voting 13 May
- Super Cycles in Real Metals Prices
- A DCT Based Local Feature Extraction Algorithm for Palm Print Recognition
- j.1750-3841.2009.01111.x

Você está na página 1de 22

Pavel Paclik, Serguei Verzakov, and Robert P.W.Duin

Contents

1 Hypertools toolbox 1

2 Installing hypertools 2

3 Data handling 2

3.1 Data handling in hypertools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.2 Spectral images in PRTools dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.3 Conversion between DIP image and dataset representation . . . . . . . . . . . . . . . . . . 4

3.4 Importing binary BioRad FTIR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Plotting spectral data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.2 Interactive vizualization tool for spectral images . . . . . . . . . . . . . . . . . . . . . . . 5

4.3 Area under spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Preprocessing 9

5.1 Baseline subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.2 Smoothing of spectral data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.3 Unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Dissimilarity measures 12

6.1 Dissimilarity measures implemented in hypertools . . . . . . . . . . . . . . . . . . . . . . . 12

6.2 Visualization using dissimilarity measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6.3 Building dissimilarity representation for pattern recognition . . . . . . . . . . . . . . . . . 13

7.1 Generalized Local Discriminant Bases (GLDB) . . . . . . . . . . . . . . . . . . . . . . . . 14

7.2 Multi-class GLDB feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7.3 Genetic algorithm for feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7.4 Maximum Autocorrelation Transformation (MAF) . . . . . . . . . . . . . . . . . . . . . . 18

7.5 Principial Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.6 PCA shaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.7 Canonical Correlation Analysis (CCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.8 Partial Least Squares (PLS) regression mapping . . . . . . . . . . . . . . . . . . . . . . . 20

8 Image segmentation 20

8.1 Segmentation combining spatial and spectral domain . . . . . . . . . . . . . . . . . . . . . 20

8.2 ECHO segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9 Classifiers 22

1 Hypertools toolbox

Hypertools is A Matlab toolbox for analysis of hyperspectral images. It contains algorithms for visualiza-

tion, preprocessing, representation and classification of spectral data. This toolbox is being developed at

TU Delft in The Netherlands within the Hyperspectral Image Analysis project, sponsored by the Dutch

technology foundation STW. Hypertools is available under academic or commercial license. Although we

are especially targeting spectral images, number of routines may be used also for generic spectral datasets

1

without spectral. It is based on the PRTools toolbox version 4. This document briefly describes how to

use hypertools for analyzing of spectral images.

2 Installing hypertools

Hypertools require Matlab version 6.1 or higher and PRTools version 4.x. Many routines is hypertools

require also DIPimage version 1.4 or higher. In order to install hypertools, extract the archive into a

directory and add its path into the Matlab environment.

3 Data handling

3.1 Data handling in hypertools

Hyperspectral images consist of spectral measurements organized in a spatial setup:

In hypertools, image cubes may be stored either in PRTools dataset or in a dip_image. It depends

on actual type of processing we want to execute on the data cube which data representation is better

at the moment. In case of extensive filtering or 2D-connected processing, dip_image is a good choice.

For pattern recognition tasks, such as clustering, image segmentation, feature extraction or classification,

dataset representation offers far more flexibility. Because of availability of additional meta information,

hypertools uses PRTools dataset as the primary data representation.

In this tutorial, we illustrate various data analysis approaches on a spectral image from a plastic sorting

application (NIR spectra). The image depicts four types of plastics and a background class. Through

the text, the image a1 is used for training and a2 as a test set.

Spectral images, stored in PRTools dataset contain following meta information in structure fields:

spectral image dimensions in objsize and number of spectral wavelengths in featsize

2

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> a1.objsize

ans =

33 40

>> a1.featsize

ans =

240

optional labels per spectrum in nlab. Labelling of spectral images may be visualized as an image

using

getli (get label image) function:

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> getli(a1)

Displayed in figure 3

unique identifier per spectrum (pixel) in ident. Identifiers are useful to back trace what pixels are

present in a subset of the original spectral image.

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% pixels per class

>> proto=gendat(a1,0.01)

15 by 240 dataset with 5 classes: [4 4 3 2 2]

>> drawident(a1(:,100),proto)

Displayed in figure 3

3

Figure: Highlighting the prototypes

version info capturing the PRTools version, used for dataset generation and date of dataset cre-

ation. Dataset name may closely identify the dataset content or project name:

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> a1.version1

ans =

Version: 3.2.5

Release:

Date: 04-Apr-2003

>> a1.version2

ans =

22-Sep-2003 15:33:40

>> a1.name=projectA

projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

additional info in the user structure. Additional information, such as spectral range, units, or

spectral imaging technique description, may be stored in the user field.

In hypertools, spectral image data may be transformed back and forth between PRTools datasets and

dip_image objects using data2dip and dip2data

functions. Both transformations preserve data connectivity, but transforming the data into DIPimage,

the meta information, stored in a dataset, is lost.

todo:examples of data2dip, dip2data and fig2dip

Binary BioRad files may be imported as spectral datasets using the ftir_load routine. This routine

is experimental and limited to transmission BioRad datasets. Starting wavelength and the step may be

defined. This meta information is stored in the dataset user field and used e.g. in rendering the spectral

domain plots via plots .

4

>> fim=ftir_load(a1.dat,910.399010, 15.43049)

4096 by 512 dataset with 0 classes: []

>> fim.user

ans =

type: spectra

format: FTIR converted from bio-rad

mode: transmission

units: cm^-1

start_wavelength: 910.3990

step: 15.4305

>> plots(gendat(fim,10))

4.1 Plotting spectral data

Spectral data, stored in a PRTools dataset may be plotted using the plots function. Dataset features

are assumed to represent densely sampled wavelengths. Each data sample is rendered as a 1D function

of the wavelength. See also this example.

Hypertools contain a simple interactive visualization tool for spectral imagery. It allows the user to

inspect both spectral and spatial data domain simultaneously. Using the

showsi command, the spectral image is rendered in two windows: the spatial view (using dip_image))

and the spectral plot using plots . It is useful to store the handle, returned from showsi command as

it enables access to the visualized data and provides additional functionality.

>> a1

projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> sih=showsi(a1);

5

Figure: Visualizing the spectral image using showsi

Moving a mouse pointer over the spatial image, a spectrum at the current point is visible in the spectral

plot. When a left mouse button is clicked in the spectral plot, a spectral wavelength may be selected by

dragging the mouse over the plot. The spatial plot is updated accordingly.

By clicking over the spatial image, points may be selected. Corresponding spectra are also plotted in

the spectral pane. Three buttons in the spectral figure allow to choose three different colors or classes.

Right mouse click in the spatial window cancels the point selection.

6

Figure: Highlighting the points of interest

Selected points may be retrieved from the spectral image view using the spectral image handle sih

and used e.g. as prototypes. A dissimilarity may be computed from all the data points to the selected

prototypes:

>> proto=si_get_spectra(sih)

5 by 240 dataset with 3 classes: [2 2 1]

>> D=dasam(a,proto)

5280 by 5 dataset with 5 classes: [1461 1426 1025 664 704]

% computed and returned as a dataset

% scatterdui command:

>> fig=scatterdui(D);

% fig is a figure handle of the scatter plot which we will use later.

7

Figure: Scatter plot of a dissimilarity space representation

Buttons along each scatter axis allow us to step easily between different feature space dimensions. By

clicking at the data points, sample index is shown nearby. This enables us to find back a sample or a

pixel from a feature space.

In order to see the correspondence between different representations of spectral data, scatter plots may

be attached to the spectral image. We need the handles of both spectral image (sih) and of the scatter

plot (fig):

>> si_attach_display(sih,fig)

Now, we may move the mouse over the spatial image and observe the corresponding data sample in

the feature space, denoted by the yellow circle.

todo:example of generating a area under the spectrum image

8

5 Preprocessing

5.1 Baseline subtraction

Baseline may be subtracted from a dataset using a

basesubm mapping. First, a single spectrum must be identified which will be used for baseline correc-

tion and then the baseline region must be identified. This may be achieved using an interactive showsi

tool, included in hypertools:

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> sih=showsi(a1)

>> b=si_get_spectra(sih)

1 by 240 dataset with 1 class: [1]

>> w=basesubm(b,[1:40 220:240])

Baseline subtraction mapping, 240 to 179 trained mapping --> basesubm

>> c=a1*w

1320 by 179 dataset with 5 classes: [372 360 258 161 169]

>> sih2=showsi(c)

9

Figure: Spectral image with subtracted baseline

The baseline subtraction mapping reduced the dimensionality of our dataset from original 240 to 179

wavelengths. Please note the zero values in the tails of corrected spectra resulting from the default

clipping.

Via smoothm mapping, two smoothing algorithms are implemented in Hypertools: Gaussian and Savitsky-

Golay. Smoothing parameter sigma may be set for Gaussian smoothing. In case of Savitsky-Golay, three

parameters may be set: windows size, polynomial degree and the derivative order.

>> w=smoothm

Spectral smoothing mapping (Gaussian sigma=1.0), fixed mapping --> smoothm

>> w=smoothm(gauss,3)

Spectral smoothing mapping (Gaussian sigma=3.0), fixed mapping --> smoothm

>> w=smoothm(sg)

Spectral smoothing mapping (Savitsky-Golay ws=3,p=1,d=0), fixed mapping --> smoothm

>> w=smoothm(sg,7,2)

Spectral smoothing mapping (Savitsky-Golay ws=7,p=2,d=0), fixed mapping --> smoothm

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> b=gendat(a1,1)

1 by 240 dataset with 5 classes: [0 1 0 0 0]

>> plots(b)

% smoothing

>> w=smoothm(sg,7,2)

Spectral smoothing mapping (Savitsky-Golay ws=7,p=2,d=0), fixed mapping --> smoothm

>> plots(b*w,r)

10

Figure: Smoothing spectra. Original spectrum (blue) and the smoothed spectrum (red)

5.3 Unmixing

Other names for unmixing are blind source separation (in signal processing), multy-curve resolution (in

chemometrics) or factor analysis. The goal of unmixing is to represent dataset as a product of two

matrices: concentrations (scores) and spectra (loadings) of pure components. For spectral data we can

make use of the nonegativity of both matrices (scores and loadings). Unmixing routines which are

implemented in hypertools:

varimax : VARIMAX, given the loadings found by PCA, tries to find the rotation after which they

look as sparse as possible, i.e. it assumes that pure spectra consist of the number of compact peaks.

Varimax is provided in two versions,

varimaxfm performing feature selection and

varimaxom implementing object selection.

opa : OPA (orthogonal projection approach) looks for the set of the most dissimilar (orthogonal)

spectra

opam implements selection of most orthogonal features

opaps selection of the most orthogonal prototypes (examples).

simplisma : is similar to the OPA but also takes into account the purity of the candidate spectra

(the pure spectrum is supposed to have large variance)

simplismam implements selection of most pure features

simplismaps selection of the most pure prototypes (examples).

als : (alterating least squares) is the last step in unmixing procedure. Taking as an input data

and candidate loadings (pure spectra) found by previous routines, it decomposes data into positive

concentration and spectra matrices.

todo:unmixing examples

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> b=simplismaps(a1)

5 by 240 dataset with 5 classes: [1 1 2 0 1]

>> [c,b2]=als(a1,b)

1320 by 5 dataset with 5 classes: [372 360 258 161 169]

11

>> xOPAOptions.maxsn = inf;

>> OPAOptions.eps = 0.05;

>> OPAOptions.include_mean = 0;

>> OPAOptions.verbose = 0;

>> [Y, Y_ind, dis_max, dis] = opa(X,OPAOptions);

>> ALSOptions.mode = samples; % samples | features

>> ALSOptions.maxiter = 100;

>> ALSOptions.crit = rec; % conv | rec

>> ALSOptions.eps = 1e-2;

>> ALSOptions.verbose = 0;

>> [Zp,Yp] = als(X,Y,ALSOptions);

6 Dissimilarity measures

Dissimilarity measures define the scalar-valued measurement of a dissimilarity between two spectra. The

dissimilarity values may be used for visualization or for building dissimilarity representation for pattern

recognition.

dasam : Spectral Angle Mapper (SAM), (arc cosine)

dsam : Spectral Angle Mapper (as normalized inner product)

dkolmogorov : Kolmogorov dissimilarity between cumulative distributions, computed from unit-

normalized spectra

dmatch : matching dissimilarity (sum of differences between cumulative distributions, computed

from unit-normalized spectra)

dspec_shape : L1 norm between derivative of spectra (using smoothed Gaussian derivative filter)

dquadform : quadratic form dissimilarity

12

6.2 Visualization using dissimilarity measures

Dissimilarities may be computed from a complete spectral image to a set of prototype spectra. In the

following example, we choose randomly a set of prototypes from a labelled spectral image and compute

Spectral Angle Mapper dissimilarity between all image spectral and these prototypes. The result of

dissimilarity computation is a dataset with 15 features (each measuring dissimilarity to a corresponding

prototype). Because the dataset originates from a spectral image, it may be converted into dip_image

object and visualized.

>> a1

projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> proto=gendat(a1,0.01)

projectA, 15 by 240 dataset with 5 classes: [4 4 3 2 2]

>> D=dasam(a1,proto)

1320 by 15 dataset with 5 classes: [372 360 258 161 169]

>> data2dip(D)

Displayed in figure 1

The image contains 15 bands, (features in a dataset) rendering dissimilarities to the 15 prototypes.

By pressing n or p keys, we can move back and forth (dip_image feature). The following Figure shows

the dissimilarity to the 5th prototype (indices in DIP image package are zero-based).

Dissimilarity measures may be used to create a representation and use it for building classifier. Tradi-

tionally, mean class spectra are used as prototypes, some spectra-specific dissimilarity is computed to

these prototypes, followed by the minimum distance classifier ( mindistc ).

13

% compute mean class spectra - use them as class prototypes

>> m=meancov(a1)

5 by 240 dataset with 5 classes: [1 1 1 1 1]

>> dtr=dasam(a1,m)

1320 by 5 dataset with 5 classes: [372 360 258 161 169]

>> dts=dasam(a2,m)

1320 by 5 dataset with 5 classes: [359 353 254 170 184]

>> w=mindistc(dtr)

Minimum distance classifier, 5 to 5 trained mapping --> mindistc

% execute the trained mapping on the test set and get the average class error:

>> dts*w*testc

ans =

0.0992

>> lab=dts*w*classim;

>> getli(lab)

Displayed in figure 1

Prototypes (representation set) may be also selected randomly or via the interactive tool, as shown

above.

7.1 Generalized Local Discriminant Bases (GLDB)

Generalized Local Discriminant Bases (GLDB) feature extraction algorithm, proposed by Kumar,Ghosh,

and Crawford:

Kumar,Ghosh,Crawford:Best-Bases Feature Extraction Algorithms for Classification of Hyperspec-

tral Data, IEEE Trans.on Geoscience and Remote Sensing, bol.39, no.7, July 2001

GLDB algorithm splits a spectrum into a set of non-overlapping regions maximizing the separability

between classes. GLDB algorithm starts from with all the wavelengths forming singleton feature groups.

It tries to grow each group and selects the one maximizing the criterion based on Fisher ratio and

minimum correlation (max-min). It grows until no further improvement can be made. GLDB algorithm,

14

applied to a training dataset, results in a set of non-overlapping wavelength groups and corresponding

Fisher projection mappings. Applied to a test set, a new feature space is generated by projecting each

wavelength group into the 1D Fisher space. In order to mitigate the influence of non-informative data, the

authors recommend to run a subsequent feature selection or extraction algorithm after applying GLDB

method.

gldbm routine implements the bottom-up two-class GLDB feature extractor as proposed by Kumar

et.al.. Selected wavelength groups may be visualized by

plot_gldb_groups function.

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> b1=seldat(a1,[2 3])

projectA, 618 by 240 dataset with 2 classes: [360 258]

>> w=gldbm(b1)

.......Best Bases mapping, 240 to 50 trained mapping --> gldbm

>> c1=b1*w

projectA, 618 by 50 dataset with 2 classes: [360 258]

>> plot_gldb_groups(b1,w)

As you can see, the GLDB feature extraction algorithm decomposes a complete spectral range into

a set of wavelength groups. Many of these are probably not adding any discriminatory information in

the classification problem at hand. In order to identify the informative groups, a second stage feature

selection may be carried on.

In the following example, we select the best features generated by the GLDB extraction using a

sequential forward selection procedure. Because GLDB is effectively both feature selection (groups of

adjacent wavelengths) and feature extraction (within each group), the feature selection result may be

combined with the trained GLDB mapping leading to a reduced GLDB mapping:

15

% select the best subset of GLDB features in the dataset c

>> wfsel=featself_simple(c1)

Forward Feature Selection, 50 to 1 fixed mapping --> cmapm

% let us now derive a reduced GLDB mapping retaining only wavelength groups,

% selected by the feature selection

>> wnew=gldbm_featsel(w,wfsel)

240 to 1 trained mapping --> gldbm

% we can look into the wnew mapping to see only the group of wavelengths 87 to 170

% is used in this reduced GLDB mapping:

>> getdata(wnew)

clf: [1x1 struct]

l: 87

u: 170

Hypertools provides also impementation of the top-down GLDB algorithm using the Log-odds probability-

based criterion ( gldbm_td_prob ) as proposed by the GLDB authors. We have also implemented the

top-down GLDB using the apparent error criterion ( gldbm_td_ae ) and using the combined Fisher

separability and correlation criterion used also in the bottom-up case ( gldbm_td ).

Originally the GLDB algorithm was defined for two-class problems only. For multi-class problems, the

authors proposed to derive all pair-wise GLDB extractors and train the classifiers in the respective feature

spaces. In a C-class classification problem, a newcoming object is subjected to all C(C-1)/2 stored feature

extractors and classifiers and finally to a majority voting combiner.

In our paper:

Paclik, Verzakov, Duin, Multi-class extensions of the GLDB feature extraction algorithm for spectral

data, In proc. of ICPR, 2004

we have presented two alternative solutions significantly limiting the execution complexity of a multi-

class classifiers employing GLDB feature extraction. Hypertools toolbox contains an implementation of

one of them - a GLDB feature extractor with multi-class criterion. The criterion, utilizing the same

concept as in the two-class case - a combined Fisher separability and inter-band correlation. If more

then two-classes are available, the Fisher projection yields min(wavelelengths in a group,C-1) output

features for each wavelength group.

% train the multi-class GLDB extraction on a training set a1 with five classes:

>> a1

1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> w=gldbm_multi(a1)

Best Bases mapping, 240 to 42 trained mapping --> gldbm_multi

% let us look into the mapping (asking for a user-specific data of the trained mapping):

>> getdata(w)

clf: [1x26 struct]

l: [1x26 double]

u: [1x26 double]

info_iter: 94

We can observe that the multi-class GLDB mapping is composed of 26 wavelength groups but yields

42 features.

Similarly to the two-class GLDB, the second-stage feature selection may be carried on and the resulting

feature selection mapping combined with the trained GLDB mapping so only the informative wavelength

groups are retained.

16

>> c1=a1*w

1320 by 42 dataset with 5 classes: [372 360 258 161 169]

>> wfsel=featself_simple(c1)

Forward Feature Selection, 42 to 22 fixed mapping --> cmapm

>> wnew=gldbm_mc_featsel(w,wfsel)

240 to 22 trained mapping --> gldbm

Note, that the feature selection operates on the level of output features not the wavelength groups.

Finally, Hypertools provides a multi-class GLDB extractor leveraging the non-linear Fisher criterion,

introduced in:

Marco Loog, R.P.W. Duin, R. Haeb-Umbach, Multiclass Linear Dimension Reduction by Weighted

Pairwise Fisher Criteria, IEEE PAMI, vol. 23, no. 7, July 2001

The non-linear Fisher criterion is beneficial in situations where some classes are very distant from

others in the feature space. While the classical Fisher projection would emphasize the distant class,

the non-linear will re-weight the class contributions and offer a better overall performance. In PRTools,

the non-linear Fisher mapping is implemented in nlfisherm function. The Hypertools implements the

non-linear multi-class GLDB in the nlgldbm routine.

Genetic algorithm is an optimization method based on evolutionary concepts. Genetic-based feature

selection algorithm works as follows: feature subsets, encoded by binary vectors, form a population

of solutions. Quality of a feature subset (in genetic terminology a chromosome) may be evaluated by

criterion based on class separability. Initial population of feature subsets is generated randomly and

all chromosomes are evaluated. Chromosomes providing better class separation have a higher chance

to be selected for mating than the worse ones. Crossover operation is executed mixing randomly the

features between couples of good chromosomes. Underlying idea is that generated offspring may often

improve qualities of the parents. Additionally, with low probability, some chromosomes in the population

are subjected to mutation. That means, that some genes (features) get randomly flipped. Mutation

introduces new qualities or distortions, not present in the original population. In the optimization sense,

mutation may help to escape from a local optimum.

Hypertools toolbox provides a simple genetic algorithm

genfeatsel for feature selection, as described in:

Siedlecki, Sklansky, A note on genetic algorithms for large-scale feature selection, Pattern Recogni-

tion Letters, vol.10, pp.335-347, 1989

Apparent error of the Fisher classifier is used to measure chromosome quality. In the following ex-

periment, we use genetic algorithm to select best feature subset for a two-class dataset. We build a

population of 100 chromosomes (solutions) and perform 5 generations:

17

>> a1

projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% training set

>> b=seldat(a1,[2 3])

projectA, 618 by 240 dataset with 2 classes: [360 258]

>> w=genfeatsel(b,100,5)

Feature Selection, 240 to 2 fixed mapping --> cmapm

>> ts=seldat(a2,[2 3])

607 by 240 dataset with 2 classes: [353 254]

>> scatterdui(ts*w)

Figure: Scatter plot of a test set with features, identified by genetic algorithm

This is a special purpose version of Principal Component Analysis (PCA) for image data. The covariance

matrix is slightly modified such that the covariances are computed for the one-pixel shifted images.

Further, transformed data are constrained to be univariate. As a result the transformation maximizes

image autocorrelation.

>> a1

projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> w=maf(a1,0.9)

240 to 2 trained mapping --> affine

>> scatterdui(a1*w)

18

Figure: Scatter plot of the spectral data projected by MAF mapping

todo:describe the use of pcatrain, pcaapply, pcanpc, pcacrit and pcastat

PCA shaving performs a backward elimination procedures to find the groups (clusters) of correlated/covariated

wavelengths. Elimination is based on the ranking of wavelengths according to their participation in the

the first PC loading. PCA shaving may be run in a supervised or unsupervised mode.

Hypertools pcashave implementation is fully functional but doesnt provide unified mapping output,

yet. The following figure illustrates possible use of PCA shaving for unsupervised grouping of wavelengths

in spectrum. Wavelength color denote the membership in one of 10 identified groups.

Canonical Correlation Analysis commutates a linear transformations of data X to new representation

T and output (target) data Y to new representation U such that i-th columns of the T and U have

19

maximum possible correlation and at the same time are orthogonal to the previous columns in both

matrices. Columns of T and U are normalized to be univariate. Because X and Y can be not full-ranked,

pseudo inverse pinv is used in the algorithm.

>> b=seldat(a1,[2 3])

projectA, 618 by 240 dataset with 2 classes: [360 258]

>> bts=seldat(a2,[2 3])

607 by 240 dataset with 2 classes: [353 254]

>> plot_cca(b,bts)

Figure: Superimposed training (lighter colors) and test data (dark colors), projected by the CCA mapping

We can see the bright unimodal clusters corresponding to the training data, projected by CCA map-

ping. The darker markers denote the test set, projected using the same mapping. It is apparent, that in

this case the training set is not representative of the problem.

Partial Least Square (PLS) is a multiple linear regression technique, which maps input data onto set of

target variables. It can be used for regression, classification (targets from crisp labels) or for visualization.

In hypertools, PLS regression is implemented as plsm mapping.

todo:plsm examples

8 Image segmentation

Image segmentation is an unsupervised pattern recognition technique producing a unique assignment of

image pixels (spectra) into a set of classes. Because even the number of classes is usually unknown, image

segmentation is, in fact, an ill-posed clustering problem.

Hypertools implements in segment_comb an image segmentation algorithm, combining spectral and spa-

tial information using a combined classifier approach:

Paclik P., Duin R.P.W., van Kempen G.M.P., Kohlus R.: Segmentation of multi-spectral images

using the combined classifier approach, Image and Vision Computing, vol.21, num.6, pp.473-482,

June 2003

20

Firstly, a set of labels is created by clustering the spectral data domain. Then, in a loop, separate

classifiers are trained and executed in both domains: by default the nearest mean classifier (nmc) in the

spectral domain and Parzen classifier with a Gaussian kernel in the spatial domain (implemented by

convolution). Both domains are combined using the product combination rule. The process is repeated

until stability.

First, we segment a spectral image using a raw spectral data with 240 spectral wavelengths. We

cluster the data using k-means algorithm and using the combined spectra-spatial segmentation algorithm

(starting from the output of k-means). In the results bellow, we can see that some spatial inconsistencies

are improved using the combined segmentation algorithm because the spatial information is employed.

>> a1

projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> lab=kmeans(a1,5);

>> getli(lab,a1) % output is a dip_image window

>> speccolormap % high contrast color map for labels

% provided by k-means

>> seg=segment_comb(setlabels(a1,lab),5)

>> speccolormap

Figure: Result of kmeans clustering (left) and combined spectral-spatial algorithm (right) using raw

spectra

In the second experiment, we build a dissimilarity-based representation of the spectral image using

Spectral Angle Mapper (SAM) distance measure. A set of five, randomly selected, prototypes forms a

representation set. Again, both the k-means and the combined spectral-spatial algorithms generate image

labelling:

>> proto=gendat(+a1,5)

5 by 240 dataset with 1 class: [5]

>> drawident(a1(:,100),proto)

>> D=dasam(a1,proto)

1320 by 5 dataset with 5 classes: [372 360 258 161 169]

21

Figure: Five randomly selected prototype pixels (spectra)

>> lab=kmeans(D,5);

>> getli(lab,a1) % left figure

Figure: Result of kmeans clustering (left) and combined spectral-spatial algorithm (right) using SAM

distances

k-means clustering algorithm doesnt take into account spatial connectivity and, therefore, provides a

noisy solution which may be homogenized using the combined spectra-spatial algorithm.

ECHO algorithm, proposed by D.Landgrebe

todo:echo

9 Classifiers

Hypertools implements several classifiers, traditionally used by the spectral community:

mindistc : Minimum distance classifier for dissimilarity to prototypes. An example is available

here

corrc : correlation classifier

samc : Spectral Angle Mapper classifier

todo:simca desciption + simcac and simcam examples

22

- Application of multivariate principal component analysis on dimensional reduction of milk composition variablesEnviado porresearchinbiology
- Multivariate Data Analysis Using SPSSEnviado pork9denden
- A Confirmatory Factor Analysis of the Student Adaptation toEnviado porMaría Elena Ruiz Solano
- tmpEC2F.tmpEnviado porFrontiers
- SegmentationEnviado porPriyanka Alison
- measuring-volunteer-motivations.pdfEnviado porJuliana Sotnikova
- Assignment Matrix DecompositionEnviado porNoyeem Mahbub
- Minerals Thin SectionEnviado porCamilo Matta Torres
- PCA_JH_MM_GEEnviado porfreeski5
- IJAIEM-2014-02-28-079Enviado porAnonymous vQrJlEN
- Multivariate TechniqueEnviado porMuhammad Saifur Rahman
- 1-s2.0-S026087741100238X-mainEnviado porEifa Mat Lazim
- A Comparative Study of Various Inclusive Indices and the Index Constructed by the Principal Components AnalysisEnviado porSudhanshu K Mishra
- Multi WayPCAEnviado porAlex Ikeda
- Grery BestEnviado porErmias Aswessie
- 2004 - Object Detection Using Feature Subset SelectionEnviado porSummrina Kanwal
- Class Voting 13 MayEnviado porAmyEricaSmith
- Super Cycles in Real Metals PricesEnviado poriPoliticsCA
- A DCT Based Local Feature Extraction Algorithm for Palm Print RecognitionEnviado porIJSTR Research Publication
- j.1750-3841.2009.01111.xEnviado porAlexis Monzon Llempen
- IJCSE10-02-05-92.pdfEnviado porMahesh Neelarapu
- 9. Ijamss - Statistical Evaluation of Demographic Disparities of Two Communities of AssamEnviado poriaset123
- 10.1007@s00468-016-1491-5Enviado porPopa Ionel
- Rdemo-cwEnviado portsit
- clus_pp1Enviado porKingsly Usha
- 12.pdfEnviado porTalin Machin
- 04502813Enviado porSivaRaman Jayaraman
- EIJ-software-reviewEnviado porrafikscribd
- 10.1109@T08.2001914Enviado porkamiyab
- baili2011Enviado porRajendra Prasad

- GeochemicalEnviado porRoberto Pellerano
- Auto WEKA ManualEnviado porRoberto Pellerano
- Curso Redaccion Papers Tema1Enviado porRoberto Pellerano
- Texturómetro ShimadzuEnviado porRoberto Pellerano
- 22 order formEnviado porRoberto Pellerano
- CianuroEnviado porRoberto Pellerano
- ChinMed-1749-8546-3-9Enviado porRoberto Pellerano

- stress.pdfEnviado porcharie2
- Free Reign LeadershipEnviado porKareem Jassani
- differences between Inductive and Deductive InstructionEnviado porparajms8778
- dcsd induction program project plan for quarter 4Enviado porapi-392257524
- QuestionnaireEnviado porsmartersmarty
- jackson-tpges-lesson-planEnviado porapi-320999303
- angela root resume weeblyEnviado porapi-284305881
- FINAL Exam, 2nd Semester 2017-2018Enviado porvisayasstateu
- Adverbs and Adverb PhrasesEnviado poranthonette lantaco
- Gender Disphoria vs.transexualityEnviado porCamelia Tudor
- Unit 2-Individual BehaviourEnviado porRobin Varshney
- ccss lesson plan - cowan s15 multiplying binomialsEnviado porapi-260036820
- physics classroom management planEnviado porapi-243289431
- The Hamptons A4 8pp Portrait_PRF6Enviado porBenjamin Phippen
- List of Men's National Association Football Teams - Wikipedia, The Free EncyclopediaEnviado porMelai Delos Reyes Celes
- novel assignments the hobbitEnviado porapi-287313865
- AutoCAD MEP Complete GuideEnviado porVineeth Muraleedharan
- R Packages for Machine LearningEnviado porReaderRat
- K.O. Sandnes-The Challenge of Homer_ School, Pagan Poets and Early Christianity (2009)Enviado porDoron3
- Practical Pacing and Fatigue ManagementEnviado porreadingaddict1
- Infusing Ict in English ClassroomEnviado porZulfadhli Kamarudin
- In Re; Benjamin DacanayEnviado pormavslastimoza
- S11024055_ACT_3Enviado porJosefaDelai
- Classroom Instruction Delivery Alignment Map BUSINESS FINANCEEnviado porJana Kamille Andasan Resma
- Health is Wealth - Methods to Improve Attendance in a Lifestyle Intervention for a Largely Immigrant Filam SampleEnviado porAngela Reyes
- Lesson Outline in Earth ScienceEnviado porKenneth Jay Acidera
- Certificate Js,HTML,Css(Duke)Enviado porSrijan Goyal
- Designing Effective PowerPoint PresentationsEnviado porDimasalang Perez
- The Unofficial Scripps College Survival Guide 2015Enviado porThePoliticalHat
- Mirror Neurons PowerpointEnviado porTroy Maynes