Você está na página 1de 19

Date Analysis Case Study: Changes in Sensory discrimination

among 12 numerical values


Sensory discrimination methods are supported by the sensR package and which
features the sensR package provides for these discrimination methods. Absent check marks
indicate that the feature is not implemented.

Synopsis
A sensor is designed to detect the presence of one of two groups of substances. The
sensor measures 12 different numerical attributes of a sample of an unknown substance to
determine which of the two groups the sample falls into. The sensor also indicates a false
alarm when the sample does not fall into either group.

Attributes
12 numerical values for attributes named Input 1 to Input 12
Dim
Retrieve or set the dimension of an object.
> dim(RedWhiteWine)
[1] 6497 13

Head
To obtain the first several rows of a matrix or data frame head is used.
> head(RedWhiteWine)
fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
1
4.6
0.52
0.15
2.1
0.054
2
4.7
0.60
0.17
2.3
0.058
3
4.9
0.42
0.00
2.1
0.048
4
5.0
0.38
0.01
1.6
0.048
5
5.0
0.40
0.50
4.3
0.046
6
5.0
0.42
0.24
2.0
0.060
free.sulfur.dioxide total.sulfur.dioxide density pH sulphates
1
8
65 0.99340 3.90
0.56
2
17
106 0.99320 3.85
0.60
3
16
42 0.99154 3.71
0.74
4
26
60 0.99084 3.70
0.75
5
29
80 0.99020 3.49
0.66
6
19
50 0.99170 3.72
0.74
alcohol R.W quality
1 13.1 R
4
2 12.9 R
6
3 14.0 R
7
4 14.0 R
6
5 13.6 R
6
6 14.0 R
8

Tail
To obtain the last several rows of a matrix or data frame tail is used.
> tail(SensorDiscrimination)
> tail(RedWhiteWine)
fixed.acidity volatile.acidity citric.acid residual.sugar
6492
10.3
0.17
0.47
1.4
6493
10.3
0.25
0.48
2.2
6494
10.7
0.22
0.56
8.2
6495
10.7
0.22
0.56
8.2
6496
11.8
0.23
0.38
11.1
6497
14.2
0.27
0.49
1.1
chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
6492
0.037
5
33 0.9939 2.89
6493
0.042
28
164 0.9980 3.19
6494
0.044
37
181 0.9980 2.87
6495
0.044
37
181 0.9980 2.87
6496
0.034
15
123 0.9997 2.93
6497
0.037
33
156 0.9920 3.15
sulphates alcohol R.W quality
6492
0.28
9.6 W
3
6493
0.59
9.7 W
5
6494
0.68
9.5 W
6
6495
0.68
9.5 W
6
6496
0.55
9.7 W
3
6497
0.54 11.1 W
6

Mean
Generic function for the (trimmed) arithmetic mean.
> mean(RedWhiteWine$fixed.acidity)
[1] 7.215346
> mean(RedWhiteWine$volatile.acidity)
[1] 0.339666
> mean(RedWhiteWine$citric.acid)
[1] 0.3186332
> mean(RedWhiteWine$residual.sugar)
[1] 5.445344
> mean(RedWhiteWine$chlorides)
[1] 0.05603386
> mean(RedWhiteWine$free.sulfur.dioxide)
[1] 30.52532
> mean(RedWhiteWine$total.sulfur.dioxide)
[1] 115.7446
> mean(RedWhiteWine$density)
[1] 0.9946966
> mean(RedWhiteWine$pH)
[1] 3.218501
> mean(RedWhiteWine$sulphates)
[1] 0.5312683
> mean(RedWhiteWine$alcohol)
[1] 10.49206
> mean(RedWhiteWine$R.W)
[1] NA

> mean(RedWhiteWine$quality)
[1] 5.818378

Var
The variance is a numerical measure of how the data values is dispersed around
the mean. In particular, the sample variance is defined as:

> var(SensorDiscrimination$Input.1)
[1] 1.295837
> var(SensorDiscrimination$Input.2)
[1] 1.2879
> var(SensorDiscrimination$Input.3)
[1] 0.8122589
> var(SensorDiscrimination$Input.4)
[1] 1.373952
> var(SensorDiscrimination$Input.5)
[1] 0.4726337
> var(SensorDiscrimination$Input.6)
[1] 0.8495474
> var(SensorDiscrimination$Input.7)
[1] 0.2619285
> var(SensorDiscrimination$Input.8)
[1] 0.06095695
> var(SensorDiscrimination$Input.9)
[1] 0.3875063
> var(SensorDiscrimination$Input.10)
[1] 2.656813
> var(SensorDiscrimination$Input.11)
[1] 1.874917
3

> var(SensorDiscrimination$Input.12)
[1] 12.4342
SD
The standard deviation of an observation variable is the square root of its variance.
> sd(SensorDiscrimination$Input.1)
[1] 1.138348
> sd(SensorDiscrimination$Input.2)
[1] 1.134857
> sd(SensorDiscrimination$Input.3)
[1] 0.9012541
> sd(SensorDiscrimination$Input.4)
[1] 1.172157
> sd(SensorDiscrimination$Input.5)
[1] 0.6874836
> sd(SensorDiscrimination$Input.6)
[1] 0.921709
> sd(SensorDiscrimination$Input.7)
[1] 0.5117895
> sd(SensorDiscrimination$Input.8)
[1] 0.2468946
> sd(SensorDiscrimination$Input.9)
[1] 0.6225
> sd(SensorDiscrimination$Input.10)
[1] 1.629973
> sd(SensorDiscrimination$Input.11)
[1] 1.369276
> sd(SensorDiscrimination$Input.12)
[1] 3.526215

Length
Get or set the length of vectors
4

> length(SensorDiscrimination$Input.1)
[1] 2212
> length(SensorDiscrimination$Input.2)
[1] 2212
> length(SensorDiscrimination$Input.3)
[1] 2212
> length(SensorDiscrimination$Input.4)
[1] 2212
> length(SensorDiscrimination$Input.5)
[1] 2212
> length(SensorDiscrimination$Input.6)
[1] 2212
> length(SensorDiscrimination$Input.7)
[1] 2212
> length(SensorDiscrimination$Input.8)
[1] 2212
> length(SensorDiscrimination$Input.9)
[1] 2212
> length(SensorDiscrimination$Input.10)
[1] 2212
> length(SensorDiscrimination$Input.11)
[1] 2212
> length(SensorDiscrimination$Input.12)
[1] 2212

Sum
To get sum of all the values present in its arguments.
> sum(SensorDiscrimination$Input.1)
[1] 7986.513
5

> sum(SensorDiscrimination$Input.2)
[1] 4075.915
> sum(SensorDiscrimination$Input.3)
[1] 10959.68
> sum(SensorDiscrimination$Input.4)
[1] 10283.42
> sum(SensorDiscrimination$Input.5)
[1] 2183.266
> sum(SensorDiscrimination$Input.6)
[1] 3231.868
> sum(SensorDiscrimination$Input.7)
[1] 1500.403
> sum(SensorDiscrimination$Input.8)
[1] 801.2843
> sum(SensorDiscrimination$Input.9)
[1] 1952.437
> sum(SensorDiscrimination$Input.10)
[1] 6132.516
> sum(SensorDiscrimination$Input.11)
[1] 4178.967
> sum(SensorDiscrimination$Input.12)
[1] 8602.95

Range
A range is a vector containing the minimum and maximum of all the given arguments.
> range(SensorDiscrimination$Input.1)
6

[1] 1.129 5.105


> range(SensorDiscrimination$Input.2)
[1] 0.3247 4.6750
> range(SensorDiscrimination$Input.3)
[1] 2.676 5.944
> range(SensorDiscrimination$Input.4)
[1] 1.705 6.013
> range(SensorDiscrimination$Input.5)
[1] 0.01343 2.75400
> range(SensorDiscrimination$Input.6)
[1] -0.1758 3.6380
> range(SensorDiscrimination$Input.7)
[1] -0.1074 2.4460
> range(SensorDiscrimination$Input.8)
[1] -0.07324 1.19900
> range(SensorDiscrimination$Input.9)
[1] -0.06226 2.56100
> range(SensorDiscrimination$Input.10)
[1] 0.741 5.312
> range(SensorDiscrimination$Input.11)
[1] 0.7043 5.6400
> range(SensorDiscrimination$Input.12)
[1] -0.2417 20.0000

Readline
To readline reads a line from the terminal.
> enames <-readline(SensorDiscrimination)
c(1.473, 1.46, 1.552, 1.605, 1.534, 1.796, 1.566, 1.425, 1.595, 1.628, 1.583, 1.454, 1.617, 1.482, 1.443, 1.477, 1.527, 1.599,
1.587, 1.521, 1.562, 1.438, 1.406, 1.644, 1.378, 1.537, 1.583, 1.527, 1.819, 1.573, 1.516, 1.63, 1.497, 1.688, 1.741, 1.703, 2.6

Names
Functions to get the names of an object.
> names(SensorDiscrimination)
[1] "Input.1"
"Input.2"
[5] "Input.5"
"Input.6"
[9] "Input.9"
"Input.10"
[13] "Substance.Group"

"Input.3"
"Input.7"
"Input.11"

"Input.4"
"Input.8"
"Input.12"

Naming an attribute at that instance


> x0 <-SensorDiscrimination$Input.1
> x1 <-SensorDiscrimination$Input.2

Summary
summary is a generic function used to produce result summaries of the results of
various model fitting functions.
> summary(x0)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.129 3.005 4.052 3.611 4.485 5.105
> summary(x0)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.129 3.005 4.052 3.611 4.485 5.105
> summary(x1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3247 0.8655 1.5330 1.8430 2.6320 4.6750
> summary(SensorDiscrimination$Input.6)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1758 0.5924 1.5830 1.4610 2.1350 3.6380
> summary(SensorDiscrimination$Input.7)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1074 0.2954 0.5444 0.6783 0.9229 2.4460

Results
Histogram
A histogram is a visual representation of the distribution of a dataset. As such, the shape
of a histogram is its most obvious and informative characteristic.
This allows us easily see where a relatively large amount of the data is situated and
where there is very little data to be found.
In other words, the middle is in your data distribution, how close the data lie around
this middle and where possible outliers are to be found. Exactly because of all this, histograms
are a great way to get to know your data!
The below histograms are of Sensor Discrimination with its different numerical values
for attributes named Input 1 to Input 12

>hist(SensorDiscrimination$Input.1)

> hist(SensorDiscrimination$Input.2)

> hist(SensorDiscrimination$Input.3)

10

> hist(SensorDiscrimination$Input.8)

> hist(SensorDiscrimination$Input.9)

11

Box plot
Boxplots can be created for individual variables or for variables by group. The format
is boxplot(x, data=), where x is a formula and data= denotes the data frame providing the data
> x0 <-SensorDiscrimination$Input.1
> x1 <-SensorDiscrimination$Input.2
> boxplot(log(x0),log(x1))

> boxplot(log(SensorDiscrimination$Input.5),log(SensorDiscrimination$Input.4))

12

> boxplot(log(SensorDiscrimination$Input.1),log(SensorDiscrimination$Input.12))

> boxplot(log(SensorDiscrimination$Input.2),log(SensorDiscrimination$Input.3))

13

Scatter Plot
A scatter plot pairs up values of two quantitative variables in a data set and display
them as geometric points inside a Cartesian diagram.
> plot(x0, x1, main="Scatterplot Example",xlab ="Input.1",ylab = "Input.2",pch=19)

> plot(SensorDiscrimination$Input.4,SensorDiscrimination$Input.5, main="Scatterplot


Example",xlab ="Input.5",ylab = "Input.4",pch=19)

14

> plot(SensorDiscrimination$Input.6,SensorDiscrimination$Input.7, main="Scatterplot


Example",xlab ="Input.5",ylab = "Input.4",pch=25)

> plot(SensorDiscrimination$Input.8,SensorDiscrimination$Input.12, main="Scatterplot


Example",xlab ="Input.8",ylab = "Input.12",pch=100)

15

> plot(SensorDiscrimination$Input.12,SensorDiscrimination$Input.1, main="Scatterplot


Example",xlab ="Input.12",ylab = "Input.1",pch=5)

> group<-as.Date(as.character(SensorDiscrimination$Substance.Group),"%Y%m%d")
> rng <- range(x0,x1,na.rm = T)
> plot(dates,x1,pch=20,ylim=rng,xlab = "")

16

Multiple Scatter plot


R makes it easy to combine multiple scatter plots into one overall graph, as shown
below.
> par(mfrow=c(1,6))
> plot(Input.1,Input.2, main="Scatterplot of input 1,2")
> plot(Input.3,Input.4, main="input3,4")
> plot(Input.5,Input.6, main="input 5,6")
> plot(Input.7,Input.8, main="input7,8")
> plot(Input.9,Input.10, main="input 9,10")
> plot(Input.11,Input.12, main="input 11,12")

17

Multiple Plots
R makes it easy to combine multiple plots into one overall graph, as shown below.
par(mfrow=c(2,2))
plot(Input.1,Input.2, main="Scatterplot")
plot(Input.3,Input.4, main="Scatterplot")
hist(Input.5, main="Histogram")
boxplot(Input.6, main="Boxplot")

18

Sapply
Sapply is a traversing over a set of data like a list or vector, and calling the specified
function for each item.
>Sapply(split(SensorDiscrimination,SensorDiscrimination$Input.10),nrow)
0.741 0.8484 0.863 0.8667 0.8789 0.8801 0.8887 0.8899 0.8936 0.8984 0.9033
1
1
1
1
1
1
2
1
2
1
1
0.9045 0.9058 0.907 0.9082 0.9106 0.9131 0.9143 0.9155 0.9167 0.918 0.9192
1
1
1
1
1
1
2
1
1
2
2
0.9216 0.9253 0.929 0.9302 0.9338 0.9351 0.9363 0.9375 0.9387 0.9424 0.9436
2
1
1
1
3
2
1
1
3
3
2
0.9448 0.946 0.9473 0.9485 0.9497 0.9521 0.9534 0.9558 0.957 0.9583 0.9595
2
1
2
2
3
2
1
3
1
2
5
0.9607 0.9619 0.9631 0.9644 0.9656 0.9668 0.968 0.9692 0.9705 0.9717 0.9729
2
3
2
2
2
2
1
3
1
3
1
0.9753 0.9766 0.9778 0.979 0.9802 0.9814 0.9839 0.9851 0.9863 0.9875 0.9888
3
4
2
1
4
2
1
1
1
1
4
0.9912 0.9924 0.9937 0.9949 0.9961 0.9973 0.9985 1.001 1.003 1.005 1.007
1
2
2
1
1
3
1
2
2
3
1
1.01 1.011 1.012 1.013 1.014 1.016 1.017 1.018 1.019 1.022 1.023
3
2
2
1
2
1
3
3
2
1
2
1.024 1.025 1.027 1.028 1.03 1.031 1.033 1.034 1.035 1.036 1.038
4
3
2
1
1
1
6
1
1
2
1
1.039 1.04 1.041 1.042 1.044 1.047 1.049 1.05 1.051 1.052 1.053
2
3
2
2
2
2
1
3
1
3
1
0.9753 0.9766 0.9778 0.979 0.9802 0.9814 0.9839 0.9851 0.9863 0.9875 0.9888
3
4
2
1
4
2
1
1
1
1
4
0.9912 0.9924 0.9937 0.9949 0.9961 0.9973 0.9985 1.001 1.003 1.005 1.007
1
2
2
1
1
3
1
2
2
3
1
1.01 1.011 1.012 1.013 1.014 1.016 1.017 1.018 1.019 1.022 1.023
3
2
2
1
2
1
3
3
2
1
2
1.024 1.025 1.027 1.028 1.03 1.031 1.033 1.034 1.035 1.036 1.038
4
3
2
1
1
1
6
1
1
2
1
1.039 1.04 1.041 1.042 1.044 1.047 1.049 1.05 1.051 1.052 1.053

19