Escolar Documentos
Profissional Documentos
Cultura Documentos
Synopsis
A sensor is designed to detect the presence of one of two groups of substances. The
sensor measures 12 different numerical attributes of a sample of an unknown substance to
determine which of the two groups the sample falls into. The sensor also indicates a false
alarm when the sample does not fall into either group.
Attributes
12 numerical values for attributes named Input 1 to Input 12
Dim
Retrieve or set the dimension of an object.
> dim(RedWhiteWine)
[1] 6497 13
Head
To obtain the first several rows of a matrix or data frame head is used.
> head(RedWhiteWine)
fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
1
4.6
0.52
0.15
2.1
0.054
2
4.7
0.60
0.17
2.3
0.058
3
4.9
0.42
0.00
2.1
0.048
4
5.0
0.38
0.01
1.6
0.048
5
5.0
0.40
0.50
4.3
0.046
6
5.0
0.42
0.24
2.0
0.060
free.sulfur.dioxide total.sulfur.dioxide density pH sulphates
1
8
65 0.99340 3.90
0.56
2
17
106 0.99320 3.85
0.60
3
16
42 0.99154 3.71
0.74
4
26
60 0.99084 3.70
0.75
5
29
80 0.99020 3.49
0.66
6
19
50 0.99170 3.72
0.74
alcohol R.W quality
1 13.1 R
4
2 12.9 R
6
3 14.0 R
7
4 14.0 R
6
5 13.6 R
6
6 14.0 R
8
Tail
To obtain the last several rows of a matrix or data frame tail is used.
> tail(SensorDiscrimination)
> tail(RedWhiteWine)
fixed.acidity volatile.acidity citric.acid residual.sugar
6492
10.3
0.17
0.47
1.4
6493
10.3
0.25
0.48
2.2
6494
10.7
0.22
0.56
8.2
6495
10.7
0.22
0.56
8.2
6496
11.8
0.23
0.38
11.1
6497
14.2
0.27
0.49
1.1
chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
6492
0.037
5
33 0.9939 2.89
6493
0.042
28
164 0.9980 3.19
6494
0.044
37
181 0.9980 2.87
6495
0.044
37
181 0.9980 2.87
6496
0.034
15
123 0.9997 2.93
6497
0.037
33
156 0.9920 3.15
sulphates alcohol R.W quality
6492
0.28
9.6 W
3
6493
0.59
9.7 W
5
6494
0.68
9.5 W
6
6495
0.68
9.5 W
6
6496
0.55
9.7 W
3
6497
0.54 11.1 W
6
Mean
Generic function for the (trimmed) arithmetic mean.
> mean(RedWhiteWine$fixed.acidity)
[1] 7.215346
> mean(RedWhiteWine$volatile.acidity)
[1] 0.339666
> mean(RedWhiteWine$citric.acid)
[1] 0.3186332
> mean(RedWhiteWine$residual.sugar)
[1] 5.445344
> mean(RedWhiteWine$chlorides)
[1] 0.05603386
> mean(RedWhiteWine$free.sulfur.dioxide)
[1] 30.52532
> mean(RedWhiteWine$total.sulfur.dioxide)
[1] 115.7446
> mean(RedWhiteWine$density)
[1] 0.9946966
> mean(RedWhiteWine$pH)
[1] 3.218501
> mean(RedWhiteWine$sulphates)
[1] 0.5312683
> mean(RedWhiteWine$alcohol)
[1] 10.49206
> mean(RedWhiteWine$R.W)
[1] NA
> mean(RedWhiteWine$quality)
[1] 5.818378
Var
The variance is a numerical measure of how the data values is dispersed around
the mean. In particular, the sample variance is defined as:
> var(SensorDiscrimination$Input.1)
[1] 1.295837
> var(SensorDiscrimination$Input.2)
[1] 1.2879
> var(SensorDiscrimination$Input.3)
[1] 0.8122589
> var(SensorDiscrimination$Input.4)
[1] 1.373952
> var(SensorDiscrimination$Input.5)
[1] 0.4726337
> var(SensorDiscrimination$Input.6)
[1] 0.8495474
> var(SensorDiscrimination$Input.7)
[1] 0.2619285
> var(SensorDiscrimination$Input.8)
[1] 0.06095695
> var(SensorDiscrimination$Input.9)
[1] 0.3875063
> var(SensorDiscrimination$Input.10)
[1] 2.656813
> var(SensorDiscrimination$Input.11)
[1] 1.874917
3
> var(SensorDiscrimination$Input.12)
[1] 12.4342
SD
The standard deviation of an observation variable is the square root of its variance.
> sd(SensorDiscrimination$Input.1)
[1] 1.138348
> sd(SensorDiscrimination$Input.2)
[1] 1.134857
> sd(SensorDiscrimination$Input.3)
[1] 0.9012541
> sd(SensorDiscrimination$Input.4)
[1] 1.172157
> sd(SensorDiscrimination$Input.5)
[1] 0.6874836
> sd(SensorDiscrimination$Input.6)
[1] 0.921709
> sd(SensorDiscrimination$Input.7)
[1] 0.5117895
> sd(SensorDiscrimination$Input.8)
[1] 0.2468946
> sd(SensorDiscrimination$Input.9)
[1] 0.6225
> sd(SensorDiscrimination$Input.10)
[1] 1.629973
> sd(SensorDiscrimination$Input.11)
[1] 1.369276
> sd(SensorDiscrimination$Input.12)
[1] 3.526215
Length
Get or set the length of vectors
4
> length(SensorDiscrimination$Input.1)
[1] 2212
> length(SensorDiscrimination$Input.2)
[1] 2212
> length(SensorDiscrimination$Input.3)
[1] 2212
> length(SensorDiscrimination$Input.4)
[1] 2212
> length(SensorDiscrimination$Input.5)
[1] 2212
> length(SensorDiscrimination$Input.6)
[1] 2212
> length(SensorDiscrimination$Input.7)
[1] 2212
> length(SensorDiscrimination$Input.8)
[1] 2212
> length(SensorDiscrimination$Input.9)
[1] 2212
> length(SensorDiscrimination$Input.10)
[1] 2212
> length(SensorDiscrimination$Input.11)
[1] 2212
> length(SensorDiscrimination$Input.12)
[1] 2212
Sum
To get sum of all the values present in its arguments.
> sum(SensorDiscrimination$Input.1)
[1] 7986.513
5
> sum(SensorDiscrimination$Input.2)
[1] 4075.915
> sum(SensorDiscrimination$Input.3)
[1] 10959.68
> sum(SensorDiscrimination$Input.4)
[1] 10283.42
> sum(SensorDiscrimination$Input.5)
[1] 2183.266
> sum(SensorDiscrimination$Input.6)
[1] 3231.868
> sum(SensorDiscrimination$Input.7)
[1] 1500.403
> sum(SensorDiscrimination$Input.8)
[1] 801.2843
> sum(SensorDiscrimination$Input.9)
[1] 1952.437
> sum(SensorDiscrimination$Input.10)
[1] 6132.516
> sum(SensorDiscrimination$Input.11)
[1] 4178.967
> sum(SensorDiscrimination$Input.12)
[1] 8602.95
Range
A range is a vector containing the minimum and maximum of all the given arguments.
> range(SensorDiscrimination$Input.1)
6
Readline
To readline reads a line from the terminal.
> enames <-readline(SensorDiscrimination)
c(1.473, 1.46, 1.552, 1.605, 1.534, 1.796, 1.566, 1.425, 1.595, 1.628, 1.583, 1.454, 1.617, 1.482, 1.443, 1.477, 1.527, 1.599,
1.587, 1.521, 1.562, 1.438, 1.406, 1.644, 1.378, 1.537, 1.583, 1.527, 1.819, 1.573, 1.516, 1.63, 1.497, 1.688, 1.741, 1.703, 2.6
Names
Functions to get the names of an object.
> names(SensorDiscrimination)
[1] "Input.1"
"Input.2"
[5] "Input.5"
"Input.6"
[9] "Input.9"
"Input.10"
[13] "Substance.Group"
"Input.3"
"Input.7"
"Input.11"
"Input.4"
"Input.8"
"Input.12"
Summary
summary is a generic function used to produce result summaries of the results of
various model fitting functions.
> summary(x0)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.129 3.005 4.052 3.611 4.485 5.105
> summary(x0)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.129 3.005 4.052 3.611 4.485 5.105
> summary(x1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3247 0.8655 1.5330 1.8430 2.6320 4.6750
> summary(SensorDiscrimination$Input.6)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1758 0.5924 1.5830 1.4610 2.1350 3.6380
> summary(SensorDiscrimination$Input.7)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1074 0.2954 0.5444 0.6783 0.9229 2.4460
Results
Histogram
A histogram is a visual representation of the distribution of a dataset. As such, the shape
of a histogram is its most obvious and informative characteristic.
This allows us easily see where a relatively large amount of the data is situated and
where there is very little data to be found.
In other words, the middle is in your data distribution, how close the data lie around
this middle and where possible outliers are to be found. Exactly because of all this, histograms
are a great way to get to know your data!
The below histograms are of Sensor Discrimination with its different numerical values
for attributes named Input 1 to Input 12
>hist(SensorDiscrimination$Input.1)
> hist(SensorDiscrimination$Input.2)
> hist(SensorDiscrimination$Input.3)
10
> hist(SensorDiscrimination$Input.8)
> hist(SensorDiscrimination$Input.9)
11
Box plot
Boxplots can be created for individual variables or for variables by group. The format
is boxplot(x, data=), where x is a formula and data= denotes the data frame providing the data
> x0 <-SensorDiscrimination$Input.1
> x1 <-SensorDiscrimination$Input.2
> boxplot(log(x0),log(x1))
> boxplot(log(SensorDiscrimination$Input.5),log(SensorDiscrimination$Input.4))
12
> boxplot(log(SensorDiscrimination$Input.1),log(SensorDiscrimination$Input.12))
> boxplot(log(SensorDiscrimination$Input.2),log(SensorDiscrimination$Input.3))
13
Scatter Plot
A scatter plot pairs up values of two quantitative variables in a data set and display
them as geometric points inside a Cartesian diagram.
> plot(x0, x1, main="Scatterplot Example",xlab ="Input.1",ylab = "Input.2",pch=19)
14
15
> group<-as.Date(as.character(SensorDiscrimination$Substance.Group),"%Y%m%d")
> rng <- range(x0,x1,na.rm = T)
> plot(dates,x1,pch=20,ylim=rng,xlab = "")
16
17
Multiple Plots
R makes it easy to combine multiple plots into one overall graph, as shown below.
par(mfrow=c(2,2))
plot(Input.1,Input.2, main="Scatterplot")
plot(Input.3,Input.4, main="Scatterplot")
hist(Input.5, main="Histogram")
boxplot(Input.6, main="Boxplot")
18
Sapply
Sapply is a traversing over a set of data like a list or vector, and calling the specified
function for each item.
>Sapply(split(SensorDiscrimination,SensorDiscrimination$Input.10),nrow)
0.741 0.8484 0.863 0.8667 0.8789 0.8801 0.8887 0.8899 0.8936 0.8984 0.9033
1
1
1
1
1
1
2
1
2
1
1
0.9045 0.9058 0.907 0.9082 0.9106 0.9131 0.9143 0.9155 0.9167 0.918 0.9192
1
1
1
1
1
1
2
1
1
2
2
0.9216 0.9253 0.929 0.9302 0.9338 0.9351 0.9363 0.9375 0.9387 0.9424 0.9436
2
1
1
1
3
2
1
1
3
3
2
0.9448 0.946 0.9473 0.9485 0.9497 0.9521 0.9534 0.9558 0.957 0.9583 0.9595
2
1
2
2
3
2
1
3
1
2
5
0.9607 0.9619 0.9631 0.9644 0.9656 0.9668 0.968 0.9692 0.9705 0.9717 0.9729
2
3
2
2
2
2
1
3
1
3
1
0.9753 0.9766 0.9778 0.979 0.9802 0.9814 0.9839 0.9851 0.9863 0.9875 0.9888
3
4
2
1
4
2
1
1
1
1
4
0.9912 0.9924 0.9937 0.9949 0.9961 0.9973 0.9985 1.001 1.003 1.005 1.007
1
2
2
1
1
3
1
2
2
3
1
1.01 1.011 1.012 1.013 1.014 1.016 1.017 1.018 1.019 1.022 1.023
3
2
2
1
2
1
3
3
2
1
2
1.024 1.025 1.027 1.028 1.03 1.031 1.033 1.034 1.035 1.036 1.038
4
3
2
1
1
1
6
1
1
2
1
1.039 1.04 1.041 1.042 1.044 1.047 1.049 1.05 1.051 1.052 1.053
2
3
2
2
2
2
1
3
1
3
1
0.9753 0.9766 0.9778 0.979 0.9802 0.9814 0.9839 0.9851 0.9863 0.9875 0.9888
3
4
2
1
4
2
1
1
1
1
4
0.9912 0.9924 0.9937 0.9949 0.9961 0.9973 0.9985 1.001 1.003 1.005 1.007
1
2
2
1
1
3
1
2
2
3
1
1.01 1.011 1.012 1.013 1.014 1.016 1.017 1.018 1.019 1.022 1.023
3
2
2
1
2
1
3
3
2
1
2
1.024 1.025 1.027 1.028 1.03 1.031 1.033 1.034 1.035 1.036 1.038
4
3
2
1
1
1
6
1
1
2
1
1.039 1.04 1.041 1.042 1.044 1.047 1.049 1.05 1.051 1.052 1.053
19