Você está na página 1de 8

Problem 3

m2
45.018
1.6259
48
36.5
52.875
48
11.4969
132.1794653
-0.4417
-0.6508
45
20
65
2250.9
50

Mean
Standard Error
Median
1st quartile
3rd quartile
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

Price
148124.712
4362.7338
149000
130250
165000
110000
30849.1863
9.5167E+08
-0.0166
0.03440
140006
77994
218000
7406235.6
50

price/m2
3401.56
95.2751
3322
3039.75
3617
N/A
673.6968
453867.3943
4.3338
1.7144
3464
2400
5864
170078
50

Box plot - Price/m2)

Box plot - Area(m2)

20

40

60

Construction year
1957.4
0.8447
1955
1954
1959
1954
5.9727
35.6735
2.3710
1.8547
21
1952
1973
97870
50

80

1000

2000

3000

4000

5000

Aparments by
construction year

Box plot - Price


15
10
5

1973

1971

1969

1967

1965

1963

1961

220000

1959

170000

1957

120000

1955

70000

1953

25th
50th
75th
lower
upper

m2
36.5
11.5
4.875
16.5
12.125

Price
price/m2
130250 3039.75
18750
282.25
16000
295
52256
639.75
53000
2247

Problem 4
Analyze the correlation between price per m2 and the m2, construction year and condition of the flat.
Regression Statistics
Multiple R
0.710462679
R Square
0.504757219
Adjusted R Square
0.472458776
Standard Error
489.3196916
Observations
50
ANOVA
df
Regression
Residual

SS
MS
F
Significance F
3 11225549.3 3741850 15.62791 3.81315E-07
46
11013953 239434

Total

49 22239502.3

Intercept
m2
Construction year
condition

Coefficients
57167.74046
-36.8973676
-26.70170116
127.5992637

Standard
Upper
Lower
Upper
Error
Lower 95%
95%
95.0%
95.0%
t Stat
P-value
22989.6637 2.48667 0.016585 10891.94825 103444 10891.95 103443.5
6.11807657 -6.03088
2.6E-07 -49.2124168 -24.5823 -49.2124 -24.5823
11.7566349 -2.2712 0.02786 -50.36657939 -3.03682 -50.3666 -3.03682
110.668395 1.15299 0.254872 -95.16465752 350.363 -95.1647 350.3632

With R=0.504, there seems to be a significant correlation between the dependant variable (sale price)
and the chosen independant variables.
There's a negative correlation between the area and the price per m2, which means the larger the apartment,
the cheaper per m2 it gets
There's negative correlation between the price and the year constructed. It is in contrary with my expectation
There's a positve correlation between the condition and price, which is quite reasonable. The better the
flat, the more expensive it is.
Problem 5
Independent variables
m2
Construction year
Condition

60
1971
1

Estimated market price per m2:

57167.7-36.90*60-26.7*1971+1*127.60
=
2452.4

The price is a bit lower than my expectation, though it may make sense because the rental for my room
is the lowest in the neighborhood.

Data for problem 3-4-5


Type
1h,kk
1h, kk
1h, bk
1h+kk+kh
1h+kk+kh
1h,kk,alkovi
1h, avok.
1h, k
1h,avok,kph
1h, kk, las.p
2h,k,kph,ransk. p
2 h, k, p
2h, avokeitti
2h,kk,parveke
2h, k
2h, kk
2 h, kk, p
2h, k
2h, k, kph, parv.
2h, k
2 h, kk, p
2h, kk, kph, piha
2h,k,kph,parv.
2h+k+kph+p
2h+kk
2h+kk+kph+vh
2h, k, kph, las.parv.
2h, k, kph, parveke
2h,kk
2 h, k, vh, p
2h, tupakeitti,ransk.par
2h,kk,kph
2h, k
2h, k, p
2h, kk, ruokailuerkkeri,
3h, k
2h, k, kph, vh, parveke
2h, k, kph, las.parveke
2 h, k, p
2h, k
2 h, k, kph, vh, parv
2 h, k, p
2h, k, kph, parveke
2 h, k, p
2h, ke, kph, parveke
2h, kk, parveke
2h+k+kh+parv
2h, kk
3h, kk, p
3 h, k, p

Price
129000
77994
110000
90000
125000
97312.7
99000
110000
98500
127000
166500
169489.4
148733
146500
160000
135000
153049.9
144000
159000
153000
149000
165000
143000
152000
124000
119000
187000
174000
137000
155000
134000
149000
154000
149704.6
137000
135000
192000
144000
188000
162000
150000
189500
146000
199800
170000
155000
204500
120000
204652
218000

price/m2
5864
3466
5500
3333
5000
3604
3808
3235
3648
3969
3469
3606
4250
2845
3048
3857
3733
2769
3741
2942
3548
3438
2804
3167
2756
3400
3117
3164
3044
3229
2680
3311
2884
3119
2537
2455
3589
2400
3837
2700
3125
3445
3042
3770
3617
3039
3146
2857
3655
3516

m2
22
22.5
20
27
25
27
26
34
27
32
48
47
35
51.5
52.5
35
41
52
42.5
52
42
48
51
48
45
35
60
55
45
48
50
45
53.4
48
54
55
53.5
60
49
60
48
55
48
53
47
51
65
42
56
62

Area: Pohjois Haaga. Condition: 2=good, 1=satisfactory,0=terrible.

Construction year
1954
1960
1954
1959
1954
1959
1954
1955
1959
1972
1954
1953
1956
1956
1959
1956
1953
1955
1957
1955
1954
1956
1960
1954
1973
1956
1972
1954
1973
1954
1960
1954
1955
1957
1958
1955
1952
1972
1952
1972
1954
1954
1957
1952
1953
1952
1954
1954
1956
1957

condition
2
0
2
0
2
1
2
1
2
2
1
1
1
2
2
1
1
1
1
2
2
1
1
1
1
0
2
1
1
2
2
1
1
1
2
1
2
1
1
1
2
2
2
1
1
1
1
1
0
0

Floor
3/4
2/3
2/4
2/4
3/4
4/4
2/6
1/3
3/4
5/6
3/6
1/3
2/10
2/3
1/4
7/10
2/3
3/3
1/4
2/3
4/4
1/3
1/4
2/4
5/6
8/10
1/6
1/3
3/6
2/4
2/3
2/3
3/4
3/4
3/4
4/7
4/4
1/6
4/4
1/6
3/4
3/3
3/3
2/4
3/3
3/4
2/4
4/4
2/3
1/4

elevator
ei
ei
ei
ei
on
ei
on
ei
ei
on
on
ei
on
ei
ei
on
ei
ei
ei
ei
ei
ei
on
ei
on
on
on
ei
on
ei
ei
ei
on
ei
ei
on
ei
on
ei
on
ei
ei
ei
ei
ei
ei
ei
ei
ei
ei

apartmentsyear
3
14
5
6
4
1
4
3
4
2

1953
1954
1955
1956
1957
1958
1959
1960
1972
1973

Problem 1
A
x
4
7
5
8
6
13
10
11
9
14
12

y
4.26
4.82
5.68
6.95
7.24
7.58
8.04
8.33
8.81
9.96
10.84

rank X
1
4
2
5
3
10
7
8
6
11
9

rank y
1
2
3
4
5
6
7
8
9
10
11

d
0
2
-1
1
-2
4
0
0
-3
1
-2

d^2
0
4
1
1
4
16
0
0
9
1
4

y
3.1
4.74
6.13
7.26
8.1
8.14
8.74
8.77
9.13
9.14
9.26

rank x
1
2
3
4
11
5
10
6
9
7
8

rank y
1
2
3
4
5
6
7
8
9
10
11

d
0
0
0
0
6
-1
3
-2
0
-3
-3

y
5.39
5.73
6.08
6.42
6.77
7.11
7.46
7.81
8.15
8.84
12.74

rank x
1
2
3
4
5
6
7
8
9
11
10

rank y
1
2
3
4
5
6
7
8
9
10
11

y
5.25
5.56
5.76
6.58
6.89
7.04
7.71
7.91
8.47
8.84
12.5

55
rank x
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
11

rank y
1
2
3
4
5
6
7
8
9
10
11

Spearman's correlation
coefficient
d= rank X - rank Y

sum(d^2)
n
p
pearson

40
11
0.818182
0.816421

d^2
0
0
0
0
36
1
9
4
0
9
9

sum(d^2)
n
p
pearson

68
11
0.690909
0.816237

d
0
0
0
0
0
0
0
0
0
1
-1

d^2
0
0
0
0
0
0
0
0
0
1
1

sum(d^2)
n
p
pearson

2
11
0.990909
0.816287

d
4.5
3.5
2.5
1.5
0.5
-0.5
-1.5
-2.5
-3.5
-4.5
0

d^2
20.25
12.25
6.25
2.25
0.25
0.25
2.25
6.25
12.25
20.25
0

sum(d^2)
n
p
pearson

82.5
11
0.625
0.816521

B
x
4
5
6
7
14
8
13
9
12
10
11

C
x
4
5
6
7
8
9
10
11
12
14
13

D
x
8
8
8
8
8
8
8
8
8
8
19

Result interpretion
In all four case, both Pearson's and Spearman's correlation coefficient are positive and closer to 1,
implying a significant positive correlation between x and y.
In case A, the data is roughly distributed with an upward, which makes
Spearman's and Pearson's coefficients are nearly equal.
In case B, the data does not follow the upward trend from x=11, but it has a linear pattern,
which results in the fact the Pearson's coefficient is larger than Spearman's.
In case C, the data is nearly perfect monotonous, with just one exception of x.
Hence, Spearman's coefficient is larger than Pearson's. Because Pearson's relies on the assumption
of linear correlation, it is more sensitive to an input that is off trend from its linear outlier.
In case D, with most of x values are the same, they have to share the same average rank
in Spearman's calculation, which make Spearman's coefficient smaller than Pearson.

Problem 2

Shots on Goal - NHL

NHL season

60

0.873654179
0.763271624

Goals

Multiple R
R Square
Adjusted R
Square
Standard Error
Observations

y = 0.1213x - 2.9563
R = 0.7633

80

Regression Statistics

0.762905738
4.704100328
649

40
20
0
-20

100

200

300
Shots on Goal

400

500

600

ANOVA
df

Regression
Residual
Total

1
647
648

SS

MS

46162.17152
14317.17825
60479.34977

Coefficients Standard Error

Intercept
Shots on Goal

-2.95626228
0.121338901

1.3328E-204

t Stat

Lower 95%

0.350319845 -8.4388
0.002656645 45.674

P-value

Upper 95% Lower 95.0% Upper 95.0%

2.1E-16 -3.644163385 -2.2683612 -3.644163385 -2.26836118


1E-204 0.116122214 0.1265556 0.116122214 0.12655559

FIN season

Shots and Goals - FIN

Regression Statistics

0.742305964
0.551018144

y = 0.0612x - 0.5066
R = 0.551

40
30
Goals

Multiple R
R Square
Adjusted R
Square
Standard Error
Observations

Significance F

46162 2086.09
22.129

0.549735339
3.994377462
352

20
10
0
0

100

200

300

400

500

600

Shots

ANOVA
df

Regression
Residual
Total

1
350
351

SS

6853.357041
5584.267959
12437.625

Coefficients Standard Error

Intercept
Shots

Alpha
Beta
R^2

-0.50660787
0.061176623
NHL
-2.9563
0.1213
0.7633

MS

Significance F

6853.4 429.542
15.955

7.90773E-63

t Stat

Lower 95%

P-value

Upper 95% Lower 95.0% Upper 95.0%

0.39189584 -1.2927 0.19696 -1.277374861 0.2641591 -1.277374861


0.002951773 20.725 7.9E-63
0.05537118 0.0669821 0.05537118
FIN
-0.5066
0.0612
0.5510

Linear model is better applied in NHL (R^2 is bigger)

0.26415913
0.06698207