Escolar Documentos
Profissional Documentos
Cultura Documentos
Probabilitas
Tujuan Perkuliahan
Untuk mengerti ketidakpastian pada data dan perbedaan yang ada dapat
menggunakan alat bantu tampilan grafis. Tampilan grafis ini dapat berupa
line diagram/bar chart, dot diagram, histogram, frequency polygon,
duration curve dan lain-lain.
Contoh: Data Intensitas Hujan Tahunan
DAS Esopus Creek (1918 1946)
Kuantitas seperti ini dapat dievaluasi dari histogram yang diberikan dimana
secara statistik selalu ditetapkan dalam bentuk rata-rata sample (sample
mean) dan standar deviasi standar (sample standard deviation).
Parameter-Parameter
Sampel Acak
Perkiraan
Statistik
Statistik Deskriptif, Skala Pengukuran (1)
Nominal
Tidak terdapat properti numerik atau
quantitatif, klasifikasi group atau kategori
Gender: Pria atau wanita
Bidang: Struktur atau Sumber Daya Air
Ordinal
Digunakan untuk mengurutkan level
variabel yang sedang di analisis. Tidak ada
nilai spesifik yang ditempatkan dalam skala
rating tersebut.
Rating hotel: bintang 4, bintang 3, bintang
2, dan bintang 1
Statistik Deskriptif, Skala Pengukuran (2)
Interval
Perbedaan antar nilai dalam skala dan interval
tersebut berukuran sama. Tidak ada nilai nol.
Dapat digunakan pembanding nilai pengukuran
Temperatur: Perbedaan antara 20 dan 30 derajat
adalah sama dengan perbedaan antara 30 dan 40
derajat. Kita tidak bisa bilang bahwa 40 derajat dua
kali lebih panas dari 20 derajat, hanya 20 derajat
lebih panas.
Rasio
Skala yang mempunyani titik nol yang
mengindikasikan nilai variabel tersebut tidak ada.
Dapat dijadikan rasio
Berat: 100 kg adalah setengahnya dari 200 kg
Statistik Deskriptif, Distribusi
Frekuensi
Dalam tabel, distribusi frekwensi di
bentuk dengan me-resume data dalam
bentuk nilai frekwensi observasi dalam
setiap kategori, skor, atau interval skor.
40
30
Frequency
20
10
0
22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5
25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0
Age in years
Tabel Frekwensi
Penyusunan Tabel Frekwensi
Pengelompokan data menjadi kelas
Mencari jumlah item/sampel dalam setiap kelas
Disusun agar data mudah di mengerti
Pertimbangan
Pengelompokan tidak overlap
Jumlah kelas umumnya antara 5 dan 18
Jika memungkinkan, pengelompokan memiliki lebar
yang sama, walaupun kadangkala lebar yang berbeda
diperlukan
Setiap observasi hanya terdapat dalam satu kelas
Jumlah Kelas
Cukup kecil untuk menampilkan summary/resume
Cukup besar untuk menampilkan karakteristik yang relevan
Kelas dengan batas yang paling kecil harus memasukan
nilai data terkecil
Kelas dengan batas yang paling besar harus memasukan
nilai data terbesar
Histogram
Visually displays the information from a frequency table
Plot the group boundaries on the horizontal axis-use a constant linear
scale
frequency (relative frequency) on the vertical axis
draw a vertical bar for each group
where the area of the bar is proportional to the (relative) frequency
for equal class widths, the height is proportional to the frequency
Note that there are no gaps between the bars for continuous data.
to convert to a relative frequency histogram, or a percentage histogram,
just change the vertical scale
Contoh Lain
Statistik Deskriptif
Kurva Normal Curva Bimodal
Positively Negatively
Skewed Skewed
Ogive(Cumulative Frequency
Polygon)
visually displays the information from a frequency
table
on the horizontal axis, the group boundaries are
drawn on linear scale
on the vertical axis, the percentages or proportions
For the first point, plot (lower boundary of lowest
class, 0)
then, for each class, plot (upper class boundary,
cumulative frequency) on the x and y axis
respectively
join the points
Histogram dan Poligon
Frekwensi
Penyajian Histogram dapat disampaikan dalam poligon frekwensi
Perbandingan dengan kelompok data lain dapat disajian secara
superimpose
Kecenderungan Tengah: Central Tendency
Modus (Mode)
Nilai yang mempunyai frekuensi paling besar
3 3 3 4 4 4 5 5 5 6 6 6 6: Modus=6
3 3 3 4 4 4 5 5 6 6 7 7 8: Modus adalah 3 dan 4
Nilai Tengah (Median)
Nilai yang membagi dua grup nilai dimana 50 % berada di atas
dan 50 % berada di bawah nilai median
3 3 3 5 8 8 8: Median=5
3 3 5 6: Median=4 (Rata-rata dari 2 nilai yang terdapat di
tengah)
Nilai Rerata (Mean)
Nilai yang selalu di utamakan, dan satu-satunya properti central
tendency yang digunakan dalam analisis statistika lanjut.
Lebih akurat dan reliabel
Cocok bagi perhitungan aritmatik
Pada umumnya menjumlahkan semua nilai dibagi dengan
banyaknya nilai.
2 3 4 6 10: Mean=5 (25/5)
Review: Measure of Central Tendency
Advantages:
it is easy to compute
combines well
i.e. the mean of
combined sample is the
weighted mean of two
sample means
where the weights are
proportions in each
sample
corresponds to the
'centre of gravity' of the
data values
Disadvantages:
It is affected by outliers
or extreme values
Median dan Modus
Advantages of the Advantage of Mode
median none
as it is the central Disadvantage of the
observation it is Mode
not affected by
extreme values It may not exist
If it exists, it may be far
Disadvantage of from "centre" of data
the median
It doesn't combine
well
It requires ranking
the observations
Pemilihan Kecenderungan Tengah:
Mean
Dipengaruhi oleh pencilan (outlier)
Median
Tidak dipengaruhi oleh pencilan dalam jumlah
yang sedikit
Umumnya median digunakan untuk skewed data
(tapi tidak selalu)
Menggunakan satu nilai central tendency tidak selalu
cukup
Berhati-hati dengan perata-rataan, hanya merata-
ratakan ketika masuk akal (konteks)
Mean dan Median selalu berbeda kecuali bagi data
yang simetri
Properti distribusi frekuensi:
Spread/Variability/Dispersion
Rentang (Range)
Dihitung dengan mengurangi nilai tertinggi dengan
nilai terendah
Hanya digunakan untuk skala Ordinal, Interval, dan
Ratio scales dan data harus terurut
Contoh: 2 3 4 6 8 11 24 (Rentang=22)
Varian (Variance)
Jangkauan nilai dalam distribusi frekuensi (The extent
to which individual scores in a distribution of scores
differ from one another)
Standard Deviasi (Standard Deviation)
Akar kuadrat dari varian
Digunakan untuk menggambarkan dispersi dalam set
observasi pada sebuah distribusi
Review: measure of spread
Range
the largest observation minus the smallest observation
Xmax-Xmin
Advantage of Range
It is easy to calculate
Disadvantage
It is very influenced by outliers
Interquartile Range
shows the range of the middle 50% of observations
Defined as the difference between third quartile & first
quartile
Q3 - Q1
Advantage
It is not affected by outliers
Disadvantage
It is harder to calculate as it requires ranking
Review: measure of spread (2)
Variance, 2
This is the "average" of the squared
deviations from the mean
Variance
Note the use of the divisor of (n - 1) for
the sample variance. This makes the
sample variance an unbiased estimator
of the population variance
Advantages
good mathematical properties
Disadvantages
It is strongly influenced by outliers
It is also in squared units
hard to have a good idea about what size
it
Standard Deviation
This is the square root of variance
advantages
This statistic is in original units and is thus
directly comparable to the mean
disadvantages
It is influenced by outliers
Coefficient of Variation
This is the ratio of the standard
deviation to the mean
It is usually expressed as a
percentage
It measures the spread relative
to the average size
Estimating the mean and variance
from grouped data
With grouped data
have lost the information about where in the group each
observation lies
all we know is the group in which each observation lies
Therefore we assume each observation in a group lies at the
group midpoint
About Dispersion
Is the amount of spread or scatter
that occurs in data set
If values in set are clustered tightly
around their mean, measured
dispersion (std. dev.) is small
if standard deviation is small, items
grouped around their mean
if standard deviation is large data
values widely dispersed about their
mean
Aturan Praktis
Untuk distribusi data yang berbentuk
lonceng (mendekati normal)
Sekitar 68% dari observasi terdapat
dalam rentang satu standard deviasi
dari mean
Sekitar 95% dari observasi terdapat
dalam rentang dua standar deviasi dari
mean
Sekitar 99.7% dari observasi terdapat
dalam rentang tiga deviasi dari mean
Chebyshev's theorem
The proportion (percentage) of any
data set that lies within k standard
deviations of the mean (k is any
positive number greater than 1) is at
least
1 - (1/k2)
eg. k = 2 - at least 75% of items in a data
set lie within 2 standard deviations of the
mean, it doesn't matter how skewed the
data set is
Z-Scores dan T-Scores
Z-Scores
Most widely used standard score in statistics
It is the number of standard deviations above or below the
mean.
a Z score of 1.5 means that the score is 1.5 standard
deviations above the mean; a Z score of -1.5 means that the
score is 1.5 standard deviations below the mean
Always have the same meaning in all distributions
To find a percentile rank, first convert to a Z score and then
find percentile rank off a normal-curve table
T-Scores
Most commonly used standard score for reporting performance
May be converted from Z-scores and are always rounded to
two figures; therefore, eliminating decimals
Always reported in positive numbers
The mean is always 50 and the standard deviation is always
10.
a T-score of 70 is 2 SDs above the mean
a T-score of 20 is 3 SDs below the mean
Korelasi dan Regresi Linear
Korelasi atau Kovarian (Correlation/Covariation)
Regresi Linear
Tujuan dari persamaan regeresi adalah untuk
perkiraaan sampel baru observasi berdasarkan
temuan dari sampel sebelumnya.
Types of Statistical Analysis -
Descriptive
Quantify the degree of relationship
between variables
Parametric tests are used to test
hypotheses with stringent
assumptions about observations
e.g., t-test, ANOVA
Nonparametric tests are used with
data in a nominal or ordinal scale
e.g., Chi-Square, Mann-Whitney U,
Wilcoxon
Types of Statistical Analysis -
Inferential
Allow generalization about populations using data
from samples
Non-parametric
Non-parametric tests do not require any assumptions
about normal distribution, but are generally less
sensitive than parametric tests.
The test for nominal data is the Chi-Square test
The tests for ordinal data are the Kolmogorov-Smirnov
test, the Mann-Whitney U test, and the Wilcoxon
Matched-Pairs Signed-Ranks test
Parametric
The tests for interval and ratio data include the t-test
and etc
Statistics and Probability
Statistics: Procedures for describing, analyzing,
and interpreting quantitative data
The choice of statistical technique should be
guided by the research design and the type of
data collected
Probability simply represents a judgment about
likelihood of outcomes, i.e., how likely is it that I
could obtain a result like this purely by chance?
Statistical inferences
significant very unlikely the effect would occur by
chance, e.g. less than 5%
not significant - results are likely to have occurred by
chance
Statistik Inferensial: Sampling (1)
Sampling relates to the degree to which those
surveyed are representative of a specific
population
Sampling Distributions
55