Statistics: I. II. Iii. IV

Statistics Body of mathematical techniques and procedures
Opinion which attempts to draw inferences as to the outcome. Nomina l Ordinal Test of Relationship Test of Difference Test of Means Ratio/ Interval Ordinal Parametric Test z-test, t-test, ANOVA Non-Parametric Test Sign test, Mann-Whitney U Kruskal Walls Ratio/ Interval Chi-square, Fisher exact tests Spearman Rank-Difference, Coefficient of correlation, Pearson product-moment Regression Analysis
dealing with the: Collection Organization Analysis Interpretation Presentation of information that can be stated numerically Statistical methodology and theory Basic tool for measurement , evaluation and research The interpretation of statistics is both an art and a science
Inferential statistics
Population A group or set people, objects or events having a common observable characteristics Parameters Data obtained about a population Includes, MEAN and STANDARD DEVIATION are the characteristics of a population. Sample Collection of some elements in a population Subset of a population Statistics or estimate Data about samples Purposes: 1. Research relies on statistics to organize, summarize and interpret numerical data generated by researchers 2. Enables to assign precise and universally accepted quantitative values to the properties of objects, people and events 3. Provides measurement and evaluation of quantitative data. Two major categories of statistics: 1. Descriptive statistics 2. Inferential statistics Descriptive statistics Deals with: Enumeration Organization Graphical representation of data Example: age, sex, race and marital status Level of Measurement Nominal Ordinal Interval/Ratio Measures in the interpretation Frequency, percentages, mode Frequency, percentages, mode Mean and Standard Deviation
Classification of Statistics I. Measures to condense data- tabular and graphical data presentation II. Measures of central tendency- mean, median, mode III. Measures of variability- standard deviation, variance, range, coefficient of variation IV. Measures of location- percentile, quartile, decile Why Study Statistics? A knowledge of statistics is essential for both understanding and conducting research in any of the health professions Statistical terminology and methodology Discriminate between fact and fiction Sources of Data A. Variables Information on specific characteristics Ex. Age, weight, height, marital status or smoking habits B. Data Values of the observations recorded Raw materials of statistics Surveys Represent observations of events or phenomena over which few, if any, controls are imposed or controls are seldom possible Experiment To impose controls over the amount of exposure (treatment) to a phenomenon Imposes controls on the methods, treatment or conditions. Three general ways of organizing and presenting data: Table Graphs Numerical techniques Categories of Data Qualitative data- attributes or characteristics as labels
Inferential statistics Concerned with reaching conclusions from incomplete information-generalizing from the specific Uses information obtained from a sample to say something about an entire population.
Quantitative data- numerical information
Scale of Measurement 1. Nominal Grouping/categorizing/labeling data Qualitative variables Ex. Marital status, age group of older women, Race
Ex 90110, CB=109.5. Graphical presentation Presentation of data in the form of a graph or diagram Graph Description Example Line Relationship between Annual population growth rate from 2 or more sets of 2001-2005 quantities For time and trend Bar Comparing data taken Top 10 morbidity/mortality at a particular time in Brgy P, 2011 Qualitative and discrete quantitative Convenient graphical device for nominal/ ordinal data Pie chart, Educational Qualitative variables circle attainment of (at least 5 variables) graph area is proportional to female population component the frequency ages 25=60 n Brgy. bar P, 2011 Pictograph Actual pictures Immunization of children ages 1-6 in represent values regions 1-4, 2001 Histogram Continuous data Age distribution in Brgy. P 2011, most common graph systolic blood pressure. Measures of Central Tendency 1. Mean summing all, divide by the number of observations Hormonic, geometric and arithmetic Is affected by the value (large) may distort mean. Most commonly used. 2. Median Mode Arranged in an array Divides the distribution into equal parts Not affected by extreme values Most typical observation Middle most value Skewed (i.e.) income Occurs most frequently Peak
2. Ordinal
Ordered series of relationship Ranking order Ex. Nutritional level, Pain scale, 5-point likert scale
3. Interval
4. Ratio Numerically Artificial zero Ex. Temperature, age in months True or absolute zero Total absence Ex. Weight in pounds, blood pressure
Discrete Discontinues; whole numbers; always be in integers. Ex. Number of humans fingers Continues Fractional/ decimal values Ex. Age, height and weight Tabular method Process of presenting data in a form of table Displays quantitative data- clear and understandable Figures Any type of illustration other than a table such as charts, graphs, photographs, drawing Frequency Table Most convenient ways of summarizing data Frequency Number of cases with particular value Valid percent Is the percentage out of 100, using only those subjects w/ data percentage is the same with valid percent Class intervals Usually equal in length, thereby aiding the comparisons between any two intervals Interval width- number of units between upper and lower limits or class limits. Ex. 90-99, IW=10 Range- difference between highest and lowest numbers in data set Class boundaries- true limits; points that demarcate true upper limit of one class and true lower limit of the next.
3.
Interval-means Ordinal-median- skewness Nominal-mode Measures of Variation 1. Range 2. Mean Deviation 3. Standard Deviation 4. Variance 5. Coefficient of variation
Measures of variability Range = Xmax-Xmin; not very useful because it considers only extremes and does not take into consideration the bulk of observation. Mean Deviation = more sophisticated than the range; Average deviation of all observations from the mean; absolute value ignores the sign of difference. Standard Deviation = most widely used measure of variation Variance = squaring each deviation from the mean, summing up and dividing their sum by one < n larger they are, more homogenous. Coefficient of variation = ratio of SD to the absolute value of the mean, %. Measuring and interpreting skewness MEAN- Preferred measure of central tendency If the data are skewed, then the preferred measure of central tendency is the MEDIAN Can be measured by SPSS Measure of skewness is based on sample size, and each variable has a sample of 100. The key issue or key factor is to determine if the data are symmetrical or skewed. Symmetrical data = data are Not Skewed Variable is Not symmetrical = skewed If the data are skewed, then the distribution cannot be normal this means that statistical procedures based on the normal distribution should not be used. Positive Skewness there is a pileup of cases to the left and the right tail of distribution is too long The mean will be more than the median, and the coefficient will be positive. Negative Skewness there is a pileup of cases to the right and the left tail of the distribution is too long. Values will fall between -1 to +1. Measures of location Decil Percentile Quartile e 1 10 P 25 = Q1 2 20 3 30 4 40 P 50 = Q2 5 50 6 60 7 70 P 75 = Q3 8 80 9 90 10 100 P 100 = Q4 Probability- Numeric Measures the likelihood that a particular event will or will not occur Defined as range of 0-1 only, never more/never less A probability of 1.0 means that the event will happen with certainty 0 means that the event will not happen
0.5, the event should occur once in every two attempts, on average If the probability is close to 1.0, then the event is more likely to happen and if the probability is close to 0, it is unlikely to happen. Measures of Kurtosis or Peakness Measures whether the bell shape is too flat/ too peaked. Leptokurtic if the kurtosis value is a large positive number, the distribution is too peaked to be normal. Platykurtic the curve is too flat to be normal If too flat, the kurtosis value is negative If a distribution is markedly skewed, there is no particular need to examine kurtosis. II. Normal distribution Normal distribution is the most important statistical distribution discovered by French mathematician Abraham De Moivre. Normal curve = Gaussian distribution Normal distribution is the basis for the use of inferential statistics Has 3 main properties Appearance: symmetrical bell shaped curve All normal distribution have a particular internal distribution for the area under the curve The amount under the curve is directly proportional to the percentage of raw scores. Is a Theoretical distribution defined by two parameters Mean = 0; Standard deviation = 1 Statistical Inference Obtaining information from a sample of data about the population from which the sample is drawn and setting up a model to describe this population Two types: a) Parameter estimation point estimation (ex. Sample mean, median variance, standard deviation) and interval estimate ( ex. Confidence interval and upper and lower limit of the range of values b)Hypothesis testing Random Selection When a random sample is drawn from the population, every member has an equal chance of being drawn If theres available sampling frame, a table of random numbers can be used in selecting random sample dependent on the sample needed size. Nonprobability samples are very unlikely to represent the population under study. Areas under the curve Standardized score, Z, which gives the relative position of any observation in the distribution Z is referred to as Z Score, Z Value, Standard Score. The net effect of this so-called Z transformation is to change any normal distribution to the standard normal distribution, Mean = 0; Standard deviation = 1 Because the normal curve is symmetrical, the area between zero and any negative point is equal to the area between zero and corresponding positive point.
The area under the curve is equal to 1 and the curve is symmetrical about zero. Negative Z Score means that the corresponding Standard Score A way of expressing a score n terms of its relative distance from the mean. Distribution of samples Is the set of values of sample means obtained from all possible samples of the same size symmetrical, roughly bell-shaped and centered close to the population mean but less variation than the distribution Central Limit Theorem One of the most remarkable features of mathematical statistics States that for a randomly selected sample of size with a mean and standard deviation: The distribution of sample means is approximately normal regardless of whether the population distribution is normal. From statistical theory come these two additional principles: Mean of the distribution of sample means = mean of its population distribution SD of the distribution of sample means = SD of the population divided by the square root of the sample size. It indicates that the standard error is being estimated given the SD of a sample size n. Note: As the sample size increases the variability of the sampling distribution becomes progressively smaller. Standard error of the Mean The measure of variation of the distribution of sample means Is a counterpart of the standard deviation in that is a measure of variation, but variation of sample means rather than of individual observations An important statistical tool because it is a measure of the amount of sampling error Students t Distribution 1908, William S. Gosset, English chemist t score t distribution is similar to standard normal distribution in that it is unimodal, bell-shaped and symmetrical and extends infinitely in either direction Function of quantity is the degrees of freedom. Degrees of freedom measure the quantity of information available in a data set that can be used in estimating the population variance. T distribution for infinite degrees of freedom is precisely equal to the normal distribution. This equality is readily seen by comparing the critical values for df. Degrees of freedom are equal to the number of independent pieces of information used to estimate the parameter Population parameters: Point estimate and confidence interval estimate. Point estimate of the population mean is the sample mean computed from a random sample of the population.
Confidence interval- a group of numbers for which we have a specified degree of assurance that the value of the parameter was captured. It allows us to estimate the unknown parameter and provides a margin for error indicating how good our estimate is. Assumptions necessary to perform t-tests 1. Observations are randomly selected 2. Distribution is a normal distribution Hypothesis testing Hypothesis- statement of belief used in the evaluation of population values Null Hypothesis- no difference Alternative Hypothesis Research Hypothesis it is the hypothesis that an investigator believes in A claim that disagrees with null hypothesis Specifies that there is a difference between the sample mean and the population mean Test statistic a statistic used to determine the relative position of the mean in the hypothesized probability distribution of sample means Critical region Rejection region, is the region on the far end of the distribution. It sets guidelines for rejecting or failing to reject the null hypothesis. Critical value is the number that divides the normal distribution into the region where we will reject null hypothesis and the region where we fail to reject the null hypothesis Significance level is the level that corresponds to the area in the critical region. Nonrejection region is the region located under the middle portion of the curve. Whenever a test statistic falls in this region, the evidence does not permit us to reject the null hypothesis = not unexpected results. Test of significance Hypothesis test, is a procedure used to establish the validity of a claim by determining whether the test statistic falls in the critical region. One tailed (Directional Test) Two tailed (Non- Directional) With the region of With the region of rejection lying on either rejection lying on both left or right tail of the tails. It is used when normal curve the Alternative Hypothesis uses words Right directional testsuch as (Not equal to, region of rejection is on Significantly Different) the right tail. Used in alternative hypothesis uses COMPARATIVES (greater, higher than, better, superior to, exceeds) Left directional testregion of rejection is on
the left tail. (Less than, smaller than, inferior to, lower than, below) Type I error (Alpha Error) Rejecting a true hypothesis Type II error (Beta Error) Accepting the false hypothesis Independent (unpaired) t Tests No connection between any subject in group 1 and group 2 Ex. Comparison between males and females Assumption: 2 sample assuming unequal variance Dependent (Paired) t Tests There is a connection between scores in group 1 and group 2. Ex. Pretest and posttest, before and after Assumption: samples must be equal in size Standard deviation Fewer DF, t value = larger, CI = wider ANOVA Analysis of variance, is a statistical procedure that is able to handle these difficulties (Choice of a proper significance level for overtesting; numerous tests needed if many groups are involved and the lack of one overall measure of significance for the differences among the means) Is a logical extension of the t test when we have data from 3 or more independent groups Analysis of variance is unique in that it compares 2 different estimates of the population variance to test hypothesis concerning the population mean Within-group variance the sum of the variances of each group Between-group variance Assumptions: Observations are independent Observations in each group are normally distributed Variance of each group is equal to that of any group (homogeneity of variances) It is a robust technique Chi-Square test Easy to perform, has wide variety of applications in health and medical sciences. Determines whether there is association between 2 variables Compares the observed frequencies with the expected frequencies Types: a. whether 2 variables are independent b. whether various subgroups are homogenous c. whether there is significant difference in the proportions in the subclasses among the subgroups. A generally, well accepted rule is that no expected frequency should be less than 1 and not more than 20% of the cells should have an expected frequency of less than 5.
Use Fishers exact test f the expected frequencies are too small.

Statistics: I. II. Iii. IV

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Statistics: I. II. Iii. IV

Enviado por

Direitos autorais:

Formatos disponíveis

Statistics Body of mathematical techniques and procedures

Quantitative data- numerical information

Você também pode gostar