Você está na página 1de 21

Title: Student Survey on different types of Data

Abstract:

Introduction
The objective of this project is to learn how to analyse data and make conclusions by performing
different tests using computer software to help make decisions in a situation or the running of a
company or organisation.

Process used
The data survey was collected using a scale to get the weight, a ruler for the measurement of the
height, a data Survey Question Sheet to collect information and the SPSS software to analyse the data
being collected.

Project Phases
Questionnaire Design

Data Collection
The data collection of all the groups were put together in an excel workbook and then converted into
the SPSS software to do tests and analysis.
Given below are the samples of the data collection in SPSS and excel.

Figure 1 Sample of data collection in SPSS

Figure 2 Sample of data collection in Excel

EDA (Graphical and Numerical)


Descriptive statistics table

Figure 3. The descriptive statistics table for the Dept, CATOTAL, SVTOTAL and EVTOTAL

The valid N (listwise) is the number of missing values.

The N is the total number or sample of each variable. In this case, each has an N total of 87.
The Minimum is the smallest number of each variable.
The Maximum is the highest number of the variables.
The mean is the arithmetic mean across the observation. It is also known as average and is the most
widely used measure of tendency.
The Std. deviation describes whether the value is close or far from the mean. The larger it is, the
further or wider it becomes moving away from the mean. It is the square root of the variance.
The Variance is the square of the standard deviation.
Skewness measures the degree and direction of asymmetry. A distribution that is skewed to the left
has a negative skewness. A distribution that is skewed to the right has a positive skewness. A normal
distribution has a skewness of 0.

Kurtosis is a measure of the heaviness of the tails of a distribution.


The statistics are the descriptive statistics.
The standard error gives information about the variability possible in the statistic.

Stem and Leaf

Figure 4 The stem and leaf plot for the CATOTAL

A stem and leaf is a graphical way of representing the summary of the data. In the stem and
leaf, the Frequency is the number of times the leaf appears or the frequency of the leaf. The
stem is the number that represents the number of 10s. For example, the number 2 under the
stem represents 20. The leaf is the number that comes right after the stem number. It is the
number in the 1s place of the value of the variable. It indicates how many numbers are in the
variable.

Frequency

Figure 5 The statistics table for the Gef, CATOTAL, SVTOTAL and EVTOTAL

The statistics table above represents the overall frequencies of the variables. It is similar to the
descriptive statistics table except that the statistics are listed on the rows and the variables are placed
in the columns and also the Sum is included. The Sum is the total.

Figure 6 Frequency Table for Gef

The above is the frequency table for the Gef and it describes the distribution of the Gef frequencies.
The Frequency is the frequency of male and female.

Figure 7 Frequency Table for the CATOTAL

This is the CATOTAL frequency table and it describes the distribution of the CATOTAL frequencies.

Figure 8 Frequency table for the SVTOTAL

The SVTOTAL frequency table describes the distribution of SVTOTAL frequencies.

Figure 9 Frequency table for the EVTOTAL

The frequency table for the EVTOTAL describes the distribution of the EVTOTAL frequencies.

Graphs
Box plot

Figure 10 Boxplot of the CATOTAL based on Gender

Box plot visually represents statistical data. These statistical data are the Maximum, Minimum,
Median, first quartile and third quartile.
The small horizontal line that the vertical line extends to from the top of the box represents the
Maximum value. The top of the box represents the third quartile. The line in between the box
represents the median or the second quartile. The bottom of the box represents the first quartile. The
horizontal line that the vertical line extends to from the bottom of the box is the Minimum value.

Histogram

A histogram shows how many times the value of the variables appears. This histogram
graphically represents the percentile of the SVTOTAL table. The curve shows that it is a
normal distribution.
Pie Chart

A pie chart emphasizes relative portions of the whole. Each portion or sector represents
different variables. For the above case, it represents the sum of three variables. These
variables are CATOTAL, SVTOTAL and EVTOTAL. Just by looking at the pie chart, we see
that the CATOTAL has the largest portion or sector. This means that the sum of CATOTAL is
bigger than the sum of the other two.
Scatter plot

The scatter plot is used to graphically represent the relationship between two groups or
variables. It can also state whether the relationship is linear or not.
Inferential Statistics
Inferential Statistics are statistics that draw conclusions about a population from a sample(s).
It is done by the different distribution processes or tests. T-tests, ANOVA test, factor analysis,
Chi-square Contingency Test and Regression & Correlation are some of the tests that come
under the inferential statistics.
T-tests
There are three main types of t-test and they are one sample test (Single-sample t-test), Independent
Sample t-test and dependent sample t-test (paired t-test).
The single sample t-test is for one sample test and it compares the mean of the sample according to
the value given.
The Independent sample t-test is for two samples which compares the difference in the means of the
same variable between two groups.
The dependent sample t-test is used when the samples are not independent. It compares the difference
in the means of two variables with the same given value.

Single-sample t-test.
Below is the one sample test for Regor. The t-statistic is 22.445 with 86 degrees of freedom. The 2-tail
(p-value) is 0.00 which is less than the significance level which is 0.05. It clearly shows that the mean
of Regor is different from the test value which is 0.

Independent sample t-test


The test below is for the variable CATOTAL and the two groups are female and male. Its t-statistics is
0.099 with 85 degree of freedom and 2-tail (p-value) of 0.921 which is greater than the significance
level (0.05). To conclude this test, there is no difference in the means of CATOTAL between male and
females.

Paired sample t-test


The t-statistic is 20.340 with a degree of freedom of 86 and the 2-tailed (p-value) is 0.000 which is
less than the significance level leaving us to a conclusion that the mean difference of SVTOTAL and
EVTOTAL is different from 0.

In the Paired sample statistics table, there are five columns. These columns are the Variable column
(SVTOTAL&EVTOTAL), the Mean column, the N column, the Standard Deviation (Std. Deviation)
column, and the Standard Error Mean (Std. Error Mean) column.
The mean is the mean of the two variables.
The N is the total or sample value of students.
*The Std. Deviation is the measurement of how far the variable or values are to the mean. A large
standard deviation means that the value is far from the mean. A smaller standard deviation means that
the value is close to the mean.
The standard error
ANOVA test
An ANOVA test is a way of analysing variance to test the difference between or among the means.
In the following example, we want to test if each year levels have the same or different means of the
study hours.

This is the descriptive table showing the year levels at the left most column. The N represents the total
number of each year levels. The Mean column shows the difference amongst the study hour mean of
each year level. Also included in the table are the standard deviation, standard error, the Confidence
Interval for mean and the maximum and minimum.

The ANOVA table consists of the Sum of Squares, the degree of freedom (df), the mean square, the F
and the sig. The F is calculated by dividing the mean square between groups by the mean square
within groups. The sig is also known as the p value.
To interpret the ANOVA table to indicate whether the study hours means of the year levels are
different or the same, we look at the sig. value. If the sig value is greater than the significance level or
the confidence interval, we conclude that there is no difference in the means. If the sig value is less
than the significance level or the confidence interval, we can conclude that the there is a difference
amongst the means. In this case, the confidence interval is 0.05 and the sig value is 0.004 which is less
than 0.05 so we conclude that there is the difference among the means.
The above states that there is a difference among the means but the means are not specified. To know
which means are different, a post hoc test is conducted.
A post hoc test is used when there is a difference between the means in an ANOVA test.

The multiple comparisons table lists the results of the post hoc test. The first column lists the
condition names which are the year levels and the comparison of the different conditions. The single
conditions are listed on the left of the rows and on the right are the different condition comparisons to
the left condition.
Looking at the sig values, most of the sig values are greater than 0.05 meaning there is no difference
among the means except for four values. Two are 0.002 and two are 0.014. The 0.002 values are from
the comparison between year level 1 and year level 3. The 0.014 values are from the comparison
between year level 2 and year level 3. This means that year 1 and year 3 conditions have different
means. Also year 2 and year 3 conditions have different means in terms of study hours while the other
year level comparisons are not different from each other.

The mean graph graphically shows the difference among the mean of study hours of the year levels.
Factor Analysis
Factor analysis is a way of reducing data and show relationships among variables. It requires large
samples. Given below is the factor analysis test performed.

The communalities are the proportion of the variance of each variable. It is the sum of the variables
squared factor. The initial are the values that are determined by the squared multiple correlation of the
variable with other variables.

The components are the number of variables that are used in the factor analysis. The initial
Eigenvalues are the variances of the components. The total column lists the eigenvalues. The first
component always has the highest variance and decreases as the list goes down. As you can see in the
above table, totals are listed from the highest to the lowest. The % of Variance lists the percentage of
the total variance of the components and the Cumulative % lists the cumulative percentage of variance
accounted for the by the current and all preceding factors.

The scree plot graphs the eigenvalue against the component number.
Chi-Square Contingency Test
A chi-square test is use to test if the observed proportion of a variable is different from the expected or
hypothesized proportion.

The degree of freedom for the Gef is 1 and its p value is 0.004 which is less than the significance
level. We conclude that the Gef composition is different from the expected value. For the SVTOTAL,
the p value is also less than 0.05 so we conclude that it is different from the expected value.
Regression & Correlation

Conclusion & Recommendation


References

Você também pode gostar