Data Analysis & Interpretation 11jan2009

Data Analysis and Interpretation
Faculty of Civil Engineering, UiTM Shah Alam Zakiahah@hotmail.com PhD/Master Research Methodology Course UiTM (20 Feb 2010)
Zakiah Ahmad
Content of presentation

Introduction Administration of the data collection program. Data acquisition Data analysis Data presentation The construction of good charts and graphs for data presentation Statistical test in data analysis Conclusion
Introduction
What data do we need? Why do we place great emphasis on quality and quantity of data? Why do research problems and objectives (or hypotheses) need to be carefully constructed?
Hypothesis
A hypothesis can be described as a tentative answer to a research question or a provisional prediction
Hypothesis should be: Stated clearly, using appropriate terminology Testable Statement of relationship between variables Limited scope
Types of hypothesis
The words influence or affects are used without direction This hypothesis is called two-tailed hypothesis The words reduce, increase, lower, raise This type is called one-tailed hypothesis This can lead to easier data analysis
Administration of the data collection program
Of utmost importance! You cannot rely so much on others to administer the data collection program. This is especially true when study involves the issuance of questionnaire. Others may not know what level of accuracy or precision you require during sampling or making measurements.
Data acquisition

Identification of data/variables to be measured. Sources of data in terms of its reliability. Characteristics of data in terms of quantity and quality. The need to refer back to the research problems and objectives (or hypothesis). Documentation of your data.
Example1 summary data to be collected Example2 data collection and analysis
Some fundamental questions.
What data are needed? What are the required characteristics in terms of quantities and qualities? (data type, quality, quantity, resolution, precision, accuracy, and coverage, in order to properly address the research objectives ). Are the data sufficient? How would you assess its suitability? If the data is not available, how would you generate these data?
Continue..
What variables will be measured? How will measurements be made? What sampling scheme will be employed, and why? What logistical problems (e.g., accessibility) need to be considered? At what scale(s) will measurements be made? How will you ensure that you are measuring what you think you are measuring?
Executive summary
What implications are there for the subsequent analysis?
How does sample size constrain the effectiveness (e.g., power) of statistical tests? Are replicate observations needed? Is there a spatial dimension to your data, and if so have you worked out what the distance between your samples should be? Have you over-sampled or under-sampled, and can this be remedied beforehand? Are the data "representative" and how do you know? Are the data "random" or "stratified" or "nested" and does this matter? Does the type of data - ratio scale, interval, ordinal, discrete, nominal, closed, directional - have implications for data analysis
Experimental design
DATA
Variables and Parameters

What is the difference between a variable and a parameter? At which stage of your research would you use variables? Chapter 1 At which stage of your research would you use parameters?
parameter
What is a Variable?
Can be defined as a characteristic, a measurable quantity, a trait, an attribute of a person, etc. Basically have more than one value and its varies. Variables can be classified into: 1) qualitative variables 2) quantitative variables
1) qualitative variables - Is the one which describes a characteristic, a recognition, a pattern, a preference, etc. 2) quantitative variables - Represent in numerical sense a characteristic, a parameter etc. with a range of value. - Normally represented in two ways namely: a) discrete variables b) continuous variables
Quantitative variables can be further sub-divided:

i) Continuous - have infinite number - can be express in decimal, ii) Discrete
e.g. stress, COD, bearing capacity,velocity
- normally whole numbers - decimal and fraction are excluded, e.g. number of cars/house
iii) Categorical/Nominal non-hierarchical order - mutually exclusive - e.g. color, race, religion, skin color, etc. iv) Ordinal/Ranked do not have common unit of measurement - can class ranked - e.g. income level, job position, opinion
v) Interval have common unit of measurement - do not have a true zero point - e.g. temperature, pH, turbidity, etc vi) Ratio have common unit of measurement - have a true zero point - e.g. height, weight, density, pressure, etc
Variables can also be grouped as: 1) independent variables 2) dependent variables
1) independent variables
-
Is the one that is manipulated or being subjected to some form of treatment Some called experimental variables
2) dependent variables
-
Is the one that changes when subjected to the changes in the independent variables Depend on the changes of independent variables
Variables are quantities which vary from individual to individual. By contrast, parameters do not relate to actual measurements or attributes but to quantities defining a theoretical model.
The word parameter can also be related to its original mathematical meaning as the value(s) defining one of a family of curves. The slope and intercept of a line (more generally known as regression coefficients) are examples of the parameters defining a model.
Measurement and Process

-
The method of measurement is as important as the process of measurement in order to obtain exact and precise value
Measurement Methods - Basically, there are two measurement, namely:
are
two
approaches
in
a) direct/primary comparison b) indirect comparison
a) Direct/primary comparison
-
Is comparing the measurement with a primary standards such as dimensional standards. (e.g. ISO, SI unit etc) There are 3 classes of measurement unit, namely: i) the base units ii) the supplementary units iii) the derived units
b) Indirect comparison - Is comparing the measurement through the use of calibrated system - This calibration procedure establishes correct output of the measurement
Quality of Measurement
-
Usually described by the following terms, namely:

a) accuracy determined by the variations between the measured and true values b) precision determined by the variations between the instruments reported values under repeated measurement of the same conditions of use of the same parameter/quantity c) reliability is a measure of how an instrument able to measure what it supposes to measure
Purpose of Measurement
-
The purpose of measurement including: 1) to establish the system characteristics 2) to establish and learn from past records and evidences 3) to compare the measurement by standard reference 4) to form new or truthful knowledge 5) to be used for predicting phenomena or conditions
Measurement Process
-
The measurement process can generally be described as a sequence of 5 operations.

a) Designing of efficient measurement set up b) Intelligent operation or use of the measuring instrument c) Recording of the information/data in a manner that is clear and complete d) Estimating the accuracy of measurement and eliminating sources of error e) Reporting of the measurement made
Measurable Variables - Theoretically, we can measure almost everything either qualitatively or quantitatively. - Some common measures frequently encountered may be classified as:

Chemical measures Physical measures Biological measures Behavioral measures Cognitive measures Affective measures Economic measures
Manipulative and Non-Manipulative Variables

Manipulative Variables is an independent variable under control to the researcher as in experimental research
Non-Manipulative Variables is an independent variable not under direct control of the researcher - there are two types of non manipulative variables, namely selected and natural treatment or natural exposure variables
Sources of Error in Measurement

1) Human Error or Gross Error
- cannot be treated mathematically - can be avoided only by taking care during work - preferably take more than one reading 2) Systematic Error - can be divided into two: i) instrumental error ii) environmental error
3) Random Error - due to change and occur even when all systematic errors have been accounted for - increasing member of readings or samples can reduces these error - using statistical means to obtain the best approximation of the true value of the quality under measurement
Overview of data processing

Recording of raw data
Editing & Cleaning
Coding*
Analysis
Engineering Softwares Statistical Softwares Empirical or Mathematical models
Interviews
Census-based
Lab-based Field-based Modeling-based Participatory-based
Developing a code book Pre-testing the code book Coding the data Characteristics of a unit or a system Establish cause-effect relationships
Verifying the coded data Intepretation Conclusions Recommendations
*qualitative studies
Data Presentation
Raw data should not be presented in the main report use appendices Tables in the main report should consist of analyzed data NOT raw data. Typical results (graphs) should be provided in the main report. Characteristics of the each sample should be summarized in a table. Same information should only be presented in either a table or a graph, NOT both. Each table and graph that is provided in the main report must be discussed.
Data Presentation
Tables and Figures must be consistent. All figures and tables are to be accompanied by text. Figures and tables should come immediately after the text. Label all figures and tables accordingly. (Refer to the university guidelines for style and format). Only typical values or cross-sections are shown. Raw data to be placed in the appendices. Use proper graphs (bar charts (histograms), box plots (rarely used), scatter plots, line plots, pie charts (avoid if you can!) and cone and cylinder (NEVER!) to present your results effectively. To provide equations wherever appropriate.
Data Presentation
Tables Tables must have -Title, Column heading, Body, Supplementary notes or Footnotes
Graphs Types of graphs : histogram, bar chart, pie chart, line diagram or trend curve, Scattergram
Data presentation
Ensure that units are given in Tables and Figures.
Pictures Example sem
The construction of good charts and graphs for data presentation
The use of charts are to convey ideas about the data that are not apparent if displayed in table or text (Remember - efficient display of meaningful and unambiguous data) The use of charts would allow the reader a quick grasp of what the data mean. Every dot on a scatter plot, every point on a time series line, every bar on a bar chart represents a number (actually, in the case of a scatter plot, two numbers). It is important to tell the reader what each of those numbers represent.
Basic components of a Chart
the labeling that defines the data: the title, axis titles, scales and data labels, legends defining separate data series, and notes (often, to indicate the data source), scales defining the range of the Y (and sometimes the X) axis, and the graphical elements that represent the data: the bars in bar charts, the lines in times series plot, the points in scatter plots, or the slices of a pie chart.
Continue
Titles. The title should be used to define the data series without imposing a data interpretation on the reader. Often, the units of measurement are specified at the end of the title after a colon or in parentheses in a subtitle. Axis titles. Axis titles should be brief and should not be used at all if the information merely repeats what is clear from the title and axis labels. It would be redundant to repeat the phrase and completely unnecessary. Axis scale and data labels. The value or magnitude of the main graphical elements of the chart are defined by either or both the axis scale and individual data labels. Avoid using too many numbers to define the data points. A chart that labels the value of each individual data point does not need labeling on the y axis. If it seems necessary to label every value in a chart, consider that a table is probably a more efficient way of presenting the data.
Legends. Legends are used in charts with more than one data series. They should not be placed on the outside of the chart in a way than reduces the plot area, the amount of space given to represent the data. The legend should be placed inside the chart (although some think that detracts from the main graphical elements), it could also be placed at the bottom of the chart. Gridlines. If used at all, gridlines should use as little ink as possible so as to not overwhelm the main graphical elements of the chart. The source. Specifying the source of the data is important for proper academic citation, but it also can also give knowledgeable readers who are often familiar with common data sources important insights into the reliability and validity of the data. Other chart elements. The amount of ink given over to the nondata elements of a chart that are not necessary for defining the meaning and values of the data should be kept to an absolute minimum. Plot area borders and plot area shading are unnecessary. Keep the shading of the graphical elements simple and always avoid using unnecessary 3-D effects.
Avoid the followings
Using pie charts. Pie charts usually contain more ink than is necessary to display the data and the slices provide for a poor representation of the magnitude of the data points. Forcing the reader to draw comparisons across the two pie charts is also a bad idea. 3-D pie charts are even worse, as they add visual distortion of the data
Data Presentation
Examples of Good/Bad Tables and Figures To be discussed in class
How best can you present your data?
Rotated bar chart with two data series
Stacked bar chart

Should be used with caution as it is difficult to visualize the difference in the size of components. The same goes to stacked line and area charts!
The use of stacked bar chart to display the change in land use
100% 90% 6000000 80% 7000000
60% 50% 40% 30% 20%
4000000
3000000
2000000
1000000 10% 0% 0
1947
1949
1951
1953
1955
1957
1959
1961
1963
1965
1967
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
Period Measured (Year) Forest Tree cultivation, orchards and tea Agriculture experimental stations Measured deposits Grassland, scrub forest and shifting cultivation Residential/estate buildings and associated areas Water body Market and mixed agriculture and floriculture Mining Cleared/open land
2005
Measured Deposits (m 3)
70%
5000000
Land Use Change (%)
Avoid using 3 D Bar Chart

Revised chart without having the 3 D effect
Really bad 3 D bar chart
Avoid placing legend at the side as it reduces the chart size
Time series plot with annotations

Rules for Time Series (Line) Charts Time is almost always displayed on the X-axis from left to right. Display as much data with as little ink as possible. Make sure the reader can clearly distinguish the lines for separate data series.
This is one of the most efficient means of displaying large amounts of data in ways that provide for meaningful analysis. The typical time series line chart is a scatterplot chart with time represented on the X-axis and lines connecting the data points.
Time series chart with second Y axis
Scatter plots
scales
Predicted against Measured Sediment Load 100 10 1 0.1 0 1 10 100 measured load (kg/s)
Difference ratio
predicted load (kg/s)
gridlines
Rules for Scatterplots Fully define the variables with the axis titles. Place the independent variable (the one that causes the other) on the X-axis and the dependent variable (the one that may be caused by the other) on the Yaxis. Scale the axes to maximize the use of the plot area for displaying the data points. You may add data labels to identify the cases.
St ru
4500 4000 3500 3000 2500 2000 1500 1000 500 0
Y Axis
Scales
Ko nk rit da n Ba ha n .. n hi d. da eo te k an lik G lo R in g Hi dr au
St ru kt ur ra t Be
kt ur
G gi ni ka da Tr l n af M ik ek da ... n Le bu h. .. Pe ris Al ia n am Ke Uk Se ki ur ju ru t. . . ta r eo
Bar chart / Histogram

2
K eluas an L antai (m )
gridlines
X Axis
P urata
Minimum
Maks imum
Legend
Data Labels
Indeks (m /pelajar)
14 12 10 8 6 4 2 0
UM
11
12 9 9 7 1 10 8 3 2
11
Indeks (m2/pelajar)
NM
TM Ui
UM UN P IM AS UM S
UT
UK
UT
UP
US
UP
Data Analysis
Data analysis

Pay attention to the units used. (Students often make mistakes here!). You may use statistical tools in your analysis. When evaluating others equations, it is important to refer to the paper written by the respective authors. The reason being, chances of the other party making mistakes are likely. You will end up using the wrong equations! Do not discard data without checking the influentiality of the data first. You may not know until you have done some test on this! If the study involves developing empirical equations, check on the significance of the controlling variables. (this can be done through statistical tests. Students may review the underlying theories and fundamentals to confirm the factors that govern the dependent variables. Again this may not be necessary true.).
Continue..
In modeling (simulations or optimization etc) works, do not always believe in the results generated from the model. Some engineering judgement is necessary. Explain the trends you observe in the figures you present. Do not explain what you did not see as the examiners would be equally looking at the same thing with their unaided eyes! Do not throw your raw data. They should be kept safely as in circumstances where your computers got corrupt, you still have your raw data to refer to.
Data analysis statistically

Preliminary data analysis 1. Descriptive Statistics 2. Inferential Statistics 3. Predictive Statistics
Statistical data analysis techniques

Purpose of Research
Descriptive
Inferential
Predictive
Forecasting methods including Multiple regression analysis
Mean Median Parametric tests -Hypothesis testing Non-parametric tests
Mode
Minimum Maximum Variance Proportions Etc.
List of statistical tools
Statistical tests in data analysis

It is imperative to first categorize your variables as independent and dependent. Determine the number of independent and dependent variables in the study. You may treat intervening or nuisance variables as additional independent variables. Determine the level of measurement (nominal, ordinal or interval) applied to each relevant variable.
Examples
T-test Chi-square F-test
Sample of the data summary
Conclusion
Most research requires data and data analysis. Data acquisition is of utmost importance and considerable effort should be made to obtain or generate good data. Good data are data whose characteristics enable the research objectives to be met. Data of poor quality or undesirably low quantity will lead to unsatisfactory data analysis and vague results. The characteristics of the data, particularly their type, quantity, and how they were sampled, constrain the choice of data analysis techniques able to be used on the data. Use proper charts and graphs for effective presentation of your data.
Thank you

Data Analysis &amp; Interpretation 11jan2009

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Data Analysis &amp; Interpretation 11jan2009

Enviado por

Direitos autorais:

Formatos disponíveis

Data Analysis and Interpretation

Administration of the data collection program

Example1 summary data to be collected Example2 data collection and analysis

Some fundamental questions.

What implications are there for the subsequent analysis?

Variables and Parameters

Quantitative variables can be further sub-divided:

e.g. stress, COD, bearing capacity,velocity

Variables can also be grouped as: 1) independent variables 2) dependent variables

Measurement and Process

Measurement Methods - Basically, there are two measurement, namely:

a) direct/primary comparison b) indirect comparison

Usually described by the following terms, namely:

The measurement process can generally be described as a sequence of 5 operations.

Manipulative and Non-Manipulative Variables

Sources of Error in Measurement

Overview of data processing

Editing & Cleaning

Verifying the coded data Intepretation Conclusions Recommendations

Ensure that units are given in Tables and Figures.

Pictures Example sem

The construction of good charts and graphs for data presentation

Basic components of a Chart

Avoid the followings

How best can you present your data?

Rotated bar chart with two data series

Stacked bar chart

60% 50% 40% 30% 20%

Land Use Change (%)

Avoid using 3 D Bar Chart

Really bad 3 D bar chart

Avoid placing legend at the side as it reduces the chart size

Time series plot with annotations

Time series chart with second Y axis

predicted load (kg/s)

Bar chart / Histogram

Data analysis statistically

Statistical data analysis techniques

Mean Median Parametric tests -Hypothesis testing Non-parametric tests

List of statistical tools

Statistical tests in data analysis

T-test Chi-square F-test

Sample of the data summary

Você também pode gostar

Data Analysis & Interpretation 11jan2009

Data Analysis & Interpretation 11jan2009