Você está na página 1de 61

Data Analysis and Interpretation

Faculty of Civil Engineering, UiTM Shah Alam Zakiahah@hotmail.com PhD/Master Research Methodology Course UiTM (20 Feb 2010)

Zakiah Ahmad

Content of presentation

Introduction Administration of the data collection program. Data acquisition Data analysis Data presentation The construction of good charts and graphs for data presentation Statistical test in data analysis Conclusion

Introduction

What data do we need? Why do we place great emphasis on quality and quantity of data? Why do research problems and objectives (or hypotheses) need to be carefully constructed?

Hypothesis
A hypothesis can be described as a tentative answer to a research question or a provisional prediction
Hypothesis should be: Stated clearly, using appropriate terminology Testable Statement of relationship between variables Limited scope

Types of hypothesis
The words influence or affects are used without direction This hypothesis is called two-tailed hypothesis The words reduce, increase, lower, raise This type is called one-tailed hypothesis This can lead to easier data analysis

Administration of the data collection program

Of utmost importance! You cannot rely so much on others to administer the data collection program. This is especially true when study involves the issuance of questionnaire. Others may not know what level of accuracy or precision you require during sampling or making measurements.

Data acquisition

Identification of data/variables to be measured. Sources of data in terms of its reliability. Characteristics of data in terms of quantity and quality. The need to refer back to the research problems and objectives (or hypothesis). Documentation of your data.

Example1 summary data to be collected Example2 data collection and analysis

Some fundamental questions.

What data are needed? What are the required characteristics in terms of quantities and qualities? (data type, quality, quantity, resolution, precision, accuracy, and coverage, in order to properly address the research objectives ). Are the data sufficient? How would you assess its suitability? If the data is not available, how would you generate these data?

Continue..

What variables will be measured? How will measurements be made? What sampling scheme will be employed, and why? What logistical problems (e.g., accessibility) need to be considered? At what scale(s) will measurements be made? How will you ensure that you are measuring what you think you are measuring?
Executive summary

What implications are there for the subsequent analysis?

How does sample size constrain the effectiveness (e.g., power) of statistical tests? Are replicate observations needed? Is there a spatial dimension to your data, and if so have you worked out what the distance between your samples should be? Have you over-sampled or under-sampled, and can this be remedied beforehand? Are the data "representative" and how do you know? Are the data "random" or "stratified" or "nested" and does this matter? Does the type of data - ratio scale, interval, ordinal, discrete, nominal, closed, directional - have implications for data analysis
Experimental design

DATA

Variables and Parameters


What is the difference between a variable and a parameter? At which stage of your research would you use variables? Chapter 1 At which stage of your research would you use parameters?
parameter

What is a Variable?
Can be defined as a characteristic, a measurable quantity, a trait, an attribute of a person, etc. Basically have more than one value and its varies. Variables can be classified into: 1) qualitative variables 2) quantitative variables

1) qualitative variables - Is the one which describes a characteristic, a recognition, a pattern, a preference, etc. 2) quantitative variables - Represent in numerical sense a characteristic, a parameter etc. with a range of value. - Normally represented in two ways namely: a) discrete variables b) continuous variables

Quantitative variables can be further sub-divided:


i) Continuous - have infinite number - can be express in decimal, ii) Discrete

e.g. stress, COD, bearing capacity,velocity

- normally whole numbers - decimal and fraction are excluded, e.g. number of cars/house

iii) Categorical/Nominal non-hierarchical order - mutually exclusive - e.g. color, race, religion, skin color, etc. iv) Ordinal/Ranked do not have common unit of measurement - can class ranked - e.g. income level, job position, opinion

v) Interval have common unit of measurement - do not have a true zero point - e.g. temperature, pH, turbidity, etc vi) Ratio have common unit of measurement - have a true zero point - e.g. height, weight, density, pressure, etc

Variables can also be grouped as: 1) independent variables 2) dependent variables

1) independent variables
-

Is the one that is manipulated or being subjected to some form of treatment Some called experimental variables

2) dependent variables
-

Is the one that changes when subjected to the changes in the independent variables Depend on the changes of independent variables

Variables are quantities which vary from individual to individual. By contrast, parameters do not relate to actual measurements or attributes but to quantities defining a theoretical model.
The word parameter can also be related to its original mathematical meaning as the value(s) defining one of a family of curves. The slope and intercept of a line (more generally known as regression coefficients) are examples of the parameters defining a model.

Measurement and Process


-

The method of measurement is as important as the process of measurement in order to obtain exact and precise value

Measurement Methods - Basically, there are two measurement, namely:

are

two

approaches

in

a) direct/primary comparison b) indirect comparison

a) Direct/primary comparison
-

Is comparing the measurement with a primary standards such as dimensional standards. (e.g. ISO, SI unit etc) There are 3 classes of measurement unit, namely: i) the base units ii) the supplementary units iii) the derived units

b) Indirect comparison - Is comparing the measurement through the use of calibrated system - This calibration procedure establishes correct output of the measurement

Quality of Measurement
-

Usually described by the following terms, namely:


a) accuracy determined by the variations between the measured and true values b) precision determined by the variations between the instruments reported values under repeated measurement of the same conditions of use of the same parameter/quantity c) reliability is a measure of how an instrument able to measure what it supposes to measure

Purpose of Measurement
-

The purpose of measurement including: 1) to establish the system characteristics 2) to establish and learn from past records and evidences 3) to compare the measurement by standard reference 4) to form new or truthful knowledge 5) to be used for predicting phenomena or conditions

Measurement Process
-

The measurement process can generally be described as a sequence of 5 operations.


a) Designing of efficient measurement set up b) Intelligent operation or use of the measuring instrument c) Recording of the information/data in a manner that is clear and complete d) Estimating the accuracy of measurement and eliminating sources of error e) Reporting of the measurement made

Measurable Variables - Theoretically, we can measure almost everything either qualitatively or quantitatively. - Some common measures frequently encountered may be classified as:

Chemical measures Physical measures Biological measures Behavioral measures Cognitive measures Affective measures Economic measures

Manipulative and Non-Manipulative Variables


Manipulative Variables is an independent variable under control to the researcher as in experimental research

Non-Manipulative Variables is an independent variable not under direct control of the researcher - there are two types of non manipulative variables, namely selected and natural treatment or natural exposure variables

Sources of Error in Measurement


1) Human Error or Gross Error
- cannot be treated mathematically - can be avoided only by taking care during work - preferably take more than one reading 2) Systematic Error - can be divided into two: i) instrumental error ii) environmental error

3) Random Error - due to change and occur even when all systematic errors have been accounted for - increasing member of readings or samples can reduces these error - using statistical means to obtain the best approximation of the true value of the quality under measurement

Overview of data processing


Recording of raw data

Editing & Cleaning

Coding*

Analysis
Engineering Softwares Statistical Softwares Empirical or Mathematical models

Interviews

Census-based
Lab-based Field-based Modeling-based Participatory-based

Developing a code book Pre-testing the code book Coding the data Characteristics of a unit or a system Establish cause-effect relationships

Verifying the coded data Intepretation Conclusions Recommendations

*qualitative studies

Data Presentation
Raw data should not be presented in the main report use appendices Tables in the main report should consist of analyzed data NOT raw data. Typical results (graphs) should be provided in the main report. Characteristics of the each sample should be summarized in a table. Same information should only be presented in either a table or a graph, NOT both. Each table and graph that is provided in the main report must be discussed.

Data Presentation

Tables and Figures must be consistent. All figures and tables are to be accompanied by text. Figures and tables should come immediately after the text. Label all figures and tables accordingly. (Refer to the university guidelines for style and format). Only typical values or cross-sections are shown. Raw data to be placed in the appendices. Use proper graphs (bar charts (histograms), box plots (rarely used), scatter plots, line plots, pie charts (avoid if you can!) and cone and cylinder (NEVER!) to present your results effectively. To provide equations wherever appropriate.

Data Presentation
Tables Tables must have -Title, Column heading, Body, Supplementary notes or Footnotes
Graphs Types of graphs : histogram, bar chart, pie chart, line diagram or trend curve, Scattergram
Data presentation

Ensure that units are given in Tables and Figures.

Pictures Example sem

The construction of good charts and graphs for data presentation

The use of charts are to convey ideas about the data that are not apparent if displayed in table or text (Remember - efficient display of meaningful and unambiguous data) The use of charts would allow the reader a quick grasp of what the data mean. Every dot on a scatter plot, every point on a time series line, every bar on a bar chart represents a number (actually, in the case of a scatter plot, two numbers). It is important to tell the reader what each of those numbers represent.

Basic components of a Chart

the labeling that defines the data: the title, axis titles, scales and data labels, legends defining separate data series, and notes (often, to indicate the data source), scales defining the range of the Y (and sometimes the X) axis, and the graphical elements that represent the data: the bars in bar charts, the lines in times series plot, the points in scatter plots, or the slices of a pie chart.

Continue

Titles. The title should be used to define the data series without imposing a data interpretation on the reader. Often, the units of measurement are specified at the end of the title after a colon or in parentheses in a subtitle. Axis titles. Axis titles should be brief and should not be used at all if the information merely repeats what is clear from the title and axis labels. It would be redundant to repeat the phrase and completely unnecessary. Axis scale and data labels. The value or magnitude of the main graphical elements of the chart are defined by either or both the axis scale and individual data labels. Avoid using too many numbers to define the data points. A chart that labels the value of each individual data point does not need labeling on the y axis. If it seems necessary to label every value in a chart, consider that a table is probably a more efficient way of presenting the data.

Legends. Legends are used in charts with more than one data series. They should not be placed on the outside of the chart in a way than reduces the plot area, the amount of space given to represent the data. The legend should be placed inside the chart (although some think that detracts from the main graphical elements), it could also be placed at the bottom of the chart. Gridlines. If used at all, gridlines should use as little ink as possible so as to not overwhelm the main graphical elements of the chart. The source. Specifying the source of the data is important for proper academic citation, but it also can also give knowledgeable readers who are often familiar with common data sources important insights into the reliability and validity of the data. Other chart elements. The amount of ink given over to the nondata elements of a chart that are not necessary for defining the meaning and values of the data should be kept to an absolute minimum. Plot area borders and plot area shading are unnecessary. Keep the shading of the graphical elements simple and always avoid using unnecessary 3-D effects.

Avoid the followings

Using pie charts. Pie charts usually contain more ink than is necessary to display the data and the slices provide for a poor representation of the magnitude of the data points. Forcing the reader to draw comparisons across the two pie charts is also a bad idea. 3-D pie charts are even worse, as they add visual distortion of the data

Data Presentation
Examples of Good/Bad Tables and Figures To be discussed in class

How best can you present your data?

Rotated bar chart with two data series

Stacked bar chart


Should be used with caution as it is difficult to visualize the difference in the size of components. The same goes to stacked line and area charts!

The use of stacked bar chart to display the change in land use
100% 90% 6000000 80% 7000000

60% 50% 40% 30% 20%

4000000

3000000

2000000

1000000 10% 0% 0

1947

1949

1951

1953

1955

1957

1959

1961

1963

1965

1967

1969

1971

1973

1975

1977

1979

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2001

2003

Period Measured (Year) Forest Tree cultivation, orchards and tea Agriculture experimental stations Measured deposits Grassland, scrub forest and shifting cultivation Residential/estate buildings and associated areas Water body Market and mixed agriculture and floriculture Mining Cleared/open land

2005

Measured Deposits (m 3)

70%

5000000

Land Use Change (%)

Avoid using 3 D Bar Chart


Revised chart without having the 3 D effect

Really bad 3 D bar chart

Avoid placing legend at the side as it reduces the chart size

Time series plot with annotations


Rules for Time Series (Line) Charts Time is almost always displayed on the X-axis from left to right. Display as much data with as little ink as possible. Make sure the reader can clearly distinguish the lines for separate data series.

This is one of the most efficient means of displaying large amounts of data in ways that provide for meaningful analysis. The typical time series line chart is a scatterplot chart with time represented on the X-axis and lines connecting the data points.

Time series chart with second Y axis

Scatter plots
scales
Predicted against Measured Sediment Load 100 10 1 0.1 0 1 10 100 measured load (kg/s)

Difference ratio

predicted load (kg/s)

gridlines

Rules for Scatterplots Fully define the variables with the axis titles. Place the independent variable (the one that causes the other) on the X-axis and the dependent variable (the one that may be caused by the other) on the Yaxis. Scale the axes to maximize the use of the plot area for displaying the data points. You may add data labels to identify the cases.

St ru
4500 4000 3500 3000 2500 2000 1500 1000 500 0

Y Axis

Scales

Ko nk rit da n Ba ha n .. n hi d. da eo te k an lik G lo R in g Hi dr au

St ru kt ur ra t Be

kt ur

G gi ni ka da Tr l n af M ik ek da ... n Le bu h. .. Pe ris Al ia n am Ke Uk Se ki ur ju ru t. . . ta r eo

Bar chart / Histogram


2

K eluas an L antai (m )

gridlines

X Axis

P urata

Minimum

Maks imum

Legend

Data Labels

Indeks (m /pelajar)
14 12 10 8 6 4 2 0
UM

11

12 9 9 7 1 10 8 3 2

11

Indeks (m2/pelajar)

NM

TM Ui

UM UN P IM AS UM S

UT

UK

UT

UP

US

UP

Data Analysis

Data analysis

Pay attention to the units used. (Students often make mistakes here!). You may use statistical tools in your analysis. When evaluating others equations, it is important to refer to the paper written by the respective authors. The reason being, chances of the other party making mistakes are likely. You will end up using the wrong equations! Do not discard data without checking the influentiality of the data first. You may not know until you have done some test on this! If the study involves developing empirical equations, check on the significance of the controlling variables. (this can be done through statistical tests. Students may review the underlying theories and fundamentals to confirm the factors that govern the dependent variables. Again this may not be necessary true.).

Continue..

In modeling (simulations or optimization etc) works, do not always believe in the results generated from the model. Some engineering judgement is necessary. Explain the trends you observe in the figures you present. Do not explain what you did not see as the examiners would be equally looking at the same thing with their unaided eyes! Do not throw your raw data. They should be kept safely as in circumstances where your computers got corrupt, you still have your raw data to refer to.

Data analysis statistically


Preliminary data analysis 1. Descriptive Statistics 2. Inferential Statistics 3. Predictive Statistics

Statistical data analysis techniques


Purpose of Research

Descriptive

Inferential

Predictive
Forecasting methods including Multiple regression analysis

Mean Median Parametric tests -Hypothesis testing Non-parametric tests

Mode
Minimum Maximum Variance Proportions Etc.

List of statistical tools

Statistical tests in data analysis


It is imperative to first categorize your variables as independent and dependent. Determine the number of independent and dependent variables in the study. You may treat intervening or nuisance variables as additional independent variables. Determine the level of measurement (nominal, ordinal or interval) applied to each relevant variable.

Examples

T-test Chi-square F-test

Sample of the data summary

Conclusion

Most research requires data and data analysis. Data acquisition is of utmost importance and considerable effort should be made to obtain or generate good data. Good data are data whose characteristics enable the research objectives to be met. Data of poor quality or undesirably low quantity will lead to unsatisfactory data analysis and vague results. The characteristics of the data, particularly their type, quantity, and how they were sampled, constrain the choice of data analysis techniques able to be used on the data. Use proper charts and graphs for effective presentation of your data.

Thank you

Você também pode gostar