Escolar Documentos
Profissional Documentos
Cultura Documentos
Faculty of Civil Engineering, UiTM Shah Alam Zakiahah@hotmail.com PhD/Master Research Methodology Course UiTM (20 Feb 2010)
Zakiah Ahmad
Content of presentation
Introduction Administration of the data collection program. Data acquisition Data analysis Data presentation The construction of good charts and graphs for data presentation Statistical test in data analysis Conclusion
Introduction
What data do we need? Why do we place great emphasis on quality and quantity of data? Why do research problems and objectives (or hypotheses) need to be carefully constructed?
Hypothesis
A hypothesis can be described as a tentative answer to a research question or a provisional prediction
Hypothesis should be: Stated clearly, using appropriate terminology Testable Statement of relationship between variables Limited scope
Types of hypothesis
The words influence or affects are used without direction This hypothesis is called two-tailed hypothesis The words reduce, increase, lower, raise This type is called one-tailed hypothesis This can lead to easier data analysis
Of utmost importance! You cannot rely so much on others to administer the data collection program. This is especially true when study involves the issuance of questionnaire. Others may not know what level of accuracy or precision you require during sampling or making measurements.
Data acquisition
Identification of data/variables to be measured. Sources of data in terms of its reliability. Characteristics of data in terms of quantity and quality. The need to refer back to the research problems and objectives (or hypothesis). Documentation of your data.
What data are needed? What are the required characteristics in terms of quantities and qualities? (data type, quality, quantity, resolution, precision, accuracy, and coverage, in order to properly address the research objectives ). Are the data sufficient? How would you assess its suitability? If the data is not available, how would you generate these data?
Continue..
What variables will be measured? How will measurements be made? What sampling scheme will be employed, and why? What logistical problems (e.g., accessibility) need to be considered? At what scale(s) will measurements be made? How will you ensure that you are measuring what you think you are measuring?
Executive summary
How does sample size constrain the effectiveness (e.g., power) of statistical tests? Are replicate observations needed? Is there a spatial dimension to your data, and if so have you worked out what the distance between your samples should be? Have you over-sampled or under-sampled, and can this be remedied beforehand? Are the data "representative" and how do you know? Are the data "random" or "stratified" or "nested" and does this matter? Does the type of data - ratio scale, interval, ordinal, discrete, nominal, closed, directional - have implications for data analysis
Experimental design
DATA
What is a Variable?
Can be defined as a characteristic, a measurable quantity, a trait, an attribute of a person, etc. Basically have more than one value and its varies. Variables can be classified into: 1) qualitative variables 2) quantitative variables
1) qualitative variables - Is the one which describes a characteristic, a recognition, a pattern, a preference, etc. 2) quantitative variables - Represent in numerical sense a characteristic, a parameter etc. with a range of value. - Normally represented in two ways namely: a) discrete variables b) continuous variables
- normally whole numbers - decimal and fraction are excluded, e.g. number of cars/house
iii) Categorical/Nominal non-hierarchical order - mutually exclusive - e.g. color, race, religion, skin color, etc. iv) Ordinal/Ranked do not have common unit of measurement - can class ranked - e.g. income level, job position, opinion
v) Interval have common unit of measurement - do not have a true zero point - e.g. temperature, pH, turbidity, etc vi) Ratio have common unit of measurement - have a true zero point - e.g. height, weight, density, pressure, etc
1) independent variables
-
Is the one that is manipulated or being subjected to some form of treatment Some called experimental variables
2) dependent variables
-
Is the one that changes when subjected to the changes in the independent variables Depend on the changes of independent variables
Variables are quantities which vary from individual to individual. By contrast, parameters do not relate to actual measurements or attributes but to quantities defining a theoretical model.
The word parameter can also be related to its original mathematical meaning as the value(s) defining one of a family of curves. The slope and intercept of a line (more generally known as regression coefficients) are examples of the parameters defining a model.
The method of measurement is as important as the process of measurement in order to obtain exact and precise value
are
two
approaches
in
a) Direct/primary comparison
-
Is comparing the measurement with a primary standards such as dimensional standards. (e.g. ISO, SI unit etc) There are 3 classes of measurement unit, namely: i) the base units ii) the supplementary units iii) the derived units
b) Indirect comparison - Is comparing the measurement through the use of calibrated system - This calibration procedure establishes correct output of the measurement
Quality of Measurement
-
Purpose of Measurement
-
The purpose of measurement including: 1) to establish the system characteristics 2) to establish and learn from past records and evidences 3) to compare the measurement by standard reference 4) to form new or truthful knowledge 5) to be used for predicting phenomena or conditions
Measurement Process
-
Measurable Variables - Theoretically, we can measure almost everything either qualitatively or quantitatively. - Some common measures frequently encountered may be classified as:
Chemical measures Physical measures Biological measures Behavioral measures Cognitive measures Affective measures Economic measures
Non-Manipulative Variables is an independent variable not under direct control of the researcher - there are two types of non manipulative variables, namely selected and natural treatment or natural exposure variables
3) Random Error - due to change and occur even when all systematic errors have been accounted for - increasing member of readings or samples can reduces these error - using statistical means to obtain the best approximation of the true value of the quality under measurement
Coding*
Analysis
Engineering Softwares Statistical Softwares Empirical or Mathematical models
Interviews
Census-based
Lab-based Field-based Modeling-based Participatory-based
Developing a code book Pre-testing the code book Coding the data Characteristics of a unit or a system Establish cause-effect relationships
*qualitative studies
Data Presentation
Raw data should not be presented in the main report use appendices Tables in the main report should consist of analyzed data NOT raw data. Typical results (graphs) should be provided in the main report. Characteristics of the each sample should be summarized in a table. Same information should only be presented in either a table or a graph, NOT both. Each table and graph that is provided in the main report must be discussed.
Data Presentation
Tables and Figures must be consistent. All figures and tables are to be accompanied by text. Figures and tables should come immediately after the text. Label all figures and tables accordingly. (Refer to the university guidelines for style and format). Only typical values or cross-sections are shown. Raw data to be placed in the appendices. Use proper graphs (bar charts (histograms), box plots (rarely used), scatter plots, line plots, pie charts (avoid if you can!) and cone and cylinder (NEVER!) to present your results effectively. To provide equations wherever appropriate.
Data Presentation
Tables Tables must have -Title, Column heading, Body, Supplementary notes or Footnotes
Graphs Types of graphs : histogram, bar chart, pie chart, line diagram or trend curve, Scattergram
Data presentation
The use of charts are to convey ideas about the data that are not apparent if displayed in table or text (Remember - efficient display of meaningful and unambiguous data) The use of charts would allow the reader a quick grasp of what the data mean. Every dot on a scatter plot, every point on a time series line, every bar on a bar chart represents a number (actually, in the case of a scatter plot, two numbers). It is important to tell the reader what each of those numbers represent.
the labeling that defines the data: the title, axis titles, scales and data labels, legends defining separate data series, and notes (often, to indicate the data source), scales defining the range of the Y (and sometimes the X) axis, and the graphical elements that represent the data: the bars in bar charts, the lines in times series plot, the points in scatter plots, or the slices of a pie chart.
Continue
Titles. The title should be used to define the data series without imposing a data interpretation on the reader. Often, the units of measurement are specified at the end of the title after a colon or in parentheses in a subtitle. Axis titles. Axis titles should be brief and should not be used at all if the information merely repeats what is clear from the title and axis labels. It would be redundant to repeat the phrase and completely unnecessary. Axis scale and data labels. The value or magnitude of the main graphical elements of the chart are defined by either or both the axis scale and individual data labels. Avoid using too many numbers to define the data points. A chart that labels the value of each individual data point does not need labeling on the y axis. If it seems necessary to label every value in a chart, consider that a table is probably a more efficient way of presenting the data.
Legends. Legends are used in charts with more than one data series. They should not be placed on the outside of the chart in a way than reduces the plot area, the amount of space given to represent the data. The legend should be placed inside the chart (although some think that detracts from the main graphical elements), it could also be placed at the bottom of the chart. Gridlines. If used at all, gridlines should use as little ink as possible so as to not overwhelm the main graphical elements of the chart. The source. Specifying the source of the data is important for proper academic citation, but it also can also give knowledgeable readers who are often familiar with common data sources important insights into the reliability and validity of the data. Other chart elements. The amount of ink given over to the nondata elements of a chart that are not necessary for defining the meaning and values of the data should be kept to an absolute minimum. Plot area borders and plot area shading are unnecessary. Keep the shading of the graphical elements simple and always avoid using unnecessary 3-D effects.
Using pie charts. Pie charts usually contain more ink than is necessary to display the data and the slices provide for a poor representation of the magnitude of the data points. Forcing the reader to draw comparisons across the two pie charts is also a bad idea. 3-D pie charts are even worse, as they add visual distortion of the data
Data Presentation
Examples of Good/Bad Tables and Figures To be discussed in class
The use of stacked bar chart to display the change in land use
100% 90% 6000000 80% 7000000
4000000
3000000
2000000
1000000 10% 0% 0
1947
1949
1951
1953
1955
1957
1959
1961
1963
1965
1967
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
Period Measured (Year) Forest Tree cultivation, orchards and tea Agriculture experimental stations Measured deposits Grassland, scrub forest and shifting cultivation Residential/estate buildings and associated areas Water body Market and mixed agriculture and floriculture Mining Cleared/open land
2005
Measured Deposits (m 3)
70%
5000000
This is one of the most efficient means of displaying large amounts of data in ways that provide for meaningful analysis. The typical time series line chart is a scatterplot chart with time represented on the X-axis and lines connecting the data points.
Scatter plots
scales
Predicted against Measured Sediment Load 100 10 1 0.1 0 1 10 100 measured load (kg/s)
Difference ratio
gridlines
Rules for Scatterplots Fully define the variables with the axis titles. Place the independent variable (the one that causes the other) on the X-axis and the dependent variable (the one that may be caused by the other) on the Yaxis. Scale the axes to maximize the use of the plot area for displaying the data points. You may add data labels to identify the cases.
St ru
4500 4000 3500 3000 2500 2000 1500 1000 500 0
Y Axis
Scales
Ko nk rit da n Ba ha n .. n hi d. da eo te k an lik G lo R in g Hi dr au
St ru kt ur ra t Be
kt ur
G gi ni ka da Tr l n af M ik ek da ... n Le bu h. .. Pe ris Al ia n am Ke Uk Se ki ur ju ru t. . . ta r eo
K eluas an L antai (m )
gridlines
X Axis
P urata
Minimum
Maks imum
Legend
Data Labels
Indeks (m /pelajar)
14 12 10 8 6 4 2 0
UM
11
12 9 9 7 1 10 8 3 2
11
Indeks (m2/pelajar)
NM
TM Ui
UM UN P IM AS UM S
UT
UK
UT
UP
US
UP
Data Analysis
Data analysis
Pay attention to the units used. (Students often make mistakes here!). You may use statistical tools in your analysis. When evaluating others equations, it is important to refer to the paper written by the respective authors. The reason being, chances of the other party making mistakes are likely. You will end up using the wrong equations! Do not discard data without checking the influentiality of the data first. You may not know until you have done some test on this! If the study involves developing empirical equations, check on the significance of the controlling variables. (this can be done through statistical tests. Students may review the underlying theories and fundamentals to confirm the factors that govern the dependent variables. Again this may not be necessary true.).
Continue..
In modeling (simulations or optimization etc) works, do not always believe in the results generated from the model. Some engineering judgement is necessary. Explain the trends you observe in the figures you present. Do not explain what you did not see as the examiners would be equally looking at the same thing with their unaided eyes! Do not throw your raw data. They should be kept safely as in circumstances where your computers got corrupt, you still have your raw data to refer to.
Descriptive
Inferential
Predictive
Forecasting methods including Multiple regression analysis
Mode
Minimum Maximum Variance Proportions Etc.
It is imperative to first categorize your variables as independent and dependent. Determine the number of independent and dependent variables in the study. You may treat intervening or nuisance variables as additional independent variables. Determine the level of measurement (nominal, ordinal or interval) applied to each relevant variable.
Examples
Conclusion
Most research requires data and data analysis. Data acquisition is of utmost importance and considerable effort should be made to obtain or generate good data. Good data are data whose characteristics enable the research objectives to be met. Data of poor quality or undesirably low quantity will lead to unsatisfactory data analysis and vague results. The characteristics of the data, particularly their type, quantity, and how they were sampled, constrain the choice of data analysis techniques able to be used on the data. Use proper charts and graphs for effective presentation of your data.
Thank you