Você está na página 1de 18

Course B1129 Statistics for management Sem-1 Assesment 1 Question 1.

. (a) What is the difference between a qualitative and quantitative variable? [5 Marks] (b) A town has 15 neighbourhoods. If you interviewed everyone living in one particular neighbourhood, would you be interviewing a population or a sample from the town? Would this be a random sample? If you had a list of everyone living in the town, called a frame, and you randomly selected 100 people from all neighbourhoods, would this a random sample? [5 Marks].

Answer 1:

1. Qualitative data deals with meanings while quantitative data deals with numbers.
Qualitative data describes properties or characteristics that are used to identify things. Quantitative data describes in terms of quantity using the numerical figure accompanied by measurement unit. By definition something that is qualitative concerns or describes a quality. A qualitative variable is a descriptive. Qualitative variable are sometimes referred to as categorical. The variable may be colors in the light spectrum or a comparison between red and green grapes. Qualitative variables can influence the outcome of an experiment or research because they can influence other factors or parameters. Qualitative variables are frequently used in social research. Qualitative research is considered to be inductive. By definition something that is quantitative can be expressed as a quantity or number. Quantitative variables are something that can be measured. Quantitative variables are numerical. A quantitative variable can be a percentage of something, a number of units or any other measurement. Temperature is a quantitative value or variable by the number of degrees. Speed, area population, voltage and time are all examples of quantitative variables that can be measured. Quantitative variables are most often considered to be deductive in nature. An example of quantitative variables in an experiment would be testing the change in speed on a turntable as additional weight is applied. The turntable itself is the controlled variable. The experimenter will only use one. The independent quantitative variable is the amount of weight applied for each measurement. The dependent quantitative variable is the resulting speed that is measured.

An example of a qualitative variable in testing would be the drying time require for red and green grapes at a constant temperature. The outcome, of time is measured and therefore quantitative. The controlled variable being used is temperature, also quantitative. The independent variable is qualitative, the difference between red and green grapes. In this particular example the weight of each grape, a quantitative variable would also need to be consistent or controlled.

1 B) Question:A town has 15 neighbourhoods. If you interviewed everyone living in one particular neighbourhood, would you be interviewing a population or a sample from the town? Would this be a random sample? If you had a list of everyone living in the town, called a frame, and you randomly selected 100 ! 1B) Answer: A town has 15 neighbourhoods. If we interviewed everyone living in one particular neighbourhood. We will be interviewing a sample from the town.This Is a random sample. If we Have a list of everyone living in the town, called a frame, and we randomly selected 100 people from all neighbourhoods, this would be a random sample.

Explanation: When collecting data on a large group of people (called a "population"), you might want to minimize the impact that the survey will have on the group that you are surveying. It is often not necessary to survey the entire population. Instead, you can select a random sample of people from the population and survey just them. You can then draw conclusions about how the entire population would respond based on the responses from this randomly selected group of people. This is exactly what political pollsters do they ask a group of people a list of questions and based on their results, they draw conclusions about the population as a whole with those often heard disclaimers of "plus or minus 5%."

Question 2. a) Explain the steps involved in planning of a statistical survey? b) What are the merits & Demerits of Direct personal observation and Indirect Oral Interview? Answer 2a) The Accuracy of data obtained in a survey depends upon the care exercised in planning. A properly planned investigation can lead to best result with least cost and time. The steps are defined below. 1) Nature of problem to be investigated should be properly defined in an unambiguous manner. 2) Objective of investigation stated at the outset. Objective could be to:  Obtain certain estimates.  Establish a theory.  Verify an existing statement .

 Find relationship between characteristics.  The scope of investigation has to be 3) The scope of investigation has to be made clear. The scope of investigation referes to the area to be covered, identification of units to be studied, nature of characteristics to be observed accuracy of measurement, analytical methods, time cost and other resources required. 4) Whether to use data collected from primary or secondary source should be determined in advance. 5) The organization of investigation is the final step in the process. It encompasses the determination of the number of investigators required, their training, supervision work needed, funds required.

Answer 2b) When the investigator collects the data by having direct contact with units of investigation is direct personal observation. The direct personal method is suitable where the scope of investigation is narrow . investigation is confidential and requires personal personal attention of the investigator and accuracy of data is important. However there are some minus points of direct personal observation are also there.

Merits and Demerits of direct personal observation: Merits Demerits 1 We get the original data which is This .method consumes more cost more accurate and reliable. 2 Satisfactory information can be This method consumes more time. extracted by the investigator 3 Date is homogeneous and This method can not be used when the comparable. scope of the work is wide. 4 Additional information can be Most of the data collected by this method is gathered. maintained as confidential. Hence , there is chance of leakage of data. 5 Misinterpretation of questions can be avoided.

Question 3A) Draw Ogives from the follwing data and measure median value verify it by actual calcuation. Central size Frequency 5 5 15 11 25 21 35 16 45 10

Answer 3
70 60 50 40

Series1
30 5 11 21 16 10 20 10 0 0 10 20 30 40 50

4. a) What is the main difference between correlation analysis and regression analysis? b) In a multiple regression model with 12 independent variables, what are the degrees of freedom for error? Explain? Answer 4 a) Corelation analysis attempt to study the relationship between the two variables x and y. Regression analysis attempts to predict the average x for a given y.

Corelations Analysis deals with the following.  Measuring the relationship between the variables.  Testing the relationship for its significance.  Giving confidence interval for population corelation measure. Regression analysis deals with the following.  To estimate the values of the dependent variables from the values of the independent variables.  To get a measure of the error involvedwhile using regression line as a basis for estimation.  Regression coefficient is used to calculate corelation coefficient.

Correlation quantifies the degree to which two variables are related. Correlation does not find a best-fit line (that is regression). You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X. With correlation, it doesn't matter which of the two variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the two. With linear regression, the decision of which variable you call "X" and which you call "Y" matters a lot, as you'll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y. Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate. With linear regression, the X variable is often something you experimentall manipulate (time, concentration...) and the Y variable is something you measure. The difference between corelation coefficient and regression coefficient.

Question 4 b) The degrees of freedom in a multiple regression equals N-k-1,. Where k is the number of variables.

N is data points. The more variables you add, the more you erode your ability to test the model..

n-12-1 ??

Question 5. a) Discuss what is meant by Quality control and quality improvement. Answer 5 A) Quality control: Statistical quality control refers to the use of statistical methods in the monitoring and maintaining of the quality of products and services. One method, referred to as acceptance sampling, can be used when a decision must be made to accept or reject a group of parts or items based on the quality found in a sample. A second method, referred to as statistical process control, uses graphical... Quality control refers to the systematic use of methods to ensure that a service or product conforms to a desired standard. Primary emphasis is placed on monitoring processes and/or outcomes. Quality improvement refers to the betterment or enhancement of a product or service. When enhancements are ongoing or occur repeatedly over time, the process is known as continuous quality improvement Quality improvement: Complete control and improvements on any process is made by accurate measurements at critical points within the process. In order to gain confidence, the numbers are required to be generated often at various points so that all the variations of the process are detected. The quantity of measurements accumulates over time and simple tables or listings of these numbers are not enough to evaluate the process. The following statistical tools are used to understand what the numbers mean. The numbers from measurements that represent something in common rather than a scattering of unrelated numbers are called a set. When measuring properties of the process that are different, for example, gradation, crush count, or chert count, each property requires a set of

The act of overseeing all activities and tasks needed to maintain a desired level of excellence. This includes creating and implementing quality planning and assurance, as well as quality control and quality improvement. It is also referred to as total quality management (TQM). A major aspect of quality control is the establishment of well-defined controls. These controls help standardize both production and reactions to quality issues. Limiting room for error by specifying which production activities are to be completed by which personnel reduces the chance that employees will be involved in tasks for which they do not have adequate training.

Question 4 b) What are the limitations of a quality control charts? Answer 4 b) A control chart is a popular statistical tool for monitoring the quality of goods and services, and for detecting when the process goes "out of control" as early as possible. Samples from the process are taken every time interval, and their quality measured. Control charts are used to track the sample quality over time and detect any unusual behavior. Below are calculators that help you to easily obtain the control chart limits for different types of measurements. Statistical tool used in quality control to analyze and understand process variables determine process capabilities, and to monitor effects of the variables on the difference between target and actual performance. Control charts indicate upper and lower control limits, and often include a central (average) line, to help detect trend of plotted values. If all data points are within the control limits, variations in the values may be due to a common cause and process is said to be 'in control'. If data points fall outside the control limits, variations may be due to a special cause and the process is said to be out of control. Traditional statistical tools are subject to certain constraints when they are applied to quality control in industries where the number of faults per working day is limited. An effective quality monitoring and analyzing tool is therefore needed to meet the specific requirements of these industrial sectors. Proposes a so-called Cause-classified Control Chart,

Types of charts Chart Process observation Process observations relationships Process Size of observations shift to type detect Large ( 1.5 ) Large ( 1.5 )

and R chart

Quality characteristic measurement within one subgroup Quality characteristic measurement within one

Independent

Variables

and s chart

Independent

Variables

subgroup Shewhart individuals control chart (ImR chart or XmR chart)

Quality characteristic measurement for one observation

Independent

Variables

Large ( 1.5 )

Quality characteristic Three-way chart measurement within one subgroup p-chart Fraction nonconforming within one subgroup Number nonconforming within one subgroup Number of nonconformances within one subgroup

Independent

Variables

Large ( 1.5 ) Large ( 1.5 ) Large ( 1.5 ) Large ( 1.5 ) Large ( 1.5 )

Independent

Attributes

np-chart

Independent

Attributes

c-chart

Independent

Attributes

u-chart

Nonconformances per unit Independent within one subgroup Exponentially weighted moving average of quality Independent characteristic measurement within one subgroup Cumulative sum of quality characteristic Independent measurement within one subgroup Quality characteristic measurement within one subgroup Quality characteristic measurement within one

Attributes

EWMA chart

Attributes or variables

Small (< 1.5 )

CUSUM chart

Attributes or variables

Small (< 1.5 )

Time series model

Autocorrelated

Attributes or variables Variables

N/A

Regression

Dependent of process control

Large

control chart

subgroup Sliding window of quality characteristic measurement within one subgroup

variables

( 1.5 )

Real-time contrasts chart

Independent

Attributes or variables

Small (< 1.5 )

Limitations of Control Chart: several authors have criticised the control chart on the grounds that 1) it violates the likelihood principle.However, the principle is itself controversial and supporters of control charts further argue that, in general, it is impossible to specify a likelihood function for a process not in statistical control, especially where knowledge about the cause system of the process is weak. 2) Some authors have criticised the use of average run lengths (ARLs) for comparing control chart performance, because that average usually follows a geometric distribution, which has high variability and difficulties. 3) Some authors have criticized that most control charts focus on numeric data. Nowadays, process data can be much more complex, e.g. non-Gaussian, mix numerical and categorical, missing-valued. 4) Critics of this approach argue that control charts should not be used when their underlying assumptions are violated, such as when process data is neither normally distributed nor binomially (or Poisson) distributed. Such processes are not in control and should be improved before the application of control charts. Additionally, application of the charts in the presence of such deviations increases the type I and type II error rates of the control charts, and may make the chart of little practical use.

6. a) Suggest a more suitable average in each of the following cases:

(i) Average size of ready-made garments. (ii) Average marks of a student. b) State the nature of symmetry in the following cases: (i) When median is greater than mean, and (ii) When Mean is greater than median. Answer 6 a) Average size of ready made germents == median Average marks= mean

Answer 6 B) State the nature of symmetry in the following cases: (i) When median is greater than mean, and (ii) When Mean is greater than median.

Consider the following data set: 4 ; 5 ; 6 ; 6 ; 6 ; 7 ; 7 ; 7 ; 7 ; 7 ; 7 ; 8 ; 8 ; 8 ; 9 ; 10 This data produces the histogram shown below. Each interval has width one and each value is located in the middle of an interval.

The histogram displays a symmetrical distribution of data. A distribution is symmetrical if a vertical line can be drawn at some point in the histogram such that the shape to the left and the right of the vertical line are mirror images of each other. The mean, the median, and the mode are each 7 for these data. In a perfectly symmetrical distribution, the mean, the median, and the mode are often the same. The histogram for the data: 4;5;6;6;6;7;7;7;7;8 is not symmetrical. The right-hand side seems "chopped off" compared to the left side. The shape distribution is called skewed to the left because it is pulled out to the left.

The mean is 6.3, the median is 6.5, and the mode is 7. Notice that the mean is less than the median and they are both less than the mode. The mean and the median both reflect the skewing but the mean more so. The histogram for the data: 6 ; 7 ; 7 ; 7 ; 7 ; 8 ; 8 ; 8 ; 9 ; 10 is also not symmetrical. It is skewed to the right.

The mean is 7.7, the median is 7.5, and the mode is 7. Notice that the mean is the largest statistic, while the mode is the smallest. Again, the mean reflects the skewing the most. To summarize, generally if the distribution of data is skewed to the left, the mean is less than the median, which is less than the mode. If the distribution of data is skewed to the right, the mode is less than the median, which is less than the mean. Skew ness and symmetry become important when we discuss probability distributions in later chapters.

Part II

1. What are the characteristics of a good measure of central tendency?

(b) What are the uses of averages?

Answer 1 a) A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of

central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as, the median and the mode. The mean, median and mode are all valid measures of central tendency but, under different conditions, some measures of central tendency become more appropriate to use than others. There can often be a "good measure of central tendency with regards to the data you are analyzing but there is no one good" measure of central tendency. This is because whether you use the median, mean or mode will depend on the type of data you have, , such as nominal or continuous data; whether your data has outliers and/or is skewed; and what you are trying to show from your data. So the good measure of central tendency depends on the type of data. Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed) Good measure of central tendency Mode Median Mean Median

Answer 1 b) The uses of averages: The use or application of a particular average depends upon the purpose of the investigation. Some of the cases of different averages are as follows: Arithmetic Mean Arithmetic mean is considered an deal average. It is frequently used in all the aspects of life. It possesses many mathematical properties and due to this it is of immense utility in further statistical analysis. In economic analysis arithmetic mean is used extensively to calculate average production, average wage, average cost, per capital income exports,

imports, consumption, prices, etc. When different items of a series have different relative importance, then weighted arithmetic mean is used. Geometric Mean Use of Geometric mean is important in a series having items of wide dispersion. It is used in the construction of index number. The averages of proportions, percentages and compound rates are computed by geometric mean. The growth of population is measured in it as population increases in geometric progression. Harmonic Mean Harmonic mean is applied in the problems where small items must get more relative importance than the large ones. It is useful in cases where time, speed, values given in quantities, rate and prices are involved. But in practice, it has little applicability. Median and partition Values Median and partition values are positional measures of central tendency. There are mainly used in the qualitative cases like honestly, intelligence, ability, etc. In the distributions which are positively skewed, median is a more suitable average. These are also suitable for the problems of distribution of income, wealth, investment, etc. Mode Mode is also positional average. Its applicability of daily problems is increasing. Mode is used to calculate the 'modal size of a collar', 'modal size of shore', or 'modal size of ready-made garments' etc. It is also used in the sciences of Biology, Meteorology, Business and Industry. Question 2. For each one of the following null hypothesis, determine if it is a left-tailed, a right-tailed, or a two-tailed test. [10 Marks} a. 10 = b. P 0.5 c. is at least 100. d. -20 e. p is exactly 0.22 answer 2) One-Tailed and Two-Tailed Tests A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution, is called a one-tailed test. For example, suppose the null

hypothesis states that the mean is less than or equal to 10. The alternative hypothesis would be that the mean is greater than 10. The region of rejection would consist of a range of numbers located located on the right side of sampling distribution; that is, a set of numbers greater than 10. A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling distribution, is called a two-tailed test. For example, suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range of numbers located located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10. a. 10 = b. P 0.5 c. is at least 100. d. -20

Question 3) Why do we have to know the distribution of a test statistic?

answer 3) A statistic is calculated from the sample. To begin with we assume that the hypothesis about the population parameter is true. We compare the value of the statistic with the hypothetical value of the parameter. If the difference between them is small, the hypothesis is accepted and if the difference between them is large, the hypothesis is rejected. A statistic on which the decision can be based whether to accept or reject a hypothesis is called test statistic. It is important to remember that a teststatistic does not prove the hypothesis to be correct but if furnishes as evidence against the hypothesis. Some of the test statistics to be discussed later are Z, t and Chi Square .Statistics is a diverse subject and thus the mathematics that are required depend on the kind of statistics we are studying. A strong background in linear algebra is needed for most multivariate statistics, but is not necessary for introductory statistics. A background in Calculus is useful no matter what branch of statistics is

being studied, but is not required for most introductory statistics classes .At a bare minimum the student should have a grasp of basic concepts taught in Algebra and be comfortable with "moving things around" and solving for an unknown.

In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample[1]; it is considered as a numerical summary of a set of data that reduces the data to one or a small number of values that can be used to perform a hypothesis test. Given a null hypothesis and a test statistic T, we can specify a "null value" T0 such that values of T close to T0 present the strongest evidence in favor of the null hypothesis, whereas values of T far from T0 present the strongest evidence against the null hypothesis. An important property of a test statistic is that we must be able to determine its sampling distribution under the null hypothesis, which allows us to calculate p-values. For example, suppose we wish to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If we flip the coin 100 times and record the results, the raw data can be represented as a sequence of 100 Heads and Tails. If our interest is in the marginal probability of obtaining a head, we only need to record the number T out of the 100 flips that produced a head, and use T0 = 50 as our null value. In this case, the exact sampling distribution of T is the binomial distribution, but for larger sample sizes the normal approximation can be used. Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing. A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution.

Você também pode gostar