Solved QP CP-205

1 Q1. What is Regression Analysis ? How can you find the regression lines ?
Explain it with the help of suitable example ? Regression analysis reveals average relationship between two variables and this makes possible estimation or prediction. Definitions of the term regression:1. Regression is the measure of the average relationship between two or more variables in term of the original units of data. 2. The term regression analysis refers to the method by which estimates by are made of the value of a variables from a knowledge of the value of one or more other variables and to the measurement of the errors involved in this estimation process. 3. One of the most frequently used techniques in economics and business research, find a relation between two or more variables that are related causally, is regression analysis. 4. Regression analysis attempts to establish the nature of the relationship between variables and thereby provide a mechanism for prediction or forecasting. It is clear from the above definitions that regression analysis is a statistical device with the help of which we are in a position to estimate the unknown value of one variable from known value of another variable. the variables which is used to predict the variable of interest is called independent variable or explanatory variable and the variable we are trying to predict is called the dependent variable or explained variable. the interdependent variable is denoted by X and the dependent variable by Y. the analysis used is called the similar linear regression analysis simple because there is only one predictor variable, and linear because of the assumed linear relationship between the dependent and independent variables. For ex.:- while estimating sale of a product from figure on advertising expenditures, sale is generally taken as the dependent variable. However, there may or may not causal connection between there two factors in the sense that changes in advertising expenditures cause changes in sales. USES OF REGRESSION ANALYSIS Regression analysis is a branch of statistical theory that is widely used in almost all the scientific disciplines. In economic it is the basis techniques for measuring or estimating the relation among economic variable that constitute the essence of economic theory and economic life. For ex.:-if we know that two variables, price (X) and demand (Y), are closely related we can find out the most probable value of X. Similarly, if we know that the amount of tax and the rise in the price of a commodity are closely related, we can find
2 out the expected price of a certain amount of tax levy. the regression analysis attempts to accomplish the following : 1. Regression analysis provides estimates of value of the dependent variable from value of the independent variables. 2. The second goal of regression analysis is to be obtain a measure of the error involved in using the regression line as a basis of estimation. 3. With the regression coefficients we can calculate the correlation coefficient. the square of correlation coefficient ( r ), called coefficient of determination, measures the degree of association of correlation that exists between the two variables. REGRESSION LINE If we take the case of two variables X and Y, we shall have two regression lines as the regression of X on Y and regression of Y on X. The regression line of Y on X gives the most probable value of Y for given value of X and the regression line of X on Y gives the most probable values of X for given values of Y. However, when there is either perfect positive or perfect negative correlation between the two variables (r = 1) the regression line will coincide, i.e. we will have only one line. The further the two regression line from each other, the lesser is the degree of correlation and the nearer the two regression lines to each other, the higher is the degree of correlation. if the variables are independent, r is zero and the line of regression are at right angles, i.e. parallel to OX and OY. That the regression lines cuts each other at the point of average of X and Y, i.e., if from the point where both the regression lines cuts each other a perpendicular is drawn on the X axis, we will get the mean value of X and if from that point a horizontal line is drawn on the Y-axis, we will get the mean value of Y. The regression lines are drawn on least square assumption which stipulates that the sum of square of the deviation of the observed Y values from the fitted line shall be minimum. The total of the square of the deviations of the various points is minimum only from the line of best fit. The regression line of X and Y minimizes the total square of the horizontal deviations. For ex.:Height of fathers (inches) Height of sons (inches) 65 63 67 64 68 62 70 66 68 67 69 71 68 66 68 65 69 66 68 65 71 67 68 70
The two regression equations corresponding to these variables are: x = -3.38 + 1.036 y y = 35.82 + 0.476 x .(i) (ii)
3 By assuming any value of Y we can find out corresponding values of X from Eq. (i) For example if Y = 65 X would be 3.38 + 1.036(65) = 63.96 Similarly, if Y = 70, X would be 3.38 + 1.036(70) = 69.14 we can plot these points on the graph and obtain regression line of X and Y. Similarly, by assigning any value to X in Eq. (ii) we can obtain corresponding values of Y. hus, if X = 63, Y would be: 35.82 + .476(63) = 65.808 or 65.81 and for X = 70, Y would be 35.82 + .476 = 69.14 The graph of original data and these lines would be as follows:
Q2. What is Statistical Control ? What are the various tools of Statistical Quality Control ? Ans : Meaning of statistical quality control:Statistical quality control refers to the use of statistical techniques in controlling the quality of manufactured goods. It is the means of establishing and achieving quality specification, which requires use of tools and techniques of statistics. it is an important application of the theory of probability and theory of sampling for the maintenance of uniform quality in a continuous flow of manufactured products. One of the major tools of S.Q.C. is the control chart first introduced by W.A.Shewhart through the application of normal distribution.
Definition of statistical quality control (S.Q.C.) :1.Statistical quality control can be simply define as an economic and effective system of maintaining and improving the quality of outputs throughout the whole operating process of specifications, production and inspection based on continuous testing with random samples. 2. statisticaly quality control should be viewed as a kit of tools which may influence decisions to the function of specification, production and inspection.
Methods of statistical quality control:S.Q.C. methods are applied to two distinct phase of plant operation. They are: (1). PROCESS CONTROL (2). PRODUCT CONTROL (1).PROCESS CONTROL:- Under the process control, the quality of the products is controlled while the products are in the process of production. The process control is secure with the techniques of control charts. Control charts are used as a measure of quality control not only in the production process but also in the areas of advertising, packing, air line reservations etc. Control charts ensure that whether the products confirm to the specified quality standards or not. (2). PRODUCT CONTROL:- under the products control, the quality of the product is controlled while the product is ready for sale and dispatch to the customers. The product control secured with the techniques of acceptance sampling. In acceptance sampling, the manufacture articles are formed into lots, a few items are choose randomly and lot is either accepted and rejected on the basis of certain set of rules, usually called sampling inspection plans. Thus, process control concerned with controlling of quality of goods during the process of manufacturing
6 where as product control is concern with the inspection of finished goods, when they are ready for delivery. CONTROL CHARTS The control charts are the graphic devices developed by Walter A. Shewhart for detecting unnatural pattern of variation in the production process and determining the permissible limits of variation. Control charts are core of statistical quality control. There are based on the theory of probability and sampling. Control charts are simple to the construct and easy to interpret and they tell the production manager at the glance whether or not the process is in control i.e. within the tolerance limits. A control chart consists of three horizontal lines: (1). Central line (CL) (2). Upper control limit (UCL) (3). Lower control limit (LCL) (1). Central line (CL):- The central line is the middle line of the charts. It indicates the grand average of the measurements of the samples. It shows the desired standard or level of the process. The central line is generally drawn as bold line. (2). Upper control limit (UCL):- the upper control limit is usually obtained by adding 3 sigma (3) to the process average. It is denoted by Mean + 3. The upper control limit is generally drawn as dotted line. (3). Lower control limit (LCL):- The lower control limit is usually obtained by subtracting 3sigma (3) to the process average. It is denoted by mean - 3. The lower control limit is generally drawn as dotted line.
In the control chart, the mean value of the statistics T (i.e. mean, range, S.D. etc.) for successive sample are plotted and often joined by broken line to provide a visual clarity.
Types of control charts

Control charts are of two types depending on whether a given quality or characteristics of a product is measurable or not. These are:
7 (A) Control charts for variables (1) X-Chart (2) R-Chart (3) -Char (B) Control charts for Attributes (1) p-charts (2) np-charts (3) c-charts A. Control charts for variables: These charts are used when the quality or characteristics of a product is capable of being measured in quantitavely such as gauge of steel alimirah, diameter of a screw, tensile strength of steel pipe. Such charts are three types: (1). mean chart (2). R-chart (range chart) (3). -chart (Standard chart deviation) (1). Mean chart: This chart is constructed for controlling the variation in the average quality standard of the products in a production process. Procedure: - The construction of Mean chart involves the following steps: (i) Compute the mean of each sample i.e. (ii) Compute the mean of the sample mean by dividing the sum of the sample mean by the number of sample i.e.
(2). R-chart:The range chart is constructed for controlling the variation in the dispersion or variability of the quality standard of the products in a production process. Procedure:- The construction of R-chart involve the following steps: (i) compute the range of each sample using the formula: R= L - S L= Largest value S= Smallest value
8 (ii) Compute the mean of ranges by dividing the sum of the sample ranges (R) by the number of samples R + R2 + .........Rk R R= 1 = where k = No. of samples. k k U C L = D4 R L C L = D3 R (4) Chart This chart is constructed to get a better picture of the variation in the quality standerd in a process then that is obtaining from the range chart provide the standard deviation of the variation sample are readily available. Procedure:- the construction of chart involves the following steps: (i) Find the S.D. of each sample, if not given (ii) Compute the mean of S.D. by using the formula:
(iii)
S=
S1 + S 2 + .........S k S = k k
The mean of S.Ds represents the central line (CL) (iii) Find the upper and lower control limits by using the formula: (a) On the basis of quality control factors B1,B2 and population standard deviation () UCL = B2. LCL = B1. (b) On the basis of quality control factors B3,B4 and estimated population standard Deviation ( S ) UCL = B4. S LCL = B3. S Where the B1, B2, B3, and B4 are the quality control factors. (B) Control chart for attributes
these chart are used when the quality of a product cannot be measured in quantitative form and data is studied on the basis of totality of attributes like defective and non-defective Such chart are three types:(i) p chart (or fraction defective chart)
9 (ii) (iii) (1) np chart (or no. of defective chart) c chart (or no. of defective per unit chart)
p chart :this chart is constructed for controlling the quality standard in the average fraction defective of the product in a process when the obserbed sample items are classified into defectives and non-defectives. Procedure:- The construction of p chart involves the following steps: (i) Find the fraction defective or proportion of defective in each sample i.e. P1, P2, P3..Pk (ii) Find the mean of the fraction defective by using the formula: _ Total no. of Defectives p = -------------------------------------Total no. of units inspected P =1- q Control Limits = p 3.
p.q n
(iii)
LCL = p 3. UCL = p + 3. (iii) (iv)
p.q n p.q n
Construct the p-chart by plotting the sample number on x-axis and sample fraction defectives, UCL, LCL and central line on the y-axis. Interpret the p-chart. If all the sample fraction (p) fall within the control limits, the process is in the state of control otherwise it is beyond the control.
Np- Chart : This chart is constructed for controlling the quality standard of attributes in a process where the sample size is equal and it is required to plot the number of defectives (np) in samples instead of fraction defectives. Procedure : (i) Find the average number of defectives. (ii) Find the value of p by using the formula p= np q = 1 p n
10
(iii)
Determine the control limits by using the formula UCL = n p + 3 n p q LCL = n p 3 n p q
Construct the np-chart by plotting the sample number on x-axis and sample fraction defectives, UCL, LCL and central line on the y-axis. (vi) Interpret the np-chart. If all the sample no. of defectives fall within the control limits, the rocess is in the state of control otherwise it is beyond the control. C-Chart : This chart is used for the control of number of defects per unit say a piece of cloth which may contain more than one defect. The inspection unit in this chart will be a single unit of product. The probability of occurrence of each defect tends to remain very small. Hence the distribution of the number of defects may be assumed to be a poisson distribution with mean = variance Procedure : (i) Determine the number of defects per unit in the sample of equal size. (ii) Find the mean of the number of defects counted in several units C C= N (iii) Determine the control limits UCL = C + 3 C LCL = C 3 C (iv) Construct the C-chart by plotting the sample number on x-axis and sample fraction defectives, UCL, LCL and central line on the y-axis. (v) Interpret the C-chart. If the Observed value s of the number of defect per unit fall within the control limits, the process is in the state of control otherwise it is beyond the control.
(v)
11 Q3. What are the various measures of dispersion ? Define it with the help of merits and demerits ? Ans : Introduction : The measure of central tendency serve to locate the center of the distribution, but they do not reveal how the items are spread out on either side of the center. This characteristic of a frequency distribution is commonly referred to as dispersion. In a series all the items are not equal. There is difference or variation among the values. The degree of variation is evaluated by various measures of dispersion. Small dispersion indicates high uniformity of the items, while large dispersion indicates less uniformity. For example consider the following marks of two students. Student I Student II 68 85 75 90 65 80 67 25 70 65 Both have got a total of 345 and an average of 69 each. The fact is that the second student has failed in one paper.When the averages alone are considered, the two students are equal. But first student has less variation than second student. Less variation is a desirable characteristic. Characteristics of a good measure of dispersion: An ideal measure of dispersion is expected to possess the following properties 1.It should be rigidly defined 2. It should be based on all the items. 3. It should not be unduly affected by extreme items. 4. It should lend itself for algebraic manipulation. 5. It should be simple to understand and easy to calculate Absolute and Relative Measures : There are two kinds of measures of dispersion, namely 1.Absolute measure of dispersion 2.Relative measure of dispersion. Absolute measure of dispersion indicates the amount of variation in a set of values in terms of units of observations. For example, when rainfalls on different days are available in mm, any absolute measure of dispersion gives the variation in rainfall in mm. On the other hand relative measures of dispersion are free from the units of measurements of the observations. They are pure numbers. They are used to compare the variation in two or more sets, which are having different units of measurements of observations. The various absolute measures of dispersion are listed below. Absolute measure 1. Range 2.Quartile deviation 3.Mean deviation 4.Standard deviation Range
12 Range: This is the simplest possible measure of dispersion and is defined as the difference between the largest and smallest values of the variable. In symbols, Range = L S. Where L = Largest value. S = Smallest value. In individual observations and discrete series, L and S are easily identified. Merits and Demerits of Range : Merits: 1. It is simple to understand. 2. It is easy to calculate. 3. In certain types of problems like quality control, weather forecasts, share price analysis, et c., range is most widely used. Demerits: 1. It is very much affected by the extreme items. 2. It is based on only two extreme observations. 3. It cannot be calculated from open-end class intervals. 4. It is not suitable for mathematical treatment. 5. It is a very rarely used measure. Inter Quartile Range and Quartile Deviation Deviation : Definition: Quartile Deviation is half of the difference between the first and third quartiles. Hence, it is called Semi Inter Quartile Range Inter Quartile Range = Q3 Q1 Q3 Q1 Quartile Deviation = 2 Merits and Demerits of Quartile Deviation Merits : 1. It is Simple to understand and easy to calculate 2. It is not affected by extreme values. 3. It can be calculated for data with open end classes also. 4. It is superior and more reliable than Range. Demerits: 1. It is not based on all the items. It is based on two positional values Q1 and Q3 and ignores the extreme 50% of the items 2. It is not amenable to further mathematical treatment. 3. It is affected by sampling fluctuations. Mean Deviation Mean Deviation is another measure of dispersion. It is also known as average deviation. Mean deviation is defined as the arithmetic average of the deviation of the various items of a series computed from some measure of central tendency say mean or median. In taking deviation of the various items, algebraic signs + and are not taken into consideration. Although mean deviation can be computed either from the mean or median but theoretically
13 median is preferred because the sum of the deviations of the items taken from median is minimum when signs are ignored. The formula is : M.D. From Mean M. D. from Median
X X n X med n
Merits and Demerits of Mean Deviation Merits : 1. It is simple to understand and easy to compute. 2. It is based on all the observation. 3. It is less affected by extreme items. 4. It is very useful in various fields such as economics, commerce etc. Demerits : 1. It is not an accurate method when it is calculated from mode. 2. It is not capable for further algebraic treatment. 3. It is not used in Statistical conclusion. Standard Deviation Standard deviation is also called as root mean square deviation. Standard deviation is defined as the square root of the arithmetic mean of the squares of the deviation of the values taken from mean. Standard deviation is denoted by the small Greak letter (read as sigma) and is computed as follows :
( X X ) 2 N
Merits and Demerits of Standard Merits : 1. It is rigidly defined 2. It is based on all the observation. 3. It is capable of being treated mathematically. For example, if standard deviation of a number of groups are known, their combined standard deviation can be computed. 4. It is not very much affected by the fluctuations of sampling and, therefore, is widely used in sampling theory and test of significance. Demerits : 1. As compared to the quartile deviation and range etc., it is difficult to understand and difficult to calculate. 2. It gives more importance to extreme observations. 3. Since it depends upon the units of measurement of the observations, it cannot be used to compare the dispersions of the distributions expressed in different units.
Q. 1. Statistic is a body of methods for making wise decision in the face of uncertainty. Comment on the statement bringing out clearly how does statistics help in business decision-making. Ans.: There have been many definitions of the term Statistics. However, a very simple and concise definition of statistics by Croxton and Cowden, says, Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data. So the above definition has explained five stages in a statistical investigation namely 1) Collection of data 2) Organization 3) Presentation of data 4) Analysis and 5) Interpretation of numerical data. Since statistical methods help in taking decisions, Statistics may be regarded as a body of methods for making wise decisions in the face of uncertainty. The scope of statistics is so vast it is used in any field whether it be trade, industry or commerce, economics, service, education, sociology etc. Statistics and Business: With the growing size and ever increasing competition the problems of the business enterprises are becoming complex and they are using more and more statistics in decision-making. With the growth in the size of business firms it has often become impossible for the owners to maintain personal contact with the thousands and lakhs of customers. Management has become a specialized job and a manager is called upon to plan, organize, supervise and control the operations of the business house. Since very little personal contact is possible with customers, a modern business firm faces a greater degree of uncertainty concerning future operations than it did when the size of business was small. Moreover, most of the production these days is in anticipation of demand and therefore, unless a very careful study of the market is made, the firm may not be able to make profits. Thus a businessman who has to deal in an atmosphere of uncertainty can no longer depend on trial and error method in taking decisions. If he is to be successful in decision-making, he must be able to deal systematically with the uncertainty itself by careful evaluation and application of statistical methods concerting the business activities. Business indeed runs on estimates and probabilities. The higher the degree of accuracy of a businessmans estimates, the greater is the success in his business. Business activities can be broadly classified into various groups, like production, sale, finance, personnel, accounting, market and product research and quality control. With the help of statistical methods in respect of each of the above areas abundant quantitative information can be obtained which can be used in formulating suitable policies; the information might be in the form of reports, computer printouts, records kept in ledgers or other books etc. The ability of the manager to extract important information from the data and use it in making decisions can have a significant effect on his or her own future as well as that of the organization. For example, a Marketing Researcher in a large company uses data of consumer buying habits to help develop new products or improve the existing products. A manager needs individual worker performance data to support promotion decisions. A production manager looks at quality control data to decide when to make adjustments in a manufacturing process. Statistical tables and charts are frequently used by sales managers to present numerical facts of sales.
The techniques of time series analysis and business forecasting enable the businessman to predict with a fair degree of accuracy the effect of a large number of variables. However, it should be remembered that though statistical methods are extremely useful in taking decisions, they are not perfect substitute for commonsense. So the use of business statistics must combine the knowledge of the business environment in which he operates and its technological characteristics with commonsense and ability to interpret statistical methods to non-statisticians. Q. 2. Comment on the following: a) None of the methods of collecting primary data can be regarded as the best. b) In collection of data commonsense is the chief requisite and experience the chief teacher. Ans : Data constitute the foundation for statistical analysis. Data may be obtained either from the primary source or the secondary source. A primary source is one that itself collects the data; a secondary source is one that makes available data which were collected by some other agency. Depending on the source statistical data are classified under two categories: i) Primary data and ii) Secondary data i) Primary Data are obtained by a study specifically designed to fulfill the data needs of the problem at hand. Such data are original in character and generated in large number of surveys conducted mostly by Government and also by some individuals, institutions and research bodies. For example, data obtained in a population census by the office of the Registrar General and Census Commissioner, Ministry of Home Affairs are primary data. Data which are not originally collected but rather obtained from published or unpublished sources are known as Secondary data. For e.g. data collected by the Census Commission is primary data for them and secondary data for all others who uses such data. Methods of Collecting Primary data: There are different methods used in the collection of primary data. Every method has its own merits and limitations also. In different situations, different type of methods are used. Some of these methods of collection of primary data are : 1) 2) 3) 4) 5) Direct personal interviews Indirect Oral interviews Information from correspondents Mailed Questionnaire methods and Schedules sent through enumerators
1. Direct Personal Interviews: Under this method of collecting data, there is face-to-face contact with the persons from whom the information is to be obtained. The interviewer asks them questions pertaining to the survey and collects the desired information. The information obtained is likely to be more accurate as the interviewer can clear the doubts of the informants. But it may be very costly, time consuming, personal prejudice and bias may affect the information gathered. Hence this method should be used only in those cases where intensive study of a limited field is desired. 2. Indirect Oral Interviews: Under this method, the investigator contacts third parties called witnesses capable of supply the necessary information. This method is generally adopted in those cases where the information to be obtained is of a complex nature and the informants are not inclined to respond if approached directly. The method, however depends on the honesty and ability of the interviewers and the type of persons whose evidence is being recorded. 3. Information from Correspondents: Under this method, the investigator appoints local agents or correspondents in different places to collect information. Newspaper agencies generally adopt this method. Also various government departments use this method to collect information for e.g. in the construction of wholesale price index. The main advantage of this method is that it is cheap and appropriate for extensive investigation. However, it may not always ensure accurate results because of personal prejudice. 4. Mail Questionnaire Method: Under this method, a list of questions pertaining to the survey (questionnaire) is prepared and sent to the various informants by posts. A request is made though a covering letter to fill up the questionnaire and send it back within a specified period. This method can be adopted where the field of investigation is very vast and it is relatively cheap also. However, this method can be adopted only where the informants are literate people. 5. Schedules sent through Enumerators: Yet another method of collecting information is that of sending schedules through the enumerator or interviewers. The enumerators contact the informants, get replies to the questions contained in a schedule and fill them in their own handwriting in the questionnaire form. It can be adopted in those cases where informants are illiterate. But this method is quite costly and requires experience and training of the enumerators. After analyzing the different methods of collecting primary data with their advantages and disadvantages, it can be concluded that none of the method is best. Every method has its own advantages and weaknesses. The type of method to be adopted depends on the existing situation. If the population to be interviewed happens to be illiterate, than it is not logical to use mail questionnaire method. Either direct personal interview or schedules sent thought enumerators method are used.
Sources of Secondary Data : In most of the studies it is not required to collect first hand information on all related issues and as such data collected by others is used. The sources of secondary data can broadly be classified under two heads: 1) Published sources and 2) Unpublished Source 1) Published Sources: The various sources of published data are: a) Report and official publications of i) International bodies such as the World Bank, International Labour Organisation etc. ii) Central and State Governments such as Economic Survey of India. iii) Reports of the Ad-hoc Committees and Commissions appointed by the government such as Sarkaria Committee, Mehrotra Committee, Fifth Pay Commission etc. b) Semi official publications of various local bodies such as Municipal Corporation and District Boards. c) Publications of autonomous and private institutes such as: i) Trade and professional bodies such as, the Federation of Indian Chambers of Commerce & Industry, the Institute of Chartered Accountants, the Institute of Foreign Trade. Their journals include Economic Trends, The Chartered Accountant, Foreign Trade Review. ii) Financial and economic journals such as Indian Economic Review, Reserve Bank of India Bulletin, Indian Finance. iii) Annual Reports of Joint Stock Cos & Corporations. iv) Publications bought out by various autonomous Research Institute such as Institute of Economic Growth, Delhi, National Council of Applied Economic Research, New Delhi. 2) Unpublished Sources: All statistical material is not always published. There are various sources of unpublished data such as records maintained by various govt. and private offices, research institutes etc. Such sources can be used where necessary. Q. 4. Write notes on the following: a) Sampling distributions and their characteristics b) Acceptance Sampling Ans.: Sampling theory is the study of relationship between a population and samples drawn from the population and it is applicable to random samples only. To estimate the true value of the population parameter (population mean, population standard deviation and population proportion etc.) by using sample statistics take sample mean, sample standard deviation, sample proportion etc and to find the limits of accuracy of estimates based on samples. Sampling theory helps us to determine whether the differences between two samples are actually due to chance variation or whether they are really significant. Let us consider a population of size N and let us draw all possible random samples of a given size n. We get
N = k (say) samples each of size n. For each of n
these k samples, we compute a statistic (i.e sample mean or sample standard deviation or sample proportion). The value of may vary from sample to sample. Let 1, 2, .... k be the value of the statistic for the k samples. Each of these values
occurs with a definite probability. Thus, we can construct a table showing the set of values 1, 2 ..... k of will their respective probabilities. This probability distribution of is known as the Sampling Distribution of . A statistic (sample mean, standard deviation, sample proportion) always has a sampling distribution, but a parameter (population mean, standard deviation, proportion etc) has no sampling distribution. The sampling distribution of a statistic reveals some important features: 1) A sampling distribution is generated from a population distribution, known or assumed. 2) The same population may generate an infinite number of sampling distributions for the statistic, each for special sample size n. 3) A population may generate sampling distributions for two or more different statistics. Sampling distributions are of great importance in theory and practice of statistics. This is because the sampling distribution of a statistic has well defined properties and it is from these properties that we can calculate risks (errors due to chance) involved in making generalization about populations on the basis of samples. Standard Error of Estimate Central Limit Theorem The Law of Inertia of Large Numbers. Details are already mentioned in the Chapter on Sampling Distributions
b) Acceptance Sampling: Introduction on statistical Quality Control SQC: The quality of a product is the most important property that one desires while purchasing it. The success of a manufacturer mostly depends on quality of his product. Also, the quality of the product should be maintained so that the reputation of the company does not suffer. Statistical quality control deals with quality of the articles produced in an industry. Articles produced by a machine differ from one another. If the difference is not much from the desired measurements, the article is acceptable and if it differs much, it is to be rejected. Rejection is generally not affordable, so it is necessary to keep constant vigil on the quality of the finished product. Statistical quality control helps to find out whether some reason can be assigned for this variation. SQC methods are applied to two phases of the manufacturing process. 1) Process Control 2) Product Control 1) Process Control When SQC is applicable to any repetitive manufacturing process and is known as a process control. 2) Product Control In this case the checking of the quality of the manufactured product is in respect of its acceptability. This is achieved though an acceptance inspection or a sampling inspection plan called Acceptance Sampling Plan.
Acceptance Sampling Plan : Sampling Inspection is carried out to inspect the raw material and the spares of a machine and his own product in the finished form. In short sampling inspection is meant for product control. Since it is not practically possible to inspect each and every unit produced in a large factory, hence, one has to resort to sampling inspection. It means that a small fraction of items, selected randomly from a lot, are inspected to decide, as to whether the lot should be accepted or rejected on the basis of the information supplied by the sample inspection. It is also called Acceptance sampling. Acceptance Sampling serves the following purposes: 1) It provides the basis to know whether the manufacturing process is yielding the product up to the mark or not. 2) It tells about the quality of the product at hand. 3) It minimizes the risk of the purchaser and protects the producer from future losses. Acceptance sampling is of two types : 1) Sampling inspection by attributes 2) Sampling inspection by variables Sampling inspection by attributes is considered to the more simpler and useful. Here, the quality of items will he judged only as defective or non-defective. It leads to two types of decisions i) Acceptance rejection type ii) Acceptance rectification type However, the acceptance rectification plan is better because if the sampling inspection does not lead to the acceptance of the lot, each and every unit of the lot is inspected and defectives are replaced by non-defectives and the lot is released for sale. There are two risks involved in the acceptance sampling. A Producers risk which is same as Type I Error, that a sampling inspection plan leads to the rejection of a lot of satisfactory quality. A Consumers risk is again same as Type II Error when he accepts a lot which is not upto mark. Types of Sampling Plans: 1) Single Sampling Plan: When the decision about the acceptance or rejection of a lot is based on a single sample, its known as single sampling plan. 2) Double Sampling Plan: Double sampling plan is used when a single sampling plan does not lead to the satisfaction about the acceptance or rejection of the lot. In this situation, a second sample is drawn and a decision is taken on the basis of the results of the first and second samples combined. This is known as Double Sampling plan 3) Multiple Sampling Plan: The process of second sampling can be extended to the third, and the fourth sampling stage etc. Its called Multiple sampling plan.
Q. 5. Out of 8000 graduates in a town 800 are females, out of 1600 graduate employees 120 are females. Use 2 test to determine if any distinction is made in appointment on the basis of sex. Table Value of 2 at 5% level of significance for one degree of freedom is 3.84. Ans. Females Males Total Graduates 800 7200 8000 Graduate Emp. 120 1480 1600 Total 920 8680 9600 Let us take the hypothesis that distinction is not made in appointment on the basis of sex. Applying 2 test, we will first find expected frequencies, given the observed frequencies. Expected frequency corresponding the first raw and first column =
RTxCT . RT Row Total, CT=Column Total, N=Total number of observations. For N 920x8000 =766.6=767 e.g. expected frequency to observed frequency 800= 9600
Similarly, other expected frequencies are : 767 7233 153 1447 Total 920 8680 (O) Observed Frequency 800 120 7200 1480 2 = (E) Expected Frequency 767 153 7233 1447 8000 1600 9600 (O-E)2 1089 1089 1089 1089 (O-E)2/E 1.41 7.11 0.15 0.75
(O E ) 2 E = 9.42
Given, the table value of 2 at 5% significance level and degree of freedom (V) 1 is 3.84. Since the calculated value of 2 (9.42) is grater than the table value (3.84), the hypothesis that distinction is not made is rejected. Hence, it is proved that distinction is made in appointment on the basis of sex.
Q. 7. Show with the help of the following data, that the factor reversal test and time reversal test and satisfied by Fishers Ideal Index Number: Commodity Base Year Current Year Price Expen. Price Expen. A 12 600 20 1200 B 4 400 4 480 C 8 480 12 720 D 20 600 24 720 E 16 640 24 960
Ans : According to Factor Reversal test by Prof. Irving Fisher, the formula for constructing index number should permit the interchange of weights i.e. the change in price multiplied by the change in quantity is equal to the total change in value. Thus factor reversal test is satisfied if P01 x Q01 =
Pq Pq
1 1
0 0
To prove this, we use the given data Commodity A B C D E Base Year Q0 P0 12 600 4 400 8 20 20 16 16 640 Current Year P1 Q1 20 1200 4 480 12 720 24 72 24 960 P0Q0 P0Q1 7200 14400 1600 1920 3840 5760 12000 14400 10240 15360 P0 Q0 348800
0 1
PQ PQ PQ
1 1
P1Q0 12000 1600 5760 14400 15360
P1Q0 24000 1920 8640 17280 23040
51840 49120 74880
Factor Reversal test is satisfied when P01 x Q01 = P01 =
Q01 =
PQ PQ PQ PQ = PQ PQ Q P Q P = Q P Q P
1 1 0 0
49120 74880 34880 51840 51840 74880 34880 49120
1 0
1 1
0 0
0 1
P01 x Q01 = =
49120 74880 51840 74880 34880 51840 34880 49120 74880 P1Q1 74880 74880 = = 34880 34880 34880 P0Q0
Hence the factor reversal test is satisfied. Time Reversal test is satisfied when the formula for calculating the index number should be such that it will give the same ratio between one point of comparison and another, no matter which of the two is taken as base. In other words index number prepared forward should be the reciprocal of index number prepared backward. Therefore time reversal test is satisfied if P01 X P10 = 1 Using the data, it can be proved that fishers Ideal Index Number satisfies time reversal test also. P01=
P10
P01
PQ PQ PQ PQ P Q PQ = PQ P Q PQ PQ P Q P Q xP = P Q P Q PQ PQ
1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1
10
= 1 =1
Hence time reversal test is satisfied.
Q. 8. Write Notes on the following a) Bayes Theorem b) Poission Distribution c) Hypothesis Ans : Bayes Theorem:
One of the most interesting applications of the results of probability theory involves estimating unknown probabilities and making decisions on the basis of new (sample) information. Bayes theorem is used for this purpose. The conditional probability takes into account information about the occurrence of one event to predict the probability of another event. The concept can be extended to revise probabilities based on new information and to determine the probability that a particular effect was due to a specific cause. The procedure for revising these probabilities is known as Bayes theorem. Quite often the businessman has the extra information on a particular event or proposition, either through a personal belief or from the past history of the event. Probabilities assigned on the basis of personal experience, before observing the outcomes of the experiment are called prior probabilities. For example, probabilities assigned to past sales records, to past number of defectives produced by a machine are examples of prior probabilities. When the probabilities are revised with the use of Bayes rule, they are called posterior probabilities. Bayes theorem is very useful in solving practical business problems in the light of additional information.
In its general form, Bayes theorem deals with specific events, such as E1, E2, -------Ek that have prior probabilities. Each prior probabilities is already known to the decision maker, and these probabilities have the following form P(E1), P(E2)-----P(Ek) The events with prior probabilities produce, cause or provide another event say B.A conditional probability relation exists between events E1, E2........Ek and event B. The conditional probability are P (B/E1), P (B/E2)........P(B/Ek) Bayes formula allows us to calculate the probability of an event, say E1 occurring given that event B has already occurred with a known probability P(B). The probability of E1 occurring given that B has already occurred in the posterior (or revised) probability. It is denoted by P(E1/B). Thus P(E1) and P(B/E1) is given and is used to calculate P(E1/B) P(Ei/B)=
P ( Ei ) P ( B / Ei ) P( E1 ) P( B / E1 ) + P( E2 ) P( B / E2 ) + P ( Ek ) P( B / Ek )
(b) Poisson Distribution : The probability distribution of a random variable may be the probability listing of outcomes and their observed relative frequencies, Amongst probability or expected frequency distributions, the following three are more popular: (i) Binomial Distribution (ii) Poisson Distribution (iii) Normal Distribution Poisson Distribution:Poisson distribution is a discrete probability distribution and is very widely used in statistical work. Poisson distribution is used in practice in a wide variety of problems where there are infrequently occurring events with respect to time, area, volume etc. Poisson distribution may be expected in cases where the chance of any individual event being a success is small. The distribution is used to describe the behavior of rare events. Such as the number of accidents on road, number of printing mistakes in a book etc. Characteristics of the Poisson distribution: i) The poisson distribution is a discrete distribution with a single parameter m. As m increases the distribution shifts to the right. All poisson probability distributions are skewed to the right. This is the reason why the poisson probability distribution has been called the probability distribution of an event (the probabilities tend to be high for small number of occurrences). ii) The occurrence of the events is independent. That is the occurrence of one event has no effect on the probability of a second occurrence of the event. iii) Theoretically an infinite number of occurrences of the event must be possible in the interval. iv) In an extremely small portion of interval, the probability of two or more occurrence of the event is negligible.
The poisson distribution has many applications in business and has been widely used in Management science and operations research. Some of the examples, which may be analyzed with the use of the distribution are: a) Demand for a product b) Typographical errors occurring or the pages of a book. c) The occurrence of accident in a factory d) The arrival pattern in a departmental store. e) The occurrence of defects in a bolt in a factory. f) Arrival of incoming calls. Mean and variance of the poisson distribution =m(np) 2=m c) Hypothesis:: A hypothesis is a proposition or assumption, which the researcher wants to verify. In a problem-oriented research, it is necessary to formulate a hypothesis which is concerned with the causes of a certain phenomenon or a relationship between two or more variables under investigation. Procedure of Hypothesis Testing: 1) Formulate a hypothesis 2) Set up a suitable significance level. 3) Choose a test criteria 4) Compute 5) Make decision Formulate a Hypothesis: The conventional approach to hypothesis testing is to set up two hypothesis instead of one in such a way that if one hypothesis is true the other is false. Alternatively, if one hypothesis is false or rejected, then the other is true or accepted. These two hypothesis are: (i) Null hypothesis (ii) Alternative hypothesis The term null means nothing or invalid. Let us assume that the mean of the population is o and mean of the sample is X. This assumption that population has a mean o, is our Null hypothesis. Ho = = Where Ho is the null hypothesis, Than the alternative hypothesis is H1 : The rejection of the null hypothesis will show that the mean of the population is not o. This implies that some other hypothesis is accepted. This other hypothesis is called the Alternative hypothesis. It may be that there can be two or more alternative hypothesis though only one alternative hypothesis can be tested at one time against the null hypothesis. Set up a suitable Significance level: The next step after formulating a hypothesis is to test its validity at a certain level of significance. The confidence with which a null hypothesis is rejected
or accepted depends upon the significance level used for the purpose. A significance level of 5 % per cent means that the risk of making the wrong decision is about 5 percent. The researcher is likely to be wrong in accepting a false hypothesis or in rejecting a true hypothesis in 5 out of 100 occasions. Select test Criterion: The next step is the selection of appropriate statistical techniques as a test criterion. There are many techniques from which one is to be chosen. Foe example, when the hypothesis pertains to a large sample (30 or more) the Z test implying normal distribution is used. When a sample is small (less than 30) t, f and 2 test are used. Make Computations: After having selected the statistical technique to verify the hypothesis, the next step is the performance of various computations, necessary for the application of that particular test. These computations include the testing statistic and its standard error. Make Decisions:The last step in hypothesis is to draw a statistical decision, involving the acceptance or rejection of the null hypothesis. This will depend on whether the computed value of the test criterion falls in the region of acceptance or in the region of rejection at a given level of significance. Two types of errors in Hypothesis Testing When a hypothesis is tested there are 4 possibilities 1) The hypothesis is true but our test leads to its rejection. 2) The hypothesis is false but our test leads to its acceptance. 3) The hypothesis is true but our test leads to its acceptance 4) The hypothesis is false and our test leads to its rejection. Of these four possibilities, the first two decision lead to wrong decision. The first possibility is the Type I Error, when the hypothesis is true but our test leads to its rejection. The second possibility is the Type II Error, when the hypothesis is false but our test leads to its acceptance. However, it is more risky to commit type II error than type I error. To reduce the risk of committing a type I error, we should reduce the size of the rejection region or level of significance. When significance level is reduced from 10% to 1%, it means the risk of rejecting a true hypothesis is reduced from 10% to 1% But a reduction in the probability of committing a type I error increases the risk of committing a type II error i.e. the probability of accepting a null hypothesis when it is false, increases. However, an increase in the sample size is the only way to reduce the risk of committing both the types of errors. Let us take a hypothetical example of business firm who wants to introduce another product in the market. Thus it has to choose one of the two decisions i.e. not to introduce the product or to introduce it. Now, the states of nature are two, namely, the failure of the product and the success of the product. E.g. of type I and type II Errors
Decision Do not introduce the Product A1) Introduce the product A2)
States of Nature H0 is true (Product fails) Correct decision Type I error ()
H0 is false (Product succeeds) Type II error () Correct decision
The firm thus runs the risk of a wrong decision in two ways. It does not introduce the product though it would have succeeded had it been introduced. This is type II () error. The second risk is that the product is introduced but does not succeed. This is type I () error. ONE TAILED-TWO TAILED TEST: A two-tailed test of hypothesis will reject the null hypothesis, if the sample statistic is significantly higher than or lower than the hypothesized population parameter. Thus, in a two-tail test the rejection region is located in both the tails. If we are testing a hypothesis at 5% level of significance, the size of acceptance region on each side of the man would be 0.475 and the size of the rejection region is 0.025. If we consult the table of areas under the normal curve, area of 0.475 corresponds to 1.96 standard errors on each side of the hypothetical mean and this equal the size of the acceptance region. If the sample mean falls into this area, the hypothesis is accepted. If the sample mean falls into the area beyond 1.96 standard error, the hypothesis is rejected because it falls into the rejection region.
Acceptance Region Acceptance Region Rejection Region Rejection Region
H-1.96 x
H+1.96 x
If we have to test whether the population mean has a specified value o than the null hypothesis is H0 : = 0 and the alternative hypothesis may be i) H1, 0(>0 or <0) Two tailed test ii) H1=>0(Right tailed alternative/test) iii) H1=<0(Left tailed alternative/test)
Paper Code 4346 Statistical Analysis 1. What are non-parametric tests? In what way are they different from parametric tests? Explain the various types of non-parametric tests known to you and the specific situation in which they are applicable. Ans : i) and ii) parts are already discussed in the Paper 2621 Question 5. iii) The various types of non-parametric tests are : a) The Sign Test. b) The Runs Test of Randomness. c) The Medium Test for Randomness. d) Wilcoxon Signed Rank-Sum Test. e) Mann Whitney Wilcoxon Test. f) The Kruskal-Wallis Test. g) The Spearmans Rank Correlation Test. a) The Sign Test : When samples are small and the t-test is used to determine whether two population means are equal it is assumed that each of the two populations is normal and the variances of the two populations are identical. In many instances either or both of these assumptions may be false and t-test cannot be used. In this case we use non-parametric procedure known as the sign test. First, we rephrase our hypothesis. Rather than hypothesizing that x1 = x 2 , we hypothesize that the two populations have an identical distribution. The two hypotheses are not exactly the same since two-populations may have different distributions and yet possess the same mean. Suppose that A and B are two random variables and it is hypothesized that the population distribution of A is the same as that of B. For testing this hypothesis we draw two samples of equal size one for A and the other for B. The observations in the two samples may be matched pairs such as a workers daily output before a strike and his daily output after the strike. The sign test is based on the signs, plus or minus of the differences between the observations of matched pairs without regard to the magnitude of the differences. Thus, where the normality assumption is not warranted and the use of ttest is not made, the sign test provides a convenient substitute for the test involving dependent samples. The test statistic is X np x-number of positive signs Z= np (1 p ) n-total number of sign of differences p-probability of positive signs. This can be tested at 5% or 1% level of significance. b) The Runs Test of Randomness : The basic assumption in all statistical methods for estimation and hypothesis testing is that the samples obtained are random. To test the randomness of a given sample (whether the sample observations are independent and ideally distributed), one such test is known as the Runs Test.
The runs test is based on a sample of n observations recorded in the order in which they are made. Taking each pair of consecutive observations in turn, a plus or minus sign is inserted between them depending on whether the earlier observation is less than the later or vice-versa. Run can be defined as a sequence of elements of one kind, followed and proceeded by elements of another kind. Defining a run as a continuous sequence of plus or minus signs, the number of runs, R can be shown to have an approximately normal 1 1 distribution with mean (2n 1) and variance (16n 29) , if the sample is in fact, 3 90 random This approximation is sufficiently close to most purposes when n>20 and improves as n increases. c) Median Test for Randomness : Another test for randomness of the sample is based upon the number of runs, R, above and below the median of the series as a test statistic. If the series is random it can be shown that R is approximately normally distributed with mean (n+2)/2 and variance n(n-2)/4 (n-1). Again the approximations improves as n increases. d) Wilcoxon Signed Rank-Sum Test : The disadvantage of the Sign-test is that although it take account of the signs of difference it makes no allowance for their magnitudes. The signed rank test developed by Frank Wilcoxon takes into account the magnitude of the difference between period values. The test requires the ranking of all the absolute differences between paired values from the smallest to the largest. The smallest absolute difference is assigned the rank 1, the next smallest the rank 2 and so on. Any pair with a difference o may be disregarded. Since the differences are ranked without regard to their signs, a difference of 1 and or +1 is assigned the same rank. Also, -2 or +2 is assigned the same rank. Once the differences are ranked, the sign of each difference is affixed to its rank. The sum of the negatives and positive rows are then computed. Under the assumption that there is no difference between the population means, it can be shown that for a larger sample (n>30) the statistic T=sum of ranks of less frequent sign has approximately a normal distribution with mean = n(n+1)/4 and variance = n(n+1) (2n+1)/24. e) Mann Whitney Wilcoxon Test : This test is used to test the hypothesis that two independent samples have come from two populations with equal means or medians. The test can be a one or two tailed test. The basis assumption of the test is that the distribution of the two population are continuous with equal standard deviations. If the sample sizes are n1 and n2, the sum of R1 and R2 is simply the sum of first n1 + n2 positive integers, which is known to be (n1 + n 2 )(n1 + n2 + 1) 2 This formula enables us to find R2 if we know R1 and vice-versa. The decision of ranks sums is based on either of the related statistics when each n1 or n2 is at least 10. n (n + n 2 + 1) u1 = 1 1 2
u2 =
n 2 (n1 + n 2 + 1) 2
n1 n2 (n1 + n 2 + 1 12 f) The Kruskal Wallis Test : This test is used to determine whether R independent samples can be regarded to have been obtained from identical populations with respect to their means. The Kruskal-Wallis test is the non-parametric counterpart of the one-way analysis of variance. The assumption of the F-test, was that each of the R-populations should be normal with equal variance. However, the Kruskal-Wallis test only assumes that the R-populations are continuous and have the same pattern of (symmetrical) distribution. The null and the alternative hypothesis of the K-W test are : H0 : m1=m2=______mk (i.e. means of the k populations are equal) H1 : N at all mi s are equal. The test static demoted by H, is given by 2 R 2 R2 12 R2 k 1 H= + + 3(n + 1) n(n + 1) n1 n2 nk
and standard ever =
It can be shown that the distribution of H is 2 with k-1 degrees of freedom, when size of each sample is at least 5. Thus, if H> k21 , H0 is rejected. h) The Spearmans Rank Correlation Test : The Spearmans Rank correlations test can be used to test the significance of correction is population when ranks are given and when ranks are not given. It is given by 6 d 2 R=1n(n 2 1) i) Chi Square Test (2): The2 test is one of the simplest and most widely used non-parametric tests in statistical work. The quantity 2 describes the magnitude of the discrepancy between theory and observation. It is defined as :2 = (O E)2 \ E O refers to the observed frequencies and E refers to the expected frequencies. In general, the expected frequencies are estimated by E = RT * CT \ N RT The row total for the row containing the cell CT - The column total for the column containing the cell N - The total number of observations. The calculated value of 2 is compared with the table value of 2 for given degrees of freedom at a certain specified level of significance. If at the stated level (generally 5%) the calculated value of 2 is more than the table value of 2 , the difference between theory and observation is considered to be significant i.e it could not have arisen due to fluctuations of simple sampling. If on the other hand, the calculated value of 2 is less
than the table value, the difference between theory and observation is not considered as significant i.e it is regarded as due to fluctuations of simple sampling and hence ignored. The value of 2 is always positive and its upper limit is infinity. Also, since 2 is derived from observations, it is a statistic and not a parameter. It is one of the great advantages of this test that it involves no assumption about the form of the original distribution from which the observations come.
Degrees of Freedom: While comparing the calculated value of 2 with the table value we have to determine the degrees of freedom. By degrees of freedom we mean the number of classes to which the values can be assigned arbitrarily or at will without violating the restrictions or limitations placed. For e.g, if we are to choose any 5 numbers whose total is 100, we can choose any 4 numbers and the 5 number is fixed by virtue of the total being 100. Suppose, if the 4 numbers are 20, 35, 15, 10 , the 5 number is 100-(20+35+15+10) = 20 The degrees of freedom v = n-k n-number of observations k- number of independent constraints. Uses of 2 test :i) 2 test as a test of Independence :With the help of 2 test we can find whether two or more attributes are associated or not. We take the null hypothesis that there is no association in the attributes. If the calculated value of 2 is less than the table value at a certain level of significance (5% level), the hypothesis that the attributes are not associated holds good. If the calculated value of 2 is greater than the table value, the attributes are associated. 2) 2 test as a test of Goodness of Fit :2 test as a test of goodness of fit enables us to ascertain how appropriately the theoretical distributions such as Binomial, Poisson, Normal fit empirical distributions i.e those obtained from sample data.
3) 2 test as a test of Homogeneity :The 2 test of homogeneity is an extension of the chi-square test of independence. Test of homogeneity are designed to determine whether two or more independent random samples are drawn from the same population or from different populations.
Q. 7. Find the coefficient of correlation from the following data. x 300 350 400 450 500 550 600 650 700 y 800 900 1000 1100 1200 1300 1400 1500 1600 ( x 500) ( y 1200) Ans. x y y2 : xy x2 x y 300 -200 40000 800 -400 160000 80000 350 -150 22500 900 -300 90000 45000 400 -100 10000 1000 -200 40000 20000 450 -50 2500 1100 -100 10000 5000 500 0 0 1200 0 0 0 550 +50 2500 1300 +100 1000 5000 600 +100 10000 1400 +200 40000 20000 650 +150 22500 1500 +300 90000 45000 700 +200 40000 1600 +400 160000 80000 _______ ______ _______ _____ ______ _______ __________ 600000 y =10800 y =0 y 2 = x =4500 x =0 x 2 =150000 xy = 300000 Since actual mean is in whole numbers, well use Actual mean method of Karl Pearson to calculate coefficient of correlation . Its formula is : xy r= x2 y2
x=x- x y=y- y Putting the values in the formula. r= r=1 300000 150000 600000 = 1
8) Find the Regression equation of x and y and the coefficient of correlation from the following data : x =60 y =40 xy =1500
x 2 =4160
y =1720
N=10
Ans 8 i) The Regression equation of y and x is y=a+bx ------ (1) -------(2) y =Na+b x 2 -------(3) xy =a x +b x Putting the values, in the two simultaneous equations (2) and (3) 40 = 10a + 60B ------ (2) 1500 = 60a + 4160b -------(3)
Multiplying equation (2) by 6, 240 = 60a + 360b 1500 = 60a + 4160b Deducting equation (5) from (4) -3800b=-1260 1260 = 0.33 b= 3800 Substituing the value of b in equation (2) 40 = 10a + 60 x 0.33 40 = 10a + 19.8 10a = 40-19.8 10a = 20.2 20.2 = 2.02 a = 10
--------(4) --------(5)
Putting the values of a and b in the equation, the regression of y on x is y = 2.02 + 0.33 x The Regression equation of x on y is x = a + by and the two normal equations are : x =Na+b y xy = a y +b y 2 Substituting the values, 60 = 10 a + 40 b ------(1) 1500 = 40 a + 1720 b ------(2) Multiplying equation (1) by 4. 240 = 40 a + 160 b ------(3) 1500 = 40 a + 1720 b ------(4) Deducting equation (4) from (3) -1560 b = -1260 1260 = 0.8 b= 1560 Substituting the value of b in equation (1) 60 = 10 a + 40 x 0.8 10 a + 32 = 60 10 a = 60 32 10 a = 28 28 a= 10 a = 2.8 Putting the values of a and b in the equation, the regression line of x on y is x = 2.8 + 0.8 y
ii) To calculate coefficient of correlation from the given data, we use the Direct Method of finding correlations and the formula is N xy ( x)( y ) r= N x 2 ( x) 2 N y 2 ( y ) 2 Putting the values in the formula (10 1500) (60)(40) = (10 4160) (60) 2 (10 1720) (40) 2
= =
15000 2400 41600 3600 17200 1600 12600
38000 15600 = 0.51
2)(a) Write short notes on the following i) Additive rule of Probability ii) Normal distribution (b) One bag contains 5 red balls and 3 white balls. A second bag contains 4 red balls and 7 black balls. If one ball is drawn at random from each bag, what is the probability that both are of the same colour? Ans. : i) Additive Rule of Probability: The addition of probability states that if two events A and B are mutually exclusive the probability of occurrence of either A or B is the sum of the individual probability of A and B. P (A or B) = P (A) + P (B) Proof of the Theorem: If an event A can happen in a1 ways and B in a2 ways, then the number of ways in which either event can happen is a1 + a2. If the total number of possibilities is n, then by definition the probability of either the first or the second event happening is a1 + a2 / n = a1/n + a2/n but a1/n = P (A) and a2/n = P (B) Thus, P (A or B) = P (A) + P (B) The additive rule can be extended to three or more mutually exclusive. Thus, P (A or B or C) = P (A) + P (B) + P (B) When events are not mutually exclusive: When events are not mutually exclusive or in other words it is possible for both events to occur, the addition rule must be modified. For e.g what is the probability of drawing a king or a heart from a standard pack of playing cards? The events a king or a heart can occur together. So the addition rule is P (A or B) = P (A) + P (B) P (A and B). Multiplication Rule of Probability: This theorem/rule states that if two events A and B are independent, the probability that they both will occur is equal to the product of their individual probabilities. If A and B are independent, then P (A and B) = P (A) x P (B) Proof of the Theorem: If and event A can happen in n1 ways of which a1 are successful and the event B can happen in n2 ways of which a2 are successful, we can combine each successful event in the first with each successful event in the second case. Thus, the total number of successful happenings in both cases is a1 x a2. Similarly, the total number of possible cases is n1 x n2. The probability of the occurrence of both events is
a1 x a2 / n1 x n2 = a1/n1 x a2/n2 But a1/n1 = P (A) a2/n2 = P(B) P (A and B) = P (A) x P (B) The theorem can be extended to three or more events.
ii) Normal Distribution: The normal distribution, also called the normal prob. (probability) distribution, is the most useful prob.distribution for continuous variables. The reason being that so many physical measurements and natural phenomena have actual observed frequency distributions that closely resemble the normal distribution. The normal distribution is the most frequently used of all probability distribution (Binomial and Poisson). The prob. Distributions of most sample statistics are derived and closely connected with normal distribution. The fundamental importance of the normal distribution in statistics arise from the fact that the distribution of sample means and other statistics for large sample sizes is approximately normal, even though the normal population may not be normal. The normal distribution has convenient mathematical properties and it also serves as an approximation to other discrete prob.distributions such as binomial, poisson etc. Shape and Properties of the Normal Curve: 1) The normal curve is bell-shaped and symmetrical and depends on the two parameters, mean and standard deviation. While it is always bell-shaped and symmetrical about its mean, its actual shape is determined by the standard deviation of the distribution 2) All its measures of central tendency are equal (mean=median=mode). 3) All observations are included inside the curve above the X-axis 4) The area lying between the normal curve and the horizontal axis is said to be the area under the curve and is equal to the number of frequencies in the distribution. The area under the normal curve distributed as follows: a) Mean 1 covers 68.27% area; 34.135% area will lie on either side of the mean b) Mean 2 covers 95.45% area c) Mean 3 covers 99.73% area. b) Probability of drawing a red ball from the first bag = 5/8 Probability of drawing a red ball from the second bag = 4/11 Since the events are independent the probability that both the balls are red = 5/8 x 4/11 = 0.22
Q. 3. Explain the illustrate with the help of suitable examples the reasons and theoretical basis of sampling. What precautions should be taken while collecting samples data through mail questionnaire method? Ans : Refer to question papers 8560. Q4. Distinguish between correlation and regression. From the following data obtain two regression equations and also calculate co-efficient of correlation X: 2 4 6 8 Y: 5 10 7 14 Ans: The techniques of regression and correlation are different but they are related also. The co-efficient of correlation (r) is related to the co-efficient of determination R2 in regression analysis. Also, the correlation coefficient( r )takes the same sign as that of (b). However, the differences between correlation and regression are: 1) The regression analysis aims at establishing the functional relationship between the dependent and independent variables such that the relationship can be used for prediction of dependent variable on the basis of independent variables. Correlation aims at measuring the degree of variation in X and y but it does not imply a functional relationship. Therefore, correlation coefficient (r) does not make it possible to predict y on the basis of x. 2) Correlation is merely a tool of ascertaining the degree of relationship between two variables and therefore, we cannot say that one variable is the cause and other the effect. However, in Regression analysis one variable is taken as dependent while the other as independent- thus making it possible to study the cause and effect relationship. 3) In correlation analysis rx,y is a measure of direcation and degree of linear relationship between two variables x and y, rxy and ryx are symmetric and it is difficult to analyse whether x is a dependent or independent variable. In regression analysis the regression coefficients bxy and byx are not symmetric and hence it makes a difference as to which variable is dependent and which is independent. 4) There may be non-sense correlation due to chance but there is nothing like nonsense regression. 5) Correlation coefficient is independent of change of scale and origin. Regression coefficients are independent of change of origin but not of scale.
ii)
Y X x(X-5) x2 2 -3 9 5 4 -1 1 10 6 1 1 7 8 3 9 14 x=20 x2=20y=36
y(Y-9) y2 xy -4 16 12 1 1 -1 -2 4 -2 5 25 15 y2=46 xy=24
x=
20 =5 4
y=
36 =9 4 y- y =
Regression equation of y on x :
xy x
2
(x- x )
xy = 24 = 6 = 1.2 x 20 5
2
y-9=1.2(x-5) y-9=1.2x-6 y=3+1.2x Regression equation of x on y : x- x =
xy = 24 = 0.52 y 46
2
xy y
2
(y- y )
x-5=0.52(y-9) x-5=0.52y-4.68 x=0.32+0.52y so the two regression equations are: y on x : y = 3+1.2x x on y : 0.32 + 0.52y Coefficient of correlation r = bxy b yx 0.52 *1.2 bxy=0.52 = 6.24 byx=1.2 =2.49 Q5. Distinguish between parametric and non-parametric tests. In an experiment of anti-malaria the following results were obtained Fever No Fever Total Quinine 12 28 40 No Quinine 48 12 60 Total 60 40 100 Discuss the effectiveness of quinine in checking malaria.
=
Ans: There are two types of tests- parametric and non-parametric. The parametric tests assume that parameters such as mean, standard deviation etc. exist and are used in testing a hypothesis. The underlying assumption in such tests is that the source of data is normally distributed. But, however, in some cases the population may not have a normal distribution than the sample is used. The parametric tests that are commonly used are : a) Z test b) t-test c) F-test But there are certain situations, particular in psychological or marketing research studies where the assumption underlying the parametric tests is not valid. There is no such assumption that a particular distribution is applicable or that a certain value is attached to a parameter of the population. Hence non-parametric tests are used. These tests are also known as distribution free tests. These tests include chi-square test, rank-sum test, Run test for randomness etc. A major advantage of non-parametric tests is that they are quick and easy to use. Moreover when data are not as accurate as they should be for proper application of standard tests of significance, then these tests which are very conversant to use, can give fault satisfactory results. However, the use of a non-parametric list involves greater risk of accepting a false hypothesis and thus committing. Type II error. They yield less precise result than parametric tests, as the null hypothesis is loosely defined. Treatment Fever No Fever Total Quinine 12 28 40 No Quinine 48 12 60 Total 60 40 100 Let us take the hypothesis that quinine is not effective in checking the malaria. Applying 2test observed frequencies are given and now we will find expected frequencies. RT CT Expected frequency corresponding to first row and first column= N RT-Row Total CT-Column Total N-Total Number of observations 40 60 = 24 For e.g. Expected frequencies to observed frequency 12 = 100 Other expected frequencies are Total 24 16 40 36 24 60 Total 60 40 100 Observed Frequency 12 Expected frequency 24 (O-E)2 144 ()-E)2/E 6
48 28 12
36 16 24
144 144 144
4 9 6 (O E ) 2 E =25
Degree of freedom v = (r-1)[c-1)- = (2-1)(2-1)=1 v=row c=column Table value of 2 at 5% significance level (assumed as not given in the question) is 3.84 The calculated value of 2 (25) is greater than the table value 3.84. The hypothesis assumed that quinine is not effective in checking malaria is rejected. Hence quinine is useful in checking malaria.
Q. 6) Distinguish between estimate, estimation and estimator. Also discuss the properties of good estimator? Ans. : Estimation is a procedure by which sample information is used to estimate the numerical magnitude of one or more parameters of the population. A function of sample values is called an estimator (or statistic) while its numerical value is called an estimate. For e.g X is an estimator of population mean, . On the other hand if X=50 for a sample, the estimate of population mean is said to be 50. Properties of good estimator: R.A Fisher has given the following properties of good estimators. These are: a) Unbiasedness: An estimator is said to be unbiased if its expected value is identical with the population parameter being estimated. That is an estimator t (X1, X2,..Xn) is said to be an unbiased estimator of a parameter if E(t) = . b) Consistency: It is desirable to have an estimator with a prob. distribution that comes closer and closer to the population parameter as the sample size is increased. An estimator possessing this property is called a consistent estimator. An estimator tn (X1, X2,..Xn) is said to be consistent if its porb.distribution converges to as n P (tn ) = 1 as n . c) Efficiency: - Let t1 and t2 be two estimators of a population parameter such that both are either unbiased or consistent. To select a good estimator from t1 and t2, we consider another property that is based upon its variance. If t1 and t2 are two estimators of a parameter such that both of them are either unbiased or consistent, then t1 is said to be more efficient than t2 if variance (t1) < variance (t2). The efficiency of an estimator is measured by its variance. If the population is symmetrically distributed then both the sample mean and the sample median are consistent and unbiased estimators of population mean, . Yet the sample mean is better than the sample median as variance of sample mean < variance of sample median. d) Sufficiency: An estimator t is said to be a sufficient estimator of if it utilizes all the information given in the sample about . For e.g the sample mean is a
sufficient estimator of because no other estimator of can add any further information about . The following must be noted about sufficient estimators: 1) A sufficient estimator is always consistent. 2) A sufficient estimator is most efficient if an efficient estimator exists. A sufficient estimator may or may not be unbiased.
8. Explain and illustrate the basic principles underlying the control chart. Discuss how control limits are determined for: i) Mean chart ii) Range Chart Ans: The quantity of a large number of products is judged on the basis of measurements of characteristics such as length, diameter, weight etc. These variables are of a continuous type. For continuous variable, Shewhart developed control charts known as X, and R charts to keep a control on the quality of the product. These charts involve the location parameter the mean, the scale parameters, the range and the standard deviation. Mean chart (X) is a graphic device, which depicts the measurements revealing the state of their scattered ness from the standard value. The control charts have three horizontal lines, the lower lines, the middle line and the upper line. The middle line or the central line (C.L.) shows the standard value of the quality, characteristic (variable) of the manufactured units. The lower line is the lower control limit (L.C.C.) of the variate values and the upper line is the upper control limit (U.C.C.) of the variate values.
Rejection Region
Statistic Value
Acceptance Region
Acceptance Region
Rejection Region Sample Number A small sample is drawn at regular intervals and the decision about the process, whether it is under control or not is taken on the basis of the plotted points, which here are mean values of the samples plotted against sample numbers. The decision criterion is, if all the plotted points lie on or in between upper and lower control limits, the process is under control. On the contrary, if any one or more points lie outside the control limits it is concluded that the process is not under control, which indicates the presence of some assignable causes.
Setting of control limits:
Let be the mean and the standard deviation of variate values for all the units of the process. 99.73% units fall within +-36 limits in case of normal population. Hence the process is under control of the mean values of the sample fall within plus and minus, there times the standard deviation of the statistics from the true mean. The control limits for x chart as set at U.C.L. = x + 36 x
where
L.C.L. = x 36 x
x mean of the sample means x = No.ofsamples x
When standard deviation is not known range is calculated and the control limit. U.C.L. = x + A2 R L.C.L. = x A2 R where R -Sample Range A2 Tabulated values
Range Chart(R chart): In case of small samples, the standard deviation S and the range R fluctuate simultaneously. If S is small, R is also small. In statistical quality control, generally small samples are drawn and hence the range can be used in place of standard deviation. The charts constructed on using the range are termed R-charts The R-chart is used to show the variability or dispersion of the quality produced by a given process. R chart (or chart) is the companion chart to the x chart and both are required for adequate analysis of the production process under study. The U.C.L. and L.C.L. of R-chart U.C.LR = R + 3R U.C.LR = R - 3R where R mean of the sample ranges. R - The standard error of the range where the tabulated values are used, the two limits are:U.C.LR = D4 R U.C.LR = D3 R
Q. 1. Point out the role and limitations of sampling. Explain critically any two probability sampling methods. Ans. Already discussed. Q. 2. Give a note of measures of central tendency together with their merits and demerits. Which is the best measure of central tendency and why? Ans. On the most important objectives of statistical analysis is to get one single value that describes the character tic of the entire mass of data. Such a value is called the central value or an average. This single value is the point of location amount which individual values cluster and therefore called the measures of location. The different measures of central tendency are : 1) Arithmetic Mean 2) Median 3) Mode 4) Geometric Mean 5) Harmonic Mean 1. Arithmetic Mean : Merits Arithmetic mean is most widely used in practice because of the following reasons : a) It is the simplest average to understand and resort to compute. b) It is affected by the value of every item in the series. c) It is defined by a rigid mathematical formula with the result that everyone who computers the average gets the some answer. d) The mean is typical in the sense that it is the center of gravity, balancing the value of either side of it. e) It is calculated value, and not based on position in the series. Demerits : a) Since the value of mean depends on each and every item of the series, extreme items i.e. very small and very large items, unduly affect the value of the average. b) In a distribution with open end classes the value of mean cannot be computed without making assumptions regarding the size of the class interval of the open end class. c) The arithmetic mean is a good measure of central tendency only when the distribution of the variable is reasonably normal. 2. Median : Median is the middle value in a distribution. Merits : a) It is especially useful in case of open end classes since only the position and not the values of items must be known. b) Extreme value do not affect the median as strongly as then do the mean. For e.g. the median of 10,20,30,40 and 150 would be 30. Whereas the mean 50. In extreme observations median is a more satisfactory measure of the central tendency than the mean c) In markedly skived distributions such as inions distributions on price distributions, when mean is distorted the median is especially useful. d) It is the most appropriate average in dealing with quantitative data i.e. where ranks are given.
e) The median actually indicates the value of the middle item in the distribution. Demerits : a) For calculating median it is necessary to average the data; other averages do not need any arrangement. b) Since its a positional average, its value is not determined by the each and every observation. c) It is not capable of algebras tredment. d) The value of median is affected more by sampling fluctuations than the value of the arithmetic mean. 3. Mode : The mode or the modal value is that value in a series of observations which occurs with the greatest frequency. Merits : a) By definition mode is the most typical or representative value of distribution. It is the most frequently occurring value. b) Like median, mode is not unduly affected by extreme values. c) It is value can be determined in open-end distribution without ascertaining the class limits. d) It can be used to describe quantitative phenomenon. e) The value of mode can also be determined graphically whereas the value of mean cannot be graphically ascertained. 4. Geometric Mean : Geometric mean is defined as the N root of the product of N items or values. Merits : a) It is based on each and every item of the series. b) It is rigidly defined. c) It is useful in average ratios and percentages and in determining rates of increase and decrease. d) It gives less weight to large items and more to small ones than does the arithmetic average. e) It is capable of algebraic manipulations. Demerits : a) It is difficult to understand. b) It is difficult to compute and to interpret and so has restricted application. c) It cannot be computed when there an both negative and positive values in a series or one or more of the value are 0. 5. Harmonic Mean : Harmonic mean is based on the reciprocals of numbers averaged. It is defined as the reciprocal of the arithmetic mean of the reciprocal of the individual observations. Merits : a) Its value is based on every item of the series. b) It tends itself to algebra manipulation. c) In problems relating to time and rates it gives better results than other average. Demerits : a) Its not easily understood.
b) It is difficult to compute. c) It gives target weight to smallest items. This is generally not a desirable fracture. d) Its value cannot be computed when then are both positive and negative items in a series or when one or more items are zero. Best Measure of Central Tendency : One thing that is clear after describing the merits and demerits of various types of average that no one average can be regarded as best for all circumstances. The following considerations influences the selection of an appropriate average. a) The purpose which the average is designed to serve. b) Would the average be used for further computations. c) The type of data available; skewed, not skewed. d) The typical value required in the particular problem. However apart from some specific cares where either median, mode, geometric mean on harmonic mean is more appropriate, in other cases, the arithmetic mean is the most popular and widely used average in practice. However in the following cases, arithmetic mean should not be used and other measures of central tendency are used. 1) In highly skewed distributions, where either median or mode is used. 2) In distributions with open-end clauses, median is generally the best average. 3) When the distribution is unevenly spread. Concentration being smaller or large at irregular points, mode is performed. Mode can be used in problems where quantitative measurements are not possible. 4) The arithmetic mean should not be used to average ratios and rates of change. In such cases, the geometric mean is more suitable. 5) In problems where values of a variable are compared with a constant quantity of another variable i.e. rates, time, distance etc, harmonic mean is more useful than arithmetic mean. Q3. What is meant by correlation? What is the significance of positive and negative correlation? Distinguish between simple, multiple and partial correlation. Ans. In business we come across a large number of problems involving the use of two or more than two variables. If two quantities vary in such a way that movements in one are accompanied by movements in the other, these quantities are said to be correlated. The statistical test with the help of which these relationship between two or more than two variables is studied is called correlation. Significance of the study of correlation : The study of correlation is of immense use in practical life because of the following reasons : 1. Most of the variables show some kind of relationship. For e.g. there is relationship between price and demand, income and expenditure etc. With the help of correlation analysis, we can measure in one figure the degree of relationship existing between the variables. 2. Once it is established that the two variables are closely related, we can estimate the value of one variable given the value of another variable with the help of Regression analysis.
3. In business, correlation enables the Executive to estimate costs, rates, price and variables on the basis of some other series with which they may be functionally related. Types of Correlation : Three of the most important types of correlation are : 1. Positive and Negative Correlation 2. Simple Partial and Multiple Correlation and 3. Linear and Non linear. 1. Positive and Negative Correlation : If both the variables are varying in the same direction i.e. if one variable is increasing the other on an average is also increasing or if one variable is decreasing the other on an average is also decreasing, Correlation is said to be positive. If the variables are varying in opposite directions i.e. as one variable is increasing the other is decreasing or vice versa, correlation is said to be negative. 2. Simple, Partial and Multiple Correlation : When only two variables are studied it is a problem of simple correlation. When more than two variables are taken into consideration, it is a problem of partial correlation. In multiple correlation, more than three variables are studied simultaneously. 3. Linear and Non Linear (Curvilinear) Correlation : The distinction is based upon the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear a constant ratio to the amount of change in the other variable than the correlation is said to be Linear. Correlation would be non-linear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable. Methods of Studying Correlation : The following are the important methods of knowing whether the two tables are correlated or not :1. Scatter Diagram Method 2. Karl Pearsons coefficient of correlation 3. Spearmans Rank correlation coefficient and 4. Method of least squares. Q.4. State and explain Bayes Theorem and bring out its importance in probability theory. What is conditional probability? Explain with the help of an example. Ans. (a) is already discussed. (b) Conditional Probability : Independent Events. If the probability of an event is subject to a restriction on the sample space the probability is said to be conditional. Conditional probability is the probability of the occurrence of the event, say A subject to the occurrence of the previous event, say B. We define the conditional probability of event A, given that B has occurred, in case of A and B being independent events, as the probability of event A. P (A/B) = P (A) It is so because independent events are those whose probabilities are in no way affected by the occurrence of each other. Conditional Probability : Dependent Events. The conditional probability of event A, given that event B occurred when both A and B are dependent event, as the ratio of the number of elements common in both A and B to the number of elements in B P ( AandB ) P (A/B) = P( B)
Example : A bag contains 5 white and 3 black balls.Two balls are drawn at random one after the other without replacement. Find the probability that both balls drawn are black. Solution : Probability of drawing a black ball in the first attempt is 5 3 P (A) = = 5+3 8 Probability of drawing the second black ball given that the first ball drawn is black. 2 2 P (B\A) = = 5+2 7 The Probability that both balls drawn are black is given by P (AB) = P(A) x P(B/A) 3 2 3 = = 8 7 28 Q. 5. Name the different tests of consistency in the selection of an appropriate index formula. Verify whether Fishers Ideal Index formula satisfy such tests. Ans. 5. Several formula have been suggested for constructing index numbers and the problem is that of selecting the most appropriate one in a given situation. The following tests are suggested for choosing an appropriate index : 1. Unit test 2. Time Reversal test 3. Factor Reversal Test 4. Circular test 1. Unit Test : The unit test requires that the formula for constructing an index should be independent of the units in which or for which ** and quantities an quoted. 2. Time Reversal Test : Time reversal test is a test of determine whether a given method will work both ways in time, forward and backward. It mean that when the data for any two years are treated by the same methods, but the basis reversal the two index numbers secured should be reciprocal to each other so that their product is unity. P01 x P10 = 1 P01 index for time i and with o as base. P10 index for time o with i as base. Fishers method do satisfy this test. Fishers formula for P01 = poq q p 0 q 0 P01 = p 0 q 1 p1 q 0 changing time i = 0 to 1 and 1 to 0. p1q0 p0 q0 P = 10 p0 q0 p1q0 P10 = p1 q 0 p1 q1 p 0 q1 p 0 q 0 = 1 =1 p0 q 0 p 0 q1 p1 q q p1 q 0
Since P01 x P10 = 0, the fishers formula satisfies the test.
3. Factor Reversal Test : According to this test, the product of price index and quantity index should be equal to the corresponding value index. The total value of a given commodity in a given year = p x q percent. pq Ratio of total value in 1 year to the total value in preceding year = 1 1 p0 q0 p1 q1 So P01 x Q 01 = p0 q0 Factor Reversal test is satisfied only by the Fishers Ideal Index :
p1 q 0 p1 q1 p 0 q 0 p 0 q1 Changing p to q and q to p q1 p 0 q1 p1 Q01 = q 0 p 0 q 0 p1 p 01 = P01 Q01 = p1 q 0 p1 q1 q1 p 0 q1 p1 p 0 q 0 p 0 q1 q 0 p 0 q 0 p1
( p1 q 0 ) 2 = ( p 0 q 0 ) 2 4. Circular Test : If index number is used not only for comparison but the measure price changes over a period of years, it is desirable to shift the base. This test enables the adjust the index values from period to period without referring each time to the original base. A test of this shift ability of base is called the circular test. P01 x P12 x P20 = 1 This test is just an extension of the time reversal test. This test requires that if an index is constructed for the year a on base year b and year b on base of c, we should get the same result as if we calculated desert an index for a on base year c without going through as an intermediary. Fishers Ideal formula does not satisfy this test. Only simple aggregative method and fixed weight aggregative method satisfies this test.
Q. 6. Explain Central Limit Theorem. What is its significance? Explain the concept of Null Hypothesis and Type I and Type II errors. Ans. Already discussed. Q. 7. Explain the different types of control charts popularly used in practice. Ans. Introduction Already discussed Q. 8 Paper Code 2624 X. Different types of control charts can be divided into two groups : 1. Control charts of variables 2. Control charts of attributes.
1. Control Charts of Variables : Variables are those quality characterstics of a product which are measurable and can expressed in specific units of measurement. It includes Mean charts and R charts. 2. Control Charts of Attributes : Attributes are those product characterstics which are not amenable to measurement. They can only be identified by their presence or absence from the product. It includes P charts and C charts. C Charts : The C-chart is designed to control the number of defects per unit. Control chart for C is used in situations where the opportunity for defects is large while the actual occurrence tends to be small. Such situations are described by the Poisson distribution. For e.g. number of air bubbles in a piece of glass. The Central lineof the Control Chart for C is C (C mean) and the control limits are :
U.C.L. = C + 3 C L.C.L.= C 3 C The use of the C-chart is appropriate if the opportunities for a defect in each production unit are infinite but the probability of a defect at any point is very small and is constant. Uniform sample size is highly desirable while using the C-Chart. Where sample size varies particularly if the variation is large, the C-chart becomes difficult to read and the pchart provides a better choice. Control Chart for P (Fraction Defective) : The p-chart is designed to control the percentage or proportion of defectives per sample. The p-chart has two advantages over the c-chart. 1. Expressing the defectives as a percentage or fraction of production is more meaningful and easily understood. 2. Where the size of the sample varies from sample to sample, p-chart is more straightforward. The upper and lower control limits for p-chart are : UCL= p + 3
p(1 p ) n p (1 p) n Number of defectives
LCL = p 3
p =average fraction defective =

total number of units inspected
Q. 8. Write notes on the following : a) Schedule b) Non-parametric tests. Ans. Already discussed on Paper Code 2624-X Question 5. Q. Discuss and illustrate the difference between descriptive and inferential statistics. Ans. : Descriptive Statistics : Descriptive statistics is a branch of statistics that denotes many techniques used to summarize a set of data. In a sense, we are using the data on numbers of a set to describe the set. The techniques are commonly classified as :
1) Graphical Description in which we use graphs to summarize data. 2) Tabular description in which we use tables to summarize data. 3) Summary statistics in which we calculate certain values to summarize data. The two measures that are used are : 1) Measure of central tendency which shows how different units seem similar and includes arithmetic mean, median, mode, harmonic mean and geometric mean. 2) Measure of statistical variability that shows how different units differ. It includes range, interquartile range, standard deviation, mean deviation etc. Inferential Statistics : Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect ( usually parameter) of a population. Two schools of inferential statistics are frequency probability using Maximum Likelihood Estimation and Bayesian Inference.
Binomial Distribution : The binomial distribution also known as Bernoulli Distribution, with the name of James Bernoulli. Binomial distribution is a probability distribution expressing the probability of one set of altenatives i.e. success or failure. The Bernoulli model is developed under certain assumptions. These assumptions are : 1. An experiment is performed under the same conditions for a fixed number of trials, say n. 2. In each trial, there are only two possible outcomes of the experiment, success or failure. 3. The probability of a success demoted by p remains constant from trial to trial. The probability of a failure denoted by q is equal to (1-p). 4. The trials are statistically independent i.e. the outcomes of any trial or sequence of trials do not affect the outcomes of subsequent trails. Properties of Binomial Distribution : 1. Shape and location of binominal distribution changes as p changes for a given n or as n changes for a give p. 2. As n increases and p is fixed, binominal distribution moves to the right. Mean of the binomial distribution increases as n increases. For larger n there are more possible outcomes of a binomial experiment and p becomes smaller. 3. When p is small (suppose 0.1) binomial distribution is skewed to the right. 4. As p increases (0.3), skewness is less noticeable. 5. When p=0.5, binomial distribution is symmetrical. 6. When p larger than 0.5. distribution is skewed to the left. Importance of the Binomial Distribution : It is useful in describing an enormous variety of real life events. For example, a quality control inspector wants to know the probability of defective light bulbs in a random sample of 10 bulbs if 10% of the bulbs are defective. So n = 10 p = 0.1 q = 0.9
SAMPLING AND SAMPLING DISTRIBUTION Introduction: Population: - A Population is a collection of all the data points being studied. For example, if we are studying the annual incomes of all the people in India then the population under study would consist of data points representing the incomes of each and every person in India. Sample: -A sample is a part of a population. In the above example, a sample is the Annual incomes of all the people in Mumbai, Annual incomes of all the people in India over 40 yrs. of age etc. Samples, being smaller in size than their population, are easier to study. Hence, if we want to draw some conclusions about a population, we can do so by studying a suitable sample of the population Advantages of Sampling: The sampling technique has the following advantages over census method. (i) Facilitating timely Results: - The sample method results in considerable amount of saving in time and labor. There is saving in time because (a) a sample usually takes less time in investigation than complete enumeration of the population and (b) The time required in editing, coding and tabulating sample data is much less. (ii) More Accurate Results: - The results obtained are generally more reliable than that obtained from a complete enumeration. Both sample method and census method are subject to certain errors. These errors arise because of factors such as poor planning, ineffective execution and lack of proper control over various activities. But the effect of these errors is less in case of sample method. (iii) Less cost: -The sample method is much more economical than a complete enumeration method, since in sampling only a part of the population is considered and expenses are less. Destructive Testing: In the course of inspection, the units are destroyed or (iv) affected adversely, and then sampling method is employed. For e.g. to test the quality of explosives, crackers etc. sampling is used. (v) Used in Special Cases: - In many cases, census method may not be possible. For e.g. when the population to be inspected/investigated are either infinite in terms of number or otherwise constantly changing. So the sampling method is used. Purpose of Sampling: - The basic objective of sample study is to draw inferences about the population. It is a tool that helps to know the characteristics of the universe or population by examining only a small part of it. Theoretical Basis of Sampling: - There are two important principles on which the theory of sampling is based: (i) Principle of Statistical Regularity (ii) Principle of Inertia of Large Numbers.
Principle of Statistical Regularity: According to king, The law of statistical regularity lays down that a moderately large number of items chosen at random from a large group are almost sure on the average to possess the characteristics of the large group. The law points out that if a sample is taken at random from population it is likely to possess almost the same characteristics as that of the population. The law depends on one point that, the sample should be selected at random from the population. By random selection it is meant that a selection in which each and every event in the universe has an equal chance of being selected in the sample. Principle of Inertia of Large Numbers: - This law is derived from the law of statistical regularity. It states, Other things being equal, as the sample size increases, the results tend to be more reliable and accurate. This is based on the fact that the large numbers are more stable as compared to small ones because if the events are large than the typical odd variations in one part of the universe in one direction will get neutralized by the variations in equally bigger part of the universe in the other direction. For example if a coin is tossed 1000 times, the chance of 50% head (500 heads) and 50% tail (500 tails) would be very high. Sampling and Non-Sampling Errors: In Statistics, the difference between the true value of a population parameter and the estimated value of a sample statistics is called error. There are many causes for such deviations between two results. The error in any statistical investigation may be broadly classified as sampling and non-sampling errors. Sampling Errors: - The results of a sample survey are bound to differ from the census techniques and the variability or heterogeneity of the population to be sampled. A measure of the sampling error is provided by the standard error of the estimate. Non-Sampling Errors: Non-sampling errors are not attributed to chance and are a consequence of certain factors that are within human control. These arise in all surveys, whether it is a sample survey or a census survey. Such errors can arise due to a number of causes such as defective methods of data collection and tabulation, faulty definition, incomplete coverage of the population or sample etc. The non-sampling errors can be controlled by the methods including employment of qualified and trained staff, using sophisticated statistical techniques etc. Sampling Techniques: There are two methods of selecting samples from population 1) Non Random or Judgment sampling and 2) Random or probability sampling. In probability sampling, all the items in the population have a chance for being chosen in the sample. In non-random or judgment sampling, personal knowledge and opinions are used to identify those items from the population that are to be included in the sample.
I) Random Sampling Methods are: Simple Random Sampling: - Simple random sampling selects samples by methods that allow each possible sample to have an equal probability of being selected and each item in the entire population to have an equal chance of being included in the sample. Selection of sample units may be with or without replacement. A random sample may be selected by (i) Lottery Method (ii) Use of Random Numbers i) Lottery Method: Under this method, every member or unit of the population are numbered named on separate slips of paper of identical size and shape. Then these slips are put in a bag and thoroughly shuffled and then as many slips as units needed in the sample are drawn one by one, the slips being thoroughly shuffled after each draw. ii) Use of Table of Random Digits: The best way to ensure that we are employing random sampling is to use table of random numbers. The random numbers are usually generated by some mechanism which ensures approximately equal frequencies for the numbers from 0 to 9 and also proper frequencies for the combination of number such as 00,01,..,99 etc. Several standard tables of random numbers are available. Tippetts Table of random numbers is most popularly used in practice. The first forty sets from Tippetts Table have been reproduced below. 2952 6641 3992 9792 7969 5911 3170 5624 4167 9524 1545 1396 7203 5356 1300 2693 2370 2183 3408 2762 3563 1089 6913 6591 0560 5246 1112 6107 6008 8125 4233 8776 2754 9143 1405 9025 7002 6111 8816 6446 2. Stratified Random Sampling: - This method is recommended when the population consist of a set of heterogeneous groups. To use this method, we divide the population into relative homogenous groups, called strata. Either we select at random from each stratum a specified number of elements corresponding to the proportion of that stratum in the population as a whole, or we draw an equal number of elements from each stratum and give weight to the results according to the stratums proportion of total population. 3. Cluster Sampling: - In cluster sampling, we divide the population into groups, or clusters and then select a random sample of these clusters. It is assumed that these individual clusters are representative of the population as a whole. If the selection of a sample passes through more than two stages of sampling, such a sampling method would be known as multistage sampling. Suppose we want to take a sample of 5,000 households from the State of Haryana. At the first stage, the state may be divided into a number of districts and few districts
selected at random. At the second stage, each district may be subdivided into a number of households may be selected from each of the villages selected at the second stage. 4.Systematic Sampling: - A systematic sample is obtained by randomly selecting one element from the first K elements in the frame and then selecting every kth element thereafter, since it is easier and less time-consuming to perform this, than simple random sampling.Systematic sampling can provide more information per sampling rupee. II) Non-Random Sampling: 1. Judgment Sampling: In this method of sampling the choice of sample items depends exclusively on judgment of the investigator. For example, if a sample of ten students were to be selected from a class of sixty for analyzing the spending habits of students, the investigator would select ten students who, in his opinion, are representative of the class. 2.Quota Sampling: In this method, the investigator is told in advance the number of units to examine or enumerate from the stratum assigned to him. The sampling quotas may be fixed according to some specified characteristics such as income group, sex, occupation, religious affiliations etc. The choice of the particular units of individual for investigation is left to the investigators themselves. He usually applies his judgment in the choice of the sample. SAMPLING DISTRIBUTION: Sampling theory is the study of relations between a population and samples drawn from the population and it is applicable to random samples only. It is possible to estimate the true value of the population parameters (population mean, pop. standard deviation and pop. Proportion etc.) by using sample statistics like sample mean, sample standard deviation sample proportion etc. and to find the limits of accuracy of estimates based on samples. Sampling theory helps us to determine whether the differences between two samples are actually due to chance variation or whether they are really significant. As the units (or observations) selected in two or more samples drawn from a population is not the same, the value of a statistic varies from sample to sample, but the parameter always remains constant (since all the units in a population always remain the same). This variation in the value of a statistic is called sampling fluctuation. A parameter has no fluctuation. Let us consider a population of size N and let us draw all possible random N samples of a given size n. We get = k samples of size n. For each of these k n samples, we compute a statistic (i.e. sample mean |standard deviation. | proportion of defectives etc.). The value of may vary from sample to sample. Let 1, 2 , k be
the values of the statistic for the k samples. Each of these values occurs with a definite probability. Thus we can construct a table showing the set of values 1,2, k of with their respective probabilities. This Probability. Distribution of is known as the Sampling Distribution of . A statistic (sample mean, standard deviation, sample proportion of defectives etc.) always has a sampling distribution, but a parameter (pop. mean, standard deviation, pop. Proportion etc.) has no sampling distribution. Note: -Sampling is always done without replacement, as we would like to sample n different items. If n is small with respect to N sampling without replacement is same as sampling with replacement. Standard Error of Estimate: The standard error of a statistic (standard error of sample mean or sample standard deviation (s.d) or sample proportion of defectives etc.) is the standard deviation of the sampling distribution of the statistic. Standard error measures the variability arising from sampling error due to chance. Standard error is used as a tool in tests of hypothesis or tests of significance. It gives an idea about the reliability and precision of a sample. It helps in determining the limits (confidence limits) within which the parameters are expected to lie. Formulae for the standard errors (S.E.) of some well-known statistics for random samples are given below: a. S.E. sample mean X = x
b. S.E. Standard deviation s = c. Sample Proportion p : p =
2
2n pq / n q=1-p
d. Difference in two samples means X 1 and X 2 i.e.

2 x1 x 2 = 12 / n1 + 2 / n2
Where X 1 and X 2 are the means of 2 samples of size n1 an n2 drawn from two populations with standard deviations 1 and 2 respectively. e. Difference of two proportions p1 p 2 = pq / n1 + pq / n2 Where p1 and p 2 are the proportions of two random samples of size n1 and n2 drawn from two population and p = (n1 p1 + n 2 p 2 ) /(n1 + n 2 )
f.
For a finite population of size N, when a sample is drawn without replacement. x = ( n )( N n / N 1
Applications of Standard Error: 1. Standard error is used to test whether the difference between the sample statistic and the population parameter is significant or is due to sampling fluctuations. 2. Standard error is used to find the precision of the sample estimate of a population parameter. If a statistic is used to estimate the parameter, then precision of = 1/S.E. of 3. It is used to find the interval estimate of a population parameter. Central limit theorem: 1. If x be the mean of a random sample of size n drawn from a population having mean and s.d. then the sampling distribution of the sample mean x is approximately a normal distribution with mean and standard deviation equal to standard error of x provided the sample size n is sufficiently large. 2. If p be the proportion of defectives in a random sample of size n drawn from a population having the proportion of defectives p then the sampling distribution of the sample proportion of defectives p is approximately a normal distribution with mean p and standard deviation is equal to Standard Error of p provided the sample size n is sufficiently large. Usually a sample of size 30 or more is considered as a large sample. However, the large the value of n, the better is the approximation. Other Sampling Distribution: It is seen that sampling distribution of mean and proportion of successes are normal. Apart from normal distribution, there are certain other Prob. Distribution that are useful in sampling theory. These distributions are: 1. Chi-square distribution 2. Students t distribution 3. Snederors F distribution

Solved QP CP-205

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Solved QP CP-205

Enviado por

Direitos autorais:

Formatos disponíveis

1 Q1. What is Regression Analysis ? How can you find the regression lines ?

Types of control charts

LCL = p 3. UCL = p + 3. (iii) (iv)

Determine the control limits by using the formula UCL = n p + 3 n p q LCL = n p 3 n p q

N = k (say) samples each of size n. For each of n

P1Q0 12000 1600 5760 14400 15360

P1Q0 24000 1920 8640 17280 23040

51840 49120 74880

Factor Reversal test is satisfied when P01 x Q01 = P01 =

49120 74880 34880 51840 51840 74880 34880 49120

Hence time reversal test is satisfied.

States of Nature H0 is true (Product fails) Correct decision Type I error ()

H0 is false (Product succeeds) Type II error () Correct decision

and standard ever =

15000 2400 41600 3600 17200 1600 12600

38000 15600 = 0.51

Y X x(X-5) x2 2 -3 9 5 4 -1 1 10 6 1 1 7 8 3 9 14 x=20 x2=20y=36

y(Y-9) y2 xy -4 16 12 1 1 -1 -2 4 -2 5 25 15 y2=46 xy=24

y-9=1.2(x-5) y-9=1.2x-6 y=3+1.2x Regression equation of x on y : x- x =

144 144 144

x mean of the sample means x = No.ofsamples x

Since P01 x P10 = 0, the fishers formula satisfies the test.

p(1 p ) n p (1 p) n Number of defectives

p =average fraction defective =

b. S.E. Standard deviation s = c. Sample Proportion p : p =

d. Difference in two samples means X 1 and X 2 i.e.

For a finite population of size N, when a sample is drawn without replacement. x = ( n )( N n / N 1

Você também pode gostar