Você está na página 1de 13

Fundamentals of Data Analytics Assignment

Submitted To Assistant Professor J.Balaji


By Vivek Narayanan, PGDM Number: 13061

1/9/13

Statistics In Sports

Chapter 1 Introduction to Descriptive Statistics


Scales of Measurement
The 4 generally used scales of measurement are nominal, ordinal, ratio and interval. The nominal data measurement scale is used for data that is expressed with the purpose of identifying some kind of attribute. It can be expressed using either a numeric code or some kind of nonnumeric label. The ordinal data measurement scale is used when you want to classify information based on a specific order or rank that is necessary. The interval data measurement scale is used for numeric data that is expressed in intervals of some kind of fixed measurement. Finally, the ratio data measurement scale is used to express the ratio of some of the values of interval data. Let us take an example related to sports and explain the above scales. The Below table Lists the Football Statistics of 4 teams in World Cup History. Team Name Italy Germany Spain England Ranking 6 2 1 14 Number of world cups won Number of Goals Scored in World Cup 4 3 1 1 70 65 63 55

Here The Team names-Italy, Spain, Germany and England Depict the nominal scale. The ranking Along with these names indicate the ordinal scale. If we take the ratio of the number of goals a team has scored compared to another team that indicates the ratio scale. The Number of goals the Teams have scored between the intervals 0-20mins, 20-40mins, 40-60mins and 60-90mins can be classified on basis of the interval scale.

1|Page

Percentiles and Quartiles


A percentile is a certain percentage of a set of data. Percentiles are used to observe how many of a given set of data fall within a certain percentage range. Let designate a percentile as Pm where m represents the percentile we're finding, for example for the tenth percentile, m} would be 10. Given that the total number of elements in the data set is N

The term quartile is derived from the word quarter which means one fourth of something. Thus a quartile is a certain fourth of a data set. When you arrange a date set increasing order from the lowest to the highest, then you divide this data into groups of four, you end up with quartiles. Below data represents the highest earnings of footballers in 2013 Lionel Messi 30 C. Ronaldo 25.7 Samuel Etoo Naymar 20.5 17.1 Wayne Rooney 15.4

Name of the Player Amount in Million Pounds

Let us find the 40th percentile and the 3 quartiles of the worlds top 5 earnings of footballers. 40th percentile will be 19.14 million pounds First Quartile =17.1 million pounds Second Quartile=20.5 million pounds Third Quartile=25.7 million pounds

Measures of Central Tendency, Variability, Skewness and Kurtosis


Measures of central tendency include mean, median and mode. We can use the same table above and determine the mean, median and mode. The Mean for the given dataset is 21.74 million pounds The Median is 20.5 million pounds Mode for the data is 30 million pounds The Standard Deviation of data is 6.065723 Skewness of the data set is 0.502247 which means the earnings of the players displayed is reducing gradually based on their popularity. The Kurtosis is -1.55252

2|Page

Histogram and Frequency Polygon


The Number of pins down in a game of bowling is given in the below table: Pins Down 0 1 2 3 4 5 6 7 8 9 10 Frequency 2 1 2 0 2 4 9 11 13 8 8

The Histogram for the above data is displayed below:

14 12 10 8 6 4 2 0 <=0 (0, 1] (1, 2] (2, 3] (3, 4]

Histogram

(4, 5]

(5, 6]

(6, 7]

(7, 8]

(8, 9] (9, 10] (10, 11](11, 12] >12

The Frequency Polygon for the same data can be represented as follows:
14 12 10 8 6 4 2 0 <=0 0,1 1,2 2,3 3,4 4,5 5,6 6,7 7,8 8,9 9,10

3|Page

Methods of displaying data


Data can be displayed in the form of pie-charts, bar charts, frequency polygon and ogive. Below data represents the highest earnings of footballers in 2013 Name of the Player Lionel Messi Amount in Million 30 Pounds C. Ronaldo 25.7 Samuel Etoo 20.5 Naymar Wayne Rooney 17.1 15.4

Bar Chart Representing Player's and their Earnings


30 25 20 15 10 5 0 Messi Ronaldo Eto'o Naymar Rooney

Salary in Million Pounds

The following data shows the worlds most popular sports in Percentage.

Sport Football Cricket Tennis Others

Popularity Percentage 51 % 28% 15% 6%

4|Page

The above data can be represented in the form of a pie chart.

Tennis 15%

Others 6%

Football 51%

Cricket 28% POPULARITY OF SPORTS

Sachin Tendulkars scores in the last few matches can be seen in the below table Interval Of Runs 10< n < 20 20< n < 30 30 <n < 40 40 <n < 50 50 <n < 60 Frequency 5 7 12 10 6 Cumulative Frequency 5 12 24 34 40

The above data can be displayed with the help of an Ogive as seen below:

Ogive
45 40 35 30 25 20 15 10 5 0 10-20 20-30 30-40 40-50 50-60

40 34 24 12 5

Frequency of runs in the interval

5|Page

Exploratory Data Analysis Stem and leaf displays


Exploratory data analysis (EDA) is an approach to analysing data sets to summarize their main characteristics, often with visual methods. Here we show you the Stem and leaf displays and the box plot. Example: Following are the cricket scores scored by a player in a season. 23 45 53 30 4 75 24 121 55 116 73 56 34 78 64 39

We can represent the above data in the stem and leaf form as shown below: Outlier Stem 2 3 4 5 6 7 Outliers 0/4 Leaf 3 0 5 3 4 3 11/6

4 4 5 5 12/1

8 6 9

Another method of representing data is to summarize the data in a Box and Whisker Plot or Box Plot. This method uses the smallest value, the largest value, the median and the upper and lower quartile values. This is often referred to as a five point summary The scores of a batsman are given below: 11 21 29 50 12 22 30 52 12 22 31 53 13 22 32 56 15 23 34 60 15 24 35 62 15 26 37 16 27 41 17 27 41 20 27 42 21 28 45 21 29 47

The Box Plot can be represented as shown below: Lower Whisker 11 Lower Hinge 21 Median 27 Upper Hinge 40 Upper Whisker 62

6|Page

Chapter 2 Probability
Probability of Events
In probability theory, an event is a set of outcomes of an experiment (a subset of the sample space) to which a probability is assigned. A single outcome may be an element of many different events, and different events in an experiment are usually not equally likely, since they may include very different groups of outcomes. Example: In a class of 36 learners in a boys school, 20 play cricket, 26 play rugby and 4 do not play cricket or rugby. If a learner is chosen at random, calculate the probability that he: 1. Plays rugby and cricket 2. Plays cricket only 3. Does not play cricket or rugby 4. plays cricket or rugby 5. Does not play rugby Answer: n(S) = 36 Event C = plays cricket Event R = plays rugby These events are not mutually exclusive. P(R and C) = n(R and C)/n(S) Hence probability he play rugby and cricket= 14/36 or 7/18 2. P (cricket only) = n (cricket only)/n(S) = 6/36 Probability he play cricket only = 1/6 3. Probability that he does not play cricket or rugby= 4/36 or 1/9 4. P(C U R) = P(C) + P(R) P(C R) = 20/36 + 26/36 -14/36 = 32/36 Probability that he play cricket or rugby= 8/9 5. P(R') = 1 P(R) = 1 -26/36 = 10/36 Probability that he does not play rugby=5/18

7|Page

Mutually Exclusive Events


Two events are 'mutually exclusive' if they cannot occur at the same time. The probability of mutually exclusive events is denoted by P (AUB) = P (A) + P (B) Example - In a class there are 50 students, twenty students like playing cricket and ten students like playing football. Find the probability a randomly selected student likes playing cricket or football? Answer: P(C U F) = P(C) + P (F) =20/50 + 10/50 =3/5 Hence probability a randomly selected student likes playing cricket or football is 3/5 or 60%

Conditional Probability
In probability theory, a conditional probability is the probability that an event will occur, when another event is known to occur or to have occurred. If the events are A and B respectively, this is said to be "the probability of A given B". Example - At a middle school, 18% of all students play football and basketball and 32% of all students play football. What is the probability that a student plays basketball given that the student plays football? Answer: P (Football and Basketball) = 18% P (Football) = 32% P (Basketball | Football) =P (F and B)/P (B) = 18/32 = 56 %

Independence of Events
In probability theory, to say that two events are independent (alternatively statistically independent, marginally independent or absolutely independent) means that the occurrence of one does not affect the probability of the other. Example-Russell is playing in a cricket match and a game of football at the weekend. The probability that his team will win the cricket match is 0.7, and the probability of winning is 0.9 in the football. What is the probability that his team will win in both matches? Answer: Using Multiplication Law we get: P (win both matches) = P (win cricket AND win football) = P (win cricket) P (win football) = 0.7 0.9 = 0.63

Hence, Probability that his team will win in both matches=0.63

8|Page

Bayes Theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) is a result that is of importance in the mathematical manipulation of conditional probabilities. It is a result that derives from the more basic axioms of probability.

Example - Cricket DRS system is claimed to be around 95% accurate in giving a batsman out, if in fact, the batsman is really out. Suppose the DRS also yields F+( False Positive ) results for just 1% of the bowler reviews, i.e. it gives a batsman 'out' when he is really 'not out' just like the umpire originally said. If 10% of the batsmen subject to bowler reviews are actually out (as obtained in the previous paragraph), what is the probability that a batsman is actually out given that the DRS overturns the umpire's decision to say he is out? Solution: Let OUT be the event that the batsman reviewed is actually out (its complementary event is NOTOUT), and RED the event that DRS gave him out. The desired probability P(OUT|RED) is obtained using the Bayes formula by: P(OUT|RED) = P(OUTRED)/P(RED) Expanding the terms, we can write this as = [P(RED|OUT) x P(OUT)] / [P(RED|OUT) x P(OUT) + P(RED|NOTOUT) x P(NOTOUT)] = [0.95 * 0.1] / [0.95*0.1 + 0.01 * 0.9] = 0.095/0.104 = 91%

9|Page

Chapter 3 Random Variables


In probability and statistics, a random variable or stochastic variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense).As opposed to other mathematical variables, a random variable conceptually does not have a single, fixed value (even if unknown); rather, it can take on a set of possible different values, each with an associated probability.

Binomial Distribution
The Binomial probability formula is given by nCrprqn-r where p represents the probability of success and q represents the probability of failure. Example - Probability that a batsman scores a century in a cricket match is 1/3. Find the probability that out of 4 matches, he may score century (1) in exactly 3 matches (2) in one of the matches Solution: Here "success" is denoted by "scoring century" Given probability that a batsman scores a century in a cricket match is 1/3. That is p = 1/3. "Failure" is denoted by "not scoring century". We know that q = 1 - p = 1 - 1/3 = 2/3. Total number if matches n = 4. Binomial probability formula is given by nCrprqn-r (1) We have to find the probability that he scores century in exactly three matches. That is r = 3. P (scoring century in exactly 3 matches) = 5C3(1/3)3(2/3)5-3 = 5C3(1/3)3(2/3)2 = 10 * (1/27)*(4/9) = 40/243 P(scoring century in exactly 3 matches) = 40/243 (2) We have to find the probability that he scores century in one of the matches. That is r = 1 P (scoring century in one of the matches) = 5C1(1/3)1(2/3)5-1 =5C1(1/3)1(2/3)4 = 5 * 1/3 *(16/81) = 80/243 P(scoring century in one of the matches) = 80/243

10 | P a g e

Poisson Distribution
We know that the Poisson probability formula is given by mx e-m/x! Where m is the mean of the Poisson distribution. Example In football there are on an average 4 goals scored by goalkeepers every 2 years. Find the probability that 2 years there will be less than 3 goals scored by goalkeepers. Solution: Given that in 2years there are on an average 4 goals scored by goalkeepers, i.e. m = 4. We have to find that probability that in 2 years there will be less than 3 goals scored by goalkeepers. So x can take values 0, 1, or 2. We know that the Poisson probability formula is given by mx e-m/x! Where m is the mean of the Poisson distribution. So required probability is P(x < 3) = P(x = 0) + P(x = 1) + P(x = 2) = 40 e-4/0! + 41 e-4/1! + 42 e-4/2! = 0.2382 Therefore probability that in 2 years there will be less than 3 goals scored by goalkeepers =0.2382

11 | P a g e

Chapter 4 Normal Distribution


Transformation of Normal Random Variable
In probability theory, the normal (or Gaussian) distribution is a very commonly occurring continuous probability distributiona function that tells the probability of a number in some context falling between any two real numbers. Example: The number of goals Manchester United score in Barclays Premier League season is assumed to be distributed with a mean of 100 and standard deviation 15.Manchester United need 115 goals to create the record for highest goals in a single season.

Probability that Manchester United will score less than 115 goals is P (X<115) = 0.8413

Probability that Manchester United will score more than 115 goals is P (X>115) = 0.1587

Probability that Manchester United will score between 70 to 120 goals is P (70 < X < 120) = 0.8860

***************************************************************************

12 | P a g e

Você também pode gostar