Você está na página 1de 3

1.

Using prose, describe how the variables and observations are organised in each of the
sample tables.
ANS:
Table1 : The variables have own columns.Tidy data
Table2,3,4a,4b: The variables dont have own columns.Its a Untidy data
2. Compute the rate for table2, and table4a + table4b. You will need to perform four
operations:
1. Extract the number of TB cases per country per year.
ANS :Table1
table1%>% group_by(year)%>%summarise(count=sum(cases))
2. Extract the matching population per country per year.
table1
table1%>% group_by(year)%>%summarise(count=sum(population))
3. Divide cases by population, and multiply by 10000.
tibble(country=cases$country,year=population$year,rate=cases2$sum/
pop$sum*10000)
4. Store back in the appropriate place.
3.Which representation is easiest to work with? Which is hardest? Why?
ANS: Tidy data is easiest to work .Because consistance way of working with data
untidy data is hardest .Because the observation is not neat format.
4. Recreate the plot showing change in cases over time using table2 instead of table1.
What do you need to do first?
ANS:
cases2<-table2%>%filter(type=='cases')%>%group_by(country,year)%>
%summarise(sum=sum(count))
cases2

ggplot(table2,aes(year,cases2))+geom_line(aes(group=country),colour=”grey50”)
+geom_point(aes(colour=country))

Exercises

1.Both spread() and gather() have a convert argument. What does it do?

ANS:
stocks %>% spread(year,return)
gather(`2015`,`2016`,key=”year”,value=return)
Since gather does the work of converting values into column names.It is narrowely
arranged.Spreading does the work of converting column names into values.It is widely
arranged.

1. Why does this code fail?


ANS:

gather(`1999`, `2000`, key = "year", value = "cases")

2. Why does spreading this tibble fail? How could you add a new column to fix the problem?
ANS:
Tibble is wronglyb written as tribble.There is no datatype and proper column
names.So we cannot spread this code.
New column<- Age ,Height.
spread(age,height)

3. Tidy the simple tibble below. Do you need to spread or gather it? What are the variables?

Exercises
1. What do the extra and fill arguments do in separate()? Experiment with the
various options for the following two toy datasets.

tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%

separate(x, c("one", "two", "three"),extra=”drop”)

tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%

separate(x, c("one", "two", "three")fill=’drop’)

2.

12.5.1 Exercises
1.Compare and contrast the fill arguments to spread() and complete().
stocks <- tibble(
.
year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016),

qtr = c( 1, 2, 3, 4, 2, 3, 4),

return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66)


)

stocks %>%

spread(year, return)

complete(year, qtr)

2.fill(return)

Você também pode gostar