Você está na página 1de 3

Training with R

PART I:

Creating and modifying data

1. Create a vector of the integers from 1 to 50 and call it x.


2. Use the seq function to create a vector of even numbers from 2 to 100 and call it y.
3. Find the length of each vector using the length function.
4. Add the vectors x and y element-size.
5. Combine x and y into a single vector and call it z using c function.
6. Sort the z vector from lowest to highest.
7. Take the natural logarithm of all the values in z and store that in log.z variable.
8. Find the mean of log.z.
9. Count the number of values in log.z that are greater than 4 using the command:
sum(log.z > 4)
10. Output the values in log.z that are greater than 4 using the command: log.z[log.z > 4]
11. Combine the vectors x and y column-wise using the cbind function and give the new
object the name xy.mat.
12. Use the class function to determine which class your new object belongs to.
13. Find the element in the 10th row, 2nd column.
14. Using the t function to find the transpose of xy.mat and call it t.xy.mat.
15. Use %*% to multiply t.xy.mat and xy.mat (in that order) and call the output
xy.mat2
16. Use the dim function to find the dimensions of xy.mat2.
17. Find the inverse of xy.mat2 using the solve function.
PART II:

Using data

1. Create or download a dataset in .csv format. Make sure that it has variable names in the
document in the first row.
2. Load the dataset using the read.table function. Print the dataset to make it was imported
properly.
3. Load the datasets package using the library function.

4. There is a dataset in this package called chickwts. Use the class function to determine
which class the object belongs to.
5. Use the names function to find the variable names.
6. Attach the dataset to the workspace so that we can continue to work with it.
7. Suppose we are only interested in chickens who have eaten soybean or sunflower feeds.
Use the which function to print out the indices of the observations of with feed
sunflower or soybean.
8. Create a new dataset called chickwts2 that is a subset of the chickwts data but
includes only sunflower or soybean feeds. Use the command: chickwts2 =
subset(chickwts, feed== sunflower | feed== soybean)
9. Attach this new dataset to the workspace so that we can continue to work with it.
10. Conduct a one-sample t-test to see if the mean chicken weights are greater than 280. If
needed, utilize the help menu by typing in ?t.test.
11. There is another dataset in this package called cars. Use the class function to determine
which class the object belongs to.
12. Use the names function to find the variable names.
13. Attach the dataset to the workspace so that we can continue to work with it.
14. In this data, Speed is in miles per hour and Distance is in feet. Suppose we want to use
yards instead of feet. Convert the distance from feet to yards and store in new variable
called dist.yards.
15. Add this new variable to your car dataset by using the command: cars = data.frame(cars,
dist.yards)
16. Create a scatterplot of Y = distance and X= speed using the plot function.
17. Fit a linear regression model of Y = stopping distance in yards versus X = speed in mph
yards using the lm function.
18. Note the equation of the line. Create a function that takes a value of x as input and
outputs the fitted value of y. Call the function f.
19. Add the fitted line equation in the color red to your plot using the command: lines(0:30,
f(0:30),col= red)
20. One last dataset that we will look at is called infert. Use the class function to determine
which class the object belongs to.
21. Use the names function to find the variable names.
22. Attach the dataset to the workspace so that we can continue to work with it.
23. Create a two-way table of the education and induced. Call the table tab.

24. Perform a chi-square test of independence for the tab data using the chisq.test function.
25. There is an age variable in the dataset. Recode this variable into age groups: 24 or
younger, 25 to 29, 30 to 35, and 36 and over. There are many ways to do this, and one
such way is to use a for-loop couple with an if-then statement:
for (i in 1:length(age) ){
if (age[i] < 25) {age[i] = 24 or younger}
else if (age[i] < 30) {age[i] = 25 to 29}
else if (age[i] < 36) {age[i] = 30 to 35}
else {age[i] = 36 or older}
}
Try this for yourself and verify that it works and that you understand what the code is
doing. Notice that in this version of the code, you will be replacing the age variable
from a vector of numerical values to a vector of characters.
PART III:

Advanced exercises for functions, if-then statements, and for-loops

1. Write a function that takes a dataframe as an argument and returns the mean of numeric
column in the data frame. Test it on the iris dataset, preloaded in R.
2. Modify your function so that it returns a list, the first element of which is the means of
the numeric variables, and the second of which is the counts of the levels of each
categorical variable.
3. Write a function that outputs the string positive if the input is positive, negative if the
input is negative, and zero if the input is 0. Test it on a few values.
4. Use a for-loop to get the class of each column in the iris dataset.
5. Use the loop to calculate the mean of each numeric column in the iris dataset.

Você também pode gostar