--- title: "R Intro For Probability" author: "Original version by Prof Wagaman, slightly modifed by Prof Leise" output: pdf_document --- > Instructions: This series of exercises is a warm-up designed to give you some familiarity working in R and RStudio. These exercises are prepared in [R Markdown](http://www.rstudio.com/ide/docs/authoring/using_markdown), a markup language that allows you to easily format your code for readability and reproducibility. For more information about formatting in R Markdown, check out [RStudio](http://www.rstudio.org). The benefit of using R Markdown is that it enables you to seamlessly combine R commands, the output from those commands, and written analysis into one document. Save the file in a convenient place and keep saving your work as you go! #### Introduction to R NOTE: R is a programming language - there are often MULTIPLE ways to accomplish a given task. You may see different code here than in the textbook. The textbook code is a good reference, but during the semester, don't forget your class notes and examples too! Note that R is case-sensitive. That means capitalization and spelling count! The # in a line of code is a commenting convention. R ignores anything after the # as a comment, but it will still show up in the line of code. There will be roughly one R problem per week, plus occasion regular textbook homework problems will require some R calculations. We will practice using R during class to help you learn how to use it. ### R is a calculator When you type in commands in the console, they run. In the RMarkdown document, you use chunks (which can be added via the top right menu, or using shortcuts) instead. You can run individual command lines or entire chunks (put your cursor on a line or highlight multiple lines then hit Run). Within an R chunk, or even just in the console at the > prompt, you can perform basic calculations: ```{r} x<-6 # assigns 6 to x y<-8 # comments after "#" do not 'run' z<-3 x+y x-z x*y x*y/z x^z # of course, you could always just do 6^3 to get the same result ``` ### Examples of built-in functions ```{r} x<-c(3,5,6,8) # creates a vector with 4 entries # note that this overwrote the previously saved x value! sum(x) # adds up the entries in x mean(x) # finds the average value in x sd(x) # finds the standard deviation of values in x ``` ### Using R for Simulation One of the primary reasons we will be using R and RStudio is to simulate situations to illustrate and reinforce probability concepts. Later, it will also be useful for working with probability distributions. What can we simulate? Anything we can write code for. Don't panic -- we'll be starting off using code that is already set up for you (either from the text or provided by me), that you can modify as you learn. Again, you may have to make adjustments in the code as you work through problems. The good news is - you can always ask for help with the code! We will also be learning to write functions. That includes learning a little about algorithms, pseudocode and function inputs. For the most part, we will assume that users of our functions are "smart" - meaning that if you expect an integer input to a function, you will get one, and you don't need to code to prevent that. However, if you are interested in setting defaults, outputting errors for nonsensical inputs, etc. I can help with that. To get started though, let's just see some simple R commands that could help us simulate. ### Generating Random Numbers Suppose we want to generate a random number between 0 and 10 (integers only). There are many ways to do this. Both lines below will work. Try them. ```{r} sample(0:10, 1, replace=TRUE) round(runif(1, 0, 10), 0) ``` What is different about these commands? The sample command is sampling from the integers 0 to 10, one value, and reporting it to you. The second command line is a combination of the round command (to get a whole number) and the runif command. runif is sampling 1 value from a continuous uniform distribution on 0 to 10, which is then rounded for you to the nearest integer. What if you don't know what a command does, and there is no instruction about it? Help for R functions can be accessed by typing ?functionname at the console. For example, if I wanted help for the sample function, I would type ?sample in the console. These commands can cause issues when compiling if they are in an R chunk however (as they try to open an extra window), so comment them out if you are leaving them in your document (or just type them in the console and don't save them) like this: ```{r} #?sample #because of the # in front of this line, this is a comment and won't execute ``` How would you adjust both these commands to get two random values instead of one? Try it! > ANSWER: ```{r} ``` The sample command is more useful for us in this context, so let's work with it a little bit more. To simulate rolling a standard 6-sided die: ```{r} sample(1:6, 1, replace=TRUE) ``` What if we decided to roll the die as many times as necessary until we got two different numbers? Basically, this is sampling from 1:6 twice but the two values obtained need to be distinct, so we're sampling without replacement: ```{r} sample(1:6, 2, replace=FALSE) ``` Suppose you have given out 40 tickets for three prizes (identical prizes), and individuals cannot win multiple times. Write an example R command to distribute the prizes randomly among the 40 ticket holders, assuming tickets are labelled 1 through 40. > ANSWER: ```{r} ``` ### Setting Seeds Let's generate two more random numbers - just from 1 to 10, using separate commands. ```{r} sample(1:10, 1, replace=TRUE) sample(1:10, 1, replace=TRUE) ``` Were your two randomly generated numbers the same? (There is a chance that they will be.) Were they the same as those generated by your peers? Generating random numbers is great except that we like to be able to reproduce our work. If we each use a sample command, we aren't assured of getting the same random number, unless we set the same *seed* for the random number generator. If you run the next code chunk (all of it), your randomly generated value should be 3. ```{r} set.seed(1) sample(1:10, 1, replace=TRUE) ``` How do I know the value should be 3? Well, I ran the code once with the seed set to 1 and obtained 3. As long as you set the same seed, you will get the same result. Now, if you re-execute the sample command without the seed command right before it, you might get a different value. The trick is to set your random seed before generating any random numbers. That way, you can tell someone else the seed and they can generate the same random numbers you did. It may sound odd to be able to generate the SAME random numbers as someone else, but this is a way to make your work reproducible. You won't always need to set a seed, but if you find you are getting unexplainable results, and want to ask for help, try setting one so you can have someone else look at your work with the same results. It will also be required anytime we try to make a simulation *reproducible*. Also, note that if you compile an R Markdown document involving random components, and didn't set a seed, text that you write based on your output may be slightly different than what is in your compiled file. That may be fine for some applications, but you often want to refer to your generated value, so setting a seed makes that easy. To demonstrate how setting seeds works one more time, run the code chunk below. ```{r} sample(1:10, 4, replace=TRUE) #4 numbers between 1 and 10 possibly repeating set.seed(5000) sample(1:10, 4, replace=TRUE) #always 5 5 2 10 if run with seed line above ``` Always feel free to ask if you ever have questions about R or anything else in the course!