--- title: "Simulation demo" author: "Original version by Prof Wagaman, slightly modifed by Prof Leise" output: pdf_document --- ### Voting Simulation To demonstrate simulation in action, let's consider a voting scenario. For each answer here that requires running some of the code, you should justify your response with values generated from the simulation. Bear in mind that if you compile the document, you might get slightly different values, unless you set seeds. SETUP: A voting precinct has only 2000 eligible voters. Suppose you are the consultant hired to watch poll numbers and relay confidence in a win to your candidate. Your candidate needs a simple majority vote to win the election (> 50 percent). Suppose 75 percent of the eligible voters prefer your candidate to his/her opponent and will vote in his/her favor. Now, suppose that only 200 of the 2000 voters actually end up voting in the election. We will also assume that each of the 2000 voters is equally likely to end up being one of the 200 who end up voting. Based just on this information, what would you tell your candidate about his/her chances of winning the election? > ANSWER: We could tackle this problem and find an analytical solution. Or, we can get an empirical answer by simulating this situation using R. The code begins with what are called variable declarations - storing some needed values and creating variables like *wins* that are currently filled with 0s but will be filled in as the simulation runs. Then we use a for-loop to run through lots of simulated trials and report the results. ```{r} # Voting Simulation n<-2000 # set number of overall voters pref<-0.75 # set preference for your candidate # create pool of voters, 1's favor your candidate voters<-c(rep(0, n*(1-pref)), rep(1, n*pref)) samplesize<-200 # number of voters who actually show up to vote runs<-1000 # set number of simulations you want to run # declare a variable to save win/loss record of election wins<-rep(0,runs) # run election simulations for(i in 1:runs) { # start loop to run elections # sample the number of voters who show up from eligible pool votersub<-sample(voters, samplesize) # fraction of total votes your candidate receives fraction<-sum(votersub)/samplesize # records a win if candidate receives more than 50 percent of vote if(fraction>.5) { wins[i]=1 } } # end of loop # shows the number of wins (1s) and losses/ties (0s) among the set of simulations table(wins) ``` The table that is output contains 1s and possibly 0s (indicating wins and losses/ties, respectively). The number underneath the 0 or 1 is the number of times the value occurred out of the number of runs you performed. Most likely, your first table will show a 1 with a 1000 under it. That means your candidate won 1000 times out of the 1000 runs of the simulation. 100 percent chance of winning the election! You can re-run the simulation by running the second half of code (reset the wins variable and run the loop and final summary), or just by re-running the whole chunk. You can also run more simulations at once by increasing the value of runs. (There is no need to go crazy here, under 10000 is reasonable for simple simulations like this). Run the simulation a few times. Do the results change? Do you confirm what you decided to tell your candidate about their chances of winning? > ANSWER: Now, suppose that only 55 percent of the eligible voters prefer your candidate. Your candidate believes they will win simply because 55 is greater than 50 percent. Without using the simulation, what do you tell your candidate about their chances of winning the election in this setting? > ANSWER: Change *pref* in the simulation to 0.55. Run the simulation a few times (or use a number of runs like 5000). Should your candidate be worried about winning the election? Does what you see in the simulation agree with what you thought you should tell your candidate? > ANSWER: What if only 100 voters actually show up and vote in the election? Should your candidate be more or less worried about winning the election than before? Use the simulation to check your intuition (you will need to change *samplesize* to 100 instead of 200.) Does your answer depend on whether the preference for your candidate is .55 or .75? > ANSWER: ### Functions We've been using functions in R that are provided for us. However, it is possible to write functions in R. You will see them in the R code supplied by the textbook. Briefly, I'm going to show you what the voting simulation code above would look like as a function, and how you'd run it in that case. It may look more clean to you this way. Run the entire code chunk below to create a function that will run the voting simulation. ```{r} # Voting Simulation as a function voteSim <- function(n, pref,samplesize, runs) { # create pool of voters, 1's favor your candidate voters<-c(rep(0,n*(1-pref)),rep(1,n*pref)) # declare a variable to save win/loss record of election wins<-rep(0,runs) # start loop to run elections for(i in 1:runs) { # sample the number of voters who show up from eligible pool votersub<-sample(voters,samplesize) # fraction of vote your candidate receives fraction<-sum(votersub)/samplesize # records a win if candidate receives more than 50 percent of vote if(fraction>.5) { wins[i]=1 } } # end of loop table(wins) # shows the number of wins (1s) and losses/ties (0s) } # end of function ``` It should show up as a function in your environment tab on the top right. To run the simulation, you need to supply it with the four input values, and it will show you the table of wins. ```{r} voteSim(2000, 0.55, 200, 100) #n (number of voters), pref, samplesize, runs ``` This function is not particularly elegantly written, but it runs. For example, if you put in a value that won't work (for example: assuming 1000 show up to vote out of 500 eligible voters), it simply stops working. (And the resulting file would not compile). Really well-written functions will put in conditions to be checked before they try to run, and include defaults in case input values are nonsensical. However, that is not the case with most of the code we will be using. It will run, but it can also be ``broken''. Please ask for assistance if you have issues with code! Why functions? Functions allow us to explore results of simulations in different settings (input values) easily. We will practice summarizing simulation results this semester, based on the code you have run. #### Writing your own function Functions in R have a basic framework. You set a function name, a list of inputs (which appear after "function" but before the bracket), then you perform a series of operations on the inputs, and set an output (the last item before the end bracket). Let's look at a very simple function. ```{r} addtwo<-function(a,b) { total<-a+b total } addtwo(3,9) ``` What is the name of this function? > ANSWER What are the inputs to this function? > ANSWER What does the function do? > ANSWER What is the output of the function? > ANSWER Again, bearing in mind that this is a simple function, let's consider the pseudocode that would support you in building this function. According to Wikipedia's Pseudocode entry: "Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a normal programming language, but is intended for human reading rather than machine reading." You've figured out that "addtwo" takes two numbers and outputs their sum. Here is example pseudocode that could help you write the associated function: Determine 2 inputs. Add the inputs. Output the total. This example is very simple and does not include much of the programming language. For example, it does not have any loops and there is no reference to a computer function. We will work up to more complicated functions as the semester progresses. To wrap up this lab activity, create your own function that generates three random numbers from 0 to 10 and outputs the maximum of the three. Hint: Does this function need any inputs? > ANSWER ```{r} ```