RStudio-PS918 - Father Essays

Worksheet 3: Models of Risky Choice PS918 Psychological Models of Choice Emmanouil Konstantinidis | University of Warwick March 25, 2021 The main objective of this worksheet is to become more familiar with parameter estimation in risky choice (i.e., choices between risky gambles). We will cover different utility functions and choice rules. Part 1 Load some R libraries that we will be using throughout: library(ggplot2); library(ggpubr); library(plot3D) We will start with a simpler model and then add more features (and parameters). For a refresher on these models, you can go back to the slides from Lecture 2 (and associated readings). We will work with the Technion Competition dataset EstimationSet.csv (for more details see Erev et al., 2010). This was the first modelling competition on risky choice. The dataset contains 1,200 rows. There are 60 participants in total which means 20 data points/choices per participant. dt <- read.csv("EstimationSet.csv", header = TRUE) The data.frame has the following variables: sub: Participant number id: Choice id trial: Order of choices high: Highest outcome in the Risky option phigh: Probability of the highest outcome in the Risky option low: Lowest outcome in the Risky option (the probability of this outcome is 1 phigh) medium: The outcome in the Safe option (Probability is 1) risky.choice: 1 if choice was Risky, 0 if choice was Safe Consider the first row in the dataset presented below. A participant is facing a choice between a risky prospect A = 1.5, 0.92; 6.4, .081 and a sure thing alternative B = 1.8, 12. All gambles in the dataset follow the same structure. The main variable of interest is risky.choice (see above for description). ## sub id trial high phigh low medium risky.choice ## 1 1 10 1 -1.5 0.92 -6.4 -1.8 0 ## 2 1 17 2 -9.7 0.10 -24.7 -23.8 1 ## 3 1 24 3 9.2 0.05 -9.5 -7.5 0 ## 4 1 53 4 25.7 0.10 8.1 11.5 0 ## 5 1 47 5 24.3 0.04 9.7 10.6 0 ## 6 1 51 6 13.1 0.94 3.8 12.8 1 1This reads as 92% to lose 1.5 points or 8% to lose 6.4 points. 2A certain loss of 1.8 points 1 0100 200 300 400 5.0 2.5 0.0 2.5 EV Difference Fr eq ue nc y A) l l ll l l lllll l l l l l l ll l l l l l l ll l l ll l ll l l l l l l l l ll ll l ll l l l l l ll l l l ll lll l l 0.00 0.25 0.50 0.75 1.00 4 2 0 2 EV Difference % o f R isk y Ch oi ce B) Figure 1: A) Distribution of EV differences. B) EV difference as a function of risk taking behaviour. Activity 1 Take some time to get a sense of the dataset. You can begin by calculating the Expected Value (EV) of each gamble and add it to the dataset as separate variable. First, you may want to see the distribution (i.e., histogram) of EV differences. Then, you could plot the choice proportions as a function of EV difference. This is a good way to check that your data are coded correctly. Some things to consider: Is it good to have gambles closely matched on expected value Are proportions of choices following a sensible pattern Figure 1 shows my results (but no code: you can try to generate it). dt$EV.Risky <- with(dt, phigh * high + (1 - phigh) * low) dt$EV.Safe <- dt$medium dt$EV.diff <- dt$EV.Risky - dt$EV.Safe p1 <- ggplot(data = dt) + geom_histogram(aes(EV.diff),binwidth = 0.2, colour = "black", fill = "grey70") + theme_classic(base_size = 10) + labs(x="EV Difference",y="Frequency") p2 <- ggplot(data = dt, aes(x = EV.diff, y = risky.choice)) + geom_point(size = 0.1) + geom_smooth() + theme_classic(base_size = 10) + labs(x="EV Difference", y="% of Risky Choice") ggarrange(p1, p2, nrow = 1, widths = c(1, 1.4), labels = c("A)","B)"), font.label = list(size = 10)) Let’s now do some modelling. We will start with the Expected Utility model with a simple power-law utility function and a Fechner model of noise. This model will allow us to account for different attitudes towards risk (via the power parameter), but also accommodate the concept of constant error. 2 This can be formally expressed as: U(x) = xα (1) and [pU(H) + (1 p)U(L) U(M)] + , ～ N (0, σ) (2) This is a figure of how the model works: Figure 2: Simple Expected Utility model with Fechner noise indicating the probability of selecting the Risky (and Safe) option, p(R). In Figure 2 you can see a hypothetical normal distribution. The Fechner model of choice (and noise) assumes that stochastic (i.e., probabilistic) variation in one’s risk preferences results from an error term applied additively (see Equation 2). Based on this specification, the risky option is chosen when pU(H) + (1 p)U(L) U(M)+ > 0. In Figure 2 the probability of choosing the risky option is the area under the curve beyond/above the vertical line at value 0 (i.e., when there is no EU difference between the two options). Naturally, the probability of choosing the safe option is the area under the curve below 0, or 1 p(R). This area is given by the probability density function (pdf) of the normal distribution, which can be assessed in R via the function pnorm. The pnorm function can give us the probability of a certain value drawn from a normal distribution with mean μ and standard deviation σ. In the case of the Fechner model, we can use the EU calculation (or the difference in EU between the two options) as the μ parameter of the model. For the purpose of this exercise, let’s assume that α = 0.5 and σ = 1. First, we will define our fechner function. The function takes as inputs the numerical values of a given gamble (i.e., outcomes and probabilities) and the parameters α and σ. Within that function, we can define the utility calculation. Finally, we use pnorm to draw from a normal distribution where the mean μ is provided by our utility calculation and some amount of noise. fechner <- function(high, phigh, low, medium, alpha, sigma) { u <- function(x) { sign(x) * abs(x) ^ alpha } pR <- pnorm( q = 0, mean = phigh * u(high) + (1 - phigh) * u(low) - u(medium), sd = sigma, lower.tail = FALSE) 3 return(pR) } Let’s try it out. What would the result be for the following pair of gambles A = 10, .5; 0, .5 and B = 5, 1 Remember that α = 0.5 and σ = 1. res1 <- fechner(high = 10, phigh = 0.5, low = 0, medium = 5, alpha = 0.5, sigma = 1) res1 ## [1] 0.2562567 So p(R) = 0.26. Why do you think this is case What particular effect is displayed here What would happen if the lotteries were in the loss domain (You can modify the code and use your own gambles/prospects). Let’s apply this function to every row in the dataset. dt$fechner <- with(dt, fechner(high = high, phigh = phigh, low = low, medium = medium, alpha = 0.5, sigma = 1)) head(dt, n = 10) ## sub id trial high phigh low medium risky.choice EV.Risky EV.Safe EV.diff ## 1 1 10 1 -1.5 0.92 -6.4 -1.8 0 -1.892 -1.8 -0.092 ## 2 1 17 2 -9.7 0.10 -24.7 -23.8 1 -23.200 -23.8 0.600 ## 3 1 24 3 9.2 0.05 -9.5 -7.5 0 -8.565 -7.5 -1.065 ## 4 1 53 4 25.7 0.10 8.1 11.5 0 9.860 11.5 -1.640 ## 5 1 47 5 24.3 0.04 9.7 10.6 0 10.284 10.6 -0.316 ## 6 1 51 6 13.1 0.94 3.8 12.8 1 12.542 12.8 -0.258 ## 7 1 55 7 11.4 0.97 1.9 11.0 0 11.115 11.0 0.115 ## 8 1 52 8 3.5 0.09 0.1 0.5 0 0.406 0.5 -0.094 ## 9 1 42 9 17.9 0.92 7.2 17.1 0 17.044 17.1 -0.056 ## 10 1 9 10 -5.7 0.95 -16.3 -6.1 0 -6.230 -6.1 -0.130 ## fechner ## 1 0.5049826 ## 2 0.5375080 ## 3 0.4849131 ## 4 0.3734353 ## 5 0.4726218 ## 6 0.4766678 ## 7 0.4999299 ## 8 0.4009206 ## 9 0.4887593 ## 10 0.4999432 The fechner column indicates the probability that the risky option will be chosen given certain values for the model parameters. The model makes probabilistic predictions about choosing the risky option (and not deterministic). Activity 2 Repeat the computation several times to see how p(R) changes as a function of risk aversion (i.e., different value of the α parameter. Try different values that characterise a risk neutral or a risk seeking individual. Are the results sensible 4 dt$Fechner.RN <- with(dt, fechner(high, phigh, low, medium, alpha = 1, sigma = 1)) dt$Fechner.RS <- with(dt, fechner(high, phigh, low, medium, alpha = 1.5, sigma = 1)) What is the likelihood of the data given our predictions (based on the assumption that α = 0.5 and σ = 1) Next, our goal is to determine the values of α and σ that maximize the likelihood of the data. This is an example of a multidimensional optimisation problem. Let’s define the likelihood function. The function will take all of our parameters and gambles’ features and values, but also participants’ decisions (risky.choice) into consideration. We will then sum the log values. ll_fechner <- function(pars, high, phigh, low, medium, risky.choice) { alpha <- pars[1] sigma <- pars[2] p.risky <- fechner(high, phigh, low, medium, alpha, sigma) probs <- ifelse(risky.choice == 1, p.risky, 1 - p.risky) if (any(probs == 0)) return(1e6) return(-sum(log(probs))) } Activity 3 Applying the ll_fechner function to the data will give us the log-likelihood value. Note that we do not distinguish between individuals just yet. Try different parameters for α and σ. Take 10 minutes and see what is the best result that you can obtain. logRes <- with(dt, ll_fechner( pars = c(alpha = 0.5, sigma = 1), high, phigh, low, medium, risky.choice)) logRes ## [1] 785.0779 This is the log-likelihood for a given set of parameters (α = 0.5 and σ = 1). What we need to do is to run the optimisation function so that we can find the parameters that maximize the likelihood function (model fitting). We will use the nlminb optimisation routine. Below we save the results of the optimisation routine in the object sol1. The task is identical to what we did when fitting the DDM in Worksheet 2. sol1 <- with(dt, nlminb(c(alpha = .3, sigma = 1), ll_fechner, high = high, phigh = phigh, low = low, medium = medium, risky.choice = risky.choice)) sol1 ## $par ## alpha sigma ## 0.4195855 0.5169065 ## ## $objective 5 ## [1] 777.1386 ## ## $convergence ## [1] 0 ## ## $iterations ## [1] 10 ## ## $evaluations ## function gradient ## 14 27 ## ## $message ## [1] "relative convergence (4)" The sol1 list contains the results. The first element is par which has two values: the first is the estimate for α and the second for σ. The element objective indicates the log-likelihood value. We can check whether this is correct by running the ll_fechner function with the parameter values in par. As you can see, we get the same result. attempt1 <- with(dt, ll_fechner(sol1$par, high, phigh, low, medium, risky.choice)) attempt1 ## [1] 777.1386 We could try different values by hand, but as long as our MLE algorithm did its job, we should never find a better solution. However, stopping here with the parameter estimation might be a mistake. We now have to evaluate the results of the model fit. We have to be able to establish that our solution (i.e., parameter estimates) is both unique and the best possible. First, we need to compute the average choice proportions across all individuals. This will tell us about the mean choice proportion for the risky alternative. Next, we obtain the predictions of the model based on the parameter values that we have estimated. ## Aggregate data risk_data <- aggregate(risky.choice ~ id, data=dt, mean) ## Aggregate model risk_fechner <- with(dt, fechner(high, phigh, low, medium, alpha = sol1$par["alpha"], sigma = sol1$par["sigma"])) risk_fechner_id <- aggregate(risk_fechner, list(id = dt$id), mean) ## Combine in a single) + scale_y_continuous(limits = c(0, 1)) + scale_x_continuous(limits = c(0, 1)) + labs(x="Risky Choice: Data", y="Risky Choice: Model") On average, our model does not seem to perform that great. We can see a positive relationship but also a lot of noise. We should investigate the source of the problem. We could be facing local minima or perhaps the data are not good enough to constrain the parameters of our model. Since we are dealing with a relatively simple model (just two parameters), we can explore the parameter space graphically. With more complex models, this would be very hard. In such cases, a better solution is to perform a grid search. For now, let’s explore the parameter space. We will begin by creating a bunch of different combinations of parameter values and the associated log-likelihood value in a 3-dimensional graph. parameters <- expand.grid(alpha = seq(0, 2, .1), sigma = seq(0, 2, .1)) parameters$logll <- with(dt, apply(parameters, 1, ll_fechner, high = high, phigh = phigh, low = low, medium = medium, risky.choice = risky.choice)) parameters$logll[parameters$logll == 1e6] <- NA par(mar = c(1,1.5,2,2)) with(parameters, scatter3D(alpha, sigma, logll, theta = 20, phi = 10, bty = "g", ticktype = "simple", pch = 20, cex = 2, cex.lab = 1.8, xlab="alpha", ylab="sigma",zlab="Log-Likelihood", clab = "LL")) The results are troubling. We can see that the log-likelihood space is very flat in places. What this means is that there are many combinations of the two parameters that produce equally good fits. We will now explore this in more detail. What happens if we vary α and keep σ constant (and vice versa) 7 alpha sig m a Log Likelihood lll l l l l l l l l l l l l l l ll ll l l l l l l l l l l l ll l l l ll l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l l l l ll l l l l l ll l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l ll l l l ll l l 800 1000 1200 1400 1600 1800 2000 2200 LL Figure 4: Log-Likelihood values for combinations of α and σ of the Fechner model # Effect of varying alpha, with sigma fixed at best fit parms_sigma <- as.data.frame(cbind(alpha=seq(0,2,.01),sigma=sol1$par["sigma"])) parms_sigma$logll <- with(dt, apply(parms_sigma, 1, ll_fechner, high = high, phigh = phigh, low = low, medium = medium, risky.choice = risky.choice)) parms_sigma$logll[parms_sigma$logll==1e6] <- NA # Effect of varying sigma, with alpha fixed at best fit parms_alpha <- as.data.frame(cbind(alpha=sol1$par["alpha"],sigma=seq(0,2,.01))) parms_alpha$logll <- with(dt, apply(parms_alpha, 1, ll_fechner, high = high, phigh = phigh, low = low, medium = medium, risky.choice = risky.choice)) parms_alpha$logll[parms_alpha$logll==1e6] <- NA # plots p_sigma <- ggplot(data=parms_sigma,aes(x=alpha,y=logll))+ geom_point(colour="maroon", shape=21)+ theme_classic() + theme(text=element_text(size=14)) + labs(x=expression(alpha),y="Log-Likelihood", title=bquote(sigma == .(round(sol1$par["sigma"],2)))) p_alpha <- ggplot(data=parms_alpha,aes(x=sigma,y=logll))+ geom_point(colour="maroon", shape=21)+ theme_classic() + theme(text=element_text(size=14)) + labs(x=expression(sigma),y="Log-Likelihood", title=bquote(alpha == .(round(sol1$par["alpha"],2)))) 8 lllllllllllllllllllllllllllllll lll lll ll lll ll ll ll ll ll ll l l l l l l l l l l l l l l l l l l l l l l l 750 1000 1250 1500 1750 0.0 0.5 1.0 1.5 2.0 α Lo g Li ke lih oo d σ = 0.52 l l l l l l l l l l l l l l llllllllllllllllllllllll llllll llllll llllll lllllll llllllll lllllllll lllllllllll lllllllllllll800 850 900 950 0.0 0.5 1.0 1.5 2.0 σ Lo g Li ke lih oo d α = 0.42 Figure 5: Log-Likelihood values for each parameter of the Fechner model while keeping one constant ggarrange(p_sigma, p_alpha, nrow = 1) Clearly, we can see that varying parameter values makes hardly any difference for log-likelihood. We can conclude that a) our best model does not fit data very well and b) the parameters are not well constrained by the data. Part 2 There are many different ways in which CPT can be implemented. There are many parametric functions that one can choose for the probability weighting and value (utility) functions, and how choice is made (like the Fechner model in Part 1). In a paper by Stott (2006) there are many variants that can be used which generate 256 different versions of CPT. As you probably expect, each version will likely generate different results. Fitting CPT can be a challenging task. There are many parameters in the model, and parameter recovery is often not that great for the more complex versions (e.g., with more parameters; see Nilsson, Rieskamp, & Wagenmakers, 2011). In order to determine how well a model can fit the data, it is important to iden- tify whether the model has the ability to recover its parameters. This is an important property of any computational/cognitive model. As we have seen with the DDM, we will investigate whether model parameters can be recovered when we generate responses using a pre-determined set of parameter values. The recoverability of parameters is also dependent on the stimuli that is used to elicit people’s choices. Imagine that we asked people to make choices between extremely risky options and a safe one. We may expect that most people would be perfectly happy to pick the safe option. But if everyone does, we could not possibly estimate people’s risk aversion parameter. We may find many parameters offering us equally good fit to the data. This is why we need to ensure that our stimuli are “diagnostic” and can capture realistic ranges of parameter values. So far, we have explored a simple model. The Fechner is quite unique - thanks to its error structure we could use pnorm (the pdf of the Normal distribution) to calculate the probability of picking either the risky or the safe option. This will not be sufficient for CPT, however, as we need to find a more systematic way to calculate model predictions. Also unlike the DDM worksheet, we do not have convenient packages (like rtdists) to generate model 9 predictions (however, there have been some R packages for CPT and other models of risky choice – their validity has to be checked though). We have to start by specifying the model and then generate these predictions ourselves. For this worksheet, we will be using the logit choice rule that will transform our values for each option into choice probabilities (not to be confused with an option’s probabilities). We will use this stochastic rule to obtain our p(C) (i.e., probability of choosing any given option). The probability of choosing option A (over B) is given as: p(A) = 1 1 + e τ [V (A) V (B)] (3) Does this rule remind you of something If so, try both functional forms with some hypothetical values and see whether you arrive at the same result. Activity 4 Let’s try to rewrite our Fechner model into a standard EU model with a power function and the logit choice rule. Let’s also plot the results when we apply the new function to each row of our data. Below is the result (but no code). logit <- function (high, phigh, low, medium, gamma, tau) { u <- function(x) {sign(x) * abs(x) ^ gamma} v.a <- phigh * u(high) + (1-phigh) * u(low) v.b <- u(medium) v.diff <- v.a - v.b 1/(1 + exp(-tau*(v.diff))) } dt$Logit <- with(dt, logit(high, phigh, low, medium, gamma=0.5, tau=1)) ggplot(data=dt) + geom_histogram(aes(x=Logit), binwidth=0.02) + theme_classic2() + ylab("Frequency") 0 100 200 300 0.3 0.4 0.5 0.6 0.7 0.8 Logit Fr eq ue nc y fechner(10,.5, 0, 5, .5, 1) ## [1] 0.2562567 10 logit(10, .5, 0, 5, .5, 1) ## [1] 0.3418796 Activity 5 Apply the new logit function with the bias (or τ) parameter to the entire dataset. Evaluate the fit. Re- implement all of the code we covered in Part 1. Make sure that everything is in the working order. Earlier we mentioned grid search as a useful way for determining whether our model has local minima. Basically, we would expect that the results of our optimisation do not change as we vary our starting parameter values. We can also use a grid of parameter values to test parameter recoverability. To do this, we want to simulate data based on some assumed parameter values. We can then fit the model again to see whether our initial parameters are recovered. Activity 6 For the rest of this session, we will simulate some data to see whether our model can be recovered. Here is some code that can get you started. We will first generate model predictions based on some assumed parameter values. gamma.start <- 0.88 bias.start <- 1.1 dt$Logit <- with(dt, logit(high, phigh, low, medium, gamma=gamma.start, tau=bias.start)) head(dt) ## sub id trial high phigh low medium risky.choice EV.Risky EV.Safe EV.diff ## 1 1 10 1 -1.5 0.92 -6.4 -1.8 0 -1.892 -1.8 -0.092 ## 2 1 17 2 -9.7 0.10 -24.7 -23.8 1 -23.200 -23.8 0.600 ## 3 1 24 3 9.2 0.05 -9.5 -7.5 0 -8.565 -7.5 -1.065 ## 4 1 53 4 25.7 0.10 8.1 11.5 0 9.860 11.5 -1.640 ## 5 1 47 5 24.3 0.04 9.7 10.6 0 10.284 10.6 -0.316 ## 6 1 51 6 13.1 0.94 3.8 12.8 1 12.542 12.8 -0.258 ## fechner Fechner.RN Fechner.RS Logit ## 1 0.5049826 0.46334902 2.841817e-01 0.4871304 ## 2 0.5375080 0.72574688 9.954296e-01 0.6088608 ## 3 0.4849131 0.14343796 2.025795e-09 0.3292695 ## 4 0.3734353 0.05050258 8.848345e-08 0.2170825 ## 5 0.4726218 0.37600125 2.365132e-01 0.4364530 ## 6 0.4766678 0.39820345 2.173946e-01 0.4482567 Remember that the new column simply represents predictions p(C) of the model. We can follow a simple rule, where we assign a single choice (1 or 0) based on p(C). The code below illustrates this, but you will need to try many iterations. There are two things you want to iterate over: combinations of parameter values, and multiple simulations of a data given any pair of parameter values. decision.generator <- function(probability){ r.prob <- runif(1, min = 0, max = 1) choice <- ifelse(probability <= r.prob, 1, 0) } 11 dt$simulated.responses <- sapply(X = dt$Logit, FUN = decision.generator, simplify = TRUE) head(dt) ## sub id trial high phigh low medium risky.choice EV.Risky EV.Safe EV.diff ## 1 1 10 1 -1.5 0.92 -6.4 -1.8 0 -1.892 -1.8 -0.092 ## 2 1 17 2 -9.7 0.10 -24.7 -23.8 1 -23.200 -23.8 0.600 ## 3 1 24 3 9.2 0.05 -9.5 -7.5 0 -8.565 -7.5 -1.065 ## 4 1 53 4 25.7 0.10 8.1 11.5 0 9.860 11.5 -1.640 ## 5 1 47 5 24.3 0.04 9.7 10.6 0 10.284 10.6 -0.316 ## 6 1 51 6 13.1 0.94 3.8 12.8 1 12.542 12.8 -0.258 ## fechner Fechner.RN Fechner.RS Logit simulated.responses ## 1 0.5049826 0.46334902 2.841817e-01 0.4871304 0 ## 2 0.5375080 0.72574688 9.954296e-01 0.6088608 0 ## 3 0.4849131 0.14343796 2.025795e-09 0.3292695 0 ## 4 0.3734353 0.05050258 8.848345e-08 0.2170825 0 ## 5 0.4726218 0.37600125 2.365132e-01 0.4364530 1 ## 6 0.4766678 0.39820345 2.173946e-01 0.4482567 0 References Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S. M., Hau, R., . . . Lebiere, C. (2010). A choice prediction competition: Choices from experience and from description. Journal of Behavioral Decision Making , 23 (1), 15–47. doi: 10.1002/bdm.683 Nilsson, H., Rieskamp, J., & Wagenmakers, E.-J. (2011). Hierarchical Bayesian parameter estima- tion for cumulative prospect theory. Journal of Mathematical Psychology , 55 (1), 84–93. doi: 10.1016/j.jmp.2010.08.006 Stott, H. P. (2006). Cumulative prospect theory’s functional menagerie. Journal of Risk and Uncertainty , 32 (2), 101–130. doi: 10.1007/s11166-006-8289-6 12