程序案例-DATA5207

Lecture 3: Confounding factors and human behaviour DATA5207: Data analysis in the social sciences Dr Shaun Ratcliff The world is not simple. $0 $25,000 $50,000 $75,000 $100,000 60 65 70 75 80 Height (inches) An nu a l e ar n in gs (1 99 1 U S$ ) Earnings by height Confounding factors. $0 $25,000 $50,000 $75,000 $100,000 60 65 70 75 80 Height (inches) An nu a l e ar n in gs (1 99 1 U S$ ) gender Female Male Earnings by height and gender $0 $25,000 $50,000 $75,000 $100,000 60 65 70 75 80 Height (inches) An nu a l e ar n in gs (1 99 1 U S$ ) gender Female Male Earnings by height and gender This was made with just this code: ggplot(earnings.data, aes(y=earn, x=height, colour=gender, fill=gender)) + geom_smooth(method=”lm”, fullrange=TRUE, colour=”black”, se=FALSE, aes(group=1)) + geom_smooth(method=”lm”, fullrange=TRUE) + geom_jitter(alpha=.5, width = 2, height = 1000) + scale_color_manual(values=c(“red”,”blue”)) + labs(title = “Earnings by height and gender”, y = “Annual earnings (1991 US$)”, x = “Height (inches)”) + scale_y_continuous(labels = dollar, breaks = c(0,25000,50000,75000, 100000))+ coord_cartesian(ylim=c(0,100000), xlim=c(58,80)) + theme_bw() Confounding factors. Linear regression y = α + βx + Linear regression Assuming y is your dependent variable, x is your substantive predictor and X all your controls (other independent variables): When you fit a regression, it shows you the estimated change of y when x changes by 1, holding X constant (at zero or their baseline category). If some subset of X is a confounding factor for x , this will change the coefficient than if you fit a regression without these controls included. Fitting linear regression in R earnings.model <- lm(earn ~ z.height + race2 + gender + z.age, data = earnings.data) display(earnings.model) ## lm(formula = earn ~ z.height + race2 + gender + z.age, data = earnings.data) ## coef.est coef.se ## (Intercept) 16444.11 758.48 ## z.height 4557.52 1427.38 ## race2Black -2632.00 1742.56 ## race2Other 1620.40 3590.42 ## race2Hispanic -4072.34 2141.91 ## genderMale 11240.25 1450.25 ## z.age 4252.46 1137.31 ## --- ## n = 1377, k = 7 ## residual sd = 18354.27, R-Squared = 0.14 Regression as a tool Why use regression Three major uses both in academia, and in private and public sectors. Controlling for confounding factors. Smoothing. Prediction. Regression as a tool Why use regression Three major uses both in academia, and in private and public sectors. Controlling for confounding factors. Smoothing. Prediction. Regression as a tool Why use regression Three major uses both in academia, and in private and public sectors. Controlling for confounding factors. Smoothing. Prediction. Regression as a tool Why use regression Three major uses both in academia, and in private and public sectors. Controlling for confounding factors. Smoothing. Prediction. Regression as a tool Why use regression Three major uses both in academia, and in private and public sectors. Controlling for confounding factors. Smoothing. Prediction. The research project In the labs. Labs Fitting linear regressions in R. Reading regression output. Standardising variables. Plotting regression estimates. Labs Fitting linear regressions in R. Reading regression output. Standardising variables. Plotting regression estimates. Labs Fitting linear regressions in R. Reading regression output. Standardising variables. Plotting regression estimates. Labs Fitting linear regressions in R. Reading regression output. Standardising variables. Plotting regression estimates.