管理|MG107: Management and Analytics in the Age of Big Data 2023/7/20

Noam Yuchtman
July 20, 2023
MG107: Management and Analytics in
the Age of Big Data
Lecture 9: Linear patterns and
simple regression
Course overview
Part 5: Building regression models
Lectures 9-11
You want to build a model to predict and
explain outcomes from data you have
What’s the best model you can build
Today: constructing and interpreting the OLS
regression line and inference in regression
Outline
Constructing the OLS regression line
Interpreting and using the OLS model
The importance of the residuals
Inference in regression models
Outline
Constructing the OLS regression line
Interpreting and using the OLS model
The importance of the residuals
Inference in regression models
A bit more on the big picture . . .
Why regression models
Average effects of x on y often important for
decision-making in a range of organizations
We usually don’t have data that allow for a
specific comparison or a specific estimate of a
parameter
A general model can provide information about
these comparisons and estimates in the absence
of data we’d need to run the tests from earlier in
the course
Pick a job, I’ll give you a regression
Consultants: reg price competition
Pick a job, I’ll give you a regression
HR: reg performance interview_score
Pick a job, I’ll give you a regression
Finance: reg share_price quarter_sales
Pick a job, I’ll give you a regression
Education: reg test_scores spending
Fitting a line to data
Diamonds example
You have a sample of several hundred diamonds
Based on this evidence, you’d like to know:
1. What is the general relationship between price and
weight
2. What is the predicted price of a diamond that weighs
0.4 carat
3. How much more do diamonds that weigh 0.5 carat cost
Fitting a line to data
Diamonds example
To answer those questions, you want to build a linear
model that links the weight of a diamond to price
Does a linear model make sense
How do you build the model
Scatterplot of price against weight
Linear association is clear (correlation, r = 0.66)
What makes a good model
Many different lines can be drawn through the cloud of
data
You want to estimate a relationship between weight and
price that predicts price in the following way:
1. The prediction isn’t systematically wrong in one direction
2. Distance between prediction and actual price is small
The Ordinary Least Squares regression
The most important data analysis tool there is!
Any linear model will look like the following:
Estimated Price = b0 + b1 Weight
Why Because this is just the equation for a line . . .
The question is how to choose b0 and b1 to make the model work well