程序案例-1PIZZA TEAM

1PIZZA TEAM 1
Dr. Ramos
Wang, Wu 2
Introduction
The object of our case is pizza, and we analyzed the entire frozen pizza for pizza y needs.
This project involves the investigation on the score of frozen pizza rated by Consumers Union.
The related factors are Calories, Fat content, and Type. A ‘standard slide’ was defined prior to
the study and a ‘Cost’ portion is also included. We would use regression analysis to analyze and
understand the union score, and quantitatively take research on the regression model and
variables.
Analysis & Report
Classification of Datas:
● The variable of interest is the score given by Consumers Union.
● The population is all the frozen pizza sold.
● The sample is the frozen pizza rated by Consumers Union.
● The score, cost, calories, type are all statistics
● No parameters presented in this case.
● Inferential Statistics-datas of this sample(frozen pizza rated by Consumer Union)
can be used to make a prediction of population(frozen pizza sold).
Scientific Question: What type of variable can affect the Consumers Union score of frozen
pizza
By analyzing our datas, we will be able to identify them into different categories.
● Nominal: Score, costs, calories.
● Quantitative: Fat.
● All above are cross-sectional datas.
● Primary:Calories, fat, cost and type.
● Secondary:Score.
The data is relatively reliable but it is rated by reviewers, the standard for ‘standard slide’ would
be misleading.
The next step is to know the sample statistics. In this situation, Mean is 65.4688,
Standard deviation is 19.3890, Variance is 375.9345, Median is 68.5 and Mode is 67. Because
Wang, Wu 3
we have two different types we divide the data into two parts, we use a dummy variable to
describe the type of pepperoni (figure1) and cheese (figure2) and calculate the following data.
For Pepperoni: Mean is 75.2857, Standard deviation is 15.1120, Variance is 228.3736, Median is
78, and Mode is None. As a result, the distribution for Pepperoni should be slightly left skewed
because its mean is a relatively lower mean score than median, but roughly normal. For Cheese:
Mean is 57.8333, Standard deviation is 19.2300 Variance is 369.7941, Median is 59, and Mode is
39. As a result, the distribution of Cheese Pizza score tends to be left skewed because it has a
relatively lower mean than median. Around 68%(38.6, 77.6) of datas fall into the first standard
deviation, and 95%(19.37, 96.29) for the next and 99.7%.(0.14,115.52). We have 14/32=43.75%
frozen pizza is pepperoni, so to check random sample with size of n=25, what are odds exactly
10 of them are pepperoni, use BINOM.DIST(10,25,0.4375, FALSE), we can get that there is
about 15% chance that exactly 10 of the 25 are pepperoni pizza.
The chart display in the following:
Wang, Wu 4
Second, we need to make assumptions in order to perform the procedure of constructing a
confidence interval. There are four assumptions. One is data must be from a random sample from
a large population. Two observations in the sample must be independent of each other. Three is if
n small, population distribution must be approximately normal. Four is if n is large, the
population need not be approximately normal. We use t-test because we know the sample mean,
sample size and sample standard deviation but not knowing the population mean, and standard
deviation are unknown. We need to compute the necessary statistics for constructing a
confidence interval. So, mu=65.4688, S=19.3890, and n=32(CLT applies). Since we know this
information, we could find the margin of error of the confidence interval at confidence levels of
95% and 99%, respectively and calculate the confidence intervals. For 95% confidence:
ME=6.990, t-critical=2.0395. For 99% confidence, ME=9.405, t-critical=2.7440. So, the 95%
confidence interval is (58.4783, 72.4593). The 99% confidence interval is (56.0635, 74.8470).
From above we can see 72.5 falls in the 99% confidence interval but not in the 95% confidence
interval, so at 99% confidence interval, the sample supports the claim that the average score is
Wang, Wu 5
72.5 but at 95% confidence interval, the sample does not support this claim.
We need to make assumptions in order to perform the procedure of conducting a
hypothesis test. Assumptions for hypothesis test, z-test in this case: The data are independently
sampled from a normal distribution. Then, we start to make a hypothesis test. H0<=59 and Ha>59. When we calculate with the data we get t-statistic=1.8873 and t-critical=1.8100. Besides
that, we know P-value=0.9657. We see that t-statistic> t-critical, so we reject H0, and we
conclude at 0.04 significance level, the population mean would be larger than 59.
In this part, we built a Simple Linear Regression Model. The first step is to identify the
quantitative variables. They are score, calories, fat and cost. We construct a Scatter Plot to show
the relationship between Score (y) and each independent variable are in the below.
Scatter plot of Score(y axis) vs. Cost(x axis):
Scatter Plot of Score(y axis) vs. Calories(x axis):
Wang, Wu 6
Scatter Plot of Score(x axis) vs. Fat(x axis):
Then we calculate the sample correlation coefficients for all pairs. Correlation
coefficients of Score vs. Cost=0.0473, Correlation coefficients of Score vs. Calories=0.4362, and
Correlation coefficients of Score vs. Fat=0.2765. From above we can see that Score and Calories
have the strongest linear association. The general formula is . Applying simple
regression to Score (y) and Calories (x), we can get at 5% significance level in the below.
From this table, we found the slope equals 0.2457, the equation is
Y-hat=-18.1416+0.2457X. The slope means with one unit increase in calories of frozen pizza;
the Consumers Union score would increase 0.2457 on average. R square is 0.1903 which means
that 19.03% total variation in y can be explained by the independent variable x. The predicted
score of 310 calories pizza is about 58.03(chart below). Also, from the table we can see that the
95% confidence interval for the average score of pizza with calories of 310 units is
(49.4515,66.6192). The 95% prediction interval for the average score of pizza with calories of
Wang, Wu 7
310 units is (20.8119,95.2588).
In order to find out if there is a linear relationship between y and x, we made the
hypothesis test. H0: beta1 is equal to 0, Ha: beta 1 is not equal to 0. For Calories,
p-value=0.0126, t-statistic=2.6554, and t-critical=1.8789. So, t-statistic> t-critical, reject H0, we
can conclude that there is a significant linear relationship between x and y.
Wang, Wu 8
At 94% confidence interval:
From above we can see that the population slope interval is (0.0648,0.4266), which does
not include 0.
Test for assumptions is from above we already see that the relationship between x and y
is linear which is using scatter plot. The Residuals show no pattern, but it does not evenly fall
between 0 lines, so the assumption of normal distribution assumption is violated. We could get a
conclusion that the results from SLR are not reliable. Scatter charts show below.
Wang, Wu 9
Wang, Wu 10
In this situation, we develop a multiple regression model to predict the Score (y) using all
the other variables of interest. First, we identified the qualitative variables. The qualitative
variable is type, we denote the pepperoni with dummy 0 and cheese with dummy 1.
R square=0.385, which means 38.5 of total variation in the score can be explained by
model 1. Adjusted R square is 0.2941. Then we use the F-test, H0 is
beta1=beta2=beta3=beta4=0, Ha is not all beta=0. From ANOVA of multi-regression, we can see
that F-statistic= 4.2291, and F-critical=F (0.087,4.27). We know that F-statistic> F-critical, so we
reject H0 and conclude that at significance level of 0.08, the model is significant. At 70%
confidence level, we then apply the t-test for coefficients of independent variables and see that
cost is not significant, and the other variables are significant.
Wang, Wu 11
The cost should be eliminated, and we build model 2. From the model in Excel, we know
that R square is 0.3842 which means 38.42% of total variation in the score can be explained by
model 2, which is about the same as that of model 1. Adjusted R square is 0.3182 which is larger
than that of model 1. F-statistic=5.8232> F-critical, so the overall regression model is
significant.
Model 2 is better than Model 1. Score=0.733*351-5.455*21-100.333=42.4 The average
score would decrease 13.0051 on average if the pizza is not pepperoni. From the graphs in Excel,
we can see the assumption for linearity, independence of errors, normality and equal variance are
met, so the results are reliable.
Wang, Wu 12
Assumption Tests Graphs:
Wang, Wu 13
Wang, Wu 14
Conclusion
We have analyzed frozen pizza and most of the data is true and reliable. The study was
analyzed in terms of calorie count, fat content and type (cheese or pepperoni). The conclusion
reached is that pizza y still has an advantage in the frozen pizza market.
From above analysis, we can see that for the statistics with one variable of interest and
many other independent variables related to this variable of interest, we can firstly check the
linearity between these variables using correlation coefficients, if the linearity is satisfied, we can
use simple linear regression or multivariable regression to build the prediction model. But the
reliability of this model is dependent on the 4 assumptions we made, namely, linearity,
independence of errors, normality, and equal variance. For this project, the simple linear
regression model is less reliable than multi-regression model, and in multi regression model, too
many independent variables would make the model too complex, we can eliminate certain
unimportant features and the adjusted R square would be higher in cost of just little decrease in R
square value, in this model, we eliminate the variable ‘cost’ and get a better model.
Wang, Wu 15
Proof of collaboration
Wang, Wu 16
Wang, Wu 17