程序案例-IMBA 214-50

IMBA 214-50 Project 2 Mountain Bike Sales Instructor: Sudhir Thakur By: Vasundhara Sharma Introduction: This documented analysis aims to study relation and influence of various factors on Sales of mountain bikes. Mountain bikes are special purpose-built bikes which are generally used for mountain trails and unpaved surfaces. They are also commonly used on paved way in cities and town due to their easy customization. It is noticed in all major cities of the world that more bicycles are being used now than automobiles due to parking issues, traffic and congestion in the cities. These bikes are also commonly used as a mode of exercise within all age groups and can be also found commonly in touristy areas. Many people are also turning towards bike riding to reduce the carbon footprints generated by the usage of their automobiles. The reason of selecting mountain bike sales for analysis is that, as per my understanding usage of bicycles is seeing an upward trend and one of the major factors which impacts the sale of the bike is that of population density. Here as part of this analysis, we will be running multiple regression model to come to conclusion if population density along with other factors have an impact on sales of the bikes. Literature Review: As part of the study, we are trying to analyze the impact of Sales on various factors such as Floorspace, Advertisement related to the bike, Population Density in the area , Competing stores near and around the area, Pricing and Brand of the bike(whether the brand influence its sales or not). According to article in inc.com “How the Humble Bicycle Spurred a Modern Lifestyle Industry” the demand for market bike has increased over years due to increase in health-related sport activities, rise in disposable income, accessibility due to bike sharing apps etc. Urbanization has also played an important part as people living in the cities find bike as an affordable and convenient alternative while travelling on congested city roads. City planner and civil Corporations also encourage use of bikes and it assist in maintenance of traffic, resolve parking issues and keep pollution level in cities at bay. The article also states that brand of the bike may not be a deciding factor for consumers while purchasing the bike as they look out for more variety and customization which may not be prevalent in all brands. The same trend can also be explained in the article “How the humble bicycle is making a comeback in US cities” which states the upward trend in cycle usage specially in the city areas due to its convenience in terms of commuting. Problem Statement: The main purpose of running these models is to check on the impact of sales of mountain bikes by various factors such as Price, Floor space that the bikes takes, its brand, competing stores, competing ads etc. The purpose of analysis is also to check specifically if the sales of these bikes increases in area with high density population like touristy or urban areas. The analysis will be done based on 30 observations collected from Canvas(as provided by instructor). Null hypothesis in this case will implicate that no relation exists between Sales and above stated various factors and Alternate hypothesis would show that significant relation exists between Sales and stated factors. Methodology: A data set consisting of 30 observations have been taken from Canvas. This data set will be analyzed in JMP software by running regression models. Two models will be generated in JMP. One model with 1 dependent (Sales) and 5 independent variables (Floor space, Competing Ads, Population Density, Competing stores and Prices) and second model which would consist of an additional dummy variable(Brand) along with 1 dependent and 5 independent variables(stated above). Below are the 2 regression models generated in JMP: Model 1: Simple Regression Model Without Dummies Number of Observations model is based on: 30 Dependent Variables: Sales Independent Variables Floor Space Competing Ads Population Density Competing Stores Price Equation of the model: Sales = 1103.2062 + 11.105865*(Floor space) + (-6.409642) (Competing Ads) + 0.0600383*(Pop Density per sqkm) + (-0.69955)*Competing Stores + (-0.145987)*Price R squared value: 0.808526 Root mean Square Error 112.1223 Intercept Value: 1103.2062 Below are screenshots from JMP: Summary table For simple model Attribute of Model Value Interpretation P-value Interpretation Intercept 1103.2062 The intercept of 1103.2062 indicates that if all other factors are zero, then the sales of the bike will be equivalent to 1103.2062 0.0073 The value of 0.0073 signifies that P> 0.05 and that we reject the null hypothesis. R Squared 80.8526% r Squared value of 80.852 predicts that 80.85% of data fits the model signifying 80.85 variability in Sales due to independent variables. B1(Slope of Floor Space) 11.105865 This value signifies that if Slope of Floor space increases by 1, then Sales will increase by 11.10 <.0001 This value interprets that we accept the alternate hypothesis and reject the null signifying significant relation between Sales of bike and floor space B2(Slope of Competing adds) -6.409542 This value signifies that if slope increases by 1, then Sales will decrease by 6.409542 0.0912 This value signifies that we accept the null and that there is no significant relation between Competing adds and Sales of the bike B3(Population Density) 0.0600383 This value signifies that if slope increases by 1, then Sales will increase by 0.0600383 0.0205 This value signifies that we accept the alternate hypothesis since p<0.05 and that there is significant relation between Population Density and bike sales. B4(Competing Stores) -0.69955 This value signifies that if slope of completing store increases by 1, then Sales decrease by 0.69955 0.9510 This value signifies that we accept the null hypothesis since P>0.05 and that there is no significant relation between Sales and Competing Stores B5(Price) -0.145987 This value signifies that if slope of Price increases by 1, then Sales decreases by 0.145987 0.0968 The value of P =0.0968 signifies that we accept the null and reject the alternate hypothesis signifying no significant relation between Price and Sales. Model 2: Modified Model with Dummy Variable Number of Observations model is based on 30 Dependent Variables: Sales Independent Variables: Floor Space Competing Ads Population Density Competing Stores Price Dummy Brand (0= Not a brand,1=Expensive brand) Equation of the model: Sales = 1183.2146 + 10.930149*(Floor space) + (-7.078638) (Competing Ads) + 0.0544797*(Pop Density psqkm) + (-4.233147) *Competing Stores + (-0.139374) *Price + 45.885272(Brand Dummy) R squared value:0.813147 Root mean Square Error 113.1433 Intercept Value: 1183.2146 Summary table – For modified Model Attribute of Model Value Interpretation P- value Interpretation Intercept 1183.2146 The intercept of 1183.2146 indicates that if all other factors are zero, then the sales of the bike will be equivalent to 1183.2146 0.0063 The value of 0.0063 signifies that P> 0.05 and that we reject the null hypothesis. R Squared 81.3147% R Squared value of 81.3147 predicts that 81.31% of data fits the model signifying 81.3147 variability Sales due to independent variables. B1(Slope of Floor Space) 10.930149 This value signifies that if Slope of Floor space increases by 1, then Sales will increase by 10.930149 <.0001 This value interprets that we accept the alternate hypothesis and reject the null signifying relation between Sales of bike and floor space B2(Slope of Competing adds) -7.078638 This value signifies that if slope increases by 1, then Sales will decrease by -7.078638 0.0740 This value signifies that we accept the null and that there is no significant relation between Competing adds and Sales of the bike B3(Population Density) 0.0544797 This value signifies that if slope increases by 1, then Sales will increase by 0.0544797 0.0436 This value signifies that we accept the alternate hypothesis since p<0.05 and that there is significant relation between Population Density and bike sales. B4(Competing Stores) -4.233147 This value signifies that if slope of completing store increases by 1, then Sales decrease by 4.233147 0.7338 This value signifies that we accept the null hypothesis since P>0.05 and that there is no significant relation between Sales and Competing Stores B5(Price) -0.139374 This value signifies that if slope of Price increases by 1, then Sales decreases by 0.139374 0.1175 The value of P =0.1175 signifies that we accept the null and reject the alternate hypothesis signifying no significant relation between Price and Sales. B6(Brand Dummy) 45.885272 This slope values indicates that Sales will be increased by 45.885272 for a bike which is an expensive brand compared to no brand. 0.4584 For the value of P=0.4584, we accept the null hypothesis (as P>0.05) and reject the alternate. This signifies that there is no significant relation between Brand and Sales of the bike. Conclusion: From the above 2 models run, it can be noticed that not all factors taken into consideration have a significant impact on the sales of the mountain bikes. For both the models, with and without dummy, it could be noticed that floor space taken by the bikes and Population density have a P value< 0.05. For these factors, we reject the null hypothesis, accept the alternate. Hence, we conclude that population density and floor space have a significant impact in the sales of the bike. The P value for Competing Adds, Population Density, competing stores and Price have P value > 0.05 which signifies that we accept the null hypothesis and conclude that none of these factors are significant when it comes to sales of the bike. The dummy variable added in the modified model also has a P value(P=0.4584) also signifying that the brand does not have any significant impact on the sales of the bike(as P >0.05. We accept the null hypothesis and reject the alternate hypothesis). It can be noticed while comparing the 2 models that the R squared value is higher and closer to 1 when adding the brand dummy to the data. A slight increase in the can also be noticed in the intercept when we add the dummy brand variable into our model. Further Analysis 1) Multicollinearity problem Below multicollinearity test has been run on modified model. Multicollinearity occurs when independent variables are corelated which in a way is a cause of concern as it hinders accurate results when we fit the model. All independent variables in the model should be independent of each other. VIF or variation inflation factor will be used to interpret multicollinearity. VIF assists in measuring how much variance of an estimated regression coefficient increases if independent variables are correlated. A high VIF signifies that associated independent variable is highly collinear. Values of VIF > 10 are usually said to be multicollinear. Since none of the VIF values of independent variable exceeds 10, none of the independent variables are corelated. Below is multicollinearity on simple regression model (without Dummy parameter) which also depicts that none of the independent variables are correlated. 2) Autocorrelation Autocorrelation is usually referred to as Lag correlation as it measures relationship between variables current values and its past values. Durbin Watson test is used to determine if there exist autocorrelation in the data set or not. The null hypothesis in this case signifies that there is no correlation and that the residuals are independent. The alternative Hypothesis states that residuals are corelated. On Simple Model (without Dummy): The value closer to 2, signifies no autocorrelation. The value of Durbin-Watson test of 1.833 signifies slightly positive correlation (Autocorrelation value of 0.0507 also signify positive correlation). It is to be noticed that P value in this case is 0.2648 and is not significant (as P>0.05). This implies that we accept the null that there is no correlation and residuals are independent. This also signifies no first order positive correlation. On Modified Model (With Dummy) The value of Durbin-Watson test of 1.833 signifies slightly positive correlation. It is to be noticed that P value in this case is 0.2648 and is not significant (as P>0.05). This implies that we accept the null that there is no correlation and residuals are independent. This also signifies no first order positive correlation. References: Kenny Kline (2017, Feb). How the Humble Bicycle Spurred a Modern Lifestyle Industry. Retrieved from https://www.inc.com/kenny-kline/5-trends-that-paved-the-way-for-a-bicycle-industry- renaissance.html (2016, July). How the humble bicycle is making a comeback in US cities. Retrieved from https://www.bbc.com/news/world-us-canada-36778953 Mountain Bike Market Size Worth $3,585 Million By 2026. Retrieved from https://www.polarismarketresearch.com/press-releases/global-mountain-bike-market bike_sales_data_proje ct2.xlsx