SPSS assignment
Question 1
Think
about whether store size would have impact on weekly department sales. Form a
hypothesis in your head or put it on paper. Now, run a linear regression to
predict Weekly_Sales using tsize of store as the predictor. What is the R2 of
the resulting model?
Hypothesis: Store size has positive, significant influence on weekly
department sales.
For this analysis, the weekly sales is predicted by the number of
stores and the returned R2 = 0.059. See details in SPV file.
Question 2
Think
about whether unemployment would have an impact on sales at Walmart. Form a
hypothesis in your head or write it down on a piece of paper. Now, run a linear
regression to predict Weekly_Sales using unemployment as the independent
variable. What is the R2 for the resulting model?
(Note:
the p-value for the test of R2. If this results surprises you, as it should,
think about why it is statistically significant).
Hypothesis: Unemployment has negative influence on weekly sales
For this analysis, the weekly sales is predicted with unemployment
and obtained R2 = 0.001. See details in SPV file.
What the p-value imply is that the null hypothesis can either be
accepted or rejected. In this case, p <0.05 and the null hypothesis is
rejected and conclude that unemployment has negative influence on weekly sales.
Question 3
What
is the impact of a 1% increase in unemployment rate on weekly department sales
at Walmart?
Be
sure to include the sign. Do not include units in your answer; only write the
number. A positive sign would indicate an increase, while a negative sign would
suggest a decrease.
One point (%) increase (change) in unemployment would result to
-314.946 on weekly sales as
documented in the “B” of unstandardized coefficients. See details in SPV file.
Question 4
What
is the MAPE for this model?
The MAPE, which is also the MADE is represented in the regression
coefficient table as standardized error. It
is equal to 18.8 (approx.) percent. See details in SPV file.
Question 5
There
are a large number of independent variables here. Assuming you don’t have a
strong theoretical reason to include any particular variable in the model, you
would use a statistical approach in selecting variables to include in the
model. Run a stepwise variable selection method with the following independent
variables: Size, IsHoliday, Temperature, Fuel_Price, CPI, and Unemployment. How
many variables are included in the best model?
Six (6) variables. In this
order: Predictors: (Constant), Size, CPI, Unemployment, Temperature, IsHoliday,
Fuel_Price with Store Size as the best-fit variable.
Question 6
Now,
do the same as above but using the forward variable selection method. How many
variables are included in the best model?
The Six (6) variables are also included in the same manner as the
Stepwise selection method. Predictors:
(Constant), Size, CPI, Unemployment, Temperature, IsHoliday, Fuel_Price
Question 7
Repeat
the above using the Backward method. How many variables are included in the
best model?
All requested variables were entered.
Question 8
Now,
run a model using the Enter method with the independent variable selected based
on the Stepwise variable selection method. What is the R2 of this model?
(Think
of how you could explain this result to your boss, CEO of Walmart.)
All the variables were entered in the Stepwise Method and as such
entered in the Enter method. The R2 = 0.061.
The implication (explanation to the CEO) is that 61%
of the total variation in the dependent variable can be explained by the
independent variables
Question 9
What
is the MAPE of this model?
(Think
of what you would say to your boss, CEO of Walmart.)
Model |
Unstandardized Coefficients |
||
B |
Std. Error |
||
1 |
(Constant) |
8577.671 |
398.964 |
IsHoliday |
1398.830 |
134.506 |
|
Size |
.091 |
.001 |
|
Temperature |
29.142 |
1.956 |
|
Fuel_Price |
-454.467 |
76.847 |
|
CPI |
-18.800 |
.955 |
|
Unemployment |
-264.532 |
19.517 |
Based on the above table, the MAPE = sum of the standard error /
number of variables
MAPE = 38.96
The implication of this is that the influence of 38.96 percent error
gap should be applied in interpreting the influence of the independent
variables on the dependent variables. However, this would be best understood
through individual analysis as Size of store returned the most significant
error margin (1%), showing that 99% of the influence that Size yields on Weekly
Sales are correct and this goes for other variables.
Question 10
What
is the RMSE of this model?
(Again,
think of how you would explain this result to your boss, CEO of Walmar.)
In SPSS, RMSE is the same as the standard deviation and it is
contained in the below table.
Descriptive Statistics |
|||
|
Mean |
Std. Deviation / RMSE |
N |
Weekly_Sales |
15984.0260 |
22712.17621 |
421497 |
IsHoliday |
.07 |
.256 |
421497 |
Size |
136728.08 |
60980.976 |
421497 |
Temperature |
60.0902 |
18.44753 |
421497 |
Fuel_Price |
3.36103 |
.458511 |
421497 |
CPI |
171.202288308 |
39.1594400648 |
421497 |
Unemployment |
7.96021 |
1.863301 |
421497 |
The implication is that the RMSE explain the extent how the value differs
from the middle (mean). For instance, in the case of Size, the RMSE is 60981
and this is because the different in middle is very vast.
Question 11
Which
of the following is the second most important predictor of weekly department
sales?
IsHoliday is the second most important with correlation of 0.013 (13
percent) as others had negative correlation