import numpy as np
import pandas as pd
Impact of minimum wage increase on employment rate of fast food restaurants
# pull the data
= pd.read_csv("datasets/njmin3.csv") dataset
dataset.head()
NJ | POST_APRIL92 | NJ_POST_APRIL92 | fte | bk | kfc | roys | wendys | co_owned | centralj | southj | pa1 | pa2 | demp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 15.00 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 12.00 |
1 | 1 | 0 | 0 | 15.00 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 6.50 |
2 | 1 | 0 | 0 | 24.00 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | -1.00 |
3 | 1 | 0 | 0 | 19.25 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 2.25 |
4 | 1 | 0 | 0 | 21.50 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13.00 |
NJ
: if the fast food restaurante is located at New Jersey (1) or Pensylvania (0)POST_APRIL92
: if the observation was recorded after (1) or before (0) april 92NJ_POST_APRIL92
: multiplication ofNJ
byPOST_APRIL92
fte
: full time employment rate
Each line of the dataframe represents an observation of fte on a fast food restaurant.
dataset.shape
(820, 14)
dataset.describe()
NJ | POST_APRIL92 | NJ_POST_APRIL92 | fte | bk | kfc | roys | wendys | co_owned | centralj | southj | pa1 | pa2 | demp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 | 820.000000 |
mean | 0.807317 | 0.500000 | 0.403659 | 21.026511 | 0.417073 | 0.195122 | 0.241463 | 0.146341 | 0.343902 | 0.153659 | 0.226829 | 0.087805 | 0.104878 | -0.070443 |
std | 0.394647 | 0.500305 | 0.490930 | 9.271972 | 0.493376 | 0.396536 | 0.428232 | 0.353664 | 0.475299 | 0.360841 | 0.419037 | 0.283184 | 0.306583 | 8.725511 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -41.500000 |
25% | 1.000000 | 0.000000 | 0.000000 | 15.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -3.500000 |
50% | 1.000000 | 0.500000 | 0.000000 | 20.500000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
75% | 1.000000 | 1.000000 | 1.000000 | 25.500000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4.000000 |
max | 1.000000 | 1.000000 | 1.000000 | 85.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 34.000000 |
sum() dataset.isnull().
NJ 0
POST_APRIL92 0
NJ_POST_APRIL92 0
fte 26
bk 0
kfc 0
roys 0
wendys 0
co_owned 0
centralj 0
southj 0
pa1 0
pa2 0
demp 52
dtype: int64
# replacing null values with averages
from sklearn.impute import SimpleImputer
= SimpleImputer(missing_values = np.nan, strategy = 'mean')
missingvalues_imputer 'fte', 'demp']])
missingvalues_imputer.fit(dataset[['fte', 'demp']] = missingvalues_imputer.transform(dataset[['fte', 'demp']]) dataset[[
DiD with Aggregated Metrics
'NJ', 'POST_APRIL92'])['fte'].mean().reset_index() dataset.groupby([
NJ | POST_APRIL92 | fte | |
---|---|---|---|
0 | 0 | 0 | 23.272823 |
1 | 0 | 1 | 21.162064 |
2 | 1 | 0 | 20.457145 |
3 | 1 | 1 | 21.027396 |
- (NJ fte after treatment) - (NJ fte before treatment) = 21.03 - 20.46 = 0.57
- (PENN fte after treatment) - (PENN fte before treatment) = 21.162064 - 23.272823 = - 2.11
- DiD = 0.57 - (-2.11) = 0.57 + 2.11 = 2.68
- DiD = 2.68
The full time employment (fte) rate on New Jersey have an increase of 2.73 due to the minimum wage increase policy.
In other words, increasing the minimum wage has a positive impact on employment rate for fast food restaurants on New Jersey.
DiD with Linear Regression
Let NJ be represented by G and POST_APRIL92 represented by T. So the functional form of linear regression is:
\[fte(G,T) = \beta_0 + \beta_1 G + \beta_2 T + \beta_3 T G\]
\[DiD = [fte(1,1) - fte(1,0)] - [fte(0,1) - fte(0,0)]\]
\[DiD = [\beta_0 + \beta_1 + \beta_2 + \beta_3 - \beta_0 - \beta_1] - [\beta_0 + \beta_2 - \beta_0]\]
\[DiD = \beta_2 + \beta_3 - \beta_2 = \beta_3\]
\[DiD = \beta_3\]
= dataset[['NJ', 'POST_APRIL92', 'NJ_POST_APRIL92']]
X = dataset['fte'].values y
import statsmodels.api as sm
= sm.add_constant(X)
X = sm.OLS(y, X).fit() model1
print(model1.summary(yname="FTE",
=("intercept", "New Jersey", "After April 1992", "New Jersey and after April 1992"),
xname="Model 1: FTE ~ NJ + POST_APRIL92 + NJ_POST_APRIL92")) title
Model 1: FTE ~ NJ + POST_APRIL92 + NJ_POST_APRIL92
==============================================================================
Dep. Variable: FTE R-squared: 0.007
Model: OLS Adj. R-squared: 0.004
Method: Least Squares F-statistic: 1.974
Date: Wed, 28 Dec 2022 Prob (F-statistic): 0.116
Time: 20:11:03 Log-Likelihood: -2986.2
No. Observations: 820 AIC: 5980.
Df Residuals: 816 BIC: 5999.
Df Model: 3
Covariance Type: nonrobust
===================================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------------------
intercept 23.2728 1.041 22.349 0.000 21.229 25.317
New Jersey -2.8157 1.159 -2.430 0.015 -5.091 -0.541
After April 1992 -2.1108 1.473 -1.433 0.152 -5.001 0.780
New Jersey and after April 1992 2.6810 1.639 1.636 0.102 -0.536 5.898
==============================================================================
Omnibus: 232.659 Durbin-Watson: 1.847
Prob(Omnibus): 0.000 Jarque-Bera (JB): 908.337
Skew: 1.289 Prob(JB): 5.72e-198
Kurtosis: 7.465 Cond. No. 11.4
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The coefficient of the variable NJ_POST_APRIL92 = New Jersey and after April 1992
is 2.68, that is equal to the value founded by the aggregation method for DiD.