My Analysis of 2012 Outlook on Life Survey: Potential Moderator for ANOVA

In our first week of this course, we were to run an ANOVA on two variables in our dataset. I decided to compare ethnicity to religiosity (an index I formulated from several questions about religion). I found that black respondents were significantly more religious than other races. However, one variable that might confound racial comparisons is socioeconomic status. Therefore, I decided to test whether such status is a moderator for religiosity in the various ethnicities.

Following the videos' instructions, I subdivided the 2012 Outlook on Life dataset by self-reported socioeconomic status (W1_P2). Then I ran ANOVAs on each subset of the data. Thus, for those in the "poor" socioeconomic class, F(253,4) = 2.6 and p=0.04, significant but barely. For the "working" class, F(750,4)=8.7 and p<0.001. For "middle" class, F(950,4)=14.7 and p<0.001. For "upper-middle" class, F(209,4)=4.7 and p=0.001. For "upper" class, F(21,3)=6.1 and p=0.004. Thus, I could still reject the null hypothesis for all classes.

As before, since I am running ANOVA on a variable with multiple cases, I ran the Tukey HSD test for each socioeconomic class. When the "poor" groups were compared, the null hypothesis could not be rejected for any pairwise comparisons. However, among the "working" class, blacks and whites were significantly different (p=0.001), but all other pairwise comparisons were not. Among those in the "middle" class, blacks were significantly more religious than all other races (p=0.001 vs whites, p=0.007 vs Latinos, p=0.017 vs others, and p=0.036 vs mixed race), but all other races were not significantly different from each other. Among races in the "upper-middle" class, blacks were again significantly more religious than whites (p=0.0025) but not significantly more than any other race. Again, pairwise comparison of the other races were not significant. This same pattern followed in the "upper" class, with blacks vs whites (p=0.01) and all other comparisons not significant.

If socioeconomic status were, in fact, a moderating variable for the relationship between ethnicity and religiosity, I would expect noticeably different results in the comparisons for some classes from the overall result. However, the same pattern was followed regardless of class, with some expected variation. Thus, I reject the possibility of socioeconomic status as a moderator on this relationship.

This is the snippet of code I used to run these tests:

# ANOVA of ethnicity vs religiosity
model2 = smf.ols(formula='RELIND ~ C(ETH)', data=data).fit()
print (model2.summary())
#create sub4 with only RELIND and ETH to print means and s.d.'s
sub4 = data[['RELIND', 'ETH']].dropna()
print ('means for Religiosity by Race')
m3= sub4.groupby('ETH').mean()
print (m3)
print ('standard deviations for Religiosity by Race')
sd3 = sub4.groupby('ETH').std()
print (sd3)
#run post hoc Tukey test to find out which group was significantly different
mc2 = multi.MultiComparison(sub4['RELIND'], sub4['ETH'])
res2 = mc2.tukeyhsd()
print(res2.summary())

#create sub-dataframes based on socioeconomic class: 1=poor, 2=working, 3=middle, 4=upper-middle, 5=upper
sub5=data[(data['W1_P2']==1)]
sub5 = sub5[['RELIND', 'ETH', 'W1_P2']].dropna()
sub6=data[(data['W1_P2']==2)]
sub6 = sub6[['RELIND', 'ETH', 'W1_P2']].dropna()
sub7=data[(data['W1_P2']==3)]
sub7 = sub7[['RELIND', 'ETH', 'W1_P2']].dropna()
sub8=data[(data['W1_P2']==4)]
sub8 = sub8[['RELIND', 'ETH', 'W1_P2']].dropna()
sub9=data[(data['W1_P2']==5)]
sub9 = sub9[['RELIND', 'ETH', 'W1_P2']].dropna()

#run ANOVAs for each socioeconomic class
print ('association between ethnicity and religiosity for those in "Poor" socioeconomic class')
model3 = smf.ols(formula='RELIND ~ C(ETH)', data=sub5).fit()
print (model3.summary())
print ('association between ethnicity and religiosity for those in "Working" socioeconomic class')
model4 = smf.ols(formula='RELIND ~ C(ETH)', data=sub6).fit()
print (model4.summary())
print ('association between ethnicity and religiosity for those in "Middle" socioeconomic class')
model5 = smf.ols(formula='RELIND ~ C(ETH)', data=sub7).fit()
print (model5.summary())
print ('association between ethnicity and religiosity for those in "Upper-middle" socioeconomic class')
model6 = smf.ols(formula='RELIND ~ C(ETH)', data=sub8).fit()
print (model6.summary())
print ('association between ethnicity and religiosity for those in "Upper" socioeconomic class')
model7 = smf.ols(formula='RELIND ~ C(ETH)', data=sub9).fit()
print (model7.summary())

#print comparative means for each socioeconomic class
print ("means for religiosity by ethnicity for Poor")
m5= sub5.groupby('ETH').mean()
print (m5)
print ("means for religiosity by ethnicity for Working")
m6= sub6.groupby('ETH').mean()
print (m6)
print ("means for religiosity by ethnicity for Middle")
m7= sub7.groupby('ETH').mean()
print (m7)
print ("means for religiosity by ethnicity for Upper-middle")
m8= sub8.groupby('ETH').mean()
print (m8)
print ("means for religiosity by ethnicity for Upper")
m9= sub9.groupby('ETH').mean()
print (m9)

#run post hoc Tukey test for all classes to find out which group in that class was significantly different
mc3 = multi.MultiComparison(sub5['RELIND'], sub5['ETH']) #poor
res3 = mc3.tukeyhsd()
print('Tukey test of ethnicity and religiosity for "Poor" class')
print(res3.summary())
mc4 = multi.MultiComparison(sub6['RELIND'], sub6['ETH']) #working
res4 = mc4.tukeyhsd()
print('Tukey test of ethnicity and religiosity for "Working" class')
print(res4.summary())
mc5 = multi.MultiComparison(sub7['RELIND'], sub7['ETH']) #middle
res5 = mc5.tukeyhsd()
print('Tukey test of ethnicity and religiosity for "Middle" class')
print(res5.summary())
mc6 = multi.MultiComparison(sub8['RELIND'], sub8['ETH']) #upper-middle
res6 = mc6.tukeyhsd()
print('Tukey test of ethnicity and religiosity for "Upper-middle" class')
print(res6.summary())
mc7 = multi.MultiComparison(sub9['RELIND'], sub9['ETH']) #upper
res7 = mc7.tukeyhsd()
print('Tukey test of ethnicity and religiosity for "Upper" class')
print(res7.summary())

My Analysis of 2012 Outlook on Life Survey

Monday, December 14, 2020

Potential Moderator for ANOVA

No comments:

Post a Comment

Testing a Basic Linear Regression Model on the Outlook on Life Survey

Report Abuse