The segregation variable constitutes 33.4% of the missing data, followed by highschool_grd (21.2%), mental_care (5.79%), primary_care (4.91%), life_expectancy (2.28%), food_index (1.10%). Other variables with minimal missingness (<1%) include income_inequality, housing_burden, insuff_sleep, uninsured, unemployment, household_income.
2.5.1 Filter Complete Cases For Exposure dentist_prop
We will remove missing cases for the exposure dentist_prop and name the set chr24_si.
The variables with a correlation of 0.8 or more (or -0.8 or less) are:
poor_health with poor_physical, life_expectancy, diabetes, inactivity, and obesity
poor_physical with poor_mental, life_expectancy, diabetes, and inactivity
diabetes with inactivity
The variables that exhibited high correlation with the outcome smoking_prop will be removed from the analysis (poor_health, poor_physical, and diabetes).
The low dentist group counties have more proportions of uninsured populations (0.13 vs 0.09 ), lower college education (0.54 vs 0.67), lower life expectancy (74.9 vs 77.3), more rural areas (0.89 vs 0.38), moresmoking prevalence (18.8 vs 16.4) and lowerexcessive drinking (16.3 vs 18.8).
5 Propensity Score Assessment
5.1 Calculating and Evaluating PS Balance
To calculate the PS, we will build a logistic model with the binary exposure as the outcome and the covariates as the predictors.
While most of the variables have low variance inflation factors, there are a few with an extremely high values (>10), which presents serious multi-collinearity. These variables are hispanic, white, black, and native. We will remove these variables from our analysis.
5.1.3 Fixed Propensity Score
We will start by excluding the problematic variables from our final data set chr_fin to produce subset for outcome 1 analysis chr24_smoking.
5.1.5 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps00 + linps00 ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")
response
dentist_grp
min
Q1
median
Q3
max
mean
sd
n
missing
ps00
High
0.000
0.004
0.026
0.137
0.992
0.145
0.254
777
0
ps00
Low
0.014
0.828
0.936
0.982
1.000
0.855
0.195
777
0
linps00
High
−20.056
−5.575
−3.626
−1.842
4.880
−3.839
3.430
777
0
linps00
Low
−4.238
1.570
2.689
4.015
10.800
2.707
1.906
777
0
The propensity scores slightly improved after removing the variables with high collinearity values. However, the problem with their values being too close to 0 or 1 is visibly persistent.
5.1.5.1 Fixed Propensity Score - Part 2
We will attempt to observe the impact of each covariate on the outcome individually to rule out the variable(s) that impacted the PS values.
5.1.5.1.1poor_mental
Code
psmodel_pm <-glm(dentist_grp ~ poor_mental, family =binomial(), data= chr_fin1)model_parameters(psmodel_pm) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")
5.1.5.1.63 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_se + linps_se ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")
response
dentist_grp
min
Q1
median
Q3
max
mean
sd
n
missing
ps_se
High
0.165
0.355
0.445
0.530
0.886
0.448
0.135
777
0
ps_se
Low
0.145
0.432
0.565
0.679
0.886
0.552
0.166
777
0
linps_se
High
−1.618
−0.595
−0.222
0.119
2.055
−0.217
0.594
777
0
linps_se
Low
−1.776
−0.275
0.263
0.749
2.055
0.233
0.741
777
0
The segregation variable has two linear PS less than 0.01 and all PS values around 0.99. We might remove this variable.
5.1.6 PS After Covariates Removal
We will start by building a subset for the first outcome in our analysis chr_fin2 after removing the problematic covariates from the original chr_fin set.
df_stats(ps1 + linps1 ~ dentist_grp, data = chr_fin2) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")
response
dentist_grp
min
Q1
median
Q3
max
mean
sd
n
missing
ps1
High
0.002
0.129
0.257
0.463
0.993
0.320
0.240
777
0
ps1
Low
0.005
0.528
0.731
0.872
0.999
0.680
0.232
777
0
linps1
High
−6.437
−1.911
−1.063
−0.150
4.940
−0.987
1.448
777
0
linps1
Low
−5.230
0.112
1.000
1.923
6.709
1.032
1.435
777
0
The problematic values in the propensity score are persistent and we will need to investigate this using other methods. For now, we will proceed to the analysis phase.
6 Outcome 1 Analysis
6.1 The Blanace Before PS Adjustment
Code
ggplot(chr_fin2, aes(x = dentist_grp, y = ps1, color = dentist_grp)) +geom_boxplot() +geom_jitter(width =0.1) +guides(color ="none")
Code
ggplot(chr_fin2, aes(x = linps1, fill = dentist_grp)) +geom_density(alpha =0.3)
The absolute value of the standardized difference of the linear propensity score comparing the counties with high dentist proportions and low dentist proportions is 114.8%. This value is too far from the ideal value of 0% and above the maximum acceptable value of 50 %. We definitely do not pass Rubin’s rule 1, therefore, propensity score adjustment is needed.
The ratio of variances of the linear propensity score comparing the counties with high dentist proportions and low dentist proportions is 0.98. This value is very close to the desired value of 1.0. In this case, we pass Rubin’s Rule 2.
6.2 Unadjusted Outcome 1 smoking_prop
We will observe the the distribution of the outcome smoking_prop among the two exposure groups dentist_grp.
Code
df_stats(smoking_prop ~ dentist_grp, data = chr_fin2) |>gt() |>fmt_number(mean:sd, decimals =2) |>opt_stylize(style =5, color ="pink")
response
dentist_grp
min
Q1
median
Q3
max
mean
sd
n
missing
smoking_prop
High
6.821752
14.04568
15.86923
18.30110
37.02901
16.41
3.68
777
0
smoking_prop
Low
7.687424
15.68921
18.72868
21.37735
41.20176
18.78
3.76
777
0
The high dentist group has a lower mean of smoking proportions (16.4 vs 18.8).
6.2.1 Linear Regression Model for Outcome 1
We will build a linear regression model for outcome 1 smoking_prop and the exposure dentist_prop without including the covariates.
Code
unadj.out1 <-lm(smoking_prop ~ dentist_grp, data= chr_fin2)model_parameters(unadj.out1, ci =0.95) |>gt() |>fmt_number(columns =-df_error, decimals =3) |>opt_stylize(style =6, color ="pink")
Parameter
Coefficient
SE
CI
CI_low
CI_high
t
df_error
p
(Intercept)
16.412
0.133
0.950
16.150
16.673
123.071
1552
0.000
dentist_grpLow
2.364
0.189
0.950
1.994
2.734
12.533
1552
0.000
The unadjusted model estimates that low dentist proportions increases smoking prevalence in US counties by 2.36 percentage points (95% CI: 1.99, 2.73).
6.3 1:1 Matching With Replacement
The binary exposure has an equal number of observations in each group (n=777). Therefore, we will use 1:1 matching with replacement.
Code
X <- chr_fin2$linps1 Tr <-as.logical(chr_fin2$dentist_n)match1 <-Match(Tr=Tr, X=X, M =1, replace =TRUE, ties=FALSE)summary(match1)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 1554
Original number of treated obs............... 777
Matched number of observations............... 777
Matched number of observations (unweighted). 777
checking if matching worked:
Code
n_distinct(match1$index.treated)
[1] 777
Code
n_distinct(match1$index.control)
[1] 278
After 1:1 matching with replacement, the effective size was reduced to 272 matched counties from the treatment and control group.
As previously mentioned, the effective size after 1:1 matching with replacement is 278 counties. We lost 499 counties from the sample because they were unmatched.
Now we will build the love plot for the standardized differences:
Code
lplot1 <-love.plot(b, threshold = .1, size =3,var.order ="unadjusted", stars ="raw",title ="Standardized Differences and 1:1 Matching with Replacement")lplot1
All covariates have improved standardized difference after 1:1 matching adjustment, with the exception for social_score.
Plot of Variance Ratios
Code
lplot2 <-love.plot(b, stat ="v",threshold =1.25, size =3,var.order ="unadjusted",title ="Variance Ratios and 1:1 Matching with Replacement")lplot2 +theme_bw()
Most of the covariates displayed improved variance ratios with the exception of income_inequality, highschool_grad, insuff_sleep, and above_65.
The absolute value of the standardized difference of the linear propensity score before adjustment was 114.8%. The value after adjustment (52.76%) shows substantial improvement as it is close to the maximum value of 50%, but the value remains above 10% and is not close to 0%. In this case, we barely pass Rubin’s Rule 1.
The previous ratio of the variance of the linear propensity score was 0.98. This value also shows mild improvement where the value is closer to 1 and remains below 2. We also pass Rubin’s Rule 2 and can proceed to use the results of this PS adjustment to measure the estimates.
6.3.6 Estimating Effect of Exposure on Outcome After 1:1 Matching
We will use the ATT (Average Treatment Effect on the Treated) estimate to measure the dentist proportion effect on smoking prevalence in US Counties. First, we will need to create a data frame out of the matched sample:
After 1:1 matching with replacement, our model estimates the effect of low dentists proportions on smoking prevalence in US counties to increase by 0.78 percentage points (95% CI: 0.39, 1.16). Our confidence interval does not include 0, therefore, following up with a sensitivity analysis is a reasonable choice.
6.4 Use ATT Weighting Adjustment
We will start by creating a data frame of our data set.
Code
chrfin_df <- base::data.frame(chr_fin2)
Now, we will calculate the propensity score using the ps function from the twang package.
After ATT weight adjustment, the estimated impact of low dentist proportions on smoking prevalence in US counties is decrease of -0.43 percentage points (95% CI: -2.57, 1.71).
After Double Robust adjustment, the estimated impact of low dentist proportions on smoking prevalence in US counties is an increase of 0.72 percentage points (95% CI: -0.17, 1.61). These results are slightly different from the ATT Weighting results.
We can see that propensity score adjustment decreased the estimated affect of dentist proportions on smoking prevalence in US counties (from 2.36% to about 0.7-0.8%) . While two of the three adjustments present close estimates, the 1:1 matching with replacement adjustment displays the smallest confidence interval, the best balance, and largest effective size (278 after 1:1 matching compared to 163 after ATT weighting).
6.7 Sensitivity Analysis
The estimate produced from the 1:1 Matching with replacement has a statistically significant p-value. Therefore, we will need to perform a sensitivity analysis.
Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
Unconfounded estimate .... NaN
Gamma Lower bound Upper bound
1.00 NaN NaN
1.25 NaN NaN
1.50 NaN NaN
1.75 NaN NaN
2.00 NaN NaN
2.25 NaN NaN
2.50 NaN NaN
2.75 NaN NaN
3.00 NaN NaN
3.25 NaN NaN
3.50 NaN NaN
3.75 NaN NaN
4.00 NaN NaN
4.25 NaN NaN
4.50 NaN NaN
4.75 NaN NaN
5.00 NaN NaN
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
Our conclusions regarding the impact of low dentists proportions increasing smoking prevalence in US counties hold true in absence of biases in the study. This conclusion is maintained in the presence of a biases level between L= 1.00 and L= 1.25. This means that only small hidden bias is needed to change our conclusions.
7 Outcome 2 Analysis
7.1 Unadjusted Outcome 2 drinking_prop
We will observe the the distribution of the outcome drinking_prop among the two exposure groups dentist_grp.
Code
df_stats(drinking_prop ~ dentist_grp, data = chr_fin2) |>gt() |>fmt_number(mean:sd, decimals =2) |>opt_stylize(style =5, color ="pink")
response
dentist_grp
min
Q1
median
Q3
max
mean
sd
n
missing
drinking_prop
High
10.140012
16.79649
18.88142
20.74658
29.44012
18.83
3.24
777
0
drinking_prop
Low
9.265156
14.09065
16.03230
18.32060
27.81331
16.26
2.91
777
0
The low dentist group has a lower mean of excessive drinking prevalence (16.3 vs 18.8).
7.1.1 Linear Regression Model for Outcome 2
We will build a linear regression model for outcome 2 drinking_prop and the exposure dentist_grp without including the covariates.
Code
unadj.out2 <-lm(drinking_prop ~ dentist_grp, data= chr_fin2)model_parameters(unadj.out2, ci =0.95) |>gt() |>fmt_number(columns =-df_error, decimals =3) |>opt_stylize(style =6, color ="pink")
Parameter
Coefficient
SE
CI
CI_low
CI_high
t
df_error
p
(Intercept)
18.825
0.110
0.950
18.609
19.042
170.493
1552
0.000
dentist_grpLow
−2.563
0.156
0.950
−2.869
−2.256
−16.412
1552
0.000
The unadjusted model estimates that low dentist proportions decreases excessive drinking prevalence in US counties by 2.56 percentage points (95% CI: -2.87, -2.26).
7.2 1:1 Matching With Replacement
The binary exposure has an equal number of observations in each group (n=777). Therefore, we will use 1:1 matching with replacement.
Code
X <- chr_fin2$linps1 Tr <-as.logical(chr_fin2$dentist_n)match2 <-Match(Tr=Tr, X=X, M =1, replace =TRUE, ties=FALSE)summary(match2)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 1554
Original number of treated obs............... 777
Matched number of observations............... 777
Matched number of observations (unweighted). 777
checking if matching worked:
Code
n_distinct(match2$index.treated)
[1] 777
Code
n_distinct(match2$index.control)
[1] 269
After 1:1 matching with replacement, the effective size was reduced to 269 matched counties from the low dentist and high dentist groups.
As previously mentioned, the effective size after 1:1 matching with replacement is 269 counties. We lost 508 counties from the sample because they were unmatched.
Now we will build the love plot for the standardized differences:
Code
lplot5 <-love.plot(b2, threshold = .1, size =3,var.order ="unadjusted", stars ="raw",title ="Standardized Differences and 1:1 Matching with Replacement")lplot5
All covariates have improved standardized difference after 1:1 matching adjustment, with the exception for social_score.
Plot of Variance Ratios
Code
lplot6 <-love.plot(b2, stat ="v",threshold =1.25, size =3,var.order ="unadjusted",title ="Variance Ratios and 1:1 Matching with Replacement")lplot6 +theme_bw()
Most of the covariates displayed improved variance ratios with the exception of highschool_grad, insuff_sleep, unemployment, and above_65.
The absolute value of the standardized difference of the linear propensity score before adjustment was 114.8%. The value after adjustment (52.03%) shows substantial improvement, however the value is above 10% and far from to 0%. In this case, we barely pass Rubin’s Rule 1.
The previous ratio of the variance of the linear propensity score was 0.98. This value also shows mild improvement where the value is closer to 1 (1.03) and remains below 2. We also pass Rubin’s Rule 2 and can proceed to use the results of this PS adjustment to measure the estimates.
7.2.6 Estimating Effect of Exposure on Outcome After 1:1 Matching
We will use the ATT (Average Treatment Effect on the Treated) estimate to measure the dentist proportion effect on smoking prevalence in US Counties. First, we will need to create a data frame out of the matched sample:
After 1:1 matching with replacement, our model estimates the effect of low dentists proportions on excessive drinking prevalence in US counties to increase by 0.40 percentage points (95% CI: 0.16, 0.64). Our confidence interval does not include 0, therefore, following up with a sensitivity analysis is a reasonable choice.
7.3 Use ATT Weighting Adjustment
We will start by creating a data frame of our data set.
Code
chrfin_df2 <- base::data.frame(chr_fin2)
Now, we will calculate the propensity score using the ps function from the twang package.
After ATT weight adjustment, the estimated impact of low dentist proportions on excessive drinking prevalence in US counties is an increase of 1.15 percentage points (95% CI: 0.13, 2.17).
7.4 Use Double Robust Adjustment
We will use the previous double robust design chr.dr.design.
After Double Robust adjustment, the estimated impact of low dentist proportions on excessive drinking prevalence in US counties is a decrease of -0.11 percentage points (95% CI: -0.54, 0.33). These results are very different from the ATT Weighting results.
We can see that propensity score adjustment inversed the estimated affect of dentist proportions on excessive drinking prevalence in US counties (from -2.56% to 0.4-1.2%) . While two of the three adjustments present close estimates, the 1:1 matching with replacement adjustment displays the smallest confidence interval, the best balance, and largest effective size (273 after 1:1 matching compared to 163 after ATT weighting).
7.6 Sensitivity Analysis
The estimate produced from the 1:1 Matching with replacement has a statistically significant p-value. Therefore, we will need to perform a sensitivity analysis.
Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
Unconfounded estimate .... NaN
Gamma Lower bound Upper bound
1.00 NaN NaN
1.25 NaN NaN
1.50 NaN NaN
1.75 NaN NaN
2.00 NaN NaN
2.25 NaN NaN
2.50 NaN NaN
2.75 NaN NaN
3.00 NaN NaN
3.25 NaN NaN
3.50 NaN NaN
3.75 NaN NaN
4.00 NaN NaN
4.25 NaN NaN
4.50 NaN NaN
4.75 NaN NaN
5.00 NaN NaN
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
Our conclusions regarding the impact of low dentists proportions increasing excessive drinking prevalence in US counties hold true in absence of biases in the study. This conclusion is maintained in the presence of a biases level between L= 1.00 and L= 1.25. This means that only small hidden bias is needed to change our conclusions.
8 Discussion
This study found that counties with low dentist proportions are associated with a small increase in smoking prevalence and excessive drinking prevalence among adults (0.78% and 0.4% respectively). Although these estimates provided a meaningful size effect, the sensitivity analysis concluded that our conclusions may change in the presence of small hidden bias (gamma between 1.0 and 1.25). The weak findings may be attributed to the existing low number of dentists per 1000 population in US counties. Despite the attempt to dichotomize the dentist variable by using the first and third quartiles (0.2 and 0.6 per 1000 population), the difference was not large enough to detect observable impact. It is also important to note that increased presence of dentists does not spontaneously reduce negative oral habits (smoking and excessive drinking). The quality care and accessibility of services is an important factor in improving oral health and other related health behaviors.
This project emphasized the importance of using confounders to assess selection biases in observational studies. The difference in the exposure’s effect estimates after propensity score adjustment was remarkable. While our initial belief was that having more covariates would improve the analysis, this project corrected that misconception. In our analysis, multiple covariates were dropped due to high correlation, collinearity, or for disrupting the propensity score values. Selecting and evaluating appropriate covariates was the most critical part for the propensity score analysis. To improve my study in the future, we will need to explore more variables from the County Health Rankings (CHR) as we may have missed other useful variables during the initial selection phase. Also, using the Health Professional Shortage Area classification (HPSA) instead of the number of dentists per population can provide a better assessment of the impact of dentist quantities on oral-related health behaviors in US counties.
---title: "Dentists, Adult Smoking, and Excessive Drinking"subtitle: "A Cross-sectional Study from CHR 2018 - 2024"author: "Sarah Albalawi"date: last-modifiedformat: html: toc: true number-sections: true code-fold: show code-tools: true code-overflow: wrap embed-resources: true date-format: iso theme: default ---# R Setup {.unnumbered}```{r}#| message: false#| warning: falseknitr::opts_chunk$set(comment =NA) library(janitor)library(dplyr)library(readr)library(magrittr)library(Hmisc)library(knitr)library(naniar)library(easystats)library(tableone)library(mosaic)library(gt)library(Epi)library(Matching)library(arm)library(cobalt)library(broom)library(survival)library(ggplot2)library(mice)library(patchwork)library(twang)library(car)library(rbounds)library(tidyverse)theme_set(theme_bw()) ```### Loading CHR Data {.unnumbered}We will load two data sets from the County Health Ranking source: CHR from year 2024 and CHR from year 2018.```{r}chr24_raw <-read_csv("analytic_data2018_0.csv", skip =1, show_col_types =FALSE)dim(chr24_raw)``````{r}chr18_raw <-read_csv("analytic_data2024.csv", skip =1, show_col_types =FALSE)dim(chr18_raw)```# Data Management## Joining data setsWe will only use the outcomes `smoking` and `excessive drinking` from the 2024 CHR data set.```{r}chr24_var <- chr24_raw|> dplyr::select( fipscode, v009_rawvalue, v049_rawvalue)```The exposure `dentist proportion` and the covariates will be pulled from the 2018 CHR data set.```{r}chr18_var <- chr18_raw|> dplyr::select( fipscode, v002_rawvalue, v036_rawvalue, v042_rawvalue, v133_rawvalue, v085_rawvalue, v004_rawvalue, v088_rawvalue, v062_other_data_1, v069_rawvalue, v023_rawvalue, v044_rawvalue, v140_rawvalue, v147_rawvalue, v060_rawvalue, v143_rawvalue, v021_rawvalue, v141_rawvalue, v153_rawvalue, v154_rawvalue, v052_rawvalue, v053_rawvalue, v054_rawvalue, v055_rawvalue, v081_rawvalue, v056_rawvalue, v126_rawvalue, v057_rawvalue, v058_rawvalue, v070_rawvalue, v011_rawvalue, v063_rawvalue) ```Now we will joing the two table to produce our initial working data set `chr_1`.```{r}chr_1 <- chr18_var |>left_join(chr24_var, by="fipscode")dim(chr_1)```Our joint data set has 3,195 observations (counties) and 34 variables (health measures).## The Codebook {.unnumbered}The data file includes 3,195 observations (counties), on 34 variables (health measures).| Variable | Description | Notes | Year ||---:|----|----|----|| `fipscode` | 5-digit FIPS Code | | || `v088_rawvalue` | Number of dentists per 1000 population | **Binary Exposure**: High proportion vs Low proportion | 2018 || `v009_rawvalue` | Adult Smoking prevalence % | **Quantitative Outcome 1** | 2024 || `v049_rawvalue` | Excessive Drinking prevalence % | **Quantitative outcome 2** | 2024 || `v052_rawvalue` | \% Below 18 Years of Age proportion | *Covariate* | 2018 || `v053_rawvalue` | \% 65 and Older proportion | *Covariate* | 2018 || `v054_rawvalue` | \% Non-Hispanic Black proportion | *Covariate* | 2018 || `v055_rawvalue` | \% American Indian or Alaska Native proportion | *Covariate* | 2018 || `v081_rawvalue` | \% Asian proportion | *Covariate* | 2018 || `v056_rawvalue` | \% Hispanic proportion | *Covariate* | 2018 || `v126_rawvalue` | \% Non-Hispanic White proportion | *Covariate* | 2018 || `v057_rawvalue` | \% Female proportion | *Covariate* | 2018 || `v058_rawvalue` | \% Rural | *Covariate* | 2018 || `v002_rawvalue` | \% Poor or Fair Health | *Covariate* | 2018 || `v036_rawvalue` | Poor Physical Health Days | *Covariate* | 2018 || `v042_rawvalue` | Poor Mental Health Days | *Covariate* | 2018 || `v133_rawvalue` | Food Environment Index | *Covariate* | 2018 || `v085_rawvalue` | Uninsured population % | *Covariate* | 2018 || `v004_rawvalue` | Primary Care Physicians per population | *Covariate* | 2018 || `v062_other_data_1` | Ratio of population to mental health providers | *Covariate* | 2018 || `v021_rawvalue` | High School Completion % | *Covariate* | 2018 || `v069_rawvalue` | Some College % | *Covariate* | 2018 || `v023_rawvalue` | Unemployment % | *Covariate* | 2018 || `v044_rawvalue` | Income Inequality | *Covariate* | 2018 || `v140_rawvalue` | Social Associations | *Covariate* | 2018 || `v070_rawvalue` | Physical Inactivity days | *Covariate* | 2018 || `v011_rawvalue` | Adult Obesity prevalence % | *Covariate* | 2018 || `v060_rawvalue` | Diabetes Prevalence % | *Covariate* | 2018 || `v143_rawvalue` | Insufficient Sleep proportion % | *Covariate* | 2018 || `v141_rawvalue` | Residential Segregation - Black/White | *Covariate* | 2018 || `v147_rawvalue` | Life Expectancy (years) | *Covariate* | 2018 || `v063_rawvalue` | Median Household Income | *Covariate* | 2018 || `v153_rawvalue` | Home ownership | *Covariate* | 2018 || `v154_rawvalue` | Severe Housing Cost Burden | *Covariate* | 2018 |> Note: The analysis will include one exposure and two outcome variables.## Renaming the variables```{r}chr_1 <- chr_1 |>mutate(dentist_grp = v088_rawvalue*100,smoking_prop = v009_rawvalue*100,drinking_prop = v049_rawvalue*100)|>rename(below_18 = v052_rawvalue,above_65 = v053_rawvalue,black = v054_rawvalue,native = v055_rawvalue,asian = v081_rawvalue,hispanic = v056_rawvalue,white = v126_rawvalue,female = v057_rawvalue,rural = v058_rawvalue,poor_health = v002_rawvalue,poor_physical = v036_rawvalue,poor_mental = v042_rawvalue,food_index = v133_rawvalue,uninsured = v085_rawvalue,primary_care = v004_rawvalue,mental_care = v062_other_data_1,highschool_grad = v021_rawvalue,some_college = v069_rawvalue,unemployment = v023_rawvalue,income_inequality = v044_rawvalue,social_score = v140_rawvalue,inactivity = v070_rawvalue,obesity = v011_rawvalue,diabetes = v060_rawvalue, insuff_sleep = v143_rawvalue,segregation = v141_rawvalue, life_expectancy = v147_rawvalue,household_income = v063_rawvalue, home_ownership = v153_rawvalue,housing_burden = v154_rawvalue )|> dplyr::select(-v088_rawvalue, -v009_rawvalue , -v049_rawvalue)```Checking the new names:```{r}names(chr_1)```> The new names look reasonable and are free from spelling errors.## Categorizing Exposure `dentist_grp````{r}summary(chr_1$dentist_grp)```We will use the first quartile (0.028) and third quartile (0.065) as cut points to dichotomize the `dentist_grp` variable.```{r}chr_1 <- chr_1 |>mutate(dentist_grp =case_when( dentist_grp <=0.02760~"Low", dentist_grp >=0.06495~"High")) |>mutate(dentist_grp =factor(dentist_grp))chr_1 |>count(dentist_grp)``````{r}chr_1 <- chr_1 |>mutate(dentist_n =as.numeric(dentist_grp),dentist_n = dentist_n -1)chr_1 |>count(dentist_n)```We have 777 counties in each group.## Numeric Summaries Before Imputation```{r}describe(chr_1)```## Dealing with Missingness```{r}miss_var_summary(chr_1)```The `segregation` variable constitutes 33.4% of the missing data, followed by highschool_grd (21.2%), `mental_care` (5.79%), `primary_care` (4.91%), life_expectancy (2.28%), `food_index` (1.10%). Other variables with minimal missingness (\<1%) include `income_inequality`, `housing_burden`, `insuff_sleep`, `uninsured`, `unemployment`, `household_income`.### Filter Complete Cases For Exposure `dentist_prop`We will remove missing cases for the exposure `dentist_prop` and name the set `chr24_si`.```{r}chr_simp <- chr_1 |>filter(!is.na(dentist_grp))|>filter(!is.na(dentist_n))``````{r}n_miss(chr_simp$dentist_n)```The outcome variable has no missing values. Now we will deal with the missing values in the covariates.### Single Imputation for CovariatesWe will use the `mice` function from the `mice` package to perform single imputation for all variables in the `chr24_si` data set.```{r}chr_simp <-mice(chr_simp, m =1, seed =500, print =FALSE) |>complete() |>tibble()``````{r}miss_var_summary(chr_simp)```There are no missing observation in the semi-final working data set `chr_simp`.# Final Variable Selection## Check Correlation```{r}res <-correlation(chr_simp, method ="spearman")res```The variables with a correlation of 0.8 or more (or -0.8 or less) are:- **poor_health** with `poor_physical`, `life_expectancy`, `diabetes`, `inactivity`, and `obesity`- **poor_physical** with `poor_mental`, `life_expectancy`, `diabetes`, and `inactivity`- **diabetes** with `inactivity`> The variables that exhibited high correlation with the outcome `smoking_prop` will be removed from the analysis (`poor_health`, `poor_physical`, and `diabetes`).```{r}chr_fin <- chr_simp |> dplyr::select(-fipscode, -poor_health, -poor_physical, -inactivity)dim(chr_fin)```# Table 1setting up table 1:```{r}myVars <-c("poor_mental", "food_index","uninsured", "primary_care", "mental_care","highschool_grad", "some_college", "unemployment", "income_inequality","social_score", "life_expectancy", "insuff_sleep","home_ownership", "housing_burden","below_18","above_65", "black", "native", "asian", "hispanic","white", "female", "rural", "obesity", "household_income", "smoking_prop", "drinking_prop")t1 <- tableone::CreateTableOne(data = chr_fin, vars = myVars,strata =c("dentist_grp"))table_one_text <-print( t1,showAllLevels =TRUE,quote =FALSE,noSpaces =FALSE,printToggle =FALSE)knitr::kable( table_one_text,format ="html",caption ="Comparison by Dentist Proportion")```The low dentist group counties have more proportions of uninsured populations (0.13 vs 0.09 ), lower college education (0.54 vs 0.67), lower life expectancy (74.9 vs 77.3), more rural areas (0.89 vs 0.38), **more** `smoking prevalence` (18.8 vs 16.4) and **lower** `excessive drinking` (16.3 vs 18.8).# Propensity Score Assessment## Calculating and Evaluating PS BalanceTo calculate the PS, we will build a logistic model with the binary exposure as the outcome and the covariates as the predictors.```{r}psmodel0 <-glm(dentist_grp ~ poor_mental + food_index + uninsured + income_inequality + primary_care + mental_care + highschool_grad + some_college + unemployment + social_score + life_expectancy + insuff_sleep + segregation + home_ownership + housing_burden + below_18 + above_65 + black + native + asian + hispanic + white + female + rural + obesity + household_income, family =binomial(), data= chr_fin)model_parameters(psmodel0) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```### Saving Propesnsity Score```{r}chr_fin$ps0 <- psmodel0$fittedchr_fin$linps0 <- psmodel0$linear.predictors```### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps0 + linps0 ~ dentist_grp, data = chr_fin) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```> The propensity scores here are concerning. We will need to observe the value of inflation to observe potential collinearity between the covariates.The VIF:```{r}vif_model <-glm(dentist_grp ~ poor_mental + food_index + uninsured + income_inequality + primary_care + mental_care + highschool_grad + some_college + unemployment + social_score + life_expectancy + insuff_sleep + segregation + home_ownership + housing_burden + below_18 + above_65 + black + native + asian + hispanic + white + female + rural + obesity + household_income, family =binomial(), data= chr_fin)vif(vif_model)```While most of the variables have low variance inflation factors, there are a few with an extremely high values (\>10), which presents serious multi-collinearity. These variables are `hispanic`, `white`, `black`, and `native`. We will remove these variables from our analysis.### Fixed Propensity ScoreWe will start by excluding the problematic variables from our final data set `chr_fin` to produce subset for outcome 1 analysis `chr24_smoking`.```{r}chr_fin1 <- chr_fin|> dplyr::select(-hispanic, -white, -black, -native, -ps0, -linps0)```Next, we will build the new model:```{r}psmodel00 <-glm(dentist_grp ~ poor_mental + food_index + uninsured + income_inequality + primary_care + mental_care + highschool_grad + some_college + unemployment + social_score + life_expectancy + insuff_sleep + segregation + home_ownership + housing_burden + below_18 + above_65 + asian + female + rural + obesity + household_income, family =binomial(), data= chr_fin1)model_parameters(psmodel00) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps00 <- psmodel00$fittedchr_fin1$linps00 <- psmodel00$linear.predictors```### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps00 + linps00 ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```> The propensity scores slightly improved after removing the variables with high collinearity values. However, the problem with their values being too close to 0 or 1 is visibly persistent.#### Fixed Propensity Score - Part 2We will attempt to observe the impact of each covariate on the outcome individually to rule out the variable(s) that impacted the PS values.##### `poor_mental````{r}psmodel_pm <-glm(dentist_grp ~ poor_mental, family =binomial(), data= chr_fin1)model_parameters(psmodel_pm) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_pm <- psmodel_pm$fittedchr_fin1$linps_pm <- psmodel_pm$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_pm + linps_pm ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `poor_mental` variable looks has two linear PS less than 0.01 and above 1.##### `food_index````{r}psmodel_fi <-glm(dentist_grp ~ food_index, family =binomial(), data= chr_fin1)model_parameters(psmodel_fi) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_fi <- psmodel_fi$fittedchr_fin1$linps_fi <- psmodel_fi$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_fi + linps_fi ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `food_index` variable has two linear PS less than 0.01 and above 1. All PS values are close to 0.99 or above, therefore, it will be removed.##### `uninsured````{r}psmodel_un <-glm(dentist_grp ~ uninsured, family =binomial(), data= chr_fin1)model_parameters(psmodel_un) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_un <- psmodel_un$fittedchr_fin1$linps_un <- psmodel_un$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_un + linps_un ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `uninsured` variable has two linear PS less than 0.01 and above 1. All PS values are close to 0.99 or above, therefore, it will be removed.##### `income_inequality````{r}psmodel_ii <-glm(dentist_grp ~ income_inequality, family =binomial(), data= chr_fin1)model_parameters(psmodel_ii) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_ii <- psmodel_ii$fittedchr_fin1$linps_ii <- psmodel_ii$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_ii + linps_ii ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `income_inequality` has two linear propensity scores less than 0.01 and above 1.##### `primary_care````{r}psmodel_pc <-glm(dentist_grp ~ primary_care, family =binomial(), data= chr_fin1)model_parameters(psmodel_pc) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```This variable has an exploding coefficient.##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_pc <- psmodel_pc$fittedchr_fin1$linps_pc <- psmodel_pc$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_pc + linps_pc ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `primary_care` variable has all propensity scores as less than 0.01 and above 1. This variable will be removed from the analysis.##### `mental_care````{r}psmodel_mc <-glm(dentist_grp ~ mental_care, family =binomial(), data= chr_fin1)model_parameters(psmodel_mc) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_mc <- psmodel_mc$fittedchr_fin1$linps_mc <- psmodel_mc$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_mc + linps_mc ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `mental_care` variable has all ps values smaller than 0.01 and above 1. This variable will be removed from the analysis.##### `highschool_grad````{r}psmodel_hs <-glm(dentist_grp ~ highschool_grad, family =binomial(), data= chr_fin1)model_parameters(psmodel_hs) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_hs <- psmodel_hs$fittedchr_fin1$linps_hs <- psmodel_hs$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_hs + linps_hs ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `highschool_grad` variable has two PS value less than 0.01, but overall looks acceptable.##### `some_college````{r}psmodel_sc <-glm(dentist_grp ~ some_college, family =binomial(), data= chr_fin1)model_parameters(psmodel_sc) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_sc <- psmodel_sc$fittedchr_fin1$linps_sc <- psmodel_sc$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_sc + linps_sc ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `some_college` variable has two linear PS value less than 0.01 and 3 above 0.99. This variable will be removed.##### `unemployment````{r}psmodel_ue <-glm(dentist_grp ~ unemployment, family =binomial(), data= chr_fin1)model_parameters(psmodel_ue) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_ue <- psmodel_ue$fittedchr_fin1$linps_ue <- psmodel_ue$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_ue + linps_ue ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `unemployment` variable has two linear PS values less than 0.01, but overall looks acceptable.##### `social_score````{r}psmodel_ss <-glm(dentist_grp ~ social_score, family =binomial(), data= chr_fin1)model_parameters(psmodel_ss) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_ss <- psmodel_ss$fittedchr_fin1$linps_ss <- psmodel_ss$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_ss + linps_ss ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `social_score` has two linear PS values less than 0.01, but overall looks acceptable.##### `life_expectancy````{r}psmodel_le <-glm(dentist_grp ~ life_expectancy, family =binomial(), data= chr_fin1)model_parameters(psmodel_le) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_le <- psmodel_le$fittedchr_fin1$linps_le <- psmodel_le$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_le + linps_le ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `life_expectancy` variable has two linear PS less than 0.01 and above 1. The other PS values are also problematic, therefore, it will be removed.##### `insuff_sleep````{r}psmodel_is <-glm(dentist_grp ~ insuff_sleep, family =binomial(), data= chr_fin1)model_parameters(psmodel_is) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_is <- psmodel_is$fittedchr_fin1$linps_is <- psmodel_is$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_is + linps_is ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `insuff_sleep` variable has two linear PS less than 0.01 and above 1.##### `home_ownership````{r}psmodel_ho <-glm(dentist_grp ~ home_ownership, family =binomial(), data= chr_fin1)model_parameters(psmodel_ho) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_ho <- psmodel_ho$fittedchr_fin1$linps_ho <- psmodel_ho$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_ho + linps_ho ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `home_ownership` variable has all PS values in problematic levels. Therefore, it will be removed.##### `housing_burden````{r}psmodel_hb <-glm(dentist_grp ~ housing_burden, family =binomial(), data= chr_fin1)model_parameters(psmodel_hb) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_hb <- psmodel_hb$fittedchr_fin1$linps_hb <- psmodel_hb$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_hb + linps_hb ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `housing_burden` variable has two linear PS less than 0.01 and above 1. All PS values are close to 0.99 or above, therefore, it will be removed.##### `below_18````{r}psmodel_b18 <-glm(dentist_grp ~ below_18, family =binomial(), data= chr_fin1)model_parameters(psmodel_b18) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_b18 <- psmodel_b18$fittedchr_fin1$linps_b18 <- psmodel_b18$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_b18 + linps_b18 ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `below_18` variable has two linear PS values less than 0.01 (0.00). Otherwise it looks acceptable.##### `above_65````{r}psmodel_a65 <-glm(dentist_grp ~ above_65, family =binomial(), data= chr_fin1)model_parameters(psmodel_a65) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_a65 <- psmodel_a65$fittedchr_fin1$linps_a65 <- psmodel_a65$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_a65 + linps_a65 ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `above_65` variable has two linear PS less than 0.01 and above 1.##### `asian````{r}psmodel_as <-glm(dentist_grp ~ asian, family =binomial(), data= chr_fin1)model_parameters(psmodel_as) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_as <- psmodel_as$fittedchr_fin1$linps_as <- psmodel_as$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_as + linps_as ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `asian` variable has multiple PS values in problematic levels, therefore, it will be removed.##### `female````{r}psmodel_fm <-glm(dentist_grp ~ female, family =binomial(), data= chr_fin1)model_parameters(psmodel_fm) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_fm <- psmodel_fm$fittedchr_fin1$linps_fm <- psmodel_fm$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_fm + linps_fm ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `female` variable has two linear PS less than 0.01 all PS values at or above 0.99. Therefore, it will be removed.##### `rural````{r}psmodel_rl <-glm(dentist_grp ~ rural, family =binomial(), data= chr_fin1)model_parameters(psmodel_rl) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_rl <- psmodel_rl$fittedchr_fin1$linps_rl <- psmodel_rl$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_rl + linps_rl ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `rural` variable has two linear PS less than 0.01 and value above 1. We might remove this variable.##### `obesity````{r}psmodel_ob <-glm(dentist_grp ~ obesity, family =binomial(), data= chr_fin1)model_parameters(psmodel_ob) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_ob <- psmodel_ob$fittedchr_fin1$linps_ob <- psmodel_ob$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_ob + linps_ob ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `obesity` variable has two linear PS less than 0.01 and all PS values around 0.99. We will remove this variable.##### `segregation````{r}psmodel_se <-glm(dentist_grp ~ segregation, family =binomial(), data= chr_fin1)model_parameters(psmodel_se) |>gt() |>fmt_number(decimals =3) |>opt_stylize(style =1, color ="pink")```##### Saving Propesnsity ScoreNow we will save the new propensity scores:```{r}chr_fin1$ps_se <- psmodel_se$fittedchr_fin1$linps_se <- psmodel_se$linear.predictors```##### Comparing The PS Distribution of The Two Exposure Groups```{r}df_stats(ps_se + linps_se ~ dentist_grp, data = chr_fin1) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```The `segregation` variable has two linear PS less than 0.01 and all PS values around 0.99. We might remove this variable.### PS After Covariates RemovalWe will start by building a subset for the first outcome in our analysis `chr_fin2` after removing the problematic covariates from the original `chr_fin` set.```{r}chr_fin2 <- chr_fin|> dplyr::select( -hispanic, -white, -black, -native, -ps0, -linps0, -food_index, -uninsured, -primary_care, -mental_care, -some_college, -life_expectancy, -housing_burden, -home_ownership, -female, -rural, -asian, -obesity)```Now, we will build our `psmodel1`.```{r}psmodel1 <-glm(dentist_grp ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 ,family =binomial(), data= chr_fin2)model_parameters(psmodel1) |>gt() |> gt::fmt_number(decimals =3) |> gt::opt_stylize(style =1, color ="pink")``````{r}chr_fin2$ps1 <- psmodel1$fittedchr_fin2$linps1 <- psmodel1$linear.predictors``````{r}df_stats(ps1 + linps1 ~ dentist_grp, data = chr_fin2) |>gt() |>fmt_number(columns =c(-n, -missing), decimals =3) |>opt_stylize(style =2, color ="pink")```> The problematic values in the propensity score are persistent and we will need to investigate this using other methods. For now, we will proceed to the analysis phase.# Outcome 1 Analysis## The Blanace Before PS Adjustment```{r}ggplot(chr_fin2, aes(x = dentist_grp, y = ps1, color = dentist_grp)) +geom_boxplot() +geom_jitter(width =0.1) +guides(color ="none")``````{r}ggplot(chr_fin2, aes(x = linps1, fill = dentist_grp)) +geom_density(alpha =0.3)```> There is a good overlap between the two groups.### Rubin's Rule 1```{r}rubin1.unadj <-with(chr_fin2,abs(100*(mean(linps1[dentist_n==1])-mean(linps1[dentist_n==0]))/sd(linps1)))rubin1.unadj```The absolute value of the standardized difference of the linear propensity score comparing the counties with high dentist proportions and low dentist proportions is 114.8%. This value is too far from the ideal value of 0% and above the maximum acceptable value of 50 %. We definitely do not pass Rubin's rule 1, therefore, propensity score adjustment is needed.### Rubin's Rule 2```{r}rubin2.unadj <-with(chr_fin2, var(linps1[dentist_n==1])/var(linps1[dentist_n==0]))rubin2.unadj```The ratio of variances of the linear propensity score comparing the counties with high dentist proportions and low dentist proportions is 0.98. This value is very close to the desired value of 1.0. In this case, we pass Rubin's Rule 2.## Unadjusted Outcome 1 `smoking_prop`We will observe the the distribution of the outcome `smoking_prop` among the two exposure groups `dentist_grp`.```{r}df_stats(smoking_prop ~ dentist_grp, data = chr_fin2) |>gt() |>fmt_number(mean:sd, decimals =2) |>opt_stylize(style =5, color ="pink")```The high dentist group has a lower mean of smoking proportions (16.4 vs 18.8).### Linear Regression Model for Outcome 1We will build a linear regression model for outcome 1 `smoking_prop` and the exposure `dentist_prop` without including the covariates.```{r}unadj.out1 <-lm(smoking_prop ~ dentist_grp, data= chr_fin2)model_parameters(unadj.out1, ci =0.95) |>gt() |>fmt_number(columns =-df_error, decimals =3) |>opt_stylize(style =6, color ="pink")```> The unadjusted model estimates that low dentist proportions increases smoking prevalence in US counties by 2.36 percentage points (95% CI: 1.99, 2.73).## 1:1 Matching With ReplacementThe binary exposure has an equal number of observations in each group (n=777). Therefore, we will use 1:1 matching with replacement.```{r}X <- chr_fin2$linps1 Tr <-as.logical(chr_fin2$dentist_n)match1 <-Match(Tr=Tr, X=X, M =1, replace =TRUE, ties=FALSE)summary(match1)```checking if matching worked:```{r}n_distinct(match1$index.treated)``````{r}n_distinct(match1$index.control)```> After 1:1 matching with replacement, the effective size was reduced to 272 matched counties from the treatment and control group.### Balance Assessment```{r}set.seed(500)mb1 <-MatchBalance(dentist_n ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 + ps1 + linps1, data= chr_fin2, match.out = match1, nboots=500)```### Tabulating Standardized DifferencesWe will name the covariates used in our model.```{r}covnames <-c( "poor_mental", "unemployment", "income_inequality", "social_score", "diabetes", "insuff_sleep", "highschool_grad", "segregation","below_18", "above_65", "raw PS", "linear PS") ```Then we will extract the standardized differences.```{r}pre.szd <-NULL; post.szd <-NULLfor(i in1:length(covnames)) { pre.szd[i] <- mb1$BeforeMatching[[i]]$sdiff.pooled post.szd[i] <- mb1$AfterMatching[[i]]$sdiff.pooled}```Now, we can have our final table.```{r}match_szd <-tibble(covnames, pre.szd, post.szd, row.names=covnames)match_szd```We will use these results to build the love plot to better visualize the standardized differences.### Visualizing Standardized DifferencesWe will use the cobalt method to build plots for the standardized differences and the variance ratios.```{r}b <-bal.tab(match1, dentist_n ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 + ps1 + linps1, data= chr_fin2, stats =c("m", "v"), un =TRUE,quick =FALSE)b```> As previously mentioned, the effective size after 1:1 matching with replacement is 278 counties. We lost 499 counties from the sample because they were unmatched.Now we will build the love plot for the standardized differences:```{r}lplot1 <-love.plot(b, threshold = .1, size =3,var.order ="unadjusted", stars ="raw",title ="Standardized Differences and 1:1 Matching with Replacement")lplot1```> All covariates have improved standardized difference after 1:1 matching adjustment, with the exception for `social_score.`Plot of Variance Ratios```{r}lplot2 <-love.plot(b, stat ="v",threshold =1.25, size =3,var.order ="unadjusted",title ="Variance Ratios and 1:1 Matching with Replacement")lplot2 +theme_bw()```Most of the covariates displayed improved variance ratios with the exception of `income_inequality`, `highschool_grad`, `insuff_sleep`, and `above_65`.### Building New Data Frame with Matched Sample```{r}matches1 <-factor(rep(match1$index.treated, 2))chr.matchedsample1 <-cbind(matches1, chr_fin2[c(match1$index.control, match1$index.treated),])``````{r}chr.matchedsample1 |>count(dentist_grp)```Because the 1:1 matching method we used includes *replacement*, it is only natural to retain the original number of observations in the exposure groups.### Rubin's Rules After MatchingFirst, we will check Rubin's Rule 1:```{r}rubin1.match1 <-with(chr.matchedsample1,abs(100*(mean(linps1[dentist_n==1])-mean(linps1[dentist_n==0]))/sd(linps1)))rubin1.match1```> The absolute value of the standardized difference of the linear propensity score before adjustment was 114.8%. The value after adjustment (52.76%) shows substantial improvement as it is close to the maximum value of 50%, but the value remains above 10% and is not close to 0%. In this case, we barely pass Rubin's Rule 1.Next, we will check Rubin's Rule 2:```{r}rubin2.match1 <-with(chr.matchedsample1, var(linps1[dentist_n==1])/var(linps1[dentist_n==0]))rubin2.match1```> The previous ratio of the variance of the linear propensity score was 0.98. This value also shows mild improvement where the value is closer to 1 and remains below 2. We also pass Rubin's Rule 2 and can proceed to use the results of this PS adjustment to measure the estimates.### Estimating Effect of Exposure on Outcome After 1:1 MatchingWe will use the ATT (Average Treatment Effect on the Treated) estimate to measure the dentist proportion effect on smoking prevalence in US Counties. First, we will need to create a data frame out of the matched sample:```{r}matches1 <-factor(rep(match1$index.treated, 2))chr.matchedsamplef <-cbind(matches1, chr_fin2[c(match1$index.control, match1$index.treated),])```Now, we can produce an estimate using the matched sample from `chr_fin2`.```{r}chr.matchedsamplef$matches.f <-as.factor(chr.matchedsamplef$matches1)matched_mix.mod1 <-lmer(smoking_prop ~ dentist_n + (1| matches.f), REML =TRUE, data=chr.matchedsamplef)tidy_matched_mix.mod1 <- broom.mixed::tidy(matched_mix.mod1, conf.int =TRUE, conf.level =0.95) |>filter(term =="dentist_n")tidy_matched_mix.mod1 |>kable(digits =3)```> After 1:1 matching with replacement, our model estimates the effect of low dentists proportions on smoking prevalence in US counties to increase by 0.78 percentage points (95% CI: 0.39, 1.16). Our confidence interval does not include 0, therefore, following up with a sensitivity analysis is a reasonable choice.## Use ATT Weighting AdjustmentWe will start by creating a data frame of our data set.```{r}chrfin_df <- base::data.frame(chr_fin2)```Now, we will calculate the propensity score using the `ps` function from the `twang` package.```{r}ps.chr <- twang::ps(dentist_n ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 + ps1 + linps1,data = chrfin_df,n.trees =5000,interaction.depth =2,stop.method =c("es.mean"),estimand ="ATT",verbose =FALSE)```Checking the iteration:```{r}plot(ps.chr)```> We can see that we have run enough iterations to reach a stable estimate.Observing the effective sample size of the weighted estimates:```{r}summary(ps.chr)```> The effective size after ATT weighting is 163 counties.### Checking the balance```{r}plot(ps.chr, plots =2)``````{r}plot(ps.chr, plots =3)```> We can see that the weighted estimates have an absolute standard differences are below 0.5 and are closer to 0 compared to the unadjusted estimates.We can also assess the balance using the cobalt method.```{r}b2 <- cobalt::bal.tab(ps.chr, full.stop.method ="es.mean.att", stats =c("m", "v"), un =TRUE)b2```We will visualize the standardized differences using the Love plot.```{r}lplot3 <- cobalt::love.plot(b2, threshold = .1, size =3, stars ="raw",title ="Standardized Diffs and TWANG ATT weighting")lplot3```> All of the variables have observable improved standardized difference after TWANG ATT weighting adjustment.Visualizing the variance ratio:```{r}lplot4 <- cobalt::love.plot(b2, stat ="v",threshold =1.25, size =3, title ="Variance Ratios: TWANG ATT weighting")lplot4 +theme_bw()```Most of the variables show some improvement in the variance ration after adjustment, though it is not very satisfying.### Estimating The Causal Effect After ATT Weight AdjustmentFirst, we will build the weight reference `wts1` (which is the inverse propensity score) and add it to our data set `chr_out1`.```{r}chr_fin2$wts1 <-ifelse(chr_fin2$dentist_n==1, 1, chr_fin2$ps1/(1-chr_fin2$ps1))``````{r}chrwt1.design <- survey::svydesign(ids=~1, weights=~wts1, data= chr_fin2) adjout1.wt1 <- survey::svyglm(smoking_prop ~ dentist_n, design= chrwt1.design)wt_att_results1 <-tidy(adjout1.wt1, conf.int =TRUE) |>filter(term =="dentist_n")wt_att_results1```> After ATT weight adjustment, the estimated impact of low dentist proportions on smoking prevalence in US counties is decrease of -0.43 percentage points (95% CI: -2.57, 1.71).## Use Double Robust Adjustment```{r}chr.dr.design <- survey::svydesign(ids=~1, weights=~wts1, data= chr_fin2) dr.out1.wt1 <- survey::svyglm(smoking_prop ~ dentist_n + linps1, design= chr.dr.design)dr_att_out1 <-tidy(dr.out1.wt1, conf.int =TRUE) |>filter(term =="dentist_n")dr_att_out1```> After Double Robust adjustment, the estimated impact of low dentist proportions on smoking prevalence in US counties is an increase of 0.72 percentage points (95% CI: -0.17, 1.61). These results are slightly different from the ATT Weighting results.## Compare Results```{r}chr_results <-tibble(analysis =c("Unadjusted", "PS 1:1 Match", "TWANG ATT WT", "Double Robust"),estimate =c(2.36, 0.78, -0.43, 0.72),conf.low =c(1.99, 0.39, -2.57, -0.17),conf.high =c(2.73, 1.16, 1.71, 1.61))ggplot(chr_results, aes(x = analysis, y = estimate)) +geom_errorbar(aes(ymax = conf.high, ymin = conf.low), width =0.5) +geom_label(aes(label = estimate), size =5)```> We can see that propensity score adjustment decreased the estimated affect of dentist proportions on smoking prevalence in US counties (from 2.36% to about 0.7-0.8%) . While two of the three adjustments present close estimates, the 1:1 matching with replacement adjustment displays the smallest confidence interval, the best balance, and largest effective size (278 after 1:1 matching compared to 163 after ATT weighting).## Sensitivity AnalysisThe estimate produced from the 1:1 Matching with replacement has a statistically significant p-value. Therefore, we will need to perform a sensitivity analysis.```{r}Rc <- match1$mdata$Y[match1$mdata$Tr==0]Rt <- match1$mdata$Y[match1$mdata$Tr==1] psens(Rt, Rc, Gamma =5, GammaInc =0.25)```Our conclusions regarding the impact of low dentists proportions increasing smoking prevalence in US counties hold true in absence of biases in the study. This conclusion is maintained in the presence of a biases level between L= 1.00 and L= 1.25. This means that only *small hidden bias* is needed to change our conclusions.# Outcome 2 Analysis## Unadjusted Outcome 2 `drinking_prop`We will observe the the distribution of the outcome `drinking_prop` among the two exposure groups `dentist_grp`.```{r}df_stats(drinking_prop ~ dentist_grp, data = chr_fin2) |>gt() |>fmt_number(mean:sd, decimals =2) |>opt_stylize(style =5, color ="pink")```The low dentist group has a lower mean of excessive drinking prevalence (16.3 vs 18.8).### Linear Regression Model for Outcome 2We will build a linear regression model for outcome 2 `drinking_prop` and the exposure `dentist_grp` without including the covariates.```{r}unadj.out2 <-lm(drinking_prop ~ dentist_grp, data= chr_fin2)model_parameters(unadj.out2, ci =0.95) |>gt() |>fmt_number(columns =-df_error, decimals =3) |>opt_stylize(style =6, color ="pink")```> The unadjusted model estimates that low dentist proportions decreases excessive drinking prevalence in US counties by 2.56 percentage points (95% CI: -2.87, -2.26).## 1:1 Matching With ReplacementThe binary exposure has an equal number of observations in each group (n=777). Therefore, we will use 1:1 matching with replacement.```{r}X <- chr_fin2$linps1 Tr <-as.logical(chr_fin2$dentist_n)match2 <-Match(Tr=Tr, X=X, M =1, replace =TRUE, ties=FALSE)summary(match2)```checking if matching worked:```{r}n_distinct(match2$index.treated)``````{r}n_distinct(match2$index.control)```> After 1:1 matching with replacement, the effective size was reduced to 269 matched counties from the low dentist and high dentist groups.### Balance Assessment```{r}set.seed(500)mb2 <-MatchBalance(dentist_n ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 + ps1 + linps1, data= chr_fin2, match.out = match2, nboots=500)```### Tabulating Standardized DifferencesWe will name the covariates used in our model.```{r}covnames2 <-c( "poor_mental", "unemployment", "income_inequality", "social_score", "diabetes", "insuff_sleep", "highschool_grad", "segregation","below_18", "above_65", "raw PS", "linear PS") ```Then we will extract the standardized differences.```{r}pre.szd2 <-NULL; post.szd2 <-NULLfor(i in1:length(covnames2)) { pre.szd2[i] <- mb2$BeforeMatching[[i]]$sdiff.pooled post.szd2[i] <- mb2$AfterMatching[[i]]$sdiff.pooled}```Now, we can have our final table.```{r}match_szd2 <-tibble(covnames2, pre.szd2, post.szd2, row.names=covnames2)match_szd2```We will use these results to build the love plot to better visualize the standardized differences.### Visualizing Standardized DifferencesWe will use the cobalt method to build plots for the standardized differences and the variance ratios.```{r}b2 <-bal.tab(match2, dentist_n ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 + ps1 + linps1, data= chr_fin2, stats =c("m", "v"), un =TRUE,quick =FALSE)b2```> As previously mentioned, the effective size after 1:1 matching with replacement is 269 counties. We lost 508 counties from the sample because they were unmatched.Now we will build the love plot for the standardized differences:```{r}lplot5 <-love.plot(b2, threshold = .1, size =3,var.order ="unadjusted", stars ="raw",title ="Standardized Differences and 1:1 Matching with Replacement")lplot5```> All covariates have improved standardized difference after 1:1 matching adjustment, with the exception for `social_score.`Plot of Variance Ratios```{r}lplot6 <-love.plot(b2, stat ="v",threshold =1.25, size =3,var.order ="unadjusted",title ="Variance Ratios and 1:1 Matching with Replacement")lplot6 +theme_bw()```Most of the covariates displayed improved variance ratios with the exception of `highschool_grad`, `insuff_sleep`, `unemployment`, and `above_65`.### Building New Data Frame with Matched Sample```{r}matches2 <-factor(rep(match2$index.treated, 2))chr.matchedsample2 <-cbind(matches2, chr_fin2[c(match2$index.control, match2$index.treated),])``````{r}chr.matchedsample2 |>count(dentist_grp)```Because the 1:1 matching method we used includes *replacement*, it is only natural to retain the original number of observations in the exposure groups.### Rubin's Rules After MatchingFirst, we will check Rubin's Rule 1:```{r}rubin1.match2 <-with(chr.matchedsample2,abs(100*(mean(linps1[dentist_n==1])-mean(linps1[dentist_n==0]))/sd(linps1)))rubin1.match2```> The absolute value of the standardized difference of the linear propensity score before adjustment was 114.8%. The value after adjustment (52.03%) shows substantial improvement, however the value is above 10% and far from to 0%. In this case, we barely pass Rubin's Rule 1.Next, we will check Rubin's Rule 2:```{r}rubin2.match2 <-with(chr.matchedsample2, var(linps1[dentist_n==1])/var(linps1[dentist_n==0]))rubin2.match2```> The previous ratio of the variance of the linear propensity score was 0.98. This value also shows mild improvement where the value is closer to 1 (1.03) and remains below 2. We also pass Rubin's Rule 2 and can proceed to use the results of this PS adjustment to measure the estimates.### Estimating Effect of Exposure on Outcome After 1:1 MatchingWe will use the ATT (Average Treatment Effect on the Treated) estimate to measure the dentist proportion effect on smoking prevalence in US Counties. First, we will need to create a data frame out of the matched sample:```{r}matches2 <-factor(rep(match2$index.treated, 2))chr.matchedsamplef2 <-cbind(matches2, chr_fin2[c(match2$index.control, match2$index.treated),])```Now, we can produce an estimate using the matched sample from `chr_fin2`.```{r}chr.matchedsamplef2$matches2.f <-as.factor(chr.matchedsamplef2$matches2)matched_mix.mod2 <-lmer(drinking_prop ~ dentist_n + (1| matches2.f), REML =TRUE, data=chr.matchedsamplef2)tidy_matched_mix.mod2 <- broom.mixed::tidy(matched_mix.mod2, conf.int =TRUE, conf.level =0.95) |>filter(term =="dentist_n")tidy_matched_mix.mod2 |>kable(digits =3)```> After 1:1 matching with replacement, our model estimates the effect of low dentists proportions on excessive drinking prevalence in US counties to increase by 0.40 percentage points (95% CI: 0.16, 0.64). Our confidence interval does not include 0, therefore, following up with a sensitivity analysis is a reasonable choice.## Use ATT Weighting AdjustmentWe will start by creating a data frame of our data set.```{r}chrfin_df2 <- base::data.frame(chr_fin2)```Now, we will calculate the propensity score using the `ps` function from the `twang` package.```{r}ps.chr2 <- twang::ps(dentist_n ~ poor_mental + unemployment + income_inequality + social_score + diabetes + insuff_sleep + highschool_grad + segregation + below_18 + above_65 + ps1 + linps1,data = chrfin_df2,n.trees =5000,interaction.depth =2,stop.method =c("es.mean"),estimand ="ATT",verbose =FALSE)```Checking the iteration:```{r}plot(ps.chr2)```> We can see that we have run enough iterations to reach a stable estimate.Observing the effective sample size of the weighted estimates:```{r}summary(ps.chr2)```> The effective size after ATT weighting is 163 counties.### Checking the balance```{r}plot(ps.chr2, plots =2)``````{r}plot(ps.chr2, plots =3)```> We can see that the weighted estimates have an absolute standard differences are below 0.5 and are closer to 0 compared to the unadjusted estimates.We can also assess the balance using the cobalt method.```{r}b3 <- cobalt::bal.tab(ps.chr2, full.stop.method ="es.mean.att", stats =c("m", "v"), un =TRUE)b3```We will visualize the standardized differences using the Love plot.```{r}lplot7 <- cobalt::love.plot(b3, threshold = .1, size =3, stars ="raw",title ="Standardized Diffs and TWANG ATT weighting")lplot7```> All of the variables have observable improved standardized difference after TWANG ATT weighting adjustment.Visualizing the variance ratio:```{r}lplot8 <- cobalt::love.plot(b3, stat ="v",threshold =1.25, size =3, title ="Variance Ratios: TWANG ATT weighting")lplot8 +theme_bw()```Most of the variables show some improvement in the variance ration after adjustment, though it is not very satisfying.### Estimating The Causal Effect After ATT Weight AdjustmentWe will use the weight design previously created `wt1` and `chrwt1.design`.```{r}adjout2.wt1 <- survey::svyglm(drinking_prop ~ dentist_n, design= chrwt1.design)wt_att_results2 <-tidy(adjout2.wt1, conf.int =TRUE) |>filter(term =="dentist_n")wt_att_results2```> After ATT weight adjustment, the estimated impact of low dentist proportions on excessive drinking prevalence in US counties is an increase of 1.15 percentage points (95% CI: 0.13, 2.17).## Use Double Robust AdjustmentWe will use the previous double robust design `chr.dr.design`.```{r}dr.out2.wt1 <- survey::svyglm(drinking_prop ~ dentist_n + linps1, design= chr.dr.design)dr_att_out2 <-tidy(dr.out2.wt1, conf.int =TRUE) |>filter(term =="dentist_n")dr_att_out2```> After Double Robust adjustment, the estimated impact of low dentist proportions on excessive drinking prevalence in US counties is a decrease of -0.11 percentage points (95% CI: -0.54, 0.33). These results are very different from the ATT Weighting results.## Compare Results```{r}chr_results2 <-tibble(analysis =c("Unadjusted", "PS 1:1 Match", "TWANG ATT WT", "Double Robust"),estimate =c(-2.56, 0.40, 1.15, -0.11),conf.low =c(-2.87, 0.16, 0.13, -0.54),conf.high =c(-2.26, 0.64, 2.17, 0.33))ggplot(chr_results2, aes(x = analysis, y = estimate)) +geom_errorbar(aes(ymax = conf.high, ymin = conf.low), width =0.5) +geom_label(aes(label = estimate), size =5)```> We can see that propensity score adjustment inversed the estimated affect of dentist proportions on excessive drinking prevalence in US counties (from -2.56% to 0.4-1.2%) . While two of the three adjustments present close estimates, the 1:1 matching with replacement adjustment displays the smallest confidence interval, the best balance, and largest effective size (273 after 1:1 matching compared to 163 after ATT weighting).## Sensitivity AnalysisThe estimate produced from the 1:1 Matching with replacement has a statistically significant p-value. Therefore, we will need to perform a sensitivity analysis.```{r}Rc2 <- match2$mdata$Y[match2$mdata$Tr==0]Rt2 <- match2$mdata$Y[match2$mdata$Tr==1] psens(Rt2, Rc2, Gamma =5, GammaInc =0.25)```Our conclusions regarding the impact of low dentists proportions increasing excessive drinking prevalence in US counties hold true in absence of biases in the study. This conclusion is maintained in the presence of a biases level between L= 1.00 and L= 1.25. This means that only *small hidden bias* is needed to change our conclusions.# DiscussionThis study found that counties with low dentist proportions are associated with a small increase in smoking prevalence and excessive drinking prevalence among adults (0.78% and 0.4% respectively). Although these estimates provided a meaningful size effect, the sensitivity analysis concluded that our conclusions may change in the presence of small hidden bias (gamma between 1.0 and 1.25). The weak findings may be attributed to the existing low number of dentists per 1000 population in US counties. Despite the attempt to dichotomize the dentist variable by using the first and third quartiles (0.2 and 0.6 per 1000 population), the difference was not large enough to detect observable impact. It is also important to note that increased presence of dentists does not spontaneously reduce negative oral habits (smoking and excessive drinking). The quality care and accessibility of services is an important factor in improving oral health and other related health behaviors.This project emphasized the importance of using confounders to assess selection biases in observational studies. The difference in the exposure's effect estimates after propensity score adjustment was remarkable. While our initial belief was that having more covariates would improve the analysis, this project corrected that misconception. In our analysis, multiple covariates were dropped due to high correlation, collinearity, or for disrupting the propensity score values. Selecting and evaluating appropriate covariates was the most critical part for the propensity score analysis. To improve my study in the future, we will need to explore more variables from the County Health Rankings (CHR) as we may have missed other useful variables during the initial selection phase. Also, using the Health Professional Shortage Area classification (HPSA) instead of the number of dentists per population can provide a better assessment of the impact of dentist quantities on oral-related health behaviors in US counties.# Session Information```{r}sessionInfo()```
5.1.5.1.28
social_scoreCode