1  Dentists, Adult Smoking, and Excessive Drinking

A Cross-sectional Study from CHR 2018 - 2024

Author

Sarah Albalawi

Published

2025-10-28

R Setup

Code
knitr::opts_chunk$set(comment = NA) 

library(janitor)
library(dplyr)
library(readr)
library(magrittr)
library(Hmisc)
library(knitr)
library(naniar)
library(easystats)
library(tableone)
library(mosaic)
library(gt)
library(Epi)
library(Matching)
library(arm)
library(cobalt)
library(broom)
library(survival)
library(ggplot2)
library(mice)
library(patchwork)
library(twang)
library(car)
library(rbounds)
library(tidyverse)

theme_set(theme_bw()) 

Loading CHR Data

We will load two data sets from the County Health Ranking source: CHR from year 2024 and CHR from year 2018.

Code
chr24_raw <- read_csv("analytic_data2018_0.csv", skip = 1, show_col_types = FALSE)
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Code
dim(chr24_raw)
[1] 3194  508
Code
chr18_raw <- read_csv("analytic_data2024.csv", skip = 1, show_col_types = FALSE)

dim(chr18_raw)
[1] 3195  770

2 Data Management

2.1 Joining data sets

We will only use the outcomes smoking and excessive drinking from the 2024 CHR data set.

Code
chr24_var <- chr24_raw|> dplyr::select( 
    fipscode,
    v009_rawvalue,
    v049_rawvalue)

The exposure dentist proportion and the covariates will be pulled from the 2018 CHR data set.

Code
chr18_var <- chr18_raw|> dplyr::select( 
    fipscode,
    v002_rawvalue,
    v036_rawvalue,
    v042_rawvalue,
    v133_rawvalue,
    v085_rawvalue,
    v004_rawvalue,
    v088_rawvalue,
    v062_other_data_1,
    v069_rawvalue,
    v023_rawvalue,
    v044_rawvalue, 
    v140_rawvalue,
    v147_rawvalue,
    v060_rawvalue,
    v143_rawvalue,
    v021_rawvalue,
    v141_rawvalue,
    v153_rawvalue,
    v154_rawvalue,
    v052_rawvalue,
    v053_rawvalue, 
    v054_rawvalue, 
    v055_rawvalue,
    v081_rawvalue,
    v056_rawvalue,
    v126_rawvalue,
    v057_rawvalue,
    v058_rawvalue,
    v070_rawvalue,
    v011_rawvalue,
    v063_rawvalue) 

Now we will joing the two table to produce our initial working data set chr_1.

Code
chr_1 <- chr18_var |>
  left_join(chr24_var, by= "fipscode")

dim(chr_1)
[1] 3195   34

Our joint data set has 3,195 observations (counties) and 34 variables (health measures).

The Codebook

The data file includes 3,195 observations (counties), on 34 variables (health measures).

Variable Description Notes Year
fipscode 5-digit FIPS Code
v088_rawvalue Number of dentists per 1000 population Binary Exposure: High proportion vs Low proportion 2018
v009_rawvalue Adult Smoking prevalence % Quantitative Outcome 1 2024
v049_rawvalue Excessive Drinking prevalence % Quantitative outcome 2 2024
v052_rawvalue % Below 18 Years of Age proportion Covariate 2018
v053_rawvalue % 65 and Older proportion Covariate 2018
v054_rawvalue % Non-Hispanic Black proportion Covariate 2018
v055_rawvalue % American Indian or Alaska Native proportion Covariate 2018
v081_rawvalue % Asian proportion Covariate 2018
v056_rawvalue % Hispanic proportion Covariate 2018
v126_rawvalue % Non-Hispanic White proportion Covariate 2018
v057_rawvalue % Female proportion Covariate 2018
v058_rawvalue % Rural Covariate 2018
v002_rawvalue % Poor or Fair Health Covariate 2018
v036_rawvalue Poor Physical Health Days Covariate 2018
v042_rawvalue Poor Mental Health Days Covariate 2018
v133_rawvalue Food Environment Index Covariate 2018
v085_rawvalue Uninsured population % Covariate 2018
v004_rawvalue Primary Care Physicians per population Covariate 2018
v062_other_data_1 Ratio of population to mental health providers Covariate 2018
v021_rawvalue High School Completion % Covariate 2018
v069_rawvalue Some College % Covariate 2018
v023_rawvalue Unemployment % Covariate 2018
v044_rawvalue Income Inequality Covariate 2018
v140_rawvalue Social Associations Covariate 2018
v070_rawvalue Physical Inactivity days Covariate 2018
v011_rawvalue Adult Obesity prevalence % Covariate 2018
v060_rawvalue Diabetes Prevalence % Covariate 2018
v143_rawvalue Insufficient Sleep proportion % Covariate 2018
v141_rawvalue Residential Segregation - Black/White Covariate 2018
v147_rawvalue Life Expectancy (years) Covariate 2018
v063_rawvalue Median Household Income Covariate 2018
v153_rawvalue Home ownership Covariate 2018
v154_rawvalue Severe Housing Cost Burden Covariate 2018

Note: The analysis will include one exposure and two outcome variables.

2.2 Renaming the variables

Code
chr_1 <- chr_1 |>
  mutate(dentist_grp = v088_rawvalue*100,
         smoking_prop = v009_rawvalue*100,
         drinking_prop = v049_rawvalue*100)|>
  rename(below_18 = v052_rawvalue,
         above_65 = v053_rawvalue,
         black = v054_rawvalue,
         native = v055_rawvalue,
         asian = v081_rawvalue,
         hispanic = v056_rawvalue,
         white = v126_rawvalue,
         female = v057_rawvalue,
         rural = v058_rawvalue,
         poor_health = v002_rawvalue,
         poor_physical = v036_rawvalue,
         poor_mental = v042_rawvalue,
         food_index = v133_rawvalue,
         uninsured = v085_rawvalue,
         primary_care = v004_rawvalue,
         mental_care = v062_other_data_1,
         highschool_grad  = v021_rawvalue,
         some_college = v069_rawvalue,
         unemployment = v023_rawvalue,
         income_inequality = v044_rawvalue,
         social_score = v140_rawvalue,
         inactivity = v070_rawvalue,
         obesity = v011_rawvalue,
         diabetes = v060_rawvalue, 
         insuff_sleep = v143_rawvalue,
         segregation = v141_rawvalue, 
         life_expectancy = v147_rawvalue,
         household_income = v063_rawvalue, 
         home_ownership = v153_rawvalue,
         housing_burden = v154_rawvalue )|>
  dplyr::select(-v088_rawvalue, -v009_rawvalue , -v049_rawvalue)

Checking the new names:

Code
names(chr_1)
 [1] "fipscode"          "poor_health"       "poor_physical"    
 [4] "poor_mental"       "food_index"        "uninsured"        
 [7] "primary_care"      "mental_care"       "some_college"     
[10] "unemployment"      "income_inequality" "social_score"     
[13] "life_expectancy"   "diabetes"          "insuff_sleep"     
[16] "highschool_grad"   "segregation"       "home_ownership"   
[19] "housing_burden"    "below_18"          "above_65"         
[22] "black"             "native"            "asian"            
[25] "hispanic"          "white"             "female"           
[28] "rural"             "inactivity"        "obesity"          
[31] "household_income"  "dentist_grp"       "smoking_prop"     
[34] "drinking_prop"    

The new names look reasonable and are free from spelling errors.

2.3 Categorizing Exposure dentist_grp

Code
summary(chr_1$dentist_grp)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
0.00000 0.02760 0.04391 0.04867 0.06495 0.81642      87 

We will use the first quartile (0.028) and third quartile (0.065) as cut points to dichotomize the dentist_grp variable.

Code
chr_1 <- chr_1 |>
  mutate(dentist_grp = case_when(
    dentist_grp <= 0.02760 ~ "Low",
    dentist_grp >= 0.06495 ~ "High")) |>
  mutate(dentist_grp = factor(dentist_grp))

chr_1 |> count(dentist_grp)
# A tibble: 3 × 2
  dentist_grp     n
  <fct>       <int>
1 High          777
2 Low           777
3 <NA>         1641
Code
chr_1 <- chr_1 |>
  mutate(dentist_n = as.numeric(dentist_grp),
         dentist_n = dentist_n - 1)

chr_1 |> count(dentist_n)
# A tibble: 3 × 2
  dentist_n     n
      <dbl> <int>
1         0   777
2         1   777
3        NA  1641

We have 777 counties in each group.

2.4 Numeric Summaries Before Imputation

Code
describe(chr_1)
chr_1 

 35  Variables      3195  Observations
--------------------------------------------------------------------------------
fipscode 
       n  missing distinct 
    3195        0     3195 

lowest : 00000 01000 01001 01003 01005, highest: 56037 56039 56041 56043 56045
--------------------------------------------------------------------------------
poor_health 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0      237        1   0.1768    0.174  0.05052   0.1160 
     .10      .25      .50      .75      .90      .95 
  0.1254   0.1420   0.1680   0.2070   0.2380   0.2580 

lowest : 0.084 0.087 0.089 0.091 0.093, highest: 0.351 0.359 0.364 0.375 0.38 
--------------------------------------------------------------------------------
poor_physical 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3194        1    3.888    3.871    0.734    2.915 
     .10      .25      .50      .75      .90      .95 
   3.065    3.385    3.866    4.333    4.739    4.977 

lowest : 2.26179 2.33164 2.39068 2.3946  2.41979
highest: 6.08336 6.0868  6.1734  6.44954 6.83683
--------------------------------------------------------------------------------
poor_mental 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3194        1    5.212    5.219   0.6926    4.122 
     .10      .25      .50      .75      .90      .95 
   4.410    4.814    5.229    5.626    5.990    6.194 

lowest : 3.18554 3.21723 3.53009 3.54504 3.54712
highest: 7.07808 7.15042 7.17393 7.30487 7.38881
--------------------------------------------------------------------------------
food_index 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3160       35       82    0.999    7.543     7.65    1.291      5.3 
     .10      .25      .50      .75      .90      .95 
     6.1      6.9      7.7      8.4      8.9      9.2 

lowest : 0   0.9 1.3 1.4 1.6, highest: 9.6 9.7 9.8 9.9 10 
--------------------------------------------------------------------------------
uninsured 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3194        1     3190        1   0.1149   0.1099  0.05717  0.05144 
     .10      .25      .50      .75      .90      .95 
 0.05934  0.07462  0.10340  0.14548  0.18739  0.21667 

lowest : 0.0238974 0.0241492 0.024785  0.0268813 0.0279943
highest: 0.323534  0.324548  0.327475  0.387124  0.463235 
--------------------------------------------------------------------------------
primary_care 
        n   missing  distinct      Info      Mean   pMedian       Gmd       .05 
     3038       157      2942         1 0.0005389 0.0005031   0.00037 0.0001039 
      .10       .25       .50       .75       .90       .95 
0.0001689 0.0002974 0.0004740 0.0007176 0.0009665 0.0011567 

lowest : 0           3.43867e-05 3.93043e-05 3.98311e-05 4.38462e-05
highest: 0.00284638  0.00289715  0.00296695  0.00507614  0.00580527 
--------------------------------------------------------------------------------
mental_care 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3010      185     3001        1     1344    907.7     1536    162.8 
     .10      .25      .50      .75      .90      .95 
   220.2    368.0    692.9   1547.6   2999.5   4792.8 

lowest :  -931  -919  -884  -865  -849, highest: 15654 17550 19935 22593 23044
--------------------------------------------------------------------------------
some_college 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3185        1   0.5902   0.5918   0.1336   0.3936 
     .10      .25      .50      .75      .90      .95 
  0.4370   0.5110   0.5921   0.6753   0.7368   0.7746 

lowest : 0.117647 0.122152 0.208128 0.228007 0.236698
highest: 0.925572 0.926316 0.930521 0.992308 1       
--------------------------------------------------------------------------------
unemployment 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3194        1     3181        1  0.03592  0.03482  0.01286  0.02035 
     .10      .25      .50      .75      .90      .95 
 0.02277  0.02730  0.03415  0.04201  0.05064  0.05705 

lowest : 0.00572082 0.00859107 0.0131761  0.0133748  0.0134731 
highest: 0.116036   0.125495   0.129104   0.130016   0.146677  
--------------------------------------------------------------------------------
income_inequality 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3180       15     3179        1    4.553    4.481   0.8517    3.527 
     .10      .25      .50      .75      .90      .95 
   3.706    4.019    4.436    4.945    5.544    6.009 

lowest : 1.65164 2.26575 2.41043 2.53293 2.57027
highest: 8.82769 9.19356 9.30925 9.51399 10.5323
--------------------------------------------------------------------------------
social_score 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3013        1    11.28    10.94    6.154    2.174 
     .10      .25      .50      .75      .90      .95 
   5.182    7.892   10.767   14.073   17.901   21.240 

lowest : 0        0.556783 0.931619 1.75186  1.84139 
highest: 41.0034  41.9162  43.4783  48.2245  59.4354 
--------------------------------------------------------------------------------
life_expectancy 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3122       73     3121        1    75.78    75.82    3.768    70.54 
     .10      .25      .50      .75      .90      .95 
   71.63    73.61    75.88    78.02    79.81    81.00 

lowest : 55.919  60.1049 61.1734 61.3558 61.3605
highest: 89.3706 91.1526 92.7249 93.0018 98.9029
--------------------------------------------------------------------------------
diabetes 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0      138        1   0.1056   0.1035  0.02444    0.076 
     .10      .25      .50      .75      .90      .95 
   0.081    0.089    0.102    0.118    0.134    0.147 

lowest : 0.056 0.058 0.06  0.062 0.063, highest: 0.203 0.204 0.205 0.216 0.222
--------------------------------------------------------------------------------
insuff_sleep 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3192        3      256        1   0.3445    0.344  0.04103   0.2860 
     .10      .25      .50      .75      .90      .95 
  0.2980   0.3190   0.3440   0.3680   0.3929   0.4070 

lowest : 0.238 0.24  0.247 0.248 0.249, highest: 0.46  0.463 0.465 0.469 0.484
--------------------------------------------------------------------------------
highschool_grad 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    2517      678     1637        1   0.8834   0.8905  0.06992   0.7637 
     .10      .25      .50      .75      .90      .95 
  0.8003   0.8538   0.8942   0.9258   0.9550   0.9706 

lowest : 0.175    0.221452 0.336957 0.425    0.475   
highest: 0.979681 0.98     0.981393 0.989176 0.995   
--------------------------------------------------------------------------------
segregation 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    2128     1067     2126        1    49.61    49.95    19.09    20.70 
     .10      .25      .50      .75      .90      .95 
   26.69    38.31    50.88    61.29    71.35    76.25 

lowest : 1.05295 1.06989 1.59135 1.67271 1.96799
highest: 88.7365 88.9926 89.8186 89.9605 93.7638
--------------------------------------------------------------------------------
home_ownership 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3191        1   0.7261   0.7333  0.09183   0.5686 
     .10      .25      .50      .75      .90      .95 
  0.6194   0.6826   0.7388   0.7839   0.8194   0.8371 

lowest : 0        0.157895 0.199638 0.243232 0.276574
highest: 0.912844 0.923458 0.933333 0.933466 0.969858
--------------------------------------------------------------------------------
housing_burden 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3190        5     3179        1   0.1095   0.1072   0.0387  0.06053 
     .10      .25      .50      .75      .90      .95 
 0.07002  0.08552  0.10478  0.12855  0.15675  0.17516 

lowest : 0.00595238 0.00878477 0.00996678 0.0159657  0.0185874 
highest: 0.25323    0.255763   0.2584     0.281928   0.311692  
--------------------------------------------------------------------------------
below_18 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3193        1   0.2166   0.2164  0.03757   0.1615 
     .10      .25      .50      .75      .90      .95 
  0.1746   0.1966   0.2173   0.2353   0.2537   0.2696 

lowest : 0.0243902 0.0354663 0.0493957 0.0570924 0.0711527
highest: 0.369018  0.375724  0.381907  0.414208  0.420995 
--------------------------------------------------------------------------------
above_65 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3189        1   0.2046    0.202  0.05296   0.1324 
     .10      .25      .50      .75      .90      .95 
  0.1483   0.1734   0.2008   0.2304   0.2665   0.2920 

lowest : 0.0544279 0.0629911 0.0684948 0.0737527 0.0755564
highest: 0.414013  0.417121  0.424593  0.435067  0.574546 
--------------------------------------------------------------------------------
black 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3187        1  0.09037  0.04868   0.1235 0.004405 
     .10      .25      .50      .75      .90      .95 
0.005453 0.008516 0.023844 0.103048 0.292486 0.410786 

lowest : 0           0.000646134 0.000772201 0.000950119 0.000986193
highest: 0.787643    0.790084    0.821227    0.845068    0.851448   
--------------------------------------------------------------------------------
native 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3190        1  0.02449 0.009342  0.03517 0.002600 
     .10      .25      .50      .75      .90      .95 
0.003159 0.004443 0.007416 0.014790 0.036667 0.084503 

lowest : 0           0.000555247 0.000785376 0.00115182  0.0011554  
highest: 0.810833    0.844577    0.848004    0.895259    0.921116   
--------------------------------------------------------------------------------
asian 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3184        1  0.01755  0.01038     0.02 0.002817 
     .10      .25      .50      .75      .90      .95 
0.003649 0.005451 0.008366 0.016349 0.036610 0.061040 

lowest : 0           0.000563063 0.00061757  0.000723589 0.000817184
highest: 0.372693    0.385787    0.414258    0.425768    0.463802   
--------------------------------------------------------------------------------
hispanic 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3194        1   0.1045  0.06686   0.1171  0.01596 
     .10      .25      .50      .75      .90      .95 
 0.01953  0.02863  0.05158  0.11195  0.25909  0.40959 

lowest : 0.00662915 0.00732961 0.00762389 0.00768806 0.00772813
highest: 0.936973   0.948533   0.950321   0.952917   0.960641  
--------------------------------------------------------------------------------
white 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3193        1   0.7468   0.7698   0.2173   0.3256 
     .10      .25      .50      .75      .90      .95 
  0.4383   0.6286   0.8172   0.9121   0.9414   0.9520 

lowest : 0.0269389 0.0284045 0.0345667 0.03594   0.0400751
highest: 0.972168  0.972469  0.97272   0.9754    0.975858 
--------------------------------------------------------------------------------
female 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     3192        1   0.4952   0.4982  0.02076   0.4596 
     .10      .25      .50      .75      .90      .95 
  0.4747   0.4899   0.4988   0.5065   0.5140   0.5201 

lowest : 0.215686 0.270395 0.287504 0.291292 0.301648
highest: 0.544438 0.546502 0.547111 0.548661 0.577624
--------------------------------------------------------------------------------
rural 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0     2066    0.959   0.6347    0.639   0.3791  0.05155 
     .10      .25      .50      .75      .90      .95 
 0.13787  0.34769  0.65675  1.00000  1.00000  1.00000 

lowest : 0           6.79046e-07 3.07633e-05 3.99956e-05 4.51164e-05
highest: 0.998927    0.998929    0.999886    0.99994     1          
--------------------------------------------------------------------------------
inactivity 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0      280        1    0.266    0.265  0.05883    0.184 
     .10      .25      .50      .75      .90      .95 
   0.200    0.230    0.262    0.300    0.333    0.357 

lowest : 0.12  0.121 0.122 0.123 0.13 , highest: 0.443 0.45  0.456 0.457 0.47 
--------------------------------------------------------------------------------
obesity 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3195        0      271        1   0.3729   0.3755  0.05027    0.287 
     .10      .25      .50      .75      .90      .95 
   0.314    0.350    0.376    0.402    0.427    0.439 

lowest : 0.174 0.185 0.187 0.197 0.201, highest: 0.509 0.51  0.512 0.52  0.525
--------------------------------------------------------------------------------
household_income 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3194        1     3101        1    63449    61837    17240    42604 
     .10      .25      .50      .75      .90      .95 
   45708    52610    60986    70897    83207    94785 

lowest :  28972  30659  30688  31972  32783, highest: 142513 143795 144632 150502 167605
--------------------------------------------------------------------------------
dentist_grp 
       n  missing distinct 
    1554     1641        2 
                    
Value      High  Low
Frequency   777  777
Proportion  0.5  0.5
--------------------------------------------------------------------------------
smoking_prop 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3192        3     3191        1    17.87    17.71    4.014    12.96 
     .10      .25      .50      .75      .90      .95 
   13.84    15.23    17.31    20.28    22.44    23.85 

lowest : 6.73543 6.82175 7.16927 7.68742 7.73646
highest: 37.029  38.3231 39.0797 41.2018 42.7541
--------------------------------------------------------------------------------
drinking_prop 
       n  missing distinct     Info     Mean  pMedian      Gmd      .05 
    3192        3     3191        1    17.43    17.39    3.677    12.29 
     .10      .25      .50      .75      .90      .95 
   13.14    15.10    17.40    19.68    21.47    22.92 

lowest : 9.26516 9.28781 9.40767 9.48465 9.52642
highest: 27.8133 28.0047 28.0822 28.1579 29.4401
--------------------------------------------------------------------------------
dentist_n 
       n  missing distinct     Info      Sum     Mean 
    1554     1641        2     0.75      777      0.5 

--------------------------------------------------------------------------------

2.5 Dealing with Missingness

Code
miss_var_summary(chr_1)
# A tibble: 35 × 3
   variable          n_miss pct_miss
   <chr>              <int>    <num>
 1 dentist_grp         1641   51.4  
 2 dentist_n           1641   51.4  
 3 segregation         1067   33.4  
 4 highschool_grad      678   21.2  
 5 mental_care          185    5.79 
 6 primary_care         157    4.91 
 7 life_expectancy       73    2.28 
 8 food_index            35    1.10 
 9 income_inequality     15    0.469
10 housing_burden         5    0.156
# ℹ 25 more rows

The segregation variable constitutes 33.4% of the missing data, followed by highschool_grd (21.2%), mental_care (5.79%), primary_care (4.91%), life_expectancy (2.28%), food_index (1.10%). Other variables with minimal missingness (<1%) include income_inequality, housing_burden, insuff_sleep, uninsured, unemployment, household_income.

2.5.1 Filter Complete Cases For Exposure dentist_prop

We will remove missing cases for the exposure dentist_prop and name the set chr24_si.

Code
chr_simp <- chr_1 |>
  filter(!is.na(dentist_grp))|>
  filter(!is.na(dentist_n))
Code
n_miss(chr_simp$dentist_n)
[1] 0

The outcome variable has no missing values. Now we will deal with the missing values in the covariates.

2.5.2 Single Imputation for Covariates

We will use the mice function from the mice package to perform single imputation for all variables in the chr24_si data set.

Code
chr_simp <- 
  mice(chr_simp, m = 1, seed = 500, print = FALSE) |>
  complete() |>
  tibble()
Warning: Number of logged events: 67
Code
miss_var_summary(chr_simp)
# A tibble: 35 × 3
   variable      n_miss pct_miss
   <chr>          <int>    <num>
 1 fipscode           0        0
 2 poor_health        0        0
 3 poor_physical      0        0
 4 poor_mental        0        0
 5 food_index         0        0
 6 uninsured          0        0
 7 primary_care       0        0
 8 mental_care        0        0
 9 some_college       0        0
10 unemployment       0        0
# ℹ 25 more rows

There are no missing observation in the semi-final working data set chr_simp.

3 Final Variable Selection

3.1 Check Correlation

Code
res <- correlation(chr_simp, method = "spearman")

res
# Correlation Matrix (spearman-method)

Parameter1        |        Parameter2 |       rho |         95% CI |        S |         p
-----------------------------------------------------------------------------------------
poor_health       |     poor_physical |      0.94 | [ 0.93,  0.95] | 3.77e+07 | < .001***
poor_health       |       poor_mental |      0.75 | [ 0.73,  0.77] | 1.54e+08 | < .001***
poor_health       |        food_index |     -0.68 | [-0.70, -0.65] | 1.05e+09 | < .001***
poor_health       |         uninsured |      0.54 | [ 0.51,  0.58] | 2.85e+08 | < .001***
poor_health       |      primary_care |     -0.40 | [-0.44, -0.36] | 8.75e+08 | < .001***
poor_health       |       mental_care |      0.37 | [ 0.33,  0.42] | 3.93e+08 | < .001***
poor_health       |      some_college |     -0.79 | [-0.81, -0.77] | 1.12e+09 | < .001***
poor_health       |      unemployment |      0.47 | [ 0.43,  0.51] | 3.29e+08 | < .001***
poor_health       | income_inequality |      0.45 | [ 0.41,  0.49] | 3.41e+08 | < .001***
poor_health       |      social_score |     -0.13 | [-0.18, -0.08] | 7.05e+08 | < .001***
poor_health       |   life_expectancy |     -0.81 | [-0.83, -0.79] | 1.13e+09 | < .001***
poor_health       |          diabetes |      0.94 | [ 0.93,  0.94] | 3.96e+07 | < .001***
poor_health       |      insuff_sleep |      0.79 | [ 0.77,  0.80] | 1.34e+08 | < .001***
poor_health       |   highschool_grad |      0.01 | [-0.04,  0.06] | 6.18e+08 | > .999   
poor_health       |       segregation |     -0.30 | [-0.35, -0.25] | 8.14e+08 | < .001***
poor_health       |    home_ownership |     -0.06 | [-0.11, -0.01] | 6.62e+08 | > .999   
poor_health       |    housing_burden |  1.45e-03 | [-0.05,  0.05] | 6.25e+08 | > .999   
poor_health       |          below_18 |      0.16 | [ 0.11,  0.21] | 5.25e+08 | < .001***
poor_health       |          above_65 |     -0.10 | [-0.15, -0.04] | 6.85e+08 | 0.021*   
poor_health       |             black |      0.30 | [ 0.25,  0.35] | 4.38e+08 | < .001***
poor_health       |            native | -7.28e-03 | [-0.06,  0.04] | 6.30e+08 | > .999   
poor_health       |             asian |     -0.36 | [-0.41, -0.32] | 8.53e+08 | < .001***
poor_health       |          hispanic |     -0.06 | [-0.11, -0.01] | 6.62e+08 | > .999   
poor_health       |             white |     -0.29 | [-0.34, -0.25] | 8.09e+08 | < .001***
poor_health       |            female |      0.05 | [ 0.00,  0.10] | 5.94e+08 | > .999   
poor_health       |             rural |      0.33 | [ 0.28,  0.37] | 4.21e+08 | < .001***
poor_health       |        inactivity |      0.88 | [ 0.87,  0.89] | 7.60e+07 | < .001***
poor_health       |           obesity |      0.69 | [ 0.66,  0.71] | 1.96e+08 | < .001***
poor_health       |  household_income |     -0.78 | [-0.80, -0.76] | 1.11e+09 | < .001***
poor_health       |      smoking_prop |      0.71 | [ 0.69,  0.74] | 1.80e+08 | < .001***
poor_health       |     drinking_prop |     -0.69 | [-0.72, -0.66] | 1.06e+09 | < .001***
poor_health       |         dentist_n |      0.44 | [ 0.39,  0.48] | 3.53e+08 | < .001***
poor_physical     |       poor_mental |      0.85 | [ 0.84,  0.87] | 9.12e+07 | < .001***
poor_physical     |        food_index |     -0.64 | [-0.67, -0.61] | 1.03e+09 | < .001***
poor_physical     |         uninsured |      0.46 | [ 0.42,  0.50] | 3.38e+08 | < .001***
poor_physical     |      primary_care |     -0.41 | [-0.45, -0.36] | 8.80e+08 | < .001***
poor_physical     |       mental_care |      0.35 | [ 0.31,  0.40] | 4.04e+08 | < .001***
poor_physical     |      some_college |     -0.78 | [-0.80, -0.76] | 1.11e+09 | < .001***
poor_physical     |      unemployment |      0.47 | [ 0.43,  0.51] | 3.32e+08 | < .001***
poor_physical     | income_inequality |      0.40 | [ 0.36,  0.44] | 3.75e+08 | < .001***
poor_physical     |      social_score |     -0.17 | [-0.22, -0.12] | 7.32e+08 | < .001***
poor_physical     |   life_expectancy |     -0.80 | [-0.82, -0.78] | 1.12e+09 | < .001***
poor_physical     |          diabetes |      0.82 | [ 0.81,  0.84] | 1.10e+08 | < .001***
poor_physical     |      insuff_sleep |      0.76 | [ 0.74,  0.78] | 1.50e+08 | < .001***
poor_physical     |   highschool_grad |      0.04 | [-0.01,  0.09] | 5.98e+08 | > .999   
poor_physical     |       segregation |     -0.25 | [-0.29, -0.20] | 7.79e+08 | < .001***
poor_physical     |    home_ownership |  9.49e-03 | [-0.04,  0.06] | 6.20e+08 | > .999   
poor_physical     |    housing_burden |     -0.03 | [-0.08,  0.02] | 6.45e+08 | > .999   
poor_physical     |          below_18 |      0.10 | [ 0.05,  0.15] | 5.63e+08 | 0.010*   
poor_physical     |          above_65 |     -0.04 | [-0.09,  0.01] | 6.50e+08 | > .999   
poor_physical     |             black |      0.18 | [ 0.13,  0.23] | 5.10e+08 | < .001***
poor_physical     |            native |     -0.04 | [-0.09,  0.02] | 6.48e+08 | > .999   
poor_physical     |             asian |     -0.40 | [-0.44, -0.35] | 8.74e+08 | < .001***
poor_physical     |          hispanic |     -0.16 | [-0.21, -0.11] | 7.24e+08 | < .001***
poor_physical     |             white |     -0.12 | [-0.17, -0.07] | 7.00e+08 | < .001***
poor_physical     |            female |      0.02 | [-0.03,  0.08] | 6.10e+08 | > .999   
poor_physical     |             rural |      0.34 | [ 0.30,  0.39] | 4.11e+08 | < .001***
poor_physical     |        inactivity |      0.80 | [ 0.78,  0.82] | 1.26e+08 | < .001***
poor_physical     |           obesity |      0.61 | [ 0.58,  0.65] | 2.41e+08 | < .001***
poor_physical     |  household_income |     -0.75 | [-0.77, -0.73] | 1.10e+09 | < .001***
poor_physical     |      smoking_prop |      0.74 | [ 0.72,  0.76] | 1.61e+08 | < .001***
poor_physical     |     drinking_prop |     -0.68 | [-0.70, -0.65] | 1.05e+09 | < .001***
poor_physical     |         dentist_n |      0.43 | [ 0.39,  0.47] | 3.55e+08 | < .001***
poor_mental       |        food_index |     -0.52 | [-0.56, -0.48] | 9.52e+08 | < .001***
poor_mental       |         uninsured |      0.26 | [ 0.21,  0.31] | 4.61e+08 | < .001***
poor_mental       |      primary_care |     -0.25 | [-0.29, -0.20] | 7.79e+08 | < .001***
poor_mental       |       mental_care |      0.18 | [ 0.13,  0.23] | 5.11e+08 | < .001***
poor_mental       |      some_college |     -0.59 | [-0.62, -0.55] | 9.94e+08 | < .001***
poor_mental       |      unemployment |      0.46 | [ 0.42,  0.50] | 3.36e+08 | < .001***
poor_mental       | income_inequality |      0.39 | [ 0.35,  0.44] | 3.79e+08 | < .001***
poor_mental       |      social_score |     -0.16 | [-0.21, -0.11] | 7.26e+08 | < .001***
poor_mental       |   life_expectancy |     -0.68 | [-0.71, -0.65] | 1.05e+09 | < .001***
poor_mental       |          diabetes |      0.65 | [ 0.61,  0.67] | 2.22e+08 | < .001***
poor_mental       |      insuff_sleep |      0.68 | [ 0.66,  0.71] | 1.98e+08 | < .001***
poor_mental       |   highschool_grad |  4.24e-03 | [-0.05,  0.06] | 6.23e+08 | > .999   
poor_mental       |       segregation |     -0.16 | [-0.21, -0.11] | 7.23e+08 | < .001***
poor_mental       |    home_ownership |     -0.03 | [-0.08,  0.02] | 6.44e+08 | > .999   
poor_mental       |    housing_burden |      0.08 | [ 0.03,  0.13] | 5.77e+08 | 0.251    
poor_mental       |          below_18 |     -0.03 | [-0.08,  0.02] | 6.44e+08 | > .999   
poor_mental       |          above_65 |     -0.03 | [-0.08,  0.02] | 6.46e+08 | > .999   
poor_mental       |             black |      0.19 | [ 0.14,  0.24] | 5.08e+08 | < .001***
poor_mental       |            native |     -0.09 | [-0.14, -0.04] | 6.81e+08 | 0.051    
poor_mental       |             asian |     -0.25 | [-0.29, -0.20] | 7.79e+08 | < .001***
poor_mental       |          hispanic |     -0.16 | [-0.21, -0.11] | 7.27e+08 | < .001***
poor_mental       |             white |     -0.05 | [-0.10,  0.00] | 6.57e+08 | > .999   
poor_mental       |            female |      0.18 | [ 0.13,  0.23] | 5.14e+08 | < .001***
poor_mental       |             rural |      0.17 | [ 0.12,  0.22] | 5.17e+08 | < .001***
poor_mental       |        inactivity |      0.56 | [ 0.53,  0.60] | 2.72e+08 | < .001***
poor_mental       |           obesity |      0.41 | [ 0.37,  0.45] | 3.68e+08 | < .001***
poor_mental       |  household_income |     -0.59 | [-0.62, -0.56] | 9.94e+08 | < .001***
poor_mental       |      smoking_prop |      0.63 | [ 0.60,  0.66] | 2.32e+08 | < .001***
poor_mental       |     drinking_prop |     -0.56 | [-0.59, -0.52] | 9.75e+08 | < .001***
poor_mental       |         dentist_n |      0.28 | [ 0.23,  0.32] | 4.52e+08 | < .001***
food_index        |         uninsured |     -0.49 | [-0.52, -0.45] | 9.29e+08 | < .001***
food_index        |      primary_care |      0.31 | [ 0.26,  0.35] | 4.34e+08 | < .001***
food_index        |       mental_care |     -0.12 | [-0.17, -0.07] | 7.02e+08 | < .001***
food_index        |      some_college |      0.49 | [ 0.45,  0.53] | 3.18e+08 | < .001***
food_index        |      unemployment |     -0.36 | [-0.40, -0.31] | 8.49e+08 | < .001***
food_index        | income_inequality |     -0.36 | [-0.40, -0.31] | 8.50e+08 | < .001***
food_index        |      social_score |      0.15 | [ 0.10,  0.20] | 5.31e+08 | < .001***
food_index        |   life_expectancy |      0.59 | [ 0.56,  0.62] | 2.57e+08 | < .001***
food_index        |          diabetes |     -0.63 | [-0.66, -0.60] | 1.02e+09 | < .001***
food_index        |      insuff_sleep |     -0.45 | [-0.49, -0.41] | 9.06e+08 | < .001***
food_index        |   highschool_grad |      0.07 | [ 0.02,  0.12] | 5.81e+08 | 0.490    
food_index        |       segregation |      0.12 | [ 0.07,  0.17] | 5.52e+08 | < .001***
food_index        |    home_ownership |      0.10 | [ 0.05,  0.15] | 5.65e+08 | 0.020*   
food_index        |    housing_burden |  2.06e-03 | [-0.05,  0.05] | 6.24e+08 | > .999   
food_index        |          below_18 |     -0.09 | [-0.15, -0.04] | 6.85e+08 | 0.024*   
food_index        |          above_65 |     -0.07 | [-0.12, -0.02] | 6.72e+08 | 0.359    
food_index        |             black |     -0.07 | [-0.12, -0.02] | 6.71e+08 | 0.391    
food_index        |            native |     -0.14 | [-0.19, -0.09] | 7.12e+08 | < .001***
food_index        |             asian |      0.34 | [ 0.29,  0.38] | 4.14e+08 | < .001***
food_index        |          hispanic |      0.03 | [-0.02,  0.08] | 6.08e+08 | > .999   
food_index        |             white |      0.17 | [ 0.11,  0.21] | 5.22e+08 | < .001***
food_index        |            female |      0.02 | [-0.03,  0.07] | 6.14e+08 | > .999   
food_index        |             rural |     -0.31 | [-0.36, -0.27] | 8.22e+08 | < .001***
food_index        |        inactivity |     -0.59 | [-0.62, -0.55] | 9.93e+08 | < .001***
food_index        |           obesity |     -0.49 | [-0.53, -0.45] | 9.32e+08 | < .001***
food_index        |  household_income |      0.70 | [ 0.68,  0.73] | 1.85e+08 | < .001***
food_index        |      smoking_prop |     -0.51 | [-0.54, -0.47] | 9.42e+08 | < .001***
food_index        |     drinking_prop |      0.49 | [ 0.45,  0.53] | 3.18e+08 | < .001***
food_index        |         dentist_n |     -0.30 | [-0.35, -0.25] | 8.14e+08 | < .001***
uninsured         |      primary_care |     -0.32 | [-0.36, -0.27] | 8.23e+08 | < .001***
uninsured         |       mental_care |      0.37 | [ 0.32,  0.41] | 3.97e+08 | < .001***
uninsured         |      some_college |     -0.51 | [-0.54, -0.47] | 9.42e+08 | < .001***
uninsured         |      unemployment |      0.01 | [-0.04,  0.06] | 6.19e+08 | > .999   
uninsured         | income_inequality |      0.14 | [ 0.09,  0.19] | 5.40e+08 | < .001***
uninsured         |      social_score |     -0.05 | [-0.10,  0.00] | 6.57e+08 | > .999   
uninsured         |   life_expectancy |     -0.40 | [-0.44, -0.36] | 8.77e+08 | < .001***
uninsured         |          diabetes |      0.54 | [ 0.50,  0.57] | 2.89e+08 | < .001***
uninsured         |      insuff_sleep |      0.27 | [ 0.22,  0.32] | 4.56e+08 | < .001***
uninsured         |   highschool_grad |      0.03 | [-0.02,  0.08] | 6.05e+08 | > .999   
uninsured         |       segregation |     -0.27 | [-0.31, -0.22] | 7.92e+08 | < .001***
uninsured         |    home_ownership |      0.02 | [-0.04,  0.07] | 6.16e+08 | > .999   
uninsured         |    housing_burden |     -0.13 | [-0.18, -0.08] | 7.05e+08 | < .001***
uninsured         |          below_18 |      0.25 | [ 0.20,  0.29] | 4.71e+08 | < .001***
uninsured         |          above_65 |     -0.03 | [-0.08,  0.02] | 6.42e+08 | > .999   
uninsured         |             black |      0.08 | [ 0.02,  0.13] | 5.78e+08 | 0.316    
uninsured         |            native |      0.37 | [ 0.33,  0.41] | 3.94e+08 | < .001***
uninsured         |             asian |     -0.24 | [-0.29, -0.19] | 7.75e+08 | < .001***
uninsured         |          hispanic |      0.26 | [ 0.22,  0.31] | 4.60e+08 | < .001***
uninsured         |             white |     -0.36 | [-0.40, -0.31] | 8.49e+08 | < .001***
uninsured         |            female |     -0.10 | [-0.15, -0.05] | 6.88e+08 | 0.010**  
uninsured         |             rural |      0.36 | [ 0.31,  0.40] | 4.00e+08 | < .001***
uninsured         |        inactivity |      0.46 | [ 0.42,  0.50] | 3.35e+08 | < .001***
uninsured         |           obesity |      0.31 | [ 0.27,  0.36] | 4.29e+08 | < .001***
uninsured         |  household_income |     -0.41 | [-0.45, -0.37] | 8.83e+08 | < .001***
uninsured         |      smoking_prop |      0.25 | [ 0.20,  0.29] | 4.71e+08 | < .001***
uninsured         |     drinking_prop |     -0.45 | [-0.49, -0.41] | 9.06e+08 | < .001***
uninsured         |         dentist_n |      0.39 | [ 0.34,  0.43] | 3.84e+08 | < .001***
primary_care      |       mental_care |     -0.50 | [-0.54, -0.46] | 9.38e+08 | < .001***
primary_care      |      some_college |      0.49 | [ 0.45,  0.52] | 3.22e+08 | < .001***
primary_care      |      unemployment |     -0.09 | [-0.14, -0.04] | 6.80e+08 | 0.065    
primary_care      | income_inequality |      0.05 | [ 0.00,  0.10] | 5.95e+08 | > .999   
primary_care      |      social_score |      0.14 | [ 0.09,  0.19] | 5.37e+08 | < .001***
primary_care      |   life_expectancy |      0.34 | [ 0.29,  0.38] | 4.13e+08 | < .001***
primary_care      |          diabetes |     -0.34 | [-0.38, -0.29] | 8.38e+08 | < .001***
primary_care      |      insuff_sleep |     -0.31 | [-0.35, -0.26] | 8.19e+08 | < .001***
primary_care      |   highschool_grad |     -0.17 | [-0.22, -0.12] | 7.30e+08 | < .001***
primary_care      |       segregation |      0.25 | [ 0.20,  0.29] | 4.71e+08 | < .001***
primary_care      |    home_ownership |     -0.39 | [-0.43, -0.35] | 8.70e+08 | < .001***
primary_care      |    housing_burden |      0.32 | [ 0.28,  0.37] | 4.24e+08 | < .001***
primary_care      |          below_18 |     -0.10 | [-0.15, -0.05] | 6.88e+08 | 0.011*   
primary_care      |          above_65 |     -0.17 | [-0.21, -0.11] | 7.29e+08 | < .001***
primary_care      |             black |      0.03 | [-0.02,  0.08] | 6.07e+08 | > .999   
primary_care      |            native |  6.63e-03 | [-0.04,  0.06] | 6.21e+08 | > .999   
primary_care      |             asian |      0.54 | [ 0.50,  0.57] | 2.88e+08 | < .001***
primary_care      |          hispanic |      0.17 | [ 0.12,  0.22] | 5.21e+08 | < .001***
primary_care      |             white |     -0.12 | [-0.17, -0.07] | 7.00e+08 | < .001***
primary_care      |            female |      0.28 | [ 0.23,  0.32] | 4.53e+08 | < .001***
primary_care      |             rural |     -0.55 | [-0.59, -0.51] | 9.70e+08 | < .001***
primary_care      |        inactivity |     -0.44 | [-0.48, -0.40] | 8.99e+08 | < .001***
primary_care      |           obesity |     -0.42 | [-0.46, -0.37] | 8.86e+08 | < .001***
primary_care      |  household_income |      0.37 | [ 0.33,  0.42] | 3.92e+08 | < .001***
primary_care      |      smoking_prop |     -0.31 | [-0.35, -0.26] | 8.17e+08 | < .001***
primary_care      |     drinking_prop |      0.32 | [ 0.28,  0.37] | 4.23e+08 | < .001***
primary_care      |         dentist_n |     -0.67 | [-0.69, -0.64] | 1.04e+09 | < .001***
mental_care       |      some_college |     -0.47 | [-0.50, -0.42] | 9.16e+08 | < .001***
mental_care       |      unemployment |     -0.02 | [-0.07,  0.03] | 6.40e+08 | > .999   
mental_care       | income_inequality |     -0.06 | [-0.11,  0.00] | 6.60e+08 | > .999   
mental_care       |      social_score |      0.10 | [ 0.05,  0.15] | 5.61e+08 | 0.008**  
mental_care       |   life_expectancy |     -0.31 | [-0.36, -0.27] | 8.21e+08 | < .001***
mental_care       |          diabetes |      0.34 | [ 0.30,  0.39] | 4.12e+08 | < .001***
mental_care       |      insuff_sleep |      0.32 | [ 0.28,  0.37] | 4.23e+08 | < .001***
mental_care       |   highschool_grad |      0.32 | [ 0.27,  0.36] | 4.27e+08 | < .001***
mental_care       |       segregation |     -0.34 | [-0.39, -0.30] | 8.40e+08 | < .001***
mental_care       |    home_ownership |      0.48 | [ 0.44,  0.52] | 3.27e+08 | < .001***
mental_care       |    housing_burden |     -0.42 | [-0.46, -0.37] | 8.86e+08 | < .001***
mental_care       |          below_18 |      0.12 | [ 0.07,  0.17] | 5.48e+08 | < .001***
mental_care       |          above_65 |      0.16 | [ 0.11,  0.21] | 5.24e+08 | < .001***
mental_care       |             black |  7.22e-03 | [-0.04,  0.06] | 6.21e+08 | > .999   
mental_care       |            native |     -0.11 | [-0.16, -0.06] | 6.95e+08 | 0.002**  
mental_care       |             asian |     -0.48 | [-0.52, -0.44] | 9.26e+08 | < .001***
mental_care       |          hispanic |     -0.21 | [-0.25, -0.16] | 7.54e+08 | < .001***
mental_care       |             white |      0.14 | [ 0.09,  0.19] | 5.37e+08 | < .001***
mental_care       |            female |     -0.17 | [-0.22, -0.12] | 7.29e+08 | < .001***
mental_care       |             rural |      0.55 | [ 0.51,  0.58] | 2.83e+08 | < .001***
mental_care       |        inactivity |      0.45 | [ 0.41,  0.49] | 3.43e+08 | < .001***
mental_care       |           obesity |      0.43 | [ 0.38,  0.47] | 3.59e+08 | < .001***
mental_care       |  household_income |     -0.32 | [-0.37, -0.28] | 8.28e+08 | < .001***
mental_care       |      smoking_prop |      0.25 | [ 0.20,  0.30] | 4.70e+08 | < .001***
mental_care       |     drinking_prop |     -0.34 | [-0.39, -0.30] | 8.39e+08 | < .001***
mental_care       |         dentist_n |      0.63 | [ 0.59,  0.66] | 2.34e+08 | < .001***
some_college      |      unemployment |     -0.35 | [-0.40, -0.31] | 8.47e+08 | < .001***
some_college      | income_inequality |     -0.22 | [-0.27, -0.18] | 7.66e+08 | < .001***
some_college      |      social_score |      0.09 | [ 0.04,  0.14] | 5.68e+08 | 0.038*   
some_college      |   life_expectancy |      0.66 | [ 0.63,  0.68] | 2.15e+08 | < .001***
some_college      |          diabetes |     -0.70 | [-0.72, -0.67] | 1.06e+09 | < .001***
some_college      |      insuff_sleep |     -0.62 | [-0.65, -0.58] | 1.01e+09 | < .001***
some_college      |   highschool_grad |     -0.05 | [-0.10,  0.00] | 6.58e+08 | > .999   
some_college      |       segregation |      0.24 | [ 0.19,  0.29] | 4.74e+08 | < .001***
some_college      |    home_ownership |     -0.18 | [-0.22, -0.13] | 7.35e+08 | < .001***
some_college      |    housing_burden |      0.17 | [ 0.12,  0.22] | 5.20e+08 | < .001***
some_college      |          below_18 |     -0.08 | [-0.13, -0.03] | 6.78e+08 | 0.119    
some_college      |          above_65 |     -0.08 | [-0.13, -0.03] | 6.73e+08 | 0.284    
some_college      |             black |     -0.11 | [-0.16, -0.06] | 6.94e+08 | 0.002**  
some_college      |            native |     -0.05 | [-0.10,  0.00] | 6.56e+08 | > .999   
some_college      |             asian |      0.46 | [ 0.42,  0.50] | 3.39e+08 | < .001***
some_college      |          hispanic |      0.10 | [ 0.05,  0.15] | 5.64e+08 | 0.015*   
some_college      |             white |      0.10 | [ 0.05,  0.15] | 5.65e+08 | 0.018*   
some_college      |            female |      0.16 | [ 0.11,  0.21] | 5.27e+08 | < .001***
some_college      |             rural |     -0.47 | [-0.51, -0.43] | 9.20e+08 | < .001***
some_college      |        inactivity |     -0.74 | [-0.76, -0.72] | 1.09e+09 | < .001***
some_college      |           obesity |     -0.58 | [-0.61, -0.54] | 9.88e+08 | < .001***
some_college      |  household_income |      0.66 | [ 0.63,  0.69] | 2.12e+08 | < .001***
some_college      |      smoking_prop |     -0.58 | [-0.62, -0.55] | 9.90e+08 | < .001***
some_college      |     drinking_prop |      0.56 | [ 0.52,  0.59] | 2.77e+08 | < .001***
some_college      |         dentist_n |     -0.52 | [-0.56, -0.48] | 9.52e+08 | < .001***
unemployment      | income_inequality |      0.36 | [ 0.32,  0.41] | 3.98e+08 | < .001***
unemployment      |      social_score |     -0.15 | [-0.20, -0.10] | 7.22e+08 | < .001***
unemployment      |   life_expectancy |     -0.39 | [-0.43, -0.34] | 8.68e+08 | < .001***
unemployment      |          diabetes |      0.41 | [ 0.37,  0.46] | 3.67e+08 | < .001***
unemployment      |      insuff_sleep |      0.45 | [ 0.41,  0.49] | 3.42e+08 | < .001***
unemployment      |   highschool_grad |     -0.18 | [-0.23, -0.13] | 7.40e+08 | < .001***
unemployment      |       segregation |     -0.02 | [-0.07,  0.03] | 6.39e+08 | > .999   
unemployment      |    home_ownership |     -0.08 | [-0.13, -0.03] | 6.74e+08 | 0.251    
unemployment      |    housing_burden |      0.23 | [ 0.18,  0.27] | 4.84e+08 | < .001***
unemployment      |          below_18 |     -0.19 | [-0.24, -0.14] | 7.43e+08 | < .001***
unemployment      |          above_65 |  1.51e-03 | [-0.05,  0.05] | 6.25e+08 | > .999   
unemployment      |             black |      0.23 | [ 0.18,  0.28] | 4.82e+08 | < .001***
unemployment      |            native |     -0.04 | [-0.09,  0.01] | 6.49e+08 | > .999   
unemployment      |             asian |     -0.02 | [-0.07,  0.03] | 6.37e+08 | > .999   
unemployment      |          hispanic |     -0.01 | [-0.06,  0.04] | 6.32e+08 | > .999   
unemployment      |             white |     -0.24 | [-0.29, -0.19] | 7.76e+08 | < .001***
unemployment      |            female |      0.12 | [ 0.07,  0.17] | 5.49e+08 | < .001***
unemployment      |             rural | -9.86e-03 | [-0.06,  0.04] | 6.32e+08 | > .999   
unemployment      |        inactivity |      0.29 | [ 0.24,  0.34] | 4.44e+08 | < .001***
unemployment      |           obesity |      0.18 | [ 0.13,  0.23] | 5.14e+08 | < .001***
unemployment      |  household_income |     -0.34 | [-0.39, -0.30] | 8.40e+08 | < .001***
unemployment      |      smoking_prop |      0.34 | [ 0.29,  0.38] | 4.14e+08 | < .001***
unemployment      |     drinking_prop |     -0.24 | [-0.29, -0.19] | 7.78e+08 | < .001***
unemployment      |         dentist_n |      0.04 | [-0.01,  0.09] | 6.00e+08 | > .999   
income_inequality |      social_score |     -0.06 | [-0.11, -0.01] | 6.63e+08 | > .999   
income_inequality |   life_expectancy |     -0.36 | [-0.40, -0.31] | 8.50e+08 | < .001***
income_inequality |          diabetes |      0.47 | [ 0.43,  0.51] | 3.31e+08 | < .001***
income_inequality |      insuff_sleep |      0.40 | [ 0.35,  0.44] | 3.76e+08 | < .001***
income_inequality |   highschool_grad |     -0.13 | [-0.18, -0.08] | 7.04e+08 | < .001***
income_inequality |       segregation |     -0.04 | [-0.09,  0.01] | 6.50e+08 | > .999   
income_inequality |    home_ownership |     -0.27 | [-0.31, -0.22] | 7.93e+08 | < .001***
income_inequality |    housing_burden |      0.42 | [ 0.37,  0.46] | 3.66e+08 | < .001***
income_inequality |          below_18 |     -0.15 | [-0.20, -0.10] | 7.18e+08 | < .001***
income_inequality |          above_65 |     -0.03 | [-0.08,  0.03] | 6.42e+08 | > .999   
income_inequality |             black |      0.31 | [ 0.27,  0.36] | 4.29e+08 | < .001***
income_inequality |            native |     -0.08 | [-0.13, -0.03] | 6.78e+08 | 0.113    
income_inequality |             asian |     -0.03 | [-0.09,  0.02] | 6.47e+08 | > .999   
income_inequality |          hispanic |     -0.05 | [-0.10,  0.00] | 6.57e+08 | > .999   
income_inequality |             white |     -0.26 | [-0.31, -0.22] | 7.90e+08 | < .001***
income_inequality |            female |      0.26 | [ 0.21,  0.30] | 4.64e+08 | < .001***
income_inequality |             rural |     -0.03 | [-0.08,  0.03] | 6.41e+08 | > .999   
income_inequality |        inactivity |      0.38 | [ 0.34,  0.42] | 3.87e+08 | < .001***
income_inequality |           obesity |      0.23 | [ 0.18,  0.28] | 4.81e+08 | < .001***
income_inequality |  household_income |     -0.41 | [-0.45, -0.37] | 8.83e+08 | < .001***
income_inequality |      smoking_prop |      0.34 | [ 0.29,  0.38] | 4.15e+08 | < .001***
income_inequality |     drinking_prop |     -0.26 | [-0.31, -0.21] | 7.88e+08 | < .001***
income_inequality |         dentist_n |      0.02 | [-0.03,  0.07] | 6.11e+08 | > .999   
social_score      |   life_expectancy |      0.02 | [-0.03,  0.07] | 6.11e+08 | > .999   
social_score      |          diabetes |     -0.11 | [-0.16, -0.06] | 6.94e+08 | 0.002**  
social_score      |      insuff_sleep |     -0.12 | [-0.17, -0.07] | 7.00e+08 | < .001***
social_score      |   highschool_grad |      0.04 | [-0.01,  0.09] | 5.98e+08 | > .999   
social_score      |       segregation |     -0.02 | [-0.07,  0.04] | 6.35e+08 | > .999   
social_score      |    home_ownership |      0.08 | [ 0.03,  0.13] | 5.73e+08 | 0.121    
social_score      |    housing_burden |     -0.19 | [-0.24, -0.14] | 7.43e+08 | < .001***
social_score      |          below_18 |     -0.01 | [-0.06,  0.04] | 6.32e+08 | > .999   
social_score      |          above_65 |      0.22 | [ 0.17,  0.27] | 4.88e+08 | < .001***
social_score      |             black |     -0.02 | [-0.07,  0.03] | 6.37e+08 | > .999   
social_score      |            native |     -0.08 | [-0.13, -0.03] | 6.74e+08 | 0.251    
social_score      |             asian |     -0.11 | [-0.16, -0.06] | 6.92e+08 | 0.004**  
social_score      |          hispanic |     -0.19 | [-0.24, -0.14] | 7.47e+08 | < .001***
social_score      |             white |      0.19 | [ 0.14,  0.24] | 5.05e+08 | < .001***
social_score      |            female |      0.09 | [ 0.04,  0.14] | 5.69e+08 | 0.046*   
social_score      |             rural |      0.05 | [ 0.00,  0.10] | 5.93e+08 | > .999   
social_score      |        inactivity |      0.01 | [-0.04,  0.06] | 6.18e+08 | > .999   
social_score      |           obesity |      0.14 | [ 0.09,  0.19] | 5.40e+08 | < .001***
social_score      |  household_income |     -0.03 | [-0.08,  0.02] | 6.44e+08 | > .999   
social_score      |      smoking_prop |     -0.04 | [-0.09,  0.01] | 6.51e+08 | > .999   
social_score      |     drinking_prop |      0.11 | [ 0.06,  0.16] | 5.54e+08 | < .001***
social_score      |         dentist_n |     -0.03 | [-0.09,  0.02] | 6.47e+08 | > .999   
life_expectancy   |          diabetes |     -0.78 | [-0.79, -0.75] | 1.11e+09 | < .001***
life_expectancy   |      insuff_sleep |     -0.70 | [-0.73, -0.68] | 1.06e+09 | < .001***
life_expectancy   |   highschool_grad |      0.03 | [-0.02,  0.08] | 6.08e+08 | > .999   
life_expectancy   |       segregation |      0.28 | [ 0.23,  0.33] | 4.51e+08 | < .001***
life_expectancy   |    home_ownership |      0.03 | [-0.02,  0.08] | 6.09e+08 | > .999   
life_expectancy   |    housing_burden |      0.07 | [ 0.02,  0.12] | 5.84e+08 | 0.834    
life_expectancy   |          below_18 |     -0.20 | [-0.24, -0.15] | 7.48e+08 | < .001***
life_expectancy   |          above_65 |      0.08 | [ 0.03,  0.13] | 5.77e+08 | 0.229    
life_expectancy   |             black |     -0.27 | [-0.31, -0.22] | 7.93e+08 | < .001***
life_expectancy   |            native |      0.08 | [ 0.03,  0.13] | 5.76e+08 | 0.205    
life_expectancy   |             asian |      0.38 | [ 0.34,  0.43] | 3.86e+08 | < .001***
life_expectancy   |          hispanic |      0.21 | [ 0.16,  0.26] | 4.95e+08 | < .001***
life_expectancy   |             white |      0.14 | [ 0.09,  0.19] | 5.39e+08 | < .001***
life_expectancy   |            female |     -0.14 | [-0.19, -0.09] | 7.14e+08 | < .001***
life_expectancy   |             rural |     -0.27 | [-0.32, -0.22] | 7.96e+08 | < .001***
life_expectancy   |        inactivity |     -0.75 | [-0.78, -0.73] | 1.10e+09 | < .001***
life_expectancy   |           obesity |     -0.65 | [-0.68, -0.62] | 1.03e+09 | < .001***
life_expectancy   |  household_income |      0.74 | [ 0.72,  0.76] | 1.62e+08 | < .001***
life_expectancy   |      smoking_prop |     -0.74 | [-0.76, -0.72] | 1.09e+09 | < .001***
life_expectancy   |     drinking_prop |      0.62 | [ 0.59,  0.65] | 2.37e+08 | < .001***
life_expectancy   |         dentist_n |     -0.36 | [-0.40, -0.31] | 8.50e+08 | < .001***
diabetes          |      insuff_sleep |      0.79 | [ 0.77,  0.81] | 1.30e+08 | < .001***
diabetes          |   highschool_grad |     -0.03 | [-0.08,  0.03] | 6.41e+08 | > .999   
diabetes          |       segregation |     -0.36 | [-0.40, -0.31] | 8.50e+08 | < .001***
diabetes          |    home_ownership |     -0.15 | [-0.20, -0.09] | 7.16e+08 | < .001***
diabetes          |    housing_burden |      0.07 | [ 0.02,  0.12] | 5.83e+08 | 0.678    
diabetes          |          below_18 |      0.19 | [ 0.14,  0.24] | 5.07e+08 | < .001***
diabetes          |          above_65 |     -0.19 | [-0.24, -0.14] | 7.43e+08 | < .001***
diabetes          |             black |      0.44 | [ 0.40,  0.48] | 3.51e+08 | < .001***
diabetes          |            native |     -0.04 | [-0.09,  0.01] | 6.50e+08 | > .999   
diabetes          |             asian |     -0.26 | [-0.31, -0.21] | 7.87e+08 | < .001***
diabetes          |          hispanic |      0.01 | [-0.04,  0.06] | 6.17e+08 | > .999   
diabetes          |             white |     -0.43 | [-0.47, -0.39] | 8.95e+08 | < .001***
diabetes          |            female |      0.14 | [ 0.09,  0.19] | 5.39e+08 | < .001***
diabetes          |             rural |      0.22 | [ 0.17,  0.27] | 4.86e+08 | < .001***
diabetes          |        inactivity |      0.86 | [ 0.84,  0.87] | 8.88e+07 | < .001***
diabetes          |           obesity |      0.70 | [ 0.67,  0.73] | 1.87e+08 | < .001***
diabetes          |  household_income |     -0.72 | [-0.74, -0.69] | 1.07e+09 | < .001***
diabetes          |      smoking_prop |      0.66 | [ 0.63,  0.69] | 2.14e+08 | < .001***
diabetes          |     drinking_prop |     -0.67 | [-0.70, -0.64] | 1.05e+09 | < .001***
diabetes          |         dentist_n |      0.38 | [ 0.34,  0.42] | 3.86e+08 | < .001***
insuff_sleep      |   highschool_grad |      0.08 | [ 0.03,  0.13] | 5.78e+08 | 0.267    
insuff_sleep      |       segregation |     -0.34 | [-0.39, -0.30] | 8.39e+08 | < .001***
insuff_sleep      |    home_ownership |     -0.04 | [-0.09,  0.01] | 6.52e+08 | > .999   
insuff_sleep      |    housing_burden |      0.07 | [ 0.02,  0.12] | 5.83e+08 | 0.678    
insuff_sleep      |          below_18 |      0.03 | [-0.02,  0.08] | 6.05e+08 | > .999   
insuff_sleep      |          above_65 |     -0.13 | [-0.18, -0.08] | 7.08e+08 | < .001***
insuff_sleep      |             black |      0.50 | [ 0.46,  0.54] | 3.13e+08 | < .001***
insuff_sleep      |            native |     -0.30 | [-0.34, -0.25] | 8.10e+08 | < .001***
insuff_sleep      |             asian |     -0.22 | [-0.27, -0.17] | 7.62e+08 | < .001***
insuff_sleep      |          hispanic |     -0.16 | [-0.21, -0.11] | 7.25e+08 | < .001***
insuff_sleep      |             white |     -0.24 | [-0.29, -0.20] | 7.79e+08 | < .001***
insuff_sleep      |            female |      0.19 | [ 0.14,  0.24] | 5.06e+08 | < .001***
insuff_sleep      |             rural |      0.13 | [ 0.08,  0.18] | 5.44e+08 | < .001***
insuff_sleep      |        inactivity |      0.71 | [ 0.68,  0.73] | 1.82e+08 | < .001***
insuff_sleep      |           obesity |      0.57 | [ 0.54,  0.61] | 2.67e+08 | < .001***
insuff_sleep      |  household_income |     -0.56 | [-0.60, -0.53] | 9.77e+08 | < .001***
insuff_sleep      |      smoking_prop |      0.66 | [ 0.63,  0.68] | 2.15e+08 | < .001***
insuff_sleep      |     drinking_prop |     -0.58 | [-0.61, -0.54] | 9.86e+08 | < .001***
insuff_sleep      |         dentist_n |      0.34 | [ 0.29,  0.39] | 4.12e+08 | < .001***
highschool_grad   |       segregation |     -0.18 | [-0.23, -0.13] | 7.38e+08 | < .001***
highschool_grad   |    home_ownership |      0.31 | [ 0.26,  0.35] | 4.33e+08 | < .001***
highschool_grad   |    housing_burden |     -0.28 | [-0.33, -0.24] | 8.03e+08 | < .001***
highschool_grad   |          below_18 |      0.06 | [ 0.01,  0.11] | 5.87e+08 | > .999   
highschool_grad   |          above_65 |      0.07 | [ 0.02,  0.12] | 5.81e+08 | 0.484    
highschool_grad   |             black |     -0.08 | [-0.13, -0.03] | 6.75e+08 | 0.184    
highschool_grad   |            native |     -0.21 | [-0.26, -0.16] | 7.59e+08 | < .001***
highschool_grad   |             asian |     -0.20 | [-0.25, -0.15] | 7.48e+08 | < .001***
highschool_grad   |          hispanic |     -0.12 | [-0.17, -0.07] | 6.99e+08 | < .001***
highschool_grad   |             white |      0.27 | [ 0.22,  0.32] | 4.57e+08 | < .001***
highschool_grad   |            female |     -0.08 | [-0.13, -0.03] | 6.74e+08 | 0.259    
highschool_grad   |             rural |      0.21 | [ 0.16,  0.26] | 4.92e+08 | < .001***
highschool_grad   |        inactivity |      0.11 | [ 0.06,  0.16] | 5.59e+08 | 0.004**  
highschool_grad   |           obesity |      0.09 | [ 0.04,  0.14] | 5.67e+08 | 0.032*   
highschool_grad   |  household_income |      0.02 | [-0.03,  0.07] | 6.11e+08 | > .999   
highschool_grad   |      smoking_prop |     -0.05 | [-0.10,  0.01] | 6.54e+08 | > .999   
highschool_grad   |     drinking_prop |     -0.07 | [-0.13, -0.02] | 6.72e+08 | 0.342    
highschool_grad   |         dentist_n |      0.26 | [ 0.21,  0.31] | 4.61e+08 | < .001***
segregation       |    home_ownership |     -0.01 | [-0.06,  0.04] | 6.34e+08 | > .999   
segregation       |    housing_burden |      0.12 | [ 0.07,  0.17] | 5.50e+08 | < .001***
segregation       |          below_18 |     -0.10 | [-0.15, -0.05] | 6.89e+08 | 0.008**  
segregation       |          above_65 |      0.12 | [ 0.07,  0.17] | 5.52e+08 | < .001***
segregation       |             black |     -0.37 | [-0.42, -0.33] | 8.60e+08 | < .001***
segregation       |            native |      0.12 | [ 0.06,  0.17] | 5.53e+08 | < .001***
segregation       |             asian |      0.09 | [ 0.04,  0.14] | 5.67e+08 | 0.029*   
segregation       |          hispanic |  2.81e-03 | [-0.05,  0.05] | 6.24e+08 | > .999   
segregation       |             white |      0.24 | [ 0.19,  0.28] | 4.77e+08 | < .001***
segregation       |            female |     -0.07 | [-0.12, -0.02] | 6.70e+08 | 0.510    
segregation       |             rural |     -0.18 | [-0.23, -0.13] | 7.36e+08 | < .001***
segregation       |        inactivity |     -0.30 | [-0.34, -0.25] | 8.12e+08 | < .001***
segregation       |           obesity |     -0.29 | [-0.34, -0.25] | 8.09e+08 | < .001***
segregation       |  household_income |      0.19 | [ 0.14,  0.24] | 5.07e+08 | < .001***
segregation       |      smoking_prop |     -0.18 | [-0.23, -0.13] | 7.38e+08 | < .001***
segregation       |     drinking_prop |      0.29 | [ 0.24,  0.34] | 4.44e+08 | < .001***
segregation       |         dentist_n |     -0.32 | [-0.37, -0.28] | 8.27e+08 | < .001***
home_ownership    |    housing_burden |     -0.55 | [-0.58, -0.51] | 9.68e+08 | < .001***
home_ownership    |          below_18 |     -0.10 | [-0.15, -0.05] | 6.88e+08 | 0.011*   
home_ownership    |          above_65 |      0.48 | [ 0.44,  0.52] | 3.22e+08 | < .001***
home_ownership    |             black |     -0.33 | [-0.38, -0.29] | 8.34e+08 | < .001***
home_ownership    |            native |     -0.10 | [-0.15, -0.05] | 6.87e+08 | 0.013*   
home_ownership    |             asian |     -0.53 | [-0.56, -0.49] | 9.55e+08 | < .001***
home_ownership    |          hispanic |     -0.37 | [-0.41, -0.32] | 8.55e+08 | < .001***
home_ownership    |             white |      0.50 | [ 0.47,  0.54] | 3.10e+08 | < .001***
home_ownership    |            female |     -0.24 | [-0.28, -0.19] | 7.73e+08 | < .001***
home_ownership    |             rural |      0.56 | [ 0.52,  0.59] | 2.78e+08 | < .001***
home_ownership    |        inactivity |      0.03 | [-0.03,  0.08] | 6.10e+08 | > .999   
home_ownership    |           obesity |      0.10 | [ 0.05,  0.15] | 5.64e+08 | 0.015*   
home_ownership    |  household_income |     -0.03 | [-0.08,  0.02] | 6.46e+08 | > .999   
home_ownership    |      smoking_prop |     -0.05 | [-0.10,  0.00] | 6.56e+08 | > .999   
home_ownership    |     drinking_prop |     -0.09 | [-0.14, -0.04] | 6.83e+08 | 0.033*   
home_ownership    |         dentist_n |      0.46 | [ 0.42,  0.50] | 3.37e+08 | < .001***
housing_burden    |          below_18 |     -0.20 | [-0.25, -0.15] | 7.53e+08 | < .001***
housing_burden    |          above_65 |     -0.21 | [-0.26, -0.17] | 7.60e+08 | < .001***
housing_burden    |             black |      0.35 | [ 0.30,  0.39] | 4.08e+08 | < .001***
housing_burden    |            native |     -0.03 | [-0.08,  0.02] | 6.46e+08 | > .999   
housing_burden    |             asian |      0.46 | [ 0.42,  0.50] | 3.36e+08 | < .001***
housing_burden    |          hispanic |      0.25 | [ 0.20,  0.30] | 4.67e+08 | < .001***
housing_burden    |             white |     -0.40 | [-0.44, -0.36] | 8.77e+08 | < .001***
housing_burden    |            female |      0.30 | [ 0.26,  0.35] | 4.35e+08 | < .001***
housing_burden    |             rural |     -0.52 | [-0.56, -0.48] | 9.52e+08 | < .001***
housing_burden    |        inactivity |     -0.14 | [-0.19, -0.09] | 7.11e+08 | < .001***
housing_burden    |           obesity |     -0.22 | [-0.27, -0.17] | 7.63e+08 | < .001***
housing_burden    |  household_income |      0.09 | [ 0.04,  0.14] | 5.70e+08 | 0.059    
housing_burden    |      smoking_prop |     -0.05 | [-0.10,  0.00] | 6.56e+08 | > .999   
housing_burden    |     drinking_prop |      0.07 | [ 0.01,  0.12] | 5.84e+08 | 0.874    
housing_burden    |         dentist_n |     -0.40 | [-0.44, -0.35] | 8.73e+08 | < .001***
below_18          |          above_65 |     -0.50 | [-0.54, -0.46] | 9.40e+08 | < .001***
below_18          |             black |     -0.06 | [-0.11, -0.01] | 6.65e+08 | > .999   
below_18          |            native |      0.18 | [ 0.13,  0.23] | 5.15e+08 | < .001***
below_18          |             asian |     -0.05 | [-0.10,  0.00] | 6.56e+08 | > .999   
below_18          |          hispanic |      0.21 | [ 0.16,  0.26] | 4.93e+08 | < .001***
below_18          |             white |     -0.17 | [-0.22, -0.12] | 7.30e+08 | < .001***
below_18          |            female |      0.01 | [-0.04,  0.06] | 6.18e+08 | > .999   
below_18          |             rural |     -0.02 | [-0.07,  0.03] | 6.38e+08 | > .999   
below_18          |        inactivity |      0.23 | [ 0.18,  0.27] | 4.85e+08 | < .001***
below_18          |           obesity |      0.27 | [ 0.22,  0.31] | 4.59e+08 | < .001***
below_18          |  household_income |      0.02 | [-0.03,  0.07] | 6.13e+08 | > .999   
below_18          |      smoking_prop |      0.06 | [ 0.01,  0.11] | 5.88e+08 | > .999   
below_18          |     drinking_prop |     -0.10 | [-0.15, -0.05] | 6.86e+08 | 0.016*   
below_18          |         dentist_n |      0.04 | [-0.02,  0.09] | 6.03e+08 | > .999   
above_65          |             black |     -0.39 | [-0.43, -0.34] | 8.66e+08 | < .001***
above_65          |            native |     -0.01 | [-0.06,  0.04] | 6.33e+08 | > .999   
above_65          |             asian |     -0.43 | [-0.47, -0.38] | 8.93e+08 | < .001***
above_65          |          hispanic |     -0.35 | [-0.39, -0.30] | 8.44e+08 | < .001***
above_65          |             white |      0.47 | [ 0.42,  0.50] | 3.34e+08 | < .001***
above_65          |            female |     -0.08 | [-0.13, -0.03] | 6.78e+08 | 0.113    
above_65          |             rural |      0.50 | [ 0.46,  0.53] | 3.16e+08 | < .001***
above_65          |        inactivity |     -0.04 | [-0.09,  0.01] | 6.50e+08 | > .999   
above_65          |           obesity |      0.02 | [-0.03,  0.07] | 6.12e+08 | > .999   
above_65          |  household_income |     -0.24 | [-0.29, -0.19] | 7.76e+08 | < .001***
above_65          |      smoking_prop |     -0.10 | [-0.15, -0.05] | 6.90e+08 | 0.007**  
above_65          |     drinking_prop |     -0.11 | [-0.16, -0.06] | 6.94e+08 | 0.002**  
above_65          |         dentist_n |      0.23 | [ 0.18,  0.28] | 4.83e+08 | < .001***
black             |            native |     -0.29 | [-0.34, -0.24] | 8.08e+08 | < .001***
black             |             asian |      0.33 | [ 0.28,  0.37] | 4.22e+08 | < .001***
black             |          hispanic |      0.15 | [ 0.10,  0.20] | 5.30e+08 | < .001***
black             |             white |     -0.60 | [-0.64, -0.57] | 1.00e+09 | < .001***
black             |            female |      0.38 | [ 0.34,  0.43] | 3.85e+08 | < .001***
black             |             rural |     -0.35 | [-0.39, -0.30] | 8.42e+08 | < .001***
black             |        inactivity |      0.25 | [ 0.20,  0.29] | 4.71e+08 | < .001***
black             |           obesity |      0.17 | [ 0.12,  0.22] | 5.21e+08 | < .001***
black             |  household_income |     -0.08 | [-0.13, -0.03] | 6.74e+08 | 0.251    
black             |      smoking_prop |      0.24 | [ 0.19,  0.29] | 4.74e+08 | < .001***
black             |     drinking_prop |     -0.18 | [-0.23, -0.13] | 7.37e+08 | < .001***
black             |         dentist_n |     -0.03 | [-0.08,  0.02] | 6.45e+08 | > .999   
native            |             asian |      0.08 | [ 0.02,  0.13] | 5.78e+08 | 0.296    
native            |          hispanic |      0.49 | [ 0.45,  0.53] | 3.18e+08 | < .001***
native            |             white |     -0.29 | [-0.34, -0.24] | 8.08e+08 | < .001***
native            |            female |     -0.32 | [-0.37, -0.28] | 8.29e+08 | < .001***
native            |             rural |      0.10 | [ 0.05,  0.15] | 5.64e+08 | 0.014*   
native            |        inactivity |     -0.12 | [-0.17, -0.07] | 7.04e+08 | < .001***
native            |           obesity |     -0.11 | [-0.16, -0.06] | 6.94e+08 | 0.002**  
native            |  household_income |      0.03 | [-0.02,  0.08] | 6.07e+08 | > .999   
native            |      smoking_prop |     -0.18 | [-0.23, -0.14] | 7.41e+08 | < .001***
native            |     drinking_prop |      0.10 | [ 0.05,  0.15] | 5.63e+08 | 0.010*   
native            |         dentist_n |     -0.08 | [-0.13, -0.03] | 6.78e+08 | 0.110    
asian             |          hispanic |      0.51 | [ 0.48,  0.55] | 3.04e+08 | < .001***
asian             |             white |     -0.46 | [-0.50, -0.42] | 9.14e+08 | < .001***
asian             |            female |      0.24 | [ 0.19,  0.28] | 4.78e+08 | < .001***
asian             |             rural |     -0.71 | [-0.73, -0.68] | 1.07e+09 | < .001***
asian             |        inactivity |     -0.47 | [-0.51, -0.43] | 9.19e+08 | < .001***
asian             |           obesity |     -0.50 | [-0.54, -0.46] | 9.41e+08 | < .001***
asian             |  household_income |      0.53 | [ 0.49,  0.57] | 2.92e+08 | < .001***
asian             |      smoking_prop |     -0.36 | [-0.40, -0.32] | 8.51e+08 | < .001***
asian             |     drinking_prop |      0.36 | [ 0.31,  0.40] | 4.00e+08 | < .001***
asian             |         dentist_n |     -0.59 | [-0.62, -0.55] | 9.92e+08 | < .001***
hispanic          |             white |     -0.65 | [-0.68, -0.62] | 1.03e+09 | < .001***
hispanic          |            female |     -0.03 | [-0.08,  0.02] | 6.44e+08 | > .999   
hispanic          |             rural |     -0.34 | [-0.39, -0.30] | 8.40e+08 | < .001***
hispanic          |        inactivity |     -0.20 | [-0.25, -0.15] | 7.51e+08 | < .001***
hispanic          |           obesity |     -0.30 | [-0.34, -0.25] | 8.10e+08 | < .001***
hispanic          |  household_income |      0.29 | [ 0.24,  0.34] | 4.44e+08 | < .001***
hispanic          |      smoking_prop |     -0.40 | [-0.45, -0.36] | 8.78e+08 | < .001***
hispanic          |     drinking_prop |      0.11 | [ 0.06,  0.16] | 5.59e+08 | 0.004**  
hispanic          |         dentist_n |     -0.25 | [-0.30, -0.20] | 7.82e+08 | < .001***
white             |            female |     -0.19 | [-0.24, -0.14] | 7.44e+08 | < .001***
white             |             rural |      0.32 | [ 0.27,  0.37] | 4.24e+08 | < .001***
white             |        inactivity |     -0.11 | [-0.16, -0.06] | 6.95e+08 | 0.002**  
white             |           obesity |     -0.02 | [-0.07,  0.03] | 6.40e+08 | > .999   
white             |  household_income | -5.93e-04 | [-0.05,  0.05] | 6.26e+08 | > .999   
white             |      smoking_prop |      0.03 | [-0.02,  0.08] | 6.08e+08 | > .999   
white             |     drinking_prop |      0.13 | [ 0.08,  0.18] | 5.42e+08 | < .001***
white             |         dentist_n |      0.17 | [ 0.12,  0.22] | 5.22e+08 | < .001***
female            |             rural |     -0.38 | [-0.43, -0.34] | 8.66e+08 | < .001***
female            |        inactivity |      0.03 | [-0.02,  0.08] | 6.09e+08 | > .999   
female            |           obesity | -9.56e-03 | [-0.06,  0.04] | 6.31e+08 | > .999   
female            |  household_income | -1.80e-03 | [-0.05,  0.05] | 6.27e+08 | > .999   
female            |      smoking_prop |      0.08 | [ 0.03,  0.13] | 5.75e+08 | 0.181    
female            |     drinking_prop |     -0.17 | [-0.21, -0.12] | 7.29e+08 | < .001***
female            |         dentist_n |     -0.23 | [-0.28, -0.18] | 7.69e+08 | < .001***
rural             |        inactivity |      0.38 | [ 0.34,  0.43] | 3.85e+08 | < .001***
rural             |           obesity |      0.39 | [ 0.35,  0.43] | 3.81e+08 | < .001***
rural             |  household_income |     -0.47 | [-0.51, -0.43] | 9.22e+08 | < .001***
rural             |      smoking_prop |      0.24 | [ 0.19,  0.28] | 4.78e+08 | < .001***
rural             |     drinking_prop |     -0.33 | [-0.38, -0.29] | 8.33e+08 | < .001***
rural             |         dentist_n |      0.66 | [ 0.63,  0.69] | 2.12e+08 | < .001***
inactivity        |           obesity |      0.79 | [ 0.77,  0.81] | 1.29e+08 | < .001***
inactivity        |  household_income |     -0.76 | [-0.78, -0.74] | 1.10e+09 | < .001***
inactivity        |      smoking_prop |      0.71 | [ 0.68,  0.73] | 1.83e+08 | < .001***
inactivity        |     drinking_prop |     -0.62 | [-0.65, -0.58] | 1.01e+09 | < .001***
inactivity        |         dentist_n |      0.50 | [ 0.46,  0.53] | 3.14e+08 | < .001***
obesity           |  household_income |     -0.66 | [-0.69, -0.63] | 1.04e+09 | < .001***
obesity           |      smoking_prop |      0.62 | [ 0.58,  0.65] | 2.40e+08 | < .001***
obesity           |     drinking_prop |     -0.49 | [-0.53, -0.45] | 9.35e+08 | < .001***
obesity           |         dentist_n |      0.45 | [ 0.40,  0.49] | 3.47e+08 | < .001***
household_income  |      smoking_prop |     -0.66 | [-0.69, -0.63] | 1.04e+09 | < .001***
household_income  |     drinking_prop |      0.61 | [ 0.58,  0.64] | 2.45e+08 | < .001***
household_income  |         dentist_n |     -0.42 | [-0.47, -0.38] | 8.91e+08 | < .001***
smoking_prop      |     drinking_prop |     -0.47 | [-0.51, -0.43] | 9.18e+08 | < .001***
smoking_prop      |         dentist_n |      0.32 | [ 0.27,  0.37] | 4.25e+08 | < .001***
drinking_prop     |         dentist_n |     -0.39 | [-0.43, -0.35] | 8.69e+08 | < .001***

p-value adjustment method: Holm (1979)
Observations: 1554

The variables with a correlation of 0.8 or more (or -0.8 or less) are:

  • poor_health with poor_physical, life_expectancy, diabetes, inactivity, and obesity

  • poor_physical with poor_mental, life_expectancy, diabetes, and inactivity

  • diabetes with inactivity

The variables that exhibited high correlation with the outcome smoking_prop will be removed from the analysis (poor_health, poor_physical, and diabetes).

Code
chr_fin <- chr_simp |>
  dplyr::select(-fipscode, -poor_health, -poor_physical, -inactivity)

dim(chr_fin)
[1] 1554   31

4 Table 1

setting up table 1:

Code
myVars  <- c("poor_mental", "food_index",
                   "uninsured", "primary_care", "mental_care",
                   "highschool_grad", "some_college", "unemployment", "income_inequality",
                   "social_score", "life_expectancy", "insuff_sleep",
                   "home_ownership", "housing_burden",
                   "below_18","above_65", "black", "native", "asian", "hispanic",
                   "white", "female", "rural", "obesity", "household_income", "smoking_prop", "drinking_prop")


t1 <- tableone::CreateTableOne(
          data = chr_fin, 
          vars = myVars,
          strata = c("dentist_grp"))

table_one_text <- print(
  t1,
  showAllLevels = TRUE,
  quote         = FALSE,
  noSpaces      = FALSE,
  printToggle   = FALSE  
)


knitr::kable(
  table_one_text,
  format   = "html",
  caption  = "Comparison by Dentist Proportion"
)
Comparison by Dentist Proportion
level High Low p test
n 777 777
poor_mental (mean (SD)) 5.02 (0.59) 5.35 (0.64) <0.001
food_index (mean (SD)) 7.90 (1.18) 7.19 (1.38) <0.001
uninsured (mean (SD)) 0.09 (0.04) 0.13 (0.06) <0.001
primary_care (mean (SD)) 0.00 (0.00) 0.00 (0.00) <0.001
mental_care (mean (SD)) 547.08 (975.87) 2390.97 (2778.69) <0.001
highschool_grad (mean (SD)) 0.86 (0.07) 0.89 (0.08) <0.001
some_college (mean (SD)) 0.67 (0.10) 0.54 (0.12) <0.001
unemployment (mean (SD)) 0.03 (0.01) 0.04 (0.01) 0.151
income_inequality (mean (SD)) 4.54 (0.74) 4.61 (0.95) 0.085
social_score (mean (SD)) 10.97 (5.14) 10.85 (7.36) 0.701
life_expectancy (mean (SD)) 77.26 (3.57) 74.98 (3.37) <0.001
insuff_sleep (mean (SD)) 0.33 (0.04) 0.35 (0.04) <0.001
home_ownership (mean (SD)) 0.68 (0.10) 0.76 (0.08) <0.001
housing_burden (mean (SD)) 0.13 (0.04) 0.10 (0.03) <0.001
below_18 (mean (SD)) 0.21 (0.03) 0.22 (0.04) 0.632
above_65 (mean (SD)) 0.20 (0.05) 0.22 (0.05) <0.001
black (mean (SD)) 0.08 (0.12) 0.10 (0.16) 0.004
native (mean (SD)) 0.04 (0.10) 0.02 (0.06) <0.001
asian (mean (SD)) 0.04 (0.05) 0.01 (0.01) <0.001
hispanic (mean (SD)) 0.11 (0.11) 0.10 (0.15) 0.036
white (mean (SD)) 0.72 (0.20) 0.76 (0.21) <0.001
female (mean (SD)) 0.50 (0.02) 0.49 (0.03) <0.001
rural (mean (SD)) 0.38 (0.33) 0.89 (0.19) <0.001
obesity (mean (SD)) 0.34 (0.05) 0.39 (0.04) <0.001
household_income (mean (SD)) 71992.92 (19356.32) 57529.52 (12919.78) <0.001
smoking_prop (mean (SD)) 16.41 (3.68) 18.78 (3.76) <0.001
drinking_prop (mean (SD)) 18.83 (3.24) 16.26 (2.91) <0.001

The low dentist group counties have more proportions of uninsured populations (0.13 vs 0.09 ), lower college education (0.54 vs 0.67), lower life expectancy (74.9 vs 77.3), more rural areas (0.89 vs 0.38), more smoking prevalence (18.8 vs 16.4) and lower excessive drinking (16.3 vs 18.8).

5 Propensity Score Assessment

5.1 Calculating and Evaluating PS Balance

To calculate the PS, we will build a logistic model with the binary exposure as the outcome and the covariates as the predictors.

Code
psmodel0 <- glm(dentist_grp ~  poor_mental + food_index + uninsured + income_inequality +
                  primary_care + mental_care + highschool_grad + some_college + unemployment + 
                  social_score + life_expectancy + insuff_sleep + segregation + home_ownership +
                  housing_burden + below_18 + above_65 + black + native + asian + hispanic +
                  white + female + rural + obesity + household_income, 
                family = binomial(), data= chr_fin)

model_parameters(psmodel0) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −0.289 7.033 0.950 −14.370 13.621 −0.041 Inf 0.967
poor_mental 0.229 0.283 0.950 −0.325 0.785 0.811 Inf 0.417
food_index −0.196 0.107 0.950 −0.409 0.012 −1.826 Inf 0.068
uninsured 12.491 3.018 0.950 6.653 18.494 4.139 Inf 0.000
income_inequality 0.143 0.161 0.950 −0.167 0.464 0.890 Inf 0.373
primary_care −3,235.569 350.427 0.950 −3,940.793 −2,564.907 −9.233 Inf 0.000
mental_care 0.000 0.000 0.950 0.000 0.000 2.513 Inf 0.012
highschool_grad −0.445 1.344 0.950 −3.212 2.094 −0.331 Inf 0.741
some_college −0.572 1.270 0.950 −3.064 1.920 −0.451 Inf 0.652
unemployment −18.456 10.540 0.950 −39.187 2.206 −1.751 Inf 0.080
social_score −0.018 0.015 0.950 −0.048 0.011 −1.215 Inf 0.224
life_expectancy −0.011 0.048 0.950 −0.104 0.085 −0.225 Inf 0.822
insuff_sleep 3.596 5.940 0.950 −8.048 15.264 0.605 Inf 0.545
segregation −0.022 0.007 0.950 −0.035 −0.009 −3.336 Inf 0.001
home_ownership 6.351 2.013 0.950 2.421 10.306 3.155 Inf 0.002
housing_burden −6.008 4.090 0.950 −14.047 2.009 −1.469 Inf 0.142
below_18 −14.298 4.826 0.950 −23.729 −4.787 −2.963 Inf 0.003
above_65 −14.052 3.834 0.950 −21.546 −6.495 −3.665 Inf 0.000
black −1.113 4.465 0.950 −10.265 8.398 −0.249 Inf 0.803
native −8.765 4.815 0.950 −18.599 1.319 −1.820 Inf 0.069
asian −26.237 15.047 0.950 −57.152 −2.854 −1.744 Inf 0.081
hispanic −2.042 4.303 0.950 −10.832 7.051 −0.474 Inf 0.635
white −1.641 4.644 0.950 −11.106 8.138 −0.353 Inf 0.724
female −1.140 5.360 0.950 −12.093 9.212 −0.213 Inf 0.832
rural 4.280 0.517 0.950 3.288 5.316 8.282 Inf 0.000
obesity 9.328 3.804 0.950 1.903 16.834 2.452 Inf 0.014
household_income 0.000 0.000 0.950 0.000 0.000 0.812 Inf 0.417

5.1.1 Saving Propesnsity Score

Code
chr_fin$ps0 <- psmodel0$fitted

chr_fin$linps0 <- psmodel0$linear.predictors

5.1.2 Comparing The PS Distribution of The Two Exposure Groups

Code
df_stats(ps0 + linps0 ~ dentist_grp, data = chr_fin) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps0 High 0.000 0.004 0.023 0.124 0.991 0.139 0.251 777 0
ps0 Low 0.014 0.829 0.943 0.984 1.000 0.861 0.193 777 0
linps0 High −19.766 −5.642 −3.767 −1.952 4.666 −3.895 3.356 777 0
linps0 Low −4.267 1.579 2.807 4.149 9.743 2.815 1.937 777 0

The propensity scores here are concerning. We will need to observe the value of inflation to observe potential collinearity between the covariates.

The VIF:

Code
vif_model <- glm(dentist_grp ~  poor_mental + food_index + uninsured + income_inequality +
                  primary_care + mental_care + highschool_grad + some_college + unemployment + 
                  social_score + life_expectancy + insuff_sleep + segregation + home_ownership +
                  housing_burden + below_18 + above_65 + black + native + asian + hispanic +
                  white + female + rural + obesity + household_income, 
                family = binomial(), data= chr_fin)

vif(vif_model)
      poor_mental        food_index         uninsured income_inequality 
         3.775083          2.288975          2.291229          2.039530 
     primary_care       mental_care   highschool_grad      some_college 
         1.129137          1.219127          1.369741          2.204048 
     unemployment      social_score   life_expectancy      insuff_sleep 
         1.920851          1.500884          3.128986          4.694362 
      segregation    home_ownership    housing_burden          below_18 
         1.337250          3.200085          2.076655          3.673811 
         above_65             black            native             asian 
         4.596523         27.541605         21.671785          2.526745 
         hispanic             white            female             rural 
        37.921934         87.584373          1.763807          2.417255 
          obesity  household_income 
         2.243732          3.681487 

While most of the variables have low variance inflation factors, there are a few with an extremely high values (>10), which presents serious multi-collinearity. These variables are hispanic, white, black, and native. We will remove these variables from our analysis.

5.1.3 Fixed Propensity Score

We will start by excluding the problematic variables from our final data set chr_fin to produce subset for outcome 1 analysis chr24_smoking.

Code
chr_fin1 <- chr_fin|>
  dplyr::select(-hispanic, -white, -black, -native, -ps0, -linps0)

Next, we will build the new model:

Code
psmodel00 <- glm(dentist_grp ~  poor_mental + food_index + uninsured + income_inequality +
                  primary_care + mental_care + highschool_grad + some_college + unemployment + 
                  social_score + life_expectancy + insuff_sleep + segregation + home_ownership +
                  housing_burden + below_18 + above_65 + asian +
                  female + rural + obesity + household_income,  
                family = binomial(), data= chr_fin1)

model_parameters(psmodel00) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −10.190 5.276 0.950 −20.584 0.116 −1.931 Inf 0.053
poor_mental 0.175 0.256 0.950 −0.327 0.679 0.683 Inf 0.495
food_index −0.128 0.100 0.950 −0.325 0.066 −1.282 Inf 0.200
uninsured 10.767 2.457 0.950 6.056 15.695 4.382 Inf 0.000
income_inequality −0.046 0.144 0.950 −0.327 0.240 −0.322 Inf 0.748
primary_care −3,322.832 344.040 0.950 −4,015.515 −2,664.968 −9.658 Inf 0.000
mental_care 0.000 0.000 0.950 0.000 0.000 2.891 Inf 0.004
highschool_grad 1.728 1.117 0.950 −0.492 3.896 1.547 Inf 0.122
some_college −0.886 1.213 0.950 −3.269 1.492 −0.730 Inf 0.465
unemployment −27.947 9.386 0.950 −46.423 −9.585 −2.978 Inf 0.003
social_score −0.010 0.015 0.950 −0.039 0.019 −0.700 Inf 0.484
life_expectancy 0.068 0.045 0.950 −0.019 0.158 1.516 Inf 0.129
insuff_sleep 10.463 5.100 0.950 0.488 20.507 2.052 Inf 0.040
segregation −0.024 0.006 0.950 −0.036 −0.012 −3.914 Inf 0.000
home_ownership 7.180 1.794 0.950 3.689 10.737 4.002 Inf 0.000
housing_burden −2.289 3.806 0.950 −9.742 5.197 −0.602 Inf 0.548
below_18 −17.897 4.356 0.950 −26.485 −9.390 −4.109 Inf 0.000
above_65 −13.396 3.645 0.950 −20.548 −6.244 −3.675 Inf 0.000
asian −26.238 12.986 0.950 −52.511 −5.981 −2.020 Inf 0.043
female −0.123 5.055 0.950 −10.218 9.945 −0.024 Inf 0.981
rural 3.693 0.456 0.950 2.814 4.602 8.104 Inf 0.000
obesity 8.389 3.681 0.950 1.204 15.652 2.279 Inf 0.023
household_income 0.000 0.000 0.950 0.000 0.000 −0.281 Inf 0.779

5.1.4 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps00 <- psmodel00$fitted

chr_fin1$linps00 <- psmodel00$linear.predictors

5.1.5 Comparing The PS Distribution of The Two Exposure Groups

Code
df_stats(ps00 + linps00 ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps00 High 0.000 0.004 0.026 0.137 0.992 0.145 0.254 777 0
ps00 Low 0.014 0.828 0.936 0.982 1.000 0.855 0.195 777 0
linps00 High −20.056 −5.575 −3.626 −1.842 4.880 −3.839 3.430 777 0
linps00 Low −4.238 1.570 2.689 4.015 10.800 2.707 1.906 777 0

The propensity scores slightly improved after removing the variables with high collinearity values. However, the problem with their values being too close to 0 or 1 is visibly persistent.

5.1.5.1 Fixed Propensity Score - Part 2

We will attempt to observe the impact of each covariate on the outcome individually to rule out the variable(s) that impacted the PS values.

5.1.5.1.1 poor_mental
Code
psmodel_pm <- glm(dentist_grp ~  poor_mental, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_pm) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −4.579 0.458 0.950 −5.487 −3.692 −10.004 Inf 0.000
poor_mental 0.883 0.088 0.950 0.713 1.057 10.077 Inf 0.000
5.1.5.1.2 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_pm <- psmodel_pm$fitted

chr_fin1$linps_pm <- psmodel_pm$linear.predictors
5.1.5.1.3 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_pm + linps_pm ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_pm High 0.146 0.376 0.463 0.552 0.785 0.465 0.123 777 0
ps_pm Low 0.150 0.448 0.549 0.632 0.875 0.535 0.132 777 0
linps_pm High −1.766 −0.505 −0.147 0.209 1.294 −0.151 0.523 777 0
linps_pm Low −1.738 −0.207 0.196 0.542 1.945 0.148 0.568 777 0

The poor_mental variable looks has two linear PS less than 0.01 and above 1.

5.1.5.1.4 food_index
Code
psmodel_fi <- glm(dentist_grp ~  food_index, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_fi) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 3.505 0.360 0.950 2.812 4.225 9.727 Inf 0.000
food_index −0.462 0.047 0.950 −0.555 −0.372 −9.911 Inf 0.000
5.1.5.1.5 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_fi <- psmodel_fi$fitted

chr_fin1$linps_fi <- psmodel_fi$linear.predictors
5.1.5.1.6 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_fi + linps_fi ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_fi High 0.256 0.374 0.441 0.522 0.948 0.462 0.120 777 0
ps_fi Low 0.247 0.441 0.522 0.612 0.971 0.538 0.137 777 0
linps_fi High −1.068 −0.514 −0.237 0.086 2.904 −0.144 0.545 777 0
linps_fi Low −1.115 −0.237 0.086 0.456 3.505 0.182 0.638 777 0

The food_index variable has two linear PS less than 0.01 and above 1. All PS values are close to 0.99 or above, therefore, it will be removed.

5.1.5.1.7 uninsured
Code
psmodel_un <- glm(dentist_grp ~  uninsured, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_un) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −1.945 0.148 0.950 −2.240 −1.659 −13.136 Inf 0.000
uninsured 17.458 1.276 0.950 15.005 20.009 13.682 Inf 0.000
5.1.5.1.8 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_un <- psmodel_un$fitted

chr_fin1$linps_un <- psmodel_un$linear.predictors
5.1.5.1.9 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_un + linps_un ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_un High 0.178 0.303 0.382 0.530 0.944 0.427 0.159 777 0
ps_un Low 0.232 0.406 0.559 0.727 0.998 0.573 0.193 777 0
linps_un High −1.528 −0.834 −0.481 0.120 2.832 −0.303 0.716 777 0
linps_un Low −1.197 −0.382 0.238 0.978 6.142 0.399 0.997 777 0

The uninsured variable has two linear PS less than 0.01 and above 1. All PS values are close to 0.99 or above, therefore, it will be removed.

5.1.5.1.10 income_inequality
Code
psmodel_ii <- glm(dentist_grp ~  income_inequality, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_ii) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −0.471 0.279 0.950 −1.021 0.074 −1.689 Inf 0.091
income_inequality 0.103 0.060 0.950 −0.014 0.221 1.717 Inf 0.086
5.1.5.1.11 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_ii <- psmodel_ii$fitted

chr_fin1$linps_ii <- psmodel_ii$linear.predictors
5.1.5.1.12 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_ii + linps_ii ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_ii High 0.448 0.487 0.496 0.508 0.599 0.499 0.019 777 0
ps_ii Low 0.425 0.485 0.497 0.512 0.649 0.501 0.024 777 0
linps_ii High −0.210 −0.052 −0.016 0.034 0.401 −0.004 0.076 777 0
linps_ii Low −0.301 −0.059 −0.011 0.050 0.614 0.004 0.098 777 0

The income_inequality has two linear propensity scores less than 0.01 and above 1.

5.1.5.1.13 primary_care
Code
psmodel_pc <- glm(dentist_grp ~  primary_care, 
                family = binomial(), data= chr_fin1)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Code
model_parameters(psmodel_pc) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 3.190 0.167 0.950 2.872 3.525 19.152 Inf 0.000
primary_care −5,829.786 288.884 0.950 −6,412.009 −5,278.912 −20.180 Inf 0.000

This variable has an exploding coefficient.

5.1.5.1.14 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_pc <- psmodel_pc$fitted

chr_fin1$linps_pc <- psmodel_pc$linear.predictors
5.1.5.1.15 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_pc + linps_pc ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_pc High 0.000 0.064 0.195 0.445 0.960 0.276 0.252 777 0
ps_pc Low 0.001 0.627 0.815 0.901 0.960 0.724 0.237 777 0
linps_pc High −30.653 −2.687 −1.417 −0.222 3.190 −1.684 2.569 777 0
linps_pc Low −6.747 0.520 1.480 2.212 3.190 1.218 1.415 777 0

The primary_care variable has all propensity scores as less than 0.01 and above 1. This variable will be removed from the analysis.

5.1.5.1.16 mental_care
Code
psmodel_mc <- glm(dentist_grp ~  mental_care, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_mc) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −1.232 0.087 0.950 −1.404 −1.064 −14.208 Inf 0.000
mental_care 0.001 0.000 0.950 0.001 0.001 15.128 Inf 0.000
5.1.5.1.17 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_mc <- psmodel_mc$fitted

chr_fin1$linps_mc <- psmodel_mc$linear.predictors
5.1.5.1.18 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_mc + linps_mc ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_mc High 0.092 0.276 0.299 0.339 1.000 0.345 0.148 777 0
ps_mc Low 0.087 0.437 0.666 0.897 1.000 0.655 0.256 777 0
linps_mc High −2.293 −0.963 −0.850 −0.668 14.836 −0.576 1.171 777 0
linps_mc Low −2.349 −0.253 0.689 2.165 25.878 1.637 3.334 777 0

The mental_care variable has all ps values smaller than 0.01 and above 1. This variable will be removed from the analysis.

5.1.5.1.19 highschool_grad
Code
psmodel_hs <- glm(dentist_grp ~  highschool_grad, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_hs) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −4.943 0.719 0.950 −6.382 −3.564 −6.872 Inf 0.000
highschool_grad 5.612 0.812 0.950 4.053 7.235 6.910 Inf 0.000
5.1.5.1.20 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_hs <- psmodel_hs$fitted

chr_fin1$linps_hs <- psmodel_hs$linear.predictors
5.1.5.1.21 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_hs + linps_hs ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_hs High 0.019 0.429 0.491 0.545 0.637 0.479 0.092 777 0
ps_hs Low 0.019 0.492 0.540 0.573 0.647 0.521 0.090 777 0
linps_hs High −3.961 −0.288 −0.035 0.182 0.564 −0.093 0.411 777 0
linps_hs Low −3.961 −0.033 0.160 0.295 0.608 0.068 0.464 777 0

The highschool_grad variable has two PS value less than 0.01, but overall looks acceptable.

5.1.5.1.22 some_college
Code
psmodel_sc <- glm(dentist_grp ~  some_college, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_sc) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 6.212 0.360 0.950 5.521 6.934 17.250 Inf 0.000
some_college −10.244 0.582 0.950 −11.410 −9.126 −17.587 Inf 0.000
5.1.5.1.23 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_sc <- psmodel_sc$fitted

chr_fin1$linps_sc <- psmodel_sc$linear.predictors
5.1.5.1.24 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_sc + linps_sc ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_sc High 0.035 0.218 0.322 0.490 0.983 0.366 0.200 777 0
ps_sc Low 0.017 0.476 0.682 0.823 0.993 0.634 0.234 777 0
linps_sc High −3.320 −1.276 −0.744 −0.040 4.080 −0.641 1.015 777 0
linps_sc Low −4.031 −0.095 0.762 1.535 5.007 0.697 1.269 777 0

The some_college variable has two linear PS value less than 0.01 and 3 above 0.99. This variable will be removed.

5.1.5.1.25 unemployment
Code
psmodel_ue <- glm(dentist_grp ~  unemployment, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_ue) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −0.221 0.162 0.950 −0.541 0.096 −1.364 Inf 0.173
unemployment 6.279 4.373 0.950 −2.273 14.890 1.436 Inf 0.151
5.1.5.1.26 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_ue <- psmodel_ue$fitted

chr_fin1$linps_ue <- psmodel_ue$linear.predictors
5.1.5.1.27 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_ue + linps_ue ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_ue High 0.466 0.486 0.497 0.509 0.595 0.499 0.018 777 0
ps_ue Low 0.454 0.487 0.498 0.510 0.614 0.501 0.019 777 0
linps_ue High −0.137 −0.055 −0.014 0.036 0.383 −0.003 0.072 777 0
linps_ue Low −0.186 −0.051 −0.008 0.041 0.466 0.003 0.075 777 0

The unemployment variable has two linear PS values less than 0.01, but overall looks acceptable.

5.1.5.1.28 social_score
Code
psmodel_ss <- glm(dentist_grp ~  social_score, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_ss) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 0.034 0.101 0.950 −0.164 0.232 0.332 Inf 0.740
social_score −0.003 0.008 0.950 −0.019 0.013 −0.384 Inf 0.701
5.1.5.1.29 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_ss <- psmodel_ss$fitted

chr_fin1$linps_ss <- psmodel_ss$linear.predictors
5.1.5.1.30 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_ss + linps_ss ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_ss High 0.479 0.498 0.500 0.502 0.508 0.500 0.004 777 0
ps_ss Low 0.463 0.497 0.501 0.503 0.508 0.500 0.006 777 0
linps_ss High −0.083 −0.007 0.002 0.009 0.034 0.000 0.016 777 0
linps_ss Low −0.149 −0.011 0.002 0.014 0.034 0.000 0.023 777 0

The social_score has two linear PS values less than 0.01, but overall looks acceptable.

5.1.5.1.31 life_expectancy
Code
psmodel_le <- glm(dentist_grp ~  life_expectancy, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_le) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 15.042 1.291 0.950 12.551 17.613 11.652 Inf 0.000
life_expectancy −0.198 0.017 0.950 −0.231 −0.165 −11.665 Inf 0.000
5.1.5.1.32 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_le <- psmodel_le$fitted

chr_fin1$linps_le <- psmodel_le$linear.predictors
5.1.5.1.33 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_le + linps_le ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_le High 0.034 0.344 0.433 0.532 0.982 0.447 0.149 777 0
ps_le Low 0.049 0.446 0.568 0.664 0.960 0.553 0.148 777 0
linps_le High −3.333 −0.644 −0.268 0.128 3.994 −0.222 0.705 777 0
linps_le Low −2.967 −0.215 0.273 0.680 3.167 0.227 0.665 777 0

The life_expectancy variable has two linear PS less than 0.01 and above 1. The other PS values are also problematic, therefore, it will be removed.

5.1.5.1.34 insuff_sleep
Code
psmodel_is <- glm(dentist_grp ~  insuff_sleep, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_is) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −6.637 0.536 0.950 −7.704 −5.600 −12.371 Inf 0.000
insuff_sleep 19.431 1.565 0.950 16.407 22.546 12.412 Inf 0.000
5.1.5.1.35 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_is <- psmodel_is$fitted

chr_fin1$linps_is <- psmodel_is$linear.predictors
5.1.5.1.36 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_is + linps_is ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_is High 0.118 0.334 0.425 0.541 0.880 0.444 0.154 777 0
ps_is Low 0.190 0.439 0.560 0.674 0.922 0.556 0.160 777 0
linps_is High −2.012 −0.691 −0.303 0.164 1.990 −0.239 0.685 777 0
linps_is Low −1.449 −0.244 0.242 0.727 2.476 0.254 0.719 777 0

The insuff_sleep variable has two linear PS less than 0.01 and above 1.

5.1.5.1.37 home_ownership
Code
psmodel_ho <- glm(dentist_grp ~  home_ownership, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_ho) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −8.754 0.592 0.950 −9.940 −7.618 −14.787 Inf 0.000
home_ownership 12.103 0.809 0.950 10.551 13.723 14.966 Inf 0.000
5.1.5.1.38 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_ho <- psmodel_ho$fitted

chr_fin1$linps_ho <- psmodel_ho$linear.predictors
5.1.5.1.39 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_ho + linps_ho ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_ho High 0.003 0.240 0.397 0.556 0.897 0.401 0.207 777 0
ps_ho Low 0.000 0.507 0.626 0.726 0.927 0.599 0.175 777 0
linps_ho High −5.810 −1.150 −0.418 0.227 2.169 −0.545 1.155 777 0
linps_ho Low −8.754 0.029 0.514 0.974 2.544 0.415 0.945 777 0

The home_ownership variable has all PS values in problematic levels. Therefore, it will be removed.

5.1.5.1.40 housing_burden
Code
psmodel_hb <- glm(dentist_grp ~  housing_burden, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_hb) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 2.601 0.196 0.950 2.223 2.991 13.274 Inf 0.000
housing_burden −23.546 1.724 0.950 −26.990 −20.228 −13.659 Inf 0.000
5.1.5.1.41 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_hb <- psmodel_hb$fitted

chr_fin1$linps_hb <- psmodel_hb$linear.predictors
5.1.5.1.42 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_hb + linps_hb ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_hb High 0.030 0.296 0.428 0.562 0.890 0.427 0.181 777 0
ps_hb Low 0.017 0.476 0.598 0.690 0.921 0.573 0.167 777 0
linps_hb High −3.483 −0.868 −0.289 0.250 2.090 −0.358 0.878 777 0
linps_hb Low −4.037 −0.097 0.397 0.801 2.461 0.312 0.785 777 0

The housing_burden variable has two linear PS less than 0.01 and above 1. All PS values are close to 0.99 or above, therefore, it will be removed.

5.1.5.1.43 below_18
Code
psmodel_b18 <- glm(dentist_grp ~  below_18, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_b18) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −0.147 0.311 0.950 −0.758 0.463 −0.472 Inf 0.637
below_18 0.684 1.430 0.950 −2.118 3.492 0.479 Inf 0.632
5.1.5.1.44 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_b18 <- psmodel_b18$fitted

chr_fin1$linps_b18 <- psmodel_b18$linear.predictors
5.1.5.1.45 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_b18 + linps_b18 ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_b18 High 0.480 0.496 0.500 0.503 0.526 0.500 0.006 777 0
ps_b18 Low 0.467 0.497 0.500 0.503 0.525 0.500 0.006 777 0
linps_b18 High −0.081 −0.015 0.000 0.012 0.106 0.000 0.024 777 0
linps_b18 Low −0.130 −0.013 0.001 0.013 0.101 0.000 0.025 777 0

The below_18 variable has two linear PS values less than 0.01 (0.00). Otherwise it looks acceptable.

5.1.5.1.46 above_65
Code
psmodel_a65 <- glm(dentist_grp ~  above_65, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_a65) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −1.663 0.228 0.950 −2.115 −1.220 −7.286 Inf 0.000
above_65 8.110 1.088 0.950 6.001 10.268 7.454 Inf 0.000
5.1.5.1.47 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_a65 <- psmodel_a65$fitted

chr_fin1$linps_a65 <- psmodel_a65$linear.predictors
5.1.5.1.48 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_a65 + linps_a65 ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_a65 High 0.228 0.416 0.464 0.534 0.805 0.481 0.097 777 0
ps_a65 Low 0.264 0.462 0.511 0.576 0.866 0.519 0.093 777 0
linps_a65 High −1.222 −0.340 −0.146 0.136 1.418 −0.076 0.404 777 0
linps_a65 Low −1.025 −0.153 0.046 0.307 1.865 0.082 0.395 777 0

The above_65 variable has two linear PS less than 0.01 and above 1.

5.1.5.1.49 asian
Code
psmodel_as <- glm(dentist_grp ~  asian, 
                family = binomial(), data= chr_fin1)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Code
model_parameters(psmodel_as) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 1.692 0.108 0.950 1.484 1.908 15.654 Inf 0.000
asian −132.404 9.115 0.950 −150.928 −115.192 −14.526 Inf 0.000
5.1.5.1.50 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_as <- psmodel_as$fitted

chr_fin1$linps_as <- psmodel_as$linear.predictors
5.1.5.1.51 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_as + linps_as ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_as High 0.000 0.023 0.337 0.628 0.844 0.333 0.287 777 0
ps_as Low 0.000 0.639 0.711 0.764 0.844 0.667 0.150 777 0
linps_as High −54.682 −3.745 −0.678 0.524 1.692 −3.050 6.809 777 0
linps_as Low −10.225 0.571 0.903 1.175 1.692 0.691 0.919 777 0

The asian variable has multiple PS values in problematic levels, therefore, it will be removed.

5.1.5.1.52 female
Code
psmodel_fm <- glm(dentist_grp ~  female, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_fm) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 11.219 1.477 0.950 8.411 14.199 7.598 Inf 0.000
female −22.627 2.969 0.950 −28.616 −16.976 −7.620 Inf 0.000
5.1.5.1.53 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_fm <- psmodel_fm$fitted

chr_fin1$linps_fm <- psmodel_fm$linear.predictors
5.1.5.1.54 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_fm + linps_fm ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_fm High 0.232 0.428 0.467 0.510 0.990 0.476 0.084 777 0
ps_fm Low 0.136 0.453 0.502 0.558 0.998 0.524 0.120 777 0
linps_fm High −1.195 −0.288 −0.133 0.038 4.628 −0.090 0.400 777 0
linps_fm Low −1.851 −0.187 0.010 0.234 6.339 0.135 0.649 777 0

The female variable has two linear PS less than 0.01 all PS values at or above 0.99. Therefore, it will be removed.

5.1.5.1.55 rural
Code
psmodel_rl <- glm(dentist_grp ~  rural, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_rl) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −3.521 0.185 0.950 −3.894 −3.170 −19.076 Inf 0.000
rural 5.291 0.240 0.950 4.834 5.773 22.093 Inf 0.000
5.1.5.1.56 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_rl <- psmodel_rl$fitted

chr_fin1$linps_rl <- psmodel_rl$linear.predictors
5.1.5.1.57 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_rl + linps_rl ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_rl High 0.029 0.049 0.118 0.366 0.854 0.262 0.292 777 0
ps_rl Low 0.029 0.658 0.854 0.854 0.854 0.738 0.202 777 0
linps_rl High −3.521 −2.958 −2.011 −0.552 1.770 −1.525 1.755 777 0
linps_rl Low −3.493 0.653 1.770 1.770 1.770 1.176 1.018 777 0

The rural variable has two linear PS less than 0.01 and value above 1. We might remove this variable.

5.1.5.1.58 obesity
Code
psmodel_ob <- glm(dentist_grp ~  obesity, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_ob) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −8.412 0.549 0.950 −9.512 −7.360 −15.327 Inf 0.000
obesity 22.823 1.472 0.950 20.000 25.773 15.504 Inf 0.000
5.1.5.1.59 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_ob <- psmodel_ob$fitted

chr_fin1$linps_ob <- psmodel_ob$linear.predictors
5.1.5.1.60 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_ob + linps_ob ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_ob High 0.012 0.220 0.390 0.565 0.944 0.399 0.221 777 0
ps_ob Low 0.041 0.491 0.609 0.734 0.973 0.601 0.174 777 0
linps_ob High −4.440 −1.268 −0.446 0.261 2.817 −0.542 1.175 777 0
linps_ob Low −3.162 −0.036 0.444 1.014 3.570 0.458 0.850 777 0

The obesity variable has two linear PS less than 0.01 and all PS values around 0.99. We will remove this variable.

5.1.5.1.61 segregation
Code
psmodel_se <- glm(dentist_grp ~  segregation, 
                family = binomial(), data= chr_fin1)

model_parameters(psmodel_se) |> 
  gt() |> fmt_number(decimals = 3) |> 
  opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) 2.099 0.185 0.950 1.742 2.468 11.339 Inf 0.000
segregation −0.041 0.003 0.950 −0.048 −0.035 −11.916 Inf 0.000
5.1.5.1.62 Saving Propesnsity Score

Now we will save the new propensity scores:

Code
chr_fin1$ps_se <- psmodel_se$fitted

chr_fin1$linps_se <- psmodel_se$linear.predictors
5.1.5.1.63 Comparing The PS Distribution of The Two Exposure Groups
Code
df_stats(ps_se + linps_se ~ dentist_grp, data = chr_fin1) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps_se High 0.165 0.355 0.445 0.530 0.886 0.448 0.135 777 0
ps_se Low 0.145 0.432 0.565 0.679 0.886 0.552 0.166 777 0
linps_se High −1.618 −0.595 −0.222 0.119 2.055 −0.217 0.594 777 0
linps_se Low −1.776 −0.275 0.263 0.749 2.055 0.233 0.741 777 0

The segregation variable has two linear PS less than 0.01 and all PS values around 0.99. We might remove this variable.

5.1.6 PS After Covariates Removal

We will start by building a subset for the first outcome in our analysis chr_fin2 after removing the problematic covariates from the original chr_fin set.

Code
chr_fin2 <- chr_fin|>
  dplyr::select( -hispanic, -white, -black, -native, -ps0, -linps0, -food_index, -uninsured, -primary_care, -mental_care, -some_college, -life_expectancy, -housing_burden, -home_ownership, -female, -rural, -asian, -obesity)

Now, we will build our psmodel1.

Code
psmodel1 <- glm(dentist_grp ~  poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 ,
                family = binomial(), data= chr_fin2)

model_parameters(psmodel1) |> 
  gt() |> gt::fmt_number(decimals = 3) |> 
  gt::opt_stylize(style = 1, color = "pink")
Parameter Coefficient SE CI CI_low CI_high z df_error p
(Intercept) −15.234 1.450 0.950 −18.122 −12.435 −10.508 Inf 0.000
poor_mental 0.415 0.146 0.950 0.130 0.703 2.839 Inf 0.005
unemployment −32.170 6.628 0.950 −45.231 −19.222 −4.854 Inf 0.000
income_inequality −0.568 0.098 0.950 −0.764 −0.378 −5.779 Inf 0.000
social_score −0.024 0.010 0.950 −0.045 −0.004 −2.364 Inf 0.018
diabetes 57.055 5.926 0.950 45.665 68.906 9.629 Inf 0.000
insuff_sleep 2.959 3.121 0.950 −3.167 9.078 0.948 Inf 0.343
highschool_grad 6.238 0.875 0.950 4.558 8.003 7.132 Inf 0.000
segregation −0.027 0.004 0.950 −0.036 −0.019 −6.547 Inf 0.000
below_18 7.428 2.417 0.950 2.711 12.196 3.073 Inf 0.002
above_65 21.644 1.817 0.950 18.152 25.280 11.912 Inf 0.000
Code
chr_fin2$ps1 <- psmodel1$fitted

chr_fin2$linps1 <- psmodel1$linear.predictors
Code
df_stats(ps1 + linps1 ~ dentist_grp, data = chr_fin2) |>
  gt() |> fmt_number(columns = c(-n, -missing), decimals = 3) |> 
  opt_stylize(style = 2, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
ps1 High 0.002 0.129 0.257 0.463 0.993 0.320 0.240 777 0
ps1 Low 0.005 0.528 0.731 0.872 0.999 0.680 0.232 777 0
linps1 High −6.437 −1.911 −1.063 −0.150 4.940 −0.987 1.448 777 0
linps1 Low −5.230 0.112 1.000 1.923 6.709 1.032 1.435 777 0

The problematic values in the propensity score are persistent and we will need to investigate this using other methods. For now, we will proceed to the analysis phase.

6 Outcome 1 Analysis

6.1 The Blanace Before PS Adjustment

Code
ggplot(chr_fin2, aes(x = dentist_grp, y = ps1, color = dentist_grp)) + 
    geom_boxplot() +
    geom_jitter(width = 0.1) + 
    guides(color = "none")

Code
ggplot(chr_fin2, aes(x = linps1, fill = dentist_grp)) +
    geom_density(alpha = 0.3)

There is a good overlap between the two groups.

6.1.1 Rubin’s Rule 1

Code
rubin1.unadj <- with(chr_fin2,
     abs(100*(mean(linps1[dentist_n==1])-mean(linps1[dentist_n==0]))/sd(linps1)))
rubin1.unadj
[1] 114.7657

The absolute value of the standardized difference of the linear propensity score comparing the counties with high dentist proportions and low dentist proportions is 114.8%. This value is too far from the ideal value of 0% and above the maximum acceptable value of 50 %. We definitely do not pass Rubin’s rule 1, therefore, propensity score adjustment is needed.

6.1.2 Rubin’s Rule 2

Code
rubin2.unadj <-with(chr_fin2, var(linps1[dentist_n==1])/var(linps1[dentist_n==0]))
rubin2.unadj
[1] 0.9818352

The ratio of variances of the linear propensity score comparing the counties with high dentist proportions and low dentist proportions is 0.98. This value is very close to the desired value of 1.0. In this case, we pass Rubin’s Rule 2.

6.2 Unadjusted Outcome 1 smoking_prop

We will observe the the distribution of the outcome smoking_prop among the two exposure groups dentist_grp.

Code
df_stats(smoking_prop ~ dentist_grp, data = chr_fin2) |>
  gt() |> fmt_number(mean:sd, decimals = 2) |>
  opt_stylize(style = 5, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
smoking_prop High 6.821752 14.04568 15.86923 18.30110 37.02901 16.41 3.68 777 0
smoking_prop Low 7.687424 15.68921 18.72868 21.37735 41.20176 18.78 3.76 777 0

The high dentist group has a lower mean of smoking proportions (16.4 vs 18.8).

6.2.1 Linear Regression Model for Outcome 1

We will build a linear regression model for outcome 1 smoking_prop and the exposure dentist_prop without including the covariates.

Code
unadj.out1 <- lm(smoking_prop ~ dentist_grp, data= chr_fin2)

model_parameters(unadj.out1, ci = 0.95) |> 
  gt() |> fmt_number(columns = -df_error, decimals = 3) |>
  opt_stylize(style = 6, color = "pink")
Parameter Coefficient SE CI CI_low CI_high t df_error p
(Intercept) 16.412 0.133 0.950 16.150 16.673 123.071 1552 0.000
dentist_grpLow 2.364 0.189 0.950 1.994 2.734 12.533 1552 0.000

The unadjusted model estimates that low dentist proportions increases smoking prevalence in US counties by 2.36 percentage points (95% CI: 1.99, 2.73).

6.3 1:1 Matching With Replacement

The binary exposure has an equal number of observations in each group (n=777). Therefore, we will use 1:1 matching with replacement.

Code
X <- chr_fin2$linps1 
Tr <- as.logical(chr_fin2$dentist_n)
match1 <- Match(Tr=Tr, X=X, M = 1, replace = TRUE, ties=FALSE)
summary(match1)

Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  1554 
Original number of treated obs...............  777 
Matched number of observations...............  777 
Matched number of observations  (unweighted).  777 

checking if matching worked:

Code
n_distinct(match1$index.treated)
[1] 777
Code
n_distinct(match1$index.control)
[1] 278

After 1:1 matching with replacement, the effective size was reduced to 272 matched counties from the treatment and control group.

6.3.1 Balance Assessment

Code
set.seed(500)
mb1 <- MatchBalance(dentist_n ~ poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 + ps1 + linps1, 
                  data= chr_fin2, 
                    match.out = match1, nboots=500)

***** (V1) poor_mental *****
                       Before Matching       After Matching
mean treatment........      5.354             5.354 
mean control..........     5.0152            5.2984 
std mean diff.........     52.617            8.6282 

mean raw eQQ diff.....    0.34041          0.082562 
med  raw eQQ diff.....    0.36298          0.073533 
max  raw eQQ diff.....    0.73693           0.73693 

mean eCDF diff........    0.16003          0.034772 
med  eCDF diff........    0.18919          0.027027 
max  eCDF diff........    0.24324           0.10296 

var ratio (Tr/Co).....       1.18            1.0251 
T-test p-value........ < 2.22e-16          0.068521 
KS Bootstrap p-value.. < 2.22e-16             0.002 
KS Naive p-value...... < 2.22e-16        0.00052946 
KS Statistic..........    0.24324           0.10296 


***** (V2) unemployment *****
                       Before Matching       After Matching
mean treatment........   0.035706          0.035706 
mean control..........   0.034857          0.035613 
std mean diff.........     7.1471           0.77874 

mean raw eQQ diff.....  0.0010747         0.0010116 
med  raw eQQ diff.....   0.001005        0.00071892 
max  raw eQQ diff.....   0.013165          0.018658 

mean eCDF diff........   0.025066          0.020112 
med  eCDF diff........   0.020592          0.015444 
max  eCDF diff........   0.059202          0.065637 

var ratio (Tr/Co).....     1.0862            1.0782 
T-test p-value........    0.15077            0.8748 
KS Bootstrap p-value..      0.146             0.072 
KS Naive p-value......    0.13128          0.070342 
KS Statistic..........   0.059202          0.065637 


***** (V3) income_inequality *****
                       Before Matching       After Matching
mean treatment........     4.6117            4.6117 
mean control..........     4.5373            4.6189 
std mean diff.........     7.7974          -0.75458 

mean raw eQQ diff.....    0.15959           0.12362 
med  raw eQQ diff.....    0.10178          0.062316 
max  raw eQQ diff.....     2.0677            2.5917 

mean eCDF diff........   0.037661          0.024458 
med  eCDF diff........   0.041184          0.024453 
max  eCDF diff........   0.073359          0.056628 

var ratio (Tr/Co).....     1.6836            1.6035 
T-test p-value........   0.085366            0.8694 
KS Bootstrap p-value..      0.032             0.182 
KS Naive p-value......   0.030552           0.16546 
KS Statistic..........   0.073359          0.056628 


***** (V4) social_score *****
                       Before Matching       After Matching
mean treatment........     10.845            10.845 
mean control..........     10.969            11.278 
std mean diff.........    -1.6808           -5.8744 

mean raw eQQ diff.....     1.7301           0.88592 
med  raw eQQ diff.....     1.2191           0.90882 
max  raw eQQ diff.....     21.371            21.456 

mean eCDF diff........   0.059115          0.046787 
med  eCDF diff........   0.052767           0.05148 
max  eCDF diff........    0.12999          0.095238 

var ratio (Tr/Co).....     2.0485            1.2144 
T-test p-value........    0.70099            0.2186 
KS Bootstrap p-value.. < 2.22e-16             0.004 
KS Naive p-value...... 3.9747e-06          0.001739 
KS Statistic..........    0.12999          0.095238 


***** (V5) diabetes *****
                       Before Matching       After Matching
mean treatment........    0.11284           0.11284 
mean control..........   0.095932           0.11175 
std mean diff.........     71.142            4.5818 

mean raw eQQ diff.....   0.016906         0.0019331 
med  raw eQQ diff.....      0.018             0.001 
max  raw eQQ diff.....      0.033             0.045 

mean eCDF diff........    0.13194          0.014661 
med  eCDF diff........   0.099743          0.010296 
max  eCDF diff........    0.34234          0.061776 

var ratio (Tr/Co).....     1.5867            1.0317 
T-test p-value........ < 2.22e-16           0.27851 
KS Bootstrap p-value.. < 2.22e-16             0.074 
KS Naive p-value...... < 2.22e-16           0.10308 
KS Statistic..........    0.34234          0.061776 


***** (V6) insuff_sleep *****
                       Before Matching       After Matching
mean treatment........    0.35465           0.35465 
mean control..........    0.32925           0.35599 
std mean diff.........     68.689           -3.6185 

mean raw eQQ diff.....     0.0254         0.0078744 
med  raw eQQ diff.....      0.027             0.006 
max  raw eQQ diff.....      0.032             0.029 

mean eCDF diff........    0.13895          0.042921 
med  eCDF diff........    0.11583          0.043115 
max  eCDF diff........     0.3166           0.10682 

var ratio (Tr/Co).....     1.1012           0.73316 
T-test p-value........ < 2.22e-16           0.47547 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        0.00028217 
KS Statistic..........     0.3166           0.10682 


***** (V7) highschool_grad *****
                       Before Matching       After Matching
mean treatment........    0.89305           0.89305 
mean control..........    0.86432           0.88698 
std mean diff.........      34.78            7.3467 

mean raw eQQ diff.....   0.033037          0.012742 
med  raw eQQ diff.....   0.029953              0.01 
max  raw eQQ diff.....    0.37636           0.37636 

mean eCDF diff........    0.15082          0.047224 
med  eCDF diff........    0.16345          0.045045 
max  eCDF diff........    0.26641           0.13256 

var ratio (Tr/Co).....     1.2754            1.4269 
T-test p-value........ 6.2084e-13           0.11513 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16         2.351e-06 
KS Statistic..........    0.26641           0.13256 


***** (V8) segregation *****
                       Before Matching       After Matching
mean treatment........     45.162            45.162 
mean control..........     56.059            48.835 
std mean diff.........    -60.758           -20.479 

mean raw eQQ diff.....     10.918            4.6257 
med  raw eQQ diff.....     10.675            4.5646 
max  raw eQQ diff.....     17.966            9.7518 

mean eCDF diff........    0.18644          0.078972 
med  eCDF diff........    0.20721          0.066924 
max  eCDF diff........    0.33076            0.1982 

var ratio (Tr/Co).....     1.5551            1.2339 
T-test p-value........ < 2.22e-16        2.1495e-06 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        1.1099e-13 
KS Statistic..........    0.33076            0.1982 


***** (V9) below_18 *****
                       Before Matching       After Matching
mean treatment........    0.21521           0.21521 
mean control..........    0.21435           0.21543 
std mean diff.........     2.3763           -0.6046 

mean raw eQQ diff.....  0.0033666         0.0038355 
med  raw eQQ diff.....  0.0025852         0.0029644 
max  raw eQQ diff.....   0.071666          0.099059 

mean eCDF diff........   0.023328          0.025996 
med  eCDF diff........   0.023166          0.027027 
max  eCDF diff........   0.050193          0.068211 

var ratio (Tr/Co).....     1.0897           0.96816 
T-test p-value........    0.63248           0.90575 
KS Bootstrap p-value..       0.29             0.046 
KS Naive p-value......    0.28165          0.053823 
KS Statistic..........   0.050193          0.068211 


***** (V10) above_65 *****
                       Before Matching       After Matching
mean treatment........    0.21521           0.21521 
mean control..........    0.19577            0.2255 
std mean diff.........     39.892           -21.115 

mean raw eQQ diff.....   0.019441          0.016389 
med  raw eQQ diff.....    0.02113          0.010991 
max  raw eQQ diff.....   0.055136          0.060015 

mean eCDF diff........    0.13214          0.066599 
med  eCDF diff........    0.13385          0.061776 
max  eCDF diff........    0.24453           0.16216 

var ratio (Tr/Co).....    0.95543           0.58912 
T-test p-value........ 1.3767e-14         0.0002428 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        2.6751e-09 
KS Statistic..........    0.24453           0.16216 


***** (V11) ps1 *****
                       Before Matching       After Matching
mean treatment........    0.68035           0.68035 
mean control..........    0.31965           0.68039 
std mean diff.........     155.79          -0.01788 

mean raw eQQ diff.....    0.36069         0.0014057 
med  raw eQQ diff.....    0.40833        0.00093887 
max  raw eQQ diff.....    0.48292         0.0093827 

mean eCDF diff........    0.34969         0.0025874 
med  eCDF diff........    0.38867          0.001287 
max  eCDF diff........    0.56113          0.019305 

var ratio (Tr/Co).....    0.93354           0.99996 
T-test p-value........ < 2.22e-16           0.56947 
KS Bootstrap p-value.. < 2.22e-16                 1 
KS Naive p-value...... < 2.22e-16           0.99869 
KS Statistic..........    0.56113          0.019305 


***** (V12) linps1 *****
                       Before Matching       After Matching
mean treatment........     1.0321            1.0321 
mean control..........   -0.98732            1.0249 
std mean diff.........     140.76           0.50261 

mean raw eQQ diff.....     2.0194          0.021934 
med  raw eQQ diff.....     2.0486         0.0054214 
max  raw eQQ diff.....     2.7235            1.7689 

mean eCDF diff........    0.34969         0.0025874 
med  eCDF diff........    0.38867          0.001287 
max  eCDF diff........    0.56113          0.019305 

var ratio (Tr/Co).....    0.98184            1.0325 
T-test p-value........ < 2.22e-16          0.035311 
KS Bootstrap p-value.. < 2.22e-16                 1 
KS Naive p-value...... < 2.22e-16           0.99869 
KS Statistic..........    0.56113          0.019305 


Before Matching Minimum p.value: < 2.22e-16 
Variable Name(s): poor_mental social_score diabetes insuff_sleep highschool_grad segregation above_65 ps1 linps1  Number(s): 1 4 5 6 7 8 10 11 12 

After Matching Minimum p.value: < 2.22e-16 
Variable Name(s): insuff_sleep highschool_grad segregation above_65  Number(s): 6 7 8 10 

6.3.2 Tabulating Standardized Differences

We will name the covariates used in our model.

Code
covnames <- c( "poor_mental", "unemployment", "income_inequality", "social_score", 
                  "diabetes", "insuff_sleep", "highschool_grad", "segregation",
                  "below_18", "above_65", "raw PS", "linear PS") 

Then we will extract the standardized differences.

Code
pre.szd <- NULL; post.szd <- NULL
for(i in 1:length(covnames)) {
  pre.szd[i] <- mb1$BeforeMatching[[i]]$sdiff.pooled
  post.szd[i] <- mb1$AfterMatching[[i]]$sdiff.pooled
}

Now, we can have our final table.

Code
match_szd <- tibble(covnames, pre.szd, post.szd, row.names=covnames)
match_szd
# A tibble: 12 × 4
   covnames          pre.szd post.szd row.names        
   <chr>               <dbl>    <dbl> <chr>            
 1 poor_mental         54.7    8.63   poor_mental      
 2 unemployment         7.29   0.779  unemployment     
 3 income_inequality    8.73  -0.755  income_inequality
 4 social_score        -1.95  -5.87   social_score     
 5 diabetes            78.8    4.58   diabetes         
 6 insuff_sleep        70.3   -3.62   insuff_sleep     
 7 highschool_grad     36.8    7.35   highschool_grad  
 8 segregation        -67.0  -20.5    segregation      
 9 below_18             2.43  -0.605  below_18         
10 above_65            39.4  -21.1    above_65         
11 raw PS             153.    -0.0179 raw PS           
12 linear PS          140.     0.503  linear PS        

We will use these results to build the love plot to better visualize the standardized differences.

6.3.3 Visualizing Standardized Differences

We will use the cobalt method to build plots for the standardized differences and the variance ratios.

Code
b <- bal.tab(match1, 
             dentist_n ~ poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 + ps1 + linps1, 
             data= chr_fin2, stats = c("m", "v"), un = TRUE,
             quick = FALSE)
b
Balance Measures
                     Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
poor_mental       Contin.  0.5262     1.1800   0.0863      1.0184
unemployment      Contin.  0.0715     1.0862   0.0078      1.0711
income_inequality Contin.  0.0780     1.6836  -0.0075      1.5929
social_score      Contin. -0.0168     2.0485  -0.0587      1.2064
diabetes          Contin.  0.7114     1.5867   0.0458      1.0249
insuff_sleep      Contin.  0.6869     1.1012  -0.0362      0.7283
highschool_grad   Contin.  0.3478     1.2754   0.0735      1.4175
segregation       Contin. -0.6076     1.5551  -0.2048      1.2258
below_18          Contin.  0.0238     1.0897  -0.0060      0.9618
above_65          Contin.  0.3989     0.9554  -0.2111      0.5852
ps1               Contin.  1.5579     0.9335  -0.0002      0.9934
linps1            Contin.  1.4076     0.9818   0.0050      1.0257

Sample sizes
                     Control Treated
All                   777.       777
Matched (ESS)         127.23     777
Matched (Unweighted)  278.       777
Unmatched             499.         0

As previously mentioned, the effective size after 1:1 matching with replacement is 278 counties. We lost 499 counties from the sample because they were unmatched.

Now we will build the love plot for the standardized differences:

Code
lplot1 <- love.plot(b, threshold = .1, size = 3,
               var.order = "unadjusted", stars = "raw",
               title = "Standardized Differences and 1:1 Matching with Replacement")
lplot1

All covariates have improved standardized difference after 1:1 matching adjustment, with the exception for social_score.

Plot of Variance Ratios

Code
lplot2 <- love.plot(b, stat = "v",
               threshold = 1.25, size = 3,
               var.order = "unadjusted",
               title = "Variance Ratios and 1:1 Matching with Replacement")
lplot2 + theme_bw()

Most of the covariates displayed improved variance ratios with the exception of income_inequality, highschool_grad, insuff_sleep, and above_65.

6.3.4 Building New Data Frame with Matched Sample

Code
matches1 <- factor(rep(match1$index.treated, 2))
chr.matchedsample1 <- cbind(matches1, chr_fin2[c(match1$index.control, match1$index.treated),])
Code
chr.matchedsample1 |> count(dentist_grp)
  dentist_grp   n
1        High 777
2         Low 777

Because the 1:1 matching method we used includes replacement, it is only natural to retain the original number of observations in the exposure groups.

6.3.5 Rubin’s Rules After Matching

First, we will check Rubin’s Rule 1:

Code
rubin1.match1 <- with(chr.matchedsample1,
      abs(100*(mean(linps1[dentist_n==1])-mean(linps1[dentist_n==0]))/sd(linps1)))
rubin1.match1
[1] 0.5067778

The absolute value of the standardized difference of the linear propensity score before adjustment was 114.8%. The value after adjustment (52.76%) shows substantial improvement as it is close to the maximum value of 50%, but the value remains above 10% and is not close to 0%. In this case, we barely pass Rubin’s Rule 1.

Next, we will check Rubin’s Rule 2:

Code
rubin2.match1 <- with(chr.matchedsample1, var(linps1[dentist_n==1])/var(linps1[dentist_n==0]))
rubin2.match1
[1] 1.032521

The previous ratio of the variance of the linear propensity score was 0.98. This value also shows mild improvement where the value is closer to 1 and remains below 2. We also pass Rubin’s Rule 2 and can proceed to use the results of this PS adjustment to measure the estimates.

6.3.6 Estimating Effect of Exposure on Outcome After 1:1 Matching

We will use the ATT (Average Treatment Effect on the Treated) estimate to measure the dentist proportion effect on smoking prevalence in US Counties. First, we will need to create a data frame out of the matched sample:

Code
matches1 <- factor(rep(match1$index.treated, 2))
chr.matchedsamplef <- cbind(matches1, chr_fin2[c(match1$index.control, match1$index.treated),])

Now, we can produce an estimate using the matched sample from chr_fin2.

Code
chr.matchedsamplef$matches.f <- as.factor(chr.matchedsamplef$matches1)

matched_mix.mod1 <- lmer(smoking_prop ~ dentist_n + (1 | matches.f), REML = TRUE, data=chr.matchedsamplef)

tidy_matched_mix.mod1 <- broom.mixed::tidy(matched_mix.mod1, conf.int = TRUE, conf.level = 0.95) |> 
  filter(term == "dentist_n")

tidy_matched_mix.mod1 |> kable(digits = 3)
effect group term estimate std.error statistic conf.low conf.high
fixed NA dentist_n 0.776 0.195 3.985 0.394 1.158

After 1:1 matching with replacement, our model estimates the effect of low dentists proportions on smoking prevalence in US counties to increase by 0.78 percentage points (95% CI: 0.39, 1.16). Our confidence interval does not include 0, therefore, following up with a sensitivity analysis is a reasonable choice.

6.4 Use ATT Weighting Adjustment

We will start by creating a data frame of our data set.

Code
chrfin_df <- base::data.frame(chr_fin2)

Now, we will calculate the propensity score using the ps function from the twang package.

Code
ps.chr <- twang::ps(dentist_n ~ poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 + ps1 + linps1,
             data = chrfin_df,
             n.trees = 5000,
             interaction.depth = 2,
             stop.method = c("es.mean"),
             estimand = "ATT",
             verbose = FALSE)

Checking the iteration:

Code
plot(ps.chr)

We can see that we have run enough iterations to reach a stable estimate.

Observing the effective sample size of the weighted estimates:

Code
summary(ps.chr)
            n.treat n.ctrl ess.treat ess.ctrl    max.es   mean.es    max.ks
unw             777    777       777 777.0000 1.5578591 0.5361848 0.5611326
es.mean.ATT     777    777       777 163.8941 0.1794596 0.0667433 0.1205385
            max.ks.p    mean.ks iter
unw               NA 0.26490776   NA
es.mean.ATT       NA 0.07668971 1111

The effective size after ATT weighting is 163 counties.

6.4.1 Checking the balance

Code
plot(ps.chr, plots = 2)

Code
plot(ps.chr, plots = 3)

We can see that the weighted estimates have an absolute standard differences are below 0.5 and are closer to 0 compared to the unadjusted estimates.

We can also assess the balance using the cobalt method.

Code
b2 <- cobalt::bal.tab(ps.chr, full.stop.method = "es.mean.att", 
        stats = c("m", "v"), un = TRUE)

b2
Balance Measures
                      Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
prop.score        Distance  2.2674     0.7698   0.5561      0.6932
poor_mental        Contin.  0.5262     1.1800   0.0079      0.9499
unemployment       Contin.  0.0715     1.0862   0.0512      1.1563
income_inequality  Contin.  0.0780     1.6836   0.0068      1.3792
social_score       Contin. -0.0168     2.0485   0.0007      1.0241
diabetes           Contin.  0.7114     1.5867   0.1795      1.4289
insuff_sleep       Contin.  0.6869     1.1012   0.0630      0.9451
highschool_grad    Contin.  0.3478     1.2754  -0.0281      1.3991
segregation        Contin. -0.6076     1.5551  -0.0901      0.9325
below_18           Contin.  0.0238     1.0897  -0.0110      1.0725
above_65           Contin.  0.3989     0.9554  -0.0601      0.8466
ps1                Contin.  1.5579     0.9335   0.1684      0.8879
linps1             Contin.  1.4076     0.9818   0.1342      0.8905

Effective sample sizes
           Control Treated
Unadjusted  777.       777
Adjusted    163.89     777

We will visualize the standardized differences using the Love plot.

Code
lplot3 <- cobalt::love.plot(b2, 
               threshold = .1, size = 3, stars = "raw",
               title = "Standardized Diffs and TWANG ATT weighting")
lplot3

All of the variables have observable improved standardized difference after TWANG ATT weighting adjustment.

Visualizing the variance ratio:

Code
lplot4 <- cobalt::love.plot(b2, stat = "v",
               threshold = 1.25, size = 3, 
               title = "Variance Ratios: TWANG ATT weighting")
lplot4 + theme_bw()

Most of the variables show some improvement in the variance ration after adjustment, though it is not very satisfying.

6.4.2 Estimating The Causal Effect After ATT Weight Adjustment

First, we will build the weight reference wts1 (which is the inverse propensity score) and add it to our data set chr_out1.

Code
chr_fin2$wts1 <- ifelse(chr_fin2$dentist_n==1, 1, chr_fin2$ps1/(1-chr_fin2$ps1))
Code
chrwt1.design <- survey::svydesign(ids=~1, weights=~wts1, data= chr_fin2) 

adjout1.wt1 <- survey::svyglm(smoking_prop ~ dentist_n, design= chrwt1.design)

wt_att_results1 <- tidy(adjout1.wt1, conf.int = TRUE) |> filter(term == "dentist_n")

wt_att_results1
# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 dentist_n   -0.434      1.09    -0.398   0.691    -2.57      1.71

After ATT weight adjustment, the estimated impact of low dentist proportions on smoking prevalence in US counties is decrease of -0.43 percentage points (95% CI: -2.57, 1.71).

6.5 Use Double Robust Adjustment

Code
chr.dr.design <- survey::svydesign(ids=~1, weights=~wts1, data= chr_fin2) 

dr.out1.wt1 <- survey::svyglm(smoking_prop ~ dentist_n + linps1, design= chr.dr.design)

dr_att_out1 <- tidy(dr.out1.wt1, conf.int = TRUE) |> filter(term == "dentist_n")

dr_att_out1
# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 dentist_n    0.723     0.453      1.60   0.111   -0.166      1.61

After Double Robust adjustment, the estimated impact of low dentist proportions on smoking prevalence in US counties is an increase of 0.72 percentage points (95% CI: -0.17, 1.61). These results are slightly different from the ATT Weighting results.

6.6 Compare Results

Code
chr_results <- tibble(
    analysis = c("Unadjusted", "PS 1:1 Match", "TWANG ATT WT", "Double Robust"),
    estimate = c(2.36, 0.78, -0.43, 0.72),
    conf.low = c(1.99, 0.39, -2.57, -0.17),
    conf.high = c(2.73, 1.16, 1.71, 1.61))

ggplot(chr_results, aes(x = analysis, y = estimate)) +
    geom_errorbar(aes(ymax = conf.high, ymin = conf.low), width = 0.5) + 
    geom_label(aes(label = estimate), size = 5)

We can see that propensity score adjustment decreased the estimated affect of dentist proportions on smoking prevalence in US counties (from 2.36% to about 0.7-0.8%) . While two of the three adjustments present close estimates, the 1:1 matching with replacement adjustment displays the smallest confidence interval, the best balance, and largest effective size (278 after 1:1 matching compared to 163 after ATT weighting).

6.7 Sensitivity Analysis

The estimate produced from the 1:1 Matching with replacement has a statistically significant p-value. Therefore, we will need to perform a sensitivity analysis.

Code
Rc <- match1$mdata$Y[match1$mdata$Tr==0]
Rt <- match1$mdata$Y[match1$mdata$Tr==1] 


psens(Rt, Rc, Gamma = 5, GammaInc = 0.25)

 Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
 
Unconfounded estimate ....  NaN 

 Gamma Lower bound Upper bound
  1.00         NaN         NaN
  1.25         NaN         NaN
  1.50         NaN         NaN
  1.75         NaN         NaN
  2.00         NaN         NaN
  2.25         NaN         NaN
  2.50         NaN         NaN
  2.75         NaN         NaN
  3.00         NaN         NaN
  3.25         NaN         NaN
  3.50         NaN         NaN
  3.75         NaN         NaN
  4.00         NaN         NaN
  4.25         NaN         NaN
  4.50         NaN         NaN
  4.75         NaN         NaN
  5.00         NaN         NaN

 Note: Gamma is Odds of Differential Assignment To
 Treatment Due to Unobserved Factors 
 

Our conclusions regarding the impact of low dentists proportions increasing smoking prevalence in US counties hold true in absence of biases in the study. This conclusion is maintained in the presence of a biases level between L= 1.00 and L= 1.25. This means that only small hidden bias is needed to change our conclusions.

7 Outcome 2 Analysis

7.1 Unadjusted Outcome 2 drinking_prop

We will observe the the distribution of the outcome drinking_prop among the two exposure groups dentist_grp.

Code
df_stats(drinking_prop ~ dentist_grp, data = chr_fin2) |>
  gt() |> fmt_number(mean:sd, decimals = 2) |>
  opt_stylize(style = 5, color = "pink")
response dentist_grp min Q1 median Q3 max mean sd n missing
drinking_prop High 10.140012 16.79649 18.88142 20.74658 29.44012 18.83 3.24 777 0
drinking_prop Low 9.265156 14.09065 16.03230 18.32060 27.81331 16.26 2.91 777 0

The low dentist group has a lower mean of excessive drinking prevalence (16.3 vs 18.8).

7.1.1 Linear Regression Model for Outcome 2

We will build a linear regression model for outcome 2 drinking_prop and the exposure dentist_grp without including the covariates.

Code
unadj.out2 <- lm(drinking_prop ~ dentist_grp, data= chr_fin2)

model_parameters(unadj.out2, ci = 0.95) |> 
  gt() |> fmt_number(columns = -df_error, decimals = 3) |>
  opt_stylize(style = 6, color = "pink")
Parameter Coefficient SE CI CI_low CI_high t df_error p
(Intercept) 18.825 0.110 0.950 18.609 19.042 170.493 1552 0.000
dentist_grpLow −2.563 0.156 0.950 −2.869 −2.256 −16.412 1552 0.000

The unadjusted model estimates that low dentist proportions decreases excessive drinking prevalence in US counties by 2.56 percentage points (95% CI: -2.87, -2.26).

7.2 1:1 Matching With Replacement

The binary exposure has an equal number of observations in each group (n=777). Therefore, we will use 1:1 matching with replacement.

Code
X <- chr_fin2$linps1 
Tr <- as.logical(chr_fin2$dentist_n)
match2 <- Match(Tr=Tr, X=X, M = 1, replace = TRUE, ties=FALSE)
summary(match2)

Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  1554 
Original number of treated obs...............  777 
Matched number of observations...............  777 
Matched number of observations  (unweighted).  777 

checking if matching worked:

Code
n_distinct(match2$index.treated)
[1] 777
Code
n_distinct(match2$index.control)
[1] 269

After 1:1 matching with replacement, the effective size was reduced to 269 matched counties from the low dentist and high dentist groups.

7.2.1 Balance Assessment

Code
set.seed(500)
mb2 <- MatchBalance(dentist_n ~ poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 + ps1 + linps1, 
                  data= chr_fin2, 
                    match.out = match2, nboots=500)

***** (V1) poor_mental *****
                       Before Matching       After Matching
mean treatment........      5.354             5.354 
mean control..........     5.0152            5.3006 
std mean diff.........     52.617            8.2912 

mean raw eQQ diff.....    0.34041          0.078434 
med  raw eQQ diff.....    0.36298          0.068421 
max  raw eQQ diff.....    0.73693           0.73693 

mean eCDF diff........    0.16003          0.033382 
med  eCDF diff........    0.18919           0.02574 
max  eCDF diff........    0.24324           0.10296 

var ratio (Tr/Co).....       1.18            1.0018 
T-test p-value........ < 2.22e-16          0.081108 
KS Bootstrap p-value.. < 2.22e-16             0.002 
KS Naive p-value...... < 2.22e-16        0.00052946 
KS Statistic..........    0.24324           0.10296 


***** (V2) unemployment *****
                       Before Matching       After Matching
mean treatment........   0.035706          0.035706 
mean control..........   0.034857          0.035517 
std mean diff.........     7.1471            1.5883 

mean raw eQQ diff.....  0.0010747        0.00088258 
med  raw eQQ diff.....   0.001005        0.00061619 
max  raw eQQ diff.....   0.013165          0.018658 

mean eCDF diff........   0.025066          0.017423 
med  eCDF diff........   0.020592           0.01287 
max  eCDF diff........   0.059202          0.055341 

var ratio (Tr/Co).....     1.0862            1.1023 
T-test p-value........    0.15077            0.7428 
KS Bootstrap p-value..      0.146             0.178 
KS Naive p-value......    0.13128           0.18502 
KS Statistic..........   0.059202          0.055341 


***** (V3) income_inequality *****
                       Before Matching       After Matching
mean treatment........     4.6117            4.6117 
mean control..........     4.5373            4.6387 
std mean diff.........     7.7974           -2.8275 

mean raw eQQ diff.....    0.15959            0.1258 
med  raw eQQ diff.....    0.10178          0.079581 
max  raw eQQ diff.....     2.0677            2.5917 

mean eCDF diff........   0.037661          0.027845 
med  eCDF diff........   0.041184          0.028314 
max  eCDF diff........   0.073359          0.065637 

var ratio (Tr/Co).....     1.6836            1.5003 
T-test p-value........   0.085366           0.53569 
KS Bootstrap p-value..      0.032             0.078 
KS Naive p-value......   0.030552          0.070342 
KS Statistic..........   0.073359          0.065637 


***** (V4) social_score *****
                       Before Matching       After Matching
mean treatment........     10.845            10.845 
mean control..........     10.969            11.301 
std mean diff.........    -1.6808           -6.1888 

mean raw eQQ diff.....     1.7301           0.94784 
med  raw eQQ diff.....     1.2191            0.9656 
max  raw eQQ diff.....     21.371            21.456 

mean eCDF diff........   0.059115          0.049636 
med  eCDF diff........   0.052767          0.056628 
max  eCDF diff........    0.12999          0.099099 

var ratio (Tr/Co).....     2.0485            1.2142 
T-test p-value........    0.70099           0.19129 
KS Bootstrap p-value.. < 2.22e-16             0.002 
KS Naive p-value...... 3.9747e-06        0.00097071 
KS Statistic..........    0.12999          0.099099 


***** (V5) diabetes *****
                       Before Matching       After Matching
mean treatment........    0.11284           0.11284 
mean control..........   0.095932           0.11228 
std mean diff.........     71.142            2.3342 

mean raw eQQ diff.....   0.016906         0.0017748 
med  raw eQQ diff.....      0.018             0.001 
max  raw eQQ diff.....      0.033             0.045 

mean eCDF diff........    0.13194          0.013378 
med  eCDF diff........   0.099743          0.010296 
max  eCDF diff........    0.34234          0.048906 

var ratio (Tr/Co).....     1.5867            1.0398 
T-test p-value........ < 2.22e-16           0.57795 
KS Bootstrap p-value.. < 2.22e-16             0.238 
KS Naive p-value...... < 2.22e-16           0.31067 
KS Statistic..........    0.34234          0.048906 


***** (V6) insuff_sleep *****
                       Before Matching       After Matching
mean treatment........    0.35465           0.35465 
mean control..........    0.32925            0.3563 
std mean diff.........     68.689           -4.4658 

mean raw eQQ diff.....     0.0254          0.007618 
med  raw eQQ diff.....      0.027             0.006 
max  raw eQQ diff.....      0.032             0.028 

mean eCDF diff........    0.13895          0.041729 
med  eCDF diff........    0.11583          0.041184 
max  eCDF diff........     0.3166           0.10682 

var ratio (Tr/Co).....     1.1012           0.75614 
T-test p-value........ < 2.22e-16           0.36823 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        0.00028217 
KS Statistic..........     0.3166           0.10682 


***** (V7) highschool_grad *****
                       Before Matching       After Matching
mean treatment........    0.89305           0.89305 
mean control..........    0.86432           0.88934 
std mean diff.........      34.78               4.5 

mean raw eQQ diff.....   0.033037          0.012938 
med  raw eQQ diff.....   0.029953         0.0084777 
max  raw eQQ diff.....    0.37636           0.46344 

mean eCDF diff........    0.15082          0.041283 
med  eCDF diff........    0.16345          0.041184 
max  eCDF diff........    0.26641           0.11583 

var ratio (Tr/Co).....     1.2754            1.7386 
T-test p-value........ 6.2084e-13           0.31027 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        5.9379e-05 
KS Statistic..........    0.26641           0.11583 


***** (V8) segregation *****
                       Before Matching       After Matching
mean treatment........     45.162            45.162 
mean control..........     56.059            48.876 
std mean diff.........    -60.758           -20.706 

mean raw eQQ diff.....     10.918            4.6238 
med  raw eQQ diff.....     10.675            4.5646 
max  raw eQQ diff.....     17.966            9.8243 

mean eCDF diff........    0.18644          0.079899 
med  eCDF diff........    0.20721          0.070785 
max  eCDF diff........    0.33076           0.20463 

var ratio (Tr/Co).....     1.5551            1.2463 
T-test p-value........ < 2.22e-16        1.7141e-06 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        1.4809e-14 
KS Statistic..........    0.33076           0.20463 


***** (V9) below_18 *****
                       Before Matching       After Matching
mean treatment........    0.21521           0.21521 
mean control..........    0.21435           0.21636 
std mean diff.........     2.3763           -3.1678 

mean raw eQQ diff.....  0.0033666         0.0038545 
med  raw eQQ diff.....  0.0025852         0.0031975 
max  raw eQQ diff.....   0.071666          0.099059 

mean eCDF diff........   0.023328          0.028745 
med  eCDF diff........   0.023166          0.027027 
max  eCDF diff........   0.050193           0.07722 

var ratio (Tr/Co).....     1.0897           0.97856 
T-test p-value........    0.63248           0.53529 
KS Bootstrap p-value..       0.29              0.01 
KS Naive p-value......    0.28165          0.019447 
KS Statistic..........   0.050193           0.07722 


***** (V10) above_65 *****
                       Before Matching       After Matching
mean treatment........    0.21521           0.21521 
mean control..........    0.19577           0.22345 
std mean diff.........     39.892           -16.919 

mean raw eQQ diff.....   0.019441          0.015299 
med  raw eQQ diff.....    0.02113         0.0090949 
max  raw eQQ diff.....   0.055136          0.057585 

mean eCDF diff........    0.13214          0.062442 
med  eCDF diff........    0.13385          0.065637 
max  eCDF diff........    0.24453           0.15315 

var ratio (Tr/Co).....    0.95543           0.59933 
T-test p-value........ 1.3767e-14         0.0032109 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... < 2.22e-16        2.4317e-08 
KS Statistic..........    0.24453           0.15315 


***** (V11) ps1 *****
                       Before Matching       After Matching
mean treatment........    0.68035           0.68035 
mean control..........    0.31965           0.68035 
std mean diff.........     155.79        -0.0011278 

mean raw eQQ diff.....    0.36069         0.0014012 
med  raw eQQ diff.....    0.40833        0.00091168 
max  raw eQQ diff.....    0.48292         0.0093827 

mean eCDF diff........    0.34969         0.0026134 
med  eCDF diff........    0.38867          0.001287 
max  eCDF diff........    0.56113          0.019305 

var ratio (Tr/Co).....    0.93354           0.99994 
T-test p-value........ < 2.22e-16           0.97131 
KS Bootstrap p-value.. < 2.22e-16                 1 
KS Naive p-value...... < 2.22e-16           0.99869 
KS Statistic..........    0.56113          0.019305 


***** (V12) linps1 *****
                       Before Matching       After Matching
mean treatment........     1.0321            1.0321 
mean control..........   -0.98732            1.0247 
std mean diff.........     140.76            0.5161 

mean raw eQQ diff.....     2.0194          0.021914 
med  raw eQQ diff.....     2.0486         0.0054645 
max  raw eQQ diff.....     2.7235            1.7689 

mean eCDF diff........    0.34969         0.0026134 
med  eCDF diff........    0.38867          0.001287 
max  eCDF diff........    0.56113          0.019305 

var ratio (Tr/Co).....    0.98184            1.0326 
T-test p-value........ < 2.22e-16          0.030659 
KS Bootstrap p-value.. < 2.22e-16                 1 
KS Naive p-value...... < 2.22e-16           0.99869 
KS Statistic..........    0.56113          0.019305 


Before Matching Minimum p.value: < 2.22e-16 
Variable Name(s): poor_mental social_score diabetes insuff_sleep highschool_grad segregation above_65 ps1 linps1  Number(s): 1 4 5 6 7 8 10 11 12 

After Matching Minimum p.value: < 2.22e-16 
Variable Name(s): insuff_sleep highschool_grad segregation above_65  Number(s): 6 7 8 10 

7.2.2 Tabulating Standardized Differences

We will name the covariates used in our model.

Code
covnames2 <- c( "poor_mental", "unemployment", "income_inequality", "social_score", 
                  "diabetes", "insuff_sleep", "highschool_grad", "segregation",
                  "below_18", "above_65", "raw PS", "linear PS") 

Then we will extract the standardized differences.

Code
pre.szd2 <- NULL; post.szd2 <- NULL
for(i in 1:length(covnames2)) {
  pre.szd2[i] <- mb2$BeforeMatching[[i]]$sdiff.pooled
  post.szd2[i] <- mb2$AfterMatching[[i]]$sdiff.pooled
}

Now, we can have our final table.

Code
match_szd2 <- tibble(covnames2, pre.szd2, post.szd2, row.names=covnames2)
match_szd2
# A tibble: 12 × 4
   covnames2         pre.szd2 post.szd2 row.names        
   <chr>                <dbl>     <dbl> <chr>            
 1 poor_mental          54.7    8.29    poor_mental      
 2 unemployment          7.29   1.59    unemployment     
 3 income_inequality     8.73  -2.83    income_inequality
 4 social_score         -1.95  -6.19    social_score     
 5 diabetes             78.8    2.33    diabetes         
 6 insuff_sleep         70.3   -4.47    insuff_sleep     
 7 highschool_grad      36.8    4.50    highschool_grad  
 8 segregation         -67.0  -20.7     segregation      
 9 below_18              2.43  -3.17    below_18         
10 above_65             39.4  -16.9     above_65         
11 raw PS              153.    -0.00113 raw PS           
12 linear PS           140.     0.516   linear PS        

We will use these results to build the love plot to better visualize the standardized differences.

7.2.3 Visualizing Standardized Differences

We will use the cobalt method to build plots for the standardized differences and the variance ratios.

Code
b2 <- bal.tab(match2, 
             dentist_n ~ poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 + ps1 + linps1, 
             data= chr_fin2, stats = c("m", "v"), un = TRUE,
             quick = FALSE)
b2
Balance Measures
                     Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
poor_mental       Contin.  0.5262     1.1800   0.0829      0.9952
unemployment      Contin.  0.0715     1.0862   0.0159      1.0950
income_inequality Contin.  0.0780     1.6836  -0.0283      1.4904
social_score      Contin. -0.0168     2.0485  -0.0619      1.2062
diabetes          Contin.  0.7114     1.5867   0.0233      1.0329
insuff_sleep      Contin.  0.6869     1.1012  -0.0447      0.7511
highschool_grad   Contin.  0.3478     1.2754   0.0450      1.7271
segregation       Contin. -0.6076     1.5551  -0.2071      1.2381
below_18          Contin.  0.0238     1.0897  -0.0317      0.9721
above_65          Contin.  0.3989     0.9554  -0.1692      0.5954
ps1               Contin.  1.5579     0.9335  -0.0000      0.9933
linps1            Contin.  1.4076     0.9818   0.0052      1.0257

Sample sizes
                     Control Treated
All                    777.      777
Matched (ESS)          126.7     777
Matched (Unweighted)   269.      777
Unmatched              508.        0

As previously mentioned, the effective size after 1:1 matching with replacement is 269 counties. We lost 508 counties from the sample because they were unmatched.

Now we will build the love plot for the standardized differences:

Code
lplot5 <- love.plot(b2, threshold = .1, size = 3,
               var.order = "unadjusted", stars = "raw",
               title = "Standardized Differences and 1:1 Matching with Replacement")
lplot5

All covariates have improved standardized difference after 1:1 matching adjustment, with the exception for social_score.

Plot of Variance Ratios

Code
lplot6 <- love.plot(b2, stat = "v",
               threshold = 1.25, size = 3,
               var.order = "unadjusted",
               title = "Variance Ratios and 1:1 Matching with Replacement")
lplot6 + theme_bw()

Most of the covariates displayed improved variance ratios with the exception of highschool_grad, insuff_sleep, unemployment, and above_65.

7.2.4 Building New Data Frame with Matched Sample

Code
matches2 <- factor(rep(match2$index.treated, 2))
chr.matchedsample2 <- cbind(matches2, chr_fin2[c(match2$index.control, match2$index.treated),])
Code
chr.matchedsample2 |> count(dentist_grp)
  dentist_grp   n
1        High 777
2         Low 777

Because the 1:1 matching method we used includes replacement, it is only natural to retain the original number of observations in the exposure groups.

7.2.5 Rubin’s Rules After Matching

First, we will check Rubin’s Rule 1:

Code
rubin1.match2 <- with(chr.matchedsample2,
      abs(100*(mean(linps1[dentist_n==1])-mean(linps1[dentist_n==0]))/sd(linps1)))
rubin1.match2
[1] 0.5203845

The absolute value of the standardized difference of the linear propensity score before adjustment was 114.8%. The value after adjustment (52.03%) shows substantial improvement, however the value is above 10% and far from to 0%. In this case, we barely pass Rubin’s Rule 1.

Next, we will check Rubin’s Rule 2:

Code
rubin2.match2 <- with(chr.matchedsample2, var(linps1[dentist_n==1])/var(linps1[dentist_n==0]))
rubin2.match2
[1] 1.032566

The previous ratio of the variance of the linear propensity score was 0.98. This value also shows mild improvement where the value is closer to 1 (1.03) and remains below 2. We also pass Rubin’s Rule 2 and can proceed to use the results of this PS adjustment to measure the estimates.

7.2.6 Estimating Effect of Exposure on Outcome After 1:1 Matching

We will use the ATT (Average Treatment Effect on the Treated) estimate to measure the dentist proportion effect on smoking prevalence in US Counties. First, we will need to create a data frame out of the matched sample:

Code
matches2 <- factor(rep(match2$index.treated, 2))
chr.matchedsamplef2 <- cbind(matches2, chr_fin2[c(match2$index.control, match2$index.treated),])

Now, we can produce an estimate using the matched sample from chr_fin2.

Code
chr.matchedsamplef2$matches2.f <- as.factor(chr.matchedsamplef2$matches2)

matched_mix.mod2 <- lmer(drinking_prop ~ dentist_n + (1 | matches2.f), REML = TRUE, data=chr.matchedsamplef2)

tidy_matched_mix.mod2 <- broom.mixed::tidy(matched_mix.mod2, conf.int = TRUE, conf.level = 0.95) |> 
  filter(term == "dentist_n")

tidy_matched_mix.mod2 |> kable(digits = 3)
effect group term estimate std.error statistic conf.low conf.high
fixed NA dentist_n 0.396 0.122 3.255 0.158 0.635

After 1:1 matching with replacement, our model estimates the effect of low dentists proportions on excessive drinking prevalence in US counties to increase by 0.40 percentage points (95% CI: 0.16, 0.64). Our confidence interval does not include 0, therefore, following up with a sensitivity analysis is a reasonable choice.

7.3 Use ATT Weighting Adjustment

We will start by creating a data frame of our data set.

Code
chrfin_df2 <- base::data.frame(chr_fin2)

Now, we will calculate the propensity score using the ps function from the twang package.

Code
ps.chr2 <- twang::ps(dentist_n ~ poor_mental + unemployment  + income_inequality + social_score + 
                  diabetes  + insuff_sleep + highschool_grad + segregation +
                  below_18 + above_65 + ps1 + linps1,
             data = chrfin_df2,
             n.trees = 5000,
             interaction.depth = 2,
             stop.method = c("es.mean"),
             estimand = "ATT",
             verbose = FALSE)

Checking the iteration:

Code
plot(ps.chr2)

We can see that we have run enough iterations to reach a stable estimate.

Observing the effective sample size of the weighted estimates:

Code
summary(ps.chr2)
            n.treat n.ctrl ess.treat ess.ctrl    max.es   mean.es    max.ks
unw             777    777       777 777.0000 1.5578591 0.5361848 0.5611326
es.mean.ATT     777    777       777 163.8941 0.1794596 0.0667433 0.1205385
            max.ks.p    mean.ks iter
unw               NA 0.26490776   NA
es.mean.ATT       NA 0.07668971 1111

The effective size after ATT weighting is 163 counties.

7.3.1 Checking the balance

Code
plot(ps.chr2, plots = 2)

Code
plot(ps.chr2, plots = 3)

We can see that the weighted estimates have an absolute standard differences are below 0.5 and are closer to 0 compared to the unadjusted estimates.

We can also assess the balance using the cobalt method.

Code
b3 <- cobalt::bal.tab(ps.chr2, full.stop.method = "es.mean.att", 
        stats = c("m", "v"), un = TRUE)

b3
Balance Measures
                      Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
prop.score        Distance  2.2674     0.7698   0.5561      0.6932
poor_mental        Contin.  0.5262     1.1800   0.0079      0.9499
unemployment       Contin.  0.0715     1.0862   0.0512      1.1563
income_inequality  Contin.  0.0780     1.6836   0.0068      1.3792
social_score       Contin. -0.0168     2.0485   0.0007      1.0241
diabetes           Contin.  0.7114     1.5867   0.1795      1.4289
insuff_sleep       Contin.  0.6869     1.1012   0.0630      0.9451
highschool_grad    Contin.  0.3478     1.2754  -0.0281      1.3991
segregation        Contin. -0.6076     1.5551  -0.0901      0.9325
below_18           Contin.  0.0238     1.0897  -0.0110      1.0725
above_65           Contin.  0.3989     0.9554  -0.0601      0.8466
ps1                Contin.  1.5579     0.9335   0.1684      0.8879
linps1             Contin.  1.4076     0.9818   0.1342      0.8905

Effective sample sizes
           Control Treated
Unadjusted  777.       777
Adjusted    163.89     777

We will visualize the standardized differences using the Love plot.

Code
lplot7 <- cobalt::love.plot(b3, 
               threshold = .1, size = 3, stars = "raw",
               title = "Standardized Diffs and TWANG ATT weighting")
lplot7

All of the variables have observable improved standardized difference after TWANG ATT weighting adjustment.

Visualizing the variance ratio:

Code
lplot8 <- cobalt::love.plot(b3, stat = "v",
               threshold = 1.25, size = 3, 
               title = "Variance Ratios: TWANG ATT weighting")
lplot8 + theme_bw()

Most of the variables show some improvement in the variance ration after adjustment, though it is not very satisfying.

7.3.2 Estimating The Causal Effect After ATT Weight Adjustment

We will use the weight design previously created wt1 and chrwt1.design.

Code
adjout2.wt1 <- survey::svyglm(drinking_prop ~ dentist_n, design= chrwt1.design)

wt_att_results2 <- tidy(adjout2.wt1, conf.int = TRUE) |> filter(term == "dentist_n")

wt_att_results2
# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 dentist_n     1.15     0.520      2.20  0.0276    0.126      2.17

After ATT weight adjustment, the estimated impact of low dentist proportions on excessive drinking prevalence in US counties is an increase of 1.15 percentage points (95% CI: 0.13, 2.17).

7.4 Use Double Robust Adjustment

We will use the previous double robust design chr.dr.design.

Code
dr.out2.wt1 <- survey::svyglm(drinking_prop ~ dentist_n + linps1, design= chr.dr.design)

dr_att_out2 <- tidy(dr.out2.wt1, conf.int = TRUE) |> filter(term == "dentist_n")

dr_att_out2
# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 dentist_n   -0.106     0.222    -0.477   0.633   -0.542     0.330

After Double Robust adjustment, the estimated impact of low dentist proportions on excessive drinking prevalence in US counties is a decrease of -0.11 percentage points (95% CI: -0.54, 0.33). These results are very different from the ATT Weighting results.

7.5 Compare Results

Code
chr_results2 <- tibble(
    analysis = c("Unadjusted", "PS 1:1 Match", "TWANG ATT WT", "Double Robust"),
    estimate = c(-2.56, 0.40, 1.15, -0.11),
    conf.low = c(-2.87, 0.16, 0.13, -0.54),
    conf.high = c(-2.26, 0.64, 2.17, 0.33))

ggplot(chr_results2, aes(x = analysis, y = estimate)) +
    geom_errorbar(aes(ymax = conf.high, ymin = conf.low), width = 0.5) + 
    geom_label(aes(label = estimate), size = 5)

We can see that propensity score adjustment inversed the estimated affect of dentist proportions on excessive drinking prevalence in US counties (from -2.56% to 0.4-1.2%) . While two of the three adjustments present close estimates, the 1:1 matching with replacement adjustment displays the smallest confidence interval, the best balance, and largest effective size (273 after 1:1 matching compared to 163 after ATT weighting).

7.6 Sensitivity Analysis

The estimate produced from the 1:1 Matching with replacement has a statistically significant p-value. Therefore, we will need to perform a sensitivity analysis.

Code
Rc2 <- match2$mdata$Y[match2$mdata$Tr==0]
Rt2 <- match2$mdata$Y[match2$mdata$Tr==1] 


psens(Rt2, Rc2, Gamma = 5, GammaInc = 0.25)

 Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
 
Unconfounded estimate ....  NaN 

 Gamma Lower bound Upper bound
  1.00         NaN         NaN
  1.25         NaN         NaN
  1.50         NaN         NaN
  1.75         NaN         NaN
  2.00         NaN         NaN
  2.25         NaN         NaN
  2.50         NaN         NaN
  2.75         NaN         NaN
  3.00         NaN         NaN
  3.25         NaN         NaN
  3.50         NaN         NaN
  3.75         NaN         NaN
  4.00         NaN         NaN
  4.25         NaN         NaN
  4.50         NaN         NaN
  4.75         NaN         NaN
  5.00         NaN         NaN

 Note: Gamma is Odds of Differential Assignment To
 Treatment Due to Unobserved Factors 
 

Our conclusions regarding the impact of low dentists proportions increasing excessive drinking prevalence in US counties hold true in absence of biases in the study. This conclusion is maintained in the presence of a biases level between L= 1.00 and L= 1.25. This means that only small hidden bias is needed to change our conclusions.

8 Discussion

This study found that counties with low dentist proportions are associated with a small increase in smoking prevalence and excessive drinking prevalence among adults (0.78% and 0.4% respectively). Although these estimates provided a meaningful size effect, the sensitivity analysis concluded that our conclusions may change in the presence of small hidden bias (gamma between 1.0 and 1.25). The weak findings may be attributed to the existing low number of dentists per 1000 population in US counties. Despite the attempt to dichotomize the dentist variable by using the first and third quartiles (0.2 and 0.6 per 1000 population), the difference was not large enough to detect observable impact. It is also important to note that increased presence of dentists does not spontaneously reduce negative oral habits (smoking and excessive drinking). The quality care and accessibility of services is an important factor in improving oral health and other related health behaviors.

This project emphasized the importance of using confounders to assess selection biases in observational studies. The difference in the exposure’s effect estimates after propensity score adjustment was remarkable. While our initial belief was that having more covariates would improve the analysis, this project corrected that misconception. In our analysis, multiple covariates were dropped due to high correlation, collinearity, or for disrupting the propensity score values. Selecting and evaluating appropriate covariates was the most critical part for the propensity score analysis. To improve my study in the future, we will need to explore more variables from the County Health Rankings (CHR) as we may have missed other useful variables during the initial selection phase. Also, using the Health Professional Shortage Area classification (HPSA) instead of the number of dentists per population can provide a better assessment of the impact of dentist quantities on oral-related health behaviors in US counties.

9 Session Information

Code
sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.4    forcats_1.0.1      stringr_1.5.2      purrr_1.1.0       
 [5] tidyr_1.3.1        tibble_3.3.0       tidyverse_2.0.0    rbounds_2.2       
 [9] car_3.1-3          carData_3.0-5      twang_2.6.1        patchwork_1.3.2   
[13] mice_3.18.0        survival_3.8-3     broom_1.0.10       cobalt_4.6.1      
[17] arm_1.14-4         lme4_1.1-37        Matching_4.10-15   MASS_7.3-65       
[21] Epi_2.61           gt_1.1.0           mosaic_1.9.2       mosaicData_0.20.4 
[25] ggformula_1.0.0    Matrix_1.7-3       ggplot2_4.0.0      lattice_0.22-7    
[29] tableone_0.13.2    see_0.12.0         report_0.6.1       parameters_0.28.2 
[33] performance_0.15.2 modelbased_0.13.0  insight_1.4.2      effectsize_1.0.1  
[37] datawizard_1.3.0   correlation_0.8.8  bayestestR_0.17.0  easystats_0.7.5   
[41] naniar_1.1.0       knitr_1.50         Hmisc_5.2-4        magrittr_2.0.4    
[45] readr_2.1.5        dplyr_1.1.4        janitor_2.2.1     

loaded via a namespace (and not attached):
  [1] splines_4.5.1           rpart_4.1.24            lifecycle_1.0.4        
  [4] Rdpack_2.6.4            globals_0.18.0          vroom_1.6.6            
  [7] backports_1.5.0         survey_4.4-8            sass_0.4.10            
 [10] rmarkdown_2.30          yaml_2.3.10             DBI_1.2.3              
 [13] minqa_1.2.8             RColorBrewer_1.1-3      abind_1.4-8            
 [16] nnet_7.3-20             gdtools_0.4.4           labelled_2.15.0        
 [19] gbm_2.2.2               listenv_0.9.1           MatrixModels_0.5-4     
 [22] parallelly_1.45.1       codetools_0.2-20        xml2_1.4.0             
 [25] tidyselect_1.2.1        shape_1.4.6.1           farver_2.1.2           
 [28] broom.mixed_0.2.9.6     base64enc_0.1-3         jsonlite_2.0.0         
 [31] e1071_1.7-16            mitml_0.4-5             Formula_1.2-5          
 [34] ggridges_0.5.7          iterators_1.0.14        systemfonts_1.3.1      
 [37] foreach_1.5.2           tools_4.5.1             cmprsk_2.2-12          
 [40] Rcpp_1.1.0              glue_1.8.0              gridExtra_2.3          
 [43] pan_1.9                 xfun_0.52               chk_0.10.0             
 [46] mgcv_1.9-3              withr_3.0.2             numDeriv_2016.8-1.1    
 [49] fastmap_1.2.0           mitools_2.4             latticeExtra_0.6-31    
 [52] boot_1.3-31             digest_0.6.37           timechange_0.3.0       
 [55] R6_2.6.1                visdat_0.6.0            colorspace_2.1-2       
 [58] jpeg_0.1-11             utf8_1.2.6              generics_0.1.4         
 [61] fontLiberation_0.1.0    data.table_1.17.8       class_7.3-23           
 [64] htmlwidgets_1.6.4       pkgconfig_2.0.3         gtable_0.3.6           
 [67] S7_0.2.0                furrr_0.3.1             htmltools_0.5.8.1      
 [70] fontBitstreamVera_0.1.1 scales_1.4.0            png_0.1-8              
 [73] reformulas_0.4.1        snakecase_0.11.1        rstudioapi_0.17.1      
 [76] tzdb_0.5.0              coda_0.19-4.1           checkmate_2.3.3        
 [79] nlme_3.1-168            nloptr_2.2.1            proxy_0.4-27           
 [82] zoo_1.8-14              parallel_4.5.1          foreign_0.8-90         
 [85] pillar_1.11.1           grid_4.5.1              vctrs_0.6.5            
 [88] jomo_2.7-6              xtable_1.8-4            cluster_2.1.8.1        
 [91] htmlTable_2.4.3         evaluate_1.0.5          cli_3.6.5              
 [94] etm_1.1.2               compiler_4.5.1          rlang_1.1.6            
 [97] crayon_1.5.3            labeling_0.4.3          interp_1.1-6           
[100] plyr_1.8.9              fs_1.6.6                ggiraph_0.9.2          
[103] stringi_1.8.7           deldir_2.0-4            glmnet_4.1-10          
[106] mosaicCore_0.9.5        fontquiver_0.2.1        hms_1.1.4              
[109] bit64_4.6.0-1           future_1.67.0           haven_2.5.5            
[112] rbibutils_2.3           bit_4.6.0               xgboost_1.7.11.1