The study uses data from the National Survey of Children’s Health (NSCH), year 2022. It is a survey based data set with the main subject children aged 0-17. The population includes children from all 50 states and the District of Columbia. The survey is a self-administered web-based and paper-based questionnaire. The respondent is a parent or a guardian, where one child from each household is selected to be the subject of the questionnaire.
This study focuses on children with hospitalization records. Medically compromised patients often face delay when receiving oral health care, which can negatively impact their oral and overall health status. The aim of this study is to assess whether being a child with frequent hospital visits, history of admission, and taking medication, impacts their access to preventive dental care.
5 Research Questions
The overarching purpose of this study is to investigate the access and oral health status of children with hospitalization history. Medically compromised children often face barriers to preventive dental care, which negatively impacts their health outcomes.
Research question 1: What are the odds of children with hospitalization records of receiving preventive dental care when other variables (age) are taken into account?
Hospital admission and ER visits were used to observe whether they present as barriers to preventive oral care access. Children with poor health status can often face difficulties accessing preventive dental care.
Research question 2: What are the odds of children with hospitalization records of experiencing chronic cavities, when other variables (ER visits and medication) are taken into account?
Hospital admission, ER visits, and medication use were added to observe potential associations. There has been dental reluctance in treating children with poor health status. Children with more hospital visits may face dental care neglect and consequently have higher odds of experiencing cavities
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `er_visits = fct_collapse(er_visits, `3` = c(" 3", "4"))`.
Caused by warning:
! Unknown levels in `f`: 3
Code
tabyl(nsch22_raw$er_visits)
nsch22_raw$er_visits n percent valid_percent
1 2511 0.837000000 0.83979933
2 372 0.124000000 0.12441472
3 107 0.035666667 0.03578595
<NA> 10 0.003333333 NA
Code
nsch22_raw <- nsch22_raw |>mutate(er_visits =fct_recode(er_visits,"None"="1","1 visit"="2","2 or more visits"="3"))tabyl(nsch22_raw$er_visits)
nsch22_raw$er_visits n percent valid_percent
None 2511 0.837000000 0.83979933
1 visit 372 0.124000000 0.12441472
2 or more visits 107 0.035666667 0.03578595
<NA> 10 0.003333333 NA
nsch22_raw$med_use n percent valid_percent
yes 556 0.1853333333 0.185457
no 2442 0.8140000000 0.814543
<NA> 2 0.0006666667 NA
6.2.4 Working with Categorical Outcomes
dentistvisit
Code
nsch22_raw <- nsch22_raw |>mutate(dentistvisit =fct_recode(dentistvisit,"No preventive visits"="1","1 visit"="2","2 or more visits"="3"), dentistvisit=relevel(dentistvisit, "No preventive visits", "1 visit", "2 or more visits"),dentistvisit=factor (dentistvisit, ordered =TRUE))is.ordered(nsch22_raw$dentistvisit)
[1] TRUE
Code
tabyl(nsch22_raw$dentistvisit)
nsch22_raw$dentistvisit n percent valid_percent
No preventive visits 86 0.02866667 0.03568465
1 visit 947 0.31566667 0.39294606
2 or more visits 1377 0.45900000 0.57136929
<NA> 590 0.19666667 NA
nsch22_raw$cavities n percent valid_percent
yes 282 0.094000000 0.09444072
no 2704 0.901333333 0.90555928
<NA> 14 0.004666667 NA
6.2.5 Arranging the Tibble
We will rename the cleaned sample as nsch22.
Code
nsch22 <- nsch22_rawdim(nsch22)
[1] 3000 7
Code
identical(nrow(nsch22), n_distinct(nsch22$hhid))
[1] TRUE
6.3 Print and Save The Final Tibble
Code
write_rds(nsch22,"nsch22.Rds")
7 Code Book and Description
Variable
Role
Type
Description
hhid
ID
-
Unique Household ID
dentistvisit
Outcome (multi-categorical regression)
3 Categories
The number of dental visits the child had during the past 12 months to seek preventive dental care: None, 1 visit, 2 or more visits
cavities
Outcome (logistic regression)
Binary
Child experience of chronic cavities during the past 12 Months (yes or no)
age
input
Quantitative
Child age in years
er_visits
input
3 Categories
The number of child emergency room visits (None, 1 visit, 2 or more visits)
hospitalstay
input
Binary
Child admitted to a hospital for at least one night during past 12 months (yes or no)
med_use
Outcome
Binary
Child need for prescribed medicine by a doctor (yes/ no)
8 Analyses
8.1 Analysis 1
8.1.1 Research Question for Analysis 1
What are the odds of children with hospitalization records of receiving preventive dental care when other variables (age) are taken into account?
Children who have experienced hospitalization might have a higher likelihood of receiving preventive dental care. This could be due to frequent ER visits and hospitalization often indicating a decline in the health status, which may lead to a more comprehensive approach to healthcare, encompassing preventive dental care. Furthermore, younger children may be more likely to receive preventive dental care due to parental involvement and regular check-ups, while older children might have more autonomy over their healthcare decisions and may be more proactive about preventive measures. We need to conduct a proper statistical analysis of relevant data to confirm those anticipations.
We plan to use an ordinal logistic regression model to analyze the likelihood of children with history of ER visits and hospital admission receiving preventive dental care. This model will consider three factors that might influence the outcome: age, history of hospitalization , and the number of emergency room visits We will treat the outcome variable, dentistvisit as an ordinal variable, acknowledging that it has a natural order (No preventive visits ,1 visit, 2 or more visits).
8.1.2 Analysis 1 Outcome
Our outcome of interest is dentistvisit. It indicates how many visits during the past 12 months a child has seen a dentist or other oral health care provider for preventive dental care(No preventive visits in the past 12 months, none, 1 visit, 2 or more visits).
As we have missing values in our predictors hospitalstay and er_visits, we conduct single imputation under the assumption that missing values are missing at random (MAR).
Code
set.seed(432)orderdata <- nsch22_model_one |>select(-hhid)nsch22_order_mice1 <-mice(orderdata, m =15, printFlag =FALSE)
# A tibble: 2,410 × 5
dentistvisit age er_visits hospitalstay hhid
<ord> <dbl> <fct> <fct> <chr>
1 2 or more visits 7 None no 22017610
2 1 visit 12 1 visit no 22075400
3 2 or more visits 12 None no 22103566
4 2 or more visits 18 None no 22317899
5 1 visit 4 None no 22270255
6 2 or more visits 14 None no 22308731
7 2 or more visits 17 None no 22304506
8 1 visit 14 None no 22287109
9 No preventive visits 12 None no 22020482
10 1 visit 3 1 visit no 22354728
# ℹ 2,400 more rows
The next step is to verify the single imputation with the multi-categorical variable er_visits:
er_visits n percent
None 2062 85.6%
1 visit 274 11.4%
2 or more visits 74 3.1%
We have the same percentages for each level in both data sets.
Code
describe(nsch22_order_si)
nsch22_order_si
5 Variables 2410 Observations
--------------------------------------------------------------------------------
dentistvisit
n missing distinct
2410 0 3
Value No preventive visits 1 visit 2 or more visits
Frequency 86 947 1377
Proportion 0.036 0.393 0.571
--------------------------------------------------------------------------------
age
n missing distinct Info Mean pMedian Gmd .05
2410 0 18 0.996 10.61 10.5 5.556 3
.10 .25 .50 .75 .90 .95
4 6 11 15 17 18
Value 1 2 3 4 5 6 7 8 9 10 11
Frequency 4 33 119 152 182 176 110 116 152 133 140
Proportion 0.002 0.014 0.049 0.063 0.076 0.073 0.046 0.048 0.063 0.055 0.058
Value 12 13 14 15 16 17 18
Frequency 131 141 146 147 175 172 181
Proportion 0.054 0.059 0.061 0.061 0.073 0.071 0.075
--------------------------------------------------------------------------------
er_visits
n missing distinct
2410 0 3
Value None 1 visit 2 or more visits
Frequency 2062 274 74
Proportion 0.856 0.114 0.031
--------------------------------------------------------------------------------
hospitalstay
n missing distinct
2410 0 2
Value yes no
Frequency 59 2351
Proportion 0.024 0.976
--------------------------------------------------------------------------------
hhid
n missing distinct
2410 0 2410
lowest : 22000115 22000164 22000276 22000358 22000381
highest: 22356672 22356767 22356784 22356882 22357318
--------------------------------------------------------------------------------
Here’s a breakdown of the observations from the describe analysis:
Hospital ER Visits: Most children (2062) had no hospital ER visits in the past year. A smaller portion had one visit (274), even fewer had 2 or more visits or (74).
Hospital Admissions: A large majority (2354) of the children had a prior hospitalization, while 59 had never been admitted.
8.1.6 Checking the Outcome (order,levels and distribution)
We will check if our outcome dentistvisit is an ordered factor.
Code
is.ordered(nsch22_order_si$dentistvisit)
[1] TRUE
We will check if the first level dentistvisit is No preventive visits
Code
nsch22_order_si |>tabyl(dentistvisit)
dentistvisit n percent
No preventive visits 86 0.03568465
1 visit 947 0.39294606
2 or more visits 1377 0.57136929
Bar Chart for dentistvisit
We will build a graphical summary of our outcome dentistvisit distribution.
The graph reveals how preventive dental care is distributed among children in our NSCH 22 sample. Notably, the majority of children received dental care, with two or more visits being the most common frequency. While one visit was also relatively frequent, only 4% of children in the survey reported never having had preventive dental care in the past year.
Code
ggplot(nsch22_order_si, aes(x = dentistvisit, y = age, fill = dentistvisit)) +geom_violin(trim =TRUE) +geom_boxplot(col ="white", width =0.2) +scale_fill_brewer(palette ="PuRd") +guides(fill ="none")
8.1.7 Partitioning the Data
After dealing with missing values, we will split the data into training and testing samples with resample function. We want 75% in the training sample and 25% in the testing sample.
We want to look at the distribution of subjects in both splits.
Code
dim(nsch22_order_train); dim(nsch22_order_test)
[1] 1807 5
[1] 603 5
Also, we want to check the breakdown of the distribution of preventive dental visits across different categories for both the training and test data sets.
Code
nsch22_order_train |>tabyl(dentistvisit)
dentistvisit n percent
No preventive visits 66 0.03652463
1 visit 719 0.39789707
2 or more visits 1022 0.56557831
Code
nsch22_order_test |>tabyl(dentistvisit)
dentistvisit n percent
No preventive visits 20 0.0331675
1 visit 228 0.3781095
2 or more visits 355 0.5887231
8.1.8 Scatterplot Matrix and Collinearity
We will evaluate collinearity between predictors via scatter plot matrix and variance inflation factors.
The predictors’ generalized variance inflation factors (GVIF) demonstrate a small level of collinearity. We therefore have no serious concerns with respect to collinearity.
8.1.9 Models Fiting
Fit proportional odds logistic regression model
Our model, denoted as m_polr, uses proportional odds logistic regression to predict dentistvisit. This prediction is made considering these factors: age, er visits, and hospitalstay, utilizing the polr function. Additionally, we provide a summary of the model.
Based on prior analysis, we confirm that our outcome variable is an ordered factor.
Call:
polr(formula = dentistvisit ~ age + er_visits + hospitalstay,
data = nsch22_order_train, Hess = TRUE)
Coefficients:
Value Std. Error t value
age 0.06507 0.009935 6.5489
er_visits1 visit -0.21842 0.148048 -1.4753
er_visits2 or more visits -0.29149 0.292856 -0.9953
hospitalstayno -0.72220 0.355314 -2.0326
Intercepts:
Value Std. Error t value
No preventive visits|1 visit -3.3821 0.3896 -8.6811
1 visit|2 or more visits -0.3268 0.3721 -0.8782
Residual Deviance: 2875.925
AIC: 2887.925
Our m_polr model includes two intercepts to accommodate the three levels of dentistvisit, along with six slopes corresponding to our four predictors.
Summarizing the Fit
Code
# model including covariatesglance(m_polr) |>gt() |>fmt_number(columns = logLik:deviance, decimals =3) |>tab_options(table.font.size =20)
edf
logLik
AIC
BIC
deviance
df.residual
nobs
6
−1,437.963
2,887.925
2,920.922
2,875.925
1801
1807
8.1.10 Interpreting the Model
We will exponentiate our model coefficients to facilitate interpretation.
If we compare two children, A and B, with the same values of the other predictors:
If child A is one year older than child B, then the model predicts that child A will have 1.067 times higher odds of having more preventive dental care visits with 95% CI:(1.047, 1.088).
The influence of hospital ER visits, whether occurring once or 2 times or more, compared to none, is estimated to have coefficients of approximately 0.8 and 0.75, respectively. This suggests that when holding other variables constant, each additional ER visit is associated with a slight decrease in the odds of attending dental appointments, by approximately 0.80 and 0.75, respectively. The 95% CI for these estimates are (0.60,1.076) for the first scenario and (0.421, 1.334) for the latter. Notably, these effects are deemed to be relatively small.
For children with no history of hospitalization, the odds ratio of 0.486 indicates that they have lower odds of preventive dental care visits with 95% CI:(0.234, 0.954), holding all other variables constant. This effect is considered to be relatively minor.
8.1.11 Validating Models
Using lrm for Proportional Odds Logistic Regression
We will build a model using lrm tool from the rms package
Code
d <-datadist(nsch22_order_train)options(datadist ="d")m_lrm <-lrm(dentistvisit~ age+ er_visits+ hospitalstay, data= nsch22_order_train, x = T, y = T)m_lrm
Logistic Regression Model
lrm(formula = dentistvisit ~ age + er_visits + hospitalstay,
data = nsch22_order_train, x = T, y = T)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 1807 LR chi2 51.06 R2 0.035 C 0.590
No preventive visits66 d.f. 4 R2(4,1807)0.026 Dxy 0.180
1 visit 719 Pr(> chi2) <0.0001 R2(4,1366.2)0.034 gamma 0.188
2 or more visits1022 Brier 0.238 tau-a 0.094
max |deriv| 2e-05
Coef S.E. Wald Z Pr(>|Z|)
y>=1 visit 3.3821 0.3896 8.68 <0.0001
y>=2 or more visits 0.3268 0.3721 0.88 0.3798
age 0.0651 0.0099 6.55 <0.0001
er_visits=1 visit -0.2184 0.1480 -1.48 0.1401
er_visits=2 or more visits -0.2915 0.2929 -1.00 0.3196
hospitalstay=no -0.7222 0.3553 -2.03 0.0421
The Nagelkerke R^2 value is suboptimal, measuring at 0.035. The C statistic of 0.590 suggests that the model fails in terms of predictive ability.
We validate the summary statistics with validate.
Code
set.seed(4322023)validate(m_lrm)
index.orig training test optimism index.corrected Lower Upper
Dxy 0.1800 0.1844 0.1777 0.0066 0.1734 0.1265 0.2255
R2 0.0347 0.0366 0.0328 0.0038 0.0310 0.0112 0.0474
Intercept 0.0000 0.0000 0.1398 -0.1398 0.1398 -1.0065 1.0738
Slope 1.0000 1.0000 0.9653 0.0347 0.9653 0.6833 1.3376
Emax 0.0000 0.0000 0.1018 -0.1018 0.1018 -0.0693 0.2659
D 0.0277 0.0293 0.0261 0.0032 0.0245 0.0082 0.0381
U -0.0011 -0.0011 -1.2757 1.2746 -1.2757 -1.2827 -1.2703
Q 0.0288 0.0304 1.3018 -1.2714 1.3002 1.2799 1.3180
B 0.0353 0.0364 0.0353 0.0011 0.0343 0.0269 0.0426
g 0.3891 0.3948 0.3755 0.0193 0.3698 0.2625 0.4699
gp 0.0938 0.0947 0.0130 0.0817 0.0120 -0.0136 0.0356
n
Dxy 40
R2 40
Intercept 40
Slope 40
Emax 40
D 40
U 40
Q 40
B 40
g 40
gp 40
We obtain a validated 𝑅^2 of 0.0310 , and a validated C statistic of 0.5 + (0.1735/2) = 0.586, which suggests a that the model fails in terms of predictive ability.
8.1.12 Effects Plot
Before proceeding with any predictions, we present a plot illustrating the effect size of our model. The summary function applied to the lrm fit showcases the effect size by comparing the 25th percentile to the 75th percentile.
Code
plot(summary(m_lrm))
Code
summary(m_lrm)
Effects Response : dentistvisit
Factor Low High Diff. Effect S.E. Lower 0.95
age 6 15 9 0.58561 0.089418 0.410350
Odds Ratio 6 15 9 1.79610 NA 1.507300
er_visits - 1 visit:None 1 2 NA -0.21842 0.148050 -0.508590
Odds Ratio 1 2 NA 0.80379 NA 0.601340
er_visits - 2 or more visits:None 1 3 NA -0.29147 0.292860 -0.865460
Odds Ratio 1 3 NA 0.74716 NA 0.420860
hospitalstay - yes:no 2 1 NA 0.72222 0.355320 0.025817
Odds Ratio 2 1 NA 2.05900 NA 1.026200
Upper 0.95
0.760860
2.140100
0.071751
1.074400
0.282520
1.326500
1.418600
4.131500
According to this model, when the age increases from 6 to 15 years old the odds of having more dentistvisit are estimated to be 1.8 times higher, with a 95% CI of (1.5, 2.1), while keeping all other variables constant. Additionally, the effect of being a 15-year-old child on the log odds of dentistvisit is estimated to be 0.58, with a SE of 0.1.
When the child’s number of ER visits increases from none to once in the past 12 months, this is associated with an estimated effect on the log odds of dentistvisit of -0.21, with a SE of 0.15. This translates to an odds ratio of 0.80, indicating that children with one ER visit have 80% lower odds of having preventive dental care compared to children with no ER visits, with 95% CI (0.60, 1.07), while keeping all other variables constant. The impact is relatively minor.
When the child’s number of ER visits increases from none to 2 times or more in the past 12 months, this is associated with an estimated effect on the log odds of dentist visits of -0.29, with a SE of 0.29 Alternatively, it is associated with an estimated odds ratio of 0.74, indicating lower odds of dentistvisits, with a 95% CI of (0.42, 1.32), while controlling for all other variables. The impact is relatively small.
Being admitted to the hospital for at least one night in the past 12 months is associated with an estimated effect on the log odds of dentistvisit of 0.72, with a standard error of 0.36. Alternatively, it is associated with a higher estimated odds ratio effect of 2 times for dentistvisit, with a 95% confidence interval of (1, 4.13). However, it’s important to note that the effect is very small.
8.1.13 Nomogram of Model
Code
fun.1<-function(x) 1-plogis(x)fun.3<-function(x) plogis(x - m_lrm$coef[1] + m_lrm$coef[2])plot(nomogram(m_lrm,fun=list('Prob Y = 1 (Low)'= fun.1, 'Prob Y = 3 (High)'= fun.3)))
8.1.14 Testing the Proportional Odds Assumption
We will apply the equivalent multinomial logit model to assess the proportional odds assumption. We will build this fit with the multinomial model. This model does not adhere to our assumption.
# weights: 18 (10 variable)
initial value 1985.192406
iter 10 value 1549.164657
iter 20 value 1430.936986
final value 1430.936934
converged
Code
m_multi
Call:
multinom(formula = dentistvisit ~ age + er_visits + hospitalstay,
data = nsch22_order_train)
Coefficients:
(Intercept) age er_visits1 visit
1 visit 2.691706 -0.05716066 0.01654727
2 or more visits 3.047038 0.01833096 -0.22362763
er_visits2 or more visits hospitalstayno
1 visit -1.073470 0.3349657
2 or more visits -1.107578 -0.4428863
Residual Deviance: 2861.874
AIC: 2881.874
We will calculate the log likelihood values for each of our models by comparing the log likelihoods of the multinomial model and the proportional odds logistic regression model. The multinomial logit model includes 2 intercepts and 12 slopes, while the proportional odds logit model includes 2 intercepts and 6 slopes. Therefore, the difference in parameters between the two models is 6.
As indicated by the p-value of 0.029, the proportional odds logistic regression model provides a less proper fit compared to the more complex multinomial logit model.
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 1.2889e-28
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
0.995
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 1.2889e-28
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 1.01
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at 0.995
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 1.2889e-28
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
0.995
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 1.2889e-28
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 1.01
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at 0.995
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 1.2889e-28
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
0.995
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 1.2889e-28
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 1.01
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at 0.995
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 1.2889e-28
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
0.995
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 1.2889e-28
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 1.01
Code
par(mfrow=c(1,1))
8.1.15 AIC and BIC
Code
AIC(m_polr, m_multi)
df AIC
m_polr 6 2887.925
m_multi 10 2881.874
Code
BIC(m_polr,m_multi)
df BIC
m_polr 6 2920.922
m_multi 10 2936.868
m_multi demonstrates a slightly superior AIC. However, in terms of BIC, m_polroutperform m_multi. Based on the initial assumption that our outcome is an ordered factor, we will proceed with predictions using the polr model.
Note:
We attempted to perform traditional validation using a separate test set, but this approach proved unsuitable because our outcome variable dentistvisit is an ordered categorical factor. Traditional validation methods require numerical output for predictions, which is not possible with this type of outcome.
8.1.16 Cross Tabulation Approach
Predicted vs. Observed: How well does the model fit?
Next, we will apply our proportional odds logistic regression model, m_polr, to predict the frequency of preventive dental visits for each of the 603 subjects in our test sample nsch22_order_test.
Observed Values
Predicted No preventive visits 1 visit 2 or more visits Sum
No preventive visits 0 0 0 0
1 visit 8 49 38 95
2 or more visits 12 179 317 508
Sum 20 228 355 603
The polr model m_polr predicts the dentistvisit category correctly for (0+49+317)=366 out of 603 children in the sample, accounting for 60.6% of the children in the test sample.
Our model achieves the following accurate predictions:
0% correct for children who actually never had a preventive dental care visit.
51.6% correct for children who actually had one preventive dental care visit (49 out of 95).
62.4% correct for children who actually had two or more preventive dental care visits (317 out of 508).
Here, we’ll fit the appropriate m_polr and see what predictions it makes across our 1807 subjects in training sample nsch22_order_train.
Observed Values
Predicted No preventive visits 1 visit 2 or more visits Sum
No preventive visits 0 0 0 0
1 visit 13 160 106 279
2 or more visits 53 559 916 1528
Sum 66 719 1022 1807
The polr model m_polr predicts the dentistvisit category correctly for (0+160+916)= 1076 out of 1807 children in the sample, accounting for 60 % of the children in the training sample.
Our model achieves the following accurate predictions:
0% correct for children who actually never had a preventive dental care visit.
57.3% correct for children who actually had one preventive dental care visit (160 out of 279).
60 % correct for children who actually had two or more preventive dental care visits (916 out of 1528).
8.1.17 Making Predictions
We will utilize our model, m_polr, to predict the frequency of preventive dental visits for two hypothetical children: - Child 1: 8 years old, 4 or more ER visits, and hospitalized. - Child 2: 15 years old, 4 or more ER visits, and hospitalized.
Code
temp.dat <-data.frame(name =c("1", "2"), age =c(8,15), er_visits =c("2 or more visits", "2 or more visits"), hospitalstay =c("yes", "yes"))predict(m_polr, temp.dat, type ="p")
No preventive visits 1 visit 2 or more visits
1 0.02630921 0.3382051 0.6354857
2 0.01684620 0.2498815 0.7332723
Based on our model’s predictions m_polr, child 1 (8 years old) has a higher probability of having one preventive dental visit, while Child 2 (15 years old) has a higher probability of having two or more preventive dental visits.
8.1.18 Multiple Imputation Approach
The provided code use multiple imputation approach to account for the missing values in the nsch22_model_one data. It first generates multiple imputations using the aregImpute function and then fits a logistic regression model lrm to the imputed data using the fit.mult.imputefunction, yielding the output m_imp
Logistic Regression Model
fit.mult.impute(formula = dentistvisit ~ age + er_visits + hospitalstay,
fitter = lrm, xtrans = imp_fit22, data = nsch22_model_one,
pr = FALSE, fitargs = list(x = TRUE, y = TRUE))
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 2410 LR chi2 65.23 R2 0.033 C 0.590
No preventive visits86 d.f. 4 R2(4,2410)0.025 Dxy 0.180
1 visit 947 Pr(> chi2) <0.0001 R2(4,1814.1)0.033 gamma 0.188
2 or more visits1377 Brier 0.238 tau-a 0.093
max |deriv| 1e-05
Coef S.E. Wald Z Pr(>|Z|)
y>=1 visit 3.1839 0.3328 9.57 <0.0001
y>=2 or more visits 0.1296 0.3175 0.41 0.6830
age 0.0650 0.0086 7.53 <0.0001
er_visits=1 visit -0.1980 0.1293 -1.53 0.1258
er_visits=2 or more visits -0.2857 0.2538 -1.13 0.2603
hospitalstay=no -0.5017 0.3018 -1.66 0.0964
The The Nagelkerke R^2 value remains similar with that obtained from single imputation (0.035). Similarly, the C statistic of 0.591 almost matches the result from single imputation, indicating a lack of predictive ability in the model.
8.1.19 Comparing the Coeffiecnts (exponentiated)
Code
round_half_up(exp(m_lrm$coefficients),3)
y>=1 visit y>=2 or more visits
29.434 1.387
age er_visits=1 visit
1.067 0.804
er_visits=2 or more visits hospitalstay=no
0.747 0.486
Code
round_half_up(exp(m_imp$coefficients),3)
y>=1 visit y>=2 or more visits
24.141 1.138
age er_visits=1 visit
1.067 0.820
er_visits=2 or more visits hospitalstay=no
0.751 0.606
Comparing the lrm models obtained from single vs. multiple imputation reveals that the coefficients for age and er_visits (2 times or more category) are relatively similar. However, the coefficients for other variables show some differences between the two imputation methods.
8.2 Analysis 2
The outcome for this analysis is cavities. The predictors are er_visits, hospitalstay, and med_use.
We will start by creating a subset for the logistic analysis.
# A tibble: 3,000 × 5
hhid er_visits hospitalstay med_use cavities
<chr> <fct> <fct> <fct> <fct>
1 22017610 None no no no
2 22075400 1 visit no no yes
3 22103566 None no yes no
4 22317899 None no yes <NA>
5 22270255 None no no no
6 22308731 None no no no
7 22304506 None no no no
8 22287109 None no no no
9 22261632 None no no no
10 22020482 None no yes no
# ℹ 2,990 more rows
8.2.1 Prepare the binary outcome
We will observe the cavities outcome distribution in the study sample.
There are 2,391 subjects with complete cases out of the 3000 study sample subjects. 11% of the children guardians/ parents reported that they have experienced chronic cavities, while the remining 89% did not.
8.2.2 Missingness
First we will observe the missingness in the data set:
Code
miss_var_summary(nsch22_log) |>gt()
variable
n_miss
pct_miss
hospitalstay
14
0.467
cavities
14
0.467
er_visits
10
0.333
med_use
2
0.0667
hhid
0
0
The proportion of missing data in this sample is about 1.33%. The outcome cavities has the highest proportion of missing cases 0.47%, as well as the predictor hospitalstay followed by the other predictors er_visits and med_use (0.33 and 0.06). Overall, the proportion of missingness is minute. Due to the small proportion of missingness, we can assume that the data is missing at random (MAR).
For the sake of this project, we will replace the missing values through the multiple imputation method using aregImpute.
Logistic Regression Model
fit.mult.impute(formula = (cavities == "yes") ~ er_visits + hospitalstay +
med_use, fitter = lrm, xtrans = nsch22_imp, data = nsch22_raw,
pr = FALSE, fitargs = list(x = TRUE, y = TRUE))
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 3000 LR chi2 11.38 R2 0.008 C 0.548
FALSE 2718 d.f. 4 R2(4,3000)0.002 Dxy 0.097
TRUE 282 Pr(> chi2) 0.0227 R2(4,768.9)0.010 gamma 0.189
max |deriv| 6e-05 Brier 0.085 tau-a 0.016
Coef S.E. Wald Z Pr(>|Z|)
Intercept -2.3388 0.4543 -5.15 <0.0001
er_visits=1 visit -0.0737 0.1976 -0.37 0.7091
er_visits=2 or more visits -0.0377 0.3531 -0.11 0.9150
hospitalstay=no 0.4844 0.4483 1.08 0.2799
med_use=no -0.4938 0.1474 -3.35 0.0008
Code
round_half_up(exp(m2_ord$coefficients),3)
Intercept er_visits=1 visit
0.096 0.929
er_visits=2 or more visits hospitalstay=no
0.963 1.623
med_use=no
0.610
We will observe the size effects through this plot:
Code
plot(summary(m2_ord))
Our model predicts that children that have visited the ER in the hospital (1 visit and 2 or more visits) have lower odds of experiencing toothaches (0.929 and 0.963 respectively) compared to those with no history of ER visits. It also predicts that children that have been admitted to stay for at least one night in the hospital have lower odds (0.62) of experiencing chronic cavities compared to children with no history of hospital admission. However, children using medication have (1.61) the odds of experiencing chronic cavities. All of these assumptions are made while holding other factors constant.
All confidence intervals, with the exception of med_use, cross the value of 1. This indicates that the medication use variable is the only variable associate with the outcome and therefore contributes to the model predictability.
We will validate our model using the Bootstrap method.
Code
set.seed(432)validate(m2_ord, B =50)
index.orig training test optimism index.corrected Lower Upper
Dxy 0.0966 0.1091 0.0854 0.0238 0.0728 0.0247 0.1248
R2 0.0082 0.0120 0.0062 0.0058 0.0024 -0.0073 0.0124
Intercept 0.0000 0.0000 -0.5156 0.5156 -0.5156 -1.3299 1.1362
Slope 1.0000 1.0000 0.7703 0.2297 0.7703 0.3971 1.4355
Emax 0.0000 0.0000 0.2103 -0.2103 0.2103 -0.0414 0.5239
D 0.0035 0.0052 0.0025 0.0027 0.0008 -0.0038 0.0054
U -0.0007 -0.0007 0.0003 -0.0010 0.0003 -0.0011 0.0032
Q 0.0041 0.0059 0.0022 0.0037 0.0004 -0.0072 0.0057
B 0.0848 0.0847 0.0850 -0.0003 0.0851 0.0752 0.0925
g 0.1655 0.2112 0.1541 0.0571 0.1084 0.0093 0.2117
gp 0.0153 0.0187 0.0136 0.0051 0.0102 0.0019 0.0199
n
Dxy 50
R2 50
Intercept 50
Slope 50
Emax 50
D 50
U 50
Q 50
B 50
g 50
gp 50
The C statistic = 0.5 + (0.0705/2) = 0.53525 With a low AUC value, we can see that the model has a poor predictive performance. The Nagelkerke score of 0.002 further supports this conclusion regarding the model.
We will check whether collinearity is a problem.
Code
rms::vif(m2_ord)
er_visits=1 visit er_visits=2 or more visits
1.025313 1.083527
hospitalstay=no med_use=no
1.092496 1.024044
The value of inflation is not concerning for the presented predictors. Using an interaction term is not necessary in this case.
8.2.4 Bayesian logistic model
For the Bayesian method, the data source should be a data frame. With the proportion of missingness being small (1.33%), we can use a complete data set nsch22_cc.
We can also visualize the coefficients and credible intervals.
Code
plot(m2_bayes, prob =0.5, prob_outer =0.95)
To compare the ordinary logistic model results with the Bayesian logistic model’s, we will produce a table with the exponeniated coefficients. This will also help in interpreting the odds ratio.
Similar the regular logistic regression model, the Bayesian model predicts that children with a history of ER visits have the same odds of experiencing chronic cavities compared to children with no ER visit history (1.01 and 0.98). The same findings can be observed with the hospital overnight stay (1.04). Contrary to the regular logistic regression model, the medication use is predicted to have the same odds (0.97). All confidence intervals cross the value of one, indicating these variables have weak association with the outcome (chronic cavities). All of these assumptions were made while holding other variables constant.
Graphical Posterior Predictive Checks
Code
pp_check(m2_bayes, nreps =10)
Code
pp_check(m2_bayes, plotfun ="stat_2d", stat =c("mean", "sd"))
Note: in most cases the default test statistic 'mean' is too weak to detect anything of interest.
9 Conclusions and Discussion
9.1 Analysis One
What are the odds of children with hospitalization records of receiving preventive dental care when other variables (age) are taken into account?
We hypothesized that children who experienced frequent ER visits and hospitalization might be more likely to receive dental care, and that younger children might be more likely to receive frequent dental care visits compared to older children.
The proportional odds logistic regression model (POLR) revealed that age plays a role in predicting dental care visits. The model predicts that the likelihood of a child seeking preventive dental care increases by 1.067 times for each additional year of age. The impact of hospital ER visits was surprising. While we anticipated a positive association, the results showed a slight decrease in the odds of dental visits with each additional ER visit. However, for children with frequent hospital admission, the model suggested a higher likelihood of dental care than those without history of hospitalization visits. However, the small effect sizes across all these factors indicate their limited individual impact.
Our POLR predicting preventive dental visits with three levels (no preventive visits,1 visit,2 or more visits) using age, er_visits and hospitalstay has relatively poor or weak predictive ability. The validated 𝑅^ 2 of 0.0310, and a validated C of 0.5 + (0.1735/2) = 0.586 indicate that the model fails to fit our data well. However, it was able to correctly predict dental visit categories to some extent, particularly for children with one and two or more visits (51.6% and 62%, respectively).
Accounting for more sample size and incorporating additional factors beyond those currently included is crucial to enhancing the model’s performance. Socioeconomic status and access to healthcare are potential variables that could contribute to predicting preventive dental care among children in the US.
9.2 Analysis Two
What are the odds of children with hospitalization records of experiencing chronic cavities, when other variables (ER visits and medication) are taken into account?
In this study, we hypothesized that children with hospitalization history have higher odds of experiencing chronic cavities. The same assumption was made for children with a history of medication use. The ordinary logistic model predicted that children with ER visitation history have the same odds (0.929 and 0.963) of experiencing chronic cavities, compared to children without history. As hypothesized, children who use medication have higher odds (1.61) of chronic cavities. Children admitted to a hospital for at least one night had lower odds (0.62). In the ordinary logistic regression model, the medication use variable was the only variable with a confidence interval excluding the value of 1, indicating high association with the outcome.
The Bayes logistic regression produced the same results regarding the ER visits and hospital stay variables. However, the results were different for the medication use variable. The model predicted that children with 1 visit or 2 or more ER visits have the same odds (1.01 and 0.98) of children with no history of ER visits. The same findings can be observed with the hospital overnight stay (1.04) and medication use (0.97). All confidence intervals contain the value of one, indicating these variables have a weak association with the outcome (chronic cavities).
The Bayes regression model presents a better understanding of the findings. This AUC value of the model was very low (0.56) and the Nagelkerke R2 value (0.002) indicate weak predictive performance of the model. Therefore, the predictions produced from the model are not reliable. The predictors have a weak association with the outcome and as a result cannot contribute meaningfully to the model predictability.
Based on these findings, future studies regarding children with hospitalization records should use scores to evaluate the level of the health status. Objective methods evaluating oral health (such as the DMFT score) can provide reliable results compared to subjective measures (reported cavities experience).
10 References and Acknowledgments
10.1 References
The data set was pulled from the National Survey for Child Health year 2022. Data Source: https://www.childhealthdata.org/learn-about-the-nsch/nsch-codebooks
10.2 Acknowledgments
First and foremost, we would like to express our deepest gratitude to Professor Love for his excellent teaching, unwavering support, and the abundance of resources he provided throughout the semester. His patience and guidance were invaluable. We also extend our heartfelt appreciation to Alex, Miza, Monica, Lindsey, and all the TAs for their tireless assistance and willingness to address all our concerns throughout the semester, especially during the project period.
---title: "Preventive Dental Care and Hospitalization Trends Among Children"subtitle: "2022 National Survey of Children’s Health (NSCH) Data"author: "Sarah Albalawi & Walaa Alshaia"date: last-modifiedformat: html: toc: true number-sections: true code-fold: show code-tools: true code-overflow: wrap embed-resources: true date-format: iso theme: zephyr---## R Packages and Setup```{r}#| message: false#| warning: falseknitr::opts_chunk$set(comment =NA)library(bayestestR); library(bestglm)library(car); library(caret); library(cutpointr)library(Epi)library(GGally); library(glmnet); library(gt)library(insight)library(janitor)library(lme4)library(MASS); library(mice); library(mosaic)library(naniar); library(nnet)library(patchwork); library(pROC); library(pscl)library(quantreg)library(ROCR); library(rstanarm)library(survey); library(survival); library(survminer)library(haven)library(broom.mixed)library(conflicted)library(rms)library(tidymodels)library(tidyverse)conflicts_prefer(dplyr::filter, dplyr::select, dplyr::summarize, dplyr::count, base::mean, base::sum, base::max, car::vif, Matrix::update, rms::Predict)options(dplyr.summarise.inform =FALSE)theme_set(theme_bw())```# BackgroundThe study uses data from the National Survey of Children's Health (NSCH), year 2022. It is a survey based data set with the main subject children aged 0-17. The population includes children from all 50 states and the District of Columbia. The survey is a self-administered web-based and paper-based questionnaire. The respondent is a parent or a guardian, where one child from each household is selected to be the subject of the questionnaire.This study focuses on children with hospitalization records. Medically compromised patients often face delay when receiving oral health care, which can negatively impact their oral and overall health status. The aim of this study is to assess whether being a child with frequent hospital visits, history of admission, and taking medication, impacts their access to preventive dental care.# Research QuestionsThe overarching purpose of this study is to investigate the access and oral health status of children with hospitalization history. Medically compromised children often face barriers to preventive dental care, which negatively impacts their health outcomes.**Research question 1:** What are the odds of children with hospitalization records of receiving preventive dental care when other variables (age) are taken into account?Hospital admission and ER visits were used to observe whether they present as barriers to preventive oral care access. Children with poor health status can often face difficulties accessing preventive dental care.**Research question 2:** What are the odds of children with hospitalization records of experiencing chronic cavities, when other variables (ER visits and medication) are taken into account?Hospital admission, ER visits, and medication use were added to observe potential associations. There has been dental reluctance in treating children with poor health status. Children with more hospital visits may face dental care neglect and consequently have higher odds of experiencing cavities# Data Ingest and Management## Loading the Raw Data```{r}nsch_2022e_topical <-read_dta("nsch_2022e_topical.dta")dim(nsch_2022e_topical)```### Sampling the DataWe will select a random sample of 3000 subjects out of the 54,103 original study population.```{r}set.seed(432)nsch22_raw <- nsch_2022e_topical |>slice_sample(n =3000)```## Cleaning the Data### Selecting Variables We'll Useselect the variables```{r}nsch22_raw <- nsch22_raw|> dplyr::select(hhid, sc_age_years, hospitaler, hospitalstay, sc_k2q10, dentistvisit, cavities)|> dplyr::mutate(across(where(is.numeric), as_factor))|> dplyr::mutate(hhid =as.character(hhid))|> dplyr::mutate(sc_age_years =as.numeric(sc_age_years))nsch22_raw```### Changing Variable Names```{r}nsch22_raw <- nsch22_raw |>rename(age = sc_age_years,med_use = sc_k2q10,er_visits = hospitaler)nsch22_raw```### Working with Categorical Predictors**er_visits**```{r}nsch22_raw |>tabyl (er_visits) ```The fourth categories has 15 observations. We will need to collapse the variable into three categories instead of four.```{r}nsch22_raw <- nsch22_raw |>mutate (er_visits=fct_collapse(er_visits, "3"=c(" 3", "4")))tabyl(nsch22_raw$er_visits)``````{r}nsch22_raw <- nsch22_raw |>mutate(er_visits =fct_recode(er_visits,"None"="1","1 visit"="2","2 or more visits"="3"))tabyl(nsch22_raw$er_visits)```**hospitalstay**```{r}nsch22_raw<- nsch22_raw|>mutate(hospitalstay =fct_recode (hospitalstay, "yes"="1", "no"="2"))tabyl(nsch22_raw$hospitalstay)```**med_use**```{r}nsch22_raw<- nsch22_raw|>mutate(med_use =fct_recode (med_use, "yes"="1", "no"="2"))tabyl(nsch22_raw$med_use)```### Working with Categorical Outcomes**dentistvisit**```{r}nsch22_raw <- nsch22_raw |>mutate(dentistvisit =fct_recode(dentistvisit,"No preventive visits"="1","1 visit"="2","2 or more visits"="3"), dentistvisit=relevel(dentistvisit, "No preventive visits", "1 visit", "2 or more visits"),dentistvisit=factor (dentistvisit, ordered =TRUE))is.ordered(nsch22_raw$dentistvisit)``````{r}tabyl(nsch22_raw$dentistvisit)```**cavities**```{r}nsch22_raw<- nsch22_raw|>mutate(cavities =fct_recode (cavities, "yes"="1", "no"="2"))tabyl(nsch22_raw$cavities)```### Arranging the TibbleWe will rename the cleaned sample as nsch22.```{r}nsch22 <- nsch22_rawdim(nsch22)``````{r}identical(nrow(nsch22), n_distinct(nsch22$hhid))```## Print and Save The Final Tibble```{r}write_rds(nsch22,"nsch22.Rds")```# Code Book and Description| Variable | Role | Type | Description ||:-------------|:-------------|:-------------|:------------------------------|| **hhid** | ID | \- | Unique Household ID || **dentistvisit** | **Outcome (multi-categorical regression)** | 3 Categories | The number of dental visits the child had during the past 12 months to seek preventive dental care: None, 1 visit, 2 or more visits || **cavities** | **Outcome (logistic regression)** | Binary | Child experience of chronic cavities during the past 12 Months (yes or no) || **age** | input | Quantitative | Child age in years || **er_visits** | input | 3 Categories | The number of child emergency room visits (None, 1 visit, 2 or more visits) || **hospitalstay** | input | Binary | Child admitted to a hospital for at least one night during past 12 months (yes or no) || **med_use** | **Outcome** | Binary | Child need for prescribed medicine by a doctor (yes/ no) |# Analyses## Analysis 1### Research Question for Analysis 1What are the odds of children with hospitalization records of receiving preventive dental care when other variables (age) are taken into account?Children who have experienced hospitalization might have a higher likelihood of receiving preventive dental care. This could be due to frequent ER visits and hospitalization often indicating a decline in the health status, which may lead to a more comprehensive approach to healthcare, encompassing preventive dental care. Furthermore, younger children may be more likely to receive preventive dental care due to parental involvement and regular check-ups, while older children might have more autonomy over their healthcare decisions and may be more proactive about preventive measures. We need to conduct a proper statistical analysis of relevant data to confirm those anticipations.We plan to use an ordinal logistic regression model to analyze the likelihood of children with history of ER visits and hospital admission receiving preventive dental care. This model will consider three factors that might influence the outcome: age, history of hospitalization , and the number of emergency room visits We will treat the outcome variable, `dentistvisit` as an **ordinal variable**, acknowledging that it has a natural order (No preventive visits ,1 visit, 2 or more visits).### Analysis 1 OutcomeOur outcome of interest is `dentistvisit`. It indicates how many visits during the past 12 months a child has seen a dentist or other oral health care provider for preventive dental care(No preventive visits in the past 12 months, none, 1 visit, 2 or more visits).```{r}nsch22 |>filter(is.na(dentistvisit) ==FALSE) |>nrow()```There are 2410 rows with complete data on `dentistvisit.`### Analysis 1 PredictorsFor Analysis 1, we will use four variables to predict `dentistvisit`:- `age`: child age in years- `er_visits`: How many times did this child visit a hospital emergency room? (None, 1 time, 2‐3 times, 4 or more times)- `hospitalstay`: Whether the child was admitted to the hospital to stay for at least one night during past 12 months (yes or no)### MissingnessFirst, we will generate a tibble that includes our variables of interest.```{r}nsch22_model_one <- nsch22 |>select(hhid, dentistvisit, age, er_visits, hospitalstay)```Then we'll use `miss_var_summary` to see how many observations are missing for each variable.```{r}miss_var_summary(nsch22_model_one)```We have 590 missing observations on the outcome `dentistvisit`.```{r}n_case_complete(nsch22_model_one); pct_complete_case(nsch22_model_one)```- 2401 (80%) of the 3000 subjects in the nsch22_model_one are complete.- The remaining 599 or 20% observations have some missing data on predictors or the outcome.- We will assume these data are **missing at random (MAR)** and proceed with single imputation.- We will filter by complete cases for the outcome.```{r}nsch22_model_one <- nsch22_model_one |>filter(complete.cases(dentistvisit))```### Single Imputation ApproachAs we have missing values in our predictors `hospitalstay` and `er_visits`, we conduct single imputation under the assumption that missing values are missing at random (MAR).```{r}set.seed(432)orderdata <- nsch22_model_one |>select(-hhid)nsch22_order_mice1 <-mice(orderdata, m =15, printFlag =FALSE)``````{r}nsch22_order_si <-complete(nsch22_order_mice1) |>tibble()``````{r}nsch22_order_si$hhid <- nsch22_model_one$hhiddim(nsch22_order_si); n_miss(nsch22_order_si); nsch22_order_si ```- The next step is to verify the single imputation with the multi-categorical variable `er_visits`:```{r}nsch22_model_one |>tabyl (er_visits) |>adorn_pct_formatting()``````{r}nsch22_order_si |>tabyl (er_visits) |>adorn_pct_formatting()```We have the same percentages for each level in both data sets.```{r}describe(nsch22_order_si)```Here's a breakdown of the observations from the `describe` analysis:**Hospital ER Visits:** Most children (2062) had no hospital ER visits in the past year. A smaller portion had one visit (274), even fewer had 2 or more visits or (74).**Hospital Admissions:** A large majority (2354) of the children had a prior hospitalization, while 59 had never been admitted.### Checking the Outcome (order,levels and distribution)- We will check if our outcome `dentistvisit` is an ordered factor.```{r}is.ordered(nsch22_order_si$dentistvisit)```- We will check if the first level `dentistvisit` is No preventive visits```{r}nsch22_order_si |>tabyl(dentistvisit)```- Bar Chart for `dentistvisit`We will build a graphical summary of our outcome `dentistvisit` distribution.```{r}ggplot(nsch22_order_si, aes(x = dentistvisit, fill = dentistvisit)) +geom_bar(aes(y = (after_stat(count)/sum(after_stat(count))))) +geom_text(aes(y = (after_stat(count))/sum(after_stat(count)), label = scales::percent((after_stat(count)) /sum(after_stat(count)))),stat ="count", vjust =1.5, color ="black", size =5) +scale_y_continuous(labels = scales::percent) +scale_fill_brewer(palette ="PuRd") +guides(fill ="none") +labs(title ="Preventive Dental Care",subtitle ="NSCH 22",x ="Dentist Visits",y ="Percentage",caption ="Number of children: 2410")```The graph reveals how preventive dental care is distributed among children in our NSCH 22 sample. Notably, the majority of children received dental care, with two or more visits being the most common frequency. While one visit was also relatively frequent, only 4% of children in the survey reported never having had preventive dental care in the past year.```{r}ggplot(nsch22_order_si, aes(x = dentistvisit, y = age, fill = dentistvisit)) +geom_violin(trim =TRUE) +geom_boxplot(col ="white", width =0.2) +scale_fill_brewer(palette ="PuRd") +guides(fill ="none")```### Partitioning the DataAfter dealing with missing values, we will split the data into training and testing samples with `resample` function. We want 75% in the training sample and 25% in the testing sample.```{r}#| echo: trueset.seed(4322024)nsch22_o_splits <-initial_split(nsch22_order_si, prop =3/4)nsch22_order_train <-training(nsch22_o_splits)nsch22_order_test <-testing(nsch22_o_splits)```We want to look at the distribution of subjects in both splits.```{r}dim(nsch22_order_train); dim(nsch22_order_test)```Also, we want to check the breakdown of the distribution of preventive dental visits across different categories for both the training and test data sets.```{r}#| echo: truensch22_order_train |>tabyl(dentistvisit)nsch22_order_test |>tabyl(dentistvisit)```### Scatterplot Matrix and CollinearityWe will evaluate collinearity between predictors via scatter plot matrix and variance inflation factors.- Scatterplot Matrix:```{r}#| message: false#| warning: false#| fig-height: 8GGally::ggpairs(nsch22_order_train |>select (age, er_visits, hospitalstay, dentistvisit))```The correlation among the predictors is fairly modest, and the scatter plot matrix reveals no major concerns regarding collinearity among predictors.- Checking Variance Inflation Factors:```{r}vif(polr(dentistvisit~ age+ er_visits+ hospitalstay, data= nsch22_order_train, Hess=TRUE))```The predictors’ generalized variance inflation factors (GVIF) demonstrate a small level of collinearity. We therefore have no serious concerns with respect to collinearity.### Models Fiting- Fit proportional odds logistic regression modelOur model, denoted as `m_polr`, uses proportional odds logistic regression to predict `dentistvisit.` This prediction is made considering these factors: age, er visits, and hospitalstay, utilizing the `polr` function. Additionally, we provide a summary of the model.Based on prior analysis, we confirm that our outcome variable is an ordered factor.```{r}m_polr <-polr(dentistvisit~ age+ er_visits+ hospitalstay, data= nsch22_order_train, Hess=TRUE)summary(m_polr)```Our `m_polr` model includes two intercepts to accommodate the three levels of dentistvisit, along with six slopes corresponding to our four predictors.- Summarizing the Fit```{r}# model including covariatesglance(m_polr) |>gt() |>fmt_number(columns = logLik:deviance, decimals =3) |>tab_options(table.font.size =20)```### Interpreting the ModelWe will exponentiate our model coefficients to facilitate interpretation.```{r}tidy(m_polr, exponentiate =TRUE, conf.int =TRUE) |>filter(coef.type =="coefficient") |>gt() |>fmt_number(estimate:conf.high, decimals =3)```**Coefficients Interpretation:**If we compare two children, A and B, with the same values of the other predictors:- If child A is one year older than child B, then the model predicts that child A will have 1.067 times higher odds of having more preventive dental care visits with 95% CI:(1.047, 1.088).The influence of hospital ER visits, whether occurring once or 2 times or more, compared to none, is estimated to have coefficients of approximately 0.8 and 0.75, respectively. This suggests that when holding other variables constant, each additional ER visit is associated with a slight decrease in the odds of attending dental appointments, by approximately 0.80 and 0.75, respectively. The 95% CI for these estimates are (0.60,1.076) for the first scenario and (0.421, 1.334) for the latter. Notably, these effects are deemed to be relatively small.For children with no history of hospitalization, the odds ratio of 0.486 indicates that they have lower odds of preventive dental care visits with 95% CI:(0.234, 0.954), holding all other variables constant. This effect is considered to be relatively minor.### Validating Models- Using `lrm` for Proportional Odds Logistic RegressionWe will build a model using `lrm` tool from the `rms` package```{r}d <-datadist(nsch22_order_train)options(datadist ="d")m_lrm <-lrm(dentistvisit~ age+ er_visits+ hospitalstay, data= nsch22_order_train, x = T, y = T)m_lrm```The *Nagelkerke* R\^2 value is suboptimal, measuring at 0.035. The C statistic of 0.590 suggests that the model fails in terms of predictive ability.We validate the summary statistics with `validate.````{r}set.seed(4322023)validate(m_lrm)```We obtain a validated 𝑅\^2 of 0.0310 , and a validated C statistic of 0.5 + (0.1735/2) = 0.586, which suggests a that the model fails in terms of predictive ability.### Effects PlotBefore proceeding with any predictions, we present a plot illustrating the effect size of our model. The summary function applied to the `lrm` fit showcases the effect size by comparing the 25th percentile to the 75th percentile.```{r}plot(summary(m_lrm))``````{r}summary(m_lrm)```According to this model, when the age increases from 6 to 15 years old the odds of having more `dentistvisit` are estimated to be 1.8 times higher, with a 95% CI of (1.5, 2.1), while keeping all other variables constant. Additionally, the effect of being a 15-year-old child on the log odds of `dentistvisit` is estimated to be 0.58, with a SE of 0.1.When the child's number of ER visits increases from none to once in the past 12 months, this is associated with an estimated effect on the log odds of `dentistvisit` of -0.21, with a SE of 0.15. This translates to an odds ratio of 0.80, indicating that children with one ER visit have 80% lower odds of having preventive dental care compared to children with no ER visits, with 95% CI (0.60, 1.07), while keeping all other variables constant. The impact is relatively minor.When the child's number of ER visits increases from none to 2 times or more in the past 12 months, this is associated with an estimated effect on the log odds of dentist visits of -0.29, with a SE of 0.29 Alternatively, it is associated with an estimated odds ratio of 0.74, indicating lower odds of `dentistvisits`, with a 95% CI of (0.42, 1.32), while controlling for all other variables. The impact is relatively small.Being admitted to the hospital for at least one night in the past 12 months is associated with an estimated effect on the log odds of `dentistvisit` of 0.72, with a standard error of 0.36. Alternatively, it is associated with a higher estimated odds ratio effect of 2 times for `dentistvisit`, with a 95% confidence interval of (1, 4.13). However, it's important to note that the effect is very small.### Nomogram of Model```{r}fun.1<-function(x) 1-plogis(x)fun.3<-function(x) plogis(x - m_lrm$coef[1] + m_lrm$coef[2])plot(nomogram(m_lrm,fun=list('Prob Y = 1 (Low)'= fun.1, 'Prob Y = 3 (High)'= fun.3)))```### Testing the Proportional Odds AssumptionWe will apply the equivalent multinomial logit model to assess the proportional odds assumption. We will build this fit with the multinomial model. This model does not adhere to our assumption.```{r}m_multi <-multinom(dentistvisit~ age+ er_visits+ hospitalstay, data= nsch22_order_train)m_multi```We will calculate the log likelihood values for each of our models by comparing the log likelihoods of the multinomial model and the proportional odds logistic regression model. The multinomial logit model includes 2 intercepts and 12 slopes, while the proportional odds logit model includes 2 intercepts and 6 slopes. Therefore, the difference in parameters between the two models is 6.```{r}LL_1 <-logLik(m_polr)LL_1m <-logLik(m_multi)(G <--2* (LL_1[1] - LL_1m[1]))``````{r}pchisq(G, 6, lower.tail =FALSE)```As indicated by the p-value of 0.029, the proportional odds logistic regression model provides a less proper fit compared to the more complex multinomial logit model.```{r}par(mfrow =c(2,2))resid(m_lrm, 'score.binary', pl=TRUE)par(mfrow=c(1,1))```### AIC and BIC```{r}AIC(m_polr, m_multi)``````{r}BIC(m_polr,m_multi)````m_multi` demonstrates a slightly superior AIC. However, in terms of BIC, `m_polr`outperform `m_multi.` Based on the initial assumption that our outcome is an ordered factor, we will proceed with predictions using the polr model.**Note:**We attempted to perform traditional validation using a separate test set, but this approach proved unsuitable because our outcome variable `dentistvisit` is an ordered categorical factor. Traditional validation methods require numerical output for predictions, which is not possible with this type of outcome.### Cross Tabulation Approach- Predicted vs. Observed: How well does the model fit?Next, we will apply our proportional odds logistic regression model, `m_polr`, to predict the frequency of preventive dental visits for each of the 603 subjects in our test sample `nsch22_order_test.````{r}addmargins(table(predict(m_polr, newdata = nsch22_order_test), nsch22_order_test$dentistvisit,dnn =c("Predicted", "Observed Values")))```The polr model `m_polr` predicts the `dentistvisit` category correctly for (0+49+317)=366 out of 603 children in the sample, accounting for 60.6% of the children in the test sample.**Our model achieves the following accurate predictions:**- 0% correct for children who actually never had a preventive dental care visit.- 51.6% correct for children who actually had one preventive dental care visit (49 out of 95).- 62.4% correct for children who actually had two or more preventive dental care visits (317 out of 508).Here, we’ll fit the appropriate `m_polr` and see what predictions it makes across our 1807 subjects in training sample `nsch22_order_train`.```{r}addmargins(table(predict(m_polr), nsch22_order_train$dentistvisit,dnn =c("Predicted", "Observed Values")))```The polr model `m_polr` predicts the `dentistvisit` category correctly for (0+160+916)= 1076 out of 1807 children in the sample, accounting for 60 % of the children in the training sample.**Our model achieves the following accurate predictions:**- 0% correct for children who actually never had a preventive dental care visit.- 57.3% correct for children who actually had one preventive dental care visit (160 out of 279).- 60 % correct for children who actually had two or more preventive dental care visits (916 out of 1528).### Making PredictionsWe will utilize our model, `m_polr`, to predict the frequency of preventive dental visits for two hypothetical children: - Child 1: 8 years old, 4 or more ER visits, and hospitalized. - Child 2: 15 years old, 4 or more ER visits, and hospitalized.```{r}temp.dat <-data.frame(name =c("1", "2"), age =c(8,15), er_visits =c("2 or more visits", "2 or more visits"), hospitalstay =c("yes", "yes"))predict(m_polr, temp.dat, type ="p")```Based on our model's predictions `m_polr`, child 1 (8 years old) has a higher probability of having one preventive dental visit, while Child 2 (15 years old) has a higher probability of having two or more preventive dental visits.### Multiple Imputation ApproachThe provided code use multiple imputation approach to account for the missing values in the `nsch22_model_one` data. It first generates multiple imputations using the `aregImpute` function and then fits a logistic regression model `lrm` to the imputed data using the `fit.mult.imputefunction`, yielding the output m_imp```{r}#| warning: falseset.seed(4322024)d2 <-datadist(nsch22_model_one)options(datadist ="d2")imp_fit22 <-aregImpute(~ dentistvisit+age+ er_visits+ hospitalstay,nk =c(0,3), tlinear =TRUE, data = nsch22_model_one,B =10, n.impute =10, pr =FALSE)m_imp <-fit.mult.impute(dentistvisit~age + er_visits+ hospitalstay,fitter = lrm, xtrans = imp_fit22, pr =FALSE,data = nsch22_model_one, fitargs=list(x=TRUE,y=TRUE))m_imp```The The *Nagelkerke* R\^2 value remains similar with that obtained from single imputation (0.035). Similarly, the C statistic of 0.591 almost matches the result from single imputation, indicating a lack of predictive ability in the model.### Comparing the Coeffiecnts (exponentiated)```{r}round_half_up(exp(m_lrm$coefficients),3)``````{r}round_half_up(exp(m_imp$coefficients),3)```Comparing the `lrm` models obtained from single vs. multiple imputation reveals that the coefficients for `age` and `er_visits` (2 times or more category) are relatively similar. However, the coefficients for other variables show some differences between the two imputation methods.## Analysis 2The outcome for this analysis is `cavities`. The predictors are `er_visits`, `hospitalstay`, and `med_use`.We will start by creating a subset for the logistic analysis.```{r}nsch22_log <- nsch22|>select(hhid, er_visits, hospitalstay, med_use, cavities)nsch22_log```### Prepare the binary outcomeWe will observe the `cavities` outcome distribution in the study sample.```{r}nsch22_cc <- nsch22|>drop_na()``````{r}ggplot(nsch22_cc, aes(x = cavities, fill = cavities)) +geom_bar(aes(y = (after_stat(count)/sum(after_stat(count))))) +geom_text(aes(y = (after_stat(count))/sum(after_stat(count)), label = scales::percent((after_stat(count)) /sum(after_stat(count)))),stat ="count", vjust =1.5, color ="black", size =5) +scale_y_continuous(labels = scales::percent) +scale_fill_brewer(palette ="PuRd") +guides(fill ="none") +labs(title ="Chronic Cavities Experience",subtitle ="NSCH 22",x ="Chronic Cavities Experience",y ="Percentage",caption ="Number of children: 2391")```There are 2,391 subjects with complete cases out of the 3000 study sample subjects. 11% of the children guardians/ parents reported that they have experienced chronic cavities, while the remining 89% did not.### MissingnessFirst we will observe the missingness in the data set:```{r}miss_var_summary(nsch22_log) |>gt()```The proportion of missing data in this sample is about 1.33%. The outcome `cavities` has the highest proportion of missing cases 0.47%, as well as the predictor `hospitalstay` followed by the other predictors `er_visits` and `med_use` (0.33 and 0.06). Overall, the proportion of missingness is minute. Due to the small proportion of missingness, we can assume that the data is missing at random (MAR).For the sake of this project, we will replace the missing values through the multiple imputation method using `aregImpute`.```{r}set.seed(432)dd <-datadist(nsch22_raw)options(datadist ="dd")nsch22_imp <-aregImpute(~ cavities + hospitalstay + er_visits + med_use, nk =c(0, 3:5), tlinear =FALSE, data = nsch22_raw,B =10, n.impute =3, pr =FALSE)```### Ordinary logistic modelFirst, we will fit an ordinary logistic model `m2_ord`.```{r}m2_ord <-fit.mult.impute((cavities =="yes") ~ er_visits + hospitalstay + med_use,fitter = lrm, xtrans = nsch22_imp, data = nsch22_raw, fitargs =list(x =TRUE, y =TRUE), pr =FALSE)m2_ord``````{r}round_half_up(exp(m2_ord$coefficients),3)```We will observe the size effects through this plot:```{r}plot(summary(m2_ord))```Our model predicts that children that have visited the ER in the hospital (1 visit and 2 or more visits) have lower odds of experiencing toothaches (0.929 and 0.963 respectively) compared to those with no history of ER visits. It also predicts that children that have been admitted to stay for at least one night in the hospital have lower odds (0.62) of experiencing chronic cavities compared to children with no history of hospital admission. However, children using medication have (1.61) the odds of experiencing chronic cavities. All of these assumptions are made while holding other factors constant.All confidence intervals, with the exception of `med_use`, cross the value of 1. This indicates that the medication use variable is the only variable associate with the outcome and therefore contributes to the model predictability.We will validate our model using the Bootstrap method.```{r}set.seed(432)validate(m2_ord, B =50)```The C statistic = 0.5 + (0.0705/2) = 0.53525 With a low AUC value, we can see that the model has a poor predictive performance. The Nagelkerke score of 0.002 further supports this conclusion regarding the model.We will check whether collinearity is a problem.```{r}rms::vif(m2_ord)```The value of inflation is not concerning for the presented predictors. Using an interaction term is not necessary in this case.### Bayesian logistic modelFor the Bayesian method, the data source should be a data frame. With the proportion of missingness being small (1.33%), we can use a complete data set `nsch22_cc`.We will fit the Bayesian logistic model `m2_bayes`:```{r}set.seed(432)m2_bayes <-stan_glm((cavities =="yes") ~ er_visits + hospitalstay + med_use, data = nsch22_cc, refresh =0)summary(m2_bayes)```We will extract the posterior```{r}posteriors <-get_parameters(m2_bayes)head(posteriors)```A summary of the posterior distribution can be observed below.```{r}describe_posterior(posteriors, test =c("pd", "ROPE")) |>print_md(decimals =3)```A summary of the prior distribution can be also be produced.```{r}describe_prior(m2_bayes) |>print_html(decimals =3)```We can also visualize the coefficients and credible intervals.```{r}plot(m2_bayes, prob =0.5, prob_outer =0.95)```To compare the ordinary logistic model results with the Bayesian logistic model's, we will produce a table with the exponeniated coefficients. This will also help in interpreting the odds ratio.```{r}broom.mixed::tidy(m2_bayes, exponentiate =TRUE, conf.int =TRUE, conf.level =0.95) |>gt() |>fmt_number(decimals =3)```Similar the regular logistic regression model, the Bayesian model predicts that children with a history of ER visits have the same odds of experiencing chronic cavities compared to children with no ER visit history (1.01 and 0.98). The same findings can be observed with the hospital overnight stay (1.04). Contrary to the regular logistic regression model, the medication use is predicted to have the same odds (0.97). All confidence intervals cross the value of one, indicating these variables have weak association with the outcome (chronic cavities). All of these assumptions were made while holding other variables constant.Graphical Posterior Predictive Checks```{r}pp_check(m2_bayes, nreps =10)``````{r}pp_check(m2_bayes, plotfun ="stat_2d", stat =c("mean", "sd"))```# Conclusions and Discussion## Analysis OneWhat are the odds of children with hospitalization records of receiving preventive dental care when other variables (age) are taken into account?We hypothesized that children who experienced frequent ER visits and hospitalization might be more likely to receive dental care, and that younger children might be more likely to receive frequent dental care visits compared to older children.The proportional odds logistic regression model (POLR) revealed that age plays a role in predicting dental care visits. The model predicts that the likelihood of a child seeking preventive dental care increases by 1.067 times for each additional year of age. The impact of hospital ER visits was surprising. While we anticipated a positive association, the results showed a slight decrease in the odds of dental visits with each additional ER visit. However, for children with frequent hospital admission, the model suggested a higher likelihood of dental care than those without history of hospitalization visits. However, the small effect sizes across all these factors indicate their limited individual impact.Our POLR predicting preventive dental visits with three levels (no preventive visits,1 visit,2 or more visits) using age, er_visits and hospitalstay has relatively poor or weak predictive ability. The validated 𝑅\^ 2 of 0.0310, and a validated C of 0.5 + (0.1735/2) = 0.586 indicate that the model fails to fit our data well. However, it was able to correctly predict dental visit categories to some extent, particularly for children with one and two or more visits (51.6% and 62%, respectively).Accounting for more sample size and incorporating additional factors beyond those currently included is crucial to enhancing the model's performance. Socioeconomic status and access to healthcare are potential variables that could contribute to predicting preventive dental care among children in the US.## Analysis TwoWhat are the odds of children with hospitalization records of experiencing chronic cavities, when other variables (ER visits and medication) are taken into account?In this study, we hypothesized that children with hospitalization history have higher odds of experiencing chronic cavities. The same assumption was made for children with a history of medication use. The ordinary logistic model predicted that children with ER visitation history have the same odds (0.929 and 0.963) of experiencing chronic cavities, compared to children without history. As hypothesized, children who use medication have higher odds (1.61) of chronic cavities. Children admitted to a hospital for at least one night had lower odds (0.62). In the ordinary logistic regression model, the medication use variable was the only variable with a confidence interval excluding the value of 1, indicating high association with the outcome.The Bayes logistic regression produced the same results regarding the ER visits and hospital stay variables. However, the results were different for the medication use variable. The model predicted that children with 1 visit or 2 or more ER visits have the same odds (1.01 and 0.98) of children with no history of ER visits. The same findings can be observed with the hospital overnight stay (1.04) and medication use (0.97). All confidence intervals contain the value of one, indicating these variables have a weak association with the outcome (chronic cavities).The Bayes regression model presents a better understanding of the findings. This AUC value of the model was very low (0.56) and the Nagelkerke R2 value (0.002) indicate weak predictive performance of the model. Therefore, the predictions produced from the model are not reliable. The predictors have a weak association with the outcome and as a result cannot contribute meaningfully to the model predictability.Based on these findings, future studies regarding children with hospitalization records should use scores to evaluate the level of the health status. Objective methods evaluating oral health (such as the DMFT score) can provide reliable results compared to subjective measures (reported cavities experience).# References and Acknowledgments## ReferencesThe data set was pulled from the National Survey for Child Health year 2022. Data Source: https://www.childhealthdata.org/learn-about-the-nsch/nsch-codebooks## AcknowledgmentsFirst and foremost, we would like to express our deepest gratitude to Professor Love for his excellent teaching, unwavering support, and the abundance of resources he provided throughout the semester. His patience and guidance were invaluable. We also extend our heartfelt appreciation to Alex, Miza, Monica, Lindsey, and all the TAs for their tireless assistance and willingness to address all our concerns throughout the semester, especially during the project period.# Session Information```{r}xfun::session_info()```