2017 Youth Risk Behavior Survey: Analysis and Shiny App
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
The Youth Risk Behavior Survey (YRBS) is a survey administered by the Centers of Disease Control and Prevention to monitor many adolescent health-risk behaviors that may contribute to issues of major public health concern.
The survey was developed in the early 1990s and has been administered to all enrolled students at public, Catholic, and other private schools in the United States every two years to give insight on the distribution of behaviors that contribute to injury and violence, sexual behaviors, alcohol and drug use, tobacco use, unhealthy dietary behaviors, and inadequate physical activity.
To monitor these behaviors, the YRBS uses a three-stage cluster sample design stratified by racial/ethnic concentration and Metropolitain statistical area status to produce a representative sample of 9th through 12 grade students in the 50 States and the District of Columbia. From 1991 to 2017, the YRBS has collected data from more than 3.8 million high school students from national, state, territorial, tribal, and local school-based surveys.
In 2017, schools were were selected systematically with probability proportional to enrollment usng a random start resulting in 192 schools being sampled. The overall response rates of the 2017 YRBS are shown in Figure 1 below:
Purpose of Project
The safety and health of adolecents in the United States should always be a subject of public health focus. Usually, distructive behaviors established in adolescence are carried into adulthood, continuing the vicious cycle of existing health epidemics plaguing the United States currently and creating new health problems that will burden our healthcare system and the individual.
A perfect example of a new health issue in the spotlight today is the safety of vaping and electronic cigarette products. The rise in incidence of the lung infection and deaths attributed to electronic smoking products is causing medical professionals to question the safety of these once "safe" alternatives to smoking. Curving these behaviors in adolescence could help alleviate the burden they may impart on the individual and the health system in the future.
The purpose of this project is to conduct a complete analysis of the 2017 YRBS data with statistical testing to offer insights into behaviors that influence others and to develop an almost complete visualization of the data with an interactive Shiny app using R. This project's outcomes of interest are those that relate to weapons, dating violence, bullying, and drug use. Hopefully, this application can be used as an aid to adolescent health research.
Of the 14,765 students included in the 2017 YRBS data, about 51% were girls and 48.2% were boys. The median student was in 11th grade and 16 years old. The majority of students identified with the White ethnicity (42.3%). Other ethnicities reported in the survey were African American (18.9%), Multiple Hispanic-Latino (14.2%), Hispanic-Latino (10.5%), Multiple non-Hispanic-Latino (5.5%), Asian (4.4%), and about an equal proportion of those reporting being Native American/ Alaskan Indian, Hawaiian and Pacific Islander, and identifying as "None" (1.7%).
Features of Interest
Below is a table of the features from the survey this analysis focuses on. This table includes the topic, the variable name in the analysis, the question from the survey associated with the variable, and the wording of the question in the survey.
Methods and Missing Values
All statistical tests and analyses were conducting in R using the "survey" package. This package accounts for the complex 3- stage cluster design of the YRBS by using a "design" statement that includes the survey weights, strata, metropolitain statistical area status, and PSU.
Ordered Linear Regressions (OLR) were used for variables that had greater than 2 levels and General Linear Models (GLM) were used for those features that were binary. The data included a large number of missing values with question 45 ("During the past 30 days, what is the largest number of alcoholic drinks you had in a row?") having the greatest percentage ~30% missing.
For this analysis, the missing values were imputed using the "MICE" package in R running 2 iterations to predict values resulting in imputed 2 datasets. The 2 datasets were used in both OLR and GLM models and then the results were pooled.
Weapons: All Weapons Carrying
Out of the 14,765 students, 11,738 responded to question 12 of the survey, which asked: "During the past 30 days, on how many days did you carry a weapon such as a gun, knife, or club?" (respose rate 79%, 20% missing). The variable included 5 levels measuring the number of days the student carried. Possible answers were:
- 0 days (10,027 reports)
- 1 to 2 days (349 reports)
- 2 to 3 days (398 reports)
- 4 to 5 days (166 reports)
- 6 or more days (798 reports)
Of the respondents, 85% reported carrying 0 days, followed by the those reporting carrying 6 or more days (6.8%). When the weapons carry feature was put into a OLR model controlling for all variables in the survey except questions 59 - 99 (due to insignificance in the full model) and other weapons and physical fighting variables, the feature yeilded some interesting results.
The most notable related to race with those idetifying as "White" having greater odds of reporting carrying a weapon more than 0 days when compared to the reference group African Americans (Odds Ratio (OR) : 2.52, p = 0.03), age with 17 year olds reporting carrying weapons more than 0 days (OR: 3.12, p = 0.006), and being bullied on school property with those reporting "Yes" to experiencing bullying having 1.27 greater odds of carrying a weapon than those who reported not being bullied at school (p=0.0006).
Another feature that had higher odds of carrying a weapon on almost all levels was question 42, which asks: During the past 30 days, on how many days did you have at least one drink of alcohol? The levels "1 to 2 days", "3 to 5 days", "10 to 19 days", and "20 to 29 days" of the drinking days variable were significant with odds ratios of 1.23, 2.26, 2.02, and 36.3 respectively, when controlling for all other variables in the reduced model.
Weapons: Gun Carrying
More students responded to the gun carrying variable with an overall response rate of 96% (14,195 respondents, 3.8% missing). Question 14 of the survey asked: "During the past 12 months, on how many days did you carry a gun? ( Do not count the days when you carried a gun only for hunting or for sport, such as target shooting.)" The levels of the variable are similar to the previous outcome with:
- 0 days (13,467 reports)
- 1 to 2 days (196 reports)
- 2 to 3 days (174 reports)
- 4 to 5 days (69 reports)
- 6 or more days (289 reports)
Again, "0 days" is the most frequent answer amongst respondents with "6 or more days" following in greatest frequency. Using the same OLR model from the "survey" package, the reduced model included all variables relating to demographics, driving, drinking and driving, feeling unsafe at school, bullying, depression, suicide, smoking, drinking, and marijuana and cocaine use.
Some notable results showed males being at 7.81 greater odds of reporting carrying a gun in the past 12 months than females (p< 0.0001), those feeling unsafe at school for 6 or more days being at 0.143 greater odds of carrying a gun than those who reported feeling safe (p= 0.0012), and those students reporting being bullied at school being at 1.16 greater odds of carrying a gun in the past 12 months (p = 0.045).
Weapons: Carrying Any Weapon to School
Question 13 of the 2017 YRBS asked: "During the past 30 days, on how many days did you carry a weapon such as a gun, knife, or club on school property?" The question had a response rate of 98.3% (258 missing) with the distribution of responses being
- 0 days (13,903 reports)
- 1 to 2 days (170 reports)
- 2 to 3 days (119 reports)
- 4 to 5 days (43 reports)
- 6 or more days (272 reports)
In a reduced OLR model including variables relating to demographics, driving behaviors, safety at school, bullying, depression, and suicide, the weapons to school variable was significant with race and gender.
Compared to African Americans, American Indians/ Alaskan Natives, Hispanic/Latino, Multiple Hispanic/ Latino, Native Hawaiian and Pacific Islanders, and those who identify as White all were at significantly greater odds of reporting carrying a weapon more that 0 days to school (OR: 3.92, 1.32, 3.42, 15.1, 1.195, respectively). Gender was also significant with males being at 6.01 times greater odds of reporting carrying a weapon to school compared to females.
Weapons: Injured with any Weapon on School Property
The last weapon feature analysed is question 16 from the survey which asked: "During the past 12 months, how many times has someone threatened or injured you with a weapon such as a gun, knife, or club on school property?" The question had a 99.5% response rate with the distribution of answers as follows:
- 0 times (13,768 reports)
- 1 time (429 reports)
- 2 or 3 times (217 reports)
- 4 or 5 times (84 reports)
- 6 or 7 times (44 reports)
- 8 or 9 times (26 reports)
- 10 or 11 times (13 reports)
- 12 or more times (121 reports)
The reduced survey OLR model, controlling for variables relating to demographics, driving behaviors, safety, bullying, depression, and suicide resulted in Males being at 3.53 times more likely to report being injured with a weapon on school property (p=0.006), those feeling unsafe at school on all levels being at much greater odds of reporting being injured with a weapon at school compared to those who felt safe,
students who reported being both electronically and bullied at school were at greater odds of reporting being injured with a weapon at school with ORs of 1.96 (p=0.003) and 3.18 ( p<0.0001) respectively, and, finally, students who considered suicide reported being injured with a weapon on school property (OR: 1.45, p = 0.017).
Dating Violence: Rape
Question 19 of the survey asked: "Have you ever been physically forced to have sexual intercourse when you did not want to?" The question had a response rate of 97.8%. The possible answers were "Yes" with 1,104 reports (7.6%) and "No" with 13,336 reports (92.4%) .
When put through the general linear model (GLM) offered by the "survey" package in R against all variables minus questions 59 - 66, 68 - 78, and 80 - 99 those variables relating to safety, bullying, depression, considering suicide, and the drug MDMA were significant on at least 1 level. There were no significant differences in race, age, sex, or grade. Those who reported feeling unsafe for 4 to 5 days were at 193.5 times more likely to answer question 16 of the YRBS "Yes" (p=0.0003).
Students who reported being bullied electronically (p=0.03), feeling sad (p=0.005), and having considered suicide (p = <0.0001) were more likely to answer "Yes" to question 16. One of the most notable results was the odds of students reporting heavy MDMA use ( using > 40 times in lifetime). Those 0.57% of students who reported heavy MDMA out of the 14,675 students were at 8.85e+21 times greater odds of reporting being forced to have intercorse in their lifetime.
Dating Violence: Forced to do Other Sexual Acts by Anyone
The next question relating to dating violence on the survey asked: "During the past 12 months, how many times did anyone force you to do sexual things that you did not want to? (Kissing, touching, or forced to have intercourse)." This question had an overall response rate of 95.8% with the distribution of answers as follows:
- 0 times (12,724 reports)
- 1 time (670 reports)
- 2 or 3 times (465 reports)
- 4 or 5 times (93 reports)
- 6 or more times (193 reports)
A survey OLR model, controlling for demographic variables, driving variables, safety, bullying variables, depression, suicide, and sexual contact showed American Indian/ Alaskan Natives and Native Hawaiian/ Pacific Islanders being at greater odds of reporting being forced to do other sexual acts by anyone in the past 12 months greater than 0 times ( OR: 1.93, p = 0.04 and 2.05, p= 0.006 respectively) when compared to the reference group, African American.
Those reporting being bullied electronically (OR: 1.70, p= 0.0009) and feeling depressed (OR: 3.34, p< 0.0001) also showed significantly greater odds of reporting being forced to do other sexual acts.
Dating Violence: Forced to do Other Sexual Acts by Boyfriend/Girlfriend
Survey question 21 asked: "During the past 12 months, how many times did someone you were dating or going out with force you to do sexual things that you do not want to do? (Kissing, touching, intercourse)." The response rate for this variable was 92.6% and the possible answers for this question were as follows:
- 0 times (8,603 respondents)
- 1 time (313 respondents)
- 2 or 3 times (165 respondents)
- 4 or 5 times (41 respondents)
- 6 or more times (109 respondents)
- I did not date anyone in the past 12 months (4,541 respondents)
The variable was put through a survey OLR model with demographic variables, driving variables, safety, bullying variables, depression, suicide, and sexual contact preferences.
Results of the model showed those identifying as "Asian" being at 3.22 greater odds of reporting being forced to do sexual acts by a boyfriend/girlfriend (p < 0.0001), student answering "Yes" to making a suicide plan in the past 12 months were also more likely to report being forced by a boyfriend or girlfriend (OR: 1.33, p = 0.025), and those answering "Never had sex" to question 66, which asks "During your life, with whom have you had sexual contact?," were 3.84 times more likely to report being forced to do sexual things by someone they were dating (p < 0.0001).
Dating Violence: Injured by Boyfriend/Girlfriend
The last dating violence question asked "During the past 12 months, how many times did someone you were dating or going out with physically hurt you on purpose?" Of the 14,765 students who took the survey 13,995 answered this question (94.8% response). The distribution of answers were as follows:
- 0 times (8,709 respondents)
- 1 time (345 respondents)
- 2 or 3 times (255 respondents)
- 4 or 5 times (91 respondents)
- 6 or more times (153 respondents)
- I did not date anyone in the past 12 months (4,532 respondents)
A reduced survey OLR controlling for demographic variables, driving variables, safety, bullying variables, depression, suicide, and sexual contact preferences resulted in students identifying as "Asian" being 2.64 times more likely to report being injured by a boyfriend/girlfriend in the past 12 months (p = 0.0015),
those feeling unsafe at school 4 to 5 days at 3.08 times greater odds of reporting being injured (p < 0.0001), students who answered "Yes" to making a suicide plan were at 1.18 times greater odds of reporting being injured by a boyfriend or girlfriend (p=0.03), and those reporting having "Never had sex" to the sexual contact question were 4.75 times more likely to report being injured by a significant other (p < 0.0001).
Bullying : At School
The YRBS has 2 questions relating to bullying, question 23 monitors students being bullied on school property by asking "During the past 12 months, have you ever been bullied on school property?" The question had a 98.3% repsonse rate with a the possible answers being "Yes" with 2,665 responses (18.3%) or "No" with 11,941 responses (82.3%).
The feature was tested using a survey GLM model controlling for all variables except for those that did not turn up significant in the full model. The results showed that males were 1.83 times more likely to report being bullied at school then females (p=0.0457), students being bullied electronically were 16.2 times more likely to report being bullied at school compared to those who reported not being bullied electronically (p< 0.0001), and those who reported being sad were 1.90 times more likely to report being bullied at school then those who reported otherwise (p< 0.0001).
Bullying : Electronically
The second question that monitors bullying, question 24, asks, "During the past 12 months, have you ever been electronically bullied? (Count texting, Instagram, Facebook, or other social media)." The possible answers are the same as the previous variable with an over all response rate of 98.2% and the distribution of answers as follows:
- Yes: 2,113 responses (14.5%)
- No: 12,482 responses (86%)
The variable was tested using the same model as question 23 and generated similar results except in contrast with bullying at school, males were less likely to report being bullied electronically (p=0.036). Those more likely to report being bullied electronicaly include those students reporting being bullied at school (OR: 16.8, p< 0.0001), those reporting having attempted suicide 6 or more times in the past 12 months (OR: 161.5, p = 0.049), and those abusing opioids (OR:4.64, p = 0.039).
For all students surveyed only 1.79% of students reported one or more uses of illicit drug use in their lifetimes. While Marijuana was a part of the survey, the use of marijuana was excluded from this analysis due to the current consideration of marijuana in larger society.
The highest use amongst this subset of students was of non-prescription opioids. An overwhelment 13.89% of students in this subset reported taking opioids 1 or more times in their life, with 12% of those students reporting using 40+ times. While high incidence of repeated opioid use is the lowest at 12%, heroin shows up as the most addictive amongst students with over 30% of students reported using 40+ times. methamphetamines is the next most addictive with 25% of using students using 40 or more times. These are followed by MDMA at ~17% and Inhalants at around 13%.
Gender is a significant variable in predicting use of each drug. For girls MDMA, methamphetamines, and heroin are used significantly more than boys. Boys used inhalants more than girls.
For those who reported using drugs most did not self report gender or grade. They did however report age, with 12 and 13 year olds reporting the highest percentage of use of over 50% each within their populations. High reporting could be due to many factors. There is a high likelihood that many students experiment with drugs for the first time at a young age and therefore do not report first time usage at older ages. They also might be too young to realize the significance of admitting to drug use.
Students also chose not to self-report race when reporting drug use. However amongst those students who did self-report a higher proportion of Native Americans reported using vs never using. Asians have the lowest proportion of usage.
This analysis of the 2017 YRBS produced many concerning revelations about the adolescent population in the United States. The most alarming is that high school students are actually reporting carrying weapons in and out of school, experiencing dating violence, being subjected to bullying both in and out of school, and using drugs. The reality, based on this data, is that the adolescent population in the United States is not immune to experiencing violence, mental health disorders, and addiction just because they are perceived as still being children.
All of these issues require attention and should be taken seriously. Not only do some of these concerns impart individual hardship on a high school student and many areas of their lives, but many of these issues may result in serious injury, mental trauma, or even death.
While these behaviors could be largely influenced by underlying factors that are not even covered in the 2017 YRBS data like location, parental income, relationship with family, family health history, etc. the findings of this analysis are useful in offering insight into what needs immediate attention and who may need the most attention reguarding certain issues.
In the full models, the violence variables continuously remained significant with the weapons features. This shows that, in most cases, violent tendencies predict other violent behaviors. If a student is getting into trouble for fighting in and out of school, the results of the full model showed that he or she may be at greater risk of carrying a weapon and injuring someone else with a weapon or vise versa.
The purpose of the reduced models was to identify factors other than violence that may help influence the weapons variables. The reduced models show that factors vary depending on the question, however, a trend can be seen with the weapons variables and being bullied at school. Out of the 4 features analyzed, 3 (carrying any weapon, carrying a gun, and being injured with a weapon at school) all were significant with those reporting being bullied at school. Whether or not this is causal is unclear and cannot be determined by this analysis.
Dating violence was an interesting subject to analyze. A concerning pattern can be seen with dating violence variables focusing on a boyfriend/girlfriend and the sexual contact preference variable. In both tests, these variables showed that those who reported "Never having sex" were experiencing violence from their significant other on more than one occasion. Also, all dating violence features showed significance with either feeling sad or considering suicide. Again, it is hard to determine which comes first, dating violence or depression and suicidal ideation.
Bullying at school is an issue that has been alive for ages, however, today with the constant need to be connected by social media and the availability of social media on every device there is this new phenomenon of electronic bullying that eliminates a student's ability to escape from the bullying they experiences at school.
This is apparent based on this analysis, i.e. both bullying variables have significant relationships with one another. Both features were also significant with variables concerning depression and suicide, which highlights the severity of the issue. Another interesting find was the difference in relationship both electronic bullying and bullying at school had with gender. Boys were more likely to report experiencing bullying at school, while girls were more likely to report electronic bullying.
Overall student drug use follows the larger patterns of adult drug use. The opioid epidemic is clearly shown amongst students as it is in adults. Heroin addiction is also evident amongst students. Finally, individual weight is a small factor in the use of drugs as higher weight categories (overweight and obese) report higher usage.
While the YRBS is incredibly useful, there are definitly some notable limitations. First, survey data is based on recall and relies on an idividual to report correctly. Second, there are many factors that could influence a adolescent's behavior that the YRBS does not cover. Familial relationships, familial health history , parental income level, and location could be important influencing factors that have potential for bias.
The last limitation is the high fequency of missing values and imputation of those missing values. Imputation can always introduce bias in results, however, the method used predicted answers based on multiple iterations producing 2 datasets, this was a much better alternative to imputing the mode of each question.
Though the missing values may have been a limitation in this study, it is important to note that the missing values offer insight to who skips questions, which questions have the lowest response rate, and offers insight to the effectiveness of the wording of certain questions. The missing values in the 2017 YRBS should be studied to help improve the YRBS in the future.