Does investing in education reduce STI risk in California?

Posted on Aug 16, 2023


The objective of this post is to analyze the impact of education investment in California counties from 2003-2020 to determine if increased education funding correlates with a decrease in sexually transmitted infection (STI) infection rates.

More specifically, I'll be analyzing the following STIs: Chlamydia, Gonorrhea, and Syphilis. I chose these because they are curable and they account for more than $1.1 billion in direct medical care costs for the US government.

Although these STIs are treatable, if left untreated, they can lead to severe complications, including death. 

Please see my Github repo for details on the code details.


Three distinct datasets were used, all obtained from the California Health and Human Services Open Data Portal (

I. Incidence of Gonorrhea, Syphilis, and Chlamydia cases among males and females in California counties (excluding San Francisco) spanning 2001 to 2020 and the rate of infection out of 1000 people 

II. Comprehensive county expenditure information across various sectors, encompassing education, spanning 2003 to 2020.

III. County Federal Information Processing Standards (FIPS) codes.

Charting cases

Growing cases 

As we can see, while the cases of all three diseases shows an upward trend, Chlamydia consistently tops the charts as the most prevalent and highly infectious disease over the years. 

Approximately only 20% of the time over the span of 18 years, there was a decrease in STI infections. In 2020 the number of cases decreased. This was likely due to people refraining from socializing and coming into close contact with people during COVID-19

Rates by gender 

Women are at a greater risk group for these STI infections.

Chlamydia itself isn’t fatal, but if it damages a woman's fallopian tubes, leading to an ectopic pregnancy, the fallopian tube can rupture, causing internal bleeding and death if not treated promptly.

STI by County 

Chlamydia has the highest rate, and as seen above, some counties have an infection rate higher than 1% for the female population.

Investment in education 

The investment in education has risen considerably in the last years. 

Conclusion Investment in education X STI cases

The box plot above  shows the correlation between education investment and STI rates among counties between 2003 and 2020.  Despite increased investment in education over the years, STI spread in California remained unchecked.

Finally, for the year 2019, the projected infection rates for each infection in different counties of California are shown. As seen above, the highlighted county has a high infection rate for all three infections. That county is none other than Los Angeles, a state with a soaring infection rate despite significant investments in education.

As Leandro Mena, MD, MPH, Director of CDC’s Division of STD Prevention, observed in a 2023 release concerning the growing number of people infected with STIs : “The U.S. sexually transmitted infection (STI) epidemic shows no signs of slowing down. The reasons for the ongoing increases are complex – and so are the solutions”

Future work

For future work I would seek out answers to these questions:

•Why do some counties have such a large rate of infected people compared to others?

•What specific type of investment in education can help to prevent STI?

•What specifically influences  a person to be safe in sexual activities?

About Author

isaac chammah

I have worked for 10 years in São Paulo, Brazil as the CFO of the largest wet wipe manufacturing company in Latin America. I was responsible for leading the due diligence process when the company was sold in...
View all posts by isaac chammah >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI