Stack Overflow: Who are the people behind the posts?

Posted on Jul 29, 2018

If you have been coding for a while, you must have visited this website several times, possibly even thousands of times. We might be amazed that this website can help us to get our code working over 90% of time. In many cases, we may go so far as to say it is our only hope. After being saved again and again, are you curious who are those people behind the posts?

I am. so I chose Stack Overflow 2018 Developer Survey results for analysis. The data set itself is huge, so I just use a small subset (~35000 observations ~25 attributes) for this visualization project.

My analysis on this data has three main parts: general profile of the respondents, work related information and life related information. I will show them one by one. (Link to Shiny app: https://xzglovenk.shinyapps.io/SO_svy_fnl02/)

  • General profile
  1. Where are they from? From the bubble plots and the interactive map you can easily tell United States has the largest population of Stack Overflow (SO) users. North America, West Europe and India have larger population than all the other regions in the world. We did not see many Asian countries represented. Language might be one important reason.
  2. Gender distribution. If the survey respondents are a good sample of the developer community (which I believe it is), what we saw in this graph is a little shocking. We may need to make more effort in the future to convince women that coding is not that hard and boring.
  3. Educational background. It’s no surprise that those with bachelor’s degrees make up  the largest fraction, and the ratio decrease all the way to PhD and professional degrees (JD, MD). One thing worth mentioning is that the "No degree" category in the graph actually means “have some college level education but no degree” according to its full description as it appears in the survey (I rephrased most of the descriptions so that they are not too long for the graph).
  4. Undergraduate Major. From this chart we can see over 90% of the respondents are from majors that has extensive coding training, such as computer science, information system, web development and so on. This is not so surprising either. The reason I want to mention this is that most beginners like me all have frustrating moments and turned to SO to find solutions. Sometimes I doubted myself, thinking, “Am I  smart enough to be a qualified programmer, when these people know so much more than I do?” We do not need to be bothered by those thoughts since most of the people on SO study coding as their major, and so they are supposed to know it well! If we spend enough time on coding, we can grow to be as good. Just do not lose confidence at the beginning.
  5. Years of coding. The people with 3~5 years and 6~8 years experience hold  the largest fraction in the survey respondents. Considering the fact that most of them have formal undergraduate education in heavy coding majors, those people with 3~8 years coding experience just left school not that long ago. They are the fresh blood in the developer community -- young and ambitious. They are not throwing away their shots to make disruptive changes to the world.
  • Work related: After putting together a general idea who those people are, we will take a look at some responses about their work.
    1. Working languages. In the bubble charts we can see the comparison between the languages they are using right now and they ones they desire to learn and use next year. Web development languages are dominant in both charts. The biggest difference is more people put Python on their wish list for next year. This is an strong indicator that more people are jumping into data science, and Python appears tol get more and more popular in job market.

      Languages which developers are using for now.

      Languages developers desired to learn and use for next year.

    2. Salaries. In this part I want to highlight a few factors that could influence on developers' salaries.
      • Salary vs. gender. From this graph we can see that in general male developers earn more than female and under-representative groups. Probably we do not have an easy way to ease this discrepancy soon, but we need to keep this fact in mind and do not take it for granted.
      • Salary vs. degree. From this graph we can see a general trend that salary increases as education advances from the bachelor’s to master’s to PhD level, indicating that higher degree do help you earn more money.
      • Salary vs. company size. We can also see there is a general trend here: the bigger the company size, the more the employees earn. Of course there are still a lot of outliers, and money is not the only factor we should consider in a job search. There are a lot more important things.
    3. Satisfied with their jobs? After looking at the salaries, let us continue our exploration in what factors bear on job satisfaction. The way we can see the relationship is using the Cohen-Friendly association plot (https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/assocplot.html). The rectangle cell in each row are positioned relative to a baseline indicating independence. The height of the cells are proportional to the chi^2 value. The higher they are (regardless the sign), the stronger the correlations. To make the graph less busy, I quantified the satisfaction descriptions: "5" means "Very satisfied", "-5" means "Very dissatisfied". "3" and "1" denote "Moderately satisfied" and "slightly satisfied"......

      • Job satisfaction vs. gender. It seems that ladies tend to have stronger emotions towards their work. They are either very happy or very unhappy. Compared to ladies, gentlemen's feeling about their jobs are more predictable. We also noticed that the under-representative groups tend to be unhappy about their jobs. There might be multiple reasons, hopefully workplace discrimination is not one of them.
      • Job satisfaction vs. salary range. We can easily see a pattern in this graph. The people who earn less than 100k/year tend to be unhappy while people who earn over 100k/year tend to be happy about their jobs. Yes, money is important.
      • Job satisfaction vs. company size. This  plot reveals something interesting. While people in smaller companies tend to be happier, and those who work in super large companies tend to be unhappy about their jobs. Probably they feel less of a sense of achievement and receive less attention in a group having more than 10,000 people around them. However, they might earn a little more money.
      • Life related: Now let us take a quick look at their lifestyles. I chose three major attributes to see if they are living a health life.1. Skip meals. I am happy to see most of them take meals seriously, which is very good.2. Working hours. Most people work between 5~ 12 hours a day, but there are still quite a number of people work over 12 hours a day. Yes, programmers' life can be tough.3. Exercise. Unfortunately, a lot of them rarely do any kind of exercise.From those three bar charts, we can see certain fraction in different bad life habits. An natural question we want to ask is: Is there an overlap among those people? For example, do the people who work long hours tend to skip meals or  have no time to exercise? Can we confirm that? Again, let us look at the association plots between two certain lifestyles. Clearly, working long hours, no exercise and skipping meals are highly correlated. It seems that a small group of people are just collecting various bad life habits. I used to be one of them myself, but I realized that it was having a bad effect on me. Hopefully we can make a change soon and try to live healthier.

  • In summary, after looking at the subset of the survey results, I think there are a few things to highlight:
    • The developer community is booming, thought considerable gender gaps still exist.
    • The clear choice for what to learn is Python.
    • Consider company size when choosing jobs. Facebook and Google might not be right choices for everyone.
    • Nothing is more important than health. Do not burn yourself out!

 

About Author

Zhenggang Xu

Zhenggang is currently a data science fellow in NYC data science academy. He received his education in computational chemistry and worked in deep water exploration for a few years. He believes in numbers since computations have helped him...
View all posts by Zhenggang Xu >

Leave a Comment

Benjamin Roberts July 30, 2018
This is incredible

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI