Python Survey 2017 Visualization with R and Shiny

Posted on Feb 4, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

With the arrival of Big Data era, data has become more and more important to human beings. People who have better understand of data can not only gain advantage for their business, but also can have better understand of their industries. Because of this, data analysis tools such as python, R, SAS, etc. also become more and more popular. Since I have background in both computer science and applied mathematics and statistics. I decided to take a good look of survey that conducted by JetBean for python users in 2017. You can view my project via the link, and the code for the project is on github.

Dataset

The data set I used is from JetBean.com, the website provides the report about Python community in both 2016 and 2017. They only provide the raw data for 2017. It gives me a chance to doing analyzing with the data on different ways from their report. From the dataset, I hope to provide my audiences a better understand of python in the world now. The dataset includes answers from 10,000 JetBean users for the Python Developers Survey 2017. The survey has 30 different questions to ask users from if they use python as main languages to the type of industries the users are in, etc.

Project

After viewing and cleaning the data set, I decided to use 6 main components for my project. They are what is the usage of python for them? what countries do these python users from? What are the age ranges of these python users? What other languages do they use beside Python? What are the purposes do they use Python for? What kinds of industries do you work in?

As you can see in the graph down below, among all the users that finished the survey, there are 85.3% of python users. 67.5% of them use Python as main language; and 17.8% of them use Python as secondary language.

In the Country section, I listed of number of Python users in the top 12 countries that have most Python users and plug them into the global map. The darker color the countries get filled, the more Python users this country has.

In the Age section, I listed out the range of Python users from under 17 to 60 or older. Although Python users have different age, as you can see most of Python users are in their 20s to 30s. And age range of 21-29 has the most Python users.

For the question about what other languages do python users use, you can find the answer in the language section. You can clearly see that almost 50% of developer also use JavaScript, and 49% of them use HTML as well.

That leads us to the question of what do people use Python for? Since this question can contain multiple answer on the survey. The total percentage is greater 100%. The answer is most of them use Python for either Data Analysis or Web Development (50% vs 49%), following by DevOps / System administration / Writing automation scripts, Programming of web parsers / scrapers / crawlers, etc.

As I mentioned in the beginning of this, the world has entered the era of Big Data. You can easily figure out the answer for what kinds of industries do Python users work for? Information Technology / Software Development contains about 25% of Python users among 10,000 of them.

Conclusion

In the future I hope to find more dataset for the popular using computer language like JavaScript, Java, C, C# to continue my project. To provide people with better understand of different computer languages in the world nowadays.

About Author

Weixing Yang

Data scientist with a background in big data analytics and intensive programming. I am currently seeking a position within a creative and dynamic work environment that gives me the opportunity to contribute my abilities and skill set gained...
View all posts by Weixing Yang >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI