R vs. Python: Analysis Based on Data from Stack Overflow

Posted on Nov 7, 2016

Contributed by Chuan Hong. Chuan is currently in the NYC Data Science Academy 12 week full-time Data Science Bootcamp program taking place between September 26th to December 23rd, 2016. This post is based on her class project - R  Shiny.

Motivation

Both R and Python are widely used programming languages for data analysis ranging from basic statistics and visualization to complex model analysis. I am curious about "which is more popular: R or Python?". Thus, I created an app to visualize the popularity of R versus Python based on data from Stack Overflow over the past six years.

Data Source and Processing

Data were from kaggle, R Questions from Stack Overflow and Python Questions from Stack Overflow. Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their career. The table below shows the raw data from kaggle. The data of R or Python was organized in three tables: Questions, Answers, and Tags. Full text of questions and answers from Stack Overflow that are tagged with the "Tags", useful for natural language processing and community analysis.

var_raw

Next, since we only cared about the number of questions and answers for each language, we nailed down to the following variables in each table.

var_processing

In order to separate the "Tags" into "packages" and "topics", I also scraped a list of R packages from CRAN Packages By Name, and a list of Python packages from PyPI Ranking as well.

 Visualizations

  • Summary of R & Python

    First, let's look at the aggregated number of users and the number of questions/answers. The bar chart below shows that more users posted questions/answers related to Python than users posting about R. Similarly, the total number of questions/answers of Python is greater than that of R.

    summary-of-r-python

  • Time Trend: R vs. Python

    The video below shows how to check the time trend of R or Python with different items. For example, we can see how the number of users posting questions and the number of questions of  R or Python changed from 1/15/2000 to 9/15/2016. Obviously, both of R and Python was increasing during the past six years, and the number of users asking questions about Python (n=9441) is greater than that about R (n=2612).

    1. time-trends-of-r-versus-python


  • Topics:  R versus Python

    Next, let's look at the findings about packages and topics. The most popular topics of R are dataframe, plot, loops, regex, and function.

    topics_r

    Based on the graph below, we can see the most popular topics of Python are python 2.7, python 3.x, list, dictionary, and tkinter.

    topics_pythons

  • Packages: R versus Python

    The most popular packages of R are ggplot2, shiny, data.table, dplyr, and list.packages_r

    The most popular packages of Python are django, panadas, numpy, matplotlib, and regex.

    packages_pythons

About Author

Chuan Hong

Chuan Hong is a Ph.D. Candidate majoring in Public Health at the University of South Carolina. Her main research areas are environmental health sciences, with a focus on environmental epidemiology. By using a series of data collection, statistical...
View all posts by Chuan Hong >

Leave a Comment

blog names examples December 13, 2017
It works as a website cms, but gives you plenty of treatments for what the blog will look like and how it'll function. You can sell t-shirts and bags that are branded together with your blog's logo to improve some income. Writing blog articles can be a way for you to channel knowing about it and creativity while using world.
Chuan Hong March 7, 2017
Hi Victor, Thanks for liking my project. I used googleVis library to make this "Motion Chart". Here's an introduction regarding "visualization: Motion Chart" https://developers.google.com/chart/interactive/docs/gallery/motionchart. I hope this will help :)
Victor February 16, 2017
Hi Chuan, Nice article. Thank for the insights. What library did you use do generate the Time Trend R x Python in the video. Was it R Shiny? Thanks

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI