Topics from TED Talks

Posted on Feb 4, 2019

The company TED Conferences LLC posts talks online for free distribution and they have been watched billions of times worldwide since the site was launched. The purpose of this project is to study how the topics of these videos changed over the time.

This is a web scraping project and all information was scraped from ted.com/talks using Selenium. More than 3 thousands videos were scraped and the items extracted from each one are: title of the video, description of the video, keywords (or related topics), month and year that the talk occurred, number of views, number of transcript languages and number of comments.

You may find the code file here.

Data Analysis

Each video has several keywords related and there are more than 4 hundreds different topics. At the scatter plot below, it is possible to see the distribution between the number of videos and the average number of views for each topic.

Graph 1: Number of videos vs average of views for each topic

It was a surprise to see from Graph 1 that the topics with more videos posted is not the most ones watched. To study the evolution of the topics over the time, it was taken these two extremes separately.

Following it is possible to see the progress of the topics with more than 400 videos posted over the time.

Graph 2: Topics with more than 400 videos posted

The topics most posted are, in descending order: technology, science, culture, global issues, design, business and society. Technology was on the top of the most posted topics until 2016, when society had a significant increase in the number of videos. It might suggests a change in TED's repertoire.

Following it is possible to see the evolution of the topics with more than 4 millions views over the time.

Graph 3: Topics with more than 4 millions views

The most viewed topics are, in descending order: body language, introvert, mindfulness, success, time and evil. Success and time are two topics that can be a strong relation. So, it is not possible to see a significant change in the videos with more views.

Another factor studied is the relation among the number of view, number of transcript languages and number of comments. The data for the Graph 4 is from the last 5 years and the size of the circle represents the number of comments.

Graph 4: Relation among number of views, number os transcript languages and the number of comments

From Graph 4 it is concluded that the number of views might increases when the video has more transcript languages but it is not possible to have a clear conclusion about the influence of the number of comments.

Future Work

For future work , studies of the network among the topics and analyze the central ones would be helpful to have a wider view of the evolution of the topics.

Scraping more comments information (as comment date, replies and helpful ratings) would be interesting to analyze how the people interact with video or topic. In other words, if there are more interaction in new videos than the older ones for example.

Conclusion

TED posts more talks about technology but it seems that people are more interested in videos related with success/career. Besides that, the number of transcript languages has a positive affect on the number of views.

About Author

Stella Oliveira

Data scientist with a background in financial services and demonstrated experience managing data and deploying predictive models. Highly motivated to combine the ability to thrive in a fast-paced work environment with the fascination for generating insights from complex...
View all posts by Stella Oliveira >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI