Topics from TED Talks

Posted on Feb 4, 2019

The company TED Conferences LLC posts talks online for free distribution and they have been watched billions of times worldwide since the site was launched. The purpose of this project is to study how the topics of these videos changed over the time.

This is a web scraping project and all information was scraped fromΒ using Selenium. More than 3 thousands videos were scraped and the items extracted from each one are: title of the video, description of the video, keywords (or related topics), month and year that the talk occurred, number of views, number of transcript languages and number of comments.

You may find the code file here.

Data Analysis

Each video has several keywords related and there are more than 4 hundreds different topics. At the scatter plot below, it is possible to see the distribution between the number of videos and the average number of views for each topic.

Graph 1: Number of videos vs average of views for each topic

It was a surprise to see from Graph 1 that the topics with more videos posted is not the most ones watched. To study the evolution of the topics over the time, it was taken these two extremes separately.

Following it is possible to see the progress of the topics with more than 400 videos posted over the time.

Graph 2: Topics with more than 400 videos posted

The topics most posted are, in descending order: technology, science, culture, global issues, design, business and society. Technology was on the top of the most posted topics until 2016, when society had a significant increase in the number of videos. It might suggests a change in TED's repertoire.

Following it is possible to see the evolution of the topics with more than 4 millions views over the time.

Graph 3: Topics with more than 4 millions views

The most viewed topics are, in descending order: body language, introvert, mindfulness, success, time and evil. Success and time are two topics that can be a strong relation. So, it is not possible to see a significant change in the videos with more views.

Another factor studied is the relation among the number of view, number of transcript languages and number of comments. The data for the Graph 4 is from the last 5 years and the size of the circle represents the number of comments.

Graph 4: Relation among number of views, number os transcript languages and the number of comments

From Graph 4 it is concluded that the number of views might increases when the video has more transcript languages but it is not possible to have a clear conclusion about the influence of the number of comments.

Future Work

For future work , studies of the network among the topics and analyze the central ones would be helpful to have a wider view of the evolution of the topics.

Scraping more comments information (as comment date, replies and helpful ratings) would be interesting to analyze how the people interact with video or topic. In other words, if there are more interaction in new videos than the older ones for example.


TED posts more talks about technology but it seems that people are more interested in videos related with success/career. Besides that, the number of transcript languages has a positive affect on the number of views.

About Author

Stella Oliveira

Data scientist with a background in financial services and demonstrated experience managing data and deploying predictive models. Highly motivated to combine the ability to thrive in a fast-paced work environment with the fascination for generating insights from complex...
View all posts by Stella Oliveira >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp