Topics from TED Talks
The company TED Conferences LLC posts talks online for free distribution and they have been watched billions of times worldwide since the site was launched. The purpose of this project is to study how the topics of these videos changed over the time.
This is a web scraping project and all information was scraped from ted.com/talks using Selenium. More than 3 thousands videos were scraped and the items extracted from each one are: title of the video, description of the video, keywords (or related topics), month and year that the talk occurred, number of views, number of transcript languages and number of comments.
You may find the code file here.
Each video has several keywords related and there are more than 4 hundreds different topics. At the scatter plot below, it is possible to see the distribution between the number of videos and the average number of views for each topic.
It was a surprise to see from Graph 1 that the topics with more videos posted is not the most ones watched. To study the evolution of the topics over the time, it was taken these two extremes separately.
Following it is possible to see the progress of the topics with more than 400 videos posted over the time.
The topics most posted are, in descending order: technology, science, culture, global issues, design, business and society. Technology was on the top of the most posted topics until 2016, when society had a significant increase in the number of videos. It might suggests a change in TED's repertoire.
Following it is possible to see the evolution of the topics with more than 4 millions views over the time.
The most viewed topics are, in descending order: body language, introvert, mindfulness, success, time and evil. Success and time are two topics that can be a strong relation. So, it is not possible to see a significant change in the videos with more views.
Another factor studied is the relation among the number of view, number of transcript languages and number of comments. The data for the Graph 4 is from the last 5 years and the size of the circle represents the number of comments.
From Graph 4 it is concluded that the number of views might increases when the video has more transcript languages but it is not possible to have a clear conclusion about the influence of the number of comments.
For future work , studies of the network among the topics and analyze the central ones would be helpful to have a wider view of the evolution of the topics.
Scraping more comments information (as comment date, replies and helpful ratings) would be interesting to analyze how the people interact with video or topic. In other words, if there are more interaction in new videos than the older ones for example.
TED posts more talks about technology but it seems that people are more interested in videos related with success/career. Besides that, the number of transcript languages has a positive affect on the number of views.