TED Talks: Trends in Global Issues, Science, and Technology
"Scraping TED Talks" is a longitudinal examination of trendiness of TED Talks on global issues, science, and technology. After extracting and transforming unstructured data from multimedia content, different methods and different measures of trendiness were used to inform analysis. Taken together, both methods reveal different sides of the story behind the numbers, as well as the evolution of trends. A composite measure of trendiness was constructed to gain a deeper understanding of the overall trending landscape.
Author: You-Sun Nam, Data Science Fellow
Quick Links: GitHub | Primary Data | Portfolio
Table of Contents
Note: Parentheses indicate estimated length in minutes ("min") or seconds ("sec")
- Executive Summary (2 min)
- Business Case (2 min)
- Methodology (1 min)
- Trends by Total Count vs Tag Frequency (Total: 8.5 min)
- Trends by Composite Measure (3 min)
- Future Updates (27 sec)
- Appendix (46 sec)
- Contact
Executive Summary
Length: 2 min
After scraping, transforming, and analyzing unstructured data from TED Talks in global issues/technology and science/technology, the following insights can be made:
- Unsurprisingly, technology informs the future orientation and practical application of TED Talk content when cross-referenced with global issues or science. This is not surprising given the nature of technology, but the inclusion of this category has had the following in both categories: boosting long-term or future-oriented issues, while downgrading past or short-term current issues.
- Controlling for frequency of TED Talk tags versus without tells different sides of the story behind trends. Without controlling for tag frequency, total count provides a broad, macro understanding of overall trends. Controlling for tag frequency into account (hereby abbreviated as "tag frequency"), on the other hand hand, reveals rising trends within this broad context, previously obscured by total count.
- Both methods (total count and tag frequency) may inform the evolution of trends. Total count identifies current, mainstream trends that are most likely industry-driven, where as tag frequency identifies up-and-coming trends that have yet to be mainstreamed but is gaining popularity among the audience.
- Distinguishing trendiness by different measures of trendiness —hits, audience engagement, and worldwide appeal — are irrelevant by total count, but relevant after controlling for tag frequency. Further research is needed to examine if and how specific trends in each measure identified by tag frequency are interrelated or manifest thematically.
Technical Notes
Demonstrated skills, language(s), and tools:
- Web scraping: Selenium with Python
- Data cleaning: R
- Data visualization: R
Business Case
Length: 2 min
Introduction
If a picture is worth a thousand words, then how much is a video worth? Outside of Excel spreadsheets and existing databases, there is an extraordinary amount of unstructured data to explore. Let's take a look at a sample TED Talk as an example:
Figure 2.1.1 A sample TED Talk
How many data points can you spot in the above screenshot? Here are some data points to start things off: title, speaker, summary, date, location, number of views, transcript, reading list, footnotes, number of comments, the type of comments...
Remember, these are data points from one video. Imagine all the data points from hundreds and thousands of related videos. What does all this data mean, individually and collectively?
Figure 2.1.2 A collection of TED Talks, filtered by category
More importantly, which data points are worth extracting? The answer to this question is: It depends on your research question.
Purpose
What are some trends in global issues, science, and technology companies can tap into?
After scraping and transforming unstructured data from TED Talks on global issues, science, and technology, I conduct a longitudinal analysis of industry trends global development organizations, government agencies, tech companies, and marketers. With scraped variables serving as a proxy for trends or a different aspect of trendiness, I demonstrate how different methods and different measures can provide a wide range of business insights.
With each scraped numerical variable representing a different aspect of trendiness, I then aggregate relevant variables to construct a multivariate indicator. This multivariate indicator clarifies and provides a deeper understanding of trends in global issues, science, and technology on a macro level.
Methodology
Length: 1 min
Scraped Variables
Within the scope of global issues, science, and technology, I scraped the following data points using Selenium with Python:
Textual data | Numerical data |
Tags | Number of views |
Title | Number of comments |
Speaker | Number of translations |
Summary | Year |
Transcript |
Tags are the main focus of study, as a proxy for trends.
Number of views, number of comments, and number of translations are numerical variables measuring trendiness. Each numerical variable serves as a proxy indicator for an aspect of trendiness, which is detailed in the next subsection.
The first part of analysis involves examining tags by each numerical measure of trendiness. Two methods, total count and tag frequency, were applied to each measure.
Constructing Composite Measure of Trendiness
The numerical variables were aggregated to create a multivariate indicator of trendiness. The measures of trendiness are as follows, with each numerical variable as a proxy indicator for some aspect of trendiness:
- Number of views as hits
- Number of comments as audience engagement
- Number of translations as worldwide appeal
Each numerical measure was weighted equally.
The second part of the analysis involves examining tags by this composite measure of trendiness.
Trends by Total Count vs Tag Frequency
Total Length: 8.5 min
There are multiple ways to measure trendiness by tags (as opposed to overall trendiness, which is what the previous measures indicate). In this project, we use two methods: first by total tag count, and second by tag frequency. Afterwards in the "Total Count vs Tag Frequency: Which is Right?" section, the different stories told by each method are analyzed and used to inform which method provides more value.
Total Count
Length: 2 min
The figures below illustrate the Top 20 trends in either category –– Global Issues and Technology or Science and Technology –– according to total number of views (hits), number of comments (audience engagement), and number of translations (worldwide appeal).
Global Issues and Technology
Note that aside from a few exceptions, the top trends remain consistent across all measures and are fairly interchangeable in ranking.
Figure 4.1.1.1 Top 20 Trends in Global Issues and Technology by Number of Views
Figure 4.1.1.2 Top 20 Trends in Global Issues and Technology by Number of Comments
Figure 4.1.1.3 Top 20 Trends in Global Issues and Technology by Number of Translations
For easy comparison, here's the top five trends across all three measures, total number of views, total number of comments, and total number of translations:
Rank | Total Views | Total Comments | Total Translations |
1 | Climate change | Business | Culture |
2 | Future | Culture | Business |
3 | Culture | Design | Design |
4 | Business | Politics | Future |
5 | Environment | Climate change | Climate change |
The trends seem to hold across each numerical measure of trendiness, with rather insignificant interchangeability in rankings. Let's see if the same pattern of consistency occurs with a different category of videos.
Science and Technology
Having carried out the same analysis with a different category (this time, science and technology), we can see the same pattern of consistency and ranking interchangeability occur.
Figure 4.1.2.1 Top 20 Trends in Science and Technology by Number of Views
Figure 4.1.2.2 Top 20 Trends in Science and Technology by Number of Comments
Figure 4.1.2.3 Top 20 Trends in Science and Technology by Number of Translations
For easy comparison, here's the top five trends across all three measures, total number of views, total number of comments, and total number of translations:
Rank | Total Views | Total Comments | Total Translations |
1 | Innovation | Innovation | Innovation |
2 | Future | Future | Future |
3 | Invention | Engineering | Biology |
4 | Engineering | Invention | Invention |
5 | Design | Biology | Medicine |
6 | Biology | Medicine | Design |
What can we conclude so far? Based on the same pattern of consistency and ranking interchangeability, it is likely that the top trends are more informed by the generalizability and frequency of the tags, rather than illustrating anything meaningful.
Tag Frequency
Length: 2.5 min
This time, let's carry out the same analysis, but controlling for frequency of tags. This should also weed out the issue with broad, generalizable tags.
Global Issues and Technology
After controlling for tag frequency, note that the pattern of consistency and ranking interchangeability has pretty much disappeared.
Figure 4.2.1.1 Top 20 Trends in Global Issues and Technology by Number of Views Per Tag Count
Figure 4.2.1.2 Top 20 Trends in Global Issues and Technology by Number of Comments Per Tag Count
Figure 4.2.1.3 Top 20 Trends in Global Issues and Technology by Number of Translations Per Tag Count
For easy comparison, here's the top five trends across all three measures, number of views per tag frequency, number of comments per tag frequency, and number of translations per tag frequency:
Rank | Views Per Tag Frequency |
Comments Per Tag Frequency |
Translations Per Tag Frequency |
1 | Rocket Science | Iraq | Vaccines |
2 | Mars | Europe | Plastic |
3 | Industrial Design | Online video | Iraq |
4 | Life | Military | Interview |
5 | Religion | Demo | Library |
No longer do number of views, number of comments, number of translations correspond to each other after controlling for tag frequency. Here we can see several, interesting patterns to differentiate by hits, audience engagement, and worldwide appeal.
Science and Technology
After controlling for tag frequency for science and technology TED talks, the pattern of consistency and ranking interchangeability has dramatically reduced.
Figure 4.2.2.1 Top 20 Trends in Science and Technology by Number of Views Per Tag Count
Figure 4.2.2.2 Top 20 Trends in Science and Technology by Number of Comments Per Tag Count
Figure 4.2.2.3 Top 20 Trends in Science and Technology by Number of Translations Per Tag Count
For easy comparison, here's the top five trends across all three measures, number of views per tag frequency, number of comments per tag frequency, and number of translations per tag frequency:
Rank | Views Per Tag Frequency |
Comments Per Tag Frequency |
Translations Per Tag Frequency |
1 | Manufacturing | Social media | Toy |
2 | Social media | Gaming | Personality |
3 | Gaming | Compassion | Language |
4 | Compassion | Body language | Introvert |
5 | Body language | Birds | Evolutionary Psychology |
After controlling for tag frequency, we can see number of views and number of comments tend to correspond for the top 5 trends, with more variation in later rankings. On the other hand, there is little relationship between number of translations and the other two measures, at least for the top five trends.
Total Count vs Tag Frequency: Which is Right?
Length: 3.5 min
Different methods (total count and tag frequency) tell different stories. Which story is "right"? Which method should be used to measure trendiness? The short answer is 'both,' in that both stories are "right" and both methods should be used to measure trendiness. So if both methods are right, how do we account for the different conclusions?
Tag count paints a broad picture of trends in global issues, science, and technology. It is primarily useful for gaining an overall understanding to contextualize. Tag frequency, on the other hand, gives us more meaningful insight into the rising trends obscured by total count.
Business Value
To illustrate the business value of using both methods to tell a different aspect of the story, let's compare the top five global issues and technology trends identified by total count...
Rank | Total Views | Total Comments | Total Translations |
1 | Climate change | Business | Culture |
2 | Future | Culture | Business |
3 | Culture | Design | Design |
4 | Business | Politics | Future |
5 | Environment | Climate change | Climate change |
...to the top five global issues and technology trends identified by tag count after controlling for tag frequency.
Rank | Views Per Tag Frequency |
Comments Per Tag Frequency |
Translations Per Tag Frequency |
1 | Rocket Science | Iraq | Vaccines |
2 | Mars | Europe | Plastic |
3 | Industrial Design | Online video | Iraq |
4 | Life | Military | Interview |
5 | Religion | Demo | Library |
Assume you're a marketing analyst at a multinational technology firm that is interested in expanding their presence in international affairs. Your task is to identify current and future trends that can be used to inform the direction of the firm's tech products and services, which would be used by the the firm's clients to address global issues. Afterwards, you are to present your analysis to the upper management. How will you reconcile the different results?
Distinguishing the Trend
If total count provides insight into the broad, overall trends and controlling for tag frequency uncovers trends obscured by this border context, then here is how you might distinguish the trends identified by each method:
- Total count: Broadly speaking, the top five trending TED Talks in global issues and technology revolve around current societal problems that have continuity into the future. Environmental issues, such as climate change, is one prominent example. Measured by total count, these trends illustrate current trends. This is because we are not taking tag frequency into the account, which means popularity of these trends are influenced by sheer number. In turn, this suggests these trends are throughly mainstreamed, driven by the industry as a whole.
- Tag frequency: After controlling for tag frequency, we see that the top five trending TED Talks are far more specific in topic and scope. Given the different results across each numerical measure of trendiness, there appears to be little to no correspondence between hits, audience engagement, and worldwide appeal. If tag count provides us insight into current trends, then tag frequency identifies up-and-coming trends. Because the popularity of these trends is not influenced by sheer number, these trends are most likely not mainstreamed and driven by key, individual players.
- Business recommendation: First, improve and refine current line of technological products and services for current, future-impacting issues such as climate change, but expect some market saturation. Second, the identified up-and-coming trends inform the direction R&D should take when designing and targeting future technological products and services. However, more research on the identified up-and-coming trends is needed beforehand.
Trends by Composite Measure
Length: 3 min
Recall from the "Methodology" section that a composite measure of trendiness was constructed by equally weighting each individual numerical measure. Each numerical measure served as a proxy for the following (a.k.a. what we wanted to measure):
- Number of views as hits
- Number of comments as audience engagement
- Number of translations as worldwide appeal
Going back to tag count, let's broaden our understanding of the overall context using this composite measure of trendiness.
Global Issues and Technology
Below is a lollipop graph depicting the most trending and least trending TED Talks in global issues and technology:
Figure 5.1 Top 10 and Bottom 10 Trends in Global Issues and Technology by Average Composite Measure of Trendiness
At this point into the analysis, the top 10 trends shouldn't come off as a surprise. From this graph, we can see that the most trending TED Talks focus on practical applications ("business," "design," "invention," "communication", "collaboration") to current problems that will continue to affect society in the future ("future," "climate change"/"environment", and "politics"/"culture"). In comparison to the bottom 10 trends, the top 10 trends are globally applicable and broad in scope.
The least trending TED Talks in global issues and technology, on the other hand, are less interconnected. Unsurprisingly, the bottom 10 trends are also less broad in scope than the top 10 trends. We also know from the first part of analysis that these topics are not necessarily unpopular because of tag frequency. (Even after controlling for tag frequency, none of these topics appear in the top 20 trends.) Taken together, all of these points suggest that these trends are fairly niche, appealing to a minority of global issues and technology-browsing TED audience.
Science and Technology
Let's take a look at the most trending and least trending TED Talks in science and technology:
Figure 5.2 Top 10 and Bottom 10 Trends in Science and Technology by Average Composite Measure of Trendiness
The most trending TED Talks in science and technology focus on practical applications to practical applications ("innovation," "invention," "engineering," "medicine," "design," "biotech") to biological issues ("biology," "health," "brain") with implications for the future ("future"). In comparison to the bottom 10 trends ("Middle East," "South America"), the top 10 trends are international and broad in scope. These results are fairly similar in theme to the most trending talks in global issues and technology, undoubtedly the influence of the 'technology' category.
The last trending TED Talks in science and technology, on the other hand, are less interconnected. Unsurprisingly, the bottom 10 trends are also less broad in scope than the top 10 trends. We also know from the first part of analysis that these topics are not necessarily unpopular because of tag frequency. (Even after controlling for tag frequency, none of these topics appear in the top 20 trends.) Taken together, all of these points suggest that these trends are fairly niche, appealing to a minority of science and technology-browsing TED audience. They may, however, appeal to a different segment of TED audience.
Future Updates
Length: 27 seconds
- Replicate project with a larger sample size, i.e. similar videos outside of TED Talks
- Examine the popularity of speakers as a variable. How does a speaker's popularity and reputation affect these measures of trendiness?
- Generate meaningful subcategories by analyzing textual data using Topic Modeling
- Analyze case studies: Conduct sentiment analysis using NLP on comments left on most popular TED Talks in global issues, science, and technology
Appendix
Length: 46 seconds
The first and second parts of analysis were longitudinal in nature, focused on identifying trends over time. A shorter longitudinal study or a cross-sectional analysis restricted to a specific year can also be conducted. Brief analysis was carried out in efforts to understand how time might affect the trendiness of TED Talks in global issue, science, and technology. Each numerical measure of trendiness displayed different trends in both global issues/technology and science/technology.
Exploratory data analysis suggests that more investigation is needed into dramatic spikes in specific years, especially the year(s) that overlap across each measure of trendiness. Speaker should also be taken into account when conducting cross-sectional analysis.
Contact
If you have any questions or comments, please feel free to reach out to me on LinkedIn or GitHub.
Quick Links: GitHub | Primary Data | Portfolio