Analyses of YouTube Statistics in the U.S.A
"The joy of YouTube is that you can create content about anything you feel passionate about, however silly the subject matter."
Zoe Sugg
To understand how I use Data & AI to make actionable insights, please check out my strictly by the numbers player grouping and player comparison dashboards!
Introduction:
Whether you are trying to learn how to speak a new language, learn a new skill, or showcase your personal talent, YouTube tends to be the first option that individuals gravitate towards. The convenience of going viral and being recognized from anywhere in the world is something I believe has been under-appreciated in this age of rapid technological advancement. YouTube has revolutionized the world ever since its emergence on the scene, enabling the "average joe" by providing a platform for entrepreneurship and success in ways that would be improbable without its existence.
With that being said, becoming known globally in many cases can be attributed to a single video. If we look amongst those videos that have successfully gone viral, is there a common denominator amongst them?
Background Information:
As someone who frequently checks out what's new on YouTube, I was motivated to know a bit more about trending YouTube videos. The data set I decided to use was acquired from Kaggle, which provided daily trending video statistics, with up to 200 trending videos per day. Approximately 40,000+ observations were used to perform the data analysis, using the listed features below:
- Video ID (unique identifier for each uploaded video)
- Trending_date
- Channel Title
- Category_id
- Video Description
- Publish Time
- Tags
- Number of Views, Likes, Dislikes & Comments
- Thumbnail link
- video_error_or_removed
- Ratings Disabled an/or Comments Disabled
I also did some additional feature engineering that I felt was necessary in analyzing the data. Some of these additional features include:
- Like Percentage (Proportion of likes)
- Trending Diff (number of days before trending)
- Category (Utilized the cateogry_id to derive a new column called "Category", that included the category names)
This project was programmed in R Shiny, which is an R package that makes it easy for anyone to visualize my findings. The source code is available in my Github repository.
The first thing I decided to look at was the distribution of views per Category. It was not surprising to see that approximately 63% of the total views amongst trending videos were in the Music and Entertainment category.
I then decided to investigate the most frequently used words in Youtube video titles in order to see if there were any keywords in a videos title that was directly correlated to success. I found that "official, trailer, & video" were the most frequently used words in video titles. I was able to do this via natural language processing(NLP), eliminating the "stopwords" and miscellaneous characters while also bringing to light relevant keywords.
Looking more closely at the channels that had the highest tally in trending videos, I noticed that ESPN superseded its contemporaries. This results were expected, as ESPN is a destination not just for people interested in sports but also for audiences interested in its it contentious "Hot Takes" debates.
Lastly, I wanted to examine a time series depiction how certain videos may trend over time. I figured looking into how many days it took a video to trend from when it was uploaded would be a good indicator of the distribution of trending videos as a function of time (days in this case).
It was fascinating to see that the bulk of trending videos were between 2 and 7 days. What does this really mean? Well, my intuition is that if you expect your video to be a one hit wonder, you better hope it starts trending within a week. It is important to note that a video can still be successful even if it does not trend within a week. This is because videos generally accumulate views over time and could still be very profitable.
Conclusion and Future Work:
My findings could be summarized below:
- The Music & Entertainment categories dominate YouTube, drawing 61% of views.
- ESPN has the highest number of trending videos.
- The results show that high trending videos generally start trending between 2 to 7 days.
- "Official Music Video" are the most popular keywords used in formulating YouTube video titles.
In order to come to a final verdict about whether there exists a true pattern amongst trending Youtube videos, this project will have to branch further into international YouTube Statistics. One useful analysis would be to investigate categories that trend in other continents, while seeing if a correlation between culture and Youtube popularity exists.
I would also look at the choices of tags used in these popular videos. Exploring tag names calongside frequently used title words, as tags are typically used for Search Engine Optimizations, could be useful. It would be interesting to see what types of tag strategies the top videos on YouTube could apply to their videos.
The skills the author demonstrated here can are taught in the NYC Data Science Academy's Data Science with Machine Learning bootcamp.