Hong Yang (Jason) Wang
Posted on Feb 18, 2019

YouTube is one of the websites I use every day. I generally look for pop-music or comedy shows to watch during my free time. Some of my friends turn to YouTube for tutorial videos to solve their problems. The popularity of the platform has opened up the opportunity for those who broadcast on it to achieve both fame and fortune. That made me consider, is it possible to predict which channels will prove successful?  I came up with some questions: How does YouTube channel target their audience? What's the correlation between channel subscribers and video views? To find the answer, I used Scrapy to scrape data from Social Blade and analyzed the dataset.

You can see my project code and powerpoint link below:


I targeted ten variables of top 5000 YouTube channels. You can see one of the examples above highlighted with a red line; username, uploaded videos, subscribers, video views, country, channel type, channel created date, subscriber rank, video view rank, and estimated yearly earnings.

However, I encountered two problems during and after the web scraping. One is that some links didn’t connect t directly to the right page like above but only to search results pages as in the example below. Some other links even redirected me to an unrelated page.

Another difficulty I have is YouTube Official channels do not show information like uploaded video, video views, country, channel type, view rank, and estimated yearly earnings. You can see one of the examples below.

I lost nearly one thousand observations after scraping from the website, because of the unstable links I could only scrape four thousand observation. During my cleaning process, I removed Official YouTube channels from my dataset to prevent outlier in my analysis.



The pie chart above showed the lion’s share of YouTube channels --36/5 percent --  are from the U.S.A., followed by Brazil with 11.3 percent and India with 9.97percent.

The subscriber share differs slightly as you can see from the following bar chart. Channel from the United States and India have the most subscribers. One of the reasons is India is the second largest populated country in the world, and the United States is the third. While China is the first largest populated country, it is not represented because it does not support YouTube; the video-sharing website in use there is called Iqiyi.


The pie chart below shows the different types of channels on YouTube. The top three are entertainment with 26.8 percent, music with 20.8 percent, and games with 12.2 percent.

In the bar chart below, you can see that comedy, music, and games type of channels have the most subscriber on YouTube.


However, according to the bar chart below, music YouTube channels draw more video views than other types of channel. There are many factors that contribute to their popularity, including the fact that people use YouTube to listen to music. YouTube has another feature for people to listen to music on their phones while the screen off, and they could even download music to listen in their phone. This makes a significant change in the industry, drawing the music listeners in from competitors like Pandora and Spotify.

In this section, you can see the following subscriber-view scatter plot. Subscriber and video view are highly correlated. The more subscribers Youtubers have, the more views they will get.


The following two scatter plots are video uploaded-view and video uploaded-subscriber. They are not highly correlated, so this means the graph is random.  This indicates that a YouTube channel with a lot of videos uploaded won’t necessarily draw more views and subscribers. The key is to target your audience by giving what they want to see. Those who can deliver on that gain views and subscribers.


Going through the data reveals data associated with popular YouTube channels by, identifying key correlations. We can see how popularity divides geographically with U.S. and India YouTube channels showing the most subscribers. Popularity by genre indicates that comedy and music type of channels have the most subscribers, though music as a category draws the most video views overall on YouTube. Another clear correlation is that the more subscriber channels have, the more views they will get.

Future Work

In this project, I scraped estimated yearly earnings as one of my variables. However, I did not use that in my analysis process. The reason is the calculation of the earning is complicated and different from each YouTube channels. There are many factors to calculating the income of a YouTube channel, like subscriber, CPC(Cost per Click), CPM(Cost Per Mille). For future work, I would like to look into each YouTube channel's earnings based on its subscribers, views, the average length of a video, and the average ad’s cost. To do so, I will need to scrape more information from other websites.

About Author

