Is BTS really popular in United States than in South Korea?

Posted on Jul 28, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

 Some of you have heard about BTS. Some of you may have heard of it for the first time. I wasn't very interested in BTS either before this project.

 The reason I chose this topic is because suddenly, many media outlets began reporting that BTS topped the Billboard charts one day, and then various media outlets began to talk about BTS more actively than ever before.

 So I thought BTS became more popular in Korea since then. I wondered whether it was more popular in the U.S. than in Korea, or whether the popularity of Korea was affected by the popularity of Korea due to its popularity in the U.S. or if there was another factor.

 

Who is BTS?

BTS background information

 

 This is background information of  BTS.

 BTS is not a popular group from the beginning of its debut, but it is a boy-band of Korea that gained ground by doing active domestic/external activities.

 

for more information of BTS, click follow wikipedia link

https://en.wikipedia.org/wiki/BTS_(band)

 

 

The way to analysis

 From BTS debut to present, I selected the Billboard 200/hot 100/artist 100 from Billboard.com and compare it each other with the top 100 ranking from Melon.com.

 You all know what is Billboard, and Melon is a Korean music streaming service. Although recently it has been controversial for its abusing to raise unknown artist's ranking, it still present popular music ranking.

 So, basically, to find the answer to the following questions, I try to establish two hypotheses and then verify them with the data.

Are leading/following charts identified?
Is there a connection between each datas?

 

Issue of scraping

 Before analyzing the data, I want to share issues while I was scraping.

 At first, I tried to crawl simply by using scrapy, but the billboard used scrapy and melon used selenium.

 Because, in Melon's case, the http 406 error occurs when scraping a ranking page(https://www.melon.com/chart/search/).

HTTP 406 error

 It's a case of blocking crawling on the Melon.com site.

 In some cases, the firewall prevents scraping, but in this case, only certain pages are blocked from crawling, making it easy to bypass.

 Since it is not blocked by the firewall, it can be resolved by simply accessing the parent page, having the session, and moving to the ranking page and crawling with selenium.

 

Hypothesis 1

BTS first became popular in the U.S. and was popular in Korea due to its influence.

 If the hypothesis is correct, the ranking indicator of the Billboard.com will precede the ranking indicator of the Melon.com. Therefore, I decided to compare the ranking of the BTS album by date to see if the ranking of the Billboard.com precedes the Melon.com

 

 This is billboard200 ranking of BTS by date.  The debut is in 2013, but it is starting to make it to the top of the rankings in late 2016.

 

 This is melon ranking of BTS by date.  Melon has a song ranking instead of album ranking. So, I assumed that the highest song on the album was ranked on the album list.

 The graph looks more complicated but the overall trend is similar in Korea as it also includes singles and remixes, not official albums.

 

 This is the ranking of songs in the same period. I consider there is no big connection between song and album rankings

 So I will exclude the billboard Hot100 chart from the subsequent analysis.

 

 These are graph of the latest of the two charts zoomed in and compared. 

 For the previous album(Love yourself:Answer), you can see that it is ranked a little higher in the U.S. than in Korea.  At the time of its release in Korea, the album ranked low and rose sharply. And it was ranked No. 1 as soon as it was released in the U.S. 

 Usually, for artists who are very popular in Korea, the album tops the list as soon as it is released.

 For example, Look at the follow chart of a girl group called Twice, you'll see that most of the albums are No. 1 as soon as they're released.

 

 But this doesn’t exactly mean BTS is more popular in U.S. than in Korea.
 Moreover, the hypothesis that the ranking in the U.S. will outpace that in South Korea is not proven. Rather, it is because of the release date, so its ranking in Korea is ahead of ranking in United States.

 After all, the two charts were similar, but the charts alone lacked data to verify the 1st hypothesis.

 

Hypothesis 2

BTS' artist's ranking of billboard may have affected the album ranking. 

 To verify the second hypothesis, I decided to compare Billboard's artist ranking with the album and find a correlation between the two data.

 Let's start with the artist's ranking.

 Since late 2016, you can see the ranking started to feel dizzy and gradually rise. The key point is that it has remained in the top spot since its sudden rise around April 2018.

 Let's compare the album ranking to the artist's ranking at the same time.

 Whenever the album topped the charts, the artist also topped the charts. It should be noted that the artist's ranking has skyrocketed since the release of the album(in the orange-colored) in April 2018, with all subsequent albums topping the list. The album, released in April 2018, is the third full-length album released in Japan, and the popularity of BTS appears to have risen by leaps and bounds.

 

 Here's an interesting fact.

 When a new album is released, the ranking of the previous album will be raised as well.

 

 We can see the same thing on the Melon chart.

 This means that the higher the artist's ranking, the more attention the old album gets when it comes to releasing a new album.

 

Correlation between artist ranking and album ranking in Billboard

 This graph shows the relationship between album ranking and artist ranking. As a matter of course the higher the ranking of the album, the higher the artist's ranking of the artist.

 The above results suggest that a second hypothesis is resonable. The artist's ranking eventually affects the album's ranking and the album's ranking keeps the artist's ranking. In the end, there is a significant correlation between the popularity of artists and the album ranking.

 

Conclusions

 Whether the popularity of BTS in the first assumed U.S. influenced Korea is hard to prove from the data collected at the moment. By a small margin, the U.S. may claim to be more popular than South Korea, but there is a logical leap forward. And the U.S. album ranking does not precede that of Korea, whether it is due to the release period of the album.

 The second assumed album and artist's rankings have a lot to do with each other. The popular artist's album draws attention as soon as it is released and it also affects the artist's popularity again.

 Overall, it has become popular in the U.S. and Korea at the same time in general, but it is hard to figure out how the data affected each other. However, the artist's popularity has skyrocketed since its third full-length album in Japan, which was released in April 2018, and it is certain that it has gained more popularity in both the U.S. and South Korea since then.

 In the future, it would be better to focus on this area and analyze unstructured data such as SNS and newspaper articles about what happened in April 2018 to make a more accurate analysis of BTS popularity.

 

Thank you.

About Author

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI