Anime Exploration: Analyzing Different Elements of Anime

Posted on Feb 17, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

The shiny app of this project can be accessed here and this app contains many interactive elements. Corrsponding code can be found at the GitHub repositotry.


What comes to your mind when you see the word anime? You may be thinking about the popular anime movie Your Name and those classics from Ghibli Studio, Spirited Away, Totoro for example. Anime is one of the art forms that originated in Japan.

Anime is particularly popular among teenagers from middle school to college. Some particular genres, however, enjoy huge popularity from all age ranges. Being part of the massive ACGN (Anime, Comics, Game, and Novel for short) industry, anime has a lot of connections with manga, light novels, and games. For this project, I created a shiny app with R to analyze different factors of the anime and how would those elements affect the popularity and ratings of particular animes.

The analysis in this article is somewhat long. If you are interested in the findings of the project, feel free to scroll down to the second to last section that summarizes the takeaways.

Methodology and Tools

The statistical software R is utilized in this project. After obtaining the dataset, R packages including dpylr, tidyr is used to clean the data for later analysis. Following the stage of data cleaning, corresponding plots are created using the ggplot2 package. Lastly, shiny related packages (shiny, shinydashboard and shinywidgets) are used to create the shiny app, add interactive features and deploy onto the web.

A Snapshot of the Shiny App

If you would like to find more, please access the app and try out the interactive features.


Similar to movies and tv shows, people rate animes. MyAnimeList is one of the biggest rating/ranking websites for animes and is sometimes called the "IMDB of animes". The dataset is obtained from Kaggle and the dataset is based on the info collected from MyAnimeList. The dataset contains information on two major parts, information about the anime and statistics specific to MyAnimeList.

The first part contains information about the title, date premiered, genre, type of the anime and so forth. The second part includes scores, number of audience and some other data of the anime from MyAnimeList. After cleaning the data, the dataset contains information of 8265 animes with 16 features associated with each title.Β 

Analysis of Different Features

Taking the first glimpse of the dataset, we found that the average score of all animes in the dataset is 6.74 out of 10, which just passes the passing grade. Also, the average audience for each anime is 38,927, close to 40,000. More details about the analysis of the relationship of statistics from MyAnimeList can be found in later subsections.

We are ready to drill down to each factor and see how would they affect the scores and number of audiences for the anime. Due to the lack of data concerning budget and profits, we will use the number of audiences as the measure for popularity and possible business benefits throughout the analysis.

Types of Animes

In the cleaned dataset, there are four types of anime: TV, OVA, movie, and music. TV refers to the anime that is aired on television, mostly in Japan, and also on the internet. OVA means a bonus track of TV animes where each TV anime typically have one or more OVAs. OVA serves as the connection between two seasons or simply an addition to the regular TV anime contents. TV and OVA count up more than 80% of all the animes in the dataset and the percentage of them are quite similar. Movie type made up 14% and music type fills the remaining 2%.Β 

Density Plots of Scores of the Anime VS. Different Types

As music type is quite rare and is usually not the type of anime that anime producers would like to consider, we would omit it for the analysis. We could see that the distribution of scores for movies and TVs are quite similar to the two density curves overlap extensively. OVA, on the other hand, has an average that is below the other two types. The calculated sample average of animes of type TV and movie only differs by 0.01 and statistical tests confirm the equality of these two measures.

Boxplots of Number of Audience VS. Types of Anime

Different Audience

The above plots demonstrate the difference of audience for each type of anime. We could see that from this measure, we could see that TV animes enjoy the greatest popularity among these two types and the OVA type has the least popular. Statistical tests are performed and we accept the hypothesis that the difference of audience exists concerning different types of anime.

In a nutshell, if anime producer wants to pick an anime-type, the safest type to pick is the TV anime because it enjoys the greatest popularity and the scores of the TV anime are not bad. Another practical reason is that movie animes take much longer time to make and many things thus become much more unpredictable, box office for instance.

Sources of Animes

As was mentioned in the background, anime is part of the ACGN industry.Β  Manga, light novels are popular sources of anime. Besides these two, original animes are also popular where anime companies write up the plot and design the characters themselves. These three sources count up to nearly 80% of all animes and we will focus on these two sources.

Density Plots of Scores of Animes VS. Different Sources

We could observe from the above density plots that animes that areΒ  sourced from manga has the highest average score and original animes' average score is more towards the lower side. The summary statistics confirm that and statistical tests indicate that there are differences in mean scores of the animes that are based on these three sources.

Boxplot of Number of Audience VS. Different Anime Sources

The above plot depicts the number of audience versus different anime sources. Original animes enjoy the lowest average popularity in terms of the number of the audience while the other two are quite similar. Though the summary statistic of the mean number of audiences is almost the same for mangas and light novels, the statistical test shows that they are statistically different.

Inconsistent qualities

The low score of the original animes would due to the inconsistent qualities. As the chosen mangas and light novels have established popularity while original stories do not. To summarize this subsection, if the anime producer would want to choose a specific source for making an anime, the advice would be to choose from the manga and light novel sources.

Producing original anime is a more risky choice unless the plots and settings are attractive enough. That being said, anime studios are facing a difficult decision since sometimes they don't make much profit by producing animes that are sourced from manga or light novels where the original content owner makes the most money.

Different Ratings of the Anime

Similar to movies, animes have been assigned some ratings, ranging from G to Rx. When the anime is aired in Japan, different ratings of the anime would possibly result in a different timeframe that anime could be played on TV. The ratings of animes in Japan are more general while the ones on MyAnimeList tend to more specific. The following table provides some summary statistics with the removal of outliers.

Summary Statistics With Respect To Ratings of Anime

We could observe that animes that are rated R enjoys the highest popularity and the boxplot confirms this.

Boxplot of Number of Audience Versus Different Ratings of Anime

The major reason would be that some R rated animes, death note, attack on titans, for example, enjoy surprisingly high popularity around the world. In reality, though, PG-13 animes are still the mainstream (counts up nearly half of all the animes). A piece advice to the anime producers would be that it is safe to make a PG-13 anime since the number of audience and the scores for this rating are above average. Nevertheless, it is always worthwhile to try to make an R-rated anime which would possibly attract more audience around the world and lead to greater economic benefits.

Anime Making Studios

There are many anime studios within the industry. Famous studios include Studio Ghibli(the studio that made Spirited Away), Sunrise(Gundam Series) and so forth. Many new studios appear each year. To compare anime studios, we will focus on the studios that make the most animes per year. If you are interested in comparing certain animes, you could select specific anime studios in the shiny app and compare their statistics.Β 

Number of Animes Made Per Year of Top 10 Studios

The above barplot presents the average number of animes made by anime studios per year. These big studios made at least one anime per season and two to three per season for the top 5 studios. The barplot follows are the average score of the anime made by these studios.

Average Scores of Animes For Top 10 Studios

From the plot, we could see that in general, the score doesn't differ too much though some studios like DLE has comparatively low score when compared to others.

Higher than Average

If you still remember the 6.74 score average for the dataset, the animes made by big studios generally receive higher than average scores.

Average Number of Audiences of Animes Made By Top 10 Studios

By looking at the above barplot, we can see that the average audience differs quite a lot among these studios. The average audience of all animes collected is close to 40,000 and 4 out of 10 studios in the plot fail to reach that. Though this may not have a direct link to the economic benefits, Sunrise profits quite a lot from selling Gundam models for example. It is still an indicator that reminds the studio to keep its brand value. In other words, it suggests that the studio may not make much money by making more animes each year. The key point would be to pick good stories and make great animes each time.

Anime Genres

As with other arts, anime has many genres, this dataset contains a total of 47 genres. Popular genres include comedy, action, adventure and so forth. The production of the above genres each year can be around 30-60 while we can only see less than 10 productions for the less popular genres.

Choosing a popular genre, on the other hand, doesn't lead to making a profit and receive a higher score from the audience. The following two histograms demonstrate the average score and average audience for each of the top 10 genres. Comedy has the largest production per year out of these 10 genres while the supernatural has the least production per year.

Average Score of Animes From Top 10 Genres

From the above histogram, we can see that the average score is quite similar for these genres, with a difference of less than 0.5 for each pair.

Average Audience for the Animes from Top 10 Genres

The average audience for each genre differs significantly as we can observe from the above histogram. The average audience of comedy animes is only half of that of the school animes. The difficult situation here is that the cost of producing different anime genres are different. As you could imagine, producing a comedy anime cost less than producing action/supernatural animes. More data would be needed to perform a case by case study for each genre. At the current point, it might be safer but it is not suggested to make anime of comedy genre/adventure type where the market might be saturated. It would be best for anime producers to make school, romance, or supernatural animes if nice plots exist.

Other Exploratory Analysis

The Year When Anime is Made

For this subsection, we will look into the anime made each year.

Number of Animes Made Each Year

Average Score for Anime Made Each Year

Average Audience for Anime Made Each Year

From the above scatterplots, we could conclude that there are more and more animes being made each year. The average score, however, oscillates between different periods. The drop and boost in score within the 1990s might be related to the Otaku Incident that happened in Japan during that time and the appearance of the anime Evangelion.

After that, the score gradually went up until the recent 5 years. The audience gradually increases and the era of the Internet would account for this. The drop in average audience for the last piece would be due to the limitation of the dataset where the anime data is collected only through the year of 2018.

Anime Durations

We will look into duration of animes for this subsection.

Count of Animes Made (Outliers Removed)

Average Score of Animes Made in Each Duration

In the first scatterplot, the outliers are the animes with a duration between 20-25 minutes, which are the most common anime durations. Besides that, we can see that there many animes that have a duration of fewer than 10 minutes. Also, the number of anime movies ( animes with duration around 2 hours) and the number of long anime episodes ( animes with duration around 50 minutes) are quite similar. In terms of score, however, anime movies on average receive higher scores.

Anime Opening/Ending Themes

Finally, let's take a look at the opening and ending songs for the anime. To make the following boxplots, I separated the anime dataset into two groups. The first group is those animes that have an opening or ending theme sang by artists who sing a lot of anime songs. In the following plots, the top 10 artists refer to the artists that sing the top 10 most anime songs.

Average Score of Animes Cooperated With Top 10 Artists or Not

Average Audience of Animes Cooperated With Top 10 Artists or Not

From the first boxplot, the group of animes that cooperate with top artists has a lower average score than the other group. The number of audiences, on the other hand, is quite similar between the two groups. Thus, according to this dataset, there is no need to cooperate with top artists for making anime songs if the goal is to increase the popularity of animes.

Conclusions and Takeaways

The takeaways of each section are summarized as follows and the takeaways could possibly help people in the anime industry when they want to decide upon what kind of anime they would like to make.

  • The safest type to pick is the TV anime type. Movie type animes are worth trying to provide adequate time and budget.
  • Mangas and light novels are safer sources to be chosen when making an anime. Original animes are risky in terms of the score and number of audience
  • PG-13 animes is the mainstream and it is recommended when picking a specific rating. R-rated animes are more likely to be a great hit around the world and are worthwhile to try to make one.
  • Different anime studios can differ a lot in terms of the average number of audience. Making more animes each year may not help the anime studio make more money.
  • More data would be needed to determine whether producing animes for certain genres could lead to a profit. The general advice for anime producers is to try making animes that fall in the school, romance, and supernatural genres. Also, it would be good to avoid comedy/action animes as the market is saturated with these two genres.

Future Work

As for future work, I would like to investigate the dataset about users and see if any interesting insights could be drawn from there. Also, I would like to look into the Japanese anime rating website and merge two datasets in some way and see there is any interesting observations could be made. Further, if possible, I would be interested to look at the budget and profits and perform some data analysis around those metrics as well.

About Author

Hanbo Shao

Data Scientist with a strong quantitative background in mathematics and operations research. Detail-oriented, curious and highly motivated to apply data analysis and machine learning skills into solving real-life problems. A collaborative team player and loves to learn new...
View all posts by Hanbo Shao >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI