USNews Rankings Analysis

Posted on Jul 29, 2019



Large-cap growth mutual funds generate some of the highest participation among financial instruments. These funds are professionally managed and generate profit primarily through the capital appreciation of growth stocks from more cyclical sectors, like Industrials, Consumer Discretionary, and Technology. Considering the average size of large growth funds, they typically have lower expense ratios - the operating expenses charged to the shareholders – when compared to other fund categories. However, expense ratios can vary considerably between funds. depending on their size and investment strategy.


The amount an investor pays in expenses has a great impact on their long-term total profit. Consider the following: Imagine you invested $10,000 in a fund that returns 12% every year with an expense ratio of 1.25%. Over 20 years, your investment would be worth close to $55,000. On the other hand, if that same fund had an expense ratio of only 0.50%, that investment would be worth over $65,000. As you could see, expense ratios are an important factor when considering investment vehicles.


Now that we’ve put the importance of expense ratios in its proper perspective, it is apropos to understand how fund performance and expense all come together to reflect which large growth funds offer the best value for investors. Accordingly, I wanted to examine the relationship between how large growth funds are ranked by the most esteemed investment research sites and the funds’ respective expense ratios.

Luckily, provides a rating that aggregates the rankings from the most popular investment research sites: Morningstar, CFRA, Lipper, The Street, and Zack’s. To analyze  the relevant data, , I web-scraped USNews to bring to light this relationship between large growth fund rankings and expense ratios.


Web-Scraping Intro


The data I scraped was from USNews’ Money section which ranks the top 200+ of every type of investment vehicle. Given the fact that their rating is aggregated by multiple other sites, I found it to be the most valuable site to scrape since, by nature of its aggregation, the rating has a more all-encompassing assessment of each fund’s performance.

Conveniently, USNews formats their rankings on a single, interactive page that can all be viewed at once, and the landing page for each fund’s description can be accessed right from the rankings page. The rankings page lists each fund’s name, expense ratio, and assets under management; each fund’s landing page provides more specific details of the fund’s expenses, risks, performance over time, and more.

Web Scraping Discussed


I managed to scrape the data of all 226 ranked large growth funds and an additional 70 unranked funds. Considering the interactivity of the USNews’ rankings page, I used the Selenium Suite to load and extract all of the rankings data. After scraping through the rankings page, with the href links I gathered there, I looped through each fund’s landing page to extract additional data. That plan fleshed out as follows:


1)      I scrolled down through the rankings page till the page’s pop-up showed. Once it showed, I stopped the scroll instruction, and clicked to close out the pop-up window.


2)      After closing the pop-up, I instructed another while loop to click the ‘Load More’ to load the rest of the rankings. Although there was a limited amount of content on the page, the ‘Load More’ button would still appear whether it was loading content or not; hence, I had to control my loop to click the button a definitive amount of times.


3)      After loading all of the content, I scraped each fund’s name, expense ratio, total assets, and the corresponding href links for each of the fund’s landing pages.


4)      By grabbing all of the href links, I was able to use selenium to loop through my list of links to obtain each fund’s ticker symbol, expense profile, and risk profile.


My Data Set


The total time it took me to scrape was just over an hour; the fact that only 300 funds were assessed made the scraping process fairly manageable. Since I needed to scrape the href links before I could access each fund’s landing page, I had to merge two csv files: one from the data I gathered on the ranking’s page, and the other from the data I gathered on the landing pages. Since I was able to scrape the name of the funds on both scrapes, I merged the two csv files on the fund name. Including the fund name, I was left with 7 columns.


Data Cleaning


Although the amount of data scraped was manageable, the cleaning process was long and detailed. I found the Pandas package to be the most suitable for the cleaning process. To start, the ‘total assets’ column I scraped came as a string, denoting dollar amounts  both in billions (e.g. ‘$4.5B’) and millions (e.g. ‘$300M’). To deal with this, I first had to remove the ‘$’ by replacing it with nothing.

Afterwards, to make the data consistent, I had to translate the rows denoted in billions to millions. I did this by first removing the ‘B’ and ‘M’ to turn the row into integers. Then, considering the minimum total assets under management was no less than $100 million, I used boolean operators to multiply all rows less ‘99’ by 1,000.  This successfully translated all of my rows denoted in billions to millions.



 Organizing my expense ratio column with my expense profile column also took considerable cleaning. Some of the landing pages did not include expense profile keywords, so for some funds I was left with empty values. To account for this, and considering that the expense profile was directly determined by the expense ratio, I determined the max and min of each expense profile based on their expense ratio.

For instance, I found that funds whose expenses were considered ‘Below Average’ were in the range between 0.70% and 1.10%, and funds whose expenses were considered ‘Above Average’ were within 1.30% and 1.70%. I was able to fill out any missing expense profiles using these parameters.


Data Analysis


 One would assume that the best performing large growth funds – those with the best rankings – would have the luxury of charging a premium to its investors (mostly through management fees with contributes the most to the total expense fees) to participate in their superior performance. Yet, as indicated by the data, it turns out that the best-ranked funds actually, on average, charge less than worst ranked funds.



This data suggests a few things.  One: the data seems to support the mostly-accepted phenomena that funds, on average, can’t consistently outperform the indexes. Consequently, it is often the best interest of investors to invest in funds with lower expense ratios since performance among funds will be more or less the same. Two: the data also seems to confirm the  postulate that  more turnover in positions doesn’t necessarily add to fund performance. Insofar as you can’t outperform the average fund in your category, additional turnover only adds to a fund’s expense ratio which, in turn, hurts the fund’s overall performance profile.



To inspect the extent of the impact of a fund’s total assets on its expenses, I decided to graph the relationship between fund rank and its total assets.  Since we know that lower rank means lower expense ratios and higher rank means higher expense ratios, we can use rank as a proxy of measuring a fund’s expenses. One would presume that funds with fewer assets under management would usually need to charge higher expense ratios than those with more assets under management due toeconomies of scale.

Surprisingly though, based on this set of ranked funds, we see there isn’t a true relationship between totals assets under management and expenses ratio. This lack of correlation suggests that the variance in expense ratios among funds is more of a product of business model more than it is about fund size. That is, funds whose approach is based on higher turnover or more position dispersion may be the most dominant reason why there is variance in expense ratios among funds.


To further inspect the characteristics of the top-ranked funds, I created a boxplot and sideways bar graph to measure the fund’s rank in relation to its risk profile. Risk profile refers to a fund’s risk relative to the risk of the other funds in its category. For example, large growth funds that have an annual standard deviation of returns above 15% and a beta above 1.45% would be profiled as ‘High’ risk, whereas funds with a standard deviation below 12% and a beta below 1 will be considered below average.


To my surprise, the data shows that the best-ranked funds assumed high risk whereas worse-ranked funds assumed low risk.


It is worth remembering that the rank data is considered over a 1-year period. It is important to note this because the data suggests that perhaps the most risk-assuming funds are ranked the highest because the risk they assume has allowed these funds to capitalize on the aggressive bull market the most, and that explains their superior performance.


Conversely, it begs the question how well these top-ranked funds will perform during bear markets and more turbulent market times. Will these funds still be ranked the best? Will the high risk be detrimental to them during turbulent times? Is the premium investors pay in expense ratios for the safety that less risky large growth funds provide? I would need to scrape data through multiple economic cycles to truly answer these questions.




From the data I gathered, the large growth funds that have the lowest expense ratios are ranked higher than those with higher expense ratios. Thus, the performance of funds with higher expense ratios isn’t in any way measurably better than those with lower expense ratios. By looking at each fund’s total assets under management, we see that the variance in expense ratios among funds isn’t necessarily due to its size, but more so due to its strategy – the amount of turnover the fund assumes and its dispersion in positions.

Lastly, the data shows that the highest-ranked funds are those that have the highest risk profile. By assuming more risk, the highest-ranked funds capitalize on the momentum of the bull market while still limiting their expenses.


Further Considerations


1)      Scrape the USNews large growth fund rankings over an entire economic cycle, or multiple cycles.


2)      Gather the data of each fund’s turnover ratio to measure the correlation between turnover ratio and expense ratio.


3)      Aggregate the data of each fund family and compare them to each other to see which family characteristics are most prosperous.

The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI