Fortune 500 Companies by Fortune Mag. Visualizing

Posted on Jul 29, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

The Fortune 500 is a well-known list that is published by Fortune magazine each year.  The list ranks U.S. based companies in order of revenue for the respective fiscal year. The idea of using revenue as the main criteria essentially shows us which of these companies are the “largest”.  The companies that made the list are from a variety of different industries, such as: insurance, retail, energy, automobiles etc. Aside from just revenue, the Fortune 500 list includes company assets, profits, market value, number of employees, and a few other categories.  While this list may seem to be more “for show” than useful data purposes, there is still a lot of meaning to explore within these numbers. 

Before I get to the analysis it is important to discuss what I have done to scrape the data from the website.  For those who are not familiar with web scraping, essentially, what I have done is create a function that “crawls” through the desired parts of the website that I want to extract and exports the information into a microsoft excel ‘csv’ file.  In this case, I was taking the numbers from the table found in the following url. ( The function also loops through multiple urls in order to get the same set of data from the years 2019, 2018 and 2017.  I was forced to use the python package “selenium” as opposed to the more efficient “scrapy” because of an existing ‘next’ button on the website, which is clickable but does not change the url.  I was able to click the next button through selenium to get to the next 100 rows for each year. Each number in the table is recognized by selenium by using a unique xpath and by iterating through the rows and then iterating through separate chunks of numbers in each row because of a certain pattern among the xpaths of each column.  After the csv files were created, I used pandas to convert some of the columns to integers and floats, in order for the graphs to work.   

Here is the link to the code

Being that the main goal of this list is to compare the size of these different companies, I decided to take a deeper look into the aspect of growth, and how it relates to other aspects of business.  Naturally, I compared revenue % change with profit, to see if there was a correlation between the expansion of a business and the amount of profit for that particular business.  Also, using plotly allowed my graph to have somewhat of an interactive aspect where you could identify which companies are plotted where if you scrolled over their corresponding point on the graph.  It was interesting to see that the majority of businesses were expanding and profiting simultaneously.  This graph also serves the purpose of being able to identify what sort of direction the company is heading in, in regards to expansion and profitability.

I also looked at the effect that the number of employees has on profitability and revenue.  Based on these graphs, there is a more direct relationship between number of employees and revenue as compared to profit.  Also, these graphs are capable of conveying which companies are making the most out of their number of employees on a per employee basis, if you look at the direction of the point as it corresponds to the intersection of the x and y axis.

The next topic that I covered is market value.  For those who are interested in investing, this is a key figure to look at.  Market value has a lot to do with total revenues and many other measurements of a companies’ finances.  There is also the important factor of perception.  While profits and revenues might be similar for two companies, the market value can be drastically different based on the company’s outlook or the outlook of the industry that the company is in.  The following graphs show the relationship between revenue and market value as well as assets and market value. 

The final part of my analysis was to look at some of the categories of the information to see if there are any aggregate trends over time.  Using a bar graph, I was able to compare the average amount of revenues and assets across each of the years. 

                 Assets By Year                           Revenues by Year











About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI