Fortune 500 Companies by Fortune Mag. Visualizing
Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
The Fortune 500 is a well-known list that is published by Fortune magazine each year. The list ranks U.S. based companies in order of revenue for the respective fiscal year. The idea of using revenue as the main criteria essentially shows us which of these companies are the “largest”. The companies that made the list are from a variety of different industries, such as: insurance, retail, energy, automobiles etc. Aside from just revenue, the Fortune 500 list includes company assets, profits, market value, number of employees, and a few other categories. While this list may seem to be more “for show” than useful data purposes, there is still a lot of meaning to explore within these numbers.
Before I get to the analysis it is important to discuss what I have done to scrape the data from the website. For those who are not familiar with web scraping, essentially, what I have done is create a function that “crawls” through the desired parts of the website that I want to extract and exports the information into a microsoft excel ‘csv’ file. In this case, I was taking the numbers from the table found in the following url. (https://fortune.com/fortune500/2019/search/) The function also loops through multiple urls in order to get the same set of data from the years 2019, 2018 and 2017. I was forced to use the python package “selenium” as opposed to the more efficient “scrapy” because of an existing ‘next’ button on the website, which is clickable but does not change the url. I was able to click the next button through selenium to get to the next 100 rows for each year. Each number in the table is recognized by selenium by using a unique xpath and by iterating through the rows and then iterating through separate chunks of numbers in each row because of a certain pattern among the xpaths of each column. After the csv files were created, I used pandas to convert some of the columns to integers and floats, in order for the graphs to work.
Here is the link to the code https://github.com/jdsipala/seleniumProj
Being that the main goal of this list is to compare the size of these different companies, I decided to take a deeper look into the aspect of growth, and how it relates to other aspects of business. Naturally, I compared revenue % change with profit, to see if there was a correlation between the expansion of a business and the amount of profit for that particular business. Also, using plotly allowed my graph to have somewhat of an interactive aspect where you could identify which companies are plotted where if you scrolled over their corresponding point on the graph. It was interesting to see that the majority of businesses were expanding and profiting simultaneously. This graph also serves the purpose of being able to identify what sort of direction the company is heading in, in regards to expansion and profitability.
I also looked at the effect that the number of employees has on profitability and revenue. Based on these graphs, there is a more direct relationship between number of employees and revenue as compared to profit. Also, these graphs are capable of conveying which companies are making the most out of their number of employees on a per employee basis, if you look at the direction of the point as it corresponds to the intersection of the x and y axis.
The next topic that I covered is market value. For those who are interested in investing, this is a key figure to look at. Market value has a lot to do with total revenues and many other measurements of a companies’ finances. There is also the important factor of perception. While profits and revenues might be similar for two companies, the market value can be drastically different based on the company’s outlook or the outlook of the industry that the company is in. The following graphs show the relationship between revenue and market value as well as assets and market value.
The final part of my analysis was to look at some of the categories of the information to see if there are any aggregate trends over time. Using a bar graph, I was able to compare the average amount of revenues and assets across each of the years.
Assets By Year Revenues by Year