Visualizing the annual US Federal Budget since 1962

Posted on Jul 24, 2015

To explore the R Shiny App from a new window, please click here.

Eszter D. Schoell, PhD

Published July 24, 2015

Politics is very polarizing in the United States and the federal budget is a major topic in the debate. Attending large family gatherings with relatives coming from all over the US can therefore become very stressful if politics is brought up - especially in families spread across the North and South. This stress is created not only by being related, but also because at some point, there is no longer an objective basis when discussing politics. The particular political party a person endorses tends to determine which secondary sources a person uses.

I therefore wanted to create interactive visualizations of the US federal budget directly from the primary source (the white house on github). My goal was to provide a tool that persons in heated arguments could use to judge for themselves what they are really arguing about. I am quite looking forward to using it July 4th, 2016.

Using Shiny in R, I created an app with 3 tabs to display government income ('Receipts'), government expenditure ('Outlays') and what congress approved to spend each year ('Budget Authority'). [For more detailed information, please see federal budget process and federal budget basics.] The proportional breakdown of the total monies for each year is displayed using a treemap. For example, in the header image at the beginning of the blog, you can see that in 2015 a little more than 1/3 of government income came from individual income taxes. By selecting different years, you can see how the proportions change. The same can be done on the 'Receipts' and 'Budget Authority' tabs.

To better visualize how the budget has changed over the years, the user can then select one or more specific cells from the treemap (Figure 1) and plot them on a time line (Figure 2).  In this example, it can be seen how individual income taxes have increased more steeply than the contribution from corporate income taxes. It would be interesting to include change in size of working and corporation populations.


Figure 1


Figure 2

International Disaster Assistance is another interesting point. Looking at Figure 3, it would seem there has been a huge increase in the US contribution since 1962. However, it is hard to understand absolute numbers. Rather, it is easier to ask the question: Is international disaster assistance a larger part of the budget compared to other things? Figure 4 puts it in perspective and leads to the question: What is the Rail Industry Pension Fund and why is this not debated?


Figure 3


Figure 4

I now switch to describing some of the challenges I faced in the development of this visualization, the description of which helps in understanding better what is being visualized.

Challenge One: Working with treemaps.

a) How to present negative values?

In order to clearly show proportions, I used treemaps. Treemaps use area (size) and color to code values. Since area cannot be negative, but budgets have negative values, I decided to code amount of money using the absolute value. I then used color to indicate positive or negative values: darker shades code for greater distance from zero with green for positive and red for negative values. With this visualization form, you can explore the question: How does the proportion of total money spent in 1962 for the Department of Defense compare to 2014? In 1962, it was about one fourth of the budget and in 2014, about one eight of the budget. Important to note is that the specific amount does not matter in this representation, but the proportion to the rest.

b) How to show nested-ness?

Each category can be further broken down into subcategories. For example, the money given to the 'Department of Health and Human Services' includes the Centers for Medicare and Medicaid Services, Administration for Children and Families, and the National Institutes of Health, among others. In addition, this is a way to visualize how the departments are split into mandatory and discretionary spending (Figure 5). To accomplish this, I used the multi-select option for the selectInput function and then fed it to the treemap function.

Screen Shot 2015-07-31 at 1.18.55 PM

Figure 5

Challenge Two: Working with money over time.

a) How to adjust for inflation?

Looking at the treemaps for each year, it becomes obvious that proportions change over time. To visualize this change, a simple line graph of amount of money versus year seemed simplest. The first obstacle was adjusting for inflation. I wrote a helper function to adjust amounts based on 2015 US dollars (US inflation rate per year); this adjustment can be applied by clicking a checkbox.

# Adjusting for inflation so that each year is shown in 2015 USD amounts.
i_rate = read.csv('data/inflation.csv', colClasses='numeric')
i_rate$add1 = i_rate$Inflation.Rate + 1
mult = function(x) {
  for(i in 1:(length(x)-1)) {
    x[i+1] = x[i+1]*x[i]
i_rate$adjuster = mult(i_rate$add1)
new = data.frame(c(2020,2019,2018,2017,2016),c(0,0,0,0,0), c(1,1,1,1,1), c(1,1,1,1,1))
names(new) = names(i_rate)
i_rate = rbind(new,i_rate)

b) How to present the user with a subset of options based on a previous choice?

The raw data has several columns that can be seen as deepening levels - these are the nested categories outlined in Challenge one: b. For the change in amount over time, I wanted to limit the options to the current treemap to improve clarity. I therefore used an observer. The observer, in combination with an if condition checking for the existence of a selection, limits the options of the graph to the parent category in the treemap.

c) How to be able to pick one or more categories to graph?

Another challenge was allowing several options to be graphed at once to look at relationships and thereby understand scale. For example, the categories 'International.Assistance.Programs' and 'Department.of.Labor' were equally funded in 1962. Solely graphing 'International.Assistance.Programs' shows an increase of 20% in US spending from 1962 to 2015.  However, by plotting 'International.Assistance.Programs' versus the 'Department.of.Labor' - both of which were the same proportion of the budget in 1962 - you can see that the 'Department.of.Labor' has had a much steeper increase in expenditure.

Screen Shot 2015-07-24 at 3.02.46 PM

Figure 6

Screen Shot 2015-07-24 at 3.02.57 PM

Figure 7

Future improvements

1. Adding an option to output specific categories and dollar amounts for further analysis.

2. Being able to zoom into the treemap to see labels of areas that are too small for the whole plot.

3. Selecting one subcategory and scoping down so it becomes the new treemap.

4. Provide additional data for understanding: for example, bringing working population per year into graph of 'Individual.Income.Taxes.'

Special thanks to the Office of Management and Budget for tracking and providing data.

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp