Web Scraping USAID for non-country-specific Recipients

Lakshmi Prabha Sudharsanom
Posted on Feb 17, 2018

Introduction

This post is about the second of the four projects we are supposed to deliver at the NYC Data Science Academy Data Science Bootcamp program. This is a web scraping project. 

About USAID

USAID - United States Agency for International Development.

Who they are:

USAID is the lead U.S. Government agency that works to end extreme global poverty and enable resilient, democratic societies to realize their potential. Former President John. F. Kennedy created USAID by executive order in 1961.

Assistance to Foreign Countries:

U.S. Foreign assistance has always had the twofold purpose of furthering America's interests while improving lives in the developing world.  USAID carries out U.S. Foreign policy by promoting broad-scale human progress at the same time it expands stable, free societies, creates markets and trade partners for the United States, and fosters good will abroad.

What they do:

Spending less than 1 percent of the total federal budget, USAID works in over 100 countries to:

Data Source

The Foreign assistance data can be found on the Foreign Aid Explorer. We can view data by recipient country, by sector, and by federal agency. The complete data, containing about 900,000 observations, is available for download from the website. But, as this is a web scraping project, I scraped a part of the data, required for my analysis, from the query page.

About my Web Scraping Project:

The actual content of the web page is loaded with a heavy JavaScript implementation (HTML structure and URL are the same for all pages no matter how you navigate or filter). I used Python with Selenium and Beautiful Soup to scrape the web page. The page took a minimum of 45 seconds to completely load. So, I had to set the initial idle time to 90 seconds. Within this time, I selected the required regions, the transaction type as ‘Disbursements’ and applied filters. There were about 8,700 rows (observations). I set 100 rows per page, which resulted in 87 pages. The code is written in such a way that it gives a pause of about 1 to 1.5 minutes for each page in order to avoid overloading of the server and not get banned from the server. At the bottom of each page, a "Next" button will be available to view the next page. I wrote code to automatically move on to the next page, without manual interruptions. I encountered the problem however that, even after reaching the last page, the "Next" button would still work and the script would scrap the last page repeatedly, if unnoticed. To accommodate for this, I changed my code so that, it would move on to the next page, only if the current page was not the last page; otherwise, the website should be closed.

The eighty-seven pages were scraped continuously for 1.5 hours, without manual interruptions. Each column was saved as a list, simultaneously and once scraping was over, I converted them into a data frame. Then, I set the transaction type as 'Obligations' and used the same techniques to scrape all the 84 pages, which also lasted for about 1.5 hours. It was saved in another data frame. As mentioned before, the entire data consisted of about 9,000 pages, which would have accounted for about 9,000 minutes or 6 complete days. I therefore only scraped a part of the data, because of the time constraint.

Combining and Cleaning:

Before converting to CSV, I had to do some cleaning. I continued coding in Python. Before starting the cleaning, I concatenated the two data frames into a single data frame, including a new column, 'Transaction_type'. The resulting data frame had about 17,200 observations and eight columns.

The 'Current_amount' column had dollar symbol and commas. First, I removed all the symbols from the column and converted it into a numeric column. Then, there were some negative amounts (de-obligations). As it was given that the de-obligations were given for another purpose and not be accounted in total, I made all the negative amounts to zero. With that all cleaning done, I converted the resulting data frame into a CSV file and stored it in my local directory.

As I wanted to visualize the whole data in an interactive app, I decided to build the scraped portion first as an app for this project and then update it later on. I switched on to R and Shiny Dashboard to visualize and develop the data as an app.

Glossary

Activity:

An aid 'activity' can be a project, a program, cash transfer, delivery of goods, a training course, a research project, a debt relief operation, or a contribution to an international organization.

Financial Flow:

An aid 'financial flow' most commonly takes the form of an 'obligation' (a binding agreement, based on budgeted resources, which will result in outlays) or a 'disbursement' (an amount paid by federal agencies, by cash or cash equivalent, to settle government obligations).

Fiscal Year:

An accounting period of 365 days (366 in leap years), but not necessarily starting on January 1. The fiscal year of the United States Government begins on October 1 and ends on September 30 and is designated by the calendar year in which it ends for global programs.

Implementing Agency:

The agency which actually obligates and disburses the U.S. Foreign assistance, either directly or via an implementing partner entity. The implementing agency for a foreign aid activity may or may not differ from the appropriated (funding) agency.

Region: This is the specific target beneficiary of U.S. Foreign assistance. A 'country' listed on the Foreign Aid Explorer website is either an entity named and classified as an "Independent State" by the U.S. Department of State, or a geographic regional recipient with the suffix "Region".

The Foreign Aid Explorer website uses the term 'region' in two different ways:

  1. To designate assistance to a non-country-specific recipient and
  2. Summarize assistance by geographic area.

On the 'Data Query' section, foreign aid benefiting multiple countries may be assigned to either a 'Region' recipient or to 'World‘.

I have employed the first case of 'region', i.e., non-country-specific recipients, for this project.

Literature Review

Wikipedia:

  • Foreign aid is generally unpopular with the general public, with a 2017 poll finding 57% favor a cut and 6% who want increased aid.
  • Most Americans overestimate foreign aid as a share of the total federal budget.
  • In the past, less than 1% of the national budget went to foreign assistance.
  • As of fiscal year 2017, foreign aid between the U.S. State Department and USAID totaled $50.1 billion, or just over 1% of the budget.

From this survey, it could be understood that people are not aware of USAID. This app may play a small part to explain what is actually happening.

When I read the term, 'non-country-specific recipients',  I had many questions running through my mind:

  • Who are the non-country-specific recipients?
  • Why is it very important to allocate funds separately, even after allocating billions of funds to independent countries?
  • What are the important activities?
  • What are the top sectors?
  • What are the top implementing agencies?
  • How are the funds distributed over the course of seventeen years?

These were my questions of interest. I had scraped data from USAID for seventeen years from 2001 to  2017. It is given in the website that the data for the fiscal year 2017 is incomplete. However, they had obligated the budget, so, I also included that year. In order to get answer for my first three questions, I started my analysis from the 'Activity' column utilizing the dplyr package in R.

Regions and Activities

The regions (non-country-specific recipients) considered for this project are:

  1. East Asia and Oceania Region
  2. Eastern Africa Region
  3. Europe Region
  4. Middle East Region
  5. North Africa Region
  6. South and Central Asia Region
  7. Southern Africa Region
  8. West Africa Region
  9. Western Hemisphere Region

Observations for Activities:

I mention below some of the common activities which I observed in all the regions:

Common Activities:

  • Countering Violent Extremism - Govt. and Civil Society Sector
  • Counter-terrorism Finance - Govt. and Civil Society Sector
  • International Narcotics & Law Enforcement: Combating Wildlife Trafficking
  • Counter-terrorism Anti-Terrorism Assistance- Govt. and Civil Society Sector
  • Department of Energy, Safeguards Engagement

But there are some highlight programs, which made me raise my eyes:

Highlights:

Eastern Africa:

Middle East:

  • Improving Water sanitation Project
  • Madame President Project

East Asia and Oceania:

  • BOOST Skill Training to Advance the Careers of Engineers and Scientists in South and East Asia - Other Multisector

North Africa:

  • Advancing Careers of Women in Science: Empowering Women in North Africa -  Government and Civil Society.

Similar kinds of  programs will be conducted in independent countries. I could understand that they allocate funds separately and conduct such special programs again in the specific regions also. This would be an opportunity for people who might have missed such programs in independent countries.

I then started to analyze my last three questions. I used R and the packages, ggplot2, plotly and Shiny for analysis and visualization.

Visualization in Shiny

I did the visualization of the data in Shiny, which is an open source R package that provides an elegant and powerful web framework for building web applications using R. Shiny helps you turn your analyses into interactive web applications. Furthermore, I used Shiny Dashboard to create a management dashboard-like application.

Start

I grouped the aid into two categories, one by regions (non-country-specific recipients) and the other by sectors.

Start

Let us discuss them in detail.

Aid by Region Tab

This tab gives information about how much amount of USD has been obligated and disbursed for each region from 2001 to 2017. Two select boxes are available, one to select the region and the other to select the year.

Aid By Region

 

First, line charts are used to display the disbursements for seventeen years from 2001 to 2017 to the selected region. The green dot represents the amount for the selected year. In this case, one more check box is available for users to interact with. If the check box is selected, the line graph for the obligations will also be displayed, which enables users to check if the amounts have been disbursed as per the plan (obligations) or not.

Second, bar charts have been employed to display the top sectors,  in which amounts of more than $100,000 have been disbursed to the selected region. Three years will be shown, where the first is the selected, and the others are 2011, 2016, for comparison. If the data is not available for a particular year, then the bar chart for the corresponding year will not be displayed.

Top Sectors - West Africa

 

Again, bar charts are employed to display the top fundamental agencies, which have disbursed amounts above $100,000 to the selected region. In this case also, the bar charts display three years' information, as discussed above.

Top Implementing Agencies - West Africa

 

 

Observations of Region-wise Disbursements in 2016:

Region-wise Toppers in 2016

Region

Disbursements

(in USD)

Top Sectors

Top Implementing Agencies

West Africa

197 M Agriculture, Basic Health, Maternal and Child Care USAID, Dept. of Defense, Dept. of State
Eastern Africa 147 M Conflict, Agriculture, Environment Dept. of State,

USAID

Europe 107 M Emergency , Conflict, Government and Civil Society

Dept. of State, Dept. of Defense

South Africa

25 M Emergency, Environment, HIV/AIDS, USAID,

Dept. of State

East Asia and Oceania

16 M

Government and Civil Society

Dept. of State

Middle East 6.8 M Energy, Water Supply, Trade Policy

Dept. of Energy

North Africa

2 M Emergency, Government and Civil Society Dept. of State
South and Central Asia 1.8 M Government and Civil Society, Energy, Trade Policy Dept. of State,

Dept. of Energy, Dept. of Commerce

Western Hemisphere 1.6 M Government and Civil Society, Conflict and Trade Policy

Dept. of State, Homeland Security

 

Observations from Comparison of Region-wise Obligations and Disbursements Line Graphs  for seventeen years:

  • The funds have been disbursed as planned (obligated) in almost all years to all the regions, except Europe, North Africa, South and Central Asia and Western Hemisphere.
  • In Europe, more funds have been disbursed than planned in most of the years, which is an interesting finding.
  • In other regions, both graphs are fluctuating a lot.

Line Graphs - Regions

 

Aid by Sector Tab

This tab gives information about how many USD have been obligated and disbursed in each sector from 2001 to 2017.  Two select boxes are available here as well, one, to select the sector and the other, to select the year.

Aid by Sector

 

First, line charts are used to display the disbursements for seventeen years in the selected sector from 2001 to 2017.  The amount for the selected year will be highlighted. In this case, one more check box is available to display the line graph for the obligations. The users can compare and check if the amounts have been disbursed as planned (obligations) or not.

Second, bar charts have been employed to display the top regions for which the amounts of more than $100,000 have been disbursed in the selected sector. Three years will be shown, where the first is the selected, and the others are 2011, 2016, for comparison. If the data is not available for a particular year, then the bar chart for the corresponding year will not be displayed.

Top Regions - Agriculture

 

Again, bar charts are employed to give information about the top fundamental agencies, which have disbursed amounts of more than $100,000 in the selected sector. The bar charts display three years' information, as discussed above.

 

Observations of Sector-wise Disbursements:

Sector-wise Toppers in 2016

Sector

Disbursements

(in USD)

Top Regions

Conflict, Peace and Security

90 M East and West Africa
Emergency 88 M Europe
Agriculture 68 M West, East Africa
Basic Health 56 M West Africa
Govt.  and Civil

Society

45 M

East Asia and Oceania, Europe, East Africa

Water supply

45 M West, South, East Africa and Middle East
Other Multi-sectors 30 M Europe
Environment 26 M

West, East and South Africa

Trade 22 M West  Africa and all
Maternal and Child care 17 M West and East Africa
HIV 11 M West, South and East Africa
Post-sec Education 9 M West, East and South Africa
Disaster management 8 M West and South Africa
Food 8 M East, West and South Africa
Admin 6.8 M East and South Africa
Energy 5.4 M Middle East,

South and Central Asia and all

Business 1.3 M Europe, Middle East
Banking 500,000 West Africa
Communications 3000 West Africa

 

 

Sector-wise Disbursements from 2001 -2017:

The following sectors show an increasing trend:

  • Agriculture
  • Conflict, Peace and Security
  • Environmental
  • Govt. and Civil Society
  • Maternal and Child care
  • Other multi-sectors
  • Post-sec education
  • Trade policy
  • Water Supply

The following sectors display a decreasing trend:

  • Banking (peak in 2002)
  • Business (peak in 2004)
  • Food (peak in 2005)
  • Emergency (peak in 2005)
  • HIV/AIDS (peak in 2003)

An oscillating trend has been observed in other sectors.

Line Graphs - Sectors

 

Decreasing trends are sometimes good and increasing trends are sometimes alarming.

  • In case of the HIV/AIDS sector, obligations are also less – good news that the disease has been controlled in the specific regions.
  • Increasing trend in ‘Water supply’ sector is alarming. It tries to indicate the scarcity of water and the need to preserve water in the specific regions.
  • In case of ‘Banking’ sector, obligations are almost always less than the disbursement, which is interesting.

“The purpose of foreign aid is to end the need for its existence”Mark Green, USAID Administrator. I too hope for the best.

Conclusion

  • The Africa Regions receive funding in every sector.
  • The maximum fund for
    • South and Central Asia Region – allocated in the Energy Sector.
    • Middle East Region – in Energy, Water and Trade Policy
    • Europe Region  – in Emergency and Business
    • East Asia and Oceania – in Government and Civil Society
    • Western Hemisphere – in  Conflict and Trade Policy.
  • The sectors which Africa regions do not rank first are Business, Energy, Emergency Response and other multisectors.
  • The overall highest number of transactions has been recorded in 2015. Major sectors have received an increased amount during 2009-2016 (during former President Obama's term, for which literature works are available to explain how he improved many sectors such as Agriculture).

Overall

This puts forth the following future directions:

Future Directions

The FY 2019 President’s Budget for the State Department and USAID is $39.3 billion.

Analysis of the trend to predict, if, during President Trump’s period,

  • The decrease in the funds will indicate any good sign,
  • The disbursed amount for regions will increase or decrease in important sectors,
  • Any unexpected disbursement will be made to any new sectors, like trade policy or business related sectors.

The data set provided a great way to apply Selenium and Beautiful Soup. It was an unforgettable experience to work on this web scraping project. Thanks to NYC Data Science Academy for providing me a wonderful opportunity to work with Selenium, Beautiful Soup, Python, ggplot2, plotly and Shiny Dashboard.

Code and data can be found on GitHub, while the app itself is online at shinyapps.io

 

About Author

Lakshmi Prabha Sudharsanom

Lakshmi Prabha Sudharsanom

Lakshmi Prabha Sudharsanom is a life-long learner. She was a post doc, funded by Department of Atomic Energy, India. She had been spending time in writing graph algorithms for social network applications and analyzing data using MS-Excel. When...
View all posts by Lakshmi Prabha Sudharsanom >

Related Articles

Leave a Comment

Your email address will not be published. Required fields are marked *

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags