Web Scraping USAID for non-country-specific Recipients
This post is about the second of the four projects we are supposed to deliver at the NYC Data Science Academy Data Science Bootcamp program. This is a web scraping project.
USAID - United States Agency for International Development.
Who they are:
USAID is the lead U.S. Government agency that works to end extreme global poverty and enable resilient, democratic societies to realize their potential. Former President John. F. Kennedy created USAID by executive order in 1961.
Assistance to Foreign Countries:
U.S. Foreign assistance has always had the twofold purpose of furthering America's interests while improving lives in the developing world. USAID carries out U.S. Foreign policy by promoting broad-scale human progress at the same time it expands stable, free societies, creates markets and trade partners for the United States, and fosters good will abroad.
What they do:
Spending less than 1 percent of the total federal budget, USAID works in over 100 countries to:
- Promote broadly shared economic prosperity;
- Strengthen democracy and good governance;
- Protect human rights;
- Improve global health,
- Advance food security and agriculture;
- Improve environmental sustainability;
- Further education;
- Help societies prevent and recover from conflicts; and
- Provide humanitarian assistance in the wake of natural and man-made disasters.
The Foreign assistance data can be found on the Foreign Aid Explorer. We can view data by recipient country, by sector, and by federal agency. The complete data, containing about 900,000 observations, is available for download from the website. But, as this is a web scraping project, I scraped a part of the data, required for my analysis, from the query page.
About my Web Scraping Project:
The eighty-seven pages were scraped continuously for 1.5 hours, without manual interruptions. Each column was saved as a list, simultaneously and once scraping was over, I converted them into a data frame. Then, I set the transaction type as 'Obligations' and used the same techniques to scrape all the 84 pages, which also lasted for about 1.5 hours. It was saved in another data frame. As mentioned before, the entire data consisted of about 9,000 pages, which would have accounted for about 9,000 minutes or 6 complete days. I therefore only scraped a part of the data, because of the time constraint.
Combining and Cleaning:
Before converting to CSV, I had to do some cleaning. I continued coding in Python. Before starting the cleaning, I concatenated the two data frames into a single data frame, including a new column, 'Transaction_type'. The resulting data frame had about 17,200 observations and eight columns.
The 'Current_amount' column had dollar symbol and commas. First, I removed all the symbols from the column and converted it into a numeric column. Then, there were some negative amounts (de-obligations). As it was given that the de-obligations were given for another purpose and not be accounted in total, I made all the negative amounts to zero. With that all cleaning done, I converted the resulting data frame into a CSV file and stored it in my local directory.
As I wanted to visualize the whole data in an interactive app, I decided to build the scraped portion first as an app for this project and then update it later on. I switched on to R and Shiny Dashboard to visualize and develop the data as an app.
An aid 'activity' can be a project, a program, cash transfer, delivery of goods, a training course, a research project, a debt relief operation, or a contribution to an international organization.
An aid 'financial flow' most commonly takes the form of an 'obligation' (a binding agreement, based on budgeted resources, which will result in outlays) or a 'disbursement' (an amount paid by federal agencies, by cash or cash equivalent, to settle government obligations).
An accounting period of 365 days (366 in leap years), but not necessarily starting on January 1. The fiscal year of the United States Government begins on October 1 and ends on September 30 and is designated by the calendar year in which it ends for global programs.
The agency which actually obligates and disburses the U.S. Foreign assistance, either directly or via an implementing partner entity. The implementing agency for a foreign aid activity may or may not differ from the appropriated (funding) agency.
Region: This is the specific target beneficiary of U.S. Foreign assistance. A 'country' listed on the Foreign Aid Explorer website is either an entity named and classified as an "Independent State" by the U.S. Department of State, or a geographic regional recipient with the suffix "Region".
The Foreign Aid Explorer website uses the term 'region' in two different ways:
- To designate assistance to a non-country-specific recipient and
- Summarize assistance by geographic area.
On the 'Data Query' section, foreign aid benefiting multiple countries may be assigned to either a 'Region' recipient or to 'World‘.
I have employed the first case of 'region', i.e., non-country-specific recipients, for this project.
- Foreign aid is generally unpopular with the general public, with a 2017 poll finding 57% favor a cut and 6% who want increased aid.
- Most Americans overestimate foreign aid as a share of the total federal budget.
- In the past, less than 1% of the national budget went to foreign assistance.
- As of fiscal year 2017, foreign aid between the U.S. State Department and USAID totaled $50.1 billion, or just over 1% of the budget.
From this survey, it could be understood that people are not aware of USAID. This app may play a small part to explain what is actually happening.
When I read the term, 'non-country-specific recipients', I had many questions running through my mind:
- Who are the non-country-specific recipients?
- Why is it very important to allocate funds separately, even after allocating billions of funds to independent countries?
- What are the important activities?
- What are the top sectors?
- What are the top implementing agencies?
- How are the funds distributed over the course of seventeen years?
These were my questions of interest. I had scraped data from USAID for seventeen years from 2001 to 2017. It is given in the website that the data for the fiscal year 2017 is incomplete. However, they had obligated the budget, so, I also included that year. In order to get answer for my first three questions, I started my analysis from the 'Activity' column utilizing the dplyr package in R.
Regions and Activities
The regions (non-country-specific recipients) considered for this project are:
- East Asia and Oceania Region
- Eastern Africa Region
- Europe Region
- Middle East Region
- North Africa Region
- South and Central Asia Region
- Southern Africa Region
- West Africa Region
- Western Hemisphere Region
Observations for Activities:
I mention below some of the common activities which I observed in all the regions:
- Countering Violent Extremism - Govt. and Civil Society Sector
- Counter-terrorism Finance - Govt. and Civil Society Sector
- International Narcotics & Law Enforcement: Combating Wildlife Trafficking
- Counter-terrorism Anti-Terrorism Assistance- Govt. and Civil Society Sector
- Department of Energy, Safeguards Engagement
But there are some highlight programs, which made me raise my eyes:
- Young African Leaders Initiative (YALI) Regional Leadership Center East Africa - Post-Secondary Education
- Improving Water sanitation Project
- Madame President Project
East Asia and Oceania:
- BOOST Skill Training to Advance the Careers of Engineers and Scientists in South and East Asia - Other Multisector
- Advancing Careers of Women in Science: Empowering Women in North Africa - Government and Civil Society.
Similar kinds of programs will be conducted in independent countries. I could understand that they allocate funds separately and conduct such special programs again in the specific regions also. This would be an opportunity for people who might have missed such programs in independent countries.
I then started to analyze my last three questions. I used R and the packages, ggplot2, plotly and Shiny for analysis and visualization.
Visualization in Shiny
I did the visualization of the data in Shiny, which is an open source R package that provides an elegant and powerful web framework for building web applications using R. Shiny helps you turn your analyses into interactive web applications. Furthermore, I used Shiny Dashboard to create a management dashboard-like application.
I grouped the aid into two categories, one by regions (non-country-specific recipients) and the other by sectors.
Let us discuss them in detail.
Aid by Region Tab
This tab gives information about how much amount of USD has been obligated and disbursed for each region from 2001 to 2017. Two select boxes are available, one to select the region and the other to select the year.
First, line charts are used to display the disbursements for seventeen years from 2001 to 2017 to the selected region. The green dot represents the amount for the selected year. In this case, one more check box is available for users to interact with. If the check box is selected, the line graph for the obligations will also be displayed, which enables users to check if the amounts have been disbursed as per the plan (obligations) or not.
Second, bar charts have been employed to display the top sectors, in which amounts of more than $100,000 have been disbursed to the selected region. Three years will be shown, where the first is the selected, and the others are 2011, 2016, for comparison. If the data is not available for a particular year, then the bar chart for the corresponding year will not be displayed.
Again, bar charts are employed to display the top fundamental agencies, which have disbursed amounts above $100,000 to the selected region. In this case also, the bar charts display three years' information, as discussed above.
Observations of Region-wise Disbursements in 2016:
Top Implementing Agencies
|197 M||Agriculture, Basic Health, Maternal and Child Care||USAID, Dept. of Defense, Dept. of State|
|Eastern Africa||147 M||Conflict, Agriculture, Environment||Dept. of State,
|Europe||107 M||Emergency , Conflict, Government and Civil Society||
Dept. of State, Dept. of Defense
|25 M||Emergency, Environment, HIV/AIDS,||USAID,
Dept. of State
East Asia and Oceania
Government and Civil Society
Dept. of State
|Middle East||6.8 M||Energy, Water Supply, Trade Policy||
Dept. of Energy
|2 M||Emergency, Government and Civil Society||Dept. of State|
|South and Central Asia||1.8 M||Government and Civil Society, Energy, Trade Policy||Dept. of State,
Dept. of Energy, Dept. of Commerce
|Western Hemisphere||1.6 M||Government and Civil Society, Conflict and Trade Policy||
Dept. of State, Homeland Security
Observations from Comparison of Region-wise Obligations and Disbursements Line Graphs for seventeen years:
- The funds have been disbursed as planned (obligated) in almost all years to all the regions, except Europe, North Africa, South and Central Asia and Western Hemisphere.
- In Europe, more funds have been disbursed than planned in most of the years, which is an interesting finding.
- In other regions, both graphs are fluctuating a lot.
Aid by Sector Tab
This tab gives information about how many USD have been obligated and disbursed in each sector from 2001 to 2017. Two select boxes are available here as well, one, to select the sector and the other, to select the year.
First, line charts are used to display the disbursements for seventeen years in the selected sector from 2001 to 2017. The amount for the selected year will be highlighted. In this case, one more check box is available to display the line graph for the obligations. The users can compare and check if the amounts have been disbursed as planned (obligations) or not.
Second, bar charts have been employed to display the top regions for which the amounts of more than $100,000 have been disbursed in the selected sector. Three years will be shown, where the first is the selected, and the others are 2011, 2016, for comparison. If the data is not available for a particular year, then the bar chart for the corresponding year will not be displayed.
Again, bar charts are employed to give information about the top fundamental agencies, which have disbursed amounts of more than $100,000 in the selected sector. The bar charts display three years' information, as discussed above.
Observations of Sector-wise Disbursements:
Conflict, Peace and Security
|90 M||East and West Africa|
|Agriculture||68 M||West, East Africa|
|Basic Health||56 M||West Africa|
|Govt. and Civil
East Asia and Oceania, Europe, East Africa
|45 M||West, South, East Africa and Middle East|
|Other Multi-sectors||30 M||Europe|
West, East and South Africa
|Trade||22 M||West Africa and all|
|Maternal and Child care||17 M||West and East Africa|
|HIV||11 M||West, South and East Africa|
|Post-sec Education||9 M||West, East and South Africa|
|Disaster management||8 M||West and South Africa|
|Food||8 M||East, West and South Africa|
|Admin||6.8 M||East and South Africa|
|Energy||5.4 M||Middle East,
South and Central Asia and all
|Business||1.3 M||Europe, Middle East|
Sector-wise Disbursements from 2001 -2017:
The following sectors show an increasing trend:
- Conflict, Peace and Security
- Govt. and Civil Society
- Maternal and Child care
- Other multi-sectors
- Post-sec education
- Trade policy
- Water Supply
The following sectors display a decreasing trend:
- Banking (peak in 2002)
- Business (peak in 2004)
- Food (peak in 2005)
- Emergency (peak in 2005)
- HIV/AIDS (peak in 2003)
An oscillating trend has been observed in other sectors.
Decreasing trends are sometimes good and increasing trends are sometimes alarming.
- In case of the HIV/AIDS sector, obligations are also less – good news that the disease has been controlled in the specific regions.
- Increasing trend in ‘Water supply’ sector is alarming. It tries to indicate the scarcity of water and the need to preserve water in the specific regions.
- In case of ‘Banking’ sector, obligations are almost always less than the disbursement, which is interesting.
“The purpose of foreign aid is to end the need for its existence”– Mark Green, USAID Administrator. I too hope for the best.
- The Africa Regions receive funding in every sector.
- The maximum fund for
- South and Central Asia Region – allocated in the Energy Sector.
- Middle East Region – in Energy, Water and Trade Policy
- Europe Region – in Emergency and Business
- East Asia and Oceania – in Government and Civil Society
- Western Hemisphere – in Conflict and Trade Policy.
- The sectors which Africa regions do not rank first are Business, Energy, Emergency Response and other multisectors.
- The overall highest number of transactions has been recorded in 2015. Major sectors have received an increased amount during 2009-2016 (during former President Obama's term, for which literature works are available to explain how he improved many sectors such as Agriculture).
This puts forth the following future directions:
The FY 2019 President’s Budget for the State Department and USAID is $39.3 billion.
Analysis of the trend to predict, if, during President Trump’s period,
- The decrease in the funds will indicate any good sign,
- The disbursed amount for regions will increase or decrease in important sectors,
- Any unexpected disbursement will be made to any new sectors, like trade policy or business related sectors.
The data set provided a great way to apply Selenium and Beautiful Soup. It was an unforgettable experience to work on this web scraping project. Thanks to NYC Data Science Academy for providing me a wonderful opportunity to work with Selenium, Beautiful Soup, Python, ggplot2, plotly and Shiny Dashboard.