Web Scraping USAID for non-country-specific Recipients
Introduction
This post is about the second of the four projects we are supposed to deliver at theย NYC Data Science Academy Data Science Bootcampย program. This is a web scraping project.ย
About USAID
USAID - United States Agency for International Development.
Who they are:
USAID is the lead U.S. Government agency that works to end extreme global poverty and enable resilient, democratic societies to realize their potential. Formerย President John. F. Kennedy created USAID by executive order in 1961.
Assistance to Foreign Countries:
U.S. Foreign assistance has always had the twofold purpose of furthering America's interests while improving lives in the developing world.ย USAID carries out U.S. Foreign policy by promoting broad-scale human progress at the same time it expands stable, free societies, creates markets and trade partners for the United States, and fosters good will abroad.
What they do:
Spending less than 1 percent of the total federal budget, USAID works in over 100 countries to:
- Promote broadly sharedย economic prosperity;
- Strengthenย democracy and good governance;
- Protectย human rights;
- Improveย global health,
- Advanceย food securityย and agriculture;
- Improveย environmental sustainability;
- Furtherย education;
- Help societiesย prevent and recover from conflicts; and
- Provideย humanitarian assistanceย in the wake of natural and man-made disasters.
Data Source
The Foreign assistance data can be found on theย Foreign Aid Explorer. We can view data by recipient country, by sector, and by federal agency. The complete data, containing about 900,000 observations, is available for download from the website. But, as this is a web scraping project, I scraped a part of the data,ย required for my analysis, from the query page.
About my Web Scraping Project:
The actual content of the web page is loaded with a heavy JavaScript implementation (HTML structure and URL are the same for all pages no matter how you navigate or filter). I usedย Python withย Selenium and Beautiful Soupย to scrape the web page. The page took a minimum of 45 seconds to completely load. So, I had to set theย initial idle time to 90 seconds. Within this time, I selected the required regions, the transaction type as โDisbursementsโ and applied filters. There were about 8,700 rows (observations). I set 100 rows per page, which resulted in 87 pages. The code is written in such a way that it gives a pause of about 1 to 1.5 minutes for each pageย in order to avoid overloading of the server and not get banned from the server. At the bottom of each page, a "Next" button will be available to view the next page. I wrote code to automatically move on to the next page, without manual interruptions. I encountered the problem however that, even after reaching the last page, the "Next" button would still work and the script would scrap the last page repeatedly, if unnoticed. To accommodate for this, I changed my code so that, it would move on to the next page, only if the current page was not the last page; otherwise, the website should be closed.
The eighty-seven pages were scraped continuously for 1.5 hours, without manual interruptions. Each column was saved as a list, simultaneously and once scraping was over, I converted them into a data frame. Then, I set the transaction type as 'Obligations' and used the same techniques to scrape all the 84 pages, which also lasted for about 1.5 hours. It was saved in another data frame. As mentioned before, the entire data consisted of about 9,000 pages, which would have accounted for about 9,000 minutes or 6 complete days. I therefore only scraped a part of the data, because of the time constraint.
Combining and Cleaning:
Before converting to CSV, I had to do some cleaning. I continued coding in Python. Before starting the cleaning, I concatenated the two data frames into a single data frame, including a new column, 'Transaction_type'. The resulting data frame had about 17,200 observations and eight columns.
The 'Current_amount' column had dollar symbol and commas. First, I removed all the symbols from the column and converted it into a numeric column. Then, there were some negative amounts (de-obligations). As it was given that the de-obligations were given for another purpose and not be accounted in total, I made all the negative amounts to zero. With that all cleaning done, I converted the resulting data frame into a CSV file and stored it in my local directory.
As I wanted to visualize the whole data in an interactive app, I decided to build the scraped portion first as an app for this project and then update it later on. I switched on toย R and Shiny Dashboard to visualize and develop the data as an app.
Glossary
Activity:
An aid 'activity' can be a project, a program, cash transfer, delivery of goods, a training course, a research project, a debt relief operation, or a contribution to an international organization.
Financial Flow:
An aid 'financial flow' most commonly takes the form of an 'obligation' (a binding agreement, based on budgeted resources, which will result in outlays) or a 'disbursement' (an amount paid by federal agencies, by cash or cash equivalent, to settle government obligations).
Fiscal Year:
An accounting period of 365 days (366 in leap years), but not necessarily starting on January 1. The fiscal year of the United States Government begins on October 1 and ends on September 30 and is designated by the calendar year in which it ends for global programs.
Implementing Agency:
The agency which actually obligates and disburses the U.S. Foreign assistance, either directly or via an implementing partner entity. The implementing agency for a foreign aid activity may or may not differ from the appropriated (funding) agency.
Region: This is the specific target beneficiary of U.S. Foreign assistance. A 'country' listed on the Foreign Aid Explorer website is either an entity named and classified as an "Independent State" by the U.S. Department of State, or a geographic regional recipient with the suffix "Region".
The Foreign Aid Explorer website uses the term 'region' in two different ways:
- To designate assistance to a non-country-specific recipient and
- Summarize assistance by geographic area.
On the 'Data Query' section, foreign aid benefiting multiple countries may be assigned to either a 'Region' recipient or to 'Worldโ.
I have employed the first case of 'region', i.e., non-country-specific recipients, for this project.
Literature Review
- Foreign aid is generally unpopular with the general public, with a 2017 poll finding 57% favor a cut and 6% who want increased aid.
- Most Americans overestimate foreign aid as a share of the total federal budget.
- In the past, less than 1% of the national budget went to foreign assistance.
- As of fiscal year 2017, foreign aid between the U.S. State Department and USAID totaled $50.1 billion, or just over 1% of the budget.
From this survey, it could be understood that people are not aware of USAID. This app may play a small part to explain what is actually happening.
When I read the term, 'non-country-specific recipients',ย I had many questions running through my mind:
- Who are the non-country-specific recipients?
- Why is it very important to allocate funds separately, even after allocating billions of funds to independent countries?
- What are the important activities?
- What are the top sectors?
- What are the top implementing agencies?
- How are the funds distributed over the course of seventeen years?
These were my questions of interest. I had scraped data from USAID for seventeen years from 2001 toย 2017. It is given in the website that the data for the fiscal year 2017 is incomplete. However, they had obligated the budget, so, I also included that year. In order to get answer for my first three questions, I started my analysis from the 'Activity' column utilizing the dplyr package in R.
Regions and Activities
The regions (non-country-specific recipients) considered for this project are:
- East Asia and Oceania Region
- Eastern Africa Region
- Europe Region
- Middle East Region
- North Africa Region
- South and Central Asia Region
- Southern Africa Region
- West Africa Region
- Western Hemisphere Region
Observations for Activities:
I mention below some of the common activities which I observed in all the regions:
Common Activities:
- Countering Violent Extremism - Govt. and Civil Society Sector
- Counter-terrorism Finance - Govt. and Civil Society Sector
- International Narcotics & Law Enforcement: Combating Wildlife Trafficking
- Counter-terrorism Anti-Terrorism Assistance- Govt. and Civil Society Sector
- Department of Energy, Safeguards Engagement
But there are some highlight programs, which made me raise my eyes:
Highlights:
Eastern Africa:
- Young African Leaders Initiative (YALI) Regional Leadership Center East Africa - Post-Secondary Education
Middle East:
- Improving Water sanitation Project
- Madame President Project
East Asia and Oceania:
- BOOST Skill Training to Advance the Careers of Engineers and Scientists in Southย and East Asia - Other Multisector
North Africa:
- Advancing Careers of Women in Science: Empowering Women in North Africa -ย Government and Civil Society.
Similar kinds ofย programs will be conducted in independent countries. I could understand that they allocate funds separately and conduct such special programs again in the specific regions also. This would be an opportunity for people who might have missed such programs in independent countries.
I then started to analyze my last three questions.ย I used R and the packages, ggplot2, plotly and Shiny for analysis and visualization.
Visualization in Shiny
I did the visualization of the data inย Shiny, which is an open source R package that provides an elegant and powerful web framework for building web applications using R. Shiny helps you turn your analyses into interactive web applications. Furthermore, I usedย Shiny Dashboardย to create a management dashboard-like application.
Start
I grouped the aid into two categories, one by regions (non-country-specific recipients) and the other by sectors.
Let us discuss them in detail.
Aid by Region Tab
This tab gives information about how much amount of USD has been obligated and disbursed for each region from 2001 to 2017. Two select boxes are available, one to select the region and the other to select the year.
First, line charts are used to display the disbursements for seventeen years from 2001 to 2017 to the selected region. The green dot represents the amount for the selected year. In this case, one more check box is available for users to interact with. If the check box is selected, the line graph for the obligations will also be displayed, which enables users to check if the amounts have been disbursed as per the plan (obligations) or not.
Second, bar charts have been employed to display theย top sectors,ย in which amounts of more than $100,000 have been disbursed to the selected region. Three years will be shown, where the first is the selected, and the others are 2011, 2016, for comparison. If the data is not available for a particular year, then the bar chart for the corresponding year will not be displayed.
Again, bar charts are employed to display theย top fundamental agencies, which have disbursed amounts above $100,000 to the selected region. In this case also, the bar charts display three years' information, as discussed above.
Observations of Region-wise Disbursements in 2016:
Region |
Disbursements (in USD) |
Top Sectors |
Top Implementing Agencies |
West Africa |
197 M | Agriculture, Basic Health, Maternal and Child Care | USAID, Dept. of Defense, Dept. of State |
Eastern Africa | 147 M | Conflict, Agriculture, Environment | Dept. of State,
USAID |
Europe | 107 M | Emergency , Conflict, Government and Civil Society |
Dept. of State, Dept. of Defense |
South Africa |
25 M | Emergency, Environment, HIV/AIDS, | USAID,
Dept. of State |
East Asia and Oceania |
16 M |
Government and Civil Society |
Dept. of State |
Middle East | 6.8 M | Energy, Water Supply, Trade Policy |
Dept. of Energy |
North Africa |
2 M | Emergency, Government and Civil Society | Dept. of State |
South and Central Asia | 1.8 M | Government and Civil Society, Energy, Trade Policy | Dept. of State,
Dept. of Energy, Dept. of Commerce |
Western Hemisphere | 1.6 M | Government and Civil Society, Conflict and Trade Policy |
Dept. of State, Homeland Security |
Observations from Comparison of Region-wise Obligations and Disbursements Line Graphsย for seventeen years:
- The funds have been disbursed as planned (obligated) in almost all years to all the regions, except Europe, North Africa, South and Central Asia and Western Hemisphere.
- In Europe, more funds have been disbursed than planned in most of the years, which is an interesting finding.
- In other regions, both graphs are fluctuating a lot.
Aid by Sector Tab
This tab gives information about how many USD have been obligated and disbursed in each sector from 2001 to 2017.ย Two select boxes are available here as well, one, to select the sector and the other, to select the year.
First, line charts are used to display the disbursements for seventeen years in the selected sector from 2001 to 2017.ย The amount for the selected year will be highlighted. In this case, one more check box is available to display the line graph for the obligations. The users can compare and check if the amounts have been disbursed as planned (obligations) or not.
Second, bar charts have been employed to display theย top regionsย for which the amounts of more than $100,000 have been disbursed in the selected sector. Three years will be shown, where the first is the selected, and the others are 2011, 2016, for comparison. If the data is not available for a particular year, then the bar chart for the corresponding year will not be displayed.
Again, bar charts are employed to give information about theย top fundamental agencies, which have disbursed amounts of more than $100,000 in the selected sector. The bar charts display three years' information, as discussed above.
Observations of Sector-wise Disbursements:
Sector |
Disbursements
(in USD) |
Top Regions |
Conflict, Peace and Security |
90 M | East and West Africa |
Emergency | 88 M | Europe |
Agriculture | 68 M | West, East Africa |
Basic Health | 56 M | West Africa |
Govt.ย and Civil
Society |
45 M |
East Asia and Oceania, Europe, East Africa |
Water supply |
45 M | West, South, East Africa and Middle East |
Other Multi-sectors | 30 M | Europe |
Environment | 26 M |
West, East and South Africa |
Trade | 22 M | Westย Africa and all |
Maternal and Child care | 17 M | West and East Africa |
HIV | 11 M | West, South and East Africa |
Post-sec Education | 9 M | West, East and South Africa |
Disaster management | 8 M | West and South Africa |
Food | 8 M | East, West and South Africa |
Admin | 6.8 M | East and South Africa |
Energy | 5.4 M | Middle East,
South and Central Asia and all |
Business | 1.3 M | Europe, Middle East |
Banking | 500,000 | West Africa |
Communications | 3000 | West Africa |
Sector-wise Disbursements from 2001 -2017:
The following sectors show an increasing trend:
- Agriculture
- Conflict, Peace and Security
- Environmental
- Govt. and Civil Society
- Maternal and Child care
- Other multi-sectors
- Post-sec education
- Trade policy
- Water Supply
The following sectors display a decreasing trend:
- Banking (peak in 2002)
- Business (peak in 2004)
- Food (peak in 2005)
- Emergency (peak in 2005)
- HIV/AIDS (peak in 2003)
An oscillating trend has been observed in other sectors.
Decreasing trends are sometimes good and increasing trends are sometimes alarming.
- In case of the HIV/AIDS sector, obligations are also less โ good news that the disease has been controlled in the specific regions.
- Increasing trend in โWater supplyโ sector is alarming. It tries to indicate the scarcity of water and the need to preserve water in the specific regions.
- In case of โBankingโ sector, obligations are almost always less than the disbursement, which is interesting.
โThe purpose of foreign aid is to end the need for its existenceโโ Mark Green, USAID Administrator. I too hope for the best.
Conclusion
- The Africa Regions receive funding in every sector.
- The maximum fund for
- South and Central Asia Region โ allocated in the Energy Sector.
- Middle East Region โ in Energy, Water and Trade Policy
- Europe Regionย โ in Emergency and Business
- East Asia and Oceania โ in Government and Civil Society
- Western Hemisphere โ inย Conflict and Trade Policy.
- The sectors which Africa regions do not rank first are Business, Energy, Emergency Response and other multisectors.
- The overall highest number of transactions has been recorded in 2015.ย Major sectors have received an increased amount during 2009-2016 (during former President Obama's term, for which literature works are available to explain how he improved many sectors such as Agriculture).
This puts forth the following future directions:
Future Directions
The FY 2019 Presidentโs Budget for the State Department and USAID is $39.3 billion.
Analysis of the trend to predict, if,ย during President Trumpโs period,
- The decrease in the funds will indicate any good sign,
- The disbursed amount for regions will increase or decrease in important sectors,
- Any unexpected disbursement will be made to any new sectors, like trade policy or business related sectors.
The data set provided a great way to apply Selenium and Beautiful Soup. It was an unforgettable experience to work on this web scraping project. Thanks to NYC Data Science Academy for providing me a wonderful opportunity to work with Selenium, Beautiful Soup, Python, ggplot2, plotly and Shiny Dashboard.
Code and data can be found on GitHub, while the app itself is online at shinyapps.io