Visualizing Alberta's Oil and Gas Activity

Kamal Sandhu
Posted on Mar 19, 2017

Alberta is one of the wealthiest and most resource rich provinces of Canada. Large part of this wealth flows from the burgeoning oil and gas industry. From its nascent start in the early 20th century, Alberta now has a massive oil and gas infrastructure build to tap into its oil reserves. These reserves are third only to Saudi Arabia and Venezuela. Alberta’s commercial capital, Calgary boasts of the most number of corporate head offices in Canada outside of Toronto.

Oil and gas investment in the province topped out at $32.7 billion in 2013. Keeping a tap on the oil patch numbers could be a difficult task even for the most zealous analysts i.e. if they don’t have the right tools. I felt the same way working as an analyst with a firm that had significant exposure to oil and gas services sector. So as part of my first project at NYCDSA, I decided to a do a proof of concept for a tool that can be used for visualizing Alberta's oil and gas activity.

Main questions I was trying to answer through visualization were: how active is the local market? Which companies are drilling and where? Who is applying for oil well licenses? What kind of wells are they digging and for what products? Are they building pipelines too? From where and for what products? Which license holders and drilling contractors are the busiest? How deep are they digging? Has there been a marked shift in activity because of the oil and gas industry downturn?

Checkout the app, explore and find answers - ab.oilandgas.io.

2017-03-25 (3)

Monthly trend for top five drilling contractors in Alberta.

Data Source

Heatmap of abandoned wells in Alberta generated from shapefile.

Heatmap of abandoned wells in Alberta generated from a shapefile.

Energy industry in Alberta is regulated by Alberta Energy Regulator (AER). Most of the activities mentioned above require permits prior to commencement of work. Companies also need to notify AER when the work begins. For most activities, companies can log into their AER portal and make the required disclosures.

AER collects this data and makes some of it available on their data portal. Although most of it is available free of cost, specific skills are required to slice/dice and make a meaning out it.

AER provides activity reports on drilling activity, pipeline construction activity, well licensing and abandoned wells etc. Most of this information is produced in the form of text files broken down by daily activity. Kind of like this. Information about abandoned wells is available in the shapefile format. What is a shapefile? Go figure.

Methodology

First step was to download the files and unzip them. Although the files can be downloaded and unzipped as a batch, daily activity report for each category is contained in a single text file. I narrowed the project down to new wells drilled, new well licenses issued and new pipeline approvals from January 2012 to December 2016. This meant that the number of text files was 3 categories * 4 years * 365 days = 4,380 + 1 (for leap year). General outline of the code for reading and cleaning the data was as follows -

  1. Read the names of the files contained in all the folders recursively into a list
  2. Iterate through the list of file names and read each text file into memory
  3. Remove unnecessary headers and footers
  4. Get the lengths of various columns from the tables denoted by dashes (---) at the top or hard code them
  5. Go through every column, extract the text and breakdown the text into columns based on the length of dashes or hard code the lengths
  6. Save this text in a dataframe and perform cleaning operations like separating date and time, separating dominion land survey fields etc.
  7. Write the dataframe to a csv file when processing is done and repeat for the next file

    Township Grid System of Alberta. Source.xt category

AER publishes location data in DLS (Dominion Land Survey) format. Mapping using this system is not supported by any of the freely available mapping libraries in R. So, I had to write a short script to convert each DLS coordinate into an approximate lat/long location. You can read more about DLS here. This part of the code can be outlined as follows -

  1. Find the north-south distance of the point from the 49th parallel
  2. Find the east-west distance of the point from the nearest median to the east
  3. Use this information to calculate bearing of the location from the intersection of 49th parallel and nearest median to the east
  4. Calculate the latitude and longitude using the distances calculated in the first 2 parts and bearing

This method correctly handles the inconsistencies from road allowances, earth’s circumference and other nuances of DLS.

Visualization of the data is usually simpler than the extraction and pre-processing. Simple outline of a typical analysis operation is as follows -

  1. Decide the metric and group data accordingly
  2. Mutate, filter and apply operations on rows or to separate columns
  3. Orient the data so that it is ideal for visualization
  4. Convert date information into time-series data (if required)
  5. Group data as needed
  6. Visualize using the relevant tools (graphs or maps)

Last part was to combine all the data processed in the previous parts into a clean shiny app. Code for the project can be found here.

Future

The project is currently parked on Github and the shiny app can be found at ab.oilandgas.io. It can be expanded to include other Canadian provinces and the US. Incorporating other regulators could enable companies to keep a tap on the pulse of the O&G industry. This would enable monitoring of expected expansion or contraction of O&G industry in real time and could help in predicting potential increase or decrease in oil supply. Further, formations with over-investment/under-investment, transportation (pipeline) bottlenecks, and areas that might need increased environmental watch could also be identified.

Project is hosted on a Shiny Dashboard server and can be accessed at ab.oilandgas.io.

Tools Used

Data extraction and manipulation – R, RStudio, Dplyr, Tidyr, SP, Geosphere, Lubridate.

Mapping - Leaflet.

Graphing – Xts(timeseries preprocessing), Dygraphs.

Knitted together with Shiny and Shiny Dashboard. Hosted at Shiny Server with masked forwarding from GoDaddy.

About Author

Kamal Sandhu

Kamal Sandhu

Kamal Sandhu is a finance professional keenly interested in the potential of data science in combination with financial and management theory. He is working towards the Chartered Financial Analyst (CFA) program and the Financial Risk Manager (FRM) program....
View all posts by Kamal Sandhu >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data Book Launch Book-Signing bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp