Web Scraping WhaleWisdom
In my project I attempted to answer the question of whether there was a correlation between institutional investor behavior and stock returns in Anaplan (NYSE:PLAN). I webscraped whalewisdom.com which is a website that aggregates 13F filing data from institutional investment firms, 13F filings are basically a list of quarterly equity holdings for larger investment firms.
Here is a sample of the specific page that I was trying to web scrape:
The main problems that I ran into were:
- Data outputting to a csv file and being completely
- Having to manually scroll to get to the table with the rows I needed to scrape near the bottom of the page
- The data was in a table and I tried to scrape the entire table which wouldn't allow me to loop/iterate through it, so I had to scrape the data from each individual row
I eventually figured found out a way to click out of this news letter pop-up box, here is a link to the gist of code that helped the most with this problem: https://gist.github.com/j-gonzal/7ddc4b3c6378b712560c9b043363dc63
The data I was trying to scrape was basically an overview of the institutional investors that held Anaplan's stock, here are good examples:
I thought I would be able to understand the behavior of certain institutions because of their type, for example venture capital funds and being long-term orientated while firms like hedge funds would trade a lot more.
In the future I would like to try to extend this and try to see if there actually is a real correlation or trading signal that could be discovered in this data. If not I would like to be able to create a shareholder profile, and types of firms that hold the stock, so average investors could know who they were investing alongside.
Although I was able to successfully scrape the data after the presentation, I was unable to find any valuable type of correlation that would create a signal that could lead to higher future investment returns in time for the presentation I gave on my project. I definitely still learned a lot from the exercise though.
The best feedback that I got was to not undertake such a pie-in-the-sky type of project when I was so new to data science. There are professional quants on Wall Street who have a great deal of experience that are looking for similar types of trading signals like the ones that I was trying to find. I realize now that it was definitely not a good idea for my first project.
The link to all the code, data, and the PowerPoint that I used for the project presentation are listed here on my Github: https://github.com/j-gonzal/web_scraping_project