Data Study on Insider Trading

Posted on Aug 21, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Data shows insider Trading is often associated with the illegal activity of trading in shares of ones company based on material non public information. But, insider trading is not always illegal. It is not illegal to own, or buy and sell shares of the company you work for, as long as the transactions are being disclosed publicly in a timely manner and as long as the information that is being used to trade is publicly available. This project focuses legal element of insider trading and its potential impact on short term stock prices.


Technical trading schools often tout the relationship between Insider transactions and stock prices. There are banks that include this information as one of several indicators to create a composite score of the relative strength of a stock. I wanted to do my own analysis to see if this relationship holds up, especially on a short term time frame (i.e. 1 to 5 days after an insider transaction).

If there is a relationship, then with some additional analysis, it could be converted to a trading strategy.

Some additional questions I wanted to address:
- Are there any differences across sectors when it comes to the aforementioned relationship between Insider transactions and price?
- Does time come into play at all, and are there seasonality factors that I should be taking into account?
- What machine learning methodology would perform the best in predicting price behavior after an insider transaction?

The data

To address my objectives, I web scraped and retrieved ~40,000 records of publicly disclosed insider transactions from the past year (August 1st, 2015 to August 5th, 2016) that involved a transaction size of 10,000 shares or more.

I matched up the ~40,000 transactions to ticker information on (through the Quantmod package in R) and appended time-specific information for the 5 days following the date of each Insider transaction. After discarding OTC (Over the counter) stocks and any symbols that weren't listed on the NASDAQ, NYSE or AMEX exchanges, I was left with 28,769 records:
-- 13,202 insider stock purchases, and
-- 15,567 insider stock sales

Exploratory Data Analysis

The main outcome variable I wanted to look at was the percentage increase or decrease in price anywhere from one day to five days after an insider transaction.

The chart below shows us the average percentage change in price following an insider transaction. The price change per day has been taken cumulatively, i.e. the price change depicted for Day 5 is the difference in the closing price five days after the transaction vs. the closing price on the day of the transaction. Theoretically someone can trade off an insider trading signal much sooner than the close of market. The closing price on the day of the transaction is a much more conservative number for average traders like myself who would not be waiting at their computers for a signal, itching to make a trade. So all prices you will see in this blog are closing prices.

The chart shows that there appears to be a relationship between insider transactions and price changes. Following an insider sale (yellow part of the bar), we are seeing a decline in price, and following an insider purchase we are seeing an increase in price, which is exactly the behavior we want to see. Moreover the extent of change increases as the days go by. We also see an interesting difference between stocks priced less than $5 vs. those over $5. Stocks priced less than $5 are generally more volatile than higher priced stocks. Volatility can be good though for a trading strategy.

Data Study on Insider Trading


Data Study on Insider TradingVolume is often referenced as a confirmation signal for identifying support behind a stock's price movement. The volume chart on the right confirms this difference between lower and higher priced stocks. More importantly, this chart is showing that following an insider transaction, on average, volume picks up, compared to the day of the transaction, and stays up for the next couple of days. We don't see any negative numbers in this case.

Past year Data trends

Looking at the entire set of transactions over the past year, the volume of insider trades seems to have spiked late last year, in November and December, and there seems to be a lull in summer.  The behavior in the spring and summer months doesn't seem to be atypical of the general market. Comparing the volume of insider transactions to the volume bars below the price chart of the general market, it seems that the November spike does not gel across the two sources of information. It may be useful to factor volume differences between Insider trading and general market trading in a trading strategy because it may be highlighting potential imbalances not apparent in the market.

Data Study on Insider Trading

Data Study on Insider Trading








insider_Price Change by Month
The SPY (pictured in the above right hand side chart), is an exchange-traded fund (ETF) that represents the S&P 500 index. It has been quite choppy all year, ending up a little higher than where it started out in August last year. Interestingly, the average percentage change in price following an insider transaction (right), follows the same pattern regardless of transaction side. For e.g. in February there was a large spike in the market, and the price change over the 5 days following an insider transaction (regardless of whether it was a purchase or a sale) resulted in an increase in price. So overall market performance seems to have an impact and it should be factored into a good trading strategy.

Differences by Sector

Looking at price changes by sector (below), we see that the Energy sector responds in the most favorable way after an insider buy side transaction. On average it has moved 4.88% following an insider transaction.

On the short side of things, the Energy sector again features as the leading mover (-1.6% on average) in the direction we would want the stock to move after insiders sell a portion of their shares. Health care comes next with a -1.2% move.

  After an Insider Purchase                                 After an Insider Sale


insider_Sell Side by Sector









Following that exploratory data analysis, the next thing I wanted to address was a choice of machine learning methodology that would appropriately model the overall relationship between insider transactions and price. I first attempted to look into using multiple linear regressions but upon looking at the plots representing the assumptions that need to be met, it seemed like the data violated normality and equality of variances pretty egregiously. Additionally, there seemed to be a fair number of high leverage and high residual points as indicated by the plot on the bottom right of the chart grid below.


I did not want to remove outliers in this case, because a high volume stock for example, could actually be a good thing for a trading strategy.

So I moved on to trying out a K nearest neighbor classification method, mainly because there are fewer requirements for your data to meet certain assumptions. As a first step, I included all predictor variables to see how the model would work. I wanted to use a mix of categorical and continuous variables and so I first converted the categorical variables to be on a 0-1 scale and normalized the continuous variables to also be a percentile between 0 and 1. This would ensure that all the metrics would be on the same playing level and that no one variable would overshadow the others in the model.

Model Details

Model Predictors

Side (“Buy”/“Sell”) | Sector | Share Price | # of Shares | Transaction Value | Remaining Shares Post Transaction | Exchange | Month | ( % Volume Change 1 Day after Transaction | Market Capitalization

Target Variableinsider_Confusion Matrix_1

% Price Change 5 days after Transaction (Categorized: <-1.5%|-1.5% to 1.5% |>1.5%)

Training Set

23,857 records (10,481 buy side | 13,376 sell side)

Prediction Set

3,945 records (1,949 buy side | 1,996 sell side) — Only records from August

Choice of number of neighbors to look at: 167 (square root of the data set)

The table shows the output of the model, with predicted values in the columns and true values in the rows. The diagonals represent the accurate predictions.  At first glance, this model seems to perform okay, with 57% of the predicted drops in price being accurate predictions, but if you notice, the data set itself contains a higher number of records indicating a drop in price (i.e. more sales side transactions in general in the data).

Chi-Square Test

I ran a chi-square test of independence on the model, and the observed values were shown to be not independent from the expected values given the distribution of data. In other words this model was not significantly better than simply guessing based on the distribution seen in the data.

I ran the model again, this time filtering for stocks priced below $5, for their greater volatility. I also insider_Confusion Matrix_2changed the threshold of my price groups of interest to +-1% to be more inclusive for the 1st and 3rd groups, and removed the month variables from my predictor set since it did not add much more explanatory power to the model.


The results are included in the table here and while there are fewer predictions overall (because of the filtered data set), the model performs better. The chi-square test of independence shows a p value that is not significant: 0.13, enabling us to retain the null hypothesis that the predicted Vs. expected values cannot be shown to be dependent on each other.


I am happy to have been successful in modeling a crude relationship between insider transactions and stock prices in the short term, and in selecting a machine learning methodology that works as a good starting point. Clearly a lot of improvements can still be made to the model. For one, I would love to look at factoring overall market behavior (both price and volume) into the model. Historic average monthly market performance could be a powerful predictor that could be added into the mix.

I would also love to look a bit more closely at running models for specific sectors -- especially the Energy & Health care sectors, since they have shown a stronger relationship with insider transactions in this analysis.

Again, this is just the beginning of my analysis of such insider data and as such I would like to very strongly advise anyone reading to not to make any trades based on the information given here, but to use it as a learning tool to better understand the relationships that exist between insider transactions and stock prices.

About Author

Leave a Comment

Chandra Thuremella March 11, 2020
Interesting article! I suspected that there is a correlation between insider trades and market movements in the biotech industry, which your analysis proves. I would like to perform further analysis. Would you be able to share your dataset? thanks

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI