Shiny to Crime Forecasting Challenge by National Institute of Justice

Posted on Feb 6, 2017


I was reading the news and I came across the article above, 70 year old women in Portland was raped in the daylight and the rapist went back to mowing the lawn after committing such heinous crime.

Thomas, in and out of jail from 2008 to 2013 was on parole with minimum surveillance as per the judge. How is this surveillance level decided? By asking a group of 100+ questions, based on which many judicial system in United States assess further potential risk to the society from a criminal.

Thomas was out as low risk suspect when he committed the rape, the reason why the analytical system proved wrong is simple, Thomas lied about his age 19 in the questionnaire when he was 50 which reduced his risk to commit another crime to low. I wanted to see whether there are more such patterns in crime committed by same person also I wanted to know whether this analytical system works or is just leading us wrong way.

I began gathering data, my first source was , I also wanted to create a method to visualize the crimes in Portland and use it further to analysis on which areas can be under heavier patrol employed by law.

My data set finally looks like:


The one thing that I am missing is co-ordinates, the found out that police department uses 7 digit and 6 digit UTF co-ordinate system in Portland. In Google Earth toolkit menu you can notice there is a function which can convert lat and long to Northing and Easting, I resolved this issue by changing co-ordinates to latitude and longitudes.

What happened next was more painful, if you have been to Portland or seen it on the map, it looks like how my plot came out, there is a river from the middle and a H shape highway lane passes through the city represented in grey color .Sketch

The dots are all the crime recorded in 2012 alone, when I began plotting it on Earth, the points started to appear in Alberta,Canada than in Portland, USA. I believe the shift must have been given on purpose in the data set due to privacy concern, I tried several fix, initially I thought it was a Zone shift, that hypothesis failed and then I tried several formulas when finally Euclidean Principal showed positive result, I reduced each lat and long by the recorded distance between Alberta and Portlanad. Result:


The aim was to create an application where everyone can see and visualize the areas affected more or less by crime, so I used R Shiny as a tool to complete a Shiny app that demonstrate the crime, and it is much more interactive than what I had achieved so far.


The application allows you to move anywhere on the map, the control panel allows you to visualization crimes between specific dates. Also you can choose what type of crime would you like to Visualize, a link to the app is given here .

By visualizing the crimes by date I could see the repetition of crime and its patterns over the years in same location.


map3I could not show all pictures by when you check the app the dates move all the way to November 2012 on the same spot, this shows patterns of similar crime and location. There are thousands of clusters in Portland such as this. After getting my first result I moved to analyse further reasons for the ineffectiveness of cops to detect such patterns.

I found Portland has a total Police force of 1000 active on duty-cops and 200 reserve, also it has 300 civilian agents. According to census bureau the population of Portland is 609,892.

So that makes 1 cop for 510 people. Where in NYC there is 1 cop for 58 people another reason I found for the failing system money, the reason why the prisons can not keep criminal on punishment for long are not just based on there seriousness of crime alone but also based on the amount of money spent to keep them in detention.

On average $69 is spent on a single prisoner every day, so the cost of keeping someone in prison is very high, this accounts for $59 Billion annually for all the states combined.

The failure of current analytical system has led to a rise in crime over the years too.Crime Rate The highest crime types are mostly Accidents and Burglary which are not that serious compared to crimes in other states, but if people will not change there mindset about this situation then the numbers will kept growing.


This is crime in 2012 April, the number of Accident were about 1000 a month and Burglary cases were about 1800, more serious crime as shooting and stabbing cold were as low as 10 to 20.

The crime in Aug 2016


Total rise can now be noticed clearly, what caught my attention was rise in Burglary and Shooting, the number of shooting cases almost doubled to 50 and the Burglary cases are 2300 a month from 1800 in April 2012.

There are lot of seasonal pattern in the data, as overall crime is highest in July every year, and lowest in Dec and Jan, one of the reason is weather, due to cold weather, crime does slow down. Yet the number of shooting cases are maximum in Oct for some reason. and Burglaries are highest in July and August. Overall things in Portland get worse in the second half of every year.

The analysis here left some conclusions, yet there is no change in Crime in Portland today, I decided to take a stand and further found a way to contribute to this issue. NIJ launched a Crime Challenge. The goal of the challenge are given below.

  1. Encourage "nontraditional" crime forecasting researchers to compete against more "traditional" crime forecasting researchers.
  2. Compare available crime forecasting methods.
  3. Improve place-based crime forecasting.

My next step is to Use kNN algorithm to train my data set and build a predictive model which can help the police department to utilize there resources heavily in regions which would more likely to be active crime regions for that day. The efficiency and accuracy of algorithm could be upto a period of 3 months until the pattern in crime begins to change again and then we would require a new data set for training  a new model.

The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Arjun Singh Yadav

Arjun received his Bachelors Degree in Mechatronics Engineering from SRM University in India. Soon after which he competed in DARPA to build a autonomous vehicle to help blind and disabled where he used Python based algorithm to learn...
View all posts by Arjun Singh Yadav >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI