Agile Workflow Workshop I: Light-W Dashboard and Reporting Workflows with R

Avatar
Posted on Apr 1, 2014

Many thanks go to Betterment, the sponsor of this event, for providing the food, space and drinks, and to Big Data Meetup, the co-host of this event!
-----------------------------------

Slides:

-----------------------------------

Meetup Announcement:

Speaker: Yuriy Goldman, Lead Engineer at Betterment.

Jon Mauney,Data and Behavioral Analyst at Betterment.

Outline:

R is quickly becoming a must-have tool in a data worker's tool-belt. You know its potency for statistical analysis and charting, but did you now that it can serve you well in a light-weight dashboarding and reporting function too?

In this hands-on session, Jon and Yuriy showed how R could be integrated into an existing agile development environment to enable a workflow by which any R beginner or expert could produce, deploy, and schedule R driven reports and dashboards. No license fees and no proprietary software: just a tiny bit of DevOps elbow grease to create automation against readily available and familiar open source tools.

The walkthrough dealed with the stack currently humming at Betterment (R Studio, S3, Linux, MySQL, GitHub, Jenkins, Confluence, a shell script, Cron), but its quite easy to swap in alternative components to fit your own environment. It was the workflow that matters.

-----------------------------------

Other Useful Info Link:

Github Repo: https://github.com/ygoldman/rwizflowy

1. RWizFlowy

Combines the nimblness and expressive power of R with some engineering elbow grease to enable agile and collaborative authoring of Reports and Dashboards in R.

Setup doc for local development:https://docs.google.com/document/d/1AcbX8aH_UnYDHx8wU75xRuteSZdMpJIxLxl2TpH0K-c/edit?usp=sharing

Slides from the presentation: https://docs.google.com/a/betterment.com/presentation/d/1ekIdY-VpPeOWWC7VgD2aBsT9IU34I9K_98eD_zmuFAM/edit#slide=id.g1dd976d0a_059

2. The Workflow

1) Authoring. Data Analyst:

  • check out your proejct from GitHub
  • use familiar tooling on their workstations to produce static or interactive visualizations from a database
  • visualisations are output to a locally mounted network drive such as an S3 bucket hosted in Amazon’s Cloud
  • a web server enables web access to the visualization via a predictable URL
  • review the produced content in your favorite browser
  • a new page on a favorite wiki can then iframe or src the visualization

2) Staging. Data Analyst:

  • add a runtime schedule to /etc/cron to designate how often you want the scrip to run in production
  • commit and push your code to GitHub - ideally, open a Pull Request from your branch into a shared branch. get your script peer reviewed.

3) Deployment. GitHub and Travis-CI:

  • once your code is merged into the Master branch, Travis-CI checks out your code
  • based on .travis.yml config in this project, Travis will set up a staging environment, run any tests, and pass or fail the buld
  • if the build passes, Travis will Git Push to a remote repository configured on your EC2 instance
  • a Post-Receive hook on the EC2 instance's git repo will deploy the latest code, update cron, set permissions
  • next time cron executes, your script will run against production data and update the visualizations in the S3 bucket (S3 bucket is also mounted on the EC2 server)
  • reload your wiki and your visualizations are now updating over time

3. The Stack

This project provides scaffolding for

  • G. GitHub
  • R. R-language
  • E. Engineering Elbow Grease to glue everything together
  • A. Amazon Web Services: EC2 server, S3 Bucket for network storage, RDS running MySQL
  • T. Travis-CI: for continuous integration and deployment via GitHub

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp