Agile Workflow Workshop I: Light-W Dashboard and Reporting Workflows with R
Many thanks go to Betterment, the sponsor of this event, for providing the food, space and drinks, and to Big Data Meetup, the co-host of this event!
Speaker: Yuriy Goldman, Lead Engineer at Betterment.
Jon Mauney,Data and Behavioral Analyst at Betterment.
R is quickly becoming a must-have tool in a data worker's tool-belt. You know its potency for statistical analysis and charting, but did you now that it can serve you well in a light-weight dashboarding and reporting function too?
In this hands-on session, Jon and Yuriy showed how R could be integrated into an existing agile development environment to enable a workflow by which any R beginner or expert could produce, deploy, and schedule R driven reports and dashboards. No license fees and no proprietary software: just a tiny bit of DevOps elbow grease to create automation against readily available and familiar open source tools.
The walkthrough dealed with the stack currently humming at Betterment (R Studio, S3, Linux, MySQL, GitHub, Jenkins, Confluence, a shell script, Cron), but its quite easy to swap in alternative components to fit your own environment. It was the workflow that matters.
Other Useful Info Link:
Github Repo: https://github.com/ygoldman/rwizflowy
Combines the nimblness and expressive power of R with some engineering elbow grease to enable agile and collaborative authoring of Reports and Dashboards in R.
Setup doc for local development:https://docs.google.com/document/d/1AcbX8aH_UnYDHx8wU75xRuteSZdMpJIxLxl2TpH0K-c/edit?usp=sharing
2. The Workflow
1) Authoring. Data Analyst:
- check out your proejct from GitHub
- use familiar tooling on their workstations to produce static or interactive visualizations from a database
- visualisations are output to a locally mounted network drive such as an S3 bucket hosted in Amazon’s Cloud
- a web server enables web access to the visualization via a predictable URL
- review the produced content in your favorite browser
- a new page on a favorite wiki can then iframe or src the visualization
2) Staging. Data Analyst:
- add a runtime schedule to /etc/cron to designate how often you want the scrip to run in production
- commit and push your code to GitHub - ideally, open a Pull Request from your branch into a shared branch. get your script peer reviewed.
3) Deployment. GitHub and Travis-CI:
- once your code is merged into the Master branch, Travis-CI checks out your code
- based on .travis.yml config in this project, Travis will set up a staging environment, run any tests, and pass or fail the buld
- if the build passes, Travis will Git Push to a remote repository configured on your EC2 instance
- a Post-Receive hook on the EC2 instance's git repo will deploy the latest code, update cron, set permissions
- next time cron executes, your script will run against production data and update the visualizations in the S3 bucket (S3 bucket is also mounted on the EC2 server)
- reload your wiki and your visualizations are now updating over time
This project provides scaffolding for
- G. GitHub
- R. R-language
- E. Engineering Elbow Grease to glue everything together
- A. Amazon Web Services: EC2 server, S3 Bucket for network storage, RDS running MySQL
- T. Travis-CI: for continuous integration and deployment via GitHub