Data Science On The Go with Docker and Raspberry Pi
Docker
Note, you need to start an ssh connection in order to complete the next steps.
Installing docker for the Raspberry Pi couldn't be simpler, just enter the following command:
curl -sSL https://get.docker.com | sh
Afterwards, you'll want to add your user to the docker group with the following command:
sudo usermod -aG docker pi
Alternatively, if you want to use a different user than the default user, 'pi', the following command will work:
sudo sh -c 'usermod -aG docker $SUDO_USER'
To have this change take effect, you'll need to exit the current ssh session and log back in.
Let's verify that docker has installed correctly, use the command:
docker info
Great! Everything looks okay with the docker installation!
Here is a web page to the repository I created, edenbaus/raspberrypi-jupyterlab https://hub.docker.com/r/edenbaus/raspberrypi-jupyterlab/
The build files for this docker image (Dockerfile, requirements.txt, requirements2.txt) can be found on my Github repository, https://github.com/edenbaus/DataSciencePi
Now, let's use docker to download the latest version of the docker image edenbaus/raspberrypi-jupyterlab
with the following command:
docker pull edenbaus/raspberrypi-jupyterlab:latest
Note, due to the large number of third party packages, the image is currently over 2gB in size and is still growing as more packages are included. Please be patient while the image downloads.
Now that the latest version of the image has been downloaded to our Raspberry Pi, let's get it up and running!
docker run -it -p 9999:9999 edenbaus/raspberrypi-jupyterlab:latest
This command starts the docker image and starts Juypyter Lab open on port 9999. This is the port we pass to our web browser when connecting to Jupyter Lab. Afterwards, a root terminal prompt will load and display the output from the Jupyter Lab backend. It will take a few moments to load before the web interface will be available.
Then , we can connect to the web interface for Juypter Lab and start working on Python and R! Go to the url corresponding to your Raspberry Pi's hostname (or ip address).
http://hostname:9999
or
http://ip:9999
eg:
http://raspberrypi.local:9999
http://192.168.2.16:9999
Note, if you haven't added the Raspberry Pi's hostname and ip address to the connecting computer's /etc/host file (on MAC or Linux) you will need to include the .local suffix with your hostname, ie: rasperrypi.local.
Above is Jupyter Lab running an R notebook with ggplot2, dplyr, and plyr. On the right is a Python 3 notebook running pandas, requests, numpy, and beautifulsoup. Each ggplot2 graph took ~2-4 seconds, and the Python web scraping routine took about ~12 seconds. This is a very respectable performance for a $5-$10 computer - although the performance is far from the typically quick processing time on most modern computers.
The terminal in the ssh connection to the Raspberry Pi will output information corresponding to the libraries loaded and running procedures in the corresponding Jupyter Lab web interface.
This docker image contains roughly 100 added libraries and packages for R and Python, and I am continuing to add more relevant packages with each revision to the image. The one notable omissions for R is the package shiny.
Final Analysis
In terms of usability, the combination of R, Python, and SQL present a very capable set of data science tools and some interesting use-case opportunities:
- Education - learn programming in R, and Python on your tablet with a Raspberry Pi connected to the same wifi network.
- Development testing environment - Easily fire up multiple Raspberry Pi computers with the edenbaus/raspberrypi-jupyterlab docker image to test Python and R code in a standardized environment.
- Business - Analyze, manipulate, and visualize datasets with the Raspberry Pi through any networked desktop, laptop or tablet.
- Web scraping - Use Jupyter Lab to run a headless web scraper in Python or R without tying up resources on your main computer.
- Machine Learning - Train basic regression and classification models (CPU intensive models will be very slow!)
- Run trained models on data captured through sensors or webcams attached to the Raspberry Pi.
- Calculator - Base R, and Python with Numpy, Pandas, and Scikit-Learn offer very powerful calculator replacements.
The other interesting aspect is the concept of a small purpose built computer to function as a server and backend for a web-interface that can be operated by any networked computer, laptop, tablet, or cell phone.
With the official Raspberry Pi Zero case, it is 1/4 the size of a US dollar bill. There is an optional MINI HDMI -> HDMI adapter and USB receiver for a wireless keyboard attached in the image above. Those accessories may be necessary to directly connect the Raspberry Pi Zero W to an HDMI display and keyboard for configuration of the network if you can't connect with ssh.
See the Raspberry Pi Zero in USB over Ethernet in action!
Thanks for reading!
Feel free to connect with me on LinkedIn if you have any questions or package requests for future builds.
Here is a full list of included packages:
Included packages & libraries | |
R | Python 3 |
ACD | Django |
BH | Exifread |
Basta | Flask |
BigRF | SQLAlchemy |
BinomTools | SymPy |
Bioconductor | Whoosh |
CBA | beautifulsoup4 |
CCP | bokeh |
CORElearn | bottleneck |
ClustEval | cython |
Comparison | dash-core-components==0.5.1 |
CoreLearn | dash-html-components==0.6.2 |
DAIM | dash-renderer==0.7.3 |
ElemStatLearn | dash==0.17.7 |
HURDAT | faker |
Hmisc | fedex |
IRdisplay | gensim |
ISLR | glmnet |
LDAvis | h5py |
LSMeans | html5lib |
LTSA | inflect |
Leaflet | ipython |
MASS | jupyter |
MNP | jupyterlab |
MissForest | keras |
MissMDA | ladon |
NLP | lxml |
Outliers | matplotlib |
PROC | milk |
RColorBrewer | networkx |
RCurl | nose |
RMYSQL | numexpr |
RMarkdown | openpyxl |
RMiner | pandas |
ROCR | pandas-datareader |
RWeka | patsy |
RankCluster | peewee |
Rcpp | pillow |
RcppEigen | plotly==2.0.11 |
RegTest | pmxbot |
SigClust | prettytable |
SnowballC | progressbar2 |
TTR | pycrypto |
TimeROC | pydot |
angstroms | pymc3 |
anomalyDetection | pymongo |
arules | python-dateutil |
bikedata | pytz |
broom | quandl |
car | redis |
caret | regex |
chunked | requests |
colorspace | reverb |
corrplot | scikit-learn |
crayon | scrapy |
data.table | seaborn |
datasuRus | selenium |
descr | statsmodels |
devtools | tswift |
doBy | uuid |
dplyr | xlrd |
dwapi | xlwt |
e1071 | |
earth | |
evaluate | |
extrafont | |
fAssets | |
features | |
forecast | |
foreign | |
ggmap | |
ggplot2 | |
glmnet | |
gmodels | |
googleVis | |
gtable | |
h2o | |
highr | |
igraph | |
ipred | |
jsonlite | |
kernlab | |
kknn | |
klaR | |
lattice | |
lubridate | |
magrittr | |
mboost | |
mcmc | |
mice | |
mlogit | |
moments | |
mtvnorm | |
muhaz | |
network | |
neurohcp | |
nnet | |
omsdata | |
parallel | |
parlitools | |
party | |
party | |
pdbZMQ | |
penalized | |
plyr | |
qcc | |
quantmod | |
rCharts | |
randomForest | |
rbokeh | |
repr | |
reshape2 | |
rpart | |
rpart.plot | |
sampleSelection | |
sandwich | |
sem | |
snda | |
sqldfforecast | |
statnet | |
stringr | |
survival | |
swirl | |
syuzhet | |
text2vec | |
tidytext | |
tm | |
topicmodels | |
tree | |
vcd | |
visNetwork | |
wordcloud | |
yaml | |
yhatr | |
zoo |