Steps toward recreating The Facebook IPO plot

Posted on Feb 12, 2015

NYC Data Science Academy Bootcamp 1: February–April 2015, Day 3

As data scientists, we are quite familiar finding and mucking through data. We merge, split, clean, and analyze the data in order to draw our final conclusions. Our daily workflow can feel comfortingly logical, at times, even cut and dry. However, every now and again, we are reminded of the art of our craft. As journalism enters the age of data it is increasingly important to present data with visual impact. Resources like the New York Times present data in a visual and even interactive way, engaging the reader and enabling self-guided exploration.


Visualizing Friendships, by Paul Butler, shows which cities are connected by Facebook connections. "Each line might represent a friendship made while traveling, a family member abroad, or an old college friend pulled away by the various forces of life."


Anyone who uses R should be familiar with the graphic created by Paul Butler in 2010 and included in the original Facebook IPO back in 2012. This was brought up in one of the first classes of the NYC Data Science Academy bootcamp as an example of the prowess of ggplot2, a popular graphics package for R. This is the kind of graphic that would inspire anyone to learn more about the features of this powerful language and associated package. In particular, three things speak to me:

  • There was no mapping package: in this sense, the quantity of the connections themselves become an additional layer of information as we can clearly see most of the continents of the world mapped out by cities connected through the popular social network.
  • It was done in R and ggplot2, without the aid of graphics design
  • Great circle arcs provide an intuitive and evocative feeling of international travel (think of any major airline ad or even of the old school Indiana Jones traveling montages).

I was challenged to reproduce the look and feel of this plot using ggplot2. There is a plethora of resources online that I made use of to do this and probably many more that the reader can find if she wants to become more familiar with the capabilities of ggplot2. In particular, I draw heavily upon the tutorials by FlowingData and If you haven't heard of them, please go check them out! They are fantastic resources and were very helpful. The rest of this post will focus on some of the elements above and how to reproduce them using R.

Reproducing a map without a map

Clearly, the Facebook universe of connections is vast enough to produce the plot above. In fact, in his original post, Butler notes his decision to plot only unique pairs of cities connected by Facebook friends rather than every every connection. "A big white blob appeared in the center of the map. Some of the outer edges of the blob vaguely resembled the continents, but it was clear that I had too much data to get interesting results just by drawing lines." Without timely access to the breadth of worldwide Facebook connections, I decided to look at US airport locations and trips originating from some of the nation's busiest airports to their domestic destinations. More detailed code for how I wrangled the airport data can be found at the link to the RPub presentation from class.

Great Circles

Having loaded the data into a table of start and end coordinates for each trip, I needed to calculate great circle arcs (traces of the largest circle that can be drawn between two points on a sphere) for each trip. For this, I used the geosphere package which takes two points and a step number as inputs and outputs the trace of the great circle between those two points.


This would have to be done for every unique pair of combinations of airports to their domestic destinations. I used a for loop and created a path ID so that I could use ggplot2 to plot all the points.

for(i in 1:nrow(trips)){
gcirc$pathID<-i # allowing for group plotting in ggplot

All that was left was to plot the results, colorizing the paths and airport locations in a night-time theme using some ggplot2 options:

  theme(plot.margin = unit(c(-1, -1, -1, -1), "cm"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
arcs<-geom_path(, aes(x=lon, y=lat,group=pathID,color=iata.origin),alpha=0.2,size=0.5)



This is a small start towards the beautiful piece that Paul created. I wanted to get a little creative, so I decided to take advantage of the vector output feature from R and ggplot2. I opened up Adobe Illustrator and added a subtle glow effect to each path. I even changed some of the colors, which is something I could have easily done in ggplot2, of course.


I don' t show these plots to highlight any shortcomings of ggplot2. Rather, R output can be a beautiful thing (as Butler shows) or a great starting point for artistic effects that might aid impact.

Concluding Thoughts

I set out to see what would be required to achieve the neat effects in Paul Butler's Facebook plot and covered a few of the basic elements that, when combined with larger data sets, could certainly provide a similar look and feel. While working on this project, however, I discovered a really nice blog post over at, Improving R Data Visualization through Design. In it the author provides several examples of Raw R output and how his collaboration with a graphics designer enhanced the impact of the visual information without damaging the take-aways. While it is true that performing these sorts of post-processing steps on a raw plot outside of R violates some form of reproducibility of the figure itself, I feel that the embellishments discussed in the article serve a higher purpose of making these informative visuals even more memorable.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp