Hadoop Workshop I: configure your first Hadoop cluster on Amazon EC2

Vivian Zhang
Posted on Apr 8, 2014

I was so happy to get many upvotes!


Many thanks go to Conductor Inc (Conductor makes the most widely used SEO platform - empowering enterprise marketers to take control of their search performance.)

Special thanks go to Caitlin Wilterdink, Jon Torodash, and Chris Lee (now Googler) for hosting us and giving us the wonderful space and assistance!


NYC Data Science Academy is offering two relative courses:
RSVP Hadoop Beginner level classes
RSVP Hadoop Intermediate level classes

The Intermediate level week 1 slides:


More info about this event on meetup.We followed the Tutorial repo during this workshop.

Here is a link with info that will help Windows users connect to EC2 instances using Putty for ssh.
(Thank Mandy for windows putty link)

You can also watch the videos to learn


Meetup announcement:
Speaker: Vivian Zhang, CTO and co-founder of SupStat Inc, organizer of NYC Open Data Meetup, Founder of NYC Data Science Academy. She teaches R and Hadoop.

Her data school hires the best working professionals to teach Python, D3.js and related Data Science skills. All the courses are designed to teach you employable skills. We teach the skills and toolkits in the class and assist you to do projects of students' own choice. Students will showcase their projects in this meetup group at the end of their courses.

Outline:
In Hadoop workshop I and II, I will walk you through the steps to configure a Hadoop cluster on Amazon EC2 and run two simple map-reduce jobs on the cluster.

Preparation:
1. Sign up for Amazon AWS acct
2. Get familiar with basic vi commands (if you don't know it, I can show you quickly. You are welcome to read more before coming.)
3. You don't need to know Java at this moment. If you know Java, you can program in Hadoop quickly in later workshops.

About Author

Vivian Zhang

Vivian Zhang

Vivian is a data scientist who has been devoted to the analytics industry and the development and use of data technologies for several years. She obtained expertise in data analysis and data management as a Senior Analyst and...
View all posts by Vivian Zhang >

Related Articles

Leave a Comment

Hadoop Workshop III: One Stop Shop — One System Fit All Sizes of Data | NYC Data Science Academy June 18, 2014
[…] Hadoop Workshop I: Configure Your First Hadoop Cluster on Amazon EC2 […]

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career citibike clustering Coding Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job JP Morgan Chase Kaggle lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Portfolio Development prediction Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping What to expect word cloud word2vec XGBoost yelp