What we learned from first offering NYC R programming class

Vivian Zhang
Posted on Jan 18, 2014


(The photo was from our first offering of R classes)

We are going to offer our Data Science by R (beginner level) course again in February. The goal of this class is to get students to a point where they are self-sufficient in R, are proficient at analyzing data and can take these skills back to their full-time jobs. You may sign up at https://www.meetup.com/NYC-Data-Science-Academy/events/148820532/or www.nycdatascience.com

We had a great first round of this course and are going to keep most of it the same this time around.

From the feedback we received from our students, it was apparent that their favorite parts of the course were the class exercises and short practice problems that we provided them with. Small problems can quickly reinforce learning, and are a proven method in introducing ideas. We demonstrated how to use R to solve real life problems such as tracking the availability of New York City Citibikes and racks at each station (Postgres database API and real time XML file), evaluating the performance of Knicks and other teams, generating local weather reports based on one’s IP address, scraping web pages through xml path or table structures , and etc. Moreover, our students learn how to craft inspiring visuals using Shiny and Rcharts. We also utilize Project Euler exercises.

In the 20 hours of the first offering of R course, we went over one specific topic each class. Days 1 and 2, we went over the programming basics of R. These include: data objects (arrays, matrices, data frames, lists), functions, loops, if-else statements, and vectorized operations. Day 3 we went over how to extract data from a web page, APIs, database portals, and reading excel files. Day 4, we focused on data manipulation, such as basic transformation (data sorting and merging, summarizing data, subsetting, and string manipulation), reshaping data, splitting and combining data and data aggregation. Day 5 we covered visualization with lattice and ggplot2, how to make maps, scatter plots, matrix-related plots and making publications ready and polished.

In our upcoming 35 hour course, we plan to make a few adjustments. First, we will give more in-class and homework exercises. As always, our goal is to solidify student understanding and retention of all key concepts and skills. Additionally, we will introduce more statistical analysis. This includes: basic statistical testing, regression and principal component analysis. If we get through the material early, we will cover decision trees, k-means clustering and other mainstream machine learning/data mining techniques.

After each course, we collect session feedback from each student to make improvements where they can be made. We encourage students to participate on our class dashboard, Piazza, to post questions, help to answer questions and share useful resources related to R programming. To share our desktop screen with students and invite students to share their solutions and interact with others we employ join.me. Students are always encouraged to ask questions; both general and specific.

The dream is to inspire our students to never stop learning.

If you are interested in learning more about data mining, you are strongly recommended to take our Data Science by R (intermediate level) from Mar 8th to April 15th https://www.meetup.com/NYC-Data-Science-Academy/events/152015792/

Vivian encourages all the students to join NYC Open Data Meetup group as supplementary study. This group offers free workshops every Monday and Thursday. All the workshop material (including video, slides, source code and attendants list) are on its website https://www.nycopendata.com. The topics includes programming by R, python, tableau, processing, node.js, D3.js, Angular, GitHub, location data query, iOS programming, Google fusion, Gephi and dedicated talks covering social media analytics, graph theory, social network, big data visualization, census data processing, data science for social good, health care open data projects, open data author panel, young coder panel, policy in practice talk series, kaggler talk series, citibike hack session, NYC open data portal intro, Data network, and Interactive and reproducible reporting and etc.

Frequently asked questions:

--What is the philosophy of this class?

We hope to strike a balance between breadth and depth. We will cover a range of topics, but in each we will focus on intuition -- learning why R does certain things is important to writing robust R code that can be trusted.

--Is this class designed for beginner?

Yes, we hope this class will help beginners to get started with R. In addition, to strive to make in-class and homework exercises challenging enough for those who have some programming background or some R experience. Every student is expected to read one to two introductory level R books before the class starts. It is required pre-work.

--What if I know a little R, will this class be helpful?

Yes, if you know a little, day 1 will be smooth for you. However, you still have to work hard to learn the majority of the content.

--What if I know some other programming language, will I pick up R faster?

Yes, and we will give extra challenging in-class exercises and homework to you.

--What do you expect from a student? And how can I get the most out of this class?

This course is meant to be fast-paced. Students are expected to review the slides and to work on exercises between sessions. Students should feel comfortable working in groups and participating in class. We hope you can sign up for the class at least 2 weeks before it starts and do some pre-work, including reading and online resources of R classes.

--How fast is your class? How much work I need to do?

In order to cover both breadth and depth, this class will move at a fast pace. We will cover around 70 slides each session and ask that students work on exercises between sessions; perhaps even for material we weren't able to get to during the class. We give slides out a week before each class. And you are expected to read and try the codes before the class.

--4 to 7 hours course time is pretty long, how can you help me to stay focus and be productive?
To keep students engaged, the class will rotate between presentation of slides and application of what was learned in exercises. The in-class exercises start with simple modifications of what was presented in the slides and build up to requiring more creative activities.
Students will be encouraged to work in groups, which is meant to give them practice with working in a team environment. The slides and exercises offer them practice with both built-in R datasets and other commonly used datasets such as the World Development Indicators.

About Author

Vivian Zhang

Vivian Zhang

Vivian Zhang is the founder of the NYC Data Science Academy and the NYC Open Data meetup. She earned her M.S. in Computer Science and Statistics and B.S. in Computer Science. She is ranked as one of the...
View all posts by Vivian Zhang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp