Data Science with R: Data Analysis
Each class is 35 hours of classroom guidance with a optional three week-long showcase project of students’ own choices and optional presentation of their projects.
R009 offering:Nov 15th, 22th, Dec 6th, 13th, 20th 2014 (five Sat) , we take a break at Thanksgiving weekend.
R010 offering: Jan 3rd,10th,17th,24th, 31th, 2015(five Sat)
R011 offering: Mar 14, 21, 28, April 4, 11, 2015(five Sat)
When you sign up, please email firstname.lastname@example.org and tell us which session you are signing up.
Time: 10:00 a.m. – 5:00 p.m. (7 hours)
Instructors: Jun Zhao (Data Analyst at SupStat Inc, Masters in Statistics)
Venue: 500 7th Ave, 17th Fl., New York, NY (close to Times Square) or through recorded sessions on Youtube
Room Name: Glass Door Room
**You can contact email@example.com for corporate training or small group training opportunities.
What should I bring?
Be ready to learn. You need your laptop and a recent version of R installed (preferably 3.0+). We also recommend having the RStudio IDE installed, but it’s not required. Bring the R environment that makes you the most productive!
This intensive Data Science with R – Beginner Level course being offered by NYC Data Science Academy is a five week course that will introduce you to the wonderful wold of R and provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.
Why R is important
R is a powerful, comprehensive, and dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, and data science. And another thing: it’s free! As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical and statistical problem. The community of R users is continuing to build new functionality to the language, and R is often the first statistical tool to provide support for new algorithms and cutting-edge methods in data science.
Frequently asked questions
1. Do I have to do three weeks project? Is it required for taking this class?
Students could choose to spend extra 3 weeks with the teaching crew to do a project of their own choices. We are happy to offer assistance and arrange presentation to demo their work.
2. Can I take class online if I am not in NYC?
You can take it onsite or through recorded sessions on youtube and get timely assistance from teaching crew by google hangout or Skype.
3. If I have to miss some session, how can I make it up?
We record all of our classes and make it available for students right after each class. If you miss a class, you can also get extra help such as office hour or internet support through google hangout or Skype.
Project Demo Day and Certificates
From the rudimentary building blocks of programming basics, to data manipulation and use of advanced drawing packages, the course will conclude with a demonstration of a project of your choice on Project Demo Day. For Demo Day you will access and analyze real data, utilizing the tools and skill set taught to you throughout the course. Upon successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation.
Certificates are awarded according to your understanding, skill, and participation.
1. Basic Programming Elements – 14 hours
Abstract: Students will learn the fundamental characteristics of the R language, and acquire essential programming skills to apply to future techniques in data handling, analysis, and visualization.
Case Study and Exercises: Use the R language to complete problems from the Euler Project.
- What is R?
- Why R?
- How to get help
- R language resources
- Installing and using packages
- Data Objects: Vectors, Matrices, Data Frames, and Lists
- Local data import/export
- Control Statements
2. Primary Statistical Methods – 7 hours
Abstract: This session will cover the essential statistical methods used in data science, focusing on the fundamental building blocks which more advanced predictive modeling hinge upon.
- Descriptive statistics
- Hypothesis testing
- Linear Regression
- Logistic Regression
- Introducing non-parametric statistics
3. Data Manipulation – 7 hours
Abstract: This session teaches how to manipulate data and use R for all kinds of data conversion and restructuring processes that are frequently encountered in the initial stages of data analysis. We will also cover string processing operations and advanced data capture such as web scraping, API usage, and external database connections.
Case Study and Exercise: Find a QQ (the most used instant messenger tool) group and solve a research problem based on text features.
- Data sorting
- Merging Data
- Remodeling Data
- String manipulation
- Dates and time stamps
- Web data capture
- API data sources
- Connecting to an external database
4. Data Visualization – 7 hours
Abstract: We will quickly cover basic plotting types before introducing two advanced drawing packages (lattice and ggplot2), using the two graphing schemes to develop an understanding of the fundamental processes behind data visualization and the various options available to the data scientist to describe her data through clear and beautiful visualizations.
Case Exercises: Reproducing famous graphics like Hans Rosling’s Gapminder visualization.
- Point graphics
- Columnar graphics
- Line charts
- Pie charts
- Box Plots
- Scatter plots
- Visualizing multivariate data
- Matrix-based visualizations