Data Science with R: Intro to Data Analysis
Date: Sept. 27th(Sat); Oct. 4th(Sat), 11th(Sat), 19th(Sun), 25th(Sat); 2014 (4 Saturdays, 1 Sunday)
Time: 10:00 a.m. – 5:00 p.m. (7 hours)
Instructors: Charlie Redmon (Project Manager at SupStat Inc, Masters in Linguistics)
Vivian Zhang(CTO at SupStat Inc, Double Masters in Computer Science and Statistics)
Venue: 500 7th Ave, 17th Fl., New York, NY (close to Times Square)
Room Name: Glass Door Room
**You can contact firstname.lastname@example.org for corporate training or small group training opportunities.
What should I bring?
Be ready to learn. You need your laptop and a recent version of R installed (preferably 3.0+). We also recommend having the RStudio IDE installed, but it’s not required. Bring the R environment that makes you the most productive!
This intensive Data Science with R – Beginner Level course being offered by NYC Data Science Academy is a five week course that will introduce you to the wonderful wold of R and provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.
Why R is important
R is a powerful, comprehensive, and dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, and data science. And another thing: it’s free! As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical and statistical problem. The community of R users is continuing to build new functionality to the language, and R is often the first statistical tool to provide support for new algorithms and cutting-edge methods in data science.
Project Demo Day and Certificates
From the rudimentary building blocks of programming basics, to data manipulation and use of advanced drawing packages, the course will conclude with a demonstration of a project of your choice on Project Demo Day. For Demo Day you will access and analyze real data, utilizing the tools and skill set taught to you throughout the course. Upon successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation.
Certificates are awarded according to your understanding, skill, and participation.
1. Basic Programming Elements – 14 hours
Abstract: Students will learn the fundamental characteristics of the R language, and acquire essential programming skills to apply to future techniques in data handling, analysis, and visualization.
Case Study and Exercises: Use the R language to complete problems from the Euler Project.
- What is R?
- Why R?
- How to get help
- R language resources
- Installing and using packages
- Data Objects: Vectors, Matrices, Data Frames, and Lists
- Local data import/export
- Control Statements
2. Primary Statistical Methods – 7 hours
Abstract: This session will cover the essential statistical methods used in data science, focusing on the fundamental building blocks which more advanced predictive modeling hinge upon.
- Descriptive statistics
- Hypothesis testing
- Linear Regression
- Logistic Regression
- Introducing non-parametric statistics
3. Data Manipulation – 7 hours
Abstract: This session teaches how to manipulate data and use R for all kinds of data conversion and restructuring processes that are frequently encountered in the initial stages of data analysis. We will also cover string processing operations and advanced data capture such as web scraping, API usage, and external database connections.
Case Study and Exercise: Find a QQ (the most used instant messenger tool) group and solve a research problem based on text features.
- Data sorting
- Merging Data
- Remodeling Data
- String manipulation
- Dates and time stamps
- Web data capture
- API data sources
- Connecting to an external database
4. Data Visualization – 7 hours
Abstract: We will quickly cover basic plotting types before introducing two advanced drawing packages (lattice and ggplot2), using the two graphing schemes to develop an understanding of the fundamental processes behind data visualization and the various options available to the data scientist to describe her data through clear and beautiful visualizations.
Case Exercises: Reproducing famous graphics like Hans Rosling’s Gapminder visualization.
- Point graphics
- Columnar graphics
- Line charts
- Pie charts
- Box Plots
- Scatter plots
- Visualizing multivariate data
- Matrix-based visualizations