Each class is 35 hours of classroom guidance with a optional three week-long showcase project of students’ own choices and optional presentation of their projects.This intensive class will introduce you to the wonderful wold of R and provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.
From the rudimentary building blocks of programming basics, to data manipulation and use of advanced drawing packages, the course will conclude with a demonstration of a project of your choice on Project Demo Day. For Demo Day you will access and analyze real data, utilizing the tools and skill set taught to you throughout the course.
Upon successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation. Certificates are awarded according to your understanding, skill, and participation.
1. Basic Programming Elements – 14 hours
Students will learn the fundamental characteristics of the R language, and acquire essential programming skills to apply to future techniques in data handling, analysis, and visualization.
Case Study and Exercises:
Use the R language to complete problems from the Euler Project.
2. Primary Statistical Methods – 7 hours
- What is R?
- Why R?
- How to get help
- R language resources
- Installing and using packages
- Data Objects: Vectors, Matrices, Data Frames, and Lists
- Local data import/export
- Control Statements
This session will cover the essential statistical methods used in data science, focusing on the fundamental building blocks which more advanced predictive modeling hinge upon.
3. Data Manipulation – 7 hours
- Descriptive statistics
- Hypothesis testing
- Linear Regression
- Logistic Regression
- Introducing non-parametric statistics
This session teaches how to manipulate data and use R for all kinds of data conversion and restructuring processes that are frequently encountered in the initial stages of data analysis. We will also cover string processing operations and advanced data capture such as web scraping, API usage, and external database connections.
Case Study and Exercise:
Find a QQ (the most used instant messenger tool) group and solve a research problem based on text features.
4. Data Visualization – 7 hours
- Data sorting
- Merging Data
- Remodeling Data
- String manipulation
- Dates and time stamps
- Web data capture
- API data sources
- Connecting to an external database
We will quickly cover basic plotting types before introducing two advanced drawing packages (lattice
), using the two graphing schemes to develop an understanding of the fundamental processes behind data visualization and the various options available to the data scientist to describe her data through clear and beautiful visualizations.
Reproducing famous graphics like Hans Rosling’s Gapminder
- Point graphics
- Columnar graphics
- Line charts
- Pie charts
- Box Plots
- Scatter plots
- Visualizing multivariate data
- Matrix-based visualizations