Data Science with R: Intro to Data Analysis
Date: Sept. 28th; Oct. 5th, 12th, 19th, 26th; 2014
Time: 10:00 a.m. – 5:00 p.m. (7 hours)
Instructors: Charlie Redmon (Project Manager at SupStat Inc, Masters in Linguistics from EFL University, Hyderabad)
Venue: 500 7th Ave, 17th Fl., New York, NY (close to Times Square)
**You can contact email@example.com for corporate training or small group training opportunities.
What should I bring?
Be ready to learn. You need your laptop and a recent version of R installed (preferably 3.0+). We also recommend having the RStudio IDE installed, but it’s not required. Bring the R environment that makes you the most productive!
This intensive Data Science with R – Beginner Level course being offered by NYC Data Science Academy is a five week course that will introduce you to the wonderful wold of R and provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.
Why R is important
R is a powerful, comprehensive, and dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, and data science. And another thing: it’s free! As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical and statistical problem. The community of R users is continuing to build new functionality to the language, and R is often the first statistical tool to provide support for new algorithms and cutting-edge methods in data science.
Project Demo Day and Certificates
From the rudimentary building blocks of programming basics, to data manipulation and use of advanced drawing packages, the course will conclude with a demonstration of a project of your choice on Project Demo Day. For Demo Day you will access and analyze real data, utilizing the tools and skill set taught to you throughout the course. Upon successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation.
Certificates are awarded according to your understanding, skill, and participation.
1. Basic Programming Elements: 14 hours
Abstract: Students will learn the fundamental characteristics of the R language, and acquire essential programming skills to apply to future techniques in data handling, analysis, and visualization.
Case Study and Exercises: Use the R language to complete problems from the Euler Project.
- What is R?
- Why R?
- How to get help
- R language resources
- Installing and using packages
- Data Objects: Vectors, Matrices, Data Frames, and Lists
- Control Statements
2. Getting Data: 7 hours
Abstract: This session will cover the various ways the R language reads data, bringing participants through basic knowledge of reading local data, web crawling, and connecting to the databases through SQL.
Case Exercise: Web Crawling
- Local data
- Web data capture
- API data sources
- Connect to an external database
- Accessing local documentation
- Other data sources
- Data export
3. Data Manipulation: 7 hours
Abstract: This session teaches how to manipulate data and use R for all kinds of data conversion and restructuring processes that are frequently encountered in the initial stages of data analysis. We will also cover string processing operations.
Case Study and Exercise: Find a QQ (the most used instant messenger tool) group and solve a research problem based on text features.
- Data sorting
- Merging Data
- Summarizing data
- Remodeling Data
- String manipulation
- Dates and time stamps
4. Data Visualization: 7 hours
Abstract: We will quickly cover basic plotting types before introducing two advanced drawing packages (lattice and ggplot2), using the two graphing schemes to develop an understanding of the fundamental processes behind data visualization and the various options available to the data scientist to describe her data through clear and beautiful visualizations.
Case Exercises: Reproducing famous graphics like Hans Rosling’s Gapminder visualization.
- Point graphics
- Columnar graphics
- Line charts
- Pie charts
- Box Plots
- Scatter plots
- Visualizing multivariate data
- Matrix-based visualizations