COURSE

See Our Variety of Programs

We offer Data Science training for individuals and corporations.
Enter your email to preview videos and students work.


r_b
Data Science with R: Data Analysis

Details

Instructor

Bryan Valentini, Founder of Kinisi Inc, Adjunct Instructor at NYC Data Science Academy.
Bryan is an entrepreneur at Kinisi, Inc, and a Carnegie Mellon graduate. His development experience comes from building large scale, delay-tolerant systems, as well as human factors research designing rich usable visualizations in various languages and platforms. When not working with software, he spends time working with small embedded hardware, and teaching others about the Raspberry Pi computer. As a graduate of the NYC Data Science program, he enjoys tackling interesting data problems in weather, sports, and civic technology.

Syllabus

Each class is 35 hours of classroom guidance with a optional three week-long showcase project of students’ own choices and optional presentation of their projects.This intensive class will introduce you to the wonderful wold of R and provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.

From the rudimentary building blocks of programming basics, to data manipulation and use of advanced drawing packages, the course will conclude with a demonstration of a project of your choice on Project Demo Day. For Demo Day you will access and analyze real data, utilizing the tools and skill set taught to you throughout the course.

Upon successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation. Certificates are awarded according to your understanding, skill, and participation.


1. Basic Programming Elements – 14 hours Abstract: Students will learn the fundamental characteristics of the R language, and acquire essential programming skills to apply to future techniques in data handling, analysis, and visualization. Case Study and Exercises: Use the R language to complete problems from the Euler Project. Outline:
  • What is R?
  • Why R?
  • How to get help
  • R language resources
  • RStudio
  • Installing and using packages
  • Workspace
  • Data Objects: Vectors, Matrices, Data Frames, and Lists
  • Local data import/export
  • Functions
  • Control Statements
  2. Primary Statistical Methods – 7 hours Abstract: This session will cover the essential statistical methods used in data science, focusing on the fundamental building blocks which more advanced predictive modeling hinge upon.
  • Descriptive statistics
  • Hypothesis testing
  • Linear Regression
  • Logistic Regression
  • Introducing non-parametric statistics
  3. Data Manipulation –  7 hours   Abstract: This session teaches how to manipulate data and use R for all kinds of data conversion and restructuring processes that are frequently encountered in the initial stages of data analysis. We will also cover string processing operations and advanced data capture such as web scraping, API usage, and external database connections. Case Study and Exercise: Find a QQ (the most used instant messenger tool) group and solve a research problem based on text features.
  • Data sorting
  • Merging Data
  • Remodeling Data
  • String manipulation
  • Dates and time stamps
  • Web data capture
  • API data sources
  • Connecting to an external database
  4. Data Visualization – 7 hours   Abstract:  We will quickly cover basic plotting types before introducing two advanced drawing packages (lattice and ggplot2), using the two graphing schemes to develop an understanding of the fundamental processes behind data visualization and the various options available to the data scientist to describe her data through clear and beautiful visualizations. Case Exercises:  Reproducing famous graphics like Hans Rosling’s Gapminder visualization.
  • Histograms
  • Point graphics
  • Columnar graphics
  • Line charts
  • Pie charts
  • Box Plots
  • Scatter plots
  • Visualizing multivariate data
  • Matrix-based visualizations
  • Maps

Intended Audience and Prerequisite

Are you interested in better understanding your data, and not so interested in mastering a programming language? Have you tried learning R from a book or website, but have been discouraged? If so, this is the course for you.

We assume that you’ve never programmed before (although some experience doesn’t hurt), and we teach you the best tools to help analyze your data. You won’t be a master programmer by the end of this two-day course, but through immersion you will have learned the basics of R’s syntax and grammar, and you’ll have started building an effective R vocabulary for visualizing, transforming, and modeling data.

Recommended Book(s)

Image1 Image1 Data_Manipulation with R

FAQ

1. Can I take class online if I am not in NYC?
You can take it onsite or through recorded sessions on Youtube and get timely assistance from teaching crew by google hangout or Skype.

2. If I have to miss some session, how can I make it up?
We record all of our classes and make it available for students right after each class. If you miss a class, you can also get extra help such as office hour or internet support through google hangout or Skype.

3. Do I have to do a project? Is it required for taking this class?
Students could consider to spend extra time with the teaching crew to do a project of their own choices. We are happy to offer assistance and arrange presentation to demo their work.

4. Why R is important?
R is a powerful, comprehensive, and dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, and data science. And another thing: it’s free! As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical and statistical problem. The community of R users is continuing to build new functionality to the language, and R is often the first statistical tool to provide support for new algorithms and cutting-edge methods in data science.

Get Prepared Before Class

Bring Your Own Laptop
Each participant is required to bring their own laptop running Windows or OS X.  

Install R and IDE
The software used during this training program, R, is free and readily available for download. You can install R and RStudio desktop IDE before the class.

Fill a Survey
Please fill a programming background survey, you can find it at bit.ly/nycdatasci. We want to know you better, so we can serve your need and adjust the class based on students’ background.

Take Home Material
Attendees receive an electronic copy of the course materials and related R code at the conclusion of the workshop.

Schedule

  • Workshop starts at 10:00am
  • Lunch Break at 12:30pm – 1:15pm
  • Afternoon Coffee Break at 2:45pm – 3:00pm
  • End of the Workshop: 45:00pm

Campus Photo

image-1