This class is a comprehensive introduction to Python for Data Analysis and Visualization. This class targets people who have some basic knowledge of programming and want to take it to the next level. It introduces how to work with different data structures in Python and covers the most popular Python data analysis and visualization modules, including numpy, scipy, pandas, matplotlib, and seaborn. We use Ipython notebook to demonstrate the results of codes and change codes interactively throughout the class.
If you have good knowledge of basic data types (e.g. string, numeric), data structures (e.g. list, tuple, dictionary) and are familiar with concepts of list comprehension and for/while loop, you are good to go with the Python for Data Analysis and Visualization course. We will cover these basic Python programming topics in the course as well, but move at a relatively fast speed.
Unit 1: Introduction to Python
Python is a high-level programming language. You will learn the basic syntax and data structures in Python. We demonstrate and run codes within Ipython notebook, which is a great tool providing a robust and productive environment for interactive and exploratory computing.
- Introduction to Ipython notebook
- Basic objects in Python
- Variables and self-defining functions
- Control flow
- Data structures
Unit 2: Explore Deeper with Python
Python is an object-oriented programming (OOP) language. Having some basic knowledge of OOP will help you understand how Python codes work. More often than not, you will have to deal with data that is dirty and unstructured. You will learn many ways to clean your data such as applying regular expressions.
- Introduction to object-oriented programming
- How to deal with files
- Run Python scripts
- Handling and processing strings
Unit 3: Scientific Computation Tools
There are two modules for scientific computation that make Python powerful for data analysis: Numpy and Scipy. Numpy is the fundamental package for scientific computing in Python. SciPy is an expanding collection of packages addressing scientific computing.
Unit 4: Data Visualization
Python can also generate graphics easily using “Matplotlib” and “Seaborn”. Matplotlib is the most popular Python library for producing plots and other 2D data visualizations. Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing statistical graphics.
Unit 5: Data manipulation with Pandas
Pandas provides rich data structures and functions for working with structured data. The “DataFrame” object in Pandas is just like the “data.frame” object in R. Pandas makes data manipulation (filter, select, group, aggregate, etc.) as easy as in R.
After 20 hours of structured lectures, students are encouraged to work on an exploratory data analysis project based on their own interests. A project presentation demo will be arranged afterwards.
- Learn Python the Hard Way: http://learnpythonthehardway.org/
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Preparation – How to set up Python environment
[IMPORTANT] In the class we will use Python 3. If you are following this video to set up Python environment, please make sure you download the Python 3.X version starting from 1 min 23 s in the video.