Data Comparison Between Portuguese Schools

Posted on Jul 19, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Contributed by Shuo Zhang. She is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between  July 5th  to September 23rd, 2016 . Please refer to the following link for R codes:


Data shows education is a key factor for achieving long-term economic growth. Determinants of students’ performance have been the subject of ongoing debate among educators, academics, and policy makers.

This study focuses on secondary education in Portugal.  During the last decades, the Portuguese education level has improved. In the secondary schools, the core classes of Mathematics and Portuguese (the native language) is the most important since they provide fundamental knowledge for the success in the remaining school subjects (e.g. physics or history).

The data of student performance in Mathematics and Portuguese holds valuable information and can be used to improve decision making by parents and schools and to optimize student success. Modeling student performance is an important tool for both educators, parents and students.  It can help us better understand this phenomenon and ultimately improve it.

Data Set description

This data set provides information about student achievement in two Portuguese secondary schools. The data attributes include student grades, demographic, social and school related features.  It was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). The target is to investigate the contributing factors associated with G3 (the final year grade).

The raw data contains 382 observations and 53 variables. The target variables are G3.x (the final year grade of Math) and G3.y (the final year grade in Portuguese). The contributing factors will be presented in 3 categories: school-related (i.e. school extra education support), student-related (i.e. past course performance, age, study time, desire to pursue higher education) and family-related (i.e. parents' status, quality of family relationship, parents' education and job). I analyzed most variables and listed the top contributing factors in the following illustration.

Data Comparison Between Portuguese Schools

Data Visualization in R

Which school has better student performance?

Data Comparison Between Portuguese Schools

The boxplots of final year grade distribution show the difference in student performance by school. In this graph we can see that for the GP school the median final year grade for Math is C and the median final year grade for Portuguese is B, while for the GP school the median final year grade for Math is D and the median final year grade for Portuguese is C. We can conclude that the GP school has better student performance. In the following analysis, I will separate the plots based on school.

Does the current student performance have a correlation with the past?

The next question we want to assess is whether G1 and G2 will significantly influence G3. Let's take math performance for example.

Data Comparison Between Portuguese Schools

There are a couple interesting facts that show up in these graphs. First, we notice the data trend can be categorized into two relationships: the cluster with 0 grade (students who dropped a course) and a strong correlation between G3 and  G2, G1 (students who did not drop a course ). So the analysis is divided to two parts based on the trend.

Students who did not drop a class:

Data Comparison Between Portuguese Schools

The figure shows a linear relationship between the current grade and the past grade, which means the better you did in the first and second grade, the higher final year grade you would get.

Students who dropped a class:

Upon further inspection of the data, it becomes obvious that the group with 0 grade most likely belongs to students who dropped the course. There are a couple of interesting facts that show up on the previous graph. First, it has G1 and/or G2 grades but final grades of 0. Second, there are no G1s of 0 but there are G2s with 0 value.


The graph shows that the students who dropped G3 failed both at G1 and G2. The further investigation of the data displays that 13 students dropped G2, 39 students dropped G3 and all of students dropped G3 also dropped G2.

Is student performance affected by past class failure?


The graph shows the fact that past class failure plays a role in current student performance, and we can summarize that successful students tend to have a history of success.

Does student performance change based on age?


From this graph, we can conclude that the age of the students also plays a factor in the final year grade. The older the student is, the lower the final year grade he is likely to achieve.

Does the student who wants to take higher education do better at school?


In terms of study motivation, the student with a desire to pursue higher education has a higher probability of achieving success.

Is it true that the more time a student spends on studying, the greater his chances are of getting a higher grade?


The graph shows an association between study time and the final year grade; the  successful students tend to spend more time on coursework.

Does absence relate to student performance?


It is hard to conclude a relationship between number of school absences and final year grade. To get a better understanding of the plot, I grouped the number of school absences into 4 categories: 0-9, 10-19, 20-29, 30+.


The new graph presents that successful students tend to have less school absences.

Does the parents' education and job influence student performance?

Let's take mother's education and job for example.


The boxplots at the left of the graph indicate that students with working mothers tend to have better course performance than those with home-staying mothers. Also upon further investigation of mother job types, the boxplots at the right of the graph  demonstrate that students whose mothers have a higher education level are most likely to achieve success in their courses.

Furthermore plotting mother's job with education allows us to understand how the job distribution varies among education levels. Here we see the working mother has a greater portion of higher education level and especially the mother who works as a teacher has the most advanced  education degree on average. In conclusion, the student who has a working and well-educated mother tends to be more successful.

What is the top consideration to choose a school?


From the graph we can see that the first consideration is the quality of school course and the second is whether the school is close to home.


I have addressed the data visualization of secondary student grades of two core classes (Mathematics and Portuguese) by using past school grades (first and second periods), demographic and school related data. In conclusion, the student achievement is highly affected by previous performances. Also, there are other relevant factors that contribute to student performance, such as: school related, demographic (e.g. student’s age, study time, desire to pursue higher education, parent’s job and education). The conclusion is summarized:

  • GP school has more successful students than MS.

  • The final year grade is highly affected by the first and second year grades. Students are more likely to drop a course if they’ve had bad initial grades in that course.

  • Successful students tend to be younger, have a history of success and a desire to continue onto higher education, be absent less,  and also spend more time on coursework.

  • Successful students are prone to have working and well-educated parents.

Future work

If more data can be provided about the student performance from more schools from the same community and more subjects (i.e. history) , the analysis can be more accurate.

About Author

shuo zhang

Shuo Zhang graduated from Columbia University with a Ph.D degree in Chemical Engineering and the focus of her academic research was to design a protocol to synthesize layer-by-layer polymer films on nano-surfaces, investigate dynamics and kinetics, construct quantitative...
View all posts by shuo zhang >

Leave a Comment

ViboHoara October 5, 2017
Found a lump under my armpit. First it was small. The pain was not felt. Now rub and growing. My friends like do not. The other day heard that it can be severe malaise. But most likely, obviously fatty lump. Found understandable information about this lump. Now I'm not afraid of terrible diagnosis. Everything is clear and detail is written down to the smallest detail. Many have problems with the armpits. There are a lot of microbes. Later arise balls. They can be painful, but not always. It is better to know in advance about this problem. Then there will be no fear of horror. Good, that now I understand this. small painful lump under armpit

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI