# Business School Rankings: Are they really important ?

Pursuing a MBA is a dream for many students and is one of the most important decisions that one can make for his/her career. Though there are several benefits in doing a Full time MBA, it is still a very expensive proposition given the time and financial commitments that one would have to make during the course of the program. I wanted to analyze the Full time MBA program rankings of various business schools and see if the rank or tier of a Business school really matters in determining the return on investment. The following questions were to be addressed as a part of my analysis:

- Does the Business school's rank impact the post MBA compensation?
- How do the International Business Schools compare against their U.S. counterparts regarding the overall quality of the program and faculty?
- Are the average GMAT scores and work experience of the incoming cohort the same across all tiers of business schools?

In addition to answering the above questions, I wanted to build a Machine Learning model that would predict if a given business school is a top-tier school based on the school's admission, faculty, and program related data.

## Data Preparation & Pre-processing

The data for the business school rankings were scraped from The Economist website using the Beautiful Soup library in Python. This website has the rankings starting from 2011 to 2015, and additional details regarding the school, program, faculty, tuition cost, and recruitment can be obtained from other pages once we click the school name in the home page.

The data extraction was a bit time consuming given the multiple pages that had to be navigated and consolidated to extract the data. The final data set consisted of 500 observations across five years with 29 variables.

The field names were cleaned up to remove long phrases, underscores, and a new variable was added to reflect the world region corresponding to the school’s location. The data also had unwanted HTML tags and ASCII characters that were removed and the character set encoding was converted to Unicode Transformation Format (UTF - 8) to make them human readable.

## Exploratory Data Analysis

The box plot below shows a clear difference in the distribution of GMAT score across the tiers. The average GMAT score for admission to Tier 1 schools were the highest followed by other tiers in the order as shown in the plot. There is a considerable difference in the average GMAT scores between Tiers 1 & 2 and between Tiers 3 & 4 compared to the difference between Tiers 2 and 3.

The Distribution of average Post MBA compensation also shows a gradual declining slope as we move from tier 1 to tier 4 schools. Based on this plot, we can infer that the business school tends to have a significant impact on post-graduate compensation. There are a few outliers for tiers 3 and 4 (especially on the lower end) because of Asian schools that report relatively lower compensation. It also seems those schools may not have factored in the purchasing power of their respective countries while reporting compensation in US dollars.

The pre-MBA distribution plot of average work experience has a long right tail and is positively skewed with a mean of 5.5 years. Based on further analysis, we can infer that students in Australia and Far East Asia tend to work longer before pursuing a full time MBA compared to students in Europe, America and South Asia.

The average annual compensation is normally distributed with a mean annual salary around USD 105,000 and there are no significant extreme values.

## Hypothesis Tests

Hypothesis tests were run to check the GMAT score, student diversity, faculty quality, and program rating were statistically significant. These variables were found to be statically significant using the Analysis of Variance (ANOVA) test except the MBA Program rating. This shows that the students' perception of the overall MBA program of their respective schools is not vastly different, and the ratings across the world are pretty much in line with each other.

## Predictive Modeling

The next step was to create a Logistic Regression model using the variables below. Logistic Regression is a supervised Machine Learning algorithm that returns the probability of the response variable taking a particular value based on combination of values taken by the predictors.

- Average GMAT score
- Post MBA salary
- Percentage who received job offer
- Percent graduates finding jobs through school’s career services
- Student rating of program
- Student rating of careers service
- Ratio of Faculty to students

### Model Building & Outcome

The response variable in this algorithm is typically a binary variable. Since the goal of this analysis is to determine whether a business school is a top tier school or not, Logistic Regression would be a good candidate to accomplish the binary classification task.

Based on review of the logistic regression model the variables that were statistically significant (at the 5% significance level) are

- Average GMAT score
- Post MBA salary
- Percentage who received job offer
- Percent graduates finding jobs through school’s career services

The statistical significance of these variables indicate that they do play a key role in determining whether a school is a top tier school or not. The results of Logistic Regression can be interpreted as the odds of success of a school being a top tier school which can be explained using the below table.

### Model Accuracy

The next step in the analysis was to check the accuracy of the model by looking at the confusion matrix. This matrix displays the results of predictions against the actual results from the observations for the training and test datasets in a tabular format that would help determine the percentage of accuracy in the predictions.

The Training data set has 61 of the 68 observations predicted correctly and 7 observations were incorrectly classified. The Test data set had 29 of the 32 observations predicted correctly resulting in an overall prediction accuracy of 90%. Although the model's prediction seems to be good in this data set, the overall McFadden's R squared was ~66.2% indicating that the model can explain approximately 66% of the variability in the response variable. There is still some scope for improving the model's R squared value so that it can generalize well across unseen data sets.

The R code for the complete analysis as well as the Python code for web scraping can be reviewed at the GitHub repository:

https://github.com/nycdatasci/bootcamp006_project/tree/master/Project3-WebScraping/NandaRajarathinam

## Conclusion

The following insights were derived based on the predictive model and the exploratory data analysis performed:

- The Rank or Tier of a Business school does seem to have an impact on the individual's annual compensation and ultimately the return on investment post-graduation.

- GMAT Score, Annual Compensation(upon graduation), percentage of students who received a job offer, and effectiveness of the school’s Career Services are the four predictors that are statistically significant (at 5 % significance level) in determining whether a school is a Top tier Business school

- The average GMAT score of the cohort is lower in Europe compared to the U.S. Also, Asia Pacific and Australian Business schools tend to have lower geographical diversity compared to their counterparts in Europe and U.S.

- There is no statistical significance regarding the students’ rating (perception) of the full time MBA program across the various regions of the world. But there is a statistical significance regarding students rating of the faculty across the business schools

- Outside of the U.S, Students tend to have more work experience before pursuing a full time MBA compared to U.S based business schools

- Based on the principal recruiters’ information, consulting seems to be the industry of choice for the Full time MBA graduates followed by technology and financial services.