NYC School Performance, Poverty and Class size Analysis
March 26, 2015
Considering the importance of education in an increasingly knowledge based economy, I performed an exploratory data analysis of school performance in relation to various attributes that might potentially have an influence, with the following objectives.
- Understand what attributes actually influence school performance.
- Analyse if our current affirmative action plans and college admission policies reflect such influence.
Scope, Variables and Datasets:
Analysis was restricted to
NYC puclic schools ( comprising 32 school districts)
School district size
English language learners ratio
SAT score. covering Math , Reading and Writing was used as an indicator of school performance.
Following datasets were used for the analysis
- All source datasets were merged by District-id:School-id to create the master file.
- The dataset was scaled and centered as the features measured are vastly different- for example , poverty ratio is in percentage , class size in tens and SAT scores in hundreds .
- Data set was checked for Near-Zero variance attributes using
nearZeroVarfunction, so they can be dropped from feature set, there were none .
- Data set was checked for highly correlated variables using
viffunction, so they can be dropped from feature set, there were none .
regsubsets was for used feature selection - following 3 features out of the total 8 feature, were picked up by regsubsets as features that have some influence on SAT scores