Scrapping google scholar. Toward Epistemological dynamics of Machine Learning.
When you want to become a datascientist, you may face the same feeling as everyone does. You may start to ask google, ask datascientists, ask mathematicians, computer scientist, read books, papers, articles, blogs, Youtube vidéo, even your professor if you are at school, and unfortunately, you will have thousands of information, different answer, because there have each different perspective and specialization.
Is there any way to investigate objectively, the kingdom of data-science ?
The paradoxe is that every source you get as I just mentionned, could integrate all area of datascience in a pretty good, fully non exclusive representation.
You can find maps to choose algorithms :
But the thing is that you will always find something that doesnt fit in it.
I personnally tried to build my own map, and even if it took me some month to draw it, I still have the feeling that it is still not enough :
I was looking for something that could show me the entire ecosystem of datascience, to know and understand how each of its component interact, are evolving together. I was looking for a place that could contain everything.
The answer was google scholar.
I decided to scrap google scholar machine learning related publication to mesure and analyse their properties and finnaly trying to get an general overview of what is datascience.
In this project, I didn't took a lot of time to do different analysis because, the shiny app will be integrated, inside a leaflet map that I am currently building for the capstone project.
Different kind of plot and analysis are available.
Here is the app :