Scrapping google scholar. Toward Epistemological dynamics of Machine Learning.

Baptiste Mokas
Posted on Feb 18, 2020

When you want to become a datascientist, you may  face the same feeling as everyone does. You may start to ask google, ask datascientists, ask mathematicians, computer scientist, read books, papers, articles, blogs, Youtube vidéo, even your professor if you are at school, and unfortunately, you will have thousands of information, different answer, because there have each different perspective and specialization. 

Is there any way to investigate objectively, the kingdom of data-science ?

The paradoxe is that every source you get as I just mentionned, could integrate all area of datascience in a pretty good, fully non exclusive representation. 

You can find maps to choose algorithms :  

But the thing is that you will always find something that doesnt fit in it. 

I personnally tried to build my own map, and even if it took me some month to draw it, I still have the feeling that it is still not enough : 

I was looking for something that could show me the entire ecosystem of datascience, to know and understand how each of its component interact, are evolving together. I was looking for a place that could contain everything.

The answer was google scholar.

I decided to scrap google scholar machine learning related publication to mesure and analyse their properties and finnaly trying to get an general overview of what is datascience. 

In this project, I didn't took a lot of time to do different analysis because, the shiny app will be integrated, inside a leaflet map that I am currently building for the capstone project. 

Different kind of plot and analysis are available. 

Here is the app :


About Author

Baptiste Mokas

Baptiste Mokas

Hello! , I am a student-researcher in cognitive and mathematical bioscience dedicated to the modeling of the integration of information in complex adaptive and multiobjective systems. I use a lot of datascience tools for my work. Always ready...
View all posts by Baptiste Mokas >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp