Data What Interested Us the Week of February 16, 2015
Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Big Data: Telling The Story Of Falling Oil Prices
Jiayu Peng, 2/20/15
The fall of Oil prices is one of the most prominent topics in our world nowadays, so it's a matter of curiosity for any data enthusiast to see what Big Data can tell us about the Oil market's scene.
Recently, the Rebaie Analytics Group analyzed thousands of news article mentioning the Oil & Gas discussions before and after the fall of Oil prices in a 6 months’ time frame. They used the GDELT data which monitors the world's broadcast, print, and web news around the world.
Based on these data, the research group constructed a network diagram that shows the “communities” of conversation around “Oil & Gas”. Using data-mining techniques in networks, they identified significant influencers on the oil price, and the people with whom they are most closely connected. These results provide valuable insights and evidence for further interpretations.
Reference:
Big Data: Telling the Story of Falling Prices
IBM, G.E. and Others Create Big Data Alliance
Jiayu Peng, 2/20/15
A key element of the big data business is getting what much of computer technology secretly craves: Normality.
A new big data alliance, named the "Open Data Platform", has formed around developing products based on a common core of Hadoop's key components. The members of this alliance,including GE, Hortonworks, IBM, Infosys, Pivotal, SAS, announced a common set of standards for Hadoop.
Hadoop is perhaps the most widespread framework for distributing, managing and processing big data. However, the technology has been somewhat difficult to use, and there are concerns that deepening uses of different kinds of Hadoop, even with slight variations, could slow down the market. Therefore, it is really beneficial that big companies have teamed up and signed on common standards for Hadoop.
Reference:
IBM GE and Others Create Big Data Alliance
Tech Companies Unite Open Data Platform
Title: Oracle's new products aim to combine big data from multiple sources
Jiayu Peng, 2/20/15
Oracle announced four new products on Thursday, targeting one of the core challenges in big data efforts: combining data from multiple sources.
Oracle Big Data Discovery, for example, is designed to serve as the "visual face of Hadoop" for business users. With an interface intended to offer an experience as familiar as shopping online, it lets users not just find and explore data from across multiple sources but also analyze it and share the results, all from a single tool.
Another new product is called "GoldenGate for Big Data", a Hadoop-based tool that allows users to stream real-time, unstructured data from heterogeneous transactional systems into big-data systems including Apache Hadoop, Apache Hive, Apache HBase and Apache Flume.
"Oracle gives customers an integrated platform that helps simplify access to all their data, discover new insights, predict outcomes in real time, and keep all their data governed and secure," said Neil Mendelson, vice president of big data at Oracle.
Reference:
Oracle Steps Us its Big Data Push with New Products
Title: Internet of DNA: medicine’s next great advance
Jiayu Peng, 2/20/15
In January, programmers in Toronto began testing a system for trading genetic information with other hospitals. These facilities, in locations including Miami, Baltimore, and Cambridge, U.K., also treat children with so-called Mendelian disorders, which are caused by a rare mutation in a single gene. The system, called MatchMaker Exchange, represents something new: a way to automate the comparison of DNA from sick people around the world.
The communication between DNA databases is definitely beneficial. If a global network of millions of genomes were established, everyone's medical treatment would benefit from the experiences of millions of others. However, technical issues prevent sharing genomic data around the web, for example, there are no standard protocols, application programming interfaces (APIs), and file formats for DNA.
Fortunately, scientists are targeting these issues, and the MatchMaker Exchange system is a breakthrough. If successfully built, the Internet of DNA could be medicine’s next great advance.
Reference:
Internet of DNA