Another good piece on the importance of ‘big data’ and urgent need of people know how to work with them (Spotted by William). It has a nice video. too.
Steve O’Grady (@sogrady) , a developer-focused analyst from RedMonk, views large-scale data collection and aggregation as a problem that has largely been solved. The tools and techniques required for the Googles and Facebooks of the world to handle what he calls “datasets of extraordinary sizes” have matured. In O’Grady’s analysis, what hasn’t matured are methods for teasing meaning of this data that are accessible to “ordinary users.”
- O’Grady on the challenge of big data: ”Kevin Weil (@kevinweil) from Twitter put it pretty well, saying that it’s hard to ask the right question. One of the implications of that statement is that even if we had perfect access to perfect data, it’s very difficult to determine what you would want to ask, how you would want to ask it. More importantly, once you get that answer, what are the questions that derive from that?”
- O’Grady on the scarcity of data scientists: ”The difficulty for basically every business on the planet is that there just aren’t many of these people. This is, at present anyhow, a relatively rare skill set and therefore one that the market tends to place a pretty hefty premium on.”
- O’Grady on the reasons for using NoSQL: ”If you are going down the NoSQL route for the sake of going down the NoSQL route, that’s the wrong way to do things. You’re likely to end up with a solution that may not even improve things. It may actively harm your production process moving forward because you didn’t implement it for the right reasons in the first place.”
Seems to be exactly what we are doing here:
Yahoo! Labs in Barcelona (Spain) has several openings for postdoc positions in data mining, information retrieval, HCI, and visualization with emphasis in the areas of social-network analysis and topic modeling, within the context of EU-funded projects.
Yahoo! Labs is pioneering the new sciences underlying the web. As the center of scientific excellence for Yahoo!, Yahoo! Labs delivers both fundamental and applied scientific leadership through published research and new technologies powering the company’s products.
Selected candidates will be working together with our scientists to develop novel algorithms for processing and mining extremely large amounts of data coming from user action logs and social networks. Successful candidates should have demonstrated their ability to perform high-quality original research in the areas of data mining, social-network and social-media analysis, information-propagation analysis, social-influence analysis, and/or topic modeling. Candidates should also be proficient in developing proof-of-concept prototypes and experimenting with novel algorithms in a variety of platforms for processing very large data sets.
- Ph.D. degree in Computer Science;
- Proficiency in both written and spoken English;
- Strong problem-solving and analytical skills;
- Proven ability to perform high-quality original research in any of the areas above (data mining, information retrieval, HCI, visualization) both independently and as a member of a team;
- Experience in research and/or development of applications in the areas of social networks and/or social media and/or topic modeling;
- Proven ability to develop proof-of-concept prototypes for experimenting with novel algorithms;
- Experience in distributed processing of large data sets (Hadoop/Pig a plus)
- Proficiency in at least one of Python, Perl, C/C++, Java;
- Experience with UNIX systems;