NYU Data Science newsletter – July 13, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for July 13, 2015

GROUP CURATION: N/A

 
Data Science News



Dataphoric: Learn Data Science the Hard Way

Dataphoric


from June 27, 2015

So you want to be a Data Scientist? The good news is that there are tons of great resources out there to learn from. The bad? None is comprehensive, and choosing the best can be completely overwhelming. I created this list to help you stay focused on learning what’s important, the easiest way possible.

But it won’t be easy…

 

A dataset of every Reddit comment

Hacker News


from July 11, 2015

Someone has already put it on Google Big Query – https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.2015_05

Link to original Reddit thread – https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/

 

Preface to Python Data Science Handbook (Early Release)

O'Reilly Publishing, Jake VanderPlas


from July 06, 2015

This is a book about doing data science with Python, which immediately begs the question: what is data science? It’s a surprisingly hard definition to nail down, especially given how ubiquitous the term has become. Vocal critics have variously dismissed the term as a superfluous label – after all, what science doesn’t involve data? – or a simple buzzword which only exists to salt resumes and catch the eye of overzealous tech recruiters.

In my mind, these dismissals miss something important. Data science, despite its hype-laden veneer, is perhaps the best label we have for the cross-disciplinary set of skills that are becoming increasingly important in many applications across industry and academia.

 

Statistical Advice for A/B Testing

Insight Data Science


from July 09, 2015

A/B testing is awesome. It’s fun, it’s lucrative, and it’s an extremely visible and impactful way that you can create value as a data scientist. It’s both thrilling and deeply satisfying to see a change you proposed make a multi-million dollar difference. If only you could get paid on commission!

Before you write that email to your boss asking for a raise, though, it’s worth making sure that your A/B test evaluation process is correct. It would be… unfortunate if it turned out that your decision to color all your call to action buttons hot pink wasn’t worth the “mad stacks” that you claimed, and was in fact actively harmful. To avoid such embarrassments, you’d like to implement some sound statistical practices for evaluating your A/B tests.

Unfortunately, good statistical methods for A/B testing are more complicated than they might seem at first. (Check out the whitepaper Most Winning A/B Test Results are Illusory and Evan Miller’s How Not to Run An A/B Test for some interesting examples.) Statistical errors are easy to make, and these mistakes can fatally bias your A/B testing program. So here are four recommendations for avoiding some common statistical difficulties, and for achieving a successful and sound A/B test evaluation plan. Happy testing!

 

Oracle Data Capture Makes Utilities’ Metering Operations More Efficient · Environmental Leader · Environmental Management News

Environmental Leader


from July 08, 2015

Oracle Utilities has launched DataConnect, a data extraction feature for Oracle Utilities Customer Care and Billing and Oracle Utilities Meter Data Management that allows utilities to more easily leverage data across their systems, including those provided by third-party vendors.

This new tool exports customer and usage information for use in downstream applications, enabling utilities to derive greater value from their data and provide new offerings, such as conservation programs and audit tools, which require access to consistent and accurate data.

 

Systems biology: Network evolution hinges on history : Nature : Nature Publishing Group

Nature


from July 08, 2015

The effects of mutations in proteins can depend on the occurrence of previous mutations. It emerges that such historical contingency is also important during the evolution of gene regulatory networks.

 

European labs set sights on continent-wide computing cloud : Nature News & Comment

Nature News & Comment


from July 08, 2015

From astronomy to genomics, scientists are increasingly storing and studying their data sets on shared remote ‘cloud’ computing servers, accessed through the Internet. Three of Europe’s biggest research labs now want to help academics by working with commercial firms to create a continent-wide cloud-computing portal — and they are hoping to get backing from the European Commission.

Many researchers find cloud computing to be more flexible and efficient than buying expensive hardware — they can rent servers from firms such as Amazon and Google when they need a burst of power for an intensive computation, for example (see Nature 522, 115–116; 2015). Despite the advantages, some academics are concerned about security and reliability when storing their data on outside servers, says Bob Jones, a computer scientist at CERN, Europe’s particle-physics lab near Geneva, Switzerland.

 

Cutting cost and power consumption for big data | MIT News

MIT News


from July 10, 2015

… at the International Symposium on Computer Architecture in June, MIT researchers presented a new system that, for several common big-data applications, should make servers using flash memory as efficient as those using conventional RAM, while preserving their power and cost savings.

The researchers also presented experimental evidence showing that, if the servers executing a distributed computation have to go to disk for data even 5 percent of the time, their performance falls to a level that’s comparable with flash, anyway.

In other words, even without the researchers’ new techniques for accelerating data retrieval from flash memory, 40 servers with 10 terabytes’ worth of RAM couldn’t handle a 10.5-terabyte computation any better than 20 servers with 20 terabytes’ worth of flash memory, which would consume only a fraction as much power.

 

How SAP plans to bring analytics to soccer | CIO

CIO


from July 07, 2015

SAP and City Football Group have announced a global, multi-year partnership that will deliver data analytics to every level of CFG and its international football clubs, from business operations to fan engagement to player and team performance.

 

Harnessing computer power to understand biology | Science Careers

Science Careers


from July 07, 2015

An ability to combine computation and experimentation to discover new insights about the regulation of gene expression and assembly of protein complexes recently won Sarah Teichmann a 2015 EMBO Gold Medal. The award recognizes outstanding scientific achievements from young researchers in Europe in the field of molecular biology. … Science Careers asked Teichmann how she gained her skills and abilities and what doors they opened to her. This interview has been edited for clarity and brevity.

 

Leave a Comment

Your email address will not be published.