NYU Data Science newsletter – June 20, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for June 20, 2016

GROUP CURATION: N/A

 
Data Science News



Tweet of the Week

Twitter, Pinboard


from June 20, 2016

 

Yewno Exits Stealth Operations with $10 Million Series A Financing to Introduce Hyperknowledge

PR Newswire, Yewno


from June 15, 2016

Yewno, a profoundly new approach to knowledge discovery aimed at enhancing human understanding, today announced it has exited its two-year stealth mode period, during which the Company achieved the completion of its technology infrastructure and initial content ingestion as well as closed a $10 million Series A funding round backed by Pacific Capital. Such investment is being used to accelerate Yewno’s go-to-market activities and continue to scale the business ahead of its public debut.

The company also revealed it has signed more than two dozen partners ranging from top research universities and publishing houses to content aggregators. Yewno’s ingested knowledge base is large and rapidly growing, comprised of nearly 50 million vetted, quality items, some dating back almost 200 years; the result is a unique knowledge discovery platform where concepts are searched and chained to resemble a human inferential cognitive process.

 

In The Future, We’ll Leave Software Bug Hunting to the Machines

VICE, Motherboard


from June 16, 2016

“Being able to protect the entire cyber attack surface that is present in our lives, and doing it entirely manually, is a herculean task,” says David Melski, the vice president of research at computer security software developer GrammaTech.

His company, along with professors from the University of Virginia, is one of seven finalists in DARPA’s Cyber Grand Challenge, a digital capture-the-flag competition sponsored by the US military’s defence research arm that will take place in August at DEF CON, the infamous Las Vegas hacker convention.

Each team has been tasked with designing a tool that can tear apart software, look for vulnerabilities, and then secure them against attack—while at the same time using the vulnerabilities they discover to attack the opposing teams.

 

Robots, swarming drones and ‘Iron Man’: Welcome to the new arms race

The Washington Post


from June 17, 2016

In his quest to transform the way the Pentagon wages war, Defense Secretary Ashton B. Carter has turned to Silicon Valley, hoping its experimental culture, innovation and sense of urgency would rub off on the rigid bureaucracy he runs.

Carter has made several trips to the Valley and appointed Eric Schmidt, the chairman of Google’s parent company to an advisory board. And recently he sat down at the Pentagon with Elon Musk to see what suggestions the billionaire founder of Tesla and SpaceX might have to make the nation’s military more efficient and daring.

Musk’s answer? “Having an incentive structure that rewards innovation is extremely important,” he said in an interview after the meeting. “It’s economics 101. Whatever you reward will happen.”

The Pentagon finds itself in a new arms race, struggling to keep pace with forms of combat that are fought with bytes as well as bullets.

 

Developing the technical language behind creativity for Canvas

Facebook Code, Engineering Blog; Asad Awan and Rohan Kuruvilla


from June 17, 2016

Creative expression has evolved over time as media has progressed from print to radio, to television, to the internet. As a continuation of this trend, mobile devices today enable more opportunities for creative expression and interaction, powered by novel input gestures, high-resolution displays, convenient form factors, and ambient sensors. However, mobile devices still have several limitations and complexities — such as battery life, heterogeneity of device hardware, and network scarcity — that content creators must contend with. In addition, languages and tools for mobile content creation are still in their infancy, and many advertisers face challenges in telling stories to mobile audiences through engaging and interactive formats.

Our teams built Canvas as a step toward solving these problems.

 

Reweaving the web – A slew of startups is trying to decentralise the online world

The Economist


from June 18, 2016

… a new band of entrepreneurs and venture-capital firms is emerging with a mission to “re-decentralise” the internet. This is not the first time that new technology has pushed against the centralising forces of the internet. In the early 2000s “peer-to-peer” services such as Napster and Kazaa, for instance, allowed users to share music files rather than download them from a central server. But lawsuits from record labels and, in some cases, a failure to find ways to profit from these services meant these technologies ended up being limited to a few services, such as Skype, which offers free internet calling.

If decentralisation is now making a comeback, it is largely because of the rise of bitcoin, a crypto-currency, and its underlying technology, the blockchain. This is a globally distributed database, which is maintained not by a single actor, such as a bank, but collaboratively by many.

 

Galaxy-seeking robots

Gordon and Betty Moore Foundation


from June 15, 2016

Moore Foundation grantees at Lawrence Berkeley National Laboratory have developed a prototype system, designed as a test for a planned array of 5,000 galaxy-seeking robots.

Dubbed ProtoDESI, the scaled-down, 10-robot system will help scientists achieve the pinpoint accuracy needed to home in on millions of galaxies, quasars and stars with the Dark Energy Spectroscopic Instrument (DESI) planned for the Mayall Telescope at Kitt Peak National Observatory near Tucson, Ariz.

More data-related news from the Foundations who support this newsletter:

  • The Conference Scene for Data-Driven Discovery (June 21, Medium, Moore Data, Carly Strasser)
  • Four foundations announce support for ASAPbio (June 20, ASAPbio)
  •  

    Using Computational Models to Design Better Games

    NYU Tandon School of Engineering


    from June 17, 2016

    The Banff International Research Station (BIRS) may be located in a picturesque resort area of Alberta, Canada, but that bucolic setting belies its core mission: to encourage collaborative, cross-disciplinary research with a focus on the mathematical sciences and their applications. Scientists and mathematicians from around the world vie to hold workshops at the notoriously selective facility, and in May of this year Assistant Professor of Computer Science Andy Nealen, who also co-directs the Game Innovation Lab, was chosen to present “Computational Modeling in Games.” The multi-day conference brought together academics well-versed in researching computational models and paired them with game designers, allowing for design problems to be approached using formal modeling techniques.

     

    Facebook’s Head of AI Wants to Teach Chatbots Common Sense

    WIRED, Business


    from June 16, 2016

    AIs still can’t really understand language, [Yann] LeCun explained.

    Making that happen means teaching computers to learn in much the same way humans do. LeCun points out that babies learn to learn to associate words with objects by simply observing the world around them. It takes at least a couple years, but we humans are able to learn all this with relatively few examples, at least compared to the number of images that LeCun and company feed their computers. “So there’s something we’re missing about human and animal learning,” he says. That missing thing, LeCun explains, is what we might call common sense.

     

    Machines Beat Humans in Hedge Fund Quest to Time Market Bottom

    Bloomberg


    from June 15, 2016

    From the February low, quants tracked by Credit Suisse increased their exposure to stocks by 11 percentage points, while the measure for traditional hedge funds dropped more than 1 point. Over the same period, the S&P 500 rallied as much as 16 percent, boosting the quants’ returns for the year to as much as 8 percent, according to Connors. At the same time, the best of the hedge funds were flat.

    For some, it’s a blow to the cause of human judgment in the stock market. Inputs not deeply steeped in mathematical models, like industry assessments and economic views, caused managers to hold onto bearish bets on energy. Although they were forced to cover as energy prices rallied, shorts remained anywhere from 20 percent to 60 percent of shares available among funds tracked by Credit Suisse prime services.

     
    Events



    HCOMP 2016: Tutorial on Crowdsourced Data Processing: Industry and Academic Perspectives



    This tutorial will synthesize and summarize insights from our recently published book entitled “Crowdsourced Data Management: Industry and Academic Perspectives.” (Amazon link) The aims of the book and the tutorial are to narrow the gap between academics and practitioners.

    Austin, TX Sunday, October 30, before the Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016)

     
    Deadlines



    RIT Hockey Analytics Conference

    deadline: subsection?

    Rochester, NY A one-day workshop on September 10, 2016 showcasing state-of-the-art research in analytics for college and professional hockey

    Deadine for submissions is Friday, July 15.

     

    Call for Questions: Help Shape the Open Data Research Agenda

    deadline: subsection?

    The 2016 Open Data Research Symposium (#ODRS16) is crowd-sourcing questions that if answered could radically increase our understanding of open data.

    Madrid, Spain On October 5th, researchers from around the world will come together at the 2nd Open Data Research Symposium (ODRS) a pre-event to the International Open Data Conference.

     
    Tools & Resources



    How to approach selecting a license for data release

    Data Carpentry, Elizabeth Wickes


    from June 13, 2016

    A question recently came up on the SWC Discuss mailing list about how to select a license for publicly released data. Answering these questions without knowing the life story of the data in question can only be done in vast generalities, but there are some nearly universal issues that everyone should consider.

    No person will be able to tell you definitively which license is the best. Your selection will be determined by the kind of data you are using, community standards, and your personal preference. There are pros and cons to permissive versus restrictive licenses, which you will need to evaluate using your own preferences.

     

    Introduction to Machine Learning with Python 

    O'Reilly Media; authors: Andreas C. Mueller, Sarah Guido


    from June 16, 2016

    Introduction to Machine Learning with Python teaches you the basics of machine learning and provides a thorough hands-on understanding of the subject.

    You’ll learn important machine learning concepts and algorithms, when to use them, and how to use them. The book will cover a machine learning workflow: data preprocessing and working with data, training algorithms, evaluating results, and implementing those algorithms into a production-level system.

     

    Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations | SpringerLink

    European Journal of Epidemiology


    from May 21, 2016

    Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.

     

    Interactive Knowledge Extraction (IKE)

    Allen Institute for Artificial Intelligence


    from June 17, 2016

    Recent work on information extraction has suggested that fast, interactive tools can be highly effective; however, creating a usable system is challenging, and few publicly available tools exist. In this paper we present IKE, a new extraction tool that performs fast, interactive bootstrapping to develop high quality extraction patterns for targeted relations. Central to IKE is the notion that an extraction pattern can be treated as a search query over a corpus.

     
    Careers



    Ask the CTO: Balancing development desire with product goals
     

    O'Reilly Media, Camille Fournier
     

    Accessibility, diversity, and software teams
     

    Andew J. Ko, Bits and Behavior blog
     

    Scientific Programmer/Science Data Analyst/DevOps Engineer
     

    Association of Universities for Research in Astronomy
     

    Leave a Comment

    Your email address will not be published.