Data Science newsletter – August 31, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for August 31, 2017

GROUP CURATION: N/A

 
 
Data Science News



AI, Algorithms and the Future of Finance: Companies Use Deep Learning to “Read the Tea Leaves” in Market Data

Newsweek, Ian Allison


from

Henri Waelbroeck, director of research at machine learning trade execution system Portware, says rather poetically that the system “reads the tea leaves” in market data to distinguish different sorts of orders and execute trades more efficiently.

Portware uses artificial intelligence to help traders select the best algorithm for particular market conditions, asset class, broker, venue etc., interacting with the order flow and computing a mind-boggling array of variables in real time.


NERSC Scales Scientific Deep Learning to 15 Petaflops

HPC Wire, Rob Farber


from

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according to the authors of the paper (and to the best of their knowledge), currently the most scalable deep-learning implementation in the world.


Making data centers more energy efficient

MIT News


from

Most modern websites store data in databases, and since database queries are relatively slow, most sites also maintain so-called cache servers, which list the results of common queries for faster access. A data center for a major web service such as Google or Facebook might have as many as 1,000 servers dedicated just to caching.

Cache servers generally use random-access memory (RAM), which is fast but expensive and power-hungry. This week, at the International Conference on Very Large Databases, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new system for data center caching that instead uses flash memory, the kind of memory used in most smartphones.


DeepL Translator

DeepL


from

In blind tests pitting DeepL Translator against the competition, translators prefer our results by a factor of 3:1.


Bioinformatics Core Competencies for Undergraduate Life Sciences Education

bioRxiv; Mark A. Pauley et al.


from

Bioinformatics is becoming increasingly central to research in the life sciences. However, despite its importance, bioinformatics skills and knowledge are not well integrated in undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing genomic research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of life sciences faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists, as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate life sciences students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.


Information management: Data domination

Nature Research, Gaia Donati and Chris Woolston


from

Software programming, algorithm development and other technological skills can give scientists an edge in their fields.

Karthik Ram had to reinvent himself in 2009, as have many other scientists in this data-driven age. When he started his postdoctoral work on how climate change affects elk in Yellowstone National Park in Wyoming, he thought of himself as an ecologist. But interpreting data from satellites and the tracking collars used to follow the animals pushed him to expand that mindset.

To make sense of the shifting ecosystem, he had to hone his programming and learn how to manage mountains of information — skills that have changed the way he views himself and his career. “I use the term ‘ecologist’ less and less often,” he says. “Now, I mainly call myself a data scientist.”


Tweets of Contention: How Social Media is Changing Political Conflict

Political Violence at a Glance, Thomas Zeitzoff


from

How is social media changing political conflict, and how should scholars and policymakers analyze its effects on conflict and contentious politics? In a recent article, I summarize the current state of research, and provide a blueprint for understanding communication technology advances and conflict via four points below.

1) Lowered Barriers of Communication


Microsoft and Amazon want Cortana and Alexa to talk to each other

Digital Trends, Kyle Wiggers


from

On Wednesday, August 30, Amazon and Microsoft announced a partnership that will see the two companies’ AI-powered assistants work together across devices, software, and core services.

In the coming months, you’ll be able to invoke Cortana on Alexa devices with the command, “Alexa, Open Cortana,” and pull up Cortana on Windows 10 computers with, “Cortana, Open Alexa.” But that’s just the start. Eventually, Alexa-powered Echo speakers, smartphone apps, and smartwatches will gain tight integration with Office 365, Outlook, and Exchange, and other Microsoft productivity platforms.


Do We Need a Speedometer for artificial intelligence?

WIRED, Business, Tom Simonite


from

Microsoft said last week that it had achieved a new record for the accuracy of software that transcribes speech. Its system missed just one in 20 words on a standard collection of phone call recordings—matching humans given the same challenge.

The result is the latest in a string of recent findings that some view as proof that advances in artificial intelligence are accelerating, threatening to upend the economy. Some software has proved itself better than people at recognizing objects such as cars or cats in images, and Google’s AlphaGo software has overpowered multiple Go champions—a feat that until recently was considered a decade or more away. Companies are eager to build on this progress; mentions of AI on corporate earnings calls have grown more or less exponentially.

Now some AI observers are trying to develop a more exact picture of how, and how fast, the technology is advancing. By measuring progress—or the lack of it—in different areas, they hope to pierce the fog of hype about AI. The projects aim to give researchers and policymakers a more clear-eyed view of what parts of the field are advancing most quickly and what responses that may require.

 
Events



USACM Panel on Algorithmic Transparency and Accountability

The Association for Computing Machinery US Public Policy Council


from

Washington, DC Thursday, September 14, at 9 a.m., National Press Club [free, registration required]

 
NYU Center for Data Science News



Which encoding mechanism is best for Chinese, English, Japanese, and Korean?

Medium, NYU Center for Data Science


from

CDS’s founding director Yann LeCun & Ph.D. student Xiang Zhang produce first systematic study of 473 encoding models for text classification on 14 multilingual data sets


Hire an NYU Ph.D. Student

NYU, Wilf Family Department of Politics


from

Introducing Maria Carreri, Pablo Fernandez-Vazquez, Tyler Kustra, Kevin Munger, Hande Mutlu-Eren, Renard Sexton, Gabor Simonovits, Sebastian Thieme and Denis Stukal.

 
Tools & Resources



Notes on state of the art techniques for language modeling

fast.ai, Jeremy Howard


from

I cleverly trapped Smerity in a Twitter DM conversation while he was trapped on a train with nothing better to do than answer my dumb questions, and I managed to get a download of ~0.001% of what he knows about language modeling. It should be enough to keep me busy for a few months… The background of this conversation is that for “version 2” of our deep learning course at USF we’re curating and implementing in a consistent API the most important best practices in a range of deep learning applications, including computer vision, text, and recommendation systems. Unfortunately, for text applications the best practices are not really collected anywhere, hence the need for the Smerity-brain-dump.

I figured I’d make my notes on the conversation into a little blog post in case other people find this useful too.


Salesforce is using AI to democratize SQL so anyone can query databases in natural language

TechCrunch, John Mannes


from

SQL is about as easy as it gets in the world of programming, and yet its learning curve is still steep enough to prevent many people from interacting with relational databases. Salesforce’s AI research team took it upon itself to explore how machine learning might be able to open doors for those without knowledge of SQL.

Their recent paper, Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning, builds on sequence to sequence models typically employed in machine translation. A reinforcement learning twist allowed the team to obtain promising results translating natural language database queries into SQL.

 
Careers


Full-time positions outside academia

Program Assistant



Data-Pop Alliance; New York, NY

Analyst, Data Strategy and Analytics



Major League Soccer; New York, NY

Leave a Comment

Your email address will not be published.