Data Science newsletter – November 29, 2016

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for November 29, 2016

GROUP CURATION: N/A

Data Science News

New Program in Technology Management at UC Santa Barbara | Organizations, Occupations and Work

University of California-Santa Barbara, Department of Engineering

from November 28, 2016

The job market for interdisciplinary Ph.Ds. with training in technology and organizations is expected to be strong over the next decade. We anticipate that most graduates will pursue academic positions in Business Schools (where the study of technology innovation and entrepreneurship is growing), Schools of Communication (where the study of organization and technology is increasing swiftly), in Schools of Information (which are being rapidly founded across the country), and in Schools of Engineering (that are increasingly interested in the social aspects of technical work). Unlike the demand for Ph.D. tenure track faculty in many other disciplines, the number of tenure track openings in these schools exceeds the number of Ph.Ds. awarded each year, and that gap is increasing.

Follow the funding in (data) science

Things people and organizations who write checks to support science are thinking:

The National Institute of Health‘s Office of Behavioral and Social Science Research released its first strategic plan in a decade, indicating they want to see larger datasets and more agreement on the definition of terms like ‘trust’, ‘resilience’, and ‘depression’ to foster the combination of small datasets with similar variables.

DARPA announced the Agile Teams (A-teams) program to “discover, test, and demonstrate predictive and generalizable mathematical methods to enable optimized design…for human-intelligent machine systems” and is looking for “expertise in mathematics, organizational theory, operations research, planning and scheduling, cognitive science, human factors, autonomy, and citizen science”.

In Japan, the Ministry of Economy, Trade and Industry will spend $173m to build “the world’s fastest” supercomputer [130 petaflops/second], according to Reuters. Japanese AI-reliant companies will be able to lease time on it when its ready.

Civis Analytics, the firm responsible for the Obama 2012 campaign’s data science, has raised $22m in a Series A round that it will use to pay for a new location in Chicago and hire a bunch of engineers and data scientists. Drive Capital led the round and will get a board seat for partner Chris Olsen who will join Tim O’Reilly, Eric Schmidt and Civis CEO Dan Wagner to make the fourth of four white men on the board.

In addition to the Alexa Fund ($100m) and the Alexa Prize ($2.5m), Amazon now has an accelerator program for conversational AI which will be housed on campus at the University of Washington.

Disrupting the world of science publishing

TechCrunch, Bérénice Magistretti

from November 27, 2016

ScienceMatters, a Swiss startup that launched in February, is trying to pave the way to a more democratized system by offering an open-source publishing platform to every scientist who wants to share his or her observations. “We are trying to publish the same way top science publications published 50 years ago,” explains Lawrence Rajendran, founder and CEO of ScienceMatters. “They used to publish exact observations, but now, competition for space is extremely high so there needs to be that wow factor.”

In other words, scientists must not only present outstanding and unique results, but they also need to craft them within an appealing narrative that pleases the editors. Therefore what drives scientists today is no longer the curiosity of discovering something new, but rather the glorification of a high-impact factor (i.e. essentially an indicator of the number of times articles published in the journal are cited).

Retailer rewards programs get an overhaul with UTU

MIT Sloan Management School, Newsroom

from November 10, 2016

Lighter wallets have benefits in the heavily crowded retail loyalty and rewards program business, according to Ameer Jumabhoy, MBA ’17, and his father, Asad Jumabhoy, SM ’86.

The two recently launched a Singapore-based business, UTU, which will make loyalty and reward points programs available internationally, allowing consumers to take advantage of real time usage of those points anywhere. UTU, which translates to “rewards” in Maori, launches in Thailand this month and will continue its rollout through Asia in 2016. The company plans to make its way to Europe and the United States in 2017.

Chill Out, No One Can Predict Earthquakes—Including Japan

WIRED, Science, Marley Walker

from November 23, 2016

“Seismiologists just don’t have enough information about the processes that are going on in Earth,” says Peggy Hellweg, Operations Manager at UC Berkeley’s Seismology Lab. “We know the stress is building up, but we don’t know the details of each particular location, and when the stress is going to be too much for it. That’s the problem with earthquake prediction.”

What does it take to be a leader in precision medicine?

STAT, Hillary Ross and Zachary Durst

from November 25, 2016

With strong support from Congress, President Obama has made precision medicine a focal point of the end of his term. Even with a change coming in the White House, precision medicine will remain a national priority because of its many potential benefits. The key to its success will be clinical implementation. As genetic screening tests become increasingly common, precision medicine could change the practice of medicine as we know it.

Implementing precision medicine will require health systems to decide what precision medicine will look like for them. Some systems are taking a three-pronged approach and incorporating basic science discovery, testing in clinical trials, and clinical implementation of precision medicine across separate institutes or departments, while others are combining initiatives within related areas such as biomedical research informatics and genomics. Some progressive organizations are carving out entities for precision medicine with independent budgets, staff, and decision-making abilities. There is no standard approach.

The Story of the Self Destruction of Deutsche Bank

Spiegel Online

from October 28, 2016

For most of its 146 years, Deutsche Bank was the embodiment of German values: reliable and safe. Now, the once-proud institution is facing the abyss. SPIEGEL tells the story of how Deutsche’s 1990s rush to join the world banking elite paved the way for its own downfall.

Doing science with AI

The US Forest Service is using PCR to map every aquatic animal in the Western states from free-floating environmental DNA (eDNA) in the water.

Carl Vondrick and Antonio Torralba at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), with former CSAIL postdoc Hamed Pirsiavash have applied deep-learning to generate videos from still images. Look for their paper at NIPS.

Radiologists at Boston Children’s Hospital are partnering with GE to develop a training set of pediatric brain imaging diagnoses that will pave the way for AI-assisted diagnostics.

Carnegie Mellon had a self-driving car named ALVINN that used neural networks back in the 1990s.

Climate science is notoriously difficult – lots of measurements (and measurement errors) on oh-so-many interacting facets. Andrew Majda at NYU Abu Dhabi and Nan Chen at NYU Washington Square have a new operational model that’s far better at handling Central Pacific El Nino events.

Dan Edwards and colleagues at the Naval Research Laboratory developed an algorithm that lets drones float on hot air thermals, similar to the way birds conserve energy while flying, allowing drones to stay aloft for 30 hours. Meanwhile, Apple received FAA approval to fly drones for data collection purposes, a key component in any potential effort to develop a self-driving car.

Finding great pizza in New York is now even easier with this NYC pizza map driven by Yelp data. I prefer Lucali in Carroll Gardens.

The image-processing algorithm Google uses to label web images has been trained to detect diabetic retinopathy as accurately as an ophthalmologist.

A new mapping algorithm developed by a team from the University of Tennessee and UC-Santa Barbara directs emergency responders to critical spots on the pre-disaster map that need to be re-mapped the most urgently after a disaster (e.g. to show road closures).

Yann LeCun and Joaquin Quinonero Candela who run Facebook’s AI Research (FAIR) department tell students to take math and then more math.

Betsy Mason reports on work by Garrett Nelson of Dartmouth College and Alasdair Rae of the University of Sheffield that defines U.S. economic mega-regions. The researchers’ analysis considered 4,000,000 American commuters’ travel flows (from Census Bureau data) and combined visual and algorithmic assessments to come up the 50 continental U.S. mega-regions (below).

Machine Learning in an Auction Environment

JMLR, Patrick Hummel, R. Preston McAfee;

from November 25, 2016

We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/exploit problem as a dynamic programming problem and show that efficiency is maximized by making a bid for each advertiser equal to the advertiser’s expected value for the advertising opportunity plus a term proportional to the variance in this value divided by the number of impressions the advertiser has received thus far. We then use this result to illustrate that the value of incorporating active exploration in an auction environment is exceedingly small.

With Shifts in National Mood Come Shifts in Words We Use, Study Suggests

The New York Times, Steph Yin

from November 27, 2016

In the wake of the election, it’s clear American society is fractured. Negative emotions are running amok, and countless words of anger and frustration have been spilled. If you were to analyze this news outlet for the ratio of positive emotional words to negative ones, would you find a dip linked to the events of the past few weeks?

It’s possible, suggests a study published last week in Proceedings of the National Academy of Sciences. Analyzing Google Books and The New York Times’s archives from the last 200 years, the researchers examined a curious phenomenon known as “positive linguistic bias,” which refers to people’s tendency to use more positive words than negative words. Though the bias is robust — and found consistently across cultures and languages — social scientists are at odds about what causes it.

Designing Agile Human-Machine Teams

DARPA

from December 07, 2016

Modern military operations are dynamic and complex—requiring, for example, that infantry squads carry out their missions simultaneously in the 3-dimensional physical world, the cyber domain, and across the electromagnetic spectrum. As artificial intelligence becomes more advanced, the future of kinetic, cyber, and electronic warfare envisions humans and intelligent machines working together as a team. A challenge in designing human-machine systems, however, is determining how best to meld human cognitive strengths and the unique capabilities of smart machines to create intelligent teams adaptive to rapidly changing circumstances.

To address this challenge, DARPA today announced the Agile Teams (A-Teams) program, which sets out to discover, test, and demonstrate predictive and generalizable mathematical methods to enable optimized design of agile hybrid teams. A-Teams seeks to fundamentally challenge the current paradigm of human-intelligent machine systems design by changing the focus from simply using machines for automation and substitution of human capacity to an integrated fabric enabling superior collective problem solving.

Creating videos of the future

MIT News

from November 28, 2016

Given a still image, CSAIL deep-learning system generates videos that predict what will happen next in a scene.

Meet ALVINN, the self-driving car from 1989

The Verge, Andrew J. Hawkins

from November 27, 2016

In 1989, the Berlin Wall began to fall, the World Wide Web made its debut, Madonna’s “Like a Prayer” topped the charts, and in Pittsburgh, a retrofitted Army ambulance called ALVINN was driving around Carnegie Mellon University without any human intervention.

Self-driving cars may seem like a very recent technological phenomenon, but researchers and engineers have been building vehicles that can drive themselves for over three decades. Research on computer controlled vehicles began at Carnegie Mellon in 1984 and production of the first vehicle, Navlab 1, began in 1986. ALVINN, which stands for Autonomous Land Vehicle In a Neural Network, was used as a test vehicle well into the 1990s.

This proto-driverless vehicle came up recently in a Twitter discussion between two engineers: Oliver Cameron, who heads an open-source self-driving car project at Udacity, and Dean Pomerleau, a CMU professor who ran the self-driving car project that gave birth to ALVINN. Cameron tweeted a video shared by some of his students of a car steering itself autonomously using only a camera.

Japan looks to create a superfast supercomputer for deep learning

TechCrunch, Brian Heater

from November 25, 2016

Japan is reportedly eyeing a return to the top of the supercomputer ranks. The county’s Ministry of Economy, Trade and Industry plans to spend 19.5 billion yen ($173 million) on a new supercomputer, according to budget filings reported by Reuters, with aims of developing a machine capable of 130 petaflops.

That number would put the company in the top spot, moving the AI Bridging Cloud Infrastructure ahead of Sunway Taihulight, the supercomputer unveiled by China over the summer capable of 93 petaflops. The Ministry already has some fairly big plans for the previously unannounced supercomputer, utilizing its record-breaking speeds to help the country develop advances in AI technologies like deep learning.

GE, Boston Children’s to create deep-learning tool for pediatric brain scans

FierceBiotech

from November 28, 2016

GE Healthcare is teaming up with Boston Children’s Hospital to develop digital tools for the diagnosis and treatment of childhood diseases with an initial focus on diagnostic imaging for brain disease.

In this first project, GE will contribute its software and cloud computing prowess, while Boston Children’s radiologists will pitch in their clinical knowledge. The goal will be to create a decision-support tool to help radiologists interpret pediatric MRI scans of the brain, according to a statement. This has the potential to save money, time and stress associated with misdiagnosis and unnecessary testing.

Data Science Q&A with David Rosenberg, Office of the CTO

Tech at Bloomberg blog

from November 28, 2016

Q: Tell us about your team…

David Rosenberg: I’m on the data science team in the Office of the CTO. One of our primary responsibilities is to help set the technical direction of Bloomberg as it relates to data science. This involves coordination, consulting, running projects, and both internal and external communication.

Supercomputing Conference A Glimpse Into Future Of Mainstream Computing

Forbes, Paul Teich

from November 23, 2016

Supercomputers used to be a market unto themselves. In their time, giants such as IBM, Control Data Corporation (CDC), Evans & Sutherland (E&S), Silicon Graphics and Cray Computing ruled the supercomputing market. But with Moore’s Law restricted to only scaling transistor count, scale-out has taken over a conference series that had epitomized scale-up. Because the nature of supercomputing has changed so much in the past two decades, the market has expanded into a much broader high-performance computing (HPC) market. At this year’s SC16 conference, SGI made news by being purchased by Hewlett Packard Enterprise (HPE) and the Cray booth looked a bit forlorn. For the past few years compute silicon vendors have increased their influence on the supercomputing market. This year at SC16, most of the attention was on NVIDIA, Intel and Xilinx. Cloud infrastructure manufacturers, such as Dell EMC, HPE, IBM, Huawei, QCT, Inspur and Sugon and storage vendors, such as DDN, Dell EMC, and HPE, were also in the spotlight.

New Climate Model Captures Key Features of Central Pacific El Nino

NYU Abu Dhabi, Press Room

from November 22, 2016

Scientists working at NYU Abu Dhabi have developed a sophisticated new theoretical model that may lead to timelier, accurate forecasts of the central Pacific El Niño (CP El Niño), an important weather-maker.

Any El Niño is a period of warmer-than-normal sea-surface temperature in the equatorial Pacific. These anomalies, and the related air pressure fluctuations, can have teleconnections – long-distance effects – on weather far away from the region. CP El Niño events can last for years, and have been “very frequent in the last 25 years,” said Andrew Majda, professor of mathematics and atmosphere/ocean science and principal investigator at NYUAD’s Center for Prototype Climate Modeling (CPCM).

Majda and Nan Chen, postdoctoral researcher at NYU’s Courant Institute of Mathematical Sciences, recently published a scholarly paper explaining what they have learned about the “genuine puzzle” of how a CP El Niño occurs.

Community college pathways to a four-year computer science degree: New Google Reports

Mark Guzdial, Computing Education Blog

from November 28, 2016

My ECEP colleagues at the University of Massachusetts Amherst, Rick Adrion and Renee Fall, led a successful NSF alliance called CAITE. One of CAITE’s most successful strategies to improve diversity at university-level CS was to make it easier for students to transfer from community colleges. Community colleges are much more diverse.

The latest reports from Google tell us more about the obstacles that CS students still face in moving from community colleges to bachelor’s degrees, and how to make it easier.

Raising the Floor: Sharing What Works in Workplace Diversity, Equity, and Inclusion

The White House, Megan Smith and Laura Weidman Powers

from November 28, 2016

We’ve compiled some of what we’ve heard and seen work in creating a more diverse, inclusive, and equitable science and technology workforce.

Big Data Reshaping China Indicators

Bloomberg

from November 27, 2016

It’s early days for Big Data in China, but economists think it could reshape the way financial markets view the world’s second biggest economy. Bloomberg’s Jeff Kearns reports on “Bloomberg Daybreak: Asia.”

Gliding Algorithm Lets Drones Surf The Winds For Hours

Popular Science, David Hambling

from October 17, 2016

Launch an unmanned SBXC sailplane into the air with a catapult and it glides lazily down again in about three minutes. But when the sailplane is equipped with software to find plumes of warm air rising from the ground, known as thermals, it can ride them upward and extend its flight to a record-breaking five hours.

The software, known as ALOFT, for Autonomous Locator of Thermals, was developed by Dr. Dan Edwards and colleagues at the Naval Research Laboratory. The team tested and launched more than 20 flights with the software at the Phillips Army Airfield in Maryland last October. In total, those flights lasted more than 30 hours, proving that using algorithms to update data on thermals can help unmanned sailplanes fly much longer than previously possible.

At WoHIT, a Finnish HIT Leader Details a Groundbreaking Nationwide Health Data Initiative

Healthcare Informatics

from November 27, 2016

The Isaacus initiative is preparing to collect health status and health data from across all relevant databases in Finland—not only those from the provider sector of Finnish healthcare, but also from social welfare agencies and other sources—in order to improve the capabilities of researchers to uncover social welfare and health status patterns across Finnish society, and therefore support the development of new policies and programs to address issues such as chronic disease and the social determinants of health in that country.

When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’

The New York Times, John Markoff

from November 27, 2016

But mention the name Jürgen Schmidhuber in an automated quinoa lunch spot frequented by coders in San Francisco, and you are likely to get blank stares.

On a recent train ride to Zurich, Dr. Schmidhuber, an athletic 53-year-old who is co-director of the Dalle Molle Institute for Artificial Intelligence Research here, reflected on how he believed his early research was often overlooked or ignored. “It’s like much of the rest of society,” he said. “Sometimes it’s postfactual.”

Dr. Schmidhuber’s complaints are well known within the fraternity of researchers who have turned what until a half-decade ago was an academic backwater into a multibillion-dollar industry. He has been accused of taking credit for other people’s research and even using multiple aliases on Wikipedia to make it look as if people are agreeing with his posts.

Yoshua Bengio et Hugo Larochelle – La Presse+

La Presse, Google Translate

from November 28, 2016

Thirty years ago, a young computer scientist from McGill University, Yoshua Bengio, chose to do his mastery on a subject considered esoteric, neural networks and the ability of machines to learn for themselves. Two decades later, he will train a student who will in turn become a star in this field, Hugo Larochelle.

Making Algorithms Fair: An Interview With Cynthia Dwork

Quanta Magazine, Kevin Hartnett

from November 23, 2016

The computer scientist Cynthia Dwork takes abstract concepts like privacy and fairness and adapts them into machine code for the algorithmic age.

Events

GSA Presents: Grace Hopper Day Hackathon!

Join coders from across the region as we come together in celebration of Grace Hopper’s birthday. Friday, December 9, at National Archives and Records Administration (NARA), 700 Pennsylvania Avenue, NW

Deadlines

JUMP Funding Opportunities – SRC

The new JUMP Research Announcement is a request to the US academic community for proposals to help define the future of the microelectronics industry. There is no requirement to be from a participating SRC research university. A key goal of this research announcement is to merge the ideas from established innovators within the existing university networks of SRC and the defense community with those from new, rising institutions or researchers.

Tools & Resources

An Interactive Tutorial on Numerical Optimization

Ben Frederickson

from November 25, 2016

Numerical Optimization is one of the central techniques in Machine Learning. For many problems it is hard to figure out the best solution directly, but it is relatively easy to set up a loss function that measures how good a solution is – and then minimize the parameters of that function to find the solution.

I ended up writing a bunch of numerical optimization routines back when I was first trying to learn javascript. Since I had all this code lying around anyway, I thought that it might be fun to provide some interactive visualizations of how these algorithms work.

OSMDeepOD – OSM and Deep Learning based Object Detection from Aerial Imagery (formerly known as “OSM-Crosswalk-Detection”)

GitHub – geometalab

from November 28, 2016

OSMDeepOD is a project about object detection from aerial imagery using open data from OpenStreetMap (OSM). The project uses the open source software library TensorFlow, with a retrained Inception V3 neuronal network.

Write your own R package

Stat 545

from November 28, 2016

The source of R an package is a highly structured set of files that reside in a dedicated directory on your computer. It can be beneficial to also make this directory an RStudio Project and a Git repository and, eventually, associate it with a GitHub remote. devtools ensures that your initial set up is correct and helps you keep it that way as your package evolves.

[1611.08144] How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself

arXiv, Computer Science > Computers and Society; Daniel Gayo-Avello

from November 24, 2016

Twitter is among the commonest sources of data employed in social media research mainly because of its convenient APIs to collect tweets. However, most researchers do not have access to the expensive Firehose and Twitter Historical Archive, and they must rely on data collected with free APIs whose representativeness has been questioned. In 2010 the Library of Congress announced an agreement with Twitter to provide researchers access to the whole Twitter Archive. However, such a task proved to be daunting and, at the moment of this writing, no researcher has had the opportunity to access such materials. Still, there have been experiences that proved that smaller searchable archives are feasible and, therefore, amenable for academics to build with relatively little resources. In this paper I describe my efforts to build one of such archives, covering the first three years of Twitter (actually from March 2006 to July 2009) and containing 1.48 billion tweets. If you carefully follow my directions you may have your very own little Twitter Historical Archive and you may forget about paying for historical tweets. Please note that to achieve that you should be proficient in some programming language, knowable about Twitter APIs, and have some basic knowledge on ElasticSearch; moreover, you may very well get disappointed by the quality of the contents of the final dataset.

What’s new in Swift 3

O'Reilly Radar, Jonathon Manning, Paris Buttfield-Addison, Tim Nugent

from November 22, 2016

Swift has truly been released into the open: the conceptualization, discussion, and development of new language features, direction decisions, and changes to the features all take place on open mailing lists, with the wider community of Swift users. This is important, because it means that the direction of the language is in the hands of users and not the exclusive domain of a central planning group.

In this report, we’re going to look at Swift 3, released in September 2016, and the first significant release from the open source Swift project. Specifically, we’re going to look at three facets of Swift 3 programming and the ecosystem around its use:

The Image Data Resource: A Scalable Platform for Biological Image Data Access, Integration, and Dissemination | bioRxiv

bioRxiv, Jason Swedlow et al.

from November 24, 2016

To extend the range of data types supported by community repositories, we have built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired using many different imaging modalities including high-content screening, super-resolution microscopy, time-lapse imaging and digital pathology, and links them in a single resource. IDR links experimental perturbations to public genetic or chemical databases, and cell and tissue phenotypes to controlled vocabularies expressed as ontologies. By integrating the phenotypic and genetic metadata from multiple studies, IDR makes it possible to reveal novel functional networks of genetic interactions linked to specific cell phenotypes.

Reproducible research: Stripe’s approach to data science

Stripe, Dan Frank

from November 22, 2016

When people talk about their data infrastructure, they tend to focus on the technologies: Hadoop, Scalding, Impala, and the like. However, we’ve found that just as important as the technologies themselves are the principles that guide their use. We’d like to share our experience with one such principle that we’ve found particularly useful: reproducibility.

We’ll talk about our motivation for focusing on reproducibility, how we’re using Jupyter Notebooks as our core tool, and the workflow we’ve developed around Jupyter to operationalize our approach.

Careers

Postdocs

Announcing the 2016 Moore/Sloan Data Science and WRF Innovation in Data Science Postdoctoral Fellowships

University of Washington, eScience Institute; Seattle, WA

Alan Turing Institute Research Fellowships

The Alan Turing Institute; London, England

Tenured and tenure track faculty positions

(2) Assistant Professor – Financial Mathematics

Ryerson University; Toronto, ON, Canada

Faculty Positions in EECS

University of Michigan; Ann Arbor, MI

Full-time positions outside academia

Sr. Data Analyst – Conversion

Sprout; Chicago, IL

Sports.BradStenger.com

Data Science newsletter – November 29, 2016

Leave a Comment Cancel reply