Data Science newsletter – June 28, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 28, 2017


Data Science News

Artificial Intelligence: Experts from U.C. Berkeley, Google Talk Hype, Jonathan Vanian


Artificial intelligence has made great strides in the past few years, but it’s also generated much hype over its current capabilities.

That’s one takeaway from a Friday panel in San Francisco involving leading AI experts hosted by the Association for Computing Machinery for its 50th annual Turing Award for advancements in computer science.

Michael Jordan, a machine learning expert and computer science professor at University of California, Berkeley, said there is “way too much hype” regarding the capabilities of so-called chat bots. Many of these software programs use an AI technique called deep learning in which they are “trained” on massive amounts of conversation data so that they learn to interact with people.

How artificial intelligence can deliver real value to companies

McKinsey & Company, McKinsey Global Institute; Jacques Bughin, Eric Hazan, Sree Ramaswamy, Michael Chui, Tera Allas, Peter Dahlström, Nicolaus Henke, and Monica Trench


Companies new to the space can learn a great deal from early adopters who have invested billions into AI and are now beginning to reap a range of benefits.

Amazon Web Services creating the jobs Ontario needs, Premier says

IT World Canada, Brian Jackson


The office will be focused on projects such as Amazon Fulfillment Technologies, development of the voice-controlled intelligent assistant Alexa, and the suite of AWS infrastructure services. AWS also announced commitments to doing more business across the province:

  • Operating four fulfillment centres across the province, employing more than 1,600 people. It says the centre in Brampton will be among the most advanced in the world.
  • Establishing a software development office in Ottawa, with an exclusive focus on development of Alexa.
  • Working with University of Waterloo as one of four universities globally to receive an Alexa Fund Fellowship, to support the development of AI and voice-recognition technology.

  • The data engineering ecosystem in 2017

    Insight Data Science, Zubair Saiyed


    Since we launched the Insight Data Engineering Fellows program in 2014, we’ve built relationships with over 75 teams in the data industry. We’ve discussed the latest challenges faced by engineers on top teams like Facebook, Airbnb, Slack, The New York Times, LinkedIn, Amazon, and Tesla. Additionally, our ever-growing alumni network now includes 150+ engineers and 750+ data scientists that regularly share their experience with the Insight community. Thanks to this strong community, we have a unique position to detect emerging patterns in the technologies used in the field.

    We’re constantly exploring ways to contribute this knowledge to the next generation of data engineers and the broader data community. We’ve developed a more interactive version of our Data Engineering Ecosystem Map. This iteration provides a streamlined view of the core components of data pipelines, while enabling deeper exploration of the complex world of distributed system technologies.

    Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog

    arXiv, Computer Science > Computation and Language; Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra


    A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision!
    In this paper, using a Task and Tell reference game between two agents as a testbed, we present a sequence of ‘negative’ results culminating in a ‘positive’ one — showing that while most agent-invented languages are effective (i.e. achieve near-perfect task rewards), they are decidedly not interpretable or compositional.
    In essence, we find that natural language does not emerge ‘naturally’, despite the semblance of ease of natural-language-emergence that one may gather from recent literature. We discuss how it is possible to coax the invented languages to become more and more human-like and compositional by increasing restrictions on how two agents may communicate.

    U.S. Military Sees Future in Neuromorphic Computing

    The Next Platform, Nicole Hemsoth


    The novel architectures story is still shaping out for 2017 when it comes machine learning, hyperscale, supercomputing and other areas.

    From custom ASICs at Google, new uses for quantum machines, FPGAs finding new routes into wider application sets, advanced GPUs primed for deep learning, hybrid combinations of all of the above, it is clear there is serious exploration of non-CMOS devices. When the Department of Energy in the U.S. announced its mission to explore novel architectures, one of the clear candidates for investment appeared to be neuromorphic chips—efficient pattern matching devices that are in development at Stanford (NeuroGrid), The University of Manchester (Spinnaker), Intel, Qualcomm, and of course, IBM, which pioneered the neuromorphic space with its TrueNorth architecture.

    University Data Science News

    Carnegie Mellon University claims that they invented artificial intelligence in 1956 and have now launched an artificial intelligence center. I’m not going to weigh in on where AI was invented, but the ACM seems to think the term was coined in 1955 by John McCarthy (Dartmouth), Marvin Minsky (MIT), Claude Shannon (Bell Labs), and Nathaniel Rochester (IBM). What is worthwhile to point out that an AI center is a little different than a data science center. Many universities have established masters degrees and centers/institutes for data science. Data science is a much wider field than AI and it makes sense to me to keep a wider focus, both to encompass more computational methods across fields now and to remain robust to future developments outside of AI. In fact, it is not clear that CMU wants to establish a radically multi-disciplinary center. Instead, they appear to be doubling down on involvement from computer science and applied math. It will be interesting to see what kinds of research and products come out of CMU in comparison to schools that have taken a broader approach.

    Sort of good news in reproducibility: two key cancer studies which had previously been replicated with mixed results, have again been replicated with more robust results, comparable to the original studies as part of the Reproducibility Project: Cancer Biology, led by the journal eLife. Part of the research involving mouse models was beyond the capacity of the contract lab to replicate, leading some academics to question the value of having this type of research replicated in the first place. No live mice, no experimental dice.

    If you are a scientist in Europe, it may not pay to work longer hours at the bench or in the classroom. Appearing as a co-author and getting in some face-time with higher level administrators by participating in administrative work may net you a higher salary. (NB: This is not good advice for American scientists where longer hours at the proverbial bench are more likely to be rewarded.)

    High five to MIT for important computational chemistry news this week. We so rarely feature chemistry and are happy to note that researchers are fast at work using machine learning to predict what the major product of chemical reactions will be. The eventual goal, according to Klavs Jensen, is to “be able to walk up to a system and say, ‘I want to make this molecule.’ The software will tell you the route you should make it from, and the machine will make it.” Expect to hear more computational chemistry along the way.

    UC-Santa Cruz has opened a new data science research center, called Data, Discovery, and Decisions (D3), led by Lise Getoor, professor of Computer Science. A “central aim” of the new center is to “develop open-source tools,” possibly in conjunction with industry partners.

    David Scott Hunter and Tauhid Zaman from MIT Sloan can sort of predict startup successes. They extracted a training set from a database of 83,000 startups. Then they developed a model to predict successful exits and ended up with a 60 percent success rate, double the rate of the best VCs. The rest of the news about VCs this week suggest some are riddled with biases and far from being perfect rational actors, so I’m not surprised they got their behinds handed to them by a bot.

    That finding may have long term implications for finance majors: will plum finance jobs be replaced by robots? If so, who gets to keep all those profits??

    Beyond Bananas: CMU Scientists Harness “Mind Reading” Technology to Decode Complex Thoughts

    Carnegie Mellon University, News


    Carnegie Mellon University scientists can now use brain activation patterns to identify complex thoughts, such as, “The witness shouted during the trial.”

    This latest research led by CMU’s Marcel Just builds on the pioneering use of machine learning algorithms with brain imaging technology to “mind read.” The findings indicate that the mind’s building blocks for constructing complex thoughts are formed by the brain’s various sub-systems and are not word-based. Published in Human Brain Mapping and funded by the Intelligence Advanced Research Projects Activity (IARPA), the study offers new evidence that the neural dimensions of concept representation are universal across people and languages.

    Airbus launches Skywise – aviation’s open data platform



    Airbus has launched a new aviation data platform in collaboration with Palantir Technologies – pioneers in big-data integration and advanced analytics. Skywise aims to become the single platform of reference used by all major aviation players to improve their operational performance and business results and to support their own digital transformation. Skywise is already improving industrial operations performance throughout Airbus’ industrial footprint and allows now to deliver enhanced aircraft and equipment designs, better service and support offerings based on deeper in-service data insights.

    Skywise will provide all users with one single access point to their enriched data by bringing together to aviation data from multiple sources across the industry into one secure cloud-based platform.

    Artificial Intelligence Poised to Ride a New Wave

    Communications of the ACM, Gary Anthes


    The path of AI over the ensuing 62 years has been anything but smooth. There were early successes in areas such as mathematical problem solving, natural language, and robotics. Some of the ideas that are central to modern AI, such as those behind neural networks, made conceptual advances early on. Yet funding for AI research, mostly from the U.S. government, ebbed and flowed, and when it ebbed the private sector did not take up the slack. Enthusiasm for AI waned when the grandiose promises by researchers failed to be met.

    Significant Impact Factor Gains for AI and Computer Sciences Journals

    IOS Press


    IOS Press announces significant Impact Factor growth in its journals in artificial intelligence, computer & communications sciences and mathematics, according to the 2016 Journal Citation Reports® (JCR) released by Clarivate Analytics this month.

    Cancer studies pass reproducibility test

    Science, ScienceInsider, Jocelyn Kaiser


    A high-profile project aiming to test reproducibility in cancer biology has released a second batch of results, and this time the news is good: Most of the experiments from two key cancer papers could be repeated.

    The latest replication studies, which appear today in eLife, come on top of five published in January that delivered a mixed message about whether high-impact cancer research can be reproduced. Taken together, however, results from the completed studies are “encouraging,” says Sean Morrison of the University of Texas Southwestern Medical Center in Dallas, an eLife editor. Overall, he adds, independent labs have now “reproduced substantial aspects” of the original experiments in four of five replication efforts that have produced clear results.

    We may never know how many Americans are being swept up in ‘foreign surveillance’

    The Outline, William Turton


    Section 702 is supposed to target non-citizens only, but a 2014 Washington Post analysis of National Security Agency documents provided by whistleblower Edward Snowden revealed that “nine of 10 account holders found in a large cache of intercepted conversations … were not the intended surveillance targets but were caught in a net the agency had cast for somebody else.”

    So, how many Americans are being “caught in a net” like this every year? We don’t know, and the intelligence community is resisting pleas by Congress to tell us. Last year, then Director of National Intelligence James Clapper said that his office was “looking at several options” for producing an estimate of how many Americans have been subjected to collection under 702, “none of which are optimal.” But today during a Senate Judiciary Committee hearing to consider reauthorizing 702, the lawyers for the Director of National Intelligence told Congress that providing even an estimate would be technically impossible.

    The secrets of a top salary in science

    Nature News & Comment, Editorial


    Michelle Pfeiffer once said that she acts for free — “but I demand a huge salary as compensation for all the annoyance of being a public personality.” Many scientists have a similar attitude. They enjoy the process and thrill of research and would probably do it for free. The monthly wage is there to make up for all the stuff that goes with it — the form-filling, meetings, bureaucracy and a thousand other distractions.

    Still, some scientists get more compensation than others. Why shouldn’t one of them be you? For although reports — including Nature’s own biennial survey of salaries and attitudes — indicate that times are tough, every researcher knows, or at least suspects they know, a colleague who earns significantly more than they do but seems less dedicated to science. What’s their secret?

    Computer system predicts products of chemical reactions

    MIT News


    Historically, determining the most efficient and cost-effective way to produce a given molecule has been as much art as science. But MIT researchers are trying to put this process on a more secure empirical footing, with a computer system that’s trained on thousands of examples of experimental reactions and that learns to predict what a reaction’s major products will be.

    The researchers’ work appears in the American Chemical Society’s journal Central Science. Like all machine-learning systems, theirs presents its results in terms of probabilities. In tests, the system was able to predict a reaction’s major product 72 percent of the time; 87 percent of the time, it ranked the major product among its three most likely results.

    How Facebook Used Science And Empathy To Reach Two Billion Users

    Fast Company, Harry McCracken


    Facebook’s growth team has a methodical approach to understanding what new users want, and a surprisingly broad mandate.

    Wimbledon serves up a new take on journalism in the age of AI

    The Next Web, Matthew Hughes


    … Previously, the task of creating highlight packages and annotating photographs would be the responsibility of a human. But this year round, the job is placed in the hands of the Watson AI.

    Watson can generate highlight packages without any human input. It can watch a video feed, and identify the most pertinent parts of a match. This can be seen by players shaking hands, gesticulating in celebration, or something as simple as the levels of volume from the audience. If people are groaning, or screaming in elation, something important obviously just happened.

    Is the staggeringly profitable business of scientific publishing bad for science?

    The Guardian, Stephen Buranyi


    It is an industry like no other, with profit margins to rival Google – and it was created by one of Britain’s most notorious tycoons: Robert Maxwell.

    The Evidence-Based Revolution In Trading And Investing

    Forbes, Brett Steenbarger


    A revolution has been brewing in the financial world, and it is impacting everything from the day trading of individuals to the investments of large pension and sovereign wealth funds. Making money is no longer viewed simply as the result of the insightful decision making of market wizards. Rather, returns in markets are seen to follow from rigorous research. Investing and trading are becoming increasingly evidence-based. That has important implications for your market participation as an individual, and also for the investment of your retirement funds.

    As a clinical professor at a medical school in Upstate New York, I’ve seen this movie before. When I joined the full-time faculty in 1985, there was certainly medical research, but actual medical practice was deemed to be a function of the judgment and clinical intuition of the experienced practitioner. This was especially the case in my own department of psychiatry and behavioral sciences. Fast forward to 2017 and the third edition of a brief psychotherapy textbook I am co-editing exclusively features helping approaches that have been validated by extensive outcome studies. It is routine for medical practitioners to consult the Cochrane Database of Systematic Reviews to identify the treatments found to be most effective for particular medical problems before formulating a treatment plan. The ideal of the wise, experienced country doctor has been replaced by the ideal of medicine as a science.

    That story is playing out in the world of finance.

    How Shifting Winds Turn Tropical Storms into Hurricanes

    Eos, Emily Underwood


    Researchers present a novel method for analyzing how wind shear affects tropical cyclone strength and structure.


    Workshop series on Noisy User-generated Text (W-NUT)

    Copenhagen, Denmark September 7 at EMNLP 2017. “Shared task: Novel and Emerging Entity Recognition” focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Deadline for system description papers is July 7.

    Applications open for the August 2017 NCBI-NLM Bioinformatics Hackathon

    From August 14th – 16th, the NCBI, with involvement from several NIH institutes, will host a Biomedical Data Science hackathon at the National Library of Medicine on the NIH campus. Deadline for applications is July 11.

    Positions for Exceptional Doctoral Students

    The Helsinki Doctoral Education Network in Information and Communications Technology (HICT) is a collaborative doctoral education network hosted jointly by Aalto University and the University of Helsinki. Deadline is July 30.
    Tools & Resources

    Neural Network Libraries



    “An open source software to make research, development and implementation of neural network more efficient.”

    Releasing the Dexterity Network (Dex-Net) 2.0 Dataset for Deep Grasping

    The Berkeley Artificial Intelligence Research Blog, Jeff Mahler


    “The Dexterity Network (Dex-Net) 2.0 is a project centered on using physics-based models of robust robot grasping to generate massive datasets of parallel-jaw grasps across thousands of 3D CAD object models. These datasets are used to train deep neural networks to plan grasps from a point clouds on a physical robot that can lift and transport a wide variety of objects.”

    Leave a Comment

    Your email address will not be published.