Data Science newsletter – December 18, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for December 18, 2017

GROUP CURATION: N/A

 
 
Data Science News



candidate: Tweet of the Week

Twitter


from


Algorithms are the New Drugs

Towards Data Science blog, Hugh Harvey


from

To understand where the future lies for the Artificial Intelligence (AI) and algorithm industry (aka “Big Tech”, “Big Algorithmia”, “Digital Health”), one needs only to look at how Pharma got quite so Big in the first place, including how new drugs are developed and reach the market, how medical practice is structured around the safe delivery and monitoring of drugs, and how doctors learn to understand drug mechanisms of action and side effects.

Therefore, I predict several ancillary industries erupting around algorithms, based on lessons learned from existing practice in Big Pharma.


Carnegie Mellon reveals inner workings of victorious AI

Carnegie Mellon University, News


from

Libratus, an artificial intelligence that defeated four top professional poker players in no-limit Texas Hold’em earlier this year, uses a three-pronged approach to master a game with more decision points than atoms in the universe, researchers at Carnegie Mellon University report.

In a paper published online today by the journal Science, Tuomas Sandholm, professor of computer science, and Noam Brown, a Ph.D. student in the Computer Science Department, detail how their AI achieved superhuman performance by breaking the game into computationally manageable parts and, based on its opponents’ game play, fix potential weaknesses in its strategy during the competition.


Pioneer of Human-Machine Interaction Says AI Will Not Replace Workers

IEEE, The Institute, Ben Shneiderman


from

The notion that artificial intelligence will replace workers is false. Suggesting that a machine is better at finding a tumor than a radiologist, for example, diminishes the role of the doctor. A truer statement would be that machines can help radiologists make more accurate diagnoses, and that finding tumors is only a small part of what the doctors do in the treatment process.

As with microscopes, digital cameras, and 3D printers, technology helps people accomplish goals better than they could without it, allowing them to discover more and be more creative. This is also true for artificial intelligence. AI will be part of our tool set. The conversation around the technology should be less about the misleading images portrayed in Hollywood and science fiction literature, and more about human-centered design. Such design refers to how AI can help empower people—whether by amplifying their abilities or supporting their goals—without harming them.


Can You Use This Data Set to Find Serial Killers?

Slate, Andrew Gelman and Michael Maltz


from

We statisticians and social scientists are always trying to ensure that the data we collect or use are accurate, complete, and clean. We use data to estimate the effects of policies, and answering those questions requires data strong enough and clean enough to survive scrutiny. But data can also be used to ask questions, to look for interesting patterns, and one interesting thing about this type of endeavor is that it may not require the data to be quite as pristine.

This may explain part of why we were captivated by Alec Wilkinson’s recent New Yorker story on Thomas Hargrove, a retired reporter and current “homicide archivist” who “has the largest catalogue of killings in the country” that he analyses using an algorithm, “which he sometimes calls a serial-killer detector.” In his piece, Wilkinson recounts a story from 2010, when Hargrove uncovered a pattern of murders in Indiana that led to the discovery of a serial killer. Hargrove’s Murder Accountability Project is a fascinating example of citizen science that can help to motivate police departments to improve their work, and we applaud Hargrove’s efforts.

The story also inspired us to check out his data.


University Data Science News

Dick Startz, Professor of Economics at UC-Santa Barbara ran a new analysis of the pay gap between STEM academic jobs and STEM industry jobs. He found that one reason we have so few black and Hispanic professors is that, collectively, they do better in industry than in higher ed (not a big surprise, everyone does), but that the improvements for blacks and Hispanics going to industry are bigger than those for everyone else (mostly whites and Asians; he doesn’t specifically break out indigenous peoples so I assume they are in the “everyone else” bucket).

Swiss university Ecole Polytechnique Federale de Lausanne has announced (none too soon) a Center for Digital Trust. I would like to see Alteryx, Equifax, and Experian sign up.

Luke Oakden-Rayner a radiologist and PhD Candidate in Medicine at University of Adelaide, specializing in machine prediction, examined the ChestXray14 dataset poised for use predicting pulmonary diseases. He minces no words: “as it exists now, [ChestXray14] is not fit for training medical AI systems to do diagnostic work”. Oakden-Rayner is a radiologist so he had the training to spot errors in label accuracy, the medical meaning of labels, and the usefulness of the labels for image analysis. He noted that, “many of the data users will be computer scientists without the clinical knowledge to discover any pitfalls”. This is *such* an important story. Oakden-Rayner is what we call a pi-shaped person, someone with deep expertise in radiology and now in machine learning. He can truly see this from the domain side and the computer science side. He’s rare. And his discovery is going to be rare, not so much because errors of this nature are rare (coming up with a good dataset is tedious, difficult, and under-rewarded relative to the work it takes so there will be many others with similar issues), but because it is so difficult to find pi-shaped people. Heck, most industries are struggling to find T-shaped data scientists. My answer: university centers for data science need to focus on recruiting and training people who are already domain experts. This is what our PhD programs should be about (and teaching students how to stand up for themselves scientifically is a superb complement). We need more people like Luke Oakden-Rayner.



Elsewhere in questionable medical data, George Washington Medical Center researchers published a paper that found flaws in ER patient experience reports.

The Andrew W. Mellon Foundation, a big funder of humanities and library scholarship has gotten into the digital humanities with a $900,000 grant to University of Cincinnati to support research on machine learning and data visualization in multiple humanities disciplines.

University of Montana is launching a public opinion poll because there is so little data available to study Montanans. I happen to love Montana, so I’m excited.



Silvia Ferrari and her colleagues at Cornell and the Harvard Microrobotics Lab are refining tiny little robots designed to behave like insects. They switched to neuromorphic computer chips which respond to bursts/waves of electromagnetic activity rather than binary 0s and 1s. They draw less power and can operate on a much smaller scale, which is important for modeling insects. Very impressive.

Lehigh University is making a $3-million institutional investment in its Nano/Human Interface Presidential Engineering Research Initiative. The program, headed by materials scientist Martin Harmer, will span the scientific data pipeline, starting with the instrumentation and moving through the human-machine interfaces, all in the name of interpreting immense amounts of data.


NYU’s CUSP Teaches Students To Plan City Sustainability

Brooklyn magazine


from

Housed in the Metrotech buildings in Downtown Brooklyn alongside NYU’s Tandon School of Engineering since its 2012 launch, CUSP officially opens its doors at its new location at 370 Jay Street. As part of NYU’s physical expansion in Brooklyn, 370 Jay Street will not only be CUSP’s home but will also house other NYU programs becoming a hub for digital technology and media arts and outreach programs, while also repurposing a long-vacant property.

Cities are “living laboratories,” and the Center focuses its research by exploring problems that cities currently face. As part of their work, CUSP collaborates with various New York City agencies through its Urban Science Intensive (USI). The USI is a six month applied urban analytics project that partners CUSP graduate students with agency sponsors to address a critical urban issue or research problem.


Matthew J. Salganik on Bit by Bit: Social Research in the Digital Age

Princeton University Press


from

My dissertation research was about fads, something that social scientists have been studying for about as long as there have been social scientists. But because I happened to be in the right place at the right time, I had access to an incredibly powerful tool that my predecessors didn’t: the Internet. For my dissertation, rather than doing an experiment in a laboratory on campus—as many of my predecessors might have—we built a website where people could listen to and download new music. This website allowed us to run an experiment that just wasn’t possible in the past. In my book, I talk more about the scientific findings from that experiment, but while it was happening there was a specific moment that changed me and that directly led to this book. One morning, when I came into my basement office, I discovered that overnight about 100 people from Brazil had participated in my experiment. To me, this was completely shocking. At that time, I had friends running traditional lab experiments, and I knew how hard they had to work to have even 10 people participate. However, with my online experiment, 100 people participated while I was sleeping. Doing your research while you are sleeping might sound too good to be true, but it isn’t. Changes in technology—specifically the transition from the analog age to the digital age—mean that we can now collect and analyze social data in new ways. Bit by Bit is about doing social research in these new ways.


Data will change the world, and we must get its governance right

The Guardian, Daniel Zeichner


from

I welcome the report on data governance recently produced by The British Academy and the Royal Society.

The report covers the context in which data governance needs to be set, and explains the ways in which society is changing and how data policy must take into account the risk of “data-related controversy” and history’s warning tales of how the “widespread adoption of new technologies can increase public anxiety, or result in major public controversy”. It also sets out what it calls principles for data governance, which are “needed to visibly shape all forms of data governance and ensure trustworthiness and trust in the management and use of data as a whole”. Visibility is key: it allows accountability and legitimises any governance laws introduced.


Science is on a mission to map all 37 trillion cells in your body

Wired UK, Emma Bryce


from

The human body contains more than 37 trillion cells – and Sarah Teichmann wants to map them all. She’s the pioneer behind the Human Cell Atlas, an international bid to build a 3D map of all our cells. A catalogue of these foundational units of life would give researchers unparalleled power to understand and treat human disease – “laying a foundation for a new era of precision medicine”, says Teichmann, head of cellular genetics at the Sanger Institute in Cambridge, UK.


Extra Extra

Netflix released its most-watched content, in case anyone is planning to consume mental junk food over the holidays.

Some year end roundups discussing AI and machine learning:

  • Deep Learning Achievements Over the Past Year by Eduard Tyantov
  • 2017: DeepMind’s year in review from Google DeepMind
  • The biggest AI stories of 2017 by Steve Levine at Axios
  • Alexander Reben used AI to generate “new” episodes of the PBS show Joy of Painting starring Bob Ross. Mr. Ross was not amused, and sued Reben for copyright infringement. Oh, America, land of the free-to-litigate. Anyway, this brings us to a new legal frontier. If a human can legally learn from a painting tutorial (or book), can a software program legally do the same?


    Same Title, Different Role: Commercial vs. Industrial Data Scientists

    Dataconomy, Chad Stoecker


    from

    Data science is a critical component of business operations today. According to Forrester, organizations that are driven by data science research and insights are twice as likely to be market leaders within their industries. The proof is in the pudding: investment in data leads to big-time payoffs for companies.

    While this data has become vital to organizations, data science strategies and practices differ dramatically between industrial and commercial organizations. From the amount and frequency of data inputs to the costs of experiments and models, data sets take on very different forms across these two sectors.


    Why is Facebook admitting that social networks can be bad for our health?

    Recode, Kurt Wagner


    from

    Facebook did something surprising: It admitted, in a corporate blog post yesterday, that using social media can leave people feeling crummy.

    Facebook, of course, has built a multi-billion dollar business on the back of the world’s largest social network, which made the admission rather startling.

    But there was a caveat to the claim: Facebook, which cited outside research throughout the report, says that using social media passively — described as “reading but not interacting with people” — makes users feel worse.


    Netflix and Spotify Ask: Can Data Mining Make for Cute Ads?

    The New York Times, Sapna Maheshwari


    from

    Last week, Netflix decided to have some holiday fun courtesy of its user data. So the streaming service took to Twitter to pose the question, “To the 53 people who’ve watched A Christmas Prince every day for the past 18 days: Who hurt you?”

    The tweet was meant to be an entertaining jab at a cheesy holiday film that the company released last month. But while many saw the humor, others were creeped out by the specificity of the information, with some complaining that Netflix appeared to be using its data in a flip manner that mocked some customers.


    A New Breed Drives the Deal-Making Frenzy at Computer Chip Firms

    The New York Times, Don Clark


    from

    Matthew Murphy, a first-time chief executive, was barely a year into a turnaround effort at Marvell Semiconductor when he sold its board on a bold move: a $6 billion offer to purchase Cavium, another midsize maker of computer chips.

    The offer, made in November, has since pushed Marvell’s shares up 18 percent, adding nearly $1.65 billion to a market valuation that had already risen nearly 80 percent during Mr. Murphy’s 16-month tenure. The 45-year-old chief executive has taken other big steps since succeeding the husband-and-wife team who led Marvell for 20 years, including trimming 1,700 jobs through layoffs and sales of businesses.

    “Marvell was a strong company from a technology point of view, but they definitely didn’t have the financial discipline they needed,” said Mark Edelstone, Morgan Stanley’s global head of investment banking for the semiconductor industry. “It took a change in leadership to make that happen.”


    Are we going too fast on driverless cars?

    Science, Jeffrey Mervis


    from

    The automakers and high-tech companies spending billions of dollars on developing self-driving cars and trucks tout the idea that autonomous vehicles (AVs) will help create a safer, cleaner, and more mobile society. Politicians aren’t far behind in their enthusiasm for the new technology.

    “This is probably the biggest thing to hit the auto industry since the first car came off the assembly line,” Senator Gary Peters (D–MI) told a cheering audience of researchers and executives at a recent computing conference in Washington, D.C. “It will not only completely revolutionize the way we get around, but [AVs] also have the potential to save hundreds of thousands of lives each year.”

    Such predictions, however, turn out to be based on surprisingly little research. While developers amass data on the sensors and algorithms that allow cars to drive themselves, research on the social, economic, and environmental effects of AVs is sparse. Truly autonomous driving is still decades away, according to most transportation experts. And because it’s hard to study something that doesn’t yet exist, the void has been filled by speculation—and starkly contrasting visions of the future. “The current conversation … falls into what I call the utopian and dystopian views,” says Susan Shaheen, co-director of the Transportation Sustainability Research Center at the University of California (UC), Berkeley.

     
    Events



    Canonical Computation in Brains and Machines Symposium

    NYU Global Institute for Advanced Study


    from

    New York, NY March 16-18, 2018, at NYU. Registration opens soon.

     
    Deadlines



    Data Science for Social Good Summer Fellowship Chicago

    Applications for fellows, mentors, and project partners for the 2018 programs are now open. Deadline to apply is January 31, 2018.
     
    Tools & Resources



    How I used machine learning to classify emails and turn them into insights (part 2).

    Towards Data Science, Anthony Dm.


    from

    “After discovering the most popular terms and the most exciting emails due to clustering algorithms, I was looking for a manner to further group emails related to a specific keyword. For example, finding all the emails that are related to salary or expenses, Enron was involved in a scandal for some reason, right?”

    “The first thing that came to mind to achieve this was cosine similarity. A common technique used to measure cohesion within clusters in the field of data mining.”


    Jupyter Tips and Tricks

    YouTube, Google Cloud


    from

    The YouTube series AI Adventures is produced by Google Cloud and hosted by Yufeng Guo. It’s a solid source of tips to improve data science workflows.


    How Docker Can Help You Become A More Effective Data Scientist

    Towards Data Science blog, Hamel Husein


    from

    “I wrote this quick primer so you don’t have to parse all the information out there and instead can learn the things you need to know to quickly get started.”

     
    Careers


    Full-time positions outside academia

    Coordinator, Digital Analytics



    National Hockey League; New York, NY
    Full-time, non-tenured academic positions

    Bioinformatics Scientist



    Harvard University, FAS Informatics Group; Cambridge, MA

    Leave a Comment

    Your email address will not be published.