Data Science newsletter – January 29, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for January 29, 2018

GROUP CURATION: N/A

 
 
Data Science News



Is “Murder by Machine Learning” the New “Death by PowerPoint”?

Harvard Business Review, Michael Schrage


from

Software doesn’t always end up being the productivity panacea that it promises to be. As its victims know all too well, “death by PowerPoint,” the poor use of the presentation software, sucks the life and energy out of far too many meetings. And audit after enterprise audit reveals spreadsheets rife with errors and macro miscalculations. Email and chat facilitate similar dysfunction; inbox overload demonstrably hurts managerial performance and morale. No surprises here — this is sadly a global reality that we’re all too familiar with.

So what makes artificial intelligence/machine learning (AI/ML) champions confident that their technologies will be immune to comparably counterproductive outcomes? They shouldn’t be so sure. Digital empowerment all too frequently leads to organizational mismanagement and abuse. The enterprise history of personal productivity tools offers plenty of unhappy litanies of unintended consequences. For too many managers, the technology’s costs often rival its benefits.

It’s precisely because machine learning and artificial intelligence platforms are supposed to be “smart” that they pose uniquely challenging organizational risks. They are likelier to inspire false and/or misplaced confidence in their findings; to amplify or further entrench data-based biases; and to reinforce — or even exacerbate — the very human flaws of the people who deploy them.


Scoop: Trump team considers nationalizing 5G network

Axios; Jonathan Swan, David McCabe, Ina Fried, Kim Hart


from

Trump national security officials are considering an unprecedented federal takeover of a portion of the nation’s mobile network to guard against China, according to sensitive documents obtained by Axios.

Why it matters: We’ve got our hands on a PowerPoint deck and a memo — both produced by a senior National Security Council official — which were presented recently to senior officials at other agencies in the Trump administration.

The main points: The documents say America needs a centralized nationwide 5G network within three years. There’ll be a fierce debate inside the Trump administration — and an outcry from the industry — over the next 6-8 months over how such a network is built and paid for.


Summary of #AI in #Davos2018 #WEF2018 .

Twitter, Fei Fei Li


from

The snowy mountain town of Davos is filled with heated talks all week long. #AI is certainly a highly popular topic. Here is a brief summary after participating in more than a dozen onstage & countless number of offstage discussions. [thread]


At Harvard, developing software to spot misused images in science

Elsevier Connect, Lucy Goodchild van Hilten


from

When you’re submitting a manuscript to a journal, you might have to select the image for a figure from a folder with hundreds of different image files. It’s easy to make a mistake – the slip of a finger, a wrongly named file, a mistake in your notes. And that can lead to the wrong image ending up in your submission and, unless it’s spotted by the keen eyes of a reviewer or editor, in the published paper.

A review of 20,000 published papers revealed that nearly 4 percent of them contained inappropriately duplicated images – most often due to mistakes in data management, but some potentially resulting from misconduct. What if there were a way to detect image duplication and other manipulations so problems could be corrected prior to publication?

That’s what a team of data analysts and research integrity experts at Harvard Medical School are doing. With support from Elsevier, the team is working with other facilities at Harvard, including the Image and Data Analysis Core (IDAC), to develop a tool that can objectively indicate whether an image has been manipulated or is a duplication. That tool may ultimately feature in Elsevier’s editorial submission system.


AI-enabled Face-Swap Porn is on the Rise—and the Law Can’t Help You

WIRED, Culture, Emma Gray Ellis


from

To many vulnerable people on the internet, especially women, this looks a whole lot like the end times. “I share your sense of doom,” Mary Anne Franks, who teaches First Amendment and technology law at the University of Miami Law School, and also serves as the tech and legislative policy advisor for the Cyber Civil Rights Initiative. “I think it is going to be that bad.”

She should know. Franks helped write much of the US’s existing legislation that criminalizes nonconsensual porn—and it’s not going to help. It’s not that Franks and lawmakers weren’t thinking about the implications of manipulated images. It’s that the premise of any current legislation is that nonconsensual porn is a privacy violation. Face-swap porn may be deeply, personally humiliating for the people whose likeness is used, but it’s technically not a privacy issue. That’s because, unlike a nude photo filched from the cloud, this kind of material is bogus. You can’t sue someone for exposing the intimate details of your life when it’s not your life they’re exposing.


Emory looks to lead in education innovation

Emory University, News Center


from

David Kulp 20C felt somewhat anxious when he signed up for an introductory chemistry course at Emory last year.

Understandably so. His last taste of chemistry dated back to sophomore year of high school. On top of that, he spent a gap year away from the classroom.

Now he was putting his dream of a medical career on the line with a deep plunge into basic chemistry that, while not a true weed-out class, could still prove extremely abstract.

“I’m more of a big-picture guy,” he says.

Fortunately for him and his classmates, timing was good. They were the first students in Emory’s College of Arts and Sciences and Oxford College to experience a pilot program of the Department of Chemistry’s major overhaul of everything from introductory courses to capstone senior seminars. Called “Chemistry Unbound,” the program is one of the first from a major research university to completely rethink the way chemistry is taught.


How a computer program can prevent a terrorist attack

Colorado State University, Source


from

After a terrorist attack, the question always arises: Could this have been prevented?

The answer may lie at the intersection of data science and social science. A newly funded project among Colorado State University and Brandeis University researchers is aimed at creating a powerful, data-driven tool that can help law enforcement identify individuals headed toward violent extremism.

CSU’s Anura Jayasumana, professor of electrical and computer engineering with a joint appointment in computer science, and Brandeis’ Jytte Klausen have been awarded $731,000 over two years from the National Institute of Justice to develop a “dynamic risk assessment protocol” that can anticipate imminent risk of violence in individuals. The computational tool would monitor and screen for proven risk indicators of radicalization among large databases of people that would be impossible to comb through one by one.


Link found between genes in mosquitos and the spread of diseases in new MU research

University of Missouri, News Bureau


from

Researchers at the University of Missouri have found a link between genetic molecules in mosquitos and dengue fever. The results of the study, conducted by Susanta K. Behura and his colleagues, could lead to future breakthroughs in combating destructive tropical diseases like dengue fever, Zika virus and yellow fever. The researchers focused their efforts on a single species of mosquito, Aedes aegypti, a key player in the spread of such diseases in animals and humans around the world.

An estimated 2.5 billion people reside in places where it’s possible to contract dengue fever, according to the Centers for Disease Control and Prevention. While small outbreaks have occurred in the continental U.S., dengue fever has been particularly severe in Puerto Rico, where a 2007 epidemic affected approximately 10,000 people. According to the World Health Organization, “Of all disease-transmitting insects, the mosquito is the greatest menace.”

“Given the widespread global distribution of the Aedes aegypti mosquito, we estimate over 40 percent of the world’s population is at risk from dengue,” said Behura, assistant research professor in bioinformatics and computational biology in the Division of Animal Sciences. “Now that the study has been completed, the research could be used to precisely modify the genetic material of mosquitos, preventing them from spreading disease to humans.”


Few physicians are fluent in data. New program aims to fix that.

AMA Wire, Brendan Murphy


from

The influx of electronic data presents a new problem: People working in health care, on both the clinical and system levels, are not properly trained to interpret it. This problem is being addressed by a collaboration between the Regenstrief Institute, IUSM and IU’s Richard M. Fairbanks School of Public Health at Indiana University-Purdue University Indianapolis.

Their program aims to address the need by preparing graduate students and post-doctoral fellows, including physicians, to make sense of that data while working in a broad spectrum of entities across the health care industry.


Yale’s Most Popular Class Ever: Happiness

The New York Times, David Shimer


from

On Jan. 12, a few days after registration opened at Yale for Psyc 157, Psychology and the Good Life, roughly 300 people had signed up. Within three days, the figure had more than doubled. After three more days, about 1,200 students, or nearly one-fourth of Yale undergraduates, were enrolled.

The course, taught by Laurie Santos, 42, a psychology professor and the head of one of Yale’s residential colleges, tries to teach students how to lead a happier, more satisfying life in twice-weekly lectures.

“Students want to change, to be happier themselves, and to change the culture here on campus,” Dr. Santos said in an interview. “With one in four students at Yale taking it, if we see good habits, things like students showing more gratitude, procrastinating less, increasing social connections, we’re actually seeding change in the school’s culture.”


Ford Paves a Path From Big Automaker to Big Operating System

WIRED, Transportation, Aarian Marshall


from

In its 114-year history, Ford has been many kinds of automaker. A manufacturing innovator, a hawker of Mustang muscle, a pickup powerhouse. Now the company that helped put a car (or two) in every garage wants to be something else altogether: an operating system.

“With the power of AI and the rise of autonomous and connected vehicles, for the first time in a century, we have mobility technology that won’t just incrementally improve the old system but can completely disrupt it,” CEO Jim Hackett said in a keynote address at this year’s Consumer Electronics Show, trumpeting the pivot. “A total redesign of the surface transportation system with humans and community at the center.”

As Ford executives move to execute the plan, they unveiled yesterday a reorganization of the automaker’s young mobility business, with two acquisitions to help it along. It’s all in service of a new, very 21st century goal. Ford will put less effort into convincing people to plunk down their credit cards for personal cars (though that’s still important) and more into moving them from A to B, with a little Ford badge tacked onto whatever gets them there.


[1801.06700] A Deep Reinforcement Learning Chatbot (Short Version)

arXiv, Computer Science > Computation and Language; Yoshua Bengio et al.


from

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.


Artificial neurons compute faster than the human brain

Nature, News, Sara Reardon


from

Superconducting computing chips modelled after neurons can process information faster and more efficiently than the human brain. That achievement, described in Science Advances on 26 January1, is a key benchmark in the development of advanced computing devices designed to mimic biological systems. And it could open the door to more natural machine-learning software, although many hurdles remain before it could be used commercially.

Artificial intelligence software has increasingly begun to imitate the brain. Algorithms such as Google’s automatic image-classification and language-learning programs use networks of artificial neurons to perform complex tasks. But because conventional computer hardware was not designed to run brain-like algorithms, these machine-learning tasks require orders of magnitude more computing power than the human brain does.

“There must be a better way to do this, because nature has figured out a better way to do this,” says Michael Schneider, a physicist at the US National Institute of Standards and Technology (NIST) in Boulder, Colorado, and a co-author of the study.


Internet-scanning U-M startup offers new approach to cybersecurity

University of Michigan, Michigan News


from

Rolling out what it’s calling a “street view for cyberspace,” Censys—a tech startup based on technology developed at the University of Michigan—has launched a commercially available version of its internet-wide scanning tool.

Based on technology developed in the lab of U-M computer science and engineering professor J. Alex Halderman, Censys continuously scans the internet, analyzing every publicly visible server and device. It uses the data that comes back to create a dynamic, searchable snapshot of the entire internet.

Censys is designed to be a cybersecurity defense tool for IT experts working to secure large networks, which are composed of a constantly changing array of devices ranging from servers to smartphones and internet-of-things devices.


The Follower Factory

The New York Times; Nichola Confessore, Gabriel J.X. Dance, Richard Harris and Mark Hansen


from

The real Jessica Rychly is a Minnesota teenager with a broad smile and wavy hair. She likes reading and the rapper Post Malone. When she goes on Facebook or Twitter, she sometimes muses about being bored or trades jokes with friends. Occasionally, like many teenagers, she posts a duck-face selfie.

But on Twitter, there is a version of Jessica that none of her friends or family would recognize. While the two Jessicas share a name, photograph and whimsical bio — “I have issues” — the other Jessica promoted accounts hawking Canadian real estate investments, cryptocurrency and a radio station in Ghana. The fake Jessica followed or retweeted accounts using Arabic and Indonesian, languages the real Jessica does not speak. While she was a 17-year-old high school senior, her fake counterpart frequently promoted graphic pornography, retweeting accounts called Squirtamania and Porno Dan.

All these accounts belong to customers of an obscure American company named Devumi that has collected millions of dollars in a shadowy global marketplace for social media fraud.


Pfizer automated flow system screens 1,500 reactions per day

Chemical & Engineering News, Stu Borman


from

Optimizing chemical reactions and finding the best ways to scale them up to produce compounds for clinical testing and drug commercialization can be tedious, time-consuming, and expensive.

To overcome these limitations, in 2015 Merck & Co. researchers developed a microtiter plate-based system in which reagents were pipetted at nanoliter volumes and then analyzed with liquid chromatography/mass spectrometry (LC/MS). The system carried out and evaluated nanomole-scale catalytic cross-coupling reactions at a rate of more than 1,500 per day (Science 2015, DOI: 10.1126/science.1259203). But that system requires the use of nonvolatile solvents that won’t evaporate easily from the open wells of microtiter plates, and reactions can’t be heated.

Inspired by the Merck system and advances in flow chemistry, researchers at Pfizer aimed to build an even more flexible high-throughput system for chemical reaction discovery.


Extra Extra

Psychologists Terrie Moffitt and Avshalom Caspi started following 1000 New Zealanders in 1972 with a retention rate of 95%. Their work is an argument for small, rich data and 100% confidentiality. They discovered how to detect teens at high risk for schizophrenia, which teenage boys are likely to continue to be violent as adults (only ~5%), and the later-life cognitive decline associated with teenage cannabis use. This is the long read of the month.

Second place long read: the WIRED Backchannel deep dive on AI at Amazon by Steven Levy.

An AI DJ collaborates with a human DJ to spin records and is even able to read the crowd in AI DJ Project.



DJ Patil is getting ready to finalize a code of ethics for data science…and so am I. We’ll be at the same event next Tuesday. Look for the final code in coming days.

 
Events



UC Center for Business Analytics Presents Machine Learning Day

University of Cincinnati


from

Cincinnati, OH “The University of Cincinnati’s Center for Business Analytics’s Machine Learning Day event on Feb. 14 features three morning keynotes and six afternoon case studies and technical presentations.” [$$$]


d3.oakland(“Block Party”)

Bay Area d3 User Group


from

Oakland, CAM Tuesday, February 13, starting at 6:30 p.m., GreenInfo Network (2201 Broadway, Suite 604). [rsvp required]

 
Deadlines



Membership – NumFOCUS

NumFOCUS is a 501(c)(3) nonprofit in the United States. Your tax-deductible donation supports NumFOCUS in our mission to promote sustainable high-level programming languages, open code development, and reproducible scientific research.”

Invent the Future – Simon Fraser University

Burnaby, BC, Canada July 8-20. “Invent the Future is a two-week summer enrichment program for Grade 11 girls at Simon Fraser University’s Burnaby campus.” Deadline for applications is April 3.
 
Tools & Resources



DynaMl

GitHub – transcendent-ai-labs


from

“DynaML is a Scala environment for conducting research and education in Machine Learning. DynaML comes packaged with a powerful library of classes for various predictive models and a Scala REPL where one can not only build custom models but also play around with data work-flows. It can also be used as an educational/research tool for data analysis.”


Khroma — The AI color tool for designers.

George Hastings


from

“TL;DR: Why painstakingly craft color pallettes by hand when I can train an AI to learn how to infinitely replicate palettes I like? I did, and with Khroma anyone can.”


Elasticsearch ‘Learning to Rank’ Released, Bringing Open Source AI to Search Teams

PR Newswire, OpenSource Connections


from

“Search experts at OpenSource Connections, the Wikimedia Foundation, and Snagajob, deliver open source cognitive search capabilities to the Elasticsearch community. The open source Learning to Rank plugin allows organizations to control search relevance ranking with machine learning. The plugin is currently delivering search results at Wikipedia and Snagajob, providing significant search quality improvements over legacy solutions.”


How to be a SciPy reviewer

Medium, Lorena A Barba


from

SciPy has evolved its process to build the conference program in recent years, aiming for more transparency and trust. In 2017, we moved to double-open peer review: authors and reviewers are known to each other (i.e., reviews are signed). We made a tenacious outreach effort to enlist reviewers, we gave area chairs a clear charge to make decisions, and we developed the SciPy Chair and Reviewer Guideline.”

“The size of our reviewer pool went from 58 and 67 members in 2015 and 2016, to 165 in 2017 — and we expect even more volunteers this year. With 154 submissions received in 2017, the workload was about three submissions per reviewer. This year, we plan to increase the number of reviewers assigned per submission, aiming to get four meaningful reviews for each. Anticipating a larger reviewer pool, we think the workload will remain below 5 submissions per reviewer.”


[1801.08336] Big Data Visualization Tools

arXiv, Computer Science > Databases; Nikos Bikakis


from

“Data visualization is the presentation of data in a pictorial or graphical format, and a data visualization tool is the software that generates this presentation. Data visualization provides users with intuitive means to interactively explore and analyze data, enabling them to effectively identify interesting patterns, infer correlations and causalities, and supports sense-making activities.”


The UX of AI – Using Google Clips to understand how a human-centered design process elevates artificial intelligence

Google Design, Josh Lovejoy


from

As was the case with the mobile revolution, and the web before that, machine learning will cause us to rethink, restructure, and reconsider what’s possible in virtually every experience we build. In the Google UX community, we’ve started an effort called “human-centered machine learning” to help focus and guide that conversation. Using this lens, we look across products to see how machine learning (ML) can stay grounded in human needs while solving for them—in ways that are uniquely possible through ML. Our team at Google works across the company to bring UXers up to speed on core ML concepts, understand how to best integrate ML into the UX utility belt, and ensure we’re building ML and AI in inclusive ways.

Google Clips is an intelligent camera designed to capture candid moments of familiar people and pets. It uses completely on-device machine intelligence to learn to only focus on the people you spend time with, as well as to understand what makes for a beautiful and memorable photograph. Using Google Clips as a case study, we’ll walk through the core takeaways after three years of building the on-device models, industrial design, and user interface—including what it means in practice to take a human-centered approach to designing an AI-powered product.

 
Careers


Full-time positions outside academia

Program Specialist – Institute of Museum and Library Services



National Foundation on the Arts and the Humanities, Institute of Museum and Library Services; Washington, DC

Chief Operating Officer



Partnership on AI; San Francisco, CA
Internships and other temporary positions

Engineering Intern, Data Science



iHeartRadio; New York, NY

Leave a Comment

Your email address will not be published.