Data Science newsletter – March 15, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for March 15, 2018


Data Science News

Researchers say use of artificial intelligence in medicine raises ethical questions

Stanford Medicine, News Center


Artificial intelligence is hard at work crunching health data to improve diagnostics and help doctors make better decisions for their patients. But researchers at the Stanford University School of Medicine say the furious pace of growth in the development of machine-learning tools calls for physicians and scientists to carefully examine the ethical risks of incorporating them into decision-making.

In a perspective piece published March 15 in The New England Journal of Medicine, the authors acknowledged the tremendous benefit that machine learning can have on patient health. But they cautioned that the full benefit of using this type of tool to make predictions and take alternative actions can’t be realized without careful consideration of the accompanying ethical pitfalls.

“Because of the many potential benefits, there’s a strong desire in society to have these tools piloted and implemented into health care,” said the lead author, Danton Char, MD, assistant professor of anesthesiology, perioperative and pain medicine. “But we have begun to notice, from implementations in non-health care areas, that there can be ethical problems with algorithmic learning when it’s deployed at a large scale.”

Healthy Nevada Study Combines Genomics, Population Health

HealthIT Analytics, Jennifer Bresnick


A unique population health study in Nevada is combining genomic information from thousands of citizens with socioeconomic, environmental, and clinical data to develop insights into how lifestyle interacts with genetics.

The Healthy Nevada Project, a collaboration between non-profit healthcare delivery system Renown Health and the Desert Research Institute (DRI), is currently entering its second data collection phase after a successful first effort to gather genetic information from local residents.

“We’ve taken our clinical data over ten years and combined it with the expertise at Desert Research Institute’s environmental data platform and the social determinants of health from our community and put it in a mega data warehouse so that we can understand the patterns of disease and illness in our community,” said Tony Slonim, MD, DrPH, Renown Health President and CEO.

A new kind of computer science major delves into how technology is reshaping society

Los Angeles Times, Rosanna Xia


Occidental established a computer science major this fall, one of numerous liberal arts colleges to do so in recent years. They’ve popped up at Reed College in Oregon and Whitman College in Washington state.

These schools better known for teaching history and philosophy are shaping their programs to draw on their strengths. They don’t just focus on the vocational or on abstract algorithms. As artificial intelligence and automation increasingly enter everyday life, their courses push students to examine how modern technology both changes and challenges society.

In Maine, Bates College started a multidisciplinary Digital and Computational Studies program, with aims including “to interrogate the values and assumptions of a digitized world” and “increase understanding of the power and limitations of computers in solving problems.”

How 11,000 Citizen Scientists Helped Unlock Decades of Biological Data

WTTW, Chicago Tonight, Alex Ruppenthal


Hidden inside storage cabinets at the Field Museum are thousands upon thousands of unexplored plant specimens that hold potential clues to key scientific questions, such as determining the speed and impact of climate change.

The problem, until recently, was that it would take years or even decades to study each specimen in the massive plant library.

“There’s so much information there that’s locked away,” said Matt Von Konrat, who oversees more than 3 million plant specimens as head of the Field’s botanical collections. “So we were trying to think of, ‘How can we accelerate this whole process of documentation and discovery?’”

What Von Konrat and his colleagues did was enlist the help of more than 11,000 people worldwide, ranging from kindergarten students in Chicago to college professors in Chile and New Zealand.

The 2018 Developer Survey Results are Live

Stack Overflow Blog, Rachel Ferrigno


Each year, we ask the developer community about everything from their favorite technologies to their job preferences. This year marks the eighth year we’ve published our Annual Developer Survey results—with the largest number of respondents yet. Over 100,000 developers took the 30-minute survey this past January. This year, we covered a few new topics ranging from artificial intelligence to ethics in coding.

Google Cloud chief scientist: ‘AI doesn’t belong to just a few tech giants in Silicon Valley’

VentureBeat, Khari Johnson


Silicon Valley may be behind much of the development of AI in the modern world, but it’s vital that everyone feel included in the technology, said Fei-Fei Li, Google Cloud chief scientist for AI.

“It’s time to bring AI together with social science, with humanities, to really study the profound impact of AI to our society, to our legal system, to our organizations, to our society to democracy, to education, to our ethics,” Li said. “Again I stress: AI doesn’t belong to just a few tech giants in Silicon Valley, and these few companies in Silicon Valley have a responsibility to harness AI for the good of everyone, but they also have the responsibility to work with everybody, recognize we don’t know it all, and to include everybody.

“This is a historical moment, and we have a tremendous opportunity and responsibility and to really think about how to remedy this problem.”

UT System pumped millions into its oil institute, then pulled the plug

Austin American-Statesman, Ralph K.M. Haurwitz


When a blue-ribbon task force recommended that the University of Texas System establish an energy institute with research and academic chops reminiscent of Sematech, the computer chip consortium that helped spark Austin’s high-tech expansion, the system’s Board of Regents rose to the challenge.

“This is a great initiative, without question, and one critical to the state and the university system,” then-Chairman Paul Foster said in August 2014 as the regents voted unanimously to establish the institute. Francisco Cigarroa, who was chancellor at the time, said it was “at least a 10-year plan,” adding: “We will get the very best person in the United States or the world to lead this effort.”

The UT System hired Jeffrey Spath, whose credentials include a doctorate in engineering, a stint as president of the Society of Petroleum Engineers and a 32-year career with oilfield services giant Schlumberger Ltd. that saw him take on assignments around the world and rise to vice president.

But the system’s sunny ambitions for what came to be known as the Texas Oil and Gas Institute soon wilted like a flower in a hailstorm, an American-Statesman investigation has found.

Delphia helps publishers create complex, AI-driven surveys

TechCrunch, Anthony Ha


You’re probably familiar with quizzes from online publishers like BuzzFeed. But what if a quiz could actually help you sort through tough decisions and complex topics, not just which Sex and the City character or Disney princess you most closely resemble?

That’s basically what Y Combinator -backed startup called Delphia is promising. CEO Clifton van der Linden said the company works with publishers to create applications that help their readers make decisions.

Van der Linden is a Ph.D. candidate in the political science department at the University of Toronto, and he first built an application called Vote Compass, which tells users how their political views line up with election candidates. (In the United States, Vote Compass was released in partnership with Vox.)

Now, however, Delphia is working to bring a similar approach to non-political questions, like helping kids decide which college to attend, or helping adults figure out the workplace culture that would best fit them.

Peter Thiel’s Palantir Wins $876 Million U.S. Army Contract

Bloomberg Technology, Lizette Chapman


Billionaire investor Peter Thiel got a fresh victory in Washington. His data-mining startup, Palantir Technologies Inc., won a much-contested contract to provide software to the U.S. Army.

Palantir will work with Raytheon Co. to replace the troubled Distributed Common Ground System now in effect. They beat out seven other proposals for a decade-long, $876 million contract, according to the U.S. Defense Department. Terms of the partnership between Palantir and Raytheon weren’t disclosed, and the companies didn’t immediately respond to requests for comment.

New Orleans ends its relationship with tech firm Palantir, Landrieu’s office says, Jonathan Bullington and Emily Lane


New Orleans Mayor Mitch Landrieu’s office will not renew its agreement with Palantir Technologies, the data collection and analysis company whose use in the city’s crime-fighting efforts recently raised issues of privacy and transparency.

Landrieu’s office confirmed by email Wednesday (March 14) that the agreement would not be renewed. It expired Feb. 21, and had been extended three previous times since first drafted in 2012.

“This technology is no longer being utilized in day-to-day operations, and therefore, will not be renewed,” Landrieu press secretary Craig Belden said in an email.

Litigation Analysis AI Start-Up, Gavelytics, Closes $3.2m Funding Round

Artificial Lawyer


Artificial Lawyer recently featured the Californian company which ‘highlights the vast differences between judges by tracking how each judge tends to rule on over 100 types of motions’.

It also shows how quickly judges move through their cases and how frequently lawyers file peremptory challenges against them. In September 2017, Gavelytics launched its coverage of Los Angeles County Superior Court, the largest and most active state, civil trial court system in the US.

‘Based on my experience as a big firm lawyer and as the co-founder of LegalZoom, I recognized Gavelytics as a truly exciting business that solves a real problem faced by lawyers every day,’ key investor Lee said.

Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication

Proceedings of the National Academy of Sciences; Marcia K. McNutt et al.


In keeping with the growing movement in scientific publishing toward transparency in data and methods, we propose changes to journal authorship policies and procedures to provide insight into which author is responsible for which contributions, better assurance that the list is complete, and clearly articulated standards to justify earning authorship credit. To accomplish these goals, we recommend that journals adopt common and transparent standards for authorship, outline responsibilities for corresponding authors, adopt the Contributor Roles Taxonomy (CRediT) ( methodology for attributing contributions, include this information in article metadata, and require authors to use the ORCID persistent digital identifier ( Additionally, we recommend that universities and research institutions articulate expectations about author roles and responsibilities to provide a point of common understanding for discussion of authorship across research teams. Furthermore, we propose that funding agencies adopt the ORCID identifier and accept the CRediT taxonomy. We encourage scientific societies to further authorship transparency by signing on to these recommendations and promoting them through their meetings and publications programs.

Fintech’s New Big Player: JD Finance Adds $2B, Hits $26–30B Valuation

Medium, Synced


According to a China Securities Journal report, the China International Capital Corporation (CICC) and China’s largest food processor, manufacturer and trader China National Cereals, Oils and Foodstuffs Corporation (COFCO) are looking to invest US$2 billion (CNY 13 billion) in JD Finance, the fintech spinoff of online retailer

The funding will be applied to acquiring financial licenses through mergers and acquisitions, technology research and development, and further investments. The investing institutes will hold 10 billion JD Finance shares. The transaction is expected to be completed by end of April and will raise JD Finance’s market valuation to a colossal US$26 billion to $30 billion.

Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English

Microsoft, The AI Blog, Allison Lin


A team of Microsoft researchers said Wednesday that they believe they have created the first machine translation system that can translate sentences of news articles from Chinese to English with the same quality and accuracy as a person.

Researchers in the company’s Asia and U.S. labs said that their system achieved human parity on a commonly used test set of news stories, called newstest2017, which was developed by a group of industry and academic partners and released at a research conference called WMT17 last fall. To ensure the results were both accurate and on par with what people would have done, the team hired external bilingual human evaluators, who compared Microsoft’s results to two independently produced human reference translations.

Xuedong Huang, a technical fellow in charge of Microsoft’s speech, natural language and machine translation efforts, called it a major milestone in one of the most challenging natural language processing tasks.

“Hitting human parity in a machine translation task is a dream that all of us have had,” Huang said. “We just didn’t realize we’d be able to hit it so soon.”

Researchers create algorithm to separate earthquakes from seismic noise

Harvard Gazette


Marine Denolle, assistant professor of Earth and planetary sciences at Harvard, is one of several co-authors of a study that used computer-learning algorithms to identify small earthquakes buried in seismic noise. Other authors are Thibaut Perol, who has doctoral and master’s degrees from the Harvard John A. Paulson School for Engineering and Applied Sciences and the Harvard Institute for Applied Computational Science, and Michaël Gharbi, a doctoral student at Massachusetts Institute of Technology. The study was published in the journal Science Advances.

While researchers hope the algorithm may one day allow for development of a system for real-time earthquake detection, the ability to track limited “micro-seismicity” should help scientists draw a clearer picture of a number of processes in the Earth.


Globus World 2018

The University of Chicago


Chicago, IL April 25-26. This University of Chicago conference “brings together researchers, systems administrators, developers and IT leaders from top computing centers, labs and universities around the U.S.” [$$$]


HILDA 2018 Call for Papers

Houston, TX Workshop on Human-In-the-Loop Data Analytics, June 10, co-located with SIGMOD 2018. Deadline for submissions is March 23.

Student applications open for Google Summer of Code 2018

Deadline to submit proposals is March 27.

2nd Blockchain@UBC Summer Open Source Blockchain Knowledge Building Initiative

Vancouver, BC May 14-27. “The goal of this initiative is to provide upper undergraduate and graduate students from any discipline at UBC with advanced, specialized training in blockchain and distributed ledger technologies. Students from other universities and industry partners may also apply.” Deadline for applications is April 12.

Beyond Online Data Workshop | ICWSM 2018

Stanford, CA June 25, precedes ICWSM 2018. “This ICWSM workshop on “Beyond online data” aims to bring together social scientists and computer scientists to think about new ways to utilize data for addressing challenging social science questions.” Deadline for paper submissions is April 22.

Mozilla Research Grants 2018H1

Mozilla seeks proposals for research funding to support its mission: to ensure the internet is a global public resource, open and accessible to all.” Deadline to submit proposals is May 1.

National Science Foundation – National Institutes of Health NSF-NIH Interagency Initiative: Smart and Connected Health

“The purpose of this interagency program solicitation is the development of technologies, analytics and models supporting next generation health and medical research through high-risk, high-reward advances in computer and information science, engineering and technology, behavior, cognition, robotics and imaging. Collaborations between academic, industry, and other organizations are strongly encouraged to establish better linkages between fundamental science, medicine and healthcare practice and technology development, deployment and use.” Deadline for proposals is May 22.
Moore-Sloan Data Science Environment News

5 Minutes with Cristina Savin

Medium, NYU Center for Data Science


Neuroscience, AI, and networking? As part of this month’s Women in Data Science series, we catch up with Cristina Savin, Assistant Professor of Neural Science and Data Science

Tools & Resources

Baidu Apollo Releases the World’s Largest Dataset for Self-Driving Cars

Future Car, Eric Walz


Baidu has announced the release of ‘Apollo Scape’, billed as the world’s largest open-source dataset for autonomous driving technology.”

Welcome to the ITS JPO’s public access point for ITS data

USDOT's Intelligent Transportation Systems Joint Program Off


138 datasets and counting!

How Three Lines of Code and Windows Machine Learning Empower .NET Developers to Run AI Locally on Windows 10 Devices | Machine Learning Blog

Microsoft, Cortana Intelligence and Machine Learning Blog, Rosane Maffei Vallim and Wilson Lee


Artificial Intelligence (AI) with deep learning and machine learning algorithms are changing the way we solve variety of problems from manufacturing to biomedical industries. The applications that can benefit from the power of AI are endless.

With the Windows Machine Learning (Windows ML) API, as .NET developers, we can now leverage the ONNX models that have been trained by data scientists and use them to develop intelligent applications that run AI locally. In this blog post, we will give an overview of what Windows ML can do for you; show you how to use ONNX in your UWP application; and introduce you to the Windows Machine Learning Explorer sample application that generically bootstraps ML models to allow users to dynamically select different models within the same application.

Balanced Partitioning and Hierarchical Clustering at Scale

Google Research Blog, Hossein Bateni and Kevin Aydin


Solving large-scale optimization problems often starts with graph partitioning, which means partitioning the vertices of the graph into clusters to be processed on different machines. The need to make sure that clusters are of near equal size gives rise to the balanced graph partitioning problem. In simple terms, we need to partition the vertices of a given graph into k almost equal clusters, while we minimize the number of edges that are cut by the partition. This NP-hard problem is notoriously difficult in practice because the best approximation algorithms for small instances rely on semidefinite programming which is impractical for larger instances.

This post presents the distributed algorithm we developed which is more applicable to large instances. We introduced this balanced graph-partitioning algorithm in our WSDM 2016 paper, and have applied this approach to several applications within Google. Our more recent NIPS 2017 paper provides more details of the algorithm via a theoretical and empirical study.

Top 5 Best Jupyter Notebook Extensions

KDnuggets, Eliot Andres


“Notebook extensions are plug-ins that you can easily add to your Jupyter notebooks. The best way to install them is to use Jupyter NbExtensions Configurator.”

HALP: High-Accuracy Low-Precision Training

Stanford DAWN; Chris De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Chris Aberger, Kunle Olukotun, and Chris Ré


Low-precision computation has been gaining a lot of traction in machine learning. Companies have even started developing new hardware architectures that natively support and accelerate low-precision operations including Microsoft’s Project Brainwave and Google’s TPU. Even though using low precision can have a lot of systems benefits, low-precision methods have been used primarily for inference—not for training. Previous low-precision training algorithms suffered from a fundamental tradeoff: when calculations use fewer bits, more round-off error is added, which limits training accuracy. According to conventional wisdom, this tradeoff limits practitioners’ ability to deploy low-precision training algorithms in their systems. But is this tradeoff really fundamental? Is it possible to design algorithms that use low precision without it limiting their accuracy?

It turns out that yes, it is sometimes possible to get high-accuracy solutions from low-precision training—and here we’ll describe a new variant of stochastic gradient descent (SGD) called high-accuracy low precision (HALP) that can do it. HALP can do better than previous algorithms because it reduces the two sources of noise that limit the accuracy of low-precision SGD: gradient variance and round-off error.


Full-time positions outside academia

Coordinator for Development of Econ-ARK Toolkit

NumFOCUS: Remote

Lead Data Scientist

RetailMeNot; Austin, TX
Internships and other temporary positions

NIPS/ICML/ICLR are looking for a full-time programmer to run the conferences’ submission/review processes

The Salk Institute, Computational Neurobiology Laboratory; La Jolla, CA
Full-time, non-tenured academic positions

Research Engineering/ Scientist Associate I (Exempt) – Data Science Specialist

University of Texas at Austin, Texas Advanced Computing Center; North Austin, TX

Leave a Comment

Your email address will not be published.