Data Science newsletter – July 27, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 27, 2018

GROUP CURATION: N/A

 
 
Data Science News



NIH delays controversial clinical trials policy for some studies

Science, Jocelyn Kaiser


from

Basic brain and behavioral researchers will get more than a year to comply with a new U.S. policy that will treat many of their studies as clinical trials. The announcement from the National Institutes of Health (NIH) appears to defuse, for now, a yearlong controversy over whether basic research on humans should follow the same rules as studies testing drugs.

Although research groups had hoped NIH would drop its plans to tag basic studies with humans as trials, they say they’re relieved they get more time to prepare and give the agency input. “It’s a positive step forward,” says Paula Skedsvold, executive director of the Federation of Associations in Behavioral & Brain Sciences in Washington, D.C.

At issue is a recently revised definition of a clinical trial along with a set of rules in effect since January that are meant to increase the rigor and transparency of NIH-funded clinical trials.


How Alphabet, Amazon, Apple, and Microsoft are shaking up healthcare — and what it means for the future of the industry

Business Insider, Shelagh Dolan


from

New technologies including telehealth, wearables, mobile apps, and AI are facilitating a shift towards preventative medicine and value-based care, in turn reducing claims, improving benefits plans, lowering patients’ premiums, and increasing their lifetime value.

Now, four of the biggest tech companies are ramping up to go all-in on healthcare — but they’re each tackling a different part of the space.


AI, China Dominate Semiconductor Funding

EE Times, Nitin Dahad


from

China’s use of facial recognition technology for widespread state surveillance has been feverishly reported by many media in recent months. The latest is the UK’s Financial Times, which reminds us of a 2015 official paper that articulated a vision to have a national video surveillance network by 2020 that is omnipresent, always working and fully controllable.

The technology upon which it is based is reflected in the figures for investments in artificial intelligence (AI) and related tech companies, including semiconductor companies.

Take for example Beijing-based SenseTime, which in the second quarter raised more than $1.2 billion. Among other things, it provides AI-powered surveillance for the Chinese police. This week, it is reported that the SoftBank Vision Fund is considering putting in almost$1 billion more in SenseTime, which is already valued at over $4.5 billion, arguably the most valuable AI company globally.


Van Jones: AI jobs are a route out of poverty for urban youth

Fast Company, Sean Captain


from

The green jobs advocate is also promoting coding and artificial intelligence careers, teaming up with Google’s AI boss Fei-Fei Li to reach minorities.


How AI could transform the way we measure kids’ intelligence

Quartz, Jenny Anderson


from

There is a saying in education that you treasure what you measure. Going by the standardized tests that dominate schools in many countries around the world, we’re teaching children that we value only a very narrow definition of intelligence—the ability to solve word problems about train times, or identify the purpose of a World War I treaty on a multiple-choice test.

The truth is human intelligence is vast and complex. Yet it is measured—and valued—crassly. And in an age when artificial intelligence is capable of nailing IQ tests and mastering knowledge-based curricula, humans may be setting ourselves up to be outshone by technology.

“I think we are in danger of dumbing ourselves down,” says Rose Luckin, a professor of learning-centered design at University College London who has been studying artificial intelligence and learning for more than 25 years. Because we measure intelligence in very limited ways, “we are very impressed by the sort of intelligent behavior our technology can produce.”


ECP Announces New Co-Design Center to Focus on Exascale Machine Learning Technologies

Exascale Computing Project


from

Rapid growth in the amount of data and computational power is driving a revolution in machine learning (ML) and artificial intelligence (AI). Beyond the highly visible successes in machine-based natural language translation, these new ML technologies have profound implications for computational and experimental science and engineering and the exascale computing systems that DOE is deploying to support those disciplines.

To address these challenges, the ExaLearn co-design center will provide exascale ML software for use by ECP Applications projects, other ECP Co-Design Centers and DOE experimental facilities and leadership class computing facilities. The ExaLearn Co-Design Center will also collaborate with ECP PathForward vendors on the development of exascale ML software.


Helping computers perceive human emotions

MIT News


from

Personalized machine-learning models capture subtle variations in facial expressions to better gauge how we feel.


Who cares? New technology helps answer a big question about big data for educators

Purdue University, Research Foundation News


from

Purdue University researchers have developed a new way to track the effectiveness of online learning programs, speeches and businesses, and determine if anyone really cares enough to download the material.

The Purdue team developed an algorithm that sorts digital data and formats it into a 2D time and space heat map.

“Lots of times it can be difficult for someone such as an educator to determine if their online course or digital supplements to classroom lectures are being used by students,” said Dwight McKay, a senior data science engineer for Information Technology at Purdue (ITaP), who helped lead the research team. “We basically just need to know when the user came to the site and their IP address, a numeric designation that identifies a location on the internet. We can turn all of this big data and massive amounts of information into actionable numbers.”


New Master’s Program in Data Analytics

Tufts University, Tufts Now


from

A new master’s degree program coming next fall reflects the university’s increasing focus on preparing students for rapidly growing career opportunities in data analysis.

The Master of Science in Data Analytics at the Graduate School of Arts and Science (GSAS) will span the sciences, arts, and humanities and give students the skills to drive strategic business decisions. Recruitment will open this fall, with enrollment beginning in fall 2019.


University of Pennsylvania’s School of Engineering launches online master’s in computer science

University of Pennsylvania, Penn Today


from

The University of Pennsylvania’s School of Engineering and Applied Science today announced its first completely online master’s degree, a Master of Computer and Information Technology. MCIT Online is designed to be affordable, at one-third the cost of an on-campus degree, and open to students with no previous background in computer science. Offered exclusively on Coursera, MCIT Online is ideal for students who wish to pursue a variety of high-potential career paths, ranging from software development to product management to data science, that require a strong technical foundation.


DeepMind Is Giving Cambridge Money To Hire Staff

Forbes, Sam Shead


from

DeepMind, a Google-owned lab that is at the front of the AI race, is gifting the University of Cambridge an undisclosed sum to appoint a “DeepMind Chair of Machine Learning”.

The donation comes after DeepMind and other tech companies have been criticised for poaching dozens of leading academics from top universities like Cambridge, luring them with big salaries that universities can’t compete with. Here’s a report about some of DeepMind’s previous Oxbridge hires.


‘The discourse is unhinged’: how the media gets AI alarmingly wrong

The Guardian, Oscar Schwartz


from

According to [Zachary] Lipton, in recent years broader interest in topics like “machine learning” and “deep learning” has led to a deluge of this type of opportunistic journalism, which misrepresents research for the purpose of generating retweets and clicks – he calls it the “AI misinformation epidemic”. A growing number of researchers working in the field share Lipton’s frustration, and worry that the inaccurate and speculative stories about AI, like the Facebook story, will create unrealistic expectations for the field, which could ultimately threaten future progress and the responsible application of new technologies.

Exaggerated claims in the press about the intelligence of computers is not unique to our time, and in fact goes back to the very origins of computing itself.


The World’s 1st Computer Algorithm, Written by Ada Lovelace, Sells for $125,000 at Auction

LiveScience, Brandon Specktor


from

Young Ada Lovelace was introduced to English society as the sole (legitimate) child of scalawag poet Lord Byron in 1815. More than 200 years later, she is remembered by many as the world’s first computer programmer.

On Monday (July 23), Lovelace’s scientific reputation got a boost when a rare first edition of one of her pioneering technical works — featuring an equation considered by some to be the world’s first computer algorithm — sold at auction for 95,000 pounds ($125,000) in the U.K. [Beyond Tesla: History’s Most Overlooked Scientists]


Accelerating GPU Betweenness Centrality

Communications of the ACM; Adam McLaughlin, David A. Bader


from

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest.

Recently, the use of Graphics Processing Units (GPUs) has been promising for efficient processing of unstructured data sets. Prior GPU implementations of BC suffer from large local data structures and inefficient graph traversals that limit scalability and performance. Here we present a hybrid GPU implementation that provides good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. Our methods achieve up to 13× speedup on high-diameter graphs and an average of 2.71× speedup overall compared to the best existing GPU algorithm. We also observe near linear speedup when running BC on 192 GPUs.


ExPecto: New AI Predicts Biological Roles Of Genetic Variations

Evolving Science, Intelligent Machines


from

In the age of AI, this situation is improving rapidly. An example in this new realm of variation-monitoring is ExPecto, a machine learning (ML) system developed at Princeton, in conjunction with the Simons Foundation in New York.

ExPecto has been trained to assess sequences with variations in the context of what the gene(s) in that sequence normally do. The program then extrapolates a variation’s effect on the protein (or regulatory action) that the gene in question codes. Accordingly, ExPecto predicts what that effect could eventually have on a phenotype (the biological and physiological manifestation of a gene).

ExPecto was designed and trained based on previous work that found convincingly causal relationships between disease markers in the genome and actual conditions. All the genome-wide association studies, available to the Princeton/Simons team, were included, on four specific conditions. These immunological health states were chosen to inform ExPecto’s algorithms and ability to predict disease-specific variants. Empirically-validated models of variations associated with the four conditions were also integrated into the system.

 
Events



Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

KDD 2018


from

London, England August 19-23. [$$$$]

 
Deadlines



Open Innovation Challenge | Nokia

“The Nokia Open Innovation Challenge, in partnership with NGP Capital, is looking for new innovative products and solutions within the Industrial IoT domain. It is time to submit your greatest ideas for technology and business models which could change the world. Together with Nokia Bell Labs experts and mentors you can make a difference and potentially grow your business to new levels.” Deadline for submissions is September 6.
 
Tools & Resources



Machine Learning in Google BigQuery

Google AI Blog, Umar Syed and Sergei Vassilvitskii


from

“We’re announcing BigQuery ML, a capability inside BigQuery that allows data scientists and analysts to build and deploy machine learning models on massive structured or semi-structured datasets. BigQuery ML is a set of simple SQL language extensions which enables users to utilize popular ML capabilities, performing predictive analytics like forecasting sales and creating customer segmentations right at the source, where they already store their data. BigQuery ML additionally sets smart defaults automatically and takes care of data transformation, leading to a seamless and easy to use experience with great results.”


Good small datasets

Vicki Boykis


from

“Normally, I’d use mtcars or iris, but I’ve been a bit tired of both lately, so I asked Twitter for suggestions.”

“I got a lot of good answers, so I thought I’d share them here for anyone else looking for datasets.”


Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers

PLOS One; John A. Borghi, Ana E. Van Gulick


from

Neuroimaging methods such as magnetic resonance imaging (MRI) involve complex data collection and analysis protocols, which necessitate the establishment of good research data management (RDM). Despite efforts within the field to address issues related to rigor and reproducibility, information about the RDM-related practices and perceptions of neuroimaging researchers remains largely anecdotal. To inform such efforts, we conducted an online survey of active MRI researchers that covered a range of RDM-related topics. Survey questions addressed the type(s) of data collected, tools used for data storage, organization, and analysis, and the degree to which practices are defined and standardized within a research group. Our results demonstrate that neuroimaging data is acquired in multifarious forms, transformed and analyzed using a wide variety of software tools, and that RDM practices and perceptions vary considerably both within and between research groups, with trainees reporting less consistency than faculty. Ratings of the maturity of RDM practices from ad-hoc to refined were relatively high during the data collection and analysis phases of a project and significantly lower during the data sharing phase. Perceptions of emerging practices including open access publishing and preregistration were largely positive, but demonstrated little adoption into current practice.

 
Careers


Full-time positions outside academia

Senior Data Scientist



Institute for Environmental Analytics; Reading, England

Data Analyst / Software Developer



Institute for Environmental Analytics; Reading, England

Research Director



Geisinger Health System, Applied Behavioral Insights Team; Danville, Pa

Leave a Comment

Your email address will not be published.