Data Science newsletter – January 22, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for January 22, 2018

GROUP CURATION: N/A

 
 
Data Science News



Psychomotor function measured via online activity predicts motor vehicle fatality risk

Nature, NPJ Digital Medicine; Tim Althoff, Eric Horvitz & Ryen W. White


from

Impaired psychomotor performance severely increases the risk of fatal and non-fatal car accidents. However, we currently lack methods to continuously and non-intrusively monitor psychomotor performance. We show we can estimate psychomotor function at population scale from 16 billion observations of typing speeds during the input of web search queries. We show that these estimates exhibit diurnal variation with a substantial increase during typical sleep times, matching published accident risk rates. Further, we show that psychomotor impairment, as measured by keystroke timing, predicts motor vehicle fatality risk on a population level (Spearman ρ = 0.61; p « 10−10). The methods and results highlight a promising direction of harnessing ambient streams of data, such as patterns of interactions with devices, as large-scale sensors to continuously and non-intrusively monitor human psychomotor performance at population scale. [full text]


Company Data Science News

Zoolingua, a company started by Professor Emeritus Con Slobodchikoff of Northern Arizona University is translating animal communications into human speech. He studied prairie dogs, realized they make different warning vocalizations based on the type and size of nearby predators, and has since been developing a data science application that could turn your dog’s facial expressions and barks into human words. Why not start with babies? They seem frequently inscrutable and much more vocal that most dogs. Good idea for a start-up.

Edward Chenard thinks 80-90% of data scientists want to leave their jobs. He blames the “academic trap” wherein PhDs are hired to run teams but they don’t know how to: “run a P&L, manage a team, deal with people…or make market assessments.” He also decries the application of agile frameworks in the data science environment. Data science programs, take heed. Teach your students to address the full range of organizational demands.

Google, SAP, Dell EMC, and Microsoft are offering low-math suites of tools for machine learning to bring smaller companies to the AI gold rush. Data science is difficult, there aren’t enough well-trained grads yet, so expect to see lots of companies trying to bridge this gap with low-math strategies which are roughly synonymous with low-code platforms for software development.



The Allen Institute for Artificial Intelligence is taking a more human approach to make it easier for small companies to access talented AI experts and their techniques. They are offering a CTO residency program so that “early-stage startups can access the same kind of talent that’s available to the big guys.”



Elsevier acquired Via Oncology, a University of Pittsburgh Medical Center Enterprises spinoff that “develops clinical pathways with payers and providers.” Elsevier is out there, they’ve got money, and they want to position themselves well in any emerging information space, including precision medicine and other health care applications.



Google is doubling its French workforce by opening Les Ateliers Numériques Google, focusing on AI.



Facebook has been doing some soul-searching since the election of Donald Trump. An ex-mentor to Mark Zuckerberg, Roger McNamee summed up his estimation of what’s going on pointedly: “The problem with Facebook’s whole position is that the algorithm exists to maximize attention, and the best way to do that is to make people angry and afraid.” OK, so there may be a reason he is an ex-advisor. That statement has surface level resonance, but lacks complexity and nuance. I’m neither here nor there when it comes to what Facebook should do or be, but I think it is fair to note that we have never seen a company with so much power over the hearts and minds of 2 billion people. That’s not what Facebook wanted, but it is what they’ve got. They should be taking time and careful consideration before proceeding, as they appear to be doing.



Marc Benioff, CEO of Salesforce and Silicon Valley heavyweight chimed in with his opinion on how to deal with Facebook: regulate it like we regulate tobacco companies. Not sure smoking and social media are synonymous, but that’s not the point. You know an industry is roiling when heavyweights try to wrap each other up in regulatory restraints.



Part of this soul-searching sees Yann LeCun stepping into a new role as Chief AI Scientist where he’ll focus on research, handing off the AI management reigns to Facebook newbie, Jerome Presenti. LeCun will continue to teach at NYU. And Bored Yann LeCun “will continue to muse on the connections of all things, terrestrial and extra-, to deep neural networks” on Twitter.

NVidia’s GPUs are powering an AI watch that detects heart arrhythmias. Developed by AliveCor, a Mountain View start-up, the app “correlate[s] heart rate to a user’s current activity” to determine whether the rhythm is anomalous or not. This type of application is particularly important for those with congenital heart deformities and heart disease.



We all probably new this was coming: Deutsche Borse has an index created by AI to identify high performing AI companies/stocks. Want to invest? It’s called the AI Global Artificial Intelligence Index and is currently a pool of 200 stocks selected from a slate of 7000.



IBM is sending its AI platform Watson to the Grammys to gather fashion data about celebs on the red carpet. I have no idea why they are doing this, other than to garner attention for their (floundering?) AI product. In more interpretable news, the company had its first quarterly revenue growth in years.


Stanford scholars develop new algorithm to help resettle refugees and improve their integration

Stanford University, Stanford News


from

A new machine learning algorithm developed by Stanford researchers could help governments and resettlement agencies find the best places for refugees to relocate, depending on their particular skills and backgrounds.


Extra Extra

Cryptocurrency has caused a global shortage of graphics cards. Time to invest in NVidia! (I am not a financial advisor. Standard invest-at-your-own risk disclosure applies.)

But don’t invest in the cloud, according to The Economist.



Carnegie Mellon University sent a cohort of faculty to Davos. Is this a thing? Since when are academics ritzy glitzy?

Wired has a podcast episode asking why Americans are dumb. They said it more nicely.


[1801.06122] Anatomy of an online misinformation network

arXiv, Computer Science > Social and Information Networks; Chengcheng Shao, Pik-Mai Hui, Lei Wang, Xinwen Jiang, Alessandro Flammini, Filippo Menczer, Giovanni Luca Ciampaglia


from

Massive amounts of fake news and conspiratorial content have spread over social media before and after the 2016 US Presidential Elections despite intense fact-checking efforts. How do the spread of misinformation and fact-checking compete? What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? How to reduce the overall amount of misinformation? To explore these questions we built Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. Hoaxy filters public tweets that include links to unverified claims or fact-checking articles. We perform k-core decomposition on a diffusion network obtained from two million retweets produced by several hundred thousand accounts over the six months before the election. As we move from the periphery to the core of the network, fact-checking nearly disappears, while social bots proliferate. The number of users in the main core reaches equilibrium around the time of the election, with limited churn and increasingly dense connections. We conclude by quantifying how effectively the network can be disrupted by penalizing the most central nodes. These findings provide a first look at the anatomy of a massive online misinformation diffusion network.


University Data Science News

A group of 36 biologists and clinical researchers published a position piece at bioRxiv about the intersection of AI and biology and argue that, “deep learning has yet to revolutionize or definitively resolve any of these problems, but promising advances have been made on the prior state of the art.” They rightly note that, “the limited amount of labeled data for training presents problems in some domains, as do legal and privacy constraints.” Still, “foresee deep learning powering changes at both bench and bedside.” I see this is as a yellow light – proceed with caution – for those in industry and academia chomping at the bit to rush into precision medicine like the Oklahoma boomers.

Tim Althoff (Stanford), Eric Horvitz (Microsoft), and Ryen White (Microsoft) just might have cracked the code on figuring out how to tell if someone is too impaired to drive. They found a strong monotonic correlation (Spearman coefficient = 0.61; p << 10 to the negative 10) between the speed and accuracy of typed search queries and fatalities. This means their model might lead to an driving fitness test agnostic of the cause of impairment: could be lack of sleep, alcohol, weed or other drugs, extreme anxiety, kids screaming in the backseat, etc. Stanford political scientists, meanwhile, have created a matching algorithm to place refugees in communities where they are most likely to succeed based on their skills and the economic characteristics of the receiving communities. They have a fairly wide range describing the potential accuracy of these predictions – 40 – 70% – which is not too surprising given the many human components in the model.

This week in The Passions and the p-values, human geneticist Pardis Sabeti argues that social psychology has acted on the replication debate with unproductive personal attacks, threatening to force promising researchers out of the field altogether. She cites the case of Amy Cuddy, who left the field after others failed to replicate her finding that power poses lead to increased performance in certain high stress situations. Cuddy was lambasted and ended up leaving the field. Later replications found more support for her work, but she was already gone. Sabeti advises academics to avoid the , “insidious temptation to mistake being critical for being right, and to subordinate humility and decency to a “gloating sense of ‘gotcha’.”

Antibiotic resistance is an apocalypse level problem. We can all rejoice that a team of computer scientists in the US and Russia has developed an algorithm that identified ten times more antibiotics than all antibiotics discovered to date. We aren’t safe yet since no drugs have been developed, but there is hope.



Tsinghua University researchers have invented an optimized-for-AI chip called Thinker that can run on 8 AA batteries for an entire year. The beauty of its design is how it optimizes power usage based on the computational demands of the particular software with which it is currently paired. This is a big development. Also, China is now the largest producer (18.6%) of scientific articles, surpassing the US.



Aviv Regev and Sarah Teichmann are adding another biological atlas to the growing corpus of atlases, this time focusing on mapping every single cell type in the human body. Ah, to be a scientist in the age of data explosion.


Chief Data Officer vs. Chief Analytics Officer

ESRI, WhereNext magazine, Amen Ra Mashariki


from

Digital transformation provides cities with ever-increasing ways to capture data—from connected streetlights to CCTVs to traffic flow and collision information transmitted from smart roads. But many cities fall short of their digital potential because they focus on collecting and using that data in narrow, traditional ways.

A truly smart city goes beyond these, improving quality of life and the climate for business by analyzing data in service of specific challenges and using it to make good decisions in real time on a range of critical issues, from public safety to open data to economic investment. Probing deeply into big data and pinpointing solutions to important problems demand a chief analytics officer (CAO), rather than a chief data officer (CDO). Both are essential to a smart city or smart business, but they are quite different.


Why Are Data Science Leaders Running for the Exit?

DataScience.com, Edward Chenard


from

I’ve had several conversations recently with people I know in the data science space that always start out about business and then drift to the state of data science as a whole. One theme constantly comes up in these conversations: There are a lot of people currently running data science teams at large organizations and the vast majority of them — I believe we are talking about 80-90% — want to leave their jobs. Why is that? Within smaller organizations, the number isn’t as great. So what is going on in larger organizations that is causing such a mass exodus? Having worked in and with many large organizations in a data science leadership capacity, I have a few theories.


Oddball scientists, the rise of Chinese research, and other highlights from NSF’s new tome of essential science statistics

Science, Jeffrey Mervis


from

Scientists discover something new every day. But science policy trends can take decades to reveal themselves. That’s why the bottom line in the newest edition of an indispensible statistical tome from the National Science Foundation (NSF)—that China continues to close the gap with the United States in the international race for scientific supremacy—will sound very familiar to those who follow these trends.

“The U.S. global share of [science and technology] activities is declining as other nations—especially China—continue to rise,” NSF officials declared yesterday in rolling out the 2018 Science & Engineering Indicators, a massive biennial report that tracks scientific activity around the world. “The U.S. still leads by many measures,” adds Maria Zuber, chair of the National Science Board in Alexandria, Virginia, NSF’s oversight body, “but our lead is decreasing in certain areas that are important to the country.”


Inside Amazon Go, a Store of the Future

The New York Times, Nick Wingfield and Natasha Singer


from

The technology inside Amazon’s new convenience store, opening Monday in downtown Seattle, enables a shopping experience like no other — including no checkout lines.


A clutch of clinical decision support deals from UPMC spinoff Via Oncology to GE Healthcare

MedCity News, Stephanie Baum


from

Clinical decision support companies have become a hot acquisition target for organizations wishing to expand their tech toolboxes and help hospitals reduce the cost of care by avoiding repeated tasks and unnecessary procedures. Elsevier’s acquisition of Via Oncology, a spinoff of University of Pittsburgh Medical Center Enterprises that develops clinical pathways with payers and providers, marks the medical publisher’s latest deal in this health IT subsector.

Although the terms of the deal were not disclosed, Via Oncology companies will continue to work for the business in Pittsburgh, according to the press release.


Microsoft is discontinuing HealthVault Insights apps “research project”

MedCity News, Stephanie Baum


from

A Microsoft initiative that produced apps to offer analysis and user insights based on personal health records from HealthVault, has been shut down by the company. It is removing apps associated with HealthVault Insights from iOS, Android and Windows app stores.

A statement on Microsoft’s website offered no explanation for the move, except to say that HealthVault Insights was merely a research project launched last year had come to an end.


CDC to cut back disease work in foreign countries: report

TheHill, Peter Sullivan


from

The Centers for Disease Control and Prevention is planning to significantly reduce its overseas work to fight disease due to coming funding cutbacks, according to an internal email reported by The Wall Street Journal.

Unless it gets new funding, the CDC will be cutting down its work against diseases from 49 countries to 10 countries starting in October 2019, the Journal reported.

 
Deadlines



We are expanding to a global DSSG network and will be running joint summer programs in Portugal, Charlotte (NC), and Chile in 2018-2019.

Applications for fellows, mentors, and project partners for the 2018 Data Science for Social Good programs (in Chicago, Charlotte, and Lisbon) are now open. Apply by January 31, 2018

Help us find the young innovators who’ll shape the future

You have until February 1 to nominate someone for MIT Technology Review’s 2018 list of 35 Innovators Under 35.

Data through Design – Submissions

“As part of New York City’s Open Data Week, Data through Design is now accepting submissions for our exhibition March 3 – 10 at the NY Media Center” … All submissions should use at least one of NYC’s open datasets, available through Open Data Portal. Deadline for submissions is February 2.

NYU Information Law Institute Research Fellowships Available for 2018-19

The Information Law Institute at NYU is accepting applications for research fellowships to begin in Fall 2018. The initial fellowship term is for one year, with the possibility of renewal for a second year. Applicants should hold a law degree or a Ph.D. in a relevant area, and show interest and promise in conducting research and writing. We will give preference to applicants with demonstrated interest in ILI areas of focus, including privacy, algorithms and automated decision-making, and information/data ethics and politics.

JupyterCon call for speakers

New York, NY Conference will August 21-24. “development, we’d love to hear from you. “If you have an intriguing case study, deep technical knowledge to share about Jupyter’s internals, a useful extension, or a provocative idea for new development, we’d love to hear from you.” Deadline for proposals is March 6. Also, 20% registration discount to JupyterCon with the code NYUDSC20. Even larger discount for faculty (50%) and full-time students (65%). Contact confreg@oreilly.com.
 
Tools & Resources



Convolutional neural networks for language tasks

O'Reilly Radar, Garrett Hoffman


from

When approaching problems with sequential data, such as natural language tasks, recurrent neural networks (RNNs) typically top the choices. While the temporal nature of RNNs are a natural fit for these problems with text data, convolutional neural networks (CNNs), which are tremendously successful when applied to vision tasks, have also demonstrated efficacy in this space.

In our LSTM tutorial, we took an in-depth look at how long short-term memory (LSTM) networks work and used TensorFlow to build a multi-layered LSTM network to model stock market sentiment from social media content. In this post, we will briefly discuss how CNNs are applied to text data while providing some sample TensorFlow code to build a CNN that can perform binary classification tasks similar to our stock market sentiment model.


The Ultimate Guide to JavaScript Frameworks

Javascript Report, John Hannah


from

Keeping up with JavaScript frameworks can be a challenge. There are a lot of them, and seemingly another one every month. How do you know which ones might be right for your project? What are their strengths and weaknesses? How do you get started?

That’s where this guide comes in. It’s a living document that is a reference for all known front end JavaScript frameworks (archived or deprecated projects are not included). In this case, the term “frameworks” is being used in a broad sense. It includes user interface (UI) libraries like React, as well as full frameworks like Angular.


Sharing basic machine learning ideas with kids

Mike Lawler


from

Two weeks ago I saw an interesting lecture from Gil Strang at MIT about the math behind machine learning. Sharing some of those ideas with kids has been on my mind ever since. Today I finally got around to it!

 
Careers


Full-time positions outside academia

Big Data Manager



Southern Poverty Law Center; Atlanta, GA

Leave a Comment

Your email address will not be published.