Data Science newsletter – October 31, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for October 31, 2017


Data Science News

Assessing and Responding to the Growth of Computer Science Undergraduate Enrollments

The National Academies Press


The field of computer science (CS) is currently experiencing a surge in undergraduate degree production and course enrollments, which is straining program resources at many institutions and causing concern among faculty and administrators about how best to respond to the rapidly growing demand. There is also significant interest about what this growth will mean for the future of CS programs, the role of computer science in academic institutions, the field as a whole, and U.S. society more broadly.

Assessing and Responding to the Growth of Computer Science Undergraduate Enrollments seeks to provide a better understanding of the current trends in computing enrollments in the context of past trends. It examines drivers of the current enrollment surge, relationships between the surge and current and potential gains in diversity in the field, and the potential impacts of responses to the increased demand for computing in higher education, and it considers the likely effects of those responses on students, faculty, and institutions. This report provides recommendations for what institutions of higher education, government agencies, and the private sector can do to respond to the surge and plan for a strong and sustainable future for the field of CS in general, the health of the institutions of higher education, and the prosperity of the nation.

Qualcomm: 5G will power the push toward outcome-based care

MobiHealthNews, Dave Muoio


5G mobile networks may be years away, but that hasn’t stopped mobile health players from salivating over its potential — and perhaps for good reason. A newly released report from Qualcomm and the University of California, Berkeley suggests that the technology will become a “substantial enabler” of the future’s personalized health care ecosystem, and within healthcare alone will have a sales enabling effect of more than $1.1 trillion.

“This is [healthcare’s] network, they’re building this for us,” Qualcomm Life President Rick Valencia said during a recent keynote presentation at the Connected Health Conference in Boston. “This network is not just about what you think generally with new network technology — ‘It’s going to be faster, and I’ll be able to watch videos and real-time streaming wherever I am!’ That’s part of it, but also ultra-low latency and high reliability for mission-critical use cases. What more mission-critical use case can there be than healthcare?”

New “Ultrasound on a Chip” Tool Could Revolutionize Medical Imaging

IEEE Spectrum, Eliza Strickland


Jonathan Rothberg, a entrepreneur who prides himself on drastically disrupting the biomedical industry every so often, has typically big claims for his new product. The Butterfly iQ, a cheap handheld ultrasound tool with AI smarts tucked inside, will 1) revolutionize medical imaging in hospitals and clinics, 2) change the game in global health, and 3) eventually become a consumer product that will be as ubiquitous as the household thermometer, he says.

Today, Rothberg’s startup Butterfly Network unveiled the tool and announced its FDA clearance for 13 clinical applications, including cardiac scans, fetal and obstetric exams, and musculoskeletal checks. Rather than using a dedicated piece of hardware for the controls and image display, the iQ works with the user’s iPhone. The company says it will start shipping units in 2018 at an initial price of about $2,000.

CICS Robotics Researchers Plan Studies to Enhance Machines’ Autonomy

UMass Amherst, College of Information and Computer Sciences


Despite substantial progress in recent years, there are still “considerable barriers” to deploying fully autonomous systems such as self-driving cars or mobile service robots, say computer scientists Shlomo Zilberstein and Joydeep Biswas at the University of Massachusetts Amherst. They recently received a three-year, $700,000 grant from the National Science Foundation to develop more advanced autonomous systems that can learn from experience, recognize when they need human help and ask for it.

As Zilberstein explains, “Most robots in industry work in cages because it’s dangerous for people to be near them, but there is growing interest now in having robots in the home, robots that can help people. This kind of robot will have to be more autonomous than those on an assembly line because operating in the open world is a lot harder, where the environment is less predictable.”

He adds, “The basic assumption we make is that these systems will not be so independent that they will never need help. They will always need help to learn about their environment, and most important, to deal with the unexpected.”

UCI astronomer wins NASA Habitable Worlds grant to study surface reflectivity of exoplanets

University of California-Irvine, UCI News


Astronomers can learn a lot about planets outside our solar system, including whether they’re capable of supporting life, by studying their surface albedo, or reflectivity. Aomawa Shields, UCI Clare Boothe Luce Assistant Professor of Physics & Astronomy, recently won a grant from the NASA Habitable Worlds program to research this characteristic of planets orbiting relatively dim M-dwarfs, which make up about 70 percent of all the stars in our galaxy. “A multitude of factors go into how light is absorbed or reflected off planetary surfaces and how that interaction impacts climate and habitability,” said Shields, who joined UCI early this year. “My goal with this project is to develop a sophisticated treatment of surface albedo to provide a critical component for accurately modeling M-dwarf planets.” Now is an exciting time to be in the exoplanet research field, she noted, as a number of powerful observatories – including the James Webb Space Telescope, the Transiting Exoplanet Survey Satellite and several large ground-based telescopes – are set to be deployed. Data from these new instruments will help determine what lies in the atmospheres, and perhaps on the surfaces, of these planets, whether it’s water, ice, salt or land. “While we examine the effects on climate and habitability of the albedos of these various surface types, we will also be incorporating data on the composition and absorption properties of the atmospheres of the exoplanets being discovered by these telescopes,” Shields said. “These are all factors that can play into whether or not a planet can support life.”

Imagining the “open” university: Sharing scholarship to improve research and education

PLOS Biology, Perspective; Erin C. McKiernan


Open scholarship, such as the sharing of articles, code, data, and educational resources, has the potential to improve university research and education as well as increase the impact universities can have beyond their own walls. To support this perspective, I present evidence from case studies, published literature, and personal experiences as a practicing open scholar. I describe some of the challenges inherent to practicing open scholarship and some of the tensions created by incompatibilities between institutional policies and personal practice. To address this, I propose several concrete actions universities could take to support open scholarship and outline ways in which such initiatives could benefit the public as well as institutions. Importantly, I do not think most of these actions would require new funding but rather a redistribution of existing funds and a rewriting of internal policies to better align with university missions of knowledge dissemination and societal impact.

Saguaro and other towering cacti have a scrambled history

University of California-Berkeley, Berkeley News


According to Noah Whiteman, a UC Berkeley associate professor of integrative biology who is a coauthor of a paper appearing this week in the journal Proceedings of the National Academy of Sciences, the cactus family tree and the giant cacti in particular – the giant saguaro, organ pipe, senita and cardón, also called the Mexican giant cactus – have been very difficult to trace. Found only in the Americas, cacti have adapted to a broad range of environments, with a current count of 1,438 species. Yet scientists disagree by a factor of 10 about how many genera of cacti these species represent.

This is in part because the same traits – succulence and a columnar form, for example – seem to have evolved separately in different lineages: what’s known as parallel evolution.

Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind

Microsoft Research; Elliot Salisbury, Ece Kamar and Meredith Ringel Morris


The access of visually impaired users to imagery in social media is constrained by the availability of suitable alt text. It is unknown how imperfections in emerging tools for automatic caption generation may help or hinder blind users’ understanding of social media posts with embedded imagery. In this paper, we study how crowdsourcing can be used both for evaluating the value provided by existing automated approaches and for enabling workflows that provide scalable and useful alt text to blind users. Using real-time crowdsourcing, we designed experiences that varied the depth of interaction of the crowd in assisting visually impaired users at caption interpretation, and measured trade-offs in effectiveness, scalability, and reusability. We show that the shortcomings of existing AI image captioning systems frequently hinder a user’s understanding of an image they cannot see to a degree that even clarifying conversations with sighted assistants cannot correct. Our detailed analysis of the set of clarifying conversations collected from our studies led to the design of experiences that can effectively assist users in a scalable way without the need for real-time interaction. They also provide lessons and guidelines that human captioners and the designers of future iterations of AI captioning systems can use to improve labeling of social media imagery for blind users.

A tech slam dunk: helping students discover code with The Hidden Genius Project, Justin Steele


Today we’re announcing a $1 million grant to The Hidden Genius Project, an Oakland, CA-based organization that is working to increase the representation of black male youth in tech. By mentoring Black male youth in technology creation, entrepreneurship, and leadership skills, The Hidden Genius Project aims to shift perceptions of computer scientists and inspire the next generation of technologists. These funds will help the organization grow into new cities, train more staff, and expand their work to inspire more young people to pursue careers at the intersection of tech and their passions.

I first met The Hidden Genius Project when they were finalists and then winners in our 2015 Google Impact Challenge. Since our initial $500,000 grant, they’ve reached more than 1,700 Bay Area students through their 15-month intensive CS and entrepreneurship bootcamp program, as well as events and workshops exposing young black men to mentors, basic computer programming and various careers in tech, like sports analytics and video game design.

CMU, Pitt Brain Imaging Science Identifies Individuals With Suicidal Thoughts

Carnegie Mellon University, Dietrich College of Humanities and Social Sciences


Researchers led by Carnegie Mellon University’s Marcel Just and the University of Pittsburgh’s David Brent have developed an innovative and promising approach to identify suicidal individuals by analyzing the alterations in how their brains represent certain concepts, such as death, cruelty and trouble.

Suicidal risk is notoriously difficult to assess and predict, and suicide is the second-leading cause of death among young adults in the United States. Published in Nature Human Behaviour, the study offers a new approach to assessing psychiatric disorders.

“Our latest work is unique insofar as it identifies concept alterations that are associated with suicidal ideation and behavior, using machine-learning algorithms to assess the neural representation of specific concepts related to suicide. This gives us a window into the brain and mind, shedding light on how suicidal individuals think about suicide and emotion related concepts,” said Just, the D.O. Hebb University Professor of Psychology in CMU’s Dietrich College of Humanities and Social Sciences.

Salt Lake City robotics company says goodbye to military jobs to focus on helping workers and ‘saving lives’

The Salt Lake Tribune, Mike Gorrell


Instead of building up soldiers for warfare, Sarcos Robotics is focusing now on improving workplace safety and productivity.

“We have made a commitment, as a team, not to weaponize the robots we make,” Wolff told the writers from PC Magazine, ZDNet, Popular Mechanics, Oil & Gas Engineering, The Robot Report, research firms ABI and IDC, and The Salt Lake Tribune. “Saving lives is what we want to do.”

The PayPal Mafia of Self-Driving Cars Has Been at It a Decade

Bloomberg Technology, Dana Hull


A Pentagon-funded contest spawned many of today’s self-driving startups. A decade later, the competitors are struggling to perfect the technology.

eBay launches visual search tools that let you shop using photos from your phone or web

TechCrunch, Sarah Perez


eBay today is launching two new visual search tools that will allow online shoppers to use photos they snap, have saved on their phone, or even those they find while browsing the web or other social networking sites, in order to find matching products from eBay’s catalog. The tools, Image Search and Find it on eBay, leverage advancements in computer vision and deep learning, including the use of neural networks, the company notes.

Eight Researchers Funded for ‘High-Risk, High-Reward’ Projects

The Harvard Crimson, Ashley M. Cooper and Luke W. Vrotsos


Sherri Rose, an associate professor at Harvard Medical School, received a New Innovator Award for a proposal to analyze ten years of Mass. Medicaid data to better understand socioeconomic and geographic disparities in healthcare.

NIH’s New Innovator Award is specifically reserved for researchers early on in their careers.

Rose said that traditionally, grants fund a single, narrow research proposal, whereas this round of NIH grants “give a lot more flexibility,” allowing a scientist to conduct more multifaceted research. She also compared these grants to venture-capital funding, because they focus on an individual scientist’s potential.

Observing Life near the Ocean’s Surface with Satellites

Eos, Venetia Stuart


Ocean color (OC) satellites provide a global, long-term view of the oceans from space and greatly improve our understanding of ocean processes. These processes include the role of phytoplankton in marine ecosystems and the linkage between phytoplankton primary production, carbon fixation, and climate change. The satellite ocean color radiometry (OCR) data stream also supports a range of research and societal applications, including water quality monitoring; harmful algal bloom detection; management of marine resources, including fisheries and aquaculture; and climate and biogeochemical research.

Ocean color scientists and representatives from various space agencies, hosted by the International Ocean Colour Coordinating Group (IOCCG), met earlier this year to improve collaboration and advance OCR research. The IOCS-2017 meeting aimed to inform participants about cutting-edge research and agency mission plans and to get participants to collectively agree on requirements for sustained OC research and operations.

Carnegie Mellon Dean Of Computer Science On The Future Of AI

Forbes, Peter High


Andrew Moore’s career path at Carnegie Mellon has become emblematic of the way the University fosters its star talent. He became a tenured professor at Carnegie Mellon in 2000. In 2006, Moore joined Google, where he was responsible for building a new engineering office. As a vice president of engineering, Andrew was responsible for Google Shopping, the company’s retail segment. Moore returned to Carnegie Mellon in 2014 as the Dean of the Computer Science department. In that role and given his experience, Moore is among the most influential people in the fields of computer science and artificial intelligence. … In this interview, Moore offers insights into the evolving field of artificial intelligence, what is likely to be the factors to determine the companies who will win or lose in this space, as well as insights into what makes Carnegie Mellon specifically and Pittsburgh more generally a hot test bed for cutting edge technology.

Parenting In The Age Of Alexa, Are Artificial Intelligence Devices Safe For Kids?

NPR, Shots blog, Michaeleen Doucleff


Earlier this month, the toy-giant Mattel announced it had pulled the plug on plans to sell an interactive gadget for children.

The device, called Aristotle, looked similar to a baby monitor with a camera. Critics called it creepy.

Powered by artificial intelligence, Aristotle could get to know your child — at least that was how the device was being pitched. [3:42]

AT&T Joins the Open-Source Artificial-Intelligence Arms Race

Wired, Business, Klint Finley


Big technology companies want to make it easier to use artificial intelligence to attack real-world problems. In recent years, companies including Google, Amazon, and Microsoft have released software frameworks designed to help developers build AI-powered applications.

These projects simplify the task, but it’s still a challenge to turn these frameworks into something useful. AT&T is hoping to change that with a new AI platform called Acumos, which it plans to reveal at a Dallas event Monday.

Wolters Kluwer Uses AI to Predict What Will Become U.S. Law

Artificial Lawyer


Global legal tech company and publisher, Wolters Kluwer Legal & Regulatory US, has released an AI-powered predictive analytics capability that can assess a proposed bill’s likelihood of passage by the US Congress, and whether it will become law.

The benefit to lawyers and corporate clients is that they get an advance notice of the chances of impending legislation succeeding and can then begin to make educated guesses as to what to do next. The company said that the public release of the ‘Federal Developments Knowledge Center‘ technology follows a successful Beta test earlier in 2017.


Applied Machine Learning Days



Lausanne, Switzerland January 27-30, 2018, at EPFL. [tickets available soon]

Connected Canada 4.0

Datacenter Dynamics


Toronto, Canada December 14 at Metro Toronto Convention Centre, produced by Datacenter Dynamics. “Canada’s most anticipated conference of digital infrastructure transformation” [free, for most]

International Conference on Health Policy Statistics

American Statistical Association


Charleston, SC January 10-12, 2018. [$$$]

Go North 2017

Google Canada


Toronto, Canada November 2. “GO NORTH will bring the brightest minds of Canada’s tech ecosystem together to unlock the potential of artificial intelligence. Through high profile speakers, thought provoking panels, immersive demos and interactive sessions, the day will explore Canada’s early leadership in AI, and how it can be used to accelerate growth across our economy.” [sold out]

Tools & Resources

A Visual Guide to Evolution Strategies | 大トロ


“In this post I explain how evolution strategies (ES) work with the aid of a few visual examples. I try to keep the equations light, and I provide links to original articles if the reader wishes to understand more details.”

Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

EPJ Data Science; Andrew J ReaganEmail author, Christopher M Danforth, Brian Tivnan, Jake Ryland Williams and Peter Sheridan Dodds


“Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts.” [full text]

Introducing Vectorized UDFs for PySpark

The Databricks Blog, Li Jin


This blog post introduces the Vectorized UDFs feature in the upcoming Apache Spark 2.3 release that substantially improves the performance and usability of user-defined functions (UDFs) in Python.

Over the past few years, Python has become the default language for data scientists. Packages such as pandas, numpy, statsmodel, and scikit-learn have gained great adoption and become the mainstream toolkits. At the same time, Apache Spark has become the de facto standard in processing big data. To enable data scientists to leverage the value of big data, Spark added a Python API in version 0.7, with support for user-defined functions. These user-defined functions operate one-row-at-a-time, and thus suffer from high serialization and invocation overhead. As a result, many data pipelines define UDFs in Java and Scala, and then invoke them from Python.

Vectorized UDFs built on top of Apache Arrow bring you the best of both worlds—the ability to define low-overhead, high performance UDFs entirely in Python.


Tenured and tenure track faculty positions

Biomedical Data Science

Stanford University; Palo Alto, CA

Sustainability Planning and Urban Informatics-Open rank

Northeastern University; Boston, MA

Dean, Faculty of Science

Carleton University; Ottawa, Canada

Postdoctoral Position

ISI Foundation; Torino, Italy
Full-time positions outside academia

Football Analytics Assistant

NFL; New York, NY

Leave a Comment

Your email address will not be published.