Data Science newsletter – October 13, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for October 13, 2018

GROUP CURATION: N/A

 
 
Data Science News



Waymo’s cars drive 10 million miles a day in a perilous virtual world

MIT Technology Review, Will Knight


from

You could argue that Waymo, the self-driving subsidiary of Alphabet, has the safest autonomous cars around. It’s certainly covered the most miles. But in recent years, serious accidents involving early systems from Uber and Tesla have eroded public trust in the nascent technology. To win it back, putting in the miles on real roads just isn’t enough.

So today Waymo not only announced that its vehicles have clocked more than 10 million miles since 2009. It also revealed that its software now drives the same distance inside a sprawling simulated version of the real world every 24 hours—the equivalent of 25,000 cars driving 24/7. Waymo has covered more than 6 billion virtual miles in total.

This virtual test track is incredibly important to Waymo’s efforts to demonstrate that its cars are safe, says Dmitri Dolgov, the firm’s CTO.


Prescience: Helping doctors predict the future

University of Washington, UW News


from

During surgery, anesthesiologists monitor and manage patients to make sure they are safe and breathing well. But these doctors can’t always predict when complications will arise.

Now researchers at the University of Washington have developed a new machine-learning system, called Prescience, which uses input from patient charts and standard operating room sensors to predict the likelihood that a patient will develop hypoxemia — a condition when blood oxygen levels dip slightly below normal. Hypoxemia can lead to serious consequences, such as infections and abnormal heart behavior.

Prescience also provides real-world explanations behind its predictions.


Researchers look beyond BMI to predict obesity-related disease risk

Scripps Research Institute


from

Scientists at Scripps Research and collaborating corporate and academic partners have found a new way to use distinct molecular “signatures” from people with obesity to predict risk of developing diabetes and cardiovascular disease, an advance that could broaden the way doctors and scientists think about diagnosing and treating disease.

The research, led by Amalio Telenti, MD, PhD, professor of genomics at Scripps Research and previously a scientific leader at Human Longevity Inc. (HLI), shows that predictors of future diabetes and cardiovascular disease for a person with obesity can be found among their body’s metabolites, molecules that all of us produce as we live, breathe and eat.

Using cutting edge technologies, the scientists were able to assess the relationship between disease risk and the “metabolome,” a person’s collection of hundreds of metabolites, identifying specific signatures that predicted higher risk, according to results published in Cell Metabolism.


US military wants AI to discover new molecules for it

Chemistry World, Gege Lee


from

The Defence Advanced Research Projects Agency (Darpa) in the US has launched a new programme to develop new high-performance molecules for military purposes using artificial intelligence.

Accelerated Molecule Discovery (AMD) will increase the pace of discovery and optimisation of molecules, such as chemical warfare simulants and medicines, from years to mere months or weeks, according to programme manager Anne Fischer of Darpa’s Defense Sciences Office. Darpa says its AMD programme – announced last week – is necessary to ‘counter emerging threats’.


Guest Post: Galin Jones on criteria for promotion and tenture in (bio)statistics departments

Simply Statistics blog, Roger Peng and Galin Jones


from

I attended an ASA Chair’s meeting and spoke about ways we could support junior faculty in data science. After giving my talk Galin Jones, Professor and Director of Statistics at University of Minnesota, and I had an interesting conversation about how they had changed their promotion criteria in response to a faculty candidate being unique. I asked him to write about his experience and he kindly contributed the following post.


Can the Wi-Fi chip in your phone help feed the world?

Bill Gates, Gates Notes blog


from

Here’s how FarmBeats works: The whole system is powered by solar panels. You place a small number of sensors—one every couple hundred meters, instead of 10 meters—in the ground. You then attach your smart phone with the camera facing down to either a drone (if you have money to spare) or a helium balloon (if you don’t).

You walk around the fields with the camera, creating an aerial map of the farm. Data from both the sensors and the phone are transmitted via TV white space to your computer, where an edge device stitches everything together into a data map.

The data generated by FarmBeats has been a game changer for Sean and Dancing Crow Farm. He can use up to 30 percent less water for irrigation and 44 percent less lime to control soil pH.


How Penn Medicine is redesigning its EHR for a new era of care

Healthcare IT News, Mike Miliard


from

No longer just billing systems, electronic health records have become indispensable to quality healthcare. With help from its in-house behavioral design team, Penn Medicine aims to revamp its IT systems for a fast-evolving future.


Opinion | Internet Hacking Is About to Get Much Worse

The New York Times, Opinion, Bruce Schneier


from

It’s no secret that computers are insecure. Stories like the recent Facebook hack, the Equifax hack and the hacking of government agencies are remarkable for how unremarkable they really are. They might make headlines for a few days, but they’re just the newsworthy tip of a very large iceberg.

The risks are about to get worse, because computers are being embedded into physical devices and will affect lives, not just our data. Security is not a problem the market will solve. The government needs to step in and regulate this increasingly dangerous space.


The next big restaurant chain may not own any kitchens

TechCrunch, Jonathan Shieber


from

Investors are pouring millions into the creation of a network of shared kitchens, storage facilities, and pickup counters that established chains and new food entrepreneurs can access to cut down on overhead and quickly spin up new concepts in fast food and casual dining.

Powering all of this is a food delivery market that could grow from $35 billion to a $365 billion industry by 2030, according to a report from UBS’s research group, the “Evidence Lab”.

“We’ve had conversations with the biggest and fastest growing restaurant brands in the country and even some of the casual brands,” said Jim Collins, a serial entrepreneur, restauranteur, and the chief executive of the food-service startup, Kitchen United. “In every board room for every major restaurant brand in the country… the number one conversation surrounds the topic of how are we going to address [off-premise diners].”


Redistricting as an NP-Hard Problem

Medium, NYU Center for Data Science


from

Shrouded in the kind of ennui-inducing terminology only politicians can dream up, gerrymandering is the process of redrawing district boundaries with a mind for political gain. The hand redrawing district lines holds immense power in deciding the outcomes of future elections. Gerrymandering is central to the game of political scheming because it heavily influences election results on the state level. The recent Supreme Court case, Gill v. Whitford, placed the traditional method of redrawing district lines for political gain under fire. Though the Supreme Court has not yet put forth a ruling regarding fair, bipartisan redistricting, research has capitalized on the unanswered but implicit question, “What is an appropriate metric to detect partisan gerrymandering?”. In their recent publication, Moore-Sloan Data Science fellow, Soledad Villar, Richard Kueng of the Institute for Quantum Information and Matter and California Institute of Technology, and Dustin G. Mixon of the Ohio State University’s Department of Mathematics sought to explore this question by determining if it is computationally feasible to find ways to fairly and legally redistrict.


The DNA technique that caught the Golden State Killer is more powerful than we thought

The Verge, Russell Brandom


from

This April, police solved a decades-old mystery — the identity of the Golden State Killer — with a previously unused DNA technique. Searching for a sample match in existing databases turned up nothing, but a search through a public DNA database located 10 to 15 possible distant relatives, which let police narrow down a suspect list and ultimately gave them the lead they needed.

It was a new technique at the time, but after the high-profile success, that technique has proved to be one of the most powerful new tools in forensics. In the months since, groups like Parabon NanoLabs and the DNA Doe Project have identified at least 19 different cold case samples through this method, called familial DNA testing of public databases, providing crucial new leads for previously unsolvable cases.

Now, a pair of new discoveries could make that technique even more powerful. A paper published today in the journal Science finds that the same technique could span much further than contemporary labs realize, covering nearly the entire population from a relatively small base of samples. At the same time, researchers publishing in Cell have devised a way to extrapolate from incomplete samples, building out a broader picture of the genome than was originally tested. Taken together, those techniques would allow researchers to identify nearly anyone using only existing samples, a frighteningly powerful new tool for DNA forensics.


Alastair Mactaggart takes on Big Tech over privacy law

Fast Company, Marcus Baram


from

Alastair Mactaggart is out of breath. After rushing to meetings and testifying on Capitol Hill on Wednesday, the words are spilling out of him. The millionaire real estate developer from San Francisco, who spearheaded California’s new consumer privacy law, is in Washington, D.C., to join other privacy advocates who’ve been invited by lawmakers to describe what they want from a potential new federal privacy law. The issue has gained traction on both sides of the aisle in the wake of numerous high-profile privacy scandals, from Facebook user data being shared with Cambridge Analytica, to Google’s recent admission that it deliberately didn’t tell users about a privacy bug involving Google+.

But privacy advocates like Mactaggart are concerned that the bill he championed, which gives consumers sweeping control over their own data, and which took many months to become law in the face of opposition from Silicon Valley lobbyists, could be preempted by a weaker federal law.


New data science, computational biology departments span colleges

Cornell University, Cornell Chronicle


from

The university is launching two new multicollege departments – one in statistics and data science, and one in computational biology – to meet evolving research needs, encourage collaboration, and improve the quality of teaching and learning in these increasingly essential fields.

The plans follow a recent white paper from faculty in the existing Department of Biological Statistics and Computational Biology, which recommended splitting into two departments and expanding college participation to better align faculty interests and allow for critical departmental growth. The faculty plan calls for the restructuring of two existing departments – Statistical Science in Computing and Information Science (CIS) and Biological Statistics and Computational Biology in the College of Agriculture and Life Sciences (CALS) – both of which will become multicollege departments.


EPFL opens new research center for educational science

EPFL, News


from

EPFL has created a unique ecosystem of services, research labs and entrepreneurial programs in the field of education. And today it is adding another string to its bow with LEARN, a new center that will pool the efforts of the different stakeholders involved in educational research and in developing new teaching tools. “To keep pace with the remarkable advancements in digital technology, we need to update both what we teach and how we teach,” says Pierre Vandergheynst, EPFL’s Vice President for Education. “LEARN’s primary goal will therefore be to conduct translational research in educational science. That is, to try out new teaching methods, demonstrate their impact and turn them into new teaching practices for our campus and, where desirable, for the educational system as a whole.”

LEARN will aim to stimulate research and create new synergies in education – a field that has ramifications for our entire society. The center will be headed by Francesco Mondada, a professor at EPFL’s Robotic Systems Laboratory and the engineer behind the Thymio teaching robot. LEARN members include the Swiss EdTech Collider, a group of around 70 high-tech startups and the Center for Digital Education, which recently unveiled the Learning Companion application that helps students “learn how to learn.”


Facebook’s head of health research wants doctors to have even more personal patient data

CNBC, Angelica LaVito


from

Facebook’s head of health research wants doctors to have even more personal data on patients to better predict illness and treat them: insight into their social life.

While researchers have found evidence that a person’s social life impacts their health more than almost any other major risk factor, the data researchers have when looking at large swaths of the U.S. population is scant on details beyond general demographics such as race, age and income, Dr. Freddy Abnousi, Facebook’s head of health research, said at a conference Wednesday.

“This research comes with limitations,” he said at the Manova Summit, a health-care conference in Minneapolis, Minnesota. “The problem is in the characteristics of the data itself.”

 
Events



Open Source Inclusion Sprint – Beginners Welcome!

NumFOCUS


from

New York, NY October 26, starting at 8:30 a.m., Two Sigma (101 6th Ave). “Join us at the Two Sigma offices in lovely SoHo NYC for a day of fun and learning. Core contributors from the JupyterLab project will be on-site to get you setup and help trouble-shoot your code.” [$]


For the Public Good? Values and Accountability in AI and Data Science

Philosophy of Science Association


from

Seattle, WA November 1, starting at 7:30 p.m., Seattle Public Library (1000 4th Avenue). [free]

 
Tools & Resources



Using the Crossref REST API. Part 12 (with Europe PMC)

Crossref blog, Christine Cormack Wood and Michael Parkin


from

As part of our blog series highlighting some of the tools and services that use our API, we asked Michael Parkin—Data Scientist at the European Bioinformatics Institute—a few questions about how Europe PMC uses our metadata where preprints are concerned.


Longitudinal Employer-Household Dynamics

U.S. Census Bureau


from

Post-Secondary Employment Outcomes (PSEO) are experimental tabulations developed by the Longitudinal Employer-Household Dynamics (LEHD) program in collaboration with post-secondary institutions and state agencies. PSEO data provide earnings and employment outcomes for college and university graduates by degree level, degree major, and post-secondary institution. The current PSEO is released as a research data product in “beta” form.

The PSEO provide data on earnings and employment for recent graduates of partner colleges and universities. Earnings are available at the 25th, 50th, and 75th percentiles, one, five, and 10 years after graduation, by institution, degree level, degree field, and graduation cohort. A second set of tabulations, currently under development, will provide industry and location of employment for graduates. These statistics are generated by matching university transcript data with a national database of jobs, using state-of-the-art confidentiality protection mechanisms to protect the underlying data.

 
Careers


Tenured and tenure track faculty positions

Assistant Professor in Political Science



Vanderbilt University, Department of Political Science; Nashville, TN
Full-time, non-tenured academic positions

Data Visualization and Presentation Specialist



Columbia University, Mailman School of Public Health; New York, NY

Leave a Comment

Your email address will not be published.