Data Science newsletter – December 20, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for December 20, 2017

GROUP CURATION: N/A

 
 
Data Science News



How smart is today’s artificial intelligence?

Vox, Joss Fong


from

Not only are we a long way from general artificial intelligence, but there’s no clear path to that goal right now. Today’s slick AI applications are powered by a massive amount of custom engineering and data preparation in order to accomplish a single task. These algorithms are less an “artificial intelligence” than they are a powerful tool, or what computer scientist John Launchbury (formerly of DARPA) calls “spreadsheets on steroids.”

Check out the video above to learn about the strengths and limits of today’s AI.


Welcoming the Era of Deep Neuroevolution

Uber Engineering, Kenneth O. Stanley & Jeff Clune


from

In the field of deep learning, deep neural networks (DNNs) with many layers and millions of connections are now trained routinely through stochastic gradient descent (SGD). Many assume that the ability of SGD to efficiently compute gradients is essential to this capability. However, we are releasing a suite of five papers that support the emerging realization that neuroevolution, where neural networks are optimized through evolutionary algorithms, is also an effective method to train deep neural networks for reinforcement learning (RL) problems. Uber has a multitude of areas where machine learning can improve its operations, and developing a broad range of powerful learning approaches that includes neuroevolution will help us achieve our mission of developing safer and more reliable transportation solutions.


UCLA senior delivers digital health monitoring to fight disease in Cameroon

UCLA Newsroom


from

As Vikash Singh looks forward to 2018 he is also looking forward to witnessing his education in action. Specifically how his background in medical research, artificial intelligence and machine learning — along with a $5,000 UCLA Global Citizens Fellowship award and some innovative thinking — may potentially help save lives in Cameroon.

Doctors at the HSPC Polyclinic in Kumba, a city located in the country’s southwest region, will soon begin uploading patient information to a software application designed by Singh and a team of student programmers through Project DataReach, a company Singh launched in 2015 with funding provided by the Stamps Foundation Scholarship program. The student programmers, who attend various universities, include Singh’s roommate Matthew Khanzadeh, a fourth-year computer science major who served as a lead engineer on the Kumba project.

The application’s reporting and data visualization capabilities will help health care professionals at the clinic gain a more complete picture of the health of their community as it pertains to non-communicable diseases, including heart disease, diabetes, obesity, asthma and hypertension. This, in turn, can help them identify disease patterns, mitigate potential outbreaks, develop strategies for education, prevention and determining who may be at risk for these health problems.


AI insights could help reduce injuries in construction industry

University of Waterloo, Waterloo News


from

Artificial intelligence (AI) is giving researchers at the University of Waterloo new insights to help reduce wear-and-tear injuries and boost the productivity of skilled construction workers.

Studies using motion sensors and AI software have revealed expert bricklayers use previously unidentified techniques to limit the loads on their joints, knowledge that can now be passed on to apprentices in training programs.

“The people in skilled trades learn or acquire a kind of physical wisdom that they can’t even articulate,” said Carl Haas, a professor of civil and environmental engineering at Waterloo. “It’s pretty amazing and pretty important.”


How The San Diego Zoo Is Using AI And Drones To Save Polar Bears

WildLabs.net


from

Few things are more important to polar bears than ice, which provides the Arctic carnivores with food, migration routes, and places to bear their young. But over the last four decades, there has been less and less Arctic sea ice available, and the summer period of ice melt has grown longer, making survival more difficult for animals who currently number only about 25,000 around the world.

To track the impact of changes to the Arctic ice, San Diego Zoo Global, the wildlife conservation group that operates the San Diego Zoo and Safari Park, is working with aerospace contractor Northrop Grumman Corp. to develop an autonomous hexacopter drone that will monitor ice formation and polar bear behavior at a level of detail that’s never been available before. This November, the group will bring the drone — largely hacked together from modified off-the-shelf products in Northrop’s internal makers’ space, the FabLab — to the edge of the Arctic Circle to study ice and bears for 10 days in temperatures as low as 30 degrees below zero Fahrenheit. The technology will map sea ice and monitor wildlife at a scale not previously possible, in an environment that would break most high-resolution monitoring equipment, but which is crucial to the survival of Arctic polar bear populations.


​Scientists seek diagnostic tool for harmful algal blooms

The Ohio State University, Ohio State News


from

Harmful algal blooms in rivers and streams are neither well-understood nor easily predicted, and researchers at The Ohio State University are hoping to change that.

With a three-year $681,343 grant from the U.S. Environmental Protection Agency, a team of Ohio State scientists plans to develop a widely applicable system for assessing watershed health and determining when a crisis is looming.


Major Measurement Issues Found in Emergency Department Patient Experience Data

The George Washington University, School of Medicine & Health Sciences


from

There are major measurement issues in patient experience data collected from U.S. emergency departments, including high variability and limited construct validity, according to an analysis published by researchers at the George Washington University (GW) and US Acute Care Solutions.

Patient experience data is becoming increasingly important in healthcare. The data is incorporated into the U.S. Centers for Medicare and Medicaid Services public reporting and value-based purchasing models for inpatient hospital care and will be used in the implementation of the Medicare Access and CHIP Reauthorization Act, known as MACRA. The data is also used to judge physician performance and hospital performance, often driving managerial decisions such as compensation and employment, and how a hospital is perceived in the community.


University of Montana prepares to launch Big Sky Poll

Missoulian, Keila Szpaller


from

The University of Montana is launching a statewide Big Sky Poll in order to provide high-quality public opinion data.

The project of the UM Social Science Research Laboratory is offering students an opportunity to do work that’s meaningful outside the classroom, and it’s slated to offer information to the public starting in spring 2018 about Montanans’ views of politicians and key issues such as the economy.

Sara Rinfret, associate professor of political science at UM, said polling data from Montana is notoriously poor, and the Big Sky Poll aims to remedy the situation. The new Montana poll comes out of her experience with a statewide poll in Arizona.


2017 eScience Institute highlights

University of Washington, eScience Institute


from

It was a productive, full and fantastic year here at the eScience Institute. Students, faculty, staff, researchers and community members participated in dozens of events, workshops, seminars, poster sessions, working groups, networking opportunities, trainings, and office hours sessions to learn more about data science.

But before we ring in the new year, we thought we’d share ten impactful stories from 2017.


Auto-tuning data science: New research streamlines machine learning

MIT News, MIT Laboratory for Information and Decision Systems


from

The tremendous recent growth of data science — both as a discipline and an application — can be attributed, in part, to its robust problem-solving power: It can predict when credit card transactions are fraudulent, help business owners figure out when to send coupons in order to maximize customer response, or facilitate educational interventions by forecasting when a student is on the cusp of dropping out.

To get to these data-driven solutions, though, data scientists must shepherd their raw data through a complex series of steps, each one requiring many human-driven decisions. The last step in the process, deciding on a modeling technique, is particularly crucial. There are hundreds of techniques to choose from — from neural networks to support vector machines — and selecting the best one can mean millions of dollars of additional revenue, or the difference between spotting a flaw in critical medical devices and missing it.

In a paper called “ATM: A distributed, collaborative, scalable system for automated machine learning,” which was presented last week at the IEEE International Conference on Big Data, researchers from MIT and Michigan State University present a new system that automates the model selection step, even improving on human performance.


Accenture predicts the top tech stories of CES 2018

VentureBeat, Dean Takahashi


from

[Greg] Roberts sees five major stories happening at this year’s show, which starts with press events on January 7.

“The first story is around the expansion and proliferation of artificial intelligence, the second is about 5G and how that enables the next generation of technology such as the Internet of Things, and the third is blockchain as an enabling technology for things like security,” said Roberts. “There will be a shift toward software, and a lot of attention around autonomous vehicles. Those are the stories that will be important.”


Home Economics: How Life in 123 Million American Households Was Exposed Online

UpGuard, Dan O'Sullivan


from

In another blow to consumer privacy, the UpGuard Cyber Risk Team can now reveal that a cloud-based data repository containing data from Alteryx, a California-based data analytics firm, was left publicly exposed, revealing massive amounts of sensitive personal information for 123 million American households. Exposed within the repository are massive data sets belonging to Alteryx partner Experian, the consumer credit reporting agency, as well as the US Census Bureau, providing data sets from both Experian and the 2010 US Census. While the Census data consists entirely of publicly accessible statistics and information, Experian’s ConsumerView marketing database, a product sold to other enterprises, contains a mix of public details and more sensitive data.Taken together, the exposed data reveals billions of personally identifying details and data points about virtually every American household.

From home addresses and contact information, to mortgage ownership and financial histories, to very specific analysis of purchasing behavior, the exposed data constitutes a remarkably invasive glimpse into the lives of American consumers. While, in the words of Experian, “protecting consumers is our top priority,” the accumulation of this data in “compliance with legal guidelines,” only to then see it left downloadable on the public internet, exposes affected consumers to large-scale misuse of their information – whether through spamming and unwanted direct marketing, organized fraud techniques like “phantom debt collection,” or through the use of personal details for identity theft and security verification.

While many consumers will likely be troubled by the ability of private corporations to legally collect and sell this data, ranging from publicly available information to sensitive financial details, this exposure highlights a number of growing forms of cyber risk with systemic implications.


Where’s the adoption? Shifting the Focus of Data Publishing in 2018

Medium, UC Curation Center, Daniella Lowenberg


from

At RDA10 in Montreal I gave a presentation on Dash in the Repository Platforms for Research Data IG session. The session was focused on backend technology and technology communities for repository platforms. I talked a bit about the Dash open source software and features but walked away thinking “How productive is it to discuss software systems to support research data at length? Is adoption based on technology?”

The answers are: not productive, and no.

Following RDA10, I spent months talking with as many researchers and institutions as possible to figure out how much researchers know about data publishing and what would incentivize them to make it a common practice.


A new approach for detecting planets in the Alpha Centauri system

Yale University, YaleNews


from

Yale astronomers have taken a fresh look at the nearby Alpha Centauri star system and found new ways to narrow the search for habitable planets there.

According to a study led by Professor Debra Fischer and graduate student Lily Zhao, there may be small, Earth-like planets in Alpha Centauri that have been overlooked. Meanwhile, the study ruled out the existence of a number of larger planets in the system that had popped up in previous models.

“The universe has told us the most common types of planets are small planets, and our study shows these are exactly the ones that are most likely to be orbiting Alpha Centauri A and B,” said Fischer, a leading expert on exoplanets who has devoted decades of research to the search for an Earth analog.


EPFL teams up with partner organizations to build “digital trust”

EPFL, News


from

EPFL President Martin Vetterli has today unveiled plans to create the EPFL Center for Digital Trust. Eight institutional and industrial partners are already willing to join. This new research platform will aim to be a center of excellence for IT security and personal data protection in the digital era.


Government Data Science News

Canada may vote to increase federal spending for science by double digits following a stupendous grassroots effort by academics from many disciplines. Go, Canada!

Bloomberg compared central banks across Asia to size up how they were using big data to manage economic policy. As the countries’ data fluency increases they get more comfortable creating new indicators to inform decision-making.

A first-of-its-kind multispectral imaging spectrometer is aboard a new satellite orbiting Earth that the European Space Agency operates. The satellite, the Sentinel-5P, shows pollution levels in the atmosphere with “unprecedented granularity.”


Nine things I wish I had known the first time I came to NIPS

Medium, Jenn Wortman Vaughn


from

I’m thrilled to have the honor of kicking off WiML 2017, the 12th Annual Workshop for Women in Machine Learning. I’d like to start by congratulating the organizers who did all the hard work to make this event happen. Let’s give them a round of applause.

These opening remarks often start with a bunch of stats looking at the number of women in different roles within the machine learning community, and examining how these numbers have evolved over the years.

I thought about starting my talk that way, but I couldn’t bring myself to do it.


Andreessen Horowitz Backs Asimov’s Synthetic Biology CAD Software

Xconomy, Jeff Engel


from

The company uses software and robotics systems to design and test custom-made yeast and other microorganisms engineered to secrete products such as rose-scented oil that goes into perfumes; sweeteners for beverages; and industrial enzymes used in laundry detergent.

Other synthetic bio companies include Bolt Threads, which is engineering yeast that produces spider silk-like fibers for fabric and garments; Impossible Foods; which genetically engineers yeast to make a crucial ingredient in its meatless burger; and cell therapy firm Cell Design Labs, which was acquired this month by Gilead Sciences for up to $567 million.

 
Deadlines



Volunteer Science, Need for Cognition survey

Our researchers are curious about how often you seek out complex #problems. Take our Need for Cognition survey.

Submit a Carpentries Blog Post

“Use this form to upload your text. We will contact you to review it once it has been formatted.”

2018 NYC Gaia Sprint

“If you want to attend the 2018 NYC Gaia Sprint you must apply here. There are no invited participants; this meeting is apply-only.” The deadline for applications is February 5, 2018.
 
Tools & Resources



10 Things Everyone Should Know About Machine Learning

Hacker Noon, Daniel Tunkelang


from

As someone who often finds himself explaining machine learning to non-experts, I offer the following list as a public service announcement.

  • Machine learning means learning from data; AI is a buzzword. Machine learning lives up to the hype: there are an incredible number of problems that you can solve by providing the right training data to the right learning algorithms. Call it AI if that helps you sell it, but know that AI, at least as used outside of academia, is often a buzzword that can mean whatever people want it to mean.
  • Machine learning is about data and algorithms, but mostly data. There’s a lot of excitement about advances in machine learning algorithms, and particularly about deep learning. But data is the key ingredient that makes machine learning possible. You can have machine learning without sophisticated algorithms, but not without good data.
  • [8 more]


    Introducing the ALE 0.6

    Marc G. Bellemare


    from

    “We’re releasing version 0.6, which provides support for two new features: modes and difficulties. As it turns out, there are more buttons on the Atari 2600 console than the ALE lets you play with.”


    Build it, hack it, share it: Lessons from MIT’s Civic Data Design Lab director Sarah Williams

    Storybench, Paxtyn Merten


    from

    Build it, hack it, share it. This was the message that Sarah Williams, director of MIT’s Civic Data Design Lab and a professor of technology and urban planning, hoped to bring to a crowd of data enthusiasts during the final installation of the Northeastern University Visualization Consortium’s Fall 2017 speaker series.

    The three-part phrase is her lab’s motto, which they employ in an effort to harness big data and develop tools to spur policy change. Williams shared how she and her team built, hacked and shared data in projects touching on semi-formal public transportation in Nairobi, Kenya and ghost cities across China.


    Continuous Classification using Deep Neural Networks

    Nick Strayer


    from

    Continuous classification (sometimes called sequence labeling) in the context of statistics and machine learning is training models to observe data over time, like you watched the movie, and classify the status of the generating system at any given point. Sometimes seeing the most recent data is all that is needed, but more interesting and challenging problems need the algorithm to be able to make decisions about a current time while leveraging context from previous history to do so.

    This report is a brief run through past attempts at continuous classification and a deeper exploration of the current state of the art methods.


    [D] Freelance machine learners, how do you do it?

    reddit.com/r/machinelearning


    from

    All freelancing folk in the subreddit, let us know-

  • Where do you find jobs?
  • How much do you charge?
  • What work do you get? What all do you deliver?
  • [41 comments]


    What the SATs Taught Us about Finding the Perfect Fit

    Stitch Fix Technology, Multithreaded blog, Patrick Foley and John McDonnell


    from

    “It’s a magical experience to walk into a suit shop, have the professional shopping assistant look you over and without taking a measurement say, “you’re probably a 38, let’s try this one,” and pull out a perfect-fitting jacket. While this sort of experience has been impossible with traditional eCommerce, at Stitch Fix we’re making it a reality.”

     
    Careers


    Full-time positions outside academia

    Data Scientist



    The Alan Turing Institute; London, England
    Internships and other temporary positions

    The Alan Turing Institute Internship Programme 2018



    The Alan Turing Institute; London, England
    Postdocs

    Postdoctoral position available in the Poldrack Lab



    Stanford University, Department of Psychology; Palo Alto, CA

    Leave a Comment

    Your email address will not be published.