Data Science newsletter – July 30, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 30, 2018

GROUP CURATION: N/A

 
 
Data Science News



AI Object Recognition System Operates at Speed of Light

The Scientist Magazine®, Anna Azvolinsky


from

If you want an extremely fast image- or object-recognition system to detect moving items like a missile or cars on the road, a digital camera hooked up to a computer just won’t do, according to electrical engineer Aydogan Ozcan of the University of California, Los Angeles. So, using machine learning, optics tools, and 3-D printing, he and his colleagues have created a system that is more rapid, operates using light and, unlike computers, does not require a power source other than the initial light source and a simple detector. Their results are published today (July 26) in Science.

“This is a very innovative approach to construct a physical artificial neural network made of stacked layers of optical elements,” Demetri Psaltis, a professor of optics and electrical engineering at the École Polytechnique Fédérale de Lausanne in Switzerland, writes in an email to The Scientist.

What is novel here is not the deep-learning part, but the optical engineering and the ability to “make a cast” of the artificial neural network using 3-D printing, notes Olexa Bilaniuk, a graduate student in Roland Memisevic and Yoshua Bengio’s groups at the University of Montreal who studies machine learning and artificial neural networks. “Previous work to create such an optical network had either been theoretical, or had built much simpler and smaller systems,” he adds.


Russian Hackers Appear to Shift Focus to U.S. Power Grid

The New York Times, David E. Sanger


from

State-sponsored Russian hackers appear far more interested this year in demonstrating that they can disrupt the American electric utility grid than the midterm elections, according to United States intelligence officials and technology company executives.

Despite attempts to infiltrate the online accounts of two Senate Democrats up for re-election, intelligence officials said they have seen little activity by Russian military hackers aimed at either major American political figures or state voter registration systems.

By comparison, according to intelligence officials and executives of the companies that oversee the world’s computer networks, there is surprisingly far more effort directed at implanting malware in the electrical grid.


Penske Media Makes Strategic Investment in BuzzAngle Music

Variety


from

Penske Media Corporation (PMC) is making a strategic investment in BuzzAngle Music, the company announced today. Founded in 2013, and launched in 2016 by Border City Media founder Jim Lidestri, BuzzAngle provides a state-of-the-art music analytics service that has revolutionized the way businesses analyze music consumption and related trends. The system incorporates the daily sales, streaming and airplay activity of albums, songs and artists.

Penske Media plans to leverage some of its key brands, chief among them Rolling Stone, Variety, Deadline and IndieWire, to deepen the influence and expand exposure of the BuzzAngle charts in addition to creating custom charts for specific content verticals. The investment will also provide BuzzAngle a stronger financial foundation from which to grow.


Halfway Through 2018, Streaming’s Continued Growth Defies Mathematical Trends

Billboard, Ed Christman


from

In the first half of 2018, overall on-demand streaming increased 41.7 percent to reach 403.5 billion U.S. streams, according to Nielsen Music. That growth defies mathematical trends, which dictate that, as a base enlarges, it becomes harder to achieve a bigger percentage growth than in preceding time periods.

That increase is larger than the 36.3 percent bump in 2017 over the prior year’s 208.9 billion streams at the six-month mark. Also at midyear 2018, total U.S. album consumption units (album sales + track-equivalent albums + total stream-equivalent albums) reached 360.2 million, an 18.4 percent increase over 2017 that follows another: a 7.8 percent gain in album consumption units counted in the first half of 2016.


The ethics of computer science: this researcher has a controversial proposal

Nature, Elizabeth Gibney


from

Nature talks to Brent Hecht, who says peer reviewers must ensure that researchers consider negative societal consequences of their work.


Development and Application of a Machine Learning Approach to Assess Short-term Mortality Risk Among Patients With Cancer Starting Chemotherapy

JAMA, The JAMA Network, Original Investigation; Aymen A. Elfiky, Maximilian J. Pany, Ravi B. Parikh et al


from

Question Can a machine learning algorithm applied to electronic health record data predict patients’ short-term risk of death at the time that they begin chemotherapy?

Findings In this cohort study of 26 946 patients with cancer starting 51 774 discrete chemotherapy regimens, those at high risk of 30-day mortality were accurately identified across palliative and curative chemotherapy regimens and many types and stages of cancer. The algorithm was more accurate than predictions based on randomized clinical trials or population-based registry data.

Meaning A machine learning algorithm accurately identified individuals at high risk of short-term mortality and may help to guide patient and physician decisions about chemotherapy initiation and advance care planning.


Is Code Free Speech?

PBS, NOVA Next, Allison Eck


from

On August 1, a chunk of code—an essay of 0s and 1s written in the language of computers—will become available to the public.

It might sound innocuous. Maybe even poetic. But download this code and feed it to a 3D printer, and you’ll be able to produce your very own deadly weapon. The information, soon available due to a landmark legal settlement that the United States Department of Justice made with plaintiff Cody Wilson two months ago, contains the computational language necessary to construct 3D-printed .38-caliber handguns and AR-15 assault rifles, among other lethal devices. Such weapons are untraceable; they don’t come with serial numbers, and no background check is required to make them.

Wilson, who founded the Texas-based nonprofit Defense Distributed, had filed a lawsuit against the U.S. government in 2015 on the grounds that, by prohibiting Wilson from posting his data, the State Department was violating not only his right to bear arms, but his right to free speech.


Congress Has a $95 Million Proposal to Study Tech’s Effect on Kids

WIRED, Science, Robbie Gonzalez


from

Like a lot of people, you probably spend a fair bit of time worrying about how much time you spend on your phone. Who doesn’t these days? But what really concerns you is the youth. What is all that swiping and snapping and gramming doing to their still-developing brains? Surely somebody’s studied this—the effect of all this screen time. So what have they found?

Well, to be honest: nothing conclusive. At least not yet.

On Thursday, Colorado senator Michael Bennet introduced legislation that would give the National Institutes of Health $95 million1 to investigate technology’s impact on infants, children, and adolescents. Called the Children and Media Research Advancement Act, or CAMRA for short, the bill would see that money distributed over the next five years, to researchers studying how things like mobile devices, social media, and virtual reality affect the way kids think, grow, and socialize.


For Sale: Survey Data on Millions of High School Students

The New York Times, Natasha Singer


from

Three thousand high school students from across the United States recently trekked to a university sports arena here to attend an event with an impressive-sounding name: the Congress of Future Science and Technology Leaders. Many of their parents had spent $985 on tuition.

Months earlier, the teenagers had received letters, signed by a Nobel Prize-winning physicist, congratulating them on being nominated for “a highly selective national program honoring academically superior high school students.”

The students all had good grades. But many of them were selected for the event because they had once filled out surveys that they believed would help them learn about colleges and college scholarships.

Through their schools, many students in the audience had taken a college-planning questionnaire, called MyCollegeOptions. Others had taken surveys that came with the SAT or the PSAT, tests administered by the College Board. In filling out those surveys, the teenagers ended up signing away personal details that were later sold and shared with the future scientists event.


Toyota will have Level 4 self-driving cars at 2020 Tokyo Olympics

Motor Authority, Viknesh Vijayenthiran


from

Toyota will put on a technological show for the world when all eyes are on Tokyo for the 2020 Olympic Games.

This will include the demonstration of a Level 4 self-driving car to highlight Toyota’s goal of providing personal transport for everyone, including those that are too old to drive or suffer from poor vision.

Toyota says its self-driving cars will be in operation in defined areas located at the Tokyo Water Front City and Haneda areas of Tokyo. These will be fully self-driving cars requiring no intervention by a human driver.


Extremely detailed election map

FlowingData, The New York Times


from

The Upshot returns to 2016 election results mapped at the precinct level. Because you know, we all want to experience the data as many times as we can before 2020.

There’s an interesting twist though. You can randomly view “one-sided places”, where the area voted mostly the same, and “voter islands”, where the area is surrounded by opposite-voting precincts. These are made more compelling by the granular data and ease of interaction, largely made possible by Mapbox.


Company Data Science News

DeepMind is giving Cambridge University an undisclosed sum to hire the “DeepMind Chair of Machine Learning.” They’re throwing in funding for four masters students and already fund several PhDs. The new chair is expected to work on facial recognition, a technology that requires some serious ethical oversight. Improving accuracy is not the only ethical hurdle, but inaccuracy certainly doesn’t lead to good outcomes. DeepMind and Cambridge leadership both seem pleased with the arrangement.



Apple is the world’s first trillion dollar company. How long until it buys Spotify? (See DVotW.)



Google Translate is turning language barriers into language speed bumps. They cranked through 143 billion words last year. My family uses this functionality to bridge our language barriers – occasionally with hilarious inconsistencies which I consider a feature, not a bug.

UPS has been using algorithms to predict truck routes, cutting out 100 million miles traveled per year. Now they’re expanding their use of predictive analytics to optimize package volume to truck and route matching. This should further optimize their logistical efficiency.



Toyota expects to have Level 4 self-driving cars at the 2020 Olympics in Tokyo. Level 4 self-drivers require no intervention from a human. They also have mini-trams called e-Pallets that will shuttle athletes around in groups. This car-sharing technology could inadvertently replace the rampant use of dating apps by athletes.



Uber has shuttered its self-driving truck division and decided to focus on its self-driving car division.



In a delightfully alliterative partnership, Waymo has partnered with Walmart to use self-driving cars to deliver groceries in the Arizona heat. The Waymo self-driving experiment is mostly being used for commuting to work and school, with trips to restaurants and bars coming in a close second. Unsurprisingly, trips to car repair shops clock in at number five.



Amazon has promised unwavering commitment to the US government and their AI contracts with it. When asked if the company has drawn any “red lines” that they will not cross to satisfy a customer, she implied the company hadn’t considered any. However, the ACLU lambasted the company’s Rekognition facial recognition technology after it failed to correctly identify 28 of the 535 current members of the U.S. Congress. The failure rates were higher for darker skinned people.

Facebook discovered a social hacking campaign that was using the platform in an attempt to influence the outcome of the 2018 US midterm elections. The company deleted 32 false pages and profiles that had lured 290,000 people with ads. This is a relatively small ad campaign, considering it ran for over a year from March 2017 to May 2018. It is linked to Russian operatives whose tactics are getting more sophisticated and difficult to detect.


The Democratization of Data Science

Harvard Business Review, Jonathan Cornelissen


from

Relegating all data knowledge to a handful of people within a company is problematic on many levels. Data scientists find it frustrating because it’s hard for them to communicate their findings to colleagues who lack basic data literacy. Business stakeholders are unhappy because data requests take too long to fulfill and often fail to answer the original questions. In some cases, that’s because the questioner failed to explain the question properly to the data scientist.

Why would non–data scientists need to learn data science? That’s like asking why non-accountants should be expected to stay within budget.

These days every industry is drenched in data, and the organizations that succeed are those that most quickly make sense of their data in order to adapt to what’s coming. The best way to enable fast discovery and deeper insights is to disperse data science expertise across an organization.

Companies that want to compete in the age of data need to do three things: share data tools, spread data skills, and spread data responsibility.


Computer creativity: When AI turns its gaze to art

CBS News, Sunday Morning, David Pogue


from

Douglas Eck works at Google. He created the Magenta project, dedicated to using artificial intelligence, or AI, to make art and music.

Eck is using a form of AI called Machine Learning. His software analyzed thousands of sketches done by people; and now, it knows how to complete a drawing itself.

“Why don’t you start drawing some rain, and it’ll complete your thought for you,” Eck asked Pogue. “And there it goes. And it’s making it rain for you. “The idea [is] that this model has figured out that once it sees a cloud, we’re gonna add rain.”

“And if I made little, teeny drops?”

“It’s gonna match the size of your drops.” And it did.


Tweet of the Week

Twitter, Pinboard


from

 
Events



TechCrunch Include Office Hours

TechCrunch


from

San Francisco, CATechCrunch will partner with Greylock for Include Office Hours on September 4th from 2-4pm.”

 
Deadlines



Leveraging Digital Health Data for High Throughput Phenotyping @ PSB 2019

The Big Island, HI Workshop is part of Pacific Symposium on Biocomputing, January 3-7. Deadline for submissions is August 6.
 
Tools & Resources



The good, the bad, and the biased: Five ways visualizations can mislead (and how to fix them)

ACM Interactions magazine; Danielle Szafir


from

“This article reviews common visualization practices that may inhibit effective analysis, why these designs are problematic, and how to avoid them. The discussion illustrates a need to better understand how visualizations can support flexible and accurate data analysis while mitigating potential sources of bias.”


[1807.07706] Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model

arXiv, Computer Science > Machine Learning; Atilim Gunes Baydin, Lukas Heinrich, Wahid Bhimji, Bradley Gram-Hansen, Gilles Louppe, Lei Shao, Prabhat, Kyle Cranmer, Frank Wood


from

We present a novel framework that enables efficient probabilistic inference in large-scale scientific models by allowing the execution of existing domain-specific simulators as probabilistic programs, resulting in highly interpretable posterior inference. Our framework is general purpose and scalable, and is based on a cross-platform probabilistic execution protocol through which an inference engine can control simulators in a language-agnostic way. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. High-energy physics has a rich set of simulators based on quantum field theory and the interaction of particles in matter. We show how to use probabilistic programming to perform Bayesian inference in these existing simulator codebases directly, in particular conditioning on observable outputs from a simulated particle detector to directly produce an interpretable posterior distribution over decay pathways. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of Markov chain Monte Carlo sampling.


How to use rquery with Apache Spark on Databricks

Win-Vector, Databricks, Nina Zumel and John Mount


from

In this blog, we will introduce rquery, a powerful query tool that allows R users to implement powerful data transformations using Apache Spark on Databricks. rquery is based on Edgar F. Codd’s relational algebra, informed by our experiences using SQL and R packages such as dplyr at big data scale.


Getting the Most Out of Your GPU Cluster for Deep Learning: Part I

Determined AI, Jennifer Villa and Jonathan Ben-tzur


from

To maximize the value of your deep learning hardware, you’ll need to invest in software infrastructure. Setting up a cluster manager is an essential first step in this process, but it’s not the end of the story.

 
Careers


Full-time positions outside academia

Senior Software Engineer



Scroll; New York, NY, or Portland, OR

Leave a Comment

Your email address will not be published.