Data Science newsletter – April 25, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for April 25, 2018

GROUP CURATION: N/A

 
 
Data Science News



Facebook and the perils of personalized choice architecture

TechCrunch, Yafit Lev-Aretz


from

The Facebook-Cambridge Analytica mess, together with many preceding indications before it, heralds a new type of choice architecture: personalized, uniquely tailored to your own individual preferences and optimized to influence your decision.

We are no longer in the familiar zone of choice architecture that equally applies to all. It is no longer about general weaknesses in human cognition. It is also not about biases that are endemic to human inferences. It is not about what makes humans human. It is about what makes you yourself.

When the information from various sources coalesces, the different segments of our personality come together to present a comprehensive picture of who we are. Personalized choice architecture is then applied to our datafied curated self to subconsciously nudge us to choose one course of action over another.


Machine speak: Left to their own devices, computers can figure it out

Oak Ridge National Laboratory


from

Using novel machine learning techniques, a research team from the Department of Energy’s Oak Ridge National Laboratory is teaching electronic devices how to speak for themselves.

Email, texting, image sharing and social media posts take place over networks that are reliable—most of the time. When certain connections are lost or disrupted by a physical or cyberattack, electronic devices immediately lose touch, and they are not equipped to find another method of communication.

Electronics are only as smart as their design. With proper training and intelligence, devices could be programmed to seek out available media, such as acoustics, optics or radio frequencies, and figure out how to transmit and receive messages on their own without being told beforehand.

For example, multiple response teams using two-way radio devices, or walkie-talkies, during an emergency are limited to a finite number of available frequencies. The same devices could be programmed and trained through machine learning and intelligence to seek alternate means to keep people connected, which is critical in life-and-death scenarios.


How merchants use Facebook to flood Amazon with fake reviews

The Washington Post, Elizabeth Dwoskin and Craig Timberg


from

On Amazon, customer comments can help a product surge in popularity. The online retail giant says that more than 99 percent of its reviews are legitimate because they are written by real shoppers who aren’t paid for them.

But a Washington Post examination found that for some popular product categories, such as Bluetooth headphones and speakers, the vast majority of reviews appear to violate Amazon’s prohibition on paid reviews. Such reviews have certain characteristics, such as repetitive wording that people probably cut and paste in.

Many of these fraudulent reviews originate on Facebook, where sellers seek shoppers on dozens of networks, including Amazon Review Club and Amazon Reviewers Group, to give glowing feedback in exchange for money or other compensation. The practice artificially inflates the ranking of thousands of products, experts say, misleading consumers.


The Myth of Science-Based Wildlife Management

The Tyee, Kyle A. Artelle


from

For years, British Columbia’s wildlife management practices, especially its wolf cull and grizzly bear hunt, have been controversial.
Announcements, Events & more from Tyee and select partners

In 2015, then-Premier Christy Clark defended the province’s wildlife policies, stating they were grounded in sound science.

That, at least, was the claim. And not one unique to British Columbia.

In fact, hunting in Canada and the United States is guided by the North American Model of Wildlife Conservation, which states “science is the proper tool to discharge policy.” But does this claim hold up to scrutiny?


Why I’ve lost faith in p values

Steven Luck


from

Here’s the problem in a nutshell: If you run 1000 experiments over the course of your career, and you get a significant effect (p < .05) in 95 of those experiments, you might expect that 5% of these 95 significant effects would be false positives. However, as an example shown later in this blog will show, the actual false positive rate may be 47%, even if you're not doing anything wrong (p-hacking, etc.). In other words, nearly half of your significant effects may be false positives, leading you to draw completely bogus conclusions that you are able to publish. On the other hand, your false positive rate might instead be 3%. Or 20%. And my false positive rate might be very different from your false positive rate, even though we are both using p < .05 as our criterion for significance (even if neither of us is engaged in p-hacking, etc.). In other words, p values do not actually tell you anything meaningful about the false positive rate. But isn’t this exactly what p values are supposed to tell us? Don’t they tell us the false positive rate? Not if you define “false positive rate” in a way that is actually useful. Here’s why.


What you get with Spotify’s free mobile app is about to get a lot better

Quartz, Mike Murphy


from

The new version of the app has a pared-down design that’s meant to provide a far richer experience for free Spotify users. The company announced that it has over 90 million monthly active users on its free tier, and around 60% of the users that pay for the Premium service start out as free users. It’s the first overhaul of the company’s free mobile service since 2014, the company’s head of research, Gustav Söderström, said onstage.

Spotify’s head of product, Babar Zafar, said that the app will feature a data-saving mode for users who do not have unlimited or hefty mobile data plans. The app will show less content when users are on a slower connection, and will automatically store songs that you listen to a lot, meaning they won’t use any data to listen to the song, even when streaming on a data connection. In short: the more of your favorite songs you listen to repeatedly, the less data it will likely cost you.


How 5G will power innovations in VR and artificial intelligence

TechRepublic, Dan Patterson


from

Manish Vyas, president of business communication at Tech Mahindra, spoke with TechRepublic about the power of 5G to change the way we live and work, as well as some of the challenges to rolling it out.


Special program targeting vets for artificial intelligence, drone operator jobs | abc7news.com

ABC 7 News (San Franicsco), David Louie


from

Jennifer Malandra has eight years’ active service in the Navy and a Naval Academy education. Grant Keaton separated from the Air Force two weeks ago after nine years of active service and four years studying at the Air Force Academy. Both are in search of civilian jobs in the highly competitive tech industry. They and 23 others are wrapping up a 10-day program run by BreakLine, an educational program sponsored by a who’s who of Silicon Valley companies in search of bright candidates.

The focus today was artificial intelligence and drones. Both specialties are creating jobs as a result of automation and robotics. The group was visiting Kespry, a Menlo Park drone company that has hired six veterans from the BreakLine program with service in the Army, Marines and Navy. Its drones operate autonomously for specialized applications in the mining, construction and insurance industries.


How artificial intelligence is transforming the world

The Brookings Institution, Darrell M. West and John R. Allen


from

Artificial intelligence (AI) is a wide-ranging tool that enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decision making—and already it is transforming every walk of life. In this report, Darrell West and John Allen discuss AI’s application across a variety of sectors, address issues in its development, and offer recommendations for getting the most out of AI while still protecting important human values.


Techstars Alexa Firms Address Settings Where Voice Is ‘Most Natural’

Xconomy, Angela Shah


from

Aviel Ginzburg, who is managing director of the Techstars Alexa Accelerator, says he agrees. Ginzburg says the role allows him to get a sneak preview of how this emerging technology could affect how we live, work, and play. “It’s more than asking questions to control things in your home,” he says. “It’s saving time.”

The Alexa program was founded in 2017 and is part of the Alexa Fund created by Amazon (NASDAQ: AMZN) to invest in young companies innovating around voice technologies. In 2015, Amazon committed to invest up to $100 million through the Alexa Fund, then committed another $100 million next year to take the fund global.

Techstars invests $20,000 in each company, with the potential for an additional investment of $100,000 in the form of a convertible note, in exchange for 6 percent equity. The accelerator is now looking to connect with entrepreneurs overseas with a London program with three startups and efforts to make the Alexa APIs available in France.


China is putting ID cards on smartphones

World Economic Forum, Adam Jezard


from

China’s millions of smartphone users are set to be even more wedded to their mobile devices thanks to rival bids by the country’s internet giants Alibaba and Tencent. The two companies are competing to host essential state-run ID services on their platforms.

Almost 60% of China’s population of nearly 1.4 billion citizens owns a smart mobile, according to data from eMarketer, while ID cards are issued to every citizen at the age of 16 by the Public Security Bureau.


A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Gary King


from

Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one’s own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We derive “hacking intervals,” which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals.


Exabytes in a Test Tube: The Case for DNA Data Storage – IEEE Spectrum

IEEE Spectrum; Olgica Milenkovic, Ryan Gabrys, Han Mao Kiah and S.M. Hossein Tabatabaei Yazdi


from

It was back in 2003 when some researchers, notably a group at the University of Arizona, became intrigued with the idea of using DNA to store data. But there were plenty of skeptics: Conventional mass-storage systems were doing the job cheaply and reliably. There was no compelling reason to seek out new options.

The situation has changed drastically over the last 15 years. We face an unprecedented data deluge in medicine, physics, astronomy, biology, and other sciences. The Sloan Digital Sky Survey, for example, produces about 73,000 gigabytes of data annually. At the European Organization for Nuclear Research (CERN), the Large Hadron Collider generates 50 million GB of data per year as it records the results of experiments involving, typically, 600 million particle collisions per second. These CERN results churn through a distributed computing grid comprising over 130,000 CPUs, 230 million GB of magnetic tape storage, and 300 million GB of online disk storage.


Amazon Has a Top-Secret Plan to Build Home Robots

Bloomberg Technology, Mark Gurman and Brad Stone


from

Ten years ago, Amazon introduced the Kindle and established the appeal of reading on a digital device. Four years ago, Jeff Bezos and company rolled out the Echo, prompting millions of people to start talking to a computer.

Now Amazon.com Inc. is working on another big bet: robots for the home.

The retail and cloud computing giant has embarked on an ambitious, top-secret plan to build a domestic robot, according to people familiar with the plans. Codenamed “Vesta,” after the Roman goddess of the hearth, home and family, the project is overseen by Gregg Zehr, who runs Amazon’s Lab126 hardware research and development division based in Sunnyvale, California. Lab126 is responsible for Amazon devices such as the Echo speakers, Fire TV set-top-boxes, Fire tablets and the ill-fated Fire Phone.


Harvard Teaching and Research Assistants Vote to Unionize

The Scientist Magazine®, Jim Daley


from

Harvard teaching and research assistants voted last week (April 18 and 19) to join the United Auto Workers (UAW) labor union. The vote was held at the National Labor Relations Board (NLRB) regional office in Boston. The final tally was 1,931 in favor of unionizing to 1,523 against, according to The Harvard Crimson.

Union supporters tell The Boston Globe that they are seeking better health care, wage stability, and a grievance resolution process. “We have been organizing for a long time,” Niharika Singh, a PhD student in public policy, says in a press release. “Winning our union today means we can finally start to make improvements in our working conditions.”

 
Events



Deep Dive Healthcare

Innovation Leader conferences


from

Boston, MA May 16-17. “This event will bring together senior executives from insurers, hospital systems, medical device and pharma companies, healthcare startups, and academic medical centers.” [$$$$]

 
Moore-Sloan Data Science Environment News



The State of Disinformation on Social Media – Center for Data Science

Medium, NYU Center for Data Science


from

CDS researchers contribute to new review of disinformation on online platforms

 
Tools & Resources



Xenografics

Maarten Lambrechts


from

“Xeno.graphics is a collection of unusual charts and maps, managed by Maarten Lambrechts. Its objective is to create a repository of novel, innovative and experimental visualizations to inspire you, to fight xenographphobia and popularize new chart types.”


The current state of the Stan ecosystem in R

Statistical Modeling, Causal Inference, and Social Science, Jonah Gabry


from

“Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of Stan users who are releasing their own packages interfacing with rstan or one of our other packages.”


Running Keras models on iOS with CoreML

PyImageSearch, Adrian Rosebrock


from

Today, we’re going to take this trained Keras model and deploy it to an iPhone and iOS app using what Apple has dubbed “CoreML”, an easy-to-use machine learning framework for Apple applications.

Leave a Comment

Your email address will not be published.