Data Science newsletter – June 22, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for June 22, 2018


Data Science News

FTC plans to reexamine how it polices tech companies

TheHill, Harper Neideg


The new head of the Federal Trade Commission (FTC) wants to review how the consumer protection and antitrust agency polices companies like major tech platforms, promising “vigorous enforcement” of Silicon Valley.

Joseph Simons, who was sworn in as FTC chairman last month, on Wednesday announced that he would convene a series of public hearings later this year to examine whether changes in the economy prompted by the rise of tech giants might necessitate changes in how regulators carry out enforcement.

“Beginning in the fall and continuing through January, the commission is going to hold a series of multi-day, multi-part public hearings to help us evaluate our policy and enforcement efforts to see if they need modification and if so what that might look like,” Simons, a Republican, told reporters Wednesday.

Dow Jones Industrial Average Drops General Electric

NPR, Christopher Dean Hopkins


When the Dow Jones industrial average was created more than 120 years ago, it included 12 companies. Nearly all of the names faded away long ago: United States Leather Co., Distilling & Cattle Feeding Co., Chicago Gas Light And Coke Co.

All but one: General Electric, which is finally being dropped from the now 30-company stock index in favor of the parent company of the Walgreens chain of pharmacies. The change, announced Tuesday, will take effect June 26.

Want to Understand What Ails the Modern Internet? Look at eBay

The New York Times Magazine, John Herrman


This basic approach to seller vetting has been adopted, and adapted, for much wider use, often with far less remove. In 2001, an eBay user might have given an eBay seller a low rating for an unsatisfactory packaging job; in 2018, an Uber driver might be dinged if a rider doesn’t like his personality or choice of music. Facebook recently began ranking publishers by trustworthiness, based on feedback from users. Upon discovering a problem, a platform company’s first instinct is to find a way to expose it to the wisdom of a market, or at least the will of a crowd.

EBay users’ participation in the site’s governance is best understood as large-scale offloading of labor, and is basically limited to snitching on one another; users have some recourse against one another and somewhat less against the site itself. That imbalance doesn’t bother us because eBay is eBay. “It’s a transactional platform; the transaction is very clear to both sides,” Hagiu says. “By and large, buyers and sellers are aligned on eBay.” A seller banned for counterfeit sunglasses hasn’t lost any rights, just his ability to use eBay. Crucially, to most ears, a scammer’s claims to the contrary would sound ridiculous.

Is There a Smarter Path to Artificial Intelligence? Some Experts Hope So

The New York Times, Steve Lohr


In recent conversations, online comments and a few lengthy essays, a growing number of A.I. experts are warning that the infatuation with deep learning may well breed myopia and overinvestment now — and disillusionment later.

“There is no real intelligence there,” said Michael I. Jordan, a professor at the University of California, Berkeley, and the author of an essay published in April intended to temper the lofty expectations surrounding A.I. “And I think that trusting these brute force algorithms too much is a faith misplaced.”

Microsoft buys AI start-up Bonsai

CNBC, Jordan Novet


Microsoft on Wednesday said it has acquired Bonsai, a small artificial intelligence start-up. Terms of the deal weren’t disclosed.

Microsoft has increasingly bet on AI and has sought to commercialize ideas its own researchers come up with, a strategy also employed by Amazon, Google and other big technology companies. By buying Bonsai, Microsoft gains a product that has picked up some business use to help with the push.

Bonsai’s software draws on a trendy approach called reinforcement learning, which involves training systems to yield better outcomes through trial and error. After experts highlight important areas in the data and train an AI model, it can be incorporated into corporate applications.

Artificial Intelligence: Heidrick & Struggles’ Newest Specialty Practice

PR Newswire, Heidrick & Struggles


Clients across all industry sectors will be served by a new specialty practice in Artificial Intelligence (AI) at Heidrick & Struggles (Nasdaq: HSII), a premier provider of executive search, leadership assessment and development, organization and team effectiveness, and culture shaping services globally.
Ryan Bulkoski to lead AI Specialty Practice at Heidrick & Struggles.
Ryan Bulkoski to lead AI Specialty Practice at Heidrick & Struggles.

Machine learning and other advanced forms of AI can help companies in every sector move beyond process automation that has helped drive efficiency and growth. Integrating and optimizing adaptive changes made possible by AI can provide massive competitive advantage. But there is a critical shortage of leaders with the ability to apply a deep understanding of AI to completely rethink and transform an organization’s business model.

Sonos Taking on Amazon, Apple, Google as It’s About to Go Public

Variety, Janko Roetggers


Summer has been kind so far to Sonos: The Santa Barbara, Calif.-based smart-speaker maker introduced a major new product, the Beam soundbar, earlier this month. Also in June, Ikea showed off first designs of products it is developing jointly with Sonos. But the company will be put to the test in the coming weeks when it is widely expected to file for IPO in hopes of turning what was once a scrappy start-up into a multibillion-dollar publicly traded company.

Sonos isn’t commenting on the subject, but CEO Patrick Spence is vocal about 2018 being a defining year for the company, which has been on the defensive against tech giants like Amazon, Apple and Google: “Looking back 10 years from now, this will be the point where we made the leap into the next phase of Sonos,” he says.

Ringing in that new phase is the Beam, which will begin selling next month for $399. The stylish, compact soundbar features Amazon’s Alexa smart assistant, allowing consumers to request songs, turn on their TV and even control their smart home with voice commands. Sonos is also adding support for Apple’s Airplay 2 technology, making it possible to play audio from any iPhone app on the device. And later this year, the Beam will receive access to Google’s smart assistant with a software update.

Autonomous Vehicles Might Drive Cities to Financial Ruin

WIRED, Ideas, Susan Crawford


The problem, as speaker Nico Larco, director of the Urbanism Next Center at the University of Oregon, explained, is that many cities balance their budgets using money brought in by cars: gas taxes, vehicle registration fees, traffic tickets, and billions of dollars in parking revenue. But driverless cars don’t need these things: Many will be electric, will never get a ticket, and can circle the block endlessly rather than park. Because these sources account for somewhere between 15 and 50 percent of city transportation revenue in America, as autonomous vehicles become more common, huge deficits are ahead.

Cities know this: They’re beginning to look at fees that could be charged for accessing pickup and dropoff zones, taxes for empty seats, fees for parking fleets of cars, and other creative assessments that might make up the difference.

Towards personalised medicine: one type of data is not enough

European Molecular Biology Laboratory – European Bioinformatics Institute


EMBL researchers have designed a computational method to jointly analyse multiple types of molecular data from patients in order to identify molecular signatures that distinguish individuals. The method is called Multi-Omics Factor Analysis (MOFA), and was published in Molecular Systems Biology today. MOFA could be particularly useful for understanding cancer development, improving diagnosis and suggesting new directions for personalised treatment.

UMass Amherst Researchers Say Their Memristor Neural Network Can be Applied to Machine Learning | Office of News & Media Relations

University of Massachusetts Amherst, News & Media Relations


A team of researchers headed by electrical and computer engineering Professors Qiangfei Xia and J. Joshua Yang at the University of Massachusetts Amherst, say they have found a way to use sophisticated memristor neural networks to achieve machine learning where the network continuously adapts and updates its knowledge as it receives more data.

The findings are published in Nature Communications. Xia and Yang summarized the findings, saying, “This work proves that the memristor neural network is ready for machine-learning applications.”

The research team, headed by Xia and Yang and their graduate students from UMass Amherst, and also includes collaborators from Hewlett Packard Labs in Palo Alto, Calif., and the Air Force Research Laboratory, Information Directorate, Rome, N.Y.

‘In Ireland, we cannot produce data scientists fast enough’

Silicon Republic


Prof Andrew Parnell of Maynooth University is aiming to see the data from the mathematical trees to make predictions that could not only save time, but lives as well.

Trying to make sense of the reams of data available to the average researcher is challenging at the best of times, but new tools are constantly being developed to help them get to grips with it and possibly make major breakthroughs in the process.

One of those researchers is Andrew Parnell, a Hamilton professor at Maynooth University and the deputy director of machine learning and statistics at the Science Foundation Ireland-funded Insight Centre for Data Analytics.

Thomson Reuters Declares Billable Hour Dead; Joins Cost Prediction Battle

Artificial Lawyer


Global professional information and software company, Thomson Reuters, has declared that the billable hour – that revered emblem of the legal world – is now dead, at least as far as clients are concerned. It has also joined the battle to gain market share for legal matter cost prediction tech, following a wave of startups offering similar products in this area, by launching its own new service as part of its 3E platform.

Eric Ruud, managing director, at Thomson Reuters Elite, the group behind the new cost prediction system, said in a company statement: ‘The true billable hour is a thing of the past.’

Developing a modern data workflow for living data

bioRxiv; Glenda M Yenni, Erica M Christensen, Ellen K Bledsoe, Sarah R Supp, Renata M Diaz, Ethan P White, S K Morgan Ernest


Data management and publication are core components of the research process. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. “Living data” present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a living data workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to: 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow uses two tools from software development, version control and continuous integration, to create a modern data management system that automates the pipeline.

California Can Lead the Way in Open Access

Electronic Frontier Foundation, Elliott Harmon


There’s a bill in the California legislature that would be a huge win for open access to scientific research. The California Assembly recently passed A.B. 2192 unanimously. We hope to see it pass the Senate soon, and for other states to follow California’s lead in passing strong open access laws.

Under A.B. 2192, all peer-reviewed, scientific research funded by the state of California would be made available to the public no later than a year after publication. Under current law, research funded by the California Department of Public Health is covered by an open access law, but that provision is set to expire in 2020. A.B. 2192 would extend it indefinitely and expand it to cover research funded by any state agency.

A.B. 2192 is a huge step in the right direction. When scientific research is available only to people with access to expensive journal subscriptions or subscription-based academic databases, it puts those without institutional connections at a severe disadvantage.

Northwestern Mutual launching Milwaukee data science partnership

Milwaukee Journal Sentinel, Karen Harmon


If Milwaukee is to grow and keep young workers with skills to turn big data into insights that companies can use to make better decisions, the city needs to step up its game.

That’s why Northwestern Mutual Life Insurance, Marquette University and the University of Wisconsin-Milwaukee on Wednesday announced they are launching a Northwestern Mutual Data Science Institute.

It’s touted as both a cutting-edge partnership for the Midwest and a step toward advancing Milwaukee as a hub for technology, research, business and talent development — something Northwestern Mutual chairman and CEO John Schlifske has been talking about for months.


2018 ICME Summer Workshops – Fundamentals of Data Science

Stanford University


Stanford, CA Day long workshops running August 13-18 at Stanford University. [$$$]

Tools & Resources

Data Dictionary: a how to and best practices

Medium, Carl Anderson


A data dictionary is a list of key terms and metrics with definitions, a business glossary. While it is sounds simple, almost trivial, its ability to align the business and remove confusion can be profound. In fact, a data dictionary is possibly one of the most valuable artifacts that a data team can deliver to the business.

Most businesses have at least one concept, term, or metric that is used or interpreted differently among teams. When this happens, confusion reigns. Decision makers may disagree about what the data show and what actions to take. Reports among teams might show different numbers for the same metric from the same data source due to inconsistent business logic. Teams may even argue about the correct definition and defend their turf, perhaps because their definition makes their numbers look better. This is not good for business.

Once you have a data dictionary, it is a document that all staff can reference and be on the same page, it makes onboarding new staff easier, and the business intelligence (BI) team have crystal clear requirements for implementation of those metrics.

Curating Artificial Intelligence: Immediately Computable Access

DATAVERSITY, Wolfram Research, Keith Foote


A new press release stated Wolfram Research, makers of Mathematica, “Wolfram|Alpha, and the Wolfram Language, has officially launched the Wolfram Neural Net Repository. Providing immediate access to neural networks for Artificial Intelligence and machine learning applications, the repository further builds a ubiquitous layer of computational intelligence upon the Wolfram tech stack. These models are suitable for immediate evaluation, training, visualization, and transfer learning, for both experts and those new to applying AI to their work.”

“Building on the Wolfram Language Neural Net Framework introduced in 2014 with Version 10 of the Wolfram Language, the repository provides a uniform system for storing and deploying neural network models in an immediately computable form. The repository is built to be a global resource for neural net models, including those from the latest research papers, as well as ones trained or created at Wolfram Research.”

Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center

Nature, Scientific Data; Stephan C. Schürer et al.


The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles. [full text]

synergy-core: Open source core of Synergy, the keyboard and mouse sharing tool

GitHub – symless


This is the open source core component of Synergy, a keyboard and mouse sharing tool.

The Challenge of Product/Data Fit

Hacker Noon, Daniel Shenfield


Machine learning and AI are becoming critical components of more and more products. Some applications, like image recognition and natural language understanding, seemed out of reach just a few years ago but are now reaching the mainstream.

But, these powerful tools also create new product development challenges.

Having worked on building machine learning based products at various companies, I started noticing a common theme: data creates a fit challenge.


Full-time positions outside academia

Senior Research Scientist

Surgo Foundation; Washington, DC

Data Scientist

Metropolitan Council; St. Paul, MN
Internships and other temporary positions

Full Stack Developer

Conservation Metrics; Santa Cruz, CA

Leave a Comment

Your email address will not be published.