Data Science newsletter – June 21, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 21, 2017

GROUP CURATION: N/A

 
 
Data Science News



The fastest-growing area of machine learning science

Quid, Josh Miller, Carlos Folgar and Jess McCuan


from

At Quid we took a closer look at academic papers about machine learning — some 5,000 in total from U.S. universities — to see how the science around it is changing.

The data for the analysis comes from SCOPUS, the largest abstract and citation database of peer-reviewed academic literature from around the globe. In just a few minutes, Quid software sifted through the thousands of papers about machine learning and created a network where connections between nodes represent similarities in language between them.


On the Mind: What We Know About the Brain-Computer Interface

Paste Magazine, Carolyn Crist


from

Terms such as “artificial intelligence,” “Internet of Things” and “brain-computer interface” can worry even the most tech-savvy consumers. Silicon Valley pros, including celebrity-status entrepreneurs such as Elon Musk, see the possibility of our minds connecting with others as a reality. Researchers have made progress during the last 50 years, but how close are we, really?


Kepler Telescope Finds 10 Earth-like Habitable Planets

Edgy Labs, Zayan Guedim


from

Using the Kepler space telescope, NASA discovered ten new exoplanets that are located in the habitable zone of their respective stars.

The method used by space missions for the detection of exoplanets is that of planetary transit, the observation of the decline in luminosity of a star when a planet passes in front of its disc.

Although possible from terrestrial telescopes, this method suffers from the effects of atmospheric turbulence which causes fluctuations in luminosity generally greater than the decline caused by possible planetary transit.


AI Could Test For Autism Before It Even Emerges—But It’s No Cure-All

WIRED, Science, Anna Vlasits


from

Artificial intelligence is ascendant in medicine—from AI eye doctors to chatbot therapists. As medical databases balloon in size and complexity, researchers are teaching computers to sift through and identify patterns, hinting at a future in which machine learning algorithms diagnose disease all on their own.

Sometimes, algorithms pick up on early signs of disease that humans wouldn’t even know to look for. Last week, researchers at the University of North Carolina and Washington University reported an AI that can identify autistic infants long before they present behavioral symptoms. It’s a thrilling opportunity: Early detection gives autism neuroscience a big leg up, as researchers try to understand what goes wrong during development. But now clinicians and researchers have to figure out what they’ll do with that information—is it just a research tool, or will they one day begin diagnosing and treating autism before symptoms start? Especially when it comes to infants, it won’t be easy to entrust medical care to a computer-generated guess.


Amazon’s New Customer

Stratechery by Ben Thompson


from

The First-And-Best Customer

Last year in The Amazon Tax I explained how the different parts of the company — like AWS and Prime — were on a conceptual level more similar than you might think, and that said concepts were rooted in the very structure of Amazon itself. The best example is AWS, which offered server functionality as “primitives”, giving maximum flexibility for developers to build on top of:

The “primitives” model modularized Amazon’s infrastructure, effectively transforming raw data center components into storage, computing, databases, etc. which could be used on an ad-hoc basis not only by Amazon’s internal teams but also outside developers


Symphonic Social Science and the Future for Big Data Research

SAGE Connection – Insight, Sociology; Susan Halford, Mike Savage


from

Recent years have seen persistent tension between proponents of big data analytics, using new forms of digital data to make computational and statistical claims about ‘the social’, and many sociologists sceptical about the value of big data, its associated methods and claims to knowledge. This article is intended to make an original contribution to the debate on ‘big data’ and social research. This is grounded in our own reflections on the ‘big data’ both as an empirical phenomenon and an emergent field of practice in which claims to knowledge.


Watching cities grow: New algorithms for evaluating satellite data

Technical University of Munich


from

Three million measurement points in one square kilometer: Prof. Xiaoxiang Zhu and her team have set a world record in information retrieval from satellite data. Thanks to new algorithms, the researchers from the Technical University of Munich (TUM) succeeded in making four-dimensional point clouds of Berlin, Las Vegas, Paris and Washington, D.C. from images stacks of the TerraSAR-X radar satellite. Next the scientists plan to create four-dimensional models of all cities in the world.


Lessons of 2016 for U.S. Election Security

Princeton Center for Information Technology Policy, Freedom to Tinker blog, Ed Felten


from

The 2016 election was one of the most eventful in U.S. history. We will be debating its consequences for a long time. For those of us who pay attention to the security and reliability of elections, the 2016 election teaches some important lessons. I’ll review some of them in this post.

First, though, let’s review what has not changed. The level of election security varies considerably from place to place in the United States, depending on management, procedures, and of course technology choices. Places that rely on paperless voting systems, such as touchscreen voting machines that record votes directly in computer memories (so-called DREs), are at higher risk, because of the malleability of computer memory and the lack of an auditable record of the vote that was seen directly by the voter. Much better are systems such as precinct-count optical scan, in which the voter marks a paper ballot and feeds the ballot through an electronic scanner, and the ballot is collected in a ballot box as a record of the vote. The advantage of such a system is that a post-election audit that compares a random sample of paper ballots to the corresponding electronic records can verify with high confidence that the election results are consistent with what voters saw. Of course, you have to make the audit a routine post-election procedure.

Now, on to the lessons of 2016.


6 examples of data management, reporting and analytics in higher education

SAS Voices, Georgia Mariani


from

Example 1: Tips on the journey from data management to visual reporting

We’ve all heard the adage “garage in – garbage out.” Users must be able to trust the data and view the reports as a reliable resource. Those concepts were key when The University of Tennessee and The University of Oklahoma began their journey from data management to visual reporting. They knew that dashboards – no matter how flashy – aren’t useful if the underlying data isn’t accurate. As such, they strongly recommend that formal data governance is in place prior to beginning a data warehouse and selecting a tool. In addition, they suggest procuring input from IT and reporting stakeholders; ideally a full partnership would be most beneficial. Also, consider having a roadmap of the reporting/dashboards output.


Allen AI Joins Microsoft, Baidu to Help Empower Academic Searches

Bloomberg Technology, Dina Bass


from

Paul Allen’s artificial intelligence institute is putting together a coalition including Microsoft Corp., Google, Baidu Inc. and the Gates Foundation to share technology and ideas to help scientific researchers and academics find and take advantage of the latest discoveries and information.

Called the Open Academic Search project, the goal is to aid researchers by having the companies, institutes and nonprofits involved make their AI and analysis tools open-source, or freely available to other groups to use and tweak. The project seeks to empower researchers, doctors and professors to use the latest discoveries amid a sea of new work and data that’s being created too rapidly for anyone to keep track.


New York, New York: Hasso Plattner Institute Opens a New Branch

Hasso Plattner Institute, press release


from

From 19 to 21 June 2017, the Hasso Plattner Institute (HPI) will celebrate the opening of its new branch in New York. The network conference “Sharing Knowledge, Creating Visions” will take center stage on 20 June, when HPI officially presents itself in New York.

“Together with our partner, the SAP innovation platform Next-Gen, we would like to present the work and research fields of HPI to New York’s leading research institutes, universities and companies,” said HPI Director Professor Christoph Meinel. The New York branch will allow all research areas to intensify their cooperation with their American partners or to launch new projects.


Company Data Science News

Speaking of search features and Google, the company announced a new way to search for jobs. The search grabs jobs posted on, “LinkedIn, Monster, WayUp, DirectEmployers, CareerBuilder and Facebook” as well as jobs listed on the companies’ home pages. In addition to jobs, the results include reviews of the company from Glassdoor and Indeed. Type in “jobs near me” at google.com and try it out.

Google announced a plan to build 300 modular housing units for employees in Mountain View because new employees cannot afford rent, even on Google salaries. We should all be wondering what people making less than Google salaries (e.g. most people) are supposed to do. Tents on the side of the highway?

Andrew Ng has announced the name of his new project: deeplearning.ai. Or, if you prefer (as Andrew does), deeplearniNgAI. I was surprised to find exactly no additional information at the website. When he informs his admirers and would-be investors what he’s up to, I shall comment upon it here.

Apple acquired a teeny start-up called Health Gorilla, ostensibly to apply machine learning to the health-related data collected by iPhones. Unclear what the overall objective is. IPhone users, it may be prudent to consider whether you want Apple involved in assessing how healthy you are.

Bosch, the autoparts maker in Germany, is building a $1 bn manufacturing facility to make chips. Of course the chips are for self-driving cars.

Tesla hired machine learning superstar Andrej Karpathy to lead the Autopilot group. As a pedestrian, I feel safer already.

Nissan is also working on a self-driving car with an interface panel that looks similar to Tesla’s Autopilot, according to The Verge.

DeepMind has an app for nephrologists (kidney doctors) that will allow them to pull out their phones to view “results of X-rays, scans and blood tests”. More importantly, the app uses machine learning (not sure which technique) to detect and flag suspected cases of acute kidney injury.

Jeff Bezos, founder and CEO of Amazon also owns the Washington Post. Bezos offered advice for newspapers now that WaPo is profitable: “Advertising alone won’t support investigative journalism” and “When you are writing be riveting. Be right. And ask people to pay. They will pay.” This sounds like the business version of the positive banalities sports stars offer up in their post-game interviews. Of course, of course, of course, such sweet nothings.

Cool moovel mapping project that takes a novel approach and a snappy tone to analyze mobility in the city.

Zendrive is an auto-nanny that utilizes data gathered by your phone to help you be a better driver. It can “relay data not only about when drivers are looking at their phones, but also how fast they’re driving at any given moment, or if they make hard turns at intersections. It detects collisions, aggressive driving, distracted driving, and more, including whether you’ve recently stopped at a pub.” This is the type of tech I imagine will be integrated into self-driving vehicles and hybrid human/AI cars to keep us from killing ourselves and others.


Zangardi: Better Cybersecurity Needs Technology, Talent

SIGNAL Magazine, Sandra Jontz


from

The swiftly changing cyber domain demands a dynamic and dedicated partnership between the U.S. Defense Department and industry—a critical relationship for the development of both technologies and the work force needed to help the United States maintain a superior edge over adversaries, said John Zangardi, the department’s acting chief information officer.


Measuring biological dust in the wind

MIT News


from

A technique developed in the Cziczo Lab may be the most accurate way of identifying biological aerosols from mineral dust in the atmosphere and analyzing their contribution to cloud formation and climate change.


These undergrads built a Generative Adversarial Net that creates art

TechCrunch, John Mannes


from

I can’t tell you for sure if we have reached peak GAN, but there are far more people messing around with them than there were a year ago, and that’s a great thing. Two undergraduates at Williams College taught themselves introductory machine learning and read about 50 papers on the now almost mainstream Generative Adversarial Network (GAN) before embarking on a project with a professor to build one that could generate art — and within a year they had basically done it.

The two students, Kenny Jones and Derrick Bonafilia, were computer science students who stumbled across the Martin Arjovsky’s Wasserstein GAN (WGAN) earlier this year. WGAN, a more stable cousin of the basic GAN, turned out to be a great tool for both learning artistic style and generating new art.


Tesla hires deep learning expert Andrej Karpathy to lead Autopilot vision

TechCrunch, Darrell Etherington


from

Tesla has hired deep learning and computer vision expert Andrej Karpathy in a key Autopilot role. Karpathy most recently held a role as a researcher at OpenAI, the artificial intelligence nonprofit backed by Elon Musk. He has an extensive background in AI-related fields, having completed a PhD at Stanford University in computer vision.

Karpathy also created one of the original, and most respected, deep learning courses taught at Stanford, and his dissertation work focused on creating a system by which a neural network could identify multiple discrete and specific items within an image, label them using natural language and report to a user. The dissertation also included developing a system that works in reverse, allowing for a model that can use descriptions from a user articulated in natural language (i.e. “white tennis shoes”) and find that object in a given image.


Russia’s Cyberwar on Ukraine Is a Blueprint For What’s to Come

WIRED, Security, Andy Greenberg


from

The Cyber-Cassandras said this would happen. For decades they warned that hackers would soon make the leap beyond purely digital mayhem and start to cause real, physical damage to the world. In 2009, when the NSA’s Stuxnet malware silently accelerated a few hundred Iranian nuclear centrifuges until they destroyed themselves, it seemed to offer a preview of this new era. “This has a whiff of August 1945,” Michael Hayden, former director of the NSA and the CIA, said in a speech. “Somebody just used a new weapon, and this weapon will not be put back in the box.”

Now, in Ukraine, the quintessential cyberwar scenario has come to life.


Google to Buy 300 Modular Apartments for Employees

Planetizen, Casey Brazeal


from

Mountain View is an expensive place to find housing, in part because of Google itself. The tech giant plans to do something about the housing crunch, for its employees at least. – See more at: https://www.planetizen.com/node/93337/google-buy-300-modular-apartments-employees?utm_source=dlvr.it&utm_medium=dlvr-twitter&utm_campaign=newfeed#sthash.Re2GXWZJ.dpuf

 
Events



Fourth Global Machine Intelligence Summit

RE•WORK


from

Amsterdam, The Netherlands June 28-29. Co-located with Machine Intelligence in Autonomous Vehicles Summit, produced by RE•WORK. [$$$$]


Code Climate Summit 2017

Code Climate


from

New York, NY July 19 at Astor Center. One-day conference for leaders of engineering organizations who want to better themselves, their processes, and their teams. [$$$]


Join us for @Scale 2017!

@Scale


from

San Jose, CA August 31 at San Jose Convention Center. To request an invite, send us a message through the @Scale community page.

 
Deadlines



Help EFF Track the Progress of AI and Machine Learning

“There isn’t currently a good single place to find the state of the art on well-specified machine learning metrics, let alone the many problems in artificial intelligence that are still so hard that there are no good datasets and benchmarks to keep track of them yet. So we are trying to make one. Today, we’re launching the EFF AI Progress Measurement experiment, and encouraging machine learning researchers to give us feedback and contribute to the effort.”
 
Tools & Resources



MERRAclim, a high-resolution global dataset of remotely sensed bioclimatic variables for ecological modelling

Scientific Data; Greta C. Vega


from

“Here we introduce MERRAclim, a global set of satellite-based bioclimatic variables including Antarctica for the first time. MERRAclim consists of three datasets of 19 bioclimatic variables that have been built for each of the last three decades (1980s, 1990s and 2000s) using hourly data of 2 m temperature and specific humidity. We provide MERRAclim at three spatial resolutions (10 arc-minutes, 5 arc-minutes and 2.5 arc-minutes). These reanalysed data are comparable to widely used datasets based on ground station interpolations, but allow extending their geographical reach and SDM building in previously uncovered regions of the globe.”


[1706.04987] Variational Approaches for Auto-Encoding Generative Adversarial Networks

arXiv, Statistics > Machine Learning; Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed


from

Auto-encoding generative adversarial networks (GANs) combine the standard GAN algorithm, which discriminates between real and model-generated data, with a reconstruction loss given by an auto-encoder. Such models aim to prevent mode collapse in the learned generative model by ensuring that it is grounded in all the available training data. In this paper, we develop a principle upon which auto-encoders can be combined with generative adversarial networks by exploiting the hierarchical structure of the generative model. The underlying principle shows that variational inference can be used a basic tool for learning, but with the in- tractable likelihood replaced by a synthetic likelihood, and the unknown posterior distribution replaced by an implicit distribution; both synthetic likelihoods and implicit posterior distributions can be learned using discriminators. This allows us to develop a natural fusion of variational auto-encoders and generative adversarial networks, combining the best of both these methods. We describe a unified objective for optimization, discuss the constraints needed to guide learning, connect to the wide range of existing work, and use a battery of tests to systematically and quantitatively assess the performance of our method.

 
Careers


Postdocs

Post-Doctoral Research Position to Study Data and Software Practices in Science



University of California-Los Angeles, Center for Knowledge Infrastructures; Los Angeles, CA
Full-time positions outside academia

Human Cell Atlas: Opportunities



European Bioinformatics Institute; Hinxton, England
Full-time, non-tenured academic positions

Privacy Director



Stanford University, Center for Internet and Society; Stanford, CA

Leave a Comment

Your email address will not be published.