Data Science newsletter – July 12, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for July 12, 2017

GROUP CURATION: N/A

 
 
Data Science News



Proximity boosts collaboration on MIT campus

MIT News


from

Want to boost collaboration among researchers? Even in an age of easy virtual communication, physical proximity increases collaborative activity among academic scholars, according to a new study examining a decade’s worth of MIT-based papers and patents.

In particular, the study finds that cross-disciplinary and interdepartmental collaboration is fueled by basic face-to-face interaction within shared spaces.

“If you work near someone, you’re more likely to have substantive conversations more frequently,” says Matthew Claudel, a doctoral student in MIT’s Department of Urban Studies and Planning (DUSP) and the MIT Lab for Innovation Science and Policy, and the lead author of a new paper detailing the findings.


NHS and Oxford University tap IoT company Drayson Technologies to develop digital health tools

MobiHealthNews, Heather Mack


from

London-based Drayson Technologies, a healthcare IoT company, is getting a high profile gig and funding to help build out and commercialize digital health tools for the University of Oxford and Oxford University Hospitals NHS Foundation Trust.

Under a five-year, Strategic Research Agreement, the three entities will join forces to develop digital health products based on research from Oxford Biomedical Research Center and funded by the National Institute for Health Research. The SRA comes on the heels of Drayson’s $19.3 million (15 million British pounds) Series C financing, which was led by Woodford Investments. The company has raised over $53 million (41 million pounds) to date.


Stanford Seminar – Unethical Algorithms of Massive Scale

YouTube, stanfordonline


from

Unethical Algorithms of Massive Scale: New Data, a New Discovery, a New Tracking System, and a New Organization …
Speaker: Robert Epstein, The Sunlight Society and The American Institute for Behaviorial Research and Technology


Intel, While Pivoting to Artificial Intelligence, Tries to Protect Lead

The New York Times, Steve Lohr


from

The computers in modern data centers — the engine rooms of the digital economy — are powered mainly by Intel chips. They animate the computing clouds of the internet giants and corporate data centers worldwide.

But Intel is now facing new competitive forces that could pose a challenge to its data-center dominance and profitability.

In particular, the rise of artificial intelligence is creating demand for new computing hardware tailored to handle vast amounts of unruly data and complex machine-learning software — and Intel’s general-purpose chips are not yet tuned for the most demanding tasks. Instead, specialized chips are delivering better performance on artificial intelligence programs that identify images, recognize speech and translate languages.


Massive database of 182,000 leaves is helping predict plants’ family trees

Nature News & Comment, Heidi Ledford


from

The technique could be used on everything from flowers to cells to examine the factors that influence the shapes of plant parts.


The Google Brain Residency Program — One Year Later

Google Research Blog; Luke Metz and Yun Liu


from

This month marks the end of an incredibly successful year for our first class of the Google Brain Residency Program. This one-year program was created as an opportunity for individuals from diverse educational backgrounds and experiences to dive into research in machine learning and deep learning. Over the past year, the Residents familiarized themselves with the literature, designed and implemented experiments at Google scale, and engaged in cutting edge research in a wide variety of subjects ranging from theory to robotics to music generation.


Stanford computer scientists develop an algorithm that diagnoses heart arrhythmias with cardiologist-level accuracy

Stanford University, Stanford News


from

A new algorithm developed by Stanford computer scientists can sift through hours of heart rhythm data generated by some wearable monitors to find sometimes life-threatening irregular heartbeats, called arrhythmias. The algorithm, detailed in an arXiv paper, performs better than trained cardiologists, and has the added benefit of being able to sort through data from remote locations where people don’t have routine access to cardiologists.

“One of the big deals about this work, in my opinion, is not just that we do abnormality detection but that we do it with high accuracy across a large number of different types of abnormalities,” said Awni Hannun, a graduate student and co-lead author of the paper. “This is definitely something that you won’t find to this level of accuracy anywhere else.”


Zen and the Art of Multilingual eDiscovery

Slator, Mark Hjerpe


from

eDiscovery is tough in English. Adding foreign language increases the difficulty and complexity by magnitudes. This paper offers guidance on three key challenges of multilingual eDiscovery: logistics, linguistics and culture. By understanding how these areas come into play on multilingual legal matters, global working groups can better equip themselves with the people, processes and tools necessary for successful execution.


MIT IoT and wearable project foretells the future of industrial safety

Network World, Steven Max Patterson


from

MIT built a fitting prototype that could, with further development, scale into a platform. A multidisciplinary team from the MIT Design Lab led by MIT Media Lab researcher Guillermo Bernal won best research paper at the Petra Conference last month for the team’s work applying IoT and wearables to industrial safety. The sophisticated and purpose-built prototype at the center of the research makes the paper “Safety++. Designing IoT and Wearable Systems for Industrial Safety through a User-Centered Design Approach” extremely tangible and predictive about how the IoT will unfold.

The energy industry was the context of the research. Testing was conducted at refineries owned by ENI S.p.A. The team began its research with existing literature, interviews with plant supervisors and workers, and a study of video recordings of the production area.


Ethics and Governance AI Fund funnels $7.6M to Harvard, MIT and independent research efforts

TechCrunch, Devin Coldewey


from

A $27 million fund aimed at applying artificial intelligence to the public interest has announced the first targets for its beneficence: $7.6 million will be split unequally between MIT’s Media Lab, Harvard’s Berkman Klein Center and seven smaller research efforts around the world.

The Ethics and Governance of Artificial Intelligence Fund was created by Reid Hoffman, Pierre Omidyar and the Knight Foundation back in January; the intention was to ensure that “social scientists, ethicists, philosophers, faith leaders, economists, lawyers and policymakers” have a say in how AI is developed and deployed.


Breakthrough Tool Predicts Properties of Theoretical Materials, Finds New Uses for Current Ones

University of North Carolina, Eshelman Schooll of Pharmacy


from

Scientists at the University of North Carolina at Chapel Hill and Duke University have created the first general-purpose method for using machine learning to predict the properties of new metals, ceramics and other crystalline materials and to find new uses for existing materials, a discovery that could save countless hours wasted in the trial-and-error process of creating new and better materials.

Researchers led by Olexandr Isayev, Ph.D., and Alexander Tropsha, Ph.D., at the UNC Eshelman School of Pharmacy used data on approximately 60,000 unique materials from the National Institute of Standards and Technology’s Inorganic Crystal Structure Database to create a new methodology they call Properties Labeled Materials Fragments.

Using machine learning to analyze and model existing crystal structures, the PLMF method is able to predict the properties of new materials proposed by scientists and engineers. The tool was even able to fill in missing values for properties of materials in the NIST database that had never been tested experimentally.


Intel Cues New Xeon Chips for an AI Future

The New Stack, Agam Shah


from

Should software define how chips are designed? Or vice versa? Intel has struggled with that question for decades, but now is downplaying its CPUs and treating the data center like one large computer.

That was the underlying theme at Intel’s launch of new Xeon Scalable processors on Tuesday at an event in Brooklyn, New York. Intel called the new chips the “biggest data center advancement in a decade,” and a general purpose compute engine that would drive artificial intelligence, networking, storage and the cloud.


Qubole Harnesses Automation to Provide Data Self-Service

The New Stack, Susan Hall


from

Data delayed is data denied.

That was a slogan from the team that built out a self-service data platform at Facebook. Employees’ frustration at having to go through a data team to get the information they needed has been channeled into Qubole, a Santa Clara, Calif.-based startup that was generating buzz at ApacheCon recently for its focus on automation.


U.S. Department of Defense Has Enlisted Six Teams to Develop New Brain-Computer-Interface Technologies

Futurism, Karla Lant


from

DARPA’s quest for high-bandwidth brain-computer interfaces (BCIs) has a new partner in Paradromics, which will be leading one of the agency’s six BCI-development consortia. This is just one of the projects working to develop a breakthrough BCI right now.


Songs as Skills

Rethink Music, Benji Rogers


from

A song like any other digital asset becomes smart when you can identify from the file itself who owns it. When you can query the file to see what it is and how you can (should you choose to) commit commerce with it. A song becomes smart if you know who wrote it, not just who performed it, and who played on it, contributed to it, etc. This knowledge means that you can set rules into the file itself for how it can be used.

Songs as skills

The reason it’s so important that we get this next bit right is that a smart, or to use Amazon’s term “skilled,” song can talk directly to a machine, like an Alexa device, and can inform Alexa that yes it does indeed have the right to play the song (the permission), and that if it does so then there is a fee or a data exchange (obligation) due to all of those who are a party to the song in question.


Proscia wants to bridge technology gap in cancer research

MedCity News, Stephanie Baum


from

A digital pathology startup founded by technologists from Johns Hopkins, Moffitt Cancer Center in Tampa, the Ohio State University College of Medicine, and the University of Pittsburgh wants to shake up the world of pathology.

Several health IT companies have staked out niche parts of digital pathology and Proscia is one of them. Proscia has developed a set of tools with the intention of using technology to support institutions as they add digital pathology capabilities. The healthcare startup seeks to aid the digitization of pathology in the same way that radiology has been helped by the digitization, sharing and storing of X-ray images.


UN report: 50% of countries have no cybersecurity strategy in place

TechRepublic, Alison DeNisco


from

Only 38% of countries have a published cybersecurity strategy, and just 12% are in the process of developing one, according to a new report from the United Nations (UN). The Global Cybersecurity Index 2017, created by the UN’s International Telecommunication Union (ITU) and released Wednesday, urges more countries to consider national policies to combat cybercrime.


Evolving Government: A Chihuahua or a muffin? The real challenge of government machine learning

Fedscoop, Becky Fair


from

Often in government agencies, there are only a few experts who have the skills to understand nuanced data, or often times the data is too sensitive to outsource for support. A D.C.-based startup called Thresher aims to tackle this challenge. “Our clients are not trying to distinguish between a Chihuahua and a blueberry muffin,” noted Dr. Evann Smith, a senior data scientist at Thresher. “They need labels that teach computers the difference between nuanced concepts, such as news vs. propaganda in multiple Arabic dialects, nuclear waste vs. weapons discussions on social media, and hereditary vs. acquired myopathies based on doctors’ clinical notes. Getting these labels wrong can drastically change the outcomes of government data science efforts.

 
Events



Focus | AI: The Ethical Impact of AI

NYU Future Labs, Verizon


from

New York, NY Wednesday, July 26, starting at 6:30 p.m., Verizon Venture Studio (119 W 24th St) [$]


WiNLP workshop – Women and Underrepresented Minorities in Natural Language Processing

Association for Computational Linguistics


from

Vancouver, Canada July 30. The first WiNLP workshop will be co-located with ACL 2017. [$$$]


Future of Storytelling Festival

Future of Storytelling


from

New York, NY October 6-9. The world’s leading immersive storytelling event, FoST Festival. [$$$]

 
Deadlines



Training Course on Social Media, Big Data and Digital Demography – IPC2017workshop

Cape Town, South Africa This training workshop is organized by the IUSSP Scientific Panel on Big Data and Population Processes as a preconference workshop at the International Population Conference. October 29-November 4. Deadline to apply is August 15.

 
Tools & Resources



Natural Language Generation | Creating Great Automated Content

Automated Insights, Adrian Atkinson


from

Truly fantastic automated content can be found at the confluence of rich data and great storytelling. In general, a more robust set of data variables will enable a more nuanced narrative. But more and better data can only complement story structure, not replace it.

So what do we mean by “story structure?” The world of automated writing is really not that different from its more manual predecessor.


Tutorial on Deep Learning for MIR

GitHub – keunwoochoi


from

1. Pitch detector with a dense layer
2. Chord recogniser with a convnet


When data gets big: Best practices for data preparation at scale

Trifacta, Athena Jiang


from

Today we work with data that has grown up in diversity, scale and complexity — this applies to not only data scientists and academic researchers, but also the rest of us. Business analysts across a spectrum of industries are asked to include larger volumes of data in their work, now pervasive due to diminishing costs of collection and storage. Answering real analytic questions that drive business value means adapting methodologies to the reality of the data at hand. For this, new data preparation tools are gaining adoption, helping business users bring their domain expertise to bear on bigger, thornier data challenges. Based on our experiences navigating these transitions, we’ll share some best practices for evolving data workflows to handle increasing data volumes.


Rise of the Data Product Manager

Medium, Trey Causey


from

“Working with data at the core of a product requires a level of understanding of data modeling, data infrastructure, and statistical and machine learning. It goes beyond understanding the results of experiments and reading dashboards — it requires a deep appreciation for what is possible and what will soon be possible by taking full advantage of the flow of data. If the traditional PM operates at the intersection of business, engineering, and user experience, the data PM must also have domain knowledge of data and data science.”


Introducing Apache Spark 2.2

The Databricks Blog, Michael Armbrust


from

This release marks a major milestone for Structured Streaming by marking it as production ready and removing the experimental tag. In this release, we also support for arbitrary stateful operations in a stream, and Apache Kafka 0.10 support for both reading and writing using the streaming and batch APIs. In addition to extending new functionality to SparkR, MLlib, and GraphX, the release focuses on usability, stability, and refinement, resolving over 1100 tickets.


Human-Centered Machine Learning

Medium, Google Design, Josh Lovejoy and Jess Holbrook


from

“We’ve developed seven points to help designers navigate the new terrain of designing ML-driven products. Born out of our work with UX and AI teams at Google (and a healthy dose of trial and error), these points will help you put the user first, iterate quickly, and understand the unique opportunities ML creates.”


Respondent mode choice in a smartphone survey

Public Opinion Quarterly; Frederick G. Conrad Michael F. Schober Christopher Antoun H. Yanna Yan Andrew L. Hupp Michael Johnston Patrick Ehlen Lucas Vickers Chan Zhang


from

Now that people on mobile devices can easily choose their mode of communication (e.g., voice, text, video), survey designers can potentially allow respondents to answer questions in whatever mode they find momentarily convenient given their circumstances or that they chronically prefer. We conducted an experiment to explore how mode choice affects response quality, participation, and satisfaction in smartphone interviews. A total of 1,260 iPhone users were contacted on their iPhones by either a human or an automated interviewer via voice or SMS text. This created four modes: Human Voice, Human Text, Automated Voice, and Automated Text. In half of the initial contacts, respondents were required to choose their interview mode (which could be the contact mode); in the remaining half, the mode was simply assigned. Respondents who chose their interview modes provided more conscientious (fewer rounded and non-differentiated) answers, and they reported greater satisfaction with the interview. Although fewer respondents started the interview when given a choice of mode, a higher percentage of Mode Choice respondents who started the interview completed it. For certain mode transitions (e.g., from automated interview modes), there was no reduction in participation. The results demonstrate clear benefits and relatively few drawbacks resulting from mode choice, at least among these modes and with this sample of iPhone users, suggesting that further exploration of mode choice and the logistics of its implementation is warranted. [full text]

 
Careers


Full-time, non-tenured academic positions

New CIS Staff Position: Privacy Director



Stanford University, Stanford Law School, Center for Internet and Society; Stanford, CA
Full-time positions outside academia

UX Researcher, News



Facebook; Menlo Park, CA

Leave a Comment

Your email address will not be published.