Data Science newsletter – December 15, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for December 15, 2017

GROUP CURATION: N/A

 
 
Data Science News



Data Visualization of the Week

Twitter, Alec Tyson


from


Tweet of the Week

Twitter, HomoCarnula


from


How Machine Learning Can Help Identify Cyber Vulnerabilities

Harvard Business Review, Ravi Srinivasan


from

How do organizations not only monitor for suspicious activity coming from the outside world, but also look at the behaviors of their employees to determine security risks? As the adage goes, “to err is human” — people are going to make mistakes. So we need to find ways to better understand humans, and anticipate errors or behaviors that are out of character — not only to better protect against security risks, but also to better serve internal stakeholders.

There’s an emerging discipline in security focused around user behavior analytics that is showing promise in helping to address the threat from outside, while also providing insights needed to solve the people problem. It puts to use new technologies that leverage a combination of big data and machine learning, allowing security teams to get to know their employees better and to quickly identify when things may be happening that are out of the norm.


Gateses Pledge $15M for UW Computer Science Building Named for Them

Xconomy, Benjamin Ramano


from

First their friends named the new University of Washington computer science building in their honor. Now Bill and Melinda Gates have contributed $15 million to complete the fund-raising drive for the project, which marks a construction milestone on Wednesday.

The building, which is expected to more than double the number of computer science majors the UW can accommodate when it is completed in 2019, was named for the Microsoft co-founder and his spouse after Microsoft and several current and former executives collectively contributed $30 million to fund its construction.


Resources for dealing with the Undergraduate CS Capacity Crisis: Guest Post from Eric Roberts

Mark Guzdial, Computing Education Blog


from

Eric Roberts emailed to SIGCSE-members a note with resources on the capacity crisis. He graciously agreed to let me share it here as a guest blog post. Thanks, Eric!


Five priorities for weather and climate research

Nature, Comment; Øystein Hov, Deon Terblanche, Gregory Carmichael, Sarah Jones, Paolo M. Ruti & Oksana Tarasova


from

The meteorological community must adapt. Two-thirds of national meteorological organizations need to strengthen their core infrastructure. Researchers need to adopt fresh approaches to collecting and disseminating environmental information. Communication, alliances and partnerships must be revamped to be interactive, informal and inclusive.

To begin this process, the WMO convened more than 100 experts in atmospheric, hydrological and related sciences in Geneva, Switzerland, in October. In a timely reminder of the threat of natural hazards, the meeting had to be relocated from Bali, Indonesia, because the Mount Agung volcano became active. The outcomes of the meeting have now been agreed by more than 50 countries and will be put to the World Meteorological Congress for ratification in 2019.


Analytics Lessons from the Sports Data Revolution

InformationWeek, Howard M. Cohen


from

A new MIT program helps executives engage in guided strategic thinking to develop an analytics program through best practices from the sports industry.


Harvard computer scientist recruited to UBC through federal research program

University of British Columbia, UBC News


from

A top computer scientist and entrepreneur from Harvard University will join UBC next September, thanks to the federal government’s Canada 150 Research Chairs Program.

Harvard professor Margo Seltzer will join UBC as the Canada 150 Research Chair in Computer Systems and the Cheriton Family Chair in Computer Science in September 2018, a position that comes with $1 million in federal funding per year for seven years. Seltzer is currently a computer science professor and director of the Center for Research on Computation and Society in the John A. Paulson School of Engineering and Applied Sciences at Harvard. As a researcher, Seltzer delves into problems that are fundamental to how computer systems work, with past research funded by prominent tech companies like Google, Microsoft, IBM, NetApp, and Oracle.


Stanford AI4ALL, formerly SAILORS, launches for summer 2018

Medium, AI4ALL


from

Stanford AI4ALL will run alongside 5 additional AI4ALL education programs around North America in summer 2018. Additional AI4ALL education program sites include: UC Berkeley, Carnegie Mellon University, Princeton University, Boston University, and Simon Fraser University in Burnaby, Canada.


Getting the World Quantum Ready

IBM, THINK Blog, Dario Gil


from

The world took a big leap forward toward quantum computing readiness, today. A dozen international organizations representing Fortune 500 companies, academia, and the government joined the newly minted IBM Q Network. Together, we are committed to exploring scientific and commercial applications of quantum computing, leveraging IBM’s recently-announced 20-qubit commercial system – and, soon, our next-generation 50 qubit processor.


Intel Sheds More Light On Benefits of Nervana Neural Network Processor

PC Perspective, Tim Verry


from

Intel is currently working with Facebook and hopes to have its deep learning products out early next year. The company may have axed Knights Hill, but it is far from giving up on this extremely lucrative market as it continues to push towards exascale computing and AI. Intel is pushing for a 100x increase in neural network performance by 2020 which is a tall order but Intel throwing its weight around in this ring is something that should give GPU makers pause as such an achievement could cut heavily into their GPGPU-powered entries into this market that is only just starting to heat up.


How the loss of net neutrality could change the internet

POLITICO, Margaret Harding McGill


from

Supporters of the repeal say it will free up internet providers to innovate, while opponents fear the online world will become more like cable TV.


The FCC just voted to repeal its net neutrality rules, in a sweeping act of deregulation

The Washington Post, Brian Fung


from

Federal regulators voted Thursday to allow Internet providers to speed up service for websites they favor — and block or slow down others — in a decision repealing landmark Obama-era regulations overseeing broadband companies such as AT&T and Verizon.

The move by the Federal Communications Commission to deregulate the telecom and cable industries was a prominent example of the policy shifts taking place in Washington under President Trump and a major setback for consumer groups, tech companies and Democrats who had lobbied heavily against the decision.


Pittsburgh dominating global artificial intelligence competition

TribLIVE, Aaron Aupperlee


from

Pittsburgh had more teams than any other city in the country move to the next round of the $5 million IBM Watson AI XPRIZE.

“AI can be applied to so many different fields,” Amir Banifatemi, prize lead for competition, told the Tribune-Review. “The goal of the prize is to really showcase the hard problems that can be solved with artificial intelligence and can be achieved with collaboration between man and machine.”


Patenting intellectual property for artificial intelligence as complex as some AI code

Financial Post, Denise Deveau


from

Benjamin Alarie, co-founder and CEO of Blue J Legal, says the patent for its AI software for legal and accounting firms is filed and pending. But in an AI world, he’s the first to admit it is far from a simple process.

“Patents take three to five years to process. At the same time, technology is moving very, very quickly. If you wait too long to patent something that’s truly innovative, it’s likely someone else will file.”

Securing intellectual property (IP) can be a major hurdle for startups at the best of


Company Data Science News

Andrew Ng announced what he will be doing after leaving Baidu. His startup, Landing.ai, will help companies in manufacturing take advantage of machine learning and AI.



Spotify is considering an IPO without the typical team of (largely self-serving) bankers underwriting the process. Matt Levine, my hands-down favorite market watcher/writer explains how this could work: “Spotify will go public by means of an opening auction on the New York Stock Exchange or Nasdaq: Early one morning, some Spotify shareholders will make indicative offers to sell their shares, and some bold investors will make indicative bids to buy them, and the exchange will publish some tentative price that seems like it will clear the supply and demand, and then other shareholders and buyers can come in and adjust the prices and quantities that they want, and eventually a clearing price will be reached, and the stock will open and trade normally.” This strategy may work better with a well-known company, so maybe they’re able to reap some rewards from brand awareness (e.g. the reward of not having to deal with the IPO roadshow and the bankers who set aside chunks of the initial offering for their favored investors, many of whom are only interested in the pop before they drop the stock).

IBM is hoping to fight its way into industry-level AI leadership with hardware, introducing a new Power9 chip that can either be packaged into third party computers or accessed via the cloud. Reportedly, it makes processes on Chainer, TensorFlow and Caffe up to 4x faster. Interesting to see an older tech company following the lead of a younger one (Google).

“We now know who our new overlord is,” said chess researcher David Kramaley, the CEO of chess science website Chessable. It’s DeepMind. Now you know how those human Go players feel.



Hedge funds are worried about overfitting. Not sure that this is interesting, unless you want to go into finance and beat the hedge funds at their own game, in which case: may the feature engineering be ever in your favor.



Wes McKinney, left the academic life to build and maintain the extremely popular Pandas data science python library. Officially a full-time employee of Two Sigma, he exemplifies one sustainable version of open source success.



“Nearly 45 percent of the 74,000 statisticians in this country are between the ages of 25 and 34,” according to a new study based on Bureau of Labor Statistics data. In other words, millennials are dominating the field and getting paid well, too. Average pay is $71,000 but many make six figures.



Fei Fei Li announced that Google is opening a new location in Beijing which will be focusing on Google Cloud and AI.


Farmers spot diseased crops faster with artificial intelligence

WQAD, CNN Money


from

After a year-long test of a variety of new technologies, Evergreen FS found artificial intelligence could identify trouble, such as fungus growth and water shortages, in corn and soybean crops weeks before the naked eye would ever realize it.

The tech, which comes from startup Ceres Imaging, offers farmers an AI analysis of photos taken from planes flying several thousand feet above fields. Previously, the technology was only available for orchards and vineyards.


Artificial Intelligence, NASA Data Used to Discover Eighth Planet Circling Distant Star

NASA


from

Our solar system now is tied for most number of planets around a single star, with the recent discovery of an eighth planet circling Kepler-90, a Sun-like star 2,545 light-years from Earth. The planet was discovered in data from NASA’s Kepler Space Telescope.

The newly-discovered Kepler-90i – a sizzling hot, rocky planet that orbits its star once every 14.4 days – was found using machine learning from Google. Machine learning is an approach to artificial intelligence in which computers “learn.” In this case, computers learned to identify planets by finding in Kepler data instances where the telescope recorded signals from planets beyond our solar system, known as exoplanets.


Why are we working so hard to open up science? A personal story.

Center for Open Science, Brian Nosek


from

In the Fall of 2011, Sarah Mackenzie, the maid of honor at my wedding, was diagnosed with a rare form of ovarian cancer. Sarah and her family were motivated to learn as much as they could about the disease to advocate for her care. They weren’t scientists, but they started searching the literature for relevant articles. One evening, Sarah called us, angry. Every time she found an article that might be relevant to understanding her disease, she ran into a paywall requiring $15-$40 to access it. Public money had paid for the research, yet she was barred from making any use of it. Luckily, she had us. Most people in Sarah’s position don’t have the luxury of friends at wealthy academic institutions with subscriptions to the literature.

During this time, I was pursuing an interest in the business models of scholarly communication. I wanted to understand the ways in which these models interfered with the dissemination of knowledge that could improve quality of life. Sarah’s experience illustrated one key part of the problem–the outcomes of research should be public goods, but the business models of publishing make them exclusive goods. Lack of access to published literature limits our ability to apply what we know to improving others’ quality of life. If doctors can’t access the literature, they can’t keep up with the latest innovations for care. If policy makers can’t access the literature, they can’t create evidence based policies. To advance solutions and cures, the outcomes of research must be open.

 
Deadlines



Excel team considering Python as scripting language: asking for feedback

Microsoft is officially considering providing Python support in Excel (finally). If you are interested in this, please visit their uservoice page and let them know what you think. Thank you!” … https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/10549005-python-as-an-excel-scripting-language

SciPy 2018 General Conference Talk and Poster Presentations

Austin, TX SciPy 2018, the sixteenth annual Scientific Computing with Python conference, will be held this July 9-15, 2018. Deadline for submissions is February 9, 2018.
 
Tools & Resources



From Data Visualization to Interactive Data Analysis

Medium, Enrico Bertini


from

“Visualization projects with high visibility focus on two main purposes: inspiration and explanation. Visualization can however be used (and is actually used) to increase understanding of complex problems through data analysis. These project are less visible but by no means less important.”


Bringing AI to your Workflow without Code: Stamplay adds Algorithmia

Algorithmia blog


from

“Stamplay can monitor and interact with hundreds of third-party services, providing an intuitive interface for and building complex workflows (with loops, conditionals, and fault-tolerance) without ever writing a line of code — unless you want to. Algorithmia provides over 4,000 (and growing) microservices ranging from simple computational utilities to advanced deep-learning tools. In combination, these allow you automate business and personal processes with the power of AI.”


Introduction to Random Forests

DataScience.com, inertia7, Raul Eulogio


from

“Random forests, also known as random decision forests, are a popular ensemble method that can be used to build predictive models for both classification and regression problems. Ensemble methods use multiple learning models to gain better predictive results — in the case of a random forest, the model creates an entire forest of random uncorrelated decision trees to arrive at the best possible answer.”

To demonstrate how this works in practice — specifically in a classification context — I’ll be walking you through an example using a famous data set from the University of California, Irvine (UCI) Machine Learning Repository.


Keyword spotting for Microcontrollers

GitHub – ARM-software


from

This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword spotting on Microcontrollers. The scripts are adapted from Tensorflow examples and some are repeated here for the sake of making these scripts self-contained.


Automatic Speaker Recognition using Transfer Learning

Towards Data Science blog, Christopher Gill


from

“We sought to create a system able to quickly add user profiles and accurately identify their voices with very little training data, a few sentences as most! This learning from one to only a few samples is known as One Shot Learning. This article will outline the phases of our project in detail.”

 
Careers


Full-time positions outside academia

Data Engineer (crowdsourced, linguistic)



Allen Institute for Artificial Intelligence; Seattle, WA

Deputy Chief Information Officer



National Foundation on the Arts and the Humanities, Institute of Museum and Library Services; Washington, DC
Internships and other temporary positions

Boston Red Sox David Ortiz Fellowship



Boston Red Sox; Boston, MA

Leave a Comment

Your email address will not be published.