Data Science newsletter – October 27, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for October 27, 2017

GROUP CURATION: N/A

 
 
Data Science News



Amazon Deepens Ties With German AI Hub

Handelsblatt Global, Barbara Gillmann Barbara Gillmann


from

Not to be outdone by Google’s latest advances in artificial intelligence, online retailing giant Amazon says it has partnered with the Max Planck Institute for Intelligent Systems, a leading AI research center in Germany.

According to details obtained by Handelsblatt, Amazon will pump nearly €2 million ($2.35 million) into the institute’s work on artificial intelligence and robotics. But the Seattle-based company is also going a step further: It plans to build its own research outfit in the southwest German city of Tübingen, where it will directly employ 100 scientists.


Caltech opens a drone lab, with big ideas to improve how robots work with humans

Los Angeles Times, Rosanna Xia


from

Caltech opened its new Center for Autonomous Systems and Technologies (CAST), where scientists hope to make these kinds of technological leaps.

The goal is to improve the ability of drones and robots to think and react independently. The more that happens, the more they will be able to help humans gather big data, respond to disasters and explore space, the deepest parts of the ocean and other unreachable corners of the world.


Company Data Science News

NuTonomy, a self-driving car company based in Boston, was acquired by automotive parts company Delphi. Delphi paid $450 million for the start-up that spun out of MIT in 2013.



Salaries for talented machine learning, NLP, and other deep data science people are reportedly hovering between $300,000 and $500,000. Of course, those salaries are only for top talent. The average data science salary is a healthy $150,000 or so, which is good, but not stratospheric. The challenge with such high salaries for those at the forefront of the field is that it can get hard to keep talented people working in academia. Next week I’ll have an article about brainpool.ai, a London-based start-up planning to launch in New York and Montreal that offers lucrative, flexible, project-based work to talented data scientists who may also be keeping their toes in the academic pool. Or not. Some of them make an income winning Kaggle competitions.
Stitch Fix is preparing for an IPO, which requires bankers to establish a valuation for the company. Were it to be considered a retailer, it might be valued at around 1.4 times estimated 2017 sales. But, if it is considered an ecommerce company, its valuation would be much higher, at around 4.4 times 2017 sales. This is part of the reason why so many companies looking to IPO have an incentive to call themselves tech companies, even if they mostly sell consumer goods like razors, clothing, and mayonaise.



Terah Lyons a former Obama administration staffer has been hired to head up the Partnership for AI, a group backed by founding companies Facebook, Amazon, Google, Microsoft, and IBM. Immediately prior to this appointment Lyons was a Mozilla Foundation fellow. Her work will continue to investigate the impact of AI on the economy and society.



Amazon launched a gift shop for homemade goods which will be in direct competition with Etsy. They are truly becoming the everything store, one sector at a time.

Amazon also deepened its partnership with the Max Planck Institute in Germany to work together on AI applications.

Oh, and apparently banks should worry that Amazon might be coming for their industry, too.

Boeing invested in a drone company called Near Earth Autonomy. Will be interesting to see what an aircraft manufacturer does with drone technology.



Western Digital, maker of the SanDisk product line is selling high-capacity data storage cards (64-128 GB) with faster read- (80 MB/s) and write- (50 MB/s) speeds to meet new use cases in automotive, as well as in drones, surveillance cameras, and myriad industrial applications. That’s the update from the hardware front.

Microsoft stopped manufacturing the Kinect.

There’s an exciting new voice recognition start-up, R7 Speech Sciences on the scene that promises to write an AI and engineering blog. Founders Brian McMahon and Delip Rao posted an overview of the technology on arXiv.



Jeff Dean of Google Brain talked about how hard it is to make deep learning work with only a few examples, a topic frequently discussed all over the place lately. Ian Goodfellow tackled the question of training machines with no examples.



For a refreshing perspective on ‘how dumb AI is’, we have this interview with Facebook AI Chief Yann LeCun. Yann is awesome and funny. Always listen to him.

Facebook may have cybersecurity similar to that on a college campus even though some wish they had defense contractor level security lock-downs. Point being: be careful what you share, like, click on and never post information about friends without their consent, including pictures.



Netflix is raising another $1.6 billion in debt to finance more original content production and otherwise support operational costs. Should be great viewing for all of its subscribers!



Robots will soon roam Walmart stores looking for errors.

Google Research announced OpenFermion an open source chemistry package for quantum computers. This seems to be targeting what is currently a rather small group of experts, but that’s true of many nascent open source projects that go on to be widely adopted.


Extra Extra

Will Facebook create a chilling effect for new social media startups because it copies or buys whatever good ideas emerge outside the blue garden? Is this already happening?

The LA Dodgers and the Houston Astros will go head to head in a competition driven by teams who rely heavily on statistics. They hire statisticians with PhDs from schools like Harvard and it will be fun to see what other sports can learn from baseball. All that work allows the baseball analysts in media to put the these teams’ roster decisions under a microscope.

Just in time for Halloween there’s a new dataset full of candy data. Seriously.


Exclusive: Microsoft Has Stopped Manufacturing The Kinect

Fast Company, Mark Wilson


from

Manufacturing of the Kinect has shut down. Originally created for the Xbox 360, Microsoft’s watershed depth camera and voice recognition microphone sold ~35 million units since its debut in 2010, but Microsoft will no longer produce it when retailers sell off their existing stock. The company will continue to support Kinect for customers on Xbox, but ongoing developer tools remain unclear. Microsoft shared the news with Co.Design in exclusive interviews with Alex Kipman, creator of the Kinect, and Matthew Lapsen, GM of Xbox Devices Marketing.


Introducing R7 Speech Sciences

R7 Speech Sciences


from

R7 Speech Sciences is an AI company focused on understanding spoken conversations. We believe in a voice-first future, and R7 was born out of our frustration with the existing voice products and their inability to capture nuances in conversations. By pushing the state-of-the-art in machine learning and speech science, harnessing massive datasets, and a strong team, we are building the next generation of voice products that go beyond just words. We are also introducing this blog which will be mostly science and engineering focused, other than the occasional product-related announcement.


CMU advanced manufacturing to anchor Mill 19 building at Hazelwood Green

TribLIVE, Aaron Aupperlee


from

Carnegie Mellon University will fill most of a new building slated for construction inside a massive abandoned steel mill in Hazelwood.

CMU announced Wednesday an agreement with the nonprofit Regional Industrial Development Corporation, owners of the iconic Mill 19 building in the Hazelwood Green development, to house the Advanced Robotics for Manufacturing Institute and the Manufacturing Futures Initiative.


Artificial Intelligence Can Be as Biased as Humans—But It Doesn’t Have to Be

NOVA Next, Christina Couch


from

In a brightly lit office, Joy Buolamwini sits down at her computer and slips on a Halloween mask to trick the machine into perceiving her as white.

For Buolamwini, a black PhD student at MIT’s Center for Civic Media, electronic racial deception is sometimes the most efficient way she can do her job. Buolamwini’s research focuses on facial analysis, a suite of technologies used in everything from auto-focusing smartphone cameras to advertisements to border security. But there’s a problem with many of these algorithms—they sometimes can’t detect Buolamwini or people who look like her.


Let your car tell you what it needs

MIT News


from

MIT team develops software that can tell if tires need air, spark plugs are bad, or air filter needs replacing.


Google Brain chief: Deep learning takes at least 100,000 examples

VentureBeat, Blair Hanley Frank


from

While the current class of deep learning techniques is helping fuel the AI wave, one of the frequently cited drawbacks is that they require a lot of data to work. But how much is enough data?

“I would say pretty much any business that has tens or hundreds of thousands of customer interactions has enough scale to start thinking about using these sorts of things,” Jeff Dean, a senior fellow at Google, said in an onstage interview at the VB Summit in Berkeley, California. “If you only have 10 examples of something, it’s going to be hard to make deep learning work. If you have 100,000 things you care about, records or whatever, that’s the kind of scale where you should really start thinking about these kinds of techniques.”


Is Dodgers v Astros the first all-computer World Series?

The Guardian, Les Carpenter


from

The future of the World Series does not wear Dodger blue or Astro orange. It won’t throw a strike, hit a home run or chase a line drive into the gap, though it can predict the probability of such things occurring with remarkable accuracy.

The future of the World Series lives not in the mortal realm, but in mainframes and clouds and flash drives and smartphones carried by men with pedigrees much loftier than half a lifetime in the worn fields of the minor leagues.

The World Series that starts Tuesday night in Los Angeles will feature two teams, in the Los Angeles Dodgers and Houston Astros, who use statistical analysis as their primary operational tool. While other Series teams have relied on analytics, including the Chicago Cubs and Cleveland Indians last year, never have there been two clubs who use it as much as the Dodgers and Astros.

Given their success, it’s hard to imagine more teams won’t try to imitate what they have done. The statistical revolution that has taken over baseball in recent years might be close to complete. If math has become a reliable driver to the World Series, then why won’t everyone fill their front offices with executives who can cull data to find trends?


Data science can improve aid distribution

Nature News & Comment, Editorial


from

One randomized controlled trial (RCT), reported in 2012 and involving 15,000 households in rural India, found no evidence of improved lung function in women in the first four years after they received a stove (see go.nature.com/2zjgwny).

The RCT suggests that these efforts might be revised. But as useful as RCTs are in development economics and global health, they have limits. Findings in one place might be wildly different in another. And in a crisis, first responders are typically too busy trying to provide shelter, health care and bare necessities to design and carry out a controlled set-up.

But humanitarian groups can still improve their efforts in the short and long term through evidence obtained with new technology. A Nature News Feature this week highlights software called the Dharma Platform, which enables workers on the front line of hurricanes, outbreaks or other crises to record, share and analyse useful data — for example, the spread of disease in rural villages. Dharma is being tested by Médecins Sans Frontières (or Doctors Without Borders), the World Health Organization and other groups combating crises in the Middle East. And it is just one of many new technologies that will make data faster to collect and easier to exchange.


Facebook’s security is like a “college campus,” but they face threats like a “defense contractor”

Boing Boing, Cory Doctorow


from

A leaked recording of Facebook security chief Alex Stamos (who refused to help with an illegal NSA spying program when he was CSO for Yahoo) has him describing the company’s IT culture as being “like a college campus, almost” while the company has the “threat profile of a Northrop Grumman or a Raytheon or another defense contractor.”

It’s an alarming revelation, given the sensitivity of the data Facebook holds on billions of internet users, including people who aren’t Facebook users but have their data recorded by Facebook through updates from their friends and Facebook cookies that are set and read on pages that have Facebook “Like” buttons or embeds.

Stamos says that the company’s IT culture is focused on giving “access to data and systems to engineers to make them ‘move fast'” which means that systems aren’t compartmentalized, so that an intrusion into one system can be leveraged to gain access to other systems.


Banks should fear the financial prowess of Amazon rather than fintech firms, says McKinsey’s global banking annual review

City A.M., Rebecca Smith


from

The global banking industry is failing to respond to a growing digital threat, that could also be harnessed to ramp up profits, according to McKinsey’s annual banking review.

And it’s not coming from the expected suspects of budding fintech firms, but the likes of Jeff Bezos’ ecommerce giant Amazon.

By tapping into the potential of new technology and robotic process automation, the cloud and apps, and developing “platform capabilities”, banks could secure an extra $350bn (£264bn) of profit over the next three to five years, the report said.


Computational Narrative Intelligence: Past, Present, and Future

Medium, Mark Riedl


from

Storytelling is an important part of how we, as humans, communicate, entertain, and teach each other. We tell stories dozens of times a day: around the dinner table to share experiences; through fables to teach values; through journalism to communicate important events, and in entertainment movies, novels, and computer games for fun. Stories also motivate people to learn, which is why they form the backbone of training scenarios and case studies at school or work.

Narrative intelligence is the ability to craft, tell, understand, and respond affectively to stories. Research in computational narrative intelligence seeks to instill narrative intelligence into computers. In doing so, the goal of developing computational narrative intelligence is to make computers better communicators, educators, entertainers, and more capable of relating to us by genuinely understanding our needs. Computational narrative intelligence is as much about human-computer interaction as it is about solving hard artificial intelligence problems.


Netflix will raise $1.6 billion in debt to fund new content and other ‘strategic transactions’

VentureBeat, Paul Sawers


from

Netflix has announced plans to raise another $1.6 billion in debt for “general corporate purposes” that may include new content, investments, and potential acquisitions, among other “strategic transactions,” according to a press release issued by the company.

The Los Gatos-based video streaming giant has made something of a habit of offering senior notes on a biannual basis as a means of funding new business initiatives. Indeed, exactly one year ago, the company announced a $1 billion debt offering, followed by another $1 billion six months ago. As with previous offerings, the notes will only be offered to “qualified institutional buyers” outside the U.S.

The news comes shortly after Netflix announced price increases for two of its three subscription tiers, which preceded its shares topping $200 for the first time.


Shift Technology raises another $28 million to prevent fraudulent insurance claims

TechCrunch, Romain Dillet


from

Shift Technology is raising a Series B round of $28 million from Accel and General Catalyst. Accel had already led the company’s previous round. Existing investors Elaia Partners and Iris Capital are also participating in today’s round.

And it seems like the startup has grown quite a lot since my last article. Shift Technology has now processed more than 100 million claims and revenue has grown by more than 800 percent.

Shift Technology uses artificial intelligence and huge data sets to detect patterns and flag fraudulent insurance claims. It is a software-as-a-service platform designed for insurance companies.


Data Visualization of the Week

The Washington Post, Xaquín G.V.


from


Tweet of the Week

Twitter, Shit Academics Say


from

 
Events



Data Services Research Day 2017 Registration

NYU Libraries


from

New York, NY “The Data Services Research Day at NYU Libraries is a showcase of geospatial and quantitative visualization in the NYU community. Students, faculty, and staff display maps and projects they’ve worked on and talk about the tools and data sources used to create them. Join us on the 5th floor of the Bobst Library on Wednesday, November 8, 2017 from 12:00 – 5:00 PM” [registration required]

 
Deadlines



NYC Media Lab, The Combine

“The Combine starts with an intensive, three month process that includes an entrepreneurship curriculum, an emphasis on market validation and customer development, and mentorship from executives at NYC Media Lab member companies.” Deadline to apply is November 19.
 
NYU Center for Data Science News



Using NLP to ask the right questions…about questions!

Medium, NYU Center for Data Science


from

CDS Deputy Director Arthur Spirling works with Cornell research team to create a framework for identifying the rhetorical function of different questions

 
Tools & Resources



Release SciPy 1.0.0

GitHub – scipy


from

“We are extremely pleased to announce the release of SciPy 1.0, 16 years after
version 0.1 saw the light of day. It has been a long, productive journey to
get here, and we anticipate many more exciting new features and releases in the
future.”


SO MUCH CANDY DATA, SERIOUSLY

University of British Columbia, SCQ


from

“As promised, here is the candy hierarchy data for 2017.”


Speech Recognition Is Not Solved

Awni Hannun, Writing About Machine Learning blog


from

One of the most visible deficiencies in speech recognition is dealing with accents1 and background noise. The straightforward reason is that most of the training data consists of American accented English with high signal-to-noise ratios. For example, the Switchboard conversational training and test sets only have native English speakers (mostly American) with little background noise.

But, more training data likely won’t solve this problem on its own. There are a lot of languages many of which have a lot of dialects and accents. It’s not feasible to collect enough annotated data for all cases. Building a high quality speech recognizer just for American accented English needs upwards of 5 thousand hours of transcribed audio.

 
Careers


Tenured and tenure track faculty positions

Associate Professor – Information Access/HCI/Social Computing



University of Massachusetts-Amherst, College of Information & Computer Sciences; Amherst, MA
Postdocs

Postdoc opportunities in Multi-Agent Systems and Robotics



King Abdullah University of Science and Technology; Thuwal, Saudi Arabia

Postdoctoral Research Assistant in Robot Learning for Action and Interaction



University of Oxford, Department of Engineering Science; Oxford, England

Leave a Comment

Your email address will not be published.