Data Science newsletter – April 4, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for April 4, 2017

GROUP CURATION: N/A

 
 
Data Science News



A look at deep learning for science

O'Reilly Radar, Prabhat


from

I recently chatted with O’Reilly’s Jon Bruner on the Bots Podcast. We discussed the architecture of the LBNL supercomputing center, the push to integrate deep learning libraries within this architecture, and some compelling use cases for deep learning methods (such as object or pattern detection) that must scale for extremely large data sets. [audio, 26:21]


What AI can and cannot do today

Network World, Steven Max Patterson


from

Artificial intelligence has been called the next big thing. Neural science professor an AI entrepreneur Gary Marcus sets our expectations.


Faculty to Consider New Data Science Course Requirement

The Harvard Crimson, Joshua J. Florence and Mia C. Karr


from

Future Harvard students may be required to take a new required category of course entitled “Thinking with Data” should the Faculty approve a motion that will be first discussed at Tuesday’s Faculty meeting.

“Thinking with Data” courses, according to a report from a subcommittee tasked with developing an outline of the requirement, could take several forms: they could be created from scratch or reshaped from existing introductory Statistics department classes. The courses would aim to teach students about how to acquire and analyze data.


An AI Chat Solution That Transforms the Way Businesses Talk to Customers

Cornell Tech, News & Views


from

A recent survey conducted by BT found that 85 percent of people who’ve called a business that doesn’t answer won’t call back, and 75 percent won’t leave a voicemail. When confronted with the possibility of having to wait in line behind several other callers, most people — around 80 percent — hang up. For small businesses around the world, this adds up to billions of dollars lost all because these companies don’t have the bandwidth to manage every call and voicemail.

Enter Bowtie, a company founded by Cornell Tech alumni Ron Fisher, Johnson Cornell Tech MBA ’16, Mike Wang and Vivek Sudarsan, Masters of Computer Science ’16. Bowtie is a streamlined portal for businesses to interact with their customers through chat, without the frustration and hassle of relying on the analog service a telephone provides. Focusing on the beauty, wellness and fitness spaces, Bowtie helps businesses turn missed calls into bookings using an artificial intelligence (AI) assistant.


Spaces for debate: How important are blogs for the dissemination of research?

SAGE Connection – Insight


from

Over the past decade, blogging has become an influential platform for international debate and the expression of ideas. In celebration of this, the Duckie Awards, known more formally as the Online Achievement in International Studies (OAIS) Awards, highlight the biggest achievements in the field of international relations annually.

SAGE Publishing has been delighted to partner with and sponsor the Duckies since their launch in 2013, working together with the ISA to support our shared goal of enabling debates around public policy and the social sciences.


Amazon’s shopping spree at business schools

FT.com, Jonathan Moules


from

Amazon, together with Google and Apple, is now among the biggest recruiters from top-tier business school campuses around the world, rivalling traditional employers such as investment banks. Only consultancies hired more MBA graduates than Amazon at London Business School and Berkeley’s Haas School of Business in California last year. The online retailer was the number-one recruiter of MBAs at the University of Michigan’s Ross School of Business in 2016.


How three Michigan students turned a class project into $3 million

CNBC, Ashley McHugh-Chiappone


from

As University of Michigan students and friends, they struggled to find viable college housing. Cribspot was born out of an entrepreneurship class to streamline the rental process for university students.

Jones and his co-founders — Jason Okrasinski and Evan Dancer — launched Cribspot in the fall of 2012 in Ann Arbor, Michigan, each of them turning down other job offers that would have sent them to different areas of the country. “We wanted to build a tenant-facing brand that tenants really love,” said Jones.


This Computer Program Detects Cancer Earlier Than Ever—Without Surgery

VICE, Motherboard, Knvul Sheikh


from

Cancer is often found when someone starts to feel symptoms—pain, an abnormal growth, or maybe just fatigue. Now scientists have developed a computer program that could identify and locate cancer even before symptoms arise, opening the door for earlier screening and better treatment.

The program, called CancerLocator, detects tumor DNA in patient blood samples, and precisely pinpoints where the tumor is located in the body. In a small pilot study, it successfully diagnosed liver, lung, and breast cancer in 80 percent of cases, giving researchers hope that the program could eventually be used as part of regular health checks, eliminating the need for invasive biopsies.

Developed by researchers from the University of California, Los Angeles, and the University of Southern California, CancerLocator works by analyzing DNA that escapes into the bloodstream when cells die.


Baking Real-Time Analytics into the Bread Industry

RTInsights


from

Bakkersland, the largest baking company in the Netherlands, needed to get a better idea of what demand for its products would be on a day-to-day basis, so it could produce accordingly. The company, which has 300 trucks delivering 2 million fresh baked breads to 1,200 supermarkets every day, needed to reduce the amount of leftover products left on the shelves.


Northeastern officially opens new Interdisciplinary Science and Engineering Complex

Northeastern University, news @ Northeastern


from

Northeastern officially opened the Interdisciplinary Science and Engineering Complex, a facility that U.S. Sen. Elizabeth Warren called “a true temple of science” and that President Joseph E. Aoun declared “a piece of art” that will allow researchers, students, staff, and the community “to be at their best.”


UCLA Uses Chatbots As Radiology Consultants

Hospitals & Health Networks magazine


from

Interventional radiologists at the University of California at Los Angeles have harnessed artificial intelligence to create a virtual radiologist that provides clinical decision support to others.

“The idea was to create a tool for non-radiologist providers in the hospital to help them when they need an interventional radiologist. Some questions are complex, but some are routine and can be automated,” says Kevin Seals, M.D., a resident physician in radiology at UCLA and the programmer of the application. This is the first use of this type of AI to create a “chatbot” virtual consultant, he says.


MapD Raises $25M Series B to Drive Adoption of GPU-Powered Analytics

MapD


from

MapD Technologies has secured $25M in funding in a Series B round lead by New Enterprise Associates (NEA) with participation from NVIDIA, Vanedge Capital, and Verizon Ventures.

This new investment will allow MapD to scale up its engineering, sales, and marketing teams to accelerate product development and deepen customer adoption. It will help us move faster toward our vision of making GPU-powered analytics ubiquitous in the enterprise.


17-year-old entrepreneur has Yelp in her crosshairs

The Next Web, Bryan Clark


from

After raising $280,000 in pre-seed funding, TagDat is off to a hot start. You’ve probably never heard of the recommendation app, as it launched just under a year ago and competes in the same space as heavy-hitters like Yelp and TripAdvisor. That could soon change.


Baidu’s AI team taught a virtual agent just like a human would their baby

TechCrunch, Darrell Etherington


from

Baidu’s artificial intelligence research team has achieved a significant milestone: teaching a virtual agent “living” in a 2D environment how to navigate its world using natural language commands, by first teaching it language through positive and negative reinforcement. The especially exciting thing, according to the scientists, is that the agent ended up developing a “zero-shot learning ability,” which essentially means that the AI agent developed a basic sense of grammar.

You probably don’t remember it from personal experience because it happened when you were a baby, but this is basically how parents teach their kids when very young. You show them images, repeat words and, eventually, with enough positive reinforcement, the kid can associate those words with those images and voilà — it knows the names of things.

Baidu’s big breakthrough, though, is that the agent within its system can apply to new situations commands it’s learned.


Why this NYU Tandon prof is mapping out all the electricity used in the city

Technical.ly Brooklyn, Tyler Woods


from

“We have a treasure trove of information of 23,000 buildings across the city each year and it’s given us this incredible insight into how buildings are using energy and that can help us figure out which ones are performing well and which ones are performing poorly,” Kontakosta explained by phone last week.

That information comes out of a 2009 law passed by the city called Local Law 84, or the NYC Benchmarking Law. It requires large building owners to “enter their annual energy and water use in the U.S. Environmental Protection Agency’s (EPA) online tool, ENERGY STAR Portfolio Manager and use the tool to submit data to the City.”

 
Events



Applied Artificial Intelligence Conference 2017

Bootstrap Labs


from

San Francisco, CA By Bootstrap Labs, May 11, at Bespoke Events (Westfield San Francisco Centre, Level 4) [$$$]


Manuel Lima: The Book of Circles: Visualizing Spheres of Knowledge

92nd St Y


from

New York, NY Thursday, May 11, at 6:30 p.m., Lexington Avenue at 92nd St [$$]


New England Machine Learning Day 2017

Microsoft Research


from

Cambridge, MA The event will bring together local academics and researchers in machine learning and its applications. There will be a lively poster session during lunch. Submit posters at https://aka.ms/nemlposter by April 24.


Big Data Tech 2017

MinneAnalytics


from

Bloomington, MN June 7 at Normandale Partnership Center [$$$]


Learn about the Stern Center for Research Computing

NYU Stern School of Business


from

New York, NY Presentation Tuesday, April 11, at 3 p.m.

 
Deadlines



Topic Extraction Challenge

We invite your contributions, by submitting a paper on the comparision of topic extraction approaches and results. In particular we encourage you to submit work-in-progress papers that present your solution to the topic extraction challenge using the Astro Data Set. Deadline for paper submissions for the challenge is April 10. Special session on the topic extraction challenge at the upcoming ISSI conference, from 16-20 October 2017 in Wuhan, China

Fragile Families Challenge

The Fragile Families Challenge is physically housed in Bendheim-Thoman Center for Research on Child Wellbeing at Princeton University. It is being organized by Matthew Salganik, Ian Lundberg, and Sara McLanahan. Apply now to participate.

2017 WiML Organizer Application

The Women in Machine Learning Organization (WiML) is seeking five self-identified women to serve as organizers for the 2017 WiML events. Deadline to apply is Monday, April 10.

New England Machine Learning Day 2017

Cambridge, MA The event will bring together local academics and researchers in machine learning and its applications. There will be a lively poster session during lunch. Submit posters at https://aka.ms/nemlposter by April 24.
 
Tools & Resources



Release 1.4.2-beta.1 · transcendent-ai-labs/DynaML · GitHub

GitHub – transcendent-ai-labs


from

DynaML is a scala library/repl for implementing and working with general Machine Learning models. Machine Learning/AI applications make heavy use of various entities such as graphs, vectors, matrices etc as well as classes of mathematical models which deal with broadly three kinds of tasks, prediction, classification and clustering.


How Machines Make Sense of Big Data: an Introduction to Clustering Algorithms

Medium, freeCodeCamp, Peter Gleeson


from

Us humans take it for granted how good we are categorizing and making sense of large volumes of data pretty quickly. Whether it’s a paragraph of text, or images on a screen, or a sequence of objects — humans are generally fairly efficient at making sense of whatever data the world throws at us.

Given that a key aspect of developing A.I. and Machine Learning is getting machines to quickly make sense of large sets of input data, what shortcuts are there available?


Introducing PolYamoR, the two-way translator between Python and R

Dataiku, Florian Douetteau


from

After one year of intense development, we are proud to open source PolYamoR, the first forward and reverse-automated translation system for Python and R. PolYamoR is the first multi-lingual translation system that enables full transparency, no ambiguity, and manages all of the edge cases of complex programming. PolYamoR can translate plain Python into plain R and vice versa, leading to an unexpected new era of conversations between cultures.


Ten simple rules for responsible big data research

PLOS Computational Biology; Matthew Zook et al.


from

The use of big data research methods has grown tremendously over the past five years in both academia and industry. As the size and complexity of available datasets has grown, so too have the ethical questions raised by big data research. These questions become increasingly urgent as data and research agendas move well beyond those typical of the computational and natural sciences, to more directly address sensitive aspects of human behavior, interaction, and health. The tools of big data research are increasingly woven into our daily lives, including mining digital medical records for scientific and economic insights, mapping relationships via social media, capturing individuals’ speech and action via sensors, tracking movement across space, shaping police and security policy via “predictive policing,” and much more.

The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.


Nesterov’s Punctuated Equilibrium

Ben Recht


from

Following the remarkable success of AlphaGo, there has been a groundswell of interest in reinforcement learning for games, robotics, parameter tuning, and even computer networking. In a landmark new paper by Salimans, Ho, Chen, and Sutskever from OpenAI, the authors show that a particular class of genetic algorithms (called Evolutionary Strategies) gives excellent performance on a variety of reinforcement learning benchmarks. As optimizers, the application of genetic algorithms raises red flags and usually causes us to close browser windows. But fear not! As we will explain, the particular algorithm deployed happens to be a core method in optimization, and the fact that this method is successful sheds light on the peculiarities of reinforcement learning more than it does about genetic algorithms in general.


5 Data Tips for Machine Learning in Production

Making Dia blog, Ethan Rosenthal


from

1. You can’t measure that which you do not have

This may seem like an obvious statement akin to Wayne Gretzky’s (or Michael Scott’s) “You miss 100% of the shots you don’t take.” Just like that quote, it can be helpful to continually remind oneself that one first needs to record events before they can be measured.


Supervised learning is great — it’s data collection that’s broken

Explosion AI, Ines Montani & Matthew Honnibal


from

Short of Artificial General Intelligence, we’ll always need some way of specifying what we’re trying to compute. Labelled examples are a great way to do that, but the process is often tedious. However, the dissatisfaction with supervised learning is misplaced. Instead of waiting for the unsupervised messiah to arrive, we need to fix the way we’re collecting and reusing human knowledge.

 
Careers


Postdocs

Data Science Postdoctoral Scholar



University of California-Berkeley, School of Information; Berkeley, CA
Internships and other temporary positions

Machine Learning Intern (25942)



Dolby; San Francisco, CA

Leave a Comment

Your email address will not be published.