Data Science newsletter – April 25, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for April 25, 2017

GROUP CURATION: N/A

 
 
Data Science News



How Hedge Fund Billionaire James Simons Is Changing Math Education

Forbes, Micheala Tindera


from

FORBES spoke with Simons to discuss his own passion for math education and philanthropy.

Why is it important for kids today to focus on learning math?

Simons: Today, very many things are based on mathematics. Your cell phone for example, has all kinds of algorithms built in to it, which depend on, in some cases, pretty sophisticated mathematics. You don’t know that, and the average cell phone user doesn’t know that, but mathematics and quantitative methods in general has become more and more important in the economy. Not everyone has to know a great deal of mathematics, but on the other hand if you want to be an engineer you have to know math. If you want to be a scientist, you have to know math. Even today if you want to be a doctor, you have to learn a fair amount of mathematics because things are just more quantified than they were.


Requiring Data Storage In-Country Raises Corporate Risks

Bloomberg BNA, Bloomberg Law: Privacy & Data Security, George Lynch


from

The uptick in international laws demanding companies store personal data within geographic borders forces multinationals to engage in complicated risk assessments, privacy professionals said April 20.

Governments often demand data localization under the guise of protecting privacy, but such laws are really enacted to ease government access to data or to boost local cloud service providers, Lothar Determann, a data privacy partner at Baker McKenzie LLP in Palo Alto, Calif., said during a panel at the International Association of Privacy Professionals Global Privacy Summit 2017 in Washington.


Bloomberg’s Bruno Dupire on the importance of understanding how machines learn

Tech at Bloomberg


from

From startups to hedge fund giants, the finance industry is rapidly pushing beyond the large statistical models the quantitative community used for years, creating new artificial intelligence systems that analyze massive amounts of data and improve themselves without human intervention.

Yet, for all this innovation, it’s still common to view these smart, autonomous systems as black boxes. The tendency is to throw a lot of information into the system and hope that something good will come out of them. But, in order to realize the full potential of machine learning in finance, it’s critical in these early days to understand what the system is doing, how it’s learning, and what data it’s using.

It’s this theme – understanding what machines understand – that Bruno Dupire, renowned quant finance researcher and head of Quantitative Research at Bloomberg, explored at the sold-out “Machine Learning in Finance Workshop 2017” at The Data Science Institute at Columbia University earlier today.


The long game towards understanding dialog

Facebook Research, Alexandre Lebrun, Antoine Bordes, Leon Bottou, Marco Baroni


from

Getting to a natural language dialogue state with a chatbot remains a challenge and will require a number of research breakthroughs. At FAIR we have chosen to tackle the problem from both ends: general AI and reasoning by machines through communication as well as conducting research grounded in current dialog systems, using lessons learned from exposing actual chatbots to people. Our strength lies in embracing the diversity that spans both approaches. From long-term, fundamental research like the CommAI initiative, to shorter-term applied efforts such as FastText or Facebook M. Through these, combined with our team’s expertise across the AI spectrum, from deep learning for NLP to reinforcement learning, computer vision, and engineering, we hope to deliver significant natural language dialog advancements.


FAIR research being presented at ICLR 2017 – Facebook Research

Facebook Research


from

Facebook AI (FAIR) researchers and engineers are converging at the 5th annual International Conference on Learning Representations (ICLR) 2017 in Toulon, France this week. ICLR brings together the top artificial intelligence and machine learning experts to discuss how to best learn meaningful and useful representations of data to application areas such as vision, speech, audio and natural language processing.

At ICLR they will be sharing their latest research in AI through 18 conference and workshop track papers, including a subset of their work on dialog systems which is outlined here


Cornell Team Has Early Prototype for IARPA Fingerprint Technology Challenge

MeriTalk, Eleanor Lamb


from

Amit Lal, an electrical and computer engineering professor at Cornell University, is working with a group of students to create a device that can scan the surfaces of all 10 fingers in five minutes.

His team, which consists of postdoctoral, graduate, and undergraduate students, is competing in the Intelligence Advanced Research Projects Activity’s Nail to Nail Fingerprint challenge. Although groups have until July to showcase their technology, Lal told MeriTalk that his group has already built a device.

“We have early tech, which is very promising for what the program wants. It’s been a great deal of fun. It’ll be very educational for the students,” Lal said. “We’re taking it as it goes. We may be a little ahead of schedule.”


The Race To Build An AI Chip For Everything Just Got Real

WIRED, Business, Cade Metz


from

Companies like Google, Facebook, and Microsoft can still run their neural networks on standard computer chips, known as CPUs. But since CPUs are designed as all-purpose processors, this is terribly inefficient. Neural networks can run faster and consume less power when paired with chips specifically designed to handle the massive array of mathematical calculations these AI systems require. Google says that in rolling out its TPU chip, it saved the cost of building about 15 extra data centers. Now, as companies like Google and Facebook push neural networks onto phones and VR headsets—so they can eliminate the delay that comes when shuttling images to distant data centers—they need AI chips that can run on personal devices, too. “There is a lot of headroom there for even more specialized chips that are even more efficient,” LeCun says.

In other words, the market for AI chips is potentially enormous. That’s why so many companies are jumping into the mix.


Scientific discovery game significantly speeds up neuroscience research process

University of Washington, UW Today


from

A new scientific discovery game is allowing video gamers to significantly speed up reconstructing the intricate architecture of brain cells, a fundamental task in 21st century brain science.

Mozak, a new game developed by the University of Washington Center for Game Science in partnership with the Allen Institute for Brain Science, enables citizen scientists to produce complete, three-dimensional reconstructions of neurons from different regions of the brain in animals and people. Figuring out the different shapes of nerve cells is a fundamental first step in analyzing how they assemble into the vast circuits that make up our brain.


Can conflict be predicted?

ETH Zurich


from

Modern data science techniques can also be useful in conflict research. However, in an essay published in the journal Science, Lars-Erik Cederman, Professor of International Conflict Research at ETH Zurich, suggests that certain expectations regarding the predictability of armed conflict are unrealistic. ETH News caught up with him for a chat.


“Mindless Eating,” or how to send an entire life of research into question

Ars Technica, Cathleen O'Grady


from

Brian Wansink didn’t mean to spark an investigative fury that revisited his entire life’s work. He meant to write a well-intentioned blog post encouraging PhD students to jump at research opportunities. But his blog post accidentally highlighted some questionable research practices that caused a group of data detectives to jump on the case.

Wansink attracted the attention because he’s a rockstar researcher—when someone’s work has had such astronomical impact, problems in their research are a big deal. His post also came at a time when his field, social sciences, is under increased scrutiny due to problems reproducing some of its key findings.

Wansink is probably regretting he ever started typing. Tim van der Zee, one of the scientists participating in the ongoing examination into Wansink’s past, keeps a running account of what’s turned up so far. “To the best of my knowledge,” van der Zee writes in a blog post most recently updated on April 6, “there are currently 42 publications from Wansink which are alleged to contain minor to very serious issues, which have been cited over 3,700 times, are published in over 25 different journals, and in eight books, spanning over 20 years of research.”


Lyrebird claims it can recreate any voice using just one minute of sample audio

The Verge, James Vincent


from

Artificial intelligence is making human speech as malleable and replicable as pixels. Today, a Canadian AI startup named Lyrebird unveiled its first product: a set of algorithms the company claims can clone anyone’s voice by listening to just a single minute of sample audio.

A few years ago this would have been impossible, but the analytic prowess of machine learning has proven to be a perfect fit for the idiosyncrasies of human speech. Using artificial intelligence, companies like Google have been able to create incredibly life-like synthesized voices, while Adobe has unveiled its own prototype software called Project VoCo that can edit human speech like Photoshop tweaks digital images.

But while Project VoCo requires at least 20 minutes of sample audio before it can mimic a voice, Lyrebird cuts this requirements down to just 60 seconds.


Nvidia selects 5 most-disruptive AI startups

VentureBeat, Dean Takahashi


from

Nvidia is on a quest to find the most disruptive artificial intelligence startups. This quest is part of a larger contest, dubbed Nvidia Inception, which is screening more than 600 entrants to cull the best AI startups in three big categories.

We wrote about the first four candidates for the hottest emerging startup on Friday. And now we’re focusing on the next five candidates in the category dubbed the “most disruptive” startups.

Jen-Hsun Huang, CEO of Nvidia, hosted a Shark Tank-style event this week as part of the search to find the best AI startups. Huang and a panel of judges listened to pitches from 14 AI startups across three categories. These were filtered from the more than 600 contestants who entered the Nvidia Inception contest. The winners will walk away with $1.5 million in cash at a dinner on May 10 at Nvidia’s GPU Technology Conference.


What five years of early career research funding buys the world

Andrew J. Ko, Bits and Behavior blog


from

Whenever I close out a grant, I like to reflect on what I achieved with the money. Well, to be clear, NSF likes me to do that too, in the form of a final report of project outcomes. And as it should be: the average American gave me a tenth of a penny to do some research. What did it buy them?

This particular grant was my CAREER award, granted in 2009. This grant is given out to a select few faculty each year who “have the potential to serve as academic role models in research and education and to lead advances in the mission of their department or organization.” Really though, it’s an award given out for important research by new faculty.


A robot-delivery startup helped write state laws that are locking out competition – Recodeclockmenumore-arrowrecode_divider

recode, April Glaser


from

Two U.S. states — Virginia and Idaho — have now passed laws to allow delivery robots to operate statewide.

The new laws, both of which were passed this year, were written with the help of Starship Technologies, a delivery-robot company based in Estonia that was founded by Ahti Heinla and Janus Friis, two of the co-founders of Skype.

While Starship isn’t currently working in Virginia or Idaho, the company can now legally operate its robot in those states — without a person controlling it — on sidewalks and crosswalks. Starship’s 40-pound robots are designed to deliver things like meals, groceries and other on-demand goods. Similar legislation is now being proposed in Wisconsin.

But other robot-delivery companies might not be able to take advantage of the new laws.


Data management startup Rubrik is raising up to $200M on a $1B valuation

TechCrunch, Ingrid Lunden


from

Make way for another juggernaut amongst enterprise startups: Rubrik, a data backup company that only emerged from stealth in 2015, is in the process of raising between $150 million and $200 million on a valuation of $1 billion as the company enters a period of strong demand for its storage and data management products, according to sources.


What challenges and opportunities does AI pose for healthcare?

MedCity News, Kapla Ratnam


from

In the short term, AI has immediate applicability within the administration and operation of a hospital or healthcare system. Think in terms of supply chain and inventory management within a hospital. For example, AI can be used to ensure a surgical room is correctly stocked with appropriate inventory based on the series of surgeries scheduled to be performed in the room that day. Additionally, inventory could be automatically tracked with an algorithm that is able to assess a surgeon’s needs and send instructions to a robot that can pick up and deliver appropriate supplies to the surgical room at the beginning of a day. Another scenario would be an algorithm automatically assessing the cost of supplies from multiple vendors and immediately analyzing which vendor delivers the best product at the lowest cost and promptly ordering the appropriate inventory – all done without human intervention


To Secure Knowledge: A Task Force of the Social Science Research Council

Social Science Research Council


from

The Social Science Research Council is launching “To Secure Knowledge,” a task force that is born from the organization’s essential obligations to scholarship, the infrastructure of social research, standards of inquiry and evidence, and the role rigorous understanding plays in public affairs.


Is It Time to Break Up Google?

The New York Times, Sunday Review, Jonathan Taplin


from

In just 10 years, the world’s five largest companies by market capitalization have all changed, save for one: Microsoft. Exxon Mobil, General Electric, Citigroup and Shell Oil are out and Apple, Alphabet (the parent company of Google), Amazon and Facebook have taken their place.

They’re all tech companies, and each dominates its corner of the industry: Google has an 88 percent market share in search advertising, Facebook (and its subsidiaries Instagram, WhatsApp and Messenger) owns 77 percent of mobile social traffic and Amazon has a 74 percent share in the e-book market. In classic economic terms, all three are monopolies.

We have been transported back to the early 20th century, when arguments about “the curse of bigness” were advanced by President Woodrow Wilson’s counselor, Louis Brandeis, before Wilson appointed him to the Supreme Court.


Goergen Institute for Data Science provides new opportunities for collaboration

University of Rochester, NewsCenter


from

In 2016, Timothy Dye, a professor of public health sciences at the University of Rochester, traveled with a team to Puerto Rico to help local medical personnel deal with a Zika epidemic. In the process, they interviewed residents on their attitudes and living conditions, ending up with a voluminous amount of data.

When they returned home, Dye approached Jiebo Luo, an associate professor of computer science at Rochester who specializes in machine learning, data mining, and biomedical informatics. “Our operation is involved in converting research findings into applications, and that means making sense of massive amounts of data” says Dye. “We knew Jiebo Luo was the right person for the job.”

Dye and Luo are just two of the more than 40 faculty members from across the University whose research either relies on or furthers the new and fast developing field of data science. To harness their strengths, and facilitate collaborations such as the one between Dye and Luo, the University launched the Goergen Institute for Data Science in 2016.

 
Events



PresQT: Preservation Quality Tool Workshop 1

The Center for Research Computing


from

South Bend, IN The first of two PresQT workshops convenes user communities and tool providers engaged in data and software preservation to assess needs, look at the successful tools of today, and brainstorm about the data and software preservation tools of the future. May 1-2 [free, registration required]


Future Trends in Nuclear Physics Computing

Thomas Jefferson National Accelerator Facility


from

Newport News, VA May 2-5 at
Thomas Jefferson National Accelerator Facility [$$$]

 
Deadlines



Seth Bonder Summer Camp in Computational and Data Science

The Industrial and Operations Engineering Department at the University of Michigan is running a new camp this summer from July 24th to July 28th, 2017 that is a unique opportunity for high school students. The Seth Bonder Summer Camp in Computational and Data Science for Engineering is intended for high school students interested in engineering who have no or minimal background in data science and computer programming.

PyData Seattle 2017 Call for Proposals

Seattle, WA Conference is July 5-7. Deadline for proposals is May 1.

2017-18 TIER Faculty Fellowships: Call for Applications

Fellows are typically faculty members in any of the social sciences who teach research-oriented classes on statistical methods and/or supervise undergraduate or graduate students conducting quantitative research. Target date for applications is May 15.
 
Tools & Resources



Research Data – Global Inter-calibrated Nighttime lights

Seto Lab


from

Global inter-calibrated nighttime lights (NTLs) have been generated from stable NTL annual composite product (version 4) using a novel “Ridgeline Sampling and Regression” method (Zhang et al., 2016). Before use, all images will need to be re-scaled by multiplying pixel values with a scaling factor of 0.01. For more information, refer to Zhang et al. (2016).


Forget Sexy Data, Get Boring Sorted First

LinkedIn, Dan Barnett


from

As someone who goes to a lot of data related events, it’s usually not long before someone mentions the Harvard Business Review article of Data Science being the ‘Sexiest Job of the 21st Century’ (similarly, my other main interest is football where it’s rarely more than a couple of minutes in to a talk before the Moneyball klaxon goes off).

The range and depth of things that will be possible with data are huge but there’s always the risk that some focus too much on the sexy part and not the boring bits that underpin the whole process.

 
Careers


Postdocs

Postdoctoral Scholar in Computational Affective Neuroscience



Caltech, Division of the Humanities and Social Sciences; Pasadena, CA
Tenured and tenure track faculty positions

Chair Professor/Professor/Associate Professor/Assistant Professor (Computational Social Science for Communication Research)



City University of Hong Kong, Department of Media and Communication; Hong Kong

Leave a Comment

Your email address will not be published.