Data Science newsletter – May 15, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 15, 2017

GROUP CURATION: N/A

 
 
Data Science News



How Collective Intelligence Will Drive The World’s Greatest Innovations

Heleo, Beth Comstock and Joi Ito


from

Beth Comstock is the Vice Chair of GE and is the leader of GE’s efforts to accelerate new growth. Every month, she leads a discussion with a game-changing author and thinker in GE’s Changemaker Book Club, streamed live on Facebook. Recently, she joined Joi Ito, director of the MIT Media Lab and co-author of Whiplash: How to Survive Our Faster Future, for a conversation on the effects of collective intelligence and AI on business leadership and why we shouldn’t have to ask for permission.


25 Examples of A.I. That Will Seem Normal in 2027

Inverse, Graham Templeton


from

In the last ten years, artificial intelligence has changed the world in subtle but sweeping ways, but it’s got nothing on the coming decade, if you look at what’s being developed today. Voice recognition on every smartphone were simple proofs of concept. Over the next 10 years, artificial intelligence will make more progress than in the fifty before it, combined. With countless quickly oncoming applications to business, government, and personal life, its influence will soon touch absolutely every aspect of our lives.

Here are 25 surprising ways life and society that will be forever changed by artificial intelligence over the coming decade.


What it’s really like to be a data scientist at an NYC tech company

Built In NYC, Taylor Majewski


from

We caught up with lllya Bomash, a managing data scientist at Knewton, to learn more about what it’s like on the company’s data science team.

What is the first thing you do when you come into the office every day?

Make an espresso, order lunch for the day and then try to get something big crossed off my to-do list. As a new parent, I get into the office earlier than I used to, and that morning time is a good chance to spend time on something that needs focus.


Using ‘Faked’ Data is Key to Allaying Big Data Privacy Concerns

SmartData Collective, Steve Jones


from

MIT is out of the blocks first once again with a technological development designed to fix some of the privacy issues associated with big data.

In a world where data analytics and machine learning are at the forefront of technological advancement, big data is becoming a necessary lynchpin of that process. However, most organisations do not have the internal expertise to deal with algorithm development and thus have to outsource their data analytics. This raises many concerns regarding the dissemination of sensitive information to outsiders

The researchers at MIT have come up with a novel solution to these privacy issues. Their machine learning system can create “synthetic data” modelled on the data set which contains no real data and can be distributed safely to outsiders for development and education purposes.

The synthetic data is a structural and statistical analogue of the original data set but does not contain any real information regarding the organisation.


The Census Won’t Collect L.G.B.T. Data. That’s a Problem.

The New York Times, Praveen Fernandes


from

In my roughly 20 years working in the federal policy arena, few things have become clearer to me than the importance of data. If something is not counted, it is neither seen nor understood. For all intents and purposes, it does not exist.

That’s why the Trump administration’s decision not to collect data on the lives of lesbian, gay, bisexual and transgender Americans threatens these communities in ways that are both symbolic and practical.


Physics May Speed Up Solutions for Tough Computational Problems

University of Central Florida


from

A well-known computational problem seeks to find the most efficient route for a traveling salesman to visit clients in a number of cities. Seemingly simple, it’s actually surprisingly complex and much studied, with implications in fields as wide-ranging as manufacturing and air-traffic control.

Researchers from the University of Central Florida and Boston University have developed a novel approach to solve such difficult computational problems more quickly. As reported May 12 in Nature Communications, they’ve discovered a way of applying statistical mechanics, a branch of physics, to create more efficient algorithms that can run on traditional computers or a new type of quantum computational machine, said Professor Eduardo Mucciolo, chair of the Department of Physics in UCF’s College of Sciences.


Estimating the size of animal populations from camera trap surveys

German Centre for Integrative Biodiversity Research


from

Researchers develop new analytical methods that help them estimate the size of wild animal populations from a distance

Camera traps are a useful means for researchers to observe the behaviour of animal populations in the wild or to assess biodiversity levels of remote locations like the tropical rain forest. Researchers from the University of St Andrews, the Max Planck Institute for Evolutionary Anthropology and the German Centre for Integrative Biodiversity Research (iDiv) recently extended distance sampling analytical methods to accommodate data from camera traps. This new development allows abundances of multiple species to be estimated from camera trapping data collected over relatively short time intervals – information critical to effective wildlife management and conservation.


Government Data Science News

A hiring freeze at the CDC leaves 700 jobs unfilled including “dozens of budget analysts and public health policy analysts, scientists and advisers who provide key administrative support”.

Trump’s administration continues to make data difficult for researchers to access. Some of my colleagues worry that certain datasets, if deleted, may be impossible to recover. Climate data rescues continue to protect against outright loss of our collective climatological record, but there are many fields outside of climate science that have been less mobilized to preserve federal data most relevant to them. As a researcher and a citizen, it is terrifying to imagine how much knowledge we stand to lose if datasets are deleted. It’s maddening to think about how much time researchers are currently spending replicating and rehousing data that should be available from the federal government. Even if Trump doesn’t make all data disappear, the fear that he might (because we know he has) is enough to sap the time and optimism reserves of scientists.


Michael Eisen, UC-Berkeley evolutionary biologist and computational genomicist has announced that he will run for a 2018 US Senate seat to represent California.

Yoshua Bengio gave a talk that Yann LeCun recommends.

LeCun also posted his reason for declining an invitation to speak in Saudi Arabia that was the fb post of the week (we don’t have a real category for status-of-the-week so it’s also the status-of-the-year):

Dear X,
Thank you very much for the invitation. I am sorry to have to decline it.
As an avowed atheist, I could be considered a terrorist, according to a 2014 public declaration by the Saudi Interior Ministry. The declaration defines terrorism as “calling for atheist thought in any form, or calling into question the fundamentals of the Islamic religion on which this country is based.”

I’m sure that there are plenty of closet atheists in Saudi Arabia, and they are probably doing just fine. But as a vocal advocate of atheism and rationalism, and as a humanist and a proponent of human rights (including the rights of women), I cannot accept an invitation from a country that sees me as the enemy.
Best regards,
— Yann LeCun

At DARPA, under the direction of Hava Siegelman, is supporting research to ensure artificially intelligent systems can learn from the past more comprehensively. Siegelman explains that current systems are too closely associated with the particularities of training data and “susceptible to failures as they encounter the irregularities and unpredictability of real-world circumstances.” Her goal is get beyond AI agents that are “just computational tools to help us solve problems” to a place where AI can be “responsive and adaptive”.

In Germany, driverless cars are now allowed to operate on public roadways as long as a human driver and a black box recorder are present and functioning at all times.

The FDA is setting up a digital health unit, which is logical, given the potential for cascading precision medicine and genomics applications of AI.


Microsoft’s Story Remix app Uses AI to Make Editing Videos Easy

AI Business,


from

On Thursday Microsoft unveiled their new image recognition software which uses deep learning to help users edit their photos and videos on Windows 10 devices. Along with the company’s recent acquisition of two AI start-ups, this is more proof of how seriously Microsoft is taking AI.

Microsoft’s new Story Remix app will replace the previous Windows 10 Photos app, and it was showcased onstage by Microsoft executive Lorraine Bardeen.


Deep Speaker: an End-to-End System for Large-Scale Speaker Recognition

Baidu Research, Chao Li, Ajay Kannan and Zhenyao Zhu


from

Speaker recognition algorithms seek to determine the identity of a speaker from audio. Two common recognition tasks are verification (determining whether speakers are who they claim to be) and speaker identification (classifying the identity of an unknown voice among a set of speakers).

There are a variety of applications for this technology. For example, a voiceprint can be used to login into a device. Speaker verification can also serve as an extra step of security for financial transactions. In addition, shared devices like smart home assistants can leverage this technology to personalize services based on the current user.


How Astronomers Study the Edge of the Universe

Business Insider, Darren Weaver, Jessica Orwig and Alana Kakoyiannis


from

As far as we know the universe is not infinite, there’s actually a place where it ends. While astronomers have never actually seen the edge of the universe, they know it’s out there. Theoretical physicist and director of the Institute for Advanced Study at Princeton, Robbert Dijkgraaf explains how scientists know there’s an edge of the universe. [video, pre-roll + 1:33]


Americans Divided on Whether Recent Science Protests Will Benefit Scientists’ Causes

Pew Research Center, Cary Funk and Lee Rainie


from

In the wake of last month’s marches for science and climate in Washington and around the country, Americans are divided in their support of the events’ goals and their sense of whether it will make a difference. In particular, a new Pew Research Center survey finds that most Democrats and younger adults are convinced that these public events will help the causes of scientists. By contrast, Republicans and older adults believe the marches will not raise public support for scientists, aid efforts to increase government funding of science, enhance the role of scientists in policy debates or lead to increased efforts to combat global climate change.

Overall, 44% of adults think the protests, marches and demonstrations will boost public support for science, while an equal share believe the protests will make no difference and 7% believe the demonstrations will actually hurt the cause. But on this and other questions about the marches there are consistent divides along political and generational lines. For example, 61% of Democrats and Democratic-leaning independents believe the marches will increase public support for science, while only 22% of Republicans and those who lean Republican say the same. Instead, 60% of these Republican backers think the protests will make no difference, compared with just 32% of Democratic partisans who think that.


U-M students create local business platform and App

University of Michigan News


from

Growing up in Los Angeles, Jake Fleshner and his brother liked to visit a local deli and then nearby stores before heading home. These local businesses often attracted a similar customer, so why weren’t they promoting each other to their customer bases?

That question eventually led to creating Spade America—an ad exchange that connects businesses that want to advertise with brick-and-mortars that want to monetize their physical location and foot traffic.

Jake Fleshner, now a University of Michigan sophomore studying sport management, connected with Logan Levin, a freshman at U-M’s Ross School of Business; Noah Witus, a School of Kinesiology sophomore; and Isaac Haft, a freshman in the College of Literature, Science, and the Arts, to commercialize the idea. Spade is meant to bring small businesses together that may share a customer base.


How One Startup Built Better Health Insurance With the Magic of Data

WIRED, Science, Megan Molteni


from

For 12.2 million Americans, signing up for health insurance in 2017 was a leap of faith: that Obamacare would make it through the year, that the health exchanges wouldn’t collapse, that premiums wouldn’t put their families on the street. For the 54,000 New Yorkers who used those exchanges to join Oscar—a millennial-beckoning insurance startup cofounded by Jared Kushner’s younger brother, Joshua—the 2017 enrollment period wasn’t just uncertain. It was, well, kind of bleak.

In July, Oscar’s members found out the company was halving its largest network, based in New York City, from 40,000 doctors to 20,000, from 77 hospitals to 31. That kind of forced breakup should have been a very tough sell.

But somehow, Oscar convinced almost all of those 50,000-plus members to stay. The pitch they made was simple, if not exactly intuitive: Choose better care and service over a thicker phonebook of doctors. They threw in perks like concierge care teams and direct scheduling from your smartphone to sweeten the deal. And it worked.


Why big-data analysis of police activity is inherently biased

The Conversation, William Isaac and Andi Dixon


from

Any attempt to curb the alarming rate of homicides in Chicago is laudable. But the city’s new effort seems to ignore evidence, including recent research from members of our policing study team at the Human Rights Data Analysis Group, that predictive policing tools reinforce, rather than reimagine, existing police practices. Their expanded use could lead to further targeting of communities or people of color.


Home Depot Deep Learning Competition At Georgia Tech

Machine Learning @ Georgia Tech


from

Teams of Georgia Tech students spent nearly 24 hours starting the evening of Friday, April 14th, racing to produce the best results on five challenging deep learning problems in a competition hosted by The Home Depot, Big O Theory Club, and The Agency. The problems included time series prediction, image generation, determining the gender of speakers, determining which department a product belongs to based on the product’s image, and determining which search engine would return the best results for different search terms. Fueled by cookies, coffee, energy drinks, the teams produced remarkable results. Two teams completed all three challenges, and the best submissions used cutting-edge loss functions, fine-tuning techniques, data sampling algorithms, and time series analysis techniques to produce incredible results.

 
Events



Tackling machine learning complexity for data curation

ODX


from

Waterloo, ON, Canada Ihab Ilyas, professor in the Cheriton School of Computer Science at the University of Waterloo and cofounder of Tamr, a startup focusing on large-scale data integration and cleaning, will be joining ODX for a talk on Tuesday May 23, 2017 from 12:00 – 1:00pm at the Communitech Data Hub. [free]

 
Deadlines



ML Book Bundle Giveaway

One lucky winner in the U.S. or U.K will receive all of the books that helped me get started with machine learning…and encouraged me to publish my latest book, Thoughtful Machine Learning with Python. Contest ends on May 21.

New Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months?

Vote in KDnuggets 18th Annual Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? We will clean, analyze, visualize, and publish the results.

SOCIETY FOR JUDGMENT AND DECISION MAKING: CALL FOR ABSTRACTS

Vancouver, BC, Canada The Society for Judgment and Decision Making (SJDM) annual conference will be held in Vancouver, British Columbia, November 10-13. The deadline for submissions is Monday, June 19.
 
Tools & Resources



The selfish scientist’s guide to preprint posting

Niko Kriegeskorte


from

All the advantages of using preprints to science and society are good and well. However, we also need to think about ourselves. Does preprint posting mean that we give away our results to competitors, potentially suffering a personal cost for the common good? What is the selfish scientist’s best move to advance her personal impact and career? There is a risk of getting scooped. However, this risk can be reduced by not posting too early. It turns out that posting a preprint, in addition to publication in a journal, is advisable from a purely selfish perspective, because it brings the following benefits to the authors.


TriviaQA

Mandar Joshi, Eunsol Choi, Daniel Weld, Luke Zettlemoyer


from

“TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.”


A more useful infographic about free, legal easy Open Access

Jon Tennant, Fossils and Shit blog


from


Solving the “One Concurrency Control Does Not Fit All” Problem for OLTP Databases

Intel Science & Technology Center for Big Data, Dixin Tang and Aaron J. Elmore


from

In this post, we present a new transactional database system that adaptively changes data organization and concurrency control protocols in response to workload changes.

With the increasing memory sizes of modern servers, many OLTP databases can entirely reside in the main memory of a single machine. With the elimination of disk stalls and the increasing number of CPU cores available in a single server, however, concurrency control is now one of the major performance bottlenecks for OLTP database systems.

Recent effort focuses on optimizing concurrency control protocols to achieve higher than ever throughput. However, these protocols are typically optimized for specific workload characteristics, such as being read-heavy or highly-skewed, and can suffer when faced with diverse workloads─a situation that can arise due to shifting or unknown access patterns. Therefore, the performance of a database using a single static concurrency control protocol may suffer in the presence of dynamic, unknown, or mixed workloads.


Proceedings of the Eleventh International Conference on Web and Social Media

AAAI


from

Montréal, Québec, Canada, May 15–18, 2017. Published by The AAAI Press, Palo Alto, California. [links to all papers]

 
Careers


Full-time positions outside academia

Deputy Director, Library Services



Institute of Museum and Library Services; Washington, DC

Leave a Comment

Your email address will not be published.