Data Science newsletter – June 7, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for June 7, 2017

GROUP CURATION: N/A

Data Science News

New eCornell data analytics courses help professionals excel

Cornell Chronicle

from June 05, 2017

Cornell’s SC Johnson College of Business and eCornell, the university’s online education department, are now offering professionals the opportunity to earn an executive certificate in data analytics that builds core fluency in data analysis and a foundation for further technical study.

“Data analysis requires professionals to be informed consumers of data. Technical knowledge is necessary, but it’s actually even more valuable to know which questions to ask, how to ask them, test them and translate them into business intelligence,” said Chris Anderson, associate professor in the SC Johnson College of Business and the certificate’s author. “Done well, data analysis provides a valid narrative business leaders can follow to make more successful strategic decisions.”

How GPUs and Deep Learning Are Helping Protect Endangered Species

NVIDIA Blog, Tony Kontzer

from June 05, 2017

Zoe Jewell and Sky Alibhai are co-founders of WildTrack, a nonprofit devoted to monitoring endangered species. In time for World Environment Day, they’ve launched a program called ConservationFIT, where FIT stands for “footprint identification technology.”

The program seeks to crowdsource photos of animal footprints, and then use those images to build algorithms that can identify the species, individual, sex and age-class of the animal who made them.

Flex Logix eFPGA IP chosen for Harvard 16nm deep learning IC

Electronics Weekly, David Manners

from June 05, 2017

Flex Logix, the eFPGA IP company, has won a slot in a deep learning IC being made on TSMC’s 16nmFFC process by Harvard’s John A. Paulson School of Engineering and Applied Sciences.

Flex Logix and the Harvard researchers have already completed the chip design and tape-out and are now going to fab.

Risk analytics enters its prime

McKinsey & Company, Rajdeep Dash, Andreas Kremer, Luis Nario, and Derek Waldron

from June 05, 2017

The recent dramatic increases in computing power have allowed banks to deploy advanced analytical techniques at an industrial scale. Machine-learning techniques, such as deep learning, random forest, and XGBoost, are now common at top risk-analytics departments. The new tools radically improve banks’ decision models. And techniques such as natural-language processing and geospatial analysis expand the database from which banks can derive insights.

These advances have allowed banks to automate more steps within currently manual processes—such as data capture and cleaning. With automation, straight-through processing of most transactions becomes possible, as well as the creation of reports in near real time. This means that risk teams can increasingly measure and mitigate risk more accurately and faster.

We use big data to sentence criminals. But can the algorithms really tell us what we need to know?

The Conversation, Melissa Hamilton

from June 05, 2017

Today, judges across the U.S. use risk assessment tools like COMPAS in sentencing decisions. In at least 10 states, these tools are a formal part of the sentencing process. Elsewhere, judges informally refer to them for guidance.

I have studied the legal and scientific bases for risk assessments. The more I investigate the tools, the more my caution about them grows.

The scientific reality is that these risk assessment tools cannot do what advocates claim. The algorithms cannot actually make predictions about future risk for the individual defendants being sentenced.

Before Silicon Valley, New Jersey Reigned As Nation’s Center Of Innovation

NPR, All Tech Considered, Laura Sydell

from June 05, 2017

Edison invented the modern lab — teams of people working together, sharing ideas and perfecting devices. In the century after Edison, New Jersey became the place to set up shop if you wanted to invent. On top of all the other assets, the state had lots of inexpensive land available.

That’s what made it attractive to Bell Labs. It spread out over several sites in the middle of the state and created many of the technologies that paved the way for 20th century inventions. One of its former facilities, which once held 6,000 engineers and researchers, is in the suburb of Holmdel.

Institutional researchers and librarians unite!

OCLC Research, Hanging Together blog

from June 06, 2017

Last week I had the opportunity to present a poster at the annual meeting of the Association of Institutional Research (AIR), the primary professional organization for US institutional research (or IR) professionals. The IR professionals I spoke with expressed frustration with the ability to collect high quality, reliable information about the research productivity at their institutions.

Long reads and ear candy

Nature has a smart weekly podcast aimed at scientists.

Jevin West and Carl Bergstrom, colleagues at UW-Seattle, launched a course designed to allow people to call bullshit on data science. They argue humans have evolved to detect lies and the lying liars who tell them, but we may struggle when those lying liars are, say, algorithms or bad training data. If you are at NYU and you like the sound of this, you could take my class Data Rules which is an historically rooted neoliberal critique of how we make truth claims using data. Not the same boat, but definitely the same stream of thought.

Some of us have wanted to call BS on the all-male-panels. Marilee Talkington brought a nearly-all-male-panel to a pin-drop quiet moment of contemplation with her simple, powerful strategy for ensuring the woman panelist was able to speak.

The OKCupid blog! It’s a must read and great for teaching stats to students who do not inherently care about stats. This month OKCupid has a feature about age gaps in dating. Women over 35 are more likely to get responses when they message younger men. “When men message women, women tend to respond most often to men around their own ages. But when women message men, they’re actually more likely to get a response from younger men than they are from older ones.” Not all posts are great for undergraduate classrooms; the piece suggesting June is the hottest month for one-night stands is a topic I’d like to avoid in class.

Liesl Yearsley, who sold her AI agent Cognea to IBM reveals what people want from AI agents: “I always assumed we would want to keep some distance between ourselves and AI, but I found the opposite to be true. People are willing to form relationships with artificial agents, provided they are a sophisticated build, capable of complex personalization. We humans seem to want to maintain the illusion that the AI truly cares about us.”

Securing the cloud Inside the high-tech, high-stakes race to keep the cloud safe, secure and empowering for all.

Microsoft, Story Labs

from June 06, 2017

At any point in time on any day of the week, Microsoft’s cloud computing operations are under attack: The company detects a whopping 1.5 million attempts a day to compromise its systems.

Microsoft isn’t just fending off those attacks. It’s also learning from them.

All those foiled attacks, along with data about the hundreds of billions of emails and other pieces of information that flow to and from Microsoft’s cloud computing data centers, are constantly being fed into the company’s intelligent security graph.

Why aren’t Google and Facebook enriching our lives?

Philip Greenspun's Weblog

from June 05, 2017

Since I’m a Gmail user, let’s take Google as an example. Google knew that I was going to Moscow (itinerary emailed to my Gmail address). Google knew my schedule (Calendar). Google should know my various interests by now, from reading my Gmail messages and Docs content. Due to me being of such an advanced age that I still use email rather than text, Google definitely knows my real social network (the people with whom I correspond via email).

Why didn’t Google suggest to me a whole bunch of cultural events? People to meet? Groups to join?

Deep analysis of police body cam footage shows pattern of less respectful speech towards black people

TechCrunch, Devin Coldewey

from June 06, 2017

Police body cameras don’t just record shootouts and other crises — they’re on all day every day, giving a view in unprecedented detail into officers’ interactions with the public. If you can handle that volume of data, that is. Researchers at Stanford created a method to do just that, and soon found that officers in Oakland consistently used less respectful language when speaking with black people.

Now, this wasn’t some big, obvious difference. It took the analysis of half a million words spoken by Oakland police officers in thousands of interactions to surface — but that’s the strength of having this amount of data to draw from. Perhaps otherwise this difference in manner would be written off as imagined.

Apple’s WWDC Keynote Lays Out a Vision for the Future

The Atlantic, Alexis C. Madrigal

from June 05, 2017

Without once saying the words “artificial intelligence,” a stream of Apple executives described a vision of the near future in which Siri, the company’s AI avatar, stitches together the company’s many hardware products.

And they introduced a new—and widely anticipated—entry into their lineup: a $349 cylindrical voice-controlled speaker they call HomePod.

After a strangely dystopian video in which Apple’s apps go away and the world plunges into post-apocalyptic violence, Apple CEO Tim Cook led off the company’s keynote at its big gathering for coders, the Worldwide Developers Conference, in San Jose.

The WWDC keynote tends to be a place where Apple showcases all the little incremental “refinements” they are making to their software and hardware. This year, however, there was a thread that ran through many presentations: Siri.

Their code was used to hack Sony and create ‘WannaCry.’ Meet the ‘Lazarus Group,’ the armed robbers of the Internet

Los Angeles Times, Matt Pearce

from May 18, 2017

On Feb. 4, 2016, as employees left work to enjoy their weekends, the central bank of Bangladesh began firing off dozens of transfer orders to the Federal Reserve Bank of New York, asking to remove money from its accounts — almost $1 billion.

It was a heist. The robbers hadn’t walked in with guns or tunneled into a vault to get the money. They’d hijacked the bank’s computer systems to access an international financial network, SWIFT, which shunts around billions of dollars a day. The invisible thieves made off with $81 million before officials halted the geyser of cash.

The attack’s audacity, and the weaknesses it exposed, stunned bankers and financial regulators. Months later, cybersecurity researchers concluded that it was yet another notch in the belt of one of the most destructive hacker collectives on the Internet, the “Lazarus Group,” previously accused of being behind the devastating 2014 Sony Pictures Entertainment hack and other attacks — and working for North Korea.

Events

Free, Short-Course on Adaptive Enrichment Designs: Methods and Software

Michael Rosenblum

from June 13, 2017

Washington, DC Tuesday, June 13. Instructor: Michael Rosenblum from Johns Hopkins Bloomberg School of Public Health. [free, registration required]

Tutorials Schedule | SciPy 2017

SciPy 2017

from July 10, 2017

Austin, TX The SciPy Conference kicks off with two days of tutorials (July 10-11) that take place before the general conference. [$$$]

Deadlines

AI For Everyone Challenge

The “AI For Everyone” Challenge is a $1 million contest, organized by CrowdFlower, that is specifically aimed to help make a difference in our world – from non-profit ventures to scientific research. Deadline for submissions is June 15.

ICML 2017 RML Workshop

Sydney, Australia Aug 11. Deadline for submissions is June 17.

Open Review of IRIDIUM Research Data Management Glossary is now invited and is open until 30 June 2017

The International Research Data Management (IRiDiuM) Glossary (http://www.codata.org/working-groups/standard-glossary-for-research-data-management-iridium) and (http://dictionary.casrai.org/Category:Research_Data_Domain) is intended as a practical reference for individuals and working groups concerned with the improvement of research data management, and as a meeting place for further discussion and development of terms.

NYU Center for Data Science News

Collecting data to investigate discrimination in the NYC Rental Market

NYU Center for Data Science

from June 05, 2017

In “Can the Government Deter Discrimination?” our Moore-Sloan postdoctoral researcher, Andrew Guess, along with Albert Fang (Yale) and Macartan Humphreys (Columbia), conducted a 20 month experiment from April 2012 to December 2013 in partnership with the NYC municipal government to explore the extent of racial discrimination in the city’s rental market.

The researchers began by scouring daily rental ads on Craigslist, the primary advertising platform for NYC rental listings, and randomly selected the ads where potential tenants were invited to inquire by phone.

10,000 New Yorkers. 2 Decades. A Data Trove About ‘Everything.’

The New York Times, Marc Santora

from June 04, 2017

Your phone, in all likelihood, knows more about you than your doctor. Your credit card company knows your likes and dislikes better than your closest friend. Google knows your thoughts, and even completes your sentences. Your telephone service provider knows where you are at all times. Facebook, for many, knows more than the rest combined.

But Paul W. Glimcher, a neuro-economist at New York University, looks at all that data and sees a “train wreck.”

For all of Silicon Valley’s cheerleading of “big data,” Mr. Glimcher said it had yet to be used to effectively solve some of society’s most vexing problems.

So he was intrigued when Miyoung Chun, the executive vice president for science programs at the Kavli Foundation and a leader of the Obama administration’s Brain Initiative, approached him five years ago about what the future of big data might look like.

Tools & Resources

DataScience.com Partners With RStudio to Bring Collaborative Data Science to Enterprise Customers Working With R

DataScience.com

from June 05, 2017

In keeping with its mission to empower data science teams with the tools they already love, enterprise data science platform provider DataScience.com has partnered with software company RStudio to bring the collaborative features of its platform to teams that work with the programming language R.

A Roadmap For Augur and What’s Next

Medium, Augur Project

from June 05, 2017

The Augur core dev team has been making progress towards our platform release. In spirit of keeping everyone on the same page, we wanted to inform the community on what you can expect to see in the near future.

What will be in the initial Augur release?

The ability to create markets: binary (yes/no), scalar (a range of values, i.e. how many iPhones will Apple sell in Q4, how many inches of rain will fall in x location, etc.), and categorical (will it be Trump, Hillary, Johnson, or other?), the ability to trade, and the ability to resolve markets using reporting with both backstops implemented will be in the initial launch.

Jupyter Insights: Andreas Mueller

O'Reilly Radar

from June 06, 2017

Andreas Mueller, a lecturer at the Data Science Institute at Columbia University and co-author of Introduction to Machine Learning with Python, shares his thoughts on the current and future state of Jupyter.

NeuroNER

GitHub – Franck-Dernoncourt

from June 05, 2017

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Simone – A Distributed Simulation Service

Medium, Netflix TechBlog

from June 06, 2017

Simone is a service that enables configuration, deployment, and execution of simulations within arbitrary domains throughout the Netflix environment.

Databricks releases serverless platform for Apache Spark along with new library supporting deep learning

TechCrunch, John Mannes

from June 06, 2017

Today to kick off Spark Summit, Databricks announced a Serverless Platform for Apache Spark — welcome news for developers looking to reduce time spent on cluster management. The move to simplify developer experiences is set to be a major theme of the event overall. In addition to Serverless, the company also introduced Deep Learning Pipelines, a library that makes it easy to mix deep learning frameworks with Spark.

If you haven’t been following the latest developments in cloud-based data processing, Databricks is the commercial manifestation of the open-source Apache Spark project. The company’s engineers spend their days building tools to support the Spark ecosystem, like those being announced today.

Build More Intelligent Apps with Core ML

Apple Developer Documentation

from June 05, 2017

Now you can bring intelligent features, such as face tracking, text detection, language identification, and much more into your apps with the new foundational machine learning technology that is used across Apple products. Core ML delivers blazingly fast performance with easy integration of machine learning models.

Managing software teams in academia and industry

Medium, Bits and Behavior blog, Andy J. Ko

from June 05, 2017

“I’m not going to pretend to be an expert on software project management, in academia or industry. That said, I have spent a lot of time managing software teams. I managed a team of eight engineers at AnswerDash for two years. I’ve managed teams of researchers. And nearly every class I’ve taught has involved group work” … “So why is teamwork so hard?”

Python Imgstore

loopbio blog

from May 30, 2017

“Imgstore is a library that we built for efficient reading and writing of large video recordings. By splitting long recordings into chunks and implementing efficient indexing alongside, it allows frame accurate seeking and random frame access.”

Careers

Postdocs

Postdoctoral Research Associate in Global Synthesis of Large-Scale Land Acquisitions

University of Alabama, Department of Geography; Tuscaloosa, AL

Full-time positions outside academia

Data Visualization Engineer

Capital One; Washington, DC

Sports.BradStenger.com

Data Science newsletter – June 7, 2017

Leave a Comment Cancel reply