Data Science newsletter – January 16, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for January 16, 2017

GROUP CURATION: N/A

 
 
Data Science News



DARPA wants to create secure data-sharing tech

Computerworld, Sharon Gaudin


from

DARPA is kicking off a project to create a new way to enable U.S. troops working in remote areas around the world to securely send and receive sensitive information on their devices.

The Defense Advanced Research Projects Agency, the research arm of the Department of Defense, said it’s working on a project that would use software and networking technology to securely share information on unsecured commercial and military networks.


How Northeastern plans to reach equal male-female computer science enrollment by 2021

TechRepublic, Alison DeNisco


from

Carla Brodley, dean for the College of Computer Information and Science at Northeastern University, explains how to make computer science attractive to all students to fill workforce gaps


Helping universities combat depression with mobile technology

The Conversation, Mi Zhang, David Mohr, Jingbo Meng


from

Using mobile technology for mental illness diagnosis and treatment is becoming a hot research topic nowadays because of the pervasiveness of mobile devices and their behavior-tracking capabilities. Building on others’ work, we have found a way to enhance counseling services with mobile technology and big data analytics. It can help students and clinicians alike, by offering a new tool for assessing depression that may shed increased light on a condition that is challenging to study.


University Data Science News

Northeastern University‘s new dean for Computer and Information Science, Carla Brodley, is increasing joint majors, amplifying the CS minor, and offering two masters in CS (one for student with no CS experience) to boost the representation of women to 50 percent. Currently, women make up 16 percent of CS graduates, down from 30 percent at peak.

Gates Foundation asks its grantees to publish in open access journals. For some reason, the Nature News headline reads, “Gates Foundation research cannot be published in top journals”.”

Carnegie Mellon University has a massive non-profit offshoot dedicated to robotics led by Gary Fedder. The Advanced Robotics Manufacturing Innovation Hub will live in a former steel plant. It received $80m from DoD plus $173m from a host of others. So far, “123 companies, 40 academic institutions, and 64 government and non-profit agencies” have signed on to be collaborators. Massive bet on the future.

Iztok Fister and Matjaz Perc found evidence of an ivory tower “citation cartel”. With a nod to the reputation-destroying false positive problem they note, “We can only indicate that there is a high probability of citation cartel existence.”

Astrophysicist Matt Turk posted plans for astronomical software in 2017 writing in his new role as Chair of the Working Group for Astronomical Software in the American Astronomical Society.

Philip E. Bourne will lead University of Virginia’s Data Science Institute starting in May when he leaves his position as the associate director for data science at NIH where he has been leading the Big Data to Knowledge Program.

Researchers at UCLA and Loyola Marymount University sequenced the DNA of over a hundred orders of restaurant and grocery store sushi. When you think you are eating halibut or red snapper? You probably aren’t. Tuna and salmon are far more likely to be properly labeled. The mislabeling can contributed to overfishing; in the case of halibut, 4 of 10 orders were a species of threatened flounder.

University of Southern California’s Translational Imaging Center developed a more powerful, faster, cheaper biological imaging tool that’s kind of like an internal black-light. The Hyper-Spectral Phasor (HySP) tags molecules with dye then images tissue systems in 5D. So far, it works well in zebra fish models.

Cornell Tech is addressing the 40 percent of PhD-holders who cannot get a job at graduation by offering them space in an entrepreneurial incubator.
Russ Caflisch has been named Director of NYU‘s Courant Institute of Mathematical Sciences. He will leave his current position at UCLA over the summer.

Alejandro Estrada lead author on a study at the National Autonomous University of Mexico finds that 60 percent of the world’s 504 primate species are sliding towards extinction due to farming, logging, and ranching.

NYU’s GovLab wants to build a Wikipedia-style platform for open data.


CMU Robotics center to focus on practical application

Pittsburgh Post-Gazette, Tracie Mauriello


from

Seventeen years after LTV Steel closed its Hazelwood operation, its site is being prepared for new life as a catalyst for the region’s — and the nation’s — next big industrial revolution.

A new nonprofit offshoot of Carnegie Mellon University is in negotiations with foundations that own the former industrial site known as Almono to make it the headquarters for a national effort to make robotics more affordable, adaptable and useful for manufacturers.

The Advanced Robotics Manufacturing Innovation Hub is being created with $80 million in start-up money from the Department of Defense and $173 million in commitments from partners including the Richard King Mellon Foundation. The enormous collaboration involves 123 companies, 40 academic institutions and 64 government and nonprofit agencies in 31 states.


How AI took center stage at CES 2017

VentureBeat, Sami Viitamaki. Havas


from

CES 2017 was set to be the biggest so far, with more exhibitors, products, and visitors than ever before in its 50 year history. But one signal sparked my interest above everything else: increasing focus on how technology brings value over technology per se.

It was hard to find anything not labeled “smart” or “intelligent” this year. While just a year ago AI seemed futuristic, people now expect it … and expect it to work straight out of the box. This is because we’ve reached sufficient maturity in what I call the 4Cs of useful AI: computing, connectivity, cognition, and convergence.


The Latest On The Cancer Moonshot Initiative

Stanford Medicine, Scope Blog, KQED Forum


from

Earlier this week, Vice President Joe Biden visited San Francisco and discussed the National Cancer Moonshot Initiative, a project aimed at accelerating cancer research and improving the prevention, detection and treatment of the disease.

Jan Liphardt, PhD, an associate professor of bioengineering and member of the Stanford Cancer Institute, is involved with the Moonshot project and was a guest on KQED’s Forum yesterday. During the hour-long show, which explored the promises and limitations of the initiative, Liphardt discussed why he got involved with the project (“I like a challenge,” he once said) and how the creation of CancerBase. [audio, 51:52]


Scientific ‘cartels’ band together to cite each others’ work

STAT, Ivan Oransky and Adam Marcus


from

They’re not the kind of gangs that smuggle drugs and murder people. But people looking closely at the scientific literature have discovered that a small number of scientists are part of a different kind of cartel — ones that band together to reference each other’s work, gaming the citation system to make their studies appear to be more important and worthy of attention.

These so-called citation cartels have been around for decades, as the publishing consultant Phil Davis has pointed out. Thomson Reuters, which until recently owned the Impact Factor for ranking journals, has even sanctioned periodicals for evidence of cartel behavior.


Unexpected Consequences of Self Driving Cars

Rodney Brooks


from

Many new technologies have unexpected impacts on the physical or social world in which we live.

When the first IMPs for the fledgling ARPANET were being built starting in 1969 at BBN in Cambridge, MA, I think it safe to say that no one foresaw the devastating impact that the networking technology being developed would have on journalism thirty to fifty years later. Craigslist replaced classified ads in newspapers and took a huge amount of their revenue away, and then Google provided a new service of search for things that one might buy and at the same time delivered ads for those things, taking away much of the rest of advertising revenue from print, radio, and TV, the homes of most traditional journalism. Besides loss of advertising cutting the income stream for journalism, and thus cutting the number of employed journalists, the new avenues for versions of journalism are making it more difficult for traditional print journalists to compete, as John Markoff recently talked about in announcing his retirement from the New York Times.

A way of sharing main frame computer power between research universities ended up completely disrupting how we get our news, and perhaps even who we elected as President.

Where might new unexpected upendings of our lives be coming from?


“OK Facebook”—Why stop at assistants? Facebook has grander ambitions for modern AI

Ars Technica, Steven Max Patterson


from

Facebook’s machine learning pipeline—from research to production—is aimed at an AI future.


Company Data Science News

Robots at BlackRock‘s quantitative hedge fund performed terribly in 2016, returning an overall loss.

Startup Blockstack received $4m in VC funding from Union Square Ventures to build “open-source software that will create a kind of parallel universe to the Web we know,” a universe where users get control over their data.”

Microsoft acquired Montreal-based Maluuba, a natural language “understanding” platform based on deep learning methods advised by Yoshua Bengio. Elsewhere, The Economist ran an explainer on the state of hard natural language tasks like speech recognition, translation, and interpreting what language means. Oh, and team at Carnegie Mellon launched DyNet, a dynamic neural network tool for NLP.

Wondering about cannabis research? A meta study run by the National Academies of Sciences, Engineering and Medicine summarized the state of our knowledge about pot consumption, concluding that we need much more research in this area. Use of medicinal marijuana does seem to relieve pain and reduce nausea for chemo patients.

Merck has teamed up with Palantir to develop drugs more quickly. This could lead to surging profits for both companies.

Google‘s Francois Chollet (author of Keras), announced on reddit that TensorFlow will add the Keras library, making Keras the default TensorFlow API for now.

WhatsApp is still a secure messaging platform contrary to a news report in the Guardian. Meitu, however, is an anime selfie app you should never download unless you want to let the Chinese government know what you’re up to.


Astronomical Software in 2017 and Beyond

Medium, Matthew Turk


from

At the end of 2016, I took on the role of Chair of the Working Group for Astronomical Software (WGAS) in the American Astronomical Society (AAS), attempting to fill in the unfillable shoes of Frossie Economou. I’m honored to be asked to take on this role, and I hope I live up to the expectations of the society and its members.

Below, I’ve outlined a few of the things I want to prioritize in the WGAS; they can be summarized as the problems of recognition and domain specificity. Having these discussions occur in the context of an AAS-recognized working group provides the opportunity to leverage the membership of AAS and to present findings to the AAS Council, where they can help shape policy. I see the WGAS as an instrument of social change in our professional society.


Udacity launches deep learning nanodegree foundation program

TechCrunch, John Mannes


from

Greater compute power and power efficiency has made deep learning algorithms ubiquitous in our world. Deep learning has found its way into self driving cars, convenience stores and hospitals. Yet the fight for top talent in the space remains fierce and is a bottleneck for reaching new industries and solving tough challenges. To complement Udacity’s previous AI courses, the online education startup is partnering with YouTube star Siraj Raval for a new deep learning nanodegree foundation program that will be co-taught with Udacity’s Mat Leonard.

Foundation Programs are going to be a major focus for Udacity in the coming year. They fill a niche for students not quite ready for a full nanodegree program. The new course promises a lot in a short 17 weeks. With just 3-4 hours per week of time, the goal is to equip students with enough knowledge to know how to solve any task with deep learning. Raval explains it as a way of thinking.


UVA Names NIH Researcher as New Director of Data Science Institute

University of Virginia, UVA Today


from

Philip E. Bourne, a leading “big data” researcher and administrator at the National Institutes of Health, has been named the Stephenson Chair of Data Science and director of the University of Virginia’s Data Science Institute. He also will serve as a professor of biomedical engineering.

Bourne will come to UVA in May from his position as associate director for data science at NIH. There he has led the Big Data to Knowledge Program, which coordinates access to, and analysis of, biomedical research, making it internationally available to scientists and researchers. He also oversaw governance and strategic planning for data and knowledge management and established several data science training programs.


Google Disrupts Curating Via Artificial Intelligence

artnet News, Ben Davis


from

In artnet’s predictions for 2017, I wrote that it would be the year Artificial Intelligence would finally crack the problem of curating. That was meant as a joke—but it turns out that I was already behind the times. The whiz kids at Google’s non-profit cultural arm, Google Cultural Institute, have spent the year trying to imagine just that.

Head to the “Experiments” section of the Cultural Institute website and you will find a catalogue of how they have been attempting to apply “machine learning” to the question of organizing artworks. Perhaps the experiment that best showcases their particular brand of cultural gimcrackery is called “X Degrees of Separation.”

Google Cultural Institute has secured millions of high-quality images of artworks and artifacts from hundreds of partner museums around the world. “X Degrees” lets you pick any two images from this library. Its algorithm then conjures up a series of steps that connect the two images visually, using other artworks from the trove.


Established education providers v new contenders – Alternative providers of education must solve the problems of cost and credentials

The Economist


from

THE HYPE OVER MOOCs peaked in 2012. Salman Khan, an investment analyst who had begun teaching bite-sized lessons to his cousin in New Orleans over the internet and turned that activity into a wildly popular educational resource called the Khan Academy, was splashed on the cover of Forbes. Sebastian Thrun, the founder of another MOOC called Udacity, predicted in an interview in Wired magazine that within 50 years the number of universities would collapse to just ten worldwide. The New York Times declared it the year of the MOOC.

The sheer numbers of people flocking to some of the initial courses seemed to suggest that an entirely new model of open-access, free university education was within reach. Now MOOC sceptics are more numerous than believers. Although lots of people still sign up, drop-out rates are sky-high.


Fearing Trump Data Purge, Environmentalists Push to Get Records

Bloomberg Markets, Jennifer A Dlouhy


from

U.S. government scientists frantically copying climate data they fear will disappear under the Trump administration may get extra time to safeguard the information, courtesy of a novel legal bid by the Sierra Club.

The environmental group is turning to open records requests to protect the resources and keep them from being deleted or made inaccessible, beginning with information housed at the Environmental Protection Agency and the Department of Energy. On Thursday, the organization filed Freedom of Information Act requests asking those agencies to turn over a slew of records, including data on greenhouse gas emissions, traditional air pollution and power plants.


One Startup’s Vision to Reinvent the Web for Better Privacy

MIT Technology Review, Tom Simonite


from

Venture capitalist Albert Wenger has done well by investing in Web businesses—he was an early backer of Etsy and Tumblr. But at his urging, Union Square Ventures, where he is a partner, is backing a company founded on the principle that the Web needs a rethink.

“We’re living in a time period where the new incumbents like Amazon, Google, and Facebook have firmly established themselves, and are near monopolists in their markets,” says Wenger. “If we want a long-term, open playing field for innovation, we’re going to need new, decentralized infrastructure.”


How Drones Are Helping Scientists Study the Future of Whales

NBC News, Tom Costello and Bradleigh Miranda Chance


from

Researchers at Ocean Alliance, a nonprofit dedicated to conservation efforts, are using drones capture the breathtaking image with aerial photography — as well as the genetic material in petri dishes.

Ocean Alliance created the “Snotbot,” a drone they designed to capture both images of the whales and live biological samples of the “snot” the animals exhale through their blow holes. The team mounted petri dishes to the drone’s base so that when the drone hovers above a breaching whale, the propellers create a vortex. This pushes the whale’s spray down into the sample trays.


Microsoft acquires deep learning startup Maluuba; AI pioneer Yoshua Bengio to have advisory role

Microsoft, The Official Microsoft Blog, Harry Shum


from

Today is an exciting day for the advancement of AI at Microsoft. We have agreed to acquire Maluuba, a Montreal-based company with one of the world’s most impressive deep learning research labs for natural language understanding. Maluuba’s expertise in deep learning and reinforcement learning for question-answering and decision-making systems will help us advance our strategy to democratize AI and to make it accessible and valuable to everyone — consumers, businesses and developers.

We’ve recently set new milestones for speech and image recognition using deep learning techniques, and with this acquisition we are, as Wayne Gretzky would say, skating to where the puck will be next — machine reading and writing.


Language: Finding a voice – Computers have got much better at translation, voice recognition and speech synthesis, says Lane Greene. But they still don’t understand the meaning of language

The Economist, Technology Quarterly


from

The improvements in the lab are now being applied to products in the real world. More and more cars are being fitted with voice-activated controls of various kinds; the vocabulary involved is limited (there are only so many things you might want to say to your car), which ensures high accuracy. Microphones—or often arrays of microphones with narrow fields of pick-up—are getting better at identifying the relevant speaker among a group.

Some problems remain. Children and elderly speakers, as well as people moving around in a room, are harder to understand. Background noise remains a big concern; if it is different from that in the training data, the software finds it harder to generalise from what it has learned. So Microsoft, for example, offers businesses a product called CRIS that lets users customise speech-recognition systems for the background noise, special vocabulary and other idiosyncrasies they will encounter in that particular environment. That could be useful anywhere from a noisy factory floor to a care home for the elderly.

But for a computer to know what a human has said is only a beginning. Proper interaction between the two, of the kind that comes up in almost every science-fiction story, calls for machines that can speak back.

 
Events



Craige Roberts Short Course on Indexicality



New York, NY Craige Roberts is giving a three-week course on Indexicality in the NYU Department of Linguistics starting January 24. If you wish to attend and are not affiliated with NYU, please notify Prof. Roberts in advance.

NYC Data Science Seminar Series — Olga Troyanskaya



New York, NY Thursday, January 26, at 6:00 PM, Facebook NYC (770 Broadway) [free, sold out but waiting list available]

PyData London CFP



London, England Conference is May 5-7. CFP closes: TBD

The 10th annual symposium for Women in Mathematics in Southern California, Feb 11, 2017



Los Angeles, CA University of Southern California [free, register by January 20]
 
Deadlines



Data Science Bowl 2017: Can you improve lung cancer detection?

There’s a $1 million prize from Kaggle and Booz Allen Hamilton. Deadline is April 21.

CSCW 2018

CSCW is moving to a fall schedule. To bridge the year-and-a-half gap between CSCW 2017 and CSCW 2018, we will have two submission deadlines in spring 2017 and spring 2018. The spring 2017 Online First deadline is Thursday April 27.
 
Tools & Resources



Stan JSS paper out: “Stan: A probabilistic programming language”

Statistical Modeling, Causal Inference, and Social Science blog, Bob Carpenter


from

As a surprise welcome to 2017, our paper on how the Stan language works along with an overview of how the MCMC and optimization algorithms work hit the stands this week.


Program better, for fun and for profit

Katherine Wood, Inattentional Coffee blog


from

Code quality is something I’m constantly working on. You can see the development in my scripts; I only recently started using dplyr and the rest of the tidyverse in R, and what a difference it’s made to the quality of my code. I cringe a little, looking back at my earliest scripts (and they’re a matter of public record, forever). Cringe is good, though. Cringe signals improvement, and wisdom gained.

I thought I’d share a few of the practices that were drilled into me during my CS education that have helped improve the style, quality, and readability of my code.

 
Careers


Full-time, non-tenured academic positions

Data Standards & Quality Manager, HUMAN Project



NYU, Institute for the Interdisciplinary Study of Decision Making; New York, NY
Tenured and tenure track faculty positions

Computer Science Tenure Track Assistant or Associate Professor



George Washington University; Washington, DC

Leave a Comment

Your email address will not be published.