Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for May 30, 2017
GROUP CURATION: N/A
Data Science News
Company Data Science News
Apple is reportedly developing chips dedicated to AI computation, joining a range of other firms like NVidia, IBM, Qualcomm, Google, and Intel. Called Apple Neural Engine – not iAI – the chip is designed to address image and speech recognition tasks. The most interesting part of this corporate leak is that Apple “plans to offer developer access to the chip so third-party apps can also offload artificial intelligence-related tasks”.
John Deere has opened an office near Stanford to develop better precision agriculture. Having now written 100+ newsletters and read thousands of articles on the hype of AI, I think precision medicine and precision agriculture are the two areas most ripe for AI-to-commercialization success in the present and near-term future. Mary Meeker, Kleiner Perkins analyst, calls our present moment “a digital inflection point for medicine”.
Steve Ballmer has finally combined his love of NBA basketball with his tech-savvy chops and partnered with Second Spectrum. Developed by two former USC professors, Second Spectrum is the official optical tracker for the NBA. They now have a product that is an AI-driven digital overlay that will present data and animations over LA Clippers games. [Note: Ballmer owns the Clippers which he bought for $2bn from Shelly and Donald Sterling.]
Litesprite offers a video game designed to gather medically useful data on patient’s unique attitudes and capacities to achieve better treatment outcomes. Its first game “Sinasprite” gives customers like health-insurance companies, universities, and clinics get data on “which coping mechanisms seem to work best for each patient” in order to treat mental health symptoms. Future games will be oriented toward better treatments for diabetes and cancer. I worry a little about the leaps being made between behavior in a game to adherence to medication schedules and so forth.
Giant tech companies are fighting over the technology in court. Start-ups around the world are racing to develop new versions of it. And engineers say it is essential to making autonomous cars safe.
The obscure object of desire: lidar.
“We believe it will be the basis for autonomous driving,” said Guillaume Devauchelle, who oversees innovation at Valeo, a major parts supplier to automakers.
NCWIT said the school’s outreach and recruiting efforts (including a CS camp for high school girls and a chapter of Girls Who Code), and work to identify and eliminate the effects of implicit bias in admissions were two big reasons why UIUC received the top prize. UIUC’s work to improve teaching, support collaborative learning and invest in student activities, as well as adding a series of “CS + X” multidisciplinary majors (such as CS+Anthropology and CS+Astronomy) which have further attracted female students to computer science, also played a role.
Long reads and ear candy
Psychology as a discipline is having a crisis. Because this is academia and things move slowly, this crisis has been going on since 2012 with a flame-up in 2015 when Brian Nosek tried replicating 100 published psych findings and could only reproduce 36%. What about now, in 2017, can we trust psych studies?
As someone who’s held the job title of Data Scientist, who teaches Data Science classes, and who is a Founding Faculty member of the Data Science Major and the new Data Science Institute, I take very seriously the idea that Data Science can and should be an independent, novel field of scientific inquiry. And that it will be a massively important one at that. My musings on this recently came to a head with the release of a White Paper I helped write for the UC San Diego Division of Social Sciences (detailed below).
The rise of artificial intelligence in recent years is grounded in the success of deep learning. Three major drivers caused the breakthrough of (deep) neural networks: the availability of huge amounts of training data, powerful computational infrastructure, and advances in academia. Thereby deep learning systems start to outperform not only classical methods, but also human benchmarks in various tasks like image classification or face recognition. This creates the potential for many disruptive new businesses leveraging deep learning to solve real-world problems.
At Berlin-based Merantix, we work on these new business cases in various industries (currently automotive, health, financial and advertising).
Anthony Goldbloom shares lessons learned from top performers in the Kaggle community and explores the types of machine-learning techniques typically used. [video, 4:07]
MACHINE-LEARNING is beginning to shake up finance. A subset of artificial intelligence (AI) that excels at finding patterns and making predictions, it used to be the preserve of technology firms. The financial industry has jumped on the bandwagon. To cite just a few examples, “heads of machine-learning” can be found at PwC, a consultancy and auditing firm, at JP Morgan Chase, a large bank, and at Man GLG, a hedge-fund manager. From 2019, anyone seeking to become a “chartered financial analyst”, a sought-after distinction in the industry, will need AI expertise to pass his exams.
Despite the scepticism of many, including, surprisingly, some “quant” hedge funds that specialise in algorithm-based trading, machine-learning is poised to have a big impact. Innovative fintech firms and a few nimble incumbents have started applying the technique to everything from fraud protection to finding new trading strategies—promising to up-end not just the humdrum drudgery of the back-office, but the more glamorous stuff up-front.
David Culler, professor of electrical engineering and computer science at UC-Berkeley has been named interim dean of Berkeley’s new Division of Data Sciences. Some have asked: What is a division? Is it another Berkeley thing we’re all going to wish we had, like when DJ Khaled performed at the Berkeley stats graduation? … A “division” is a hybrid organizational form that is more inclusive than a department but more (methodologically) focused and smaller than a whole school. It is a thoughtful organizational attempt to locate data science within a recognizable bureaucratic entity that will promote (not stifle) university-wide participation in data science through curricula and research that draw on a core suite of computational methods in disciplinarily appropriate ways. While I can’t say if we’re all going to wish we had a division of data sciences yet, Berkeley’s strategy is definitely worth consideration. They have already been able to revamp their undergraduate core curriculum to ensure students from a huge range of majors have access to a) intro level coursework in data science, computer science, and statistics and b) ‘connector courses’ that tie the methods to a host of substantive domains.
Stanford professor, Ilya Strebulaev, and Will Gornall of the University of British Columbia recalculated the valuation of 100+ companies known as unicorns (startups valued at $1bn +) and showed many aren’t worth nearly as much as they claim. Why? Because math. Startups typically issue different classes of stock in each fundraising round but their valuations are oversimplified by applying the price of the most recent round to all outstanding shares. *Every* company they looked at was overvalued,
53 lost their $1bn unicorn status, and 13 were overvalued by more than 100 percent. … “Some unicorns have made such generous promises to their preferred shareholders that their common shares are nearly worthless,” the two professors wrote. In my opinion, this is an example of two things 1) lots of people cannot apply their math skillz and 2) the ethos of finance contains much magical thinking. The entire industry is obsessed with unicorns. According to Scottish myth, unicorns were ruthlessly hounded by clamoring hoards, simultaneously scapegoated for being the aberrant creatures they are and loved to death (e.g. abused, fatally) for their magical powers. Lesson: it’s clear that many in finance are not good at applying their history and culture skillz, either.
Harvard Universityrescinded offers of admissions to at least 10 students who created a Facebook group to exchange violent, racist, sexist memes. The group was at one point called, “Harvard memes for horny bourgeois teens”. Subject matter included memes suggesting sexual arousal caused by the torture of children, memes violently targeting minority groups, and numerous sexually explicit memes. There is no word on whether Harvard could or would admit students from the waitlist at this late date.
A Gallup poll found that just over half of American college graduates would choose a different major, degree type, or institution if they had a chance to do it all over again. I’m sure many institutions would happily admit these regretful souls to one of their many masters programs.
University of Cambridge researchers have figured out how to tell if a sheep is in pain from its facial expression. Then they trained a computer-vision model to take over for the humans. That process is now doing about as well as humans – 67% true positive rate – of identifying pained sheep. Identifying pain allows the sheep to be treated more quickly, reducing their suffering and potentially limiting the spread of infectious disease within a flock. This is one of my feel-good AI stories of the week. I left out the part where the authors note that humans don’t pay nearly enough attention to their sheep in the first place.
Bradley Voytek who is a founding member of the Data Science Institute at UC-San Diego has called for a “Data Science for Social Good” program similar to the one at the University of Chicago” to be part of their overall data science effort. I got an email from one of NYU’s MS in data science alums where he noted, “I’ve recently come to the realization that all of that time I spent learning better modeling techniques is kind of pointless if it doesn’t have an impact. (Duh, right?).” Data scientists, unite for good!
Computer science has a widespread cheating problem. Randy Katz, an EECS professor at UC-Berkeley discovered 100 of his 700 students had violated the course policy on collaborating or copying code.
Cornell’s new campus on Roosevelt Island in New York is designed to be eco-friendly. How did they do it?
Fei-Fei Li is a leader in image recognition who is now heading up Google‘s cloud computing team and keeping her toes in the faculty pool at Stanford. If you don’t know who she is or what she’s up to these days, get acquainted by reading this interview.
What’s my favorite type of billions? Billions of light years. Three billion light years away two black holes collided and the Laser Interferometry Gravitational-wave Observatory (LIGO) captured it. “The team has concluded that the gravitational wave was produced by the collision of two heavy, stellar–mass black holes, one estimated to be about 31 times, and the other 19 times, as massive as the sun.” There’s mesmerizing space art after the jump.
University of Washington researchers have developed a tool that reveals nefarious mobile phone surveillance. SeaGlass “detect[s] anomalies in the cellular network that may indicate the presence of [stingrays]” or devices that ‘appear’ to be legitimate phone towers but are actually packet sniffers. Researchers planted their devices in the trunks of ride-share vehicles to generate maximum coverage at minimal expense.
Duke University researchers have developed a slender wand that attaches to a laptop or mobile phone and could allow women to screen themselves for cervical cancer with the help of image recognition programs for spotting abnormalities.
Stanford engineering professor James Plummer has predicted Masters degree programs will disappear in the future to be replaced by just-in-time education. I may have my own doubts about masters programs, but none of those doubts involved their extinction. A social science analysis suggests the opposite, that masters degree programs will continue to proliferate unabated as our advanced capitalist society continues to encourage creeping credentialism and ‘just-in-time’ degrees have not figured out how to offer the reputation asset (e.g. social and cultural capital) a university degree grants.
arXiv, Computer Science > Learning; Peter Bailis, Kunle Olukoton, Christopher Re, Matei Zaharia
from
Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In this document, we outline opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What’s Next) project at Stanford.
That’s the motto of Measures for Justice, a nonprofit that just launched an online data portal to address unanswered questions about the American criminal justice system. Though the prison system is highly critiqued by activists, we know very little about what life looks like for the more than 2 million prisoners currently held in county prisons around the U.S.
The prevailing narrative goes that Silicon Valley’s technology companies, like Alphabet and IBM, will bring super-computing to medicine, thus transforming a $ trillion broken sector. Cha-ching. Computers will detect complex diagnoses in piles of medical images in overrun clinics, and that’s just the beginning.
But as digital health futurist Maneesh Juneja pointed out this week, there’s a difference between hype and hope. So CNBC interviewed a half-dozen doctors to find out what they think.
Whether you focus on human health or ecological health, adaptation to climate change or a humanitarian crisis, SNAPP announces its 2017 Request for Proposals to convene interdisciplinary teams, and invites you to be a part of the solution! Deadline to submit RFP is June 7.
At Model Lab, we believe robust simulation tools can help illuminate and inform the benefits and costs of transport-related service, policy, and infrastructure decisions. To understand how transportation interventions impact communities, the models we build need to adequately represent every person living in a community today and every person expected to be living there tomorrow. Our first step in creating a model system that achieves this goal is a toolkit we call Doppelgänger.
“We’re open-sourcing OpenAI Baselines, our internal effort to reproduce reinforcement learning algorithms with performance on par with published results.”
“In this tutorial, we implement the MNIST classifier using a simple neural network and visualize the training process using TensorBoard. In training phase, we plot the loss and accuracy functions through scalar_summary and visualize the training images through image_summary. In addition, we visualize the weight and gradient values of the parameters of the neural network using histogram_summary.”
“OpenVisConf 2017, like the years past, brought a mix of high quality talks from visualization practitioners and researchers. The talks are now online. All are worth a look, but there are a few I’d like to amplify.”
Over the past 2 years there has been a growing number of initiatives to address data management issues in scientific research. While this is no news in some fields like genetics, where input data as well as derived data (results) are shared in standardised digital formats, in other fields researchers have been mainly left alone to develop their own data structure and data management plan, i.e. “reinventing the wheel” multiple times. In Finland, the last round of Academy of Finland application was – for the first time – explicitly asking for a data management plan (DMP). To help the applicants, web tools such as https://www.dmptuuli.fi/ were developed to specifically follow Academy of Finland guidelines as well as other DMP templates such as ERC’s Horizon 2020, NIH, Wellcome Trust. DMPtuuli is based on https://dmponline.dcc.ac.uk/ delivered by http://www.dcc.ac.uk/. Another important resource in Europe is the EUDAT platform https://www.eudat.eu/ that goes beyond DMP, to also include data sharing, data preservation, data processing, meta-data indexing.