Data Science newsletter – August 8, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for August 8, 2017

GROUP CURATION: N/A

 
 
Data Science News



University Data Science News

MIT Media Lab professor Iyad Rahwan and his student Bjarke Felbo have gotten one step closer to being able to detect sarcasm in (semi)natural language using DeepMoji. DeepMoji is, of course, an algorithm using deep neural nets to predict which of 64 emoji would be used with the text in a message. Then they used a training set of sarcastic tweets to predict sarcasm in unlabeled tweets. The neural net with the emoji training did better than the one without. DeepMoji, like most humans, did not understand Trump’s covfefe tweet. So that’s fine. But DeepMoji may detect other types of sarcasm so nuanced that people cannot detect it. That’s a weird kind of Type 2 error. 💩

Columbia University researchers are working on a hearing aid that will read brain waves to discover which source a listener in a crowded soundscape is trying to focus on in order to amplify just the sounds associated with the source of interest. Ageing is going to be so different for me than it will be for my grandparents. I’ll have safe mobility until I cannot move at all courtesy of self-driving cars and I’ll be able to listen clearly to my friends and family and nearby birds. Now, about financing retirement. Let’s work on that.

Speaking of financing, Columbia is offering a new masters degree in data journalism that will set students back about $100,000. Is that kind of debt responsible for students entering a field that has 1) moderate salaries and 2) a shrinking demand for labor? Not everyone can work for BuzzFeed.

Funders like the Wellcome Open Trust, Bill and Melinda Gates Foundation, and now the European Commission are offering free-to-peruse publishing platforms for their grantees. This speeds up the publishing pipeline and may provide a halo of institutional credibility that may or may not adhere to, say, arxiv.org. Some disciplines have a lot of respect for arxiv publications, others do not. Publishing under the umbrella of a respected funder may solve some of the credibility/status/career advancement problem. It certainly avoids paying additional fees to OA journals or getting entangled in the for-profit publishing rat’s nest. This does not mean science is no longer a rat race, however. Get on that wheel and run after those grants, readers.

Georgia Institute of Technology had such success with its online CS masters degree that it decided to try teaching CS undergrads online. Fifty students took an online version of Intro to Computing that was simultaneously running as an in-person for 350 students. They found “no significant difference in grades or accumulated knowledge, based on test scores” and are planning to go big with online undergrad CS. I would like to know how students were selected into the online class. Unless it was random assignment, I don’t trust the results.



Pittsburgh is transitioning from being a steel town to being a robotics hub. Carnegie Mellon University is moving its Advanced Robotics Manufacturing Institute into a steel rolling mill formerly occupied by LTV Steel, a one-time manufacturing behemoth that closed in 2000. The development is called Almono (not sure why, sounds too much like Alamo to me). Two of the buildings will be powered by solar panels, which is thoughtfully forward-looking.

GT’s Ian Bogost, a professor in computational media, is thinking about what he does to prevent, “the gross machismo of computing writ large.” He argues that, “More women and minorities are needed in computing because the world would be better for their contributions—and because it might be much worse without them.” I wonder what types of cultural messages are communicated in online teaching scenarios? Does online teaching impact the culture of computing? If so, for which types of students, how, and to what degree?



Andrew Ng has a new deep learning class on Coursera which is probably awesome judging by how many people like his Machine Learning course. It’s cool that he took time to put this course together after leaving Baidu earlier this year.



Software Carpentry and Data Carpentry have formally merged into one organization. Tracy Teal is the Executive Director.



MIT, in partnership with the Boston University Law School now offers a Technology and Cyberlaw Clinic to students who are either afraid their inventions might violate someone’s patent/law, wish to get smart contracts in place before starting a venture, or have already been sued for some kind of technology law violation. I’m conflicted about whether I should recommend that all technology-forward schools establish such a clinic or if I should note that MIT is a bunch of iconoclastic inventors who would, of course!, need such a clinic where others do not. (Full disclosure: I have a couple degrees from MIT.) Your opinions, dear readers, are always welcome.

Marta Zlatic at the Howard Hughes Medical Institute’s Janelia Research Campus has mapped the entire neurological circuitry of the fruit fly. For those of us who don’t use fly models (e.g. most of us), this may seem boring. But there’s a lot of complexity even in the fruit fly brain and it’s extremely useful from a scientific perspective to have a full model.

Monika Henzinger won the ACM’s Test of Time award (what a cool paper award – more disciplines should do this) for a paper written in 1998 entitled “Improved Algorithms for Topic Distillation in a Hyperlinked Environment.” In it she proposed, “algorithms that initiate web search by topic rather than the precise wording of the given query” which is still an active area of research.


As politicians become less civil, so does the internet

Vox, Rishab Nithyanand, Brian Schaffner, and Phillipa Gill


from

We examined every political comment posted on Reddit from January 2015 to January 2017, more than 39 million comments in total. Because it would be impossible to code millions of comments by hand, we instead developed an algorithm to accurately identify “offensive” comments. Among other things, the algorithm tagged posts as offensive when they used profanity, compared someone to Hitler or a pedophile, engaged in name calling, and so on.

We tested the accuracy of our algorithm on a data set of 14,500 tweets that were coded by humans, and our automated approach correctly classified 90 percent of the tweets based on whether or not human coders would have found them offensive. In testing, we have found that when the classifier does make errors, it is more likely to underestimate the presence of offensive speech. Thus, our results can be considered conservative estimates of the presence of offensive speech on Reddit.


Data Science vs. Data Engineering – How to Fund Discovery with Charles Martin

Max Mautner, The Accidental Engineer blog


from

Charles re-joined us for a conversation about how new discoveries in data science get funded.

In the absence of the National Science Foundation (NSF) and an increased focus in industry on “engineering” wins, Charles describes how he convinces companies to fund original data science research. [video, 35:28]


Company Data Science News

eBrevia is an artificial intelligence engine designed to do contract review for mergers and acquisitions, usually one of the highest grossing areas in modern banking. It’s been adopted by Baker McKenzie, the 3rd largest law firm in the U.S. by revenue. This is just one of what will be a wave of text analysis tools rolling out across the legal field. Legal language is odd for humans, but should be excellent for machine reading because it is parsimonious, using nouns, verbs, and phrases with explicit, singular meaning.

Massachusetts Mutual Life Insurance Company has partnered with the University of Vermont’s Complex Systems Center to the tune of half a million bucks. Is it just me or does it make your skin crawl to think about data science in the service of insurance?

Google has a project called Magenta that trains on existing music to make new music. It’s decent at classical music (which is all that I heard, maybe it’s good at other genres, too). Not as good as the best human-composed music, but better than a lot of human-composed music. Expect to see much more in the creative AI space.



Google and MIT researchers just made it easier to live for the ‘gram. ‘Gram, in this case, is short for Instagram which is owned by Facebook, a Google competitor. Sort of. Anyway, the team has produced an intelligent photo retoucher that will automagically make your social media posts super like-able! Because…nobody cares about portraiture, we all want glossy glamour shots designed to make our followers feel inadequate. Have I ever mentioned that I teach a couple different data science classes where we get to ask “what could go wrong?” and “what kind of world do we want to live in?”? These classes give some meaning to my life.

Also about Instagram: according to research undertaken at Harvard people who have a history of depression post pictures containing more blue pixels than people who do not. I cannot even imagine a causal arrow here, or at least the one in my mind looks like a tangle box of string. Let’s all just take a deep breath before we start diagnosing mental illness based on instagram posts.

SmartAssist an AI startup in Seattle that offers customer support processes (ever feel like you are talking to a chatbot? Get used to it.), just raised $5m to hire more humans – likely pale and male if the diversity reports from the tech industry are accurate – that will help develop a process likely to replace call center workers (who are more likely to be women and brown-skinned). The question should not be: will robots displace workers? It should be: which types of workers will be displaced by robots?

Tableau bought ClearGraph a company that makes it easy to visualize data based on natural language queries. Microsoft BI already offers a similar product.

OpenAI, the heavily funded AI non-profit that is interested in preventing AI doom, gave an intellectual tour to a reporter from the New York Times.

Satellogic a private satellite company planning to have 300 satellites in orbit by 2022 is offering its data to astronomers for free.

Palantir one of the big-brother-est of data science companies has gotten into policing. It would not have taken any kind of AI to predict this.

Google fired James Damore after he wrote a careful, thoughtful, 10-page essay on why women are inferior technologists. That is upsetting, but predictably so. What is even more upsetting is that precision medicine apparently works best for white people. This, too, may somehow be predictable. Numerically, there are more white people and historically white people have had better access to health care so there is better white-people medical data. This is a totally unacceptable state of precision medical affairs! It can be corrected with better attention to data sources and better access to health care for racial minorities. It’s a case in point that if we use machine learning to model the world as it is we are doomed to repeat the injustices of the past.



This has not been a fun newsletter to write.


Government Data Science News

Germany now has an Open Data Law, giving citizens access to government data if it was provided in electronic format originally. The law also established a central support agency for open data.



China uses a bunch of commentators on social media to suppress political dissent and amplify pro-regime messaging. We heard a preview of this research a year and a half ago in Text as Data from one of the authors, Meg Roberts (come to this seminar in the fall! It’s so good), and now you can read all about the Chinese techniques here.

DARPA has no problem coming up with cool names for their projects, though I quibble with the scale implied here. The Colosseum is a 20 x 30 foot server room at Johns Hopkins that emulates tens of thousands of possible interactions among wireless communication devices including phones, IoT gadgetry, and military radios. The simulator supports testing all sorts of RF-based products that need to work in dense urban environments such as self-driving cars.

Canada, just like the US, is facing crippling cuts to federal research budgets. But Trump does not equal Trudeau in so. many. ways.



Russia has all sorts of cyber tools aimed at democratic foreign governments and is actively seeking to destabilize countries like the US. No wonder the Russians apparently supported Trump but now want little to do with him, right?

The Federal Communications Commission done lost its mind, suggesting that maybe Americans don’t need highspeed internet access at home. We can make do with our slow, battery depleted phones! Why would the FCC, responsible for communications, suggest that Americans may not need to use computers or televisions? Because the FCC is currently responsible for providing high speed internet access to homes and has not been able to get it done since the Bush years. How are we supposed to be an innovation leader without broadband at home? Does Ajit Pai fail to realize the central role garages played in the invention of personal computing?


Cognitive hearing aid uses AI and brain waves to enhance voices

Digital Trends, Luke Dormehl


from

A new piece of technology coming out of Columbia University School of Engineering and Applied Science could make things even better, however — courtesy of a hearing aid that is designed to read brain activity to determine which voice a hearing aid user is most interested in listening to and then focusing in on it. The resulting “cognitive hearing aid” could be transformative in settings like crowded rooms in which multiple people are speaking at the same time.


From mandates to platforms: Have funders lost patience with publishers?

Copyright Clearance Center, Suzanne Kavanagh


from

The Wellcome Trust, Bill & Melinda Gates Foundation, and the European Commission have a lot in common. Not only do they fund science worldwide, but they now provide—or are developing—an open publishing platform to guarantee subscription-free access to the research they support. Have they finally lost patience with the pace of OA? Or, is it a natural evolution for scholarly communications?


How U.S. government statistics work, explained by the country’s Chief Statistician

The Washington Post, Henry Farrell


from

Nancy Potok is the chief statistician of the United States. I interviewed her last month about her role, and the challenges faced by the U.S. national statistics system. The interview has been lightly edited for style and flow.


UVM partners with MassMutual in groundbreaking data science initiative

Vermont Business Magazine, University of Vermont


from

Seeking to expand the applications of computational, social and data science, Massachusetts Mutual Life Insurance Company (MassMutual) announced today that it is providing the University of Vermont (UVM) $500,000 to fund an innovative pilot program within the university’s Vermont Complex Systems Center. It will create a new doctoral fellowship, support faculty research and hire a visual data artist-in-residence.


NSF Issues Awards to Advance a National Research Infrastructure for Neuroscience

CCC Blog, Helen Wright


from

The National Science Foundation (NSF) has made 17 Next Generation Networks for Neuroscience (NeuroNex) awards to aid the research community as it pursues one of its grandest challenges: understanding the brain.

These projects will support the development of innovative, accessible and shared capabilities and resources, as well as theoretical frameworks and computational modeling to advance neuroscience research. The overall goal of this activity is to establish a coherent national infrastructure to enhance our understanding of brain function across organizational levels and a diversity of species.


China is perfecting a new method for suppressing dissent on the internet

Vox, Sean Illing


from

The art of suppressing dissent has been perfected over the years by authoritarian governments. For most of human history, the solution was simple: force. Punish people severely enough when they step out of line and you deter potential protesters.

But in the age of the internet and “fake news,” there are easier ways to tame dissent.

A new study by Gary King of Harvard University, Jennifer Pan of Stanford University, and Margaret Roberts of the University of California San Diego suggests that China is the leading innovator on this front. Their paper, titled “How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument,” shows how Beijing, with the help of a massive army of government-backed internet commentators, floods the web in China with pro-regime propaganda.


Georgia Tech plans extension of undergraduate online computer science course

Inside Higher Ed, Mark Lieberman


from

Georgia Institute of Technology’s online, MOOC-inspired master’s degree in computer science has many educators watching closely. This spring, the university tried a similar approach for undergraduates and found it so successful that it’s continuing along a path to shave off up to a year and a half of in-person instruction for students pursuing a bachelor’s degree.

Fifty-nine students enrolled in the experimental Intro to Computing online course this spring, while approximately 350 students took the course in person. The university found no significant difference in grades or accumulated knowledge, based on test scores, between students in the two course models, according to a report compiled by the online course’s instructor, David Joyner, a lecturer at Georgia Tech’s College of Computing who also teaches several courses in the university’s online computer science master’s program.

In fact, test scores were slightly higher on average for the online students, though not to a statistically significant degree.


What we learn when a machine ‘listens’ to Miles Davis

University of Rochester, NewsCenter


from

Altabef and Graham are at Rochester as participants in a new Research Experience for Undergraduates, or REU, program called Computational Methods for Understanding Music, Media, and Minds. REUs are designed at universities and funded by the National Science Foundation as part of a federal effort to make research accessible to undergraduate students from as many backgrounds as possible. Ten undergraduates are participating in this REU this summer, seven of whom are from other universities.

Computational Methods for Understanding Music, Media, and Minds, offered through the Goergen Institute for Data Science and the Kearns Center for Leadership and Diversity, capitalizes on some of the University’s core strengths—in music and data science research as well as in multidisciplinary research opportunities for undergraduates. Altabef and Graham, for example, are benefiting from working alongside two mentors from different, but related, disciplines. Darren Mueller is an assistant professor of musicology at the University’s renowned Eastman School of Music and an expert on cultural aspects of music. Stephen Roessner, a Grammy Award–winning lecturer in audio and music engineering, is an expert in recording techniques and analyzing sound quality.

 
Careers


Full-time positions outside academia

Research Associate



Phylagen; San Francisco, CA

Leave a Comment

Your email address will not be published.