Data Science newsletter – November 20, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for November 20, 2017

GROUP CURATION: N/A

 
 
Data Science News



Nixon Peabody: 3 shady things that digital health start-ups do

CNBC, Christina Farr


from

Health technology is a booming sector, with more than $3.5 billion in venture money flowing into the space in the first half of the year alone.

As often happens, when the money flows, the crooks follow. The health-tech space is filled with dodgy growth hacks and downright illegal practices.

I asked one of the top lawyers in the space, Nixon Peabody’s Jill Gordon, to detail the three dodgiest practices that she sees (and attempts to thwart) in digital health.


Reasoning About Data

Simply Statistics blog, Roger Peng


from

In my ongoing discussion in my mind about what makes for a good data analysis, one of the ideas that keeps coming back to me is this notion of being able to “reason about the data”. The idea here is that it’s important that a data analysis allow you to understand how the data, as opposed to other aspects of an analysis like assumptions or models, played a role in producing the outputs. I think for a given problems, some kinds of analysis do a better job of that than others.


Using social media big data to combat prescription drug crisis

EurekAlert! Science News, The Geisel School of Medicine at Dartmouth


from

Researchers at Dartmouth, Stanford University, and IBM Research, conducted a critical review of existing literature to determine whether social media big data can be used to understand communication and behavioral patterns related to prescription drug abuse. Their study found that with proper research methods and attention to privacy and ethical issues, social media big data can reveal important information concerning drug abuse, such as user-reported side effects, drug cravings, emotional states, and risky behaviors.

Their work, “Scaling Up Prescription Drug Abuse and Addiction Research Through Social Media Big Data,” is reported in the Journal of Medical Internet Research.


Using Azure and AI to Explore the JFK Files

Microsoft, Cortana Intelligence and Machine Learning Blog, Corom Thompson


from

A the Microsoft Connect(); 2017 event, we created the demo web site* shown in Figure 1 below – this is a web application that uses the AzSearch.js library and designed to give you interesting insights into this vast trove of information.


Tech Leaders Dismayed by Weaponization of Social Media

IEEE Spectrum, Tekla S. Perry


from

The tech industry can’t hide from the information war, particularly when its own creations are being weaponized.

That was the consensus of a panel at the Techonomy17 conference in Half Moon Bay, Calif., last week. The group assembled to discuss the meaning of authority in a networked, artificially intelligent world. The panelists quickly zoomed in on the manipulation of Facebook, Google, and other sites by Russians during the U.S. presidential election. They, as well as several other speakers at the conference, painted a dark picture of our current online world for at least the immediate future; they concluded that preventing such manipulation is not going to be easy.

“I spent my whole life working in civil liberties, and I didn’t see this coming,” said Marc Rotenberg, president of the Electronic Privacy Information Center. “Democratic institutions, the rule of law…are up for grabs. Anything can happen.”


Government Data Science News

The Department of Immigration & Customs Enforcement (ICE) is reportedly moving ahead with plans to use social media accounts to vet immigrants and travelers to the US as reported by ProPublica who had a reporter at a cybersecurity event in Arlington, VA last week. [We had advertised the event in this newsletter – did anyone get to attend? Want to share what you heard?] Given how dynamic the meaning underlying social content is, many scholars who use text analysis tools are worried this new social media dragnet will catch innocents, miss real threats, and subject far too many to invasive surveillance with little or no uptick in actual security. The Brennan Center for Justice has launched a website around ICE’s new vetting policies to help legal scholars understand the new social surveillance landscape.

Toronto will share its traffic data with Google’s Waze. Recall that Toronto is one of the Canadian cities with a very strong support system for artificial intelligence projects, going so far as to sign a deal with Google’s Sidewalk Labs to build a smart city from the ground up. This has vaulted Toronto into city-level civic tech leadership, putting a more top-down (tech-down) infrastructural plan in place. This is in contrast to cities like Chicago, San Francisco and New York that have pursued Open Gov and civic data sharing in a more grassroots way with meetup groups and an agency-to-agency approach. [University of British Columbia researchers are also working to understand how Twitter data streams indicate traffic patterns.] It will be instructive to compare these efforts with the Toronto’s full-on smart city strategy.

The FCC is planning to repeal net neutrality rules. Chairman Ajit Pai believes that the current policy, enacted under Obama “micromanage the internet”. The data science community is largely arrayed in favor of net neutrality (thus, in opposition to the proposed repeal) because it would give unchecked power to large oligopolistic internet service providers in determining the speed at which consumers can access content via the internet. There are allegations that bots have been leaving hundreds of thousands of pro-repeal messages on the website the FCC has made available for public comments. New York’s Attorney General Eric Schneiderman has opened an investigation.



The 2020 US Census is has been hanging in limbo since the Census Bureau lost its director several months ago. (There are additional complications like budget shortfalls amidst demands to upgrade Census-taking digital infrastructure before that.) Now the Trump administration has an appointee for one of the leadership spots. It’s University of Texas at Dallas professor Thomas Brunell, a man with no government experience. Brunell is a political scientist who authored a book in 2008 entitled Redistricting and Representation: Why Competitive Elections are Bad for America and has frequently offered testimony in support of Republican plans to gerrymander congressional districts. Voting rights advocates and fans of democracy are alarmed. And that’s putting it mildly.



Some are calling to shutter clinicaltrials.gov because it’s not set up to inform patients about the relative quality of trials while the NIH’s “All of Us” precision medicine project gathers steam. Is this the sign of a tipping point from randomized controlled trials to a more data-driven, sensor-based approach to medical research?

NASA Jet Propulsion Lab is pitting its AI against the best human drone racers.


A gymnastics coach rates Boston Dynamics’ flipping robot.

Slate, Lila Thulin


from

We wanted to know how impressive, so Slate asked a gymnastics coach to rate Atlas’ parkour skills. According to gymnastics coach Aryan Mazloum, Atlas’ backflip—a back salto, if you want the technical term—is not bad. “It’s pretty fantastic to be able see a robot have the center of gravity and be able to not only just move, but literally flip and catch itself,” said Mazloum, a junior Olympic coach at Northern Virginia’s Capital Gymnastics National Training Center. (He’s also working toward a Ph.D. in informatics at George Mason University.)

The back salto, Mazloum explains, is “an intermediate skill” that coaches introduce in the fifth level of USA Gymnastics.


Tim Berners-Lee on the future of the web: ‘The system is failing’

The Guardian, Olivia Solon


from

Sir Tim Berners-Lee’s optimism about the future of the web is starting to wane in the face of a “nasty storm” of issues including the rollback of net neutrality protections, the proliferation of fake news, propaganda and the web’s increasing polarisation.

The inventor of the world wide web always maintained his creation was a reflection of humanity – the good, the bad and the ugly. But Berners-Lee’s vision for an “open platform that allows anyone to share information, access opportunities and collaborate across geographical boundaries” has been challenged by increasingly powerful digital gatekeepers whose algorithms can be weaponised by master manipulators.

“I’m still an optimist, but an optimist standing at the top of the hill with a nasty storm blowing in my face, hanging on to a fence,” said the British computer scientist.


Based on your morals, a debate with a computer to expose you to other points of view

FlowingData, MIT Media Lab


from

Collective Debate from the MIT Media Lab gauges your moral compass with a survey and then tries to “debate” with you about gender bias using counterpoints from the opposite side of the spectrum. The goal isn’t to be right. Instead, it’s to try to understand the other side. At the end, you see how you compare to others.


University Data Science News

Harvard University is being threatened with a lawsuit by the US Department of Justice for allegations that its admissions policies discriminate against white and Asian American applicants. The DoJ notes that Harvard has “pursued a strategy of delay and has not yet produced even a single document” and is compelling the school to hand over admissions records by December 1st. Seth Waxman, a lawyer for Harvard, noted in a letter that the DoJ’s request is “so outside ordinary practices that Harvard is obliged to clarify the authority and rationale for the Department’s decision” before submitting the requested documentation. Don’t expect this to be sorted out any time soon. Do expect many schools to reconsider their affirmative action policies even absent a conclusive adjournment of Harvard’s case as a way to avoid costly legal battles.



Stanford University researchers in Andrew Ng’s group have developed an algorithm to diagnose pneumonia that outperforms radiologists. What’s remarkable is that the data set was released by the National Institutes of Health only recently – on September 26th – and the algorithm has already been developed and validated. “After about a month of continuous iteration, the algorithm outperformed the four individual Stanford radiologists in pneumonia diagnoses. This means that the diagnoses provided by CheXNet agreed with a majority vote of radiologists more often than those of the individual radiologists.” This is certainly something to be grateful for.



The March of Dimes is now supporting a new database full of research data on prematurity in infants. Marina Sirota leads the effort and points to the success of large open databases of cancer data that have accelerated medical breakthroughs there. [University of Michigan startup, Genomenon, has a business model based on cancer genes’ datum.] Globally, premature birth is now the most common killer of children under 5.



The University of Colorado is organizing to improve researchers’ sharing equipment and technical services. The initiative, led by Kathryn Ramirez-Aguila, wants to “spur a wave of culture change across campus laboratories.”

The Alan Turing Institute, the UK’s national institute for data science, has announced it will add four new universities to its network, joining the five founding members. The new universities are Leeds, Manchester, Newcastle and Queen Mary University of London.



If you use SciPy you probably want to read this post that gets into the technological tensions between stability and flexibility in code bases. On the one hand, stability may increase reproducibility (good for science), but decrease flexibility (bad for living code bases). I’ll let Konrad Hinson, a scientist and developer, explain after the jump.



The American Statistical Association published findings from a new survey of 2016 graduates. The top line findings are that “nearly one-third of respondents double-majored, with economics and mathematics being the most common majors” and that “respondents who had jobs outnumbered those in graduate school almost two to one” with a median starting salary of $55,000.


DDN Provides University of Tennessee’s SimCenter with Big Data Storage to Support Machine Learning and Data Analytics

HPC Wire


from

DataDirect Networks (DDN) today announced that The University of Tennessee, Chattanooga (UTC) has selected DDN’s GS14KX parallel file system appliance with 1.1PB of storage to replace its aging big data storage system and to support a diversifying range of data-intensive research projects. The Center of Excellence in Applied Computational Science and Engineering (SimCenter) at UTC needed a big data storage solution that could scale easily to support growing research programs focused on computational fluid dynamics (CFD), machine learning, data analytics, smart cities and molecular biology. The DDN GS14KX is purpose built to address the comprehensive needs of HPC environments and manage huge data growth, enabling organizations, such as SimCenter, to scale their environments and take advantage of new, data-intensive research disciplines such as machine learning.


For Minorities in Science, Building Communities Matters

Undark magazine, Lesley Evans Ogden


from

Against this backdrop, the representation gap among tenured scientists employed at U.S. universities and four-year colleges becomes clear. Among male scientists of such status, nearly 80 percent are white, according to 2013 data. Asians make up another 12 percent of male scientists, while blacks, Hispanics — and indeed, all other backgrounds — account for less than 5 percent each. Most female scientists are also white — 81 percent — with Asians accounting for 9 percent of women scientists, and blacks and Hispanics accounting for 4 percent each.

It was with numbers like these in mind that Hill, along with Tang-Martinez and others, launched the Charles H. Turner Program in 2002. Named after one of the first African-American researchers in animal behavior, the program’s original aim was to increase participation at society meetings by undergraduates from visible minorities. But as explained by Tang-Martinez, who now serves as the society’s historian, the program today defines minorities more broadly, to include the LGBTQ community and those experiencing disability.

Initiatives like this one aim to support the careers of minority scientists — in part by getting them to the professional networking and educational events that are so often out of reach. The thinking behind such programs is that encouraging interest in the sciences is not enough, and that providing mentorship and ongoing career guidance is key to keeping minorities anchored in research fields — and the communities that grow up around them.


Amazon is Becoming the New Microsoft

Robert Cringeley; I, Cringely blog


from

This column is about the slugfest to determine what company’s public cloud is most likely to prevail. I reckon it is Amazon’s and I’ll go further to claim that Amazon will shortly be the new Microsoft.

What I mean by The New Microsoft is that Amazon is starting to act a lot like the old Microsoft of the 1990s. You remember — the Bad Microsoft.


How algorithms are pushing the tech giants into the danger zone

The Guardian, Sara Wachter-Boettcher


from

The algorithms Facebook and other tech companies use to boost engagement – and increase profits – have led to spectacular failures of sensitivity and worse. How can we fight back?

 
Events



EGG2017 – The Non-Conforming Data Science Conference

Dataiku


from

New York, NY “Join us on November 30th in Manhattan, at the famous 230 Fifth venue rooftop!” [$$$]


Engaging with the Public: Why Scientists Need to Communicate Their Work

The New York Academy of Sciences


from

New York, NY November 30, starting at 6 p.m., New York Academy of Sciences (7 World Trade Center). [$$]


Modeling and Visualizing Science and Technology Developments

Sackler Colloquia of the National Academy of Sciences


from

Irvine, CA December 3-5. “This colloquium brings together researchers and practitioners from multiple disciplines to present, discuss, and advance computational models and visualizations of science and technology.” [$$$]

 
Deadlines



Apply for a Prize Grant to Host a Data Science for Good Event

Inviting community talent on Kaggle’s Datasets and Kernels platforms to achieve progress on non-profit organizations’ data initiatives.

Call for Abstracts | 2018 International Conference on Computational Social Science

Evanston, IL Conference is July 12-15, 2018. “We welcome submissions on any topic in the field of computational social science, including (a) work that advances methods and approaches in computational social science, (b) data-driven work that describes or explains social phenomena, and (c) theoretical work that generates new insights into computational social science research.” Deadline for submissions is February 4, 2018.
 
Tools & Resources



[D] Low entry barrier is destroying deep learning reputation

reddit.com/r/machinelearning


from

… My company recently assembled a team to try to do “deep learning” that consisted of two PhD statisticians, a ML PhD and a PhD engineer.

The statisticians worked their asses off trying to build clever approximations of distributions involved to better select features.

The ML PhD built a cutting-edge adversarial approach to the problem and wrote it up in tensorflow.

The engineer wrote “from sklearn.ensemble import random_forest” because that’s what she knew would probably do fine.

At the end of the day, the statisticians had wrong assumptions and the adversarial model didn’t have enough data to get anywhere, but the random forest kept on trucking.


Shannon Entropy, Information Gain, and Picking Balls from Buckets

Medium, Udacity, Luis Serrano


from

Entropy and Information Gain are super important in many areas of machine learning, in particular, in the training of Decision Trees. My goal is to really understand the concept of entropy, and I always try to explain complicated concepts using fun games, so that’s what I do in this post. Enjoy!


An On-device Deep Neural Network for Face Detection – Apple

Apple Machine Learning Journal


from

Apple started using deep learning for face detection in iOS 10. With the release of the Vision framework, developers can now use this technology and many other computer vision algorithms in their apps. We faced significant challenges in developing the framework so that we could preserve user privacy and run efficiently on-device. This article discusses these challenges and describes the face detection algorithm.

 
Careers


Internships and other temporary positions

Graduate applicants with skills in data science



University of British Columbia, Gambling Research Center; Vancouver, Canada
Full-time positions outside academia

Life Sciences Group Leader



Barcelona Supercomputing Center; Barcelona, Spain

Scientific Visualization Engineer



St. Jude Children’s Hospital; Memphis, TN

Leave a Comment

Your email address will not be published.