Data Science newsletter – October 2, 2017

Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for October 2, 2017

GROUP CURATION: N/A

 
 
Data Science News



University Data Science News

Steven Englehardt (Princeton Center for Information Technology Policy) looked into email tracking and found that the “privacy risks of email tracking extend far beyond senders knowing when emails are viewed. Opening an email can trigger requests to tens of third parties, and many of these requests contain your email address. This allows those third parties to track you across the web and connect your online activities to your email address, rather than just to a pseudonymous cookie.” With respect to privacy, this is important! The email address is one of the more robust, informative personal identifiers that is widely used and, apparently, readily available to big advertising firms.



The Carnegie Mellon University Department of Statistics is changing its name to the Department of Statistics and Data Science. Importantly, the name change doesn’t reflect a substantive change in what the department offers. Here’s what it sounds like when academics engage in what other industries would do in their marketing departments: “Because our approach to statistics encompasses the essence of what data science is and can be, we have decided to change the name of the department to the Department of Statistics and Data Science. We feel this name more accurately reflects the department’s position on the leading edge of research and education in a data-rich world,” stated Richard Scheines, the Dean of Dietrich, and Christopher Genovese, head of the Department of Statistics and Data Science in a broadcast email. I am 100% accepting of the value of marketing and would have been fine with an explanation that the department would attract more students and research funding with the term “data science.” Even without the name change, student interest in statistics has sky rocketed: “the number of undergraduate statistics majors is four times the size of what it was in 2010, and twenty times the size of what it was in 2003.”



University of Toronto just opened a very impressive new lab that could unite “the booming field of machine learning with the nascent technology of quantum computing”. Should be hot if it works, but it’s a rigorous challenge.

Human-Computer Interaction researchers at Stanford have things to say about Data Science: something like, “be careful out there.” First Terry Winograd, next R. David Dixon, Jr..

University of Southern California’s Center for Body Computing is partnering with Lyft to study how the ride sharing service can better serve elderly people who are unable to drive their own cars to doctor’s appointments (and elsewhere). This is a kind, friendly, progressive dispatch from the often malignant world of ride sharing companies locked in fierce, at times illegal, competition.



University of Michigan has launched a new precision medicine initiative. The goals are broad, ambitious, and holistic starting with “a baseline of genomic and medical factors and incorporating data from sensors and wearables, weaving in social and environmental factors as well as behavior and lifestyle strategies”. The first project will look at America’s opioid crisis. They are in line with a similar, albeit more entrepreneurially focused, effort at Johns Hopkins.



Elsewhere at the University of Michigan a mobility startup incubator will help advance the school’s existing autonomous vehicle research TechLab.

Clemson University has the AI version of the field of dreams strategy: build a super computer and top researchers will come. So far, it’s working!



DeepMind announced it will form an ethics group: DeepMind Ethics and Society that will be co-advised by Columbia University development professor Jeffrey Sachs, Oxford AI professor Nick Bostrom, and climate change campaigner Christiana Figueres to advise the unit. How can I get a seat at that table?



Carly Strasser and Eesha Khare have a new paper out that estimates the effects of mandating that research must published in publically accessible ways. It’s great to have foundations sharing their thought process and publishing.



And journals are listening. This week the Journal of the American Medical Association announced a new journal: JAMA Network Open. “Our editorial goal is to publish the very best clinical research across all disciplines, serving the worldwide community of investigators and clinicians and meeting the evolving needs and requirements of authors and funders.”



Michael Cohen, the MIT Ph.D. and Simons Foundation fellow, died at age 25 of natural causes earlier this week. Colleagues called him a rising star and phenomenal talent. He will be deeply missed.


You better explain yourself, mister: DARPA’s mission to make an accountable AI

The Register, Dan Robinson


from

The field of AI has made great strides in the last several years, thanks to developments in machine learning algorithms and deep learning systems based on artificial neural networks (ANNs). Researchers have found that vast sets of example data are the way to train up such systems to produce the desired results, whether that is picking out a face from a photograph or recognising speech input.

But the resultant systems often turn out to operate as an inscrutable “black box” and even their developers find themselves unable to explain why it arrived at a particular decision. That may soon prove unacceptable in areas where an AI’s decisions could have an impact on people’s lives, such as employment, mortgage lending, or self-driving vehicles.

Because of this, a number of organisations as well as DARPA have started to take an interest in making AI systems more accountable, or at least able to explain themselves so that their decision-making processes can be tweaked if necessary.


Department of Statistics and Data Science name change reflects shift in the field

Carnegie Mellon University, The Tartan student newspaper, Nora Mattson


from

The name change from the Department of Statistics to the Department of Statistics and Data Science was marked by some interesting statistics of its own. Applicants to Dietrich thinking about studying statistics have doubled since 2013, and the number of undergraduate statistics majors is four times the size of what it was in 2010, and twenty times the size of what it was in 2003.

This change was prompted not by an abrupt shift in what the Statistics Department does, but rather was meant to reflect the shifting state of the field. “Because our approach to statistics encompasses the essence of what data science is and can be, we have decided to change the name of the department to the Department of Statistics and Data Science. We feel this name more accurately reflects the department’s position on the leading edge of research and education in a data-rich world,” stated Richard Scheines, the Dean of Dietrich, and Christopher Genovese, head of the Department of Statistics and Data Science in a Dietrich-wide email.

As “big data” becomes more and more central to the work of the Statistics Department, the work done by the department is also on the cutting edge of the technological atmosphere prevalent at Carnegie Mellon, as evidenced by the degree in Statistics and Machine Learning offered jointly by the Department of Statistics and the Department of Machine Learning in the School of Computer Science.


Data Science and AI in Film Production

Medium, Towards Data Science, Debajyoti (Deb) Ray


from

The root of pre-production is the Script Breakdown. The script is the idea, which needs to be turned into a physical product. This is the stage where all the elements that are necessary to turn the script into to an actual physical manifestation are extracted.

Using Natural Language Processing (NLP) methods, End Cue’s Agile Producer platform automatically extract elements in different categories, like the main characters (and which characters interact in a scene), the dialog for each character, the type of scene, props required, visual or audio effects, and we can estimate the length of the scene and the number of shots.

Off-the-shelf NLP entity-extraction methods won’t do, which is why we trained ours with a human-annotated dataset to learn which entities need to physically actualized and which ones don’t.


Government Data Science News

Joel Clement, the former Interior Department executive, believes he was forced out of his policy position to a job in accounting (for which he is woefully under-qualified) as retaliation for statements about climate change. Clement had detailed how climate change impacts Native Alaskan communities. His boss, Secretary of the Interior Ryan Zinke is reportedly aiming to trim 4,000 jobs from the department. Zinke upset many on his staff when he surmised, “I got 30 percent of the crew that’s not loyal to the flag”. Odd grammar aside, I find it alarming that this department is slated for such a substantial gutting just at the moment when we need more reports about the local impacts of climate change. Luckily, our friends at the Moore Foundation are funding ecological and sociological research into the Alaskan salmon population and the people who depend on it. Private foundations cannot begin to match the funding power of the federal government.

Betsy DeVos, Secretary of Education, faced protesters at Harvard University during a speech in which she called for funding to shift away from public schools to private charter schools. She also faced protesters at George Washington University who were dismayed by her recent reconfiguration of the way sexual harassment claims ought to be handled. I find active, non-violent protest to be a healthy sign of democracy in action. As a public school attendee and champion, I, too, would prefer to see a thoughtful strategy for improving the public school model. Our public schools should be shining examples of American exceptionalism at its best.



Gary Shiffman is an ex-Navy, former Customs and Border Protection chief of staff turned CEO of Giant Oak, a company that uses social media data to detect problems like child trafficking and (I think, but it’s hard to tell) radicalization. Forbes reports that Giant Oak has received several contracts with ICE that amount to $3 million. They are implying that the new legal OK to collect social media data from immigrants [including permanent residents and green card holders] will enable companies like Giant Oak to do what they have been doing, now with government purposes in mind. Previous Giant Oak clients include corporations trying to detect fraud and other inside threats or monitor negative media. Using social media to detect child trafficking, radicalization, and other threats to freedom is a little different – for one thing, many people see these efforts as public goods. Nobody wants another terrorist attack or to see children or adults trafficked for sex or slavery. Civil liberties groups are anxious, worrying that the unintended consequences of these types of social media tracking efforts may amplify or at least tech-wash biases against particular groups (e.g. Muslims and Arabs). Given our current Attorney General, it is healthy that these groups are worrying about the consequences of legally and technological equipping government agencies with the rights and capacities to track individuals via social media.

The European Union Commission published new guidelines for online platforms “to increase the proactive prevention, detection and removal of illegal content inciting hatred, violence and terrorism”. There are three main pillars: platforms must have organizational apparati in place included “trusted flaggers” and points of contact to speed detection of illegal content; they must remove content “as fast as possible”; and they must have mechanisms in place to prevent reappearance of illegal content.



Virginia is by some measures the leading state for data science employees though is not as well-served by data science events as New York, California, and Massachusetts. I believe this may be because the university infrastructure in Northern Virginia is not nearly as robust as the university scenes in New York, Boston, and the Bay Area. Huge opportunity for schools in DC and UVa to step up.



Rob Joyce the White House cyber coordinator has called for an alternative to the social security number in the wake of the Equifax hack. I’m right there with you, Joyce and appreciate a government official trying to be accountable in this way. High five.

Lawrence Berkeley National Lab has a state of the art earthquake simulation that can “capture the geology and physics of regional earthquakes” like the temblors produced by the Hayward Fault, which is located below the lab.


Two-thirds of Americans don’t bother seeking out science news

Ars Technica, John Timmer


from

For a host of issues, like vaccine safety, climate change, and GMO foods, public opinion is a poor match for our scientific standing. That dissonance has led a lot of people to ask how we could do better at getting scientific information out to the public. But the Pew Research Center decided to ask a related question that’s just as important: where’s the public getting its scientific information now?

The answer, disappointingly, is that most people in the US aren’t getting any scientific information at all. For the small number of people who do see significant scientific information, most of it comes from news outlets. Oddly, however, the public is not impressed with its primary source, as fewer than 30 percent of those polled think that news outlets actually get the science right.


Company Data Science News

An AI bot wrote an entire film script that might involve a love triangle though is reportedly too incoherent to decipher. It is, however, perhaps no more incoherent than a David Lynch film. It’s not clear to me that this movie represents an AI win for the creative process, but perhaps that’s the point.



Combine ad tech and precision medicine and you get precisely targeted ads for pharmaceuticals. All this has to be done without violating HIPAA, but in the US is it legal to target individuals with ads for prescription drugs which is not the case elsewhere. “Ask your doctor…” if your pandora listening history is causing you to receive pharma ads.



Co-founder of Groupon has a new precision medicine start-up, Tempus that looks at cancer and just scored a $70 million round of Series C funding. Bonus points to anyone who hits reply and connects Groupon to cancer diagnosis and/or treatment because I’m certainly not seeing it.



NVidia is basically having the best year of all the tech companies and is a clear winner in the AI space. They already dominate the autonomous driving market – a market that will continue to grow for decades to come – and they make basic hardware components likely to find their way into a range of current and future machine learning platforms.

Amazon apparently owes a bunch of back taxes to Luxembourg according to the EU Commission that is now headed by Jean-Claude Juncker who was responsible for inking the deal that allowed the company to evade these taxes in the first place. Awkward!



In other Amazon news, the company acquired Body Labs, a startup working on “true-to-life 3D body models to support various b2b software applications — such as virtually trying on clothes or photorealistic avatars for gaming.”



Google has earbuds that translate conversations in real time. How fricking cool is that?? Will they possibly offer a small amount of research funding to an ethnographer to study the social impact of using the device? Because I would love to investigate. Cybersecurity watchdogs are already sounding alarms about the increased vulnerability associated with voice recognition software.



Google also launched a bunch of other new hardware: new Pixel phones, Google Home Mini and Max, an all new Pixelbook, and an updated Daydream View headset.

In general, Google won this week’s DS news cycle. There was also an in-depth interview with Google CEO Sundar Pichai and a fruitful chip partnership with Intel.



And… you can now shop at Walmart using Google Home. This is an overt attempt to compete with Amazon and Alexa.

Facebook Research ran an interview with Joelle Pineau, new Director of the company’s Montreal research lab. And Wired took a critical look at how Facebook’s business model intersects U.S. electoral democracy.


Press Release – Security Union: Commission steps up efforts to tackle illegal content online

European Commission


from

The Commission is presenting today guidelines and principles for online platforms to increase the proactive prevention, detection and removal of illegal content inciting hatred, violence and terrorism online.


Technology Can Win the Fight Against Gerrymandering

CityLab, Teresa Mathew


from

Increasingly sophisticated mapping platforms have made redistricting easier—but data could also be the key to untangling the problem.


Cracking Algae’s growth code could help us breed more productive food crops

Anthropocene magazine, Emma Bryce


from

Algae are phenomenal carbon fixers. In fact, they are responsible for one-third of the carbon fixation that occurs globally. They suck up vast quantities of carbon dioxide, covert it into sugars through photosynthesis, and grow in gigantic blooms over vast water bodies.

Now scientists have cracked the code behind algae’s ability to intensify photosynthesis—and think they can potentially apply it to staple crops such as rice and wheat to enhance their production, too.


Emails Show How An Ivy League Prof Tried To Do Damage Control For His Bogus Food Science

BuzzFeed News, Stephanie M. Lee


from

Brian Wansink of Cornell University publishes headline-friendly studies about food psychology and oversees a $22 million federally funded program that uses his research to promote “smarter lunchrooms” in nearly 30,000 schools. Emails obtained by BuzzFeed News show how he scrambled to spin allegations that dozens of his studies are all just bologna.


Using patient data for patients’ benefit

The BMJ, Editorial, Amitava Banerjee and David Mathew


from

The most patient centred approach to data usage is to make best use of available data in a way that patients and the public want and expect. However, there seems to be a mismatch between clinicians and patients in the perception of benefit. A change in culture, and a clear strategy of patient engagement is necessary to reflect the beliefs and values of patients without creating an excessive burden for patients or clinicians, and ultimately to ensure that data really does save lives.


To Meet Health Consumers’ Digital Demands, Think Netflix and Verizon

Health Populi blog, Jane Sarasohn-Kahn


from


EFF: Stupid patents are dragging down AI and machine learning

Ars Technica, Joe Mullin


from

Each month, the patent lawyers at the Electronic Frontier Foundation shine a spotlight on one particular patent they believe is a drag on innovation. This month, they’re looking at one of the fastest-growing sectors of technology: machine learning and artificial intelligence.

EFF lawyer Daniel Nazer has picked out an artificial intelligence patent belonging to Hampton Creek, a San Francisco food-tech company that markets products under the brand name “just.” US Patent No. 9,760,834 describes what the company calls its “machine-learning enabled discovery platform” and ways of discovering new ingredients.


Google, Facebook may have to reveal deepest secrets

POLITICO, Nancy Scola and Josh Meyer


from

The investigations into Russia’s role in the 2016 election are threatening to pry the lid off tech companies’ most prized possessions: the secret inner workings of their online platforms.

As the probes unfold into social media’s role in spreading misinformation, U.S. lawmakers are beginning to show an interest in the mechanics of everything from how Facebook weights news items to how Google ranks search results. The questions, which echo European regulators’ interventionist approach to technology, are a stark change for Silicon Valley companies accustomed to deference from U.S. officials on how they run their operations.

Rep. Adam Schiff of California, the top Democrat on the House Intelligence Committee, warned Sept. 24 about “the use of Facebook’s algorithms and the way it tends to potentially reinforce people’s informational bias.” He added, “This is a far broader issue than Russia, but one that we really need to know more about.”


Are Ad-Tech Companies Pitching Solutions to Problems That Don’t Exist? One Exec Thinks So

Adweek, Lauren Johnson


from

Digital advertising may be plagued with issues like fraud, brand safety and viewability, but some agency execs say a new crop of tech companies are increasingly trying to capitalize on those problems and solve problems that don’t exist.

During an Advertising Week panel on Thursday, MDC Media Partners and Assembly CEO Martin Cass spoke with execs from ad-tech firms OpenX and AdRoll about challenges in digital transparency and how they affect the agency-client relationship. Over the past year, a number of ad verification and brand-safety software companies have seen an uptick in business from marketers concerned about where their ads are running and how well they are performing.

“We must have come through our doors two or three new solutions a week for a problem that we didn’t know we had and then you discover it really isn’t much of a problem,” Cass said. “There’s an enormous lack of trust because maybe three or four years ago, clients were starting to say, ‘There’s something going on. I don’t know what it is.’ Now, they know what it is or they think they know what it is that is creating a larger degree of angst, and that angst is manifesting itself in clients’ enlargement of their fees.”


Felten joins panel aimed at modernizing labor market

Princeton University, School of Engineering and Applied Science


from

Princeton Computer Science Prof. Edward Felten will be joined by fellow Princeton professor Alan Krueger on a new task force aimed at transforming America’s labor market to a 21st century, skills-driven model.


New Survey Finds 21 Percent of Americans Report Personal Experience with Medical Errors

Business Wire, NORC at the University of Chicago


from

The vast majority of Americans are having positive experiences with the health care system, but 21 percent of adults report having personally experienced a medical error, according to a new national survey released today by the IHI/NPSF Lucian Leape Institute and NORC at the University of Chicago. The survey further finds that, when errors do occur, they often have lasting impact on the patient’s physical health, emotional health, financial well-being, or family relationships.

The nationwide survey of more than 2,500 adults was conducted by NORC from May 12-June 26, 2017. The survey expands on a 1997 survey conducted by the National Patient Safety Foundation (NPSF), which merged with the Institute for Healthcare Improvement (IHI) earlier this year.


You’ve Heard About Precision Medicine. Now Get Ready for Precision Drug Ads

KQED Future of You, STAT, Rebecca Robins


from

The drug industry spends billions each year to promote its medicines to the masses, blanketing popular TV shows and magazines with ads. Now, digital companies are increasingly trying to pry away a share of that money for ads that target specific patients, rather than broad demographics.

Targeted ads are nothing new in retail; anyone who uses the internet has had the eerie feeling that the ads popping up on page after page appear to be aimed directly at them. But drug makers have long steered clear of many such tools, for fear of violating patient privacy law.

That’s changing now.


New SAT scores sow confusion over how to tell a good result

The Washington Post, Nick Anderson


from

The perfect score of yore — 1600 — is back and just as impressive as ever. But many students could be forgiven these days for puzzling over whether their own SAT scores are good, great or merely okay.

The first national report on the revised SAT shows the confusion that results when a familiar device for sorting college-bound students is recalibrated and scores on the admission test suddenly look a bit better than they actually are.

 
Events



Machine Learning and the Market for Intelligence Conference

University of Toronto, Rotman School of Management


from

Toronto, Canada October 26. “A conference dedicated to exploring the commercial opportunities and socio-economic implications arising due to advances in machine intelligence” [$$$]


Data Science Roundtables – Smarter Health and Our Future

South Big Data Hub


from

Chapel Hill, NC, and Online Wednesday, October 18 starting at 12 noon. Organized by South Big Data Hub. [rsvp require]


Federal Trade Commission, Informational Injury Workshop

Federal Trade Commission


from

Washington, DC December 12. “The workshop will address questions such as how to best characterize these injuries, how to accurately measure such injuries and their prevalence, and what factors businesses and consumers consider when evaluating the tradeoffs between providing information and potentially increasing their exposure to injuries.” [free, registration available soon]

 
Deadlines



Call for Proposals: IRIS Researcher Awards

Up to $15,000 for dissertation awards and up to $30,000 for early career and established researcher awards will be disbursed to the recipient’s institution. Deadline for applications is November 1.

Marron Institute Seed Grants

The Marron Institute of Urban Management will offer grants of up to $50,000 over a two-year period to NYU faculty to provide seed funding for innovative scholarship and applied research projects that address pressing problems faced by cities and urban residents. Deadline for proposals is December 1.
 
Tools & Resources



A Fresh Approach to Technical Computing // Viral Shah & Stefan Karpinski, Julia Computing

YouTube, Data Driven NYC


from

Viral Shah and Stefan Karpinski, Founder/CEO and CTO at Julia Computing, spoke at Data Driven NYC on September 2. They discussed the creation of the Julia programming language, the products they’re developing to make Julia easy to use, deploy and scale.


Assessment of the need for separate test set and number of medical images necessary for deep learning: a sub-sampling study

bioRxiv; Ariel Rokem, Yue Wu, Aarom Y Lee


from

“We show that a moderate amount of data (data from approximately 1,800 patients) may be enough to reach close- to-maximal performance in the classification of AMD patients from OCT images. These results suggest that deep learning algorithms can be trained on moderate amounts of data, provided that images are relatively homogenous, and the effective number of parameters is sufficiently small. Furthermore, we demonstrate that in this application, cross-validation with a separate test set that is not used in any part of the training does not differ substantially from cross-validation with a validation data-set used to determine the optimal stopping point for training.”


Chihuahua Or Muffin? Searching For The Best Computer Vision API Read more at http://www.business2community.com/tech-gadgets/chihuahua-muffin-searching-best-computer-vision-api-01923086#TXvss7vlrYp6KO38.99

Business 2 Community, Mariya Yao


from

You’ve probably seen this internet meme demonstrating the alarming resemblance of chihuahuas and muffins. Everyone in the AI industry (including myself) loves putting the image in their presentations.

But, one question I haven’t seen anyone answer rigorously is: just how good IS modern AI at disambiguating between a chihuahua and a muffin? For your entertainment and education, I’ll be investigating this question today.


Introduction to package mapsapi

Michael Dorman


from

The mapsapi package provides an R interface to the Google Maps APIs.

 
Careers


Tenured and tenure track faculty positions

STS Data Studies – Assistant/Associate Professor



University of California-Davis; Davis, CA
Full-time, non-tenured academic positions

CIRES/Earth Lab Data-Intensive Course Developer/Instructor



University of Colorado; Boulder, CO

Program Directors (3)



NYU, Marron Institute of Urban Management; New York, NY
Internships and other temporary positions

Summer Intern, Quantitative Analysis



New York Yankees; Bronx, NY

Leave a Comment

Your email address will not be published.