Data Science newsletter – July 18, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for July 18, 2018

GROUP CURATION: N/A

 
 
Data Science News



Company Data Science News

Celmatix is collecting fertility data, partly through voluntary surveys, partly in collaboration with 23andMe in a panel study. Celmatix wants to find and strengthen what scientists currently know about the genetic and environmental contributors to infertility. This is critically important because the age at first parenthood is rapidly moving into women’s mid-30s.



Goldman Sachs got moderately better at predicting recent World Cup outcomes even though the predictions were still mostly wrong. They explained that “forecasts remain highly uncertain, even with the fanciest statistical techniques, simply because football is quite an unpredictable game.” Sooooo….um, I don’t know….how much should investors trust them to predict winners in the infinitely more complex financial markets?

Google has lost three top AI leaders lately: Google Cloud COO Diane Bryant resigned, Senior Director of Engineering Shahriar Rabii left for Facebook, and last April it’s AI and Search chief, John Giannandrea moved to Apple. Is this normal churn or is there a larger pattern? With such a small N and no insight into the company, I have no idea.



Apple Music now has more subscribers than Spotify. Surprising. But that’s the power of established brands.



Spotify is rolling out a new feature – Spotify for Artists – to help artists get into playlist rotation.


University Data Science News

Another round in the rivalrous battle between MIT and Harvard, revolves around two home security startups. Spoiler alert: Harvard wins. To find out why, you’ll have to read the article. One hint: Service innovation.



Amy Bruckman of Georgia Tech observed the sudden community-cide and subsequent reestablishment of a subreddit with 100,000 members. The founder removed admin privileges from all the other moderators and then shut the whole subreddit down. Was the founding admin technically able to do that? Sort of – reddit eventually reestablished the content. Should he be able to do that? Bruckman says probably not: “Once your creation is launched into the world, it’s no longer really ‘yours.’ It has a life of its own….like a golem.”



Florida State University chemists are using a neural network approach to predict which hypothetical crystal structures will be able to form stable chemical compounds.



And chemist Lee Cronin at the University of Glasgow created a AI system to predict organic synthesis based on a specific set of reagents. Ten percent of the reactions were used to generate a model that was then testing on the remaining 90 percent of the sample. The autonomous system had 86% accuracy, which is decent.



Another “renegade physicist” decided to leave physics for another “interesting” scientific field. This interview with Matt Thompson of Caltech describes how he moved from physics to computational biology. So many physicists end up in so many fields.

The FT has what appears to be an unintentionally satirical article by Tim Harford about economists “leaving their armchairs” and “using administrative data” as a kind of intellectual revolution. I would like to point out that using administrative data does not require leaving one’s physical armchair, nor does it necessarily require leaving one’s ideological soft, soft barcalounger. It is all too easy to find what we think we ought to find in any sufficiently large, complex dataset. In the end, Harford basically says economists are becoming sociologists because sociology has better methods. He wisely values his life, so he didn’t ever use the word sociology.

Springer, the publisher of top science journals like Nature announced it would end its open access experimental partnership with the Gates Foundation. A paltry 26 papers had been released under Open Access licenses over the 18 month term. Gates paid Springer $100,000 for the first 12 months (and 16 papers) which worked out to $6250 per paper. Details are scarce. That’s predictable, given Springer’s attitude about sharing.



Climate change will eat the internet according to Paul Barford, a University of Wisconsin-Madison professor of Computer Science. He prepared a new analysis of sea levels and Internet-service-provider locations. Not looking good for Seattle, New York, and Miami.



The University of Pennsylvania will pour $50 million into 10 biotech startups near the university, to support local startups. The school is attempting to compete with powerhouse VCs and incubators in Silicon Valley and New York. I don’t understand this move. Fifty million is quite a small sum when stacked up against what VCs can offer, but a sizable amount for a school to divert from its core education and research mission.



Two computer scientists from Stanford make the point that social scientists and not-such-elite computer scientists have been making for several years: AI can be sexist and racist because too much attention is paid to models and not enough to the underlying bias in the datasets. Social scientists and not-such-elite computer scientists get to publish their thoughts on personal blogs and medium. Stanford computer scientists publish in Nature. Snark aside, I’m glad to see the article.



The first data science masters program based in Africa – in this case, Rwanda – is launching this year with support from Google and Facebook.



Microsoft and GE are forging a sizable partnership. Microsoft’s Azure cloud computing infrastructure will undergird GE’s internet-of-things devices.



Temple University’s Fox Business School falsified the data it sent to US News and World Report which angrily/justifiably scrubbed the school from the rankings altogether. The law firm Jones Day conducted an investigation which brought to light very bad, terrible practices. Moshe Porat, the school’s dean for 22 years, was 86’d earlier this month.


Inside Facebook, Twitter and Google’s AI battle over your social lives

CNET, Alfred Ng


from

From stamping out trolls to removing fake bot accounts, here’s how social networks are waging war using AI weapons.


Where Is the Home of America’s Robot Revolution? – CityLab

CityLab, Richard Florida


from

Robots, it seems, are everywhere these days. They clean our floors, mow our lawns, make many of our industrial products, and are even being trained to give hugs and serve as pets. As robots increasingly become a part of our daily lives, a growing chorus of commentators warns that they may take away our jobs and further damage the once great Rust Belt cities that once powered the American economy and served as the backbone of the middle class.

But which cities and regions will house the robot revolution? Will the rise of robotics correct or reinforce America’s growing spatial inequality?

A new paper published in the journal Regional Studies dives into the geography of the robotic revolution to identify America’s leading robot metros. The study, by Nancey Leigh and Benjamin Kraft, uses both Census data and data from industry sources such as the International Federation of Robotics and the Robotics Industry Association to identify and map America’s leading centers in the robotics industry.


Mining Electronic Medical Records To Enhance Use Of Approved Drugs

NPR, Shots blog, Richard Harris


from

Researchers are looking to see if they can plow through the vast amount of data that’s gathered in those records, along with insurance billing information, to tease out the bits that could be useful in refining treatments and identifying new uses for drugs.

For instance, when the Food and Drug Administration approves a new drug, doctors often don’t know the optimal dose, the ideal length of treatment or who is most likely to benefit – or be harmed – by the medicine.

Especially when it comes to drugs fast-tracked through the process, “there are so many questions that remain when a drug is approved,” says Dr. Janet Woodcock, who heads the FDA’s drug-approval branch. “Many of the questions are not interesting to academics so they’re not done in the academic setting, and companies are not compelled to do them… so nobody does them. I’ve been raving about this for years!” she said Wednesday at a meeting on the subject convened by the Friends of Cancer Research, a nonprofit organization that builds collaborations between government, industry and advocacy groups.


Doing good data science

O'Reilly Radar; Mike Loukides, Hilary Mason and DJ Patil


from

This post is the first in a series on data ethics.

The hard thing about being an ethical data scientist isn’t understanding ethics. It’s the junction between ethical ideas and practice. It’s doing good data science.

There has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum.

And, while there are always exceptions, we believe that most people want to be fair.


FSU researchers use artificial intelligence to identify, predict new chemical compounds

Florida State University News


from

A team of Florida State University researchers is using artificial intelligence to identify which among hundreds of thousands of hypothetical crystal structures can result in the prediction of new chemical compounds.

It’s a move that could radically change how chemists identify, test and study potential new compounds for any variety of materials, medicines or other products.

“We were dealing with a lot of data on crystal structures and trying to figure out the best way to help identify new compounds,” said FSU Professor of Chemistry and Biochemistry Michael Shatruk. “The big breakthrough came when my graduate student said, ‘Let a machine figure it out.’”


Canada raises banner of AI to boost tech industry

ComputerWeekly, Brian McKenna


from

“Canada is having a moment. It is on the world stage,” said Aaron Rosland, a diplomat in the High Commission of Canada, in London, and the official Ontario government representative in the UK.

Indeed, Canada, he said, is a great place for even UK firms to secure access to the European Union, because of the Comprehensive Economic and Trade agreement (Ceta) set up in 2017. “There is a significant uptick in interest from UK firms because of Ceta, and because it is an entry point to the US, too,” he says.

Rosland was speaking to Computer Weekly at a recent AI summit in London, alongside Daniel Silverman, executive vice-president of Toronto Global, and Jordan Jacobs, a founder of the Vector Institute and co-head of Layer 6 AI at the TD Bank Group, where he is chief AI officer.


Does anyone “own” a commuity? Today’s drama on Kotaku in Action

Amy Bruckman, The Next Bison: Social Computing and Culture blog


from

Does anyone “own” a community? In the real world, if you own the physical building in which a group meets, you can unilaterally shut it down, closing the doors and refusing access. Online, if you run the server that an online community operates on, you can just shut it down. What about running a group on a commercial platform, like a subreddit?

Testing the limits of that, today the founder of the subreddit KotakuInAction (KiA), david-me, decided he had had enough. He didn’t like what the group had become, so he removed powers from all the other moderators and took the group “private” (blocking access to new members). He deleted the code that customized the group. He did this unilaterally.

Can he do that? KiA has nearly 100,000 subscribers. It was created and sustained by the contributions of its members and the work of its volunteer moderators. Can the founder just shut it down?

If david-me was running the server, the answer is yes—he could shut down the process and walk away. In this case, the server is hosted on a commercial platform, Reddit. The other moderators asked Reddit to restore the group, which they did. But do David’s feelings about this matter? We’ll see what the Reddit administrators decide. But the answer to that is probably no.

And that’s probably the right answer.


Ed Simcox named HHS Chief Technology Officer

The HHS IDEA Lab


from

Ed Simcox has been promoted to Chief Technology Officer (CTO) at HHS. Ed had been serving as Acting CTO and was previously the Deputy CTO.


The devil’s bargain for AI companies working in China

Axios, Kaveh Waddell


from

American tech companies and research institutions — involved in the development of artificial intelligence in both the U.S. and China — face elevated ethical questions as the two superpowers race for dominance in the field.

Why it matters: U.S. labs face the real possibility that collaborations with Chinese companies and universities will end up bolstering Beijing’s goal of dominating global civilian and military AI.


A Conversation with Matt Thomson

Caltech, Now@Caltech


from

How do cells make decisions to divide, to differentiate, or to work together? For former physicist Matt Thomson, these questions drew him out of physics and into the study of computational biology. Thomson recently joined the Caltech faculty as an assistant professor of computational biology. We sat down with him to discuss his work and how he made the leap from physics into the world of the living.


Extra Extra

HBO’s Silicon Valley featured an app called Not Hotdog (a real app) that predicted which food pictures were hotdogs. Not Hotdog has been nominated for a prime time Emmy and you can read Tim Anglade’s nomination acceptance on – where else? – LinkedIn.

Ah, the frustrating inefficiency of health care in America: “for every 10 physicians providing care, almost seven additional people are engaged in billing-related activities.” Theoretically, data-driven health care could trim inefficiencies. But probably not as much as a single-payer system? 🙂

For the past year or so every week has been Shark Week along the California coast.


Google researchers create AI that maps the brain’s neurons

VentureBeat, Kyle Wiggers


from

Mapping the structure of biological networks in the nervous system — a field of study known as connectomics — is computationally intensive. The human brain contains around 86 billion neurons networked through 100 trillion synapses, and imaging a single cubic millimeter of tissue can generate more than 1,000 terabytes of data.

Luckily, artificial intelligence can help.

In a paper (High-Precision Automated Reconstruction of Neurons with Flood-Filling Networks) published in the journal Nature Methods, scientists at Google and the Max Planck Institute of Neurobiology demonstrated a recurrent neural network — a type of machine learning algorithm that’s often used in handwriting and speech recognition — tailored made for connectomics analysis.


Data impel economists to leave their armchairs at last

FT.com, Tim Harford


from

Please use the sharing tools found via the email icon at the top of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found at https://www.ft.com/tour.
https://www.ft.com/content/668a747a-8515-11e8-a29d-73e3d454535d

If Hollywood is to be believed, every mad scientist who ever lived has a laboratory full of bubbling flasks, flashing consoles and glowing orbs. Science writer Philip Ball — who has visited countless research labs — tells me that reality is not so very different: the gear may be more subdued, but the gear is always there.

Science depends on tools, often instruments to detect or measure that which was previously undetectable — think of Galileo’s telescope or Newton’s prisms. Nobel Prizes have often been awarded to the physicists who developed such tools: the cloud chamber (1927); the electron microscope (1986); and LIGO, the laser interferometer gravitational-wave observatory (2017).

What, then, of economics?

 
Events



Biocene 2018

Ohio Aerospace Institute


from

Cleveland, OH August 14-17. “Previously named the Biomimicry Summit, this is the 3rd annual bio-inspired symposium held in Cleveland, Ohio that attracts speakers and attendees from across the globe.” [$$$]


Chatbot Conference

Chatbot's Life


from

San Francisco, CA September 19. “The event features the top Bot entrepreneurs, the hottest startups, and biggest influencers in the Chatbot space.” [$$$]


Foundation or Backdrop? – the Role of Statisticians in Academic Data Science Initiatives

Twitter, Tim Zheng


from

Vancouver, BC, Canada August 2, “an invited #JSM2018 panel with Patrick Wolfe (@LifeAtPurdue), Jennifer Hill (@NYUDataScience), David Madigan (@DSI_Columbia), @eairoldi (@harvard_data) & @tylermccormick (@uwescience)”


The Japan Sports Analytics Conference

KPMG


from

Tokyo, Japan September 21, “the premier forum for industry professionals and executives interested in the role of data and analytics in the Japanese and global sports industry. Join attendees from across Japan and Asia-Pacific.” [$$$]


Women in Big Data at Michigan Symposium

University of Michigan, Michigan Institute for Data Science, MIDAS


from

Ann Arbor, MI November 12. “This day-long symposium will highlight women data science researchers at U-M, provide resources and support for women pursuing careers in data science, a poster session, lunch time round table discussions, a faculty panel, and ample time for networking.” [Registration required]

 
Deadlines



NDSS 2019 Call for Papers – NDSS Symposium

San Diego, CA February 24-27, 2019. “The Network and Distributed System Security Symposium is a top venue that fosters information exchange among researchers and practitioners of network and distributed system security. The target audience includes those interested in practical aspects of network and distributed system security, with a focus on actual system design and implementation.” Deadline for technical submissions is August 7.
 
Tools & Resources



Humane/AI

Roya Pakzad


from

Humane AI is a newsletter about human rights impacts of Artificial Intelligence,


44 tools and resources for social scientists

UsableInk, shariqueorg


from

“Over the years I’ve gotten great tips from colleagues and students about the tools that have helped them become more productive researchers. Below is a list of the 44 tools and resources that have changed how I do research.”


For E-Commerce Data Scientists: Lessons Learned Scraping 100 Billion Products Pages

THE SCRAPINGHUB BLOG, Ian Kerins


from

“In this article, the first of the series, we will give you a overview of the main challenges you will face scraping product data at scale and the lessons Scrapinghub has learned from scraping 100 billion product pages.”


Apache Arrow: A Cross-Language Development Platform for in Memory Data | SciPy 2018 | Wes McKinney

YouTube, Enthought


from

This talk discusses Apache Arrow project and its uses for high performance analytics and system interoperability. I discuss the current development initiative and future roadmap as it relates to the Python data community.


Unfollowing Everybody

Anil Dash


from

How I did it … Years ago, Twitter used to have a command-line interface for performing bulk or automated actions on an account. They abandoned it after a while, so Erik Berlin created a new command-line tool for power users of Twitter, simply called “t”.

 
Careers


Postdocs

Post Doc Position



University of Pennsylvania, Department of Psychology; Philadelphia, PA

Interdisciplinary Post-Doc in Drones, GIS, and Remote Sensing



University of Central Florida, Department of Sociology; Orlando, FL

Leave a Comment

Your email address will not be published.