Newsletter features journalism, research papers, events, tools/software, and jobs for July 18, 2018
GROUP CURATION: N/A
Data Science News
Company Data Science News
Celmatix is collecting fertility data, partly through voluntary surveys, partly in collaboration with 23andMe in a panel study. Celmatix wants to find and strengthen what scientists currently know about the genetic and environmental contributors to infertility. This is critically important because the age at first parenthood is rapidly moving into women’s mid-30s.
Goldman Sachsgot moderately better at predicting recent World Cup outcomes even though the predictions were still mostly wrong. They explained that “forecasts remain highly uncertain, even with the fanciest statistical techniques, simply because football is quite an unpredictable game.” Sooooo….um, I don’t know….how much should investors trust them to predict winners in the infinitely more complex financial markets?
Google has lost three top AI leaders lately: Google Cloud COO Diane Bryant resigned, Senior Director of Engineering Shahriar Rabii left for Facebook, and last April it’s AI and Search chief, John Giannandrea moved to Apple. Is this normal churn or is there a larger pattern? With such a small N and no insight into the company, I have no idea.
Apple Music now has more subscribers than Spotify. Surprising. But that’s the power of established brands.
Spotify is rolling out a new feature – Spotify for Artists – to help artists get into playlist rotation.
University Data Science News
Another round in the rivalrous battle between MIT and Harvard, revolves around two home security startups. Spoiler alert: Harvard wins. To find out why, you’ll have to read the article. One hint: Service innovation.
Amy Bruckman of Georgia Tech observed the sudden community-cide and subsequent reestablishment of a subreddit with 100,000 members. The founder removed admin privileges from all the other moderators and then shut the whole subreddit down. Was the founding admin technically able to do that? Sort of – reddit eventually reestablished the content. Should he be able to do that? Bruckman says probably not: “Once your creation is launched into the world, it’s no longer really ‘yours.’ It has a life of its own….like a golem.”
And chemist Lee Cronin at the University of Glasgow created a AI system to predict organic synthesis based on a specific set of reagents. Ten percent of the reactions were used to generate a model that was then testing on the remaining 90 percent of the sample. The autonomous system had 86% accuracy, which is decent.
Another “renegade physicist” decided to leave physics for another “interesting” scientific field. This interview with Matt Thompson of Caltech describes how he moved from physics to computational biology. So many physicists end up in so many fields.
The FT has what appears to be an unintentionally satirical article by Tim Harford about economists “leaving their armchairs” and “using administrative data” as a kind of intellectual revolution. I would like to point out that using administrative data does not require leaving one’s physical armchair, nor does it necessarily require leaving one’s ideological soft, soft barcalounger. It is all too easy to find what we think we ought to find in any sufficiently large, complex dataset. In the end, Harford basically says economists are becoming sociologists because sociology has better methods. He wisely values his life, so he didn’t ever use the word sociology.
Springer, the publisher of top science journals like Nature announced it would end its open access experimental partnership with the Gates Foundation. A paltry 26 papers had been released under Open Access licenses over the 18 month term. Gates paid Springer $100,000 for the first 12 months (and 16 papers) which worked out to $6250 per paper. Details are scarce. That’s predictable, given Springer’s attitude about sharing.
Climate change will eat the internet according to Paul Barford, a University of Wisconsin-Madison professor of Computer Science. He prepared a new analysis of sea levels and Internet-service-provider locations. Not looking good for Seattle, New York, and Miami.
The University of Pennsylvania will pour $50 million into 10 biotech startups near the university, to support local startups. The school is attempting to compete with powerhouse VCs and incubators in Silicon Valley and New York. I don’t understand this move. Fifty million is quite a small sum when stacked up against what VCs can offer, but a sizable amount for a school to divert from its core education and research mission.
Microsoft and GE are forging a sizable partnership. Microsoft’s Azure cloud computing infrastructure will undergird GE’s internet-of-things devices.
Temple University’s Fox Business Schoolfalsified the data it sent to US News and World Report which angrily/justifiably scrubbed the school from the rankings altogether. The law firm Jones Day conducted an investigation which brought to light very bad, terrible practices. Moshe Porat, the school’s dean for 22 years, was 86’d earlier this month.
Robots, it seems, are everywhere these days. They clean our floors, mow our lawns, make many of our industrial products, and are even being trained to give hugs and serve as pets. As robots increasingly become a part of our daily lives, a growing chorus of commentators warns that they may take away our jobs and further damage the once great Rust Belt cities that once powered the American economy and served as the backbone of the middle class.
But which cities and regions will house the robot revolution? Will the rise of robotics correct or reinforce America’s growing spatial inequality?
A new paper published in the journal Regional Studies dives into the geography of the robotic revolution to identify America’s leading robot metros. The study, by Nancey Leigh and Benjamin Kraft, uses both Census data and data from industry sources such as the International Federation of Robotics and the Robotics Industry Association to identify and map America’s leading centers in the robotics industry.
Researchers are looking to see if they can plow through the vast amount of data that’s gathered in those records, along with insurance billing information, to tease out the bits that could be useful in refining treatments and identifying new uses for drugs.
For instance, when the Food and Drug Administration approves a new drug, doctors often don’t know the optimal dose, the ideal length of treatment or who is most likely to benefit – or be harmed – by the medicine.
Especially when it comes to drugs fast-tracked through the process, “there are so many questions that remain when a drug is approved,” says Dr. Janet Woodcock, who heads the FDA’s drug-approval branch. “Many of the questions are not interesting to academics so they’re not done in the academic setting, and companies are not compelled to do them… so nobody does them. I’ve been raving about this for years!” she said Wednesday at a meeting on the subject convened by the Friends of Cancer Research, a nonprofit organization that builds collaborations between government, industry and advocacy groups.
O'Reilly Radar; Mike Loukides, Hilary Mason and DJ Patil
from
This post is the first in a series on data ethics.
The hard thing about being an ethical data scientist isn’t understanding ethics. It’s the junction between ethical ideas and practice. It’s doing good data science.
There has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum.
And, while there are always exceptions, we believe that most people want to be fair.
A team of Florida State University researchers is using artificial intelligence to identify which among hundreds of thousands of hypothetical crystal structures can result in the prediction of new chemical compounds.
It’s a move that could radically change how chemists identify, test and study potential new compounds for any variety of materials, medicines or other products.
“We were dealing with a lot of data on crystal structures and trying to figure out the best way to help identify new compounds,” said FSU Professor of Chemistry and Biochemistry Michael Shatruk. “The big breakthrough came when my graduate student said, ‘Let a machine figure it out.’”
“Canada is having a moment. It is on the world stage,” said Aaron Rosland, a diplomat in the High Commission of Canada, in London, and the official Ontario government representative in the UK.
Indeed, Canada, he said, is a great place for even UK firms to secure access to the European Union, because of the Comprehensive Economic and Trade agreement (Ceta) set up in 2017. “There is a significant uptick in interest from UK firms because of Ceta, and because it is an entry point to the US, too,” he says.
Rosland was speaking to Computer Weekly at a recent AI summit in London, alongside Daniel Silverman, executive vice-president of Toronto Global, and Jordan Jacobs, a founder of the Vector Institute and co-head of Layer 6 AI at the TD Bank Group, where he is chief AI officer.
Amy Bruckman, The Next Bison: Social Computing and Culture blog
from
Does anyone “own” a community? In the real world, if you own the physical building in which a group meets, you can unilaterally shut it down, closing the doors and refusing access. Online, if you run the server that an online community operates on, you can just shut it down. What about running a group on a commercial platform, like a subreddit?
Testing the limits of that, today the founder of the subreddit KotakuInAction (KiA), david-me, decided he had had enough. He didn’t like what the group had become, so he removed powers from all the other moderators and took the group “private” (blocking access to new members). He deleted the code that customized the group. He did this unilaterally.
Can he do that? KiA has nearly 100,000 subscribers. It was created and sustained by the contributions of its members and the work of its volunteer moderators. Can the founder just shut it down?
If david-me was running the server, the answer is yes—he could shut down the process and walk away. In this case, the server is hosted on a commercial platform, Reddit. The other moderators asked Reddit to restore the group, which they did. But do David’s feelings about this matter? We’ll see what the Reddit administrators decide. But the answer to that is probably no.
American tech companies and research institutions — involved in the development of artificial intelligence in both the U.S. and China — face elevated ethical questions as the two superpowers race for dominance in the field.
Why it matters: U.S. labs face the real possibility that collaborations with Chinese companies and universities will end up bolstering Beijing’s goal of dominating global civilian and military AI.
How do cells make decisions to divide, to differentiate, or to work together? For former physicist Matt Thomson, these questions drew him out of physics and into the study of computational biology. Thomson recently joined the Caltech faculty as an assistant professor of computational biology. We sat down with him to discuss his work and how he made the leap from physics into the world of the living.
Extra Extra
HBO’s Silicon Valley featured an app called Not Hotdog (a real app) that predicted which food pictures were hotdogs. Not Hotdog has been nominated for a prime time Emmy and you can read Tim Anglade’s nomination acceptance on – where else? – LinkedIn.
Ah, the frustrating inefficiency of health care in America: “for every 10 physicians providing care, almost seven additional people are engaged in billing-related activities.” Theoretically, data-driven health care could trim inefficiencies. But probably not as much as a single-payer system? 🙂
Mapping the structure of biological networks in the nervous system — a field of study known as connectomics — is computationally intensive. The human brain contains around 86 billion neurons networked through 100 trillion synapses, and imaging a single cubic millimeter of tissue can generate more than 1,000 terabytes of data.
Luckily, artificial intelligence can help.
In a paper (High-Precision Automated Reconstruction of Neurons with Flood-Filling Networks) published in the journal Nature Methods, scientists at Google and the Max Planck Institute of Neurobiology demonstrated a recurrent neural network — a type of machine learning algorithm that’s often used in handwriting and speech recognition — tailored made for connectomics analysis.
Please use the sharing tools found via the email icon at the top of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found at https://www.ft.com/tour.
https://www.ft.com/content/668a747a-8515-11e8-a29d-73e3d454535d
If Hollywood is to be believed, every mad scientist who ever lived has a laboratory full of bubbling flasks, flashing consoles and glowing orbs. Science writer Philip Ball — who has visited countless research labs — tells me that reality is not so very different: the gear may be more subdued, but the gear is always there.
Science depends on tools, often instruments to detect or measure that which was previously undetectable — think of Galileo’s telescope or Newton’s prisms. Nobel Prizes have often been awarded to the physicists who developed such tools: the cloud chamber (1927); the electron microscope (1986); and LIGO, the laser interferometer gravitational-wave observatory (2017).
Cleveland, OH August 14-17. “Previously named the Biomimicry Summit, this is the 3rd annual bio-inspired symposium held in Cleveland, Ohio that attracts speakers and attendees from across the globe.” [$$$]
San Francisco, CA September 19. “The event features the top Bot entrepreneurs, the hottest startups, and biggest influencers in the Chatbot space.” [$$$]
Vancouver, BC, Canada August 2, “an invited #JSM2018 panel with Patrick Wolfe (@LifeAtPurdue), Jennifer Hill (@NYUDataScience), David Madigan (@DSI_Columbia), @eairoldi (@harvard_data) & @tylermccormick (@uwescience)”
Tokyo, Japan September 21, “the premier forum for industry professionals and executives interested in the role of data and analytics in the Japanese and global sports industry. Join attendees from across Japan and Asia-Pacific.” [$$$]
University of Michigan, Michigan Institute for Data Science, MIDAS
from
Ann Arbor, MI November 12. “This day-long symposium will highlight women data science researchers at U-M, provide resources and support for women pursuing careers in data science, a poster session, lunch time round table discussions, a faculty panel, and ample time for networking.” [Registration required]
San Diego, CA February 24-27, 2019. “The Network and Distributed System Security Symposium is a top venue that fosters information exchange among researchers and practitioners of network and distributed system security. The target audience includes those interested in practical aspects of network and distributed system security, with a focus on actual system design and implementation.” Deadline for technical submissions is August 7.
“Over the years I’ve gotten great tips from colleagues and students about the tools that have helped them become more productive researchers. Below is a list of the 44 tools and resources that have changed how I do research.”
“In this article, the first of the series, we will give you a overview of the main challenges you will face scraping product data at scale and the lessons Scrapinghub has learned from scraping 100 billion product pages.”
This talk discusses Apache Arrow project and its uses for high performance analytics and system interoperability. I discuss the current development initiative and future roadmap as it relates to the Python data community.
How I did it … Years ago, Twitter used to have a command-line interface for performing bulk or automated actions on an account. They abandoned it after a while, so Erik Berlin created a new command-line tool for power users of Twitter, simply called “t”.