Data Science newsletter – May 24, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for May 24, 2018

GROUP CURATION: N/A

Data Science News

PwC unveils new data science Graduate Apprenticeship with Edinburgh and St Andrews Universities

SBNN, Scottish Business News Network

from May 24, 2018

Workers of the future will be able to get a degree in data science through a Graduate Apprenticeship in a first for Scotland.

Global accounting and business services giant PwC, has joined forces with the Universities of Edinburgh and St Andrews to offer the new work-based learning route to a BSc data science degree.

Successful AI needs smart people behind it

Raconteur magazine, Oliver Pickup

from May 23, 2018

Consider that

, albeit in restricted conditions, and this is the first death. The latest National Safety Council statistics reveal that, on average, 110 people suffer non-autonomous traffic-related fatalities in America every day. Indeed, there were 40,100 such deaths last year and 6,000 of them were pedestrians. By allowing Artificial Intelligence, which can’t be drunk, distracted, tired or influenced by road rage and other emotions, to take the wheel, those figures could, in theory, be significantly dented.

“Human drivers are committing a holocaust, killing 1.2 million people every year on roads around the world and maiming another 50 million or so,” says Calum Chace, author of The Economic Singularity. “Road accidents are the most common form of death for people aged between 15 and 29. The sooner we can stop this carnage the better.”

The Uber incident serves to highlight the disparity between the current levels of expectation for AI and the reality of its relative maturity. For business leaders, education is essential to narrow that gap.

Arm Gives Glimpse of AI Core

EE Times, Rick Merritt

from May 22, 2018

Arm sketched the inner workings of its machine-learning core at a press and analyst event here. Engineers are nearly finished with RTL for the design with hopes of snagging a commitment within weeks for use in a premium smartphone in 2019 or later.

Analysts generally praised the architecture as a flexible but late response to a market that is already crowded with dozens of rivals. However, Arm still needs to show detailed performance and area numbers for the core, which may not see first silicon until next year.

The first core is aimed at premium smartphones that are already using AI accelerator blocks from startup DeePhi (Samsung Galaxy), Cambricon (Huawei Kirin), and in-house designs (iPhone). The good news for Arm is that it’s already getting some commercial traction for the neural-networking software for its cores, released as open source, that sits under frameworks such as TensorFlow.

University Data Science News

The Howard Hughes Medical Institute announced 19 new fellows, each of whom will receive $8m over 7 years to pursue their research. The HHMI fellowships are among the very most prestigious. Twenty-eight HHMI fellows have gone on to win Nobel prizes. Their work promises to advance human health and the sustainability of human life on this planet.

A new paper on the arXiv, Citation Count Analysis for Papers with Preprints looks at the citation impact of publishing computer science pre-prints on the arxiv. The authors report, “papers submitted to arXiv before acceptance have, on average, 65% more citations in the following year compared to papers submitted after” though it should be noted that they are only looking at papers eventually accepted for submission in top-tier CS conferences. That is a serious limitation to making any broader comparison to CS paper citation counts overall or to conducting causal inference.

Speaking of careers in computer science (CS), a new paper looks at decades of CS PhDs’ careers to see how many stayed in academia, how many left, where they ended up, and who ended up coming back to the ivory tower. Very interesting and possibly maps out a career trajectory pattern that will be replicated by the much younger field of data science.

The University of Pittsburgh is opening a new center for modeling complex systems: The Modeling and Managing Complicated Systems Institute (MoMaCS). It will be in the School of Computing and Information and will address problems ranging from food insecurity to cancer treatment efficacy. Carnegie Mellon University tends to capture the most attention from the scholarly computer science community in Pittsburgh, but the University of Pittsburgh is strengthening its data science assets. Their director used the phrase “deliberately disruptive” to describe the purpose of the center which made me grumpy. Can we all agree to stop using the word ‘disruptive’ as an indicator of technological advance?

Nate Silver gave the commencement address at Kenyon College. He won my heart by focusing on the critical thinking and moral judgment that makes good data science successful. Speaking from personal experience, he noted “the more experience you have in working with different data sets, the more you’ll realize that the correct interpretation of the data is rarely obvious, and that the obvious-seeming interpretation isn’t always correct.” I like his advice to be cautious and questioning far better than the idea of “deliberate disruption.”

Amar Shah and Alex Kendall, both PhDs from University of Cambridge have joined forces in a new self-driving car company called Wayve. They plan to use unsupervised learning, at least from what I can tell, and are (remarkably) confident that they will be able to beat Google’s Waymo project and other more established self-driving car projects.

A big competitor for them is a new partnership between Apple and Volkswagen to jointly develop driverless vehicles. This comes after BMW and Mercedes-Benz turned down Apple’s partnership offer. The first project will be to introduce self-driving shuttles on Apple’s campus.

Helsinki University is offering a free online data science course. So far, 25,000 people have signed up. Note that UC-Berkeley is offering a $357 Foundations of Data Science certificate. They don’t quite have as many enrollees as the entire nation of Finland, but they’re close. (Berkeley peeps, do you have up to date enrollment info for the EdX data8 class?)

Physicist turned digital humanist Simon DeDeo, now a professor at Princeton, used machine reading to examine 40,000 transcripts of speeches given in Parliament during the French Revolution. He found that influential comments were more frequently made in committee than previously recognized. File this one under: physicists got skillz.

A consortium of big state schools, led by the University of Nebraska-Lincoln, are lobbying Washington to release more funding to keep these major advances in agriculture trucking along. Apparently, research funding for agriculture is part of the hotly contested farm bill and has been shrinking for some time.

Ethan Gutman lead author among his colleagues at the National Center for Atmospheric Research used computer simulations of 22 previous hurricanes under current climatological conditions to show that we can expect stronger, slower, wetter hurricanes. Get that flood insurance while you still can east/gulf coast readers. Here’s some dramatic coverage of the second 1000 year flood in Ellicott, MD to hit in two years. No hurricane needed.

China’s city of Tianjin to set up $16-billion artificial intelligence fund

Reuters

from May 17, 2018

China’s northern port city of Tianjin announced plans on Thursday to set up funds worth 100 billion yuan ($16 billion) to support the artificial intelligence industry, official news agency Xinhua said.

China aims to become a world leader in artificial intelligence by 2025, taking on U.S. dominance in the sector amid heightened international tension over military applications of the technology.

Finland offers free online Artificial Intelligence course to anyone, anywhere

Yle Uutiset

from May 15, 2018

Helsinki University hopes that one percent of the Finnish population – some 54,000 people – will take the online course this year. So far 24,000 have signed up.

The Future of AI Depends on High-School Girls

The Atlantic, Lauren Smiley

from May 23, 2018

Artificial intelligence is considered the major driver of what’s known as the fourth industrial revolution (after the steam engine, electricity and mass production, and the digital eras), with major tech companies like Google, Facebook, Amazon, and Microsoft realigning around it. Algorithms are driving ever more real-world decisions: helping doctors detect cancer; suggesting who should be released from jail, interviewed for a job, or get a loan. While some high-profile technologists, such as Elon Musk, have expressed fears that AI could become an existential threat to humanity, others in the field have identified a more immediate concern: far from some God-like omniscience, AI can be as biased and fallible as the humans who build it. AI has already made embarrassing mistakes, like when Google Photos auto-tagged pictures of two black people as gorillas earlier this year because the algorithm, it seems, wasn’t good at correctly labeling some non-white faces. An Uber self-driving car killed a pedestrian in Arizona. While women were fighting for full sexual agency in the real world, mostly male roboticists were creating AI-enhanced mostly female sexbots. Bringing people like Stephanie Tena into artificial intelligence is not simply important for the tech industry; in a world increasingly driven by algorithms, it’s important for all of us.

A.I. for Healthcare Gets Even More Crowded as GV Invests in Owkin

Xconomy, David Holley

from May 23, 2018

Owkin announced its initial $11 million Series A round in January, led by Otium Venture with help from Cathay Innovation, Plug and Play, and NJF Capital.

The company is part of a massive wave of startups that are trying to organize and interpret the massive and growing pool of biomedical data—and it’s no surprise that big businesses are also looking for their share of the pie. IBM (NYSE: IBM) and GE (NYSE: GE) have been leaders in the healthcare data business, spending billions of dollars on partnerships and acquisitions, such as IBM’s buyout of Ann Arbor, MI-based Truven Health Analytics and GE’s collaborations with UC San Francisco’s Center for Digital Health Innovation, Boston Children’s Hospital, and others. And drug companies are investing in data analytics businesses; see, for example, Celgene’s work with GNS Healthcare on its software for predicting whether treatments will work for specific patients.

When the humanities meet big data

Christian Science Monitor, Eoin O'Carroll

from May 16, 2018

Being a voracious reader is a prerequisite for academics in the humanities, but even the most dedicated bookworm needs to eat, sleep, and socialize.

Not so for computers, which are known for being tireless, thorough, and very fast. And, when asked the right kinds of questions, these electronic speed-readers can grasp patterns that would otherwise lie beyond the reach of human scholars.

That’s exactly what happened when a team of researchers used machine-learning techniques to plow through transcripts of 40,000 speeches in a parliamentary assembly during the first two years of the French Revolution, according to a paper published in the Proceedings of the National Academy of Sciences last month. By quantifying the novelty of speech patterns and the extent to which those patterns were copied by subsequent speakers, the researchers illustrated how much of the important intellectual work of the revolution was initially carried out in committees, rather than in the whole assembly.

A focus on cost instead of value threatens future of personalized medicine

STAT, Edward Abrahams

from May 23, 2018

One size fits all never worked as a strategy for the fashion industry. Or the auto, furniture, and most other industries. It shouldn’t be the strategy for medicine either, but unfortunately more often than not it is. And it may stay that way if the focus on cost rather than value precludes the improved health care strategies made possible by personalized medicine.

According to the World Health Organization, an effective health system requires “reliable information on which to base decisions and policies.” The evolving field of personalized medicine delivers just that kind of information in the form of data about the unique biological characteristics of each patient. Doctors can often use that information to make medical decisions based on an understanding of how a patient will respond at a molecular level to a specific therapy. This helps ensure that therapies are prescribed only to those who will benefit from them, sparing side effects and expenses to those who will not.

Although the majority of patients say they aren’t yet aware of personalized medicine, they also say they don’t want to miss out on this opportunity.

Hurricanes: Stronger, slower, wetter in the future?

National Science Foundation

from May 21, 2018

Scientists have developed a detailed analysis of how 22 recent hurricanes would be different if they formed under the conditions predicted for the late 21st century.

While each storm’s transformation would be unique, on balance, the hurricanes would become a little stronger, a little slower-moving, and a lot wetter.

In one example, Hurricane Ike — which killed more than 100 people and devastated parts of the U.S. Gulf Coast in 2008 — could have 13 percent stronger winds, move 17 percent slower, and be 34 percent wetter if it formed in a future, warmer climate.

AI Is ‘Part of the Answer’ to Fake News, Facebook Scientist Says

Bloomberg Technology, Jeremy Kahn

from May 23, 2018

Artificial intelligence is helping Facebook Inc. tackle problems of extremist propaganda, fake accounts and hate speech, but is still not sophisticated enough to handle many of the most pressing issues facing the social network, the company’s leading AI researcher said Wednesday.

Yann LeCun, Facebook’s chief AI scientist and a pioneer in the development of deep learning, said that “AI is part of the answer, but only part,” of the solution to the issues facing the company.

It’s Never Too Early To Think About 6G

IEEE Spectrum, Michael Koziol

from May 22, 2018

The first thought that popped into your head as soon as you read this article’s headline was likely something along the lines of “Wait, I thought we were still waiting on 5G.” And that’s true: This is the year 5G deployment is finally picking up steam.

But that’s precisely why ComSenTer, a multi-university research effort into the fundamentals of what 6G might look like, is already turning its attention to the next next generation of wireless. 5G will utilize higher frequency spectrum than previous generations in order to improve data rates. Insomuch as anyone has an idea of what 6G might look like, it’s a good bet that it will take that same tack.

“It’s not clear what 6G will be,” says Sundeep Rangan, the director of NYU Wireless, one of the institutions participating in ComSenTer. “If it is the case that 6G or other communications systems can benefit from very, very high frequency transmissions, we need to start looking at that now.”

MIT’s Super-Efficient Dispatching Algorithm Minimizes a City’s Taxi Fleet

IEEE Spectrum, Philip E. Ross

from May 23, 2018

Researchers at MIT say they’ve found an efficient dispatching algorithm that can cut a city’s fleet of taxis by 30 percent.

They describe their work in a paper published today in Nature.

“New York would need 30 percent fewer vehicles if the taxi fleet, even with human drivers, is managed better,” Carlo Ratti, the director of MIT’s Senseable City Lab tells IEEE Spectrum. That’s a big savings, both in taxis and in the space they take up on city streets. New York’s 14,000-odd taxis log some 500,000 trips a day.

Study maps careers of CS PhDs using decades of data

University of Michigan, Computer Science & Engineering

from May 21, 2018

Of the many burning questions in the world of computing research, the one most dear to a student’s heart has typically been the least investigated: what happens after a PhD in computer science?

Prof. Danai Koutra and CSE PhD student Tara Safavi set out to provide the world’s first data-driven answer, analyzing several decades of post-PhD computing careers using a large new dataset rich with professional information. From their findings, they developed a career network model, called R3, that captures the dynamics of computing careers over time.

Events

Cornell Tech@Bloomberg: The Future of Food with Bowery Farming CEO Irving Fain

Tech at Bloomberg

from June 12, 2018

New York, NY June 12, starting at 5:30 p.m., Bloomberg L.P. (731 Lexington Ave). [registration required]

Toward Natural Language Query Processing for Bioinformatics Polystores

University of California-Berkeley, Berkeley Institute for Data Science

from July 19, 2018

Berkeley, CA July 19, starting at 3 p.m. BIDS Lecture Series Speaker: Kurt Stockinger from Zurich University of Applied Sciences. [free]

TV of Tomorrow

Funicular Media

from June 13, 2018

San Francisco, CA June 13-14 at Golden Gate Club. “TVOT consistently attracts a veritable “who’s who” of the interactive/advanced-TV, MVPD, OTT-TV, advertising, broadcasting/programming, data/measurement, streaming-infrastructure, consumer-electronics and social-video industries.” [$$$$]

Deadlines

Call for participation in rating the Fairwork Principles

The aim of the project is to certify three kinds of online platforms: location specific platforms (e.g. Uber/Deliveroo), microwork platforms (e.g. Mturk), and macrowork or online freelancing platforms (e.g. Upwork/ Freelancer). We now plan to draw up an annual ranking of platforms within these three types, as well as certifying individual platforms.

To continue the collaborative development of the project, we invite your input about fair principles and standards that you or your organisation believe should apply in the platform economy.

Call for Applications – CCC Leadership in Embedded Security Workshop

“The Cybersecurity Taskforce of the Computing Community Consortium will host a leadership workshop to envision the future of embedded security research on August 13th in Baltimore, Maryland.” Invitation only – deadline for 1-page position papers & application is June 1.

U.S. Department of Energy Request for Information on Machine Learning for Geothermal Energy and the Geosciences » CCC Blog

“The Geothermal Technologies Office (GTO), within the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy, announces an RFI seeking feedback from industry, academia, research laboratories, government agencies, and other stakeholders regarding research opportunities associated with applying machine learning techniques toward challenges in the geosciences that are relevant to geothermal energy.” Deadline for submissions is June 6.

Helium Grant

“Applications are accepted on a rolling basis. For 2018, at least one $1,000 grant is awarded every three months, for a total of three grants, according to the following schedule. Please keep these dates in mind when considering the timing of your project.” First deadline for submissions is June 30.

Tools & Resources

Advances in Semantic Textual Similarity

Google AI Blog, Yinfei Yang

from May 17, 2018

The recent rapid progress of neural network-based natural language understanding research, especially on learning semantic text representations, can enable truly novel products such as Smart Compose and Talk to Books. It can also help improve performance on a variety of natural language tasks which have limited amounts of training data, such as building strong text classifiers from as few as 100 labeled examples.

Below, we discuss two papers reporting recent progress on semantic representation research at Google, as well as two new models available for download on TensorFlow Hub that we hope developers will use to build new and exciting applications.

What is surprisingly useful to know as a data scientist? : datascience

reddit.com/r/datascience

from May 23, 2018

Methods, skills, ephemera, etc. [65 comments as of May 24, 5 p.m.]

Finding the Right Fit: How to Scope a Data Project

Data-Smart City Solutions, Erica Pincus

from May 22, 2018

Too often, teams dive into a data project only to realize a few months later that they are either solving the wrong problem, or don’t have the data they need to reach an answer. Policymakers, researchers, and the media tend to place emphasis on how to execute data projects in order to produce results, but the prior phase—that of scoping the data project—is equally important for the project’s success.

Before implementing analytics techniques to pull insights from data, governments need to identify those places where data can make a difference. At the Civic Analytics Network’s inaugural Summit on Data-Smart Government, Lauren Haynes, former Associate Director of the Center for Data Science and Public Policy (DSaPP) at the University of Chicago, explored the concept of data scoping. In her session, “How to Scope Data Projects,” Haynes explored how cities can identify areas ripe for analytics in order to maximize the value of data.

Sports.BradStenger.com

Data Science newsletter – May 24, 2018

Leave a Comment Cancel reply