Data Science newsletter – November 8, 2019

Newsletter features journalism, research papers, events, tools/software, and jobs for November 8, 2019

GROUP CURATION: N/A

 
 
Data Science News



The incentives are all wrong (causal inference edition)

Statistical Modeling, Causal Inference, and Social Science blog, Andrew Gelman


from

After I described this possible plan of analysis, the people talking with me agreed that it was reasonable, but they argued that such an analysis could never have been published in a top journal. They argued that the apparently clean causal identification of the regression discontinuity analysis made the result publishable in a way that a straightforward observational study would not be.

Maybe they’re right.

If so, that’s really frustrating. We’ve talked a lot about researchers’ incentives to find statistical significance, to hype their claims and not back down from error, etc., as well as flat-out ignorance, as in the above example, researchers naively thinking that some statistical trick can solve their data problems. But this latest thing is worse: the idea that a better analysis would have a lower chance of being published in a top journal, for the very reasons that makes it better. Talk about counterfactuals and perverse incentives. How horrible.


Tackling Climate Change with Machine Learning

arXiv, Computer Science > Computers and Society; David Rolnick et al.


from

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.


Berkeley Lab develops machine learning algorithm to predict traffic flow

The Daily Californian student newspaper, Marc Escobar


from

A machine learning algorithm is being developed to predict traffic flow and traffic patterns on California highways.

A team of Lawrence Berkeley National Laboratory computer scientists and the California Department of Transportation, or Caltrans, are collaborating to use high-performance computing and machine learning to help Caltrans make decisions when accidents occur.


Nebraska researchers partner with Library of Congress

University of Nebraska–Lincoln, Nebraska Today


from

How can the United States Library of Congress — one of the world’s largest repositories of information — bring its collections into the digital age?

It’s a question library leadership has been attempting to answer, and a collaboration between the Library of Congress and University of Nebraska–Lincoln scholars and students has laid a strong foundation for machine learning to play a role in future digital strategies.


Stanford scientists link Neanderthal extinction to human diseases

Stanford University, Stanford News


from

Employing mathematical models of disease transmission and gene flow, Greenbaum and an international team of collaborators demonstrated how the unique diseases harbored by Neanderthals and modern humans could have created an invisible disease barrier that discouraged forays into enemy territory. Within this narrow contact zone, which was centered in the Levant where first contact took place, Neanderthals and modern humans coexisted in an uneasy equilibrium that lasted tens of millennia.


Stanford researchers lay out first genetic history of Rome

Stanford University, Stanford News


from

In the last decade or so, an increasing number of studies have used DNA sampled from ancient skeletons to fill in important details of human history. Rome presented an interesting opportunity to use the same ancient DNA techniques to fill in details left out of the historical record. “The historical and archaeological records tell us a great deal about political history and contacts of different kinds with different places – trade and slavery, for example – but those records provide limited information about the genetic makeup of the population,” Pritchard said.

To find out what that makeup looked like, the Stanford team partnered with a host of European researchers, including senior authors Alfredo Coppa, a professor of physical anthropology at Sapienza University, and Ron Pinhasi, an associate professor of evolutionary anthropology at the University of Vienna, to gather 127 human DNA samples from 29 sites in and around Rome dating from between the Stone Age and medieval times.


UT Austin Launches Institute to Harness the Data Revolution

University of Texas at Austin, News From the College of Natural Sciences


from

Advances in machine learning are announced every day, but efforts to fundamentally rethink the core algorithms of AI are rare.

The University of Texas at Austin announced this week that it has received a three-year, $1.5 million National Science Foundation TRIPODS (Transdisciplinary Research in Principles of Data Science) award to establish a new institute on the foundations of data science. The institute will coordinate foundational research in AI and data science across several university departments, launch a large-scale workshop and signature seminar series, and provide seed funding for a number of graduate and post-doctoral fellowships in artificial intelligence and machine learning.


University of Miami to adopt new telecommunication technology, including 5G+

Miami Herald, Howard Cohen


from

The University of Miami is poised to become the first campus in the nation to adopt a new technology that could speed and expand computing time and mobile app tools, according to the school and the providing company.

AT&T plans to roll out 5G+ millimeter wave and Multi-access Edge Computing technology on the Coral Gables campus, with a target completion date in spring 2020.


Science must move with the times

Nature, Comment, Philip Ball


from

Research cannot fulfil its social contract and reach new horizons by advancing on the same footing into the future, argues Philip Ball in the last essay of a series on how the past 150 years have shaped today’s science system, to mark Nature’s anniversary.


For science’s sake, the government must approve FY20 spending bills

TheHill, Elizabeth Rogan


from

Once again, Congress is poised to approve a second continuing resolution (CR) to keep the U.S. government running when the current resolution expires Nov. 21. As some congressional leaders noted, it is unlikely Congress will reach agreement on any of the 12 fiscal year (FY) 2020 spending bills by the expiration date, raising serious concerns that funding will remain flat for government agencies into, or even through, 2020. The current CR keeps the U.S. government open but operating at FY2019 spending levels. Even more ominous is the threat of another government shutdown should negotiations between Congress and the White House again collapse. Either scenario presents serious consequences for scientific research if federal agencies delay the rollout of new programs such as the National Quantum Initiative (NQI), a coordinated multiagency effort to support research and training in quantum information science.

Without an FY2020 spending package, several federal agencies that support science risk the loss of funding increases approved by House or Senate appropriators – 10 percent for the Department of Energy’s Office of Science, 5 percent for the National Nuclear Security Administration’s Inertial Confinement Fusion Program, 6 percent for the National Institute of Standards and Technology, 7 percent for the National Science Foundation, and 6 percent for the National Institute of Biomedical Imaging and Bioengineering.


Twitter Topics: follow subjects automatically in the timeline

The Verge, Casey Newton


from

Recently, a friend told me he wanted to spend more time using Twitter, but he didn’t quite know how. His primary interest is comedy, he told me, and he hoped to find a way to see comedians’ best jokes on Twitter as they were posted. But when he followed comedians, he mostly saw a lot of self-promotion — tour dates, late-night appearances, and that sort of thing. No matter your personal interests, there are countless good and relevant tweets on Twitter. But where are they?

Topics, a new feature from Twitter that is starting to roll out this week, represents a significant effort to answer that question. You will be able to follow more than 300 “topics” across sports, entertainment, and gaming, just as you are currently able to follow individual accounts. In return, you’ll see tweets from accounts that you don’t follow that have credibility on these subjects.


‘Chronicle Is Dead and Google Killed It’

VICE, Motherboard, Lorenzo Franceschi-Bicchierai and Joseph Cox


from

“We want to 10x the speed and impact of security teams’ work by making it much easier, faster and more cost-effective for them to capture and analyze security signals that have previously been too difficult and expensive to find,” Gillett wrote in a blog post announcing Chronicle. “We know this mission is going to take years, but we’re committed to seeing it through.”

At the time it was unclear what Chronicle was going to be. But industry observers were excited for what they thought was going to be a significant disruptor in an industry that is full of relatively old technologies such as antivirus and firewalls, is rife with products that offer solutions in search of a problem and outright snake oil.

Just a year and a half after the announcement, however, Chronicle has been folded back into Google, within its Cloud department—and some employees feel Chronicle management abandoned and betrayed the original vision. Chronicle’s CEO and chief security officer have already left and the chief technology officer is leaving later this month, Motherboard has learned. Others are eyeing the exit.


Gardner’s artificial intelligence bill advances in Senate committee

Colorado Politics , Michael Karlik


from

“Our bill will bring agencies, industry, and others to the table to discuss government adoption of artificial intelligence and emerging technologies,” Gardner said in a statement.

The AI in Government Act defines artificial intelligence as any type of computer programming that would enable the computer to carry out tasks of the sort that “would require intelligence if performed by a human.”

The bill would create an AI Center of Excellence within the General Services Administration, which would coordinate AI use in the public interest and house the government’s technical expertise.


Splatter master Jackson Pollock avoided “coiling” when creating his paintings

Ars Technica, Jennifer Ouelette


from

For the last few years, Roberto Zenit, a physicist with the National Autonomous University of Mexico and Brown University, has been studying the physics of fluids at work in novel painting techniques like those used by Pollock and Mexican muralist David A. Siqueiros. Pollock, for instance, early on employed a “flying filament” or “flying catenary” technique before he perfected his dripping methods. The paint forms various viscous filaments, which are thrown against a vertical canvas. Zenit and several colleagues were able to recreate the fluid action by mounting a paint-filled brush on a rapidly rotating mechanical arm.

Pollock’s dripping technique involved laying a canvas flat on the floor and then pouring paint on top of it. Sometimes he poured it directly from a can, sometimes he used a stick, knife, or brush, and sometimes he used a syringe. The artist usually “rhythmically” moved around the canvas as he worked. His style has long fascinated physicists, such as the controversy surrounding the question of whether or not Pollock’s paintings show evidence of fractal patterns.


Two former Twitter employees accused of spying for Saudi Arabia

CNN Business, David Shortell


from

Federal prosecutors accused two former Twitter employees of spying on behalf of Saudi Arabia on Wednesday.

Ali Alzabarah, a Saudi national, and Ahmad Abouammo, a US citizen, used their access at the social media giant to gather sensitive and nonpublic information on dissidents of the Saudi regime, the Justice Department alleged in a criminal complaint.

The case, unsealed in San Francisco federal court, underscores allegations the Saudi government tries to control anti-regime voices abroad. It also recalls a move reportedly directed by the country’s controversial leader to weaponize online platforms against critics.

 
Events



MIT Grand Hack NYC

MIT Hacking Medicine


from

New York, NY November 22-24. “This is the weekend to brainstorm and build innovative solutions with hundreds of like-minded engineers, clinicians, designers, developers and business people. Within our multi-theme event, there is sure to be a healthcare challenge for everyone!” [registration required]

 
Deadlines



Facebook Research: Content Governance request for proposals

“There is a need for expanding a diverse and independent body of research and advocacy work across these disciplines, in support of content governance for digital platforms. Thus, we are requesting proposals for academic research or civil society advocacy in the areas listed above. Proposals do not have to be limited to the Oversight Board. They should, however, support a specific research problem or advocacy issue related to online content governance.” Deadline for proposals is December 6.

ACM/IEEE International Conference on Human-Robot Interaction

Cambridge, England March 23-26, 2020. Deadline for submissions of Late Breaking Reports, alt.HRI papers, Video, Demonstrations, Student Design Competition is December 10.
 
Tools & Resources



Google updates Teachable Machine so you can train an AI without code

The Next Web, Ivan Mehta


from

Machine learning and artificial intelligence are complex subjects and while you might see them being mentioned every day, you might not necessarily understand how they work.

Two years ago, Google launched a site called Teachable Machine, which let you train a simple model using their camera without any code. Now, it’s launching an updated version so you can train more advanced models.


Practical Compositional Fairness: Understanding Fairness in Multi-Task ML Systems

DeepAI, Xuezhi Wang, et al.


from

Most literature in fairness has focused on improving fairness with respect to one single model or one single objective. However, real-world machine learning systems are usually composed of many different components. Unfortunately, recent research has shown that even if each component is “fair,” the overall system can still be “unfair”. In this paper, we focus on how well fairness composes over multiple components in real systems. We consider two recently proposed fairness metrics for rankings: exposure and pairwise ranking accuracy gap. We provide theory that demonstrates a set of conditions under which fairness of individual models does compose. We then present an analytical framework for both understanding whether a system’s signals can achieve compositional fairness, and diagnosing which of these signals lowers the overall system’s end-to-end fairness the most. Despite previously bleak theoretical results, on multiple data-sets—including a large-scale real-world recommender system—we find that the overall system’s end-to-end fairness is largely achievable by improving fairness in individual components. [full text]


Understanding UMAP

Google PAIR, Andy Coenen and Adam Pearce


from

Dimensionality reduction is a powerful tool for machine learning practitioners to visualize and understand large, high dimensional datasets. One of the most widely used techniques for visualization is t-SNE, but its performance suffers with large datasets and using it correctly can be challenging.

UMAP is a new technique by McInnes et al. that offers a number of advantages over t-SNE, most notably increased speed and better preservation of the data’s global structure. In this article, we’ll take a look at the theory behind UMAP in order to better understand how the algorithm works, how to use it effectively, and how its performance compares with t-SNE.

 
Careers


Full-time positions outside academia

Computer Scientist – Machine Learning



Argonne National Laboratory, Leadership Computing Facility; Lemont, IL

Leave a Comment

Your email address will not be published.