Data Science newsletter – September 15, 2017

Newsletter features journalism, research papers, events, tools/software, and jobs for September 15, 2017

GROUP CURATION: N/A

 
 
Data Science News



DSI Professor Develops Software for Automatic Machine Learning

Columbia University, Data Science Institute


from

The NSF is funding his project to develop automatic machine learning, which means users will not have to select an algorithm for their data. Mueller’s software will automatically select an algorithm for them. It’s not easy to pick the right algorithm, says Mueller, especially for lay people who can be baffled by the complex choice of data processing, knobs and settings.

“Researchers who develop new algorithms often don’t provide them in a way that is useful for a wide audience,” says Mueller, a lecturer at the Data Science Institute who is a nationally-known applied data scientist.


Feds ban use of Kaspersky software

The Boston Globe, Hiawatha Bray and Andy Rosen


from

The Trump administration Wednesday ordered all federal agencies to stop using software produced by Kaspersky Lab, dealing a major blow to the Russian computer security company that has been trying to make inroads into the US market from its local headquarters in Woburn.

In issuing the edict, acting Homeland Security Director Elaine Duke cited ties between Kaspersky officials and the Putin government, as well as a requirement that Russian companies cooperate with that country’s intelligence agencies. She said “malicious cyber actors” could use Kaspersky software to access government files.

“The risk that the Russian government, whether acting on its own or in collaboration with Kaspersky, could capitalize on access provided by Kaspersky products’’ imperils US national security, Duke said in a statement.


Expectations and Skills for Undergraduate Students Doing Research in Statistics and Data Science

Amstat News, Jo Hardin


from

As more statistics undergraduates participate in summer and capstone research projects, it becomes paramount for teachers to prepare them adequately for the experiences. Indeed, the most successful research students are those with both a strong understanding of foundational statistical methods and a fluent ability to wrangle data.

Undergraduate research has long been considered one of the strongest high-impact educational practices. However, there are myriad paths to successful undergraduate research, and there is scant information about how to link successful research practices to the undergraduate curriculum.


Why Math Is the Best Way to Make Sense of the World

Quanta Magazine, Ariel Bleicher


from

When Rebecca Goldin spoke to a recent class of incoming freshmen at George Mason University, she relayed a disheartening statistic: According to a recent study, 36 percent of college students don’t significantly improve in critical thinking during their four-year tenure. “These students had trouble distinguishing fact from opinion, and cause from correlation,” Goldin explained.

She went on to offer some advice: “Take more math and science than is required. And take it seriously.” Why? Because “I can think of no better tool than quantitative thinking to process the information that is thrown at me.” Take, for example, the study she had cited. A first glance, it might seem to suggest that a third of college graduates are lazy or ignorant, or that higher education is a waste. But if you look closer, Goldin told her bright-eyed audience, you’ll find a different message: “Turns out, this third of students isn’t taking any science.”

Goldin, a professor of mathematical sciences at George Mason, has made it her life’s work to improve quantitative literacy.


Intelligence Agencies Bank on AI, Social Media to Process Data

Edgy Labs, Chris Parbey


from

According to Dawn Meyerriecks, the CIA’s deputy director for science and technology, the Agency has 137 projects directly related to artificial intelligence (AI).

Most of these projects are being done collaboratively between the Agency and developers in Silicon Valley.


‘Pangeo’ Project Will Improve Access to Climate Data

Columbia University, Earth Institute


from

Researchers at Columbia University’s Lamont-Doherty Earth Observatory (LDEO) and their colleagues have just won a $1.2 million grant from the National Science Foundation. In collaboration with the National Center for Atmospheric Research (NCAR) and Anaconda—a private software company and leader in the emerging field of data science—the team will develop an innovative software called “Pangeo,” which aims to help climate scientists confront the challenges of Big Data.

“Pangeo: An Open Source Big Data Climate Science Platform” is a project designed to solve one of climate science’s most pressing challenges: accessing and utilizing the explosive growth in the size of climate datasets, which have become a bulky but indispensable tool for scientific inquiry in climate change research.


Berkeley RISECamp: Deploying Deep Distributed AI at Scale

IBM Data Science Experience


from

September 7 and 8, 2017 marked the first ever RISECamp at UC Berkeley. RISECamp was a two day bootcamp focused on sharing the work coming from the RISELab. The acronym RISE is explained on the Lab’s website:
Real-time Intelligence with Secure Execution

A big difference between RISELab and the notable AMPLab (where Apache Spark was born), is the focus shifting from batch processing to real time decision making at scale. This transition signifies a trend in how academia and industry are approaching state of the art machine learning systems. While batch is suitable for a wide variety of problems, real time decision systems are capable of learning on the fly from new observations as they flow into the system. These types of systems also take a step closer to true AI systems, rather than applying a model trained offline on new observations.

In addition to the focus on building systems capable making real time decisions (leveraging Reinforcement Learning), the camp also addressed additional challenges facing many organizations that attempt to integrate advanced AI systems at scale.


Facebook Enabled Advertisers to Reach ‘Jew Haters’

ProPublica; Julia Angwin, Madeleine Varner and Ariana Tobin


from

Want to market Nazi memorabilia, or recruit marchers for a far-right rally? Facebook’s self-service ad-buying platform had the right audience for you.

Until this week, when we asked Facebook about it, the world’s largest social network enabled advertisers to direct their pitches to the news feeds of almost 2,300 people who expressed interest in the topics of “Jew hater,” “How to burn jews,” or, “History of ‘why jews ruin the world.’”

To test if these ad categories were real, we paid $30 to target those groups with three “promoted posts” — in which a ProPublica article or post was displayed in their news feeds. Facebook approved all three ads within 15 minutes.


[FoR&AI] The Seven Deadly Sins of Predicting the Future of AI

Rodney Brooks


from

The claims are ludicrous. [I try to maintain professional language, but sometimes…] For instance, it appears to say that we will go from 1 million grounds and maintenance workers in the US to only 50,000 in 10 to 20 years, because robots will take over those jobs. How many robots are currently operational in those jobs? ZERO. How many realistic demonstrations have there been of robots working in this arena? ZERO. Similar stories apply to all the other job categories in this diagram where it is suggested that there will be massive disruptions of 90%, and even as much as 97%, in jobs that currently require physical presence at some particular job site.

Mistaken predictions lead to fear of things that are not going to happen. Why are people making mistakes in predictions about Artificial Intelligence and robotics, so that Oren Etzioni, I, and others, need to spend time pushing back on them?

Below I outline seven ways of thinking that lead to mistaken predictions about robotics and Artificial Intelligence.


The rise of AI is sparking an international arms race

Vox, Sean Illing


from

“Artificial intelligence is the future not only of Russia but of all of mankind … Whoever becomes the leader in this sphere will become the ruler of the world.”

Russian President Vladimir Putin made this statement to a group of students two weeks ago. Shortly thereafter, Tesla’s Elon Musk, who has worried publicly about the hazards of artificial intelligence (AI) for years now, posted an ominous tweet in response to Putin’s remarks.

“China, Russia, soon all countries w/ strong computer science,” he wrote. “Competition for AI superiority at national level most likely cause of WW3 in my opinion.”


University-driven innovation comes to New York

Reuters


from

Why entrepreneurs from schools like Columbia and Cornell are starting companies in New York City.


How AI Is Transforming the Shopping Experience Based on the Images Consumers Look at Online

Adweek, Cortney Harding


from

Trunk Club, an apparel subscription service owned by Nordstrom, has increased Pinterest engagement by more than 100 percent in recent months by embedding artificial intelligence into its digital-images marketing. AI-image search is central to its ambitions of building a data-science model that drives highly relevant product offerings to consumers across the social web. More broadly, the effort speaks to how AI search is quickly becoming the retail sector’s next big digital shopping experience.

“We are trying to understand how one pair of jeans plays out against another pair that was released in another season,” explained Justin Hughes, vp of product development and design at Trunk Club. “We want to get really granular and understand what really works.”

Retailers are excited about AI because the latest smartphones have created a Snapchat-inspired social world where consumers are starting to see “the camera as the keyboard,” added Ron Palmeri, CEO of Layer, the tech company powering Trunk Club’s search capabilities.


Facebook and Google, Show Us Your Ad Data

Bloomberg View, Cathy O'Neil


from

The stores around us, informed by our digital profiles, magically transform to meet our needs and desires. We have the tailored internet experience.

This happy story has exceptions. Predatory industries use the same profiling technology to locate folks desperately in need of money or vulnerable to gambling pitches and sell them things they shouldn’t buy. But in large part, we’re in consumer nirvana.

The dark side of this revolution is that data collected for one use can be repurposed for another. Our marketing silos are information silos, too. Thanks to our tailored experiences, we barely share a common reality with people who disagree with us. They see different products, different political news, even different facts.


UR and Harris Corporation bring ‘Big Data’ center to Rochester

Innovation Trail, Juan Vazquez


from

The University of Rochester is partnering with Harris Corp. to create the Rochester Data Science Consortium that will be housed on the UR campus.

The joint effort will help companies and governments analyze large amounts of data — an industry known as “big data” — in order to spot trends that can help the companies, consumers and citizens.

“Data science analysis is a rapidly growing field, with all sectors using more data to make better decisions,” said Danny Wegman, the president and chairman of the Wegmans Family Charitable Foundation, which committed $20 million to the consortium. He is also the chair of UR’s Board of Trustees. “Higher education is a critical partner in helping industry to understand and analyze the mountains of data that will be processed to discern the right decisions from the wrong ones, and the University of Rochester is in position to lead the way.”

 
Events



Methods, with Guest Nate Silver

Caveat


from

New York, NY September 26, starting at 8 p.m., Caveat
(21 Clinton Street). Methods is an interview show with professional fact-finders that gives the backstory on how they separate fact from fiction to understand the world around them. [$$]


The 12th Gateway Computing Environments Conference

Science Gateways Community Institute


from

Ann Arbor, MI October 23-25 at University of Michigan. “Science gateways allow science & engineering communities to access shared data, software, computing services, instruments, educational materials, and other resources specific to their disciplines.” [$$$]


Data, Ethics, and Decision-Making Lecture Series: “Using Data for Good: What does it mean?”

Columbia University


from

New York, NY Wednesday, September 20, starting at 4:30 p.m., Columbia University, Uris Hall (3022 Broadway). Speaker: Jeannette Wing. [free]


Obama Foundation Summit

Obama Foundation


from

Chicago, IL October 31-November 1. The Obama Foundation will welcome civic leaders from around Chicago, the U.S., and the world to join us for a two-day immersive event in Chicago. Apply to attend.

 
Deadlines



Call for Speakers – O’Reilly Artificial Intelligence Conference

New York, NY Conference is April 29-May 2, 2018. Deadline for speaker proposals is November 14.

IEEE Conference on Computational Intelligence and Games

Maastricht, The Netherlands Conference is September 11-14, 2018. Deadline for abstract submissions is March 1, 2018.
 
Tools & Resources



Online textbook on data visualization with the ggplot2 package

R-bloggers, Revolutions, David Smith


from

A new online textbook, Data Visualization for Social Science, will teach you everything you need to know about creating beautiful and elegant data visualizations using the ggplot2 package and the R language. The author Kieran Healy is a professor of Sociology at Duke University.

 
Careers


Full-time, non-tenured academic positions

BIDS Executive Director



University of California-Berkeley, Berkeley Institute for Data Science; Berkeley, CA
Tenured and tenure track faculty positions

Assistant Professor (3)



University of Wisconsin, Department of Statistics; Madison, WI

Leave a Comment

Your email address will not be published.