Data Science newsletter – June 16, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for June 16, 2018

GROUP CURATION: N/A

 
 
Data Science News



What happens when the AI bubble bursts?

The Next Web, Tristan Greene


from

Today researchers like Ian Goodfellow and Yann LeCun push the envelope when it comes to deep learning techniques. But much of what they and their colleagues do now continues promising work from decades ago. Work which was abandoned due to a lack of interest from researchers and funding from investors.

And it’s not just cutting-edge researchers who need worry. In fact, they may initially be the safest. Google Chief Cloud Researcher Dr. Fei Fei Li will probably find work in all but the coldest of AI winters, but the graduating class of 2023 might not find themselves so lucky. In fact, researchers at university could be the first to suffer – when the AI funding dries up it’ll probably effect Stanford’s research department before Microsoft’s.

So how do we know if an AI winter is coming? The short answer: we don’t, so suck it up and sally-forth. But the long answer is, we take a look at the factors that can cause one.


Major Study of Drinking Will Be Shut Down

The New York Times, Roni Caryn Rabin


from

An investigation at the National Institutes of Health concluded that the $100 million trial had been tainted by funding appeals to, and coordination with, the alcohol industry.


Michigan enacts toughest lead rules in US after Flint crisis

Associated Press, David Eggert


from

Michigan on Thursday began enforcing the nation’s strictest rules for lead in drinking water, a plan that eventually will result in replacing all 500,000 lead service pipes statewide in the wake of the contamination of Flint’s supply.

The lead and copper rules will drop the “action level” for lead from 15 parts per billion, the federal limit, to 12 in 2025. Underground lead service lines connecting water mains to houses and other buildings will be replaced by 2040, unless a utility can show regulators it will take longer under a broader plan to repair and replace its water infrastructure.

The rules also will prohibit the partial replacement of lead service pipes except for emergency repairs; require preliminary and final inventories of the lines and other components of a water supply by 2020 and 2025; and ensure samples are taken at the highest-risk sites and with methods designed to more accurately detect lead. Additional changes are designed to verify that corrosion control is working and better educate the public about lead in water.

The plan could cost $2.5 billion over decades, money that is expected to largely come from water customers.


Anaconda: Data Science Exiting Hadoop for the Cloud

datanami, George Leopold


from

Data scientists are embracing cloud-native frameworks as they move on from on-premises data infrastructure previously dominated by Hadoop, concludes a survey on the state of data science.

The shift is driven in part by the enterprise transition from merely managing big data to using machine learning and other connected data tools to glean insights in real time, according to the data science survey released this week by Python platform specialist Anaconda Inc. Cloud-native technologies such as applications containers and the Kubernetes cluster orchestrator are growing at the expense of traditional big data technologies such as Hadoop and Apache Spark, the survey of more than 4,200 data scientists found.


Accenture Unveils Tool to Help Companies Insure Their AI Is Fair

Bloomberg Technology, Jeremy Kahn


from

Accenture developed a software tool that does three things: It lets users define the data fields they consider sensitive — such as race, gender or age — and then see the extent to which these factors are correlated with other data fields. Race, for example, might be highly correlated with a person’s postcode, so to de-bias an algorithm it wouldn’t be enough to simply avoid considering race; postcode would also have to be de-biased.

[Rumman] Chowdhury, who showcased the tool publicly for the first time Tuesday at an AI conference in London, said Accenture uses a technique called mutual information that essentially eliminates the bias in algorithms.


Bill Gates gives a book to every US student graduating in 2018

The Guardian, Alison Flood


from

The Microsoft mogul is presenting all 4 million students passing out of college this year with a free download of Hans Rosling’s Factfulness


Friends of Friends Can Reveal Hidden Information about a Person

Scientific American, Andrea Anderson


from

People generally spend time with others who are like them, making it easy for data scientists to infer individuals’ attitudes or personality attributes by analyzing their online and real-world social networks. Researchers call this tendency to seek out like-minded people “homophily.” Think of the old adage “birds of a feather flock together,” says Johan Ugander, a management science and engineering researcher at Stanford University, who studies this topic.

But in a twist on the topic, Ugander and his graduate student Kristen M. Altenburger have found that some people are consistently drawn to those with certain dissimilar attributes. The researchers call the variation introduced by this phenomenon “monophily.” Scientists previously assumed that heterogeneity would make it harder to draw conclusions about people based on friend networks. But Ugander and Altenburger’s research demonstrates that monophily produces an effect whereby a person’s friends of friends are similar to them in ways that immediate friends may not be. This could make it easier than anticipated for scientists to infer personal characteristics that might otherwise remain hidden—and is one more way for data miners to trace personal information.


CAES launches Certificate in Agricultural Data Science

University of Georgia, UGA Today


from

From remote moisture sensors that produce a real-time feed of soil conditions to drones that use optical data to spot plant disease, new streams of data will fuel the next green revolution.

Remote sensing technologies will offer farmers the ability to customize irrigation and fertilizer applications for areas that have unique characteristics within fields, which will reduce ecological impacts and costs. However, putting precision agriculture strategies into practice requires agricultural scientists who are equipped to interpret the data that these sensors generate.

In fall 2018, the University of Georgia College of Agricultural and Environmental Sciences will launch an Interdisciplinary Certificate in Agricultural Data Science to equip CAES graduate students with the data analysis expertise that they will need to capitalize on this big data revolution.


New Yale report recommends science priorities for the decade ahead

Yale University, YaleNews


from

A university committee has made strategic recommendations to bolster Yale’s position as a leading global research university and nurture scientific discoveries with the potential to improve the world. The report recommends bold priorities for investment and changes to organizational structure.

The University Science Strategy Committee (USSC) recently presented its findings to President Peter Salovey, Provost Ben Polak, and other university leaders. President Salovey has identified science as one of Yale’s top priorities.

“For centuries, Yale faculty members and students have advanced scientific research and education. We will invest strategically in science and engineering in order to continue to push the frontier of human knowledge and discovery,” said President Salovey.


How to respond to the many applications of artificial intelligence

The Brookings Institution, Jack Karsten


from

Recent advances in computing power, algorithms, and sensor technology in recent years have combined to rapidly expand the capabilities of artificial intelligence (AI). Applications exist not only in private sector industries like healthcare, finance, and retail, but in public sector settings like criminal justice and national security as well. Each of these areas present questions about the responsible development and deployment of AI. To discuss these issues, Governance Studies hosted the ninth annual A. Alfred Taubman Forum on Public Policy at the Brookings Institution on June 12. The event featured three panels of AI experts that drew from Brookings fellows, academia, government, and industry. The panelists discussed the challenges and opportunities of AI in national security, economic, and public policy contexts.


Democracy From the Sidelines: How U.S. Politics Became a Spectator Sport

Behavioral Scientist, Evan Nesterak


from

There’s a puzzle here: Why are those of us in the U.S. more likely to be engaged and interested in national rather than local politics, even when we know our chance for impact is at the local level?

Dan Hopkins, associate professor of political science at the University of Pennsylvania, has a few ideas. In his latest book, The Increasingly United States: How and Why Political Behavior Nationalized, he explores how politics in the U.S. have largely become a spectator sport. That is, politics have become more nationalized—issues are more likely to be argued and adjudicated at the national rather than local level. This stands at odds with the historical intent of our framers, who envisioned that political power would favor states and municipalities.

I had the chance to speak to Dan about the evidence and reasons for nationalization, its consequences, and what it means for the future of U.S. politics.


Sexual harassment in science: 4 big takeaways from a huge new report

Vox, Julia Belluz


from

Science needs to reckon with the #MeToo moment, and it needs to do so immediately, says a new report from the prestigious National Academies of Sciences.

The scientific community hasn’t seen quite the gush of sexual harassment claims that Hollywood has. But allegations have emerged in fields as diverse as geology and climate change, anthropology, and biology, with the resignation of cancer biologist Inder Verma from the Salk Institute for Biological Studies last week. The sweeping new review says these anecdotes represent a systemic problem in academia.

Based on decades of research on sexual harassment in medicine, engineering, and science, as well as qualitative interviews with harassment victims, the National Academies described how pervasive sexual harassment is not only putting women at risk but also harming their career prospects and driving them away from science.

“More rapid and sustained progress in closing the gender gap in science, engineering, and medicine is jeopardized by the persistence of sexual harassment and its adverse impact on women’s careers in our nation’s colleges and universities,” the report reads. Academic institutions are currently set up to protect themselves and not the victims — and that has to change.


candidate: Data Visualization of the Week

Twitter, Shah Selbe


from


University Data Science News

You’re going to want to watch this video compilation of deep fakes from SIGGRAPH, created by a group of European and American researchers. This is the future our technology has wrought. What are we going to do to combat the exponential increase in the potential for fake news and fake video to change minds, shape elections, and persuade people?



The National Academies of Science finds rampant gender harassment across scientific fields. Thirty to forty percent of graduate students and up to half of medical students (depending on the survey) report experiencing gender and/or sexual harassment. Many women hear comments designed to shut them up or demean them in the workplace. These comments come from men and women alike and are “intended to create a hostile environment toward women, to put them down.” These women may then be far less likely to continue pursuing careers in these fields. If anyone reading this thinks women and LGBTQ people should just grow thicker skin, save that thought for yourself. What rational thinking person would choose a hostile work environment when their talents and work ethic mean they could easily choose a different work environment? These are wonderful, talented people who are looking for a job into which to pour their massive intellect, commitment and (hopefully) caring concern.

Physicists at MIT and Israel’s Technion have found a way to reduce the wavelength of light (photons) to bring them closer to the wavelength of matter (electrons). Still in early days, this could make it easier for light to interact with matter, leading to applications such as “more efficient solar cells…and new kinds of lasers and light-emitting diodes (LEDs).” Click through if you want to find out what a plasmon is.


NSF just awarded Alex Szalay of Johns Hopkins $1.8 million to build the Open Storage Network. This follows a $1 million seed grant from Schmidt Futures. The Open Storage Network will provide a place to store and share scientific data by “creating data transfer systems that can handle lots of data while matching the speed of a 100-gigabit network connection with a small number of nodes.” It will roll out over the next couple of years.



Chris Jermaine of Rice University announced a new computational system designed for “high-performance, big data codes.” From the SIGMOD conference, team member Jia Zou noted, “PlinyCompute was at least twice as fast and in some cases 50 times faster at implementing complex object manipulation and library-style computations as compared to Spark.”



MIT kicked off its inaugural four-course online MicroMasters program in Statistics and Data Science. So far, only six universities will accept the credits towards their degrees. Total cost is $1350, well below a single semester at MIT. Completing the program will grant an MITx credential on participant’s resumes.



Stuart Geiger of UC-Berkeley and his co-authors have a new report out detailing why data scientists in graduate and postdoc programs are having a hard time staying in academia. They found that it isn’t the large industry salaries pulling them out, it’s the lack of sustainable career paths pushing them out. The work associated with doing data science is less valued than mainstream publishing in their core domains, teaching, or even traditional academic service work. The report adds to a mounting pile of evidence on the casualization of academic labor. First, the large and creeping reliance on adjuncts to teach and graduate students to conduct research. Since postdocs are cheaper than grad students (who have started successful unionization campaigns in a growing number of schools), they are now doing a good deal of data science work in a kind of purgatorial state. The years spent in the postdoc status typically span from four to twelve (yes, twelve) years and cover the time period when many adults have children, though that reality is often curtailed by the low-wage, itinerant conditions of the postdoc positions.



Fordham School of Law researchers published a comprehensive, sobering report on the largely unregulated sale and use of student data in the commercial market place. I know what you’re all thinking, how is student data unregulated? We have FERPA! The truth is, “student lists are commercially available for purchase on the basis of ethnicity, affluence, religion, lifestyle, awkwardness and even a perceived or predicted need for family planning services.”



Cornell University increased the diversity of its incoming CS cohort by:
1. actively recruiting outside of their typical rounds. They did extra legwork.
And 2. Adding two new questions to their application about “how well the students had done with the opportunities they had” and if they brought “a unique perspective to their research area.” … Not only did they get underrepresented racial minorities, they ended up with a whole different pool of students altogether. Are you worried this strategy may have diluted the quality of applicants? Examine yourself for buried racist privilege, please. Their new recruiting strategy brought in more overall applications and made them more selective.



University of Georgia College of Agricultural and Environmental Sciences (CAES) has introduced the first certificate in *agricultural* data science. A survey of CAES graduate students found that “almost 90 percent were interested in the certificate program, and almost 50 percent” wanted to integrate data science into their degree. Not only is there a growing amount of sensor data from plant- and field-wearables, there is also an increasing stream of data from satellites and a more robotic set tractors, combines, and other farm infrastructure. Data science is unlikely to be the next green revolution, but it will certainly play an increasingly strong role in feeding the planet.

,br/>

 
Events



4th Annual International Conference on Computational Social Science

Northwestern University, Kellogg School of Management


from

Evanston, IL July 12-15. “Launched in 2015, this unique event has successfully convened an interdisciplinary cohort of scientists, who are poised to disrupt and transform business intelligence, operations and organization.” [$$$]


OII London Lecture: Does AI Have Gender?

University of Oxford, Oxford Internet Institute


from

London, England June 25, starting at 6:30 p.m., Pinsent Masons (30 Crown Place). Speaker: Gina Neff from Oxford Internet Institute. [free, registration required]

 
Deadlines



SSRC’s Anxieties of Democracy program

“Open call for proposals | @SSRCanxieties is accepting proposals for a February 2019 workshop focused on political mobilization, political identity, and the consequences of political participation.” Deadline is September 14.
 
Tools & Resources



Metacat: Making Big Data Discoverable and Meaningful at Netflix

Medium, Netflix Tech Blog, Ajoy Majumdar and Zhen Li


from

“Most large companies have numerous data sources with different data formats and large data volumes. These data stores are accessed and analyzed by many people throughout the enterprise. At Netflix, our data warehouse consists of a large number of data sets stored in Amazon S3 (via Hive), Druid, Elasticsearch, Redshift, Snowflake and MySql. Our platform supports Spark, Presto, Pig, and Hive for consuming, processing and producing data sets. Given the diverse set of data sources, and to make sure our data platform can interoperate across these data sets as one “single” data warehouse, we built Metacat. In this blog, we will discuss our motivations in building Metacat, a metadata service to make data easy to discover, process and manage.”


Researchers have released the largest self-driving-car data set yet

MIT Technology Review, Erin Winick


from

“The BDD100K data set, made up of 100,000 videos recorded onboard autonomous cars, is now available for download from the University of California, Berkeley.”

 
Careers


Postdocs

Postdoctoral Associate – Politics:



NYU, Wilf Family Department of Politics, Center for Data Science; New York, NY
Full-time positions outside academia

Senior Data Scientist



Perform Sports Content & Media, Opta Sports; London, England

Leave a Comment

Your email address will not be published.