Data Science newsletter – January 16, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for January 16, 2018

GROUP CURATION: N/A

 
 
Data Science News



Parsing the patents: CMU seeking clear answers on AI in workforce

Pittsburgh Post-Gazette, Daniel Moore


from

Hundreds of thousands of documents, each jammed with dense bricks of text and complicated drawings and a list of citations, appear every year at the U.S. Patent and Trademark Office — a steady stream of inventions from bright minds who hope their ideas reach far and wide.

To researchers at Carnegie Mellon University, the documents could be the key to anticipating how and where advances in artificial intelligence and machine learning will alter jobs across the country.

The CMU team is getting at an important question in an area of crowded research. The invasion of robots in the American workforce has been addressed in a tide of reports, with broad agreement among labor economists that virtually all jobs have become more computerized and perhaps half of all jobs are likely to be further automated.


New center headquartered at Carnegie Mellon University will build smarter networks to connect edge

PR Newswire, Carnegie Mellon


from

Carnegie Mellon University will lead a $27.5 million Semiconductor Research Corporation (SRC) initiative to build more intelligence into computer networks.
The CONIX research center, directed by Carnegie Mellon’s Anthony Rowe, will rethink how we can integrate cloud and edge computing with networking to ensure that IoT applications can be hosted with performance, security, robustness, and privacy guarantees.
The CONIX research center, directed by Carnegie Mellon’s Anthony Rowe, will rethink how we can integrate cloud and edge computing with networking to ensure that IoT applications can be hosted with performance, security, robustness, and privacy guarantees.

Researchers from six U.S. universities will collaborate in the CONIX Research Center headquartered at Carnegie Mellon. For the next five years, CONIX will create the architecture for networked computing that lies between edge devices and the cloud. The challenge is to build this substrate so that future applications that are crucial to IoT can be hosted with performance, security, robustness, and privacy guarantees.


Notre Dame to lead $26 million multi-university research center developing next-generation computing technologies

University of Notre Dame, Notre Dame News


from

A new $26 million center led by the University of Notre Dame will focus on conducting research that aims to increase the performance, efficiency and capabilities of future computing systems for both commercial and defense applications.

At the state level, the Indiana Economic Development Corporation (IEDC) has offered to provide funding for strategic equipment, pending final approval from the IEDC Board of Directors, to support execution of the program’s deliverables.

“We have assembled a group of globally recognized technical leaders in a wide range of areas — from materials science and device physics to circuit design and advanced packaging,” said Suman Datta, director of the Applications and Systems-driven Center for Energy-Efficient integrated Nano Technologies (ASCENT) and Frank M. Freimann Professor of Engineering at Notre Dame. “Working together, we look forward to developing the next generation of innovative device technologies.”


The Polymath: David Benjamin Is Expanding The Definition of Architecture

Architect magazine


from

Tell me about the relationship that you have with Autodesk, which acquired The Living in 2014. That’s a unique setup for a firm. How did it come about?

It was a hypothesis about collaboration of the future, on both of our parts: a hypothesis from Autodesk that an experimental design studio could be part of an R&D department in a big company, and a hypothesis from The Living that this connection could allow us to do more of what we were already doing in a high-powered way. Autodesk has a connection to a broader ecosystem of research on a variety of interesting topics with super-high-powered researchers. The Living works on commissions and applied research for outside clients like any firm, but to be connected to this bigger community of research in a big company that is thinking about the cities and design processes of the future in ways that are similar to us—and in some ways that are different than us—is valuable. Our mission is to create interesting designs in the built world, and Autodesk’s mission is to make software tools for people to design and build things. At first glance it seems pretty different, but we share a lot in our thinking about new materials, new workflows, and new ways to balance creativity with processes of computation.


New Center at University of Michigan Aims to Democratize the Design and Manufacturing of Next-Generation Systems

HPC Wire


from

As the computing industry struggles to maintain its historically rapid pace of innovation, a new, $32 million center based at the University of Michigan aims to streamline and democratize the design and manufacturing of next-generation computing systems.

The Center for Applications Driving Architectures, or ADA, will develop a transformative, “plug-and-play” ecosystem to encourage a flood of fresh ideas in computing frontiers such as autonomous control, robotics and machine-learning.


NIH support of mobile, imaging, pervasive sensing, social media and location tracking (MISST) research: laying the foundation to examine research ethics in the digital age

Nature, NPJ Digital Medicine; Camille Nebeker et al


from

Mobile Imaging, pervasive Sensing, Social media and location Tracking (MISST) tools used in research are raising new ethical challenges for scientists and the Institutional Review Boards (IRBs) charged with protecting human participants. Yet, little guidance exists to inform the ethical design and the IRB’s regulatory review of MISST research. MISST tools/methods produce personal health data that is voluminous and granular and, which may not be subject to policies like the Health Information Portability and Accessibility Act (HIPAA). The NIH Research Portfolio Online Reporting Tools (RePORTER) database was used to identify the number, nature and scope of MISST-related studies supported by the NIH at three time points: 2005, 2010 and 2015. The goal was to: 1-examine the extent to which the NIH is supporting this research and, 2-identify how these tools are being used in research. The number of funded MISST research projects increased 384% from 2005 to 2015. Results revealed that while funding of MISST research is growing, it only represented about 1% of the total NIH budget in 2015. However, the number of institutes, agencies, and centers supporting MISST research increased by roughly 50%. Additionally, the scope of MISST research is diverse ranging from use of social media to track disease transmission to personalized interventions delivered through mobile health applications. Given that MISST research represents about 1% of the NIH budget and is on an increasing upward trajectory, support for research that can inform the ethical, legal and social issues associated with this research is critical. [full text]


University Data Science News

This week’s episode of The Passions and the p-values comes to us from a giant, collaboratively authored rebuttal to Brian Nosek et al’s paper arguing to lower the p-value to 0.005 from its currently accepted 0.05, thereby reducing false positives. Led by Daniël Lakens, “Justify your Alpha” argues that authors should be allowed to publish using a range of p-values from 0.1 to 0.001 as long as they can justify why they selected the threshold they chose. In most cases, small sample size limits the confidence researchers can have in results, but researchers working in certain fields and at schools with less funding may not be able to conduct the large scale projects p-values of 0.005 require. Lakens does not recommend having 100 co-authors. Only 87 are listed because some disagreed with the paper’s final version. Never a shortage of friction on The Passions and the p-values.



Citizen scientists using Kepler data on Zooniverse found a new system of five previously undiscovered planets. The new planets orbit their star in a resonance chain, where each planet takes about 50 percent longer to orbit the star than the next closer planet. The resonance chain pattern is part of the discovery.



The eScience Institute at UW-Seattle received a five-year $750,000 grant from the Bill & Melinda Gates Foundation to improve the use of data analytics at organizations serving unhoused people in the greater Seattle area.



A number of studies have suggested that mentorship is important in academia. If you’re (self)-tasked with running a mentorship program, save yourself some groping about in the dark and read this roadmap for establishing an effective mentorship program.



The Alan Turing Institute will add two new universities to the five already in the network. The Universities of Birmingham and University of Exeter were selected for their strengths in data science. This networked approach to building data science environments is also evident in the Moore-Sloan Data Science Environment that supplies funding for this newsletter.

Georgia Tech launched its Constellations Center for Equity in Computing to get more women, people of color, people at the lower end of the income distribution, and other underserved groups access to quality computing education at the K-12 level.



Janice Chan at Johns Hopkins is using new brain imaging technologies to visualize what it looks like when a human (or rodent) brain makes a memory. Science is too, too cool sometimes.



Speaking of brains, Harvard psychologist Roger Beaty and ten co-authors argue that the brains of highly creative people are able to simultaneously engage more “large-scale brain networks” than less creative people or, in other words, “creative thinking ability is characterized by a distinct brain connectivity profile.” There were 163 participants so wait for replication studies before treating this finding as fact. (See also: The Passions and the p-values.)



University of Notre Dame will be leading a $26m multi-university effort to tackle problems necessary to advance into next-generation computing.



And University of Michigan will be home to a new $32m Center for Applications Driving Architectures. Both this center and the previous center will receive funding from the Semiconductor Research Corporation (SRC) and include partnership with the Defense Advanced Research Projects Agency (DARPA).



Carnegie Mellon University also announced a big $27.5m initiative “to connect computing systems and the cloud.” It will be directed by Anthony Rowe. It received funding from the same sources as the previous two centers: Semiconductor Research Corporation (SRC) and DARPA.

And…UVA is getting $26.5m to establish the Center for Research in Intelligent Storage and Processing in Memory with funding from the Semiconductor Research Corporation (SRC). this center will work “to remove the separation between memories that store data and processors that operate on the data” which is critical for speeding up data science applications.



With funding from the Chan Zuckerberg Initiative, the University of Massachusetts-Amherst “to create an intelligent and navigable map of scientific knowledge” using artificial intelligence. The grant is for $5.5m which, compared to the previous four announcements of grants 5-6x that amount, proves the point that government funding can only be supplemented by, not replaced by, private foundations.



Quoting directly from an abstract I wish I would have written, “Mobile Imaging, pervasive Sensing, Social media and location Tracking (MISST) tools used in research are raising new ethical challenges for scientists and the Institutional Review Boards (IRBs) charged with protecting human participants.” They argue that the growth of research utilizing MISST techniques has outpaced the growth of ethical frameworks for assessing risks and benefits. Yep. That’s definitely what I’m seeing when I’m embodying my ethnographer role.

Cindy Harper, a veterinary geneticist at the University of Pretoria in South Africa started building a database of rhino DNA in 2010. Since then, her database has been used to convict poachers, though some people argue that maintaining the database isn’t worth it. Only 2% of the DNA has been linked to a criminal case. This is a *perfect* example of the missed connections that riddle data-driven decision making. Without the political will and organizational cooperation between a variety of agencies, even the best data will go under-utilized, potentially at a cost of great time and effort in collecting and maintaining the data. Raise your hand if you know what I mean?

Elsevier was in the newsletter for co-hosting an event at Harvard last week. This week it is in the newsletter for sponsoring a 5-year postdoc program at Oxford “to develop exceptional research talent in the field of mathematics and data science.” As far as I can tell, they are defining data science as synonymous with math as only math PhDs are allowed to apply.



There’s a new academic search engine in town. Dimensions “connects publications to their related grants, funding agencies, patents and clinical trials” and is run by for-profit tech company Digital Science and is receiving optimistic reviews, though it is unlikely to unseat Google Scholar.



The Scripps Research Institute is set to receive tens of millions of dollars (TBD) from the Skaggs family to support its graduate education.


The first article to utilize the full power of the Seshat: Global History Databank has arrived!

Seshat


from

Philosophers, historians, and social scientists have proposed a multitude of different theories trying to explain the rise of huge complex human societies over the past few millennia. Was the primary driver the invention of agriculture, which seems to be the default explanation held by many archaeologists? Or was it private property and class oppression, as many Marxists believe? Warfare between tribes was a popular explanation a century ago, and has been recently revived under the rubric of cultural multilevel selection. Was it large-distance trade, the need for sophisticated information management, or something else?

One of the main goals of the Seshat project is to answer these sort of Big Questions.


New Carnegie Mellon Dynamic Statistical Model Follows Gene Expressions Over Time

Carnegie Mellon University, Dietrich College of Humanities and Social Sciences


from

Researchers at Carnegie Mellon University have developed a new dynamic statistical model to visualize changing patterns in networks, including gene expression during developmental periods of the brain.

Published in the Proceedings of the National Academy of Sciences, the model now gives researchers a tool that extends past observing static networks at a single snapshot in time, which is hugely beneficial since network data are usually dynamic. The analysis of network data—or the study of relationships from a large-scale view—is an emerging field of statistics and data science.

“For any dataset with a dynamic component, people can now use this in a powerful way to find communities that persist and change over time,” said Kathryn Roeder, the UPMC Professor of Statistics and Life Sciences in the Dietrich College of Humanities and Social Sciences. “This will be very helpful in understanding how certain diseases and disorders progress. For example, we know that certain genes are responsible for autism and can use our model to give us insight into at what point the disorder begins developing.”


Nature still battles nurture in the haunting world of social genomics

Nature, Books and Arts


from

The latest turn of the helix is ‘sociogenomics’. This uses genome-wide association studies, high-speed sequencing, gene-editing tools such as CRISPR–Cas9 and baroquely calculated risk scores — often combined with social-science methods — to ‘understand’ the ‘roots’ of complex behaviour. In Social by Nature, sociologist Catherine Bliss anatomizes the field.

Bliss looks at the science, the professional social structures and the social context of these new developments. She seeks social explanations of why the nature–nurture binary persists in the face of DNA-sequence data that once promised to erase it. Sociogenomics has great biomedical potential, she believes; but the path towards that reward runs along a knife edge, with cliffs of eugenic risk on either side. It is a brilliant book — dense at times, but insightful and filled with illustrative anecdotes and case studies. It’s one you should read if you care about what drives academic research, scientific racism or genetic futurism.


Government Data Science News

Using NVidia computers, a team at the U.S. Department of Energy’s Oak Ridge National Lab has developed an algorithm that generates neural networks. “MENNDL, short for Multi-node Evolutionary Neural Networks for Deep Learning, evaluates, tests and recommends neural networks.” Right now, they’re using it to help detect neutrinos from physics data.

Ajit Pai and his FCC colleagues have drafted a report concluding that broadband is effectively reaching all Americans. Jessica Rosenworcel, writing the dissenting opinion fumed, “it defies logic to conclude that broadband is being reasonably and timely deployed across this country when over 24 million Americans still lack access.” The winners here, would be big telecoms who don’t have to spend money trying to reach those in rural communities. The losers here are people in rural communities, like the one where I grew up. Fight the good fight, Rosenworcel.



The Union of Concerned Scientists report that the Trump Administration is less interested in science advising than any administration on record (though the records on this only go back to the 1990s). The size of the advisory panels is down by 14 percent and they meet less frequently. It is unclear whether ignorance, indifference, or antagonism towards science and scientists are causal factors.

A new Pew Research study found that half of all Americans think people don’t go into STEM because it is too hard. Though probably not for a “stable genius.” Wait. That question was not on the survey.

The Trump administration is highly interested in immigration, threatening to change the H-4 visas that allow the spouses of many H1-B visa holders to work in the US and to prevent implementation of the “entrepreneur visa” that would allow people to move to the US to found a company.



NASA is partnering with Nissan and Uber on applications that support autonomous vehicles. The space agency has a long history of public-private partnerships (see NASA exec Alex MacDonald’s doctorate).

Microsoft will spend $34m to develop an AI R&D center in Taiwan. The company is working in partnership with the Taiwanese government and its network of national labs.



Enterra, an AI company that uses strong or human-like reasoning is run by Princeton prof Stephen DeAngelis and just might be able to detect and stop the propagation of fake news.



A new IARPA project will develop AI to detect and prevent terrorist attacks. Its mostly focusing on video monitoring right now.



The What Works Cities program has met its goal of signing up 100 cities in the US to share data and strategies for data-driven action. Many cities face similar problems – increasing homeless and decreasing trust in law enforcement are two – and this effort aims to accelerate shared learning. So far, there has been progress with respect to recruiting more women to police forces.



People love Google’s new feature that matches your selfie to a famous painting

Recode, Theodore Schleifer


from

Do you look like the Mona Lisa? Or maybe more of an American Gothic?

Social media is being flooded with Google’s opinions, at least, as part of a new feature that compares a user’s selfie with the company’s catalog of historical artworks, looking for the just-perfect doppelganger.

And the update to the Google Arts & Culture App (iOS and Android) has catapulted it to the most-downloaded free app on the App Store. It claimed the No. 1 spot in the U.S. on Saturday, according to the app metrics site AppAnnie.


Machines Best Humans in Stanford’s Grueling Reading Test

Discover Magazine, D-brief, Carl Engelking


from

The ability to read and understand a passage of text underpins the pursuit of knowledge, and was once a uniquely human cognitive activity. But 2018 marks the year that, by one measure, machines surpassed humans’ reading comprehension abilities.

Both Alibaba and Microsoft recently tested their respective artificial neural networks with The Stanford Question Answering Dataset (SQuAD), which is an arduous test of a machine’s natural language processing skills. It’s a dataset that consists of over 100,000 questions drawn from thousands of Wikipedia articles. Basically, it challenges algorithms to parse a passage of text and write answers to tricky questions.


AI researcher Monty Barlow teaches a computer to figure out American accents

TechCrunch, John Biggs


from

Monty Barlow works at Cambridge Consultants and his company’s recent work involves teaching a computer to discern between British and American accents. The project, which Barlow sees as part of a suite of machine-learning solutions that will grow to encompass “true” AI, appeared at CES last week.

Barlow said that AI revolutions tended to come and go, appearing on the horizon and then fizzling out just as quickly. Interestingly, however, he believes the latest technological advances and new systems will encourage systems that augment our own intelligence instead of creating an overarching vision of AI. I encourage you to check it out.


Breakthroughs in magnetism will change storage and computing

Network World, Patrick Nelson


from

Solid-state memory hold your horses. Magnetic media isn’t dead yet. Neural networks and vast efficiencies are coming to help out the aging technology.


The Unique Neural Network of the Creative Brain

Pacific Standard, Tom Jacobs


from

“Creative thinking ability is characterized by a distinct brain connectivity profile,” writes a research team led by Harvard University psychologist Roger Beaty. “Highly creative people are characterized by the ability to simultaneously engage these large-scale brain networks.”

Beaty is referring to three specific neural systems: the executive network, which is engaged in complex mental tasks such as problem solving; the default mode, which is activated when we’re in a resting or ruminative state; and the salience network, which monitors incoming input, prioritizes it, and allows us to efficiently process it.


The Google Brain Team — Looking Back on 2017 (Part 2 of 2)

Google Research Blog; Jeff Dean


from

The Google Brain team works to advance the state of the art in artificial intelligence by research and systems engineering, as one part of the overall Google AI effort. In Part 1 of this blog post, we shared some of our work in 2017 related to our broader research, from designing new machine learning algorithms and techniques to understanding them, as well as sharing data, software, and hardware with the community. In this post, we’ll dive into the research we do in some specific domains such as healthcare, robotics, creativity, fairness and inclusion, as well as share a little more about us.


Funders should mandate open citations

Nature, World View, David Shotton


from

All publishers must make bibliographic references free to access, analyse and reuse, argues David Shotton.


Rhino poachers prosecuted using DNA database

Nature, News, Ewen Callaway


from

A genetic database that holds DNA from thousands of African rhinoceroses has secured the convictions of poachers and led to stiffer criminal sentences since its establishment eight years ago, researchers say. However, not all scientists are convinced the effort is worthwhile.

 
Events



Save the Date – February 22-23 Public Session on Reproducibility and Replicability in Science

National Academy of Sciences


from

Washington, DC The National Academy of Sciences Committee on Reproducibility and Replicability in Science will hold its second meeting on February 22-23, 2018. A public session will be held on the afternoon of February 22. The session is tentatively scheduled to continue on the morning of February 23.


Inside Seattle’s startup scene: GeekWire and MOHAI partner on Jan. 22 event that explores entrepreneurship

GeekWire


from

Seattle, WA “We’ve all heard that adage before. But by sharing stories and lessons learned — including a few well-earned battle scars — we can make the entrepreneurial journey a little less turbulent. That will be the goal of a special event hosted January 22 at the Museum of History & Industry, part of our partnership around The Seattle 10 program.”

 
Deadlines



SIGCHI Student Travel Grant

“The SIGCHI Student Travel Grant (SSTG) program is intended to enable students who lack other funding opportunities to attend SIGCHI sponsored or co-sponsored conferences. This travel grant is intended to support students whose intention is to *present* at a SIGCHI sponsored conference, not just attend.” Deadline for RecSys 2018, UIST 2018, ICMI 2018, ChiPlay 2018, CSCW 2018 is February 1.


Financial Inclusion Challenge

The Wall Street Journal’s Financial Inclusion Challenge, sponsored by MetLife Foundation, is seeking entries from for-profit and nonprofit enterprises whose products or services help to improve financial resilience, via innovative, scalable, sustainable and socially positive solutions. Deadline for entries is February 23.
 
Tools & Resources



The Art of Effective Visualization of Multi-dimensional Data

Towards Data Science, Dipanjan Sarkar


from

Descriptive Analytics is one of the core components of any analysis life-cycle pertaining to a data science project or even specific research. Data aggregation, summarization and visualization are some of the main pillars supporting this area of data analysis. Since the days of traditional Business Intelligence to even in this age of Artificial Intelligence, Data Visualization has been a powerful tool and has been widely adopted by organizations owing to its effectiveness in abstracting out the right information, understanding and interpreting results clearly and easily. However, dealing with multi-dimensional datasets with typically more than two attributes start causing problems, since our medium of data analysis and communication is typically restricted to two dimensions. In this article, we will explore some effective strategies of visualizing data in multiple dimensions (ranging from 1-D up to 6-D).


Importance of transitions in interactive visualisations.

Balamurugan Soundararaj,, Geoid blog


from

I am a very big fan of transitions when presenting complex data with interactive visualisations. I believe that transitions play an important role in building a continuous narrative in audience’s mind when they are trying to understand connected series of information. To be clear, I am talking specifically about transition of data between the stages of visualisation and not the transition of the canvas of the visualisation. Today lets look at one of my favourite visualisations of all time – this brilliant interactive one by NYtimes, showcasing the budget proposal made by US government in 2012


SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

arXiv, Computer Science > Distributed, Parallel, and Cluster Computing; Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska


from

Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has 104 basic network layers on a 12GB K40c.


We need to build machine learning tools to augment machine learning engineers

O'Reilly Radar, Ben Lorica


from

In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products infused with machine learning capabilities. Over the past few years, the data community has focused on infrastructure and platforms for data collection, including robust pipelines and highly scalable storage systems for analytics. According to a recent LinkedIn report, the top two emerging jobs are “machine learning engineer” and “data scientist.” Companies are starting to staff to put their data infrastructures to work, and machine learning is going become more prevalent in the years to come.

 
Careers


Full-time positions outside academia

Software Engineer – Dynet Backend and Applications



Petuum; PIttsburgh, PA

Leave a Comment

Your email address will not be published.