Data Science newsletter – December 2, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for December 2, 2018

GROUP CURATION: N/A

 
 
Data Science News



New analysis of history-major data says the field is at a ‘new low.’ Can it be saved?

Inside Higher Ed, Colleen Flaherty


from

History has seen the steepest decline in majors of all disciplines since the 2008 recession, according to a new analysis published in the American Historical Association’s Perspectives on History.

“The drop in history’s share of undergraduate majors in the last decade has put us below the discipline’s previous low point in the 1980s,” reads the analysis, written by Benjamin M. Schmidt, an assistant professor of history at Northeastern University.

Some numbers: there were 34,642 history degrees conferred in 2008, according to federal data. In 2017, the most recent year for which data are available, there were 24,266. Between 2016 and 2017 alone, there was a 1,500 major drop-off. And even as overall university enrollments have grown, “history has seen its raw numbers erode heavily,” Schmidt wrote, especially since 2011-12.


Advancing FAIR Data in Earth, Space, and Environmental Science

EOS, Shelley Stall et al.


from

Research and scientific discovery are rooted in a rich, fluid ecosystem of shared information that includes data, publications, software, physical samples, and a myriad of other research products. A combination of technological advances and increasing pressures on global resources is prompting a major shift in how data and research products are shared and valued in the Earth, space, and environmental sciences (ESES). This shift is complicated by legacy systems of communication, incentives, and cultural norms. Open sharing [European Commission, 2016] of data and research products will mitigate many of these challenges and enable new frontiers of discovery. Toward this goal, scientific publishers, geoscience data repositories, funders, and other stakeholders recently met as part of the Enabling FAIR Data project, funded by the Laura and John Arnold Foundation through AGU. By leveraging the FAIR principles [Wilkinson et al., 2016]—findable, accessible, interoperable, and reusable—this emerging community is working together to ensure that data, physical samples, and software are treated as first-class research products to open new opportunities for ESES research.


Better medicine through machine learning: What’s real, and what’s artificial?

PLOS Speaking of Medicine blog; Suchi Saria, Atul Butte, and Aziz Sheikh


from

PLOS Medicine Machine Learning Special Issue Guest Editors Suchi Saria, Atul Butte, and Aziz Sheikh cut through the hyperbole with an accessible and accurate portrayal of the forefront of machine learning in clinical translation.


Using the Internet of Sound to transfer IoT data via speakers

Network World, Patrick Nelson


from

ome kinds of data should be encoded onto an inaudible, near-ultrasound layer placed on top of normal, audible sounds — a process that could rapidly scale Internet of Things (IoT) adoption, says startup Trillbit.

The company says that by doing that kind of overlay, any microphone and speaker is turned into a data-transfer device that then could be used for payments transfers, user authentication, and smart city applications such as digital locks.

“There is no common universal protocol to connect Internet of Things,” Trillbit explains on its website. “Speakers [and microphones], which are already in place, and all around us” could perform connections.


The US could regulate AI in the name of national security

Quartz, Dave Gershgorn & Max de Haldevang


from

Artificial intelligence technology has the capability to be the most impactful software advance in history and the US government has no idea how to properly regulate it.

The US does know that it doesn’t want other countries using its own AI against it. A new proposal published today (Nov. 19) by the Department of Commerce (pdf) lists wide areas of AI software that could potentially require a license to sell to certain countries. These categories are as broad as “computer vision” and “natural language processing.” It also lists military-specific products like adaptive camouflage and surveillance technology.


Salovey approves USSC recommendations

Yale Daily News, Serena Cho


from

As Yale identifies key projects to drive the next capital campaign, University President Peter Salovey announced Tuesday that Yale will invest in areas including integrative data science, quantum science and neuroscience, per recommendations by the University Science Strategy Committee’s report released in June.

In a university-wide email on Tuesday, Salovey said Yale will invest in facilities and infrastructure for the sciences, restructure funding for graduate students, improve diversity in the sciences and update equipment in core STEM facilities. In September, University administrators held three faculty town halls to gather feedback on the report’s recommendations. While many praised the interdisciplinary nature of the priorities identified in the report, some criticized the exclusion of certain areas of study such as the digital sciences.


The decline in U.S. life expectancy is unlike anything we’ve seen in a century

Popular Science, Sarah Chodosh


from

Tackling the various problems that contribute to the recent upticks in suicide, liver disease, and drug overdoses won’t be easy. They’re multifaceted issues—often complicated by economics—that we’ve only just begun to parse. But if we want Americans to lead long and healthy lives, it looks like we’re going to have to change what we’re doing. If these charts tell us anything, it’s that our current methods aren’t working.


Former MIT and Google researchers raise $5.3M for ‘artificial intelligence compiler’ startup Canotic

GeekWire, Taylor Soper


from

Canotic is a new artificial intelligence startup based in Seattle and Berlin that today announced a $5.3 million seed round led by Seattle-based PSL Ventures and London-based Mosaic Ventures, with participation from East Ventures and other angels.

What they do: Canotic helps enterprise companies apply AI and machine learning to unstructured datasets without the need for top data scientists. Much like a traditional software compiler automatically and quickly transforms computer code from one programming language to another without human intervention, Canotic does the same for ingesting and analyzing data with AI and ML. Its data programs can label complex unstructured data; create real-time production AI APIs; and much more.


Leading University of Washington professor joins AI2 as artificial intelligence talent wars continue

GeekWire, Clare McGrane


from

The Allen Institute for Artificial Intelligence (AI2), one of the world’s leading artificial intelligence research organizations, has hired yet another prominent researcher.

Dan Weld, an endowed professor at the University of Washington, is joining the institution as a senior researcher working on AI2’s academic search engine, Semantic Scholar. His addition is the latest in a series of high-profile AI moves in Seattle as organizations fight for talent in the growing field.

In an email to GeekWire, said he will split his time between his position at UW and his new one at AI2. “Indeed, I see many great possibilities for collaboration between the two,” he said.


Old Lunar Data Gets New Life, With Help From Seismologists

IEEE Spectrum, Michael Dumiak


from

The NASA probe that touched down on Mars earlier this week is carrying a sensitive seismometer to measure the Red Planet’s rumblings. But as it looks to expand what we know about otherworldly seismology, researchers remain transfixed by data from a much older source: seismometers set on the moon nearly 50 years ago.

Apollo astronauts placed the first of these instruments in 1969 on the Sea of Tranquility, a large lunar plain formed by ancient volcanic eruptions. These seismometers last sent signals earthbound in 1977. Now Ceri Nunn, a seismologist at NASA’s Jet Propulsion Laboratory, is working with colleagues to collect and clean up the decades-old data.


RBS moves into Edinburgh Uni data science hub

The Scotsman, Scott Reid


from

Royal Bank of Scotland has pushed the button on a data innovation research unit at the University of Edinburgh, joining other industries, academics, and public sector partners in the heart of the city’s data science community.


Princeton and Microsoft collaborate to tackle fundamental challenges in microbiology

Princeton University, Office of Communications


from

In this project, Microsoft is helping Princeton to better understand the mechanisms of biofilm formation by providing advanced technology that will greatly extend the type of research analysis capable today. Biofilms — surface-associated communities of bacteria — are the leading cause of microbial infection worldwide and kill as many people as cancer does. They are also a leading cause of antibiotic resistance, a problem highlighted by the World Health Organization as “a global crisis that we cannot ignore.” Understanding how biofilms form could enable new strategies to disrupt them.

To support Princeton, a Microsoft team led by Dr. Andrew Phillips, head of the Biological Computation group at Microsoft Research, will be working closely with Bonnie Bassler, a global pioneer in microbiology who is the Squibb Professor in Molecular Biology and chair of the Department of Molecular Biology at Princeton and a Howard Hughes Medical Institute Investigator, and with Ned Wingreen, the Howard A. Prior Professor in the Life Sciences and professor of molecular biology and the Lewis-Sigler Institute for Integrative Genomics.


New Master’s in Data Science Prepares Students for Fastest Growing Field in U.S.

University of New Haven, University News


from

“There are two key features that set the TCoE’s data science program apart,” he said. “The first is the experiential approach of the University of New Haven. This means a heavy focus on active learning to build the needed professional skills, starting with the first class. Students will not only be learning the theory and methods, but also applying them to real problems immediately.”

The second key feature is that the content of the program is built around emerging machine learning and artificial intelligence methods. “As areas such as data science emerge, and the skills become highly sought-after, there is a tendency by some to repackage programs they already have under a more marketable name name,” Dillon said. “Our program, however, is based on technologies such as machine learning and artificial intelligence, that hold the most promise for the future, in addition to being highly-valued today.”


Not just for bots: The changing career landscape in AI | Science | AAAS

Science, Alaina G. Levine


from

Wanted: Domain experts

In the dynamic world of AI, where data science, deep learning, and machine learning are being used to improve products and outputs in fields as far apart as astronomy, health care, transportation, security, and banking, scientists and engineers are finding that their skills are highly prized in multiple career tracks and sectors, says Maggie Johnson, vice president of Education and University Programs for Google, where she manages technical education for all Google staff. In a firm like Google, she notes, there are those with Ph.D.’s in highly technical AI-adjacent areas, such as computer science, statistics, mathematics, and data science. “They are defining the actual algorithms and what the models look like,” she says.


Why a Major AI Conference Is Headed to Ethiopia

Associations Now, Ernie Smith


from

The International Conference on Learning Representations, which brings together AI researchers from around the world, is heading to Africa in 2020. Recently, researchers of African descent have faced challenges getting travel visas to North America and Europe.

To many professionals around the world, Addis Ababa, Ethiopia, isn’t as well known as an international meetings destination as, say, Barcelona, London, or Beijing. But for the attendees whom the International Conference on Learning Representations (ICLR) aims to attract, it might be the best place to ensure that they’re able to show up.

 
Events



NSF-CBMS Conference and Software Day on Topological Methods in Machine Learning and Artificial Intelligence

Department of Mathematics, College of Charleston


from

Charleston, SC May 13-17, 2019, with workshops on May 18. [registration required]

 
Tools & Resources



How to Prove the ROI of Computer Vision Moderation

Clarifai, Natalie Fletcher


from

In this post, I’ve built a simple business scenario showing you just how beneficial investing in computer vision for moderation can be and to provide an example of how the ROI calculator can be used for your own needs. Let’s dive in!


How to Use Rails Active Job

Codeship blog, Leigh Halliday


from

You’re always striving to give your users a better experience when they use your website or application. One of the most important ways to achieve this is by giving them quick server response times. In this article, we’ll explore how to use Rails Active Job to enable us to do this through the use of a queuing system. You can also use queues to help normalize traffic spikes or load on the server, allowing work to be done when the server is less busy.

Active Job was first included in Rails 4.2 as a way to standardize the interface to a number of queueing options which already existed. The most common queues used within Rails applications are Sidekiq, Resque, and Delayed Job.

Active Job allows your Rails app to work with any one of them (as well as with other queues) through a single standard interface. For the full list of which backends you can use with Rails Active Job, refer to this page. It’s also important to see which features are supported by which queueing system; some don’t support delayed jobs for example.


TorchCraftAI Operational!

Facebook, Gabriel Synnaeve


from

Last week, the CherryPi team at FAIR released TorchCraftAI. TorchCraftAI includes a modular bot framework for StarCraft: Brood War, a reinforcement learning (RL) environment, and a distributed ML/RL training library (overview). This week, we retrained the Build Order Switch (BOS) model. You can get more details about BOS in our recent NeurIPS RL PO workshop accepted paper.


Finding Data to Index: Data found in Supporting Information files

Data Catalog Collaboration Project


from

This blog is part of a series on using the “has data avail” filter on PubMed Central (PMC) to identify a wide range of institutional datasets and what we at NYU learned about our institution’s data sharing practices from this exercise. To learn more about the background of this project and how we pulled the bibliographic data used, please refer to our first post.


Amazon SageMaker Ground Truth – Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70%

Amazaon, AWS News Blog


from

Today, most machine learning tasks use a technique called supervised learning: an algorithm learns patterns or behaviours from a labeled dataset. A labeled dataset containing data samples as well as the correct answer for each one of them, aka ‘ground truth’. Depending on the problem at hand, one could use labeled images (“this is a dog”, “this is a cat”), labeled text (“this is spam”, “this isn’t”), etc.

Fortunately, developers and data scientists can now rely on a vast collection of off-the-shelf algorithms (as illustrated by the built-in algorithms in Amazon SageMaker) and of reference datasets. Deep learning has popularized image datasets such as MNIST, CIFAR-10 or ImageNet, and more are also available for tasks like machine translation or text classification. These reference datasets are extremely useful for beginners and experienced practitioners alike, but a lot of companies and organizations still need to train machine learning models on their own dataset: think about medical imaging, autonomous driving, etc.


Deep Learning cheatsheets covering the content of Stanford’s CS 230 class.

Twitter, Reza Zadeh


from

Well done and clear by @shervinea.

Leave a Comment

Your email address will not be published.