Data Science newsletter – February 6, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for February 6, 2020

GROUP CURATION: N/A

 
 
Data Science News



What if we used behavioral science to power algorithms?

Impactually blog, Nurit Nobel


from

But despite sooooo much evidence that algorithms are great in making decisions, we still don’t trust them. Sure, that job candidate may have impressed the algorithm, but we need to meet them and make sure, because there is simply no way the algorithm has our intuition, our gut feel, our expertise. Except, well, it does. And it’s better. When we see a human make a mistake, like hiring a candidate that ends up performing poorly and leaving the company, we chalk it up to an error in judgement, but we’re most likely to still give that hiring manager another chance. But when we get proof that an algorithm has made a mistake, that’s it for us. We conclude that this is a useless algorithm and we resolve to never rely on it again. Even if, on average, that algorithm makes much better predictions than the manager.

This mistrust we have for algorithms can have devastating results. An AI algorithm was designed to guide the excavation works in Flint, Michigan to locate lead pipes which were contaminating the water and replace them with copper pipes.


How data scientists found a link between restaurant inspections and New York’s wild real-estate market

MarketWatch, William Watts


from

Publicly available data on inspections, crime stats help lead to better NYC apartment-price predictions, researchers find


BRAIN Initiative’s first director sets sights on clinical tools

Science, Kelly Servick


from

Since its launch in 2013, the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative has doled out about $1.3 billion in grants to develop tools that map and manipulate the brain. Until now, it has operated with no formal director. But last week, the National Institutes of Health (NIH), which manages the initiative and is a key funder, announced that neurobiologist John Ngai would take the helm starting in March.

Ngai, whose lab at the University of California, Berkeley, focuses on the neural underpinnings of the sense of smell, has helped lead BRAIN-funded efforts to classify the brain’s dizzying array of cell types with RNA sequencing. Ngai told ScienceInsider about how the initiative is evolving and how he hopes to influence it.


MIT launches master’s in data, economics, and development policy, led by Nobel laureates

MIT News, Abdul Latif Jameel Poverty Action Lab


from

This week, the first cohort of 22 students begin classes in MIT’s new master’s program in Data, Economics, and Development Policy (DEDP). The graduate program was created jointly by MIT’s Department of Economics and the Abdul Latif Jameel Poverty Action Lab (J-PAL), a research center at MIT led by professors Abhijit Banerjee, Esther Duflo, and Benjamin Olken. Banerjee and Duflo are co-recipients of the 2019 Nobel Memorial Prize in Economics.

The 22 students beginning the master’s program this week hail from 14 countries around the world, including Brazil, India, Jordan, Lithuania, Mexico, Nigeria, the United States, and Zimbabwe.

The students are pioneers of a new approach to higher education: College degrees and standardized test scores are not required for admission. Instead, applicants prove their readiness through their performance in online MITx MicroMasters courses, completing weekly assignments and taking proctored final exams.


How Blackstone uses data scientists to win deals

eFinancialCareers, Sarah Butcher


from

Investment banks and hedge funds aren’t alone in incorporating data science into their business models. Private equity funds are also turning to data science, both to win deals in the first place and to help them manage portfolio companies after a purchase.

Speaking at today’s Alternative Investments Conference in London, Lionel Assant, head of European private equity at Blackstone, said the fund now has 14 data analytics professionals, “up from zero five years ago.”

Data analytics teams can be instrumental in enabling private equity funds to win over management teams and do deals, said Assant. He cited the example of Packers Sanitation Services Inc. (PSSI), an American company that cleans food factories, which Blackstone purchased in 2018. Blackstone’s analytics team “aggregated tens of thousands of pieces of data” on PSSI’s potential clients, said Assant. “The management team said it was awesome – our data team helped swing the deal.”


The personal web: Connecting information for better search and recommendation

Microsoft Research Blog, Paul Bennett and Adam Fourney


from

generates more information: emails, files, receipts, photos—the list goes on. We have trails of digital information from each of our work projects, vacations, hobbies, and kids’ schools, including websites, files, and calendar appointments, not to mention contacts—the people we work, play, and live with in each of these spaces. We spend a significant amount of time figuring out when to share something with them, when to ask them for something, and where to find information they might have already sent us. Don’t you wish there were a way to automatically and effortlessly organize everything? To have search as simple as people’s associative memory—the minute you start looking at one item all of the related information you need is magically at your fingertips?

This is just the problem we set out to solve with University of Michigan PhD student Tara Safavi, our Microsoft product partners and research colleagues, and Safavi’s PhD adviser.


With Google’s Meena, are AI assistants about to get a lot smarter?

VentureBeat, Alan Nichol


from

Last year, we got to see coherent, multi-paragraph text generated by OpenAI’s GPT-2 model. This week, a new paper from Google AI showed that a chatbot based on a gigantic neural network and huge amounts of data can hold coherent conversations, maintaining context over multiple turns and conversing on just about any topic. The chatbot is called Meena, and it’s even able to invent jokes (see picture).

As someone who works in this field, I am impressed. (I’m a cofounder of Rasa, ​a company that offers an open source conversational AI framework that Meena may potentially compete with someday.) But what does this result mean for the future of AI assistants?


New Research on the Adtech Industry

Schneier on Security blog, Bruce Schneier


from

The Norwegian Consumer Council has published an extensive report about how the adtech industry violates consumer privacy. At the same time, it is filing three legal complaints against six companies in this space.


ArbiLex, A Harvard Law School Legal Tech Startup, Uses AI To Settle Arbitrations

Forbes, Frederick Daso


from

Legal disputes between two entities vary in complexity and scope. With criminal law, the outcome of these cases is seen as discrete (guilty, not guilty, acquittal, mistrial). In civil law, a discrete view of the consequences of a legal ruling is inappropriate. The size of a settlement in a civil case varies. With claims resolved through international arbitration, the outcomes are on the order of millions and billions of dollars. Lawyers arguing these cases rely on intuition, precedent, and incomplete information. Isabel Yang, 30, realized the qualitative nature of law represented an opportunity for technology to resolve legal disputes, creating ArbiLex as a result. ArbiLex is a data analytics startup for international arbitrations, leveraging artificial intelligence (AI) to help parties reach resolutions quickly and efficiently. The Cambridge, Massachusetts-based company was incubated out of the Harvard Innovation Lab.


Knowledge-guided analysis of “omics” data using the KnowEnG cloud platform

PLOS Biology; Saurabh Sinha et al


from

We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in “knowledge-guided” data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive “Knowledge Network.” KnowEnG adheres to “FAIR” principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system’s potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.


Rice University Engineers Develop Deep Learning Computer That Accurately Forecasts Heat Waves, Cold Spells

HPC Wire


from

Rice University engineers have created a deep learning computer system that taught itself to accurately predict extreme weather events, like heat waves, up to five days in advance using minimal information about current weather conditions.


Researchers report progress on molecular data storage system

Brown University, News from Brown


from

A Brown University team has shown that they can store and retrieve more than 200 kilobytes of digital image files by encoding the data in mixtures of new custom libraries of small molecules.


How Florida State University is Boosting Student Success and Addressing Data Challenges

Cloudera Blog, Matt Spillar


from

For public universities, metrics such as retention rate and graduation rate are important indicators for standing out in the competitive landscape. These success metrics are paramount to bringing in more students, making them successful, and continuing to grow a strong alumni network.

One of the nation’s elite research universities, Florida State University (FSU) preserves, expands, and disseminates knowledge in the sciences, technology, arts, humanities, and professions, while embracing a philosophy of learning strongly rooted in the traditions of the liberal arts and critical thinking. Founded in 1851, it is located on the oldest continuous site of higher education in the state of Florida.

With the responsibility for Learning Management System (LMS) reporting on over 40,000 students and 3,000 instructors, the FSU Office of Distance Learning’s (ODL) limited data technology staff was looking for a way to address challenges and automate processes that had traditionally been manual.


The IRS Decided to Get Tough Against Microsoft. Microsoft Got Tougher.

Pro Publica, Paul Kiel


from

For years, the company has moved billions in profits to Puerto Rico to avoid taxes. When the IRS pushed it to pay, Microsoft protested that the agency wasn’t being nice. Then it aggressively fought back in court, lobbied Congress and changed the law.


AI can now design cities. Should we let it?

Fast Company, Mark Wilson


from

FaceLift is a new AI system developed by Nokia Bell Labs Cambridge that allows scientists and urban planners to use a crowd’s aggregated sensibility to actually redesign the look of city streets. FaceLift AI can take any Google Street View scene and beautify it instantly—but at what cost?

To create FaceLift, 82,000 volunteers from 162 countries were tasked with rating 20,000 Google Street View images as beautiful or ugly. That data was pumped into an AI that then deconstructed people’s preferences by features in these scenes: It learned picnic areas, orchards, and plazas were considered beautiful, while viaducts and construction sites were not. (Who knew?!?) Then in a final step, a “generative” AI system was tasked with beautifying images of a street by editing it with newly generated imagery (much like how a deepfake is made).

 
Events



15th International Digital Curation Conference | RDA

Research Data Alliance


from

Dublin, Ireland February 17-20. “Collective Curation: the many hands that make data work” … “The programme will focus on community, the various stakeholders that play a role in ensuring digital objects are properly created, managed and shared.” [$$$]


RMDS Lab to Host Pasadena’s Largest AI and Data Science Conference on November 2nd and 3rd, 2020

IM DATA


from

Pasadena, CA November 2-3. “IM DATA”, the industry-leading conference focused on innovative and impactful data science hosted by Research Method and Data Science (RMDS) Lab, will be held on Nov. 2nd and 3rd, 2020. This will be the second time IM Data is hosted in Pasadena, home to the most cutting-edge research institutions such as NASA’s Jet Propulsion Laboratory, Cal-tech, and Amazon’s future Web Service Center for Quantum Computing.” [save the date]


Data Science Connect Brings Largest Data Science Conference in the Southeast to South Carolina – Hypepotamus

Data Science Connect


from

Greenville, SC July 16-17. “After seeing great success with data science conferences in Georgia and North Carolina, and becoming the largest data science organization in the Southeast, DSC is adding another city to its list. The first annual Data Science South Carolina (DSSC) Conference will showcase the wide variety of advanced data science applications across industry verticals and technical disciplines with the goal of sharing and inspiring new applications, algorithms and techniques.” [$$$]


Machine Learning School in Seville

BigML


from

Seville, Spain March 26-27. “BigML is bringing the second edition of our Machine Learning School to Seville. We will hold a two-day crash course ideal for business leaders, industry practitioners, advanced undergraduates, as well as graduate students, seeking a quick, practical, and hands-on introduction to Machine Learning to solve real-world problems.” [$$]


The Knowledge Graph Conference

Knowledge Graphs Conference, LLC


from

New York, NY May 6-7 at Columbia University. “Knowledge Graphs form an organized and curated set of facts that provide support for models to help understand the world. This conference gathers technology leaders, researchers, academics, vendors — and most important, practitioners who know the discipline.” [$$$]


Applied ML: A sports analytics & contract intelligence use case

Meetup, GDG Cloud Amsterdam


from

Amsterdam, The Netherlands February 18 starting at 5:30 p.m., Google Amsterdam (Claude Debussylaan 34).


Frontiers in BioImaging 2020

Royal Microscopy Society


from

London, England June 24-25. “This is an ideal meeting for both new and established researchers to engage with a broad range of imaging approaches and to make valuable contacts with leading groups in the field.” [registration required]

 
Tools & Resources



How Digital Design Drives User Behavior

Harvard Business Review, Shlomo Benartzi and Saurabh Bhargava


from

… A review of recent research provides clear evidence that many organizations are currently undervaluing the power of digital design and should invest more in behaviorally informed designs to help people make better choices. In many cases, even minor fixes can have a major impact, offering a return on investment that’s several times larger than the conventional use of financial incentives or marketing and education campaigns.

In our recent working paper, written with Lynn Conell-Price at the University of Pennsylvania, and Richard Mason at City, University of London, we collaborated with Voya Financial, a leading retirement service provider, to investigate how variation in the digital design of online enrollment interfaces could influence the initial contribution decisions of employees in 401(k) plans. The research involved more than 8,500 employees across a few hundred plans, who prior to being automatically enrolled in the plan, had visited a standardized online enrollment interface to either actively confirm their enrollment at the default rate, personalize their enrollment at a different rate, or decline enrollment altogether by selecting one of three horizontally arranged options. Our goal was to get employees to consider a deferral rate higher than the default, which is often too low to achieve financial security in retirement.


Learning to imitate human demonstrations via CycleGAN

ΑΙhub, Laura Smith and Marvin Zhang


from

This work presents AVID, a method that enables robotic imitation learning from human videos through a strategy, similar to humans, of imagination and practice. Given human demonstration videos, AVID first translates these demonstrations into videos of the robot performing the task, by means of image-to-image translation. In order to translate human videos to robot videos directly at the pixel level, we use CycleGAN, a recently proposed model that can learn image-to-image translation between two domains using unpaired images from each domain.


Some Useful Probability Facts for Systems Programming

Simon Arneaud, The Art of Machinery blog


from

Probability problems come up a lot in systems programming, and I’m using that term loosely to mean everything from operating systems programming and networking, to building large online services, to creating virtual worlds like in games. Here’s a bunch of rough-and-ready probability rules of thumb that are deeply related and have many practical applications when designing systems.

 
Careers


Postdocs

Offre de post-doctorat



Paris Artificial Intelligence Research Institute; Paris, France

Leave a Comment

Your email address will not be published.