Data Science newsletter – January 9, 2020

Newsletter features journalism, research papers, events, tools/software, and jobs for January 9, 2020

GROUP CURATION: N/A

 
 
Data Science News



For all its limitations, the new California consumer privacy law is providing a wealth of information about companies’ data-harvesting practices.

Twitter, Natasha Singer


from

Take the new Saks Fifth Ave privacy policy saying that the store chain collects: “Suspected crime information”


The devices, software and other health tech headlines of CES 2020

MobiHealthNews


from

Over the course of the show, MobiHealthNews will be updating this list of new products and other announcements being unveiled this week in Las Vegas.


Can Social Media Inform Public Health Efforts?

The Scientist Magazine®, Emma Yasinski


from

On March 14, 2014, HealthMap—an online database created by researchers at Boston’s Children’s Hospital in 2006 to collect accounts of disease cases from various online sources—notified scientists of an article written in French about cases of a “strange fever” in Macenta, Guinea. Nine days later, the World Health Organization officially announced an Ebola outbreak in the area.

Although the outbreak was first identified by the HealthMap software when a news article was published online, other types of sources have proven invaluable for researchers to continue tracking the virus—most recently in the Democratic Republic of Congo. One key source is Twitter, Emily Cohn, who works on HealthMap at Boston Children’s Hospital, tells The Scientist in an email. Tweets containing the term “Ebola” or evidence of specific symptoms such as fever, joint or muscle aches, and coughing or vomiting blood are flagged by a machine learning algorithm and added to information drawn from online news outlets, official reports, and other sources to create a map of cases, a timeline, and a projection of predicted future cases in certain areas. “Social media collects information on location through geo-location,” Cohn says. “It is the most real-time of the data sources we work with.”

HealthMap researchers have also used social media data to create a global map of Zika and a US map of the flu, and they are not alone in their excitement about using data mined from social media and other online sources such as Google searches to track—and perhaps one day predict—disease outbreaks. Research in this area has demonstrated the value of user-generated data from the Internet to predict past outbreaks, and the hope is that social media posts and internet searches could one day help track the spread of diseases in real time, yielding clues about a pathogen’s migration faster than traditional surveillance systems.


Is Scholarly Refereeing Productive (at the Margin)?

NBER Working Papers; Aboozar Hadavand, Daniel S. Hamermesh, Wesley W. Wilson


from

In economics many articles are subjected to multiple rounds of refereeing at the same journal, which generates time costs of referees alone of at least $50 million. This process leads to remarkably longer publication lags than in other social sciences. We examine whether repeated refereeing produces any benefits, using an experiment at one journal that allows authors to submit under an accept/reject (fast-track or not) or the usual regime. We evaluate the scholarly impacts of articles by their subsequent citation histories, holding constant their sub-fields, authors’ demographics and prior citations, and other characteristics. There is no payoff to refereeing beyond the first round and no difference between accept/reject articles and others. This result holds accounting for authors’ selectivity into the two regimes, which we model formally to generate an empirical selection equation. This latter is used to provide instrumental estimates of the effect of each regime on scholarly impact.


Satellite Imagery Provides Insights into Holiday Supply Chain Management

Medium, Planet Stories, Planet


from

Raw materials are gathered in some of the world’s most remote places, transformed into goods in far-flung factories and delivered to your door through a sophisticated system of trucks, trains, planes and ships. Satellites are able to monitor many of these happenings giving a comprehensive look at global economic activity.

Here are some examples of how Earth-observation can provide unique insights into the processes that contribute to holiday cheer.


U.S. Census Embraces the Digital Age With Enterprise Data Lake

MeriTalk


from

Census leadership established the Census Enterprise Data Lake (EDL) initiative. The EDL provides big-data processing capability to fulfill petabyte-scale data management and analytics while satisfying security and privacy requirements and controlling costs. The initiative is transforming how the agency processes demographic and economic data using open-source technology and high-performance cloud infrastructure.

“The EDL will support the processing of big datasets quickly and easily with large, dynamically scalable compute and storage capabilities throughout the enterprise,” says Kevin Smith, chief information officer at the U.S. Census Bureau. “The data lake also provides a centralized repository to consolidate operational paradata, response data, and cost data from multiple modes of data collection. It provides a single place to analyze all operational data and make informed decisions during operations.”


Smart Homes Get Smarter: CES 2020 Brings Artificial Intelligence to Everything

NBC Boston


from

In a bustling Las Vegas conference hall, a handpicked crop of tech innovators showed off their wares to reporters ahead of 2020’s CES tech conference. The exhibitors at the media-only preview event, CES Unveiled, include many of this year’s CES Innovation Award winners — with products ranging from smart sinks and door locks to entirely new ways to charge electronics… [video, pre-roll + 3:11]


Patent Office May Change Artificial Intelligence Rules

The National Law Review


from

The number of patents for inventions based on artificial intelligence, machine learning and deep learning continues to grow rapidly. Some of these inventions relate to AI technology per se, and some relate to the use of AI in specific applications, including many in healthcare, financial services and blockchain, among other industries.

The USPTO has addressed various aspects of intellectual property issues with these technologies in various ways, including in an event it hosted entitled “Artificial Intelligence: Intellectual Property Policy Considerations (January 2019).” Due to some of the unique issues with these technologies, the USPTO is considering whether it should make any changes to how it handles examination of these applications. As part of this analysis, the USPTO issued a request for public comments on protection and examination of these inventions.


A.I. Predictions: Regulations, Cashierless Stores, and More Artificial Intelligence Predictions for 2020

Fortune, Jonathan Vanian


from

Most companies are still struggling to harness A.I. technologies like neural networks, or software that learns from data, according to surveys that Fortune covered last year. I believe this dilemma will continue in 2020, due in part to the lack of available A.I. talent that can build and maintain these complicated systems.

Despite the potential A.I. setbacks, businesses will continue spending big money on A.I. in the hope that their investments will eventually pay off. There’s just too much at stake (and too much hype) to ignore it.

Here’s my predictions for what to expect in A.I. in 2020:

1. The regulators are coming


No laboratory needed: The Person Project mines social science data with secure online activities

Princeton University, News


from

Understanding the human psyche is complex, for ordinary people and scientists alike. Now, researchers at Princeton University have created a new tool for social scientists to study human psychology through a series of fun, thought-provoking activities.

Launched this month, the Person Project website and smartphone app (available for both iPhone and Android) are designed to encrypt and store securely collected data from users who participate in these activities, providing an online platform for researchers to use in academic studies.

“The goal of this project is to connect people to psychological research and give curious users the opportunity to learn a little about themselves while helping us advance social science,” said Stacey Sinclair, co-principal investigator and professor of psychology and public affairs at the Woodrow Wilson School of Public and International Affairs.


Anantha Shekhar Named Pitt’s Senior Vice Chancellor for Health Sciences and Medical School Dean

University of Pittsburgh News


from

Anantha Shekhar, MD, PhD, a nationally recognized educator, researcher and entrepreneur with major contributions in medicine and life sciences, has been named senior vice chancellor for the health sciences and John and Gertrude Petersen Dean of the School of Medicine at the University of Pittsburgh. His start date is set for June 2020.

Shekhar joins Pitt from Indiana University (IU), the nation’s largest medical school, where he holds a number of leadership roles. These are: executive associate dean for research affairs, distinguished professor, associate vice president for university clinical affairs and research, executive vice president of academic affairs for clinical research at IU Health, and founding director of the Indiana Clinical and Translational Sciences Institute, the only statewide institute of its kind.


Nano antennas for data transfer

University of Würzburg, News


from

For the first time, physicists from the University of Würzburg have successfully converted electrical signals into photons and radiated them in specific directions using a low-footprint optical antenna that is only 800 nanometres in size.


Google Has the Data, but Health AI Researchers Need Your Consent

Bloomberg Law, Jeannie Baumann


from

Google project managers called Camille Nebeker out of the blue, intrigued by her paper on using artificial intelligence for healthy aging. The tech giant sees a new market for its products, which could generate reams of new data for researchers studying AI in health—provided they can close the gap on consent.

The conversation with Nebeker, an associate professor at the University of California, San Diego’s medical school, has Google considering testing products beyond the 15-to-35-year-old consumer market it had previously targeted.

“And now they just want to go out to the retirement communities and start collecting data from residents to figure out how they can pitch their product to that demographic,” Nebeker said.


Austin-area universities see boom in online enrollment

Austin American-Statesman, Lare Korte


from

Nearly a quarter of UT students took an online class in 2018, up more than 80% from 2015. It’s a big leap for the school, but it’s not alone: across the country, hundreds of colleges are increasing their offering of online courses, encouraging students to take advantage of the convenience and flexibility that comes with “distance education.” Recently released federal data show 34.7% of all college students in the country took at least one online class in the fall of 2018.

“In 2015, we still only had a couple courses,” TenBarge said of the studio where professors can broadcast lectures, record podcasts and host virtual office hours. “Now, we’ve grown to providing over 5% of the university’s undergraduate credit hours.”


Startup building ‘living map’ for healthcare raises $50M

MedCity News, Elise Reuter


from

Komodo Health raised a $50 million series C round led by Andreessen Horowitz. It plans to use the influx of funds to build software for healthcare companies using its trove of data.

The startup has recorded de-identified clinical data from more than 320 million patients in its system. It brings in the data and transforms it to make it usable with everything else.

“It’s taken us years to pull enough of this together to get a deep and representative view of the U.S.,” Komodo Health CEO and Co-Founder Dr. Arif Nathoo said. “Why it’s so hard, why we haven’t seen the AI revolution in medicine, is because data is trapped in all of these silos.”

 
Events



Pinterest Labs Tech Talk – Percy Liang on Learning from Language

Pinterest Labs


from

Palo Alto, CA January 13 starting at 5:15 p.m., Pinterest (395 Page Mill Rd). [rsvp required]

 
Deadlines



14th Annual Machine Learning Symposium

New York, NY March 13 at the New York Academy of Sciences. Abstract submissions are invited for a poster session. The deadline for abstract submission is January 20.

L’Oréal USA For Women in Science

“L’Oréal USA partners with the American Association for the Advancement of Science (AAAS) to manage the program’s application and peer-review process. Each year, the program attracts talented applicants from diverse STEM fields, representing some of the nation’s leading academic institutions and laboratories.” Deadline to apply is January 31.

Please share your experience with the South Big Data Hub!

“Start the New Year off by sharing your experiences, needs, and perspectives with us! Spare a few minutes and help inform the future strategic South Hub programs, activities, & opportunities. It’s completely anonymous!”

NCI Systems Biology and Physical Sciences Summer Research Program

“Undergraduate students interested in solving complex problems in cancer through interdisciplinary research are encouraged to apply by February 7, 2020.”

CHI 2020 Workshop – Conversational Agents for Health and Wellbeing

Honolulu, HI April 26 at CHI Conference. “This one-day workshop at the CHI 2020 in Honolulu, Hawaii, USA invites HCI designers, researchers, healthcare professionals, health informatics researchers, and AI developers and researchers to contribute to the emerging area of conversational agents in healthcare.” Deadline for position papers is February 11.

2020 California Water Data Science Symposium Presenter Interest Form

“The Surface Water Ambient Monitoring Program (SWAMP), the California Water Quality Monitoring Council, the California Water Boards Data Center, and the San Francisco Estuary Institute (SFEI) are hosting the Fifth Annual California Water Boards Water Data Science Symposium on June 29-30, 2020 in Sacramento, CA. This free annual event aims to enhance how water quality monitoring generates meaningful data to inform water quality management decisions.” Deadline for submissions is February 29.

Statistical Partnerships Among Academe,Industry, and Government (SPAIG) Award Nomination Form

“The SPAIG Award was established in 2002 to recognize outstanding partnerships between academe, industry, and government organizations, as well as to promote new partnerships among these organizations. The award is sponsored by the SPAIG committee of the ASA and distinct from other ASA awards in that it recognizes outstanding collaborations between organizations, while recognizing key individual contributors.” Deadline for nominations is March 1.
 
Tools & Resources



Search Smarter by Dorking

Exposing the Invisible The Kit 1.0 documentation,, Gabi Sobliye


from

“A look at advanced internet searches with “Google dorking,” how they work across different search engines, and how you can use the technique in your investigations. Mind the tips on protecting your privacy while searching, and safeguarding your personal information from those who might use this technique for malicious purposes.”


Building a Successful Modern Data Analytics Platform in the Cloud

Medium, Software Engineering, ML-guy


from

“TL;DR, design the data platform with three layers, L1 with raw files data, L2 with optimized files data, and L3 with cache in mind. Ingest the data as it comes into L1, and transform each use-case independently into L2, and when a specific access pattern demands it, cache some of the data into a dedicated data store.”


Thirteen tips for engaging with physicists, as told by a biologist

Nature, Career Column, Ken Kosik


from

My goal was simply to open a conversation and possibly a collaboration with physicists, not to become one. As a relatively small institution with a distinguished faculty in both the physical sciences and engineering, UCSB was, I felt, an ideal place to wade into this territory. Over the ensuing decade, this risky move has resulted in my sharing many graduate students and postdoctoral fellows with computer-science, engineering, physics and chemistry faculty members. These collaborations have greatly broadened the science in all my publications.

I devised a few simple rules to help the biologist in me to cross the divide between the life and physical sciences. In learning to talk to physicists, I discovered that I can communicate better with everyone and clarify for myself what I do and do not understand in my own field.

1. Understand what ‘I do not understand’ means

Leave a Comment

Your email address will not be published.