Data Science newsletter – November 23, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for November 23, 2018

GROUP CURATION: N/A

 
 
Data Science News



Germany’s €3B plan to become an AI powerhouse

POLITICO, Janosch Delcker


from

In the global race to dominate artificial intelligence, Europe’s industrial powerhouse is taking the gloves off.

On Thursday, Chancellor Angela Merkel will present her strategy on how Germany aims to beat the United States, China and other nations in the emerging second wave of artificial intelligence (AI) that uses industrial rather than consumer data to boost factories and supply chains around the world.


AI Is Compelling, But AI And Data Science Operations Must Improve

Forbes, CommunityVoice, Winnie Cheng


from

“Integration is the other major “front” I see in the battle to make AI operational. How can conventional software development teams ensure their application code can correctly call into the data science code? If the code built around the machine learning model changes, and the application code is changed accordingly, how can the data science and development teams synchronize the release of each of their code changes? In reality, this is very hard.”


How Universities Are Leveraging H-1B Visas To Fuel Startup Hubs

Crunchbase News, Natasha Mascarenhas


from

As obtaining an H-1B visa becomes more difficult due to the Trump administration’s crackdown on immigration and work visas, universities around the country are tapping into a creative way to legally attract and retain foreign talent: the Global Entrepreneur in Residence Program (GEIR).

Started by the University of Massachusetts Venture Development Center in Boston, the program invites foreign entrepreneurs to work part-time at a school, either as a mentor or an adjunct professor. In return, entrepreneurs may get a cap-exempt H-1B visa that gives them the latitude to work on their own startups when not fulfilling university duties. The idea is that the foreign-born entrepreneur stays in the program for about two years, or until they secure a green card or another visa.

The program is looking for founders that want to start their own companies, according to William Brah, the founder and executive director of the UMass branch. In their case, the majority of startups born out of the program have been tech-focused. Some of the startups born out of this program include Ori Systems, Enigma, and Loggr.


Making sense of Microsoft’s approach to AI

ZDNet, Mary Jo Foley


from

Microsoft has a master plan for trying to get more customers to jump into the AI waters. Here’s an attempt to explain how containers, accelerators, and APIs all figure in.


Palantir may go public, but can it turn a profit?

Marketplace, Molly Wood


from

The data analytics company Palantir is reportedly considering going public. Palantir is the company co-founded by controversial Silicon Valley billionaire Peter Thiel, formerly of PayPal. It’s named after an all-seeing artifact in the “Lord of the Rings” trilogy. The company promises police departments, governments, even the IRS, that it can take in huge amounts of data and make artificial intelligence-informed guesses to help track down criminals and cheats, among other things. In a secret pilot program in New Orleans, Palantir tech even tried to predict when crime would happen or who might be a victim. But lately its huge $20 billion valuation is in doubt, and privacy activists are concerned about its tactics. Molly Wood talked about it with Mark Harris, a reporter who’s covered Palantir for Wired magazine. The following is an edited transcript of their conversation. [audio, 6:27]


Can artificial intelligence improve maps for land …

Thomson Reuters Foundation, Gregory Scruggs


from

In December 2016, environmental group Chesapeake Conservancy unveiled one of the largest, high-resolution land-cover maps made in the United States.

It analysed every square metre of satellite data in the 207 cities and counties that touch the watershed of the Chesapeake Bay on the U.S. eastern seaboard.

The bay, North America’s biggest estuary, has struggled to recover from overfishing and pollution, and the conservancy hopes the map will guide environmental restoration decisions like where to plant stormwater-absorbing trees.

Creating a 100,000-square-mile (259,000 square kilometres) digital map that defined land use – water, vegetation or concrete – at such a fine scale was “gruelling”, said project director Jeff Allenby.


Natural language generation for electronic health records

npj Digital Medicine, Scott H. Lee


from

One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder–decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model’s optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics.


Duke to Use mHealth Wearables to Help Freshmen Improve Health

mHealth Intelligence, Eric Wicklund


from

Duke University will be arming incoming freshmen with smartwatches next year in an mHealth program that aims to improve student health outcomes.

The WearDuke program, developed by Geoff Ginsburg, a professor of medicine, and Susanne Haga, an associate professor of medicine, both with Duke’s Center for Applied Genomics and Precision Medicine, will use the wearables to track new students’ sleep and activity.

“We will initially be focusing on sleep because sleep is very well documented (as something) college students don’t get enough of,” Haga said in a story supplied by the university. “And it’s important to health, mental well-being and academic performance.”


The SpiNNaker Supercomputer, Modeled After the Human Brain, Is Up and Running

SingularityHub, Edd Gent


from

The brain is the most complex machine in the known universe, but that complexity comes primarily from its architecture rather than the individual components that make it up. Its highly interconnected structure means that relatively simple messages exchanged between billions of individual neurons add up to carry out highly complex computations.

That’s the paradigm that has inspired the ‘Spiking Neural Network Architecture” (SpiNNaker) supercomputer at the University of Manchester in the UK. The project is the brainchild of Steve Furber, the designer of the original ARM processor. After a decade of development, a million-core version of the machine that will eventually be able to simulate up to a billion neurons was switched on earlier this month.


Public Attitudes Toward Computer Algorithms

Pew Research Center, Aaron Smith


from

Algorithms are all around us, utilizing massive stores of data and complex analytics to make decisions with often significant impacts on humans. They recommend books and movies for us to read and watch, surface news stories they think we might find relevant, estimate the likelihood that a tumor is cancerous and predict whether someone might be a criminal or a worthwhile credit risk. But despite the growing presence of algorithms in many aspects of daily life, a Pew Research Center survey of U.S. adults finds that the public is frequently skeptical of these tools when used in various real-life situations.

This skepticism spans several dimensions. At a broad level, 58% of Americans feel that computer programs will always reflect some level of human bias – although 40% think these programs can be designed in a way that is bias-free. And in various contexts, the public worries that these tools might violate privacy, fail to capture the nuance of complex situations, or simply put the people they are evaluating in an unfair situation. Public perceptions of algorithmic decision-making are also often highly contextual. The survey shows that otherwise similar technologies can be viewed with support or suspicion depending on the circumstances or on the tasks they are assigned to do.


This AI Startup Generates Legal Papers Without Lawyers, and Suggests a Ruling

Bloomberg Businessweek, Patrick Gillespie


from

[Ignacio] Raffa, a local startup founder, and his colleagues created Prometea in partnership with the DA’s office. The agency says the app has helped redirect staffers away from legal scut work and toward more complex cases, and that proofreaders rarely find errors when reviewing the computer-generated files. The app has attracted interest from the United Nations, the World Bank, and the Inter-American Development Bank in Washington, among others. “It can help legal systems around the world,” says Asha Aravindakshan, a Sloan Fellow at MIT who saw a demo of the app this summer. “Everyone has a backlog.”


Major AI conference is moving to Africa in 2020 due to visa issues

VentureBeat, Khari Johnson


from

While Bengio said the event will be held in Africa, Dean, Gebru, and Google Brain resident Sara Hooker specified that the conference will take place in Addis Ababa, the capital of Ethiopia.

Throughout much of the year, Black in AI is an organization that exists primarily online as a forum, Facebook group, and Twitter account that promotes opportunities and accomplishments of people of African descent in AI.

However, in recent weeks, as the group prepares to hold its largest gathering of the year at NeurIPS (formerly NIPS) in Montreal to discuss their latest work, members of the organizing committee say Canadian immigration officials have been tougher on applications from speakers and workshop attendees than last year, despite the fact that the conference was held in Long Beach, California in 2017 and African countries are included in the U.S. State Department’s travel ban.


Now that @LibCarpentry is officially a project of @thecarpentries, I thought I would update the ‘brief history’ timeline that I made last year when @datacarpentry and @swcarpentry were merging.

Twitter, Rayna Harris


from


Using Deep Learning for Mammography Assessment

Medium, NYU Center for Data Science


from

Currently, radiologists require years of training before they are qualified to isolate cancer indicators in x-ray images. The results of this type of analysis are part of the Breast Imaging-Reporting and Data System (BI-RADS). This system depends upon this human expertise to categorize scan results on a scale of 6. Under this scale, 0 is inconclusive, 1 is negative for cancer, and 2 represents a benign growth. Any increase beyond 2 indicates that the clinician has identified positive markers for cancer. Lakshmi Subramanian, CDS affiliate and Associate Professor of Computer Science, Ulzee An, NYU Master’s Student in Computer Science, and Khader Shameer of Northwell Health explored the application of deep learning to mammography in their recent publication. In their work, Subramanian and the research team broke down the process into “a classification task specializing in discriminating tissue expressions locally, then a full context heatmap regression model which guides the aggregation of local results.”

 
Events



2018 HITLAB Innovators Summit: New York City.

HITLAB


from

New York, NY November 27-28. “The 5th Annual HITLAB Innovators Summit: NYC brings together leaders in life sciences, medicine, technology, venture capital, and design to discuss digital solutions for pressing health issues.” [$$$]


CompleNet 2019

CompleNet 2019


from

Tarragona, Catalonia, Spain March 18-21, 2019. “The International Conference on Complex Networks (CompleNet) brings together researchers and practitioners from diverse disciplines working on areas related to complex networks.” [$$$]

 
Deadlines



Wellcome Trust partners with Sage Bionetworks to launch Wellcome Data Re-use Prizes

“We are excited to partner with the Wellcome Trust to demonstrate how the re-use of public datasets can lead to better scientific insights,” said Dr. Larsson Omberg, Vice President for Systems Biology at Sage Bionetworks. “In addition to judging the reproducibility of the methods, we are encouraging people to work together by awarding extra points for new collaborations.” Deadline for submissions is February 28, 2019.
 
Tools & Resources



Introducing the Priceonomics Data Content Platform

Priceonomics


from

We’re launching something new for people and companies that write about data, the Priceonomics Content Platform. With some conditions, you can publish the data-driven reports on Priceonomics and if the content is great, it will likely spread. We’ve been trying this feature with great customers like SeatGeek, CarMax and Craft and it has worked as expected.


Introducing the data jobs board: jobs for analysts, data scientists & biz ops

ProjectBI, Justin Butlion


from

A few weeks ago I launched my new project, The Data Jobs board by ProjectBI.

The goal behind the data jobs board is to make it easy for analysts, data engineers, data scientists and ops specialists to find jobs at awesome companies. The jobs board includes companies from Israel and the United States but is not limited to these geographies. The jobs board includes remote and in-office positions.


Statistical Thinking for the 21st Century (Open Textbook)

Russ Poldrack


from

“I ultimately came to feel that the students would be best served by a book that follows very closely to my lectures, so I started writing down my lectures into a set of computational notebooks that would ultimately become this book. The outline of this book follows roughly that of Field’s book, since the lectures were originally based in large part on the flow of that book, but the content is substantially different (and also much less fun and clever).”


Data from first two TESS Observation Sectors to be released in early December

NASA TESS, Knicole Colon


from

After a successful launch and several months of commissioning, the Transiting Exoplanet Survey Satellite (TESS) commenced science operations on July 25, 2018. TESS is currently collecting 2-minute cadence and 30-minute cadence full frame image science data from Observation Sector 5 in the Southern ecliptic hemisphere. Data from the first four Observation Sectors have been received on the ground and have been or are being processed through the pipeline. The TESS Science Office has been vetting planet candidates, and has begun issuing public Alerts on a select set of especially promising TESS candidates, that are also being followed up with ground-based telescopes to confirm.

It is anticipated that the full data set from Observation Sectors 1 and 2 will be released publicly via the Mikulski Archive for Space Telescopes (MAST) in early December 2018.

 
Careers


Internships and other temporary positions

Data Science Intern



iRobot; Bedford, MA
Postdocs

Postdoctoral Research Fellowships in Observational Research on PTSD and Related Syndromes



Harvard Medical School, Department of Health Care Policy; Boston, MA

Seeking Postdoctoral Candidates



Weill Cornell Medicine, Englander Institute for Precision Medicine; New York, NY

Leave a Comment

Your email address will not be published.