Data Science newsletter – April 5, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for April 5, 2018


Data Science News

William Tang wins 2018 Global Impact Award to advance development of AI software to help create a ‘star on earth’

Princeton University, Princeton News


Physicist William Tang has won a highly competitive $100,000 Global Impact Award from NVIDIA Corp., the leading producer of graphics processing units (GPUs), for carrying out artificial intelligence (AI) computing. This award was one of two presented at the NVIDIA national GPU technology conference held March 26-29 in San Jose, California.

Tang, a principal research physicist at the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL) is a lecturer with the rank and title of professor in Princeton University’s Department of Astrophysical Sciences. He leads a team of scientists that is using modern NVIDIA GPU’s to develop a form of AI machine learning called “deep learning” to predict and mitigate the onset of dangerous disruptive events capable of terminating fusion reactions in doughnut-shaped devices known as “tokamaks.”

The Hairy Problem With Drug Testing and Chemical Analysis

WIRED, Science, Ellen Airhart


Keri Hogan was about to become a police officer when she submitted a sample of her hair to the city of Boston for testing. The city, in turn, gave it to a company called Psychemedics, which washed the hair, dissolved it, and used gas chromatography and mass spectrometry—chemical analysis techniques—to check it twice over for evidence of cocaine. Hogan’s hair tested positive.

Boston police officers whose hair tests positive for drugs usually have two options: admit their substance abuse problems and agree to a stint in rehabilitation, or relinquish their position. But Hogan, who finished her police training prior to 2005, says that she has never used cocaine; when she sent her hair to a private company for more testing, it came back negative. Now, she and nine other black police officers are suing the city of Boston, saying that the practice of testing hair for drugs is discriminatory. Because of the chemistry behind the test, they say, it unfairly targets dark hair. The bench trial for the case began on March 12, and may have long-lasting consequences for the future of drug testing.

Q1 2018: Funding keeps climbing as digital health startups double down on validation

Rock Health, Megan Zwieg and Denise Tran


On the heels of the biggest year in venture funding, the digital health space is starting off 2018 with a bang: record Q1 funding of $1.62B, three $100M+ mega-deals, and a massive exit. Compared to last year, the commotion from policy debates has largely settled and a path to regulatory clarity has emerged. On our end, we’ve launched a couple of new sections within our funding post—check out our deep investor analysis as well as an update on the sometimes elusive, always critical quest of every digital health company: validation.

Government Data Science News

The Australian Privacy Commissioner launched a formal investigation into Facebook’s improper data sharing of 87m accounts with Cambridge Analytica. Hefty fines and new regulations may follow.

Puerto Rico is planning to close its state-run statistical agency, the Puerto Rico Institute of Statistics, and outsource the work to a private contractor that will be chosen by the government. Local non-profits, the American Statistical Association, and Puerto Rican scientists have all sounded alarm bells about what this will mean for public access to data.

Canada’s 150 Research Chairs program that poaches recruits top foreigners to teach and conduct research at Canadian universities has attracted many away from American universities. They have also been spending quite a bit on AI. I imagine some of my readers may be considering a move north. Let me know if that describes you – would love to hear about the decision making process.

Berkeley (the city, not the state university) is looking to prohibit any new contracts with database operators who are also working with the U.S. Federal Government or any other governmental entity to target immigrants and undocumented people. In particular, they mention IBM Inc., Palantir Technologies Inc., and LexisNexis.

DARPA is racing to figure out how to detect ‘deep fakes’, images and video that have been doctored, but are released as if they are real. This is a key battle ground in the war to protect the concept of truth.

The US Department of Justice is continuing to pressure Harvard into releasing more detailed data about its admissions processes, following a racial discrimination suit brought on behalf of Asian American students. The students feel they have to meet a higher standard to gain admission compared to their white, black, and Hispanic peers.

4 Things Universities Need to Know to Teach Data Science

EdTech, Meghan Bogardus Cortez


Technologies that collect data for analysis exist in pretty much every aspect of life — from smart thermometers in homes to wireless sensor networks.

Every industry is seeking ways to collect data in order to discover actionable insights, and demands are increasing in the workforce for people who know how to deal with this data. But, how do higher education institutions prepare the next generation of workers to take on that challenge?

The National Academies of Sciences, Engineering and Medicine sought to answer that question in its new report, “Training Students to Extract Value from Big Data: Summary of a Workshop.”

“Advances in technology have made it easier to assemble and access large amounts of data,” reads the report. “Now, a key challenge is to develop the experts needed to draw reliable inferences from all that information.”

[P]s The 2018 Stanford CS224n NLP course projects are now online. A lot of them are pretty impressive. : MachineLearning


26 comments as of April 5

First Amendment Lawsuit Brought on Behalf of Academic Researchers Who Fear Prosecution Under the Computer Fraud and Abuse Act



A federal judge has rejected the government’s position in a lawsuit filed by a group of university professors and journalists who argue that a federal computer crimes law unconstitutionally criminalizes research aimed at uncovering whether online algorithms result in racial, gender, or other illegal discrimination in areas such as employment and housing. The district court’s decision late Friday allowed the case to proceed, although the judge did dismiss several of the claims, removing some of the plaintiffs from the lawsuit.

The case, brought by the American Civil Liberties Union in June 2016, challenges a section of the Computer Fraud and Abuse Act (CFAA) that the government argues makes it a crime to violate a website’s terms of service. Those terms, which are arbitrarily set by individual sites and can change at any time, often prohibit things like creating multiple “tester” accounts, providing inaccurate information to websites, or using automated methods to record publicly available data like search results and ads.

Those same practices are used by researchers to test whether sites are, for example, more likely to show higher interest rate loan ads to people of color or show higher paying jobs to men who search employment listings. Studies like these necessarily require researchers to create dummy online identities and record what content is served up to those identities.

Algorithms trace how stereotypes have changed

Futurity, Stanford University


Word embeddings—an algorithmic technique that can map relationships and associations between words—can measure changes in gender and ethnic stereotypes over the past century in the United States. … “Word embeddings can be used as a microscope to study historical changes in stereotypes in our society,” says paper coauthor James Zou, an assistant professor of biomedical data science at Stanford University. “Our prior research has shown that embeddings effectively capture existing stereotypes and that those biases can be systematically removed. But we think that, instead of removing those stereotypes, we can also use embeddings as a historical lens for quantitative, linguistic, and sociological analyses of biases.”

Robotics Maker Moves from War Machines to Powering Up Workforce

IndustryWeek, Matt LaWell, John Hitch |


Sarcos Robotics CEO Ben Wolff talks about how the company that started the 21st century as a military contractor, could soon give general contractors, and any industrial worker, super powers.

Can New Forensic Tech Win War On AI-Generated Fake Images?

Fast Company, Steven Melendez


As AI makes video manipulation easier, Defense Advanced Research Projects Agency scientists race to develop tools to detect what’s real and what’s not.

UCLA scientists merge statistics, biology to produce important new gene computational tool

UCLA Newsroom


The cells in our bodies express themselves in different ways. One cell might put a chunk of genetic code to work, while another cell ignores the same information entirely. Understanding why could spur new stem cell therapies, or lead to a more fundamental understanding of how organisms develop. But zeroing in on these cell-to-cell differences can be challenging.

Now, two UCLA researchers have come up with a computational tool that increases the reliability of measuring how strongly genes are expressed in an individual cell, even when the cell is barely reading certain genes. The research was published last month in the journal Nature Communications.

“The DNA sequence is the same in a brain cell, a liver cell and a heart cell,” said Jingyi “Jessica” Li , the study’s corresponding author and a UCLA assistant professor of statistics. “Why do those cells look so different? The key thing is gene expression.”

Facebook Says Data on Most of Its 2 Billion Users Is Vulnerable

Bloomberg Tech, Sarah Frier


Facebook Inc. said data on most of its 2 billion users could have been accessed improperly, giving fresh evidence of the ways the social-media giant failed to protect people’s privacy while generating billions of dollars in revenue from the information.

The company said it removed a feature that let users enter phone numbers or email addresses into Facebook’s search tool to find other people. That was being used by malicious actors to scrape public profile information, it said.

“Given the scale and sophistication of the activity we’ve seen, we believe most people on Facebook could have had their public profile scraped in this way,” the company said. “So we have now disabled this feature.”

A Data Science Tour of Duty

Caitlin Hudon


In September 2015, I was looking for a job in Austin and started interviewing at Yodle, a marketing tech company. Yodle was looking for someone with experience in data and predictive analytics, and I was looking for a company where I could learn how to code and work on new, interesting problems. It was a good match, but one thing that made this opportunity stand out was the way that my soon-to-be boss described what my time there would be like — a “tour of duty”. Tim was building a data science-y team and was testing out a management framework he had discovered called “The Alliance”. I was intrigued.

Norway picks autonomous shipping lane

Innovators magazine


Global maritime group Wilhelmsen and tech giant KONGSBERG have established a new autonomous shipping company.

Called Massterly, it will operate new ‘control centres’ to manage autonomous shipping in Norway and beyond.

“As a world-leading maritime nation, Norway has taken a position at the forefront in developing autonomous ships. Through the creation of the new company named Massterly, we take the next step on this journey by establishing infrastructure and services to design and operate vessels, as well as advanced logistics solutions associated with maritime autonomous operations. Massterly will reduce costs at all levels and be applicable to all companies that have a transport need”, says Thomas Wilhelmsen, Wilhelmsen group CEO.

Extra Extra

Paul Allen funded a team of researchers and engineers (Columbia University and UW-Seattle) to get underwater drones monitoring Antartica to better understand climate change. Three of them have been deployed. They will be put into semi-hibernation mode 9 months of the year, with remote monthly wake up calls to capture some data before hibernating again.

The news about Paul Allen, UW and Columbia was the tip of the proverbial iceberg. More high-quality longform about Antarctica:

  • Antarctica’s Ice Is Becoming Unhinged by Brian Kahn at Earther,
  • Inside Antarctica: the continent whose fate will affect millions by Pilita Clark at

    DateEDGE Conference

    University of California-Berkeley, School of Information


    Berkeley, CA April 24 in Banatao Auditorium. “The UC Berkeley School of Information’s DataEDGE conference will bring you up to speed quickly on the current state of the data revolution.” [$$$]

    MinneAnalytics Boston Community Launch event



    Boston, MA April 10. “Data science professionals in the Boston area are invited to meet our organizing team and share their ideas at a community launch event on Tuesday, April 10 at OptumLabs in Cambridge.” [free, registration required]

    Jane Street Tech Talks – OCaml All The Way Down

    Jane Street


    New York, NY April 18 starting at 5 p.m., Jane Street (250 Vesey Street). “Come and find out how we design and test our FPGAs.” [free, registration required]

    A Talk with Brian Nosek: Improving Openness and Reproducibility in Scholarly Communication

    University of California-San Diego, The Library


    La Jolla, CA April 19 at 2 p.m. [free]


    American Museum of Natural History – Brown Scholars program

    The Brown Scholars program is a new 3-year intensive focused on the intersection of computer science and science. Participants are NYC-area girls who have completed 1 or 2 years of high school. Deadline to apply is April 15.

    2018 U.S. ASPIRE Competition

    “In its eighth year running, the APEC Science Prize for Innovation, Research and Education (ASPIRE) is an annual award hosted by the Asia Pacific Economic Cooperation Policy Partnership for Science, Technology, and Innovation that recognizes young scientists who have demonstrated a commitment to excellence in scientific research and cross-border collaboration.” Deadline to apply for nomination is April 20.

    2018 Community Indicators Consortium Impact Summit: Community Indicators in Action

    Minneapolis, MN September 17-18. “We will showcase, and learn from, leading measuring and monitoring initiatives, impact, communications strategies and innovations for community change.” Deadline for proposals is April 20.

    CVPR 2018 WAD Video Segmentation Challenge

    Deadline for entries is June 4. $2500 first prize.

    Call for Abstracts: 2018 Rice University Data Science Conference

    Houston, TX October 8-9. Deadline for abstract submissions is August 3.
    Moore-Sloan Data Science Environment News

    Informing E-cigarette Public Health Policy with Data Science

    Medium, NYU Center for Data Science


    Dustin T. Duncan, Assistant Professor of Population Health, offers new data analysis of e-cigarette prevalence in U.S.

    Tools & Resources

    How Javascript and Machine Learning Can Actually Collaborate

    Open Data Science, Caspar Wylie


    Surprising article title, isn’t it? Based on past experiences in data sciences, you might not expect Javascript and Machine Learning to be in the same sentence. In this article, however, we’re going to look at how and why these two are starting to collaborate rather successfully. There are many hidden uses for a collaboration between Javascript and Machine Learning and we are only starting to see its potential.

    Machine Learning requires a lot of power when implementing a neural network model; an activity that languages like Javascript are simply not built for. Due to this lack of withstanding in Javascript, you would not expect such implementations to work effectively on a web browser. This expectation is where the collaboration between Javascript and Machine Learning prove common beliefs somewhat of a fallacy; Javascript and Machine Learning can work quite well together, when necessary, in developing more engaging and advanced web browser capabilities.

    One of the most obvious benefits of using web browsers is that browsers don’t require installation or intricate setup to use the easy access web applications across the internet.



    Postdoc Position in Clinical Neuroscience

    Cleveland Clinic, Lerner Research Institute; Cleveland, OH
    Internships and other temporary positions

    Data Science Contractor

    Snapsheet; Chicago, IL

    Product/Industrial Design and Prototyping Internship

    PARC; Palo Alto, CA

    Leave a Comment

    Your email address will not be published.