Google Online Security Blog; Marc Stevens (CWI Amsterdam), Elie Bursztein (Google), Pierre Karpman (CWI Amsterdam), Ange Albertini (Google), Yarik Markov (Google), Alex Petit Bianco (Google), Clement Baisse (Google)
Cryptographic hash functions like SHA-1 are a cryptographer’s swiss army knife. You’ll find that hashes play a role in browser security, managing code repositories, or even just detecting duplicate files in storage. Hash functions compress large amounts of data into a small message digest. As a cryptographic requirement for wide-spread use, finding two messages that lead to the same digest should be computationally infeasible. Over time however, this requirement can fail due to attacks on the mathematical underpinnings of hash functions or to increases in computational power.
Today, more than 20 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision.
National Institutes of Health, Data@NIH blog, Patti Brennan
The amount and diversity of data generated by NIH-funded research programs continues to grow rapidly; safe, scalable storage solutions, new analytic approaches, and an adaptable workforce are urgently needed. As this unprecedented revolution in biomedical information unfolds and NIH looks to the future of data science, some pitfalls remain! We must ensure that researchers have the ability to make meaningful use of this increasingly massive biomedical data resource. It is timely and critical for NIH to identify and implement new strategies to improve data discoverability, utility, and sustainability, including moving many large data sets into the cloud and making them adherent to the FAIR PrinciplesExit Link Disclaimer—Findable, Accessible, Interoperable, and Reusable. Success in meeting this challenge will require leveraging the findings from the Big Data to Knowledge (BD2K) program, and a major infusion of resources.
Over the next several months, I will be working with NIH’s 27 ICs to develop efficient strategies to improve data discoverability, utility, and sustainability for the biomedical research community. I will work with the Division of Program Coordination, Planning, and Strategic Initiatives to engage the various pilot projects that are identifying critical points of success for future data science efforts. As part of this effort, the second phase of NIH’s cornerstone data science initiative, the Big Data to Knowledge (BD2K) program, will include investments to accelerate progress in the development of these new strategies through a pilot program for an NIH data Commons.
arXiv, Physics > Chemical Physics; Felix A. Faber, Luke Hutchison, Bing Huang, Justin Gilmer, Samuel S. Schoenholz, George E. Dahl, Oriol Vinyals, Steven Kearnes, Patrick F. Riley, O. Anatole von Lilienfeld
We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of thirteen electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed with learning curves which report approximation errors as a function of training set size. Molecular structures and properties at hybrid density functional theory (DFT) level of theory used for training and testing come from the QM9 database [Ramakrishnan et al, Scientific Data 1 140022 (2014)] and include dipole moment, polarizability, HOMO/LUMO energies and gap, electronic spatial extent, zero point vibrational energy, enthalpies and free energies of atomization, heat capacity and the highest fundamental vibrational frequency. Various representations from the literature have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), and angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR) and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). We present numerical evidence that ML model predictions for all properties can reach an approximation error to DFT which is on par with chemical accuracy. These findings indicate that ML models could be more accurate than DFT if explicitly electron correlated quantum (or experimental) data was provided.
When Anthony Tran took over the District’s public health lab late last year, he had a feeling something was wrong with its testing for the Zika virus. He had just come from the public health lab in New York City, where technicians had been finding markers for Zika in the blood of arriving travelers almost every day. In the smaller, but still international, city of Washington, the same test was negative — every time.
Soon, U.S. health officials joined in Tran’s concern: Samples supplied by the federal government of the frightening, mosquito-borne virus that were tested in the lab as a control were appearing as if they contained no virus.
“I knew then that something was tremendously wrong,” Tran said late last week in an interview. He halted testing, and with help from analysts at the Centers for Disease Control and Prevention, traced the problem to a mistake that any high school chemistry student could understand.
Company Data Science News
Palantir is combining “information on a subject’s schooling, family relationships, employment information, phone records, immigration history, foreign exchange program status, personal connections, biometric traits, criminal records, and home and work addresses” to aid Immigration and Customs Enforcement (ICE) in immigration and deportation cases.
The W3C released standards for web annotations that spell out “the data model, protocol and vocabulary for annotations.” This move will likely change the storage model for data from web commenting, especially if browsers become the commenting host rather than the current model where comments live on individual websites or in a handful of commenting services.
As the tech side of health care heats up (keep watching precision medicine), executives at top ten tech companies are leaving their jobs to launch health tech startups.
Apple is continuing to beef up its AI recruiting ability by announcing it will open a new office in Seattle which has become Silicon Valley North. Apple is also endowing a professorship in AI and machine learning at UDub.
Chipmaker Intel is competing with GPU maker Nvidia. The question for hardware makers: will specialized chips designed to accommodate machine learning and neural nets eventually move onto the CPU? Other once stand-alone silicon has. Hat tip to The Economist’s coverage of AI. It is routinely excellent.
Stanford Prof and Pinterest Chief Scientist Jure Leskovecannounced the launch of Pinterest Labs, another company where we see top staff splitting their time between industry and academia.
NASA has selected proposals for the creation of two multi-disciplinary, university-led research institutes that will focus on the development of technologies critical to extending human presence deeper into our solar system.
The new Space Technology Research Institutes (STRIs) created under these proposals will bring together researchers from various disciplines and organizations to collaborate on the advancement of cutting-edge technologies in bio-manufacturing and space infrastructure, with the goal of creating and maximizing Earth-independent, self-sustaining exploration mission capabilities.
“NASA is establishing STRIs to research and exploit cutting-edge advances in technology with the potential for revolutionary impact on future aerospace capabilities,” said Steve Jurczyk, associate administrator for NASA’s Space Technology Mission Directorate in Washington.
I’ve always been curious about what makes someone “look” male or female, probably because I’m female but have never looked conventionally feminine. I was a tomboy as a child and remained one as an adult, and I’m also tall, with unruly hair that’s easiest to keep short. So strangers often assume that I’m male: in restaurants and on planes, I’m often addressed as “sir”.
People who know me well are usually surprised that anyone could think I was male. But I don’t find it that surprising — we don’t tend to really look closely at strangers, and just make broad assumptions about them based on their outlines. Children are often an exception — they will scrutinize me for a while and then ask their embarrassed parents, “is that a boy or a girl?”
Knowing that there has been huge progress in recent years in using machine learning to classify images, I got curious: could I train a model to classify photos of people according to their gender? What “rules” would it learn, for making the decision? And how would it classify me?
These technologies all have staying power. They will affect the economy and our politics, improve medicine, or influence our culture. Some are unfolding now; others will take a decade or more to develop. But you should know about all of them right now.
The Chronicle of Higher Education, Shannon Najmabadi
A Tufts University professor has a proposal to combat gerrymandering: give more geometry experts a day in court.
Moon Duchin is an associate professor of math and director of the Science, Technology and Society program at Tufts. She realized last year that some of her research about metric geometry could be applied to gerrymandering — the practice of manipulating the shape of electoral districts to benefit a specific party, which is widely seen as a major contributor to government dysfunction.
At first, she says, her plans were straightforward and research-oriented — “to put together a team to do some modeling and then maybe consult with state redistricting commissions.” But then she got more creative. “I became convinced that it’s probably more effective to try to help train a big new generation of expert witnesses who know the math side pretty well,” she says.
This week, 42,000 of my closest friends each walked an average of 5 miles per day through the Orlando Convention Center at the annual HIMSS conference. One journalist told me “It’s overwhelming. You do your best to look professional and wear comfy shoes!”
After 50 meetings, and 12 meals in 3 days, here are my impressions of the experience.
1. Wearables, while still relevant have gone from the peak of the hype curve to the trough of disillusionment.
Institute for Systems Biology, Wellcome Trust Sanger Institute
The Wellcome Trust Sanger Institute’s Catalog of Somatic Mutations in Cancer (COSMIC) team announces a new agreement to provide their data to the U.S.-based Institute for Systems Biology (ISB).
COSMIC is an expert-curated cancer mutation database, and is the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancers.
With this agreement, ISB has embedded the COSMIC data within the ISB Cancer Genomics Cloud (CGC), which is a cloud-based platform that uses Google BigQuery technology to bring unprecedented computing power to researchers around the world.
When Stephanie Tilenius, a former senior executive at eBay and Google, decided to start a health-coaching app, many in her network were incredulous. “Everyone thought I was crazy,” she recalls. “Some people loved that I wanted to do something to help others, but a lot socially ostracized me.”
For many entrepreneurs, the health sector offers an enticing opportunity—with strings attached. It’s an estimated $3 trillion market and is still dominated by a cadre of traditional players. But many in the technology sector have shied away from the industry after witnessing many high-profile failures and realizing that change doesn’t happen quickly. “Silicon Valley operators and investors see that health care needs better technology,” explains veteran health IT consultant Ben Rooks. “But they learn quickly that health care isn’t about radical disruption; it’s about slow evolution.”
Despite the challenges, a small but growing group of former technologists from companies like Google and Twitter are in it for the long haul. In many cases, their motivations are deeply personal
Intel announced in a press release Tuesday that UC Berkeley will join several major tech companies in forming the 5G Innovators Initiative, which is focused on bringing technology and academia together to advance fifth generation networks in the United States.
In collaboration with Intel, Ericsson, Honeywell and General Electric, the campus will work toward transforming network infrastructure in a variety of fields, including telecommunications, healthcare, finance and security. Fifth generation, or 5G, mobile networks, are a new industry standard and they move beyond communications to adapt to the growing needs of various industries, according to Nimish Radia, director of research at the Bay Area branch of Ericsson.
… How did political orientation compare to religious orientation in driving people’s interest in potential dates?
Religion matching is very important. Catholics want to date other Catholics. Jews want to date other Jews, and so on. That effect is actually quite a bit larger than the political effect, which is still reasonably significant.
Interestingly, disinterest in politics has an effect. People who aren’t interested in politics are not that excited about dating people who are really interested in politics. If you know people who are not interested in politics, then this strikes me as completely accurate.
Earlier this week my colleague shared some very cool news: A group of researchers here developed an experimental brain-controlled prosthesis that allows people with paralysis to type on a keyboard just by thinking about moving their hands. The scientists had a long journey reaching this point, and writer Elizabeth Svoboda shared some of it in an online piece.
This week was the HIMSS Annual Conference, the biggest week in Health IT. We’ve been covering some of the major digital health news from the conference right along, but today we’re bringing it all together in one wrap-up of the event.
We started off the conference with our own event (in partnership with the Personal Connected Health Alliance), the Digital and Personal Connected Health event, where we heard from a number of providers working on digital health projects. Read all about that event below, and then read on for a roundup of more news from the show.
Foundation Data Science News
Google.org is giving out $11.5m in grants to fight racial bias in policing with data science.
The Gates Foundation paid five designers to visualize international health data. Yeah, this is click bait in the sense that none of you are likely to make animations any time soon, but it’s substantively worth it. Saving lives!!
Cofounder of the Julia Language, Stefan Karpinski, explains why they decided to create a new language for scientific computing in the first place. Wasn’t python enough?
Andreas Mueller of scikit learn and Columbia University talks about using grant funding to improve women’s participation in open source communities.
Risa Lavizzo-Mourey is leaving the Robert Wood Johnson Foundation, where she served as president for fourteen years, to become a university professor at the University of Pennsylvania.
In many ways, the tech world’s AI arms race is really a fight for talent. Skilled engineers are in short supply, and Silicon Valley’s biggest companies are competing to nab the best minds from academia and rival firms. Which is why it makes sense that Apple has announced it’s expanding its offices in Seattle, where much of its AI and machine learning work is done.
Seattle is home not only to the University of Washington and its renowned computer science department, but also the Allen Institute for Artificial Intelligence. Microsoft and Amazon are headquartered nearby, and AI startups are finding a home in the region, too. Last August, Apple even bought a Seattle-based machine learning and artificial intelligence startup named Turi for an estimated $200 million, and the team is said to be moving into Apple’s offices at Two Union Square as part of the expansion.
A new discovery by researchers at the Fred Hutchinson Cancer Research Center in Seattle makes an important step in identifying which specific T cells within the diverse army of a person’s immune system are best suited to fight cancer.
The findings were published February 24 in Science Immunology.
“We found that the cells in each patient’s immune system that will ultimately have a clinical effect are incredibly rare,” said Dr. Aude Chapuis, lead author of the paper and a member of the Clinical Research Division at Fred Hutch. “Knowing what we’ve found, we can now refine the selection of the cells that we will ultimately use for adoptive T cell transfer, so that the cells persist and keep the tumors at bay longer in our patients.”
The University of Washington and the University of British Columbia are partnering on a new collaboration called the Cascadia Urban Analytics Cooperative (CUAC), which will connect researchers, students and public stakeholders working on urban issues across the region.
A $1 million donation from Microsoft will allow the two universities to work together using data science to research, innovate and discover sustainable solutions to civic problems in areas such as homelessness, health and transportation.
Voices of New York, created by Renee Blake and her students,wants to hear the voices of immigrant communities in New York City (NYC), and learn about the people behind them. The students have traveled throughout the boroughs of NYC to seek out neighborhoods and communities where ethnic cultures are thought to be flourishing.
After the CDS Academy Awards last week, we caught up with some of our student winners to find out more about their work.
Scooping up the prize for the project with the greatest social impact was Yiqiu Shen, Zemin Yu, and Xinsheng Zhang’s fascinating research on how data science can be used to combat terrorism.
A major challenge facing us today is how to quickly identify the groups responsible behind terrorist attacks so that the relevant individuals can be apprehended. With roughly 3290 unique terrorist groups around the globe, each with their own set of characteristics and motivations, law enforcement agencies require additional tools to help keep our country safe.
In analytics, nothing matters more than data quality. The practical way to control data quality is to do it at the point where the data is created. Cleaning up data downstream is expensive and not scalable, because data is a byproduct of business processes and operations like marketing, sales, plant operations, and so on. But controlling data quality at the point of creation requires a change in the behaviors of those creating the data and the IT tools they use.
Enter the chief data officer, or CDO. CEOs are increasingly adding the CDO role to their management teams to tackle the big business issues that come with data. Plenty of CDOs want to improve data quality, but motivating this change requires that CDOs create new organizational incentives and processes. Without the ability to do both, their efforts will fall flat.