Data Science newsletter – August 15, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for August 15, 2018

GROUP CURATION: N/A

 
 
Data Science News



Two Startups Use Processing in Flash Memory for AI at the Edge

IEEE Spectrum, Samuel K. Moore


from

Irvine, Calif.–based Syntiant thinks it can use embedded flash memory to greatly reduce the amount of power needed to perform deep-learning computations. Austin, Texas–based Mythic thinks it can use embedded flash memory to greatly reduce the amount of power needed to perform deep-learning computations. They both might be right.

A growing crowd of companies are hoping to deliver chips that accelerate otherwise onerous deep learning applications, and to some degree they all have similarities because “these are solutions that are created by the shape of the problem,” explains Mythic founder and CTO Dave Fick.

When executed in a CPU, that problem is shaped like a traffic jam of data. A neural network is made up of connections and “weights” that denote how strong those connections are, and having to move those weights around so they can be represented digitally in the right place and time is the major energy expenditure in doing deep learning today.


Vanderbilt launches data science institute

Nashville Post


from

Vanderbilt University has established a Data Science Institute and tapped Andreas Berlind and Douglas Schmidt to co-direct it for three-year terms.

The new trans-institutional institute, the creation of which was one of several recommendations outlined in a May report issued by VU’s Data Science Visions Working Group, will facilitate and promote data-driven research across Vanderbilt’s schools and colleges, according to a release. An initial goal of the institute will be to design and implement a professional master’s-level degree program in data science and to add courses in data science for undergraduate students.


Rewarding research transparency

psyArXiv; Morton Ann Gernsbacher and Gernsbacher Lab


from

Cognitive scientists are increasingly enthusiastic about research transparency. However, their enthusiasm could be tempered if the research reward system fails to acknowledge and compensate these efforts. This article suggests ways to reward greater research transparency during academic job searches, academic promotion and tenure evaluations, and society and national award selections.


AI for cybersecurity is a hot new thing—and a dangerous gamble

MIT Technology Review, Martin Giles


from

Machine learning and artificial intelligence can help guard against cyberattacks, but hackers can foil security algorithms by targeting the data they train on and the warning flags they look for.


Tim Hwang’s FiscalNote is revolutionizing Washington lobbying with big data

MIT Technology Review, Andrew Zaleski


from

Sue Zoldak is a public relations expert with a fierce competitive streak. Her surname in Slovak, as she likes to point out, means “mercenary.” Her firm, the Zoldak Agency, uses targeted advertising and grassroots campaigning to help clients spur voters to press elected officials into voting yes or no on specific bills. While not strictly a lobbyist—she doesn’t communicate directly with lawmakers—Zoldak fits squarely into the influence-peddling milieu of Washington, DC, with 15 years’ experience on K Street, where lobbying firms are traditionally headquartered. Put simply, she’s a go-to person for companies and organizations determined to shape public policy.

Lately Zoldak’s been getting help from a new source—a data intelligence platform called FiscalNote, founded by a 26-year-old political whiz, Tim Hwang. For a current client in the health-care industry—which Zoldak declines to name—she’s tracking states that want to amend their “certificate of need” laws. These obscure laws, which were mandated by Congress in 1974, require health-care companies to prove to state regulators that a community needs their new hospital, nursing home, or rehab clinic. The initial idea was that a local market could support only so many health facilities. If there were too many, and one ended up with empty beds, it would raise prices to cover its fixed costs, overcharging patients.


A New Kind of Clinical Trial

Medium, doc.ai, Jeremy Howard


from

Radiology is just the tip of the iceberg — the opportunity is just as great in every part of diagnostics and treatment planning. The problem is that we don’t have the access to the data to allow us to take advantage of that opportunity. The data is spread over many places — your pharmacy has your medication records, your lab test provider has your labs records, your hospital records may be spread across multiple institutions and departments, your personal health data (such as from your wearables, meal logging, etc.) is on your phone, and so forth. And all this data is in different, incompatible formats, and much of it is not available to researchers due to regulation and privacy concerns.

Doc.ai has come up with a really clever solution: create data import pipelines for each of the major healthcare data sources (e.g. for the major pharmacies, labs providers, hospital networks, etc.) that can run on your phone. Allow you to run an app on your phone that can grab all of your medical data and consolidate it in a single place, in a single format. You can use this app to get “at a glance” information about your health, based on all this information. Then, if a researcher thinks that they might be able to use your data to help diagnose and treat other patients, they can ask your permission to share it with them, and compensate you for it.


How Freely Should Scientists Share Their Data?

Scientific American Blog Network, Observations, Daniel Barron


from

I was glued to my Twitter feed. This was the first time that I’d seen well-established academics, from behind a keyboard or smartphone’s digital courage, publicly shame their colleagues. This was the first time that I’d seen the principles of Open Science marshalled to harm someone’s career.

This made me question the ideals of Open Science: a highly-productive lab writes a grant to fund a series of studies and the development of new tools. They spend years collecting data and building the tools for these proposed studies. Then, they finish a portion of the project and begin to publish results. Should they be required to release their data to the community? If so, when? Who owns that data? And what business do journals have in enforcing data sharing?


First UI med school class builds bridges between health, tech, engineering

The News-Gazette (Champaign, IL), Julie Wurth


from

The path from college to medical school was a 10-year journey for Kenny Leung. For Kitan Akinosho, it’s been five years. For Elizabeth Woodburn, a matter of months.

The three students — a software engineer, data scientist and recent bioengineering graduate — are all part of the inaugural class of the Carle Engineering College of Medicine, drawn by its focus on engineering, an innovative curriculum and, in part, four years of free tuition.

The class of 32 is about to start its sixth week of classes at the medical school, the first to combine engineering and medicine from day one. That was the chief draw for all three students.


Google plans to spend $600 million expanding its SC data center

Post and Courier (Charleston, SC), Thad Moore


from

Search engine giant Google Inc. plans to invest $600 million to expand its massive data center outside Charleston, according to records filed with Berkeley County.


Hacking The Electric Grid Is Damned Hard

FiveThirtyEight, Maggie Koerth-Baker


from

The nightmare is easy enough to imagine. Nefarious baddies sit in a dark room, illuminated by the green glow of a computer screen. Meanwhile, technicians watch in horror from somewhere in the Midwest as they lose control of their electrical systems. And, suddenly, hundreds of thousands, even millions of Americans are plunged into darkness.

That scene was evoked in recent weeks as federal security experts at the Department of Homeland Security warned that state-sponsored hackers have targeted more than American elections — they’re after the electric grid, too. They’ve gotten “to the point where they could have thrown switches,” a DHS official told The Wall Street Journal. Both DHS and the FBI have linked these attacks to Russia — which was already pinned as the culprit in two attacks that shut down power to hundreds of thousands of people in Ukraine two Decembers in a row, in 2015 and 2016. It’s all very urgent — a high-risk crisis that must be solved immediately.

But, surprisingly, some electrical system experts are thinking about it in a different way. Cyberattacks on the grid are a real risk, they told me. But the worst-case scenarios we’re imagining aren’t that likely.


Elucid Labs uses AI in imaging device to detect skin cancer

University of Waterloo, Waterloo Engineering


from

A medical startup has developed cutting-edge imaging technology that has the potential to detect skin cancer without leaving a mark.

Elucid Labs uses artificial intelligence (AI) technology in a small noninvasive imaging device to analyze moles and other skin lesions for signs of skin cancer, helping dermatologists make faster, more accurate decisions and reducing the number of unnecessary biopsies.

The company, named one of the 20 most innovative technology companies for 2017 by the Canadian Innovation Exchange, was cofounded by Alexander Wong, a Waterloo systems design engineering professor, and Iman Khodadad and Farnoud Kazemzadeh, who both earned their engineering doctorates at Waterloo.

“Our vision is to eventually replace all biopsies whether they’re skin related or internal, having anything to do with cancer or any other disease,” says Kazemzadeh.


JAIC: Pentagon debuts artificial intelligence hub

Bulletin of the Atomic Scientists, Jade Leung, Sophie-Charlotte Fischer


from

In October 2016, the newly formed Defense Innovation Board released its first set of recommendations. (The board, an advisory body to senior leadership in the US Defense Department, contains representatives from the private sector, academia, and nonprofits.) One recommendation that stood out was the establishment of “a centralized, focused, well-resourced organization” within the Defense Department “to propel applied research in artificial intelligence (AI) and machine learning.”

Less than two years later, the Pentagon is already transforming this idea into reality. On June 27, Deputy Defense Secretary Patrick Shanahan issued a memorandum that formally established the Defense Department’s new Joint Artificial Intelligence Center (JAIC). According to the memo, JAIC’s overarching aim is to accelerate the delivery of AI-enabled capabilities, scale the impact of AI tools, and synchronize the department’s AI efforts. To this end, JAIC will guide the execution of so-called National Mission Initiatives—large-scale AI projects “designed to address groups of urgent, related challenges.” Moreover, the National Mission Initiatives—as well as the Defense Department’s adoption of cloud technologies—will be leveraged to enable rapid delivery of AI-enabled capabilities across the department. JAIC will also serve as a platform to improve collaboration on AI-related projects with internal as well as external parties, including private companies and academics.

It is notable that JAIC’s focus will also include ethics, humanitarian considerations, and both short- and long-term AI safety. These issues—according to Brendan McCord, head of machine learning at the Pentagon entity known as Defense Innovation Unit Experimental—will be reflected in the establishment of AI defense principles that will be developed with input from multiple stakeholders. This specific dimension of JAIC, though its parameters remain abstract for now, could play an important role in realizing the Pentagon’s AI ambitions.

Developing, institutionalizing, and communicating AI defense principles transparently could not only reduce the operational risks of AI-enabled systems but also increase


Researchers Comb Through Millions of Genetic Variants to Find Disease Risk

PBS, NOVA Next, Aparna Nathan


from

Your disease risk is hiding in your DNA, and doctors may soon be able to find it.

A study published today in Nature Genetics shows that it is possible to predict whether a person is at a high risk of developing a disease based on millions of changes across their genome. This genetic risk, quantified by a “polygenic score,” can be used by doctors to decide whether the patient needs preventative measures.

“It’s good to find people at high risk, but it’s even better to be able to do something about that risk,” said Sekar Kathiresan, director of the Center for Genomic Medicine at Massachusetts General Hospital, director of the Cardiovascular Disease Initiative at the Broad Institute, and professor of medicine at Harvard Medical School.


Ambitious ‘Human Cell Atlas’ Aims To Catalog Every Type Of Cell In The Body

NPR, KQED Future of You, Karen Weintraub


from

For the last two years [Aviv] Regev, a professor of biology at MIT, has been co-leading a massive international effort to get that answer. Called the Human Cell Atlas Consortium, the effort aims to account for and better understand every cell type and sub-type, and how they interact.

The Human Cell Atlas has received less attention than the $3 billion Human Genome Project, which was completed in 2003 after 15 years of work. But it’s equally ambitious.

Regev’s co-chair, Sarah Teichmann, uses Legos as a metaphor for their quest.

“It’s kind of like we’re trying to find out what are all the different colors of Lego building blocks that we have in our bodies,” Teichmann says. “We’re trying to find out how those building blocks — how those Lego parts — fit together in three dimensions within each tissue.”


Company Data Science News

Recently, the FBI asked Google to cough up the device IDs, plus the names, addresses, and Google activity, for all devices and owners who happened to be in the vicinity of at least two in a series of nine robberies in Portland, Maine. (Portland is a great place to visit. Don’t let these thefts deter you.) This comes at a time when Google is under fire for tracking device locations even after users thought they opted out of location tracking. In the Portland case, Google never produced the data. Local police were able to identify and apprehend the 38-year-old male suspect. If they hadn’t gotten their guy, would they still be pressuring Google? And would Google be forced to comply? Note: Wired has an explainer on how to turn off Google tracking for good…?]



Google and Amazon have added features to their voice assistants
to reward their human overlords for saying please. This came in response to parent groups who were afraid the voice activated assistants were reinforcing rude behavior. But there are still questions about whether saying “please” or “thank you” is actually sufficient emotional recognition and sensitivity training.



Ford Motor Company is preparing to use artificial intelligence to predict driver’s moods – especially anger and inattentiveness – to prevent accidents and road rage. The idea of predicting moods from facial recognition is being taken up by a range of companies across industry. I’m a little concerned that the technology will see some races or genders as more angry or inattentive than others. I seriously hope the designers are testing specifically for equal, highly accurate prediction across all races, genders, and age groups. And what about those of us who have resting grumpy face?



Martin Giles of MIT Technoloyg Review went to Black Hat, a big cybersecurity conference in Vegas. He came away worried that the cybersecurity industry may not understand supervised or unsupervised learning well enough to to use it to find badly behaved humans. He wasn’t as blunt as I am, but automated decision makers are vulnerable when humans actively try to mess them up.



Doc.ai is a start-up that is competing with some serious heavyweights (Apple, IBM Watson) in the race to unite medical records and develop precision medicine models. Doc.ai works in two steps. First, individual users download the app which has importers that reach out and capture data from their doctors, pharmacies, diagnostic labs, users’ ongoing physical activity, and self-reported health issues (e.g. allergies). The second (optional) step is when users elect to participate in research studies. The first such study aims to see if random forests models can predict when people’s allergies will flare up.
Elucid Labs is another data science start up in the medical space, offering a ‘dermatologist in a box’. Basically, they’ve gotten fairly accurate predictive results by using image recognition models to identify which moles are cancerous.



Axon, the US’s largest manufacturer of body cameras, said in an earnings call that the technology is not accurate enough for facial recognition. The company is not currently working to develop better accuracy and does not recommend using the cameras for facial recognition. The ACLU is pissed that the City of Orlando is planning to use the cameras in conjunction with Amazon’s Rekognition technology to use facial recognition in policing anyway. Oh, and body cams are very hackable.

Good news, there is less money going into ad tech. Bad news, it’s because Google and Facebook have created an oligopoly dominant enough to scare away all the venture capitalists.

In unrelated news, Google is expanding a giant data center in South Carolina by infusing an additional $600m.

Consumer Reports has developed an easier-to-understand set of standards that it is applying to peer-to-peer payment platforms like Apple Pay, Zelle, and Venmo.



Getty Images launched a new feature that will suggest which images should accompany news stories. I can only imagine the blue, inner glowing soft-edged shapes that it would suggest for this data science article. (Daniela, where is that Twitter feed of overused data science images?)



Excientia, a startup using machine learning in the drug development process reveals one population that stands to benefit from the widespread adoption of AI: research animals. The more accurate the simulated models get, the fewer animals will have to suffer through the testing procedure.


Largest body camera supplier in U.S. says facial recognition isn’t good enough yet for police work

Orlando Weekly, Monivette Cordeiro


from

Orlando city officials aren’t losing much sleep over their decision to experiment with Amazon’s Rekognition software as a real-time mass surveillance program. But Axon, the largest supplier of police body cameras in the U.S., says facial recognition technology is not good enough yet for law enforcement use.

In a quarterly earnings call Tuesday, Axon CEO Rick Smith told investors that the company does not have a timeline for launching facial recognition in its products because the software available does not meet necessary “accuracy thresholds.”

“We do not have a team actively developing it,” Smith said. “This is a technology we don’t believe that – sitting here today – the accuracy thresholds are right where they need to be to [make] operational decisions off of facial recognition.”

 
Deadlines



All Hands 2018 – West Big Data Innovation Hub

Boise, ID September 20. Deadline for posters and demos is September 1. [$$]

2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics – NAACL-HLT 2019

Minneapolis, MN Conference is June 2-7, 2019. Deadline for paper submissions is December 10.
 
Tools & Resources



Making sense of topic models

Medium, Pew Research Center, Decoded blog, Patrick van Kessel


from

In my first post about topic models, I discussed what topic models are, how they work and what their output looks like. The example I used trained a topic model on open-ended responses to a survey question about what makes life feel fulfilling and examined three topics in particular:

By looking at the top words for each topic, we can clearly see that Topic 4 is related in some way to hobbies; Topic 5 seems to pertain to charity and giving back to society; and Topic 88 has something to do with spending time pursuing and contributing to more serious things, like work. However, we also notice some quirks, which raise an important question: How do we figure out what these topics are, exactly? As it turns out, that can be a very difficult question to answer.

 
Careers


Full-time positions outside academia

Research Scientist, Experimentation



Facebook; Menlo Park, CA

Content Creator



Gladstone Institutes; San Francisco, CA
Full-time, non-tenured academic positions

Research Assistant – Technical



Baycrest Health Sciences, Rotman Research Institute; Toronto, ON, Canada

Project Managers (2)



University of Oxford, Future of Humanity Institute; Oxford, England

Leave a Comment

Your email address will not be published.