University of Chicago, Medicine & Biological Sciences
from
Using health insurance claims data from more than 480,000 people in nearly 130,000 families, researchers at the University of Chicago have created a new classification of common diseases based on how often they occur among genetically-related individuals.
Researchers hope the work, published this week in Nature Genetics, will help physicians make better diagnoses and treat root causes instead of symptoms.
“Understanding genetic similarities between diseases may mean that drugs that are effective for one disease may be effective for another one,” said Andrey Rzhetsky, PhD, the Edna K. Papazian Professor of Medicine and Human Genetics at UChicago who was the paper’s senior author. “And for those diseases with a large environmental component, that means we can perhaps prevent them by changing the environment.”
Another icml, another ml4md workshop! It was 3rd machine learning for music discovery workshop this year and was featured with many awesome talks as expected. I’ll summarise who talked what briefly.
This article is about the smaller and newer companies out there. Let me preface this whole thing by saying that not all startups are the same. Everyone is a special flower that grows or wilts just so. Now, let’s generalize. There exists a strange dynamic in the startup world where artificial intelligence is really hot, and yet the alignment between expectations and reality is way out of whack when it comes to pricing expectations. An enterprise client will typically need some quick solutions to deploy apache spark or hadoop or whatever, just to get at their data in a timely manner. That’s before even thinking about machine learning. For startups it is just as challenging.
Proceedings of the National Academy of Sciences; David M. Blei and Padhraic Smyth
from
Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions and insights. In this article, we ask why scientists should care about data science. To answer, we discuss data science from three perspectives: statistical, computational, and human. Although each of the three is a critical component of data science, we argue that the effective combination of all three components is the essence of what data science is about.
On September 1, the Design Lab at UC San Diego will launch a new project to help teach incoming graduate students how to program in the era of big data. The project is funded by the National Science Foundation (NSF) Innovations in Graduate Education (IGE) program, and the Design Lab project is one of 10 new IGE grants awarded a total of $4.8 million to “pilot, test and validate innovative and potentially transformative ways to teach science, technology, engineering and mathematics (STEM).”
The UC San Diego team will receive approximately $500,000 over three years to develop a new data-science teaching approach via “Augmenting, Piloting and Scaling Computational Notebooks to Train New Graduate Researchers in Data-Centric Programming.”
Indiana University Bloomington, School of Informatics and Computing
from
John Walsh, an associate professor of information and library science at the School of Informatics and Computing, has been named interim co-director of the HathiTrust Research Center.
Walsh will replace Professor of Informatics and Computing Beth Plale, who will be taking a temporary assignment as science advisor for public access for the National Science Foundation. Plale was among the founders of the HTRC in 2011, which was opened to develop cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge in the HathiTrust Digital Library (HTDL).
“Studies show that users abandon video sessions if the quality is too low, leading to major losses in ad revenue for content providers,” says MIT Professor Mohammad Alizadeh. “Sites constantly have to be looking for new ways to innovate.”
Along those lines, Alizadeh and his team at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed “Pensieve,” an artificial intelligence (AI) system that uses machine learning to pick different algorithms depending on network conditions. In doing so, it has been shown to deliver a higher-quality streaming experience with less rebuffering than existing systems.
Google is paying Apple billions of dollars to remain the default search engine on iPhones and iPads, Bernstein said in a note to investors on Monday.
The firm believes that Google will pay Apple about $3 billion this year, up from $1 billion just three years ago, and that Google’s licensing fees make up a large bulk of Apple’s services business.
Chris Lattner, one of the key creators behind the Apple programming language Swift, is on the move again. After a short six-month stay at Tesla, which he joined last year from Apple to act as VP of Autopilot Software, Lattner announced on Twitter today that his next stop is Google Brain.
A U.S. federal judge on Monday ruled that Microsoft Corp’s (MSFT.O) LinkedIn unit cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public.
U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles.
This is a story very much of our times: development and deployment of better devices/sensors (in this case an iRhythm Zio) leads to collection of much larger data sets than have been available previously. Apply state of the art deep learning techniques trained on those data sets, and you get a system that outperforms human experts.
Shortly after [Nick] Lucius started with the city, however, he was picked for an inaugural program run by a Chicago data science company called Uptake. Dubbed beyond.uptake, the six-month fellowship included training sessions on methodologies, cybersecurity, machine learning, agile development, data visualization and more. Participants were also paired with three mentors from Uptake’s expert staff: a data scientist, an engineer, and a business or agile coach. They spent the next six months working on a project, checking in regularly with their mentors as they went.
Tom Schenk, Chicago’s chief data officer, said the program helped bridge the gap between the experience and technical knowledge Lucius brought to the work and the esoteric nature of the field.
Alabama Gov. Kay Ivey is investing $1.8 million into new digital tools and data collection efforts to stop a surge of traffic fatalities throughout the state.
Ivey awarded the funding Monday through three grants to the University of Alabama’s Center for Advanced Public Safety (CAPS), a group that has served the state through its collection of traffic data, its traffic survey software and its efforts to develop a state highway safety plan. Ivey said in a statement she hopes the additional funding will lower the state’s traffic fatalities, which jumped nearly 24 percent from 788 in 2015 to 975 in 2016.
For the future of the planet, there are few research subjects more important than the global supplies of food, water and energy. To comprehensively study, understand and inform policy around these complex systems, the next generation of researchers in the physical, social and biological sciences will need fluency with data analysis methods that transverse traditional academic boundaries.
A new interdisciplinary curriculum will train graduate students from geosciences, economics, computer science, public policy and other programs in computational and data science techniques critical for modern science. With a $3 million award from the National Science Foundation, the new research traineeship grant will combine expertise from across UChicago and Argonne National Laboratory in computing, statistics, social science, climate and agriculture.
A frequent question from students who are considering joining our immersive program is “Why does the immersive program teach data science in Python instead of R?” I have a lot of experience writing both Python and R, ranging from small one-off projects and analysis to large efforts involving many developers, and I have formulated some well-balanced opinions on Python vs. R. I do believe that Python is the correct choice for an educational program in Data Science.
Mobile sensors for workplace productivity? Activity trackers for meetings? It’s the future of mobile sensor technology and two University of Minnesota faculty are making it possible, ushering in a new era of productivity and efficiency in the workplace.
Deniz Ones, professor of psychology at the University’s College of Liberal Arts and Mustafa al’Absi, professor of behavioral medicine at the University’s Medical School, join a six-university team which will use the $13.8 million project to develop and test mPerf, a system of mobile sensors and software that can be used to objectively assess everyday job performance. The project was funded by the Intelligence Advanced Research Projects Activity’s (IARPA) Multimodal Objective Sensing to Assess Individuals with Context (MOSAIC) program.
Talk of artificial intelligence is everywhere. People marvel at the capacity of machines to translate any language and master any game. Others condemn the use of secret algorithms to sentence criminal defendants or recoil at the prospect of machines gunning for blue, pink, and white-collar jobs. Some worry aloud that artificial intelligence will be humankind’s “final invention.”
This essay, prepared in connection with UC Davis Law Review’s 50th anniversary symposium, explains why AI is suddenly on everyone’s mind and provides a roadmap to the major policy questions AI raises. The essay is designed to help policymakers, investors, technologists, scholars, and students understand the contemporary policy environment around AI at least well enough to initiative their own exploration.
In what appears to be the first successful hack of a software program using DNA, researchers say malware they incorporated into a genetic molecule allowed them to take control of a computer used to analyze it.
The biological malware was created by scientists at the University of Washington in Seattle, who call it the first “DNA-based exploit of a computer system.”
To carry out the hack, researchers led by Tadayoshi Kohno (“see “Innovators Under 35, 2007”) and Luis Ceze encoded malicious software in a short stretch of DNA they purchased online. They then used it to gain “full control” over a computer that tried to process the genetic data after it was read by a DNA sequencing machine.
Computer algorithms trained on the images of thousands of preserved plants have learned to automatically identify species that have been pressed, dried and mounted on herbarium sheets, researchers report.
The work, published in BMC Evolutionary Biology on 11 August1, is the first attempt to use deep learning — an artificial-intelligence technique that teaches neural networks using large, complex data sets — to tackle the difficult taxonomic task of identifying species in natural-history collections.
It’s unlikely to be the last attempt, says palaeobotanist Peter Wilf of Pennsylvania State University in University Park. “This kind of work is the future; this is where we’re going in natural history.”
Atlanta, GA Tuesday, October 3. Our seventh annual symposium showcases two days of panels with leading international experts, presentations, and an exhibit of AEC research being conducted across the Georgia Tech community. [$$$]
Erlangen, Germany August 28-September 1. The EuroSciPy meeting is a cross-disciplinary gathering focused on the use and development of the Python language in scientific research. [$$$]
This edited collection explores the othering and discrimination propagated and encountered by individuals online and in social media contexts and cultures. It problematizes and analyses the dichotomy presented between real and virtual worlds (and spaces) by exploring the motivations behind certain offending and othering behaviours, and the impact this has on the targets of online abuse and hate speech. Deadline for interested contributors is August 31.
arXiv, Computer Science > Computation and Language; Justine Zhang, Arthur Spirling, Cristian Danescu-Niculescu-Mizil
from
Questions play a prominent role in social interactions, performing rhetorical functions that go beyond that of simple informational exchange. The surface form of a question can signal the intention and background of the person asking it, as well as the nature of their relation with the interlocutor. While the informational nature of questions has been extensively examined in the context of question-answering applications, their rhetorical aspects have been largely understudied.
In this work we introduce an unsupervised methodology for extracting surface motifs that recur in questions, and for grouping them according to their latent rhetorical role. By applying this framework to the setting of question sessions in the UK parliament, we show that the resulting typology encodes key aspects of the political discourse—such as the bifurcation in questioning behavior between government and opposition parties—and reveals new insights into the effects of a legislator’s tenure and political career ambitions.
A team of researchers has developed a new mechanism that uses machine-learning algorithms to distinguish between genuine and counterfeit versions of the same product.
New York, NY Monday, August 21, Bruno Gonçalves, Moore-Sloan Fellow at NYU’s Center for Data Science, will explain the intuition behind word embeddings and the word2vec family of algorithms. [rsvp required]
SentEval is a library for evaluating the quality of sentence embeddings. We assess their generalization power by using them as features on a broad and diverse set of “transfer” tasks (more details here). Our goal is to ease the study and the development of general-purpose fixed-size sentence representations.
The transient nature of the clusters should not be an issue or concern for Data Scientists. They should not worry about the execution environment and how to interact with it. Genie and the tools around it, help us in making that possible.
We wanted to offer some great tips that will help your data science team be more succesful. This has nothing to do with algorithms and models, and everything with how data specialist need to operate in a business.