“There’s no better place than Singapore to do a deep tech startup, particularly anything involving cryptography.” So says Brijesh Pande, founder and managing partner of the Tembusu ICT Fund, a Singapore-based software-focused venture capital fund. Admittedly, he has a vested interest in enticing entrepreneurs to come to the island nation, but he—and two founders of companies in his portfolio, Lawrence Hughes from Sixscape and Ramond Looi from Vi Dimensions—make a solid argument.
Here in Singapore, Pande says, “We have no requirement for a security back door. The fact that the NSA [National Security Agency] requires U.S. companies to provide a back door makes technology developed in the U.S. less trusted around the world.”
“The natural marriage of science and sport is only strengthening,” Goldman analyst Christopher Wolf wrote in a client note. “The insatiable thirst for a competitive edge, coupled with new technologies and advanced computing power (e.g., IBM Watson) is driving the evolution of sports from the analog to the digital era.”
The Association of Computing Machinery (ACM) just concluded a celebration of 50 years of the ACM A.M. Turing Award (commonly known as the “Nobel Prize of computing”) with a two-day conference in San Francisco. The conference brought together some of the brightest minds in computing to explore how computing has evolved and where the field is headed. Big data was the focus of a number of panels and discussions at the conference. The following is a discussion with Daphne Koller, Chief Computing Officer, Calico Labs; Adjunct Professor of Computer Science, Stanford University; ACM-Infosys 2007 Foundation Award.
University Data Science News
UC Irvinerescinded admission to 499 incoming freshmen, explaining that they either had senioritis (bad senior year grades) or failed to submit their final transcripts. Students are appealing in droves, many claiming to have sent the reportedly missing transcripts. The school admits it is facing an overenrollment crunch and used “a harder line on the terms and conditions this year” to address it. This is what happens when the pressure to get a degree increases but the funding for expanding the physical plant and hiring tenure-track professors stagnates or decreases.
Another sign of creeping credentialization? Science ran a story this week that could have been titled: “Your PhD is no good here”. A PhD is apparently not enough qualification to be science writer or to work in a tech transfer office. But there is good news! You can keep the university system afloat and spend more years of your life infantilized in the classroom by getting another degree.
David Sontag and colleagues published a breakthrough paper in Nature that proposes a two-step human-in-the-loop noisy-Or Bayesian gate model for emergency room diagnostics. The procedure should work on other medical data, too, but it was built and tested on ER data. Deriving robust, valid insight from medical records has been a hard problem. Sontag and colleagues now recommend a “a two-step process…for the construction of the knowledge graph, in which a clinician reviews and rejects some of the edges suggested by the model. Using the results of the clinical evaluation, we can infer that if a filtering step were added to the pipeline, to achieve perfect precision with a corresponding recall of 60%, physicians would have to discard fewer than 2 out of 10 suggested edges. If this step were added to the pipeline, the resulting graph would have perfect precision and recall that would far exceed that of the Google health knowledge graph, making it an attractive candidate for real life applications.” This is big progress.
Keep in mind that it is coming well after hospitals started digitizing their records a decade (or more, or less, ago). The time it has taken to get from the idea that hospitals could and should digitize health records to now, when we’re seeing robust demonstrations of good data science – including human doctors – is a smart case study in what applied data science will look like. (See this new pre-print on imputing electronic health record data.) It will take time. There is no one-size-fits most data ‘answer me this’ button that can simply be installed on a data set. It will take human experts like computer scientists, statisticians, domain experts, and (increasingly, as more people adopt the job title) data scientists.
Fei Fei Lipublished ImageNet in 2009 on a poster in a Miami Beach conference venue. (Note to my home discipline: Why don’t we have conferences in desirable destinations? Are there no unionized hotels near gorgeous locales?) Quartz’sDavid Gershgorn explains how that poster became the key dataset in image recognition and Li rocketed into simultaneous positions at Stanford and Google.
DataCamp, one of a growing number of for-profit data science training outfits, will compete with universities for students. DataCamp just raised $4m from investors and plans to offer 80 short courses. Very different model than a masters degree – short, affordable, and uncertified – and will be interesting to watch.
Ohio State University has a newTranslational Data Analytics Institute that is inviting input from industry to develop curriculum.
Nasdaqreleased four new data sets into its Nasdaq Analytics hub which it is already selling to buy-side analysts. One of them scores data from SEC filings, two others do something indiscernible with “AI and algorithmic multi-factor ensemble voting”, and a fourth agglomerates the trades of online retail investors. The first one makes sense. I withhold judgment on the last three.
Grab your popcorn and get ready for a statistical blockbuster. The grumblings about p-hacking have erupted into a full-blown public confrontation about the role and proper level of p-values for gold-standard science.
NVIDIA CEO Jensen Huang chose to light up a meetup of elite deep learning researchers at CVPR to unveil the NVIDIA Tesla V100, our latest GPU, based on our Volta architecture, by presenting it to 15 participants in our NVIDIA AI Labs program.
Movidius and Intel have put deep-learning on a stick with a tiny $79 USB device that makes bringing AI to hardware a snap.
In April of last year, Movidius showed off the first iteration of this device, which they then called the Fathom Neural Compute Stick. The company wasn’t able to get the product out as quickly as they had hoped because they were a little busy getting acquired by Intel.
What we do know so far about the second-gen HPU is that it will incorporate an accelerator for deep neural networks (DNNs). The deep learning accelerator is designed to work offline and use the HoloLens’ battery, which means it should be quite efficient, while still providing significant benefits to Microsoft’s machine learning code.
Once we’ve identified that reproducibility is a big problem, the question becomes: How do we tackle it? Part of the answer has to do with changing incentives for researchers. But there are plenty of things we in the research community can do right now in the course of our scientific work.
It might come as a surprise that archaeologists are at the forefront of finding ways to improve the situation. Our recent paper in Nature demonstrates a concrete three-pronged approach to improving the reproducibility of scientific findings.
Just as everyone gets settled in, Jay Emerson strides forward, directing
attention to the middle of the classroom.
He is the director of graduate studies in the Department of Statistics and Data
Science at Yale University, and he’s teaching a workshop on the aircraft carrier USS George H.W. Bush (CVN 77).
Using open discussions and “hands-on” practical exercises, the “Life in the Sea
of Statistics and Data Science” workshop hopes to address real-life scenarios and
give sailors the opportunity to learn something new.
Amazon is sending all kinds of signals that it’s interested in the health-care industry.
CNBC reported in May that the company was on the hunt for a general manager to lead a new pharmacy unit. Since then, it has brought on a slew of health experts to bolster its cloud offering, Amazon Web Services, and rallied the industry to build applications for its Alexa voice technology. Amazon has also been selling medical supplies online for some time.
That interest is making some players in the health-care industry nervous.
We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.
I’d like to offer some advice on how to write better and more truthfully when you write articles about artificial intelligence. The reason I’m writing this is that there are a whole lot of very bad articles on AI (news articles and public interest articles) being published in newspapers and magazines. Some of them are utter nonsense, bordering on misinformation, some of them capture the gist of what goes on but are riddled with misunderstandings. No, I will not provide examples, but anyone working in AI and following the news can provide plenty. There are of course also many good articles about AI, but the good/bad ratio could certainly be improved.
arXiv, Computer Science > Learning; Patrice Y. Simard, Saleema Amershi, David M. Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, John Wernsing
from
The current processes for building machine learning systems require practitioners with deep knowledge of machine learning. This significantly limits the number of machine learning systems that can be created and has led to a mismatch between the demand for machine learning systems and the ability for organizations to build them. We believe that in order to meet this growing demand for machine learning systems we must significantly increase the number of individuals that can teach machines. We postulate that we can achieve this goal by making the process of teaching machines easy, fast and above all, universally accessible.
While machine learning focuses on creating new algorithms and improving the accuracy of learners, the machine teaching discipline focuses on the efficacy of the teachers. Machine teaching as a discipline is a paradigm shift that follows and extends principles of software engineering and programming languages. We put a strong emphasis on the teacher and the teacher’s interaction with data, as well as crucial components such as techniques and design principles of interaction and visualization.
“MapD Vega is based on the open-source Vega specification developed by Jeffrey Heer and his group at the University of Washington. We’ve adapted the original specification to the MapD platform so you can use the power of SQL to investigate your data and quickly render it as a custom visualization.”
“TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.”