NYU Data Science newsletter – August 6, 2015

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 6, 2015

GROUP CURATION: N/A

 
Data Science News



Dat Goes Beta

U.S. Open Data


from July 29, 2015

After a long year of alpha testing, which started on August 19th, we are excited to announce our launch of a new, even-more-stable phase of dat. Beta starts now.

Let us know what you’re working on and how dat might work for you. We’ll even come to your lab to help you get up and running, implement features, and fix bugs in real time. That’s first-class service!

 

Public Health in the Precision-Medicine Era — NEJM

New England Journal of Medicine


from August 06, 2015

That clinical medicine has contributed enormously to our ability to treat and cure sick people is beyond contention. But whether and to what extent medical care has transformed morbidity and mortality patterns at a population level and what contribution, if any, it has made to the well-being and life expectancy of the least-advantaged people have been matters of contention for more than a century. This debate has taken on renewed importance as the scientific leadership at the National Institutes of Health (NIH), National Academy of Medicine, and U.S. universities have taken up the challenge of personalized or precision medicine. It is a challenge given all the more salience by President Barack Obama’s announcement in his State of the Union address that his administration would seek to fund a major new initiative. Responding to the President’s words, Harold Varmus, director of the National Cancer Institute, and Francis Collins, director of the NIH, have written that “What is needed now is a broad research program to build the evidence base needed to guide clinical practice.”1

The enthusiasm for this initiative derives from the assumption that precision medicine will contribute to clinical practice and thereby advance the health of the public. We suggest, however, that this enthusiasm is premature.

 

Could venture capitalists use data analytics to improve success?

Fortune, Tech


from August 05, 2015

Ask anyone in venture capital about their business model and they will probably tell you it’s all about the “hits.” In the VC world, a hit is a startup that makes it big, returning many multiples of a venture fund’s initial investment. Hits are great for everyone—investors, entrepreneurs, job seekers—but the problem is they don’t happen very often. William Hambrecht, a legendary venture capitalist who made early investments in Apple, Genentech, and Google, says the odds of a big hit are about one in 10. “A few others will work out, and you’re going to lose in a lot,” he says.

But what if venture capital could boost its odds to 50-50, or even two out of three? With $48 billion in VC investment in 2014, such an improvement would prevent huge amounts of money from being lost on startups that never had much of a chance of surviving the harsh competitive environment. The challenge is to identify those likely laggards well before the market rejects their idea and, perhaps more importantly, to see the big hits before anyone else. Venture capital has long relied on subjective, intuitive methods of assessing startups, but that’s changing as more firms are bringing data science and consistency into their decision-making.

 

The New Science of Sentencing

The Marshall Project, Five Thirty Eight


from August 04, 2015

… Risk assessments have existed in various forms for a century, but over the past two decades, they have spread through the American justice system, driven by advances in social science. The tools try to predict recidivism — repeat offending or breaking the rules of probation or parole — using statistical probabilities based on factors such as age, employment history and prior criminal record. They are now used at some stage of the criminal justice process in nearly every state. Many court systems use the tools to guide decisions about which prisoners to release on parole, for example, and risk assessments are becoming increasingly popular as a way to help set bail for inmates awaiting trial.

But Pennsylvania is about to take a step most states have until now resisted for adult defendants: using risk assessment in sentencing itself. A state commission is putting the finishing touches on a plan that, if implemented as expected, could allow some offenders considered low risk to get shorter prison sentences than they would otherwise or avoid incarceration entirely. Those deemed high risk could spend more time behind bars.

Pennsylvania, which already uses risk assessment in other phases of its criminal justice system, is considering the approach in sentencing because it is struggling with an unwieldy and expensive corrections system. Pennsylvania has roughly 50,000 people in state custody, 2,000 more than it has permanent beds for. Thousands more are in local jails, and hundreds of thousands are on probation or parole. The state spends $2 billion a year on its corrections system — more than 7 percent of the total state budget, up from less than 2 percent 30 years ago. Yet recidivism rates remain high: 1 in 3 inmates is arrested again or reincarcerated within a year of being released.

States across the country are facing similar problems — Pennsylvania’s incarceration rate is almost exactly the national average — and many policymakers see risk assessment as an attractive solution. Moreover, the approach has bipartisan appeal: Among some conservatives, risk assessment appeals to the desire to spend tax dollars on locking up only those criminals who are truly dangerous to society. And some liberals hope a data-driven justice system will be less punitive overall and correct for the personal, often subconscious biases of police, judges and probation officers. In theory, using risk assessment tools could lead to both less incarceration and less crime.

There are more than 60 risk assessment tools in use across the U.S., and they vary widely. But in their simplest form, they are questionnaires — typically filled out by a jail staff member, probation officer or psychologist — that assign points to offenders based on anything from demographic factors to family background to criminal history. The resulting scores are based on statistical probabilities derived from previous offenders’ behavior. A low score designates an offender as “low risk” and could result in lower bail, less prison time or less restrictive probation or parole terms; a high score can lead to tougher sentences or tighter monitoring.

The risk assessment trend is controversial. Critics have raised numerous questions: Is it fair to make decisions in an individual case based on what similar offenders have done in the past? Is it acceptable to use characteristics that might be associated with race or socioeconomic status, such as the criminal record of a person’s parents? And even if states can resolve such philosophical questions, there are also practical ones: What to do about unreliable data? Which of the many available tools — some of them licensed by for-profit companies — should policymakers choose?

 

Speeding Up Your Quest(s) For “R Stuff” | rud.is

rud.is


from August 05, 2015

I use Google quite a bit when conjuring up R projects, whether it be in a lazy pursuit of a PDF vignette or to find a package or function to fit a niche need. Inevitably, I’ll do something like this (yeah, I’m still on a mapping kick) and the first (and best) results will come back with https://cran.r-project.org/-prefixed URLs. If all this works, what’s the problem? Well, the main CRAN site is, without mincing words, slow much of the time. The switch to https on it (and it’s mostly-academic mirrors) has introduced noticeable delays.

Now, these aren’t productivity-crushing delays, but (a) why wait if you don’t have to; and, (b) why not spread the load to a whole server farm dedicated to ensuring fast delivery of content? I was going to write a Chrome extension specifically for this, but I kinda figured this was a solved problem, and it is!

 

The Future of Work: Working for the Machine

Pacific Standard


from August 04, 2015

… Imagine yourself in 10 years. You wake up, turn on your laptop from your digital co-working space, and log in to work. The work platform recalls your skills and abilities (say, video production), then matches you immediately to a team of other experts around the world. The system tells today’s team that a client wants help putting together a marketing plan for a new product. You’ve never met your teammates, but they all have stellar reputations, so you jump in using Skype, Dropbox, and an online team room. A week later, the platform has connected you with a different team to collaborate on a documentary. A year later, you’ve joined a digital organization that has hundreds of people but no physical headquarters—one that elastically brings in new skills as it needs to navigate the marketplace.

Our research group at Stanford’s Computer Science department has been building the tools and infrastructure for “flash teams” like these to arise from expert crowdsourcing marketplaces.

 

The simple way to make data science effective | Computerworld

Computerworld


from August 04, 2015

If you look at a data science job posting, chances are it will ask for experience with machine-learning techniques, statistical programming languages, nosql databases and maybe some visualization tools. If you look at the curriculum of the new data scientist “boot camps,” the material will be the same. But if you look at what a data scientist actually does, it’s cleaning and collecting data. This isn’t because data scientists are misguided. Quite the opposite, in fact. It’s because they know that cleaning and collecting data is actually the most important thing for successful data applications.

One great example is Google Translate. Nowadays, Google has some of the best scientists who study natural language translation. But when Google Translate launched, it was immediately as good or better than many translation products that had been created over decades. And it wasn’t due to state-of-the-art algorithms. Rather, it was because Google had a corpus of data that was bigger than anyone else had access to — the entire Web that it crawled in order to make Google search. While Google hires the best data scientists and talks a lot about its amazing algorithms, if you ask the researchers who work there, they will tell you that a lot of their success is due to a massive brute-force effort to make high-quality data available everywhere.

 

HDScores just launched an API for its trove of restaurant inspection data

Technical.ly Baltimore


from August 05, 2015

HDScores founder Matthew Eierman has this mantra about letting anyone tap into the data his startup has compiled about restaurant inspections: “We don’t care where people consume our data, as long as they consume our data.”

It fits, then, that the company is launching public API so developers, open-data lovers and the rest of the public can get access. It will be available for free to users up to 2,500 inspection reports a day. Then, you have to pay. The data is also available on Data.gov and Microsoft’s Azure Marketplace.

Since the project started three-and-a-half years ago, the bootstrapped startup has added data from state and local health inspection reports from 615,000 restaurants to its system (that’s 41 percent of existing restaurants, the company estimates) then cleans the data and puts it into a consumable format.

 

UW workshop to explore Big Data solutions for science | UW Today

UW Today


from August 04, 2015

At a University of Washington workshop this week, a hundred graduate students from around the country will explore a question that everyone is asking these days: What can data science do for me?

To land an invite to the Data Science 2015 workshop on Aug 5 – 7, they were asked to identify a single challenge, big idea or solution that data science — the process of extracting knowledge and making discoveries from vast amounts of data — could advance in their scientific or engineering fields.

 

Civic Data: UChicago Data Science For Social Good Fellowship Hits Third Year

Chicago Inno


from August 05, 2015

Elissa Redmiles had all the background to excel in civic data, but before this summer she didn’t know it.

After receiving an undergraduate degree in computer science at University of Maryland, she worked for a year in curriculum development that encouraged more women to join the computer science field, then worked for a year in marketing at IBM. After two years working in the tech field, but not honing her technical skills, she was searching for a way combine her programming skills and passion for making a difference. Then a friend posted a listing for University of Chicago’s Data Science for Social Good (DSSG) summer program.

“When I saw ‘social good,’ I was like, that is awesome,” she said. “I never see people using technical skills to solve social problems.”

 
CDS News



Inside Facebook’s Quest for Software That Understands You | MIT Technology Review

MIT Technology Review


from August 06, 2015

The first time Yann LeCun revolutionized artificial intelligence, it was a false dawn. It was 1995, and for almost a decade, the young Frenchman had been dedicated to what many computer scientists considered a bad idea: that crudely mimicking certain features of the brain was the best way to bring about intelligent machines. But LeCun had shown that this approach could produce something strikingly smart—and useful. Working at Bell Labs, he made software that roughly simulated neurons and learned to read handwritten text by looking at many different examples. Bell Labs’ corporate parent, AT&T, used it to sell the first machines capable of reading the handwriting on checks and written forms. To LeCun and a few fellow believers in artificial neural networks, it seemed to mark the beginning of an era in which machines could learn many other skills previously limited to humans. It wasn’t.

 

Leave a Comment

Your email address will not be published.