NYU Data Science newsletter – August 4, 2016

NYU Data Science Newsletter features journalism, research papers, events, tools/software, and jobs for August 4, 2016

GROUP CURATION: N/A

 
Data Science News



Do good with data

Periscopic, Kim Rees


from July 01, 2016

We think incrementally even when faced with massive disruption. Our brains need training to think in a new way. … we’re facing a paradigm shift in how we need to use data. We can no longer navel gaze at our own slice of data?—?we need to think about how our data touches other data. Just as my relationships, health, work, ambitions, income, travels, interactions, and adventures all influence each other, so too does the data collected in all of those areas.

 

Publishers’ Dilemma: Judge A Book By Its Data Or Trust The Editor’s Gut?

NPR, All Tech Considered


from August 02, 2016

A publishing house might give a first-time author a six-figure deal, only to see the book flop. It’s always been hard to predict what will sell. Now publishers are getting some help from data that tells them how readers read — and that makes some people nervous.

 

Universities are tracking their students. Is it clever or creepy?

The Guardian, Chris Jutting


from August 03, 2016

What could be more normal that heading to the university library, swiping your card and logging in to a computer?

Most students wouldn’t think twice about it. But what they may not realise is that this mundane series of events leaves a unique data pattern that can be recorded, logged and reviewed, in a practice known as “learning analytics”. And now data analysts are using this information to predict whether students will struggle with their courses, or drop out.

Also, in tracking + data:

  • [1607.07403] On the Ubiquity of Web Tracking: Insights from a Billion-Page Web Crawl (July 25, arXiv, Computer Science > Social and Information Networks; Sebastian Schelter, Jérôme Kunegis)
  • Publishers’ Dilemma: Judge A Book By Its Data Or Trust The Editor’s Gut? (August 02, NPR, All Tech Considered)
  •  

    Inside Facebook’s hardware labs: Moving faster with more collaboration

    Facebook Code, Engineering Blog; Spencer Burns and Mikal Greaves


    from August 03, 2016

    Hardware engineering traditionally happens behind closed doors, in isolated labs. We fell into this pattern ourselves early on at Facebook, as we opened individual hardware labs to support new teams. Some of our first labs — including one in a repurposed mail room, in our old Palo Alto headquarters — were built for our Infrastructure teams to prototype custom racks, servers, storage systems, and network switches for our data centers. As new, hardware-oriented teams like Connectivity Lab and Oculus started to form, we built additional labs for those teams to design, prototype, and test. Today we have hardware labs all over the world — from our Oculus facilities in the Seattle area to our Aquila hangar in the U.K. to our laser communications lab in Southern California — as well as a number of custom labs in our Menlo Park office that are used by the Oculus, Connectivity Lab, and Infrastructure teams.

     

    2016 Ford-Mozilla Open Web Fellows

    Mozilla Advocacy


    from August 03, 2016

    Ford-Mozilla Open Web Fellows are activists and technologists on the front lines of the open internet movement. They’re a diverse group: makers, artists, developers, and data scientists from across the globe. But Fellows all share a common belief: The world can be made a better place by leveraging the open internet.

     

    Robots Are Getting A Sense Of Self-Doubt

    Popular Science


    from August 01, 2016

    To gauge whether an action will fail or not, the key is prediction. As humans, we predict outcomes naturally, by imagining everything that could go wrong after making a decision. In the case of deciding a drone’s flight path, an algorithm would have to look at the video coming from the drone’s camera, and predict based on the objects in the picture if its next action will cause a crash. The drone would have to imagine the future.

    This idea goes beyond object detection, a feature found in some drones now. That detection is simple: if there’s something in the way, don’t go there. But this framework would allow the A.I. to actually determine if the scenario could be detrimental to its primary mission.

     

    Blockchain: Coming Sooner Than You Think

    RTInsights


    from August 02, 2016

    As explained by R3, a blockchain alliance of 50 of the world’s top financial firms, banks often use decades-old software and separately maintained databases to register, track, and account for transactions. That introduces risk of insufficient liquidity and tedious regulatory reporting. With one ledger provided through the blockchain, transactions could be validated in near real-time, and regulatory audits would be much easier.

    Benefits to blockchain in finance also might include complying with know-your-customer requirements through identify verification; easier cross-border payments; and freeing up capital with faster settlement.

     

    IBM is one step closer to mimicking the human brain

    Wired UK


    from August 03, 2016

    Scientists at IBM have claimed a computational breakthrough after imitating large populations of neurons for the first time. … For the first time, scientists have created their own “randomly spiking” artificial neurons that can store and process data.

     

    Potentially Disruptive Digital Trends: Will There Be Any Big Data without Artificial Intelligence?

    SmartData Collective, Ioana Sima


    from August 03, 2016

    The digital universe is expanding at a staggering rate. Right now, there are about 4.4 zettabytes of data on the internet. For reference, a zettabyte is equivalent to 1000? bytes.

    According to the IDC, by 2020 there will be approximately 44 zettabytes of data. The increase is not only staggering but is scheduled to take place in a relatively short amount of time. Traditional means of handling digital information (i.e. human experts and specialized software), will no longer be enough.

     

    Moore Diversity

    Medium, Moore Data, Carly Strasser


    from August 03, 2016

    Diversity has been on my mind a lot lately. It’s a vague phrase (like “sustainability” or “governance” or “best practices”)—but it’s one that I keep finding myself chatting about with our grantees, fellow conference attendees, and other Moore Foundation personnel. I won’t attempt to cover the gamut of diversity-related topics in this post, however I want to get a few of my more recent thoughts out; this blog post will focus on (primarily gender) diversity at conferences.

     
    Events



    Challenges and Advances on Big Data in Neuroimaging



    “This conference is to bring together statisticians, clinicians, data scientists and graduate students in neuroimaging to exchange ideas of the current challenges and developments in statistical methods and applications in the research of big neuroimaging data.”

    Cleveland, OH Thursday-Friday, August 25-26. [$$$]

     
    Tools & Resources



    Probabilistic data structures in Python

    O'Reilly Media, Paco Nathan


    from August 01, 2016

    This tutorial is intended for a Python programmer who has some background working with big data, who now needs to learn how to apply probabilistic data structures for analytics with large-scale data and streaming applications, and especially for use cases that require both. [video, 27:31]

     

    A Survey of Deep Learning Techniques Applied to Trading

    Greg Harris


    from May 30, 2016

    Deep learning has been getting a lot of attention lately with breakthroughs in image classification and speech recognition. However, its application to finance doesn’t yet seem to be commonplace. This survey covers what I’ve found so far that is relevant to systematic trading.

     

    Good Things Come to Those Who Weight

    Datassist, Heather M. Krause


    from August 02, 2016

    Weighting is a series of calculations on your data. You can see easily what the weighting calculations do. Let’s pretend we’re doing a survey of the US. We can go to the US Census to find the best available estimate of how many men and how many women are in the country. Now imagine it’s 50/50. These are the population parameters. Our survey is of 28 women and 72 men. Our sample data does not represent the population across the strata of gender. We have over-sampled men and under-sampled women. To correct this imbalance, we create a new variable – a weight variable. This variable will have higher values for under-sampled groups. Our new variable does two things: it deflates the statistical importance of overrepresented groups and it inflates the statistical importance of underrepresented groups. Every individual gets their own weight factor depending on each individual’s characteristics. Once your dataset has this new variable added to it, statistical programs can then apply the weight variable to any analysis so that our final results are more accurate.

     

    Leave a Comment

    Your email address will not be published.