Data Science newsletter – February 28, 2018

Newsletter features journalism, research papers, events, tools/software, and jobs for February 28, 2018

GROUP CURATION: N/A

 
 
Data Science News



Seasonal Patterns in the Amazon Explained

Brookhaven National Laborratory


from

“If we can improve our understanding of how much carbon dioxide (CO2) is absorbed by tropical regions, we can improve future climate projections,” said Jin Wu, a scientist at Brookhaven’s Environmental & Climate Sciences Department.

Satellite images are one of the most common tools scientists use to observe tropical forests, but the efficacy of the method has been a subject of debate. Some researchers have argued that seasonal changes in the “greenness” of tropical forests, as satellites have recently shown, could be misleading. Now, the collaboration led by Brookhaven has used field observations and computational models to help clear up the controversy. Their results, published on Feb. 6 in New Phytologist, also shed light on biological processes that have changed scientists’ understanding of seasonality in tropical forests.


Smart Cities Initiative Spurs “Gigabit Apps” for Next-Gen Internet

Xconomy, Bruce V. Bigelow


from

With the U.S. Ignite initiative, the NSF is providing modest incentives to encourage the development of “smart city” applications to help solve municipal problems—and perhaps create the kind of demand for gigabit infrastructure that private industry would need to justify the cost of building ultra high-speed networks. As U.S. Ignite executive director William Wallace noted in 2015, “Building a critical mass of communities with next-generation Internet capabilities will have ripple effects: if networks are fast, reliable and widely available, companies produce more capable applications to run on those networks, which in turn brings new users online and increases use among those who already subscribe to broadband services.”


A successful cancer researcher confronts a new challenge: getting elected to Congress

Science, ScienceInsider, Jeffrey Mervis


from

Starting this month, ScienceInsider will be following the 2018 U.S. elections, which have attracted unusual interest from the scientific community. Dozens of candidates with backgrounds in science, technology, engineering, and math are seeking election to Congress, and hundreds more are running for state and local offices. We will be profiling candidates and reporting on news from the campaign trail.

This story is the first in a three-part series about three Texas candidates with scientific backgrounds who are running for the U.S. House of Representatives as Democrats. The primary is 6 March.


Nobody Wants to Let Google Win the War for Maps All Over Again

Bloomberg Technology, Mark Bergin


from

Self-driving cars need painfully detailed data on every inch of street. Can automakers solve the problem without the reigning superpower of maps?


Can Cafe X, a $25,000 robot, make better coffee than a barista?

Curbed, Diana Budds


from

Would you trust a robot to make your coffee? Entrepreneur Henry Hu and the Silicon Valley design firm Ammunition think you should give it a shot.

Pulling a perfect espresso is an art form that makes artisanal baristas and coffee snobs swoon. But when Hu bought his daily brew, he noticed that most baristas were spending more time pushing buttons and moving cups around than actually giving the coffee and customers the TLC they deserved. He noticed that orders would take longer than expected, and sometimes customers would get the wrong drink. Meanwhile, baristas didn’t have time to thoroughly answer questions about their specialty coffees to inquiring minds.


Computational Social Science ≠ Computer Science + Social Data

Communications of the ACM, Viewpoint, Hannah Wallach


from

For me, shifting away from traditional machine learning and into this interdisciplinary space has meant that I have needed to think outside the algorithmic black boxes often associated with machine learning, focusing instead on the opportunities and challenges involved in developing and using machine learning methods to analyze real-world data about society.

This Viewpoint constitutes a reflection on these opportunities and challenges. I structure my discussion here around three points—goals, models, and data—before explaining how machine learning for social science therefore differs from machine learning for other applications.


Artificial Intelligence Aids in Cancer Diagnosis

Weill Cornell Medicine, Newsroom


from

In order to further improve the accuracy of cancer diagnosis, Dr. Elemento and colleagues at Weill Cornell Medicine and NewYork-Presbyterian developed an artificial intelligence computer program that analyzes pathology images and determines whether they are malignant and if so, what type of cancer is present. Their results were published Dec. 28 in EBioMedicine.

The researchers developed what they dubbed a convolutional neural network (CNN), a computer program that is modeled on a human brain. “Just as a human would look at a lot of images and learn to distinguish certain characteristics of a cancer cell, our neural network did the same thing,” said co-senior author Dr. Iman Hajirasouliha, an assistant professor of physiology and biophysics at Weill Cornell Medicine. In order to “train” the CNN, the team exposed the computer program to thousands of pathology images of known lung, breast and bladder cancers.


Amazon Alexa’s head AI researcher has left for Google

Quartz, Dave Gershgorn


from

The researcher tasked with dreaming up new capabilities for Amazon’s Alexa is taking his talents to Google.

Ashwin Ram, former senior manager and lead for Alexa AI at Amazon, is now technical director of AI for Google Cloud, according to a post on LinkedIn. Talent in artificial intelligence is scarce, and companies are willing to pay millions to obtain new talent. Managers like Ram, who have been working on the technology since before the deep learning boom, are even more rare.


Founded by lab mates at Stanford University’s Artificial Intelligence Lab, Drive.ai paves the way for autonomous vehicles

YourStory, Vallabh Rao


from

There is an exorbitant number of automobile-related deaths each year, 94 percent of which are caused by human error. At the same time, cities are overcrowded and personally-owned cars spend an average 22 hours of each day unused. Self-driving vehicles will help solve these problems, and eliminate human-created risk from the roads to save a significant number of lives.

Drive.ai was set up taking these statistics into concern. Using deep learning artificial intelligence, Drive.ai is uniquely positioned to equip vehicles with the best autonomous technology to create safer roads with smarter cars.

It was founded in Silicon Valley in 2016 by several former lab mates of Stanford University’s Artificial Intelligence Lab, including Brody Huval, Joel Pazhayampallil, Sameep Tandon, Tao Wang, and Will Song. Carol Reiley and Fred Rosenzweig are also co-founders. Sameep currently serves as CEO.


Barbara Engelhardt’s Statistical Search for Genomic Truths

Quanta Magazine, Jordan Cepelewicz


from

“We don’t have much ground truth in biology.” According to Barbara Engelhardt, a computer scientist at Princeton University, that’s just one of the many challenges that researchers face when trying to prime traditional machine-learning methods to analyze genomic data. Techniques in artificial intelligence and machine learning are dramatically altering the landscape of biological research, but Engelhardt doesn’t think those “black box” approaches are enough to provide the insights necessary for understanding, diagnosing and treating disease. Instead, she’s been developing new statistical tools that search for expected biological patterns to map out the genome’s real but elusive “ground truth.”

Engelhardt likens the effort to detective work, as it involves combing through constellations of genetic variation, and even discarded data, for hidden gems. In research published last October, for example, she used one of her models to determine how mutations relate to the regulation of genes on other chromosomes (referred to as distal genes) in 44 human tissues. Among other findings, the results pointed to a potential genetic target for thyroid cancer therapies. Her work has similarly linked mutations and gene expression to specific features found in pathology images.

The applications of Engelhardt’s research extend beyond genomic studies. She built a different kind of machine-learning model, for instance, that makes recommendations to doctors about when to remove their patients from a ventilator and allow them to breathe on their own.


A new data trove could teach computers to tell blind people what they need to know

MIT Technology Review, arXiv


from

Enter Danna Gurari at the University of Texas at Austin and a few colleagues, who today publish a database of 31,000 images along with questions and answers about them. At the same time, Gurari and co set the machine-vision community a challenge: to use their data set to train machines as effective assistants for this kind of real-world problem.

The data set comes from an existing app called VizWiz, developed by Jeff Bigham and colleagues at Carnegie Mellon University in Pittsburgh to assist blind people. Bigham is also a member of this research team.

Using the app, a blind person can take a photograph, record a question verbally, and then send both to a team of volunteer helpers who answer to the best of their ability.


New York is the tech sector’s official second city. And the boom is just beginning

Crain's New York Business, Greg David and Cara Eisenpress


from

New York has now hooked its economic future to the tech sector—the most dynamic aspect of the American economy—and is emerging as the second headquarters of the whole industry. The dividends have already been enormous. Tech firms have provided jobs for tens of thousands of millennial graduates from the nation’s best colleges. The companies are filling office space as other industries cut back. Amazon has New York on its HQ2 short list and announced in September that its $55 million Hudson Yards office would employ 2,000 by 2020. Companies announce new deals and venture capital rounds every day; last week Roche bought five-year-old health-tech firm Flatiron Health for $1.9 billion.

There are now 7,500 New York–based tech companies—defined as those whose principal business is tech and its applications—fueled by $38.4 billion of investment in the past five years and supported by accelerators, incubators, meet-ups and universities. The companies employ 120,000 people, 60% more than a decade ago, according to the state comptroller.

 
Events



NYC Open Data Week

NYC Open Data


from

New York, NY March 3-10. “NYC Open Data Week is a collaboration between NYC Open Data, BetaNYC and the dynamic NYC civic tech and data communities. We’re kicking off the week on International Open Data Day (March 3) with a day-long conference produced by BetaNYC called School of Data and wrapping up on Saturday (March 10) with an event on the future of mobility; Open Data Art, workshops for youth and showcases of new civic tools by Participatory Budgeting and others will be happening in between.”


McKinsey Analytics Online Hackathon- Recommendation Design

McKinsey Analytics, Analytics Vidhya


from

Online March 9-10. “You will have the opportunity to experience and overcome some of the challenges that leading global organizations face. The best participants will be short listed for interviews with us and additionally the winner will receive an all-expenses paid trip to an international analytics conference of your choice as a McKinsey guest.” [free, registration required]


Kaggle’s CareerCon 2018

Google, AirBnB, Slack


from

Online March 20-22. “A Digital Event All About Landing Your First Data Science Job” [free, registration required]


Artificial Intelligence @ Oxford

University of Oxford


from

Oxford, England March 27. “A unique opportunity for industry to see the state-of-the-art in artificial intelligence and machine learning at one of the world’s great universities, and meet Oxford’s AI experts one-to-one. This event will show you the reality of AI today: what is possible and where the technology is going.” [$$$, sold out, waiting list available]

 
Deadlines



The Pentagon Announces xView Detection Challenge

xView is one of the largest publicly available datasets of overhead imagery. It contains images from complex scenes around the world, annotated using bounding boxes. The DIUx xView 2018 Detection Challenge is focused on accelerating progress in four computer vision frontiers:

  • Reduce minimum resolution for detection
  • Improve learning efficiency
  • Enable discovery of more object classes
  • Improve detection of fine-grained classes
  • Pre-registration available. Competition closes in May.


    Generative Design in Minecraft Competition (GDMC)

    “The Settlement Generation Challenge is about writing an algorithm that can create a settlement for a given, unknown Minecraft map. The challenge is to produce an algorithm that is adaptive towards the provided map, creates a settlement that satisfies a range of functional requirement – but also looks good and evokes an interesting narrative. The goal is to basically produce an algorithm that can rival the state of the art of what humans can produce.” Deadline for submissions is June 30.
     
    NYU Center for Data Science News



    A Call to Action for Empiricism in Data Quality Research

    Medium, NYU Center for Data Science


    from

    CDS Faculty Member Juliana Freire, Professor of Computer Science & Engineering and Data Science, proposes a new classification to enable data quality research


    This Summer, Choose From 2 Accelerators to Grow Your Startup

    NYU Entrepreneurship


    from

    For the past five years, the Entrepreneurial Institute has offered our marquee program, the NYU Summer Launchpad accelerator to graduating NYU startup teams. The program was the summer home for successful NYU startups including Brooklinen, Smart Vision Labs, Ephemeral Tattoos, ShopDrop, My Wellbeing, and many others. Collectively, Summer Launchpad teams have gone on to participate in renowned startup accelerators such as Techstars and Angel Pad, have raised more than $21 million in angel and VC financing, built out their teams, shipped products and are generating more than $60 million in revenue!

    Building on the amazing track record of Summer Launchpad, plus what we learned from the incredibly successful 2-week J-Term Startup Sprint, this summer we’re offering NYU entrepreneurs two summer accelerator programs to choose from: A 2-3 week Summer Startup Sprint primarily for students graduating in 2019 or later, or the more immersive 9-week Summer Launchpad accelerator, primarily for graduating students.

     
    Tools & Resources



    How to build a deep learning model in 15 minutes

    Instacart tech blog, Montana Low


    from

    As Instacart has grown, we’ve learned a few things the hard way. We’re open sourcing Lore, a framework to make machine learning approachable for Engineers and maintainable for Machine Learning Researchers.


    Increase Your Organization’s Data Agility with a Multi-Model Approach

    MarkLogic, O'Reilly Media


    from

    “In this free O’Reilly eBook, Gaining Data Agility with Multi-Model Databases by consultant Joel Ruisi, you’ll learn how, with a multi-model database, your company can easily centralize, manage, and search all the data the IT system collects. The result is data agility: the ability to adapt to changing environments and serve users what they need when they need it.”


    A Comprehensive Guide to Connect R to Amazon Redshift

    Open Data Science, Pablo Seibelt


    from

    Amazon Redshift is one of the hottest databases for Data Warehousing right now, it’s one of the most cost-effective solutions available, and allows for integration with many popular BI tools. Unfortunately, the status of the drivers compatibility is a little more shaky, but there is a way to make it work very nicely with R!


    Preparing the Next Generation of GEOINT Practitioners

    Trajectory Magazine; Jerome Lenzcowski, retired NGA; Robert McNair, Tara Mott, Nathan Spangler and Lisa Spuria


    from

    GEOINT job roles and specialized skill sets have not kept pace with the changing technology environment and will continue to lag behind as data, collectors, tools, and technology expand. Remedying this situation demands an aggressive approach to preparing current and future GEOINT practitioners, which entails more than stating the problem and developing a few new training courses. Rather, the discipline needs a comprehensive growth campaign; one that markets to a wider audience, professes a broader understanding and global acceptance of GEOINT, prepares future practitioners to seek GEOINT careers and master complex GEOINT problems, encourages current practitioners to embrace and shape emerging capabilities, and engages industry to drive cutting-edge solutions that will transform the GEOINT discipline. Fundamental to this campaign is a common lexicon and a central body of knowledge. This common basis will underpin the future expansion of the discipline and ensure GEOINT practitioners receive the breadth of foundational and emerging skills required by the GEOINT Community.


    To solve problems caused by sitting, learn to squat

    Quartzy, Rosie Spinks


    from

    Our failure to squat has biomechanical and physiological implications, but it also points to something bigger. In a world where we spend so much time in our heads, in the cloud, on our phones, the absence of squatting leaves us bereft of the grounding force that the posture has provided since our hominid ancestors first got up off the floor. In other words: If what we want is to be well, it might be time for us to get low.


    101 NumPy Exercises for Data Analysis (Python)

    Machine Learning Plus


    from

    The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.


    A cybersecurity style guide

    Boing Boing, Cory Doctorow


    from

    Information security firm Bishop Fox’s “Cybersecurity Style Guide” is 92 pages’ worth of usage notes from the confusing world of technical jargon, a combination of glossary, pronunciation guide and style manual (in the manner of the jargon file), and includes the notation that “cyber-” is an ill-advised prefix.

    Leave a Comment

    Your email address will not be published.