Data Science newsletter – December 23, 2021

Newsletter features journalism, research papers and tools/software for December 23, 2021

 

The most Fascinating Startups at Web Summit: To use or abuse the data, that is the question

Dataconomy, Vojna Ngjeqari


from

We met some fascinating startups advancing the human race by using data science. They all have three things in common:

  • They ask for your data, sometimes without clarity on how they’ll use it
  • They ask for permission to use your data
  • They ask for permission to share your data with third parties (for a greater good, or maybe just to make money)

  • University of Chicago, City Colleges of Chicago join forces to increase diversity in science careers

    University of Chicago, Civic Engagement


    from

    City Colleges of Chicago (CCC) and the University of Chicago (UChicago) are joining forces at an institutional level to strengthen STEM education and career opportunities and create a more diverse field of professionals entering the sciences.

    A Memorandum of Understanding executed by the two institutions aims to ensure that innovation in the rapidly changing fields of science and technology benefits from a wide range of perspectives and experiences.

    A CCC degree program in data science is among the first initiatives being developed from the collaboration.


    A view from A.I.’s biggest conference

    Fortune, Jeremy Kahn


    from

    NeurIPS, one of the most important A.I. conferences for academic research, wrapped up today. As always, the conference—held virtually this year—provides a good bellwether on the state-of-the-art and where the field may be heading.

    First, the big picture: for the first time, the number of papers submitted for possible inclusion in the conference declined slightly, falling 3.5% to 9,122. Whether this indicates that the hype around A.I. is dissipating, or at least plateauing, or whether the drop is just a blip, is hard to say. At the same time, the number of papers selected for inclusion in the conference jumped more than 23% to 2,344, which is the most ever chosen. The acceptance rate, at 26%, was also slightly higher than in recent years.

    The authorship of NeurIPS papers can serve as a kind of proxy for A.I. prowess. Judging by papers in which at least one of the authors had a corporate affiliation, Google was head and shoulders above its competitors—it had 177 papers, followed by Microsoft, with 116; DeepMind, with 81; and Facebook, with 78. Then it was a big step down to other tech giants: IBM had 36, one more than Amazon, while Nvidia had 20, and Apple just 10. (Remember though that Apple is notoriously secretive and until recently rarely allowed its R&D teams to publish academic papers.) The Chinese Internet giants, Alibaba and Tencent, had 20 and 19, respectively, while Baidu had 16.


    NYU researchers secure $200,000 grant to bring novel AI-tool to support under-resourced newsrooms across the U.S.

    NYU, Tandon School of Engineering


    from

    Mona Sloane, faculty at NYU Tandon and Senior Research Scientist at the NYU Center for Responsible AI (R/AI), and Hilke Schellmann, professor of journalism at NYU’s Graduate School of Arts and Science, have been awarded $200,000 grant from the Patrick J. McGovern Foundation to bring an innovative AI tool to under-resourced newsrooms to significantly scale up their investigative capacity and democratize access to FOIA records.

    The project will integrate the NYU-developed Gumshoe prototype — a Natural Language Processing Tool that identifies relevant and irrelevant sections in large text corpora — to help journalists effectively comb through thousands of Freedom of Information Request (FOIA) releases and other document sets. NYU will collaborate with MuckRock, an open source journalism platform used by tens of thousands of journalists across 4,000 newsrooms to help request, analyze, and publish public documents. The effort will unlock decades of valuable information, data and history contained in federal government records released under the Freedom of Information Act.


    Surveys of scientists show women and young academics suffered most during pandemic and may face long-term career consequences

    The Conversation; Lesley Michalegko, Eric Welch, Mary K. Feeney, Timothy P. Johnson


    from

    We are a team of researchers who study the role of science and technology in society. We are also part of a collaborative, multi-university project, called SciOPS, that seeks to improve how scientists communicate with the public. As the pandemic wore on, researchers began telling us about the work stoppages, data losses and other hardships they were experiencing. We felt this was important information, so we conducted two surveys to understand how the pandemic was affecting researchers.

    The pandemic’s hardships in academia have been widespread and lasting, but our analyses revealed that female and early career scientists faced more negative impacts than other groups. These differences are likely aggravating already existing disparities and potentially altering career trajectories. The negative outcomes may last well beyond the end of the pandemic.


    CARTO raises $61M to lead the way in cloud native spatial analytics

    CARTO Blog, Luis Sanz


    from

    Today we are delighted to announce the closure of a $61 million Series C round led by New York-based global venture capital and private equity firm Insight Partners, with participation from the European Innovation Council (EIC) Fund, as well as our existing investors Accel, Salesforce Ventures, Hearst Ventures, Earlybird, and Kibo.

    Over the past few years we’ve seen rapid growth in a range of verticals, including Retail, CPG, Logistics, and Telecoms and we are excited to continue enabling users of modern analytics stacks to bring cloud native Location Intelligence to the enterprise (you can see some examples of how on our customer stories page, with examples from Renault, JLL, Bumble, and Mastercard).


    What Does It Mean for AI to Understand?

    Quanta Magazine, Melanie Mitchell


    from

    Remember IBM’s Watson, the AI Jeopardy! champion? A 2010 promotion proclaimed, “Watson understands natural language with all its ambiguity and complexity.” However, as we saw when Watson subsequently failed spectacularly in its quest to “revolutionize medicine with artificial intelligence,” a veneer of linguistic facility is not the same as actually comprehending human language.

    Natural language understanding has long been a major goal of AI research. At first, researchers tried to manually program everything a machine would need to make sense of news stories, fiction or anything else humans might write. This approach, as Watson showed, was futile — it’s impossible to write down all the unwritten facts, rules and assumptions required for understanding text. More recently, a new paradigm has been established: Instead of building in explicit knowledge, we let machines learn to understand language on their own, simply by ingesting vast amounts of written text and learning to predict words. The result is what researchers call a language model. When based on large neural networks, like OpenAI’s GPT-3, such models can generate uncannily humanlike prose (and poetry!) and seemingly perform sophisticated linguistic reasoning.

    But has GPT-3 — trained on text from thousands of websites, books and encyclopedias — transcended Watson’s veneer? Does it really understand the language it generates and ostensibly reasons about?


    Senate faces pressure to quash FTC power in social spending bill

    POLITICO, Alexandra Levine


    from

    The U.S. Chamber of Commerce is ratcheting up its crusade against the FTC. Members have in recent weeks accused FTC Chair Lina Khan of abusing the agency’s power and unfairly attacking American businesses. Now the group is taking its concerns to the Senate.

    The chamber is firing off a letter to senators today to urge them to kill, as part of the Senate process known as reconciliation, portions of the Build Back Better Act, H.R. 5376 (117), that would grant the FTC more civil penalty authority. These provisions “would unfairly erode due process and would impose significant new costs on companies acting in good faith when serving consumers,” said the letter, which had almost 90 signatories, including the Interactive Advertising Bureau, the National Restaurant Association and Netchoice.


    ZSK embroidered sensors in the gym

    Innovation in Textiles blog


    from

    A prototype smart exercise machine was introduced at the recent Fitness Connected Expo held in Munich, Germany, by Krefeld-based ZSK Technical Embroidery Systems, a division of ZSK Stickmaschinen GmbH.

    R&D engineer Steliyan Vasilev explained that embroidery with conductive yarns was employed to produce different sensors which were then integrated into the cushions of the machine.


    How Did A.I. Art Evolve? Here’s a 5,000-Year Timeline of Artists Employing Artificial Intelligence, From the Ancient Inca to Modern-Day GANs

    ArtNet, Naomi Rea


    from

    While the fiction of A.I. art is pretty neat, the messy reality is that artists who work with computational systems have much more say in the outcomes than the term might suggest: they provide the inputs, guide the process, and filter the outputs. Artists have been attracted to using A.I. in their work for a variety of reasons; some are drawn to working with the most futuristic technologies, others use it as a way of integrating chance into their work, and others see potential for it to expand elements of their existing practices.

    Below we’ve outlined a timeline of a few of the key developments within the long history of A.I. art.


    UC San Diego Launches New Data Science Graduate Degree Programs

    University of California, San Diego; UC San Diego News Center


    from

    UC San Diego has launched new master’s and doctoral degree programs at the Halıcıoğlu Data Science Institute (HDSI), the university’s hub for all things data science. The new programs join UC San Diego’s popular undergraduate data science programs for an integrated slate of courses and degrees targeting individuals at all educational levels, including working professionals and industry practitioners.

    “The new degree programs are a reflection of the continued investment and academic leadership by UC San Diego in the growing area of data science,” said Chancellor Pradeep K. Khosla.


    The unvaccinated and their impact on the health care system

    The Hill, Sheldon H. Jacobson


    from

    Our health care system is under attack. Data shows that in some areas, the unvaccinated are overwhelming hospital resources. Nine hospital CEOs in Minnesota sounded the alarm in an ad imploring the unvaccinated to get vaccinated and help their systems remain viable. Health care workers are burnt out and demoralized.

    The net effect of overwhelming demand on health care services is that it is difficult, if not impossible, to seek care for health conditions that are important, but perhaps not urgent, which includes elective surgeries. By delaying such care, these important but not urgent issues later become urgent, effectively creating a future logjam of demand for health care services that are more expensive and more life threatening, and will take months to work through. The consequence of such a situation is that people will prematurely die from such delays, because others choose to not take advantage of the vaccines that are freely available and proven to reduce the burden on the healthcare system.


    Journals adopt AI to spot duplicated images in manuscripts

    Nature, Richard Van Noorden


    from

    Just before a study appears in any of ten journals published by the American Association for Cancer Research (AACR), it undergoes an unusual extra check. Since January 2021, the AACR has been using artificial intelligence (AI) software on all manuscripts it has provisionally accepted after peer review. The aim is to automatically alert editors to duplicated images, including those in which parts have been rotated, filtered, flipped or stretched.

    The AACR is an early adopter in what could become a trend. Hoping to avoid publishing papers with images that have been doctored — whether because of outright fraud or inappropriate attempts to beautify findings — many journals have hired people to manually scan submitted manuscripts for issues, often using software to help check what they find. But Nature has learnt that in the past year, at least four publishers have started automating the process by relying on AI software to spot duplications and partial duplications before manuscripts are published.


    Build a registry of results that students can replicate

    Nature, World View, Yusuf A. Hannun


    from

    As the year comes to a close, it’s natural to consider how to improve in future. One step would be to explicitly restructure scientific publications to fulfil their functions as building blocks of knowledge. Past suggestions include requiring authors to include statements of generalizability or a numerical confidence level. Here I propose two new strategies.

    First, every published study should articulate specific testable conclusions.

    In my field — cancer biology — an overall conclusion might be that enzyme Y regulates cell migration in cancer. This could be built from a series of experimental results, each laid out in a quantitative way, with the relevant metrics.


    Tandon team wins $5 Million DARPA contract to develop AI-driven augmented reality assistant

    NYU, Tandon School of Engineering


    from

    The Visualization, Imaging and Data Analysis (VIDA) Center at the New York University Tandon School of Engineering is leading an NYU initiative to develop an artificial intelligence (AI) “virtual assistant” providing just-in-time visual and audio feedback to help with task execution.

    The project is part of a national effort involving eight other institutional teams, funded by the Defense Advanced Research Projects Agency (DARPA) Perceptually-enabled Task Guidance (PTG) program. With support of a $5 million DARPA contract, the NYU group, led by Claudio Silva, director of the VIDA Center and professor of computer science and engineering and data science at NYU Tandon and NYU CDS, aims to develop AI technologies to help people perform complex tasks while making these users more versatile by expanding their skillset — and more proficient by reducing their errors.


    Computer science bachelor programs get funding boost from Amazon

    The Seattle Times, Jenn Smith


    from

    If you earn a computer-science-related bachelor’s degree, odds are you won’t be wanting for a job in the coming years. Employment market estimates project a 2-3% annual growth rate of occupations in computer science and information technology in Washington state between now and 2028.

    Now, Seattle Colleges is working to expand the number of students who can get a degree in this hot field. North Seattle College President Chemene Crawford recently announced a new initiative “that will put a bachelor’s degree in computer science within reach for many more students both in Seattle and across the state.” The program, which still needs approval from the State Board for Community and Technical Colleges (SBCTC) and a regional accrediting agency for colleges, could begin as early as fall 2022.

    The initiative takes advantage of legislation passed in July that authorizes the state’s community and technical colleges to offer a four-year computer science degree. It also leverages public and private funding. Amazon will grant $1 million each to three higher education entities supporting this effort: the Seattle Colleges Foundation, SBCTC and Washington State Opportunity Scholarship program.


    AI powers autonomous materials discovery

    Cornell University, Cornell Chronicle


    from

    When a master chef develops a new cake recipe, she doesn’t try every conceivable combination of ingredients to see which one works best. The chef uses prior baking knowledge and basic principles to more efficiently search for that winning formula.

    Materials scientists use a similar method in searching for novel materials with unique properties in fields such as renewable energy and microelectronics. And a new artificial intelligence tool developed by Cornell researchers promises to rapidly explore and identify what it takes to “whip up” new materials.
    SARA team

    SARA (the Scientific Autonomous Reasoning Agent) integrates robotic materials synthesis and characterization, along with a hierarchy of artificial intelligence and active learning methods, to efficiently reveal the structure of complex processing phase diagrams, making materials discovery vastly quicker.


    $750K USDA grant establishes Tennessee Digital Agriculture Center at MTSU to enhance youth education

    Middle Tennessee State University, MTSU News


    from

    MTSU’s School of Agriculture seeks to develop the first Digital Agriculture Center in Tennessee after landing a three-year, nearly $750,000 grant from the U.S. Department of Agriculture and National Institute of Food and Agriculture.

    It will feature a series of linked student/non-formal educator-centered projects as well as community outreach events that will ultimately forge a strong Digital Agriculture/Data Science education program focused on youth across Tennessee.


    CWRU researchers developing technique that could identify fake artworks using artificial intelligence

    Cleveland.com, Steven Litt


    from

    Art forgers of the world, beware.

    A team of art historians and scientists at Case Western Reserve University has developed a computer technique that can identify with near certainty which artist made a particular painting based on tiny details of brush marks that can’t be controlled by the artist and aren’t visible to the naked eye.

    The method combines data from the precise, three-dimensional mapping of a painting’s surface with analysis through artificial intelligence — a computer system based on the human brain and nervous system that can learn to identify and compare patterns.


    Princeton Apologizes For Controversial ‘Secret Shopper’ Privacy Study

    MediaPost, DigitalNewsDaily, Wendy Davis


    from

    Princeton University has apologized for a “secret shopper” study in which researchers posed as consumers and demanded information about how website operators — including nonprofits and bloggers — complied with privacy laws.

    For the study, which was cut short last week, researchers emailed website operators questions about their compliance with either the California Consumer Privacy Act or Europe’s General Data Protection Regulation.

    The emails didn’t identify the sender as affiliated with Princeton, or specify that the information was being sought as part of a research project.


    Eek. This is really bad. First…equating slaves w language models in order to make a pt that a language model, like a slave, could go from being an object to a person

    Twitter, MMitchell


    from

    The article as a whole is written with similar uncomfortable leaps of logic that attribute to the reader a shared understanding w the author that isn’t there. Eg, a passage from Helen Keller “should give pause” because a language model could basically be like her.


    Oxford Invited an AI to Debate Its Own Ethics—What It Said Was Startling

    Singularity Hub, Alex Connock and Andrew Stephen


    from

    It’s natural, therefore, that we would include the ethics of AI in our postgraduate diploma in artificial intelligence for business at Oxford’s Said Business School. In its first year, we’ve done sessions on everything from the AI-driven automated stock trading systems in Singapore, to the limits of facial recognition in US policing.

    We recently finished the course with a debate at the celebrated Oxford Union, crucible of great debaters like William Gladstone, Robin Day, Benazir Bhutto, Denis Healey, and Tariq Ali. Along with the students, we allowed an actual AI to contribute.

    It was the Megatron Transformer, developed by the Applied Deep Research team at computer chip maker Nvidia, and based on earlier work by Google. Like many supervised learning tools, it is trained on real-world data—in this case, the whole of Wikipedia (in English), 63 million English news articles from 2016 to 2019, 38 gigabytes worth of Reddit discourse (which must be a pretty depressing read), and a huge number of creative commons sources.


    Just published: Do large language models understand us? https://link.medium.com/0S1dajuU2lb It’s sometimes claimed that ML is “just stats” and AI can’t “understand”.

    Twitter, Blaise Aguera


    from

    I’m arguing that LLMs have a great deal to teach us about language, understanding, intelligence, sociality, even personhood.


    Multi-million pound investment to keep Southampton at forefront of Computer Science and Artificial Intelligence

    University of Southampton, Electronics and Computer Science


    from

    The School’s plans include an additional £1.7m to increase staff by 24 accademic posts over the next three years, £1.4m for a new research laboratory in robotics and AI, and a major capital investment in teaching facilities. The expansion will shape new research areas that change our world for the better and develop new teaching programmes that will be available by 2023/24.


    Science’s 2021 Breakthrough of the Year: AI brings protein structures to all

    YouTube, Science Magazine


    from

    Bounty of new structures will forever change biology and medicine [video, 5:48]


    White House AI Initiative Launches Public Research Support Tools

    Nextgov, Alexandra Kelley


    from

    A new section of resources intended for artificial intelligence researchers was launched last Friday by the National Artificial Intelligence Initiative Office, with a goal of providing easy access to data sets and testbed environments for AI application training.

    The AI Researchers Portal—a program within the White House Office of Science and Technology Policy—is the latest page added to the NAIIO website. Announcing its launch on Twitter, officials described it as “a central connection to many federally-supported resources for America’s AI research community.”

    Some of the available tools on the page include funding and grant information, datasets, computing resources, a research program directory and a testbed selection.


    Cal Poly Computer Science Department faces faculty shortage with growing student demand

    Cal Poly San Luis-Obispo, Mustang News, Madison Bellah


    from

    Enrolling 200 out of 6,000 applicants (roughly 3% ) this last fall, the computer science major at Cal Poly is one of the most competitive programs to get in at the university. However, it’s also one of the most understaffed.

    The department is currently struggling to meet the demand of the increasing number of students in this department, according to department chair Chris Lupo.


    Everyone Is Talking About Data Science. Here’s How J.P. Morgan Is Putting It Into Practice.

    Institutional Investor, Julie Segal


    from

    Although quants like D.E. Shaw and Renaissance Technologies live and breathe data and advanced computing techniques, many traditional firms are still in the early stages of developing proprietary tools for their portfolio managers, who make decisions based on fundamental factors like the financial potential of a new product, the visionary qualities of a company’s management, or a company’s ability to survive a global pandemic. “You’ve either got quant or fundamental. What we’re trying to do is bring the two together,” West said. “We pride ourselves on our insights as fundamental analysts, but how do you actually observe that?”

    Managers are at different stages when it comes to working with and assessing the potential of artificial intelligence capabilities and data. As one example, Wellington Management’s 70-person Investment Science Group has been focused on applying data analytics to investment ideas and developing professional investors, which includes uncovering and mitigating the downside of their behavioral biases.


    The internet is tricking our brains

    NBC News, David Ingram


    from

    People who lean on a search engine such as Google may get the right answers but they can also end up with a wrong idea of how strong their own memory is, according to a study that Ward published in August. That’s because online search is so seamless and always available that people often don’t have the chance to experience their own failure to remember things, the study found.

    The findings are part of a wave of new research in recent years examining the intersection of the internet and human memory. The implications could be far-reaching, including for the spread of political misinformation, [Adrian] Ward said. He cited years of research into how people make decisions, showing that people who are overconfident in their knowledge become more entrenched in their views about politics and science and also can make questionable financial and medical decisions.

    “The larger effect is people thinking, ‘I am smart. I am responsible for this. I came up with this info,’” Ward said in an interview.

    SPONSORED CONTENT

    Assets  




    The eScience Institute’s Data Science for Social Good program is now accepting applications for student fellows and project leads for the 2021 summer session. Fellows will work with academic researchers, data scientists and public stakeholder groups on data-intensive research projects that will leverage data science approaches to address societal challenges in areas such as public policy, environmental impacts and more. Student applications due 2/15 – learn more and apply here. DSSG is also soliciting project proposals from academic researchers, public agencies, nonprofit entities and industry who are looking for an opportunity to work closely with data science professionals and students on focused, collaborative projects to make better use of their data. Proposal submissions are due 2/22.

     


    Tools & Resources



    Part I: Simplifying Transformer Research with xFormers & Lightning | by Sean Narenthiran

    PyTorch Lightning Developer Blog, Sean Narenthiran


    from

    xFormers provides a simple composable solution to building Transformers, abstracting out optimized GPU kernels (that are enabled automatically for you), fused Transformer layers which go beyond what PyTorch can offer whilst remaining seamless for the user (just a torch.nn.Module). Paired with Lightning, this creates an extremely powerful tool to train models with high levels of flexibility.


    Announcing the Transactions on Machine Learning Research

    Medium, Hugo Larochelle


    from

    We’re happy to announce that we are founding a new journal, the Transactions on Machine Learning Research (TMLR). This journal is a sister journal of the existing, well-known Journal of Machine Learning Research (JMLR), along with the Proceedings of Machine Learning Research (PMLR) and JMLR Machine Learning Open Source Software (MLOSS). However it departs from JMLR in a few key ways, which we hope will complement our community’s publication needs. Notably, TMLR’s review process will be hosted by OpenReview, and therefore will be open and transparent to the community. Another differentiation from JMLR will be the use of double blind reviewing, the consequence being that the submission of previously published research, even with extension, will not be allowed. Finally, we intend to work hard on establishing a fast-turnaround review process, focusing in particular on shorter-form submissions that are common at machine learning conferences.


    New Datasets to Democratize Speech Recognition Technology

    Substack, The Gradient


    from

    Presenting the The People’s Speech, a massive English-language dataset of audio transcriptions, and the Multilingual Spoken Words Corpus (MSWC), a 50-language, 6000-hour dataset of individual words


    .@d3js_org ‘s @mbostock shares why his #DataVisualization tool won the 10-year Test-of-Time from @IEEEorg .

    Twitter, Trifacta


    from

    He also goes deep on @observablehq
    and talks about all things visualization with @joe_hellerstein
    and @jeffrey_heer
    on #TheDataWranglers podcast


    Customizing GPT-3 for Your Application

    Open AI, Blog


    from

    Developers can now fine-tune GPT-3 on their own data, creating a custom version tailored to their application. Customizing makes GPT-3 reliable for a wider variety of use cases and makes running the model cheaper and faster.

    You can use an existing dataset of virtually any shape and size, or incrementally add data based on user feedback. With fine-tuning, one API customer was able to increase correct outputs from 83% to 95%. By adding new data from their product each week, another reduced error rates by 50%.


    [Feature Launch]: Save Notebooks to GitHub

    Kaggle, Product Features, Jim Plotts


    from

    Starting today, we’re rolling out a new feature that lets you save your notebooks to your GitHub repository (even private ones)!

    To enable this feature, go to File > Link to GitHub in the Notebook Editor. If you haven’t previously linked your Kaggle and GitHub accounts, you’ll be prompted to do that first.


    Careers


    Tenured and tenure track faculty positions

    cluster recruitment of 2 named professors and 2 junior faculty in Artificial Intelligence and Society at @JohnsHopkins



    Johns Hopkins University; Baltimore, MD

    Leave a Comment

    Your email address will not be published.