Facebook AI Research, together with Google’s DeepMind, University of Washington, and New York University, today introduced SuperGLUE, a series of benchmark tasks to measure the performance of modern, high performance language-understanding AI.
SuperGLUE was made on the premise that deep learning models for conversational AI have “hit a ceiling” and need greater challenges.
Considered state of the art in many regards in 2018, BERT’s performance has been surpassed by a number of models this year such as Microsoft’s MT-DNN, Google’s XLNet, and Facebook’s RoBERTa, all of which were are based in part on BERT and achieve performance above a human baseline average.
SuperGLUE is preceded by the General Language Understanding Evaluation (GLUE) benchmark for language understanding in April 2018 by researchers from NYU, University of Washington, and DeepMind.
GLUE assigns a model a numerical score based on performance on nine English sentence understanding tasks for NLU systems, such as the Stanford Sentiment Treebank (SST-2) for deriving sentiment from a data set of online movie reviews.
RoBERTa currently ranks first on GLUE’s numerical score leaderboard with state-of-the-art performance on 4 of 9 GLUE tasks.