LexGLUE's contribution to the state of the art are two-fold. First, it combines and refines seven, huge document datasets into one, easy-to-access corpus. Second, thanks to the authors' thoughtful packaging of their datasets, they can be effortlessly integrated into Hugging Face's state-of-the-art transformer library for training and evaluation purposes with just a couple lines of codes.
According to the authors:
By unifying and facilitating the access to a set of law-related datasets and tasks, we hope to attract not only more NLP experts, but also more interdisciplinary researchers (e.g., law doctoral students willing to take NLP courses). More broadly, we hope LexGLUE will speed up the adoption and transparent evaluation of new legal NLP methods and approaches in the commercial sector too. Indeed, there have been many commercial press releases in the legal-tech industry on high-performing systems, but almost no independent evaluation of the performance of machine learning and NLP-based tools. A standard publicly available benchmark would also allay concerns of undue influence in predictive models, including the use of metadata which the relevant law expressly disregards.>
LexGLUE's seven, constituent datasets contain over 100,000 training instances total, primarily for multi-label or multi-class text classification tasks:
The seven included datasets and tasks are:
From the Stanford CodeX Website: Michael Bommarito is a former CodeX fellow. He is an Adjunct Professor of Law at Michigan State University and Head of Research at the ReInventLaw Laboratory. His research interests include natural language processing, machine learning, decision science, optimization, visualization, modeling, and policy, especially as applied to law and finance.
From Stanford's directory: Dirk Hartung is the founder and Executive Director of the Center for Legal Technology and Data Science at Bucerius Law School in Hamburg, Germany. He is the Co-Academic Director for the Bucerius Summer Program in Legal Technology and Operations and Bucerius Legal Technology Essentials. He develops the technology curriculum for this leading German law school. He is writing a PhD on digital lawyering under unauthorized practice of law regimes.
From Ion's personal website: I am Professor of Artificial Intelligence (AI) in the Department of Informatics of the Athens University of Economics and Business (AUEB), and head of AUEB's Natural Language Processing Group. I am also Scientific Advisor of the AI Centre of Excellence in Document Intelligence at NCSR "Demokritos", and Adjunct Researcher of the Institute for the Management of Information Systems (Digital Curation Unit) at the Research Centre "Athena".
From Abhik's personal website: I am a postdoctoral research associate at Universität Hamburg working under the supervision of Professor Chris Biemann. I am currently working on HILANO project which deals with the anonymization of sensitive data.
From his personal website: Research Interests include legal informatics, applied legal technology, law & economics, legal & regulatory complexity, artificial intelligence, artificial intelligence & law, machine learning & natural language processing, complex systems, network science, governance, financial regulation, financial technology, quantitative finance, quantitative modeling of litigation and jurisprudence, economics of the professions, blockchain & crypto infrastructure and the overall impact of information technology, analytics and automation on the future of society.
From his Sheffield profile page: Nikos Aletras is a Lecturer in Natural Language Processing (NLP) in the Computer Science Department at the University of Sheffield, co-affiliated with the Machine Learning (ML) group. Previously, he was a research scientist at Amazon (Core ML and Alexa) and a research associate at UCL, Department of Computer Science, Media Futures Group. He completed a PhD in NLP at the University of Sheffield. His research interests are in NLP, Machine Learning and Data Science. He develops text analysis methods to solve problems in other scientific areas such as (computational) social and legal science.
From Ilias' personal website: I am a post-doctoral researcher at the Department of Computer Science at University of Copenhagen (CoAStaL NLP Group). I recently received my Ph.D. from the Department of Informatics at Athens University of Economics and Business. My expertise is in Legal Natural Language Processing (LegalNLP), also known as Legal Intelligence. I have been a reviewer for ACL venues (ACL/EMNLP/NAACL 2020-2021) and reputable journals, such as AI & Law, PeerJ, ACM Computing Surveys, and Computer Speech & Language. I have also served and currently serve in the program committees of AI and NLP workshops targeting legal applications (AI4LEGAL, NLLP).