Return Home
...

LEGAL-BERT

BERT-based machine learning model trained on hundreds of thousands of legal documents.


Tool Category:

NLPDeep Learning

Summary

Legal-BERT was pretrained on a large corpus of legal documents using Google's original BRET code:

  1. 116,062 documents of EU legislation, publicly available from EURLEX (http://eur-lex.europa.eu), the repository of EU Law running under the EU Publication Office.
  2. 61,826 documents of UK legislation, publicly available from the UK legislation portal (http://www.legislation.gov.uk).19,867 cases from European Court of Justice (ECJ), also available from EURLEX.
  3. 12,554 cases from HUDOC, the repository of the European Court of Human Rights (ECHR) (http://hudoc.echr.coe.int/eng).
  4. 164,141 cases from various courts across the USA, hosted in the Case Law Access Project portal (https://case.law).
  5. 76,366 US contracts from EDGAR, the database of US Securities and Exchange Commission (SECOM) (https://www.sec.gov/edgar.shtml).

The Hugging Face implementation of this model can be easily setup to predict missing words in a sequence of legal text. It also shows meaningful performance improvement discerning contracts from non-contracts (binary classification) and multi-label legal text classification (e.g. classifying legal clauses by type).

Thanks to the magic of Hugging Face, this model should be accessible even to novice coders. Further training and fine-tuning will require training data and a basic understanding of how to do this in Hugging Face, however.

Stats

Open Source:Yes
Paid Support:No
API:via Hugging Face Interfence API

License(s)

MIT

Tech Stack

PythonC++CUDA