Return Home


BERT-based machine learning model trained on hundreds of thousands of legal documents.

Tool Category:

NLPDeep Learning


Legal-BERT was pretrained on a large corpus of legal documents using Google's original BRET code:

  1. 116,062 documents of EU legislation, publicly available from EURLEX (, the repository of EU Law running under the EU Publication Office.
  2. 61,826 documents of UK legislation, publicly available from the UK legislation portal (,867 cases from European Court of Justice (ECJ), also available from EURLEX.
  3. 12,554 cases from HUDOC, the repository of the European Court of Human Rights (ECHR) (
  4. 164,141 cases from various courts across the USA, hosted in the Case Law Access Project portal (
  5. 76,366 US contracts from EDGAR, the database of US Securities and Exchange Commission (SECOM) (

The Hugging Face implementation of this model can be easily setup to predict missing words in a sequence of legal text. It also shows meaningful performance improvement discerning contracts from non-contracts (binary classification) and multi-label legal text classification (e.g. classifying legal clauses by type).

Thanks to the magic of Hugging Face, this model should be accessible even to novice coders. Further training and fine-tuning will require training data and a basic understanding of how to do this in Hugging Face, however.


Open Source:Yes
Paid Support:No
API:via Hugging Face Interfence API



Tech Stack