We're biased on this one, but Open Contracts offers a smooth, modern tool to collaboratively label contracts for machine learning applications. Unlike every other free or open source tool (that we know of) that converts your contract to text first and provides a text labeling experience that relies on plain, unformatted text, Open Contracts was built from the ground up to work with PDFs. Because of this decision, we hope it will be well-received by even non-technical users who can browse and search a contract PDF just as they would normally. The goal is for no technical knowledge to be required by end users.
Why use Open Contracts? It not only lets you label contracts and store your labelled data in an open source format, it also lets you share your label sets and create new data sets from existing ones. Why constantly re-invent the wheel? Create a base dataset or use an open source one (like the Atticus Project's), and then instantly fork it to build a bespoke dataset for a specific application. Open Contracts puts you in control of what's most valuable - your legal data.
You can easily export annotated or un-annotated documents. You can also export entire data sets, which can be shared and loaded into other instances of Open Contracts.
Unlike other text labeling tools (that we know of), Open Contracts is designed to let you view and label native PDFs in a high-quality PDF viewer (Mozilla's Excellent PDF.js). Non-technical users will have no problem using this tool.
Open Contracts makes it easy to create multiple different collections of labelled documents from the same source material. Easily "fork" an existing data set and create a customized version using your existing data as a base. This is a great way to leverage public data sets like the Atticus Project to create your own bespoke, custom training data.
Open Contracts is built on Node.js and React. Its MIT-licensed front-end is smooth and easy to navigate for non-technical users. You can do plain text searches or search and filter by document label type or labelled text (e.g. quickly find all text labelled as "Indemnification Clause" in your data sets).