How Lexion reached high model quality on a brand new set of documents with its no-code platform
Lexion’s award-winning AI helps customers extract more than 60 smart fields from contracts, allowing legal, finance, procurement, sales, and other teams to quickly understand and report on the language across entire contract portfolios in seconds.
But that’s not all. Our homegrown, state-of-the-art document understanding system allows us to quickly train new models for new documents. When UiPath approached us in the spring of 2022, we were well poised to help them.
New documents? No problem
UiPath presented us with a scenario to put our system to the test. In a single week, Lexion would need to pull out key information from a small set of Collective Bargaining Agreements (~300 documents), including values like Employer Name, Agreement Start Date, and even complex related data points, like the number of hours of work required to earn 1 vacation day for part-time vs. full-time employees. There was just one challenge: Lexion had never seen these types of contracts before. What’s more, like many contracts Lexion handles today, the quality and format of the documents varied widely; many included things like handwriting, non-standard contract language, and more.
State-of-the-art, no-code stack for document understanding
No matter; the team devised a plan to quickly and precisely develop 8 new models for our partners at UiPath in no time. These models, or smart fields, use AI to identify and extract critical information from documents. Lexion’s robust technical stack, which includes advanced tools for scaled labeling, taking advantage of visual cues, and more; a sophisticated model training platform that leverages large language models; and the Lexion interface, made it possible. Together, they provide all of the tools we needed to get started.
Applying the power of scaled weak supervision
The first step was labeling the data. Adding new AI fields typically requires a HUGE number of labeled documents, a large team of annotators, and many days/weeks spent labeling document by document. However, by relying on scaled weak supervision (think of this as using rules to apply labels to many documents at once), we were able to label long-form documents much more efficiently, labeling essentially all the data with just a handful of rules per model. This process took just a few days.
Using that labeled data within our model training platform, we were able to train multiple models rapidly, specifying model architectures and tuning hyperparameters. And finally, our interface in Lexion let us customize the logic and view the source of the results to see exactly why we made certain predictions. Here’s a look at how it all comes together:
Faster, more accurate, and less manual
When testing our models on our blind evaluation set, we saw INCREDIBLE results for how quickly this was done. Normally, it would take significantly more data and time and human effort to achieve an accuracy above 80%. With our results, we saw an average accuracy of 85% in just a single week!
It’s not just about accuracy, but also ROI: shaving 85% off manual human review when dealing with a high volume of documents can easily translate into tens or hundreds of thousands of dollars saved each year—not to mention speed and accuracy improvements, too.
Want to better understand your documents?
Lexion’s state-of-the-art, homegrown AI and robust technical stack can adapt rapidly to new sets of documents and use cases with very little human input, helping companies like yours to gain insight and understanding across enormous document sets. For more details on our document intelligence work with UiPath, read our research paper in ACM SIGKDD 2022.
You can also see more on what makes Lexion’s AI unique here.
Have a unique set of documents you want help analyzing and understanding? Let us know.
Want to see Lexion in action? Schedule a demo today.