Recently, Judge Miller in the Northern District of Indiana approved the use of predictive coding in a large, multidistrict litigation concerning certain hip implants manufactured by Biomet.
As the order notes, Biomet started with a universe of about 19.5 million documents and used keyword culling to trim that down to 3.9 million documents (about 1.5 terabytes of data). Taking out the duplicates brought that down to 2.5 million. Using statistical sampling, Biomet had a 99% confidence rate that .55% to 1.33% of the unselected documents would be responsive, and that 1.37% to 2.47% of the original 19.5 million were responsive.
Biomet then applied predictive coding to the remaining 2.5 million documents. The court described predictive coding this way:
Predictive coding has found many uses on the Internet. Under predictive coding, the software “learns” a user’s preferences or goals; as it learns, the software identifies with greater accuracy just which items the user wants, whether it be a song, a product, or a search topic. Biomet used a predictive coding service called Axelerate and eight contract attorneys to review a sampling of the 2.5 million documents. After one round of “find more like this” interaction between the attorneys and the software, the contract attorneys (together with other software recommended by Biomet’s e-discovery vendor) reviewed documents for relevancy, confidentiality, and privilege.
As of the date of the order, Biomet had incurred $1.07 million in e-discovery costs and was expected to incur up to $3.25 million. (Vendors: “Cha-ching!”)
Biomet asked the Plaintiffs’ Steering Committee to suggest additional search terms, but they declined because they felt that the initial keyword culling used tainted the entire process. Instead, the Committee wanted Biomet to start over and use predictive coding on the original 19.5 million document universe. Biomet objected, claiming that would result in millions more in e-discovery costs.
Judge Miller approved Biomet’s e-discovery procedures, stating:
The issue before me today isn’t whether predictive coding is a better way of doing things than keyword searching prior to predictive coding. I must decide whether Biomet’s procedure satisfies its discovery obligations and, if so, whether it must also do what the Steering Committee seeks. What Biomet has done complies fully with the requirements of Federal Rules of Civil Procedure 26(b) and 34(b)(2). I don’t see anything inconsistent with the Seventh Circuit Principles Relating to the Discovery of Electronically Stored Information. Principle 1.02 requires cooperation, but I don’t read it as requiring counsel from both sides to sit in adjoining seats while rummaging through millions of files that haven’t been reviewed for confidentiality or privilege.
It might well be that predictive coding, instead of a keyword search, at Stage Two of the process would unearth additional relevant documents. But it would cost Biomet a million, or millions, of dollars to test the Steering Committee’s theory that predictive coding would produce a significantly greater number of relevant documents. Even in light of the needs of the hundreds of plaintiffs in this case, the very large amount in controversy, the parties’ resources, the importance of the issues at stake, and the importance of this discovery in resolving the issues, I can’t find that the likely benefits of the discovery proposed by the Steering Committee equals or outweighs its additional burden on, and additional expense to, Biomet.
A couple of takeaways: First, keyword culling prior to the application of predictive coding is an acceptable e-discovery practice in Indiana federal courts.
Second, and more ominous, lawyers are actively and willingly helping machines learn how to replace them. I watched a lot of sci-fi movies as a younger man, and there’s only one way for this movie to end, and it’s not good. One day, you’re at your desk reviewing documents, and the next thing you know, there’s a flash of brilliant white li…….