Predictive Coding: When Is It Ideal To Use It?
What It Is
Blair and Lawler explain that predictive coding automates the document review process. That is, it uses technology to identify relevant documents in order to reduce the volume of data that needs to go through costly and time-consuming manual review. Currently, this automation is only partial. Some manual human review is still needed, and the mathematical algorithms powering predictive coding must be tested and verified through data sampling. Additionally, the use of predictive coding is not and cannot be used as a substitute for judgment. For now, knowledgeable people control the technology and are the key to a successful outcome.
When To Use It
The decision to deploy predictive coding for a particular case should include several factors such as data volume, collection methods and type of data being collected. Firstly, data volume is possibly the most important because predictive coding algorithms operate better with more input, a case involving a large data volume is essential. They go on to explain that if a case has a small number of documents (e.g., fewer than 10,000), the number of documents that must be reviewed manually in order to train and validate the algorithm may make up the majority of those collected, thereby refuting any potential cost savings. However, a matter involving a large volume of documents will maximize the benefits of predictive coding technology and will, in turn, make the review more efficient and less costly.
Furthermore they mention that a case can also be affected by the document collection method used. Some organizations have sophisticated collection tools in-house that allow for targeted collection of potentially relevant information, while other organizations have limited collection capabilities that may result in the collection of larger wholesale data sets. According to Blair and Lawler, the former (targeted rich collection) is less likely to yield substantial cost savings using predictive coding, while the latter is often ideal for predictive coding. Likewise, custodial self-collections, whether attorney-guided or self-directed, may not be good candidates for predictive coding. Many custodians organize their mailboxes and other documents in a manner that allows them to identify the specific information that may be potentially responsive. These types of targeted collections normally result in smaller volumes of rich data being collected for review, which, in their opinion, is not best suited for a predictive coding workflow. In terms of the type of data being collected for review they agree that because predictive coding is dependent on machine learning, electronic text is essential. This is best found in documents that originated in electronic form rather than those that originated on paper. They detail that even if scanned hardcopy documents are put through optical character recognition, or OCR, there can be a high error rate in the OCR and results may not be ideal for predictive coding. For electronic data, they say predictive coding works best across a universe of similar data types, such as emails and attachments, whereas image files or other non-text-based file types, such as CAD drawings and audio files, are not suitable for predictive coding.
Making It Make Sense
To make sense and use of the factors outlined above, they advise seeking help from an experienced e-discovery lawyer or technologist who will assess these factors and advise on whether predictive coding is appropriate for a particular matter. In using predictive coding they also recommend the use of a defensible predictive coding workflow. This defensible predictive coding workflow should have set levels of recall and precision. They go on to explain what precision and recall are: Precision measures the fraction of responsive documents found in the database as identified by the computer. Recall measures the number of documents ultimately tagged as responsive. Both precision and recall metrics need to be tracked and provided regularly as reports to all stakeholders (client, co-counsel and colleagues).
To conclude, Blair and Lawler see predictive coding, when used correctly, as a valuable tool to help sift through a lot of data. They see predictive coding as having applications well beyond traditional document production if deployed properly. They correctly state that it can be used effectively to sift through a pile of documents received from an opposing party, to finding possible issues as part of an internal audit, for trial preparation or to dispose of expired records as part of an information governance program. Finally it should be noted that predictive coding is the use of computing power to improve the knowledge and proficiency of those who know how to use it.
*The full article on predictive coding can be found in The Legal Intelligencer, Special Section, February 2014