You are involved in litigation and faced with a document review need, what now? Naturally you need to find attorneys to review these documents. To this end, depending on the volume of data at issue, many firms will either: (1) staff the document review with firm attorneys, or (2) work with a vendor to retain a review team comprised of contract attorneys. Irrespective of who conducts the needed review, the cost attendant to that review and the time to complete the review is often a concern.  Because a party to a litigation should not produce documents without reviewing them, predictive coding may be a particularly helpful option.

Simply put, predictive coding is the use of a computer system to help determine which documents are relevant to a particular legal proceeding.  The system makes this determination based upon “training” it receives from human input.  In fact, for a predictive coding system to make accurate decisions, the system needs direction from humans fluent in the intricacies of the lawsuit.  During this training phase, attorneys will review a seed set of documents and code those documents accordingly (i.e., responsive, privilege, tagging issues applicable). (FN*) At each step of this process, the computer system is being trained and educated. Refinements are made along the way and internalized by the system. Once trained, the computer will find and code (based on its training) the responsive documents far quicker (and often with far greater accuracy) than human reviewers. Specifically, the computer will build a model to identify documents that have a high probability of correct classification into categories pre-defined through the training /seed coding.

As with any review (entirely human or a combination of human and machine review), a validation process should be implemented.  Specifically, there should be a work flow created that provides for attorney reviewers to check the efficacy and accuracy of the model.   It is important here to determine what validation/QC process is best implemented.  For example, one can implement a statistical sampling of data where documents are selected at random and reviewed for accuracy.  This sort of validation would be reflective of the machine’s overall accuracy and reflective of the overall document population.  There is also, however, a more particularized sampling where a group of relevant documents are selected from the population and reviewed for accuracy.  This sort of validation would be more limited in that it would not allow the attorney running the review to form any conclusions about the entire document population.  (FN**)

Because of the ever-increasing volume of data and information, predictive coding is becoming a more attractive tool to incorporate into every document review to some degree, especially because no minimum data size is required to use predictive coding.  A document review that uses predictive coding coupled with a well-devised work flow will inevitably minimize review costs while maximizing efficiency during the review.


FN* Because the coding on these seed documents will impact the quality of the computer’s determinations, it is important the individuals coding the seed documents understand well the lawsuit and how the predicting coding system is to work.

FN**  And, if you are not comfortable allowing a computer to do that much work, other predictive coding options (e.g., other than allowing the system to extrapolate based upon seed sets) are available.  For example, prioritized review can be used whereby the system identifies and escalates important documents for review but keeps likely irrelevant documents in the queue.  Incorporating this option into your work flow allows attorneys to still lay eyes on all documents but provides for an efficient prioritization of documents that must be reviewed.