Traditional document review can be one of the most variable and expensive aspects of the discovery process.  The good news is that there are innumerable analytic tools available to empower attorneys to work smarter, whereby reducing discovery costs and allowing attorneys to focus sooner on the data most relevant to the litigation.   And, while various vendors have “proprietary” tools with catchy names, the tools available all seek to achieve the same results:  smarter, more cost effective review in a way that is defensible and strategic.

Today’s blog post discusses one of those various tools – predictive coding.  The next few blog posts will focus on other tools such as email threading, clustering, conceptual analytics, and keyword expansion.

Predictive Coding

Predictive coding is a machine learning process that uses software to take keyword searches / logic, entered by people, for the purpose of finding responsive documents, and applies it to much larger datasets to reduce the number of irrelevant and non-responsive documents that need to be reviewed manually.  While each predictive algorithm will vary in its actual methodology, the process at a very simplistic level involves the following steps:

  1.  Data most likely relevant to the litigation is collected. Traditional filtering and de-duplication is applied.  Then, human reviewers will identify a representative cross-section of documents, known as a “seed set,” from the remaining (de-duplicated) population of documents that need to be reviewed.   The number of documents in that seed set will vary, but it should be sufficiently representative of the overall document population.
  2.  Attorneys most familiar with the substantive aspects of the litigation code each document in the seed set responsive or non-responsive as appropriate. Mind you, many of the predictive coding software available allows users to perform classification for multiple issues simultaneously (i.e., responsiveness and confidentiality).  These coding results will then be input into the predictive coding software.
  3.  The predictive coding software analyzes the seed set and creates an internal algorithm to predict the responsiveness of other documents in the broader population.  It is critically important after this step that the review team who coded the seed set spend time sampling the results of the algorithm on additional documents and refine the algorithm by continually coding and inputting sample documents until desired results are achieved.  This “active learning” is important to achieve optimal results.  Simply stated, active learning is an iterative process whereby the seed set is repeatedly augmented by additional documents chosen by the algorithm and manually coded by a human reviewer. (This differs from “passive learning,” which is an iterative process that uses totally random document samples to train the machine until optimal results are achieved).

Once the team is comfortable with the results being returned, the software applies the refined algorithm to the entire review set and codes all remaining documents as responsive or unresponsive.

 

 

 

 

You are involved in litigation and faced with a document review need, what now? Naturally you need to find attorneys to review these documents. To this end, depending on the volume of data at issue, many firms will either: (1) staff the document review with firm attorneys, or (2) work with a vendor to retain a review team comprised of contract attorneys. Irrespective of who conducts the needed review, the cost attendant to that review and the time to complete the review is often a concern.  Because a party to a litigation should not produce documents without reviewing them, predictive coding may be a particularly helpful option.

Simply put, predictive coding is the use of a computer system to help determine which documents are relevant to a particular legal proceeding.  The system makes this determination based upon “training” it receives from human input.  In fact, for a predictive coding system to make accurate decisions, the system needs direction from humans fluent in the intricacies of the lawsuit.  During this training phase, attorneys will review a seed set of documents and code those documents accordingly (i.e., responsive, privilege, tagging issues applicable). (FN*) At each step of this process, the computer system is being trained and educated. Refinements are made along the way and internalized by the system. Once trained, the computer will find and code (based on its training) the responsive documents far quicker (and often with far greater accuracy) than human reviewers. Specifically, the computer will build a model to identify documents that have a high probability of correct classification into categories pre-defined through the training /seed coding.

As with any review (entirely human or a combination of human and machine review), a validation process should be implemented.  Specifically, there should be a work flow created that provides for attorney reviewers to check the efficacy and accuracy of the model.   It is important here to determine what validation/QC process is best implemented.  For example, one can implement a statistical sampling of data where documents are selected at random and reviewed for accuracy.  This sort of validation would be reflective of the machine’s overall accuracy and reflective of the overall document population.  There is also, however, a more particularized sampling where a group of relevant documents are selected from the population and reviewed for accuracy.  This sort of validation would be more limited in that it would not allow the attorney running the review to form any conclusions about the entire document population.  (FN**)

Because of the ever-increasing volume of data and information, predictive coding is becoming a more attractive tool to incorporate into every document review to some degree, especially because no minimum data size is required to use predictive coding.  A document review that uses predictive coding coupled with a well-devised work flow will inevitably minimize review costs while maximizing efficiency during the review.

 

FN* Because the coding on these seed documents will impact the quality of the computer’s determinations, it is important the individuals coding the seed documents understand well the lawsuit and how the predicting coding system is to work.

FN**  And, if you are not comfortable allowing a computer to do that much work, other predictive coding options (e.g., other than allowing the system to extrapolate based upon seed sets) are available.  For example, prioritized review can be used whereby the system identifies and escalates important documents for review but keeps likely irrelevant documents in the queue.  Incorporating this option into your work flow allows attorneys to still lay eyes on all documents but provides for an efficient prioritization of documents that must be reviewed.

In Hyles v. New York City et. al., (Case No. 10-3119, 2016 U.S. Dist. LEXIS 100390 [S.D.N.Y. Aug. 1, 2016], the plaintiff, an African-American female employed by the City of New York, was demoted.  Specifically, she was replaced by a white male and demoted to a different position with a lesser salary.  Ultimately, plaintiff sued the City for discrimination and a hostile work environment under various federal statutes.

Discovery in the case was unnecessarily protracted for a number of reasons including a temporary stay and attendant delays due to mediation, motion practice, and what the Court called, a “lack of effort by counsel.” Eventually, a discovery conference was held before Magistrate Judge Andrew Peck after counsel for both parties jointly requested the Court resolve various discovery disputes.  As is relevant to this blog, the parties requested the Judge determine the scope of electronic discovery regarding: (a) custodians, (b) the date range to be searched, and (c) search methodology to be utilized.  Regarding the issue of search methodology, the City sought to use keyword searches designed to identify potentially responsive materials.  Plaintiff, on the other hand, requested the Court compel the City to use a form of technology assisted review (“TAR”) to perform the City’s search for potentially responsive materials.  In seeking to compel the City, plaintiff asserted that TAR is the more cost effective and efficient way to obtain discovery.   The City, in opposition, argued that the cost of TAR was too much and, because the parties failed to collaborate well in the past they “would not be able to collaborate to develop the seed set for a TAR process.”

In response to the plaintiff’s argument that the use of TAR would be the most efficient and cost effective, Judge Peck agreed stating “the Court believes that for most cases today, TAR is the best and most efficient search tool,” finding it “superior” to key word searching and noting, “[t]he Court would have liked the City to use TAR in this case”.  However, citing Sedona Conference Principle 6, Judge Peck held that “the responding party is best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own [ESI].”

Judge Peck noted that someday, the law may be at the point where “it might be unreasonable for a party to decline to use TAR… [but,] [w]e are not there yet.” Hyles, supra, 2016 U.S. Dist. LEXIS 100390 . at *9-*10.  Therefore, the Court denied plaintiff’s application to force defendant to use  predictive coding.

It is interesting to note the ever-growing trend among federal judges to embrace TAR as an effective way to contain costs and engage in an efficient discovery process.  While it is true that the state of the law currently allows the responding party to determine how best to identify potentially responsive data such that the party can comply with its discovery obligations, I predict (no pun intended) that more and more parties – when faced with the potentially tremendous financial costs attendant to e-discovery – may soon turn to various TAR methodologies if only as a means to control costs.

In today’s litigious world, discovery is costly and can be perilous. Exacerbating this landscape is the fact that sanctions are imposed for discovery violations more than any other litigation error. Not surprisingly, avoidable discovery mistakes lead to client dissatisfaction.  Below are ten critical tips to avoid discovery sanctions and to remain compliant with discovery obligations.

  1. Implement Timely Litigation Holds Be sure your legal hold is implemented as soon as litigation is reasonably anticipated. Be certain that your hold notice is sufficiently broad, is sent to the right custodians, receipt is acknowledged, and it is updated as needed.
  2. Conduct Key Custodian Interviews A lawyer cannot rely only on the hold notice.  Rather, custodial interviews with key players, IT personnel and anyone else with information relevant to the dispute or the client’s network architecture should be conducted.  Minimally, these interviews will confirm the suspension of auto-delete protocols and will help identify all relevant information for preservation and collection.
  3. Be Proactive Because in today’s technology-intensive world there are substantial quantities of ESI, if you want to receive a document demand before preserving and collecting documents, you may not have time to respond to those demands.  Anticipate document demands so you can start the interview, identification and collection process.  You will have a better handle on the documents (what does and does not exist), and your client’s story such that you will be in the best position to comply with discovery and meet discovery challenges.
  4. Honesty is the Best Policy When Dealing with the Courts and Opposing Parties Never make a factual representation about the status of preservation, collection, or production efforts without confirming the underlying facts with original sources. While a client will rarely mislead their lawyer intentionally, it is common for clients to have incomplete information or operate under a misunderstanding of fact when information is communicated second- hand.   Moreover, courts and opposing parties understand that mistakes can happen at various stages of the discovery process.  Such issues must be addressed immediately and head-on.  Usually the optimal strategy is full disclosure along with remedial measures.
  5. Always Budget Obtain a realistic budget before proceeding with ESI collection processing and/or review.  This is a costly area of litigation and lawyers must manage client expectations. Update the budget as needed to accommodate changes attributable to collection volume or other factors.
  6. You Get More Bees with Honey… Seek a cooperative approach irrespective of how unpleasant or unreasonable opposing counsel may be. Indeed, a cooperative approach to discovery will invariably reduce disputes and expenses. Take the higher road and assume that every email and letter you write to opposing counsel may end up in front of the judge, so adopt a cooperative approach and reasonable tone in all communications with opposing counsel.    As one of our earlier blog posts showed (see Armstrong Pump, Inc. v. Hartman, No. 10-CV-446S, 2014 WL 6908867 (W.D.N.Y. Dec. 9, 2014)), Judges have very little patience for uncooperative behavior during a lawsuit’s “search for the truth.”
  7. There’s No Longer Room For Boilerplate Discovery The amended FRCP 26(g)(1)(B)(iii) provides that every discovery request and response must be signed by at least one attorney of record, and by signing you certify that the discovery request or response is proportional – meaning “neither unreasonable nor unduly burdensome or expensive considering the needs of the case, prior discovery in the case, the amount in controversy, and the importance of issues at stake….”  The Rule goes on to state that “[i]f a certification violates this rule without substantial justification, the court must impose an appropriate sanction on the signer, the party on whose behalf the signer was acting, or both.”
  8. Be Careful What You Wish For…Lest You Receive It In Return Never send a discovery request to an adversary that you or your client would be uncomfortable complying with were opposing counsel to author a reciprocal request to you.
  9. Carefully Devised Search Terms Are Critically Important The judgment of your legal team is a good starting point for crafting search terms, but is far from sufficient.  Review a preliminary “hit-by-term” report from your ESI vendor so you can appreciate which terms are too limiting or overbroad.  During custodial interviews (see supra) ask about project code names, and other unique search terms.  Then sample, sample, sample!  Sampling the documents—both the hits and the non-hits—can help refine search terms and validate the terms chosen.
  10. Wise Use of Technology Can Be a Litigator’s Best Friend ESI processing, review (even with contract attorneys) and production is among the most costly elements of any litigation.  When used efficiently and wisely, technology can significantly reduce those costs. Consider early data assessment, filtering and predictive coding technology as appropriate for each matter.