On October 1, 2018, a new Rule (specifically, a new subdivision to existing Rule 11-e) of the Commercial Division Rules, will go into effect. 

Rule 11-e governs Responses and Objections to Document Requests.  The new subdivision, promulgated by administrative Order of Chief Administrative Judge Lawrence K. Marks, governs the use of technology-assisted review (“TAR”) in the discovery process. 

The new subdivision (f) states:

The parties are encouraged to use the most efficient means to review documents, including electronically stored information (“ESI”), that is consistent with the parties’ disclosure obligations under Article 31 of the CPLR and proportional to the needs of the case.  Such means may include technology-assisted review, including predictive coding, in appropriate cases…

In addition to implicitly recognizing the cost attendant to e-discovery, the rule promotes cooperation by encouraging parties in commercial cases “to confer, at the outset of discovery and as needed throughout the discovery period, about [TAR] mechanisms they intend to use in document review and production.”  And so, the new Commercial Division Rule appears to bring New York State Commercial Division expectations closer in line with those set forth in the Federal Rules, specifically Rule 26(f), which encourages litigants (with an eye toward proportionality) to discuss preservation and production of ESI.

Questions about technology assisted review?  Please contact kcole@farrellfritz.com.

Traditional document review can be one of the most variable and expensive aspects of the discovery process.  The good news is that there are innumerable analytic tools available to empower attorneys to work smarter, whereby reducing discovery costs and allowing attorneys to focus sooner on the data most relevant to the litigation.   And, while various vendors have “proprietary” tools with catchy names, the tools available all seek to achieve the same results:  smarter, more cost effective review in a way that is defensible and strategic.

Today’s blog post discusses one of those various tools – predictive coding.  The next few blog posts will focus on other tools such as email threading, clustering, conceptual analytics, and keyword expansion.

Predictive Coding

Predictive coding is a machine learning process that uses software to take keyword searches / logic, entered by people, for the purpose of finding responsive documents, and applies it to much larger datasets to reduce the number of irrelevant and non-responsive documents that need to be reviewed manually.  While each predictive algorithm will vary in its actual methodology, the process at a very simplistic level involves the following steps:

  1.  Data most likely relevant to the litigation is collected. Traditional filtering and de-duplication is applied.  Then, human reviewers will identify a representative cross-section of documents, known as a “seed set,” from the remaining (de-duplicated) population of documents that need to be reviewed.   The number of documents in that seed set will vary, but it should be sufficiently representative of the overall document population.
  2.  Attorneys most familiar with the substantive aspects of the litigation code each document in the seed set responsive or non-responsive as appropriate. Mind you, many of the predictive coding software available allows users to perform classification for multiple issues simultaneously (i.e., responsiveness and confidentiality).  These coding results will then be input into the predictive coding software.
  3.  The predictive coding software analyzes the seed set and creates an internal algorithm to predict the responsiveness of other documents in the broader population.  It is critically important after this step that the review team who coded the seed set spend time sampling the results of the algorithm on additional documents and refine the algorithm by continually coding and inputting sample documents until desired results are achieved.  This “active learning” is important to achieve optimal results.  Simply stated, active learning is an iterative process whereby the seed set is repeatedly augmented by additional documents chosen by the algorithm and manually coded by a human reviewer. (This differs from “passive learning,” which is an iterative process that uses totally random document samples to train the machine until optimal results are achieved).

Once the team is comfortable with the results being returned, the software applies the refined algorithm to the entire review set and codes all remaining documents as responsive or unresponsive.

 

 

 

 

When one preserves and collects electronic data for a litigation, one typically casts a broad net.  This, in turn, can result in the preservation and collection of a significant volume of documents that are not relevant to the dispute at hand.  In an effort to identify the most likely relevant documents from the cache that has been broadly preserved and collected, lawyers tend to use search terms and keywords.  But, as anyone who has engaged in that process knows, due to the range of language used in everyday communications, even the most targeted search terms yield results that are not relevant (i.e., “false hits”). So how can a practitioner best gauge the overall effectiveness of their document collection and review process?

Enter PRECISION and RECALL — the two metrics that best assess effectiveness.

So what exactly is precision and recall?

Precision

Precision measures how many of the documents retrieved are actually relevant.  For example, a 75 percent precision rate means that 75 percent of the documents retrieved are relevant, while 25 percent of those documents have been misidentified as relevant.

Recall

Recall measures how many of the relevant documents in a collection have actually been found. For example, a 60 percent recall rate means that 60 percent of all relevant documents in a collection have been found, and 40 percent have been overlooked.

It is relatively easy to achieve high recall with low precision if you collect robustly.  The downside is you will also retrieve a lot of irrelevant information, which in turn will increase the cost of review.   Similarly, high precision with low recall is easy to achieve.  By keeping your key word searches few and narrow, you will likely retrieve mostly relevant documents; and review costs will be contained because you will collect only relevant information. Many relevant documents, however, will also be overlooked.

The ideal result is to achieve high recall with high precision.  But identifying only the necessary information and little else is a task difficult to achieve.  In order to maximize your chance of achieving high recall with high precision, consider using a combination of temporal limitations, search terms that are vetted with the individuals most familiar with the intricacies of the case and its underlying facts, and early analytics to assess the validity of the terms chosen.

 

You are involved in litigation and faced with a document review need, what now? Naturally you need to find attorneys to review these documents. To this end, depending on the volume of data at issue, many firms will either: (1) staff the document review with firm attorneys, or (2) work with a vendor to retain a review team comprised of contract attorneys. Irrespective of who conducts the needed review, the cost attendant to that review and the time to complete the review is often a concern.  Because a party to a litigation should not produce documents without reviewing them, predictive coding may be a particularly helpful option.

Simply put, predictive coding is the use of a computer system to help determine which documents are relevant to a particular legal proceeding.  The system makes this determination based upon “training” it receives from human input.  In fact, for a predictive coding system to make accurate decisions, the system needs direction from humans fluent in the intricacies of the lawsuit.  During this training phase, attorneys will review a seed set of documents and code those documents accordingly (i.e., responsive, privilege, tagging issues applicable). (FN*) At each step of this process, the computer system is being trained and educated. Refinements are made along the way and internalized by the system. Once trained, the computer will find and code (based on its training) the responsive documents far quicker (and often with far greater accuracy) than human reviewers. Specifically, the computer will build a model to identify documents that have a high probability of correct classification into categories pre-defined through the training /seed coding.

As with any review (entirely human or a combination of human and machine review), a validation process should be implemented.  Specifically, there should be a work flow created that provides for attorney reviewers to check the efficacy and accuracy of the model.   It is important here to determine what validation/QC process is best implemented.  For example, one can implement a statistical sampling of data where documents are selected at random and reviewed for accuracy.  This sort of validation would be reflective of the machine’s overall accuracy and reflective of the overall document population.  There is also, however, a more particularized sampling where a group of relevant documents are selected from the population and reviewed for accuracy.  This sort of validation would be more limited in that it would not allow the attorney running the review to form any conclusions about the entire document population.  (FN**)

Because of the ever-increasing volume of data and information, predictive coding is becoming a more attractive tool to incorporate into every document review to some degree, especially because no minimum data size is required to use predictive coding.  A document review that uses predictive coding coupled with a well-devised work flow will inevitably minimize review costs while maximizing efficiency during the review.

 

FN* Because the coding on these seed documents will impact the quality of the computer’s determinations, it is important the individuals coding the seed documents understand well the lawsuit and how the predicting coding system is to work.

FN**  And, if you are not comfortable allowing a computer to do that much work, other predictive coding options (e.g., other than allowing the system to extrapolate based upon seed sets) are available.  For example, prioritized review can be used whereby the system identifies and escalates important documents for review but keeps likely irrelevant documents in the queue.  Incorporating this option into your work flow allows attorneys to still lay eyes on all documents but provides for an efficient prioritization of documents that must be reviewed.

Today’s post draws upon countless other recent articles and blogs and their respective predictions regarding, what’s in store for 2016 when it comes to e-discovery.  I have tried to synthesize below, the steps that I believe every litigator should embrace for the coming year.

First, learn the new rules of civil procedure. The amended Federal Rules of Civil Procedure took effect in December 2015.  As you all likely know by now, the new rules emphasize cooperation and proportionality.  Specifically, the amendments require lawyers to better understand best practices for complying with and participating in their discovery obligations especially in the “E” world (i.e., electronic).  With the change in Rules, it is inevitable that the federal decisions will begin to discuss and interpret these rules.  We, as lawyers, need to follow and digest those decisions and interpretations and make certain that our client’s do what is necessary to comply with the new Rules and the decisional law on point.

Next, economize without jeopardizing defensibility. Any attorney responsible for a case that involves a large document collection/review/production component has inevitably heard complaints from clients about the cost of that component of litigation.  There are, however, ways to defensibly contain costs (i.e., limiting custodians, utilizing key word searches, restricting time frames, utilizing contract attorneys for review, deduplication, deNYSTing, early cases assessment, data analytics…).  However, if 2015 taught us anything it was that federal judges in our Circuit are embracing technology assisted review.  Look no further than Magistrate Judge Peck’s decision in Rio Tinto PLC v. Vale S.A., (See Magistrate Judge Peck’s Recent Decision on the Use of Predictive Coding and the Cooperative Obligations Involved ) where he endorses this advancement as one of most efficient ways to efficiently review a large volume of data. Judge Peck commented “it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it”…and “predictive coding [is] widely accepted for limiting e-discovery to relevant documents and effecting discovery of ESI without an undue burden.”  Consider technology assisted review if you need to stay within your litigation budget on high volume cases.

Third, Stay Abreast of Advances in Technology. As mentioned in past blog posts (See blog Will New York Follow California’s Lead) a number of state ethics opinions and rules are now emphasizing the need for lawyers to possess competence in technology.  Specifically, lawyers must demonstrate knowledge of techniques for handling electronically stored information in discovery. At least one federal court has cited California’s formal ethics opinion, suggesting attorneys “should be able to perform” various eDiscovery tasks, including preserving, identifying, collecting, and producing data (either on their own or with guidance of e-Discovery specialists or counsel).  I suspect other courts are not far behind.  So…no time like the present to get comfortable with e-discovery demands and technology.

Fourth, Understand How Your Corporate  Client’s Employees Create and Store Data. I need look no further than my eleven year old to realize I don’t understand the latest devices and apps, or the vast amount of data he can create on those devices and apps.  Now, imagine that volume potential on the corporate level! We can no longer take comfort that we collected data from servers, laptops, and mobile devices.  Instead, your collection plan must identify any potentially relevant data that exists in atypical formats including, for example, social media (snapchap, facebook, Instagram, etc.), text messages, the cloud, ….Then, our plan must assess how to preserve this information and whether collection is necessary.

Fifth – and Definitely Not Finally — Everyone Should Think About Cyber-Security. With the Cybersecurity Information Sharing Act of 2015 signed into law in December, cybersecurity is no longer just an issue for one’s information technology team.  We, as attorneys, must prioritize efforts to make sure our corporate clients are preparing for a potential data breach and informing their employees of steps to take that may safeguard their data.

A little more than three years ago, federal Magistrate Judge Andrew J. Peck (SDNY), issued a seminal decision in Da Silva Moore v. Publicis Groupe & MSL Group, 11 Civ. 1279 (February 24, 2012).  Indeed, in that ruling, Judge Peck sent a message that predictive coding and computer assisted review is an appropriate tool that should be “seriously considered for use” in large data-volume cases and attorneys “no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review.”    Judge Peck went on to encourage parties to cooperate with one another and to consider disclosing the initial “seed” sets of documents.  In doing so, he recognized that sharing of seed sets is often frowned upon by counselors who argue that these sets often contain information wholly unrelated to the action, much of which may be confidential or sensitive.  Specifically Judge Peck stated: “This Court highly recommends that counsel in future cases be willing to at least discuss, if not agree to, such transparency [with seed sets] in the computer-assisted review process.”

Since Da Silva,  many cases have successfully employed various forms of technology assisted review (“TAR”) to limit the scope of documents actually reviewed by attorneys.  It is well-embraced that the upside of utilizing TAR is to make document review a more manageable and affordable task.  Moreover, Courts routinely embrace TAR for document review  See, e.g., Rio Tinto PLC v. Vale S.A., S.D.N.Y. No. 14 Civ. 3042 (RMB)(AJP) (March 3, 2015) (“the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it”).

In Rio Tinto, Judge Peck revisited his DaSilva decision. And, while most of Rio Tinto discusses the merits of transparency and cooperation in the development of seed sets, Judge Peck notes there is no definitive answer on the extent of transparency and cooperation required.   Citing to his opinion in DaSilva and other cases, Judge Peck makes clear that he “generally believe[s] in cooperation” in connection with seed set development. Nevertheless, Judge Peck notes there is no absolute requirement of transparent cooperation.  Rather, “requesting parties can insure that training and review was done appropriately by other means, such as statistical estimation of recall at the conclusion of the review as well as by whether there are gaps in the production, and quality control review of samples from the documents categorized as now responsive.” (emphasis added)

The decision goes on to emphasize that courts and litigants should not hold predictive coding to a so-called “higher standard” than keyword searches or linear review. Such a standard could very well dissuade counsel and clients from using predictive coding, which would be a step backward for discovery practice overall.