When tasked with a document review project, there are various analytic tools available to streamline the process in order to improve efficiency and accuracy.  We’ve already discussed certain of these tools (see April 26 post discussing predictive coding and May 16 post discussing email threading).  Today’s post focuses on another, interrelated tool: document clustering.

What is Document Clustering?

As you can imagine, the way in which a cache of documents is organized for review can make a tremendous difference in not only the efficiency of the review, but also the accuracy of the review itself.  Clustering software examines the text in documents, determines which documents are related to each other, and groups them into clusters.  Clustering performs the electronic equivalent of putting your documents into labeled boxes so that things only end up in the same box if they belong together. Clustering groups similar documents together and then assigns those document to the same reviewer(s), allowing for a more efficient review as related documents can be reviewed together.  Clustering organizes the documents according to the structure that arises naturally, without query terms.  It labels each cluster with a set of keywords, providing a quick overview of the cluster; basically telling you, the project lead, what the documents have in common at a conceptual level. The keywords give a quick idea of what each cluster is about, allowing you to easily identify the themes of your document set.  For example, if you are a litigator looking for information about a particular contract and the cluster is about the Company’s summer softball team, documents in that cluster are not relevant.  During review, you can, with a single mouse click, categorize or tag a single document, a cluster of documents, or a set of clusters containing a specific combination of keywords. *


*Certain clustering software has an automatic categorization capability, where all documents sufficiently similar to a set of documents can be categorized the same way, greatly reducing the amount of labor needed when new documents are added to a case.  It enables you to leverage the labor you’ve put into categorizing the earlier documents.

The April 26 blog post discussed predictive coding as one of many analytical tools available to empower attorneys to work smarter, thereby reducing discovery costs and allowing attorneys to focus sooner on the data most relevant to the litigation. Another tool in the litigator’s arsenal that can promote efficiency during document review is email threading.

According to The Radicati Group, the average employee sends 36 emails per day. That extrapolates to approximately 10,000 emails per employee per year.  And so, even in cases involving a limited number of custodians, the volume of emails at issue can be significant.

Email threading provides a number of benefits, including eliminating the need to review the same content multiple times and minimizing potential for inconsistent coding of emails.

So what exactly is threading?  It is a process by which emails are grouped together so they can be reviewed as a single coherent conversation.  For example, if I write John Smith an email, John’s reply will very likely include my original message at the bottom of the email chain.

When an email collection for discovery purposes occurs, both segments of the email chain are collected.  Presuming my conversation with John continues, the email may have many segments over the course of days /weeks.

In email threading, an algorithm compares and matches segments, resulting in emails from the same conversation being grouped together.  Then, the most inclusive email (i.e., the one with the most complete content) is promoted for review by the review team. Non-inclusive emails (i.e., those with text and attachments contained in another inclusive email) are suppressed.  By reviewing inclusive messages, rather than non-inclusive messages, the review team bypasses redundant content and limits the number of documents to review.

Threading also allows the reviewer to see the full picture.  For example, if an email conversation has 16 segments, and those segments are spread among various reviewers, they would appear as separate messages with no particular order, allowing for the first segment to be reviewed 16 times (once on its own, then as a segment in the second message, and in the third message, and so on) by multiple different reviewers.   And, if the first three segments are non-responsive, what happens when the fourth segment is responsive?  Do you have to search for the first three segments?

In summary, email threading should be implemented in every case.  It makes review more efficient, more consistent, and can streamline even the smallest review project.

When one preserves and collects electronic data for a litigation, one typically casts a broad net.  This, in turn, can result in the preservation and collection of a significant volume of documents that are not relevant to the dispute at hand.  In an effort to identify the most likely relevant documents from the cache that has been broadly preserved and collected, lawyers tend to use search terms and keywords.  But, as anyone who has engaged in that process knows, due to the range of language used in everyday communications, even the most targeted search terms yield results that are not relevant (i.e., “false hits”). So how can a practitioner best gauge the overall effectiveness of their document collection and review process?

Enter PRECISION and RECALL — the two metrics that best assess effectiveness.

So what exactly is precision and recall?


Precision measures how many of the documents retrieved are actually relevant.  For example, a 75 percent precision rate means that 75 percent of the documents retrieved are relevant, while 25 percent of those documents have been misidentified as relevant.


Recall measures how many of the relevant documents in a collection have actually been found. For example, a 60 percent recall rate means that 60 percent of all relevant documents in a collection have been found, and 40 percent have been overlooked.

It is relatively easy to achieve high recall with low precision if you collect robustly.  The downside is you will also retrieve a lot of irrelevant information, which in turn will increase the cost of review.   Similarly, high precision with low recall is easy to achieve.  By keeping your key word searches few and narrow, you will likely retrieve mostly relevant documents; and review costs will be contained because you will collect only relevant information. Many relevant documents, however, will also be overlooked.

The ideal result is to achieve high recall with high precision.  But identifying only the necessary information and little else is a task difficult to achieve.  In order to maximize your chance of achieving high recall with high precision, consider using a combination of temporal limitations, search terms that are vetted with the individuals most familiar with the intricacies of the case and its underlying facts, and early analytics to assess the validity of the terms chosen.


You are involved in litigation and faced with a document review need, what now? Naturally you need to find attorneys to review these documents. To this end, depending on the volume of data at issue, many firms will either: (1) staff the document review with firm attorneys, or (2) work with a vendor to retain a review team comprised of contract attorneys. Irrespective of who conducts the needed review, the cost attendant to that review and the time to complete the review is often a concern.  Because a party to a litigation should not produce documents without reviewing them, predictive coding may be a particularly helpful option.

Simply put, predictive coding is the use of a computer system to help determine which documents are relevant to a particular legal proceeding.  The system makes this determination based upon “training” it receives from human input.  In fact, for a predictive coding system to make accurate decisions, the system needs direction from humans fluent in the intricacies of the lawsuit.  During this training phase, attorneys will review a seed set of documents and code those documents accordingly (i.e., responsive, privilege, tagging issues applicable). (FN*) At each step of this process, the computer system is being trained and educated. Refinements are made along the way and internalized by the system. Once trained, the computer will find and code (based on its training) the responsive documents far quicker (and often with far greater accuracy) than human reviewers. Specifically, the computer will build a model to identify documents that have a high probability of correct classification into categories pre-defined through the training /seed coding.

As with any review (entirely human or a combination of human and machine review), a validation process should be implemented.  Specifically, there should be a work flow created that provides for attorney reviewers to check the efficacy and accuracy of the model.   It is important here to determine what validation/QC process is best implemented.  For example, one can implement a statistical sampling of data where documents are selected at random and reviewed for accuracy.  This sort of validation would be reflective of the machine’s overall accuracy and reflective of the overall document population.  There is also, however, a more particularized sampling where a group of relevant documents are selected from the population and reviewed for accuracy.  This sort of validation would be more limited in that it would not allow the attorney running the review to form any conclusions about the entire document population.  (FN**)

Because of the ever-increasing volume of data and information, predictive coding is becoming a more attractive tool to incorporate into every document review to some degree, especially because no minimum data size is required to use predictive coding.  A document review that uses predictive coding coupled with a well-devised work flow will inevitably minimize review costs while maximizing efficiency during the review.


FN* Because the coding on these seed documents will impact the quality of the computer’s determinations, it is important the individuals coding the seed documents understand well the lawsuit and how the predicting coding system is to work.

FN**  And, if you are not comfortable allowing a computer to do that much work, other predictive coding options (e.g., other than allowing the system to extrapolate based upon seed sets) are available.  For example, prioritized review can be used whereby the system identifies and escalates important documents for review but keeps likely irrelevant documents in the queue.  Incorporating this option into your work flow allows attorneys to still lay eyes on all documents but provides for an efficient prioritization of documents that must be reviewed.

Last year we wrote about the Lola v. Skadden Arp case wherein contract attorney, David Lola, brought suit under the Fair Labor Standards Act (“FLSA”) for overtime pay.  (See When Do Contract Attorneys “Practice Law”? and What Exactly is The Practice of Law). In or around December 16, 2015, the Lola case was settled and, on December 22, Judge Richard Sullivan (SDNY) approved the settlement, which called for $75,000 to paid to named plaintiff David Lola and two other contract attorneys placed at Skadden Arp.

The settlement, however, left unresolved the issue shared between the Lola case and the Henig v. Quinn Emanuel Urquhard & Sullivan (13-cv-1432) case, which had been pending before the Honorable Ronnie Abrams.  Namely, whether some legal work – like document review – is so routine that it cannot be considered the practice of law.

The Henig suit stems from two months of work Mr. Henig did in 2012 as a temporary attorney.  William Henig – who received $35/ hour for the work performed – sought overtime pay from Quinn Emanuel under the FLSA saying he reviewed more than 13,000 documents to assess their relevance to a litigation and whether the documents were considered privileged or confidential.  Under the FLSA and New York Labor Law, law firms are exempt from paying overtime to licensed lawyers engaged in legal work who put in more than 40 hours a week.  Henig, however, argued – much like Lola before him – that he was not engaged in the practice of law as he was not required to exercise any legal judgement.  Rather, he was engaged only in the mundane task of “tagging” documents during a large scale document review.  More specifically, Henig claimed that after a power point orientation all he had to do was assess whether a document was responsive or not responsive based solely upon a chart provided to him by Quinn Emanuel.  Southern District Judge Abrams, however, granted summary judgment to Quinn Emanuel commenting that, “Not all [large scale document review projects are] law at its grandest but all of it is the practice of law.  Mr. Henig was engaged in that practice.”  She noted that part of Henig’s role in reviewing documents was to assess not only responsiveness to a given discovery demand, but to flag for further review a document that had any possibility of being privileged.  Judge Abrams also noted that the orientation presentation instructed the contract attorneys to look for interesting and hot documents that are “important” to the case and “documents that would be helpful in depositions or briefs should be flagged.”  “The presentation indeed uses language that anticipates the need for legal judgment, particularly with regard to privilege, which the presentation acknowledges is ‘tricky’ and ‘includes a lot of gray areas’.”

Notwithstanding persuasive positions during oral argument, Abrams dismissed the 2013 lawsuit seeking overtime pay from Quinn Emanuel, finding that the work of contract attorney William Henig, while perhaps a bit dull, qualifies as the practice of law.  In fact, the Judge stated, “plaintiff’s tagging history and his other descriptions of his role on….the project…confirm that his job involved more than the largely mindless task that would result from following the [Quinn Emanuel] instructions to the letter…In particular, plaintiff’s use of the deliberative process privilege and ‘key’ tags on certain documents…make clear that plaintiff’s work…involved the type of professional judgment necessary to be engaged in the practice of the law.”

Both lawsuits were closely followed by the industry and students alike.  Indeed, firms were interested in the outcomes as contract attorneys are increasingly used as a low-cost way to tackle massive document reviews obligations thanks to the ever growing volume of electronically stored information. Moreover, young graduates were eager to see the outcome given that a  ruling that document review is not the practice of law could result in law firms hiring anyone to do the work, making competition for these positions even more acute.

On July 23, 2015, the Second Circuit, in Lola v. Skadden, Arps, Slate, Meagher & Flom LLP, Tower Legal Staffing, Inc., revived (see our earlier blog posts dated March 11, 2015) a putative collective action brought by David Lola, a contract attorney, against Skadden Arps and Tower Legal Staffing, Inc., alleging violations of the overtime provisions of the Fair Labor Standards Act (“FLSA”).  The Second Circuit held that Lola adequately pled that document review may not necessarily constitute “practicing law” under North Carolina law.  The gravamen of Lola’s complaint was that he performed document review under such tight constraints that he exercised no legal judgment whatsoever and thus, could not be considered to be “practicing law.”  Specifically, Lola alleged his document review was closely supervised and primarily consisted of:

  • looking at documents to see which search terms (pre-determined by Skadden attorneys) appeared in those documents;
  • categorizing those documents into categories pre-determined by Skadden attorneys; and
  • redacting documents based on specific protocols devised by Skadden attorneys.

Lola was paid $25 an hour and generally worked between 45 and 50 hours per week.  He was classified as exempt under the FLSA and therefore did not receive overtime pay.

Lola brought suit against Skadden and Tower Legal Staffing, Inc. as putative joint employers, on behalf of himself and similarly situated employees, alleging that he was misclassified as exempt under the FLSA and seeking overtime pay.  While attorneys generally qualify for the FLSA’s professional exemption, Lola alleged that he and other contract attorneys performing document review for Skadden were not engaged in the practice of law because they “performed document review under such tight constraints that [they] exercised no legal judgment whatsoever.”  The defendants moved to dismiss the complaint, arguing that  Lola, as an attorney, was exempt under the FLSA’s professional exemption.

The district court (Judge Sullivan, S.D.N.Y.) granted the defendants’ motion to dismiss finding, first, that the definition of “practice of law” is “primarily a matter of state concern,” and that because Lola resided at all relevant times in North Carolina, that state’s law should apply when analyzing whether he was practicing law under the FLSA.  The court then concluded that Lola was engaged in the practice of law under North Carolina law, and therefore an exempt employee under the FLSA.  Lola appealed the decision to the Second Circuit.

As a threshold matter, the Second Circuit agreed with the district court that North Carolina law should control the question of whether Lola was practicing law within the meaning of the FLSA’s professional exemption.  Constrained to accept the allegations in the complaint as true for purposes of the defendants’ motion to dismiss, however, the Court of Appeals disagreed with the district court’s conclusion that by undertaking the document review Lola was necessarily “practicing law” within the meaning of North Carolina law. Rather, because North Carolina defines the “practice of law” as requiring “at least a modicum of independent legal judgment” and a fair reading of the complaint in the light most favorable to Lola is that he provided services that a machine could have provided, Lola cannot be said to be engaged in the practice of law within the meaning of the FLSA and therefore did not qualify for the professional exemption.  For this reason, the Court of Appeals vacated the judgment of the district court dismissing the complaint, and remanded the case for further proceedings.

A little more than three years ago, federal Magistrate Judge Andrew J. Peck (SDNY), issued a seminal decision in Da Silva Moore v. Publicis Groupe & MSL Group, 11 Civ. 1279 (February 24, 2012).  Indeed, in that ruling, Judge Peck sent a message that predictive coding and computer assisted review is an appropriate tool that should be “seriously considered for use” in large data-volume cases and attorneys “no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review.”    Judge Peck went on to encourage parties to cooperate with one another and to consider disclosing the initial “seed” sets of documents.  In doing so, he recognized that sharing of seed sets is often frowned upon by counselors who argue that these sets often contain information wholly unrelated to the action, much of which may be confidential or sensitive.  Specifically Judge Peck stated: “This Court highly recommends that counsel in future cases be willing to at least discuss, if not agree to, such transparency [with seed sets] in the computer-assisted review process.”

Since Da Silva,  many cases have successfully employed various forms of technology assisted review (“TAR”) to limit the scope of documents actually reviewed by attorneys.  It is well-embraced that the upside of utilizing TAR is to make document review a more manageable and affordable task.  Moreover, Courts routinely embrace TAR for document review  See, e.g., Rio Tinto PLC v. Vale S.A., S.D.N.Y. No. 14 Civ. 3042 (RMB)(AJP) (March 3, 2015) (“the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it”).

In Rio Tinto, Judge Peck revisited his DaSilva decision. And, while most of Rio Tinto discusses the merits of transparency and cooperation in the development of seed sets, Judge Peck notes there is no definitive answer on the extent of transparency and cooperation required.   Citing to his opinion in DaSilva and other cases, Judge Peck makes clear that he “generally believe[s] in cooperation” in connection with seed set development. Nevertheless, Judge Peck notes there is no absolute requirement of transparent cooperation.  Rather, “requesting parties can insure that training and review was done appropriately by other means, such as statistical estimation of recall at the conclusion of the review as well as by whether there are gaps in the production, and quality control review of samples from the documents categorized as now responsive.” (emphasis added)

The decision goes on to emphasize that courts and litigants should not hold predictive coding to a so-called “higher standard” than keyword searches or linear review. Such a standard could very well dissuade counsel and clients from using predictive coding, which would be a step backward for discovery practice overall.

In a case that helps clarify what discovery-specific activities constitute the practice of law, District Court Judge Richard Sullivan – a judge in the Southern District of New York – ruled that contract attorneys performing document review for a law firm are not entitled to overtime pay because they are engaged in legal work.

Specifically, the case involved a collective action initiated by contract attorney David Lola in July 2013 against law firm Skadden Arps Slate Meagher & Flom (“Skadden”) and Tower Legal Staffing (“Tower”) arising from work he performed for Tower over the course of 15 months as a contract attorney in North Carolina. Although Lola is a licensed attorney in California, he is not licensed to practice in North Carolina or the Northern District of Ohio, where litigation involving a Skadden client necessitated the review work.

Lola performed elementary review that consisted of identifying search terms appearing in documents, marking those documents for responsiveness, and, occasionally, redacting materials according to protocols Tower and Skadden provided. He earned $25 per hour working 45 to 55-hour weeks. His fellow contract attorneys received similar wages, with no increase in pay for hours worked in excess of 40 hours per week.  Lola claims the legal industry has been exploiting for years contract attorneys who conduct document review projects for extended hours at a time and without overtime compensation. Though Tower hired and paid the contractors working on the Skadden project, it was Skadden that oversaw the work and provided coding protocols and guidelines. Skadden also had the authority to terminate reviewers.

Skadden moved to dismiss the suit last October, arguing that, as a licensed attorney, Lola was exempted from overtime pay under the Fair Labor Standards Act (“FLSA”), and that he had failed to show that Skadden actually employed him.

Under the FLSA, the Department of Labor, which has the authority to exempt employees working in a “in a bona fide… professional capacity,” does not require employers to pay overtime to “holder[s] of a valid license or certificate permitting the practice of law… and is actually engaged in the practice thereof.”

Lola’s counsel argued that “When one’s job consists solely of searching keywords and categorizing those documents based on those keywords, it is absolutely not the practice of law.” Adding, “It involves no legal analysis, judgment, discretion or advice, and can be performed by a non-lawyer.” Skadden argued that, though the tasks are not glamorous, review work represents a core attorney function on par with drafting pleadings and memoranda of law, and conducting legal research.  Skadden also emphasized that the North Carolina Bar acknowledges document review is legal work.

Calling upon professional and ethical codes of North Carolina, where the contract attorneys were conducting their document review, Judge Sullivan determined that document review rises to the level of legal practice — irrespective of its simplicity/complexity or the legal credentials of those performing it. The application of legal judgment, Judge Sullivan said, is not a prerequisite for an activity to be deemed “practice of law.”  Judge Sullivan reasoned that document review is a legal task, like double-checking citations while drafting a brief, that often requires little to no legal judgment.  Judge Sullivan continued, “Document review is the practice of law, regardless of who conducts it. The only difference between lawyers and non-lawyers is that the former can lawfully perform document review without supervision, while the latter cannot.”  Judge Sullivan’s ruling to dismiss Lola’s case weighs heavily on the many licensed lawyers who rely on document review projects as a way to make a living. For law firms, contract attorneys provide a reputable source of credentialed, cost-effective attorneys who spare the client from higher-priced associates, and spare those associates from a discovery obligation that many deem menial.

The question of what actually constitutes the practice of law has only been posed to two other district judges — in the Southern District of Texas in Oberc v. BP PLC  and in the Southern District of New York in Henig v. Quinn Emanuel Urquhart & Sullivan. In the Henig case, whose facts mirror the Skadden dispute, District Judge Ronnie Abrams has allowed discovery to determine whether the plaintiff in the case, William Henig, practiced law under the FLSA while working as a reviewer under the supervision of Quinn Emanuel.

The Department of Labor has given no guidance on what constitutes the practice of law and David Lola’s appeal was argued to the Second Circuit only in January.  In the argument before the Court of Appeals, Skadden argued that both common sense and the FLSA contradicted Lola’s position that document review is not the practice of law.  Lola, in turn, argued that the lower court erred by applying the definition of practicing law in North Carolina – where he conducted the document review – and irrespective of that definition, he was not practicing law. He went on to argue for the adoption of a federal definition of “practice of law.”  Check back here for the Second Circuit’s decision when available.

Lola vs. Skadden Arps – Judge Sullivan 9-16-14 Opinion and Order