In 2016, Florida became the first state to mandate technology training for lawyers, when it adopted a rule requiring lawyers to complete three hours of CLE every three years “in approved technology programs.”  The requirement went into effect on January 1, 2017.  On April 20, 2018, the North Carolina State Bar Council approved a proposed amendment to the lawyer’s annual CLE requirement.  That amendment, if enacted, would mandate that one hour of the required twelve hours of annual CLE training be devoted to technology training (defined as a program, or a segment of a program, devoted to education on information technology (IT) or cybersecurity [see N.C. Gen. Stat. §143B-1320(a)(11)]).

While there is no indication that New York will be next to impose such requirements, it may only be a matter of time until other states (including New York) follow Florida and North Carolina’s lead.  Indeed, in a world where emails, tweets, texts and instant messages are a routine way of life and way of conducting business, lawyers should be expected to maintain a basic level of competence regarding technologies and electronically stored information.  Various Model Rules (see, e.g., ABA Model Rule 1.1, Comment 8)* and State Opinions (see, e.g., New York County Lawyers’ Association Professional Ethics Committee Formal Op. 749 [Feb. 21, 2017])** have already indicated that a lawyer’s duty of competence includes technological competence.  And, there is decisional law concluding that “professed technological incompetence is not an excuse for discovery misconduct.” James v. Nat’l Fin. LLC, No. CV 8931-VCL, 2014 WL 6845560 (Del. Ch. Dec. 5, 2014).

Because the electronic nature of today’s world is here to stay, it would make great sense to mandate regular training in areas of technology and cybersecurity.  Please contact me at kcole@farrellfritz.com if you are admitted to practice in New York and would like to be added to my technology CLE invitee list.

 

*American Bar Association Model Rule 1.1  – “Duty of Competence.”  Comment 8 : “. . . a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology . . .”

** New York County Lawyers’ Association Professional Ethics Committee Formal Op. 749 (Feb. 21, 2017) discussing the “ethical duty of technological competence with respect to the duty to protect a client’s confidential information from cybersecurity risk and handling e-discovery” in a litigation or government investigation.

When tasked with a document review project, there are various analytic tools available to streamline the process in order to improve efficiency and accuracy.  We’ve already discussed certain of these tools (see April 26 post discussing predictive coding and May 16 post discussing email threading).  Today’s post focuses on another, interrelated tool: document clustering.

What is Document Clustering?

As you can imagine, the way in which a cache of documents is organized for review can make a tremendous difference in not only the efficiency of the review, but also the accuracy of the review itself.  Clustering software examines the text in documents, determines which documents are related to each other, and groups them into clusters.  Clustering performs the electronic equivalent of putting your documents into labeled boxes so that things only end up in the same box if they belong together. Clustering groups similar documents together and then assigns those document to the same reviewer(s), allowing for a more efficient review as related documents can be reviewed together.  Clustering organizes the documents according to the structure that arises naturally, without query terms.  It labels each cluster with a set of keywords, providing a quick overview of the cluster; basically telling you, the project lead, what the documents have in common at a conceptual level. The keywords give a quick idea of what each cluster is about, allowing you to easily identify the themes of your document set.  For example, if you are a litigator looking for information about a particular contract and the cluster is about the Company’s summer softball team, documents in that cluster are not relevant.  During review, you can, with a single mouse click, categorize or tag a single document, a cluster of documents, or a set of clusters containing a specific combination of keywords. *

 

*Certain clustering software has an automatic categorization capability, where all documents sufficiently similar to a set of documents can be categorized the same way, greatly reducing the amount of labor needed when new documents are added to a case.  It enables you to leverage the labor you’ve put into categorizing the earlier documents.

The April 26 blog post discussed predictive coding as one of many analytical tools available to empower attorneys to work smarter, thereby reducing discovery costs and allowing attorneys to focus sooner on the data most relevant to the litigation. Another tool in the litigator’s arsenal that can promote efficiency during document review is email threading.

According to The Radicati Group, the average employee sends 36 emails per day. That extrapolates to approximately 10,000 emails per employee per year.  And so, even in cases involving a limited number of custodians, the volume of emails at issue can be significant.

Email threading provides a number of benefits, including eliminating the need to review the same content multiple times and minimizing potential for inconsistent coding of emails.

So what exactly is threading?  It is a process by which emails are grouped together so they can be reviewed as a single coherent conversation.  For example, if I write John Smith an email, John’s reply will very likely include my original message at the bottom of the email chain.

When an email collection for discovery purposes occurs, both segments of the email chain are collected.  Presuming my conversation with John continues, the email may have many segments over the course of days /weeks.

In email threading, an algorithm compares and matches segments, resulting in emails from the same conversation being grouped together.  Then, the most inclusive email (i.e., the one with the most complete content) is promoted for review by the review team. Non-inclusive emails (i.e., those with text and attachments contained in another inclusive email) are suppressed.  By reviewing inclusive messages, rather than non-inclusive messages, the review team bypasses redundant content and limits the number of documents to review.

Threading also allows the reviewer to see the full picture.  For example, if an email conversation has 16 segments, and those segments are spread among various reviewers, they would appear as separate messages with no particular order, allowing for the first segment to be reviewed 16 times (once on its own, then as a segment in the second message, and in the third message, and so on) by multiple different reviewers.   And, if the first three segments are non-responsive, what happens when the fourth segment is responsive?  Do you have to search for the first three segments?

In summary, email threading should be implemented in every case.  It makes review more efficient, more consistent, and can streamline even the smallest review project.

Traditional document review can be one of the most variable and expensive aspects of the discovery process.  The good news is that there are innumerable analytic tools available to empower attorneys to work smarter, whereby reducing discovery costs and allowing attorneys to focus sooner on the data most relevant to the litigation.   And, while various vendors have “proprietary” tools with catchy names, the tools available all seek to achieve the same results:  smarter, more cost effective review in a way that is defensible and strategic.

Today’s blog post discusses one of those various tools – predictive coding.  The next few blog posts will focus on other tools such as email threading, clustering, conceptual analytics, and keyword expansion.

Predictive Coding

Predictive coding is a machine learning process that uses software to take keyword searches / logic, entered by people, for the purpose of finding responsive documents, and applies it to much larger datasets to reduce the number of irrelevant and non-responsive documents that need to be reviewed manually.  While each predictive algorithm will vary in its actual methodology, the process at a very simplistic level involves the following steps:

  1.  Data most likely relevant to the litigation is collected. Traditional filtering and de-duplication is applied.  Then, human reviewers will identify a representative cross-section of documents, known as a “seed set,” from the remaining (de-duplicated) population of documents that need to be reviewed.   The number of documents in that seed set will vary, but it should be sufficiently representative of the overall document population.
  2.  Attorneys most familiar with the substantive aspects of the litigation code each document in the seed set responsive or non-responsive as appropriate. Mind you, many of the predictive coding software available allows users to perform classification for multiple issues simultaneously (i.e., responsiveness and confidentiality).  These coding results will then be input into the predictive coding software.
  3.  The predictive coding software analyzes the seed set and creates an internal algorithm to predict the responsiveness of other documents in the broader population.  It is critically important after this step that the review team who coded the seed set spend time sampling the results of the algorithm on additional documents and refine the algorithm by continually coding and inputting sample documents until desired results are achieved.  This “active learning” is important to achieve optimal results.  Simply stated, active learning is an iterative process whereby the seed set is repeatedly augmented by additional documents chosen by the algorithm and manually coded by a human reviewer. (This differs from “passive learning,” which is an iterative process that uses totally random document samples to train the machine until optimal results are achieved).

Once the team is comfortable with the results being returned, the software applies the refined algorithm to the entire review set and codes all remaining documents as responsive or unresponsive.

 

 

 

 

When faced with the task of collecting, processing, reviewing and producing digital data, law firms (and clients) often retain outside vendors to assist.  Depending on the vendor, and the circumstances of the retention, there may be a single vendor retained to handle the entire spectrum of client needs (i.e., from collection to production).  Or, there may be a series of vendors retained (i.e., one to perform a forensic collection, another to handle document review).   Before retaining any vendor(s), however, it is advisable to perform some minimal due diligence on the vendor in an effort to minimize the potential that client data could be compromised.    Indeed, in today’s age of digital data and increased efforts to ensure data privacy and protection, it is critically important that any vendor that will have access to a client’s data be obligated to keep the data in an environment equally as secure as the environment in which the organization and/or law firm maintained the data.

Below is a suggested list of questions /topics to discuss with vendors before retaining them.  The list is by no means exhaustive.

  • Does the vendor have an incident response plan?
  • Does the vendor have any security certifications?  For example, the International Standards Organization (“ISO”) 27001 — the international standard for information security.
  • Does the vendor have cyber liability insurance? If so, is the insurance adequate?
  • Will the vendor permit security audits or provide a copy of the most recent security audit report?
  • Has the vendor suffered data security breaches/events?
  • What are the vendor’s encryption practices?  And, do these practices apply only to data it houses, or also to data in transit?

It is also advisable to include in the vendor agreement that the vendor must notify you/client of any data incidents within a set time frame.

In 2012, Klipsch Group Inc. (“Klipsch”), a manufacturer of sound equipment, filed a complaint against ePRO E-Commerce Ltd. (“ePRO”), alleging an ePRO subsidiary was selling counterfeit headphones.  Through discovery demands, Klipsch called for the production of information relevant to the sale of the allegedly infringing product, including emails and specific sales data.    Eventually, however, it became clear that ePRO was not engaging in a cooperative discovery process but instead was avoiding its discovery obligations.  For example, ePRO:  failed to implement an appropriate legal hold notice even after having been directed by the Trial Court to do so; limited vendor access to electronic data; failed to produce many responsive documents; and (as demonstrated by a forensic examination authorized by the Court) engaged in routine and systematic deletion of thousands of files and emails using a data wiping software long after the suit had commenced.

Because of the numerous and continuous discovery failures, Klipsch moved for sanctions and ultimately filed an ex parte motion seeking additional relief.  The District Court concluded that ePRO willfully spoliated evidence and it imposed various sanctions on ePRO including:

(1)   a jury instruction requiring the jury find that ePRO destroyed relevant emails and related data;

(2)  a jury instruction permitting the jury to infer that the destroyed evidence would have been favorable to Klipsch; and

(3)  Klipsch’s reasonable costs and fees, which the Court ultimately concluded was $2.7 million necessitated by ePRO’s obstructionist behavior.

ePRO filed an interlocutory appeal, arguing that the District Court’s $2.7 million sanction in the case where damages were, at most, $20,000 was impermissibly punitive and grossly disproportionate.

In January, the Second Circuit upheld the District Court’s sanction.  In doing so, the Circuit held that discovery sanctions should be commensurate with the costs occasioned by the sanctionable behavior, not the value attributable to the alleged (or even proven) compensatory damages.  To allow otherwise would, according to the Circuit, force a litigant to a small value dispute to beat risk to suffer blatant and egregious discovery misconduct.  And so, sanctions must be proportionate to the costs inflicted on a party – irrespective of total case value – by virtue of that party having to remediate discovery misconduct by its adversary.

Consistent with the theme of cooperative discovery, the Second Circuit noted that “the integrity of our civil litigation process requires that the parties….carry out their duties to maintain and disclose the relevant information in their possession in good faith.”    Like the countless other cases I have blogged about since December 2015, this decision serves as another reminder that judges expect cooperation between the parties and their attorneys during the litigation process to achieve orderly and cost-effective discovery; indeed, it is a priority.  Had ePRO and its counsel simply cooperated with their adversary and engaged in good faith discovery, the outcome here would have been entirely different.*

 

 

* Cooperation among counsel is critically important and the means to insure compliance with Rule 1’s mandate that the parties are responsible for securing the “just, speedy and inexpensive determination” of a civil litigation.  Indeed, the revised committee notes state, “[m]ost lawyers and parties to cooperate to achieve these ends” and “[e]ffective advocacy is consistent with – and indeed depends upon – cooperative and proportional use of procedure.”

Imagine if the above emojis, casually fired off in a text message (or in an Instagram or Facebook post) to a friend or colleague, could be used against you as evidence of workplace harassment?

Or if another combination of cartoon-like representations of emotions could be used as proof of defamation?

Or if inclusion of a face emoji with its tongue sticking out could preclude a reasonable reader from concluding the potentially defamatory statement was anything other than a joke?

Some disbelieving readers may think, never!  But, not so fast.  In fact, there are a growing number of cases, both in the United States and elsewhere, where emoji images have been entered as evidence requiring the judge or the jury (as the case may be) to interpret what exactly was meant by the emoticon.  But therein lies the issue —  what exactly does the combination of emojis (or a small digital image or icon used to express an idea, or emotion) in any given text or communication mean?  There is, after all, no fixed emotional resonance or clear dictionary definition for interpreting them.  So, while one colleague may interpret the smirking face with a beer mug as nothing more than an innocent invitation to grab a drink, another colleague may interpret that same message to have a potentially sinister motive.   It is this very subjective nature of emojis, and the double-meaning of some emojis, that can cause issues in the workplace and elsewhere.

So what is a business to do?  Whether you   or   emojis, it is important to recognize they are here to stay and part of mainstream communication.  Regulating their use in the same way that other communications are regulated may be an advisable business practice.  For example, consider whether there should be rules governing emojis in office communications.  If so, review and update your employee handbook.    

In Youngevity Intl’s Corp. v. Smith (No: 16-cv-00704 [SD CA December 21, 2017]), defendants sought an Order pursuant to Federal Rules of Civil Procedure 26(g) and 37.  The Order required Plaintiffs to remediate an improper discovery production to pay for Defendants’ costs for bringing the motion to compel and for the cost to review various improper prior productions.  Specifically, in connection with the discovery of electronically stored information (“ESI”), Defendants proposed a three-step process by which: “(i) each side proposes a list of search terms for their own documents; (ii) each side offers any supplemental terms to be added to the other side’s proposed list; and (iii) each side may review the total number of results generated by each term in the supplemented lists (i.e., a ‘hit list’ from our third-party vendors) and request that the other side omit any terms appearing to generate a disproportionate number of results.”

Approximately one week later, Plaintiffs advised in writing that they were “amenable to the three step process described in your May 9 e-mail.”  The parties then exchanged lists of proposed search terms to be run through their own ESI and the ESI of their opponent.

Pursuant to the agreed-to three-step process, Defendants provided to Plaintiffs its “hit list.”  Plaintiffs, however, never produced its “hit list.”  Instead, Plaintiff produced two large caches of documents – the first consisting of approximately 1.9 million pages and the second production consisting of approximately 2.3 million pages.   Upon receipt by Defendants, it became clear that the productions had been bulk coded with a CONFIDENTIAL legend and in some instances also with an ATTORNEYS’ EYES ONLY designation.  The produced materials also contained non-responsive documents.  A few months later, defendants advised they inadvertently failed to produce an additional 700,000 documents due to a vendor error.  Although the parties attempted to resolve amicably their differences, they were unsuccessful.

As a result, Defendants’ filed the instant motion to compel proper production and for costs.

In granting Defendants’ motion, Magistrate Judge Jill L. Burkhardt concluded, “the record indicates that Youngevity did not produce documents following the protocol to which the parties agreed.”  Specifically, “Youngevity failed to produce its hit list … and instead produced every document that hit upon any proposed search term” thus conflating “a hit on the parties’ proposed search terms with responsiveness.”  Moreover, the Court observed “the parties negotiated a stipulated protective order, which provides that only the ‘most sensitive’ information should be designated as AEO.”  As a result, Judge Burkhardt gave the plaintiffs two options for correcting their discovery productions with specific deadlines:

“1) By December 26, 2017, provide its hit list to Defendant; by January 5, 2018, conclude the meet and confer process as to mutually acceptable search terms based upon the hit list results; by January 12, 2018, run the agreed upon search terms across Plaintiff’s data; by February 15, 2018, screen the resulting documents for responsiveness and privilege; and by February 16, 2018, produce responsive, non-privileged documents with only appropriate designations of “confidential” and “AEO” (said production to include that subset of the not-previously-produced 700,000 documents that are responsive and non-privileged); or

2) By December 26, 2017, provide the not-previously-produced 700,000 documents to Defendant without further review; pay the reasonable costs for Defendant to conduct a TAR of the 700,000 documents and the July 21, 2017 and August 22, 2017 productions for responsiveness; by January 24, 2018, designate only those qualifying documents as “confidential” or “AEO”; by that date, any documents not designated in compliance with this Order will be deemed de-designated.”

Judge Burkhardt also ordered Plaintiffs to pay for the reasonable expenses, including attorney’s fees, for bringing the motion and for the expenses incurred by Defendants “as a result of Youngevity’s failure to abide by the Stipulated Protective Order.”

Conclusion

This case is another reminder of what appears to be the well-embraced theme in Federal discovery – cooperation.  The 2015 amendments made plain that cooperation between the parties and their attorneys during the litigation process to achieve orderly and cost-effective discovery is a priority.  Indeed, mutual knowledge of the relevant facts is essential to proper litigation; and therefore the process of obtaining those facts (i.e., discovery) should be a cooperative one.  Had counsel simply abided by the three-step process and stipulated protective Order it willingly entered, there would be no need to defend against (and foot the bill for) the motion to compel.

You are involved in litigation and faced with a document review need, what now? Naturally you need to find attorneys to review these documents. To this end, depending on the volume of data at issue, many firms will either: (1) staff the document review with firm attorneys, or (2) work with a vendor to retain a review team comprised of contract attorneys. Irrespective of who conducts the needed review, the cost attendant to that review and the time to complete the review is often a concern.  Because a party to a litigation should not produce documents without reviewing them, predictive coding may be a particularly helpful option.

Simply put, predictive coding is the use of a computer system to help determine which documents are relevant to a particular legal proceeding.  The system makes this determination based upon “training” it receives from human input.  In fact, for a predictive coding system to make accurate decisions, the system needs direction from humans fluent in the intricacies of the lawsuit.  During this training phase, attorneys will review a seed set of documents and code those documents accordingly (i.e., responsive, privilege, tagging issues applicable). (FN*) At each step of this process, the computer system is being trained and educated. Refinements are made along the way and internalized by the system. Once trained, the computer will find and code (based on its training) the responsive documents far quicker (and often with far greater accuracy) than human reviewers. Specifically, the computer will build a model to identify documents that have a high probability of correct classification into categories pre-defined through the training /seed coding.

As with any review (entirely human or a combination of human and machine review), a validation process should be implemented.  Specifically, there should be a work flow created that provides for attorney reviewers to check the efficacy and accuracy of the model.   It is important here to determine what validation/QC process is best implemented.  For example, one can implement a statistical sampling of data where documents are selected at random and reviewed for accuracy.  This sort of validation would be reflective of the machine’s overall accuracy and reflective of the overall document population.  There is also, however, a more particularized sampling where a group of relevant documents are selected from the population and reviewed for accuracy.  This sort of validation would be more limited in that it would not allow the attorney running the review to form any conclusions about the entire document population.  (FN**)

Because of the ever-increasing volume of data and information, predictive coding is becoming a more attractive tool to incorporate into every document review to some degree, especially because no minimum data size is required to use predictive coding.  A document review that uses predictive coding coupled with a well-devised work flow will inevitably minimize review costs while maximizing efficiency during the review.

 

FN* Because the coding on these seed documents will impact the quality of the computer’s determinations, it is important the individuals coding the seed documents understand well the lawsuit and how the predicting coding system is to work.

FN**  And, if you are not comfortable allowing a computer to do that much work, other predictive coding options (e.g., other than allowing the system to extrapolate based upon seed sets) are available.  For example, prioritized review can be used whereby the system identifies and escalates important documents for review but keeps likely irrelevant documents in the queue.  Incorporating this option into your work flow allows attorneys to still lay eyes on all documents but provides for an efficient prioritization of documents that must be reviewed.

What do applications like Snapchat, Telegram, Wickr, Cover Me, Speak On, and Whisper have in common? They are all self-destructing message (“SDM”) applications. What exactly does this mean, you ask? Self-destructing messaging applications transmit information with end-to-end encryption, and auto destruct after a set time period of time, or after receipt and access by the intended recipient.  Consider Snapchat, for example. Snapchat is one of the most popular social media platforms in the world. Indeed, in 2016, Snapchat surpassed Facebook’s number of video views per day.  Part of Snapchat’s popularity is derived from the fact that the user can set timers for shared photos / videos to self-destruct once the person received it; allowing users (typically younger generations) to share photos without the risk of the photo going public.

Yet, what happens when SDM technologies (which are evolving rapidly) are used in the corporate world?  How does one preserve potentially relevant information?  What is the risks verses benefits of incorporating into one’s business SDM technology?  These questions – and others – are likely questions litigators will grapple with in the coming months/years given the rapid growth of SDM technology.**

While it is impossible to predict the future, I suspect it is only a matter of time until this issue becomes more of a focus in litigation and I look forward to reading decisions on point as the case law catches up to the technology.

** Consider, for example, the Waymo LLC v. Uber Technologies, Inc., lawsuit, wherein allegations have arisen that one party is hiding information relevant to the lawsuit by transmitting that information via SDM. Consider further the fact that the Department of Justice in December issued an enforcement policy urging strongly against the use of messaging applications that do not store data in a way that allows for access during a subsequent investigation.  These recent lawsuits and policies make plain SDM technology is being employed in corporate America.