Technology has revolutionized, among other things, the way people conduct business, store information and communicate with others.  And, despite the many efficiencies and benefits of technology, a downside of this “revolution” is the creation of countless files that may later be subject to review and potential production during litigation /investigation proceedings.  Indeed, even relatively small cases routinely involve the collection of tens and hundreds of thousands of documents and files.  This in turn makes for a costly, and potentially complicated discovery process.  And so, it is critically important to identify early in the litigation life-cycle, defensible ways to cull this data and isolate relevant material without sacrificing accuracy.

Although many attorneys have different approaches to electronic discovery, I believe certain steps should be taken in every litigation involving ESI (which, let’s face it, is every litigation in today’s E-age).  In my opinion, among the most effective tools for reducing e-data is early case assessment efforts to analyze the data collected.  More specifically, after the data collection is complete, one should review a file extension report with an eye toward eliminating file types that are not relevant.  Another report that can provide actionable insight for counsel is a search term report.  Indeed, this report can illustrate what search “hits” are likely to yield documents responsive to the litigation/investigation and which terms are more likely “misses.”  Revising search terms (often multiple times) based upon this report is highly recommended and a sound way to cull data.

Another step that should be implemented to minimize the data universe is deduplication (either within or across custodian).  What this means is that identical duplicates of documents (or near duplicates should you opt for same) will be eliminated from the data set.  If you opt to deduplicate within a custodian, then any identical duplicate in an individual’s data will be eliminated and only one copy of the document available for review and production.  If you opt to de-duplicate across custodians, then, for example, only one copy of the email that appeared in three different custodians’ email, will be available for review and production.  However, in the latter situation, it will be disclosed through the meta-data that the document existed in the other two custodians’ mailboxes.

A final tool to implement in any review is email threading.  Threading allows for only the most inclusive versions of email documents to be included in the review whereby reducing the attorney hours required to review documents.  For example, the attorney will review only the most inclusive email chain of ten, rather than each of the ten chains leading up to the most inclusive version.

There are ample other opportunities to introduce additional efficiencies into the review (clustering, bulk-coding, etc., to name a few), but it is advisable to work with an attorney or vendor to develop a defensible methodology and workflow to achieve the most efficient and effective discovery outcome for the client.

Have questions?  Please contact me at

In recent years, there has been a dramatic increase in e-discovery vendors.  While having more vendor options to choose from may seem like a good thing, the surge in vendors can make it difficult to differentiate among them, and to compare the relative strengths and weaknesses of each. It is therefore critical that law firms and legal departments who seek to leverage efficiently data for purposes of litigation understand that selecting an e-discovery vendor is more than an isolated transaction and must be approached with some key considerations in mind.  Below are a few topics that are worth considering when choosing among vendors.

  1. Is data security a priority for the vendor?  Data security is an issue.  Think Marriott, Equifax, Anthem, Yahoo, EmblemHealth, Target … Therefore, before retaining a vendor, it is important to be confident that the vendor has robust security measures in place to ensure data access is controlled.  A lack of proper security measures exposes the firm/legal department client’s data to security vulnerabilities.  In addition to assessing what security credentials the vendor has (i.e., SOC 2 Type II) you should inquire of the vendor’s employee training and preventative efforts like intrusion detection, data encryption, cloud security controls and penetration testing.
  2. Is innovation a priority for the vendor?  Technology is constantly evolving and so, too, are the data sources that may be relevant for a litigation.  For the purpose of illustration, consider text messages. Five years ago there were rarely text messages as a data source involved in litigation. Today, much business is performed via text message.  And so, it is critical that the vendor you retain is always innovating and regularly developing new capabilities to address the growing amounts of, and varying sources of, data.  At a minimum, you want a vendor whose solution(s) seamlessly integrates with modern data sources so that things like text messages are easy to collect, review and produce.
  3. Are efficiency and automation priorities for the vendor?  There is nothing more frustrating than a review platform that is slow or clunky.  When the task at hand is to review 50,000 emails, the time it takes to process the data (including to de-dupe, deNIST, OCR) is a relevant consideration.  All of the small delays along the way can easily add up to big delays, big costs, and big concerns.  Therefore, it is important to understand the vendor’s infrastructure.  For example, what sort of processes are in place such that the time from data ingestion to production is expedient?  Similarly, it is worth speaking with existing clients of the vendor to understand any issues encountered with the platform.  For example, how is the speed of document to document load time?  What artificial intelligence is available to leverage (i.e., predicting coding, clustering).  The solution should be efficient, user-friendly and easy to use.

Legal professionals should do their homework before retaining an e-discovery vendor as no two vendors are the same.  While there are many areas to explore before retention, the issues of security, innovation and efficiency are critical among them.  Asking thoughtful and difficult questions during the vetting process gives legal professionals a greater likelihood for a seamless engagement.

Have questions?  Please contact me at


The issue of production format when dealing with ESI is often the subject of discussion and disagreement.  If possible, the parties to the litigation should agree at the outset to the production format.   In fact, a conversation about production format, metadata and redactions (among other things) should occur at the preliminary conference and/or the Rule 26 conference. However, this “meet and confer” process often gets short-changed or skipped entirely, leaving the producing parties to respond to unexpected and often costly production demands.  Irrespective of whether the parties agree upon a production format, it is important to understand the more common formats and their respective benefits/shortcomings.

1. Native File Production

A native production consists of electronically stored information in the format in which it is maintained ordinarily by the producing party.  The benefits of native file production include savings of money and time compared to other formats, which require conversion of the ESI into images and associated load files.  However, some files cannot be produced in native file format because they require conversion in order to allow them to be reviewable (i.e., certain email formats and databases).  Additional drawbacks of a native production include the inability to brand individual pages (i.e., with a bates stamp or confidential legend) or to apply redactions.  Perhaps the most concerning aspect of a native production, however, is the producing party’s inability to control the metadata produced because the document is “live.”  Consider for example an Excel document.  The metadata produced with it would necessarily include any hidden text, track changes, and comments.  An additional concern with native files is the challenges attendant to applying redactions.

3. TIFF Production

TIFF is an acronym for tagged image format file. It is a common graphic file format and the extension related to this format is .tif.  In a TIFF production, all documents are converted from their native format to black and white, single-page .tif files.  It is as if a “picture” of the ESI is taken such that is appears to the end user in the same way one would view it on screen or if printed.  For each record, document level text, an image (.opt) load file, and a metadata (.dat) load file is provided.  By producing the image with the accompanying extracted text and metadata in load files the image is viewable and searchable in a review tool.** Although converting native files to .tif involves a cost, the advantages of an image production include the ability to number, redact and mark documents as confidential, as well as the ability to control the metadata fields that are produced. Imaged files also carry less risk of accidental alteration because they are not capable of being edited.  However, the costs attributable to, and the time involved in, converting the ESI to images may be viewed as a negative.

3. Text/Searchable PDF Production

A searchable PDF is effectively the same as a .tif production.  However, rather than simply exporting the converted images to a review tool, the images are converted to PDFs and then OCR’d* to incorporate searchability.  Often one requests PDFs if they plan on reviewing the production outside of a review tool.  However, even an OCR’d PDF can suffer from incomplete and imprecise search functionality.  And so, PDF productions are less desirable than a .tif.

4. Paper Production

Paper documents are physical documents copied from other physical documents or printed from ESI.  Paper production is often the least expensive and shares many of the same advantages of .tif and .pdf productions.  For example, papers can be easily bates stamped, redacted and branded.  However, a paper production can be laborious and inefficient when you are on the receiving end.  For example, paper cannot be searched or indexed electronically. Rather, one is left to sort through, and manually organize, bankers’ boxes of documents.  And because paper has no metadata associated with it, reducing ESI to a paper format with no searchable text or metadata may not meet the requirement of producing ESI in a reasonably usable form as many of the discovery rules require.

It should also be noted that document productions often include a combination of the above formats.  For example, the lion’s share of a production may be .tif files, however, any Excel file in the production may be produced in native so that is it is more usable.  Similarly, databases may be produced in a native file format with any item needing redaction converted to an image.  Given the variables and the associated benefits and drawbacks, one should engage in a meaningful conversation with their adversary at the preliminary conference /Rule 26(f) conference to devise a production plan and chart a course that lays out what is being requested and the production expectations.

*OCR stands for Optical Character Recognition.  It is the process of converting images of printed pages into electronic text.  It is typically done so that a file is text-searchable.

**Reference to a “review tool” is meant to describe the database/repository where ESI documents are located for purposes of review and production.  These “tools” are necessary because it is impractical and inefficient to open on one’s computer each file in their many different source applications . It is therefore necessary to load the ESI into an application that allows it to be reviewed, searched and analyzed. Some companies that are frequently involved in litigation choose to purchase such applications for their own use, but many use applications hosted on their law firm’s or an e-discovery vendor’s systems. Review tools usually require the ESI to be processed before loading.

Have questions?  Please contact me at

I am often asked by clients and subscribers to the blog, What is E-discovery?  And so, this week’s post is intended to respond to that question.

E-discovery is the abbreviated term for electronic discovery and refers to the process in which electronic data (as compared to paper or object information) is sought, located, secured, reviewed and produced for use as evidence in a civil or criminal lawsuit. Although the “E-discovery” nomenclature is far more common, one may also see this concept referred to as EDD, electronic discovery. It is important to understand that all types of electronic data can serve as evidence including, for example, text, images, calendar files, databases, spreadsheets, audio files, animation, Web sites, e-mails, voicemails and computer programs.  It is important to understand E-discovery and the various sources of data so that we, as attorneys, can efficiently process the information and construct legal arguments and defenses using this data and documents.  The explosion in the amount of data being generated and how this impacts the legal process is something no one is immune from.  Indeed, E-discovery is here to stay.  It is a process that Fortune 500 companies, “Ma and Pa” shops, and individual parties to lawsuits will be required to participate in.   And so, litigators and clients must better understand data, how it is stored, how it can be searched, how it can be reviewed, and how technology can be applied to the process to promote cost effective ways to conduct document discovery and wade through the large volumes of data. It is my hope that this blog – historical posts and those to come – will help provide subscribers with the information necessary to achieve this goal.

Have questions?  Please contact me at

Whether we like it or not, a reality of today’s world is that often important business is conducted by text messages. And so, when it is time to issue a litigation hold notice, you must include an instruction to preserve text messages as well as the devices from which they are sent/received (i.e., smartphones).  Your failure to do so can be a costly mistake as learned by defendants in the Paisley Park case — a litigation involving the Estate of the late musical artist known as Prince — in the district of Minnesota.

In Paisley Park Enters. v. Boxill, No. 0:17-cv-01212, (D. Minn., 3/5/19) (copy here: Prince_Discovery_Order), Magistrate Judge Tony N. Leung reminded us of the obligation to preserve electronically stored information (“ESI”) that is relevant to the lawsuit, including text messages.* 

Simply stated, Plaintiffs claimed they were deprived of relevant discovery; defendants argued they did what was required by the law (i.e., preserve emails and computer data).  Defendants claimed ignorance that they had any obligation to preserve their text messages.

In reaching the merits of the spoliation motion filed by Plaintiffs, the Court concluded that Defendants’ failure was intentional and sanctions appropriate.  In reaching this conclusion the Court made a number of salient observations.

First, the Court observed that the executives – as principals of the corporate defendant – were they types of individuals likely to have relevant information.

Next, the Court observed that the text messages of the individual defendants were likely to contain relevant information because, as demonstrated by text messages secured by Rule 45 subpoena, the executives often discussed the very matters in the lawsuit by text message.  The Court therefore concluded that under the Federal Rules the parties were required to take reasonable steps to preserve ESI, including text messages (which are included in the standard, expansive term “documents”).**

Despite this obligation to take reasonable steps to preserve relevant information, the Court observed the defendants failed entirely to take any reasonable steps.  Indeed, the defendants failed to take any number of simple, basic steps including:

  • the executives did not suspend the auto-delete functionality on their respective phones —  a failure that the Court observed “takes, at most, only a few minutes” to implement;
  • the executives did not put in place a litigation hold to ensure that they preserved text messages; and
  • the executives failed to take any number of  “relatively simple options to ensure that their text messages were backed up to cloud storage” – processes that would have cost “little, particularly in comparison to the importance of the issues at stake and the amount in controversy here.”

The Court concluded that defendants’ failure to follow these simple steps alone was sufficient to show defendants acted unreasonably.   However, if the defendants’ absence of reasonable efforts was not enough, the evidence submitted demonstrated the defendants each wiped and intentionally destroyed their phones after the lawsuit was commenced (and, in the instance of one executive, he wiped a second phone and discarded it after the Court ordered the parties to preserve all relevant electronic information and after receipt of a letter advising of the need to produce text messages).

And so, having concluded both that the defendants failed to take reasonable steps to preserve relevant information and intended to destroy relevant ESI, the Court analyzed the prejudice caused to plaintiff.  Specifically, was the destroyed ESI able to be replaced from any other source Fed. R. Civ. P. 37(e).

Defendants argued there was no prejudice because plaintiffs were able to secure from third-parties some text messages sent to or received by the executive defendants.  The Court dismissed this argument and observed  these were “scattershot texts and [e-mails],” rather than “a complete record of defendants’ written communications from defendants.”  According to the Court, Plaintiffs were, for example, unable to recover text messages that the two individual defendants sent only to each other.  The Court therefore concluded the missing text messages could not be replaced or restored by other sources making it “impossible to determine precisely what the destroyed documents contained or how severely the unavailability of these documents might have prejudiced [Plaintiffs’] ability to prove the claims set forth in [their] Complaint.” Telectron, Inc. v. Overhead Door Corp., 116 F.R.D. 107, 110 (S.D. Fl. 1987).  Therefore, the Court concluded sanctions were appropriate under Rule 37(e)(1).

Because the Court concluded that the executive defendants acted with the intent to deprive Plaintiffs of evidence, the Court ordered sanctions, pursuant to each of Rules 37(b)(2)(C), 37(e)(1), and 37(e)(2) and directed the executive defendants to pay reasonable expenses, including attorney’s fees and costs, that Plaintiffs incurred as a result of the defendants’ misconduct.  The Court further directed the defendants pay into the Court a fine of $10,000.

While this case is an egregious example of discovery violations, the message to internalize is to include text messages (and other forms of messaging) in your hold notice.

*For those of you interested in the specifics of the lawsuit, the case involved the Estate of the late Prince Rogers Nelson (“Prince”) and the Estate’s interest in various songs created by Prince, including certain ones not released to the public.

**In rendering his decision to impose sanctions, Judge Leung provided a useful summary of the relevant law:

The Federal Rules of Civil Procedure require that parties take reasonable steps to preserve ESI that is relevant to litigation. Fed. R. Civ. P. 37(e). The Court may sanction a party for failure to do so, provided that the lost ESI cannot be restored or replaced through additional discovery. Id. Rule 37(e) makes two types of sanctions available to the Court. Under Rule 37(e)(1), if the adverse party has suffered prejudice from the spoliation of evidence, the Court may order whatever sanctions are necessary to cure the prejudice. But under Rule 37(e)(2), if the Court finds that the party “acted with the intent to deprive another party of the information’s use in the litigation,” the Court may order more severe sanctions, including a presumption that the lost information was unfavorable to the party or an instruction to the jury that it “may or must presume the information was unfavorable to the party.” The Court may also sanction a party for failing to obey a discovery order. Fed. R. Civ. P. 37(b). Sanctions available under Rule 37(b) include an order directing that certain designated facts be taken as established for purposes of the action, payment of reasonable expenses, and civil contempt of court.

Have questions?  Please contact me at

Rule 1 of the Federal Rules of Civil Procedure calls upon courts and litigants to “secure the just, speedy, and inexpensive determination of every action and proceeding.” And so, it comes as no surprise that technology assisted review (“TAR”) is being widely embraced by the legal profession.

What is TAR?

TAR (also called predictive coding, computer assisted review, or supervised machine learning) is a document review process where humans work with software to train the software/computer to identify relevant documents.*  The goal of TAR is to effectively and efficiently categorize documents. The potential for significant savings in cost and time, without sacrificing quality, is what makes TAR so appealing.

The process is relatively straight forward. First, electronic documents that have been preserved and collected are loaded onto a platform where the software builds an index during which each document’s text is analyzed.** Next, a human reviewer who is knowledgeable about the issues and circumstances of a lawsuit reviews and codes/tags documents as “responsive” or “non-responsive.” This “seed set” of documents are used to “train” the computer. This information is then ingested by the computer, and used to draw inferences about documents that have not yet been reviewed. The computer “learns” from the human reviewers’ designations which combination of terms or other features occur in responsive documents and the computer then develops a model that it uses to predict the coding on the remaining documents.

However, TAR should not be applied blindly. Rather, quality control and testing are essential to confirm the accuracy of decisions made by the software. There are a myriad ways to perform QC, including testing random samples of the predicted responsive set before production.   It should be noted that there is no standard measurement to validate the results of TAR. Rather, it is based upon reasonableness and proportionality considerations.

For those of you interested in learning more about TAR and best practices associated with TAR, consider reading the Technology Assisted Review Guidelines released by Duke Law School in February.

*The phrase “technology assisted review” can imply a meaning broader than used in this blog. For example, “TAR” could encompass non-predictive coding techniques such as “clustering.”

**Although software algorithms differ, most analyze the relationship between a document’s words, word order, characters, and repetitive text. This analysis, in turn, allows the software to compare one document to another.

Have questions?  Please contact me at


De-duplication (“de-duping”) is the process of comparing electronic records based on their content and characteristics and removing duplicate records from the data set so that only one instance of an electronic record is produced when there two or more identical copies. De-duplicating a data set is a smart way to reduce volume and increase efficiencies of review.  There are three types of de-duplication: case, custodian, and production de-duplication.

Case de-duplication involves retaining only single copies of documents per case irrespective of custodian.  This is sometimes referred to as de-duplication across custodians. For example, if an identical document resides with Mr. A, Ms. B and Miss. C, only the first occurrence of the file will be processed (Mr. A’s) for review/production.  Assuming those same facts, if one were to apply custodian-level de-duplication (i.e., de-duplication within a custodian) the system will maintain one copy for each of Mr. A, Ms. B, and Miss C – or, one copy per custodian.  Finally, if multiple copies of a document reside within the same production set, de-duplication at the production level ensures that only one of those documents are produced.

De-duplication is an important step to implement because file systems can contain many copies of the same document.  For example, each time an email is sent it typically creates two additional copies of the email and its attachments, one in the sender’s sent-items folder and once in the recipient’s inbox. An email may also be sent to multiple recipients, thereby creating more copies.  To review each of these documents, code them consistently, and produce multiple copies of an identical document creates inefficiencies and avoidable costs.  And so, it is important to evaluate de-duplication efforts.

Have questions?  Please contact me at

The Electronic Discovery Reference Model (EDRM) is a framework that outlines standards for the recovery and discovery of digital data.  An EDRM diagram created by Duke Law ( represents a conceptual view of the e-discovery process, which is not a linear process, necessarily.  In fact, you may engage in some, but not all of the steps identified in the diagram. Or, you may engage the process in an order different than outlined in the diagram.  The steps in the process include: information governance, identification, preservation, collection, processing, review, analysis, production, and presentation.

What exactly these steps entail is briefly set forth below:

  • Information governance – involves organizing and maintaining (or disposing of) your electronic data in a way that risks and expenses are mitigated should a dispute, investigation or litigation arise (i.e., data retention policies).
  • Identification – this step involves locating potential sources of ESI.
  • Preservation – simply stated, preservation means taking those steps necessary to ensure that potentially relevant ESI is protected from alteration or destruction.
  • Collection – involves gathering the ESI that may be potentially relevant for purposes of reviewing, and maybe producing, same during the discovery process.
  • Processing – is a technical step in the process that involves converting the collected ESI to a format that can be reviewed and analyzed.
  • Review/Analysis – once the ESI is collected and processed, the data is promoted for review so that attorneys can evaluate the data for relevance and privilege.
  • Production – the provision of relevant, non-privileged ESI to your adversary (or the investigating body) during discovery.
  • Presentation – is the final step of displaying ESI to another (jury, judge, or expert) in the form of a demonstrable for the purpose of eliciting additional information or persuading an audience.

Experts in the field opine the diagram is intended as a basis for discussion and analysis, not as a prescription for the only way to approach e-discovery.

Have questions?  Please contact me at