Chapman Law Review
SEARCHING FOR AN ANSWER: DEFENSIBLE E-DISCOVERY SEARCH TECHNIQUES IN THE ABSENCE OF JUDICIAL VOICE
Copyright (c) 2013 Chapman Law Review; Harrison M. Brown
“No longer can the time-honored cry of ‘fishing expedition’ serve to preclude a party from inquiring into the facts underlying his opponent’s case. Mutual knowledge of all the relevant facts gathered by both parties is essential to proper litigation.”
The past two decades have seen a widespread shift from original physical information storage technologies, to new, digital information technologies, resulting in an exponential rise in the amount of information that is created, processed, and stored. This “inflationary dynamic” has caused written information to increase to never-before-seen levels, resulting in a new landscape which makes it prohibitively expensive, if not impossible, for litigation to carry on as it has up until now.
“Today, most litigation includes electronically stored information (ESI) as a critical aspect of the discovery and production phase.” Because ESI is produced in such large quantities and the increase in ESI easily adds to the cost of review, manual or linear review has significantly decreased in e-discovery cases. In its place, attorneys have frequently used legacy search techniques, such as keyword searches, to filter data for producing responsive documents in discovery. These search methods, however, are not without their own problems and are increasingly coming under attack.
Instead, advanced automated search methods such as concept searching and predictive coding have emerged as efficient ways to comb ESI for responsive documents and are “more likely to produce the most comprehensive results.” Although progress has been made in recent years, many attorneys remain reluctant to move away from less reliable manual review and legacy search methods and embrace advanced search techniques; this is in part due to a lack of consensus on which particular technology should be used. While the bench is at times supportive of advanced search techniques, it has yet to expressly endorse one type.
Rather than wait for judicial approval of a particular kind of technology, which may not come, counsel should cooperate throughout the entire process of electronic discovery. Cooperating with opposing counsel in developing search protocols will help avoid disputes that may later arise about the appropriateness and sufficiency of search efforts taken by each party, which in turn will reduce discovery deficiencies. Developing and documenting a defensible search methodology prepares a party to defend the reasonableness of search protocols should a dispute arise and assures quality control in e-discovery.
Part I of this Note describes the modern information inflationary epoch and how traditional manual document review and production cannot keep pace with the demands inherent in this sea of change. Part II surveys institutional attempts to streamline e-discovery and investigates the efficacy of commonly used legacy search methodologies. Part III introduces two of the most promising alternative search techniques in practice today. Part IV examines recent case law and other authorities on whether e-discovery experts are needed to support a party’s search protocols. Lastly, Part V discusses steps parties can take to create defensible search protocols in light of the bench’s silence on its preferred search methodologies.
I. Manual Review–From Gold Standard to Obsolete
Although the way people communicated through written media remained unchanged for many years, the world has recently seen evolutionary changes in the way people write and communicate. This shift is primarily a result of the advent of the personal computer as well as the growth of interconnected global networks. Consequently, the total amount of written information has multiplied to previously unimaginable levels. This growth in volume has had a profound impact on litigation as “it places at severe risk the justice system’s ability to achieve the ‘just, speedy and inexpensive’ resolution of disputes, as contemplated by Rule 1 of the Federal Rules of Civil Procedure.” As such, manual review, once considered the “gold standard” of document review, is now infeasible and obsolete in an increasing number of cases.
A. Information Inflation
Information technology, simple and static for more than fifty centuries, has drastically changed in recent years as an evolution in writing resulted in information inflation. This is primarily attributable to the emergence of a “‘digital realm’ . . . created by an accretion of technological advances, each built on preceding advances.” These advances “include digitization; real time computing; the microprocessor; the personal computer, email; local and wide-area networks . . . the evolution of software . . . [and] the World Wide Web . . . .”
The past two decades have seen an exponential rise in the amount of information that is created, processed, and stored. “Computers have enabled the [large-scale] creation of  information . . . and unleashed an unprecedented deluge of data,” the results of which are staggering. In 2006 alone, the world “created, captured and replicated enough digital information to fill all of the books ever created in the world, 3 million times.” Society simply stores information in a profoundly different way than it did previously. Because of advances in technology and the integration of society into cyber-networks, the world has been forced to adapt to an ever-changing digital frontier.
In the legal world, the various types of discoverable materials in digital form are proliferating. ESI covers data similar to previous hard-copy documents, but also includes more types that were never found in the pre-electronic world, such as e-mail messages. An estimated 247 billion e-mail messages were sent in 2009, a number expected to more than double by 2013. As of 2010, the average corporate worker sends and receives upwards of 110 e-mail messages per day. Other types of information now discoverable as ESI include “instant messaging, word processing with hyperlinks, integrated voice mail, . . . structured databases of all kinds, Web pages, blogs, and e-data in all conceivable forms.” With the types and volume of ESI continuing to expand to enormous levels, the use of manual review as a viable tool in litigation is seemingly in doubt.
B. Manual Review is Ill-Suited for Today’s Legal World
The traditional “discovery review process is poorly adapted to much of today’s litigation.” Manual review is being forced out of the litigation process as a result of time constraints and skyrocketing costs associated with the information inflation. With the amount of ESI in lawsuits expanding greatly, “[t]he cost of manual review . . . is prohibitive, often exceeding the damages at stake.” Moreover, large data sets often make it impossible to complete manual review in a timely manner. Lastly, the efficacy of manual review has been greatly called into question.
C. Manual Review Cannot Keep Up With the Demands of Modern Litigation
The huge volume of available ESI poses unique challenges–both in terms of cost and time to complete the review–which traditional document review simply cannot meet. Prior to the recent information inflation, complying with discovery requests evoked a familiar image of young attorneys wading through “mountains of boxes filled with dusty, poorly organized documents.” Confronted with such a task, the only practical action that could be taken was to read each document linearly, or in a serial fashion.
While the presence of hundreds of boxes of documents may have been concerning to young associates just a few years ago, today that same amount of data might be found on a single computer hard drive. Additionally, as the ability to create and store copious amounts of data rapidly increases, the cost to store that information falls. Consequently, “more individuals and companies are generating, receiving and storing more data, which means more information must be gathered, considered, reviewed and produced in litigation.” Whereas a small business may have once had a single file cabinet full of paper records, a typical small business today stores the digital equivalent of as many as 2,000 file cabinets.
Accordingly, manual review is becoming neither workable nor economically feasible. As the court remarked in Pension Committee v. Banc of America, we live in “an era where vast amounts of electronic information is available for review,” and therefore “discovery in certain cases has become increasingly complex and expensive.” E-discovery accounts for as much as 25% of the total cost of litigation, and the biggest single cost in the process is attorney review time of voluminous data. “[T]o the extent that a particular document is likely to be the object of a discovery request, it potentially can also represent a very real liability. The cost of collection, review and production often exceeds $2 per document–and corporations produce and store many billions of documents annually.” As such, it is not unusual for the cost of reviewing information to exceed the damages at stake, forcing companies to settle cases out of necessity, rather than based on the merits.
Moreover, large amounts of ESI make it impossible to meet the time constraints imposed in litigation. For example, it would take approximately fifty-four years to complete the review of a dispute with one billion e-mails, with one hundred reviewers working ten hours per day, seven days a week. Limiting review to just one percent of the total universe of documents would still take twenty-eight weeks to complete.
This scenario is increasingly becoming a reality, as seen recently in In Re Fannie Mae Securities Litigation, where the D.C. Court of Appeals affirmed the district court’s order holding the Office of Federal Housing Enterprise Oversight (OFHEO)–the federal agency that regulates Fannie Mae and Freddie Mac–in contempt for failing to comply with a discovery deadline to which it agreed. In 2006, individual defendants who were former Fannie Mae executives subpoenaed thirty categories of documents from OFHEO, a nonparty to the litigation. In 2007, after the OFHEO claimed that it had produced all the documents requested, the defendants later conducted a Rule 30(b)(6) deposition of OFHEO and learned that OFHEO had failed to search all of its off-site records. Later, after OFHEO failed to produce additional documents, the individual defendants moved to hold OFHEO in contempt. After the contempt hearing began, the parties stipulated that OFHEO would continue to conduct searches and provide all responsive documents by January 2008.
Requiring them to review approximately 660,000 documents, “OFHEO undertook extensive efforts to comply with the stipulated order, hiring [fifty] contract attorneys solely for that purpose. The total amount OFHEO spent on the individual defendants’ discovery requests eventually reached over $6 million, more than 9 percent of the agency’s entire annual budget.” Despite this, after moving for and receiving two extensions, OFHEO failed to meet the deadline. The district court granted the individual defendants’ renewed motions for contempt, finding that “OFHEO’s efforts at compliance were ‘not only legally insufficient, but too little too late.”’ The district court imposed sanctions on OFHEO, and the Court of Appeals upheld the sanctions.
Fannie Mae highlights the problem with manual review: parties using this method will have to commit time and resources that are simply not available. The volume and associated complexity in having to search through large amounts of ESI will only worsen as time goes on, and manual review is ill-equipped to confront the problem. As such, “automated search methods should be viewed as reasonable, valuable, and even necessary.”
II. The Myth of Manual Review as the Gold Standard in Discovery
Prior to the information inflation, manual review was long considered the “gold standard” in discovery. However, as discussed above, manual review is increasingly becoming more challenging by the sheer amount of data typically generated and stored by almost every organization that uses computer technology. Even assuming, arguendo, that practitioners had the resources and time to undertake manual review of voluminous sets of ESI, studies demonstrate that manual review of large data sets is imprecise and fails to live up to its billing.
A widely-cited study on the efficacy of manual review, conducted by David Blair and M.E. Maron in 1985, shows the problems inherent in the use of human language among the various persons who can be involved in a dispute, and how difficult it can be to take this into account in a search for informational records. The Blair and Maron study involved a manual review of about 40,000 documents spanning 350,000 pages of text captured in an IBM database to be used in a large corporate lawsuit. Attorneys collaborated with paralegal search specialists to find all of the relevant documents. The attorneys estimated that they had found 75.5% of the relevant documents, however Blair and Maron’s more detailed analysis found that the actual recall value was 20.26%, meaning that the attorneys believed that they were retrieving a much higher percentage of relevant documents than they actually were.
Blair and Maron found that the different parties in the case used different words in their search for relevant documents, depending on their point of view. For example, the attorneys representing “[t]hose who were personally involved in the event, and perhaps culpable, tended to refer to it euphemistically as, inter alia, an ‘unfortunate situation,’ or a ‘difficulty.”’ However, “[t]hose who discussed the event in a critical or accusatory way referred to it quite directly–as an ‘accident.”’
Blair and Maron also found that the efficacy of manual review is directly tied to the amount of documents to be evaluated. Notably, they found that “the value of Recall decreases as the size of the database increases, or, from a different point of view, the amount of search effort required to obtain the same Recall level increases as the database increases, often at a faster rate than the increase in database size.” Thus, manual review is plagued not only by time and expense constraints, but becomes a less effective tool as document universes grow, making it ill-suited for modern litigation.
The information inflation we are experiencing as a result of the incorporation of technology and new tools in society presents new challenges to litigation. While technology may be the source of the problem in e-discovery, it also appears to be the best possible solution. As lawyers began to realize that manual review could not keep pace with the demands of e-discovery, the legal community began to collaborate and establish working models to assist in the discovery process. Attorneys also utilize tools like optical character recognition (OCR) technology to digitize paper documents in order to make use of search methodologies such as keyword and Boolean searches that increase efficiency and reduce costs.
A. Attempting to Bring Order to a Disorderly Problem
E-discovery experts and consultants George Socha and Tom Gelbmann co-founded the Electronic Discovery Reference Model (“EDRM”) in “May 2005 to address the lack of standards and guidelines in the e-discovery market.” Since then, “over 900 e-discovery experts, vendors, and end-users from more than 250 organizations have worked together to develop standards and frameworks for addressing e-discovery challenges.” By supplying guidelines, standards, whitepapers, research materials, webinars, news, data sheets and other items, the EDRM’s model has become “widely accepted and employed by most e-discovery specialists.”
EDRM’s nine-step flowchart is a conceptual, non-linear, and iterative model of the e-discovery process. The steps include: information management, identification, preservation, collection, processing, review, analysis, production, and presentation. Each step works toward the ultimate goal of translating an excessive volume of documents into relevant and usable material in litigation. The EDRM flow chart “illustrates how the volume of data decreases and the relevance increases as the work progresses.” The three steps of processing, reviewing, and analyzing are performed concurrently. While the initial steps of culling data and the final steps of incorporating those materials in a coherent way in litigation are important and present unique challenges in and of themselves, the middle three steps tend to be the areas in which the problems of the data explosion are most often felt and dealt with.
The goal of the processing step is to reduce the volume of ESI and convert it, if necessary, to forms more suitable for review and analysis. To achieve this, practitioners may “reduce the overall set of data collected by filtering out files that are duplicates or known to be irrelevant after further investigation.” Duplicate files are removed here, and “[f]iles that are probably not relevant because of factors such as date, type, or origin may also be excluded at this step, if they were not previously excluded” by technicians working during the first four steps. Hot files, or potentially adverse or embarrassing materials, may also be flagged at this stage, as they might have an immediate impact on litigation or make finding similar relevant materials among the remaining files easier.
In processing, practitioners are encouraged to “consider the relationships between the files or documents obtained to better understand what data has been collected and determine whether additional data extraction may be required.” The processing step is presented in a linear fashion: moving from assessing to preparing data, to selecting and normalizing, to validating output and exception handling, and then to preparing output and export of the data. Employment of this step is intended to be on an “iterative basis,” which means that practitioners often have to make changes to prior tasks and do them again.
The seventh step is the analysis stage. Once relevant materials have been identified, this is the stage where litigation teams attempt to make heads or tails of the information they have, hoping to make informed decisions about strategy and scope through reliable methods based on verified data. Here, litigators identify information such as “key issues, witnesses, specific vocabulary and jargon, and important individual documents.” Of course, “[t]his is a traditional legal step that competent trial lawyers are already qualified to perform.” Analysis becomes uniquely challenging when large quantities of ESI are produced.
Accordingly, when dealing with large quantities of ESI, the review step becomes the most important and the most difficult. The review step is the point where ESI collected in the previous stages is studied and sorted for use in the latter stages. Here, practitioners “review for relevance, confidentiality and privilege, and related activities such as redaction.”
Litigation is made more difficult today by the gigantic hurdles that must be overcome in the document review stage. As practitioners become consumed by this process and expend copious resources to identify usable materials, review becomes an end, rather than a means, to arrive at a legal solution that all parties can agree upon. However, the EDRM model provides a good starting point and has at least alleviated some of the problems caused by large data sets.
B. Legacy Search and Identification
Although the use of technology in the search and identification phase is not mandated by any court rules, technology is practically required to reduce the amount of manual effort, time, and expense involved in searching for and identifying potentially relevant ESI. Legacy search and identification techniques represent some of the first attempts to harness technology in order to manage large sets of ESI, and are the most widely used today.
C. Keyword Search Models
Keyword and Boolean search methods are widely used and vetted techniques for filtering data in order to produce responsive documents in discovery. This has much to do with the legal profession’s longtime familiarity with major internet legal retrieval services that allow for searches of databases containing statutes and case precedents. However, as recent cases and studies have shown, there are pitfalls to using this technique as it often fails to uncover a large portion of potentially relevant data.
A keyword search, in its simplest form, searches for documents possessing a specific term specified by a user. Keyword searches are most often used to identify documents that are either responsive or privileged, and for large-scale culling and filtering of documents. There are limitations with basic keyword searches, however, as they can fail to uncover variants of a word and will not find documents with typographical errors or misspelled words in either the document or query. To address some of the limitations of keyword searches, many databases allow for the use of “wildcards” that enable a user to search for different forms of a certain word.
Boolean searches add another dimension to keyword searches, allowing users to search for multiple keywords together, or exclusive of each other, or within a certain distance from each other. This method “allows multiple keywords or search terms to be linked together to improve the relevancy of the documents identified by this methodology.” Other operators include fuzzy searching which can find misspelled terms, and stemming, which search for variations on word endings.
D. E-mail or Conversation Threading
The goal of e-mail or conversation threading is “to find and organize messages that should be grouped together based on reply and forwarding relationships.” Typically, an e-mail thread will link together a series of e-mail responses and/or forwards that are created from an original message. This technique may be useful if a particular topic is potentially relevant because responses or forwards of the original message may also contain relevant data. This method is limited, however, when a response to a message changes the subject heading or adds additional information.
E. Shortcomings of Legacy Searches and the Need for Alternatives
Practitioners have adopted legacy search methodologies in earnest, particularly keyword and Boolean searches, to address the expanding universe of information and its associated problems. Courts accept the use of keyword searching to “define discovery parameters and resolve discovery disputes.” Despite the widespread use of these techniques, like manual review, keyword searches can be surprisingly inaccurate.
As noted above, the Blair and Maron study revealed a significant gap or disconnect between lawyers’ perceptions of their ability to ferret out relevant documents and their actual ability to do so. New research reaffirms the findings of Blair and Maron as applied to keyword searches. In one such study conducted by the Text Retrieval Conference (TREC), researchers found that Boolean keyword searches could only locate between 24% and 57% of the total number of relevant documents. Additionally, these searches produce many false positives, and it is not uncommon for a poorly chosen keyword to return more “junk” than responsive documents.
Not only are these search methodologies inaccurate, but the adversarial manner by which attorneys use them increases the likelihood that the search will fall short of its target. In an interesting analogy, the method by which most attorneys employ legacy search techniques is similar to the children’s game of “Go Fish.” When a party requests ESI, the responding party is entitled to privacy and does not have to grant unfettered access to its document database. Yet at the same time, the requesting party is able to make requests for production without revealing what it is that they are looking for. Absent cooperation, the requesting party guesses which keywords might produce evidence to support its case without having much, if any, knowledge of the responding party’s “cards,” or the terminology used by the responding party’s custodians. “This process involves as much chance as skill,” takes too long, produces a vast quantity of false positives, and misses many relevant documents.
III. Alternative Search Technologies
Cognizant of the fact that manual review is unworkable and that legacy search methodologies are broken, the Sedona Conference acknowledged that “[t]he legal profession is at a crossroads: the choice is between continuing to conduct discovery as it has ‘always been practiced’ . . . or, alternatively, embracing new ways of thinking in today’s digital world.” Indeed, lawyers are gradually beginning to use alternative forms of review with promising results. At the same time, studies demonstrate that these methods, such as concept searching and predictive coding, are able to achieve increasingly higher levels of recall and precision. Moreover, courts are beginning to take notice of the potential these new methods have to offer. While technology is the source of many of the problems with e-discovery today, technology also represents the solution.
A. Concept Searching
Concept searching allows users to “specify a concept and documents that describe that concept to be returned as the search results.” This technique examines the context in which a term appears and looks for similar terms or concepts–a method that is particularly useful in identifying “potentially relevant documents when a set of keywords are not known in advance.” When conducted in tandem with legacy search methods such as keyword and Boolean searches, the chance of finding relevant ESI greatly increases.
Concept searches have gained the attention of the courts as well as seen in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority (WMATA), a case involving a claim by disabled persons that the WMATA violated the Americans with Disabilities Act and other federal laws. WMATA used an e-mail program that automatically deleted all non-archived e-mail messages every sixty days, and it failed to suspend the deletion program until more than two years after the original complaint was filed. Plaintiffs sought restoration and review of backup tapes to find relevant deleted messages, but WMATA objected, arguing that the backup tapes were not reasonably accessible. The court, however, found support for the plaintiffs’ request, determining that the benefit of production outweighed the burden to WMATA, and subsequently ordered the restoration and search of the backups according to a protocol that the parties were directed to negotiate. In doing so, the court suggested that the parties consider using concept searching as opposed to other methods.
B. Predictive Coding
Another automated search method that has recently gained attention is predictive coding, or computer-assisted coding. These coded documents are then used by the computer system in an iterative process to code additional documents across the full collection. This process merely accelerates the discovery process; it does not replace manual review by humans, but optimizes it.
The reviewing human typically codes a controlled sample group of documents based on a series of “yes” or “no” questions, such as whether each document is responsive, relevant, or privileged. “The system builds an ontology in the background as it learns from the expert and presents subsequent samples.” After running enough iterations, the system will have “sufficiently built the ontology to the point where it can ‘predict’ what the human will” pick out in the sample he or she is reviewing. Considering that manual review can be effective in small samples, predictive coding efficiently combines a human’s analytical assessments with the processing power of a computer.
As mentioned above, the results of the TREC 2008 Legal Interactive Task study suggest that predictive coding may in fact be able to improve upon manual review and legacy search methods. One participant in the study employed predictive coding in response to a mock request to produce documents from a collection of 6,910,192 documents. By coding a smaller sample of documents and inputting them into the computer, the researchers examined only 7,992 documents, approximately 860 times fewer than would have been necessary to complete an exhaustive manual review. Still, the results compared favorably to the other search methods, as the researchers achieved recall rates ranging between 62.4% and 81.0%, far exceeding the 20.26% average recall rate in the Blair and Maron study.
IV. Recent Case Law on Reasonable Search Protocols
Courts have yet to embrace any of the new search technologies, instead only generally alluding to potential benefits they offer, but not going so far as to expressly endorse a particular method. In the meantime, lawyers must still meet their discovery obligations and defend the decisions they made when challenged on their selection of relevant materials. While keyword searching may be the most widely available and employed option, it is still quite possible to use an inadequate Boolean search.
Until only recently, few cases offered guidance on the reasonableness of electronic searches in e-discovery. In 2006, Congress passed a set of amendments to the Federal Rules of Civil Procedure, sometimes known as the “ESI Amendments,” however these did not mention the use of electronic searches. Decisions regarding manual review only offered nominal guidance since they did not address the technological complexities of electronic searches.
Before 2007, the Sedona Conference’s Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery was the only significant resource concerning the reasonableness of e-discovery search methods. The commentary’s goal was to provide a guide on the “nature of the search and retrieval process.” However, while the commentary discusses keyword searches as a useful method to find particular documents, it notes its shortcomings in certain contexts and does not suggest a particular alternative search method.
More recently, however, a few opinions have attempted to provide guidance on what methods constitute reasonable searches. United States v. O’Keefe, Equity Analytics, LLC v. Lundin, and Victor Stanley, Inc. v. Creative Pipe, Inc. all suggest that keyword searching may not be sufficient. Moreover, Victor Stanley goes on to trumpet alternative search methods in certain circumstances, but does not expressly endorse a preferred technique. Common throughout all three cases is the requirement that attorneys be prepared to defend their search methods if challenged, and that such preparation may involve the use of a technical expert, or at least someone with the qualifications needed to design and implement an effective search methodology.
A. United States v. O’Keefe
O’Keefe suggests that expert evidence may be required to evaluate the efficacy of a keyword search in identifying responsive documents. In O’Keefe, the court found a number of inadequacies in the government’s search for records and concluded that it had failed to comply with a discovery order. Despite this, the court rejected the defendants’ argument regarding the adequacy of the search terms used by the government, holding that the defendants would have had to specifically contend that the search terms used by the government were insufficient in a separate motion to compel, which would be based on evidence rising up to the requirements of Rule 702 of the Federal Rules of Evidence.
Defendant O’Keefe, a Department of State employee, was indicted for allegedly receiving gifts from co-defendant Agrawal in return for expediting visa requests for employees of Agrawal’s company. Whether such requests were expedited routinely by various consulates without receipt of anything of value became an issue, and the court ordered the government to search both its hard copy and electronic files for responsive documents.
After receiving the government’s production, the defendants filed a motion to compel, protesting that the government had not met the judge’s order. The defendants expressed concern that the government had not had its employees search their own electronically stored information for documents, making it “impossible to identify the source or custodian of [each] document.” Moreover, they contended that the government had not revealed what steps it had taken to preserve documents.
The court concluded that the defendants’ concern over deficiencies in the government’s production of electronically stored information was “an insufficient premise for judicial action.” By analogy, Rule 37(e) of the Federal Rules of Civil Procedure provided that sanctions were inappropriate if loss of such information was the result of the “routine, good-faith operation of an electronic information system.” Thus, if the defendants intended to charge that the government destroyed evidence that should have been preserved, the claim would have to be based on direct evidence. It would not be enough to surmise that they should have received more than they did.
The court also held that any contention of the defendants that search terms used by the government were ineffective would have to be made in a motion to compel supported with expert testimony pursuant to Rule 702 of the Federal Rules of Evidence. The sufficiency of search terms was “a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics.” The court also cited the Sedona Conference’s Best Practices Commentary and noted the limitations of keyword searches, but went on to explain that evaluating particular search methodologies is not easy:
[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread. This topic is clearly beyond the ken of a layman and requires that any such conclusion be based on evidence that, for example, meets the criteria of Rule 702 of the Federal Rules of Evidence.
B. Equity Analytics, LLC v. Lundin
Similarly, the Equity Analytics court suggests that Rule 702 expert evidence may be required to evaluate the methods employed to collect documents. The court in Equity Analytics stated that determining whether “a particular search methodology, such as keywords, will or will not be effective certainly” requires “knowledge beyond the ken of a lay person (and a lay lawyer) and requires expert testimony that meets the requirements of Rule 702 of the Federal Rules of Evidence.”
In Equity Analytics, the court was asked to resolve the dispute between the parties in their attempt to develop a search protocol for examination of the defendant’s computer. The plaintiff alleged that the defendant had gained illegal access to the plaintiff’s electronically stored information after the defendant was fired by the plaintiff. The defendant’s computer contained a wide range of materials, many having nothing to do with the lawsuit. The defendant opposed production of the data and proposed that only certain file types be searched but the plaintiff objected.
The court declined to determine whether the proposed search was adequate based on the arguments of the attorneys alone, instead requiring the plaintiff to submit an affidavit from an expert explaining why the narrow search proposed by the defendant was not enough. The court reasoned that such expert testimony would provide the information needed to best assess how to balance the plaintiff’s need for information with the privacy of the defendant.
These two cases suggest that expert evidence may be required to assess the searches, and that experts may be needed prior to and during litigation to design search techniques to ensure that the searches will be defensible.
C. Victor Stanley, Inc. v. Creative Pipe, Inc.
Like O’Keefe and Equity Analytics, Victor Stanley requires qualified persons to craft an effective search methodology, but it does not go so far as to require an expert. However, through discussion of electronic searches, the court offers a more practical standard for assessing search protocols.
The issue in Victor Stanley was whether the defendants had waived attorney-client privilege for documents that counsel had accidentally produced. The defendants had used keyword searches to identify non-privileged documents. One of the defendants, and two of the defendants’ lawyers, chose about seventy keywords for their expert to use in searching for protected documents in the defendants’ ESI through use of a search protocol agreed upon with the plaintiff. They did not, however, manually review any of the results of that search for privilege. In addition, the defendants did a manual privilege review of the titles of some documents that were reportedly not text-searchable. Despite the expert’s search for protected documents, the plaintiff alerted the defendants to documents in the production that appeared to be privileged or protected. The defendants sought the return of these documents, but the plaintiff countered that the defendant had waived privilege.
Since the case was decided before Rule 502 of the Federal Rules of Evidence was adopted, the court used a five-factor test from McCafferty’s, Inc. v. Bank of Glen Burnie to evaluate whether the defendant had waived privilege. The factors in McCafferty’s include: “(1) the reasonableness of the precautions taken to prevent inadvertent disclosure, (2) the number of inadvertent disclosures, (3) the extent of the disclosure, (4) any delay in measures taken to rectify the disclosure, and (5) overriding interests in justice.” Of particular note in Victor Stanley is the reasonableness factor, which is similar to the requirement in Rule 502(b)(2) of the Federal Rules of Evidence, which requires analysis of whether “the holder of the privilege or protection took reasonable steps to prevent disclosure” in assessing whether a disclosure results in waiver.
Victor Stanley went on to hold that the defendants had waived privilege, finding they had failed to meet their burden to establish that their search was satisfactory because of their failure to identify the keywords they used to conduct the searches, to explain why they chose the keywords, and to explain what type of search was done. The court spent considerable space discussing this latter failure, stating that “for the benefit of future cases,” parties should state the procedures they follow in the process of conducting searches, and the court then further provided a lengthy footnote summarizing search methodologies discussed in the Sedona Conference’s Best Practices Commentary.
One aspect of the defendants’ failure of proof was that they did not show how the defendants and their attorneys were qualified to design the search that they used and analyze the results of the search to assess its reliability, appropriateness, and implementation. The court observed that when it comes to keyword searches, the “proper selection and implementation obviously involves technical, if not scientific knowledge.” Victor Stanley does not go as far as O’Keefe and Equity Analytics in suggesting that the person who makes a search protocol must be an expert under Rule 702 of the Federal Rules of Evidence, but Victor Stanley nevertheless holds that for a contested search to withstand judicial scrutiny, a party must be able to justify the steps it undertook.
V. Moving Forward
E-discovery decisions should always be based on honoring the goal of Rule 1 of the Federal Rules of Civil Procedure: “the just, speedy, and inexpensive determination of every action and proceeding.” One way to accomplish this is by adopting advanced search methodologies. While advanced search techniques are becoming more ubiquitous, progress remains slow.
Litigators may accept simple keyword searching, yet be reluctant to use alternative search techniques. They may not be convinced that the chosen method would withstand a court challenge. They may perceive a risk that problem documents will not be found despite the additional effort; and an opposite risk that documents might be missed which would otherwise be picked up in a straight keyword search.
Compounding this problem, however, is the lack of express judicial approval for these search technologies. For example, to date, no reported case, federal or state, has ruled on the use of predictive coding. It is possible that many attorneys are reluctant to act as the proverbial “guinea pig[s],” waiting for official guidance on how to proceed in these types of searches first. Magistrate Judge Andrew Peck pondered this issue in a recent editorial, offering the following:
Perhaps they are looking for an opinion concluding that: “It is the opinion of this court that the use of predictive coding is a proper and acceptable means of conducting searches under the Federal Rules of Civil Procedure, and furthermore that the software provided for this purpose by [insert name of your favorite vendor] is the software of choice in this court.” If so, it will be a long wait.
Aside from possible breaches of judicial ethical rules, there are presumably various reasons for this. As the Sedona Conference’s Practice Point 3 states, “[t]he choice of a specific search and retrieval method will be highly dependent on the specific legal context in which it is to be employed.” Formal support for a particular search technique is an impractical one-size-fits-all approach that ignores variables that change from case to case, including how the search application was used, by whom, the type of case, alternatives that were or should have been considered, and cost.
This is not to say that the bench does not support the use of innovative search methodologies in discovery; in fact, the reality is quite the opposite. Judge Peck himself expresses support for judicial decisions critiquing keyword searches, particularly O’Keefe, Equity Analytics, and Victor Stanley.
In William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., Judge Peck notably issues “a wake-up call to the Bar in this District about the need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information.” The problem in William Gross was not the keyword technology that was used, but the failure of the parties to come to an agreement on a list of keywords. When the responding party deployed overbroad and imprecise keyword search terms to respond to a discovery request, Judge Peck bemoaned the case as “the latest example of lawyers designing keyword searches in the dark, by the seat of the pants, without adequate . . . discussion with those who wrote the emails.”
The court ordered a multi-step framework that the litigators must use when selecting a keyword search strategy. Judge Peck ordered that the attorneys “at a minimum must carefully craft the appropriate keywords, with input from the ESI’s custodians as to the words and abbreviations they use, and the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of ‘false positives.”’
Judge Peck’s opinion in William Gross demonstrates how courts are developing factors to assess the reasonableness of a litigant’s search methodology on a case-by-case basis rather than assessing search methodologies individually and out of context. The two most important factors are “cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI.”
VI. Cooperation is Key
Cooperation is being touted by an increasing number of courts as an effective way to reduce the costs and risks of e-discovery. As the court in Mancia v. Mayflower Textile Services Company explains, cooperation among counsel “will almost certainly result in having to produce less discovery, at lower cost . . . [and] will almost certainly result in getting helpful information more quickly” for the requesting parties.
Parties should attempt to cooperate with opposing counsel to agree on a discovery plan that sets forth specific protocols for identifying responsive and privileged documents. Courts are just as quick to reward parties that cooperate as they are to punish those that do not.
In terms of developing search protocols, a party’s failure to cooperate can have dramatic effects beyond driving up the cost of litigation and overburdening the justice system. For example, the court in William A. Gross did not willingly decide to order its own search protocol– instead, the court opined that the parties’ inability to agree put the court “in the uncomfortable position of having to craft a keyword search methodology for the parties, without adequate information from the parties.” Moreover, a court may even be motivated to shift discovery costs to uncooperative parties.
Perhaps the benefits of cooperating are best realized if parties are able to work together from the outset, as this may decrease the chance that a dispute about the search efforts taken by each party will later develop. With regard to search terms protocols, parties can further this goal by collaborating on which search process to use, the terms to be used in that process, and by agreeing to participate in an iterative process where successive searches can be modified and improved upon. On a more fundamental level, parties are encouraged to come to the table armed with knowledge of likely sources of ESI, its custodians, and understanding of the steps and costs required to access the ESI. A party’s own preparation in this area can help facilitate cooperation and smooth discovery.
VII. Quality Control of Defensible Search Protocols
Despite its strong support for advanced electronic search tools, the Sedona Conference notes that “[t]echnologically advanced tools, however, ‘cutting edge’ they may be, will not yield a successful outcome unless their use is driven by people who understand the circumstances and requirements of the case, as guided by thoughtful and well-defined methods, and unless their results are measured for accuracy.” This underscores the importance of strategically planning, documenting, and supervising the entire e-discovery process. Also, as the Sedona Conference makes clear, “parties should expect that their choice of search methodology will need to be explained . . . in subsequent legal contexts (including depositions, evidentiary proceedings, and trials).”
A party should be ready to place its discovery plan’s effectiveness on the line by including a method for testing and assessing the effectiveness of their search protocols, and evaluating recall and precision rates either by sampling supposed nonresponsive documents and/or documents reviewed during the primary review phase. If parties wish, they may also employ third party professionals to sample the effectiveness ofa set of search protocols. As discussed above, parties should consider retaining experts to develop, execute, and defend a protocol when appropriate.
Aside from enabling a party to more adequately defend itself should a dispute over discovery arise, such practices promote self-policing and quality control. In doing so, parties will more reliably know at the end of the discovery stage how accurate and complete their methods were, and will not be left to question whether they violated the duty to preserve, uncover, or disclose relevant evidence and the possibility that privilege or confidential information may have been inadvertently produced.
The message to be taken from the cases of O’Keefe, Equity Analytics, and Victor Stanley is clear: when parties decide to use a particular ESI search method, it needs to be aware of the intricacies of its own storage system and craft a discovery plan accordingly. Should an opposing party challenge the method selected, the discovery propoent should then expect to support its position with this information, perhaps with the assistance of discovery experts.
The information inflation shows no signs of slowing down. Parties to litigation have the choice of confronting this problem head on by embracing newer and more technologically advanced search methodologies, or proceeding at their own risk as they have before. In doing so, they face rising expenditures of time and money because their search and retrieval method is unlikely to be the most efficient or reliable possibility. Regardless of which method they choose to adopt, parties should unquestionably engage in cooperative efforts to arrive at agreeable search protocols, and develop and document thoughtful discovery plans from the ground up so as to best defend their own discovery practices and decisions.
Postscript/Update to E-Discovery Note
As discussed above, in October 2011, Magistrate Judge Andrew Peck suggested that despite a number of judicial opinions highly critical of keyword searching, one reason many attorneys have been slow to adopt new search technology is that they apparently “are waiting for a judicial decision approving of computer-assisted review. . . . If so, it will be a long wait.”
Interestingly, this “long wait” turned out to be just over four months. In February 2012, Judge Peck issued an opinion approving the use of predictive coding. In doing so, Judge Peck specifically noted that his opinion in Da Silva Moore v. Publicis Groupe “appears to be the first in which a Court has approved of the use of computer-assisted review.”
In his October 2011 periodical, Judge Peck set forth guidelines for handling discovery challenges to any proposed use of computer-assisted review that came before him. In this situation, Judge Peck stated that he would pay close attention to the process and results of the search:
[I]f the use of predictive coding is challenged in a case before me, I will want to know what was done and why that produced defensible results. I may be less interested in the science behind the ‘black box’ of the vendor’s software than in whether it produced responsive documents with reasonably high recall and high precision . . . . That may mean allowing the requesting party to see the documents that were used to train the computer-assisted coding system . . . . Proof of a valid ‘process,’ including quality control testing, also will be important.
Judge Peck’s opinion in Da Silva Mooreclosely mirrored the reasoning he set forth in the periodical. In Da Silva Moore, a Title VII action, the plaintiffs objected to defendant MSLGroup’s use of predictive coding “to cull down” over three million documents involved in discovery. The parties agreed to use computer-assisted review but disagreed over how it should be implemented, with the plaintiffs claiming that MSL’s proposal to use a number of rounds to test and refine the searches and review software, and to share the seed documents and documents flagged as relevant or irrelevant, was not reliable or transparent.
Noting his earlier writings to the contrary, Judge Peck plainly held that “[t]his judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI . . . .” However, Judge Peck cautioned that this does not mean computer-assisted review should be used in all cases, or that the exact ESI protocol approved in Da Silva Moore will be appropriate in all future cases that utilize computer-assisted review. Rather, he noted “computer-assisted review is not a magic, Staples-Easy-Button, solution appropriate for all cases.” While admitting that it is “not perfect,” Judge Peck determined that computer-assisted review was better than the alternatives in the case at bar. Judge Peck further encouraged parties to “seriously consider [computer-assisted review] for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review.”
With Da Silva Moore leading the way, a number of other courts have quickly followed suit and have begun to entertain predictive coding as a viable tool in discovery. In the plaintiffs’ challenge to Da Silva Moore, Judge Andrew Carter approved Judge Peck’s ruling and written order supporting computer-assisted review. Furthermore, a Virginia state court approved a computer-assisted review protocol proposed by the defendants in their protective order for purposes of processing and producing ESI. Yet another court criticized the shortcomings of keyword searches and endorsed predictive coding to “allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and . . . [to] significantly increase the effectiveness and efficiency of searches.” Finally, in discussing a scheduling order from the Delaware Court of Chancery, a judge even instructed the parties, without any outside cues, to adopt a predictive coding strategy or demonstrate good cause to avoid it.
In addition to merely lending judicial legitimacy to computer-assisted review, the trend affirms this Note’s emphasis on cooperation between parties and transparency in all aspects of preservation and production of ESI. Citing the Sedona Conference, Judge Peck reiterated in Da Silva Moore that “‘the best solution in the entire area of electronic discovery is cooperation among counsel.”’ One reason why Judge Peck ordered computer-assisted review protocols was that MSL’s “transparency allow[ed] the opposing counsel (and the Court) to be more comfortable with computer-assisted review, reducing fears about the so-called ‘black box’ of the technology.” In upholding Judge Peck’s order, Judge Carter further trumpeted cooperation and transparency as key ingredients in computer-assisted discovery, stating that since the “ESI protocol . . . builds in levels of participation by Plaintiffs,” the plaintiffs will have opportunity to shape the process and thus ensure it meets their needs. Furthermore, in resolving a dispute surrounding a party’s interrogatories and document requests, another court channeled the principles of cooperation of the Sedona Conference, urging counsel not to “confuse advocacy with adversarial conduct” in addressing discovery obligations. Additionally, the use of experts is cited as a valuable tool to evaluate the efficacy of a search protocol in furtherance of these efforts.
With support for technology-assisted review gaining momentum among the judiciary, parties can better position themselves to ride the coming wave by cooperating actively with opposing counsel, developing sensible discovery plans and being prepared to defend them, and sharing these protocols openly and transparently as appropriate.
Search Methodologies, Electronic Discovery Reference Model, http://www.edrm.net/resources/guides/edrm-search-guide/search-methodologies (last visited Nov. 27, 2011); see also infra Part II.D.
a combination of technologies and processes in which decisions pertaining to the responsiveness of records gathered or preserved for potential production purposes … are made by having reviewers examine a subset of the collection and having the decisions on those documents propagated to the rest of the collection without reviewers examining each record.
E-Discovery Institute Survey on Predictive Coding, Elec. Discovery Inst., 2 (Oct. 1, 2010), http://www.ediscoveryinstitute.org/pubs/PredictiveCodingSurvey.pdf; see also infra Part III.B.
[T]here appears to be a myth that manual review by humans of large amounts of information is as accurate and complete as possible–perhaps even perfect–and constitutes the gold standard by which all searches should be measured …. [[However], the relative efficacy of that approach versus utilizing newly developed automated methods of review remains very much open to debate.)
how will they be searched to reduce the electronically stored information to information that is potentially relevant? In this context, I bring to the parties’ attention recent scholarship that argues that concept searching, as opposed to keyword searching, is more efficient and more likely to produce the most comprehensive results.)
(citing Paul & Baron, supra note 3).
a combination of technologies and processes in which decisions pertaining to the responsiveness of records gathered or preserved for potential production purposes… are made by having reviewers examine a subset of the collection and having the decisions on those documents propagated to the rest of the collection without reviewers examining each record.
(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue; (b) the testimony is based on sufficient facts or data; (c) the testimony is the product of reliable principles and methods; and (d) the expert has reliably applied the principles and methods to the facts of the case.);
see also Fed. R. Evid. 702 advisory committee’s note (2000) (noting that Rule 702 provides “general standards that the trial court must use to assess the reliability and helpfulness of proffered expert testimony.”).