Vendor Voice: Yes, Counselor, There Will Be Math

Knowledge of statistics is an essential skill in e-discovery.

Maureen O’Neill, Law Technology News

February 12, 2014, 12:01 AM    |0 Comments

Editor's Note: This article was chosen in a blind competition by the Arizona State University-Arkfeld E-Discovery and Digital Evidence Conference.
There’s a tired old joke out there among lawyers, many of whom sputter and wave their arms in protest when asked to engage in anything involving math: “But I went to law school to avoid math!” For litigators engaged in discovery, however, math is no joke-to competently represent their clients, they must understand a few key statistical concepts. That knowledge is now an essential skill in e-discovery.

Why do litigators need to understand statistics?

• To improve the overall quality of discovery.
By using statistics to measure a discovery process, counsel can take steps to improve the quality of the process. For example, when evaluating search terms to cull a document collection, you may find that a proposed term is bringing in far too many “false positives” (i.e., poor “precision”). Rather than simply reject the proposed term with a subjective characterization of the results, use statistics to quantify the problem. Even better, try an alternative search term and test the new results statistically. If statistical sampling confirms that the new term reduces the number of irrelevant documents (better precision), but doesn’t miss too many relevant ones (i.e., the “recall” is good), you have improved the quality of the search process (and have solid support for a counter-proposal to the other side if necessary).
As another example, counsel can take a simple measurement of the “richness” of a document population—i.e., the prevalence of responsive documents—and use that information to design a more tailored, effective workflow for document review. Once review is underway, sampling can be used to test the quality of the review decisions, and create protocols for remediation where quality falls below an acceptable threshold.
• To more effectively defend clients’ discovery efforts.
Let’s say you’re faced with a challenge to the completeness of a document production. Without statistics, you might say something like this:
“We believe that our client has produced all relevant documents. We’re confident that we arrived at a good set of search terms that found the documents we were looking for. We used an experienced contract review team to examine the documents that hit on the searches, and based on our spot-checks of their work, they made good decisions.”
But with the application of some statistical measurements, you could make this more objective-and more compelling-statement:
“We know with a confidence level of 95 percent that our client has produced at least 90 percent of the relevant documents, based on a statistically valid measurement of the efficacy of the keyword searches and the human decisions about the documents returned by the searches. We also tested the quality of the review by examining statistically significant samples of the team’s work, and we confirmed that their decisions were more than 95 percent accurate.”
A statistically valid, quantitative method to prove up the completeness of a document production, or defend some other aspect of document discovery, can be much more effective than a subjective, from-the-gut argument. Indeed, courts are increasingly directing parties to provide statistical evidence to support their contentions about discovery. Several courts have noted that the defensible use of keyword searches may require the presentation of statistical validation of those searches.[FOOTNOTE 1] And for litigants looking to use more advanced technological means of searching for relevant documents, including predictive coding, the presentation of statistical support undoubtedly will be required.[FOOTNOTE 2]
• To save clients time and money in discovery.
Statistical sampling can create efficiency gains by allowing us to examine a relatively small subset of a document collection and draw valid conclusions about the remainder of the collection. For example, what if your opponent insists that a particular custodian is likely to possess relevant documents and should be subject to discovery, but you disagree. Why not pull a statistically valid sample of the documents and review them; if the sample turns up little or no relevant content, you now have strong, objective ammunition to resist the discovery request, and you have spent relatively little time or effort.
As another example, statistically significant samples of documents can be used to streamline counsel’s quality checks of documents reviewed by a contract attorney team. The quality control review will include fewer documents than a more subjective, “spot-check” approach (which means less time and cost), and the conclusions drawn about the accuracy of the work will be far stronger.
Statistics can bolster proportionality-based objections to discovery. In the first example above, what if the sample showed that the custodian did possess some relevant material, but nothing “hot,” and nothing non-duplicative. You can use the results of your statistical sampling to assert that the cost to collect, process, review and produce this custodian’s documents is not warranted when balanced against the modest gain achieved by producing a few more, uninteresting documents.
Statistical measurement can also create time and cost savings when used to improve keyword searches. By testing the results of keyword searches, the terms can be optimized to boost both precision and recall. This means that fewer documents in total are put through review, and of those documents reviewed, more of them are relevant.
Given the ever-larger volumes of electronic documents that must be collected, searched, reviewed and produced, today’s litigation environment demands the use of tools to optimize discovery. Statistics are one of those tools, and skilled litigators must know how to leverage them. To be sure, expert help often is necessary, and litigators are not expected to have degrees in mathematics. But a solid understanding of what can be measured and how the measurements can be applied to discovery is now an essential skill for e-discovery.
Attorney Maureen O’Neill is senior vice president, discovery strategy (west) at DiscoverReady, based in San Francisco. Email: maureen.oneill@discoverready.com.


FN 1. For example, In re Seroquel Prods. Litig., 244 F.R.D. 650, 662 (M.D. Fla. 2007), the judge noted that, “while key word searching is a recognized method to winnow relevant documents from large repositories … [c]ommon sense dictates that sampling and other quality assurance techniques must be employed to meet requirements of completeness.” See also, e.g., U.S. v. O’Keefe, 537 F. Supp. 2d 14, 24 (D.D.C. 2008) (“Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics.”); Victor Stanley v. Creative Pipe, Inc., 250 F.R.D. 251, 257 (D. Md. 2008) (“The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents”).
FN 2. See DaSilva Moore v. Publicis Group, No. 11 Civ. 1279 (S.D.N.Y.) (M.J. Peck Order Feb. 24, 2012) (approving a protocol for the use of predictive coding where “[t]he accuracy of the search processes, both the systems’ functions and the attorney judgments to train the computer, will be tested and quality controlled by both judgmental and statistical sampling”); In re: Biomet M2a Magnum Hip Implant Products Liability Litigation (MDL), No. 3:12-MD-2391 (N.D. Ind. Apr. 18, 2013) (approving use of keyword searches combined with predictive coding, relying in part on statistical evidence regarding the efficacy of the methodology).

