The Days of the Bates Stamp Are Numbered
May 11, 2008
As a kind of strange lawyer-mid-life-crisis, I wrote my first law review article last year: HASH: The New Bates Stamp, 12 Journal of Technology Law & Policy 1 (June 2007). Following tradition, I tried to make the opening sentences as clever as possible:
For over one hundred years, complex litigation has relied upon the ubiquitous Bates stamp to try and maintain order and clarity in paper evidence by placing sequential numbers on documents. In today’s world of vast quantities of electronic documents, the days of the Bates stamp are numbered. Instead, the future belongs to a new technology, a computer-based mathematical process known as “hash.” (emphasis added)
Ok, maybe not so clever, but still, I was delighted to see an article this week entitled Bates Stamps’ Days May Be Numbered by Tom O’Connor in Law.com’s Legal Technology section. No big surprise here as I met Tom a few weeks ago, and we talked about hash. (I tend to do that, a lot.) I liked how Tom saw the conversion from Bates stamping to hash as symbolic of a paradigm shift, not only in e-discovery, but in the world at large. Tom and a few others, such as Craig Ball, see a significance in the move to hash beyond what I understood when I wrote the article. They also have a better grasp of how this fits with other e-discovery technologies and procedures to facilitate what Tom claims are huge savings in time and money. I gave Tom a copy of my article, as he had heard about it from Craig but not yet read it. (Yes, I usually keep an extra copy in my briefcase.)
I mentioned Tom’s ideas in a prior blog, e-Discovery at the Harvard Club in New York City, based on his presentation at the CLE. The article Tom has since written, Bates Stamps’ Days May Be Numbered, provides more meat for the bones, which I will attempt to summarize here and place into proper hash context.
For those not real clear on what hash is, and what it could possibly have to do with the 19th Century Bates stamp shown above, I suggest you read my law review article. But if the thought of reading a 44 page academic paper with 174 footnotes leaves you cold, I suggest you try my Hash Page summary instead, or my earlier blog on Hash. They will give you a pretty good idea of how hash is the mathematical foundation of e-discovery, not a corned beef dish, and why this math should render sequential numbering obsolete. There are also many interesting comments left on these blogs by experts in the field, including an esoteric argument I had with a few vendors concerning the legal efficacy of hash in ESI authentication. These short articles do not go into law-review-depth, but do lay a helpful predicate to understand what Tom is talking about.
Tom’s article begins by noting that most people doing e-discovery today still rely on Bates stamping. They scan and sequentially number ESI as if it were a piece of paper. Then he observes, as I did in my introduction, that this system will not work “in today’s world of vast quantities of electronic documents.”
But that process is simply not effective when dealing with terabytes of data. To address the sheer volume, many vendors are advocating a new way of working with electronic documents that can reduce costs as much as 65 percent by eliminating the need for text extraction and imaging in the processing phase. Beyond immediate cost savings, this approach also provides cheaper native file production, reducing imaging costs for production sets and saving up to 90 percent of the time needed to process documents. How? By not using Bates numbers on every page.
Later Tom explains that the alternative to Bates numbers is hash values. But first, he details how and why this conversion can save so much time and money:
Currently, to provide Bates numbering, many vendors generate TIFF images from native files and then Bates number those images. But this process complicates native file review and — at anywhere from eight to 20 cents per TIFF — adds considerable cost to the process. Typically, during processing, data is culled, de-duplicated; metadata and text are extracted; and then a TIFF file is created. An unavoidable consequence is that the relationship of the pages to other pages, or attachments, is broken — and then must be re-created for the review process. Page-oriented programs handle this by using a load file to tie everything together from the key of a page number. But most new software use a relational database that stores the data about a document in multiple tables. To load single page TIFFs into a relational database involves a substantial amount of additional and duplicative work in the data load process.
These steps are avoided by changing to an identification system based on hash values of entire ESI files (which Tom here calls “documents”) that eliminates the need for tracking of individual pages. Here is how Tom explains it, using a lot of e-discovery oriented tech-talk, which, if he is speaking, is usually tempered by a few laughs and war stories:
A document-based data model, rather than a page-based approach, eliminates the text extraction and image creation steps from the processing stage and cuts the cost of that process in half. Documents become available in the review platform much faster — as imaging often accounts for as much as 90 percent of the time to process. This enables early case assessment without any processing, by simply dragging and dropping a native file or a PST straight into the application — which cannot be achieved with the page-based batch process. Relational databases allow for one-to-many and many-to-many relationships and support advanced features and functions — as well as compatibility with external engines for tasks such as de-duping and concept searching. Applications that support these functions — such as software from Equivio, Recommind and Vivisimo Inc. — are all document-based and will not perform in the old page environment. Programs that use the document model can eliminate batch transfer. This process (See Diagram 1 below) increases data storage due to the need for data replication in the transfer process and is also prone to a high rate of human error. And elimination of the time that inventory (in this case, electronic data) is stationary will eliminate overall cost as well as reduce production time

Tom’s diagram above shows the Bates stamp work flow model for traditional Tiff image e-discovery process and review. This procedure treats ESI as if it were paper, and uses sequential numbering, instead of hash, to identify information. According to Tom, this traditional procedure requires a number of time consuming and expensive batch transfer processes. He says these steps are unnecessary and can be eliminated in pure native review that relies on hash. The more simplified “Bates-free” process is shown by Tom’s diagram below. In his words, this is “an easier, faster and more cost-effective e-discovery process.”

Tom concludes that:
A modern litigation support program must be able to review native documents that are not just paper equivalents, and directly enable review of any file that is in common use in business today. The future belongs to these new technologies, where native files are processed without the need to convert to TIFF and are identified by their unique hash algorithm. Attorneys and clients who focus on a document-based system will save time and money and can conduct native file review. In today’s world of vast quantities of electronic documents, the days of the Bates stamp are numbered.
I could not agree more, especially since, unlike the tile, Tom now says the “days are numbered” and not “may be numbered.” I have no doubt about it, even though it may still take many years to get there. Old habits die hard, especially in the legal profession. Still, some day, Bates stamping will seem as quaint and antique as the original Bates numbering machine itself. The original shown above was invented in 1893. The first section of my law review article explains the history of this invention, and how Thomas Edison (shown right) purchased the patent from Edwin G. Bates. Then I go into the theory of hash and native ESI. I explain that hash is the digital fingerprint that identifies every electronic file, and reveals any change in the file. I also explain how hash is used in various e-discovery processes, and examine just about every legal decision ever written which mentions hash algorithms.
In case you have never seen a hash value before, here is an example: 4C37FC6257556E954E90755DEE5DB8CDA8D76710. There are many different types of hash formulas, but all produce lengthy alphanumerics hash values such as this. The two most popular are the SHA-1 hash algorithm which creates a 40 place hash value (shown above), and MD5 hash which produces a 32 place value. Both are too long for a practical naming convention to replace a Bates stamp. So I propose that the value be truncated and only the first and last three places be used. Thus the above hash would be shortened to 4C3.710 . I also propose that the # symbol stand for hash. (The # symbol is already commonly known as the hash mark in most of the world, but in many English speaking cultures, including the U.S., it is also called the number sign or the pound sign). So I propose to abbreviate the above SHA-1 hash with #4C3.710. Some of the technical details of this naming protocol are addressed in the law review article. Others will have to be worked out with time and experience, and the adoption of more standards in the e-discovery industry.
I conclude my article by imagining what a courtroom of the future might be like without the Bates stamp:
In countless courtrooms today, a mantra something like this is heard often: “I am handing the witness a document pre-marked as ‘Trial Exhibit 75’ and Bates stamped as ‘Dr. Smith 0573.’” In the future, the author expects something like this will be heard instead: “I am putting on screen for the witness to view an ESI file pre-marked as ‘Trial Exhibit 75’ and hash marked as ‘Dr. Smith Hash 4F7.C3B (Dr. Smith#4F7.C3B).’” The ESI file may still sometimes be converted to paper, in which case it could be handed to a witness, instead of put on a screen, but the same naming protocol would apply and it would bear a “hash mark” somewhere on the bottom: “Dr. Smith#4F7.C3B.”
Sorry, Mr. Bates, your one hundred-year-plus reign is over.
Posted by Ralph Losey
The Litigation Section of the American Bar Association has published an
Instead, the e-discovery lawyers who are on their own, or with consulting firms, are the specialists usually retained by law firms, both big and small, who lack attorneys with such arcane skills. As mentioned, they are usually called in to assist on projects after there is trouble of some kind. It is always challenging to bring in an outside attorney as an expert to assist in a case, but it is particularly difficult when it occurs after a problem develops. For one thing, how do you explain “the cleaner” to the client? No doubt it is the fault of the other side, or perhaps the judge. There can also be relationship issues when new attorneys from different firms work together for the first time. This is especially difficult when the trial attorney in charge has made a mistake and does not want to hear about it, nor understand the complexities involved. Yet, this is typically how and when most e-specialists get involved in litigation.
The truth is, without experience and occasional guidance, simple checklists alone can be counter-productive. They can easily be misunderstood and provide a false sense of confidence. Sometimes it pays to be a little worried and concerned. I am sure that is one of the lessons Qualcomm’s former lawyers have learned. Perhaps the great poet
The stately 19th Century Harvard Club this week hosted a cutting edge 21st Century conference on e-Discovery. It was organized by 
David pointed out that a law firm’s reputation for truth and honesty are key. If David thinks he is dealing with a lawyer that does not follow these fundamental precepts, then the FTC will naturally be much more demanding in their requests for information, and harsh in their treatment. Conversely, David is willing to negotiate and exercise leniency when an attorney is honest and forthcoming, and reveals the bad with the good. This attitude, in my experience, is also followed by most judges.
Most of the time all of the relevant data needed for a case will be stored on the key players’ Enterprise, Local and Individual systems. Sometimes you may also need to look at Archives too, depending on what you find in the more easily accessible stores, and how difficult it is to get at the ESI on Archives. Back-up tapes and Legacy Data are not usually needed. David explained that the FTC typically only requires two daily backup tapes be preserved, just in case they want to look at them later, which they usually don’t. He noted with a chuckle that the FTC picks which two tapes to preserve, not the respondent, and they usually just pick two at random.
David also spoke of the serious risk of just relying on custodians for self-collection. They may print out, or transfer to a disk, but they are likely to do it in a way that messes up the metadata. He stated that metadata is only rarely needed for production, and depends on the case, but you should still try and preserve it as best you can. Still, the main reason you should not rely on custodians alone for collection is that they are “self-interested.” They may, for instance, want to avoid embarrassment and not produce certain very relevant emails that they wished they had not written. In his opinion, you can do the collection in-house, and do not have to hire an outside vendor, but you should use a qualified technician to go to the computers and collect the data, and not just rely on the custodians. As to forensic imaging, where outside experts are typically used, that is only rarely needed in special cases where there are indications of criminal conduct.
I agree with Tom wholeheartedly on these new native paradigm insights. Tom said that many object to going native because they think you need TIFF and bates numbers in order to preserve authenticity and stay organized. Tom disagrees and thinks that the Bates stamp has been replaced conceptually by
I discovered two new articles this week on my favorite subject, indeed the name of this blog, e-discovery teams. The first is a cheerleader kind of easy read by Dale Buss of
An outside attorney on the team can help keep the games clean, and steer team members away from the kind of temptations that cost Qualcomm its patent, and its GC his job. Further, this kind of high-road team participation puts outside counsel in a strong position to protest any questionable calls made by the umpire.
Hide the ball is certainly not the game for an e-Discovery Team to play. Some people think that is what discovery is all about, and in the world of paper discovery, years ago, there was some truth to that. But not today, and certainly not in electronic discovery. It may be tempting to some, but if you play hide the ball in e-discovery, and get caught, you may not only lose the case, but you may lose your job, and maybe even your license. It is never worth it, just ask 

Shrink the ballis the game where the Team can save the company a lot of money. Thus, from a financial perspective, it is the most important game of all. In this culling step, you process the ESI to eliminate as much duplicate and irrelevant information as possible. Here good software and automated process are critical; so too is careful strategic thinking,
Here is where the big bucks come in, the cost to review the data for privileged, confidential, and irrelevant material. Still, most internal corporate e-Discovery Teams will not clean their own ball, they will hand it off to their caddy to do it for them, typically their outside legal counsel. A few of the more mature and well organized Teams have started to review their own data, and clean them the ESI themselves. They have teams of contract attorneys they employ to do this work at reduced rates, some even send the data to lawyers in India for review. But for most Teams, this is advanced play that they do not have the time or skill to attempt.
Now we come to the lawyerly game of aim the ballwhere the ESI is analyzed to see how it fits into the case at hand. Here lawyers and paralegals tag each file to an issue, typically using review software. They also make final decisions as to whether and how information is responsive to discovery requests, or otherwise must be produced (or not). The files are categorized and rated for importance. Is this email a smoking gun that could kill your case, or is it merely of marginal relevance to a secondary issue? You had better find this out, and fast, as to each computer file you are about to disclose to the other side. If your analysis of the information to be produced shows you have a strong case, you will approach the case far differently than if your analysis shows you will surely lose when all of the cards are put on the table.
The last game is the culmination of all the rest. The analysis game resulted in final decisions on what files to be produced. Now you actually make the production. Throwing the ball is not really all that hard, so long as you enlist the aid of WORMs. No, not the creepy crawly kind, but the “Write Once, Read Many” times kind, such as optical discs, CDs or DVDs. The ESI on these media cannot be altered after written onto the discs, thus providing you, and the receiving party, with a certain amount of protection that the files will not be accidentally altered. Worms help the parties maintain a permanent record of the ESI produced.
In the final paragraph of last week’s post Ralph made the following suggestion on a potential way to deal with an onerous pre-litigation hold demand: