Tech v. Law – a Plea for Mutual Respect

August 24, 2008

Ever wonder what the big tech companies moving into e-discovery really think of the field, or the people in it like you and me? Thanks to a recent article in the Wall Street Journal we now know. They think we are morons! Or at least one CEO of a high tech company does. There are so many mainline technology companies now muscling into e-discovery these days that the Wall Street Journal ran a feature article on the subject. Tech Firms Pitch Tools For Sifting Legal Records, Wall Street Journal at B1 (August 22, 2008). As part of this article WSJ reporter, Justin Scheck, interviewed Michael Lynch, the CEO of a British software company, Autonomy, about their move into e-discovery.  Mr. Lynch is quoted as saying e-discovery work “…is work that requires little brain-power or legal training.”  (Please see Mr. Lynch’s comment below where he says he was misquoted, and that is not what he really thinks.) Mr. Lynch is the CEO of the high-tech company that paid $375 million last year to buy Zantaz. So it seems that e-discovery is the Rodney Dangerfield of the tech world – “can’t get no respect.”

Conflict Between The Two Professions

You could say this is just one man’s opinion, and the quote was probably taken out of context. Maybe, but I don’t think so. This comment demonstrates a real antipathy between Law and IT. It also illustrates a lack of understanding or appreciation as to what each side really does. For instance, Mr. Lynch is also quoted by the Journal (again, see his challenge of the quote in his comment below) as describing what he thinks discovery lawyers do, and why they need help from super-tech gurus like him:

“The old-fashioned way of doing this was having a lot of lawyers doing a lot of simple things,” he says. “You would literally have lawyers reading through things saying ‘there was chicken for lunch.’ You don’t need lawyers to know it’s a lunch menu.” 

The article then goes on to describe how a host of technology companies, ones that have recently discovered the profit potential of e-discovery, are going to save the day with their advanced software. They are going to save law firm clients from being bilked by greedy menu-reading lawyers. The fair and balanced Journal gives the so-called opposing view by stating that:

But big law firms, facing the loss of lucrative client fees, are crying foul. The question how much of the discovery process can be automated and how much money the tools will really save. They also claim companies could end up spending more to fix mistakes.

Obviously, both the Journal and big-tech company executives they interviewed are clueless as to the real world of e-discovery. 

This arrogance, misunderstanding, antipathy, and lack of respect is not a one-way street. For most of my career, the IT guy (and yes, it always used to be a guy) received about as much respect in a typical law firm as the copy machine repair man – not very much. Even when they were later hired as full time law firm employees, techs were (and in some firms still are) considered rather dimmed witted necessary evils, with lower status than secretaries, and nowhere near the status of a paralegal. Law and lawyers were professionals. IT techs were what? Did they even go to school? Why are we paying them so much? 

The mutual lack of respect has, in my opinion, long characterized the relations between these two industries. I know this from first hand experience going back to 1978, when I first became enthralled with computers in law school. When I started practice in 1980, computers were just beginning to be used by a few progressive law firms. I was the young associate who liked computers, and so I ended up handling all of the interface with the IBM technician. There was only one computer company in those early days: IBM. The big blue tech would come to your office to fix your mini-mainframe computer, then later your PCs, whenever there was a problem, which after PCs came along in 1982, was pretty much all of the time. Eventually, I ended up doing most of the tech-work myself, and would just call IBM on the rare occasion I could not figure it out on my own. After all, as every tech knows, just checking to see if everything is plugged in, or hard-booting, will fix most of the problems a typical idiot user has; you know, users like lawyers with ‘little brain-power” who read menus for a living.

This Problem Loses Cases

So we have a real respect problem, and since none of us is Rodney Dangerfield, it is not at all funny. This antipathy leads to widespread misunderstandings and miscommunications between lawyers and computer technicians. This is just mildly annoying for most lawyers and techs, but for specialists in e-discovery it is a disaster. That is because e-discovery is a blend of the two professions. It can only work properly when lawyers and techs work together and cooperate. (I have dedicated a whole website to that proposition.) When this does not happen, the typical result is another disaster case. I will spend the rest of the blog going over a good example of this. Kevin Keithley v. The Home Store.com, Inc., 2008 U.S. Dist. LEXIS 61741,  2008 WL 3833384 (August 12, 2008). This is a case involving serious sanctions against defendants based in no small part upon techs obvious lack of respect of the law and lawyers.

Kevin Keithley v. The Home Store.com, Inc. 

Kevin Keithley is a patent infringement case in San Francisco involving computer software and Internet websites. Defendants write code for and develop such well known websites as Realtor.com, Homebuilder.com, Homestore.com, and Move.com. Most of the key ESI custodians on the defendants’ side were software engineers and programmers of various types. Their disrespect of the law, lawyers, and the discovery process was obvious, so much so that the senior federal Magistrate Judge looking into their conduct, Elizabeth D. Laporte, said it was ”among the most egregious this Court has seen.” Judge Laporte begins her opinion with this observation:

While the Court does not impose sanctions of any type lightly, and would prefer to see the resources of the Court directed to addressing the substantive issues of the case on the merits, rather than the collateral issue of sanctions for discovery abuse, this is the unusual case in which Defendants’ conduct warrants stiff monetary, as well as evidentiary, sanctions. .  See United Medical Supply Co. v. United States, 77 Fed. Cl. 257, 258-59 (Fed. Cl. 2007) (“Aside perhaps from perjury, no act serves to threaten the integrity of the judicial process more than the spoliation of evidence. Our adversarial process is designed to tolerate human failings – erring judges can be reversed, uncooperative counsel can be shepherded, and recalcitrant witnesses compelled to testify. But, when critical documents go missing, judges and litigants alike descend into a world of ad hocery and half measures-and our civil justice system suffers.”)

Judge Laporte then imposed sanctions of $320,000, plus a devastating adverse inference instruction. She considered entering judgment against the defendants outright as the plaintiffs requested, but recognized that the case involved miscommunications, disrespect, and negligence, not outright fraud. These are harsh sanctions nevertheless, and in my view, Judge Laporte correctly implemented the First Circuit quote she likes and avoided the “cardboard sword” to fight this ad hocery:

As aptly stated by the First Circuit, “the judge should take pains neither to use an elephant gun to slay a mouse nor to wield a cardboard sword if a dragon looms.” Anderson v. Beatrice Foods Co., 900 F.2d 388, 395 (1st Cir.), cert. denied, 498 U.S. 891 (1990). 

When Did The Duty To Preserve Begin?

The first interesting legal issue in this case is when the duty to preserve was triggered. The lawsuit was filed on October 1, 2003, so it definitely started at least by then. But plaintiffs argued it actually started on July 14, 1998, when plaintiffs wrote defendants requesting they license their patent. Judge Laporte did not buy that because the letter did not threaten litigation or even mention infringement. But she did find the duty was triggered on August 3, 2001, over two years before the suit was filed. She found it was triggered by a letter from plaintiffs to defendants stating that “we assume that Homestore.com wishes to litigate this matter. Unless we hear otherwise by close of business Tuesday, August 7, 2001, we will advance this matter accordingly.”

As Judge Laporte notes, this is all just an academic issue “because Defendants did not satisfy their duty to preserve even after this lawsuit was filed and recklessly allowed the destruction of some relevant source code as late as 2004.” For that reason we probably should not tax our “little brains” about it, but still, its slightly more interesting than whether “there was chicken for lunch.”

Judge Laporte explains the triggering law by first citing to A. Farber & Partners, Inc. v. Garber, 234 F.R.D. 186, 193 (C.D. Cal.2006) which held that “There is no doubt that a litigant has a duty to preserve evidence it knows or should know is relevant to imminent litigation.” She then clarifies the “imminence” requirement by referring to a quote from the holding of In re Napster Inc. Copyright Litigation, 462 F. Supp. 2d 1060, 1070 (N.D. Cal. 2006):

The court in A. Farber thus held imminence to be sufficient, rather than necessary, to trigger the duty to preserve documents. Furthermore, the court in A. Farber did not reach the issue of when, exactly, the duty attached. The duty to preserve documents attaches “when a party should have known that the evidence may be relevant to future litigation.” Zubulake v. UBS Warburg LLC, 220 F.R.D. 212, 216(S.D.N.Y.2003). See also National Ass’n of Radiation Survivors, 115 F.R.D. at 556-57. The future litigation must be “probable,” which has been held to mean “more than a possibility.” Hynix Semiconductor Inc. v. Rambus, Inc., 2006 WL 565893 at *21 (N.D. Cal.2006) (Whyte, J.). 

Law Is Not A Science

So it looks like Judge Laporte considers “imminent” to mean “probable” which means something more than possible. A very vague standard indeed, exactly the kind of thing that drives computer engineers crazy. I predict the preservation trigger date issue will always be decided on a case-by-case basis and no bright lines will ever appear. That is why the practice of law is an art, not a science, and the human element can never be replaced by technology.

Unlike computer code, the rules of law are malleable and there are always exceptions. This in turn is one of the key reasons the two cultures of Law and IT have such a hard time understanding one another. It is also the reason a few inexperienced engineer types are delusionary and arrogant enough to think that e-discovery can be “fixed” with the right software algorithms. It cannot because law is not a science, it is far too complex and chaotic for that. Or if it is a science, it is more like Quantum Physics, where electrons are unpredictable and can be in two places at once, not the orderly world of Newtonian Science that most engineers live in.

Yes, there are many computer programs that can be used as effective tools in the pursuit of justice. We lawyers need to wake up to that fact. But so too do the technologists who think the right software alone will fix everything. The human element is key in Law which is one reason that training is so important.

Where Are The Reports?

Getting back to the case, the defendants’ Chief Information Officer and Chief Technology Officer (very impressive titles!) testified that he was “instructed not to destroy any materials that might be relevant” to potential litigation. Unfortunately, none of the attorneys involved put those instructions in writing, or at least if they did, they could not find the hold notices five years later when plaintiffs moved for sanctions. (Yes, Law is slow, which is another thing IT cannot understand.)

The failure to put hold notices in writing is a rookie mistake, especially when notifying engineers. Always put the litigation hold notices in writing, usually email, confirm the receipt, send reminders, and keep a good record of everything. Then followup, and ideally, collect what you need yourself, instead of just relying on self-collection. Also, a company should have written litigation hold policies that specify how documents are to be preserved for litigation. It is dangerous to implement this in complex litigation on an ad hoc basis. The lawyers here did not do that and so the door was left open for the IT personnel and other key custodians to completely ignore the requests from Legal. Here is Judge Laporte’s reaction:

The lack of a written document retention and litigation hold policy and procedures for its implementation, including timely reminders or even a single e-mail notice to relevant employees, exemplifies Defendants’ lackadaisical attitude with respect to discovery of these important documents.  See, e.g., In re NTL Securities Litigation, 244 F.R.D. 179, 198-99 (S.D. N.Y. 2007) (finding that the failure to have an adequate litigation hold in place and the failure to issue reminders to employees regarding the duty to preserve evidence was at least grossly negligent). The harm caused by the lack of a preservation policy was compounded by an egregious failure to diligently search for responsive documents in alternate locations until well after the eleventh hour, in the wake of the initial hearing on the motion for sanctions for spoliation. 

The plaintiffs motion to compel was based on many mistakes and failures to produce various categories of ESI requested. Judge Laporte’s lengthy opinion considers many of them. One that sheds light on our disrespect and miscommunication theme here concerns plaintiffs requests for production of “reports showing how the websites were used and the content of Defendants’ databases.” Defendants attorneys first took the position “that it would be impossible to retain all reports because of space limitations.” For that reason, Defendants said they could only produce report templates. Obviously defense counsel here was just repeating what IT told them.

The lawyers were told wrong. IT gave them this song and dance, I suppose, thinking that they could get away with it, that they could use a bit of double-talk about space limitations to avoid the time and trouble of actually searching for the reports. After all, lawyers are all computer illiterate. We can tell them anything and they will never know the difference. As a result of this all-too-common tactic by IT, the lawyers were made to look like liars when the plaintiffs’ attorneys did not take “no” for an answer. They kept pressing the issue, taking depositions, hiring IT experts of their own, filing motions to compel, all culminating in an evidentiary hearing on a motion for sanctions.

The next position the lawyers took on the requested reports, again obviously at the urging of IT techs behind the scenes, was that the program “does not generate many types of reports.” Then at the evidentiary hearing, where the engineers were obviously present and advising the lawyers on what to say, the poor defense counsel was questioned by an obviously frustrated Judge Laporte. Defense counsel does his best to respond to the judge, but is obviously in deep waters, way over his head. It does not turn out well. Here is Judge Laporte’s description of what happened:

Then, at the March 18, 2008 hearing on the motions for sanctions, in response to the Court’s questioning, Defendants’ counsel told the Court that Defendants do not store reports, but only permit users to make ephemeral queries and do not store the responses.  In other words, Defendants did not keep any reports in the normal course of business, so nothing could have been lost or destroyed that should have been kept. Counsel concluded that:

Nothing’s been destroyed. Move doesn’t capture those reports that you are seeing; some other user does it. Just like you would, when you do a search on Google or Lexis. . . . We don’t get a copy of when a — when a Realtor runs a query such as those, a copy goes into some files at Move. It’s not been destroyed.

Mar. 18, 2008 Tr. at 26:10-20 (emphasis added). This representation to the Court was false.  

Ethics 101 – Thou Shalt Not Lie

Ouch, that hurts. That is not the kind of thing you ever want to read as a lawyer about yourself, that you made a false representation to the judge. This is not just a minor bad form error. It is a significant ethical violation: 

Model Rules of Professional Conduct, Advocate – Rule 3.3 – Candor Toward The Tribunal

(a) A lawyer shall not knowingly:

(1) make a false statement of fact or law to a tribunal or fail to correct a false statement of material fact or law previously made to the tribunal by the lawyer; . . .
(3) offer evidence that the lawyer knows to be false. . . .

If misrepresentations to the Court do not lead to outright BAR discipline, it will certainly ruin your reputation with the Bench. Once that is lost, if you are a litigator, you might as well pack your bags and go home. In trial work, reputation and credibility are everything.

Looks like the defense lawyer here was hung up and set out to dry by his IT clients. The judge found his whole story to be false, a tale obviously fabricated by the IT witnesses who were “helping the lawyers” behind the scenes. Judge Laporte may have suspected as much during the hearing, but she found out for sure a few weeks later.

At the end of the hearing Judge Laporte told defendants that sanctions would be imposed against them, possibly including a final judgment. Then, just two weeks after this hearing and representation as to no-reports, the defendants in fact produced over 480,000 reports! No wonder Judge Laporte took the rare step of publicly chastising defense counsel in a written opinion.

Where Is The Source Code?

In a software patent case like this, the most important evidence is usually the source code. Naturally, this is exactly what the engineers here did not bother to properly preserve and produce. Again, the lawyers took the fall for it. Judge Laporte said they should have done a better job of notifying and reminding the software coders of their duty to keep old versions of the code. I disagree. In my view, sending more notices would have been about as effective as a cardboard sword against a dragon. Still, here is the way Judge Laporte saw it:

Defendants had a duty to notify and periodically remind technical personnel of Defendants’ preservation obligation and ensure that they took adequate steps to safeguard the data. At a minimum, Defendants were reckless in their conduct regarding the Development Computer.  Had Defendants imposed a proper litigation hold in this case, the evidence on the Development Computer, in particular, the log of changes to the websites’ source code, would have been preserved. Instead, evidence of prior versions of source code was destroyed. 

The facts of source code spoliation came out at the sanctions hearing, the one which ended so poorly for defendants as previously noted. Then, after losing the hearing, when the whole case is on the line, another IT miracle happens. Old versions of the source code suddenly begin manifesting. Defendants started producing source code like crazy, thinking, I suppose, that this way they could avoid sanctions, or at least prevent an outright loss of the case. Here is how it all appeared to Judge Laporte:

It appears that only after the Court held a hearing on the motion for sanctions and indicated that sanctions may be appropriate, and fifteen months after the Court’s express order to produce all versions of source code, did Defendants make any real effort to fulfill their discovery obligations to search for and gather source code.  

Here is the story the defendants came up with to explain the sudden, unexpected production of millions of lines of source code. A few days after the hearing one of the senior engineers:

[H]ad a resurgence of memory “some weeks ago” when he recalled that his work computer’s hard drive, which likely contained copies of pre-pour-over source code, had crashed at some unspecified time and that he had stored the crashed hard drive at his home.  See Declaration of Philip Dawley in Support of Defs.’ Supp. Memo. re: Spoliation Remedy at ¶ 18-20.  Engineers were able to reconstruct source code files from that hard drive.

Still more source code was found by simply asking one of the engineers in charge of the code project. What a brilliant idea! Funny they had never thought of that before. When the lawyers finally did talk to the engineer in charge of a key code-migration project, and she understood the company might be shut down for a patent violation, she remembered that she had made an archive copy on her own. She kept it on a DVD in a drawer in her cubicle at work. That is exactly the kind of thing techs do all the time (so do I), which is why these reclusive coders must be located and personally questioned when their ESI is first requested, not years later when a judge is ready to dismiss your case. 

The court reacted to this by saying it was “frankly shocked” that the engineer had not been questioned earlier and the code produced long ago. There were even more productions and source code findings after that, but the story grows redundant at this point, and I yearn for a good lunch menu to read. 

The “Better Late Than Never” Defense

Defendants responded somewhat apologetically, but basically said “no harm, no foul,” we have now produced the code, so there is no need for sanctions. The “better late than never” defense did keep the case from the ultimate sanction of a default judgment, but they did not escape the adverse inference and the monetary sanctions. Here is Judge Laporte’s response:

The fact that Defendants have flagrantly disregarded their discovery obligations with respect to reports and source code calls out for sanctions.  … Defendants engaged in reckless and egregious discovery misconduct as described above. … 

The facts — specifically that Defendants have no written document retention policy nor was there a specific litigation hold put in place, that at least some evidence was destroyed when the Development Computer failed, that Defendants made material misrepresentations to the Court and Plaintiffs regarding the existence of reports, and that Defendants have produced an avalanche of responsive documents and electronically stored information only after the Court informed the parties that sanctions were appropriate — show a level of reckless disregard for their discovery obligation and for candor and accuracy before the Court sufficient to warrant severe monetary and evidentiary sanctions.

Defendants’ reckless conduct not only warrants sanctions under Rule 37, which does not have a bad faith requirement, but also warrants sanctions under the Court’s inherent power. Specifically, Defendants’ pattern of deceptive conduct and malfeasance in connection with discovery and production of documents under this Court’s order and reckless and frivolous misrepresentations to the Court amounts to bad faith for purposes of sanctions under the Court’s inherent power. Defendants’ conduct was not inadvertent or beyond their control or merely negligent; to the contrary, Defendants did not even come close to making reasonable efforts to carry out their preservation and other discovery obligations and to determine that their representations to the Court and to opposing counsel were accurate. As a whole, Defendants’ discovery misconduct in this case was both reckless and frivolous. See, e.g., Fink, 239 F.3d at 994. … 

However, because there is no evidence that Defendants engaged in deliberate spoliation, and dismissal is the most extreme sanction and would go beyond what is necessary to cure the prejudice to Plaintiffs, the Court does not recommend terminating sanctions.

A Call For Mutual Respect

In my opinion, all of the problems in this case derived from Law/IT miscommunications and disrespect, not from malicious intent. In fact, I suspect, and many I know agree with this, that such “Who’s on First” miscommunications are at the root of most e-discovery sanction cases. (There are some notable exceptions – can you spell Qualcomm?) I have written about this before, and often speak of the problem. The lack of respect can certainly cause a lot of trouble in e-discovery. Even Rodney Dangerfield would have had a hard time making these sanctions funny. 

Information Technology and the Law are both honorable occupations. We must learn to work together to meet the challenges of e-discovery. This is a plea for mutual respect and cooperation. A little humor about the whole thing would not hurt either.

Chicken sandwiches anyone?

 


New Case where Police Use Hash to Catch a Perp and My Favored Truncated Hash Labeling System to ID the Evidence

August 17, 2008

Part of my discipline as an e-discovery specialist is to try to read (or at least skim) every published opinion on the subject. Lots of attorneys specializing in this area do that. But there is one other type of case I also read, every opinion that uses the word “hash.” No, I do not need help from Narcotics or Overeaters Anonymous. The kind of hash I am addicted to is purely algorithmic. This hash comes in many flavors, but the best known, and the ones usually employed in e-discovery, are called MD5 hash, SHA-1 hash, or the latest and greatest, SHA-2 hash

As I explain in my blog Hash page, hash is the mathematical foundation of e-discovery and the most powerful tool of any forensic investigator. It reveals the unique mathematical fingerprint of every computer file that allows for perfect identification and authentication of electronic evidence.  I became fascinated with the powers of hash a few years ago, and ended up writing a lengthy law review article on the subject. HASH: The New Bates Stamp, 12 Journal of Technology Law & Policy 1 (June 2007). A few months ago I wrote a blog on the article called The Days of the Bates Stamp Are Numbered, talking about some of the more recent developments in this area of the law, especially the shift from Tiffing and linear flat file Bates stamping to native file hash marking.

In the process of researching the original law review article, I am pretty sure I read every legal opinion and legal article ever written that  mentions hash. I also read a few scientific and cryptological articles as well, most of which I did not really understand. Having put that much time and effort into the subject, I try to keep up by reading every new legal opinion or article mentioning hash. That is why I have a standing search for all cases using the term, and automatically receive a copy of them by email as soon as they are published. I can be in the middle of dinner and my blackberry will buzz alerting me of a new hash case. Lest you think that’s a tad weird, I am willing to bet that there are a few other hash enthusiasts out there, Craig Ball comes to mind, who do the same thing. (See Craig Ball’s excellent article “In Praise of Hash” at pg. 52.)

Hash and Child Pornography

Most of the new hash cases I see have nothing to do with e-discovery per se. Instead, they are usually criminal law cases, typically cases involving one of the most disgusting of crimes, child pornography. Police have been using hash to catch perps in this area for years. Hash is an effective tool for this because it allows police to know if certain child pornography is located on a computer, usually videos or still photos, by looking to see if the hash values for these files are present. That is a bit of an over-simplification, but suffice it to say that there are lists of hash values that are known to be associated with computer files which are unquestionably child pornography. New York Attorney General Andrew Cuomo explained the process in a press release in June 2008 announcing a deal with major Internet providers to block major sources of child pornography: 

As part of the undercover investigation, the Attorney General’s office developed a new system for identifying online content that contains child pornography.  Every online picture has a unique “Hash Value” that, once identified and collected, can be used to digitally match the same image anywhere else it is distributed.  By building a library of the Hash Values for images identified as being child pornography, the Attorney General’s investigators were able to filter through tens of thousands of online files at a time, speedily identifying which Internet Service Providers were providing access to child pornography images.

U.S. v. Warren

I recently received a new hash case alert from a district court in Missouri. U.S. v. Warren, 2008 WL 3010156 (E.D.Mo. July 24, 2008). A quick review showed it was yet another child porn case, so I did not think much about it. I just added it to my reading list for more careful study later, just in case there might be something special about it. When I got around to reading Warren yesterday, I was very pleasantly surprised, as this was indeed a special case.

Warren is a case considering and rejecting a motion to suppress evidence, namely computer video files of underage teens having sex. The motion to suppress was based on a series of hyper-technical challenges to the affidavit which the St. Louis police submitted to the judge to receive a search warrant of defendant’s computer. The affidavit explained how the police had searched the Internet for files “whose digital SHA-1 value was identical to that of a file known to contain child pornography.” They found a computer with an Internet Protocol address of 70 … 167 offering to share one such known file, and then subpoenaed AT&T to get the physical address of the subscriber with that IP address. The computer was located in Affton, Missouri.

The police detective’s affidavit explained how the hash values and offer to upload established “that a computer in Missouri was ‘offering to participate in the distribution of known child pornography.’” Based on this affidavit, the judge found probable cause to issue the search warrant of the computers located in Warren’s home. The police then went to his home, found no one there, forced entry, and seized his computer. Warren himself later came along, and, foolishly enough, voluntarily came to the police station, waived his right to counsel several times, and spoke at length to the police. The opinion includes extensive excerpts of the taped interview, which Warren later argued was made in violation of his right to legal counsel.

The defendant’s technical search warrant objections forced the court to delve into many of the characteristics and evidentiary properties of hash. For that reason alone, the case is useful to any practitioner trying to better understand the subject. But what is really special about the case, at least for me, is the system of hash file identification used by the court to identify the offending video tape at issue in this case. That video computer file was the key piece of evidence, the “smoking gun.”

Six-Place Hash Truncation Naming Protocol

The opinion by Magistrate Judge David D. Noce in Warren is unusual and special because it is the first case to use the truncated hash value labeling system I proposed in HASH: The New Bates Stamp. My article was not mentioned, and apparently Judge Noce was not aware of it. He used the six-place hash truncation system I proposed in my article because it was, in his words, “convenient” to do so, and because the detectives had used that system in their affidavits and testimony. I doubt the police detectives had read my law review article either, which makes their use of the abbreviation system all the more important. It shows that it is a natural and reasonable thing to do, although this is the first time it has been utilized or mentioned in a legal opinion.

So what is the six-place hash truncation system which I proposed that these Missouri officials are now in fact using? Before I can answer that, I have to go into a little more depth about hash and Bates stamps. HASH: The New Bates Stamp not only explains hash and its importance to e-discovery, it also argues for the legal profession and e-discovery industry to adopt a new type of electronic document naming protocol that uses hash values, instead of sequential numbering, to identify electronic evidence. I argue that the time has come for the legal profession to abandon Nineteenth Century Bates stamp paper mentality, and adopt Twenty-First Century ESI hash mentality. I proposed that sequential Bates stamps be replaced by non-linear, intrinsic hash values.

The hash values would not only identify ESI, they would authenticate it too, something the lowly Bates stamp could never do.  But the problem with using hash values to identify ESI, instead of Bates stamps, is that hash values are too long and awkward for the human mind. Here is what a typical forty place hexadecimal SHA-1 hash value looks like: 2B37BC6257556E954F90755DDE5DB8CDA8D76619.

Police detectives, lawyers and judges cannot go around describing computer files used as evidence with such long alphanumerics. It is too cumbersome a name to replace the Bates stamp. So my common sense proposal, which Judge Noce in Warren calls “convenient,” is to only use the first and last three places of the hash value, instead of all forty. So the hash value above becomes the much more manageable 2B3 … 619. That truncated hash value becomes a pretty good document name, and, in my opinion and that of many others, should replace the arbitrary Bates stamp.

Turns out that the detectives in Missouri were already following this six-place truncation protocol at the time my article was published in June 2007. Perhaps they and other law enforcement agencies have been using this system for years. I do not know for sure, although I doubt it has been a widespread practice. I have talked to many e-discovery forensic experts about the hash naming proposal over the past two years. Many of these experts did police work before going into e-discovery, and none ever mentioned having done this before. Also, it certainly does not appear in the legal literature on the subject, that is, until U.S. v. Warren.

Hexadecimal Values v. Base32 Number System

At first, I was disappointed to see that Judge Noce’s introduction of the truncated hash value naming protocol was flawed with two obvious technical errors. See if you can catch them:

The search turned up a list of files, including one with a 32-character alpha-numeric SHA1 designation of “H4V … UTI.” Fn4

FN4 - For convenience, in this opinion the SHA1 value set out in full in the search warrant affidavit will be referred to as “H4V … UTI.” The affidavit defined the term “SHA1” (also known as “SHA-1”) as being a mathematical algorithm that uses the Secure Hash Algorithm (SHA), developed by the National Institute of Standards and Technology (NIST), along with the National Security Agency (NSA) . . . Basically the SHA1 is an algorithm for computing a condensed representation of a message or data file like a fingerprint.

Warren at *1.

First of all, the SHA-1 hash generates a 40-character hexadecimal string, not 32-character. The other kind of hash, MD5 hash, is the one that uses a 32 character string, not SHA-1. For this reason, my first reaction was that the Judge, or police, mixed up the two different types of hash, and meant to say 40 characters, not 32.

But then there seemed to be yet another, even bigger mistake. The letters H V U T and I should not have been in the hash value name. The values generated in e-discovery work to represent SHA-1 and MD5 hash are always hexadecimal. That is a numerical system with a base of 16. This is typically represented by the numbers 0–9 for the first ten values, and A, B, C, D, E, and F to represent the last six, for a total of sixteen. In other words, a hexadecimal value does not employ any letters after F. Yet, the so called SHA-1 alphanumeric stated in the Warren opinion uses the letters H, U, T and I: “H4V … UTI.”

I thought the police or Judge Noce must have messed things up, but I also seemed to remember reading somewhere that were other ways to express hash values, and anyway, I am always very careful before I tell a judge that he or she is wrong. So doing a little online research, I learned that there are indeed other ways to display hash values using different binary based number systems, typically the 32 base or 64 base number systems. Base32 is defined in IETF RFC 3548, as using the characters A-Z and 2-7. While Base64 is defined in IETF PEM RFC 1421 as using the characters A-Z, a-z, 0-9, / and +.

My Online Investigation of Base32 Hash Math
Led to a Shocking Discovery

Coming back to the Warren opinion, the hash values “H4V … UTI” are not hexadecimal, but they could be either Base 32 or Base 64. At this point, I did a little more online research about Base32 hash, and quickly found that there are many websites where you can locate music and videos to download based on their hash values. Almost right away, by simply using Google, I located a site where you can find media to download based upon their SHA1 Base32 value. It then took less than a minute to find the web page where the Base32 SHA-1 hash values were listed that began with “H4V.” That is how all of the media on the site was listed, in numerical order based upon the first three numbers of their Base32 hash values.

There were 83 entires on the webpage whose hash values began with H4V. The site included listings of music and videos ranging from Beethoven’s Symphony No. 9 to a video of Lee Trevano’s Golf Instruction. One video listing which was 11.1 MB in size had a disturbing title that suggested it could contain the kind of porn referenced in Warren. It was dated May 29, 2003. I clicked on its hash value button and saw that the full SHA-1 hash value for this video was H4VIBLSKAZ477WRTKH7IURE6NXEDCUTI.

When I saw that hash value, it shook me up. The first and last three values exactly matched the hash described in Warren: H4V … UTI. My academic investigation of the mathematical properties of hash had led me right to the smoking gun in Warren! I knew from my article, and the research of Bill Speros described in footnote 168, that this match of the first and last three values meant there was a 98.6% probability that this was the exact same file referenced in Warren.  Mr. Warren was charged with a felony for distributing this same video. I think it is a crime to even have it on your computer.

I do not know for sure if it is the same file, since the Warren opinion nowhere states the full hash value, but in view of the description of this video, it is just too much of a coincidence for it not to be.  It was astonishing on many levels to see just how quickly you can find a file like this on the Internet, simply by knowing the first three hash numbers. 

It is probably not possible to actually download or view the file from this website. I do not really know for sure, since that would involve clicking on this file, which I was not about to do. But when I clicked on the link for Beethoven’s Symphony No. 9, a piece of media which I do not find morally reprehensible, it took me to another web page. This page had links to other computers where you may in fact have been able to download Beethoven’s music. (I did not try, recognizing that might be a copyright violation.) At that point, the referring website included a statement that it “ONLY HAS INFO ABOUT FILES, AND DOES NOT OFFER ANY FILES FOR DOWNLOAD.” Still, if any law enforcement agency wants to contact me for the full website address, including Cuomo’s group, I would be happy to provide it. It is really very easy to find, and so I assume the proper authorities are already well aware of this site and its hash values, or lack thereof. I am certainly no police officer, and even if I was, I would not have the stomach for this kind of investigative work. Reading the email of parties in civil suits is about as horrid as I can handle.

Judge Noce Was Right

This little investigation proved to me that Judge Noce and the St. Louis police were correct. There is a SHA-1 hash that has 32 places, not 40, and it can use the whole alphabet, not just A-F.

The hash value H4V … UTI is indeed a correct first and last place truncation of a full SHA-1 hash value. But it is a SHA-1 hash that is expressed in Base32, not hexadecimal. Although the hash values used in e-discovery are almost always hexadecimal, the hash values used in “Peer-to-Peer” websites include a variety of different numerical systems, frequently including the Base32 system.

In addition, in my brief investigation of the P2P webs, I learned that countless P2P type websites now commonly use the first three places of hash values as a convenient shorthand naming system. For all I know, the “perps” may also. As Judge Noce says, it is the convenient thing to do. So when will the e-discovery vendors start doing so too?


Adversarial Search, a “Perfect Barrier” to Cost Effective e-Discovery, and One Litigant’s “Aikido-like” Response

August 3, 2008

I came across a case recently where a defendant successfully employed an “Aikido-like” maneuver to prevail in an e-discovery fight. Perfect Barrier, LLC v. Woodsmart Solutions, Inc., 2008 WL 2230192 (N.D. Ind. May 27, 2008). Plaintiff’s counsel took an over aggressive approach to e-discovery which defense counsel completely turned around on him. Plaintiff’s counsel ended up losing a motion for sanctions and driving up his client’s e-discovery costs. This case demonstrates the essence of Aikido in action, and the dangers of trying to misuse e-discovery as a weapon. It has many other interesting points to it as well, including an argument over native production using hash values, versus flat TIFF file production using Bates stamps. This debate is at the core of my law review article, HASH: The New Bates Stamp, 12 Journal of Technology Law & Policy 1 (June 2007), and the practitioners here cite to the article as part of their arguments.

But first, a little about Aikido. It is a purely defensive martial art that redirects the force of the attacker back upon himself, instead of opposing it directly. The usual result is the attacker being thrown to the ground as shown in the photo above. For an amazing demonstration of Aikido by Steven Seagal, see this YouTube video, where he is first attacked by one black belt, then two, three, and then a whole “class action. ” They all end up on the ground, with Seagal barely breaking a sweat. Many consider Aikido the purest and most elegant of all martial arts, and one of the most difficult to master. It is considered a “non-violent martial art” (an oxymoron, I know, much like “cooperative litigation”), whose primary message is peace and reconciliation. It is only used for defense, to thwart attacks, never for offense. It works by channeling the force of the attacker, not resisting or opposing it. As the well known founder of Aikido, Morihei Ueshiba, explained:

Nonresistance is one of the principles of aikido. Because there is no resistance, you have won before even starting. People whose minds are evil or who enjoy fighting are defeated without a fight.

These are sage words, but hard to imagine how they apply to litigation and e-discovery. That is where the Perfect Barrier case comes in. Perfect Barrier, LLC v. Woodsmart Solutions, Inc., 2008 WL 2230192 (N.D. Ind. May 27, 2008). This is a relatively small dollar value case against Woodsmart Solutions, a maker of blue coated lumber.

Plaintiff begins discovery with a direct attack, making an obviously over broad request for production of emails. The request included a list of seventy-seven so called “relevant search terms” that Plaintiff’s counsel created on their own. The list included the defendant’s name, “woodsmart,” and several other common words. At this point, most would actively resist the over-broad request. They would rise to the fight with either a motion for protective order, or perhaps their own counter-offensive request for emails using a keyword list that was just as broad. But that is not the Aikido way. Nor is it the way of defense counsel in this case, Stefan Stein, whom I do not know, but who, according to my research, is certainly not a martial artist, and may never even have heard of Aikido. He is, however, a very experienced intellectual property lawyer and obviously wise in the ways of litigation.

Defendant’s attorney responded to this attack, not by resisting it, but by pointing out the obvious, that the list of keywords was so broad that it would catch virtually every email on Defendant’s server. He offered to compromise and negotiate a new list of keywords and other search protocols with Plaintiff’s counsel, a list and procedures that would be more effective to the supposed common goal of ferreting out relevant emails. Preferring to fight, Plaintiff’s attorneys did not respond to the peace overture. They refused all negotiation, and instead insisted that Defendant carry out the search with the list they had written. Plaintiff did not care that the search terms would produce too many emails because they assumed that the burden of review, to screen for confidential and unresponsive ESI, would fall on the Defendant, the producing party. They saw this as a potential knock out punch, that might force Defendant to surrender on their terms, instead of incurring the tremendous review expense. By all appearances, they were misusing e-discovery as a weapon.

Defense counsel at this point did not resist, they agreed, reluctantly, to the search terms, but then added a small little twist, that the parties would make the production under a confidentiality order. Of course, this is standard in most business litigation, and so Plaintiff quickly agreed, sensing what they thought was weakness and capitulation on Defendant’s part. Little did they know that their hands had just been grabbed, much like Seagal did to his attackers in the video. Plaintiff’s attorneys under estimated their adversary, and just like the many Steven Seagal attackers, they would soon find themselves thrown for a loop.

The parties then entered into a confidentiality agreement, which became a stipulated discovery order. Too bad Plaintiff’s counsel did not read the stipulated discovery order more carefully. He was obviously too busy with the attack to read the fine print. Defendant then ran the search terms as agreed, and made production to Plaintiff’s counsel of all email containing these terms. The production was made in native format on DVD, and not too surprisingly, there were 75,000 pages of email in this small case. However, what was surprising was the confidentiality designation notice that accompanied the production. The notice designated all of the email produced as falling within the highest category of confidentiality, the so called “Attorneys Eyes Only” confidentiality. Under the stipulated order this meant that only Plaintiff’s counsel could look at these emails, and they could not show any of them to their client, or their expert, or anyone. They could not even use them in court, unless and until they first went through the procedures of the agreement to challenge the confidentiality designations.

This little twist by Defense counsel effectively threw Plaintiff to the ground. It shifted the enormous burden of review of these 75,000 emails from the responding party, here the Defendant, to the requesting party, here the Plaintiff. This is, in my view, a perfect legal example of the Aikido philosophy of defense by redirecting the attacker’s force upon himself.

In the words of Ueshiba, who liked to call Aikido the “Art of Peace:”

In the Art of Peace we never attack. An attack is proof that one is out of control. Never run away from any kind of challenge, but do not try to suppress or control an opponent unnaturally. Let attackers come any way they like and then blend with them. Never chase after opponents. Redirect each attack and get firmly behind it.

To see Ueshiba himself as an elder practicing Aikido, see this YouTube video. It looks fake, but it’s really not. These are young black belts trying their hardest to knock down an old man. By the way, I am not an Aikido practitioner. I have never even tried it. I am just an admirer of its philosophy and techniques. Still, I have studied and practiced other martial arts and earned a brown belt in one of them.

Back to Perfect Barrier, Plaintiff’s counsel responds to the designation of all 75,000 emails as “Attorney Eyes Only,” with a motion for sanctions. It reminds me of the attackers in the Seagal and Ueshiba videos who get up after a throw down and try again to attack. Plaintiff’s counsel argued quite truthfully in the motion that it was never his intention to allow Defendant to designate all emails as “Attorneys Eyes Only,” but only ones that qualified for that designation by virtue of the super confidential nature of the communication or attached document. Plaintiff claimed that all of these emails certainly did not qualify for that classification, and so Defendant violated the stipulated order and should now be sanctioned. This was a pretty credible attack, by a black belt of an attorney, upon a seemingly weak and vulnerable adversary. Here is a copy of Plaintiff’s Memorandum supporting the sanctions motion. It has several interesting exhibits attached to it concerning hash that I will discuss later.

Defendant’ Opposition Memorandum pointed out that nothing in the parties agreement prevented them from designating all email as “Attorneys Eyes Only” and argued that shifting the burden and costs of review here onto the requesting party was justified and permitted under the rules and case law. Here is the primary thrust of their argument:

Because Plaintiff is alone responsible for the impossibly large volume of documents recovered, it should bear the burden of having to review the documents for relevance. . . .

Requiring WoodSmart to review all 75,000 pages of emails for confidentiality would be unduly burdensome and would reward Plaintiff’s overly broad discovery requests.

Magistrate Judge Christopher A. Nuechterlein seemed to understand perfectly well what was going on here, that this was a case of the Plaintiff, Perfect Barrier, attacking unnecessarily with an over broad request, and then getting exactly what they deserved. Here is what Judge Nuechterlein said in denying Plaintiff’s motion:

Nothing in the protective order prevents large categorical designations. If Perfect Barrier desired Woodsmart to be more selective in its use of the confidential designation, Perfect Barrier should have utilized more care in drafting the agreed protective order to include more particular language that is consistent with its position. As it stands, the language of the protective order simply requires a “category” designation. Therefore, this Court finds that Woodsmart followed and did not violate the protective order.

While Perfect Barrier may have a voluminous amount of discovery to parse through, it has no entity to blame except itself. Perfect Barrier provided the search terms to Woodsmart as part of its request for production of the email communications. Woodsmart produced every document that appeared with those search terms. In other words, Woodsmart provided Perfect Barrier with every possible document that Perfect Barrier requested. It was Perfect Barrier’s expansive request that produced such voluminous discovery. . . .

To be clear, Woodsmart has not withheld the emails from Perfect Barrier, it has, however, limited Perfect Barrier’s use of them by designating them “attorney eyes only.” If Perfect Barrier upon examination believes that certain emails were inappropriately characterized as “attorney eyes only,” they may challenge the designation, first with Woodsmart and, if that fails to resolve the dispute, then with the Court.

Judge Nuechterlein went on to warn the parties that he would not countenance any attempt by either side to try to shift the burden once again onto him. He said not to send large batches of email for him to review in camera. Instead, he expects Plaintiff’s counsel to sort through it all, and then get with defense counsel to resolve any disagreements as to what should be excluded, or not. If they insist on further adjudication of these e-discovery issues, Judge Nuechterlein threatened to send the whole thing to a Special Master, and force the parties to pay for it. Once again we see a court speak of possible reference of issues to a Special Master as a kind of threat to cajole agreement.

The Plaintiff attacked once again, and objected to Defendant’s production of the email in its original native format. Plaintiff wanted the whole thing to be redone and reproduced in flat, searchable TIFF images, no doubt so they could load it into their review software. There is a significant expense involved in converting native email into into a searchable image format, and Plaintiff was trying to shift the expense onto Defendant.

The court rejected this attack also and held that since Plaintiff had failed to specify a form of production in its request for email, the production in original native format was permitted under Rule 34(b)(2)(E)(ii), which states:

[i]f a request does not specify a form for producing electronically stored information, a party must produce it in a form or forms in which it is ordinarily maintained or in a reasonably usable form or forms.

Native format is the form in which Plaintiff originally maintained the email, and so the production was perfectly proper. In any event, the court held that a native form production is also reasonably usable, so there is no grounds to complain. Here are Judge Nuechterlein’s words on this issue:

Perfect Barrier did not request that the emails be produced in a particular form, yet Perfect Barrier now asks this Court to force Woodsmart to produce the electronic emails as Static Images with a bates-number identifier. Woodsmart objects to this request because it would cost a substantial sum of money to convert the documents from the form in which the documents are normally kept, Native format, to Static Images.

Woodsmart has already produced the emails on a disc in Native format. Woodsmart maintains the email documents in such a format. Fed.R.Civ.P. 34 only requires Woodsmart to submit the emails in the format in which it keeps them, Native format, and nothing more. While it may be more convenient for Perfect Barrier to have the emails as Static Images, Fed.R.Civ.P. 34 does not provide that convenience is a basis for requiring electronic discovery to be produced in a different format than normally maintained. If Perfect Barrier wanted the emails as Static Images, it should have specified this request in its requests for production, which it did not do.

Furthermore, this Court finds that the emails produced on an electronic media such as disc is reasonably usable. Perfect Barrier can access, examine, and even print the communications. While Perfect Barrier may prefer to have them as Static Images, the burden to convert the emails to Static Images remains with Perfect Barrier. Woodsmart complied with Fed.R.Civ.P. 34(b)(2)(E) and is required to do nothing more.

The court once again uses a “gotcha” on the Plaintiff, saying you forgot to ask, so you are getting what you deserve. But in reality, even if Plaintiff had originally asked for production in Static Image form, as the court puts it, that would not necessarily oblige Defendant to comply. Defendant could still have objected, and insisted on native production, arguing that the alternative was simply an attempt to shift the requesting party’s own cost of processing upon the responding party. I have not seen a case on that yet, but I expect this will come up soon.

Finally, this case is very interesting to me because the parties’ memorandums underlying the order include as exhibits several emails and letters between the attorneys where they argue about the meaning of the law review article I wrote on Hash, and the need for Bates stamps, or not, on ESI productions. See Exhibits “D” and “E” to Plaintiff’s Memorandum in Support of Motion to Compel, and Exhibits “D” and “F” to Defendant’s Opposition Memorandum. I do not know any of the attorneys in this Indiana case, so I was surprised to stumble across this debate. 

My article on hash was used primarily to support the producing party’s position that the receiving party should bear the substantial costs of Tiffing and Bates stamping. But the receiving party relied on it too, claiming that the best practice advice in the article to include a load file with hash values was not followed. Since the production was probably just a few a big PST files, that would have been easy to do, but would not really have addressed the receiving party’s concerns regarding the authenticity of individual emails within the PST. There are, however, other ways to address the authenticity issues which were not really explored by the parties. That was because their emails attached to the memorandums show that one side was just arguing, and not really trying to solve the problem. It was just a debate, not an e-discovery Art of Peace collaborative venture.

Morihei Ueshiba (1883-1969)

Morihei Ueshiba (1883-1969)

I leave you to contemplate a few of the quotes of  Sensei Ueshiba. The insights he gained into the resolution of physical combat have, for me at least, some cross-over value to resolution of legal disputes in today’s combative system of justice. As Perfect Barrier shows, his philosophy and techniques can sometimes succeed perfectly in the arena of e-discovery.

Be grateful even for hardship, setbacks, and bad people. Dealing with such obstacles is an essential part of training in Aikido.

Failure is the key to success; each mistake teaches us something.

If your opponent tries to pull you, let him pull. Don’t pull against him; pull in unison with him.

Opponents confront us continually, but actually there is no opponent there. Enter deeply into an attack and neutralize it as you draw that misdirected force into your own sphere.

Even the most powerful human being has a limited sphere of strength. Draw him outside of that sphere and into your own, and his strength will dissipate.

The real Art of Peace is not to sacrifice a single one of your warriors to defeat an enemy. Vanquish your foes by always keeping yourself in a safe and unassailable position; then no one will suffer any losses. The Way of a Warrior, the Art of Politics, is to stop trouble before it starts. It consists in defeating your adversaries spiritually by making them realize the folly of their actions. The Way of a Warrior is to establish harmony.

Never think of yourself as an all-knowing, perfected master; you must continue to train daily with your friends and students and progress together in Aikido.

The techniques of Aikido change constantly; every encounter is unique, and the appropriate response should emerge naturally. Today’s techniques will be different tomorrow. Do not get caught up with the form and appearance of a challenge. Aikido has no form – it is the study of the spirit.