<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Law Review Article Published on the Mathematics Underlying e-Discovery: &#8220;HASH: The New Bates Stamp&#8221;</title>
	<atom:link href="http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/feed/" rel="self" type="application/rss+xml" />
	<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/</link>
	<description>A Team approach to electronic discovery combining the talents of Law and IT.  The views expressed in this blog are my own, and not necessarily those of my law firm or clients. Copyright Ralph Losey 2008. All Rights Reserved.</description>
	<pubDate>Wed, 20 Aug 2008 23:51:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
		<item>
		<title>By: The Days of the Bates Stamp Are Numbered &#171; e-Discovery Team</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-6358</link>
		<dc:creator>The Days of the Bates Stamp Are Numbered &#171; e-Discovery Team</dc:creator>
		<pubDate>Sun, 11 May 2008 13:49:40 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-6358</guid>
		<description>[...] with 174 footnotes leaves you cold, I suggest you try my Hash Page summary instead, or my earlier blog on Hash. They will give you a pretty good idea of how hash is the mathematical foundation of e-discovery, [...]</description>
		<content:encoded><![CDATA[<p>[...] with 174 footnotes leaves you cold, I suggest you try my Hash Page summary instead, or my earlier blog on Hash. They will give you a pretty good idea of how hash is the mathematical foundation of e-discovery, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacques Francoeur</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4311</link>
		<dc:creator>Jacques Francoeur</dc:creator>
		<pubDate>Thu, 06 Dec 2007 20:40:20 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4311</guid>
		<description>Hi Ralph, for ease of discussion my comments are referenced to your entry dated December 4, 2007 at 3:55 pm by paragraph number. 

Paragraph 2: I am not suggesting that all ESI created be time stamped. There are more significant record events than time of creation, such as time of contract execution, time of filing or time of corporate record declaration. These key business events have important subsequent time-based events; such as, time of destruction based on a prescribed retention period.  

Paragraph 3: I guess by “overkill” you mean high level of assurance. It does provide a much higher level of assurance but at a lower cost. This combined value is compelling as an alternative approach. Timestamping can be very useful in the e-Discovery process by providing a record-level method of irrefutably proving that the authenticity of each record identified as relevant has been preserved. It does so irrespective of where the record has been distributed, where it is stored or under whose control it has been. Therefore, it does not depend on a chain of custody involving parties one has very little control over. This is important when there are a number of individuals involved in the e-Discovery review and analysis process, including opposing counsel.

Paragraph 4: The use of a hash as a method of identifying a record is not what we are talking about here. A hash can be used to identify a record and demonstrate that an identical copy was made at a point in time. However, it does not prove the authenticity of the record. For this, one must reach back to demonstrate the record is what it purports to be at the time the “assertion” in question is being made. Not at the time it was identified as relevant in the e-Discovery process. 

Paragraph 5: How can an assertion that a record is authentic (it is what it purports to be) when it was just tagged with a hash at time of collection? This is making an over assertion that is false. The only assertion that can be made is exactly what you say: “this is the same file that existed in the producing party’s computers when the collection was made.” – period! 

Paragraph 6: In general the earlier a record is timestamped, the earliest being time of creation, the more confident the assertion is related to its authenticity. It is more valuable to associate the period of “persistent provable authenticity” from the time of a business significant event such as contract execution or corporate record declaration. Again what is important in terms of authenticity is – from the time the assertion is made.  I will make the point again that if a person has control of the controls around the data and there is a motive to modify the record, they could do so without detection even with a presence of a hash. From this time forward proving identical copies captured during the e-Discovery process will mean little to those affected by the modifications.

The “significance of this limitation” is that the assertion of authenticity is overstated and it does not address the core requirement of the Federal Rules of Evidence article 9 which states – “The requirement of authentication or identification as a condition precedent to admissibility is satisfied by evidence sufficient to support a finding that the matter in question is what its proponent claims.” The best claim that can be made in your case is that the record did not change during the e-Discovery process. This adds little to demonstrating that the record is what it purports to be at the time the assertion was made.

I disagree with “burden lies on the objecting party to provide some evidence that the document is not genuine” – the burden to demonstrate the authenticity of the ESI being proffered is on the proponent of the information. I would refer to the recent American Express precedent (In Re Vee Vinhnee 336 B.R. 437 (9th Cir. BAP (Cal.) 2005) – “The court declined to admit plaintiff’s computerized business records as inadequately authenticated …”. Vinhnee never challenged the authenticity of their records nor did he even show up in court. 

I would also refer you to the Grimm decision (Lorraine v. Markel American Ins (Co., 241 F.R.D. 534 (D. Md. 2007) which stated  “… considering the significant costs associated with discovery … it makes little sense to go to all the bother and expense to get electronic information only to have it excluded from evidence … because the proponent cannot lay a sufficient foundation to get it admitted.”

Paragraph 9: The greater benefit of timestamping is less about detecting fraud, as it is the exception, but more the ability to prove “good” behavior. It is the 99.999% of people who are good who need to be protected by an effective method to refute claims of inappropriate behavior. I would refer you to what happened to Arthur Anderson. After the rash of fraudulent corporate behavior (e.g., Enron and Options Backdating) there is a strong need to be able to quickly and effectively prove good behavior.
We seem to have forgotten an important point. One cannot look at e-Discovery without the greater context of an organization’s governance responsibilities as it relates to the reliability of its corporate information. Regulations governing corporations; for example, the Sarbanes-Oxley act specify the requirement to ensure the reliability of financial information systems and the integrity of financial records. This is where you start the process of ensuring the authenticity of records – to comply with your governance requirements. Then, if any of these records find themselves relevant to e-Discovery, one can instantly prove authenticity and compliance. 

Paragraph 10: It is in fact a risk based decision as to which methods an organization decides to adopt to ensure the authenticity of their corporate records. They have two choices, either by “inferred” approaches based on external system-level (perimeter) controls or by an “intrinsic” method based on a data-level control. The approach taken will then pre-determine how the organization can subsequently demonstrate the authenticity of their records in judicial or regulatory proceedings. The cost and complexity associated with the inferred authenticity approach is much higher, easily challenged and consequently the level of assurance of successfully demonstrating authenticity is lower. In other words, the risk of not meeting the burden of proof is higher. Is this risk real? The previously cited American Express precedent excluded their corporate records because they were unable to establish their authenticity to a level satisfactory to the Judge, even after several attempts to do so. They took a risk with “inferred” approaches and they lost. 

Paragraph 11: As previously stated, the need is to demonstrate authenticity of the record well before the e-Discovery process. Your presumption as it relates to e-Discovery is based on an effective “chain of custody” between multiple parties, some friendly and others not. Depending on the external controls and trusted parties is a risk. Some will be willing to take that risk. Putting a record-level control eliminates this risk. It is a risk and cost based decision. 

Paragraph 13: Again, I would respond to say the need for demonstrating authenticity of a record in judicial and regulatory proceeding relates more to when the record “assertion” was made versus when it was tagged as relevant in the e-Discovery process. 

Regards,

Jacques R. Francoeur
ProofSpace</description>
		<content:encoded><![CDATA[<p>Hi Ralph, for ease of discussion my comments are referenced to your entry dated December 4, 2007 at 3:55 pm by paragraph number. </p>
<p>Paragraph 2: I am not suggesting that all ESI created be time stamped. There are more significant record events than time of creation, such as time of contract execution, time of filing or time of corporate record declaration. These key business events have important subsequent time-based events; such as, time of destruction based on a prescribed retention period.  </p>
<p>Paragraph 3: I guess by “overkill” you mean high level of assurance. It does provide a much higher level of assurance but at a lower cost. This combined value is compelling as an alternative approach. Timestamping can be very useful in the e-Discovery process by providing a record-level method of irrefutably proving that the authenticity of each record identified as relevant has been preserved. It does so irrespective of where the record has been distributed, where it is stored or under whose control it has been. Therefore, it does not depend on a chain of custody involving parties one has very little control over. This is important when there are a number of individuals involved in the e-Discovery review and analysis process, including opposing counsel.</p>
<p>Paragraph 4: The use of a hash as a method of identifying a record is not what we are talking about here. A hash can be used to identify a record and demonstrate that an identical copy was made at a point in time. However, it does not prove the authenticity of the record. For this, one must reach back to demonstrate the record is what it purports to be at the time the “assertion” in question is being made. Not at the time it was identified as relevant in the e-Discovery process. </p>
<p>Paragraph 5: How can an assertion that a record is authentic (it is what it purports to be) when it was just tagged with a hash at time of collection? This is making an over assertion that is false. The only assertion that can be made is exactly what you say: “this is the same file that existed in the producing party’s computers when the collection was made.” – period! </p>
<p>Paragraph 6: In general the earlier a record is timestamped, the earliest being time of creation, the more confident the assertion is related to its authenticity. It is more valuable to associate the period of “persistent provable authenticity” from the time of a business significant event such as contract execution or corporate record declaration. Again what is important in terms of authenticity is – from the time the assertion is made.  I will make the point again that if a person has control of the controls around the data and there is a motive to modify the record, they could do so without detection even with a presence of a hash. From this time forward proving identical copies captured during the e-Discovery process will mean little to those affected by the modifications.</p>
<p>The “significance of this limitation” is that the assertion of authenticity is overstated and it does not address the core requirement of the Federal Rules of Evidence article 9 which states – “The requirement of authentication or identification as a condition precedent to admissibility is satisfied by evidence sufficient to support a finding that the matter in question is what its proponent claims.” The best claim that can be made in your case is that the record did not change during the e-Discovery process. This adds little to demonstrating that the record is what it purports to be at the time the assertion was made.</p>
<p>I disagree with “burden lies on the objecting party to provide some evidence that the document is not genuine” – the burden to demonstrate the authenticity of the ESI being proffered is on the proponent of the information. I would refer to the recent American Express precedent (In Re Vee Vinhnee 336 B.R. 437 (9th Cir. BAP (Cal.) 2005) – “The court declined to admit plaintiff’s computerized business records as inadequately authenticated …”. Vinhnee never challenged the authenticity of their records nor did he even show up in court. </p>
<p>I would also refer you to the Grimm decision (Lorraine v. Markel American Ins (Co., 241 F.R.D. 534 (D. Md. 2007) which stated  “… considering the significant costs associated with discovery … it makes little sense to go to all the bother and expense to get electronic information only to have it excluded from evidence … because the proponent cannot lay a sufficient foundation to get it admitted.”</p>
<p>Paragraph 9: The greater benefit of timestamping is less about detecting fraud, as it is the exception, but more the ability to prove “good” behavior. It is the 99.999% of people who are good who need to be protected by an effective method to refute claims of inappropriate behavior. I would refer you to what happened to Arthur Anderson. After the rash of fraudulent corporate behavior (e.g., Enron and Options Backdating) there is a strong need to be able to quickly and effectively prove good behavior.<br />
We seem to have forgotten an important point. One cannot look at e-Discovery without the greater context of an organization’s governance responsibilities as it relates to the reliability of its corporate information. Regulations governing corporations; for example, the Sarbanes-Oxley act specify the requirement to ensure the reliability of financial information systems and the integrity of financial records. This is where you start the process of ensuring the authenticity of records – to comply with your governance requirements. Then, if any of these records find themselves relevant to e-Discovery, one can instantly prove authenticity and compliance. </p>
<p>Paragraph 10: It is in fact a risk based decision as to which methods an organization decides to adopt to ensure the authenticity of their corporate records. They have two choices, either by “inferred” approaches based on external system-level (perimeter) controls or by an “intrinsic” method based on a data-level control. The approach taken will then pre-determine how the organization can subsequently demonstrate the authenticity of their records in judicial or regulatory proceedings. The cost and complexity associated with the inferred authenticity approach is much higher, easily challenged and consequently the level of assurance of successfully demonstrating authenticity is lower. In other words, the risk of not meeting the burden of proof is higher. Is this risk real? The previously cited American Express precedent excluded their corporate records because they were unable to establish their authenticity to a level satisfactory to the Judge, even after several attempts to do so. They took a risk with “inferred” approaches and they lost. </p>
<p>Paragraph 11: As previously stated, the need is to demonstrate authenticity of the record well before the e-Discovery process. Your presumption as it relates to e-Discovery is based on an effective “chain of custody” between multiple parties, some friendly and others not. Depending on the external controls and trusted parties is a risk. Some will be willing to take that risk. Putting a record-level control eliminates this risk. It is a risk and cost based decision. </p>
<p>Paragraph 13: Again, I would respond to say the need for demonstrating authenticity of a record in judicial and regulatory proceeding relates more to when the record “assertion” was made versus when it was tagged as relevant in the e-Discovery process. </p>
<p>Regards,</p>
<p>Jacques R. Francoeur<br />
ProofSpace</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven W. Teppler</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4284</link>
		<dc:creator>Steven W. Teppler</dc:creator>
		<pubDate>Wed, 05 Dec 2007 18:38:00 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4284</guid>
		<description>Hello Ralph:   I am the founder of TimeCertain, one of the companies cited in your article.  I am also a practicing attorney, litigate digital evidence matters, am co-Vice Chair of the ABA Information Security Committee, and am also the co-author of an upcoming American Bar Association book entitled "Foundations of Digital Evidence" (exp. pub. Spring 2008), and am.  One of the chapters for which I have chief responsibility deals with time and digital evidence.  Together with Hoyt Kesterson, II (one of the creators of the original X.500/509 standard), I also presented on issues relating to the SHA-1 forced-collision research that was published by a Chinese researcher in 2006 hashing functions at the RSA 2006 Security Conference.

Hash functions are useful, but they are not the panacea for all authentication ills. The utility of hash functions (and output) should not be overstated to claim what hashes do *not* provide.  Further, within the context of authenticating digital evidence, it is important to keep in mind *what* it is that one is authenticating.

The technical definition of a hash function is the manipulation of a variable-length input data string to produce a fixed-length output data string (typically shorter); that resulting output data string having two characteristics – (1) it is computationally infeasible to find two input strings that will produce the same output string, and (2) given an input string and the resultant output string, it is computationally infeasible to find another input string that will produce the same output string.

While this data "fingerprint" will (absent a forced-collision vulnerability) may be considered free from unintentional manipulation, it is *not* without more, free from intentional manipulation by one in control of computer environmental variables.  One of these variables is "time."  If one can change the time "known" by a computer generating relevant data, one can also change both time of data as well as data content at whim.  One could, for example, reset a computer clock to an earlier date and create a document that appeared to third parties to be authentic.

Where this becomes problematic focuses on the claims by forensic imaging vendors who claim that the performance of a hashing function on a drive image proves the authenticity of the underlying data.  It does not, and a hash of a drive image that contains a back or forward dated document will not prove the *true* date of that data blob's creation, or first instantiation.  A hash function only, therefore, will not detect intentional manipulation by the data generator.

As for the drive image itself, the hash function will only prove that the drive image could not have been changed by anyone, again with the exception of the person who created the drive image and ran the hash function on it.  The hashing function alone may help to narrow challenges as to who might have altered data, but it does not prove that the data (whether in the form of a drive image, file, or other bit stream) *has not been altered*

The addition of a digital signature to the hash of a drive image adds additional protection to digital evidence, but again, it does not prove that the data comprising the drive image was not altered or otherwise manipulated prior to the conduct of the hash-and-sign function.  What it will tend to prove is that the drive image itself could not have been changed by anyone except the person or persons who hold the either the private key (in a PKI system) or the encryption key used to sign (or encrypt) that hash.  

Again, however, the same argument and challenge may be made.  Those in control of environmental variables (time, encryption key, etc.) could backdate data, re-run a hash and sign process on a data blob (including a drive image) and then offer it up as authentic, with little or no way to prevent authentication by traditional FRE 901 methods.  Two articles co-authored with Jeff Stapleton (the chair of the ANSI X9.95 standard described below) entitled "Digital Signatures are Not Enough" and "The Digital Signature Paradox" in 2005 and 2006 by the IETF Workshop and the ISSA Journal discuss this in more detail.  

At best, what can be argued is that the binary data representing the drive image or files contained therein could not have been changed by anyone except those in control of the environmental variables.

Again, even if one hashes and digitally signs a drive image, it narrows the potential actors but does not eliminate the possibility of intentional manipulation.   It also does not prove any digital data file authenticity, and may result in the presumption of authenticity for manipulated data.

Time adds another layer of protection for what I call "provably persistent data integrity" --- that is, proving that digital data (evidence) is what it purports to be, *at the time that relevance attaches to it, and that it can be demonstrated that such data could not, and therefore was not altered by anyone since that time." (Quotes are mine).  Not at the time a drive was imaged, but at the time the relevance of the data attached to it, which means when it was created, transmitted received, accessed, modified, etc.  For paper, this is generally either presumed, or adequate forensic tests exist to ascertain this.  For digital data, which consists of ordered sets of zeroes and ones, control over time by a data generator robs digital data of much of its authentication capability.  I am currently litigating a spoliation motion in Federal Court where three versions of an digital information varying in time and content, have been offered as "identical" and "original"  The entity here has control over the environmental control variable of time, and so this comes as no surprise.

One way (and there are others) of generating digital evidence with provably persistent data integrity is to bind a trusted time value (typically from NIST) to digital data (preferably hashed and signed) in a cryptographically robust fashion.  That is what trusted time stamping does, and details of the various methodologies (which are protocol-compliant) are set out in more detail in the ANSI X9.95 Trusted Time Stamping Standard published in 2005.  Generally, trusted time stamping creates both a token (fingerprint) to digital data at the time of first instantiation, such that if the data blob or the token if thereafter altered, it is immediately detectable.

Of course, even a trusted time stamping schema properly deployed will be subject to the typical 901 authentication requirements, but once met (and they can be met either through 901b4 or b9) the authenticity of the data content qua content will be extremely difficult to challenge.

So, my long way of saying that hashing is not enough for provably persistent data integrity. Digital signatures and hashing together are still insufficient.  Adding trusted time stamping can be enough, if deployed properly. 

As for your contention that not everything needs to be time-stamped, I agree, with a qualification.  That qualification is that one only need time stamp data which will or may be used as evidence in litigation some other adjudicative proceeding.

Best,
Steven W. Teppler</description>
		<content:encoded><![CDATA[<p>Hello Ralph:   I am the founder of TimeCertain, one of the companies cited in your article.  I am also a practicing attorney, litigate digital evidence matters, am co-Vice Chair of the ABA Information Security Committee, and am also the co-author of an upcoming American Bar Association book entitled &#8220;Foundations of Digital Evidence&#8221; (exp. pub. Spring 2008), and am.  One of the chapters for which I have chief responsibility deals with time and digital evidence.  Together with Hoyt Kesterson, II (one of the creators of the original X.500/509 standard), I also presented on issues relating to the SHA-1 forced-collision research that was published by a Chinese researcher in 2006 hashing functions at the RSA 2006 Security Conference.</p>
<p>Hash functions are useful, but they are not the panacea for all authentication ills. The utility of hash functions (and output) should not be overstated to claim what hashes do *not* provide.  Further, within the context of authenticating digital evidence, it is important to keep in mind *what* it is that one is authenticating.</p>
<p>The technical definition of a hash function is the manipulation of a variable-length input data string to produce a fixed-length output data string (typically shorter); that resulting output data string having two characteristics – (1) it is computationally infeasible to find two input strings that will produce the same output string, and (2) given an input string and the resultant output string, it is computationally infeasible to find another input string that will produce the same output string.</p>
<p>While this data &#8220;fingerprint&#8221; will (absent a forced-collision vulnerability) may be considered free from unintentional manipulation, it is *not* without more, free from intentional manipulation by one in control of computer environmental variables.  One of these variables is &#8220;time.&#8221;  If one can change the time &#8220;known&#8221; by a computer generating relevant data, one can also change both time of data as well as data content at whim.  One could, for example, reset a computer clock to an earlier date and create a document that appeared to third parties to be authentic.</p>
<p>Where this becomes problematic focuses on the claims by forensic imaging vendors who claim that the performance of a hashing function on a drive image proves the authenticity of the underlying data.  It does not, and a hash of a drive image that contains a back or forward dated document will not prove the *true* date of that data blob&#8217;s creation, or first instantiation.  A hash function only, therefore, will not detect intentional manipulation by the data generator.</p>
<p>As for the drive image itself, the hash function will only prove that the drive image could not have been changed by anyone, again with the exception of the person who created the drive image and ran the hash function on it.  The hashing function alone may help to narrow challenges as to who might have altered data, but it does not prove that the data (whether in the form of a drive image, file, or other bit stream) *has not been altered*</p>
<p>The addition of a digital signature to the hash of a drive image adds additional protection to digital evidence, but again, it does not prove that the data comprising the drive image was not altered or otherwise manipulated prior to the conduct of the hash-and-sign function.  What it will tend to prove is that the drive image itself could not have been changed by anyone except the person or persons who hold the either the private key (in a PKI system) or the encryption key used to sign (or encrypt) that hash.  </p>
<p>Again, however, the same argument and challenge may be made.  Those in control of environmental variables (time, encryption key, etc.) could backdate data, re-run a hash and sign process on a data blob (including a drive image) and then offer it up as authentic, with little or no way to prevent authentication by traditional FRE 901 methods.  Two articles co-authored with Jeff Stapleton (the chair of the ANSI X9.95 standard described below) entitled &#8220;Digital Signatures are Not Enough&#8221; and &#8220;The Digital Signature Paradox&#8221; in 2005 and 2006 by the IETF Workshop and the ISSA Journal discuss this in more detail.  </p>
<p>At best, what can be argued is that the binary data representing the drive image or files contained therein could not have been changed by anyone except those in control of the environmental variables.</p>
<p>Again, even if one hashes and digitally signs a drive image, it narrows the potential actors but does not eliminate the possibility of intentional manipulation.   It also does not prove any digital data file authenticity, and may result in the presumption of authenticity for manipulated data.</p>
<p>Time adds another layer of protection for what I call &#8220;provably persistent data integrity&#8221; &#8212; that is, proving that digital data (evidence) is what it purports to be, *at the time that relevance attaches to it, and that it can be demonstrated that such data could not, and therefore was not altered by anyone since that time.&#8221; (Quotes are mine).  Not at the time a drive was imaged, but at the time the relevance of the data attached to it, which means when it was created, transmitted received, accessed, modified, etc.  For paper, this is generally either presumed, or adequate forensic tests exist to ascertain this.  For digital data, which consists of ordered sets of zeroes and ones, control over time by a data generator robs digital data of much of its authentication capability.  I am currently litigating a spoliation motion in Federal Court where three versions of an digital information varying in time and content, have been offered as &#8220;identical&#8221; and &#8220;original&#8221;  The entity here has control over the environmental control variable of time, and so this comes as no surprise.</p>
<p>One way (and there are others) of generating digital evidence with provably persistent data integrity is to bind a trusted time value (typically from NIST) to digital data (preferably hashed and signed) in a cryptographically robust fashion.  That is what trusted time stamping does, and details of the various methodologies (which are protocol-compliant) are set out in more detail in the ANSI X9.95 Trusted Time Stamping Standard published in 2005.  Generally, trusted time stamping creates both a token (fingerprint) to digital data at the time of first instantiation, such that if the data blob or the token if thereafter altered, it is immediately detectable.</p>
<p>Of course, even a trusted time stamping schema properly deployed will be subject to the typical 901 authentication requirements, but once met (and they can be met either through 901b4 or b9) the authenticity of the data content qua content will be extremely difficult to challenge.</p>
<p>So, my long way of saying that hashing is not enough for provably persistent data integrity. Digital signatures and hashing together are still insufficient.  Adding trusted time stamping can be enough, if deployed properly. </p>
<p>As for your contention that not everything needs to be time-stamped, I agree, with a qualification.  That qualification is that one only need time stamp data which will or may be used as evidence in litigation some other adjudicative proceeding.</p>
<p>Best,<br />
Steven W. Teppler</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ralph Losey</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4260</link>
		<dc:creator>Ralph Losey</dc:creator>
		<pubDate>Tue, 04 Dec 2007 20:55:03 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4260</guid>
		<description>Thank you for your comment, which, as you know, follows a lengthy discussion with your colleague, Paul Doyle. Paul is the owner of your company, Proof Space, and we had a lively phone discussion on this already. 

We agree on much, including the importance of hash, and the need to verify the authenticity of ESI, especially in an e-discovery context. I also understand some of the important benefits and additional security offered by your company's time-stamp digital signature based hash software products. You may recall I mentioned  this type of application of hash in my law review article. Adding an encryption based signature that proves the time of the creation of a file certainly has important benefits in some circumstances, especially where proof of time might be important, as with patent documents, or there was a viable risk of attack and alteration of documents after creation. It looks like a very good system to insure that a document has not been modified since it was created. 

But, as you know, we also disagree on the need for the routine employment of your software, or others like it, as part of the standard e-discovery process, or as part of a standard enterprise document management system. To me it seems like overkill to use it all of the time. 

If ESI is hashed at the time of original e-discovery collection, and proper forensic chain of custody procedures are then followed, then that proves the identity of the ESI at a later time. It proves it is a bona fide copy of the original file or document that was in the producing party's computer system at the time of collection.  That is sufficient to prove authenticity almost all of the time, and sufficient for the admission of ESI as evidence, barring other circumstances, including the circumstances you mention of fraud where the genuineness of the file is challenged. 

Granted there are circumstances where proof of identity of ESI at the time of collection may not be sufficient. There may be evidence that a file was forged before collection, in which case the Hash at collection would only prove identity, that this is the same file that existed in the producing party's computers when the collection was made. It would not prove that it has not been altered between the time of original creation and time of collection. But still, the genuineness of a document, including an electronic document, is presumed by its location in business records and other evidence of routine business practices, unless there is some evidence of fabrication or alteration. Evidence of fraud could include testimony of the original creator of the file, who might say this was changed after I made it, or testimony for a third party who might say this is not the same file sent to me at that time. 

In other words, the hash value alone would not prove that the file had not been altered BEFORE collection. If challenged, you would have to provide testimony to support its presumed genuineness. Only your specialized hash/time software could prove it was the same file that was originally created, and even then, only if your software was used at the time of creation.  Put another way, Hash without document origination based time stamp can only prove that the ESI was a genuine copy of the original ESI in the company's computers at the time of collection.  I agree with that point, my disagreement is only with the evaluation of the significance of this limitation, both practical and legal.

The same issue you mention also arises in the collection of paper documents. It is always possible that a business could have a forged document in its records. Still, the law presumes that a record found in a business is bonafide and authenticate, so long as you can prove it is an exact copy of the original. It is not, however, a conclusive presumption, but the burden lies on the objecting party to provide some evidence that the document is not genuine, that it is a fraud. This in turn relies upon comparison with other documents, and witness testimony, especially the testimony of the person who created the file to begin with, and in some cases, forensic examination of the computer systems involved. 

True, if a company had used your software to time stamp all of its ESI, then  falsification would be much more difficult, perhaps even impossible (but I never under estimate the creativity of criminals, and falsficiation of evidence is a crime). Still, even if your software was involved, a court would probably also want to hear the testimony of the person or persons who created the ESI, and see comparisons, where available, with any other documents that claim to be the original. I understand from Judge Grimm that it is theoretically possible for two copies of the same document to be considered authentic, and then offered to the jury to decide which one was genuine. Again, interesting theory, but even Judge Grimm could not think of any case where this had ever happened. I know it has never happened to me in my 28 years of litigation.

In sum, I remain unconvinced that  to prevent even the possibility of fraud, a company should time stamp and hash each document that it creates. This is not practical. Large companies create millions, if not billions of ESI files every day. The routine employment of software such as that offered by your company is, in my view, unnecessary over-kill as a general practice for all ESI. It adds a layer of time, expense and burden not required for 99% of most company's ESI. 

Still, sometimes a business might want to so secure a file by use of your software, especially, as mentioned, for patent, or other time dependent or sensitive records. So, don't get me wrong, I think you offer a valuable piece of software, one that ingeniously incorporates and builds upon the power of Hash. I just think you say too much to suggest it should always be employed or you risk making your electronic evidence inadmissible.

We also seem to agree that in e-discovery ESI should always be hashed at the time of collection. This establishes the key time for bonaficity of a copy. But you also seem to suggest that an electronic file, and its accompanying hash, could be changed after collection, and because of this possibility, your software should be used then too. Again, I disagree, but only because I do not consider this to be a realistic possibility if proper chain of custody is maintained. For the fraud you hypothesize to occur, there would have to be a break in chain of custody, and a criminal event during that break.  Again, this is a possible, but it is, in my opinion, a very far fetched situation. 

Once a collection has been made, the first copy is secured, and multiple copies are then typically distributed to various interested parties. It is far fetched to think that someone could then surreptitiously gain access to all copies of the hashed collection set, and modify certain files and their hash. This would require knowledge of the location of all such data vaults, and then inside help to break into them. A criminal who  attempts to fabricate evidence in this manner would have to bribe or defraud, at the very least, the e-discovery vendors, and the parties attorneys. If attempted, it would almost certainly be detected. 

Again, for me the slight risk of such criminal activity does not justify the use of encrypted time stamping at the time of collection as part of a new industry standard. Hashing at the time of collection, plus secure chain of custody, is sufficient, and has been recognized as legally sufficient in many cases by courts all around the country, as my article shows. 

Does anyone else have a view on this issue?</description>
		<content:encoded><![CDATA[<p>Thank you for your comment, which, as you know, follows a lengthy discussion with your colleague, Paul Doyle. Paul is the owner of your company, Proof Space, and we had a lively phone discussion on this already. </p>
<p>We agree on much, including the importance of hash, and the need to verify the authenticity of ESI, especially in an e-discovery context. I also understand some of the important benefits and additional security offered by your company&#8217;s time-stamp digital signature based hash software products. You may recall I mentioned  this type of application of hash in my law review article. Adding an encryption based signature that proves the time of the creation of a file certainly has important benefits in some circumstances, especially where proof of time might be important, as with patent documents, or there was a viable risk of attack and alteration of documents after creation. It looks like a very good system to insure that a document has not been modified since it was created. </p>
<p>But, as you know, we also disagree on the need for the routine employment of your software, or others like it, as part of the standard e-discovery process, or as part of a standard enterprise document management system. To me it seems like overkill to use it all of the time. </p>
<p>If ESI is hashed at the time of original e-discovery collection, and proper forensic chain of custody procedures are then followed, then that proves the identity of the ESI at a later time. It proves it is a bona fide copy of the original file or document that was in the producing party&#8217;s computer system at the time of collection.  That is sufficient to prove authenticity almost all of the time, and sufficient for the admission of ESI as evidence, barring other circumstances, including the circumstances you mention of fraud where the genuineness of the file is challenged. </p>
<p>Granted there are circumstances where proof of identity of ESI at the time of collection may not be sufficient. There may be evidence that a file was forged before collection, in which case the Hash at collection would only prove identity, that this is the same file that existed in the producing party&#8217;s computers when the collection was made. It would not prove that it has not been altered between the time of original creation and time of collection. But still, the genuineness of a document, including an electronic document, is presumed by its location in business records and other evidence of routine business practices, unless there is some evidence of fabrication or alteration. Evidence of fraud could include testimony of the original creator of the file, who might say this was changed after I made it, or testimony for a third party who might say this is not the same file sent to me at that time. </p>
<p>In other words, the hash value alone would not prove that the file had not been altered BEFORE collection. If challenged, you would have to provide testimony to support its presumed genuineness. Only your specialized hash/time software could prove it was the same file that was originally created, and even then, only if your software was used at the time of creation.  Put another way, Hash without document origination based time stamp can only prove that the ESI was a genuine copy of the original ESI in the company&#8217;s computers at the time of collection.  I agree with that point, my disagreement is only with the evaluation of the significance of this limitation, both practical and legal.</p>
<p>The same issue you mention also arises in the collection of paper documents. It is always possible that a business could have a forged document in its records. Still, the law presumes that a record found in a business is bonafide and authenticate, so long as you can prove it is an exact copy of the original. It is not, however, a conclusive presumption, but the burden lies on the objecting party to provide some evidence that the document is not genuine, that it is a fraud. This in turn relies upon comparison with other documents, and witness testimony, especially the testimony of the person who created the file to begin with, and in some cases, forensic examination of the computer systems involved. </p>
<p>True, if a company had used your software to time stamp all of its ESI, then  falsification would be much more difficult, perhaps even impossible (but I never under estimate the creativity of criminals, and falsficiation of evidence is a crime). Still, even if your software was involved, a court would probably also want to hear the testimony of the person or persons who created the ESI, and see comparisons, where available, with any other documents that claim to be the original. I understand from Judge Grimm that it is theoretically possible for two copies of the same document to be considered authentic, and then offered to the jury to decide which one was genuine. Again, interesting theory, but even Judge Grimm could not think of any case where this had ever happened. I know it has never happened to me in my 28 years of litigation.</p>
<p>In sum, I remain unconvinced that  to prevent even the possibility of fraud, a company should time stamp and hash each document that it creates. This is not practical. Large companies create millions, if not billions of ESI files every day. The routine employment of software such as that offered by your company is, in my view, unnecessary over-kill as a general practice for all ESI. It adds a layer of time, expense and burden not required for 99% of most company&#8217;s ESI. </p>
<p>Still, sometimes a business might want to so secure a file by use of your software, especially, as mentioned, for patent, or other time dependent or sensitive records. So, don&#8217;t get me wrong, I think you offer a valuable piece of software, one that ingeniously incorporates and builds upon the power of Hash. I just think you say too much to suggest it should always be employed or you risk making your electronic evidence inadmissible.</p>
<p>We also seem to agree that in e-discovery ESI should always be hashed at the time of collection. This establishes the key time for bonaficity of a copy. But you also seem to suggest that an electronic file, and its accompanying hash, could be changed after collection, and because of this possibility, your software should be used then too. Again, I disagree, but only because I do not consider this to be a realistic possibility if proper chain of custody is maintained. For the fraud you hypothesize to occur, there would have to be a break in chain of custody, and a criminal event during that break.  Again, this is a possible, but it is, in my opinion, a very far fetched situation. </p>
<p>Once a collection has been made, the first copy is secured, and multiple copies are then typically distributed to various interested parties. It is far fetched to think that someone could then surreptitiously gain access to all copies of the hashed collection set, and modify certain files and their hash. This would require knowledge of the location of all such data vaults, and then inside help to break into them. A criminal who  attempts to fabricate evidence in this manner would have to bribe or defraud, at the very least, the e-discovery vendors, and the parties attorneys. If attempted, it would almost certainly be detected. </p>
<p>Again, for me the slight risk of such criminal activity does not justify the use of encrypted time stamping at the time of collection as part of a new industry standard. Hashing at the time of collection, plus secure chain of custody, is sufficient, and has been recognized as legally sufficient in many cases by courts all around the country, as my article shows. </p>
<p>Does anyone else have a view on this issue?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jremifrancoeur</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4248</link>
		<dc:creator>jremifrancoeur</dc:creator>
		<pubDate>Tue, 04 Dec 2007 04:21:19 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-4248</guid>
		<description>Hi Ralph, first thank you for your time and effort to author content in the field of hashing as applied to the legal field and for providing a forum for discussion.
 
Hashing is indeed fundamental to the ability to determine and demonstrate what I like to call “intrinsic authenticity.” This is distinct from “inferred” approaches which presume authenticity based on the presence of effective external controls and trusted insiders. However, it is clear from recent events that insiders are not always trustworthy nor are controls always effective in keeping malicious individuals out. In fact, over 70% of security breaches occur as a result of insiders. 

My comment relates to the limitations of hashing alone. A hash value can certainly be generated and placed into the metadata (or associated to the original data) but the same ease to which a hash can be generated and inserted in the metadata it can be regenerated at any time and reinserted into the metadata. If this cannot be prevented or detected, then the incredible power of hashing can actually create a situation where falsified information that contains a hash is perceived to be irrefutably authentic.  

A hash is very useful in determining whether two records are identical or whether a record is unchanged. In fact, de-duplication is premised on the first use case where the only question is – does a record produce the same hash as another record? If yes, they are identical. However, the second use case (is the record unchanged) requires more than just a test of integrity (i.e., hash comparison). Is the record unchanged begs the question – unchanged from when? 

Although a hash is a unique digital fingerprint of a set of data, alone it is “floating” or unanchored to a reference which is beyond the control of all, insiders and outsiders – time. That is, if one can circumvent the controls around the record, such as would be the case in criminal or organized crime hacking, or if one is in control of the controls around the data, such as would be the case for a System Administrator, or if an executive can enter into collusion with or coerce such an individual, one can easily go into the content or records management system, falsify the file, create a new fraudulent hash of the falsified file and then insert the fraudulent hash in the metadata of the falsified file.  When tested, a fresh hash of the falsified file will match (compare positively) the fraudulent hash in the metadata of the falsified file. This creates a false sense of integrity that will be very difficult to refute given the current understanding that if you have a hash it is presume “… to ensure integrity.” 

Do not get me wrong, hashing is fundamental and a key step in the right direction, but it is “necessary but insufficient.” Once the hash is generated then the issues shifts to the “chain of custody” around the hash and when it was generated. Until the hash and file content is cryptographically bound to time there is a realistic possibility that the above can and will occur given that the necessary skill level is not very high.  

“Outside in” or perimeter-based approaches to preserving the chain-of-custody that depend on the system or trustworthy individuals are more complex, more costly and lower assurance than persistent data-level intrinsic approaches derived from cryptographic hash binding.  The technology to cryptographically bind a hash to time is well established and embodied in the American National Standards Institute (ANSI) X9.95-2005. 

Thoughts?</description>
		<content:encoded><![CDATA[<p>Hi Ralph, first thank you for your time and effort to author content in the field of hashing as applied to the legal field and for providing a forum for discussion.</p>
<p>Hashing is indeed fundamental to the ability to determine and demonstrate what I like to call “intrinsic authenticity.” This is distinct from “inferred” approaches which presume authenticity based on the presence of effective external controls and trusted insiders. However, it is clear from recent events that insiders are not always trustworthy nor are controls always effective in keeping malicious individuals out. In fact, over 70% of security breaches occur as a result of insiders. </p>
<p>My comment relates to the limitations of hashing alone. A hash value can certainly be generated and placed into the metadata (or associated to the original data) but the same ease to which a hash can be generated and inserted in the metadata it can be regenerated at any time and reinserted into the metadata. If this cannot be prevented or detected, then the incredible power of hashing can actually create a situation where falsified information that contains a hash is perceived to be irrefutably authentic.  </p>
<p>A hash is very useful in determining whether two records are identical or whether a record is unchanged. In fact, de-duplication is premised on the first use case where the only question is – does a record produce the same hash as another record? If yes, they are identical. However, the second use case (is the record unchanged) requires more than just a test of integrity (i.e., hash comparison). Is the record unchanged begs the question – unchanged from when? </p>
<p>Although a hash is a unique digital fingerprint of a set of data, alone it is “floating” or unanchored to a reference which is beyond the control of all, insiders and outsiders – time. That is, if one can circumvent the controls around the record, such as would be the case in criminal or organized crime hacking, or if one is in control of the controls around the data, such as would be the case for a System Administrator, or if an executive can enter into collusion with or coerce such an individual, one can easily go into the content or records management system, falsify the file, create a new fraudulent hash of the falsified file and then insert the fraudulent hash in the metadata of the falsified file.  When tested, a fresh hash of the falsified file will match (compare positively) the fraudulent hash in the metadata of the falsified file. This creates a false sense of integrity that will be very difficult to refute given the current understanding that if you have a hash it is presume “… to ensure integrity.” </p>
<p>Do not get me wrong, hashing is fundamental and a key step in the right direction, but it is “necessary but insufficient.” Once the hash is generated then the issues shifts to the “chain of custody” around the hash and when it was generated. Until the hash and file content is cryptographically bound to time there is a realistic possibility that the above can and will occur given that the necessary skill level is not very high.  </p>
<p>“Outside in” or perimeter-based approaches to preserving the chain-of-custody that depend on the system or trustworthy individuals are more complex, more costly and lower assurance than persistent data-level intrinsic approaches derived from cryptographic hash binding.  The technology to cryptographically bind a hash to time is well established and embodied in the American National Standards Institute (ANSI) X9.95-2005. </p>
<p>Thoughts?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Federal Judge Tries Experimental Method to Resolve a Major e-Discovery Dispute in a Non-Adversial Manner &#171; e-Discovery Team</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-3860</link>
		<dc:creator>Federal Judge Tries Experimental Method to Resolve a Major e-Discovery Dispute in a Non-Adversial Manner &#171; e-Discovery Team</dc:creator>
		<pubDate>Mon, 19 Nov 2007 04:13:42 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-3860</guid>
		<description>[...] reference to hash, is puzzling. Hash coding is a standard procedure for all competent e-discovery vendors and this [...]</description>
		<content:encoded><![CDATA[<p>[...] reference to hash, is puzzling. Hash coding is a standard procedure for all competent e-discovery vendors and this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steve devlin</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2466</link>
		<dc:creator>steve devlin</dc:creator>
		<pubDate>Mon, 24 Sep 2007 19:42:26 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2466</guid>
		<description>Great article! I run the lit support doc management for a government law office. We often get responsive productions 50,000+ pages docs on a DVD (tif &#38; txt &#38; Summation load files).  I'd like to file save the hash values for the disks, folders, and their contents (disk/folder level tells me I have copied it completely; file level gives me individual doc integrity).  Implementing this is another story.  I cannot find hashing programs that do folders or entire disks.  Karen's Directory Printer does a file listing with MD5 and/or SHA.  Its format is the best I've seen for bulk, but is too slow for "big" projects.  I'm looking for a "conmmercial grade" tool capable of handling DVDs and HDDs.  Any suggestions?</description>
		<content:encoded><![CDATA[<p>Great article! I run the lit support doc management for a government law office. We often get responsive productions 50,000+ pages docs on a DVD (tif &amp; txt &amp; Summation load files).  I&#8217;d like to file save the hash values for the disks, folders, and their contents (disk/folder level tells me I have copied it completely; file level gives me individual doc integrity).  Implementing this is another story.  I cannot find hashing programs that do folders or entire disks.  Karen&#8217;s Directory Printer does a file listing with MD5 and/or SHA.  Its format is the best I&#8217;ve seen for bulk, but is too slow for &#8220;big&#8221; projects.  I&#8217;m looking for a &#8220;conmmercial grade&#8221; tool capable of handling DVDs and HDDs.  Any suggestions?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ralph Losey</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2280</link>
		<dc:creator>Ralph Losey</dc:creator>
		<pubDate>Tue, 18 Sep 2007 03:29:57 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2280</guid>
		<description>Greg, that is an excellent issue you raise re the individual emails in a PST file, part of the larger problem of unpacking and identifying files in an archive.  I suppose a set of standards will need to be developed and followed by all vendors for uniformity of hash values; i.e. - how to go about saving an email from the pst to msg so that there is as little alteration as possible, and the same hash is reproduced. 

I do not have a particular suggestion at this time on a standard for this.  Does anyone else have any thoughts on this? Suggestions? Over time I am confident that people with greater technical expertise and experience than I have will be able to figure out good solutions to these and other problems.</description>
		<content:encoded><![CDATA[<p>Greg, that is an excellent issue you raise re the individual emails in a PST file, part of the larger problem of unpacking and identifying files in an archive.  I suppose a set of standards will need to be developed and followed by all vendors for uniformity of hash values; i.e. - how to go about saving an email from the pst to msg so that there is as little alteration as possible, and the same hash is reproduced. </p>
<p>I do not have a particular suggestion at this time on a standard for this.  Does anyone else have any thoughts on this? Suggestions? Over time I am confident that people with greater technical expertise and experience than I have will be able to figure out good solutions to these and other problems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2257</link>
		<dc:creator>Greg</dc:creator>
		<pubDate>Mon, 17 Sep 2007 22:55:05 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2257</guid>
		<description>First off, great job on the article Ralph.  I do have a couple questions / points in response to your paper.  You talk about hashing files to prove that they haven't changed...for certain types of electronic archives- take Microsoft Outlook PSTs as an example - a hash of the PST file (really a container) is useful in authenticating the PST itself, but not later on when  you choose to produce a single email with attachments.  

Now, you could say that you should then hash the individual email and attachments...but the challenge here is what should you hash?  You could save the email as an MSG file and hash it...but unfortunately when you save an MSG file the bits and bytes are different each time - so the same document will have a different hash.  You could save a representation of the file such as RTF or HTML, but different vendors or different software versions could result in different outputs.

This can also be challenging if you must produce reviewed documents back in a PST format.  Creating a PST that is a subset will obviously alter its hash and then you also won't have a great way of referring to individual messages.  Any thoughts on how your ideas can be applied to archives, specifically email archive formats?

Lastly....for files that contain references to other files - say HTML and images - changing any file names will break referential integrity.  This could also be the case for email archives where changing the file name of an attachment would require changing the content of the parent to keep the relationship intact.  Thoughts?</description>
		<content:encoded><![CDATA[<p>First off, great job on the article Ralph.  I do have a couple questions / points in response to your paper.  You talk about hashing files to prove that they haven&#8217;t changed&#8230;for certain types of electronic archives- take Microsoft Outlook PSTs as an example - a hash of the PST file (really a container) is useful in authenticating the PST itself, but not later on when  you choose to produce a single email with attachments.  </p>
<p>Now, you could say that you should then hash the individual email and attachments&#8230;but the challenge here is what should you hash?  You could save the email as an MSG file and hash it&#8230;but unfortunately when you save an MSG file the bits and bytes are different each time - so the same document will have a different hash.  You could save a representation of the file such as RTF or HTML, but different vendors or different software versions could result in different outputs.</p>
<p>This can also be challenging if you must produce reviewed documents back in a PST format.  Creating a PST that is a subset will obviously alter its hash and then you also won&#8217;t have a great way of referring to individual messages.  Any thoughts on how your ideas can be applied to archives, specifically email archive formats?</p>
<p>Lastly&#8230;.for files that contain references to other files - say HTML and images - changing any file names will break referential integrity.  This could also be the case for email archives where changing the file name of an attachment would require changing the content of the parent to keep the relationship intact.  Thoughts?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Rhoden dreams while Ralph Losey cooks up some hash &#171; Post Process</title>
		<link>http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2185</link>
		<dc:creator>Michael Rhoden dreams while Ralph Losey cooks up some hash &#171; Post Process</dc:creator>
		<pubDate>Wed, 12 Sep 2007 23:21:13 +0000</pubDate>
		<guid isPermaLink="false">http://ralphlosey.wordpress.com/2007/09/07/law-review-article-published-on-the-mathematics-underlying-e-discovery-hash-the-new-bates-stamp/#comment-2185</guid>
		<description>[...] post by Ralph Losey on how to abbreviate the hash code in order to create a relevant bates number fits nicely into this discussion, although implementing [...]</description>
		<content:encoded><![CDATA[<p>[...] post by Ralph Losey on how to abbreviate the hash code in order to create a relevant bates number fits nicely into this discussion, although implementing [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
