Documents are no longer static and unchanging. As the creation and distribution of information become more collaborative, dynamic, and social, and as application software evolves to support “mashups” that combine both content and functionality from various sources, traditional definitions of “documents,” their authorship, and their ownership are becoming obsolete.
Not only is it possible to massively duplicate documents without permission, it is also increasingly possible to modify these documents so that the original intention of the author can become lost. In addition, collaborative document authoring, unless carefully controlled, makes it difficult, if not impossible, to identify and track the authorship contributions of individual authors.
This author is skeptical of the ability of individual registration systems, built around concepts of static documents carried over from pre-Internet days, to solve the problem of ownership and identification when documents are constantly changing or are authored collaboratively by groups of temporarily involved authors.
What may be more appropriate, this author feels, is that for certain classes of digital documents, the documents themselves should incorporate information traditionally associated with registration systems, as well as information that records the changes and modifications made over time by the author (or authors). This meta-information, always physically associated with the source document, should always be available for processing in connection with any business transaction that might require authorship and ownership information.
Web 2.0 Document Authoring
In the world of “Web 2.0” it keeps getting easier to collaboratively collect and distribute information content and associated metadata. Readily available aggregations of XML feeds, along with richly-functional remotely hosted content management applications, are enabling people to combine information sources of all kinds in new, unique, and potentially very powerful ways. “Mash-ups” of applications and data drawn from multiple sources are beginning to appear online, facilitated by the evolution of modern application architectures and data standards that facilitate interchangeability.
This is a far cry from the static publishing models of the past. The old focus was on creation of a fixed published object like a page, a book, or a magazine article with a specified set of one or more authors. With Web 2.0 we are now seeing how information -- and operations on information -- are becoming increasingly fluid, flexible, network oriented, and social.
One example is the constantly evolving online encyclopedia Wikipedia with its array of online contributors. Similar wiki technology is being applied to a more specialized application like the Peer to Patent Project which may also take advantage of social networking techniques to improve the patent review process.
Despite this move towards acceptance of a fluid publishing model -- where it’s not always clear where information is stored and who has a hand in information creation or modification -- I believe that it is still important that we not destroy the integrity of the intellectual property we now find so easy to copy and manipulate.
Orienting Oneself on the Web
Usually I try to understand issues like these by relating them to my own experience. I’ll do that here.
Perhaps the concerns that follow are examples of an "old school" attitude, but I still like to know a few things when I fire up my computer and go online:
First, where am I? Am I working on my own machine, or am I working on a machine (and content) with tools located on my machine, or are they located elsewhere? I like to know this for good reasons:
- Transmission delays may impact speed and performance of the network operation I'm performing, especially if both the data and the application I’m working on are located remotely and have to be continually transferred to my machine for processing and display.
- If my connection with a remote server is lost, I want reassurance I can recover and/or continue working. Not knowing where I am with respect to my work complicates my recovery in face of a network or remote hard drive failure.
Second, I usually like to know whose information am I working with? I like to know the provenance of the information -- its history, ownership, authorship, and credibility. Even simple "facts" don't make much sense without context. Key critical contextual details I want to know about include:
- Who wrote this thing?
- Is this what was written?
- What kinds of changes and modifications has this object gone through?
This latter issue takes me to a recent personal experience that drove home what can happen when, on the Web, content and format become separated, perhaps unintentionally. The result can be that the original intended meaning of a “document” is lost; I consider this example to be a metaphor of what is bound to happen as collaborative Web 2.0 technologies become more ingrained in day to day communications.
My Own Experience
I was researching the origins of searches to my own web site, All Kind Food. Using my site vendor’s search logs I tracked back one page hit to a referral from an online html document that had been generated by a reputable news research company that provides keyword-based tracking services and feeds to its customers. All of my original text from one of my blog article was displayed by the HTML page that contained the source link, but with a few twists:
- All my navigational links to other web pages in my site had been deleted
- My email address was gone.
- My copyright notice was gone.
- The formatting was incorrect and displayed a run-on between a sentence I wrote and a quotation (attributed) that I had included from another publication.
The result: a reader of this stripped-down version of my original text could (or would) misinterpret what I had said and what the quotation had said.
My experience is just one example where failure to represent the original structure of a document results in a misrepresentation of what I actually wrote. Now, anyone who publishes on the web knows how easy it is to copy and redistribute digital information. Publishing on the Web is basically an act of faith. While a body of law and related enforcement mechanisms have evolved to “protect” the rights of the intellectual property’s owner, we all know how digital communications have fundamentally altered the economics of intellectual property distribution. And we’re also aware of the conflicts this causes and the strains it puts on legal concepts of copyright, fair use, and intellectual property ownership.
We Need to Develop Better Mechanisms
My concern is that, as “Web 2.0” becomes increasingly interactive, social, and embedded in daily life, we will need to develop better mechanisms for keeping track of what is the individual’s, what is the group’s, and what has changed.
Just as many people are concerned about digital copies being made of their intellectual property without their permission, we also need to respect and maintain the physical and intellectual integrity of individual works, even when those works are intended for social and interactive use and manipulation.
Now, back to my personal experience with “digital republishing.” I still haven’t heard back from this publisher, and its ‘bots and crawlers still visit my web site. But I know that at least one of my publications exists in cyberspace in a corrupted form. People reading that will not receive the intended message. Furthermore, indexing and feed services are probably crawling the erroneous document with the result that my corrupted document is being further indexed, re-distributed, retrieved, and displayed.
This Situation Can Get Serious
I’ve just described a situation where, through the normal operations of how information on the web can be copied, referenced, and retransmitted, both content and physical format can be changed in ways that can alter the author's original intended meaning.
Now, in the overall scheme of things, damage to the original meaning of an article published in an obscure blog like All Kind Food is probably not a big deal. But I can imagine many situations where inadvertent or unintended modifications to source documents could have significant and potentially disastrous impacts on safety and property. Examples are chemical and biological formulas, aircraft maintenance and repair instructions, and arrest records.
Sometimes modifications to the appearance of a source document happen intentionally. For example, we've come to accept the simplified view of a source document presented by some web based feed readers when displaying an RSS, Atom, or RDF feed, and we link back to the original view if we feel the need. (I’m sure, also, that some readers are replacing actual access to the “full text” of an original with a full document feed.)
Savvy web authors know this and compose documents to take into account the likely alterations that formats will go through as data and metadata are transmitted through the web and displayed via various display systems.
Adjustments don't always work. Because of my eyesight, for example, I regularly increase the font size of text displayed by my Firefox web browser when I'm reading web pages. Not all pages scale correctly. Sometimes graphics overlay text, or vice versa. This is not the author’s fault, of course, but the author who fails to take into account what happens at the other end of the document distribution chain runs the risk of having his or her message muddled and corrupted.
Another consideration beyond the unintended alteration by the system of a document’s intended meaning is that, if I write something that is likely to be changed by the distribution system or by the actions of collaborating authors, what do I “own,” the whole document, or just the part I wrote?
And what if, when I write a blog article, where I display a copyright notice, I then come back repeatedly and change, update, and correct the original, and add comments? Even if I can legitimately claim ownership of each different version of my document, do I have a way of knowing if people who read the original will know that I've made corrections or changes? And, if there is a danger that early drafts will be cached and downloaded, do I have any moral or legal responsibility for seeking out and communicating with downloaders to tell them that something has changed? (What if the changes I make have the effect of removing mistakes that might have caused personal injury?)
These are difficult questions that are raised by several factors, including:
- The malleable nature of documents and information when published on the web.
- The difficulty in assigning ownership and responsibility when multiple people are involved in creating and/or updating a document.
One possible approach to help address questions like these is to develop registration type systems that help identify and describe works and their owners (or creators). These registration systems exist outside the created works as bodies of metadata and, in theory, can be kept updated by synchronizing changes in the original documents with changes in the registration system. In theory, registration systems can serve as an authoritative repository of information about the document and its creators and can also be used to store and track various business transactions associated with the document, e.g., transfers of ownership, usage, sales, etc.
I am somewhat skeptical of registration systems to be used for tracking ownership in “Web 2.0” situations where documents are malleable and collaboratively developed. The malleable nature of information is one of the reasons I am skeptical of registration systems that require a point-in-time registration and deposit of a document in order to be able to maintain or express certain rights. It is my understanding that, under U.S. copyright law, works are protected by law the moment they are written. Making people jump through additional hoops must have a strong justification, especially if they are developing works that frequently change or involve collaborative efforts to create and maintain.
This is the classic problem of registration systems that exist independent of the original work – keeping the two synchronized. In an online interactive collaborative environment like the web, I would argue that it is impossible to keep the two synchronized.
I'm NOT saying that registration systems don't have a purpose. If you want to get paid royalties or usage fees based on actions performed against a specific object you'll need a tracking mechanism of some sort. Such systems, however, should be designed to support specific transactions associated with works that may not stay fixed for very long (as opposed to static media such as books), and that’s a challenge unless the system is tied into the actual transaction system.
I was thinking about these issues when I ran across a new web site called "esbn.org" that appears to be a spinoff of a technology vendor, BookFob, that has developed an ebook publishing system that combines storage, reader, and delivery mechanisms so that each electronic copy of a document (e.g., a book) can be tied to the unique serial number of the device.
The "electronic standard book number" or "ESBN" (not to be confused with ISBN, the official numbering scheme used by publishers) appears to be a spinoff development that is being marketed as a solution for the identification needs of digital publishers. You register as a publisher or author with the site, create a description of the document you want to "register," and upload the file (sizes are limited during the site's beta stage). In return you get a registration number you can use for a variety of purposes and the promised availability of the system in the future to help you prove ownership.
The technology is slick, a Firefox extension is available, and esbn.org is being blogged about. It's an example of a technology-enabled entrepreneurial approach to solving certain types of licensing and distribution issues. (The lack of information on the ESBN web site [as of February 7, 2006] about the company itself, its funding, its business model, its management, its technology infrastructure, its storage capacity, its backup and security procedures, the numbering scheme itself, its standards committee makeup, and its existing customer base, are of concern to me -- so I've registered to check it out.)
Unique Numbering Systems
I'm also undecided about the viability of unique document numbering systems in the context of the web since they can be stripped off if they're not embedded with the source document. If they are embedded using a watermarking or encryption system, that has the effect of (a) complicating the authoring and updating process (which negates some of the ease of the "wild and wooly" publishing environment the Web has become), and (b) serious pirates will be able to overcome them anyway, just as serious pirates have negated the effectiveness of audio CD DRM schemes. So registration systems by themselves can provide a false sense of security unless (as with the BookFob) they are tied directly to a secure physical storage device and reader.
Granted, I don’t have the same interests as a commercial publisher or professional author. I'm a consultant and project manager who uses the web to communicate with friends, colleagues, potential clients, and others who might share my interests. Models developed to support other types of transactions might not apply in my case, especially if those models have the impact of impeding, rather than promoting, information dissemination. I am skeptical of registration systems for the types of works that are being ushered in by the Web 2.0 evolution.
I do believe that one solution is for works to carry with them basic details describing their change history, authorship, and ownership. Granted, this type of solution increases overhead, both in the size of the document itself and in terms of the complexity of the document authoring system required for authoring and the maintenance of history and ownership information. Plus, this type of solution does not solve the problem of unauthorized duplication or of the possibility of miscommunication due to the document being “damaged” (as was the case with my own blog article). Even systems such as Adobe’s Acrobat only make it difficult, not impossible, to modify a document once it is composed for printing, and Acrobat document file structures are readily understood by a variety of tools, including Google’s reasonably-successful PDF-to-HTML conversion software.
No, there’s just no way around the fact that any digital document format is vulnerable and that spending time and money to “lock up” documents against unauthorized change may be a waste of time and money.
One thing I do find encouraging is the increased availability of tools such as wikis and other collaborative authoring systems. These systems cannot operate without sophisticated internal change tracking systems, and they are increasingly being made available as remotely hosted services. The significance of this latter fact is that, freed from the confines of laptop and PC memory and storage requirements, collaborative authoring systems can be developed as “heavy duty” utilities that can offer more and better authoring, tracking, and security features than are possible with applications designed for the desktop.
It’s also important to point out that the perceived need for the types of solutions I’ve outlined above varies by individual and industry. As I’ve already found in my Web 2.0 Management Survey, certain industries may be less likely than others to employ Web 2.0 technologies for communicating with their customers.
Potentially more significant is that some authors in the “blogosphere” are fundamentally antithetical to concepts of copyright and ownership. Finally, some types of communication – e.g. blogs that are maintained by corporations in order to engage their customers in meaningful but ephemeral dialog – may not be viewed as candidates for permanent storage and processing.
After all is said and done, though, ease of use and cost will be major adoption factors in any solution. Web users vote with their fingertips and pocketbooks.
Portions of this article are based on documents published earlier in the author’s own weblog, All Kind Food.