Introduction
Some say that P2P is the future of the Internet, others that
it's just a passing fad. In this article I will examine the
relationship between P2P and the Web, and how both
technologies might benefit each other.
What is P2P?
Before going further, it'll help to define what I'm talking
about when I use the phrase "P2P". Several attempts have been
made at drawing the boundary lines as to what constitutes a
P2P application. One thing that has become immediately clear
is that a strict technical definition is not appropriate --
most P2P applications currently make use of a central server
at some point.
Having earnestly studied the marketing materials for many
applications, I've come to the opinion that the best
definition of P2P is: "P2P is whoever says they're P2P." And
it's quite likely that before that they were probably B2C, B2B
and A2A!
Seriously though, there's no doubt that there is a phenomenon
known as P2P, driven by the need to communicate and
collaborate, and with a strong element of decentralized
control. It's this that I take as my starting point for P2P,
and I doubt whether at this stage any stronger definition is
viable.
The purpose of this article is to examine this brave new world
of user-level network applications, and compare it against the
most popular user-level network application to-date, the Web.
The Early Web
Some of the defining characteristics of the early Web shed
light on both why it was popular, and also on why P2P
applications are achieving popularity. The early Web was easy:
for the initial audience it was easy to read, easy to write
and easy to implement. The ready availability of clients for
multiple platforms, along with the "View Source" ethic,
encouraged the rapid take-up of the new technology.
Furthermore, the Web was a lot of fun! People revelled in the
extra ability to express themselves and share information.
Think here about the ease-of-use of instant messaging, or the
instant satisfaction factor of Napster. There's a short and
direct feedback loop, based on interaction with our fellow
network users.
As the Web Grew
As the Web grew in popularity and adoption, its
characteristics have necessarily changed. It hit upon several
problems, connected with the increase in number of users, and
the level of expertise of those new users. Web-based
collaboration became less usable, network congestion became an
issue, media outlets talked of the "World Wide Wait".
Whereas in the first instance you might well have classified
the Web as peer-to-peer in the sense that nearly everyone who
wielded a browser also published their own web pages, that
ratio changed too. Less expert users, on
sporadically-connected client-only platforms (mostly personal
computers), grew to be the majority. For those users there was
less control, less immediacy and little of the positive
feedback that Web early adopters drew. Sure, the Web was still
great with its large and diverse content, but the excitement
just wasn't the same.
One major source of disappointment for web users is the
ineffectiveness of search. The web's simple-minded and
free-spirited approach to markup was simultaneously the key to
widespread adoption and the death knell for structured
searching. Even the most advanced search engine technology of
today is remarkably basic, and the user still has to do a lot
of the walking. Two factors contribute to this information
noise: poor information quality, due to the lack of any
reputation management, and poor page classification.
So, with all these faults, is the Web in decline? No way. The
Web has established certain baselines that benefit everyone
and which ensures its place as a foundation of Internet
information systems. These can be divided into technical
factors and human factors. Technical factors include:
-
Universality. The ubiquity of HTML and
HTTP in implementation ensures that Web technology is a
basic level of communication that practically all platforms
support -- even some microwave ovens!
-
Internationalization. Although still
immature, the Web infrastructure is designed to support
international content on a platform independent basis.
Standards like XML and Unicode are encouraging this, and
browser technology has increasingly good
internationalization support.
-
Openness. The Web is based on open
standards, with an increasing commitment to those standards
from vendors. This promotes participation from software
creators as well as the benefits of open scrutiny for the
protocols and technologies.
-
Integration. Largely due to the ubiquity
of Web software components, we're now seeing the Web
integrated into many other applications. A basic example is
the delivery of software updates over the web. Further on,
most if not all modern desktop environments feature Web
integration.
More human factors include:
-
Content. The breadth, depth and
extraordinary diversity of content on the Web means it is
an important cornerstone of information systems. Anything
that might replace it must exceed the Web's usefulness and
challenge for the accessiblility crown. Notwithstanding my
comments about web authors being proportionally on the
decline, it's still sufficiently easy to publish on the Web
that its primacy for content is not yet challenged.
-
Understanding. It's important that users
of a system understand how it works, and what its
limitations are. Albeit increasing slowly, there is an
increasing awareness of what the Web is, and what it is
not. URLs are sufficiently commonplace that most people
have a basic understanding of the technology.
So What is the Web Anyway?
When defining the web, most people would probably still say
"Well, it's HTTP and HTML." This may have been an original
fact of the situation, but the philosophy of the Web is a lot
broader. As seen by Tim Berners-Lee, the Web is in fact
constituted of anything that can be named with a uniform
resource identifier, URI. URIs include URLs as a subset, but
also include those things which have names but cannot be
retrieved directly via a protocol that forms part of that
name. The other foundation stone of the Web is of course
internationalization -- whose end, at least in the W3C's eyes,
is Unicode everywhere.
What does this expansion of the concept of the Web mean?
Anything with a URI is part of the Web. If we look at things
directly accessible, this might include:
-
USENET (news:)
-
Pages on a web server (http:)
-
Telephones (tel:)
-
Email (mailto:)
-
Freenet (freenet:)
What you can notice here immediately is that some P2P
applications, given a URI naming scheme, are automatically
part of the Web's structure. It would be quite simple to
create URIs for instant messaging, for instance, in addition.
Ideally, we want P2P protocols to have some of the Web's other
characteristics as mentioned above, as well as just a URI
scheme.
P2P Applications Scratch Itches
Now we've established where the Web is, let's look again at
some of its deficiencies. It's my opinion that several of the
popular P2P trends address direct lacks in current web
technology. These attributes of P2P apps include:
-
Ease of "publication". With file sharing
clients, publication is as simple as dragging and dropping.
While some web programs offer this, it either costs or
results in unfortunate lockin to certain systems. The fact
is that web publication is still harder than it should be,
perhaps primarily due to configuration issues.
-
Everyone's a server. Supported by the
increasing availability of always-on connectivity, P2P's
everyone-a-server mentality gets back to how things were
before the dialup explosion. It means, in effect, that
sharing and publication involves fewer configuration and
inconvenient upload/synchronization issues.
-
Better user interfaces. Perhaps this one's
the killer point. Though there is definitely room for
improvement in P2P app UIs, dedicated interfaces that are
suited for the task make a life a lot easier than contorted
web-forms interfaces. Many web-based communication and
collaboration solutions lose heavily because of user
interface issues.
-
Controlled communities. Dedicated P2P apps
create communities with a certain amount of control, which
is largely due to the use of a custom UI. This makes it a
lot easier to introduce more structure either
transparently, or with minimum fuss. For example, benefits
can be drawn in the metadata (and thus searching) arena,
and cryptographic transport (e.g. Groove). These are areas
where on the Web it's difficult to persuade users their
extra effort is worth it.
P2P Needs to be Plugged in to the Web
The itches that P2P scratch are all well and good, but what's
the use of systems that aren't integrated into the ubiquitous
protocols and software systems that we all use today? In
today's environment, no technology can afford to stay an
island. Remember "push" technology? Rather than reinventing
from the ground up, P2P applications need to open up and
embrace the open platform of the Web.
One useful integration technique is proxying. A local HTTP
gateway to another service can be run locally on a machine,
providing an HTML/HTTP-based user interface. This technique
has been used for Freenet, and for a telephony interface.
There's even a project for Mozilla, called Protozilla, to enable
Mozilla to natively understand new URI schemes.
Many P2P apps already use the Web as a bootstrapping point,
underlining that integration is desirable. The alternative --
fragmentation -- is in the long term neither in the vendor or
the users' interest.
Scratching the Web's Itches
Proxying is just the first step. Web technologies are in rapid
development that will help bring the missing pieces from P2P
to the Web. Let's examine some of these.
-
User Interface. It's long been recognized
that the Web's UI features are woeful. The W3C's XForms activity
is working on creating a new set of widgets and
browser/server interaction model. These should go a long
way to making the Web a more comfortable interface for
performing interactive tasks.
-
Ease of publishing. Although it's been
around a while, WebDAV
(HTTP extensions to support content authoring and
management) is still in the startup stage as a technology,
though it's great to see it in the latest versions of
Microsoft operating systems and in GNOME's Nautilus. Where it's
really lacking is on the ISP side, though folk like
FreeDrive, DriveSpace and XDrive are providing WebDAV
access. Such access needs to get nearer to the user. When
users sign up with an ISP they need to get a folder on
their desktop, which is their web site via WebDAV.
-
Shared computation. This is an area in
which web technologies are making leaps and bounds of late.
Both Sun and Microsoft's future strategies appear to depend
on distributed computing over the Web, using HTTP and XML
as the foundation. RPC/messaging focused technologies such
as SOAP, WSDL, UDDI and friends all provide an underpinning
for web-distributed computation. Other, more minority
interest, techniques like shared data spaces can also
underpin web-distributed computation. Technologies such as
RDF, essentially an interchange format for knowledge, are
important in this area.
To me, one of the most important things about P2P that I want
to see brought back into the Web is giving back the user
decent access to content creation facilities. Browser vendors
have consistently failed to fulfill the original vision of the
writeable web, and the flow of hypertext creation has become
ever increasingly one-way. We've seen web applications like
Blogger and EditThisPage.com become successful in filling this
need, but there's no obvious business model there, and many of
these companies are struggling. The authoring, publishing and
interaction capability simply has to make its way back into
the user agent. (There's an argument to be made that the
Internet Service Provider should provide these types of
services.) Those of us who determinedly use the Web as a
research notebook, and as a means of keeping in touch with
family and friends, as well as a business tool know the value
of being able to create hypertext -- but we know how hard it
is too.
While the technologies I've mentioned above are to a certain
extent works-in-progress, there are also aspects of the Web
that are more established, which P2P itself will benefit from
building on.
-
Data interoperability. The rise of
interoperable data formats, due to the influence of XML,
enables applications and developers to "do more" with data.
We've seen some creative uses of other applications' data
due to open file formats: these things often "just happen."
Projects like Jabber
illustrate the potential of opening up applications and
protocols like this.
-
Decentralization. Picked out as one of the
strong features in P2P, the Web has always been a
decentralized information resource. So there's already a
philosophical compatibility between the Web and P2P,
meaning that such things as the adoption of URI addressing
schemes should be natural. It's hard to think of a good
reason not to try and integrate P2P architectures with the
Web.
Some P2P-ish Ideas for the Web
One advantage P2P applications have had is the opportunity to
make a fresh start with several ideas unencumbered by web
history. While this is both good and bad, there are some ideas
from P2P apps that I think the Web would definitely benefit
from.
The first of these that attracted me was search by example, as
employed by OpenCOLA.
The idea here is that a simple way of searching for quite a
complex set of metadata matches is to throw out an example.
This is a lot simpler for the user than defining separately
each metadata facet and the relationships between them. As the
Web heads towards exposing a larger amount of metadata, useful
ways will be required to sift through this extra information.
We already know that users of search engines only want one
text-box and that's it -- how are they going to cope with the
many dimensions of investigation created by decent web
metadata? Therefore, the search by example technique seems a
hopeful way to progress, although it will require some thought
in order to implement.
The second aspect of P2P applications that the Web could
benefit from is a notification API. One of the most awkward
things about the web is the need to continually check pages to
see if they've changed. There's no general publish-subscribe
system in order to notify me of updates, or the meeting of
particular search criteria. Particular applications do
implement this in a proprietary way -- but there is no
standard piece of web infrastructure to implement this.
Perhaps this is another area in which it's hard to find a
business model. This is a problem that needs to be solved by
both web protocol engineers and user agent creators. WebDAV
may be one way in which this problem can be attacked,
especially if client-side servers were in use. The ICE protocol has
already implemented notifications, but it has a specialized
application area.
Conclusions
Peer-to-peer applications, free from legacy issues, have
started to fill needs where the web browser and web-based
applications are failing. In particular communication and
collaboration are needs which the Web isn't meeting properly.
Ease-of-publication, configuration and search are advantages
of P2P applications. However, the Web's ubiquity and openness
contrast with the fragmented, control-seeking, nature of many
P2P networks. If P2P-type apps aren't to die from
fragmentation, or to collapse into corporation-controlled
monopoly, then they need to embrace the technologies of the
Web. The Web already has many technologies that are a suitable
substrate for P2P applications, there's no need to start from
scratch. On the other hand, Web software vendors -- and
Internet service providers -- need to pay more attention to
the desires of the user, and enable content creation,
collaboration and searching to reach new levels of usability.
"Just good enough" won't always be good enough.