As per Relevance of the word replication, we have this rfc below:











Network Working Group I.
Request for Comments: 3040 Equinix, Inc
Category: Informational I.

G.
CacheFlow Inc
January 2001


Internet Web Replication and Caching

Status of this

This memo provides information for the Internet community. It
not specify an Internet standard of any kind. Distribution of
memo is unlimited

Copyright

Copyright (C) The Internet Society (2001). All Rights Reserved



This memo specifies standard terminology and the taxonomy of
replication and caching infrastructure as deployed today.
introduces standard concepts, and protocols used today within
application domain. Currently deployed solutions employing
technologies are presented to establish a standard taxonomy.
problems with caching proxies are covered in the document
"Known HTTP Proxy/Caching Problems", and are not part of
document. This document presents open protocols and points
published material for each protocol

Table of

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Base Terms . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 First order derivative terms . . . . . . . . . . . . . . . 6
2.3 Second order derivatives . . . . . . . . . . . . . . . . . 7
2.4 Topological terms . . . . . . . . . . . . . . . . . . . . 7
2.5 Automatic use of proxies . . . . . . . . . . . . . . . . . 8
3. Distributed System Relationships . . . . . . . . . . . . . 9
3.1 Replication Relationships . . . . . . . . . . . . . . . . 9
3.1.1 Client to Replica . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Inter-Replica . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Proxy Relationships . . . . . . . . . . . . . . . . . . . 10
3.2.1 Client to Non-Interception Proxy . . . . . . . . . . . . . 10



Cooper, et al. Informational [Page 1]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


3.2.2 Client to Surrogate to Origin Server . . . . . . . . . . . 10
3.2.3 Inter-Proxy . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.3.1 (Caching) Proxy Meshes . . . . . . . . . . . . . . . . . . 11
3.2.3.2 (Caching) Proxy Arrays . . . . . . . . . . . . . . . . . . 12
3.2.4 Network Element to Caching Proxy . . . . . . . . . . . . . 12
4. Replica Selection . . . . . . . . . . . . . . . . . . . . 13
4.1 Navigation Hyperlinks . . . . . . . . . . . . . . . . . . 13
4.2 Replica HTTP Redirection . . . . . . . . . . . . . . . . . 14
4.3 DNS Redirection . . . . . . . . . . . . . . . . . . . . . 14
5. Inter-Replica Communication . . . . . . . . . . . . . . . 15
5.1 Batch Driven Replication . . . . . . . . . . . . . . . . . 15
5.2 Demand Driven Replication . . . . . . . . . . . . . . . . 16
5.3 Synchronized Replication . . . . . . . . . . . . . . . . . 16
6. User Agent to Proxy Configuration . . . . . . . . . . . . 17
6.1 Manual Proxy Configuration . . . . . . . . . . . . . . . . 17
6.2 Proxy Auto Configuration (PAC) . . . . . . . . . . . . . . 17
6.3 Cache Array Routing Protocol (CARP) v1.0 . . . . . . . . . 18
6.4 Web Proxy Auto-Discovery Protocol (WPAD) . . . . . . . . . 18
7. Inter-Proxy Communication . . . . . . . . . . . . . . . . 19
7.1 Loosely coupled Inter-Proxy Communication . . . . . . . . 19
7.1.1 Internet Cache Protocol (ICP) . . . . . . . . . . . . . . 19
7.1.2 Hyper Text Caching Protocol . . . . . . . . . . . . . . . 20
7.1.3 Cache Digest . . . . . . . . . . . . . . . . . . . . . . . 21
7.1.4 Cache Pre-filling . . . . . . . . . . . . . . . . . . . . 22
7.2 Tightly Coupled Inter-Cache Communication . . . . . . . . 22
7.2.1 Cache Array Routing Protocol (CARP) v1.0 . . . . . . . . . 22
8. Network Element Communication . . . . . . . . . . . . . . 23
8.1 Web Cache Control Protocol (WCCP) . . . . . . . . . . . . 23
8.2 Network Element Control Protocol (NECP) . . . . . . . . . 24
8.3 SOCKS . . . . . . . . . . . . . . . . . . . . . . . . . . 25
9. Security Considerations . . . . . . . . . . . . . . . . . 25
9.1 Authentication . . . . . . . . . . . . . . . . . . . . . . 26
9.1.1 Man in the middle attacks . . . . . . . . . . . . . . . . 26
9.1.2 Trusted third party . . . . . . . . . . . . . . . . . . . 26
9.1.3 Authentication based on IP number . . . . . . . . . . . . 26
9.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2.1 Trusted third party . . . . . . . . . . . . . . . . . . . 26
9.2.2 Logs and legal implications . . . . . . . . . . . . . . . 27
9.3 Service security . . . . . . . . . . . . . . . . . . . . . 27
9.3.1 Denial of service . . . . . . . . . . . . . . . . . . . . 27
9.3.2 Replay attack . . . . . . . . . . . . . . . . . . . . . . 27
9.3.3 Stupid configuration of proxies . . . . . . . . . . . . . 28
9.3.4 Copyrighted transient copies . . . . . . . . . . . . . . . 28
9.3.5 Application level access . . . . . . . . . . . . . . . . . 28
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 28
References . . . . . . . . . . . . . . . . . . . . . . . . 28
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 31
Full Copyright Statement . . . . . . . . . . . . . . . . . 32



Cooper, et al. Informational [Page 2]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


1.

Since its introduction in 1990, the World-Wide Web has evolved from
simple client server model into a complex distributed architecture
This evolution has been driven largely due to the scaling
associated with exponential growth. Distinct paradigms and
have emerged to satisfy specific requirements. Two
infrastructure components being employed to meet the demands of
growth are replication and caching. In many cases, there is a
for web caches and replicated services to be able to coexist

This memo specifies standard terminology and the taxonomy of
replication and caching infrastructure deployed in the
today. The principal goal of this document is to establish a
understanding and reference point of this application domain

It is also expected that this document will be used in the
of a standard architectural framework for efficient, reliable,
predictable service in a web which includes both replicas and caches

Some of the protocols which this memo examines are specified only
company technical white papers or work in progress documents.
references are included to demonstrate the existence of
protocols, their experimental deployment in the Internet today, or
aid the reader in their understanding of this technology area

There are many protocols, both open and proprietary, employed in
replication and caching today. A majority of the open
include DNS [8], Cache Digests [21][10], CARP [14], HTTP [1],
[2], PAC [12], SOCKS [7], WPAD [13], and WCCP [18][19].
protocols, and their use within the caching and
environments, are discussed below

2.

The following terminology provides definitions of common terms
within the web replication and caching community. Base terms
taken, where possible, from the HTTP/1.1 specification [1] and
included here for reference. First- and second-order derivatives
constructed from these base terms to help define the
that exist within this area

Terms that are in common usage and which are contrary to
in RFC 2616 and this document are highlighted







Cooper, et al. Informational [Page 3]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


2.1 Base

The majority of these terms are taken as-is from RFC 2616 [1],
are included here for reference

client (taken from [1])
A program that establishes connections for the purpose of
requests

server (taken from [1])
An application program that accepts connections in order
service requests by sending back responses. Any given program
be capable of being both a client and a server; our use of
terms refers only to the role being performed by the program for
particular connection, rather than to the program's
in general. Likewise, any server may act as an origin server
proxy, gateway, or tunnel, switching behavior based on the
of each request

proxy (taken from [1])
An intermediary program which acts as both a server and a
for the purpose of making requests on behalf of other clients
Requests are serviced internally or by passing them on,
possible translation, to other servers. A proxy MUST
both the client and server requirements of this specification.
"transparent proxy" is a proxy that does not modify the request
response beyond what is required for proxy authentication
identification. A "non-transparent proxy" is a proxy
modifies the request or response in order to provide some
service to the user agent, such as group annotation services
media type transformation, protocol reduction, or
filtering. Except where either transparent or non-
behavior is explicitly stated, the HTTP proxy requirements
to both types of proxies

Note: The term "transparent proxy" refers to a
transparent proxy as described in [1], not what is
understood within the caching community. We recommend that the
"transparent proxy" is always prefixed to avoid confusion (e.g.,
"network transparent proxy"). However, see definition
"interception proxy" below

The above condition requiring implementation of both the server
client requirements of HTTP/1.1 is only appropriate for a non-
transparent proxy






Cooper, et al. Informational [Page 4]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


cache (taken from [1])
A program's local store of response messages and the
that controls its message storage, retrieval, and deletion.
cache stores cacheable responses in order to reduce the
time and network bandwidth consumption on future,
requests. Any client or server may include a cache, though
cache cannot be used by a server that is acting as a tunnel

Note: The term "cache" used alone often is meant as "caching proxy".

Note: There are additional motivations for caching, for
reducing server load (as a further means to reduce response time).

cacheable (taken from [1])
A response is cacheable if a cache is allowed to store a copy
the response message for use in answering subsequent requests
The rules for determining the cacheability of HTTP responses
defined in section 13. Even if a resource is cacheable, there
be additional constraints on whether a cache can use the
copy for a particular request

gateway (taken from [1])
A server which acts as an intermediary for some other server
Unlike a proxy, a gateway receives requests as if it were
origin server for the requested resource; the requesting
may not be aware that it is communicating with a gateway

tunnel (taken from [1])
An intermediary program which is acting as a blind relay
two connections. Once active, a tunnel is not considered a
to the HTTP communication, though the tunnel may have
initiated by an HTTP request. The tunnel ceases to exist
both ends of the relayed connections are closed


"Creating and maintaining a duplicate copy of a database or
system on a different computer, typically a server." -
Online Dictionary of Computing (FOLDOC

inbound/outbound (taken from [1])
Inbound and outbound refer to the request and response paths
messages: "inbound" means "traveling toward the origin server",
and "outbound" means "traveling toward the user agent".

network
A network device that introduces multiple paths between source
destination, transparent to HTTP




Cooper, et al. Informational [Page 5]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


2.2 First order derivative

The following terms are constructed taking the above base terms
foundation

origin server (taken from [1])
The server on which a given resource resides or is to be created

user agent (taken from [1])
The client which initiates a request. These are often browsers
editors, spiders (web-traversing robots), or other end user tools

caching
A proxy with a cache, acting as a server to clients, and a
to servers

Caching proxies are often referred to as "proxy caches" or
"caches". The term "proxy" is also frequently misused
referring to caching proxies


A gateway co-located with an origin server, or at a
point in the network, delegated the authority to operate on
of, and typically working in close co-operation with, one or
origin servers. Responses are typically delivered from
internal cache

Surrogates may derive cache entries from the origin server or
another of the origin server's delegates. In some cases
surrogate may tunnel such requests

Where close co-operation between origin servers and
exists, this enables modifications of some protocol requirements
including the Cache-Control directives in [1]. Such
have yet to be fully specified

Devices commonly known as "reverse proxies" and "(origin)
accelerators" are both more properly defined as surrogates

reverse
See "surrogate".

server
See "surrogate".







Cooper, et al. Informational [Page 6]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


2.3 Second order

The following terms further build on first order derivatives

master origin
An origin server on which the definitive version of a
resides

replica origin
An origin server holding a replica of a resource, but which
act as an authoritative reference for client requests

content
The user or system that initiates inbound requests, through use
a user agent


A special instance of a user agent that acts as a
presentation device for content consumers

2.4 Topological

The following definitions are added to describe caching
topology

user agent
The cache within the user agent program

local caching
The caching proxy to which a user agent connects

intermediate caching
Seen from the content consumer's view, all caches participating
the caching mesh that are not the user agent's local
proxy

cache
A server to requests made by local and intermediate
proxies, but which does not act as a proxy

cache
A cluster of caching proxies, acting logically as one service
partitioning the resource name space across the array. Also
as "diffused array" or "cache cluster".







Cooper, et al. Informational [Page 7]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


caching
a loosely coupled set of co-operating proxy- and (optionally
caching-servers, or clusters, acting independently but
cacheable content between themselves using inter-
communication protocols

2.5 Automatic use of

Network administrators may wish to force or facilitate the use
proxies by clients, enabling such configuration within the
itself or within automatic systems in user agents, such that
content consumer need not be aware of any such configuration issues

The terms that describe such configurations are given below

automatic user-agent proxy
The technique of discovering the availability of one or
proxies and the automated configuration of the user agent to
them. The use of a proxy is transparent to the content
but not to the user agent. The term "automatic
configuration" is also used in this sense

traffic
The process of using a network element to examine network
to determine whether it should be redirected

traffic
Redirection of client requests from a network element
traffic interception to a proxy. Used to deploy (caching)
without the need to manually reconfigure individual user agents
or to force the use of a proxy where such use would not
occur

interception proxy (a.k.a. "transparent proxy", "transparent cache")
The term "transparent proxy" has been used within the
community to describe proxies used with zero configuration
the user agent. Such use is somewhat transparent to user agents
Due to discrepancies with [1] (see definition of "proxy" above),
and objections to the use of the word "transparent", we
the term "interception proxy" to describe proxies that
redirected traffic flows from network elements performing
interception

Interception proxies receive inbound traffic flows through
process of traffic redirection. (Such proxies are deployed
network administrators to facilitate or require the use
appropriate services offered by the proxy). Problems
with the deployment of interception proxies are described in



Cooper, et al. Informational [Page 8]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


document "Known HTTP Proxy/Caching Problems" [23]. The use
interception proxies requires zero configuration of the user
which act as though communicating directly with an origin server

3. Distributed System

This section identifies the relationships that exist in a
replication and caching environment. Having defined
relationships, later sections describe the communication
used in each relationship

3.1 Replication

The following sections describe relationships between clients
replicas and between replicas themselves

3.1.1 Client to

A client may communicate with one or more replica origin servers,
well as with master origin servers. (In the absence of
servers the client interacts directly with the origin server as
the normal case.)

------------------ ----------------- ------------------
| Replica Origin | | Master Origin | | Replica Origin |
| Server | | Server | | Server |
------------------ ----------------- ------------------
\ | /
\ | /
-----------------------------------------
| Client
----------------- Replica
| Client |
-----------------

Protocols used to enable the client to use one of the replicas can
found in Section 4.

3.1.2 Inter-

This is the relationship between master origin server(s) and
origin servers, to replicate data sets that are accessed by
in the relationship shown in Section 3.1.1.








Cooper, et al. Informational [Page 9]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


------------------ ----------------- ------------------
| Replica Origin |-----| Master Origin |-----| Replica Origin |
| Server | | Server | | Server |
------------------ ----------------- ------------------

Protocols used in this relationship can be found in Section 5.

3.2 Proxy

There are a variety of ways in which (caching) proxies and
servers communicate with each other, and with user agents

3.2.1 Client to Non-Interception

A client may communicate with zero or more proxies for some or
requests. Where the result of communication results in no
being used, the relationship is between client and (replica)
server (see Section 3.1.1).

----------------- ----------------- -----------------
| Local | | Local | | Local |
| Proxy | | Proxy | | Proxy |
----------------- ----------------- -----------------
\ | /
\ | /
-----------------------------------------
|
-----------------
| Client |
-----------------

In addition, a user agent may interact with an additional server -
operated on behalf of a proxy for the purpose of automatic user
proxy configuration

Schemes and protocols used in these relationships can be found
Section 6.

3.2.2 Client to Surrogate to Origin

A client may communicate with zero or more surrogates for
intended for one or more origin servers. Where a surrogate is
used, the client communicates directly with an origin server.
a surrogate is used the client communicates as if with an
server. The surrogate fulfills the request from its internal cache
or acts as a gateway or tunnel to the origin server





Cooper, et al. Informational [Page 10]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


-------------- -------------- --------------
| Origin | | Origin | | Origin |
| Server | | Server | | Server |
-------------- -------------- --------------
\ | /
\ | /
-----------------
| Surrogate |
| |
-----------------
|
|
------------
| Client |
------------

3.2.3 Inter-

Inter-Proxy relationships exist as meshes (loosely coupled)
clusters (tightly coupled).

3.2.3.1 (Caching) Proxy

Within a loosely coupled mesh of (caching) proxies, communication
happen at the same level between peers, and with one or more parents

--------------------- ---------------------
-----------| Intermediate | | Intermediate |
| | Caching Proxy (D) | | Caching Proxy (E) |
|(peer) --------------------- ---------------------
-------------- | (parent) / (parent
| Cache | | ------/
| Server (C) | | /
-------------- | /
(peer) | ----------------- ---------------------
-------------| Local Caching |-------| Intermediate |
| Proxy (A) | (peer)| Caching Proxy (B) |
----------------- ---------------------
|
|
----------
| Client |
----------

Client included for illustration purposes






Cooper, et al. Informational [Page 11]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


An inbound request may be routed to one of a number of
(caching) proxies based on a determination of whether that parent
better suited to resolving the request

For example, in the above figure, Cache Server C and
Caching Proxy B are peers of the Local Caching Proxy A, and may
be used when the resource requested by A already exists on either
or C. Intermediate Caching Proxies D & E are parents of A, and it
A's choice of which to use to resolve a particular request

The relationship between A & B only makes sense in a
environment, while the relationships between A & D and A & E are
appropriate where D or E are non-caching proxies

Protocols used in these relationships can be found in Section 7.1.

3.2.3.2 (Caching) Proxy

Where a user agent may have a relationship with a proxy, it
possible that it may instead have a relationship with an array
proxies arranged in a tightly coupled mesh

----------------------
---------------------- |
--------------------- | |
| (Caching) Proxy | |-----
| Array |----- ^ ^
--------------------- ^ ^ | |
^ ^ | |--- |
| |----- |
--------------------------

Protocols used in this relationship can be found in Section 7.2.

3.2.4 Network Element to Caching

A network element performing traffic interception may choose
redirect requests from a client to a specific proxy within an array
(It may also choose not to redirect the traffic, in which case
relationship is between client and (replica) origin server,
Section 3.1.1.)










Cooper, et al. Informational [Page 12]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


----------------- ----------------- -----------------
| Caching Proxy | | Caching Proxy | | Caching Proxy |
| Array | | Array | | Array |
----------------- ----------------- -----------------
\ | /
-----------------------------------------
|
--------------
| Network |
| Element |
--------------
|
///
|
------------
| Client |
------------

The interception proxy may be directly in-line of the flow of
- in which case the intercepting network element and
proxy form parts of the same hardware system - or may be out-of-path
requiring the intercepting network element to redirect traffic
another network segment. In this latter case,
protocols enable the intercepting network element to stop and
redirecting traffic when the interception proxy
(un)available. Details of these protocols can be found in Section 8.

4. Replica

This section describes the schemes and protocols used in
cooperation and communication between client and replica origin
servers. The ideal situation is to discover an optimal
origin server for clients to communicate with. Optimality is
policy based decision, often based upon proximity, but may be
on other criteria such as load

4.1 Navigation

Best known reference
This memo

Description
The simplest of client to replica communication mechanisms.
utilizes hyperlink URIs embedded in web pages that point to
individual replica origin servers. The content consumer
selects the link of the replica origin server they wish to use





Cooper, et al. Informational [Page 13]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Security
Relies on the protocol security associated with the
URI scheme

Deployment
Probably the most commonly deployed client to
communication mechanism. Ubiquitous interoperability with humans

Submitter
Document editors

4.2 Replica HTTP

Best known reference
This memo

Description
A simple and commonly used mechanism to connect clients
replica origin servers is to use HTTP redirection. Clients
redirected to an optimal replica origin server via the use of
HTTP [1] protocol response codes, e.g., 302 "Found", or 307
"Temporary Redirect". A client establishes HTTP
with one of the replica origin servers. The initially
replica origin server can then either choose to accept the
or redirect the client again. Refer to section 10.3 in HTTP/1.1
[1] for information on HTTP response codes

Security
Relies entirely upon HTTP security

Deployment
Observed at a number of large web sites. Extent of usage in
Internet is unknown

Submitter
Document editors

4.3 DNS

Best known references

* RFC 1794 DNS Support for Load Balancing Proximity [8]

* This

Description
The Domain Name Service (DNS) provides a more sophisticated
to replica communication mechanism. This is accomplished by



Cooper, et al. Informational [Page 14]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


servers that sort resolved IP addresses based upon quality
service policies. When a client resolves the name of an
server, the enhanced DNS server sorts the available IP
of the replica origin servers starting with the most
replica and ending with the least optimal replica

Security
Relies entirely upon DNS security, and other protocols that may
used in determining the sort order

Deployment
Observed at a number of large web sites and large ISP web
services. Extent of usage in the Internet is unknown, but
believed to be increasing

Submitter
Document editors

5. Inter-Replica

This section describes the cooperation and communication
master- and replica- origin servers. Used in replicating data
between origin servers

5.1 Batch Driven

Best known reference
This memo

Description
The replica origin server to be updated initiates
with a master origin server. The communication is established
intervals based upon queued transactions which are scheduled
deferred processing. The scheduling mechanism policies vary,
generally are re-occurring at a specified time.
communication is established, data sets are copied to
initiating replica origin server

Security
Relies upon the protocol being used to transfer the data set.
[4] and RDIST are the most common protocols observed

Deployment
Very common for synchronization of mirror sites in the Internet

Submitter
Document editors




Cooper, et al. Informational [Page 15]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


5.2 Demand Driven

Best known reference
This memo

Description
Replica origin servers acquire content as needed due to
demand. When a client requests a resource that is not in the
set of the replica origin server/surrogate, an attempt is made
resolve the request by acquiring the resource from the
origin server, returning it to the requesting client

Security
Relies upon the protocol being used to transfer the resources.
[4], Gopher [5], HTTP [1] and ICP [2] are the most
protocols observed

Deployment
Observed at several large web sites. Extent of usage in
Internet is unknown

Submitter
Document editors

5.3 Synchronized

Best known reference
This memo

Description
Replicated origin servers cooperate using synchronized
and specialized replica protocols to keep the replica data
coherent. Synchronization strategies range from tightly
(a few minutes) to loosely coherent (a few or more hours).
occur between replicas based upon the synchronization
constraints of the coherency model employed and are generally
the form of deltas only

Security
All of the known protocols utilize strong cryptographic
exchange methods, which are either based upon the Kerberos
secret model or the public/private key RSA model

Deployment
Observed at a few sites, primarily at university campuses

Submitter
Document editors



Cooper, et al. Informational [Page 16]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Note
The editors are aware of at least two open source protocols -
and CODA - as well as the proprietary NRS protocol from Novell

6. User Agent to Proxy

This section describes the configuration, cooperation
communication between user agents and proxies

6.1 Manual Proxy

Best known reference
This memo

Description
Each user must configure her user agent by supplying
pertaining to proxied protocols and local policies

Security
The potential for doing wrong is high; each user individually
preferences

Deployment
Widely deployed, used in all current browsers. Most browsers
support additional options

Submitter
Document editors

6.2 Proxy Auto Configuration (PAC

Best known reference
"Navigator Proxy Auto-Config File Format" [12]

Description
A JavaScript script retrieved from a web server is executed
each URL accessed to determine the appropriate proxy (if any)
be used to access the resource. User agents must be configured
request this script upon startup. There is no
mechanism, manual configuration is necessary

Despite manual configuration, the process of proxy
is simplified by centralizing it within a script at a
location

Security
Common policy per organization possible but still requires
manual configuration. PAC is better than "manual



Cooper, et al. Informational [Page 17]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


configuration" since PAC administrators may update the
configuration without further user intervention

Interoperability of PAC files is not high, since
browsers have slightly different interpretations of the
script, possibly leading to undesired effects

Deployment
Implemented in Netscape Navigator and Microsoft Internet Explorer

Submitter
Document editors

6.3 Cache Array Routing Protocol (CARP) v1.0

Best known references

* "Cache Array Routing Protocol" [14] (work in progress

* "Cache Array Routing Protocol (CARP) v1.0 Specifications" [15]

* "Cache Array Routing Protocol and Microsoft Proxy Server 2.0"
[16]

Description
User agents may use CARP directly as a hash function based
selection mechanism. They need to be configured with the
of the cluster information

Security
Security considerations are not covered in the specification
in progress

Deployment
Implemented in Microsoft Proxy Server, Squid. Implemented in
agents via PAC scripts

Submitter
Document editors

6.4 Web Proxy Auto-Discovery Protocol (WPAD

Best known reference
"The Web Proxy Auto-Discovery Protocol" [13] (work in progress

Description
WPAD uses a collection of pre-existing Internet resource
mechanisms to perform web proxy auto-discovery



Cooper, et al. Informational [Page 18]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


The only goal of WPAD is to locate the PAC URL [12]. WPAD
not specify which proxies will be used. WPAD supplies the
URL, and the PAC script then operates as defined above to
proxies per resource request

The WPAD protocol specifies the following

* how to use each mechanism for the specific purpose of web
auto-

* the order in which the mechanisms should be

* the minimal set of mechanisms which must be attempted by a
compliant user

The resource discovery mechanisms utilized by WPAD are as follows

* Dynamic Host Configuration Protocol

* Service Location Protocol

* "Well Known Aliases" using DNS A

* DNS SRV

* "service: URLs" in DNS TXT

Security
Relies upon DNS and HTTP security

Deployment
Implemented in some user agents and caching proxy servers.
than two independent implementations

Submitter
Josh

7. Inter-Proxy

7.1 Loosely coupled Inter-Proxy

This section describes the cooperation and communication
caching proxies

7.1.1 Internet Cache Protocol (ICP

Best known reference
RFC 2186 Internet Cache Protocol (ICP), version 2 [2]



Cooper, et al. Informational [Page 19]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Description
ICP is used by proxies to query other (caching) proxies about
resources, to see if the requested resource is present on
other system

ICP uses UDP. Since UDP is an uncorrected network
protocol, an estimate of network congestion and availability
be calculated by ICP loss. This rudimentary loss
provides, together with round trip times, a load balancing
for caches

Security
See RFC 2187 [3]

ICP does not convey information about HTTP headers associated
resources. HTTP headers may include access control and
directives. Since proxies ask for the availability of resources
and subsequently retrieve them using HTTP, false cache hits
occur (object present in cache, but not accessible to a sibling
one example).

ICP suffers from all the security problems of UDP

Deployment
Widely deployed. Most current caching proxy
support ICP in some form

Submitter
Document editors

See also
"Internet Cache Protocol Extension" [17] (work in progress

7.1.2 Hyper Text Caching

Best known reference
RFC 2756 Hyper Text Caching Protocol (HTCP/0.0) [9]

Description
HTCP is a protocol for discovering HTTP caching proxies and
data, managing sets of HTTP caching proxies, and monitoring
activity

HTCP requests include HTTP header material, while ICPv2 does not
enabling HTCP replies to more accurately describe the
that would occur as a result of a subsequent HTTP request for
same resource




Cooper, et al. Informational [Page 20]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Security
Optionally uses HMAC-MD5 [11] shared secret authentication
Protocol is subject to attack if authentication is not used

Deployment
HTCP is implemented in Squid and the "Web Gateway Interceptor".

Submitter
Document editors

7.1.3 Cache

Best known references

* "Cache Digest Specification - version 5" [21]

* "Summary Cache: A Scalable Wide-Area Web Cache
Protocol" [10] (see note

Description
Cache Digests are a response to the problems of latency
congestion associated with previous inter-cache
mechanisms such as the Internet Cache Protocol (ICP) [2] and
Hyper Text Cache Protocol [9]. Unlike these protocols,
Digests support peering between caching proxies and cache
without a request-response exchange taking place for each
request. Instead, a summary of the contents in cache (the Digest
is fetched by other systems that peer with it. Using
Digests it is possible to determine with a relatively high
of accuracy whether a given resource is cached by a
system

Cache Digests are both an exchange protocol and a data format

Security
If the contents of a Digest are sensitive, they should
protected. Any methods which would normally be applied to
an HTTP connection can be applied to Cache Digests

A 'Trojan horse' attack is currently possible in a mesh: System
A can build a fake peer Digest for system B and serve it to B'
peers if requested. This way A can direct traffic toward/from B
The impact of this problem is minimized by the 'pull' model
transferring Cache Digests from one system to another







Cooper, et al. Informational [Page 21]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Cache Digests provide knowledge about peer cache content on a
level. Hence, they do not dictate a particular level of
management and can be used to implement various policies on
level (user, organization, etc.).

Deployment
Cache Digests are supported in Squid

Cache Meshes: NLANR Mesh; TF-CACHE Mesh (European


Submitter
Alex Rousskov for [21], Pei Cao for [10].

Note: The technology of Summary Cache [10] is patent pending by
University of Wisconsin-Madison

7.1.4 Cache Pre-

Best known reference
"Pre-filling a cache - A satellite overview" [20] (work
progress

Description
Cache pre-filling is a push-caching implementation. It
particularly well adapted to IP-multicast networks because
allows preselected resources to be simultaneously inserted
caches within the targeted multicast group.
implementations of cache pre-filling already exist, especially
satellite contexts. However, there is still no standard for
kind of push-caching and vendors propose solutions either based
dedicated equipment or public domain caches extended with a pre
filling module

Security
Relies on the inter-cache protocols being employed

Deployment
Observed in two commercial content distribution service providers

Submitter
Ivan

7.2 Tightly Coupled Inter-Cache

7.2.1 Cache Array Routing Protocol (CARP) v1.0

Also see Section 6.3



Cooper, et al. Informational [Page 22]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Best known references

* "Cache Array Routing Protocol" [14] (work in progress

* "Cache Array Routing Protocol (CARP) v1.0 Specifications" [15]

* "Cache Array Routing Protocol and Microsoft Proxy Server 2.0"
[16]

Description
CARP is a hashing function for dividing URL-space among a
of proxies. Included in CARP is the definition of a Proxy
Membership Table, and ways to download this information

A user agent which implements CARP v1.0 can allocate
intelligently route requests for the URLs to any member of
Proxy Array. Due to the resulting sorting of requests
these proxies, duplication of cache contents is eliminated
global cache hit rates may be improved

Security
Security considerations are not covered in the specification
in progress

Deployment
Implemented in caching proxy servers. More than two
implementations

Submitter
Document editors

8. Network Element

This section describes the cooperation and communication
proxies and network elements. Examples of such network
include routers and switches. Generally used for
interception proxies and/or diffused arrays

8.1 Web Cache Control Protocol (WCCP

Best known references
"Web Cache Control Protocol" [18][19] (work in progress

Note: The name used for this protocol varies, sometimes
to as the "Web Cache Coordination Protocol", but frequently
"WCCP" to avoid





Cooper, et al. Informational [Page 23]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Description
WCCP V1 runs between a router functioning as a redirecting
element and out-of-path interception proxies. The protocol
one or more proxies to register with a single router to
redirected traffic. It also allows one of the proxies,
designated proxy, to dictate to the router how redirected
is distributed across the array

WCCP V2 additionally runs between multiple routers and
proxies

Security
WCCP V1 has no security features
WCCP V2 provides optional authentication of protocol packets

Deployment
Network elements: WCCP is deployed on a wide range of
routers
Caching proxies: WCCP is deployed on a number of vendors'
proxies

Submitter
David
Document editors

8.2 Network Element Control Protocol (NECP

Best known reference
"NECP: The Network Element Control Protocol" [22] (work
progress

Description
NECP provides methods for network elements to learn about
capabilities, availability, and hints as to which flows can
cannot be serviced. This allows network elements to perform
balancing across a farm of servers, redirection to
proxies, and cut-through of flows that cannot be served by
farm

Security
Optionally uses HMAC-SHA-1 [11] shared secret authentication
with complex sequence numbers to provide moderately
security. Protocol is subject to attack if authentication is
used

Deployment
Unknown at present; several network element and caching
vendors have expressed intent to implement the protocol



Cooper, et al. Informational [Page 24]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Submitter
Gary

8.3

Best known reference
RFC 1928 SOCKS Protocol Version 5 [7]

Description
SOCKS is primarily used as a caching proxy to firewall protocol
Although firewalls don't conform to the narrowly defined
element definition above, they are a integral part of the
infrastructure. When used in conjunction with a firewall,
provides a authenticated tunnel between the caching proxy and
firewall

Security
An extensive framework provides for multiple
methods. Currently, SSL, CHAP, DES, 3DES are known to
available

Deployment
SOCKS is widely deployed in the Internet

Submitter
Document editors

9. Security

This document provides a taxonomy for web caching and replication
Recommended practice, architecture and protocols are not described
detail

By definition, replication and caching involve the copying
resources. There are legal implications of making and
transient or permanent copies; these are not covered here

Information on security of each protocol referred to by this memo
provided in the preceding sections, and in their
documentation. HTTP security is discussed in section 15 of RFC 2616
[1], the HTTP/1.1 specification, and to a lesser extent in RFC 1945
[6], the HTTP/1.0 specification. RFC 2616 contains
considerations for HTTP proxies








Cooper, et al. Informational [Page 25]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Caching proxies have the same security issues as other
level proxies. Application level proxies are not covered in
security considerations. IP number based authentication
problematic when a proxy is involved in the communications.
are not discussed here

9.1

Requests for web resources, and responses to such requests, may
directed to replicas and/or may flow through intermediate proxies
The integrity of communication needs to be preserved to
protection from both loss of access and from unintended change

9.1.1 Man in the middle

HTTP proxies are men-in-the-middle, the perfect place for a man-in
the-middle-attack. A discussion of this is found in section 15
RFC 2616 [1].

9.1.2 Trusted third

A proxy must either be trusted to act on behalf of the origin
and/or client, or it must act as a tunnel. When presenting
objects to clients, the clients need to trust the caching proxy
act on behalf on the origin server

A replica may get accreditation from the origin server

9.1.3 Authentication based on IP

Authentication based on the client's IP number is problematic
connecting through a proxy, since the authenticating device only
access to the proxy's IP number. One (problematic) solution to
is for the proxy to spoof the client's IP number for
requests

Authentication based on IP number assumes that the end-to-
properties of the Internet are preserved. This is typically not
case for environments containing interception proxies

9.2

9.2.1 Trusted third

When using a replication service, one must trust both the
origin server and the replica selection system





Cooper, et al. Informational [Page 26]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Redirection of traffic - either by automated replica
methods, or within proxies - may introduce third parties the end
and/or origin server must to trust. In the case of
proxies, such third parties are often unknown to both end points
the communication. Unknown third parties may have
implications

Both proxies and replica selection services may have access
aggregated access information. A proxy typically knows
accesses by each client using it, information that is more
than the information held by a single origin server

9.2.2 Logs and legal

Logs from proxies should be kept secure, since they
information about users and their patterns of behaviour. A proxy'
log is even more sensitive than a web server log, as every
from the user population goes through the proxy. Logs from
origin servers may need to be amalgamated to get
statistics from a service, and transporting logs across borders
have legal implications. Log handling is restricted by law in
countries

Requirements for object security and privacy are the same in a
replication and caching system as it is in the Internet at large.
only reliable solution is strong cryptography. End-to-end
frequently makes resources uncacheable, as in the case of
encrypted web sessions

9.3 Service

9.3.1 Denial of

Any redirection of traffic is susceptible to denial of
attacks at the redirect point, and both proxies and replica
services may redirect traffic

By attacking a proxy, access to all servers may be denied for a
set of clients

It has been argued that introduction of an interception proxy is
denial of service attack, since the end-to-end nature of the
is destroyed without the content consumer's knowledge

9.3.2 Replay

A caching proxy is by definition a replay attack




Cooper, et al. Informational [Page 27]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


9.3.3 Stupid configuration of

It is quite easy to have a stupid configuration which will
service for content consumers. This is the most common
problem with proxies

9.3.4 Copyrighted transient

The legislative forces of the world are considering the question
transient copies, like those kept in replication and caching system
being legal. The legal implications of replication and caching
subject to local law

Caching proxies need to preserve the protocol output,
headers. Replication services need to preserve the source of
objects

9.3.5 Application level

Caching proxies are application level components in the traffic
path, and may give intruders access to information that
previously only available at the network level in a proxy-free world
Some network level equipment may have required physical access to
sensitive information. Introduction of application level
may require additional system security

10.

The editors would like to thank the following for their assistance
David Forster, Alex Rousskov, Josh Cohen, John Martin, John Dilley
Ivan Lovric, Joe Touch, Henrik Nordstrom, Patrick McManus,
Wessels, Wojtek Sylwestrzak, Ted Hardie, Misha Rabinovich,
Masinter, Keith Moore, Roy Fielding, Patrik Faltstrom, Hilarie Orman
Mark Nottingham and Oskar Batuner



[1] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
HTTP/1.1", RFC 2616, June 1999.

[2] Wessels, D. and K. Claffy, "Internet Cache Protocol (ICP),
Version 2", RFC 2186, September 1997.

[3] Wessels, D. and K. Claffy, "Application of Internet
Protocol (ICP), Version 2", RFC 2187, September 1997.





Cooper, et al. Informational [Page 28]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


[4] Postel, J. and J. Reynolds, "File Transfer Protocol (FTP)",
9, RFC 959, October 1985.

[5] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., Torrey
D. and B. Alberti, "The Internet Gopher Protocol", RFC 1436,
March 1993.

[6] Berners-Lee, T., Fielding, R. and H. Frystyk, "
Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996.

[7] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D. and L
Jones, "SOCKS Protocol Version 5", RFC 1928, March 1996.

[8] Brisco, T., "DNS Support for Load Balancing", RFC 1794,
1995.

[9] Vixie, P. and D. Wessels, "Hyper Text Caching
(HTCP/0.0)", RFC 2756, January 2000.

[10] Fan, L., Cao, P., Almeida, J. and A. Broder, "Summary Cache:
Scalable Wide-Area Web Cache Sharing Protocol", Proceedings
ACM SIGCOMM'98 pp. 254-265, September 1998.

[11] Krawczyk, H., Bellare, M. and R. Canetti, "HMAC: Keyed-
for Message Authentication", RFC 2104, February 1997.

[12] Netscape, Inc., "Navigator Proxy Auto-Config File Format",
March 1996,
live.html>.

[13] Gauthier, P., Cohen, J., Dunsmuir, M. and C. Perkins, "The
Proxy Auto-Discovery Protocol", Work in Progress

[14] Valloppillil, V. and K. Ross, "Cache Array Routing Protocol",
Work in Progress

[15] Microsoft Corporation, "Cache Array Routing Protocol (CARP
v1.0 Specifications, Technical Whitepaper", August 1999,
microsoft.com/Proxy/Guide/carpspec.asp>.

[16] Microsoft Corporation, "Cache Array Routing Protocol
Microsoft Proxy Server 2.0, Technical White Paper",
1998,
microsoft.com/proxy/documents/CarpWP.exe>.

[17] Lovric, I., "Internet Cache Protocol Extension", Work
Progress



Cooper, et al. Informational [Page 29]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


[18] Cieslak, M. and D. Forster, "Cisco Web Cache
Protocol V1.0", Work in Progress

[19] Cieslak, M., Forster, D., Tiwana, G. and R. Wilson, "Cisco
Cache Coordination Protocol V2.0", Work in Progress

[20] Goutard, C., Lovric, I. and E. Maschio-Esposito, "Pre-filling
cache - A satellite overview", Work in Progress

[21] Hamilton, M., Rousskov, A. and D. Wessels, "Cache
specification - version 5", December 1998,
v5.txt>.

[22] Cerpa, A., Elson, J., Beheshti, H., Chankhunthod, A., Danzig
P., Jalan, R., Neerdaels, C., Shroeder, T. and G. Tomlinson
"NECP: The Network Element Control Protocol", Work in Progress

[23] Cooper, I. and J. Dilley, "Known HTTP Proxy/Caching Problems",
Work in Progress































Cooper, et al. Informational [Page 30]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Authors'

Ian
Equinix, Inc
2450 Bayshore
Mountain View, CA 94043


Phone: +1 650 316 6065
EMail: icooper@equinix.


Ingrid

Tempeveien 22
Trondheim N-7465


Phone: +47 73 55 79 07
EMail: Ingrid.Melve@uninett.


Gary
CacheFlow Inc
12034 134th Ct. NE, Suite 201
Redmond, WA 98052


Phone: +1 425 820 3009
EMail: gary.tomlinson@cacheflow.





















Cooper, et al. Informational [Page 31]

RFC 3040 Internet Web Replication & Caching Taxonomy January 2001


Full Copyright

Copyright (C) The Internet Society (2001). All Rights Reserved

This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain
or assist in its implementation may be prepared, copied,
and distributed, in whole or in part, without restriction of
kind, provided that the above copyright notice and this paragraph
included on all such copies and derivative works. However,
document itself may not be modified in any way, such as by
the copyright notice or references to the Internet Society or
Internet organizations, except as needed for the purpose
developing Internet standards in which case the procedures
copyrights defined in the Internet Standards process must
followed, or as required to translate it into languages other
English

The limited permissions granted above are perpetual and will not
revoked by the Internet Society or its successors or assigns

This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE



Funding for the RFC Editor function is currently provided by
Internet Society



















Cooper, et al. Informational [Page 32]








if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum