4. Measuring the WWW.
Our strategy for viewing the morphogenetic process of a massive
neural
net may be applied to the WWW. That is indeed the main point of
this paper. But how to represent
the Web as a Net? There are clearly two necessary steps: to define
the nodes, and to measure the
connection strengths. For each of these steps there are many
possibilities. Here we describe only
one approach to each.
Nodes. The WWW is a tree consisting of domains, servers,
and pages. There are now tens of thousands of domains,
several servers in each domain, and many pages in each server.
Each domain
has a unique name (for example, vismath.org), each server has a
unique name (eg, www.vis-
math.org) and IP address (eg, 162.227.70.1), and each page has a
unique URL (eg, http://
www.vismath.org/index.html). These are the main choices for nodes
of the WWW. For reasons of
size, mainly, let us regard domain names as the nodes of the Web.
We may further reduce the size
of the network to be visualized by considering only the suffices
edu or org. Besides reducing to a
smaller number of nodes, we might anticipate that the domains in
the com class are relatively
sparsely connected, and thus less interesting from the mathematical
point of view.
Connections. The interconnections of the WWW,
as a hypertext and hypermedia system, are
links. Links connect pages, but pages are secondary
to domains according to our choice above.
Thus, given two domains, that is, nodes, we must determine
all links from any page of the first
domain, to any page of the second domain. Then this simple
count should be normalized. That is,
regarding the number of all pages of all servers of the
first domain as a width, and all pages of all
servers of the second domain as a height, we obtain a
rectangle, the area of which (the product of
the two page counts) may be regarded as contributing to the
probability of a link. Thus, the connection
strength we are proposing here is the ratio of the
number of links to the product of the
width and the height. A more precise measure might take into
account the byte size of pages, or
equivalently, the total storage served by each domain.
However, this data is much more expensive
to obtain.
In any case, the data to construct the massive connection
matrix for the entire WWW is to be col-
lected by a Web crawler or robot, not just once, but repeatedly,
according to our larger plan. And
fortunately for this program, a number of Web crawlers are
already at work collecting links for
indices of the WWW. This is to be the basis for further work
in this project.
|| Home ||
|| 1. Introduction || 2. Connectionism || 3. Visualization of massive neural nets || 4. Measuring the WWW || 5. Conclusion || Acknowledgments
Bibliography ||