2008/05/14

The Labs.Com System Lab Clustering
Last update 2001/06/25
The Labs - Design & Functionality For The Net

UNIX Clustering - Massive Networking

  1. Introduction
  2. Distributed Computation
  3. Distributed File Systems
  4. Clustering Resources
  5. PeerToPeer (P2P)
Clustering
1. Introduction
Will I Dream? - 2001 Odyssey

Clustering machines is the way increase performance, parallizing tasks and distributing them over a cluster of machines.

Clustering
2. Distributed Computation

PVM
PVM (Parallel Virtual Machine) is a software system that enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource.
MPI
MPI is the standard for multicomputer and cluster message passing introduced by the Message-Passing Interface Forum in April 1994.
LAM/MPI
LAM (Local Area Multicomputer) is an MPI programming environment and development system for heterogeneous computers on a network. With LAM, a dedicated cluster or an existing network computing infrastructure can act as one parallel computer solving one problem.
GridWare by Sun
Solaris/Linux distributed computation
OpenDirectory: Parallel Computing
Google Web-Directory

Clustering
3. Distributed File Systems

PVFS
Parallel Virtual File System (PVFS) Project for Linux
GFS
Global File System Project for Linux
AFS
IBM's open-source contribution (AIX, Linux, Solaris, NT)
ARLA
Free reimplementation of AFS (*BSD, Linux, Solaris, Darwin)
CODA File System
Another approach of distributed file-system (Linux, FreeBSD, NetBSD, Windows 95/NT)
WebOS
WebOS provides OS services to wide-area applications: resource discovery, a global namespace, remote process execution etc.
NFS
Linux NFS How-To (NFS runs on most UNIX'es: man nfs or man nfsd)
SGI: DCE/DCF
Distributed Computing Environment/Filesystem (SGI only)

Clustering
4. Clustering Resources

BeoWulf.Org
Site to start
BeoWulf Underground
Cluster news-site

Clustering
5. PeerToPeer (P2P)

Introduction
 
P2P ("Peer to Peer") became another hype, Napster, GNUtella and other approaches brought it exposure, yet, there is more than just trading MP3s and p0rn freely - certainly P2P is a better use of the Internet than other more passive approaches such as WWW - P2P works with the inherent potential of the Net:
Each node isn't just passive to download fancy webpages, but each node becomes an active and full-sharing client in a broad web or network implementing a higher functionality than each node alone could provide.
In a way, P2P is a step to an organic use of the Internet, molecules operate as receivers and transmitters of information within a body (Internet).

Resources
 
OpenP2P.Com
Another O'Reilly site, highly recommended
JXTA.Org
Sun's effort to establish p2p programming platform
FreenetProject.org
Anonymous and distributed storage-system (platform)
Espra.Net
Retrieval system using Freenet
Everything Over Freenet
Groundwork for Freenet

Future Applications
 P2P applications aren't yet many developed, there are many possible where primary focus is massive distribution where no central control is required or necessary:
  • massive on-going computation, e.g. sharing program-code which does computation and spreads info to nodes which take input to compute further.
  • massive distribution of code, like operating-system (e.g. new Linux dist), check-sums very important otherwise hacked versions spread.
  • ...
As mentioned, authentic data is crucial for serious P2P-applications, this means data must be signed in order to be authentic, PGP or alike two-key approachs are recommended.

One personal idea of mine was a small server-code which allows to receive code which is also executed, the "P2P-Cell":

  • large platform support: (Linux, *BSD, WinXX)
  • running in secured environment and limited resources (memory, diskspace, ports, bandwidth) - but otherwise able to anything
This would allow not just data but also code being distributed, specially code which runs on every platform. Since I really dislike Java, as this just doesn't convince, but some kind of binary format which is platform independent and fast enough for real-life applications, e.g. interpreter/compiler hybrid ala Java.

The result would be a massive distributed machine where everybody could theoretically submit tasks which are executed within a range or limitations each node has applied. Of course malicious code could target one single node to spam with requests or connections, in order to avoid this several precautions would be required to be taken:

  • each code-base distributed must be authenticated, no anonymous code is executed (or those who specifically allow it),
  • each code must be connected to a user or user-group; e.g. implementing groups, e.g. a group computing 10billions digits of pi are responsible for such a piece of code submitted to the net.
  • users could then either block a certain user or user-group from connection or executing code.
In short, for this "CellP2P" follow aspects would be required to be considered:
  • signing code
  • authenticate users/nodes
  • server code-base with ability to sanktion resources (memory, disk-space, bandwidth, executation-time)
  • ability to get feedback on distribution of data/code (e.g. knowing how many nodes actually execute the code, or received the data)
As result a basis for advanced P2P applications could be build without having people upgrade the code-base all the time.

Reflection
 
As part of the distributed file-sharing, a predominant application of the P2P as of 2001, the question of free sharing and economy arises: compensation of the work of the creator of the content which is shared freely. Right now, money as common denominator is the system to measure contribution to the society, at least in theory or it was meant to do so. In the time where stock-market and belief, or better said, expectations rule the market, the solid values of economy are virtualized, and P2P file-sharing is of the extreme: one person shares a piece of art (picture, sound, movie) and it is potentially available to everybody without central point of sharing, exception at the first sharing.

Organic Life: No Single Point of Failure - or How Complexity is Preserved - How Redundancy is used in order to Preserve Uniqueness. When one looks at the way our DNA is shared, like something more precious each cell contains the DNA, yet, the DNA itself is unique in itself. The redundancy is used so the uniqueness and integrity is maintained. P2P is just, in this consideration, the ability that each molecule (node) is able to transmit or receive certain data, e.g. the data itself could be a program for itself.
Now, the Internet with the existing approaches like ftp, www, email covered functionality as provided by the Net, yet, the P2P focuses on the equality of all nodes, each nodes is server and client at the same time; there is no central point of failure, and this part is something very useful to prevent data-losses, means, this way you can store precious information.
P2P seems to me a speed up of what the Internet provided already, we download content, programs and use and share it further, the P2P applications allow now to speed up this cycle, and this cycle determines also the kind data which is preserved (and this would be an interesting sociological consideration to look at).

In nature a complex strategy of recognition procedures has been implemented in order to see which cells contain malicious data (virus) and which don't, and cancer or viral infections like AIDS have shown what happens when the cells themselves are manipulated and their overall integrity of the body is questioned, it can cause the death of the host. Now, the overall intention or occurance of such infection is part of the host's direction, means, the cells reflect the intention of a higher order of will. In the same way it could be reflected, that the higher order of consciousness is directing the parts to connect each other in a fashion in order to implement a functionality beyond the comprehension of the parts themselves, like cells organize in a higher order to compose organs or finally an entire human body, in the same sense, all machines connecting together and enhanced to potentially connect to each other implementing a framework in which a higher order of intellectual potential, or consciousness can be hosted . . .

                                                                                                                                   

Internet System LabArtificial Intelligence

Last update 2001/06/25

All Rights Reserved - (C) 1997 - 2008 by The Labs.Com

Top of Page

The Labs.Com