【题】关于信息化管【原题】Information Management: A Proposal
【作者】博纳斯·李
【说明】发表于1989年3月 Tim Berners-Lee 提交在CERN 提交了一份关于一种信息管理系统的建议
This proposal concernsthe management of general information about accelerators and experiments atCERN. It discusses the problems of loss of information about complex evolvingsystems and derives a solution based on a distributed hypertext system.
Overview
Many of the discussionsof the future at CERN and the LHC era end with the question - “Yes, but howwill we ever keep track of such a large project?” This proposal provides ananswer to such questions. Firstly, it discusses the problem of informationaccess at CERN. Then, it introduces the idea of linked information systems, andcompares them with less flexible ways of finding information.
It then summarises myshort experience with non-linear text systems known as “hypertext”, describeswhat CERN needs from such a system, and what industry may provide. Finally, itsuggests steps we should take to involve ourselves with hypertext now, so thatindividually and collectively we may understand what we are creating.
Losing Information atCERN
CERN is a wonderfulorganisation. It involves several thousand people, many of them very creative,all working toward common goals. Although they are nominally organised into ahierarchical management structure,this does not constrain the way people willcommunicate, and share information, equipment and software across groups.
The actual observedworking structure of the organisation is a multiply connected “web” whoseinterconnections evolve with time. In this environment, a new person arriving,or someone taking on a new task, is normally given a few hints as to who wouldbe useful people to talk to. Information about what facilities exist and how tofind out about them travels in the corridor gossip and occasional newsletters,and the details about what is required to be done spread in a similar way. Allthings considered, the result is remarkably successful, despite occasionalmisunderstandings and duplicated effort.
A problem, however, isthe high turnover of people. When two years is a typical length of stay,information is constantly being lost. The introduction of the new peopledemands a fair amount of their time and that of others before they have anyidea of what goes on. The technical details of past projects are sometimes lostforever, or only recovered after a detective investigation in an emergency.Often, the information has been recorded, it just cannot be found.
If a CERN experiment werea static once-only development, all the information could be written in a bigbook. As it is, CERN is constantly changing as new ideas are produced, as newtechnology becomes available, and in order to get around unforeseen technicalproblems. When a change is necessary, it normally affects only a small part ofthe organisation. A local reason arises for changing a part of the experimentor detector. At this point, one has to dig around to find out what other partsand people will be affected. Keeping a book up to date becomes impractical, andthe structure of the book needs to be constantly revised.
The sort of informationwe are discussing answers, for example, questions likeWhere is this moduleused?
• Who wrote this code? Where does he work?
• What documents exist about that concept?
• Which laboratories are included in that project?
• Which systems depend on this device?
• What documents refer tothis one?
The problems ofinformation loss may be particularly acute at CERN, but in this case (as incertain others), CERN is a model in miniature of the rest of world in a fewyears time. CERN meets now some problems which the rest of the world will haveto face soon. In 10 years, there may be many commercial solutions to theproblems above, while today we need something to allow us to continue1.
Linked informationsystems
In providing a system formanipulating this sort of information, the hope would be to allow a pool ofinformation to develop which could grow and evolve with the organisation andthe projects it describes. For this to be possible,
the method of storagemust not place its own restraints on the information.
This is why a"web" of notes with links (like references) between them is far moreuseful than a fixed hierarchical system. When describing a complex system, manypeople resort to diagrams with circles and arrows. Circles and arrows leave onefree to describe the interrelationships between things in a way that tables,for example, do not. The system we need is like a diagram of circles andarrows, where circles and arrows can stand for anything.
We can call the circlesnodes, and the arrows links. Suppose each node is like a small note, summaryarticle, or comment. I'm not over concerned here with whether it has text orgraphics or both. Ideally, it representsor describes one particular person or object. Examples of nodes can be
• People
• Software modules
• Groups of people
• Projects
• Concepts
• Documents
• Types of hardware
• Specific hardwareobjects
The arrows which linkscircle A to circle B can mean, for example, that A...
• depends on B
• is part of B
• made B
• refers to B
• uses B
• is an example of B
These circles and arrows,nodes and links2, have different significance in varioussorts of conventional diagrams:
The system must allow anysort of information to be entered. Another person must be able to find theinformation, sometimes without knowing what he is looking for.
In practice, it is usefulfor the system to be aware of the generic types of the links between items(dependences, for example), and the types of nodes (people, things,documents..) without imposing any limitations.
The problem with trees
Many systems areorganised hierarchically. The CERNDOC documentation system is an example, as isthe Unix file system, and the VMS/HELP system. A tree has the practicaladvantage of giving every node a unique name. However, it does not allow thesystem to model the real world. For example, in a hierarchical HELP system suchas VMS/HELP, one often gets to a leaf on a tree such as
HELP COMPILERSOURCE_FORMAT PRAGMAS DEFAULTS
only to find a referenceto another leaf: "Please see
HELP COMPILER COMMANDOPTIONS DEFAULTS PRAGMAS"
and it is necessary toleave the system and re-enter it. What was needed was a link from one node toanother, because in this case the information was not naturally organised intoa tree.
Another example of atree-structured system is the uucp News system (try 'rn' under Unix). This is ahierarchical system of discussions ("newsgroups") each containingarticles contributed by many people. It is a very useful method of poolingexpertise, but suffers from the inflexibility of a tree. Typically, adiscussion under one newsgroup will develop into a different topic, at whichpoint it ought to be in a different part of the tree. (See Fig 1).
Fig 1. An article in theUUCP News scheme.
The Subject field allowsnotes on the same topic to be linked together within a "newsgroup".The name of the newsgroup (alt.hypertext) is a hierarchical name. Thisparticular note is expresses a problem with the strict tree structure of thescheme: this discussion is related to several areas. Note that the"References", "From" and "Subject" fields can allbe used to generate links.
The problem withkeywords
Keywords are a commonmethod of accessing data for which one does not have the exact coordinates. Theusual problem with keywords, however, is that two people never chose the samekeywords. The keywords then become useful only to people who already know theapplication well.
Practical keyword systems(such as that of VAX/NOTES for example) require keywords to be registered. Thisis already a step in the right direction.
A linked system takesthis to the next logical step. Keywords can be nodes which stand for a concept.A keyword node is then no different from any other node. One can linkdocuments, etc., to keywords. One can then find keywords by finding any node towhich they are related. In this way, documents on similar topics are indirectlylinked, through their key concepts.
A keyword search thenbecomes a search starting from a small number of named nodes, and finding nodes which are close to all of them.
It was for these reasons that I first made a smalllinked information system, not realising that a term had already been coinedfor the idea: “hypertext”.
A solution: Hypertext
Personal Experience with Hypertext
In 1980 , I wrote a program for keeping track ofsoftware with which I was involved in the PS control system. Called Enquire,it allowed one to store snippets of information, and to link related piecestogether in any way. To find information, one progressed via the links from onesheet to another, rather like in the old computer game "adventure". Iused this for my personal record of people and modules. It was similar to the application Hypercardproduced more recently by Apple for the Macintosh. A difference was that Enquire,although lacking the fancy graphics, ran on a multiuser system, and allowedmany people to access the same data.
Fig 2. A screen in an Enquire scheme
This example is basically a list, so the list of linksis more important than the text on the node itself. Note that each link has atype ("includes" for example) and may also have comment associatedwith it. (The bottom line is a menu bar.)
Soon after my re-arrival at CERN in the DD division, Ifound that the environment was similar to that in PS, and I missed Enquire.I therefore produced a version for the VMS, and have used it to keep track ofprojects, people, groups, experiments, software modules and hardware deviceswith which I have worked. I have found it personally very useful. I have madeno effort to make it suitable for general consumption, but have found that afew people have successfully used it to browse through the projects and findout all sorts of things of their own accord.
Hot spots
Meanwhile, several programs have been made exploringthese ideas, both commercially and academically. Most of them use "hotspots" in documents, like icons, or highlighted phrases, as sensitiveareas. touching a hot spot with a mouse brings up the relevant information, orexpands the text on the screen to include it. Imagine, then, the references inthis document, all being associated with the network address of the thing towhich they referred, so that while reading this document you could skip to themwith a click of the mouse.
"Hypertext" is a term coined in the 1950s byTed Nelson [...], which has becomepopular for these systems, although it is used to embrace two different ideas.One idea (which is relevant to this problem) is the concept:
“Hypertext”: Human-readable informationlinked together in an unconstrained way.
The other idea, which isindependent and largely a question of technology and time, is of multimediadocuments which include graphics, speech and video. I will not discuss thislatter aspect further here, although I will use the word "Hypermedia"to indicate that one is not bound to text.
It has been difficult toassess the effect of a large hypermedia system on an organisation, oftenbecause these systems never had seriously large-scale use. For this reason, werequire large amounts of existing information should be accessible using any newinformation management system.
CERN Requirements
To be a practical systemin the CERN environment, there are a number of clear practical requirements.
Remote access across networks.
CERN is distributed, andaccess from remote machines is essential.
Heterogeneity
Access is required to thesame data from different types of system (VM/CMS, Macintosh, VAX/VMS, Unix)
Non-Centralisation
Information systems startsmall and grow. They also start isolated and then merge. A new system mustallow existing systems to be linked together without requiring any centralcontrol or coordination.
Access to existing data
If we provide access toexisting databases as though they were in hypertext form, the system will getoff the ground quicker. This is discussed further below.
Private links
One must be able to addone's own private links to and from public information. One must also be ableto annotate links, as well as nodes, privately.
Bells and Whistles
Storage of ASCII text,and display on 24x80 screens, is in the short term sufficient, and essential.Addition of graphics would be an optional extra with very much less penetrationfor the moment.
Data analysis
An intriguingpossibility, given a large hypertext database with typed links, is that itallows some degree of automatic analysis. It is possible to search, forexample, for anomalies such as undocumented software or divisions which containno people. It is possible to generate lists of people or devices for otherpurposes, such as mailing lists of people to be informed of changes.
It is also possible to look at the topology of anorganisation or a project, and draw conclusions about how it should be managed,and how it could evolve. This is particularly useful when the database becomesvery large, and groups of projects, for example, so interwoven as to make itdifficult to see the wood for the trees.
In a complex place like CERN, it's not always obvious howto divide people into groups. Imagine making a large three-dimensional model,with people represented by little spheres, and strings between people who havesomething in common at work.
Now imagine picking upthe structure and shaking it, until you make some sense of the tangle: perhaps,you see tightly knit groups in some places, and in some places weak areas ofcommunication spanned by only a few people. Perhaps a linked information systemwill allow us to see the real structure of the organisation in which we work.
Live links
The data to which a link(or a hot spot) refers may be very static, or it may be temporary. In manycases at CERN information about the state of systems is changing all the time.Hypertext allows documents to be linked into "live" data so that everytime the link is followed, the information is retrieved. If one sacrificesportability, it is possible so make following a link fire up a specialapplication, so that diagnostic programs, for example, could be linked directlyinto the maintenance guide.
Non requirements
Discussions on Hypertexthave sometimes tackled the problem of copyright enforcement and data security.These are of secondary importance at CERN, where information exchange is stillmore important than secrecy. Authorisation and accounting systems for hypertextcould conceivably be designed which are very sophisticated, but they are notproposed here.
In cases where referencemust be made to data which is in fact protected, existing file protectionsystems should be sufficient.
Specific Applications
The following are threeexamples of specific places in which the proposed system would be immediatelyuseful. There are many others.
Development Project Documentation.
The Remote procedure Callproject has a skeleton description using Enquire. Although limited, itis very useful for recording who did what, where they are, what documentsexist, etc. Also, one can keep track of users, and can easily append any extralittle bits of information which come to hand and have nowhere else to be put.Cross-links to other projects, and to databases which contain information onpeople and documents would be very useful, and save duplication of information.
Document retrieval.
The CERNDOC systemprovides the mechanics of storing and printing documents. A linked system wouldallow one to browse through concepts, documents, systems and authors, alsoallowing references between documents to be stored. (Once a document had beenfound, the existing machinery could be invoked to print it or display it).
The "Personal Skills Inventory".
Personal skills andexperience are just the sort of thing which need hypertext flexibility. Peoplecan be linked to projects they have worked on, which in turn can be linked toparticular machines, programming languages, etc.
The State of the Art in Hypermedia
An increasing amount ofwork is being done into hypermedia research at universities and commercialresearch labs, and some commercial systems have resulted. There have been twoconferences, Hypertext '87 and '88, and in Washington DC, the NationalInstitute of Standards and Technology (NST) hosted a workshop onstandardisation in hypertext, a followup of which will occur during 1990.
The Communications ofthe ACM special issue on Hypertext contains many references to hypertextpapers. A bibliography on hypertext is given in [NIST90],and a uucp newsgroup alt.hypertext exists. I do not, therefore, give a list here.
Browsing techniques
Much of the academicresearch is into the human interface side of browsing through a complexinformation space. Problems addressed are those of making navigation easy, andavoiding a feeling of being "lost in hyperspace". Whilst the resultsof the research are interesting, many users at CERN will be accessing thesystem using primitive terminals, and so advanced window styles are not soimportant for us now.
Interconnection or publication?
Most systems availabletoday use a single database. This is accessed by many users by using adistributed file system. There are few products which take Ted Nelson's idea ofa wide "docuverse" literally by allowing links between nodes indifferent databases. In order to do this, some standardisation would benecessary. However, at the standardisation workshop, the emphasis was onstandardisation of the format for exchangeable media, nor for networking. Thisis prompted by the strong push toward publishing of hypermedia information, forexample on optical disk. There seems to be a general consensus about theabstract data model which a hypertext system should use.
Many systems have beenput together with little or no regard for portability, unfortunately. Someothers, although published, are proprietary software which is not for externalrelease. However, there are several interesting projects and more are appearingall the time. Digital's "Compound Document Architecture" (CDA) , forexample, is a data model which may be extendible into a hypermedia model, andthere are rumours that this is a way Digital would like to go.
Incentives and CALS
The US Department ofDefence has given a big incentive to hypermedia research by, in effect,specifying hypermedia documentation for future procurement. This means that allmanuals for parts for defence equipment must be provided in hypermedia form.The acronym CALS stands for “Computer-aided Acquisition and Logistic Support).
There is also muchsupport from the publishing industry, and from librarians whose job it is toorganise information.
What will the systemlook like?
Let us see whatcomponents a hypertext system at CERN must have.
The only way in whichsufficient flexibility can be incorporated is to separate the informationstorage software from the information display software, with a well definedinterface between them. Given the requirement for network access, it is naturalto let this clean interface coincide with the physical division between theuser and the remote database machine3.
This division also isimportant in order to allow the heterogeneity which is required at CERN (andwould be a boon for the world in general).
Fig 2. A client/servermodel for a distributed hypertext system.
Therefore, animportant phase in the design of the system is to define this interface.After that, the development of various forms of display program and of databaseserver can proceed in parallel. This will have been done well if many differentinformation sources, past, present and future, can be mapped onto thedefinition, and if many different human interface programs can be written overthe years to take advantage of new technology and standards.
Accessing Existing Data
The system must achieve a critical usefulness early on. Existing hypertext systemshave had to justify themselves solely on new data. If, however, there was anexisting base of data of personnel, for example, to which new data could be linked,the value of each new piece of data would be greater.
What is required is agateway program which will map an existing structure onto the hypertext model,and allow limited (perhaps read-only) access to it. This takes the form of ahypertext server written to provide existing information in a form matching thestandard interface. One would not imagine the server actually generating ahypertext database from and existing one: rather, it would generate a hypertextview of an existing database.
Fig 3. A hypertextgateway allows existing data to be seen in hypertext form by a hypertextbrowser.
Some examples of systemswhich could be connected in this way are
uucp News This is a Unix electronic conferencingsystem. A server for uucp news could makes links between notes on the samesubject, as well as showing the structure of the conferences.
VAX/Notes This is Digital's electronicconferencing system. It has a fairly wide following in FermiLab, but much lessin CERN. The topology of a conference is quite restricting.
CERNDOC This is a document registration anddistribution system running on CERN's VM machine. As well as documents,categories and projects, keywords and authors lend themselves to representationas hypertext nodes.
File systems This would allow any file tobe linked to from other hypertext documents.
The Telephone Book Even this could even be viewed as hypertext,with links between people and sections, sections and groups, people and floorsof buildings, etc.
The unix manual This is a large body ofcomputer-readable text, currently organised in a flat way, but which alsocontains link information in a standard format ("See also..").
Databases A generic tool could perhaps be made toallow any database which uses a commercial DBMS to be displayed as a hypertextview.
In some cases, writingthese servers would mean unscrambling or obtaining details of the existingprotocols and/or file formats. It may not be practical to provide the fullfunctionality of the original system through hypertext. In general, it will bemore important to allow read access to the general public: it may be that thereis a limited number of people who are providing the information, and that theyare content to use the existing facilities.
It is sometimes possibleto enhance an existing storage system by coding hypertext information in, ifone knows that a server will be generating a hypertext representation. In'news' articles, for example, one could use (in the text) a standard format fora reference to another article. This would be picked out by the hypertextgateway and used to generate a link to that note. This sort of enhancement willallow greater integration between old and new systems.
There will always be alarge number of information management systems - we get a lot of addedusefulness from being able to cross-link them. However, we will lose out if wetry to constrain them, as we will exclude systems and hamper the evolution ofhypertext in general.
Conclusion
We should work toward auniversal linked information system, in which generality and portability aremore important than fancy graphics techniques and complex extra facilities.
The aim would be to allowa place to be found for any information or reference which one felt wasimportant, and a way of finding it afterwards. The result should besufficiently attractive to use that it the information contained would growpast a critical threshold, so that the usefulness the scheme would in turnencourage its increased use.
The passing of thisthreshold accelerated by allowing large existing databases to be linkedtogether and with new ones.
A Practical Project
Here I suggest the practical steps to go to in order tofind a real solution at CERN. After a preliminary discussion of therequirements listed above, a survey of what is available from industry isobviously required. At this stage, we will be looking for a systems which arefuture-proof:
• portable, or supported on many platforms,
• Extendible to new data formats.
We may find that with a little adaptation, pars of thesystem we need can be combined from various sources: for example, a browserfrom one source with a database from another.
I imagine that two people for 6 to 12 months would besufficient for this phase of the project.
A second phase would almost certainly involve someprogramming in order to set up a real system at CERN on many machines. Animportant part of this, discussed below, is the integration of a hypertextsystem with existing data, so as to provide a universal system, and to achievecritical usefulness at an early stage.
(... and yes, this wouldprovide an excellent project with which to try our new object orientedprogramming techniques!)
Note:
[1] The same has beentrue, for example, of electronic mail gateways, document preparation, andheterogeneous distributed programming systems.
[2] Linked informationsystems have entities and relationships. There are, however, many differencesbetween such a system and an "Entity Relationship" database system.For one thing, the information stored in a linked system is largely comment forhuman readers. For another, nodes do not have strict types which define exactlywhat relationships they may have. Nodes of similar type do not all have to bestored in the same place.
[3] A client/server splitat this level also makes multi-access more easy, in that a single serverprocess can service many clients, avoiding the problems of simultaneous accessto one database by many different users.