{CHArt logo}
« Contents

Moving the Image: Visual Culture and the New Millennium

Michael Greenhalgh
Australian National University, Canberra and Christ Church, Oxford

Virtual Reality in Architecture: A VRML Model of Borobudur

Keywords: Virtual Reality, VRML models in teaching, Borobudur stupa (Java)

What is Virtual Reality?

Over recent years, with the growing ability of the web to present ever-larger, ever-better still images, art historians amongst others have been focusing on the development of programmes able to construct a version of the world we see with our eyes - that is, in three dimensions, and also (sometimes) in stereo - which gives our students some knowledge of context, as some mitigation for the usual imaging method of presenting two flat images (the same dimensions no matter what their real sizes) in a darkened room. The Virtual Reality Modelling Language (VRML) holds out the possibility of using a computer to develop and elaborate such context; it is a language for constructing artificial worlds, with which we can interact by giving directions to the software to move, pan and zoom the scenes. Elements within the world can be hotspotted, so that a mouseclick will open a related HTML page or execute an action, giving access to still images, video, and sound. Apart from teaching students of art history who might not have visited the site being modelled, uses of VRML include education, process control, and training (e.g. for surgery, piloting, firefighting). It is no substitute for actually visiting a site, but might be a step up from the darkened-room approach to the discipline.

The great advantages of VRML are: the sense of "being there" - of being able to move in and around the building or site; the user is in control, and can manipulate the software at a convenient speed and in desired directions; and additional "hooks" provide a multi-layered, potentially rich learning environment. But there are concomitant disadvantages: VRML whilst simple to build for models which employ the most basic building-blocks of cube and pyramid, is difficult and expensive to construct and display properly for anything intricate. So that whereas the smooth surfaces and simple forms of a modernist Le Corbusier villa are easy to build, the extravagant lines, swinging curves and intricate detail of the baroque facade of Blenheim are almost impossible to reproduce accurately. So difficult is the attainment of accuracy that there is a natural tendency to model - say - just one capital or base, and then to repeat it throughout the model. Hence accuracy is a decided problem in VRML.

Modern photogrammetry packages hold out the possibility of straightforward modelling because, instead of building a model from geometry (CAD - computer aided design), they begin with a series of photographs, and apply the geometry to the angles and surfaces of the photographs. Superficially at least, the "working backwards" from photographs of the real world to a geometric construction that can be imported into a CAD package or into VRML should be much more lifelike than going the other way, via CAD: after all, it is already clothed in its actual textures and hues, and set in its true environment, and "all" the user has to do is to clothe the constructed 3D version with the real-life textures, usually using images which start out uncalibrated, and taken from arbitrary viewpoints. And the software typically offers photo-realistic models from uncalibrated photos. The goal is obviously the automatic extraction of as much information by software as possible, with as little work as possible to be done by the human being; and several packages proclaim increasingly high levels of such automation, including automating the task after first having the user set down the basic parameters - model refinement, as Anthony Dick at Cambridge has called it. It is even rumoured that three-dimensional information should be capable of extrapolation from stereoscopic photographs, but I have never seen this demonstrated. At present "totally automatic" model extraction needs large numbers of photographs, and a lot of computing power. Could somebody point me at a model of a work of architecture (as distinct from a small model on a laboratory turntable) which uses such a technique?

Unfortunately, however, there are snags with starting from photographs that are not easy to overcome, and which one doubts software could solve either. They fall into several categories:

  1. the software needs the human being: think of the software as blind to the photograph, and the human being as the guide-dog which simply offers the software a collection of points in two dimensions which the software can extrapolate into three dimensions. The more information the human being feeds in, the more accurate the finished model can be. We are apparently a long way from any software (perhaps through the use of stereo pairs) "recognizing" corners or even verticals, let alone measurements of distance - so sitting back for five years until the software can do the lot does not seem to be an option!
  2. orientation: all photo-modelling programmes are very choosy about the relative angles of photographs offered to them, and cannot extrapolate three-dimensional information if the views are incomplete or the camera positions too close together;
  3. fields of view: Life would be simple were each target building on a podium, with a large field of view all round, and convenient hills from which to take top-down shots (which is indeed the way in which some VRML projects tend to be presented!). But most buildings are obscured - occlusion - by anything from street furniture to other buildings or trees. Photography from all angles is rarely possible; and the golden rule of photo-modelling programs - to take shots at 45 degrees, i.e. looking towards the corners - sometimes very difficult to keep;
  4. more accurate detail, more work: using sections of the photograph to "clothe" the model is progressively more difficult the more detailed and twisty-curvy the object becomes. Forgetting the detail and going for the "grand picture" merely produces cardboard cut-outs which lack the lifelike grit of the real world;
  5. stage scenery or fakery? the usual result of VRML modelling is therefore either stage-scenery (because the back and perhaps side views are unreachable), or a fake, because the sides invisible to the camera are imitated from views which have been captured.

So is VRML worth the effort? And are there any other software solutions to the problem of offering a sense of "being there" better than that obtainable with flat, two-dimensional photographs? An examination of a large and elaborate project - the 9th-century Buddhist stupa on the island of Java: Borobudur - will demonstrate the necessary approaches, and the advantages and drawbacks of the methodology.

Why Borobudur?

The selection of Borobudur as a suitable subject arose from several circumstances:

  1. This World Heritage monument is accessible, substantially complete, and the object of scholarly, religious and tourist interest; Borobudur is the largest man-made monument in this class near to Australia;
  2. The restoration campaign conducted by Theodoor van Erp generated publications containing large monochrome photographs of Borobudur and all its reliefs, including those of the Hidden Basement (the majority subsequently covered up again);
  3. The volcanic stone of Borobudur is dark-grey and porous (it was probably covered with plaster and then painted), so a project focusing on over 3,000 monochrome images is possible and reasonable: any colour now to be found on the monument is due to mosses and lichens, themselves the result of the tropical climate.
  4. There are few monuments which have been as comprehensively photographed as Borobudur (and long enough ago for the images to be out of copyright); and the full suite of photos of the Hidden Basement, not to mention the very complexity of the monument, provides a good target for VRML and the HTML extensions. For example, whilst no computer simulation can substitute for a visit to the monument itself, our VRML model provides an opportunity to examine the whole monument, or any of its details, at leisure, and also allows the user to call up relevant text, comparative monuments, etc. to fill out the study.

The construction of the VRML model began in 1998, using a CAD package. Packages allowing VRML construction from photographs were then in their infancy and, in any case, there were insufficient photographs of the stupa at suitable angles to allow such an approach. From a primitive CAD skeleton, the remainder of the model was built by hand by the programmer (Dr. A.J. Limaye, of the Australian National University's Supercomputer Group and Visualization Studio). We came across the problems mentioned above, to do with accuracy of modelling (against time for construction), and with the great difficulty of modelling complicated architectural details and three-dimensional sculpture. Luckily, the bas-reliefs are the main feature of the stupa, so the armature of the various galleries could be simply modelled, and used as "hanging space" for the reliefs, which were used as textures to "clothe" the model. Although large, the structure of the stupa is essentially simple, being an outward facing "pyramid" without any internal rooms, and with each of the galleries of essentially similar construction, simply diminishing in length the higher up the structure. Because the project was to be made available over the web, tours of the structure were provided at a variety of photo resolutions and hence network speeds, to cater also for slow computers. In addition, the galleries were "quartered", and individual tours provided to each quarter of each gallery, with a link allowing a smooth transition to the next quarter if desired. Again, because of the real possibility of getting lost in such a structure, two overlays were provided, the one showing exactly where the current tour is, and the other allowing the user to dial up any of the available tours at any photo resolution.

What are the advantages of such a project? The user can "walk" through the galleries and terraces, and examine each relief in detail, moving to and fro as desired. The Hidden Basement (uncovered by the Dutch, the reliefs photographed, and the majority then re-covered) can also be visited, offering one advantage of the VRML model over an actual visit to the site. Again, explanatory texts have been added to the reliefs, facilitating a kind of study which would be impossible on site. The disadvantages are evident from any attempt to examine in close-up the three-dimensional sculpture populating the niches which decorate each gallery: these have been modelled as textures in VRML, so that they are two not three-dimensional. And because of time-constraints, one sequence of niches has been duplicated to stand in for all sequences - that is, the modelling does not accurately reproduce what is to be seen on site.

As intimated above, the chances of VRML being able accurately and easily to model three-dimensional sculpture in the field - i.e. outside miniature laboratory conditions - are slim in the short term. So are there other ways of approximating reality which might be used by the busy art historian with insufficient time (and probably skill) for elaborate VRML construction?

Various Kinds of Virtual Reality

In fact, there are several ways a computer can display more of the world than is visible in one two-dimensional photograph, all of which are constructed by software in the computer, and most of which our predecessors in the 18th and 19th centuries were keen to implement in some mechanical fashion, because they likewise recognised the restricting nature of two dimensions whether in paint or photography. All the following are simpler to set up and execute than any VRML model

  1. stitch overlapping photographs into a panorama, perhaps even in 360 degrees, with programs such as Photovista or Ulead Cool 360: precise registration needed; software of increasing sophistication now available. Compare the full-size panoramas so popular in the 18th and 19th centuries;
  2. connect stereo effects which, with glasses, enhance the impression of depth: again, precise registration of the two images needed; again, software available - but of course standard stereo loses colour, and polaroid alternatives are only at laboratory stage. The development of photography soon entailed experiments in stereo reconstruction;
  3. make either of the above but one of a series of environments, with movement between them by a mouse-click, as if moving around the rooms in a house: QuickTime or Reality Studio can do this. Several elaborate 19th-century setups did indeed employ multiple, full-size panoramas;

All approaches have their value and drawbacks. Panoramas and stereo images are simple to construct but, except for the impressions of breadth in the former and depth in the latter, still keep the viewer behind the barrier of the photo plane (although zooming in and out offers some kind of autonomy to the user). The concatenation of panoramas "in depth" is more like a real-life experience, but still the user cannot manipulate what is seem - only move around the circular panoramas, in and out like some waltz figure. Getting lost is a real possibility, although location can be tracked on a plan, and a commentary can be provided through hotspots and linked web pages.

The Classroom for Learning Art History in the New Millennium

If VRML is unlikely to become suitably powerful and simple to be manipulated by art history lecturers (rather than a skilled programmer), then how are we to use computers to learn art history in this new millennium? The darkened room will probably survive, but it should contain more flexible technology, offering web multimedia and browsers also to access CD-ROMs; video/audio will certainly grow, because they are so simple to set up (if not to edit). Panoramic and 360-degree zoomable views, hotspotted imagemaps and stereo views make learning more cogent - especially as an increasing number of students will be learning from home, with their computer screen their "window" on the darkened lecture room.

But it is precisely the "window" aspect of computer monitors that we sometimes need to banish in the interests of enhanced reality. Such Renaissance-like windows will survive in the small portable computers with which the students will be equipped, allowing them to work with video, audio etc, and used for uploading material from the web or a bancomat-type machine. The classroom itself will probably survive, but not simply for technical reasons - rather, because people need people. In them, lectures will be delivered using digital images pulled from the web and video-projected into the lecture theatre (as I have been doing for three years now). Similarly, students will be issued with CD-ROMs of course images, and initial unit documentation; and the web will be used as the notice-board for all augmentations, changes, updates, thereby consigning paper to different uses. Within a few years, assuming computer-screen material similar to Thin Film Transistor (TFT) technology becomes cheap enough, we shall have very large images projected into screens at right-angles, so that the students and lecturers are inside any virtual world created - and probably wearing stereo glasses to enhance the three-dimensional effect (after all, the Walkman looked funny when introduced, so stereo glasses are but a small and cheap piece of additional technology). Conceivably, we may look forward to robot cameras at important sites, controlled from the lectern or by individual students. The idea of a web-controlled video camera is almost as old as the web, and was latched onto very early by astronomers; since video cameras are so small and so cheap, a few trial sites would be welcome. After all, several churches reportedly rent out their towers and spires for mobine phone technology, so why not something similar for education?

At the administrative level, the web will also gain in importance. All lecturers will be armed with programs which check that essays have not been downloaded from the web; essays will naturally be submitted over the web, and seminar presentations mounted in advance as web pages. In such an image-rich subject, theses on paper should rapidly disappear, to be replaced by richly-illustrated theses presented on CD-ROM with multimedia where appropriate (NB I have a 1994 Geology PhD from Stanford submitted this way).

Conclusion: Requirements Should Lead, not Technology

A well-known problem with computers is that we tend to accept what their software and capabilities can offer us - which is natural - but then to compromise on our requirements - which is dangerous. In other words, we are quite happy to ditch perennial and unchanging requirements because the technology (which certainly will change) has dazzled us. In such circumstances, technology determines use, whereas the basic requirements of the discipline requiring the images/VRML should determine how the technology is to be deployed - if at all.

This (necessary?) inversion of roles (read: weak-kneed compromise) is seen in many VRML projects, including Borobudur: full detailing of the architecture has been omitted; instead, one section has been modelled, and then copied - so the result is no use to anyone wishing to study the monument's subtleties. The 3D sculpture has been ignored: it is simply too difficult to model. The quantities of data required have led us to chop the monument up into over 20 sections, and then to offer three different resolutions - so effectively about 70 different VRML segments to cater for network and machine speeds. We could do no other about the network speeds; but should we perhaps have used other, simpler technologies for the presentation of the bas-reliefs? We might, for example, have simply presented the structure as a series of photographs and, instead of the many months of programming time involved in constructing the VRML model, concentrated our efforts on the simpler technologies sketched above.

It may well be that computer scientists will soon develop some system whereby the software itself deduces the third dimension and builds the model (stereo has been suggested as a starting point). But until there is a light, handheld device which I can use in a church without getting thrown out (no chariots with multiple video cameras; no enormous turntables for laser recording Michelangelo's David, etc), then I suggest the following as the modest panoply for the art historian wishing to recreate context:

  1. VRML for a general (even exciting) overview, which shows the user the lie of the land, and the intricacies of the monument(s) under study;
  2. ordinary panoramic images for a simply-constructed "wide" view: of the world;
  3. hotspotted or linked panoramas for more elaborate presentations, probably using the web;
  4. stereo pairs - which can be of any size, and which can offer a remarkably effective impression of the third dimension;

September 2000

Some useful links

  1. The Web 3D Consortium;
  2. The VRML Repository;
  3. Software to make 3D models from photographs: PhotoBuilder; Photomodeller; ShapeCapture
  4. 3D models from measured drawings with photographs added: Borobudur;
  5. "Easy" technique: Reconstruction from uncalibrated photographs: the full paper is here;
  6. An archaeological site: Virtual Sagalassos;
  7. Stereoscopic photography: anaglyphic stereo imaging;
  8. Software for making panoramas: Ulead Cool 360; PhotoVista;
  9. Stitched panoramas: Piazza del Popolo or Castel Sant' Angelo;