Stable Storage

Stable Storage based on Mirroring or Replication

Author: Xavier Caron

The concept of Stable Storage has its origins in the realm of transactions and databases. A stable storage unit can be seen as an ideal storage medium that, given a set of failure assumptions, protects user data from corruption or loss. Such a storage unit offers two operations to the user, Write and Read, which can be used to store and retrieve user data to and from stable storage.

Stable storage guarantees atomicity of the write operation, e.g. either all data is written to the storage unit, or none at all, even in the case of a crash. So a write operation appears as indivisible. As a result, a read operation will always return consistent data. However, the user of a stable storage unit has to design the application in such a way that, after recovering from a crash failure, it can deal with the system either in the old or in the new state without knowing which one holds. In this short web presentation, we will introduce two forms of stable storage:

  • Stable storage based on a technique called mirroring. The integrity of the stored data must be guaranteed even in the presence of crash failures.
  • Stable storage based on replication. Using distributed programming, a mechanism is devised that allows to multi-cast the information that has to be made durable to a group of collaborating machines for storage.

[3] constructs a framework for providing persistence for Ada objects based on streams. It classifies storage devices in a class hierarchy according to essential properties, like volatility, stability, etc. An abstract root class "Storage" defines the common interface for all storage classes, including Read and Write operations. The storage hierarchy is then split into volatile and non-volatile storage. Data stored in non-volatile storage remain intact even when the program terminates. Among the different types of non-volatile storage, there is then the distinction between stable and non-stable storage. Finally, the mirrored storage and the replicated storage are subclasses of the stable storage class. Here is an UML diagram of that hierarchy:

Mirroring

In specialized literature, the "mirroring" technique, sometimes called "shadowing", often refers to duplication of data. For example, the Ralston Encyclopedia of Computer Science says: "Another recent trend is to duplicate data to enhance reliability. This technique, called mirroring or shadowing, allows systems to con-tinue operation in spite of media, controller, or channel failure. Sophisticated systems also take advantage of the extra I/O path to enhance throughput. On-line reconstruction ("remirroring") of a new second copy when one of the original two is lost is also common."

The main idea is to write data in two locations instead of one, in a sequential order. If one write operation fails, we assume that the other copy is in a consistent state. It may be the state that was valid before the write operation, or it may already be the new one. Of course, there must be a mechanism to determine which one of the two copies contains the valid data. For this purpose, a third storage unit called the log is used. It allows us to distinguish between the three possible situations depending on the moment of the crash:

  • The crash happens before or after the write operation, i.e. the log does not indicate any problem,
  • The crash happens while writing the first copy, or
  • The crash happens while writing the second copy.

Mirroring can be used for instance in a transactional system in order to keep uncorrupted a log table mapped on sequential files.

[1] is a complete paper on the subject. Its purpose is to describe the mirroring algorithm, and to present a state automaton (cf. next figure) covering all possible situations that can occur in the case of crash failures. Finally, an implementation in Ada 95 is presented.

Replication

Traditionally, the main idea of replication is to store copies of a same object on different sites. Here is what we can read about the subject in McDermid Software Engineer's Reference Book: "Replicated files or other objects are usually provided to ensure resilience to node failures. Replicated data should always appear consistent to the user, even though all the copies may not be identical. Generally, it should be possible to execute any operation on the data at any of the sites holding a copy."

In our thesis [2], the "sites" are storage units located on processing nodes. The challenge is to manage dynamically the evolution of the group of replicas since some of them can be suddenly unreachable or others can join the group.
With reliability, performance is an important feature. Therefore, operations made on the storage units must not be performed sequentially but broadcasting has to be used. Similarly, the user cannot and might not want to wait that all replicas have answered to an operation before performing the next one. A protocol has to be found to solve these problems.

Next figure shows the overall design of the replicated storage. For a complete description of the system, the interested reader may refer to chapter 4 of [2].

Other Related Work

  • The Object Persistence page: It describes the persistence framework in more detail.
  • The OPTIMA framework page: OPTIMA is an object-oriented framework that provides the necessary runtime support for open multithreaded transactions, a transaction model for concurrent object-oriented progamming. OPTIMA uses the stable storage implementation for storing the transactional log.

References

  1. Xavier Caron, Jörg Kienzle and Alfred Strohmeier, "Object-Oriented Stable Storage Based on Mirroring", 6th International Conference on Reliable Software Technologies - Ada-Europe'2001, Leuven, Belgium, May 14 - 18, 2001, LNCS (Lecture Notes in Computer Science), no. 2043, Springer Verlag, 2001, pp. 278-289, Also available as Technical Report (EPFL-DI No 00/340). abstract+file
  2. Xavier Caron, "Object-Oriented Stable Storage", M.S. Thesis, Ecole Polytechnique Fédérale de Lausanne, Département d'Informatique, 1015 Lausanne, Switzerland, 2000. abstract+file
  3. Jörg Kienzle, "Open Multithreaded Transactions: A Transaction Model for Concurrent Object-Oriented Programming", Ph.D. Thesis, no 2393, Swiss Federal Institute of Technology Lausanne, Software Engineering Laboratory, Computer Science Departement, EPFL, 1015 Lausanne, Switzerland, 2001.
EPFL | IC | LGL | Research
URL: http://lgl.epfl.ch/research/ongoing/stable_storage.html
Last modified 8/20/2001, X. Caron.