Phase III: Fault-tolerance Through Message-Logging

Due: 8:00am Tuesday, Oct. 30.

General Instructions. Students are required to work together in teams. You may work in the team you used for Phase II or you may form a new team. An assignment submitted on behalf of a "team" having fewer than 3 or more than 5 students will receive a grade of F. All members of the team are responsible for understanding the entire assignment.

This assignment builds on work that was done in Phase I. Feel free to use any team's solution to Phase I as the basis for your solution to Phase III.

No late assignments will be accepted.

Academic Integrity. Collaboration between groups is prohibited and will be treated as a violation of the University's academic integrity code.

Background: Disaster Recovery for Bank Branches

In the distributed banking system built for Phase I, account balances are stored in the volatile memory of processors --- not on secondary storage. If the processor at a branch crashes, then access to information about those balances is not only lost while the processor remains unavailable but can be lost forever. Neither the bank nor its customers are likely to be happy about such a loss of account information.

Use of a message logging protocol is one way that a branch server could recover information concerning that branch's accounts after a faulty processor has been repaired. Customers might not be happy about losing access to their accounts while "the computer is down" but at least there would be no irrevocable consequences of a processor failure.

In this phase of the cs514 project, you will program a message-logging protocol (of your choice) for the branch servers of the distributed bank implemented for phase I. This exercise should:

What to Build

For this phase, assume: Branch GUI and Branch Server. Extend the branch GUI from Phase I with a "button" that instigates the failure of the corresponding branch server. Activating this button should cause the branch server to simulate a failstop failure and recovery by doing the following.
  1. Send a message to each of neighbor, announcing that this branch server has failed.
  2. Cause the branch server application to terminate (perhaps by sending a suitable message to a modified version of that branch server).
  3. After some time has passed (say 30 seconds) since the branch server has terminated, cause a new copy of the branch server to start and run some "recovery code".

Message Logging Protocol Implementation. Also, write "recovery code" for the branch server. Execution of this code enables the branch server to obtain correct and current account balances before that branch server commences processing subsequent requests for the various operations it supports. Use message logs stored in the volatile memories of the other branch servers. (An industrial-strength implementation might also employ checkpoints to allow the truncation of message logs.)

As mentioned in class, descriptions of some message-logging protocols and citations to relevant literature can be found in:

Mootaz Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. A survey of rollback-recovery protocols in message-passing systems. Technical Report. Available from http://www.cs.utexas.edu/users/lorenzo/papers/Pap6.ps or locally postscript or pdf. Because this paper was written using MicroSoft Word, windows ghostview (and not UNIX ghostview) must be used as the Helper Application for reading.

The scheme you implement must be one of the schemes that is discussed in this survey or in some other published paper on the subject.

Submission Procedure. Create a directory containing the files you wish us to grade. Call this directory xxxxx0, where xxxxx is the Cornell network id of the team member whose net id is alphabetically smallest of your team. Then copy this directory to the following folder:

\\Goose\courses\cs514-fall01\proj03.submit

Don't be disturbed by warnings informing you that the file cannot be accessed after it has been copied.

Should you wish to revise your submission after you have copied it to our folder, then simply correct the files and re-copy the entire directory---but this time use the name xxxxx1. Revisions to that should be named xxxxx2, and so on. We will grade only the largest-numbered file of a series. No late submissions will be accepted.

Your directory should contain the following files (at least):

TEAM which contains the names (and net-ids) for all team members. Also, for each team member give a 1 or 2 paragraph description of the tasks this team member performed and the number of hours this required.

README which contains

LOGIC which contains

TOPO should specify an interesting interconnection topology for a multi-branch bank that will be used to illustrate the operation of your system. You may restrict consideration to systems in which all links are bidirectional

TestPlan should describe the process and any tools (i.e. additional programs) you wrote in order to test your system. This file should also explain what tests you ran and why this was a reasonable set of tests to have run.

JAVA source file that contain the Java source needed to compile and test your system.

Grading. Your grade will be based on the following elements: