A Large-Scale Framework for Automatic Addition of Fault-Tolerance
The advent of multi-core processors
justifies the need for the tools that facilitate the design of parallel
programs and decrease development costs. Design automation is an
approach that has the potential to decrease development costs. More
importantly, design automation results in generating a program that is
correct by construction, thereby eliminating the need for its proof of
correctness. However, the exponential complexity of automatic design of
parallel programs is a major obstacle in front of the development of
such tools. One approach for decreasing the computational cost of
automated design is to exploit the processing power of
parallel/distributed platforms for design automation.
The focus of this work is on the development of a distributed framework
for automated
design of fault-tolerant parallel programs from their fault-intolerant
version. Specifically, we propose a divide-and-conquer
approach that takes an existing fault-intolerant program and partitions
the intolerant program into a set of subsets of its
instructions. Subsequently, each subset of instructions is
automatically analyzed and revised in isolation on a separate machine
in such a way that the entire program becomes fault-tolerant against a
specific type of faults. Based on this approach, thus far, we have
implemented a distributed
fault tolerance synthesizer (in C++) that utilizes the
processing power of parallel/distributed machines to add fault
tolerance to parallel/distributed programs. Our experiments with the
current implementation of our framework show very promising results.
PARallel program
DESigner (PARDES): A
Framework for Action-Level Compositional Design of the Behavioral
Evolution of Parallel Programs
Program behaviors evolve often due to incomplete specifications and
unanticipated requirements. Designing the behavioral evolution of
parallel programs is difficult in part due to inherent non-determinism
of such programs. In our
previous work, we have formulated the behavioral evolution of
finite-state parallel programs in terms of adding Linear Temporal Logic
(LTL) properties to existing programs, where we revise an existing
program in order to satisfy a new (safety or liveness) property while
preserving existing properties. We extend our work by
presenting a novel approach for decomposing the addition problem to the
level of program actions, called action-level addition.
In the proposed approach, we separately analyze and revise the actions
of an existing program in order to accommodate a new property while
preserving already existing properties. Our proposed approach has
important applications in distributed model checking and model-driven
development as well. Specifically, in model checking, we distribute the
actions of a program on network of machines where the actions are
analyzed separately so the entire analysis will result in verifying the
entire program for a specific LTL property. Our approach provides the
option to go one step beyond model checking by providing corrective
suggestions to developers. We have implemented our action-level method
in an extensible software framework, called PARallel program DESigner (PARDES),
that exploits the computational resources of distributed machines for
the addition of LTL properties. For more information on PARDES, please
click here.
Modeling, Analysis and Design of Software Fault-Tolerance
In this project, our goal is to develop a methodology for modeling faults and fault-tolerance in UML. Our methodology is aimed at bridging the gap between the theory of fault-tolerance and the development of fault-tolerant systems. Specifically, we are working on the integration of two research areas towards developing a roundtrip engineering framework for automated analysis and design of fault-tolerant (distributed) systems. First, we have previously developed techniques for (i) generating formal specifications from UML models; (ii) analyzing the generated formal specifications, and (iii) visualizing the results of the analysis in the UML models. Second, we have developed a theory of automated synthesis of fault-tolerance concerns supported by a software tool that automatically adds fault-tolerance concerns to existing formal models. The integration of these two areas will result in an integrated environment for modeling, analysis, and code generation for fault-tolerant systems.