Accès direct au contenu

 

logo du site ENS

Retour à l'accueil du site

Search

Accueil  >  Projects

Print

Computer Sciences (Algorithms)


The research project is under the co-supervision of
Yves ROBERT                                                  Wang Changbo
(Computer Sciences, ENS Lyon)                     (Computer Sciences, ECNU)
Yves.Robert@inria.fr                                       cbwang@sei.ecnu.edu.cn

SUMMARY

This project deals with silent and non-functional errors, which are causing a major threat to data-intensive applications running on large-scale platforms or scientific clouds.

Silent errors have become a major problem for large-scale distributed systems. Big data processing means that larger memory volumes and higher computational requirements are always required, hence an increase in the probability of corrupted bits in memory or incorrect CPU results. Such silent errors are a major threat to the accuracy and trustability of big data applications. However, detecting silent errors is hard, and correcting them is even harder. This project aims at developing generic algorithms to achieve both detection and correction of silent errors, by coupling verification mechanisms and check pointing protocols.
Application-specific techniques will also be investigated to decrease detection/correction cost for dense and sparse numerical linear algebra.

Non-functional errors are an important source of problems for scientific applications deployed on distributed cloud computing platforms. These applications are expressed in terms of workflows, hence cloud workflow systems are being widely used as platform software (or middleware services) to facilitate the usage of cloud services. The quality of a cloud workflow application is determined by the collective behavior of all the cloud software services employed by the workflow application. Given that certain amount of uncertainty lies in every cloud service, the quality of a cloud workflow instance becomes a much more complex combinatorial problem. Non-functional errors, namely the violations of service quality constraints, can significantly deteriorate the usability of big data applications. Activity-point based checkpoint selection and time-point based checkpoint selection are the two major types of strategies in workflow temporal verification. Specifically, activity-point based checkpoint selection monitors the response-time of each activity and it is normally used for the monitoring of a large-size single sequential process. In contrast, time-point based checkpoint selection monitors the throughput and it is often used for the monitoring of a large batch of parallel processes. Both activity-point and time-point based checkpoint selection strategies need to be investigated so as to provide effective quality assurance for scientific cloud workflows.

 

Top of the page

 
 
Last update February 24, 2016
Ens de Lyon
15 parvis René Descartes - BP 7000 69342 Lyon Cedex 07 - FRANCE
Tél. : Site René Descartes (siège) : +33 (0) 4 37 37 60 00 / Site Jacques Monod : +33 (0) 4 72 72 80 00