20th International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, United States Of America, 10 - 14 October 2011, pp.199-200, (Full Text)
Fault-tolerance has become an essential concern for processor designers due to increasing transient and permanent fault rates. In this study we propose SymptomTM, a symptombased error detection technique that recovers from errors by leveraging the abort mechanism of Transactional Memory (TM). To the best of our knowledge, this is the first architectural fault-tolerance proposal using Hardware Transactional Memory (HTM). SymptomTM can recover from 86% and 65% of catastrophic failures caused by transient and permanent errors respectively with no performance overhead in error-free executions. © 2011 IEEE.