Active defense is being used widely in cybersecurity circles to refer to techniques that aim to empower systems and operators by not only being reactive but also designing for and taking measures to anticipate attacker actions. Though it is often portrayed as “hacking back,” many active defense techniques focus on resilience as a central design methodology.
Resilience within computer systems is generally taken to mean dependability and performance in the face of unanticipated changes.i In a cyber system or a cyber-phyiscal system, our goal is to increase robustness while reducing brittleness. Figure XX shows an idealized response curve of both a resilient system and an un-resilient system to demonstrate these properties of robustness and brittleness.ii
When developing systems with resilience as a goal, we have many factors to consider. These factors are idealized in the resilience threshold, but can be generally encompassed by the six overarching principles:iii
Adaptability is a system’s ability to change configuration or runtime parameters in response to an external event. This can be accomplished at a system level through pre-programmed execution paths or failure modes, through machine learning, or through operator intervention. This can take the form of on demand scaling of resources, removing or adding diversity to a system, or reducing the attack surface in response to an external event. Systems and protocols should be able to change over time, adapt to new conditions as they develop, and be scaleable in reaction to outside forces.iv
Redundancy in a cyber system means building multiple resources that serve the same function and can replace each other in the event of the loss of primary system resources.v Redundancy is a strong selling point for active defense strategies. In many active defense strategies such as moving target defense (MTD), redundant channels are build into the design to satisfy the goals of diversity and variance. If these systems are designed with fallback to redundant channels as a default mitigation strategy, they can greatly increase overall system resilience.
Fault tolerance is a means of achieving dependability by adapting to failure. In cyber system design, we generally divide this into three categories: hardware, software, and system fault tolerance. Hardware fault tolerance measures include redundant communications, replicated processors, additional memory, and redundant power/energy supplies. Software fault tolerance involves
structuring the computer software to compensate for faults such as changes in program or data structures due to transients or design errors. This includes mechanisms such as exception handling, error checking, etc. System fault tolerance means designing cyber systems that provide functions that compensate for failures in other system facilities that are not computer-based.vi
Mitigation is the system’s ability to respond to a failure or support a human in responding to that failure. When failure or compromise occurs, mitigation strategies involve building workflows or execution paths that allow a system or operator to respond to that failure.vii This make take the form of automatic system quarantining, activation of redundant channels, or even attack-back strategies.
Survivability is the ability to maintain or provide graceful degradation of operational goals when under attack.viii This means using the adaptability strategies to satisfy operational goals.
Recoverability in a cyber system means designing with strategies to provide a means to restore operations quickly and effectively following a service disruption.ix This can be accomplished by redundant failovers, automatic restoration of backups, or dynamic insertion of new functionality to restore failed functionality.
Active defense techniques should endeavor to design with these six principles as goals or features. Though counterattack or hacking back may be a part of this strategy, designing for resilience is an active defense. The status quo of cyber defenses: firewalls, IDS, system patching, and the like are static, brittle, and require operator intervention. Our next generation of defenses must encompass these resilience principles.
Active defense strategies like Argonne’s MORE MTD, have the ability to increase an application’s resilience threshold by varying access paths or attack surfaces. In MORE MTD, such a variance does not affect the brittleness of any individual operating system, but the application grows in overall resilience by residing on multiple platforms. We improve resilience through this increasing uncertainty and reacting to failure. For the sake of a security system, this means both attempting to prevent unanticipated execution of code, but also to recover sanely from any such unanticipated execution. to reduce the likelihood of a successful exploit, reduce the impact of any successful exploits, and keep the application up and available both during rotation and in the event of a successful exploit.
This post was written by: Mike Thompson
i Meyer, J. F. (September 2009), Defining and Evaluating Resilience: A Performability Perspective, presentation at International Workshop on Performability Modeling of Computer and Communication Systems.
ii Rieger, C. G. (August 2010), Notional examples and benchmark aspects of a resilient control system, 3rd International Symposium on Resilient Control Systems, pp. 64–71
iii Schneier, B (June 2015), Incident Response Is Key to Resilience, http://www.telecomasia.net/content/incident-response-key-resilience, Accessed on Dec 15, 2015.
iv Ricciulli, L. (March 1998), ANCORS: Adaptable Network Control and Reporting System, http://www.csl.sri.com/papers/sri-csl-98-01/sri-csl-98-01.pdf, Accessed Dec 15, 2015.
v Baker, J. (Jan 2004), Mapping the Risks: Assessing Homeland Security Implications of Publicly Available Geospatial Information, p 38.
vii Mace, J. (August 2015), Resiliency Varience in Workflows with Choice, Software Engineering for Resilient Systems 7th Annual Workshop, pp. 126-137.
viii Cardenas, A. (June 2008), Secure Control: Towards Survivable Cyber-Physical Systems, 28th International Conference on Distributed Computing Systems, https://www.truststc.org/pubs/345/cardenas-SecureControl-v1.pdf, Accessed Dec 15, 2015.
ix Swanson, M. (June 2002), Contingency Planning Guide for Information Technology Systems, NIST Special Publication 800-34, http://ithandbook.ffiec.gov/media/22151/ex_nist_sp_800_34.pdf, Accessed Dec 15, 2015.