CS 410/510 - Software Engineering class notes

Reference: Sommerville, Software Engineering, 10 erectile dysfunction., Chapter 14

The large picture

The resilience of the product is a judgment of methods well that system can keep up with the continuity of their critical services in the existence of disruptive occasions, for example equipment failure and cyberattacks. This view encompasses these 3 ideas:

A few of the services provided with a system are critical services whose failure might have serious human, social or economic effects.
Some occasions are disruptive and may affect ale a method to provide its critical services.
Resilience is really a judgment – there aren’t any resilience metrics and resilience can’t be measured. The resilience of the system are only able to be assessed by experts, who are able to check out the system and it is operational processes.

Resilience engineering places more focus on restricting the amount of system failures that arise from exterior occasions for example operator errors or cyberattacks. Assumptions:

It’s impossible to prevent system failures and thus is worried with restricting the expense of those failures and recovering from their store.
Good reliability engineering practices happen to be accustomed to minimize the amount of technical problems inside a system.

Four related resilience activities take part in the recognition and recovery from system problems:

The machine or its operators should recognize early warning signs of system failure.
When the signs and symptoms of the problem or cyberattack are detected early, then resistance strategies enables you to lessen the probability the system will fail.
If your failure occurs, the recovery activity helps to ensure that critical system services are restored rapidly to ensure that system users aren’t badly impacted by failure.
Within this final activity, all the system services are restored and normal system operation can continue.

Cybersecurity

Cybercrime may be the illegal utilization of networked systems and is among the most serious problems facing society.

Cybersecurity is really a broader subject than system security engineering.

Cybersecurity is really a socio-technical issue covering every aspect of making certain the security of citizens, companies, and demanding infrastructures from threats that arise using their utilization of computers and also the Internet.

Cybersecurity is worried wonderful an organization’s IT assets from systems right through to application systems.

Factors adding to cybersecurity failure:

Business ignorance from the significance from the problem,
Poor design and poor use of security procedures,
Human negligence,
Inappropriate trade-offs between usability and security.

Cybersecurity threats:

Threats towards the confidentiality of assets: information is not broken but it’s distributed around individuals who should not need it.
Threats towards the integrity of assets: systems or data are broken in some manner with a cyberattack.
Threats towards the availability of assets: try to deny the utilization of assets by approved users.

Types of controls to safeguard the assets:

Authentication, where users of the system need to show that they’re approved to gain access to the machine.
File encryption, where information is algorithmically scrambled to ensure that an unauthorized readers cannot connect to the information.
Firewalls, where incoming network packets are examined then recognized or rejected based on some business rules.

Redundancy and variety are valuable for cybersecurity resilience:

Copies of information and software ought to be maintained on separate personal computers (supports recovery and reinstatement).
Multi-stage diverse authentication can safeguard against password attacks (supports resistance).
Critical servers might be over-provisioned i.e. they might be more effective than is needed to deal with their expected load (supports resistance).

Cyber resilience planning:

Asset classification: the organization’s hardware, software and human assets are examined and classified for the way essential they’re to normalcy operations.
Threat identification: for each one of the assets (or, a minimum of the critical and important assets), you need to identify and classify threats to that particular asset.
Threat recognition: for every threat or, sometimes asset/threat pair, you need to identify how a panic attack according to that threat may be recognized.
Threat resistance: for every threat or asset/threat pair, you need to identify possible resistance strategies. These could be either baked into the machine (technical strategies) or may depend on operational procedures.
Asset recovery: for every critical asset or asset/threat pair, you need to see how that asset might be retrieved in case of a effective cyberattack.
Asset reinstatement: this can be a more general procedure for asset recovery in which you define procedures to create the machine back to normal operation.

Socio-technical resilience

Resilience engineering is worried with adverse exterior occasions that can result in system failure.

To create a resilient system, you need to consider socio-technical systems design and never solely concentrate on software.

Coping with these occasions is frequently simpler and much more good at the broader socio-technical system.

Four characteristics that reflect the resilience of the organization:

The opportunity to respond: Organizations have so that you can adapt their processes and operations as a result of risks. These risks might be anticipated risks or might be detected threats towards the organization and it is systems.
The opportunity to monitor: Organizations should monitor both their internal operations as well as their exterior atmosphere for threats before they arise.
The opportunity to anticipate: A resilient organization shouldn’t simply concentrate on its current operations but should anticipate possible future occasions and changes that could affect its operations and resilience.
The opportunity to learn: Business resilience could be improved by gaining knowledge from experience. It’s particularly significant to understand from effective responses to adverse occasions like the effective resistance of the cyberattack. Gaining knowledge from success enables.

People inevitably get some things wrong (human errors) that typically result in serious system failures. There’s two methods to consider human error:

The individual approach. Errors are regarded as down to the person and ‘unsafe acts’ (just like an operator neglecting to engage a security barrier) are due to individual negligence or reckless behavior.
The systems approach. The fundamental assumption is the fact that individuals are fallible and can get some things wrong. People get some things wrong since they’re pressurized from high workloads, poor training or due to inappropriate system design.

Systems engineers should think that human errors will occur during system operation.

To enhance the resilience of the system, designers need to consider the defense and barriers to human error that may be a part of a method.

Can these barriers ought to be included in the technical aspects of the machine (technical barriers)? Otherwise, they may be area of the processes, procedures and guidelines for implementing the machine (socio-technical barriers).

Defensive layers have vulnerabilities: they’re like slices of Swiss cheese with holes within the layer akin to these vulnerabilities.

Vulnerabilities are dynamic: the ‘holes’ aren’t always in the same location and how big the holes can vary with respect to the operating conditions.

System failures occur once the holes fall into line and every one of the defenses fail.

Ways of increase system resilience:

Reduce the prospect of the occurrence of the exterior event that may trigger system failures.
Increase the amount of defensive layers. The greater layers you have inside a system, the not as likely it would be that the holes will fall into line along with a system failure occur.
Design a method to ensure that diverse kinds of barriers are incorporated. The ‘holes’ will most likely maintain different places and thus there’s less possibility of the holes arranging and neglecting to trap a mistake.
Minimize the amount of latent conditions inside a system. What this means is lowering the number and size system ‘holes’.

Resilient systems design

Designing systems for resilience involves two streams of labor:

Identifying critical services and assets that permit a method to satisfy its primary purpose.
Designing system components that support problem recognition, resistance, recovery and reinstatement.

Survivable systems analysis

System understanding: to have an existing or suggested system, evaluate the goals from the system (sometimes known as the mission objectives), the machine needs and also the system architecture.
Critical service identification: the help that has to continually be maintained and also the components which are needed to keep useful identified.
Attack simulation: scenarios or use cases for possible attacks are identified combined with the system components that might be impacted by these attacks.
Survivability analysis: components which are both essential and compromisable by a panic attack are identified and survivability strategies according to resistance, recognition and recovery are identified.

Resourse:https://cs.ccsu.edu/~stan/classes/CS410/Notes16/14-ResilienceEngineering.html

CS 410/510 – Software Engineering class notes

Interview with a Software Engineer without CS Degree