Considerations for Planning a Multi-Platform Energy Utility System

A federated cloud-based multi-platform Power System is presented to meet the growing challenges confronting power system operators. It uses a fede-rated architecture to provide a group sourced increase in cyber security, in reducing the need for computing resource overcapacity, for sharing computing and power resources during emergencies, for minimizing energy costs, and for sharing information on threats and incident responses. In the face of nation-state and organized crime complex, multi-technology, coordinated attacks, a single organization stands an ever reducing chance of remaining safe. The proposed federated cloud preserves the economic efficiency advantages of marketplace of non-monopolistic organizations innovating to obtain competitive advantage with shared preparation, resources, information, and resiliency enabled by individual Power System cloud-based computing creating a fe-derated System. The paper applies earlier advances. This paper combines the results previously published in different publications and applies them to a single paradigmatic example of a power system consisting of a number of individual asset owners. It includes the architecture, model of energy, and resource sharing as well as a novel, self-learning, semantic-less breach detection system for detecting anomalous behavior in resource usage across the power system participants. The paper extends previous work published about fede-rated cloud. The simulations results provided to demonstrate the usefulness of the proposed system.


Introduction
Introduction Energy sector organizations, or market segments of the energy sector, that serve and process a large number of simultaneous users and large quantities of data require "cloud computing services" as part of their information technology process.These services enable convenient, on-demand network access to a shared pool of configurable computing resources.Cloud computing provides a rapidly growing share of industry-wide IT resources.IT related spending toward workload processing increased 32.8 percent in the year 2015, 29 percent in 2016 and 29 percent during 2017 1 .The Energy sector is no exception to this trend.Energy-based companies are using more IT to gather and process data for their business processes.The energy sector faces dynamic load patterns and generation and distribution processes that are complex and changeable at multiple rates.This, in turn, requires broadband, decentralized processing, digital storage and data management.
As an increasing fraction of computing services move to the cloud, there will be a proliferation of software characteristics, service models, and deployment options.Many organizations are moving towards hybrid cloud/hosted computing models.Energy sector organizations are most interested in availability, adaptability, and security.Availability refers to reliable service conditions that make energy related services available to the users it serves.Adaptability relates to the Vendor lock-in risk [1].Security covers the risk energy sector systems exposed while hosted in the cloud or the organization's premise.The security risk applies to the likelihood of service interruption caused by malicious intent.The single Cloud Service Provider (CSP) model provides a sub-optimal solution for the electric utility or electric power holding companies from adaptability, availability, and cyber security perspectives.This paper applies to Power Systems the non-specific results developed in earlier publications [2] [3].
In addition, upsets in generation and distribution can more likely exhibit instabilities if multiple clouds are interconnected via the internet but not coordinated to share critical information and resources to compensate for outages or unplanned loads.To exchange information, computing resources and enable load sharing, multi-cloud architectures become an attractive solution.However, a multi-cloud solution implemented by a single electric utility or electric power holding company won't provide the resilience and adaptability that multi-organization cloud services provide.Also, a single organization managing multiple cloud instances requires expensive adaptations to each CSP's tools and service constructs that may vary among different CSPs.A federated cloud allows multiple CSPs to scale and optimize cross-datacenter deployments and cross-regional deployments by suggesting a cross-cloud provider's resource sharing collaboration via a cloud aggregator.Furthermore, it removes the lock-in risk and minimizes the security risk by spreading the total risk among 1 Gartner Says Worldwide Cloud Infrastructure-as-a-Service Spending to Grow $246.8 billion in 2017.
A basic goal of an energy sector CSP organization is to maximize its market share among the energy service provider (ESP).Accommodating variable demands for computing resources requires an immense capacity, as it calls for providing for the maximum demand.In some cases, this drives CSP's to underutilize massive datacenter deployments.In other situations, the CSPs suffer over-utilization because of a misestimate of the market share, load, and reliability projections.These cases lead to suboptimal utilization and suboptimal revenue projection.
Cyber security is essential for successfully operating a business.ESPs, from government agencies to private companies, rely on information technology to perform critical and essential functions.All organizations face cyber risks today.
Varied threat actors can exploit vulnerabilities in software, hardware, or processes, leading to significant and occasionally catastrophic consequences for an enterprise, industry, region, or even an entire country.Cyber risks are increasing in number and scope and are constantly evolving.Every day, malicious actors attack, disrupt, or steal sensitive information assets or processes from public and private sector computer networks, causing significant damage to business and government.Though quantifying this problem precisely is impossible, experts have estimated that the aggregate economic impact from cyber-attacks may range from $300 billion to $1 trillion globally when totaling direct and indirect costs from actual money stolen, the value of intellectual property and business secrets stolen, the cost of downtime caused by malicious disruption, and recovery expenses for repairing or replacing damaged networks and equipment [4].
What has become increasingly clear is that preventing cyber-attacks is not possible.What needs to be done, after defending information resources, is managing risks and the impacts of successful cyber-attacks.This, in turn, requires organizations to mitigate as many vulnerabilities as possible, observe indicators of attack, deploy resources to mitigate impacts and pursue attackers, and adapt the system to deter future attacks.To make matters more critical and complex, attacks themselves have become more complex in multi-platform environments.Whereas older cyber-attacks originated and terminated on information systems, newer attacks originate from information systems, cyber physical systems, or physical security systems and terminate in various permutations and combinations of each type of inter-networked computing environment.In this paper we describe a specific use case of the polymorphic malware detection system described earlier [2] [3].
In the face of coordinated attacks from nation-states and organized crime complexes, a single organization stands little chance of remaining safe.Yet economically, the best performance is with market places of non-monopolistic organizations innovating to obtain competitive advantage.Thus the key challenge of the coming decade is to maintain competitive environments while pooling cyber defense resources and developing a capability to be more agile than at-Energy and Power Engineering tackers.That is the purpose of this paper and of earlier papers in our body of work [2] [8].A proposed solution to help meet these needs is the federation of cloud computing resources amongst organizations with common interests or which operate in a common market segment.The federation provides shared services for smoothing energy load variations, computing load variations, and security-based resource optimization.

Federation of Multiple Cloud Services
Cloud federation is an emerging new paradigm that interconnects the cloud computing environments of two or more CSPs for the purpose of load balancing traffic and accommodating spikes in demand.The approach allows numerous cloud service providers to use computing resources optimally [3].Also, it allows ESP to avoid the CSP lock-in risk and deliver service availability that can not be provided by a single CSP.No matter what the architecture, there is a need to ensure security and information assurance to users, to manage energy costs, and to share computing capacity to smooth loads and provide mutual backup.[3] specifies the selection principals ESPs needs.Below we name few: 1) Availability: Reliable service conditions that make its services available to the users it serves.
Reliability is defined by SLA and is measured by the allowed unavailability, aka, downtime.Such measurements are done with the help of independent third party service .e.g., Gartner's CloudHarmony2 .
2) Latency: Some of the SPs workloads are sensitive to network latency, which is defined by the time the service takes to respond to a user or other sub-system request.In the case of a service that runs in a geo-location, which is different from that of the end-user or other sub-systems, the network latency can impact the overall service performance.Therefore, SPs who run latency-sensitive workloads prefer to provide their service by maintaining optimal proximity to its end-users or sub-systems in which SPs' workloads interoperate with.
3) Adaptability: The Vendor lock-in risk is one of the core business risks that every enterprise, who wishes to offload its workloads to the cloud, faces.Current IT practices rely on common standards and protocols that allow organizations to switch components or elements in their IT operations.However, cloud computing disrupts most of these practices.Onboarding into a single CSP introduces a risk vector that locks the SP to use the CSP's platform, API's and tools.Adopting a single CSP requires an operational adaptation to CSP's methods.New needs on the SP side or changes in the CSP service terms might sub-optimize the operations of the SP.A Federated Cloud removes that risk vector by creating a CSP agnostic apparatus that allows the SP to adapt when the vendor lock-in plays a critical role in the migration decision.
Cloud Federation is an advantageous structure for aggregating cloud based services under a single umbrella.It is designed to share resources and responsibilities for the benefit of member cloud service providers.[3] [6] discusses the Energy and Power Engineering required framework of managing multiple cloud providers.Federation is useful not only for sharing resources amongst cloud service providers but also for providing enclaves for performing domain-specific missions such as management of electrical grids and supply chains [2].Some responsibilities of an effective federation include assuring that data transfers amongst the federation's CSPs are secure.The Federation will, above all, need to detect any anomalous behavior occurring in transactions and resource sharing.In addition to the growing number of security tools, there is a need to log and identify security issues requiring attention early on in the process.In particular, breach detection in inter-cloud data transfer and communications is a particularly serious security issue because of the possibility of an attacker potentially gaining access to more than one CSP federation member [2].
As an example of the benefits of federated cloud, we propose an energy-sector cloud federation that enables energy organizations' IT workloads to operate across numerous CSPs.The federation will enable an energy-based common dataset that can be used for breach detection and provide other benefits described below.
The federation is comprised of a multi-cloud sector-based enclave within a CSP network that interconnects with other enclaves hosted in other CSPs.The core idea is not bounded to any sector and is equally valid for any industry sector.This paper focuses on the energy sector and its computing needs.The cloud computing performance patterns analyzed by [3] [6].The analysis includes the federation behavior that might emerge out of the performance patterns.e.g., pricing, datacenter carbon footprint.
The reminder of the paper is organized as follows, Section 3 (Terminology) discusses the terminology used in this paper; Section 4 (Energy Sector Threat Landscape) summarizes the main threats the energy sector is currently facing; Section 5 (Why Energy-based Federated Cloud?) describes the solution federated cloud provides to the energy sector threats; Section 6 (Cyber Security Challenges in Federated Cloud) discusses the operational and Infrastructure security challenges of federated cloud; Section 7 (Semantic-less Breach Detection) discusses the tool we suggest for detecting the anomalous behavior of the systems that run and connect within the Cloud Federation; Section 8 (Evaluation) discusses the breach-detection tool prototyped and presents the prototype results; finally, Section 9 (Conclusions) summarizes the main results and the original research question discussed herein.

Terminology
The following section briefly defines the important terms used in this paper.
ESP (Energy-Service-Provider): A utility, grid operator or power plant, usually an organization with end-consumers who require processing of IT workloads.
We will use the terms Service Provider (SP) and Energy Service Provider (ESP) interchangeably throughout the paper.CSP (Cloud-Service-Provider): One of the System of Systems constituent that offers computing resources, digital storage and network bandwidth to its customers and the cloud federation to process its workloads.Also, it provides the software that provisions and manages cloud services.Public cloud.It offers computing resources, e.g.broadband network, computing, storage, and infrastructure applications over the public Internet.The organization, that chooses to run their workloads on the public cloud is considered as a cloud tenant.Public cloud providers adhere to generic service level agreements for service availability and security.
Private cloud: It can include public cloud offerings, excluding the multi-tenancy property.However, multi-tenancy can be implemented within the enterprise that operates the private cloud.Therefore, the implementation might be customized to adhere to specific enterprise needs.
Hybrid cloud: The hybrid cloud aggregates several public and private clouds to run heterogeneous workloads that might span across different geographical locations and enterprises.
IT Workload.These are the organization's IT needs to serve internal, external, and users' IT services and data.Cloud workloads are broadly of two types: online and offline.The former provides low-latency, read/write access to data.For example, a web user requests a web page to load online and serve within a fraction of a second.The latter provides batch-like computing for tasks that process the data offline, which is reported later to users by the systems servers; for example, the search results based on a pre-calculated index.Production offline workloads usually comprise mainly unstructured data sets, such as click stream, web graph, and sensors data.The service level objectives (SLO) for online jobs span a fraction of a second, and those for offline job goals span hours, days and, sometimes weeks.
OT (Operational Technology): These are the workloads generated by the cyber physical systems such as SCADA systems in the control rooms of power companies.They are systems that run on specific functionality that does not include an operating system even though they can communicate with the network.Unlike IT workloads, OT workloads runs on the organization premises and not in the cloud or other remote facility.Also, OT workloads omit operational logs that can be used by other OT workloads or pushed to other systems and turn into IT workloads.ICS (Industrial control system): general term for OT systems used in manufacturing for system that control energy, water, and other critical commodities.
Control Plane: This is the software that automatically controls the operations of software-based systems.It is a rule-based system that accepts signals from various system components and acts, based on a pre-defined policy.SLA (Service Level Agreement): This is an agreement between the Cloud-Service Provider and its customers, the Service-Providers.It often includes guaranteed levels of availability, network latency, and numerous other provisions.

Energy Sector Threat Landscape
The IT workloads and Operational Technology (OT) systems in the energy sector are different enough from those of other infrastructure sectors that a separate approach to forensics is justified.Cyber forensic technologies are advancing rapidly, especially for IT systems.Section 5 explores various challenges and solutions the cloud federation offers.
OT forensics, and specifically, energy sector cyber technologies, are not advancing as rapidly as IT systems and networks.A cyber-security threat-aware system that keeps tracks of the control system configuration and behavior and that shares security information with similar systems will reduce reaction time and increase the agility if resource sharing in crisis will enhance systematic resiliency against cyber threats.Section 7 introduces a method for enabling high level of security for combined IT and OT workloads.
The following sections will describe various threat actors and survey recent major cyber-attacks against the energy sector.The survey will be used to build a system model that implements an energy centric Cloud Federation model and proposes an approach for managing multi-threat cyber-attacks.

Threat Actors
Threat actors are motivated to launch attacks for a variety of reasons: 1) Espionage, including governments stealing information to gain geopolitical advantage and governments or government-sanctioned actors stealing information to give businesses competitive advantage.2) War, in which government actors cause severe destruction or disruption as part of an international conflict.3) Disruption, including attacks by non-government actors or governments pursuing hostile actions that cause serious damage but that do not rise to the level of war.4) Hacktivism, in which individuals or groups cause disruption or repetitional damage for ideological or political reasons.5) Crime, in which individuals, small groups, or large criminal organizations commit financial theft or steal information to sell in the dark web.Cyber-attacks against companies can happen in three primary ways.The first is through inside access, including insiders such as Edward Snowden or physical network access.A second way is remote penetration, such as through spear-phishing emails.The third method is by exploiting the IT/OT supply chain, as in the case of the malware dubbed "Zombie Zero", which was deployed against the shipping and logistics industry across the globe.
In the "Zombie Zero" episode, researchers discovered that hand-held terminal scanners used for inventory at ports were being shipped from their Chinese manufacturer with malware embedded alongside their OT operating system.The malware looked for financial information being processed by the hand-held scanner and exfiltrated that data.The IT supply chain path can also be used via software, without involvement by a hardware manufacturer 3 .Energy and Power Engineering Within these three methods, attackers use many techniques: stealing passwords and identifying information to compromise legitimate users' access; exploiting vulnerabilities in Internet-facing websites or applications; utilizing insiders; using backdoors in hardware; intercepting Internet communications through "man-in-the-middle" attacks; or entering a company's corporate or industrial control system networks through software patches and vendor updates.
In many campaigns and attacks, attackers combine techniques to gain entry undetected and to maintain persistent access and control within a target's networks.
As companies assess dynamic cyber risks today and for years to come, it is critical to note that cyber capabilities proliferate, giving previously less-sophisticated actors such as non-state groups access to more advanced attack skills and techniques: "Whatever a nation state uses today, organized crime will use tomorrow and hacktivists will use the day after that."In fact, the emerging threat space is characterized by sharing and cooperation between different cyber-attackers.Alexander Klimburg, a security researcher with Harvard's Kennedy School of Government and The Hague Centre for Strategic Studies, writes that "Cybercrime, cyberterrorism, and cyberwarfare share a common technological basis, tools, logistics and operational methods."This is particularly true of tools and techniques for delivering malware.For example, hackers who research "zero days", meaning vulnerabilities in code that have not yet been discovered or disclosed and therefore can be exploited before they are "patched", can sell their zero days on a black market for as much as $200,000 or, as in the case of hackers, "lending their zero-day hacks to the government for espionage purposes, then using them for crime later."

Canonical Examples of Cyber-Attacks against the Energy Sector
The energy sector has been an important target for cyber-attacks around the world.Intelligence from news media, various threat reports, U.S. Government organizations such as Industrial Controls Systems Computer Emergency Response Team (ICS-CERT), and other sources confirm the rising sophistication of malware and a growing interest not only against corporate networks but also against industrial control systems and energy critical infrastructure in particular.
Based on incidents reported to the Department of Homeland Security (DHS), the energy sector led all other sectors in 2014 in the number of reported cyber-attacks.Attacks reported outside the energy sector, for example in other critical infrastructure, are also of interest as some were targeted at ICS equipment manufacturers, showing increasing malicious activity in this space.The ICS vendor community may be a target for sophisticated threat actors for a va- One alarming trend in cyber threats against the energy sector is that they are moving toward disruption or even destruction, in addition to espionage and theft.This trend was recently confirmed in the cyber-attack against the Ukrainian electric grid [9].In the "Shamoon" attack, Saudi Arabian state-run oil giant Saudi Aramco came under a targeted and advanced attack from a hacktivist group known as the Cutting Sword of Justice, allegedly because of the company's role as a financial hub for the Saudi regime.The attackers used "wiper" malware which was also used in the Ukrainian attack, which renders computers useless, to disable 30,000 OT workstations and disrupt the internal operations and workstations of Saudi Aramco for days [10].Even more alarming, the ultimate target of the attack was not the corporate network but the ICS systems that control production and distribution of oil and gas.Abdullah al-Saadan, Aramco's vice president for corporate planning, stated that the attackers sought to "stop the flow of oil and gas to local and international markets."Fortunately, the malware contained an error and was unable to spread from the corporate network to the production ICS network.In addition, the energy sector has experienced several "cyber-physical" attacks in which cyber methods are used to cause "real-world" physical damage to ICS systems.Bloomberg News recently reported an attack on a major oil pipeline in Turkey in which alleged cyber-attackers caused an explosion.Also recently, the German Federal Office for Information Security (BSI) published information on an attack against a steel facility that caused "massive damage", though more details have not been forthcoming 5 .These incidents, as well as the alleged U.S.-Israel Stuxnet attack that disrupted Iranian nuclear operations, highlight the significant risk for the energy sector of cyber-attacks causing physical damage [11].
The threat of operational disruption is difficult to assess and prevent because the line between espionage and disruption is blurred.Although the majority of cyber intrusions in the energy sector to date can be characterized as espionage, the information being exfiltrated could easily be used to enable disruption or even sabotage at some time in the future.Espionage attacks also allow attackers to maintain a presence inside a corporate or production network that could be used to cause disruption or sabotage at a later time, for example in the event of an international conflict, as suggested by Admiral Michael Rogers, Director of the National Security Agency (NSA).In highlighting the risk of disruption or destruction, Eugene Kaspersky, the Moscow-born founder of security research company Kaspersky Labs, noted that there has been a dramatic surge in targeted attacks against power grids, banks, and transportation networks around the world and warned that groups targeting crucial infrastructure have "the capacity to inflict very visible damage.The worst terrorist attacks are not expected".

Why an Energy-Based Federated Cloud?
One of the main organizational failures in the enumerated attacks is the lack of cooperation amongst energy providers.The attacks' precursors and potential impact failed to provide timely warning and options for incident response throughout the existing tools, techniques and staff that were available to the energy organizations.Moreover, even if these data were available it lacked the attack context, and preparation, and thus the cyber attack detection was delayed, often resulting in significant damage to the organization.This paper describes how the federated cloud will provide sharing of security data, energy supply loads, information processing and forensic response amongst the Federation member organizations while enabling them to maintain their independence and corporate critical business information.This latter consideration is currently a main barrier for organizational cyber security collaboration.
Federated clouds represent a new paradigm that allows multiple cloud pro-viders (energy corporate private, public, or hybrid) to optimally utilize computing, energy, cyber security, incident response and forensic resources.It will also increase productivity by reducing operating costs.For example, in terms of energy use, which represents over 20% of cloud provider expenses, it does this by, 1) lowering the Federation members' data center's deployments per provider ratio, by sharing energy, and 2) scheduling available energy via aggregators and 3) employing, where appropriate, more renewable and carbon-free energies.The earlier paper provides a more in-depth model of combining renewable and baseline energy to save costs and reduce non-renewable energy consumption in cloud provider data centers [3].
The federated cloud utilizes a software containerized software instead of virtualized software.The goal of containerized software is to obtain resource density and and allocation elasticity.It uses Linux-based Containers orchestrated by a Kubernetes6 resource management system.The Kubernetes system governance the job scheduling and resource allocation [12].Furthermore, the proposed solution scales out and optimizes cross-datacenters and cross-regional deployments by computing and suggesting useful cross-cloud provider's collaboration via the Federation cloud aggregator.Furthermore, it suggests operating data centers employing maximum intermittent green energy sources [3] [5] [7].Yet all of these advantages will be for naught if federation services cannot be supplied securely in the face of growing sophistication and quantities of cyber security threats.The baseline metric is that the losses arising from cyber breaches are substantially less than the value of services provided.This issue is detailed in an earlier paper from our group [2].

Prototype Federated Cloud Architecture
The Cloud Federation architecture is comprised of multiple CSPs, a Cloud-Coordinator, and a Cloud-Broker system.These are defined below.
Cloud-Coordinator.Acts as an information registry that stores the CSPs dynamic pricing offers and demand patterns.Clouds Coordinators periodically update the CSPs availability and offering prices.Also, a Cloud-Coordinator will help integrate, where appropriate, more renewable and carbon-free energies [7].
Cloud-Broker.Manages the membership of the CSP constituents.Both CSPs and ESPs will use the Cloud-Broker to add new cloud federation members.Also, the Cloud-Broker will act on behalf of the individual ESP for resource allocation and provisioning requests.Cloud-Broker also provides a continuous ability to deploy SP software, configuration, and data to one or more member CSPs.In this manner it will provide a multi-utility power system with agility, resilience, capacity smoothing and financial efficiencies.

Cyber Security Challenges in Federated Cloud
The Cloud Federation has a global scale software and hardware infrastructure.Figure 1 describes the cyber security layers offered by the cloud federation.
The following paragraph briefly describes the security elements corresponding to each layer 7 .[2] extended the cyber security model and emphasizes the operational security employing our proposed novel breach detection methodology.
This was done since the operational security corresponds to the perimeter security of an enterprise system and the interface to the Federation members.It is readily configured to handle multi-technology and digital domain threats such as those to IT, IoT, and physical technology system.Also, it will suggest a systems for encryption of inter cloud provider micro-services communication, with emphasis on cross-CSPs (Power System Federation members) for tenants' workloads.

Infrastructure Security
The required baseline security level needed for cloud federation constituent's systems is referenced in Datacenter Premises.CSPs design and build their datacenters based on their expected computing capacities and service reliability defined by their SLA and the resulting redundancy levels of sub systems needed to ensure reliability [13] [14].The datacenter incorporates various components for physical security protection.These, in turn rely on networked alarm systems and indicators of power system malfunctions that are serious enough to require attention.Access to such facilities is governed by the CSP security operations.It uses technologies such as biometric identification, metal detection, metal detectors, and CCTV solutions [15].
Hardware Design.CSPs data centers run computing server machines fed by local power distribution units and connected to a local network that is connected to the edge of the wider network.The computing, digital storage, and networking equipment need to comply with standards that ensures the required audit and validation of the regulatory and industry security requirements [16] [17], e.g., hardware security chip [18].Machine Identity.This critical configuration repository confirms that any participating computing server in the cloud federation can be authenticated to its CSP machine pool through low-level management services [18] Secure Start-Up.Ensures that CSPs servers are booting the correct software stack.Securing underlining components such as Linux boot loaders, OS system images and BIOS by cryptographic signatures can prevent an already compromised server from being continuously compromised by an ephemeral malware or other non-malware attack vector.

Operational Security
Operational security comprises the business flows between the SP with the cloud federation and the CSP it uses for processing workloads.The following section briefly discusses the required cybersecurity measures needed for SP and CSP business scenarios in a cloud federation.
Cross-SP Access Management: SP workloads are manifested in two workload types, 1) short-lived workload.i.e., jobs that are terminated upon completion, and 2) long-lived workload.i.e. services.The former workload might require connectivity to external services during its processing.The latter might expose serving endpoints to other services, e.g., short-lived jobs might require persistent storage to write their job results hence connecting to BigTable8 storage server provisioned by other CSPs, which, in turn, require access management that uses credentials and certificates stored centrally within the Cloud Federation.SP Front End Service Discovery: Long-lived workloads might expose public facing endpoints for serving other workloads or end-user requests.SP front-end services require publishing endpoints to allow other workloads within or external to the cloud Federation to discover their public facing entry point and this requires service discovery capabilities.Service discovery endpoints, and the actual service endpoints, are prone to risks such as Denial of Service attacks or intrusions originated by an attacker.We argue that current solutions offered by individual CSP's are sub-optimal because the target scope of the intrusion doesn't limit propagation or allow an active incident detection and response strategy.For example, assuming an attack probability for a given CSP, running several CSPs reduces the attack impact by a factor of the number of CSPs.Section 7 formulates the risk function and shows how a cloud-federation minimizes those attack impacts by using the semantic-less breach detection system and show how the most serious impacts originate by crossing machine boundaries.
Secure Continuous Deployment: Continuous Deployment (CD) is the function that allows cloud-native applications to get updated through an automated pipeline that is initiated by a new or updated code submission that is compiled, tested through various quality gates until it is certified for deployment of the production systems and deployed seamlessly.Continuous deployment enables cloud applications to innovate faster and safer no matter what number of machines are in the service pool.A secure continuous deployment service requires secured SP code and a configuration repository that authenticates to the target computing resource regardless of the CSP network segmentation.Traditional network segmentation, or fire walling, is a secondary security mechanism that is used for ingress and egress filtering at various points in the local network segment to prevent IP spoofing [19] [20].
Authentication and Authorization.In a federated cloud architecture, deployed workloads might require, and benefit from, access to other services deployed by the federation.The canonical example will be an end user request service deployed in the Federation that triggers another micro-service within the SP architecture.Such cascading requests require multilayered authentication and authorization processes.i.e., a micro-service calls another micro-service and authenticates on behalf of the end user for audit trails supported by the end-user authentication token and the cascading micro-service tokens generated throughout the end user request.Figure 3  propriate.An estimation of the most significant attack patterns and prepared response patterns will go far to prevent catastrophic damage.
Such methods are sub-optimal in a federated cloud for two main reasons: 1) different data sets are owned by different organization departments that are not integrated physically, schematically, or semantically, 2) Lack of unification of both data sets as accomplished by fusion requires a complex transformation of both data sets semantics into a single data set.The above situation exacerbated when migrating the workload to the cloud as it introduces another orthogonal data set that contributes to complexity.The following sections propose a method for breach detection that collapses the three individual CSP silos into a cohesive semantic-less data set that will enhance the ability of the Cloud Federation services to detect breaches to an extent limited by available data and their investment in detection technology tools, i.e. allowing methods to the tenants to incorporate more data about their workload for more automatic detection.

Semantic-Less Breach Detection
As indicated in previous sections, this new cyber-security model proposed for the power system federated cloud assumes the network is breached.Thus a breach-detection system is needed to continuously attempt determining whether a workload is infected as well as the exploited risk type as enumerated above.
Since the workload presented by power systems is substantially different than that of on-demand data streaming we extend that work to show how it applies to the subject of this paper [2].Breach detection system effectiveness is influenced by a number of factors.For the sake of clarity, we focused on the human social factor and the emergent public cloud offering.The following section describes the important factors required for optimized breach detection.This mode of breach detection has to span the heterogeneous schema employed by various federation members CSPs.

The Control Systems Factor
Energy-based organizations such as power plants or utility services use supervisory control and data acquisition (SCADA) systems for managing the facility standard operations, e.g., power generation or power distribution for energy organizations.Those OT workloads are typically comprised of networked computers and data instrumented in graphical user interfaces.Its goal is a real-time control logic of the operational modules through field sensors and actuators.An attack on a facility OT Control Systems will induce unexpected commands to or behavior of the control systems that can result in unexpected system behavior.In cases like Stuxnet, the malware manipulated the feedback loop exacerbating the system [25].Generating independent system feedback through a separate set of field sensors and actuators into a shared repository with statistical learning algorithms would enable anomaly detection and provide the needed cohesive context

Cloud Federation Benefits to Power Systems
Public Cloud services exacerbate the organization's human factor risk by introducing an additional organizational entity that is often separated from the organization it serves.Public cloud operations are agnostic to its tenan's workload semantics by definition.CSPs configure their multi-tenancy to allow even businesses with conflict of interest to run their workload on the same platform.Such practices and policies do nothing to implement the cohesive view required for optimized malware detection and other Power System functions that provide value added through collaboration.
Workloads deployed in public cloud services are not limited to known machine boundaries as are traditional on-premise models offer.Although CSPs feature cyber security mechanisms that attempt mimicking the traditional computing workload hosting, workloads artifacts are under the CSP control.As such, the cloud client workloads might be compromised.Thus, there is a need for another cyber security dimension for the individual Power System ESP workload that overcomes the lack of control when running in the cloud.
A novel self-learning methodology is described that removes the need for Power System tenant information that streamlines semantic-less information flow from the various software stacks of the Cloud Federation.These include both independent Operational Technology (OT) field signals, tenant metrics, and control-plane metrics.Also, it streamlines the aggregation of useful training data for security incidents shared in collaborative platforms both inside and outside the Cloud Federation.The results obtained indicate that a Cloud Federation optimizes such collaboration and self-learning process.A system that implements such self-learning was prototyped and resulted in an initial up to 95% True-Positive rate with 96% True-Negative.
Workload data and usage patterns are critical process underpinning the ESP business success.Yet the leakage of some of the workload data and usage patterns impose a threat to the ESP business.This challenge represents a new threat of organizational espionage as well as attacks on the ESP service that can degrade ESP business continuity.Therefore, sharing semantics breaks the isolation between the two systems and might hold the hosting system accountable for security attacks in CSP or Cloud Federation platforms.Also, transforming every workload semantics into a coherent model that aggregates numerous ESP workloads requires a significant amount of investment.SPs will be reluctant to make such an investment, especially since it doesn't produce income.This method thus has a low likelihood of being implemented.Therefore, enabling a method that eliminates SP investment and business risks is a key for a modern breach detection system success.That is what is described here.Finally, a Cloud-Federation provides a centralized view of cross-CSP operations.Such centralized views allows SP workload deployment to different CSPs to gather a rich data set that will be available for malware identification, for predictive ana-lytics and for improved performance of the datacenters and IT systems of individual power systems.We suggest a method that captures computing resources usage and intra federation traffic and infers potential breach or disruption to proactively alerts CSP security stakeholders about suspicious cyber instances.

From Workload Semantic to Semantic-Less
According to [3] [7], the organization IT workloads composed of online and offline systems.The semantic-less detection will address the polymorphic malware case as its data stream are abstracted from computing activity.The semantic-less detection can be extended independently by generating additional critical signals that can later be inspected for anomalies intrinsically to the signal sets supports by the supply-chain or manufacturing equipment.
In our case, a tenant's workload in a federated cloud manifested by sandboxed software containers that are limited to not more than 1) namespace per tenant for isolation and 2) limited to a resources control groups (aka cgroup) 9 Control groups are the mechanism for limiting computing server host CPU, Memory, Disk I/O and Network I/O usage per namespace.That is the foundation of Linux Containers, which alludes to the existing methods of measurements of the metrics set, CPU, memory and I/O usage.We call this set the behavioral attributes set.Access to cgroup and namespace configuration and control is available on the host level i.e. the host OS that runs the multi-tenant workloads i.e. a control-plane component [2].

Data Collection
Both cyber security leaders and national agencies agree that addressing emerging cyber risks require sharing cyber attacks retrospects and their historical behavior, and discovered vulnerability reports as a foundation for collaboration, predictive time series analysis, risk quantification and risk allocations all leading to safer cyber services [26] [27].Incidents are often documented in unstructured reports that require a manual analysis to identify trends [28].
To assess whether or not a system was breached, it is required to establish malicious system behavior patterns and then decompose those patterns into generic computing system metrics that can later be classified as harmful or safe.The following section discusses the source datasets we chose to assess the initial malicious patterns and their detectability by our method.We continued by decomposing the data and removing the tenant semantics.That allows a generic pattern of malicious activity dataset that can be used as a training data for the supervised model.

Source Datasets
We extended the data sets used by [2] that used the National Vulnerability Database (NVD) [29] and the Vocabulary for Event Recording and Incident Sharing (VERIS) [30].Both datasets included thousands of reported incidents span-ning across various categories.In addition, the datasets used in this paper included online malware analysis sandboxes services such as VirusTotal 10 .Specially, with malware that is deployed in stages, Stage I is usually called the "dropper".The dropper is a relatively small piece of code that breaches the existing defenses and gains a foothold in the target system or network.The dropper would normally contact the command and control (C & C) server to download Stage II, which could: contain the malware that further infects the target, starts to exfiltrate data, or otherwise executes a malicious payload, which may or may not take immediate action depending on the environment.Some Stage I droppers are sandbox or virtual environment aware to the point that if they sense they are not in the actual target environment (i.e., some virtual environment), they may not take any further action.This can thwart any further analysis.Nonetheless, some information may possibly be gleaned through a reverse engineering of the Stage I dropper to determine the IP address of the C & C server.Asset owners could then be warned about this particular piece of information and block that specific IP address.
Our model focuses on 1) Unauthorized access attempts, 2) Suspicious Denial of Service, and 3) Data Stealing Malicious Code, including ransomware instances.We filtered the incidents that conform to the categories and performed a qualitative assessment of the identified breach impacts.Lastly, for simplicity, we applied an additional category that distinguishes the target component reported, service-based or client-based.We included only the service-based incidents.i.e., reported incidents that clearly targeted desktops and workstations were not included in defining tenant semantic structures.
We applied filters for training data accuracy.Filters for VERIS dataset included server workloads as indicated in Section 4.2.1, i.e., Authentication Server, Backup Server, Database Server, DHCP Server, Directory Server(LDAP, AD), Distributed control system, Domain Name Server, File Server, Mail Server, Mainframe Server, Web Application Server, and Virtual Machine Server [30].
Assets operating systems were filtered to Linux and Unix as such operating systems are more prevalent in servers than Windows, MacOSX, and mobile device operating systems.
The VERIS dataset includes incident actions.We filtered the action types that fit the paper focus workloads.i.e.Brute Force, Cache Poisoning, Cryptanalysis, Fuzzing, and HTTP Request Smuggling attacks.We excluded Buffer overflow cases as such attacks can be prevented in deterministic methods and common in Windows-based operating systems [31].The dataset size following the refinement is 5015 incidents from an original set of over 10,000 incidents.Table 1 summarizes the dataset we used for the training data.

Removing the Tenant Semantics
Our approach attempts to detect anomalies in both the control-plane and tenant activities that conform to suspicious patterns.In Section 7.4.1,we defined a categorical dataset that adheres to real incident data.This data applies to potential breaches for server-based workloads.We stipulate, for the purposes of this paper that such server-based workloads will obey similar suspicious patterns when deployed in the cloud.We transformed the categorical dataset into a multivariate time series data that can be used for supervised anomaly detection.The multivariate set is comprised of general operating system observations that do not include any workload semantics but could be used for contextual anomaly detection.The contextual attributes are used to determine the position of an instance on the entire series.We found that, based on collected incident data, the conversion of behavior patterns to multivariate time-series satisfies effective breach detection of any malware, conventional or polymorphic.
We gathered the operations reported in the incident reports (Table 1) and inferred about the operating system resources consumed during the malware lifespan.Table 2 depicts the relationship between the malware characteristics and operating system usage.Figure 4 describes a workload sample comprising of tele-signalization, tele-metering, tele-control and tele-regulation [32].It shows the common pattern of the operating system resource usage that will be used as multivariate time series data sequences.The Evaluation section describes in more details the nature of the data and how it translates into meaningful time series data.

Prediction Methodology
We used the data gathered in Table 2 for formulating the anomaly detection problem of polymorphic malware [33].The detection approach includes three distinct methods: 1) Detecting anomalous sequences in OS usages time series events, 2) Detecting anomalous subsequences within OS usages time series, and 3) Detecting anomalous OS usage events based on frequency.Let T denote a set of S training sequences based on OS usage generated by CSPs, SPs, and the Federation control plane.Also, S denotes a set of m test sequences generat- ed based on Table 2.We find the anomaly score ( ) q A S for each test sequence q S S ∈ , with respect to T .T mostly includes normal OS usage sequences, while S includes anomalous sequences.The semantic-less tool output produces a score for a scanned training sequence T using Regression, i.e., a forecast of the next observation in the time series, using the statistical model and the time series observed so far, and compares the forecasted observation with the actual observation to determine if an anomaly has occurred [34].For simplicity, our model uses Tensor Flow [33] for regression calculation.Our tool is not limited to that tool or the regression type.

Evaluation
We prototyped cloud federation system that mimics the properties analyzed in Section 2. The prototyped system includes the components that are depicted in The control-plane comprised of a Kubernetes API server and controller-manager.The controller coordinator component will need to allocate resources across several geographic regions to different cloud providers.The API server will run a new federation namespace dedicated for the experiment in a manner that such resources are provisioned under a single system.Since the single system may expose external IPs, it needs to be protected by an appropriate level of asynchronous encryption 14 .
For simplicity, we use a single cloud provider, Google Container Engine, as it provides a multi-zone production-grade compute orchestration system.The compute instances that process the user workloads are deployed as Docker containers that run Ubuntu 15 loaded with the Apache HTTP server.For simplicity, we avoided content distribution by embedding the tele-signaling simulator in the Docker image.We ran 52 Docker containers that span across the three regions and acted as WAMS edges.

Baseline and Execution
The baseline execution included data populations for power system tele-signalization.The data population was achieved by using Kubernetes Jmeter batch jobs.The loader jobs goal is to generate traffic that obeys the observed empirical patterns depicted in Figure 4.The system usage for both control-plane and ESP was measured through cAdvisor, a Kubernetes resource usage analyzer agent.The agent, from every node in a cluster, populates system usage data to Heapster, a cluster-wide aggregator of monitoring and event data 15 .
We labeled the system usage with semantic-less dimensions, as shown in tainer (%).The Heapster aggregated the data based on the labels that are later pushed to a centralized database, influxDB.The anomalous sequences q S S ∈ was injected as synthetic randomized system usage data to the influxDB using HTTP API.The date was tagged as well according to the three labels, CPU, network and disk usage.We used data in Figure 4(b), Figure 4(c) as a baseline sequence that was randomized using NumPy 16 .The randomization followed the malicious usage patterns described in Table 2 and resulted in data shown in

Analysis
The prototype included two core datasets, normal (Figure 4) and malicious (Figure 5(c)).The CPU usage in the normal dataset fit the normal tele-signaling patterns at the first half of the run.The second half required less CPU due to the caching mechanism applied in the Apache HTTP server that alleviates the need for the CPU when served through a cache.The Disk write pattern manifested similar content caching schema.The network egress ratio was not impacted by the caching schema.
The malicious dataset used the malware classification table (Table 2).Figure 5, shows a semantic-less behavior for ransomware malware that attempts to encrypt data while serving workload.Suspicious signals denoted by a star and o markers for CPU and disk write respectively.Based on the dataset classification, ransomware requires no network egress but requires the CPU for data encryption and writing back the encrypted payload to disk.Our prototype included similar patterns depicted in   [2].This result shows the potential for continual improvement of results in the field and less dependence on initial score than is needed with a non-learning, less adaptive system.

Conclusion
Power systems are facing cyber threats that are growing in quantity, complexity, and sophistication.Collaborative security in the form of a federated cloud will add coordination for security, for load sharing/redundancy and for improved asset utilization while maintaining organizational independence.This publication integrates and applies earlier results on cyber security, equipment design, energy efficiency, and load management to create a paradigm for next generation power systems.It improves security by employing well known effective ways that minimize the potentially crippling costs associated with providing security, resilience and capacity redundancy.Large scale complex technology such as that of power systems is already driving the emergence of disparate, distributed, large scale, multi-tenant environments such as the proposed cloud federation.These environments are redefining traditional perimeter boundaries along with traditional security practices.Defining and securing architectural configurations that maintain, secure and adapt Power system functionality asset boundaries is more challenging.This paper presents a federated cloud-based multi-platform power system utility that facilitates communication, load resilience under stress, energy cost management, and a proactive approach for detecting breaches, utilizing general system usage patterns that help to predict potential breach proactively from prior history and regular power system computing service provider workloads.This approach shares and minimizes upfront investments and provides savings that generate a return on investment within a year of deployment.Our proposed system of systems ultimately provides a way to meet power system regulatory service level requirements and maximizes system owner control.
We describe a progressive layers security model starting from the physical security of datacenters (part of the networked triad of physical, ICS and IT systems), progressing to the hardware and software that underlies the infrastructure.The model also incorporates the constraints and processes to support the cloud federation operational security.The following section describes the cloud federation cyber security design throughout the data processing life cycle of the cloud federation to enable secure communication with tenants (SP) and their customers or control plane communication including CSP, Cloud-Brokers, and Cloud-Coordinator.

Figure 2 .
It includes deployed facilities and computer systems managed by the individual CSPs or the Federation.The larger CSP's often exceed these baselines.

7Figure 1 .
Figure 1.Federation of Power System CSP's security design.Comprises of infrastructure security and operational security layers.

Figure 2 .
Figure 2. Proposed Federation of Power System CSP's, dedicated as s SoS includes cloud service providers and energy service providers.The Clouds-Control plane Clouds-Broker and Clouds-Coordinator.

Figure 3 .
Figure 3. Authentication and authorization in Cloud Federation Cross-SP model.

Y.
Biran et al.DOI: 10.4236/epe.2017.912046739 Energy and Power Engineering into the status of other individual, interconnected, and correlated facilities.

Figure 4 .
Figure 4. (a) shows a simulation data of a Phasor Measurement Units (PMU) recording device workload.(b) and (c) shows a normal system usage patterns, CPU, network and Disk usage respectively.

Figure 2 .
Figure 2.For the scope of the prototype, we enabled semantic-less metrics from both ESPs and CSPs to improve correlation efficiency.CSP data sharing limits the effectiveness of any cyber analytical technique and, in practice, will represent

Fig- ure 4
(c) graph.The Network egress was measured by thousands of transmitted packets (k-TX), Disk writes per second (k-write/sec) and CPU usage per con-

Figure 5 (
Figure 5(c).The execution required a TensorFlow session that looped through the dataset 2000 times, updated the model parameters and obtained the anomaly score ( ) q A Sfor each test sequence q S S ∈ , on T .The breach and anomaly 14 Simulation code and data retrieved from https://github.com/yahavb/green-content-delivery-network

Figure 5 .
Figure 5. Simulation of anomalies found in the PMU recording devices obtained after 2000 training epochs of the TensorFlow tool.It shows anomaly scores that exceed a threshold value.The threshold value depends on data quality, on interference, noise levels, and disturbances.Thus the threshold value is a dynamic system property.For the purpose of this demonstration various singular values of the threshold were chosen.
epochs.Scores that are closed to 10 indicate a potential breach.Negative scores indicate normal behavior.The scores shown in Figure 5 are the .The maximum-based aggregation was chosen because of the suspected anomaly nature.Specifically, extensive write and CPU operations plays an equal role in the potential for breach.Other anomalies might use different type of aggregations.According to the anomaly scores shown in the experiment, up to 95% were True-Positive of detected breaches and 96% of the detections were True-Negative.The advantage of more data and more specific data labeling is shown be the improved detection scores from earlier paper by 10%

Table 1 .
Summary of datasets used.
Table 2 with a similar approach as done for ransomware.Y. Biran et al.DOI: 10.4236/epe.2017.912046747 Energy and Power Engineering Our model yielded a series of anomaly scores ( ) Figure 5).The model attempted to detect anomalous subsequences within q S .e.g., net S for network, cpu S for CPU, read S , and write S for disk read and write transactions.We used a scoring based techniques for each of the observed sequences.The score range is between [ ]