Process Digital Twin and Its Application in Petrochemical Industry

Digital twin (DT) is drawing significant attention both from the academia, industry and government. However, people from different fields have different understandings and cognitions about DT. In addition, most of the DT application scenarios discussed belong to discrete manufacturing and are not suitable for process manufacturing. Petrochemical industry is a typical process manufacturing with multi-scale hierarchical and functional structure in space and time. This contribution focuses on topics on the application of DT in petrochemical industry including: 1) The specific DT definition by process industry. 2) The three key elements and design of chemical DT. 3) Features and application scenarios of chemical DT from the view of model precision, model scale and asset life cycle. 4) The Four P’s maturity framework of chemical DT, and 5) Prospects for the development of chemical DT.


Introduction
After decades of development, China's oil refining and ethylene production capacity has ranked second in the world. However, compared with the requirements of high-quality development, there is still facing challenges, mainly including the low utilization rate of resources and energy, the overcapacity of oil refining, the serious homogeneity of chemical products, and the high environmental protection pressures [1] [2]. Driven by the accelerating energy transition, national "carbon peak & carbon neutral" strategies, increasingly stringent regulatory requirements and changing market demands, environmental, social and governance (ESG) becomes a top priority. The petrochemical industry is devel-and reference architecture of SM and IM through bibliometric statistical analysis. Refinery and petrochemical industries are typical process manufacturing, which uses a common set of separation, mixing, and conversion technologies called "Unit Operations" to turn raw materials into valuable products. The essential characteristics of petrochemical manufacturing are: The process is large in scale and complex in structure, composed of multiple closely connected and interacting operational units, and has a multi-scale hierarchical and functional structure in space and time [5]. Viewed laterally, the process is a nonlinear, dynamically coupled process of a series of related, heterogeneous physical or chemical processes, such as mass transfer, heat transfer, momentum transfer and reaction process; viewed vertically, it is a nested-coupling system between these processes across time and space scales, and a petrochemical cyber-physical system (HCPS) consisting of material flow, energy flow, and information flow networks that integrate complex physical input/output [6] [7] [8].
The challenges are in the heterogeneous nature of chemical reactions over multiple scales and in the heterogeneous nature of multiphase flows encountered in separations, mixing, and reaction systems, again over multiple scales. The opportunities are provided by process technology innovations, emerging high-fidelity computational and experimental techniques that make it possible to understand chemical processes and events at molecular-scale [9] [10]. One of the notable technologies is digital twins (DT). IT analysis and market research institute Gartner has listed DT as one of the top ten emerging technologies for three consecutive years from 2017 to 2019 [11], driving the conception of DT to widely spread and attract more attention from many scholars, industrial sectors and standardization organizations. At the same time, thanks to the development of new generation information technologies such as Internet of Things (IoT), big data, cloud-based computing and artificial intelligence (AI), the implementation of DT has gradually become possible. In addition to aerospace where the origin of the concept of DT, digital twins are also used in energy, cities, agriculture, shipbuilding, manufacturing, medical treatment, environmental protection and other industries [12] [13] [14] [15]. Especially in the field of intelligent manufacturing, DT is considered to be an effective means to realize the interactive integration of manufacturing information world and physical world [16]. It was found that asset integrity monitoring, project planning, and life cycle management are the key application areas of DT in the oil and gas (O & G) industry while cyber security, lack of standardization, and uncertainty in scope and focus are the key challenges of DT deployment in the O & G industry [17]. Journal of Software Engineering and Applications This paper primarily aims at the application of DT in petrochemical industry from an operation perspective. The rest of this paper is organized as follows: Section 2 discusses the definition and understanding of DT especially in process industry. Section 3 discusses the three key elements and design of chemical DT. Section 4 discusses the features and application scenarios of chemical DT from the view of model precision, model scale and asset life cycle. Section 5 discusses the Four P's maturity framework of chemical DT. Then in Section 6, the future development and further work needed are prospected.

Digital Twin Definition
It is generally believed that the concept of digital twin originated from a concept proposed by Professor Michael grieves of the University of Michigan for product life cycle management (PLM) in 2002, which was called "mirrored space model" [18]. Since then, many different definitions of DTs have been proposed in the academic literature. However, although the research and number of publications on DTs are rapidly increasing, the DT concept is still rather fuzzy, and the boundary with some other related technologies is also more and more blurred [19]. The understanding and practice of digital twins are inseparable from specific objects, specific applications and specific needs [15] [20] [21]. Table 1 summarizes some of the most cited DT definitions by organization and suppliers. From these definitions it is clear that there are three important components in the digital twin of an entity or process:  A model of the entity or process;  An evolving set of data relating to the entity or process;  A means of dynamically updating or adjusting the model in accordance with the data.
Digital Twin Consortium (DTC) has established a glossary of DT [22], and the definition and elaboration of DT are relatively rigorous. According to DTC, a digital twin is a virtual representation of real-world entities and processes, synchronized at a specified frequency and fidelity. Digital Twin Systems (DTS) transform business by accelerating holistic understanding, optimal decision-making, and effective action. DTs use real-time and historical data to represent the past and present and simulate predicted futures. DTs are motivated by outcomes, tailored to use cases, powered by integration, built on data, guided by domain knowledge, and implemented in IT/OT systems. You cannot buy a digital twin solution per se, as digital twin is more of a methodology for integrating and modeling across multiple solutions.

Key Elements and Design of Chemical Digital Twin
Tao Fei et al. [21] proposed a five-component framework of DT, which includes of physical entities, virtual entities, services systems, DT data fusion module, and connection/interaction between these four modules. This kind of understanding of DT as integrated system both of digital model and physical model can trace L. B. Gao et al.  [22] A digital twin is a virtual representation of real-world entities and processes, synchronized at a specified frequency and fidelity.
Industrial Internet Consortium (IIC) [23] A digital twin is a formal digital representation of some asset, process or system that captures attributes and behaviors of that entity suitable for communication, storage, interpretation or processing within a certain context.
A Digital Twin is a digital representation of a particular physical entity or a process with data connections that enable convergence between the physical and digital states at an appropriate rate of synchronization, and provides an integrated view throughout the lifecycle of the physical entity or the process that helps optimize the overall performance.
Industry 4.0 [25] Definition 1: Virtual digital representation on physical assets. Note 1: In future, the digital twin will be a synonym for the asset administration shell if the development will continue as before. Note 2: In the context of Industrie 4.0, the term asset administration shell is preferred. Definition 2: Simulation model.

Siemens [26]
A digital twin is a virtual representation of a physical product or process, used to understand and predict the physical counterpart's performance characteristics. Digital twins are used throughout the product lifecycle to simulate, predict, and optimize the product and production system before investing in physical prototypes and assets.
General Electric (GE) [27] Digital twins are software representations of assets and processes that are used to understand, predict, and optimize performance in order to achieve improved business outcomes. Digital twins consist of three components: a data model, a set of analytics or algorithms, and knowledge.
Aspen Tech [28] Digital twin is an evolving digital profile of the historical, current and future behavior of a physical object or process that helps optimize business performance. It is based on models and real time data across multiple different dimensions, leading to actions in physical world such as a change in process operation, safety, maintenance and design.
KBC [29] A digital twin works in the present, mirroring the actual device, system or process in simulated mode, but with full knowledge of its historical performance and accurate understanding of its future potential. The digital twin allows "what if?" and "what's best?" scenarios to be run automatically to determine available strategies that maximize profitability.

AVEVA [30]
Digital twin is broadly defined as a digital replica of a physical object or process. Its value comes from using living data to help understand the behavior of the system or "state-of-work" in the following ways: Consortium believes a digital twin has a corresponding physical twin and a digital twin considered together with its physical twin is an example of a cyber-physical system(CPS) [22]. IIC believes that the core elements of digital twins Journal of Software Engineering and Applications include model, data and services [23]. Meanwhile, GE's digital twin also consists of three components: a data model, a set of analytics or algorithms, and knowledge [27]. This contribution proposes a three-components of chemical DTs which consists of models representation of the physical asset (equipment, unit, or plant), data federation or continuously synchronized data transfer, and integration with related applications interface (API) for advanced data analysis, as showing in Figure 1 (adapted from [32]).

DT Model
A chemical process digital twin should contain computational or analytic models that are required to describe, understand and predict the chemical process' states and behavioral aspects, and models that are used to prescribe actions based on business logic and objectives about plant operation. These models may include first-principles models (the integrated steady state, hydraulics, and dynamics) based on rigorous mathematical statements expressed more simply as algebraic equations or, with increasing complexity, as ordinary differential equations (ODEs) (for lumped parameter system modeling), differential algebraic equations (DAEs) or PDEs (for distributed parameter system modeling) [33]. It may include data-driven models based on statistics, machine learning (ML) and artificial intelligence (AI) [34] [35] [36] [37]. It may also include 3D models and augmented reality (AR) models for aiding human understanding of the operational states or behaviors of the plant.
In general, a model for a digital twin should be sufficiently physics-based, accurate and quick to run that decisions about the application can be made within the required timescale. These three criteria strongly affect which applications can most benefit from a digital twin, and also affect the ways in which physics-based models for digital twins differ from physics-based models for other purposes such as safety verification or performance modelling, where high accuracy may be more important than a short run time because the models are safety-critical but are run less frequently [38].
As the heterogeneous nature of chemical reactions over multiple scales and multiphase flows encountered in separations, mixing, and reaction operations, it is necessary to comprehensively combine the mechanistic, data-driven with ML algorithm for hybrid modeling that achieves more fidelity than either first-principles modeling or AI could alone [33] [39] [40].

DT Data
In an asset-intensive industry, such as petrochemical manufacturing, the digital twin needs to encompass the entire asset lifecycle and value chains from design and operations through maintenance and strategic business planning. The data should include both historical data and real-time data generated during design & engineering, operation and maintenance.
In order to achieve the desired levels of accuracy, source data must be gathered in real-time, be validated and reconciled to ensure that all physical and chemical laws are respected, and electronic noise and dynamic effects eliminated through filtering. Only through this approach can data quality issues be identified and mitigated, and the digital twin can be trusted to reflect reality and relied on for quality and accuracy of its predictions.
Many aspects of using data in process modeling are well-understood, from long experience in model validation and verification and from development of boundary, initial and loading conditions from measured values. However, many data issues still exist, some connected with the volume and speed of data acquisition, some connected with reliability and uncertainty, some to do with dynamic model updating, and others related to data sharing and exchange standards, all of these hindering the chemical process industry to efficiently apply data-driven technologies for their assets in large scale. There need improve the data integration across the whole asset life cycle from process design over the functional design and asset specification up to the operation of the actual assets, building an Industrial Internet of Things (IIoT) platform to harness data integration and enable mashup applications like digital twins [41]. and prescriptive (what should do?). There are also different analytical technologies, such as graphical, statistical, ML/AI, process simulation, etc. The solution should be driven by the problem that needs solving, not how much analytics can be thrown at data in the hope it will both find the problem as well as solve it. The desired outcome should influence the type of analytics being sought and the available analytics technology that is fit-for-purpose.

DT Service
The digital twin allows "What if?" and "What's best?" scenarios to be run automatically to determine available strategies that maximize profitability. Users can then review the recommended strategies to assess the impact of each recommended approach. With the in-depth use of advanced analysis technologies such as ML/AI, the proportion of manpower in a manufacturing system is gradually decreasing, while the proportion of people's knowledge and experience in a manufacturing system is gradually increasing, and eventually evolve into a fully autonomous system [42].

Features and Application Scenarios of Process Digital Twin
Refinery, petrochemical or refinery-petrochemical integration, regardless of its scale, can be solved by a series of technologies called unit operation. All unit operations can be decomposed into three transfer processes or their combination: momentum transfer, heat transfer, mass transfer and reactions [5].

DT Model Precision
The spatial dimension of chemical process modeling can range from 0 to 3D, depending on the specific applications and specific needs of DTs. Some process modeling requires detailed 3D flow and transfer model. Some applications are relatively macro, and the modeling can be met with 2D. Some applications are relatively simple, do not need to achieve high fidelity, and the modeling can be satisfied fully with 1D or 0-dimensional models.
The 0-dimension model also refers to system level simulation, a logical world with no spatial dimension and only time. In this stage, what needs to be solved is Journal of Software Engineering and Applications  changer, if there is good reason to believe that the variation of radial variables is much smaller than that of axial, then the heat transfer process can be modeled into a 1-dimensional model. The 2D model can reflect spatial relationship and flow characteristics to a certain extent, and also has high computational efficiency. The 3D model can more intuitively and comprehensively reflect actual state of the object and study the actual changes in various physical processes. 3D model calculation is much more complicated. For example, the physical structure of the olefin reactor determines its internal gas-liquid-solid three-phase reactions. It is necessary to understand the characteristics of multiphase flow through 3D steady-state and dynamic simulation, so as to optimize the equipment design.

Relationships of Chemical DT in Systems
As showing in Figure 3 (adapted from [5]). The level of abstraction of a digital twin is such that it is sufficient for the requirements of the use cases for which the digital twin is designed. Generally, the relationships of chemical DTs in systems can be classified mainly into four levels depending on the key functionalities.
The equipment-level digital twins are oriented to core and high-value equipment, such as hyper-compressors (high economic cost of failure), large pumps and compressors (high cost of spare parts and maintenance), heat exchangers (impact on yield) and so forth, reflecting the current, future and historical performance of the equipment.
The unit-level digital twins are oriented to basic chemical unit operations, such as cracking, olefin reactor and distillation. They are high-value and high return areas for digital twins involving process, asset condition, control and optimization.
The plant-level digital twins provide a digital representation of a plant, several plants or the whole site. They may cover a subset of the systems involved. For example, energy optimization, refinery and bulk chemical production planning, and special chemical production scheduling are optimized at this level.
The enterprise-level digital twin is an important emerging field. This model can quickly analyze the profit opportunities of enterprises and effectively provide operable information to the executive level. Such as enterprise risk model, supply chain model or multi-scale planning model to optimize the utilization of plants, transportation and storage facilities network, maximize profits and improve customer satisfaction.
The unit-level twin is constructed by integrating the equipment-level twins into a single functioning unit. These unit-level twins are integrated to generate the plant-level twin and so on. The equipment-level twin should possess accurate engineering, manufacturing and design data. The plant-level twin includes an accurate representation of the aggregated operation of all of the equipment, operation unit that the system is built upon. The inclusion and integration rela-Journal of Software Engineering and Applications tionship between different levels of DT follow the ISA106 standard [43].
As people at different levels of the organization have different concerns, it is natural to have different views and expectations of DTs throughout the organization. As discussed above, the strategies, priorities, methods and enabling technologies used to create and deploy digital twins will be different at various levels. However, the DTs must converge towards a holistic and unified vision.

DT in Asset Life Cycle
The process industry is characterized by two value chains of supply chain and asset life cycle, both of which come together in production [44]. A key aspect of successful digital twin adoption and application requires a "shift to the left" in thinking, which means users need to think more holistically about the chemical processes, plants and products to be managed by the digital twin in the early design stage (with the design stage at the left-or beginning stage-of the asset lifecycle).
The DT System aims to be an accurate representation of an asset over its full range of operation and its full lifecycle. It will optimize engineering and project execution, provide end-to-end value chain visibility, ensure smooth handover to operation, improve operations and maintenance performance and provide operations sustainability in terms of Health, Safety, and Environment (HSE). It is ideally created during the initial study to evaluate the feasibility of the asset. It is used and further developed during the design, construction and commissioning of the asset. It facilitates the optimum design of the asset and the training of the staff that will operate the asset. It works in the present, mirroring the actual plant in simulated mode, but with full knowledge of its historical performance and accurate understanding of its future potential. With the development and unification of technology like lifecycle of process simulation [45] [46] and 3D CAD, this lifecycle conception of chemical DT system can be realized through a unified engineering platform [47], such as AVEVA Unified Engineering [48]. It integrates all process simulation and engineering (1D, 2D and 3D) data in one single asset data-centric model on cloud environment. Bi-directional information flow creates the ability to execute concurrent, multi-disciplines engineering for greater control over change across the entire project, reducing project risk while simultaneously enhancing project efficiency and sustainability.
As depicted in Figure 4 (modified from [48]), unlike traditional approaches, which rely on experimentation, design experience, and heuristics, digital design employs a model-based system engineering (MBSE) approach coupled closely with targeted experimentation [49]. Experimentation is used to support the construction of a high-fidelity predictive model or DT of the process rather than directly establish performance aspects of the industrial-scale equipment. Once a model of sufficient accuracy is established, the process DT, rather than the experimental data, is used to optimize the process design and operation. the key to the approach is that the DT can be used to explore many aspects of the decision space for both design and operation, allowing a much more comprehensive, Journal of Software Engineering and Applications effective, and rapid exploration of the process design space than can be achieved by experimentation alone (refer to Table 2). It also allows technology risks to be quantified and addressed systematically.

Maturity of Chemical Digital Twin
Just as there is no unified definition of DT, there is also no unified method to evaluate the maturity of DT. In the field of modeling and simulation (M & S), there are generally two indicators to evaluate whether a model is credible: fidelity and credibility (also known as confidence) [50]. However, the pursuit of high fidelity will bring unnecessary complexity, which will reduce the reliability, computability, maintainability and other important performance of the model.
Credibility is an index to evaluate the trustworthiness of a model according to the specific purpose and requirements of simulation. In fact, a model that is valid for one requirement may not be suitable for another, that is, the same model may show different credibility for different simulation requirements. For example, a 3D CAD model of a plant may be of less value to a process engineer than a digital copy of the plant's operating conditions and the way in which molecules behave and transform. Tao Fei, et al. [51] proposed a six-level of digital twin maturity framework, and use 19 factors for operational DT maturity evaluation.
From an industrial perspective, the maturity can be broken down into a Four P's of DTs (refer to Figure 5): Predictive: This is a type of pattern recognition and anomaly detection leveraging industrial big data, machine learning and process knowledge to create DTs of assets and processes, and then to detect both deviations and matching patterns that indicate early warning of pending problems and inefficiencies, as well as errors in the design process. The big data can come from a variety of sources, including sensors, data lakes, historians, calculated values, audio, video,   combine both online and simulation technology that leverages machine learning to baseline performance through advanced pattern analysis in order to ensure the mathematical models accurately match operational reality. From there, deviations can be quickly detected in order that early action is taken to rectify the situation.
Prescriptive: Based on the issues detected in Predictive and Performance analytics, this provides root cause analysis, planning & decision-support, and probabilistic courses of action to best remedy and optimize a given situation.
Prognostics: Leveraging neural net, deep-learning, and reinforcement learning technologies, this DT provides a forecast of future events. It can be used in monitoring/control and scheduling optimization as well as in determining how long an asset or process can continue to safely operate (after an anomaly has been detected) before failure or significant loss of functionality occurs. It can also provide risk-based insight into decisions such as whether or not an operation should attempt to run to the next planned maintenance outage.

Conclusions and Outlook
In recent years, although many process suppliers (AVEVA, AspenTech, KBC, Siemens, etc.) provide digital twin solutions, to a large extent, they are re-packaging their original advantages technologies that have been available in the market for a long period. For example, process modeling and simulation can be traced back to 1970s. As a result, there have been different interpretations of digital twins.
It's a barrier to proposing generic, yet abstract architectures, for chemical digital twins and their position in process industrial systems.
In this paper, we took a step forward and provide the specific definition, core elements, features and applications scenarios of DT in petrochemical industry, and also discussed the multi-scale feature both in space and time, and life cycle of chemical DT.
In the future, with the further application of digital twins in chemical industry, there are still some issues that need further work, mainly in three aspects: 1) Data governance. One of the biggest challenges of digital twin application is the often discussed problem of data silos. In petrochemical industry, a huge variety of data is produced and has to be managed in several different software tools, databases and documents. This comes with the lack of harmonized data structures. There need to improve the data integration across the whole asset lifecycle and provide a unified data standard and a single data source of truth for exchange and sharing.
2) Model alliance. While individual point solution of DT exists today, there should have one multi-purpose DT in the future which aligns asset lifecycle and value chain. One of the solutions is model alliance, which enables sharing of key master data and model components between different applications to maximize synergies throughout the organization, breaking down functional silos across engineering, manufacturing, supply chain and maintenance and streamlining application deployment and maintenance.
3) Cloud-based platform. The Cloud is already the infrastructure of choice for most business applications. Cloud-based platform should be exploited, where possible, for hosting chemical DT for the following reasons:  Enables the DT to subscribe to external data feeds that can enrich its resolution.  Supports and nourishes agility with respect to the DT. It allows experimentation and rapid deployment of new solutions.  Makes solution updates trivial and significantly reduces infrastructure costs.