TITLE:
Unsupervised Methods to Classify Real Data from Offshore Wells
AUTHORS:
Antônio Orestes De Salvo Castro, Mayara De Jesus Rocha Santos, Fabiana Rodrigues Leta, Cláudio Benevenuto C. Lima, Gilson Brito Alves Lima
KEYWORDS:
Unsupervised Classification, Fuzzy C-Means, Control Chart
JOURNAL NAME:
American Journal of Operations Research,
Vol.11 No.5,
September
2,
2021
ABSTRACT: In the petroleum industry, sensor data and information are valuable. It
can detect, predict and help to understand processes during oil production.
Offshore wells require more attention. Once workovers, maintenance, and
intervention are more costly than onshore wells. Coupling data-driven methods for
well-monitoring applications, two unsupervised classification methods, one
statistical and one machine learning-based, are proposed to detect anomalies in
well data. The novelty is presented by applying a Control Chart using a 3 standard deviations window for the
Permanent Downhole Gauge Pressure
sensor (P-PDG), and a Fuzzy C-means algorithm to classify data from pressure
and temperature sensors in an offshore field. The main goal in structuring a
classified data set is using it to train machine learning models to monitor and
manage petroleum production. Modeling applications for early fault detection
systems in offshore production, based on real-time data from production sensors,
require classified data sets. Then, labeling two target classes: “normal” and “fault” is a key step to be implemented in order to train the
machine learning models. Therefore, this paper applies two methodologies to
classify a real-time data set to create a training data set divided into
“normal” and “fault” classes. Thus, it is possible
to visualize the abnormal events pointed out by the methodologies and
compare how sensible is each method. In addition, it is proposed a random forest application to test the performance of the
classified data sets from both methods. The results have shown that the control chart method presents higher sensibility than
fuzzy c-means, however, the differences
between are insignificant. The random forest performance displayed
sensitivity and specificity values of 99.91% and 100% for the data set classified
by the control chart method and 94.01% and 99.98% for the data set classified
by fuzzy c-means algorithm.