Effects of Tuning Decision Trees in Random Forest Regression on Predicting Porosity of a Hydrocarbon Reservoir. Case Study: Volve Oil Field, North Sea

Abstract

Machine learning (ML) has emerged as a powerful tool in petroleum engineering for predicting reservoir properties such as porosity. Random forest regression (RFR) is one such widely used ML technique. To optimize its performance one of its hyperparameters, the number of trees in the forest (n_estimators) are tuned. Existing literature lacks in-depth studies on the influence of n_estimators on RFR model when used for predicting porosity. In this study, the effects of n_estimators on RFR model in porosity prediction were investigated. Furthermore, n_estimators' interaction with another two key hyperparameters, namely the number of features considered for the best split (max_features) and the minimum number of samples required to be at a leaf node (min_samples_leaf) was explored. The RFR models were developed using 4 input features namely, resistivity log, neutron porosity log, gamma ray log and corresponding depths obtained from Volve oil field. Calculated porosity was used as the target data. The methodology consisted of 4 approaches. In the first approach only n_estimators were changed, in the second approach n_estimators were changed along with max_features, in the third approach n_estimators were changed along with min_samples_leaf and in the final approach all three hyperparameters were tuned. Models were evaluated using adjusted R2 (adj. R2), root mean squared error and their computational times. The obtained results showed that the highest performance with an adj. R2 value of 0.8505 was given when n_estimators was 81, max_features was 2 and min_samples_leaf was 1. In approach 2, when n_estimators upper limit was increased from 10 to 100 there was a test model performance growth of more than 1.60%, whereas increasing n_estimators’ upper limit from 100 to 1000 showed a performance drop of around 0.4%. Models developed by tuning n_estimators from 1 to 100 in intervals of 10 had healthy test model adj. R2 values and lower computational times making them the best n_estimators range and interval when both performances and computational times were taken into consideration to predict porosity of Volve oil field. Further, it was concluded that by tuning only n_estimators and max_features the performance of RFR models can be increased significantly.

Article information

Article type
Paper
Submitted
15 Thg5 2024
Accepted
06 Thg8 2024
First published
08 Thg8 2024
This article is Open Access
Creative Commons BY license

Energy Adv., 2024, Accepted Manuscript

Effects of Tuning Decision Trees in Random Forest Regression on Predicting Porosity of a Hydrocarbon Reservoir. Case Study: Volve Oil Field, North Sea

K. Sandunil, Z. Bennour, H. Ben Mahmud and A. Giwelli, Energy Adv., 2024, Accepted Manuscript , DOI: 10.1039/D4YA00313F

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements