Abstract
Early response to marine pollution may occur after marine accidents can prevent or minimize the risk of pollution. This research aims to develop a data-driven model to predict the pollution occurrence of marine accidents. The Random Forest algorithm was performed to develop the model using 10829 marine accidents occurring between 2004 and 2019 in Canadian waters, obtained from the Transportation Safety Board of Canada. Before developing the model, missing data were completed with the Naive Bayes algorithm during the data preprocessing stage. The cost-sensitive method was also integrated into the model development phase to cope with the unbalanced dataset problem. The developed model had an accuracy value of 93.54 %. Moreover, the model predicted the pollution occurrences, quite a minority in the dataset, with satisfactory accuracy. The developed model was compared with eight different algorithms, and its validity was proved by having the highest performance. The variables with the highest effect on the model were accident type, ship type, area type, and season, respectively. Detailed analysis of these variables was performed using Logistic Regression. The developed model is thought to be a decision support system for the relevant authorities to predict the pollution occurrence after marine accidents.
-
Kapsamı
Uluslararası
-
Type
Hakemli
-
Index info
WOS.SCI
-
Language
English
-
Article Type
None
-
Keywords
Marine pollution Oil spill Marine accident analysis Data mining Machine learning Random forest