Ⅰ. Introduction
The handling and administration of high-dose I-131 poses the risk of internal and external exposure for radiology technologists. The risk of internal exposure is less today than it has been in the past because of the increased stability of therapeutic capsule formulations of I-131 and the widespread adoption of the capsular form of I-131 for therapy [1, 2]. However, the risk of external exposure to localized areas such as the hands and eyes still exists because it is necessary to directly measure dose radioactivity with a dose calibrator when there is a difference between the calibration date and the date of the dose is administered to the patient. Radiology technologists should wear protective gear when measuring high-dose I-131 as shown in Fig. 1.
The equivalent dose can be calculated using Equation 1, where WR represents the radiation weighting factor and DT,R is the absorbed dose averaged over the tissue or organ, T, of the incident radiation, R. For example, when working for 10 seconds to measurement 7400 MBq of I-131, the equivalent dose to a hand at a distance of 10 cm from the I-131 is 0.157 mSv (the gamma constant for I-131 is 7.647E-5 mSv/h per 1 MBq at 1 m).
As the number of measurements increases, the exposure dose also increases. This causes great psychological burden on radiology technologists. In order to reduce exposure to radiation among workers in the long term, we conducted a study employing AI technology. Machine learning is a subtype of AI that uses algorithms through data analysis without being explicitly programmed [3-5]. Machine learning is generally divided into supervised learning and unsupervised learning. Supervised learning algorithms learn by using paired inputs and outputs presented to them by humans to find patterns so that it can correctly predict output values for new data. In this study, we designed a supervised machine learning model by applying techniques used in big data analysis modeling, which is currently being used in the data science field. We aimed to train the model and predict the radioactivity of high dose I-131 based on the measured external dose rate of the I-131 in a shielded container.
Ⅱ. Materials and methods
A sequential depiction of our methodology is shown in Fig. 2. The external dose rates were measured at 1 m, 0.3 m and 0.1 m distances from I-131 sources in a shielded container obtained from suppliers, with the dose rate equal to the highest value measured over a 1-minute time span (Fig 3). Due to the characteristics of the survey meter for external dose rate, which measures in units of μSv per hour, fluctuations in measured values can occur. In order to obtain a consistent measurement, we adopted the method of taking the highest value measured over a 1-minute time span. A total of 868 sets of measurements were acquired, 358 at 3700 MBq (100 mCi) and 510 at 5550 MBq (150 mCi). The actual measured activities are found to be significantly higher than the administered amount (up to 10% higher), due to the standard practice of preparing the dose with a margin, taking into account decay. Therefore, we explicitly distinguish between 3700 MBq and 5550 MBq, not actual measured values. A RAM DA3-2000 Meter (Rotem Industries Ltd.) was used to measure the dose rates. We used only one type of instrument as we aimed to predict results using intuitive measurement values without correcting for correction factors.
1. Exploratory data analysis
We employed exploratory data analysis (EDA) which was devised by Tukey JW in 1977 [6]. Using EDA, the characteristics of the total data set were identified; data preprocessing, such as identifying outliers and missing values, was performed; and the correlation between explanatory variables and response variables was examined with a linear regression model. Finally, we visualized suitability of the data for modeling.
2. Machine learning
We used the hold-out method to generate the training and test sets for the training of the machine learning models. This is one of the simplest data resampling strategies: it randomly samples some data from the learning set for the test set, while the remaining data constitute the training set [7, 8]. In our study, the data set was split, with 70% of the data for the training set and 30% for the test set. We created two models which employed a simple algorithm and two models which employed an ensemble algorithm. All created supervised machine learning algorithms were implemented using R programming language (version 4.2.0) in the Windows operating system environment. Linear regression is one of the oldest and most widely used correlational techniques. The goal of the method is to fit a straight line to a set of data points using a series of coefficients multiplied to each input, like a weighting function, and an intercept. Linear regression is easy to understand and quick to implement, even on larger data set. The downside of this method is that it is inherently linear and does not always fit real-world data [9]. A decision tree (DT) algorithm is used to divide learning activities where the tree is constructed by dividing the data set into smaller sets until each partition is clean and pure, where data classification depends on the type of data [10, 11]. The DT algorithm is one of the most effective learning algorithms due to its ability to handle all types of data, ease of comprehension and simplicity [12]. The party package in R was used with a maximum tree depth of 3 for visualization of DT (R Documentation: Package party version 1.3–10). In an ensemble algorithm, bagging or bootstrap aggregation (boosting) is a technique for reducing the variance of an estimated prediction function. Bagging seems to work especially well for high-variance, low-bias procedures, such as trees. For regression, we simply fit the same regression tree many times to bootstrap sampled versions of the training data, and average the results. Similar to bagging, boosting is a committee method, although, unlike bagging, the committee of weak learners evolves over time, and the members cast a weighted vote. Boosting appears to dominate bagging on most problems, and has become the preferred choice. We chose to use random forest (RF), which is a substantial modification of bagging that builds a large collection of de-correlated trees and then averages them. On many problems the performance of RF is very similar to boosting, and they are simpler to train and tune. As a consequence, RF is popular, and is implemented in a variety of packages [13]. Extreme gradient boost (XGBoost) is the fastest implementation of gradient boost, which is representative algorithm using boosting. XGBoost can harness all the processing power of modern multicore computers and is feasible to train on large data sets [14]. XGBoost was chosen for comparison with RF.
3. Evaluation
To evaluate the predictive accuracy of the models, root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) were calculated according to Equation 2-4, where yi is the actual radioactivity of the I-131 dose and is the radioactivity predicted by the model.
R squared (R2), the coefficient of determination, was also used to evaluate the models. R2 is calculated according to Equation 5, where yi is mean of the actual radioactivity of the I-131 dose (yi). The model-metrics package in R was used to print the evaluation results. To evaluate the two models with best performance, we validated them with 11 newly measured sets.
Ⅲ. Results
1. Exploratory data analysis
Fig. 4 is a boxplot of the total data set. There were no missing values and no outliers were detected. It can be seen that the interquartile range (IQR) of dose rate is the smallest at 1 m. This means that the dose rate does not fluctuate at increased distances from the shielded I-131, in accordance with the inverse-square law of distance. In the correlation analysis, the correlation coefficient between distance from the shielded I-131 and dose rate was high (≥ 0.9), and the correlation coefficient between dose rate was also close to 1. We investigated multi collinearity by calculating the variance inflation factor (VIF) among the variables in the total data set. Since the VIF of the total data set was very high, we divided the data set into two groups, a 3700 MBq group and a 5550 MBq group, for statistical analysis (Table 1). The overall regression was statistically significant in both the 3700 MBq group (R2=0.98, F (3, 354)=5783.225, P<0.001) (Table 2) and the 5550 MBq group (R2=0.96, F (3, 506)=4494.678, P<0.001) (Table 3).
Fig. 5 is a visualization of the classification and regression trees (CART) algorithm implemented in the DT. The 0.3 m variable of the first node at the top implies that it is most closely related to the radioactivity of I-131. It can be seen that the data set was suitable for the model as it was split into 4 leaf nodes at 3700 MBq and 4 leaf nodes at 5550 MBq.
2. Evaluation of models
When evaluating the accuracy of the models, the smaller the RMSE, MSE and MAE, the better performance of the model, and the closer R2 was to 1, the higher explanatory power of the model. Random numbers used for random sampling were fixed using the seed function. The model with the best performance was the RF model, with a RMSE of 8.894, MSE of 79.098 and MAE of 6.546. On the other hand, as expected, the performance of the models employing a simple algorithm was not good. In particular, the MSE of the linear regression model was 71901.870 (Table 4).
Ⅳ. Discussion
In this study, we created a data set consisting of external dose rates of high-dose I-131 measured at distances of 1 m, 0.3 m, 0.1 m, and used it to train supervised machine learning algorithms to predict the actual level of radioactivity of I-131 doses. The purpose of this study was not to compare and evaluate machine learning algorithms, but to focus on finding the optimal model for predicting the actual radioactivity. This means that the desired characteristics of candidate models, such as type of data and linearity, were identified in advance, and we had specific suitable models in mind. The correlation and linear relationship in the linear regression statistical analysis results support this. For modeling, the RF and XGBoost algorithms are already widely known to be suitable for regression and classification [15-17]. RF, which performed best in this study, has many advantages: it is fast in both model training and evaluation, is robust to outliers, can capture complex nonlinear associations, and has been shown to handle challenges arising from small data sets [18-20]. RF performed very well even with a small data set in this study. We need to focus on how much the concern about radiation exposure will be improved. Several studies have been conducted on reducing the exposure of workers to radiation safely. Among these, LLtzen U et al. developed a shielded measurement method by creating calibration curves using measurements of shielded I-131 capsules from a dose calibrator and a well ‐type and a thyroid uptake probe. This method reduced the effective radiation dose by 94.9% [21]. We calculated the equivalent dose to the hand when directly measuring the radioactivity of 7400 MBq of I-131 in the introduction of this study. If our trained RF model were used, although there would be the disadvantage of increased working time due to the measurement of the external dose rates, an equivalent dose of about 0.00885 mSv would be expected, which would represent a 94% reduction compared to the direct measurement method (Table 5). This usage of a predictive model to reduce the equivalent dose is not limited to determining the radioactivity of high-dose I-131, but can be extend to predicting the radioactivity of several sources using measured external dose rates, and further research is expected. The limitations of this study are as follows. First, overfitting could not be excluded to some extent in the two ensemble models. In the training set, the RMSE was 8.894 for the RF model and 10.209 for the XGBoost model, but when the performance of the RF and XGBoost models was validated using 11 newly measured sets, the RMSE was 38.726 for the RF model and 42.670 for the XGBoost model (Table 6). According to Breiman L, a relatively large number of variables are required to get a near-optimal test set error [22]. This means that external dose rate values measured at various distances are required, and increasing the number of these variables would improve the performance of the model. The performance of the model could be improved if more measurement distances were added in place of the measurement at 1 m, where the variable importance is 0 (Table 7), but this has a drawback in that the measurement time increases. These issues should be considered going forward. For therapeutic administrations of I-131, if more than a 10% variance from the prescribed dose or dosage range occurs, the referring physician and patient should be notified [23, 24]. The second limitation is that the model cannot predict when this variance of 10% or more will occur. When RF is employed for regression, it is unable to predict doses beyond the range seen in the training data set. Because, in the training data set, there were no cases that had an error of more than 10%, the model was not able to learn to predict such cases. We expect this problem can be addressed by employing a classification model such as support vector machine (SVM), which is capable of learning to identify cases where there is a 10% or greater variance in dose.
Ⅴ. Conclusion
To predict the radioactivity of high-dose I-131 using a supervised machine learning model, we collected a data set containing measurements of external dose rates from a shielded I-131 at various distances. We found that there was a correlation and a linear relationship between the external dose rate of high-dose I-131 and the actual radioactivity. Random forest achieved the best predictive capability with an RMSE of 8.894, MSE of 79.098, and MAE of 6.546. Improving the model’s performance in the future is expected to contribute to lowering exposure among radiology technologists.