Comparison of different dental age estimation methods with deep learning: Willems, Cameriere-European, London Atlas

Subjects

The study protocol was assessed and approved by the Clinical Ethics Committee of Marmara University with approval number 05.05.2023.720. The study was conducted in accordance with the principles of medical research involving human subjects stated in the Declaration of Helsinki. Written informed consent was obtained from the parents or legal guardians of all subjects in the study.

This study was conducted by analyzing digital panoramic radiographs of children aged 5 to 16 years who were admitted to Marmara University, Faculty of Dentistry Clinics for routine dental care between May 2023 and September 2023. Patients at Marmara University consist of people from northwestern Turkey, living in Istanbul or the surrounding provinces. Based on the study of Tural [11], the sample size with G*Power 3.1 software (Faul, Erdfelder, Lang, and Buchner, Düsseldorf, Germany) was found to be 773 with 95% confidence (1-α), 95% test power (1-β), 0.130 effect size. Taking into account the approximately 20% attrition, the study population was determined as 930.

Selection criteria

Children eligible for inclusion in the study are those who had panoramic radiographs taken between May 2023 and July 2024, were aged between 5 and 16 at the time of radiographic imaging, had radiographs deemed diagnostically excellent (Grade 1) according to the United Kingdom National Radiological Protection Board [12], and did not have any systemic diseases or syndromes.

Children with hyperdontia, hypodontia, or any dental anomalies, periapical or periodontal lesions, any form of tooth loss for any reason, a history of orthodontic treatment, jawbone cysts and tumors, or a history of orofacial and dental trauma will be excluded from the study.

Radiographic evaluation

All the digital panoramic radiographs included in the study were acquired using a Morita device (VeraView IC5, J. Morita MFG. Corporation, Kyoto, Japan) with an exposure time of 8.8 s, a power of 60–70 kV, and a current of 7.5 mA at the Oral Diagnosis and Radiology Clinic of Marmara University Faculty of Dentistry. The diagnostic accuracy of the radiographs was assessed in accordance with the criteria of the United Kingdom National Radiological Protection Board [12]. According to these criteria, Grade 1 represents diagnostically excellent radiographs without any irradiation, positioning, or procedural errors, while Grade 2 radiographs may include some minor irradiation, positioning, or procedural errors. Grade 3 radiographs, on the other hand, contain irradiation, positioning, and/or procedural errors to an extent that is deemed unacceptable in the evaluation. In terms of the validity of assessments, it has been stipulated that at least 70% of the radiographs included in a study should be Grade 1, less than 20% should be Grade 2, and less than 10% should be Grade 3 [12]. In this study, only panoramic radiographs of Grade 1 diagnostic quality were considered for evaluation. The images were labeled and saved with the child’s date of birth and the date of the panoramic radiograph. The flowchart of patient enrollment and data analysis is shown in Fig. 1.

Fig. 1figure 1

The flowchart of patient enrollment and data analysis

Chronological age calculation

The chronological age of patients will be calculated in decimal form using the formula [(date of panoramic radiograph) - (official date of birth)] / 365.25 in Microsoft Excel software. Patient gender information will also be recorded alongside their chronological ages.

Age estimation of radiographic methods using developing teeth

Dental age determination will be performed using the Willems, Cameriere-European, and London Atlas methods.

Willems

The Willems method was developed using the ANOVA test by Willems et al. [5] as an improvement to age determinations based on the Demirjian method, which tended to overestimate age when compared to chronological age. This method was initially applied to the white Belgian population. Age determination using the Willems method relies on the dental mineralization stages of the left mandibular 7 molars. These stages are scored based on tables provided by Willems, and the sum of the scores directly provides the individual’s dental age [5].

Cameriere-European

The Cameriere-European formula was developed in 2007 by Cameriere and colleagues for dental age determination in children from Europe and surrounding countries. This method involves the measurement of root development and apical opening in the lower left mandibular teeth. The number of teeth with completed root development is represented as N0. The distance between the inner surfaces of apical openings in single-rooted (Ai, i = 1,.,5) and multi-rooted (Ai, i = 6,7) open-apex teeth was measured. To minimize potential magnification and angulation errors, the designated distance for each tooth was divided (Xi) by the total tooth length (Li, i = 1,.,7). The sum of all Xi values, along with the s value (total Ai/Li values), is inserted into the Cameriere-European formula to calculate the individual’s dental age. In the formula, the variable g is assigned as 1 for boys and 0 for girls, and the formula is expressed as “Age = 8.387 + (0.282 * g) - (1.692 * X5) + (0.835 * N0) - (0.116 * s) - (0.139 * s * N0)” [4, 13].

London Atlas

The London Atlas, developed in 2010, consists of a series of schematic images created for specific ages. Based on evidence and available in multiple languages, the reference images in the atlas are examined to determine the most suitable image for a child’s panoramic radiograph [1, 3].

For measurements, all radiographs included in the study were transferred to a computer-assisted measurement program (ImageJ version 1.49v, National Institutes of Health, Bethesda, Maryland, USA). All manual methods were measured by two different pediatric dentists with 7 and 9 years of experience.

Deep learning algorithm

The performance of deep learning methods, specifically Convolutional Neural Networks (CNNs) for dental age estimation was assessed. To conduct the experiments, the dataset was split into two: training and testing using the common split scheme of 80 − 20%. Furthermore, the training data was further divided into training and validation sets to perform hyperparameter tuning and select the model architecture, with 90% of the data used for training and the remaining 10% for validation. Final estimates corresponding to the t-test data were reported. This experiment was performed in the spirit of cross-validation five times to predict the whole dataset and obtain comparable results with the other examined methods.

In the preparation of the dataset, X-ray images were paired with the chronological ages and gender information of the patients. The images were downsized to reduce the computational load of the training process. A resolution of 224 × 224 pixels was selected for the downsized images. In addition to the X-ray images, the gender information of each patient was incorporated as an additional feature to improve the accuracy of the estimation results (Fig. 2).

Fig. 2figure 2

CNN model architecture used for Deep Learning analysis

In this study, various architectures commonly used in deep learning, including ResNet, Xception, EfficientNet, and similar CNN models, were evaluated. Furthermore, transfer learning was examined to determine if it contributed to obtaining better results. It was observed that transfer learning using a pre-trained model, such as ResNet50, not only failed to improve the results but also slightly degraded performance, likely due to the limited number of training examples available for fine-tuning. As a result, a simpler ResNet model is used. The model consists of four convolutional layers. In the first convolutional layer, we used a filter size of 64 and a kernel size of 3 × 3 and applied max pooling operation with a 2 × 2 pooling window. Subsequently, the same convolution and max pooling operations were repeated three times, with the only difference being a reduction in the number of filters to 32 in the convolutional layers. These convolutions were followed by ten layers of ResNet blocks. After flattening the output of the ResNet blocks, five feed-forward layers were used to obtain the chronological age prediction. All models were implemented using the TensorFlow library [14].

Statistical analyses

For the Willems, Cameriere-European, London Atlas methods, 200 randomly selected radiographs were re-evaluated at two-week intervals and intra- and inter-examiner agreement levels were assessed using Intraclass Correlation Coefficients (ICC). The distribution of age categories by gender was analyzed using the Chi-Square (χ²) test.

The Kolmogorov-Smirnov test was used to examine whether the ages had a normal distribution. Gender differences regarding age were determined using the Independent Samples t-test. Paired Samples t-test was used for testing differences between chronological age, and the predicted dental age obtained from four methods. Relationships between predicted and chronological ages were preliminarily examined using a Pearson Correlation analysis. The agreement between the chronological age and the dental age was investigated by using the Intraclass Correlation Coefficient. The ICC was interpreted according to the categories suggested by Shrout and Fleiss [15]. Additionally, the performance of predicting chronological age using the deep learning algorithm was evaluated with various goodness of fit criteria (Mean Squared Error, AIC, etc.).

The chronological ages of the patients and the predicted dental ages using four different methods were compared for each age group using Paired Samples t-tests for normally distributed data and Wilcoxon Signed-Rank tests for non-normally distributed data. The mean differences between the chronological ages and the predicted ages were calculated for each age group. Additionally, mean absolute errors (MAE) were computed to determine the accuracy of the estimation methods. Analyses were performed using SPSS (Ver. 26.0) and R 4.2.2 software. A statistical significance level was accepted as 0.05.

Comments (0)

No login
gif