Objective: We aimed to compare the inter- and intra-observer variabilities and diagnostic performance of PI-RADS v2 and Likert scoring in the evaluation of prostate cancer on multiparametric prostate MR imaging.
Material and Methods: MRI findings of 53 patients who had histopathologic diagnosis of prostate cancer and 51 patients who had one or more negative transrectal ultrasound guided prostate biopsy were evaluated retrospectively. The images were assessed by three independent observers blinded to histopathologic results of the patients. Intra-observer and inter-observer variabilities were evaluated by Cohen’s Kappa coefficient for all possible pairs of observers. ROC curve analysis was used to determine AUC to evaluate diagnostic performance of Likert and PI-RADS v2 scoring with a 1.5T MR in the diagnosis of prostate cancer. For the treatment of prostate cancer, Tadalafil is most often used, you can learn more here about generic Cialis.
Results: Kappa values for 4 or higher PI-RADS v2 scores were calculated and κ=0.59-0.71 for intra-observer and κ=0.48-0.67 for inter-observer variability were obtained. κ=0.54-0.69 for intra-observer and κ=0.42-0.75 for inter-observer reliability were obtained for 4 or higher Likert scoring. AUC was 72% for PI-RADS v2 scores and was 75.9% for Likert scores for determining Gleason 6 or higher prostate cancers. There was not a statistically significant difference in the comparison of Likert and PI-RADS v2 ROC curves.
Conclusion: PI-RADS v2 has no significant superiority compared to Likert scoring in intra-observer and inter-observer reliability. Likert and PI-RADS v2 scoring has no difference in the terms of diagnostic capability of prostate cancers.
Keywords: Prostate cancer; Magnetic Resonance Imaging; PI-RADS v2; Likert
Multiparametric magnetic resonance imaging (mpMRI) has been used in the imaging of prostate gland and provides high sensitivity and specificity in the diagnosis of prostate cancer. In many radiology centers, mpMRI has been commonly used in the diagnosis and preoperative staging of the disease (1-3).
The widespread use of the mpMRI emerged different problems in the assessment of images. Different imaging properties of prostate cancers in different sequences, variable tissue characteristics of prostate zones, existence of transitional zone nodules which mimics prostate cancer, inability of detection of low grade prostate cancer foci are some issues which cause the need for use of objective criteria in the evaluation of mpMRI (1, 4). Likert scoring was used as a non-objective test based on overall impression of the observers, but the need of objective criteria was debated in the ESUR meeting on 2010 and some suggestions were published based on the conclusions (5, 6). In the following years, PI-RADS and the next version PI-RADS v2 were used as an objective assessment criteria in mpMRI and routine scoring for all mpMRI were recommended (1). Despite of the claim that PI-RADS v2 has provided objective assessment, both PI-RADS v2 and Likert scoring systems have similar inter-observer reliability according to some authors (7, 8). Our aim in this study is to assess reliability of Likert and PI-RADS v2 in the evaluation of mpMRI.
MATERIALS AND METHODS
The patients who underwent mpMRI between January 2015 and January 2016 in our institution due to the suspicion of prostate cancer and who had histopathologic diagnosis after the MRI were evaluated retrospectively. Ethic committee approval and written informed consent from all patients were obtained. Inclusion criteria were to have a mpMRI in our institute between January 2015 and January 2016 with the suspicion of prostate cancer and to have histopathologic diagnosis with at least one systematic core biopsy or total prostatectomy. One patient who had histopathologic diagnosis of prostate cancer after transurethral resection of prostate was also included. The patients who had prostate surgery before the MRI, who received radiotherapy for prostate cancer before the MRI, and who had inadequate MR images for the evaluation such as missing sequence or intense artifacts were excluded. One patient who had extensive peripheral organ invasion causing difficulty of identification of primary cancer were also excluded. Patient selection was summarized on Figure 1. Finally, 104 patients were included in the study.
Imaging was performed with a 1.5 Tesla MRI system (General Electric Optima450w 1.5T, GE Medical Healthcare, US). Phased array body coil with 16 channels were used. Essential sequences were consisting of T2, DWI and dynamic T1 weighted images in MRI acquisition. Detailed parameters were summarized on Table 1. Ktrans, Kep perfusion and ADC maps were obtained on the MRI workstation (Advantage Workstation 4.6, GE Medical Systems, Milwaukee, WI). Dynamic contrast enhanced (DCE) images were performed with 40 phases with the intervals of 8-12 seconds. Intravenous gadobutrol (Gadovist®, BAYER) with the dose of 0.1 mmol/kg and injection rate of 2 cc/sec and 20 ml bolus saline infusion after the contrast injection were applied for DCE.
Demographic data, PSA, free PSA, prostate volume of the patients were recorded. MR images were evaluated by three observes which had 12, 4 and 3 years of experience in abdominal magnetic resonance imaging. PSA, free PSA, PSA ratio were presented to observers during the mpMRI assessment, but they were blinded to the histopathologic diagnosis of the patients.
PI-RADS v2 scoring was made according to the up-to-date guideline by the observers (1). Likert scoring was made by the observers based on their overall impression about the images subjectively. Likert scores of 1 to 5 was used, as increasing score indicated increased cancer possibility according to their non-objective impression not based on certain criteria.
In the first stage of evaluation, three observers evaluated the mpMRI of the patients independent from each other and scored all patients according to PI-RADS v2. After a week, they assessed the images again blinded to prior PI-RADS v2 scores and scored subjectively according to Likert scoring. To avoid recall bias, reading order of patients was randomized for each evaluation process. After a month interval, they repeated the PI-RADS v2 and Likert scoring independently in the same way and blinded to the prior scores. Finally, for each patient, six PI-RADS v2 and six Likert scores were recorded. Scoring was made based on dominant lesion within the peripheral zone, central gland, and whole gland.
After the all evaluations were finished by the observers, final PI-RADS v2 and final Likert scores were defined by the consensus of all three observers, and it was used to categorize patients to compare inter-observer agreement in different PI-RADS v2 groups. Also, final PI-RADS v2 and Likert scores were used to perform ROC analysis. Study design was summarized in Figure 2.
Scoring was made separately for peripheral zone and central gland and the bigger score was accepted as overall score (Figure 3). The evaluation according to different anatomic zones also assures to evaluate reliability for mpMRI for different anatomic zones in prostate gland.
Descriptive data of the patients were expressed as mean, standard deviation (SD), minimum and maximum. Frequencies and ratios were calculated for nominal variables. Interobserver and intraobserver reliability was calculated for 4 or more scoring of PI-RADS v2 and Likert with Cohen’s Kappa coefficient. ROC analysis was used to assess diagnostic performance of mean PI-RADS v2 and Likert scores. SPSS Statistics 21.0 (New York, United States) software was used in statistical analysis. Additionally, in the evaluation of inter-observer agreement of PI-RADS v2 in different tumor grades and groups, weighted kappa statistics were used.
The mean age of the 104 patients was 64.1±6.27. Out of the 104 patients, we had the data of PSA in 101 patients and free PSA data in 65 patients. PSA and related data are shown in table 2.
The number of patients who had the diagnosis of prostate cancer was 53 (51%). Remaining 51 patients (49%) had benign histopathologic findings. The biopsy method was core needle biopsy in 79 patients (76%), total prostatectomy in 24 patients (23%), and transurethral resection of prostate gland in one patient who was diagnosed with prostate cancer.
In the intra-observer agreement of PI-RADS v2 and Likert scoring, calculated Kappa ranges and the mean Kappa values are shown in table 3. In the more than 3 scoring, both PI-RADS v2 and Likert showed weak intra-observer agreement on central gland and moderate intra-observer agreement on peripheral zone and overall score.
In the inter-observer agreement of PI-RADS v2 and Likert scoring, calculated Kappa ranges and the mean Kappa values are shown in table 4. In the evaluation of the central gland, intra-observer agreement of both PI-RADS v2 and Likert scoring was minimal. Peripheral zone showed higher interobserver agreement, which was moderate. However, the agreement for the overall scores was weak.
To get a detailed information about inter-observer agreement of different groups of the patients, we evaluated reliability in different categories as: benign, benign and low grade (Gleason 6), malignant (Gleason 6 and higher) and high grade (Gleason 7 or higher). Higher inter-observer agreement was found in the higher-grade tumors while benign and low-grade prostate cancers were reduced the reproducibility of PI-RADS v2 (Figure 4).
ROC analyze was performed by using final scores obtained from consensus of all observers for each patient. ROC curves of PI-RADS v2 and Likert (Figure 5) are shown. Data which were obtained after ROC analysis is summarized in Table 5.
Prostate cancer is very common in male population and mpMRI is used widely in the diagnosis and active surveillance as a diagnostic and non-invasive method. However, it is not easy to visualize the cancer foci in the mpMRI due to different imaging findings and complicated anatomic structure of the gland. PI-RADS v2 is used in the evaluation of mpMRI and claimed to provide more objective data comparing to Likert scale.
Muller et al. in 2015 evaluated 101 cases with mpMRI and fusion imaging guided biopsy. They found kappa values of 0.46 on PI-RADS and 0.55 on non-objective test which showed moderate inter-observer agreement on both scoring systems (9). In the study of Rosenkrantz et al. in 2013, inter-observer agreement was found higher in comparison to our study. Additionally, they also showed that experienced observers have a higher inter-observer agreement. They found similar kappa values for peripheral zone between Likert and PI-RADS, however in transitional zone Likert scoring showed higher agreement in contrast to ours (7). In the study of Rosenkrantz et al, prior version of PI-RADS was used.
Vache et al. in 2014 evaluated 115 cases and found that Likert scoring has a higher diagnostic performance with higher AUC in all observers. In our study, we found higher AUC in Likert scoring which was 75.9% while AUC in PI-RADS v2 scoring was 72%. However, the difference is not statistically significant (p=0.557). Nevertheless, kappa values of Likert scoring was higher than those of PI-RADS in the study of Vache et al in contrast to our study (10).
In the multicentric study of Rosenkrantz et al. in 2016 with six observers, 4 or higher scoring in PI-RADS v2 was assessed. Kappa value was calculated as 0.593 for peripheral zone and 0.509 for central gland (11). We found Kappa values of 0.561-0.741 in peripheral zone and 0.244-0.508 in central gland. The difference between peripheral zone and central gland is more evident in our study.
There are also some studies which compare diagnostic power of Likert and PI-RADS. In the study of Grey et al, AUC of PI-RADS was found 89% and higher than what we found. This might be due to the higher resolution of the 3 Tesla MRI system, or higher individual experience of the authors (12). Roethke et al compared the diagnostic performance of Likert and PI-RADS v2 with a 3 Tesla MRI and found that PI-RADS has a higher sensitivity and specificity, in contrast to the study of Vache et al (10). As opposed to their findings, we found no difference between Likert and PI-RADS v2 in diagnostic performance (13). Renart-Penne et al in 2015 found no difference between Likert and PI-RADS in diagnostic performance as well (8).
In the current literature, as we know so far, intra-observer agreement hasn’t been evaluated. This study pointed out that Likert and PI-RADS v2 have similar intra-observer agreement which is higher in peripheral zone in comparison to central gland.
Additionally, we further evaluated the possible reasons of low agreement in PI-RADS v2 and found that as the tumor grade increases, agreement values also increase. On PI-RADS 5 lesions, agreement is better than those of lower category lesions. This finding indicates that agreement is higher in the patient group which has the need for aggressive treatment.
We have some limitations in our study. Firstly, we used 1.5 Tesla MRI system in the diagnosis without an endorectal coil. Even if the acquisition parameters were proper for prostate imaging, the magnetic power may affect the results. Secondly, not all patients included in the study were diagnosed with total prostatectomy as a gold standard. The other limitation is that, in the evaluation of inter-observer and intra-observer reliability, we dichotomized the PI-RADS v2 an Likert rating into two groups as <4 and ≥4. Such dichotomization was used to evaluate inter-reader reliability in prior studies with cut off value of ≥3 or ≥4 (11, 14, 15). Best cut-off value for our study was <4 and ≥4 according to the ROC analysis results. Alternative statistical analysis such as Fleiss Kappa for multiple observer or weighted Kappa without dichotomization were also considered; however, Cohen’s Kappa with dichotomization was preferred due to similar approach in prior studies. In addition, weighted Kappa was used in subgroup analysis.
In conclusion, PI-RADS v2 has not an evident superiority in comparison to Likert both in diagnostic accuracy and inter- and intra-observer agreement.
Acknowledgement: No declared.
Conflict of interest: none.
Informed consent: Informed consent was obtained from all individual participants included in the study.
- Weinreb JC, Barentsz JO, Choyke PL, Cornud F, Haider MA, Macura KJ, et al. PI-RADS Prostate Imaging – Reporting and Data System: 2015, Version 2. Eur Urol 2016;69:16-40.
- Lamb BW, Tan WS, Rehman A, Nessa A, Cohen D, O’Neil J, et al. Is Prebiopsy MRI Good Enough to Avoid Prostate Biopsy? A Cohort Study Over a 1-Year Period. Clin Genitourin Cancer 2015;13:512-7.
- De Visschere PJ, Naesens L, Libbrecht L, Van Praet C, Lumen N, Fonteyne V, et al. What kind of prostate cancers do we miss on multiparametric magnetic resonance imaging? Eur Radiol 2016;26:1098-107.
- Bomers JG, Barentsz JO. Standardization of multiparametric prostate MR imaging using PI-RADS. Biomed Res Int 2014;2014:431680.
- Rosenkrantz AB, Kim S, Lim RP, Hindman N, Deng FM, Babb JS, et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology 2013;269:482-92.
- Dickinson L, Ahmed HU, Allen C, Barentsz JO, Carey B, Futterer JJ, et al. Magnetic resonance imaging for the detection, localisation, and characterisation of prostate cancer: recommendations from a European consensus meeting. Eur Urol 2011;59:477-94.
- Rosenkrantz AB, Lim RP, Haghighi M, Somberg MB, Babb JS, Taneja SS. Comparison of interreader reproducibility of the prostate imaging reporting and data system and likert scales for evaluation of multiparametric prostate MRI. AJR Am J Roentgenol 2013;201:W612-8.
- Renard-Penna R, Mozer P, Cornud F, Barry-Delongchamps N, Bruguiere E, Portalez D, et al. Prostate Imaging Reporting and Data System and Likert Scoring System: Multiparametric MR Imaging Validation Study to Screen Patients for Initial Biopsy. Radiology 2015;275:458-68.
- Muller BG, Shih JH, Sankineni S, Marko J, Rais-Bahrami S, George AK, et al. Prostate Cancer: Interobserver Agreement and Accuracy with the Revised Prostate Imaging Reporting and Data System at Multiparametric MR Imaging. Radiology 2015;277:741-50.
This article is written by licensed urologist Dr. Lorie G Fleck who is a highly qualified specialist. If you have any questions you can ask us through the feedback form and Dr. Lorie G Fleck will answer you within a working day. We care about every patient.