Preclinical multimodality phantom design for quality assurance of tumor size measurement

Background Evaluation of changes in tumor size from images acquired by ultrasound (US), computed tomography (CT) or magnetic resonance imaging (MRI) is a common measure of cancer chemotherapy efficacy. Tumor size measurement based on either the World Health Organization (WHO) criteria or the Response Evaluation Criteria in Solid Tumors (RECIST) is the only imaging biomarker for anti-cancer drug testing presently approved by the United States Food and Drug Administration (FDA). The aim of this paper was to design and test a quality assurance phantom with the capability of monitoring tumor size changes with multiple preclinical imaging scanners (US, CT and MRI) in order to facilitate preclinical anti-cancer drug testing. Methods Three phantoms (Gammex/UTHSCSA Mark 1, Gammex/UTHSCSA Mark 2 and UTHSCSA multimodality tumor measurement phantom) containing tumor-simulating test objects were designed and constructed. All three phantoms were scanned in US, CT and MRI devices. The size of test objects in the phantoms was measured from the US, CT and MRI images. RECIST, WHO and volume analyses were performed. Results The smaller phantom size, simplified design and better test object CT contrast of the UTHSCSA multimodality tumor measurement phantom allowed scanning of the phantom in preclinical US, CT and MRI scanners compared with only limited preclinical scanning capability of Mark 1 and Mark 2 phantoms. For all imaging modalities, RECIST and WHO errors were reduced for UTHSCSA multimodality tumor measurement phantom (≤1.69 ± 0.33%) compared with both Mark 1 (≤ -7.56 ± 6.52%) and Mark 2 (≤ 5.66 ± 1.41%) phantoms. For the UTHSCSA multimodality tumor measurement phantom, measured tumor volumes were highly correlated with NIST traceable design volumes for US (R2 = 1.000, p < 0.0001), CT (R2 = 0.9999, p < 0.0001) and MRI (R2 = 0.9998, p < 0.0001). Conclusions The UTHSCSA multimodality tumor measurement phantom described in this study can potentially be a useful quality assurance tool for verifying radiologic assessment of tumor size change during preclinical anti-cancer therapy testing with multiple imaging modalities.


Background
Highly consistent, reproducible and standardized response criteria are essential to evaluate the efficacy of new anti-cancer drugs in multicenter trials [1]. The World Health Organization (WHO) criteria and the Response Evaluation Criteria in Solid Tumors (RECIST) have been widely used as the only imaging biomarker presently approved by the United States Food and Drug Administration (FDA) for drug testing, although the use of functional imaging methods such as Positron Emission Tomography (PET) Response Criteria in Solid Tumors (PERCIST) complements the limitations of anatomic methods in treatment response assessment in terms of biological relevance and prognostic information [2][3][4]. The criteria require either two dimensional (WHO criteria-sum of the product of greatest perpendicular dimensions in the transverse plane over all target lesions) or one dimensional (RECIST -sum of single longest dimensions in the transverse plane for arbitrary five lesions per organ and up to ten lesions per patient) tumor size measurements [1,[4][5][6][7][8][9][10][11][12]. Three-dimensional radiologic assessment of tumor burden also has been performed using volumetric techniques [13,14].
Several quality assurance (QA) phantoms for anatomic measurement have been developed for assessment with computed tomography (CT) and magnetic resonance imaging (MRI) [15][16][17][18][19][20][21][22]. However, the development of these phantoms has predominately focused on clinical assessment. Yet, in recent years, there has been an emphasis to improve preclinical anti-cancer drug testing by incorporating longitudinal imaging of tumor models with use of preclinical scanners specially designed for small rodents [23][24][25]. A tumor measurement QA phantom for preclinical studies in rodent models could be used to identify and correct biased measurement results for tumor size determined with different imaging modalities in multiple laboratories or institutions [26]. In addition, the verification for radiologic assessment of tumor size change using this QA phantom would allow standardization of imaging protocols prior to animal studies, thus potentially reducing the number of animals required, increasing study efficiency and decreasing cost.
This Technical Advance describes the evolution in design, construction and testing of a multimodality QA phantom for use with preclinical scanners. Initial design attempts modified commercial phantoms available for human testing. By using the results from these early versions, the UTHSCSA multimodality tumor measurement QA phantom was successfully constructed for further quality assurance testing of tumor size in rodent models.

Gammex/UTHSCSA Mark 1 phantom
In 2007, the Gammex 404 GS LE phantom was modified in an attempt to construct the first generation tumor measurement phantom and this new phantom was denoted as Gammex/UTHSCSA Mark 1 phantom (Figure 1 a and 1b, Table 1). The phantom was composed of four sets of measurement calibration standards: A. Image Caliper -stainless steel wires at 2 mm vertical intervals and 3 mm horizontal intervals, B. Volumetwo spheres with volumes 179.59 and 523.59 mm 3 , C. Diameter -long cylinder (5 to 10 mm by 0.5 mm intervals) and D. Diameter Depth Dependence -2 mm-cylinders from 2 to 15 mm depth.
US, CT and MR images of the phantom were obtained following the imaging protocols listed in Table 2 ( Figure  1 c-e). The size of test objects in the phantom was measured three times independently. For US, visual measurements were made using a measurement tool in Vevo 770 v.2.2.3 software (Visualsonics Inc., Toronto, ON, Canada). For MRI, a full width at half maximum (FWHM) method was used in ImageJ software (Version 1.42q, National Institutes of Health, Bethesda, MD). RECIST and WHO analyses were performed according to their definitions. For volume accuracy, the equation V = π/6·a·b·c where a, b and c are diameters in three perpendicular dimensions was used [27].

Gammex/UTHSCSA Mark 2 phantom
After testing the Gammex/UTHSCSA Mark 1 phantom in multiple imaging modalities, the phantom was redesigned based on the number of target lesions (five per organ) required by RECIST and the dimensions necessary for use in preclinical scanners ( Figure 2 a and 2b, Table 1). The phantom consisted of five tumor-simulating test objects with different diameters of measurement calibration standards: A. Diameter -low contrast spheres (2, 4, 7, 10 and 14 mm) and B. Volume -low contrast spheres with volumes 4.2 to 1436.8 mm 3 . The sizes of test objects were chosen based on the following reasons: 2 mm is the smallest tumor size in rodent models that can be readily palpated and 14 mm is the maximum tumor size tolerated without perturbing influence by host animal physiology.
US, CT and MR images were acquired and the size of test objects in the phantom for the US and MR images was measured to calculate RECIST, WHO and volume as described for the Mark 1 phantom (Figure 2 c-e, Table 2).

UTHSCSA multimodality tumor measurement phantom
A new multimodality tumor measurement phantom was constructed to improve the contrast and geometry of Gammex/UTHSCSA Mark 1 and 2 phantoms (Figure 3 a and 3b, Table 1). The phantom had five test objects of 2, 4, 7, 10 and 14 mm as Gammex/ UTHSCSA Mark 2 phantom but was constructed with smaller dimensions (length × width × depth of 11.5 cm × 3.8 cm × 2.4 cm) so that it would fit any preclinical scanner. The phantom was made in house of tissue mimicking (TM) materials based on methods developed in Dr. Ernest L. Madsen's laboratory at the University of Wisconsin Madison [28] (See additional file 1: technical appendix with detailed description of phantom construction; additional file 2: Table S1 summarizing the phantom ingredients; additional file 3: Figure S1 describing silicone mold preparation; additional file 4: Figure S2 describing silicone mold procedures; and additional file 5: Figure S3 describing phantom assembly procedures).
US, CT and MR images of the phantom were acquired ( Figure 3 c-e, Table 2). The size of test objects in the phantom was measured to calculate RECIST, WHO and volume for the US and MRI images as described for the Mark 1 phantom. For CT, a FWHM method was used in ImageJ software. For CT and MRI, contrast (%) was calculated using the equation C = (S background -S object )/ S background .

Statistical analysis
Linear regression analysis was performed on design volume (NIST traceable gold standard) as a function of measured volume of test objects for Gammex/UTHSCSA Mark 2 phantom and the UTHSCSA multimodality tumor measurement phantom using GraphPad Prism software (Version 5.01, GraphPad Software Inc, San Diego, CA). Analysis of RECIST and WHO for all three phantoms and that of volume for Gammex/UTHSCSA Mark 1 phantom could not be performed because two data points were insufficient for statistical analyses. A pvalue < 0.05 was considered statistically significant. Distortion of spheres was evident because the lateral resolution was significant in areas far from focal length.
Slight reverberation artifacts were observed.

Results
Multimodality images of Gammex/UTHSCSA Mark 1 phantom, Gammex/UTHSCSA Mark 2 phantom, and UTHSCSA multimodality tumor measurement phantom In the Gammex/UTHSCSA Mark 2 phantom, the design was simplified based on the number (five per organ) of target lesions required by RECIST as shown in Figure 2. The UTHSCSA multimodality tumor measurement phantom had the same structure as the Mark 2 phantom but the geometry and contrast of the phantom were improved by reducing the size of the phantom and adding contrast agents to the TM materials as displayed in Figure 3. Tumor-simulating test objects appeared darker than background in all three images and the contrast between test objects and background (CT: 9.67% and MRI: 25.15%) was sufficient to distinguish test objects and measure their size. Except for a small reverberation close to the surface in the US images, no artifacts were evident for the UTHSCSA multimodality tumor measurement phantom.
Size measurement in Gammex/UTHSCSA Mark 1, Gammex/UTHSCSA Mark 2 phantom, and UTHSCSA multimodality tumor measurement phantom RECIST, WHO and volume analyses for two spheres from US and MR images of the Gammex/UTHSCSA Mark 1 phantom are displayed in Table 3. For the Mark 1 phantom, smaller errors were determined for RECIST for both US (1.73 ± 0.44%) and MRI (-2.65 ± 3.74%) compared with WHO (US, -4.75 ± 1.30%; MRI, -7.56 ± 6.52%), with MRI errors larger than for US by both RECIST and WHO. For volume analysis, MRI errors were larger than for US for both the 7 mm and 10 mm test objects. RECIST, WHO and volume analyses for CT were not determined due to inadequate CT contrast. Table 4 shows RECIST, WHO and volume analyses for five test objects from US and MR images of the Gammex/UTHSCSA Mark 2 phantom. Measurements from CT images were not determined due to the same reasons as mentioned for the Mark 1 phantom (Figure 2  d). For US, RECIST (5.66 ± 1.41%) had larger errors than WHO (-0.16 ± 1.32%). For MRI, RECIST and WHO analyses showed small errors ranging from 0.39 ± 2.54% for RECIST to -2.05 ± 2.79% for WHO. Volumes calculated from US images had larger errors (range of -5.69 ± 1.59% to 7.29 ± 5.65%) for smaller test objects (2, 4 and 7 mm) which improved with the analyses of the 10 mm (3.99 ± 2.03%) and 14 mm (1.21 ± 0.66%) test objects. Volume analysis from MR images showed similar features to that for US but had much larger errors (range of -21.81 ± 66.60% to 11.86 ± 21.62%) for smaller test objects (2, 4 and 7 mm). For Mark 2 phantom, tumor volume measured by US and MRI correlated (p < 0.0001) with design volume ( Table 4). The best fits for US and MRI versus design volume were line y = 1.014 ± 0.009x -0.152 ± 6.341 (R 2 = 0.9998; p < 0.0001) and line y = 0.962 ± 0.011x -6.665 ± 7.357 (R 2 = 0.9996; p < 0.0001), respectively.
RECIST, WHO and volume analyses for the UTHSCSA multimodality tumor measurement phantom are displayed in Table 5. Unlike results for the Mark 2 phantom, RECIST and WHO calculations showed reduced errors (range of -1.47 ± 0.25% to 1.69 ± 0.33%) for all three modalities. RECIST analysis showed smaller errors than WHO analysis except for CT. For volume analysis, errors were ≤ -2.84 ± 2.49% except for the 10 mm test object in MRI (-5.34 ± 0.76%) and the smallest test object (2 mm) with errors ranging from -18.30 ± 10.65% to 5.72 ± 0.60% for CT and MRI, respectively. For the UTHSCSA multimodality tumor measurement phantom, US-, CT-and MRI-measured tumor volume also correlated (p < 0.0001) with design volume (Table  5). US, CT and MRI -measured volume versus design (NIST traceable gold standard) volume had the best fit of lines y = 0.980 ± 0.003x + 2.277 ± 2.261 (R 2 = 1.000; p < 0.0001), y = 1.011 ± 0.004x + 0.413 ± 3.052 (R 2 = 0.9999; p < 0.0001) and y = 0.977 ± 0.008x -1.013 ± 5.613 (R 2 = 0.9998; p < 0.0001), respectively. These results demonstrate that technical personnel using the phantom could quickly prove the data from all three modalities is acceptable over the entire range of sizes with error limits determined by the study designer by comparing the slope and intercept values from a simple regression analysis (Table 5).

Discussion
Previous QA phantoms constructed for size measurement had various tumor shapes and focused predominately on measurement of test objects from CT and MRI images using measurement protocols unique to their institution [15][16][17]. This study focused on construction of a phantom with a simple spherical test object design based on a FDA approved imaging biomarker (WHO criteria, RECIST) for use with multiple preclinical imaging devices. As discussed in Table 1, the Gammex/UTHSCSA Mark 1 and Mark 2 phantoms were too large to fit into the bore of some preclinical CT and MR scanners. Since certain components of the Mark 1 phantom such as image caliper and depth dependence were not required for QA of tumor size measurement, these features were deleted in the Mark 2 phantom based on RECIST ( Figure 2). Composite aluminum poly film (Figures 1 c and 2 c) on the Mark 1 phantom surface caused reverberation artifact in the US images that were corrected in future phantoms by using thin composite polyethylene terephthalate/aluminum/linear low density polyethylene (PET/AL/LLDPE). In addition, test objects in US images of the Mark 2 phantom did not appear as perfect spheres compared with those in MR  images (Figure 2c). The beam dispersion in the region deeper than focal depth created distortion in the spheres (overestimated diameter in horizontal directions and underestimated diameter in depth). The contrast in CT images of both phantoms was not sufficient to make size measurements ( Figure 1 d and 2 d).
In the UTHSCSA multimodality tumor measurement phantom, size, distortion and contrast problems were solved for the images acquired with all three modalities (Figure 3 c-e). First, the diameter of the tumor measurement phantom was reduced to fit within the bore of all preclinical scanners. Second, the center of test objects was designed to be set above the focal depth (10 mm for 35 MHz transducer) to avoid distortion. Third, barium sulfate was used for pronounced CT contrast. As a result, test object measurements were improved for the UTHSCSA multimodality tumor measurement phantom. For all imaging modalities, RECIST and WHO errors were reduced for UTHSCSA multimodality tumor measurement phantom (≤1.69 ± 0.33%) compared with both Mark 1 (≤ -7.56 ± 6.52%) and Mark 2 (≤ 5.66 ± 1.41%) phantoms.
RECIST values were more accurate than WHO values for the UTHSCSA multimodality tumor measurement phantom except for CT. This result corresponded to the fact that WHO criteria are known to give higher risk of measurement error and overestimation of response rates [9]. Volume calculation of the smallest test object (2 mm) in the UTHSCSA multimodality tumor measurement phantom had the largest errors of -15.34 ± 0.04% and -18.30 ± 10.65% for US and CT, respectively, and errors were reduced for larger test objects (≤ -2.84 ± 2.49%) except for 10 mm sphere by MRI (-5.34 ± 0.76%) ( Table 5). This explains why small tumors smaller than or equal to 2 mm in preclinical and clinical tumor models cannot be measured with high accuracy.