Semantic Segmentation of Nodules in Thyroid Ultrasound Images Using a Fully Convolutional Neural Network

Posted on Apr 9, 2019

Introduction

Complications in biopsies can lead to hemorrhaging, infection, and damage to nearby tissue or organs. To reduce the frequency of inaccurate readings and false positives resulting in unnecessary biopsies,  Koios Medical uses image analysis and artificial intelligence algorithms in the early detection and treatment of disease.

In collaboration with Koios Medical, the goal of this project is to perform semantic segmentation of nodules found in thyroid ultrasound images.  Semantic segmentation is the process of associating each pixel in an image with a class label.  For this project, a pixel is either labeled as nodule or non-nodule.    

The Digital Database of Thyroid Ultrasound Images is an open source database that contains 345 patient cases and 635 images with coordinate locations of nodules. A fully convolutional neural network referred to as U-net is constructed in Python using Keras and Tensorflow backend for fast and precise image segmentation of nodules.

Data Pre-Processing

All images have associated XML files containing metadata about the image.  A dictionary data structure is created using image id as a key to map all coordinates of individual nodules denoted as marks to the associated image.  A data frame is constructed with the dictionary and used to overlay coordinates onto the associated images for spot checking.  The figure below illustrates coordinates overlayed on an original image.

A separate image is created with coordinates only.  The coordinates are used to create a contour.  The image is then binarized by setting pixels within the contour to 255 and all remaining pixels are set to zero.  The newly constructed image is a mask used as a label for model training.  The mask for the image above is shown below.

Nueral Network Construction

The U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg, Germany. The network is based on the fully convolutional network and its architecture was modified and extended to work with fewer training images and to yield more precise segmentations.  The architecture is illustrated below.

Below is a summary of the modified architecture used. Training was performed with a various number of filters, with and without batch normalization, with and without dropout, and with and without data augmentation.

Due to the small sample of ultrasound images, data augmentation is used to create an additional 1500 images. A series of random rotations between ± 20 degrees and horizontal flips are performed. Random augmentation is applied simultaneously to image and mask label to maintain a relationship. A sample of an augmented image and assocated mask is shown below.

Results

The best results are achieved using data augmentation with batch normalization and without dropout. The best model had a mean Intersection Over Union (IOU) of ~.86 on the test set.

The binary cross entropy loss function is continually minimized until 30 epochs. This achieved a mean IOU of ~.86. Below are five random test set samples. The leftmost plot is the original ultrasound image with coordinate annotation. The middle left plot is the mask created from the annotations. The middle right plot is the predicted mask as determined from the trained neural network. The rightmost plot is the binarized prediction from the neural network. The associated IOU is found in the upper left corner of each image set.

Future Work

Further exploration into creating an end-to-end function that does segmentation and classification creating an output folder with cropped images in order of their probability of malignancy is the next stage.

All data and code can be found:
https://github.com/Joseph-C-Fritch/unet_segmentation


About Author

Joseph C. Fritch

Data Scientist and Control Systems Engineer with 5 years experience in the energy analysis and building automation space. Interests include machine learning and its applications in controlling dynamic systems.
View all posts by Joseph C. Fritch >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI