A Comparative Data Analysis on Lab vs. Mined Diamonds

Posted on Feb 22, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Thank you for taking the time to read my research! Please feel free to use the link below to explore my code on GitHub.




If I were to hand you two diamonds-- one created in a lab, one mined from the earth-- would you be able to distinguish one from the other? Would the type of diamond have any impact on your engagement?

Lab diamonds are man-made, created by mimicking the extreme heat and pressure necessary to create diamonds. They are identical to a mined diamond in every way, down to its chemical structure. Lab diamonds have a more attractive price point, and buyers can feel guilt-free about avoiding the need to contribute to the humanitarian and environmental concerns associated with the diamond mining industry.

According to a 2018 report by the Antwerp World Diamond Centre, this segment of the market is increasing between 15% and 20% annually.


“... nearly 70% of millenials are considering buying a lab grown alternative.”

Credit: The MVI Marketing LLC


In light of the growing trend, I wanted to do a comparative analysis of lab vs. mined diamonds. The main goal was to provide trends and insights to inform prospective buyers of lab diamonds for engagement rings.

I assumed the price of the lab diamonds would be cheaper going into this study, but wanted to better understand the price difference between lab and mined diamonds. I also wanted to study the relationships between diamond characteristics and their effects on the price of the diamond.

Data Preparation

Lab diamond data was gathered from the websites of two retailers, Clean Origin and MiaDonna using Selenium to scrape over 10,000 diamonds.

For comparison, the data on 54,000 mined diamonds was sourced from the “diamonds” package in the ggplot2 library. All of the diamonds in this package were round-cut. For consistency, I filtered the lab diamonds from Clean Origin and MiaDonna to be of the same round-cut. The dataset also included the price of the diamonds in $USD, as well as the “Four C’s” of diamond analysis: Carat, Cut, Clarity, and Color.

Carat - weight of the diamond

While the average diamond used for engagement is between 1.08 to 1.2 carats, I set a broader range between 0.5 to 2.5 carat diamonds to attract a larger base of potential buyers.

Cut - quality of brilliance on a GIA rating scale

From best to worst: Ideal, Excellent, Very Good, Good, Fair, Poor

Clarity - assessment of imperfections

From best to worst: IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1

Color - hue of the diamond

From colorless to yellow: D, E, F, G, H, I, J, K

One caveat to consider is that the year in which the diamonds dataset from ggplot2 was created was unavailable. Therefore, I am unable to account for any possible inflation and/or other variance in price compared to the lab diamond prices from this year.

Data Analysis

Going into my analysis, I had the assumption that the size of the diamond would have a significant effect on its price. Using Pearson’s correlation, I was able to confirm the strong correlation with scores of 0.89 for mined diamonds and 0.82 for lab diamonds. It was noteworthy that as the carat increased, the price of the diamond increased at a faster rate for mine diamonds. This suggests that as you buy a larger diamond, the perceived discount of the lab diamond would be greater.

A Comparative Data Analysis on Lab vs. Mined Diamonds

Due to the strong influence the carat of the diamond had on its price, I did not want it to cause bias when analyzing the effects of the diamond’s other characteristics on price. To accomplish this, I created a new measure of price-per-carat to track the pricing across cut, clarity, and color characteristics of the diamonds. In comparing the price-per-carat, the lab diamond’s mean and median were both around 50% lower than those of the mined diamond. When comparing diamonds in this dataset, the price of lab diamonds were half the price of mined diamonds.

A Comparative Data Analysis on Lab vs. Mined Diamonds

Data on cut

Across the 5 measures of a diamond’s cut, the graph showed that cut did not have a significant effect on the price or price-per-carat of lab diamonds. While the lab diamond with a “Fair” rating sharply declined in price, it was due to having only a single observation. Thus, it does not have enough observations to show a significant trend.

A Comparative Data Analysis on Lab vs. Mined Diamonds

One explanation of this could be answered in the proportion of lab diamonds by cut rating. 92% of lab diamonds in the dataset had either Ideal or Excellent cut ratings. It is possible that the man-made process may regularly produce more better cut diamonds, decreasing its significance and causing the cut to have a minimal impact on the price of the diamond.

Data on Clarity

Across the measures of a diamond’s clarity, the trend was pretty consistent that as the clarity worsens, the price-per-carat also decreases. However, for mined diamonds, the sharpness of price was contingent on the subcategory label of clarity (ex: VVS1 to 2 had a smaller decline than VVS2 to VS1, and VS1 to 2 had a smaller decline than VS2 to SI1). For lab diamonds, the price-per-carat decline was sharpest from IF rating to VVS1, but then had a fairly consistent decline, thereafter.

A Comparative Data Analysis on Lab vs. Mined Diamonds

Data on Color

Across the measures of a diamond’s color, I had expected a very small relationship with price. My assumption was based on the idea that a diamond’s hue would not matter as much because the yellow hue would not be very noticeable on a gold diamond, as the warm tones compliment any yellow hue of the diamond.

As the graph shows, the price of the diamond actually increased as the color worsened for mined diamonds. My hypothesis on this trend was due to the diamonds’ carat having an effect on the price increase.

As shown on the price-per-carat comparison, the use of this new measure has normalized the data, minimizing the caret’s influence on price. This graph confirmed my hypothesis, as the price-per-carat holds up relatively well between the D-J ratings on mined diamonds. For lab diamonds, the price-per-carat does not have any effect between D,E,F ratings, before beginning its downtrend that accelerates for I,J,K ratings.


From my comparative analysis, I can generalize:

  • On average, buyers could be paying 50% less for lab diamonds than mined diamonds
  • Carat had the biggest impact, followed by clarity for both types of diamonds
  • Color does not have a big impact on price between the higher D-F ratings, but the price decline accelerates as the color rating decreases.
  • Cut does not have a significant impact on lab diamonds, possibly due to the high availability of Ideal/Excellent cut diamonds

Future Work

I would like to scrape more websites of lab diamond retailers to increase the number of observations, as well as perform a comparative analysis on the inventory of lab diamonds between retailers.

About Author

David Kim

Prior to enrolling in the NYCDSA bootcamp, I worked as an Operations Development Manager for a multinational hospitality brand. I used my skills in data analysis to help gather insight on the business and translate my findings to...
View all posts by David Kim >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI