A Comparative Analysis on Lab vs. Mined Diamonds

Posted on Feb 22, 2021

Thank you for taking the time to read my research! Please feel free to use the link below to explore my code on GitHub.




If I were to hand you two diamonds-- one created in a lab, one mined from the earth-- would you be able to distinguish one from the other? Would the type of diamond have any impact on your engagement?

Lab diamonds are man-made, created by mimicking the extreme heat and pressure necessary to create diamonds. They are identical to a mined diamond in every way, down to its chemical structure. Lab diamonds have a more attractive price point, and buyers can feel guilt-free about avoiding the need to contribute to the humanitarian and environmental concerns associated with the diamond mining industry.

According to a 2018 report by the Antwerp World Diamond Centre, this segment of the market is increasing between 15% and 20% annually.

“... nearly 70% of millenials are considering buying a lab grown alternative.”

Credit: The MVI Marketing LLC


In light of the growing trend, I wanted to do a comparative analysis of lab vs. mined diamonds. The main goal was to provide trends and insights to inform prospective buyers of lab diamonds for engagement rings.

I assumed the price of the lab diamonds would be cheaper going into this study, but wanted to better understand the price difference between lab and mined diamonds. I also wanted to study the relationships between diamond characteristics and their effects on the price of the diamond.

Data Preparation

Lab diamond data was gathered from the websites of two retailers, Clean Origin and MiaDonna using Selenium to scrape over 10,000 diamonds.

For comparison, the data on 54,000 mined diamonds was sourced from the “diamonds” package in the ggplot2 library. All of the diamonds in this package were round-cut. For consistency, I filtered the lab diamonds from Clean Origin and MiaDonna to be of the same round-cut. The dataset also included the price of the diamonds in $USD, as well as the “Four C’s” of diamond analysis: Carat, Cut, Clarity, and Color.

Carat - weight of the diamond

While the average diamond used for engagement is between 1.08 to 1.2 carats, I set a broader range between 0.5 to 2.5 carat diamonds to attract a larger base of potential buyers.

Cut - quality of brilliance on a GIA rating scale

From best to worst: Ideal, Excellent, Very Good, Good, Fair, Poor

Clarity - assessment of imperfections

From best to worst: IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1

Color - hue of the diamond

From colorless to yellow: D, E, F, G, H, I, J, K

One caveat to consider is that the year in which the diamonds dataset from ggplot2 was created was unavailable. Therefore, I am unable to account for any possible inflation and/or other variance in price compared to the lab diamond prices from this year.


Going into my analysis, I had the assumption that the size of the diamond would have a significant effect on its price. Using Pearson’s correlation, I was able to confirm the strong correlation with scores of 0.89 for mined diamonds and 0.82 for lab diamonds. It was noteworthy that as the carat increased, the price of the diamond increased at a faster rate for mine diamonds. This suggests that as you buy a larger diamond, the perceived discount of the lab diamond would be greater.

Due to the strong influence the carat of the diamond had on its price, I did not want it to cause bias when analyzing the effects of the diamond’s other characteristics on price. To accomplish this, I created a new measure of price-per-carat to track the pricing across cut, clarity, and color characteristics of the diamonds. In comparing the price-per-carat, the lab diamond’s mean and median were both around 50% lower than those of the mined diamond. When comparing diamonds in this dataset, the price of lab diamonds were half the price of mined diamonds.


Across the 5 measures of a diamond’s cut, the graph showed that cut did not have a significant effect on the price or price-per-carat of lab diamonds. While the lab diamond with a “Fair” rating sharply declined in price, it was due to having only a single observation. Thus, it does not have enough observations to show a significant trend.

One explanation of this could be answered in the proportion of lab diamonds by cut rating. 92% of lab diamonds in the dataset had either Ideal or Excellent cut ratings. It is possible that the man-made process may regularly produce more better cut diamonds, decreasing its significance and causing the cut to have a minimal impact on the price of the diamond.


Across the measures of a diamond’s clarity, the trend was pretty consistent that as the clarity worsens, the price-per-carat also decreases. However, for mined diamonds, the sharpness of price was contingent on the subcategory label of clarity (ex: VVS1 to 2 had a smaller decline than VVS2 to VS1, and VS1 to 2 had a smaller decline than VS2 to SI1). For lab diamonds, the price-per-carat decline was sharpest from IF rating to VVS1, but then had a fairly consistent decline, thereafter.


Across the measures of a diamond’s color, I had expected a very small relationship with price. My assumption was based on the idea that a diamond’s hue would not matter as much because the yellow hue would not be very noticeable on a gold diamond, as the warm tones compliment any yellow hue of the diamond.

As the graph shows, the price of the diamond actually increased as the color worsened for mined diamonds. My hypothesis on this trend was due to the diamonds’ carat having an effect on the price increase.

As shown on the price-per-carat comparison, the use of this new measure has normalized the data, minimizing the caret’s influence on price. This graph confirmed my hypothesis, as the price-per-carat holds up relatively well between the D-J ratings on mined diamonds. For lab diamonds, the price-per-carat does not have any effect between D,E,F ratings, before beginning its downtrend that accelerates for I,J,K ratings.


From my comparative analysis, I can generalize:

  • On average, buyers could be paying 50% less for lab diamonds than mined diamonds
  • Carat had the biggest impact, followed by clarity for both types of diamonds
  • Color does not have a big impact on price between the higher D-F ratings, but the price decline accelerates as the color rating decreases.
  • Cut does not have a significant impact on lab diamonds, possibly due to the high availability of Ideal/Excellent cut diamonds

Future Work

I would like to scrape more websites of lab diamond retailers to increase the number of observations, as well as perform a comparative analysis on the inventory of lab diamonds between retailers.

About Author

David Kim

Prior to enrolling in the NYCDSA bootcamp, I worked as an Operations Development Manager for a multinational hospitality brand. I used my skills in data analysis to help gather insight on the business and translate my findings to...
View all posts by David Kim >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp