A Comparative Data Analysis on Lab vs. Mined Diamonds
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Thank you for taking the time to read my research! Please feel free to use the link below to explore my code on GitHub.
GitHub
Introduction
If I were to hand you two diamonds-- one created in a lab, one mined from the earth-- would you be able to distinguish one from the other? Would the type of diamond have any impact on your engagement?
Lab diamonds are man-made, created by mimicking the extreme heat and pressure necessary to create diamonds. They are identical to a mined diamond in every way, down to its chemical structure. Lab diamonds have a more attractive price point, and buyers can feel guilt-free about avoiding the need to contribute to the humanitarian and environmental concerns associated with the diamond mining industry.
According to a 2018 report by the Antwerp World Diamond Centre, this segment of the market is increasing between 15% and 20% annually.
“... nearly 70% of millenials are considering buying a lab grown alternative.”
Credit: The MVI Marketing LLC
Goal
In light of the growing trend, I wanted to do a comparative analysis of lab vs. mined diamonds. The main goal was to provide trends and insights to inform prospective buyers of lab diamonds for engagement rings.
I assumed the price of the lab diamonds would be cheaper going into this study, but wanted to better understand the price difference between lab and mined diamonds. I also wanted to study the relationships between diamond characteristics and their effects on the price of the diamond.
Data Preparation
Lab diamond data was gathered from the websites of two retailers, Clean Origin and MiaDonna using Selenium to scrape over 10,000 diamonds.
For comparison, the data on 54,000 mined diamonds was sourced from the “diamonds” package in the ggplot2 library. All of the diamonds in this package were round-cut. For consistency, I filtered the lab diamonds from Clean Origin and MiaDonna to be of the same round-cut. The dataset also included the price of the diamonds in $USD, as well as the “Four C’s” of diamond analysis: Carat, Cut, Clarity, and Color.
Carat - weight of the diamond
While the average diamond used for engagement is between 1.08 to 1.2 carats, I set a broader range between 0.5 to 2.5 carat diamonds to attract a larger base of potential buyers.
Cut - quality of brilliance on a GIA rating scale
From best to worst: Ideal, Excellent, Very Good, Good, Fair, Poor
Clarity - assessment of imperfections
From best to worst: IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1
Color - hue of the diamond
From colorless to yellow: D, E, F, G, H, I, J, K
One caveat to consider is that the year in which the diamonds dataset from ggplot2 was created was unavailable. Therefore, I am unable to account for any possible inflation and/or other variance in price compared to the lab diamond prices from this year.
Data Analysis
Going into my analysis, I had the assumption that the size of the diamond would have a significant effect on its price. Using Pearson’s correlation, I was able to confirm the strong correlation with scores of 0.89 for mined diamonds and 0.82 for lab diamonds. It was noteworthy that as the carat increased, the price of the diamond increased at a faster rate for mine diamonds. This suggests that as you buy a larger diamond, the perceived discount of the lab diamond would be greater.
Due to the strong influence the carat of the diamond had on its price, I did not want it to cause bias when analyzing the effects of the diamond’s other characteristics on price. To accomplish this, I created a new measure of price-per-carat to track the pricing across cut, clarity, and color characteristics of the diamonds. In comparing the price-per-carat, the lab diamond’s mean and median were both around 50% lower than those of the mined diamond. When comparing diamonds in this dataset, the price of lab diamonds were half the price of mined diamonds.
Data on cut
Across the 5 measures of a diamond’s cut, the graph showed that cut did not have a significant effect on the price or price-per-carat of lab diamonds. While the lab diamond with a “Fair” rating sharply declined in price, it was due to having only a single observation. Thus, it does not have enough observations to show a significant trend.
One explanation of this could be answered in the proportion of lab diamonds by cut rating. 92% of lab diamonds in the dataset had either Ideal or Excellent cut ratings. It is possible that the man-made process may regularly produce more better cut diamonds, decreasing its significance and causing the cut to have a minimal impact on the price of the diamond.
Data on Clarity
Across the measures of a diamond’s clarity, the trend was pretty consistent that as the clarity worsens, the price-per-carat also decreases. However, for mined diamonds, the sharpness of price was contingent on the subcategory label of clarity (ex: VVS1 to 2 had a smaller decline than VVS2 to VS1, and VS1 to 2 had a smaller decline than VS2 to SI1). For lab diamonds, the price-per-carat decline was sharpest from IF rating to VVS1, but then had a fairly consistent decline, thereafter.
Data on Color
Across the measures of a diamond’s color, I had expected a very small relationship with price. My assumption was based on the idea that a diamond’s hue would not matter as much because the yellow hue would not be very noticeable on a gold diamond, as the warm tones compliment any yellow hue of the diamond.
As the graph shows, the price of the diamond actually increased as the color worsened for mined diamonds. My hypothesis on this trend was due to the diamonds’ carat having an effect on the price increase.
As shown on the price-per-carat comparison, the use of this new measure has normalized the data, minimizing the caret’s influence on price. This graph confirmed my hypothesis, as the price-per-carat holds up relatively well between the D-J ratings on mined diamonds. For lab diamonds, the price-per-carat does not have any effect between D,E,F ratings, before beginning its downtrend that accelerates for I,J,K ratings.
Summary
From my comparative analysis, I can generalize:
- On average, buyers could be paying 50% less for lab diamonds than mined diamonds
- Carat had the biggest impact, followed by clarity for both types of diamonds
- Color does not have a big impact on price between the higher D-F ratings, but the price decline accelerates as the color rating decreases.
- Cut does not have a significant impact on lab diamonds, possibly due to the high availability of Ideal/Excellent cut diamonds
Future Work
I would like to scrape more websites of lab diamond retailers to increase the number of observations, as well as perform a comparative analysis on the inventory of lab diamonds between retailers.