Building a Painting Style Classifier with Tensorflow
The objective of my project was to build a painting style classifier with Tensorflow (Artificial Neural Network), that would be able to distinguish 28 different painting styles:
- Abstract
- African
- Chinese
- Color Field
- Concept Art
- Contemporary
- Cubism
- Expressionism
- Fauvism
- Futurism
- Hyperrealism
- Impressionism
- Indian
- Islamic
- Japanese
- Korean
- Landscape
- Lyrical Abstraction
- Minimalism
- Modern Art
- Modernism
- Mural
- Photorealism
- Pop Art
- Pop Out
- Still Life
- Surrealism
- Veduta
I attempted to collect 50,000 sample images of each style or 1.4 million images total. This required use of a search engine API, namely Bing Cognitive Web Search API. Unfortunately the API could not handle the 50,000 sample workload, so ultimately I only ended up with ~1,000 images per class (1/50 of the goal). Additionally, since I leveraged a search engine for this task, the images came from a wide range of sources and thus it was essentially impossible to create truly homogeneous samples for style.
Luckily I was fortunate enough to have a 32 processor server with 208GB of RAM and 8 NVIDIA K80 Graphic Cards at my disposal. Unfortunately Tensorflow is still an evolving framework and is very difficult to optimize, and has restrictions based on the OS you are using (e.g., on Windows Tensorflow only works with Python 3.5.2).
Having around 2 weeks to attempt this massive undertaking I sought out existing models I could leverage for my problem. I narrowed down my base model to VGG Net, which was proved very effective for a similar problem, the ImageNet Challenge 2014. They ended up finding that a 16 and 19 weight level variations of the model proved most effective. The weights from this model implementation have been used to create a Neural Artistic Style Transfer program that can transform any photo you give it, to have the theme of your favorite painting. See a breakdown of the VGG Net Model in the table below:
I used the Tensorflow playground to approximate the time and complexity to train a model for what I wanted. I approximated with perfect samples, my theoretical ceiling would be 78.5% at 5,000 epochs (see image below). I determined there would not be enough time to run and test multiple models since I believe it would take over 24 hours to run a single training model.
So I ended up settling for demonstrating the application of the VGG-19 ImageNet to transform a photograph of Columbia University to the style of an Abstract Painting. See (a)Original Photo (b)Abstract Painting (c)Photo in the style of the painting
(a)
(c)
The above image (c) took 36 hours to produce, which demonstrates how time consuming it would be to build a model that not only essentially performs the above transformation for 1.4 million images, but also applies the results of it to an optimization problem that can use the common characteristics to estimate the most probable painting style of a new image fed into the model. This is a problem that fascinates me and that I will continue to work on.