NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Machine Learning > Use Neural Networks to Find the Best Words to Title Your eBook

Use Neural Networks to Find the Best Words to Title Your eBook

Bernard Ong
Posted on Aug 22, 2016

kindlesite2Introduction

The eBook business is thriving. The likes of Amazon Kindle, Apple iBookstore, and Google eBookstore all provide a robust variety of channels by which to publish any eBook on any subject you could think of.  Amazon generates an average of 1.07MM in eBook paid sales volume, which translates to about $5.8MM in revenue, every day.

A huge community of eBook followers exist due to its proven model to generate passive income for good writers. There are many great writers out there, but why is it sometimes difficult to generate the expected revenue in the market? The key to real success for capable, passionate writers is preparation and research. You have to know the market and current trends.

Maximizing Potential

It is not enough for writers come up with a topic they are interested in, start drafting and writing what they feel most passionate about, get published, then sit back and reap the benefits of their efforts. There are many websites that offer advice and recommendations on how to get into the eBook publishing business, but it boils down to good old fashioned market research and audience targeting to drive attention and trigger those sales.

Many tools and utilities are available to aid writers research their subjects of choice and understand keyword search volume. They also reveal how much competition exists for those subject areas you're writing about, and what the reading public is clamoring for. At what price point are customers willing to pay for the eBook?

When you think about the purchasing behavior of eBook readers, they would normally go to their favorite eBook stores, type in the relevant keywords, and start browsing through the hundreds, if not thousands of eBooks out there. If your eBook does not appear in the first or second page of the most relevant search results, your chances of being noticed are slim. Consumers will always try to go for the most relevant, most interesting, most popular, most highly rated, most favorably reviewed, and most inexpensive eBook with the best value they get their hands on. Sounds daunting right?

The eBook market is saturated with all the popular topics, so one needs to be more creative in their approach on the entire eBook publishing process. Do not start writing a word until an understanding of how the market will react is clearly achieved.

Business of eBooks

ebooks-businessThere is an interesting market psychology that drives the eBook market. The savviest writers are great marketers first, excellent wordsmiths second. If one's intent is to purely follow their passion and write something they truly care about with the prospect of generating income as a second thought, then no amount of market research will convince the author to sway from his or her prime directive.
However, if you're like most people, you would want to not only write something you really care about, but also try to maximize the potential of earning passive income while you're doing it. And why not? This mindset has been prevalent especially in this generation as more Baby Boomers are in or close to their retirement periods. Many feel compelled to follow their passion more, and at the same time find ways to supplement their retirement income. Writing and publishing eBooks has been one of these passive income generators that a lot of people vie for, but sadly, not many succeed.

eBook Fever

There's so much potential of passive income generation that eBooks offer that there are many models built around the business of eBooks itself. Websites and courses exist to teach, mentor, and guide eBook writers to how to go about the business of honing and excelling in the craft of research, marketing and writing rolled into one.

The most successful ones make it a point to create and publish eBooks once every two to three months. These savvy publishers go for volume, so it becomes a numbers game (to an extent). But it's not done blindly to crank out these eBooks. The electronic format and plethora of delivery channels make it really convenient and efficient to produce and publish eBooks at an unheard of rate that traditional book publishers have never seen before. So it boils down to research and more research.

So where does one start? Right out of the gate.

Open Sesame

When potential readers are hunting for the best eBook, there are cues that can be managed effectively to maximize their attention and garner a rise on a potential sale. Once search results are displayed, the book's title and cover are the little-known doorways to grabbing the reader's attention. Book design covers are there to speak visually to the consumer, the eBook title (and subtitle) are there to compete among the thousands of keywords and appeal and attract to the consumer mental and emotional state.

Let's focus on the how to choose a title for an eBook to increase the likelihood of being more visible, and appeal to the most anticipated result. We will focus on three main objectives: how to find the top words to title a Kindle eBook, and how to find the most similar words related to the subject. We will also have a discussion on how to use and combine two technical approaches to fulfill our objectives.

Titles and headlines change the way we think. What you read affects what you see, and what grabs your attention changes the way you feel. The hook, line, and sinker exist in all forms of media. Part of advertising agencies' expertise is coming up with all these catch phrases and headline grabbers. From news pieces, articles, blogs, news, printed and electronic books, to movies, marquees, and advertisements. These are just a sample of how titling can make a difference.

Word Counts and the Doc2Vec Neural Network

tech-approach

I targeted the Amazon Kindle site as the source for gathering information to do an in depth analysis on eBook inventory listing, ratings, reviews, and pricing. Code was written to dynamically web-scrape and scour the entire result set based on the key word search. From there, a process was developed to tokenize and clean up the titles into group of words which are categorized and counted to the proper rating for that specific eBook title. A total weighted score is calculated by getting the summation of the product of the rating score multiplied by the word count.

A second and complementary approach using Doc2Vec was used to analyze the entire eBook title listing. Doc2Vec was built based on the Word2Vec approach towards documents. The objective is to convert whole documents or in this case, bodies of text based on eBook titles into a digital representation in a multidimensional vector space. What that means is that any title can be represented as a vector point in a multidimensional space. This point on its own does not offer any value, but when a cluster of these points are created, a pattern starts to emerge and relationships between the words start to form. The magic and power of Doc2Vec is its ability to mathematically create context out of words, and inversely, also produce words given its context.

Doc2vec is based on a very thin two-layer neural network. Doc2Vec learns representations for words and labels simultaneously. It operates in a purely unsupervised mode and needs no labels other than an arbitrary unique ID per text example. You train it to find similar (using cosine similarity) words in context of each other based on the frequency of co-occurrences of words that are near each other. You can even pass entirely new text to the trained model and it can infer a compatible vector, and find the best and most similar words most likely to appear in context.

The approach is to create the weighted word frequency count to get the most relevant words from Amazon Kindle site and intersect that set of word list with the result from the Doc2Vec neural network result set. The intersected result set then uses the relevancy score from Amazon Kindle site to sort and produce the highly recommended list of words to use to form an eBook title that could increase the likelihood of being noticed and getting the sale. This serves two purposes. One, it ensures that the list relevancy remains high due to its consistent use of the Amazon Kindle inherent scoring mechanism. Second, the Doc2Vec neural network output of the most similar words that it learned from the entire title listings from Amazon Kindle produces possibly related search words. Utilizing these words will fortify the likelihood that the eBook title will appear on the first two to three pages of Amazon's search results.

Technical Implementation

tech-process

The initial process is all about the scraping strategy and approach. Python was the language and the BeautifulSoup library took care of the web-scraping. A fully object oriented approach was also implemented so that full modularity, reuse, componentization, and abstraction was achieved. The ReviewCorpus class did pre-processing on incoming titles. This class are four methods.

This class method removes the higher order non ascii characters. It is fairly normal at times to get these characters and it detracts from accurately processing text for a higher dimensionalized vector space.

These class methods use the Python nltk library functions to tokenize the titles, remove stop words, and take out all punctuation.

The next class method named .add then creates a dictionary that maps each title with a count against each rating category.

The AmazonKindle class scrapes the Amazon Kindle site dynamically. This code needs to be very robust as the search results from Amazon could go to a maximum of 400 pages, and each page could have from 15-20 title listing with metadata (titles, ratings, number of comments, prices).

The first method of the AmazonKindle class initializes the query string and scrapes the maximum page from the Amazon result page. This tells how deep the looping code needs to go to scrape through all the titles.

The buildURL method is then used to formulate the URL to scrape. Since Amazon generates this dynamically, I had to get the URL down to a reproducible format and protocol.

The retrieveSource method is the main module to collect the entire HTML byte stream. The header needs to be formulated to effect the scraping as a browser client as Amazon does not allow scraping bots on its site.

The main method is the processRecord method as it's designed to now process and scrape out the HTML content for the essential data and store the result in memory.

Lastly, the AmazonKindle class is designed as a class iterator to pass back a generator result set for further processing.

After the two classes above are created, a Python application is created to bring it all together. The main application will pass the parameters to the processing classes, clean all text up as it collects it, gathers all the metadata, stores it as vectors, then write it all out to CSV files for further processing by the Doc2Vec procedure.

The next step is to now vectorize the entire title list and train the neural network on the entire corpus. The Doc2Vec function is called with the right hyper-parameters. How the hyper-parameters are set is the most critical step in the vectorization, model build out, and training. The code is as follows.

An inference engine is called to get the most similar set of words from the vectorized cloud. The parameters need to be passed in manually for this version, but can be abstracted easily as appropriate.

Finally, the data sets are converted to a set format, then the set intersect function called to obtain the final recommended list of words that best be used for formulating the eBook title.

Here are some examples of the output using the weighted word frequency count and Doc2Vec neural network, tried with various keyword combinations, and final result sorted using Amazon's relevancy score.

example1

example2

example3

example4

Future Next Steps

It would be an interesting exercise to use the recommended bag of words to literally come up with final title recommendations. I believe that the use of a Deep Learning model using Theano Keras has this capability to formulate human readable titles using natural language generation.


by Bernard Ong, Data Argonaut
Email: bernard.ong@entense.com
Cell: 201-916-5241

About Author

Bernard Ong

Data Scientist with track record of driving innovative technology projects and programs to successful implementation. Blend application architecture and machine learning skills with domain knowledge to drive strategy and execution excellence. Background includes managing multimillion-dollar portfolios, turnaround initiatives,...
View all posts by Bernard Ong >

Related Articles

Capstone
Predicting the Unpredictable: Revolutionizing E-commerce Delivery with Machine Learning
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
R Shiny
Forecasting NY State Tax Credits: R Shiny App for Businesses
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle

Leave a Comment

Cancel reply

You must be logged in to post a comment.

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application