Anti-epileptic Drug Review Analysis
Introduction
Sometime when people have fever or headache or bad cold, they start to useย tylenols. ย People might ask which symptom that the tylenols take more effect on when comparing with other medicines. ย In general, how do customers choose medications based on the reviews? ย How can we answer these questions? These questions can be done by analyzing the drug reviews. ย For this project, I am interested in analyzing the drug reviews for Epilepsy. ย Exploratory data analysis, Sentiment analysis, and relation between rating and polarity analysis will be presented in the project.
Background
Epilepsyย is a series of neurological disorders characterized by epileptic seizures and it is more common in older people. ย Epileptic seizures are episodes that can vary from nearly undetectable to long periods of epileptic seizures. ย The cause of most cases of epilepsy is unknown. Epilepsy can have both genetic and acquired causes. ย Seizures are often brought on by factors such as stress, alcohol abuse, or a lack of sleep. ย Nearly 80% of cases occur in the developing world. ย Seizures ย are controllable with medication in about 70% of cases. ย What are the effectivenesses of the anti-epileptic drugs? How do they treat people withย various symptoms differently? Does the drug take more effect on a particular symptom? ย Pharmaceutical scientists might be interested in these questions so they can use the answers to do further improvement on drugs. ย Also, how do reviews help customers in choosing drugs? ย To answer these types of questions, we need to do an analysis on drug reviews.
Web Scraping
www.Drugs.comย has drug reviews information. ย For this project we selected 4 anti-epileptic drugs (Gabapentin,ย Clonazepam,ย Topiramate,ย Pregabalin) for the review analysis.
- Gabapentin: It is used in adults to treat nerve pain
- Clonazepam: It used to treat certain seizure disorders (including absence seizures or Lennox-Gastaut syndrome) in adults and children
- Topiramate: Help to reduce seizure activity and prevent migraine headaches from occurring
- Pregabalin: Slow down impulses in the brain that cause seizures
Below is a view on the data that we wereย interested in from the site.Each review of the customer includes condition, review comment, and rating. In the snapshot above, the information will be collected are condition (For Reflex Sympathetic Dystrophy Syndrome: ), Comment ("I was diagnosed with RSD on Tuesday. ... I like these pills so far!"), and Rating (10).
The method for scraping the page is BeautifulSoup.
Data Cleaning
The main item to be cleaned is the condition column. ย Condition: "For Reflex Sympathetic Dystrophy Syndrome:". ย The word 'For' in the front and ':' will be deleted. ย Also, if the condition is "Neurontin (gabapentin) for Postherpetic Neuralgia",ย "Neurontin (gabapentin) for "ย will be deleted.
Below is the head of the data frame.
Exploratory Data Analysis
Review Count By Condition
Gabapentin
The majority of the reviews are from Pain and Anxiety for Gabapentin and follow by Fibromyalgia and Peripheral Neuropathy.
Clonazepam
The majority of reviews are from Anxiety and Panic Disorder.
Topiramate
Migraine has more reviews than other conditions.
Pregabalin
Fibromyalgia and Generalized Anxiety Disorder have more reviews.
Throughout this analysis we seeย which condition is used the most for the different drugs.
Average Rating By Condition
Gabapentin
Cough has the highest ratingย among other condition.
Clonazepam
Temporomandibular Joint Disorder, Chronic Myofascial Pin and Obsessive Compulsive Disorder have higher ratings.
Topiramate
Seizures has the highest rating follows by Diabetic Peripheral Neuropathy and Tourette's Syndrome.
Pregabalin
Occipital Neuralgia has the highest rating, follows by Restless Legs Syndrome and Dercum's Disease.
Throughout this drug review analysis, we foundย Gabapentinย is good forย cough.ย Clonazepamย is good forย Temporomandibular Joint Disorder, Chronic Myofascial Pin and Obsessive Compulsive Disorder. ย Topiramateย is good forย Seizures andย Pregabalinย is good forย Occipital Neuralgia. ย The pharmaceutical companies could look at the lower average rating conditions and do further analysis on the drug ingredients.
Average Rating Comparison
Clonazepam has the highest average rating among the drugs and Topiramate has the least average rating.
Sentiment Analysis
Review Polarity
Clonazepam has the best polarity (moreย positive and less negative comments) among the other three drugs.
Review Subjectivity
The review subjectivity scores are ranging from 0 to 1. ย 0 is being most objective and 1 is being most subjective. ย All drugs have the same structure, it is almost normally distributed. There is a small amount of reviews being more objective.
Relation Between Rating and Polarity
Do the review comments match up with the ratings? ย If there is a higher rating, do you think the comments will be positive? We will look at the scatter plots to make our conclusion.
It is surprise that we do not see there is a strong correlation between rating and review polarity but we do see higher rating corresponds to higher positive score. ย In the scatter plot ofย Gabapentin, there is a review with rating of 10 but there is a veryย negative review associated with it. The reason is the review mentions the customer was very ill before with 'miserable' on it and the other words are neutral. ย The majority of the reviews for each rating are between -.5 and .5 polarity. ย It is still quite normal asย they talk about their illness in a negative way while mentioning aboutย how do the drugs cure them in a positive way.
Conclusion
- The scatter plot for rating and polarity does not reflect there is a positive correlation between them. ย On average, the polarity is always a bit way fromย neutral in with positive or negative. ย We do see the polarity scores are higher for the ratings that are above 7.
- In terms of subjectivity, there is a small group of people with more objective ย in writing reviews. ย This makes the overall reviews be less bias.
- Based on the sentiment analysis, customers should chooseย Clonazepam out of the 4 drugs for this case study because it has the highest positive score and lowest negative score. ย In general, customers could compare more drugs by using this technique.