It is quite common for us and our friends to experience an insomnia. For some people, it may be a longterm difficult-to-cure disease while for most of us, it always comes due to daily unhappiness or unfinished work. When the clock runs to one or two o’clock at night, but we are still awake, we extremely hope that something magical appears instantly and helps us fall asleep. That’s the reason why we decide to do a sleep aids marketing research.
In consideration of convenience and efficiency, we chose the top 100 best sellers in medicinal sleep aids in Amazon as the source of data and got the information of products’ name, star, price, the total number of reviews and the content of reviews. After reading this analysis article, we hope to give some suggestions for people who can’t fall asleep in the late night.
1. Data Scraping & Cleaning
First, we used Python3.7 to catch the rank, name, star and price of the top 100 products from the homepage. (By the way, the Amazon Best Sellers page updates once an hour, so there may be some different between our scraping data and real-time data.)
Second, in order to get more details, we scraped the data (including star, size, flavor and content) from the review page. There were so many reviews for each product so that we only select the most premium 10 reviews for each one.
As for homepage, we got 100 products’ information mentioned above, with one item link unavailable at that time.
As for review pages, we got 970 items in total, with 3 products losing reviews. Owing that different products from the same merchant share comments, we cleaned repeated comments in the .csv file, finally 850 items left.
The scraped data are in various forms that need to be re-organized, some may require to change the data type. For example, there are many useless characters (i.e. ‘[ ]’) and some character type data seemed to be in numbers, which need to be changed to the number type data.
2. Data analysis
2.1 Homepage Data
After data cleaning, we have come up with several graphs representing the results.
The ratings of the top 100 best selling products are shown above, for the X-axis, product names’ are placed in the order of the ranking (from left to the right, starting with rank number 1 to rank number 100). The highest rating is 4.8/5 and the lowest rating is 3.2/5. The average rating is 4.6 out of 5. There are 39% of products’ ratings below the average, and most of them located in the lower half of the sales ranking. There are many other criteria contributing to the ranking, we did not observe any direct correlation between the two items.
The prices of the top 100 best selling products are shown above, for the X-axis, product names’ are placed in the order of the ranking (from left to the right, starting with rank number 1 to rank number 100). The average price is $14.49, the highest price ($65) and the lowest price ($2.44) are in a significant difference. There is no direct correlation observed between the price and the ranking, but the top 15 products’ prices are all set below the average, which tells that the best selling products are not those comparatively expensive ones.
The comment numbers of the top 100 best selling products are shown above, for the X-axis, product names’ are placed in the order of the ranking (from the left to the right, starting with rank number 1 to rank number 100). The average comment number of a product is 916, and the comment number of the most commented product is 3239 while the least commented product only has 65 comments, which shows a huge gap between them. And it remains a question that how does the comment number affect the ranking or other dimensions of the products as we can observe from the graph, there are several products with relatively low comment number still ranked in the top 20 or top 50.
Apart from products, a perspective could be taken from the suppliers.
The above graph illustrates the average rating of suppliers of the top 100 products of melatonin. The supplier Zahler gets the highest average rating of 4.8/5 and the supplier Klove gets the lowest average rating of 3.2/5. The average rating of all suppliers is 4.222.
From the above graph, the supplier Integrative Therapeutics has the highest price of $44.7, which may tell that this supplier focuses on the high-end market or target customers who are willing to spend more money on this kind of medical products. It has two products ranked within the top 100, one of them reached the highest price of $65 among all 100 products. Over 60% of the suppliers set the price range between $5 to $15.
The supplier/brand Natrol gets the greatest number of comments, they also get the greatest market shares among the best sellers (shown in the pie chart below). The supplier Zarbee’s Naturals gets the second place in comments numbers, this supplier also owns the number 1 best selling products among the top 100 products.
Among the top 100 sellers, the supplier/brand Natrol has a dominated market share of 15%, the second large market share owns to the supplier ZzzQuil of 5%, and several suppliers tie for the third place in this market.
2.2 Review Page Data
It is essential to collect what feedbacks the consumers gave to these top 100 sellers. On one hand, it is a guidance for other consumers who are browsing websites to find the most proper product. On the other hand, merchants also refer to these reviews to improve products and adjust selling strategy.
By using dictionary and matplotlib.pyplot function in Python3.7, we accounted the most frequently used words in consumers’ comments after excluding irrelevant words such as articles, pronouns, and prepositions.
The bar chart below shows the most frequent as well as relevant keywords in the reviews.
However, single words does not make sense only when they are put into context. So the review content were input into an online word frequency counter WriteWords which can conduct phrase frequency. We also went through the context artificially via Excel because the program and the website were not intelligent enough.
When scanning the phrases and context, we can conclude that five aspects are focused on by consumers, effects and side effects together being the most significant index, followed by product features, consumer features and consumers’ further action.
The most essential thing must be the effects, for it is a medical supplement. Most of these products had apparent positive effects on the duration, speed, quality and habits of sleeping. Consumers reported they could fall asleep about 20 to 60 minutes after taking sleep aids, being asleep for 6–8 hours. However, some consumers reported side effects such as having difficulty to wake up the next day, being groggy, and stomach being gassy.
Consumers care about the product features the next, including flavor, size, status, and ingredients. Products with berry flavor, 48–120 counts, chewable solid, and natural ingredients seem to be more likely to become best sellers.
Consumers are from different age groups. Surprisingly, 13 products, which is over 10% of the top 100, are designed for children aged from 1.5 years old to 15 years old. In order to keep children asleep at night and awake at school, parents would like to buy sleep aids for their sons or daughters.
As for further action, most consumers highly recommended the product they bought and would like to continue buying.
3. Conclusion and Discussion
3.1 Conclusion
Compared with conventional sleeping pills, most of the top 100 best sellers in medicinal sleep aid in Amazon are more natural and healthier, with fewer side effects. However, though side effects are slightly perceived, consumers should pay attention to such symptoms. Whether the sleep aid works or not depends on individuals.
The supplier/brand Zahler enjoys the best reputation in sleep aid, while Natrol has the greatest market share. At the same time, Natural Factors provides the most attractive price.
The most popular products have these features: with fruit or candy flavors, 48–120 counts per pack, chewable solid, and natural ingredients.
Children become a rising group to take sleep aid. In top 100 sellers and 13 children-targeted products, the supplier/brand Zarbee’s Naturals took up 4 seats, being the most competitive one.
3.2 Discussion
It is better to update our codes and try to get real-time data once an hour so that we could conclude a dynamic graph to analyze which brands work better.
When making statistics of the review content, we spent a lot of time checking the context because of lacking the ability to do machine learning, resulting in getting approximate rather than accurate data.