Handfan Reviews

Handfan Reviews#

Olele \(^{1}\)Image credit: https://olele.es/

Feb, 2023

Web Scraping Sentiment Analysis

Background#

My wife crafts hand fans and sells them online under the brand name Olelé. Her catchphrase is: Spanish hand fans with a different touch. She basically sells them on Etsy and ships them out to the whole world.

She started in 2010 and has received many reviews ever since. I suggested doing a little analysis on them. “I’d like that”, she said. Of course, she knew the majority of the reviews were very positive, so she had no problem showing them off.

She said: “Let me see if I can remember how to download the reviews in a file from my Etsy account”. “Don’t worry”, I replied, “I’m going to scrape them directly from the website!”

Web scraping the data#

On the Etsy website I saw that Olelé’s reviews were placed along 28 different pages. I could access them changing URL query parameters, so I programmed some lines to automate the fetching of HTML pages and extraction of the desired data from them: name, date, stars and review text.

Etsy

Show code cell source Hide code cell source

# Params
pages = 28 # Review pages present for Olele
wait_time = 180 # seconds
attempts = 3 # max attempts

# Init
page = 1
attempt = 0
names_dates = []
stars = []
texts = []

# Iterate through pages
while page <= pages:

    # Update page number in query parameters of the URL
    url = f"https://www.etsy.com/shop/Olele/reviews?ref=pagination&page={page}"
    
    # Request data
    r = requests.get(url)
    
    # Server answers 'Too many requests'
    if r.status_code == 429:
        attempt += 1
        print(f"Waiting {wait_time}s in attempt {attempt}")
        time.sleep(wait_time) # Wait!
        if attempt >= attempts:
            raise Exception (f"Server still answering '429' after {attempt-1} attempts")
            break
            
    # Something went wrong      
    elif r.status_code != requests.codes.ok: # Not 200
        raise Exception (f"requests.get(url) returns {r.status_code}")
        break
        
    # Ok, go on!
    else:
        attempt = 0
        print(f"Scraping review page {page}...")
        
        # Get content
        html = r.content
        
        # Parse
        soup = BeautifulSoup(html)
        
        # Search for the data
        for review in soup.find_all('div', attrs={'class':'review-item'}):
            
            # Extract name and date
            name_date = review.find('p', attrs={'class':'shop2-review-attribution'})
            if name_date is None: # Didn't find anything
                dates.append('')
            else:
                name_date = name_date.get_text(strip=True)
                names_dates.append(name_date)
                
            # Extract stars
            star = review.find('span', attrs={'class':'screen-reader-only'})
            if star is None: # Didn't find anything
                stars.append('')
            else:
                star = star.get_text(strip=True)
                stars.append(star)

            # Extract review text
            text = review.find('p', attrs={'class':"prose wt-break-word wt-m-xs-0"})
            if text is None: # Didn't find anything
                texts.append('')
            else:
                text = text.get_text(strip=True)
                texts.append(text)
        
        page += 1
        if page > pages:
            print("Scraping finished!")

# Store listed reviews in a dataframe dropping duplicates
olele = pd.DataFrame({"name_date": names_dates, "stars": stars, "text": texts})\
                    .drop_duplicates()
olele

Scraping review page 1...
Scraping review page 2...
Scraping review page 3...
Scraping review page 4...
Scraping review page 5...
Scraping review page 6...
Scraping review page 7...
Scraping review page 8...
Scraping review page 9...
Scraping review page 10...
Scraping review page 11...
Scraping review page 12...
Scraping review page 13...
Scraping review page 14...
Scraping review page 15...
Scraping review page 16...
Scraping review page 17...
Scraping review page 18...
Scraping review page 19...
Scraping review page 20...
Scraping review page 21...
Scraping review page 22...
Scraping review page 23...
Scraping review page 24...
Waiting 180s in attempt 1
Scraping review page 25...
Scraping review page 26...
Scraping review page 27...
Scraping review page 28...
Scraping finished!

	name_date	stars	text
0	Carolinaon Nov 1, 2022	5 out of 5 stars	As usual the quality is sturdy yet elegant and...
1	Amy Annetteon Sep 3, 2022	5 out of 5 stars	I bought this fan to bring to tango and swing ...
2	Maryon Aug 31, 2022	5 out of 5 stars	LOVELY! I think this is my 4th fan? Complimen...
3	Eni Aon Aug 31, 2022	5 out of 5 stars	Very beautiful fan, great quality and the colo...
4	hredmondson Jun 11, 2022	5 out of 5 stars	This is the second fan I've bought for myself ...
...	...	...	...
272	Anonymous on Jun 21, 2010	5 out of 5 stars	Thank you!
273	Anonymous on Jun 18, 2010	5 out of 5 stars	Gorgeous! Love it! Just in time for the summer...
274	Anonymous on Jun 15, 2010	5 out of 5 stars	Really nice communication with Karmele Luqui. ...
275	Anonymous on Jun 12, 2010	5 out of 5 stars	A huge WOW when I just got them! The fans are ...
277	Anonymous on Jun 10, 2010	5 out of 5 stars	Just received this yesterday, and I am in love...

265 rows × 3 columns

Data processing#

The acquired data needed some processing before proceeding with the analysis:

Split name_date column into two different columns, name and date.
Convert date column from string to datetime data type.
Extract stars as an integer number from the string type stars column.
Rearrange the whole pandas dataframe: desired columns, names, order, index.
Finally, fill in missing (NaN) values in text with empty strings.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 265 entries, 2022-11-01 to 2010-06-10
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    265 non-null    object
 1   stars   265 non-null    int32 
 2   text    265 non-null    object
dtypes: int32(1), object(2)
memory usage: 7.2+ KB
None

olele

	name	stars	text
date
2022-11-01	Carolina	5	As usual the quality is sturdy yet elegant and...
2022-09-03	Amy Annette	5	I bought this fan to bring to tango and swing ...
2022-08-31	Mary	5	LOVELY! I think this is my 4th fan? Complimen...
2022-08-31	Eni A	5	Very beautiful fan, great quality and the colo...
2022-06-11	hredmonds	5	This is the second fan I've bought for myself ...
...	...	...	...
2010-06-21	Anonymous	5	Thank you!
2010-06-18	Anonymous	5	Gorgeous! Love it! Just in time for the summer...
2010-06-15	Anonymous	5	Really nice communication with Karmele Luqui. ...
2010-06-12	Anonymous	5	A huge WOW when I just got them! The fans are ...
2010-06-10	Anonymous	5	Just received this yesterday, and I am in love...

265 rows × 3 columns

The dataframe is now ready for analysis.

Data analysis#

Number of reviews per year#

../_images/7fe86a474556262ad49318f159da0a34106059ae2beb4880f9421a4b16696cf2.png

After some successful years, it looks like reviews have decreased lately.

Average number of reviews per month#

../_images/d5f84e714f20dbc7f1816de0d530832e8b530fdfbf1fa13333352e032b934df3.png

It shows a seasonality that matches the increased number of hand fans that are sold in the summer.

Stars’ distribution#

../_images/0ca7587d5302dc18d0effa3307673dae96e36fed241abde8ff12f0b6ee54dabd.png

Overwhelming majority of 5-star reviews!

Top reviewers#

../_images/d2544f5e4628a03babe1dbe232c80f1575184d8dd5f8c4e98ba6ace177085083.png

According to my wife, most of these top reviewers are customers from the US.

Longest review#

Out of curiosity, let’s have a look into the longest review of all.

I was looking for the perfect fan for a man like me. Then I found Karmeles shop and this wonderfully beautiful accessory. And I was thrilled with him in a split of a second. Blue is my favorite color and the pattern immediately made me think of a night sky shining with thousands of stars. Bingo! Hit! It had to be! Karmele made the fan very clean and simply perfect. He is delivered in a very nice and fine dark blue leather pocket. So now I can keep him safely at all times. An absolutely perfect combination, in my opinion. I can't wait to use him on a sunny spring day. We had it yesterday, but with 5 degrees and cool wind... I am absolutely certain, that we will have warmer temperatures very soon. Now I wish Karmele continued success and that she and all her loved ones stay healthy. Kind regards, Friedrich

Sentiment analysis#

I am using a Python library called TextBlob for processing textual data and get:

Information about the emotion contained in the text.
The most common words.

Polarity and subjectivity#

We need reviews to include text, so we discard the ones without it.

235 'texted' reviews out of 265 were selected.

Now, I obtain polarity and subjectivity indexes for each review.

	name	stars	text	polarity	subjectivity
date
2022-11-01	Carolina	5	As usual the quality is sturdy yet elegant and...	0.406190	0.759405
2022-09-03	Amy Annette	5	I bought this fan to bring to tango and swing ...	0.435714	0.534127
2022-08-31	Mary	5	LOVELY! I think this is my 4th fan? Complimen...	0.625000	0.750000
2022-08-31	Eni A	5	Very beautiful fan, great quality and the colo...	0.637500	0.637500
2022-06-11	hredmonds	5	This is the second fan I've bought for myself ...	0.458333	0.416667

polarity index: from “-1” meaning negative emotion, to “1” positive emotion, with “0” being neutral.
subjectivity index: from “0” meaning objective assertion, to “1” being completely subjective.

../_images/443e3dbb851a03c74a7399768f4a4c3cc68c94f8becbf77b03bafd54a7a8f442.png

The model calculates the polarity and subjectivity of the texts based on the words they contain, and this is the distribution of the results. We can see that the general mood matches the overwhelming majority of 5 stars rated by the customers.

../_images/74db70548b5f4e0493f5931180cf6fd16f829d1e7ecd53c36aa56038424b88c9.png

It seems there is a pattern in the scatter plot shown above, a sort of linear relationship between polarity and subjectivity: the more positive the emotion is, the more subjective seems to be according to the algorithm.

Bad reviews#

It is interesting to see how the model captures at least one of the bad reviews (“1”-star reviews), assigning it the lowest polarity of the set. Let’s take a look at it.

	name	stars	text	polarity	subjectivity
date
2019-11-19	Sednah	1	Kamele came thought for us and I'm very please...	0.59000	0.510000
2017-04-04	kim	1	I never received this fan I have been in touch...	-0.23375	0.621667

In fact, there are 2 reviews with 1 star. Oddly, one of them (the one by Sednah) has a positive polarity and gets overlapped in the last plot. Let’s read the texts to find out what is going on:

Kamele came thought for us and I'm very pleased to say that she has excellant customer service. The fans are attention grabbing in the best way.  This is my 3rd fan and I will order more when the time comes. It also makes a great holiday gift.

I never received this fan I have been in touch with the supplier and still waiting for it to be resolved either a refund or replacement, very disappointing as it was my  first buy from Etsy !

Clearly, the first one is a positive review (as the algorithm correctly interprets with the positive polarity it assigns), so the 1-star rating was a mistake made by the reviewer.

The second one is indeed a bad review, as both the algorithm and the reviewer tell (according to my wife, the problem with the courier was finally solved).

Objective reviews#

	name	stars	text	polarity	subjectivity
date
2019-05-22	Céline	5	It is sublime! Thank you	0.0	0.0
2019-02-04	福島圭子	5	I received it, thank you.	0.0	0.0
2017-08-01	adeline	5	A merveillosa réal	0.0	0.0
2015-08-11	Tanya	5	This fan is so well made. Thank you!	0.0	0.0
2010-07-01	Anonymous	5	Thank you!	0.0	0.0
2010-06-21	Anonymous	5	Thank you!	0.0	0.0

The texts are very short here (they all fit in the dataframe preview above). We can conclude that there are not enough words for the algorithm to calculate sentiments, and that is why it assigns zero values to the polarity and subjectivity, making them neutral.

Extreme reviews#

	name	stars	text	polarity	subjectivity
date
2019-12-13	Su	5	Very beautiful and so well made. Just what I w...	1.0	1.0
2019-02-17	Amy	5	This is a beautiful fan. Thank you!	1.0	1.0
2018-12-29	adeline	5	Perfect like always	1.0	1.0
2018-06-16	Carla	5	Absolutely a beautiful fan and can’t wait to u...	1.0	1.0
2017-10-14	jenniferpbowman1	5	Absolutely beautiful!	1.0	1.0
2017-10-14	jenniferpbowman1	5	Beautiful!	1.0	1.0
2016-08-13	Robin	5	Beautiful, thank you !	1.0	1.0
2016-07-06	Sommerlinde	5	The fan is beautiful, thank you!	1.0	1.0
2016-01-26	Catherine	5	Perfect, thank you. Very beautiful!	1.0	1.0
2014-01-11	Beatriz	5	Bellisimo! Perfect for spring and summer. Than...	1.0	1.0
2012-07-22	Anonymous	5	The fan is beautiful and well made!	1.0	1.0
2011-10-09	Anonymous	5	very beautiful and just as pictured	1.0	1.0
2010-08-05	Anonymous	5	Beautiful!	1.0	1.0
2010-06-29	Anonymous	5	They're beautiful! Thank you!	1.0	1.0

Very beautiful and so well made. Just what I wanted.
This is a beautiful fan. Thank you!
Perfect like always
Absolutely a beautiful fan and can’t wait to use it. Thank you!
Absolutely beautiful!
Beautiful!
Beautiful, thank you !
The fan is beautiful, thank you!
Perfect, thank you. Very beautiful!
Bellisimo! Perfect for spring and summer. Thank you!
The fan is beautiful and well made!
very beautiful and just as pictured
Beautiful!
They're beautiful!  Thank you!

Extreme sentiment values seem to come from short sentences loaded with words with a high sentiment charge, like “beautiful” or “perfect”.

Frequency of words#

Word cloud#

Let’s make a word cloud to get an impression of the words that stand out. To do so we have to convert all the reviews into one single text.

../_images/be046a7592c482bea02ac7bf0144543d251b8b2d12da9b7c23e4bf423de3d4fd.png

Most frequent words#

Finally, I will look at the most common words and plot their frequency.

../_images/7709d229c6e4ad001ebb7c93f01832f81e120e576c770c79ba2475dc29443ae5.png

The prevalent word in reviews for Olelé is ‘beautiful’.

Conclusions#

In this project, customer reviews from an online shop were web-scraped and analysed in their numbers and sentiments. It contains a brief foray into NLP (Natural Language Processing) domain via TextBlob, a library that makes textual data processing really simple.

The reviews for Olelé on Etsy were great, but unfortunately this case was not useful to build a customized interpretative model (a possible the next step in a project like this), because there were not enough bad reviews for a machine learning algorithm to be trained.