Sentiment analysis with embeddings

Sentiment analysis with embeddings#

Feb, 2024

OpenAI API Sentiment analysis

Background#

Last year, I worked on a project where I scraped reviews of a restaurant from Google Maps to analyze the sentiments expressed in those reviews. In the project, I employed the basic Natural Language Processing technique of encoding words into numerical values and then used a basic machine learning model (logistic regression) to classify reviews as positive or negative. Now, seizing the opportunity provided by OpenAI to utilize its word vectorization model, I aim to revisit the same project to compare the results obtained with this new approach.

The data#

I retrieve the data from the reviews that I saved in a CSV file.

	stars	text
0	4 estrellas	Excelente trato y buena relación precio-calida...
1	5 estrellas	Muy buen menú con mucha variedad y buen produc...
2	4 estrellas	Comida buenísima y buena atención, vistas mara...
3	5 estrellas	He comido el menú del día por 13 euros, y tien...
4	4 estrellas	Restaurante grande con unos menús muy ricos y ...

Data processing#

The string type stars column needs to be converted into integer values.

<class 'pandas.core.frame.DataFrame'>
Index: 193 entries, 0 to 202
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   stars   193 non-null    int32 
 1   text    193 non-null    object
dtypes: int32(1), object(1)
memory usage: 3.8+ KB
None

Create review texts’ embeddings#

I define a function that creates embeddings from a list of texts and use it for the reviews.

Sentiment analysis: stars (5 categories)#

At first, I classify the reviews into 5 different categories corresponding to the ratings that customers give by assigning stars. This way, I’ll be able to directly compare the results obtained by OpenAI’s semantic model with the ratings given by customers.

	stars	text	label	mismatch
0	4	Excelente trato y buena relación precio-calida...	5	1
1	5	Muy buen menú con mucha variedad y buen produc...	5	0
2	4	Comida buenísima y buena atención, vistas mara...	5	1
3	5	He comido el menú del día por 13 euros, y tien...	5	0
4	4	Restaurante grande con unos menús muy ricos y ...	5	1
...	...	...	...	...
188	5	(Traducido por Google) Gran lugar\r\n\r\n(Orig...	5	0
189	5	(Traducido por Google) Increíble vista\r\n\r\n...	5	0
190	4	(Traducido por Google) Hermoso menú y temperat...	5	1
193	5	(Traducido por Google) Gran lugar\r\n\r\n(Orig...	5	0
202	3	(Traducido por Google) El restaurante\r\n\r\n(...	5	2

193 rows × 4 columns

mismatch
0    88
1    70
2    23
3     6
4     6
Name: count, dtype: int64

Accuracy rate -> 0.46

The results indicate that the model only matched the stars assigned by customers in 88 out of 193 reviews, providing an accuracy rate of 46%.

Out of curiosity, let’s take a look at the 6 reviews that deviated the most, to check why this might have happened.

	stars	text	label	mismatch
41	1	El sitio es espectacular. El servicio es inesi...	5	4
60	1	El precio un poco excesivo para todo su conjun...	5	4
106	1	El sitio esta de lujo, pero deja mucho que des...	5	4
146	1	Me gusta mucho	5	4
157	1	Cerveza caliente y camareros bordes, la combin...	5	4
187	1	(Traducido por Google) Nos sentamos a la mesa ...	5	4

--> {'stars': 1, 'text': 'El sitio es espectacular. El servicio es inesistente desde qué te toman la comanda, asta qué llega él pri. plato puedes hecharte una siesta y ya no te digo nada para él seg. plato etc.... Lo que no entiendo como sigue teniendo el mismo dueño la explotación en fin un autentico desastre. Y llevo 35 años comiendo fuera de casa, así que algo de esperiencia tendré.', 'label': 5, 'mismatch': 4}

--> {'stars': 1, 'text': 'El precio un poco excesivo para todo su conjunto, teniendo en cuenta sobre todo el trato del personal. La comida está buena, pero no tanto como para pagar 26€ por un menú que, a primera vista llamaba mucho la atención, pero después dejó mucho que desear.', 'label': 5, 'mismatch': 4}

--> {'stars': 1, 'text': 'El sitio esta de lujo, pero deja mucho que desear.\r\nCroquetas congeladas, bacalao sin acabar de frerir, la carrillera de la carta es redondo de ternera, ...\r\nSolo lo salvan las camareras, que estan atentas a todo y pidiendo disculpas.\r\nPara tomar una caña y un pintxo despues del monte vale, pero no para mas.', 'label': 5, 'mismatch': 4}

--> {'stars': 1, 'text': 'Me gusta mucho', 'label': 5, 'mismatch': 4}

--> {'stars': 1, 'text': 'Cerveza caliente y camareros bordes, la combinación perfecta para no volver', 'label': 5, 'mismatch': 4}

--> {'stars': 1, 'text': '(Traducido por Google) Nos sentamos a la mesa durante una hora y media hasta que trajeron la comida.\r\nHabíamos hecho una reserva el día anterior.\r\n\r\n(Original)\r\nOrdu ta erdi mahaian eserita zai egon ginen janaria ekarri arte.\r\nBezperan erreserba eginda genuen.', 'label': 5, 'mismatch': 4}

In the review number 4 on this list, the issue arises from the inconsistency of the customer, who assigned only one star when the corresponding text they wrote indicated satisfaction, and therefore the model correctly deduced that it deserved 5 stars.

In the rest of the reviews, the model misinterpreted the text, probably due to reasons such as: The use of positive words in a context that actually indicated dissatisfaction; Poor writing style that hinders comprehension; Texts in languages other than English.

Sentiment analysis: good/bad (2 categories)#

To better compare the results of this model with those I obtained in the original project, I tried asking it to classify the texts in a binary manner, that is, into good and bad reviews. Just as I did in the original project, to simplify and not have to take the trouble of assigning it myself by reading each text, I considered a review to be good if it had been assigned 5 or 4 stars, while a bad one was one that had received 1, 2, or 3 stars.

	stars	text	label	stars_4_5
0	4	Excelente trato y buena relación precio-calida...	1	1
1	5	Muy buen menú con mucha variedad y buen produc...	1	1
2	4	Comida buenísima y buena atención, vistas mara...	1	1
3	5	He comido el menú del día por 13 euros, y tien...	1	1
4	4	Restaurante grande con unos menús muy ricos y ...	1	1
...	...	...	...	...
188	5	(Traducido por Google) Gran lugar\r\n\r\n(Orig...	1	1
189	5	(Traducido por Google) Increíble vista\r\n\r\n...	1	1
190	4	(Traducido por Google) Hermoso menú y temperat...	1	1
193	5	(Traducido por Google) Gran lugar\r\n\r\n(Orig...	1	1
202	3	(Traducido por Google) El restaurante\r\n\r\n(...	1	0

193 rows × 4 columns

Confusion matrix: 

 [[ 29  36]
 [  2 126]]

Report: 
                  precision    recall  f1-score   support

 bad reviews ->       0.94      0.45      0.60        65
good reviews ->       0.78      0.98      0.87       128

       accuracy                           0.80       193
      macro avg       0.86      0.72      0.74       193
   weighted avg       0.83      0.80      0.78       193

The accuracy with which it detects whether reviews are good or bad is quite high, especially for the bad ones (29 correct negative reviews and only 2 were misclassified as negative). However, if we consider that this automatic classification would be more useful if it were able to capture the maximum number of negative reviews (to provide a prompt response to all dissatisfied customers), then we see that the sensitivity (recall) of 0.45 is quite low in this case (it captured 29 negative reviews, but missed 36 that were also negative). For this specific case, the result obtained is worse than in the original basic project I did last year, in which I obtained a recall of 0.7.

Visualizing#

Simply to have a visual reference of what the model is doing, I reduce the dimensionality of the vectors to 2 in order to plot their position on a plane and verify that indeed the reviews of each type are spatially clustered.

../_images/007ee6af901005cc627a35ddf14ace9e6585cf7c994ba5ef86190aec917e7204.png

Conclusion#

Despite the ease of use and the impressive ability of OpenAI’s semantic model to assign sentiment to each review, in this project, when it came to automatically detecting all the negative reviews, it did not give me better results than the ones I obtained last year using more basic Machine Learning techniques.