Using agents to breathe life into NPCs using CrewAI — Initial Conversation Analysis
- Rifx.Online
- Roleplay , Programming , Machine Learning
- 19 Jan, 2025
Conversation Analysis using output from: Using Agents to Breathe life into NPCs using CrewAI
Analysis
- Simulation1: Population of a Software Engineer, Computer Scientist, Computer Engineer
- Conclusions
- Supporting
Methods of Analysis
- Features extracted
- Splitting global_conversations.txt
- Sentiment, topics, lexical diversity, emotion
- Self-similarity
- Notebook
Background
Previously I discussed why I had interest in simulating 2d societies in my articles Using Agents to Breathe Life into NPCs and Using Agents to Breathe life into NPCs using CrewAI. Let’s analyze a brief conversation using the conversation-party simulator. Note, this is before any NPC-world interactions have been developed.
Population of a Software Engineer, Computer Scientist, Computer Engineer
We add NPCs who have bios in neighboring fields of study. Their initial tasks should allow them to easily discuss with one-another
NPCs
Non-playable characters in the simulation
Carly Cummings — Computer Scientist, Initial Task: Apply bitwise operations
Katherine Jones — Computer Engineer, Initial Task: Create a half-adder
Ashley Brown — Electical Engineer, Initial Task: Apply boolean logic
Conclusions
- The three NPCs stayed on topic, or more specifically they strayed very little from their starting topics.
- The range of word use was impressively diverse for a given line of conversation; however, there were echos of the discussion in responses which is evident when comparing similarities between texts.
- Topic analysis revealed two conversations. It should be noted that this may be because 2 people didn’t have interactions. The set of bidirectional conversations are : Carly<->Katherine, Carly<-Ashley
Supporting
Text
Conversation text can be found in the beginning of this notebook
Sentiment Analysis:
Remember: -1 is bad, 0 is neutral, and +1 is positive
Looking at the conversations analyzed, we can see some very neutral, and relatively positive conversations occuring.
Topics
Two topic groupings were identified:
Topic 0 includes: ‘queries’, ‘caching’, ‘query’, ‘optimizing’, ‘database’, ‘efficient’, ‘strategies’, ‘performance’, ‘retrieval’, ‘data’
Topic 1 includes: ‘enhance’, ‘like’, ‘algorithm’, ‘data’, ‘efficient’, ‘boolean’, ‘operations’, ‘design’, ‘cpu’, ‘logic’]
Lexical diversity
Remember: this is a ratio of the set of words over all words. In otherwords, as lexical diversity approach 1(max uniqueness) more words are unique and as lexical diversity approaches 0(min uniqueness) more words are repeated.
We can see from this graph that there is mid to high uniqueness in word usage per conversation.
Emotional Recognition
Emotion Recognition using Hugging Face Transformers: emotion_pipeline = pipeline(‘sentiment-analysis’, model=’bhadresh-savani/distilbert-base-uncased-emotion’)
Surprise and Joy were the emotions recognized in the exchange.
Semantic Similarity
Remember: 1 is perfectly similar and -1 is perfectly dissimilar
Clearly the conversations stayed on a similar topic; for example,our most dissimilar Cosine Similarity is around 0.8, which is fairly similar.
Methods of Analysis
Features extracted
- Sentiment Analysis: Uses TextBlob for a quick polarity assessment
- Topic Modeling: Uses LDA to identify topics; ensure data preprocessing is done for better results (e.g., stopword removal, lemmatization).
- Lexical Diversity: Simple type-token ratio.
- Emotion Recognition: Uses a BERT-based model available on Hugging Face for recognizing emotions beyond simple sentiment.
- Semantic Similarity: A basic example leveraging BERT for embedding similarity, which indicates the context alignment between conversational turns.
Splitting global_conversations.txt
We have a pattern for who is talking to whom, so we split on those vectors; ignoring speaker and listener
import re
def split_conversation(raw_conversation):
# Our conversations go something like "<person talking> (talking to <person>):..."
# Let's split on that
lines = re.split(r'^.*?\(talking to.*?\)\:', raw_conversation, flags=re.MULTILINE)
# remove empty lines and leading/trailing whitespace from each line
return [line.strip() for line in lines if line.strip()]
Sentiment Topics and Emotions
import nltk
nltk.download('punkt_tab')
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')
import gensim
import numpy as np
import pandas as pd
from textblob import TextBlob
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from nltk.corpus import stopwords
import transformers
from transformers import pipeline, BertTokenizer, BertModel
from gensim import corpora, models
from collections import Counter
import networkx as nx
def sentiment_analysis(conversations):
sentiments = []
for conversation in conversations:
blob = TextBlob(conversation)
sentiments.append(blob.sentiment.polarity)
return sentiments
def topic_modeling(conversations, num_topics=2):
cv = CountVectorizer(stop_words='english')
dtm = cv.fit_transform(conversations)
lda = LatentDirichletAllocation(n_components=num_topics, random_state=0)
lda.fit(dtm)
topic_results = lda.transform(dtm)
topic_words = {}
for i, topic in enumerate(lda.components_):
topic_words[f"Topic {i}"] = [cv.get_feature_names_out()[j]
for j in topic.argsort()[-10:]]
return topic_words
def lexical_diversity(conversations):
diversities = []
for conversation in conversations:
words = nltk.word_tokenize(conversation)
diversity = len(set(words)) / len(words) #"set of words"/"num words"
diversities.append(diversity)
return diversities
## emotion recognition using Hugging Face Transformers
emotion_pipeline = pipeline('sentiment-analysis',
model='bhadresh-savani/distilbert-base-uncased-emotion')
def detect_emotions(conversation_texts):
return emotion_pipeline(conversation_texts)
sentiments = sentiment_analysis(conversations)
topics = topic_modeling(conversations)
diversities = lexical_diversity(conversations)
emotions = detect_emotions(conversations)
print(f'Sentiments: {sentiments}')
print(f'Topics: {topics}')
print(f'Lexical Diversities: {diversities}')
print(f'Emotions: {emotions}')
Self-similarity
We use Hugging Face transformers to initialize a BERT tokenizer and model, which are pre-trained on the “bert-base-uncased’ corpus, to compute semantic similarity among a given list of texts. The get_embeddings() function tokenizes a sentence, generates its embeddings using BERT and averages the vectors to produce a single embedding. The calculate_cosine_similarity() function computes the cosine similarity betwee two embedding vectors, quantifying their semantic similarity. The semantic_similarity() function processes all conversations, gets their embeddings, and populates a matrix with pairwise cosine similarities. When populating the matrix, its careful to set self-similar comparisons to 1.0. This matrix is the basis for the heatmap where yellow means more similar and blue means less similar.
from transformers import BertTokenizer, BertModel
import torch
import numpy as np
import matplotlib.pyplot as plt
## init BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
def get_embeddings(sentence):
inputs = tokenizer(sentence, return_tensors='pt', truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
return outputs.last_hidden_state.mean(dim=1).squeeze()
def calculate_cosine_similarity(embed1, embed2):
cos_sim = np.dot(embed1, embed2) / (np.linalg.norm(embed1) * np.linalg.norm(embed2))
return cos_sim
def semantic_similarity(conversations):
embeddings = [get_embeddings(conversation).numpy() for conversation in conversations]
similarities = np.zeros((len(conversations), len(conversations)))
for i in range(len(conversations)):
for j in range(len(conversations)):
if i != j:
similarities[i][j] = calculate_cosine_similarity(embeddings[i], embeddings[j])
else:
similarities[i][j] = 1.0 # Similarity of a sentence with itself
return similarities
## calculate semantic similarities for the entire conversation set
similarities = semantic_similarity(conversations)
## semantic similarity as a heatmap
plt.figure(figsize=(8, 6))
plt.imshow(similarities, cmap='viridis', interpolation='nearest')
plt.colorbar(label='Cosine Similarity')
plt.xticks(ticks=range(len(conversations)), labels=[f"Conv-{i+1}" for i in range(len(conversations))], rotation=45)
plt.yticks(ticks=range(len(conversations)), labels=[f"Conv-{i+1}" for i in range(len(conversations))])
plt.title('Semantic Similarity Heatmap Among Conversations')
plt.xlabel('Conversation Index')
plt.ylabel('Conversation Index')
plt.show()
Notebook
If you found this article insightful, please consider clapping for this piece — it not only supports the author but also helps others discover valuable insights. Additionally, don’t forget to subscribe for more articles that delve into innovative technologies and AI developments. Your engagement is greatly appreciated!