CUSTOMER ANALYTICS USING MULTIHEAD ATTENTION (Part 2)

In this concluding part, we will build the transformer model using positional encoding.

The Transformer model consists of several key components that work together to process and understand input sequences. Let's break down each component and its role:

1. Embedding Layer

The embedding layer converts input tokens into dense vectors of a fixed size. This is the first step in the model, where each word in the vocabulary is mapped to a corresponding dense vector.

embedding = Embedding(input_dim=vocab_size, output_dim=embedding_dim)(inputs)

Key Components of the Transformer Model

The Transformer model consists of several key components that work together to process and understand input sequences. Let's break down each component and its role:

Embedding Layer

The embedding layer converts input tokens into dense vectors of a fixed size. This is the first step in the model, where each word in the vocabulary is mapped to a corresponding dense vector.

Dropout Layer

The dropout layer is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training.

x = Dropout(0.3)(positional_encoding)

Multi-Head Attention

The multi-head attention mechanism allows the model to focus on different parts of the input sequence simultaneously. This capability is essential for capturing various aspects of the input, making it highly relevant for customer journey analysis and customer insights.

multi_head_attention = MultiHeadAttention(num_heads=num_heads, key_dim=embedding_dim)

x = multi_head_attention(query=x, value=x, key=x)

Layer Normalization

Layer normalization normalizes the inputs across the features, stabilizing the training process and improving model performance.

x = LayerNormalization()(x)

Feed-Forward Network

The feed-forward network applies a series of transformations to the attended representations. It consists of two dense layers with a ReLU activation function in between, adding non-linearity and enabling the model to learn more complex representations.

ff_network = Dense(ff_dim, activation='relu', kernel_regularizer=l2(0.01))(x)

ff_network = Dense(embedding_dim, kernel_regularizer=l2(0.01))(ff_network)

x = x + ff_network

Global Max Pooling

The global max pooling layer reduces the output of the Transformer block to a fixed size by selecting the maximum value over the time dimension. This layer captures the most important features, which is beneficial for customer analysis.

x = GlobalMaxPooling1D()(x)

Output Layer

The output layer consists of dense layers for the final prediction. For tasks such as sentiment analysis and behavior prediction, this layer outputs the final classifications.

outputs = Dense(num_classes, activation='softmax')(x)

Building and Training the Transformer Model

Here's the complete code to build and train the Transformer model with positional encoding:

def build_transformer_model_with_positional_encoding(max_length, vocab_size, embedding_dim, num_heads, ff_dim, num_classes):

inputs = Input(shape=(max_length,))

embedding = Embedding(input_dim=vocab_size, output_dim=embedding_dim)(inputs)

positional_encoding = PositionalEncoding(max_length, embedding_dim)(embedding)

x = Dropout(0.3)(positional_encoding)

multi_head_attention = MultiHeadAttention(num_heads=num_heads, key_dim=embedding_dim)

x = multi_head_attention(query=x, value=x, key=x)

x = LayerNormalization()(x)

ff_network = Dense(ff_dim, activation='relu', kernel_regularizer=l2(0.01))(x)

ff_network = Dense(embedding_dim, kernel_regularizer=l2(0.01))(ff_network)

x = x + ff_network

x = LayerNormalization()(x)

x = GlobalMaxPooling1D()(x)

x = Dropout(0.2)(x)

outputs = Dense(num_classes, activation='softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

return model

# Model Parameters

vocab_size = len(word2vec_model.wv.key_to_index) + 1

embedding_dim = vector_size

num_heads = 8

ff_dim = 256

num_classes = len(label_encoder.classes_)

# Build and compile the model

model_with_positional_encoding = build_transformer_model_with_positional_encoding(

max_length, vocab_size, embedding_dim, num_heads, ff_dim, num_classes

)

optimizer = Adam(learning_rate=0.001)

model_with_positional_encoding.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Model summary

model_with_positional_encoding.summary()

# Train the model with early stopping

early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)

history = model_with_positional_encoding.fit(X_train, y_train, epochs=50, batch_size=256, validation_data=(X_val, y_val), callbacks=[early_stopping])

# Evaluate model

loss, accuracy = model_with_positional_encoding.evaluate(X_val, y_val, verbose=0)

print(f'Loss: {loss:.2f}')

print(f'Accuracy: {accuracy * 100:.2f}%')

Visualizing Model Performance

To better understand the model's performance, we can plot the training and validation loss, as well as the training and validation accuracy. These plots provide insights into how well the model is learning and whether it is overfitting or underfitting.

# Plot training and validation loss

plt.figure(figsize=(12, 6))

plt.plot(history.history['loss'], label='Training Loss')

plt.plot(history.history['val_loss'], label='Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epochs')

plt.ylabel('Loss')

plt.legend()

plt.savefig('training_validation_loss.png')

plt.show()

# Plot training and validation accuracy

plt.figure(figsize=(12, 6))

plt.plot(history.history['accuracy'], label='Training Accuracy')

plt.plot(history.history['val_accuracy'], label='Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epochs')

plt.ylabel('Accuracy')

plt.legend()

plt.savefig('training_validation_accuracy.png')

plt.show()

Visualization of the plots

Training and Validation Plots

1. Training and Validation Accuracy Plot

Training Accuracy: The blue line represents the training accuracy over epochs. The training accuracy starts at a lower value and improves significantly with each epoch, stabilizing around 0.99 towards the end.

Validation Accuracy: The orange line represents the validation accuracy over epochs. The validation accuracy improves rapidly initially, reaching around 0.90, but then it fluctuates slightly and stabilizes around 0.88.

Observations:

There is a significant gap between the training accuracy and validation accuracy starting from around epoch 10, indicating potential overfitting. The model fits the training data very well but does not generalize as effectively as the validation data.

The early stopping might help, but it looks like the model continues to improve its training accuracy even after the validation accuracy has plateaued.

2. Training and Validation Loss Plot

Training Loss: The blue line represents the training loss over epochs. The training loss decreases consistently and becomes almost flat, reaching near zero towards the end.

Validation Loss: The orange line represents the validation loss over epochs. The validation loss decreases rapidly initially, reaching a minimum value around epoch 10, but then starts to increase slightly and stabilizes.

Observations:

The rapid decrease in training loss indicates that the model is learning effectively from the training data.

The increase in validation loss after epoch 10 suggests overfitting, where the model is fitting the noise in the training data rather than learning the underlying patterns that generalize to new data.

The point where the validation loss starts to increase while the training loss continues to decrease is a clear indication of overfitting.

Analysis of the Confusion Matrix

The confusion matrix provides a detailed breakdown of the model’s performance by showing the number of correct and incorrect predictions for each class. Here's a detailed analysis:

True Positives (TP): The bottom-right cell (4276) indicates the number of instances where the model correctly predicted the positive class.

True Negatives (TN): The top-left cell (1115) shows the number of instances where the model correctly predicted the negative class.

False Positives (FP): The top-right cell (315) represents the number of instances where the model incorrectly predicted the positive class when it was actually negative.

False Negatives (FN): The bottom-left cell (293) indicates the number of instances where the model incorrectly predicted the negative class when it was actually positive.

Analysis of the Confusion Matrix

From the confusion matrix, we can derive important performance metrics:

Accuracy: The overall accuracy of the model can be calculated as:

Accuracy for positive classes =

Precision: Precision for the positive class can be calculated as

Precision for the positive class

Recall: Recall for the positive class can be calculated as

Recall for the positive class

F1 Score: The F1 Score, which is the harmonic mean of precision and recall, can be calculated as:

Insights

High True Positives and True Negatives

The high values for true positives (4276) and true negatives (1115) indicate that the model is performing well in correctly classifying both the positive and negative classes.

Moderate False Positives and False Negatives

The values for false positives (315) and false negatives (293) are relatively low, but there is still room for improvement. Reducing these errors would further enhance the model’s performance.

Balanced Performance

The precision, recall, and F1 Score values are all high, suggesting a well-balanced model that performs consistently across different evaluation metrics.

Summing it up

Addressing False Positives and False Negatives: To reduce the false positives and false negatives, consider further tuning the model, exploring more advanced regularization techniques, or augmenting the dataset to improve generalization.

Continuous Monitoring: Regularly update the confusion matrix and related metrics to monitor the model's performance over time and ensure it remains robust across various data distributions.

By leveraging the detailed insights from the confusion matrix, you can continue to refine your model and achieve even greater accuracy and reliability in text classification tasks.

Page updated

Google Sites

Report abuse