Evolving Email Security with Embedding Space and AI-powered Subject Analyzer

Perception Point has developed “Subject Analyzer,” an NLP model that classifies email subjects to maximize detection of threats and spam

In today’s modern workspace, email remains the primary communication channel for most organizations, making it the main target and entry point for threat actors. Perception Point has developed Subject Analyzer, an innovative text analysis model that leverages the power of artificial intelligence (AI) and deep learning to maximize detection of threats as well as spam based on the email subject.

Subject Analyzer is a natural language processing (NLP) model developed in-house, designed to understand and classify email subjects with remarkable accuracy. This new model, like many of Perception Point’s AI-powered detection engines, is built upon advanced deep learning techniques. 

Understanding the Foundation: Deep Learning and NLP in Email Classification

Before we delve deeper into the workings of Subject Analyzer, let’s lay the groundwork by understanding its foundation. 

Deep learning, a subset of AI, takes inspiration from the human brain’s structure and functions. It uses algorithms called neural networks to analyze data, learn from it, and make decisions. Unlike traditional machine learning, which requires manual feature selection and is limited by the programmer’s foresight, deep learning autonomously discovers the features to be used for learning from the data itself. This ability enables deep learning models to handle vast datasets and recognize complex, nuanced patterns that are not immediately obvious to human analysts or traditional methods.

NLP, another AI branch, allows computers to understand, interpret, and generate human language. In the context of cybersecurity, NLP is crucial for analyzing the vast amounts of text within emails to detect malicious intent or spam. Together, deep learning and NLP equip the Subject Analyzer with the ability to sift through the subtleties of email subjects, identifying threats with precision that was previously unattainable.

By leveraging these advanced technologies, the Subject Analyzer doesn’t just look for known threat signatures; it understands the essence of the communication to identify threats that are new or cleverly disguised. This allows our detection models to learn from a large dataset of curated and labeled emails (‘clean’, ‘spam’, and ‘malicious’), identifying patterns and characteristics that differentiate between legitimate correspondence and malicious attempts. Through layers of neural networks, the Subject Analyzer discerns intricate details in the text of the email subject line, from the overall tone to specific word choices and sentence structures.

Figure 1: Neural Network Diagram for Email Subject Analysis
Figure 1: Neural Network Diagram for Email Subject Analysis

Subject Analyzer and the Role of Embedding Space

Embedding space is a fundamental concept in machine learning critical to understanding how the new model works. In essence, these multidimensional spaces, words, phrases, and entire sentences are converted into vectors (sequences of numbers). This process captures not just the superficial meaning of the text but its deeper semantic and contextual significance. In email communication, for instance, terms related to financial transactions like “invoice” or “payment request” are distinguished from those often found in promotional materials such as “free” or “discount”.

When we introduce words like “quotation” and “invoice” to the model’s embedding space, they undergo transformations into dense vector representations that encapsulate their semantic and contextual meaning. These representations exist not as isolated vectors, but rather as points within a high-dimensional continuum. Within this continuum, the spatial proximity between vectors indicates their semantic similarity.

In this embedding space, words associated with similar contexts often cluster together. For instance, terms commonly encountered in financial transactions or business communications, such as “invoice” and “request for payment”, tend to reside in close proximity due to their semantic relatedness. Similarly, words like “quotation”, frequently used in sales or procurement contexts, may cluster with these terms, albeit possibly in a slightly distinct region within the embedding space.

Figure 2: Embedding Space Visualization – different words cluster in the embedding space
Figure 2: Embedding Space Visualization – different words cluster in the embedding space

On the other hand, words like “free”, “discount”, and “click here”, known for their attention-grabbing qualities, may occupy another manifold, a collection of points forming a set, within the embedding space. These words are often found in subject lines of promotional emails or marketing materials. While they may not be directly related to financial transactions or business communications like the other terms, they share a semantic proximity due to their common purpose of engaging email recipients’ attention or enticing action. Within this manifold, these words form clusters that reflect their shared characteristics and usage contexts, facilitating the identification of attention-grabbing elements in email.

Figure 3: Real-life examples of the new model’s verdict and scores
Figure 3: Real-life examples of the new model’s verdict and scores

Analyzing the geometric relationships between these words in the embedding space, enables the AI-powered Subject Analyzer to detect subtle cues of spam attempts, BEC and other email-based threats, which might mimic legitimate business emails but carry malicious intent.

Scaling with Precision

One of the unique features of our in-house AI model lies in its scalability. Unlike off-the-shelf models that may struggle to adapt to the evolving cyber threat landscape, the Subject Analyzer is designed to scale seamlessly, ensuring robust performance even as the volume and complexity of malicious emails grows.

Precision and scalability are at the heart of every Perception Point security product. This ensures that organizations can maintain robust email security defenses and prevent threats while keeping the legitimate communications flow smoothly. 

Continuous AI Innovation 

The new model follows Perception Point’s continuous rollout of AI innovations, such as its image recognition based engine to combat advanced QR code phishing (AKA “Quishing”) attacks, its GenAI Decoder™​, an LLM-based model for detecting social engineering attempts like BEC, impersonation and phishing, and the GPThreat Hunter™ that autonomously resolves complex security cases with unprecedented accuracy and speed.

Artigo traduzido e disponibilizado pela DigitalSkills Consulting - Distribuidora oficial de soluções de cibersegurança do fabricante Perception Point. Para mais informações: www.digitalskills.pt | [email protected] | 21 418 05 21

Artigo original no site do fabricante em https://perception-point.io/blog/evolving-email-security-with-embedding-space-and-ai-powered-subject-analyzer/

Últimos artigos