Synthetic Data: A New Frontier for Cyber Deception and Honeypots

Cyber Threat Intelligence

24 Dec 2025

deception, threat hunting, network intelligence, threat actor, honeypot, honeytrap

Synthetic Data: A New Frontier for Cyber Deception and Honeypots

Investigating numerous incidents, Resecurity has developed a unique practice of using deception technologies for counterintelligence purposes. This may include solutions, tools, models, and methods that mimic legitimate enterprise environments to mislead potential threat actors and allow them to conduct malicious activities in a controlled manner. Many of these concepts originate from traditional honeypots, which enable network defenders to perform threat hunting passively—by deploying traps using misconfigured applications and network services, or dummy resources to log intruders.

With the rapid evolution of AI and ML, deception could be accelerated by using synthetic data—purposely generated data that has patterns and characteristics of real-world data without containing actual proprietary information. In the context of threat hunting, previously breached data can be highly effective for designing deception models that appear extremely realistic and attract threat actors. For example, a purposely planted honeypot—containing realistically looking (but practically useless) records—can motivate threat actors to attempt to steal it.

November 21, 2025 — Resecurity identified a threat actor attempting to conduct malicious activity targeting our resources. The actor was probing various publicly facing services and applications. Prior to that, the actor targeted one of our employees who had no sensitive data or privileged access. Our DFIR team logged the threat actor at an early stage and documented the following Indicators of Attack (IOA):

156.193.212.244 (Egypt)
102.41.112.148 (Egypt)
45.129.56.148 (Mullvad VPN)
185.253.118.70 (VPN)

Understanding that the actor is conducting reconnaissance, our team has set up a honeytrap account. This led to a successful login by the threat actor to one of the emulated applications containing synthetic data. While the successful login could have enabled the actor to gain unauthorized access and commit a crime, it also provided us with strong proof of their activity. Both Office 365 and VPN accounts are highly effective for creating honeypot (honeytrap) accounts to detect, track, and analyze hacker activity. Such accounts are widely used in enterprise environments to detect unauthorized access attempts and gather threat intelligence. The most successful honeypot deployments use realistic, well-monitored decoy accounts that mimic high-value targets but are isolated from real assets. In addition, you can use honeytrap accounts for own applications - on emulated environment, isolated from production resources and closely monitored.

Such accounts could be planted via Dark Web marketplaces and forums, so potential attackers will find and use them. One such account ("Mark Kelly") has been frequently planted on a marketplace commonly used for purchasing compromised data, called Russian Marketplace.

For synthetic data, we used two different datasets: over 28,000 records impersonating consumers and over 190,000 records of payment transactions, and generated messages. Notably, in both cases, we utilized already known breached data available on the Dark Web and underground marketplaces—potentially containing PII—making the data even more realistic for threat actors. Such data is readily available from open sources and can be used as an important element for cyber deception—especially when the threat actor is advanced and may perform various checks to verify that the data is not completely fake. Otherwise, this could affect their further tactics or lead to a complete halt of their planned actions. In our scenario, our goal was to allow the threat actor to conduct activity and feed them with synthetic data to observe their attack path and infrastructure. This task has not involved the use of passwords or API credentials.

Example:

None of these accounts are our actual customers; they are email addresses collected from publicly available combo lists and email lists, botnet data available on the Dark Web, including generated addresses. Some of the records were duplicated multiple times. In fact, none of our products have such user count.

- Payment Information (Stripe Records)

To prepare this, we used specialized synthetic data generation tools (e.g., SDV, MOSTLY AI, Faker) to create realistic, schema-compliant Stripe transaction and customer data. Our goal was to reproduce exactly the same structure that the data would have according to Stripe’s official API schemas for customers, transactions, and subscriptions. In the official Stripe API, a transaction is typically represented as an object with fields such as:

id: Unique identifier for the transaction
amount: Amount of the transaction
currency: Currency code (e.g., USD)
created: Timestamp of when the transaction occurred
type: Type of transaction (charge, refund, payout, etc.)
status: Status of the transaction (succeeded, pending, failed, etc.)
customer: Reference to the customer object
metadata: Custom key-value pairs for additional information

- Faked Customer Records (Consumer Records)

username
email
firstname
lastname
organisation
date

- Open-Source Messenger Application (such as Mattermost)

Depending on the level of deception, you can use non-sensitive data and chatter. In our case, we prepared an environment with chatter consisting of very outdated logs from 2023 to serve as a "shiny object" with 6 groups having no sensitive communications - placed in decomissioned system.

A combination of these datasets allows for mimicking a possible business application that involves consumers with financial transactions, which could be of interest to financially motivated threat actors.

The threat actor fell into our trap and began planning automation to dump the available data. It took some time, and on December 12, they resumed activity. It is possible that the threat actor was developing a custom scraper to facilitate data dumping. By that time, they used a large number of residential IP proxies to automate their activity, which helped our DFIR team gather substantial knowledge about their TTPs and the network infrastructure they used. This data is typically called "abuse data"—artifacts collected as a result of the threat actor abusing a specific application or service, or misusing it. Abuse data can also be used for early-stage threat detection when the same actor targets other enterprises, acting as Indicators of Compromise (IOCs). Sharing fresh abuse data can help network defenders hunt for threat actors operating on the same infrastructure more effectively.

Between December 12 and December 24, the threat actor made over 188,000 requests attempting to dump synthetic data. During this period, the Resecurity team documented the activity and collaborated with relevant law enforcement authorities and ISPs to share information about it. The attacker aimed to scrape the data using malicious automation.

Notably, the actor became quite busy and, at some point, disclosed his real IP addresses due to proxy connection failures, creating an OPSEC issue.

A similar issue occurred during new attempts, leading to another disclosure. In both cases, information about the attacker's hosts was reported to law enforcement.

Observing this activity, our team generated additional synthetic data of a different nature to give the actor more room for maneuvering. This led to the disclosure of other important details that confirmed his origin.

Processing a large dataset of synthetic data led to several OPSEC mistakes, resulting in the identification of the exact servers used by the attacker for automation—where he was using lists of residential IP proxies to spoof the source.

After acquiring a substantial number of residential proxies, we began blocking them, which limited the actor to a smaller number of possible hosts for proxifying the traffic. This led to the resurgence of the same IPs identified earlier.

Once the actor was located using available network intelligence and timestamps, a foreign law enforcement organization, a partner of Resecurity, issued a subpoena request regarding the threat actor.

The conclusion of this activity confirms that cyber deception using synthetic data can be highly effective, not only in threat intelligence gathering but also in investigative tasks. Depending on the jurisdiction, cybersecurity teams should ensure compliance with privacy laws and consult legal counsel before deploying such measures.

Update (from January 3, 2026):

Following our publication, the group called ShinyHunters, previously profiled by Resecurity, fell into a honeypot. In fact, we are dealing with their rebranded version, which calls itself "Scattered Lapsus$ Hunters," due to the alleged overlap between the threat actors ShinyHunters, Lapsus$, and Scattered Spider.

LAPSUS$, ShinyHunters, and Scattered Spider are linked to 'The Com,' a predominantly English-speaking cybercriminal ecosystem. This loosely organized network operates more as a cybercrime youth movement, encompassing a broad and constantly shifting range of actors, mainly teens. Some of the announcements related to successful data breaches by these actors were published at the associated Telegram channel known as "The Comm Leaks". The FBI issued a Public Service Announcement (PSA) warning about the risks associated with joining such movements.

Our previous reports about them can be found at the following links:

Trinity of Chaos: The LAPSUS$, ShinyHunters, and Scattered Spider Alliance Embarks on Global Cybercrime Spree
https://www.resecurity.com/blog/article/trinity-of-chaos-the-lapsus-shinyhunters-and-scattered-spide...

ShinyHunters Launches Data Leak Site: Trinity of Chaos Announces New Ransomware Victims
https://www.resecurity.com/blog/article/shinyhunters-launches-data-leak-site-trinity-of-chaos-announ...

In Telegram, the group claims to have "compromised" Resecurity, not realizing they have fallen into a honeypot prepared for them. The group claimed that "they have gained full access to Resecurity systems," which is a clear overstatement, as the honeypot environment prepared by us did not contain any sensitive information.

The screenshots shared by the threat actors relate to "[honeytrap].b.idp.resecurity.com" (a system emulated with compromised data from the Dark Web and not associated with any actual Resecurity customers) and the Mattermost application, which was provisioned for the honeytrap account "Mark Kelly" around November 2025 for this purpose.

The group admitted that Resecurity's efforts disrupted their operations. Our team used social engineering to acquire data from the group and tracked their activity:

Update (from January 4, 2026):

The actors removed the posting from their Telegram channel.

What threat actors did not realize:

The populated accounts contained records from non-existent domains such as "resecure.com" (a domain that does not exist and did not belong to the company) and non-existing accounts flagged as "developers" / "testers";

API keys, along with other "tokens," are hashed with bcrypt and belong to "dummy accounts" with duplicated records, which hold no value.

In-actionable (useless) data from a few years ago, planted by our engineers and mixed with AI-generated content enabled us to document their activities using honeytrap account. The generated data was based on the output of OpenAI, acting as a GPT assistant. The activity has been imaged and retained, including exact timestamps and network connections, which have been shared with law enforcement.

Why are honeytrap accounts effective? They enable defenders to simulate a realistic environment for advanced attackers and collect valuable information about their activities.

As a result of this exercise, we were able to identify the actor and link one of his active Gmail accounts to a US-based phone number and a Yahoo account. This account was registered by the actor during the observed honeytrap activity and was logged by Resecurity.

- jwh*****y433@gmail[.]com