Synthetic Data: A Game-Changer for Marketers or Just Another Fad?

By leveraging synthetic data, marketers can gain access to more granular data that is now harder to come by, enabling them to make more informed decisions.

By Romain Warlop On May 1, 2023

Generative AI first gained popularity in 2014, with the introduction of generative adversarial networks (GANs) that allowed researchers to create realistic images and videos. However, after several years of flying under the radar, the recent release of programs such as ChatGPT and DALL-e have thrust generative AI back into the spotlight. Synthetic data, in particular, is enjoying its day in the sun as marketers look to alternative solutions to fill the data void following Google’s decision to sunset third-party cookies.

What is synthetic data? In short, synthetic data refers to artificially generated data that reflects real world data. Instead of using real-world sources and consumer data, synthetic data leverages computer algorithms and AI to emulate real-world consumer data. It can be a very useful tool to marketers as it provides access to larger data sets when granular data is hard to source, enabling them to share data sets, mitigate biases and run A/B tests for marketing campaigns. However, the truth is, not a lot is known about how privacy-compliant synthetic data is, and as marketers increasingly look to leverage synthetic data as a marketing tool, it’s important to understand the full picture.

Benefits of Synthetic Data

Traditionally, marketers used synthetic data for creative generation. They have used AI to generate images that have a specific set of characteristics that will help designers create proprietary marketing materials. However, there are more use cases for synthetic data outside of the creative world. Marketers can use synthetic data to run future what-if scenarios to forecast events on marketing before they actually happen.

For instance, using internal data together with open data, external studies, and business knowledge, marketers can create a synthetic data set of customer behavior on the market that replicates historical KPIs and mimics plausible scenarios in order to test the result of a marketing strategy in different configurations.

By testing algorithms on synthetic data sets, marketers have more control over their marketing strategies and can see if their plans are working at scale. For example, if marketers develop an algorithm to detect counterfeit products from images, they can test the accuracy of their model before using it with their actual data. Marketers can generate millions of variations of counterfeit images using generative AI, then test their algorithm to determine if the algorithm created does in fact detect them all. If the algorithm is successful, they can deploy it. If not, they know they need to fix their algorithm before it goes live.

Before expanding to new audience segments, marketers can also use synthetic data that mimics the behavior of this new segment and apply their own forecasting algorithms to help them determine the relevancy of the segment. Synthetic data, despite being artificially created, mimics the real world, making it an invaluable resource for marketers grappling with data scarcity in light of the depreciation of third-party cookies.

By leveraging synthetic data, marketers can gain access to more granular data that is now harder to come by, enabling them to make more informed decisions.

Samsung’s ‘AI for All’ Vision Unveiled at CES 2024

Jan 9, 2024

Stellantis, BlackBerry QNX and AWS Launch Virtual Cockpit, Transforming In-Vehicle Software Engineering

Jan 9, 2024

Lenovo Unleashes AI-Powered Creativity and Productivity Devices and Solutions at CES 2024

Jan 9, 2024

Prev Next 1 of 6,822

Marketers can share synthetic data with other companies to help create audience groups with more insights that they would have with only their own data to gain a better understanding of their consumers. For instance, a sister company that has both different segments and shared segments with the first company, may share synthetic data to help the first company to develop such a model. Another way to generate such synthetic data can be to buy studies that will give access to aggregated behavior of consumers on the market, and then build a model that simulates consumer behavior that in average gives the same aggregated results. The more granular the studies will be, the more meaningful the generated synthetic data will be.

The Privacy Problem

While the use cases are clear for synthetic data, we can’t prove that it is privacy-safe and compliant yet. One key area of concern is that there may be ways to trick algorithms to release data. Since synthetic data comes from scraping the internet, the end result may share precise characteristics of true proprietary information and can thus be seen as plagiarism.

Every company comes with its own data and crossing internal data with open data, even synthetic data, may sometimes leak private information. For instance, if the synthetic data has been generated from company A’s CRM data, company B, that shares some clients with company A, may be able to find a way to map synthetic behavior to a true individual in their own CRM data. Moreover, asking the right questions to generative AI could end up leaking information that was not supposed to be shared, such as lines of code shared via private Twitter conversations between two developers at the same company.

Using Synthetic Data Moving Forward

Synthetic data can be a powerful tool for marketers if they know how to use it to their benefit and understand its limitations. Synthetic data and other forms of generative AI can be used to brainstorm ideas and ease the first step of projects such as creating a draft code or potential images – but technology evolves quickly, and synthetic data, along with other forms of generative AI, are accelerating at a rapid pace.

As these technologies advance, marketers should expect to see frameworks and regulations put into place, a trend that’s already starting to take place in Europe. Last month, Italy’s data protection agency announced that ChatGPT would be blocked in Italy over concerns that its training function violates GDPR. Yet, earlier this month, the agency announced that ChatGPT could continue to operate in Italy if it met a series of steps that would make the app more transparent.

It’s also important to remember that synthetic data and generative AI is not a magical solution to the loss of third-party data. Marketers still don’t have clarity on where synthetic data comes from (unless created internally) or how accurate it is, so synthetic data should be one of many solutions used to mitigate the loss of data. We’re already seeing programs like ChatGPT churn out inaccurate information, emphasizing the importance of testing synthetic data. It’s essential that all generated information must be curated and validated by a human expert. Doing so can ensure the generated data is not only useful, but also accurate to the task at hand and privacy-compliant.

[To share your insights with us, please write to sghosh@martechseries.com]