Aug 14, 2025

MasterClass' need for synthetic data

The need for synthetic data

At MasterClass, we’ve always relied on synthetic conversational data for post-training of our OnCall models. Good synthetic data is hard to generate, with the chief reason being that it’s hard to create diversity of content. We’ve noticed, despite attempts to introduce diversity through prompting, how each synthetic user persona across 100s of conversations feels like it’s the same person, always happy and grateful to hear whatever the synthetic assistant tells them. Real users’ behavior during conversations does not look like that.

Synthetic data pipelines can often get complex due to the seemingly endless customization possibilities in the process.

Why Snowglobe clicked for us

"When we started using Snowglobe, the clearest difference we saw was how realistic the synthetic user personas felt compared to any synthetic data that we’d seen before. It was clear that this was a priority for their team, and it mattered to us." - Aman Gupta, Head of AI, MasterClass

They have also done a great job of distilling the process of conversational generation into modular components - like simulation intents and customizable LLM judges to analyze and retry assistant turns - that contain the complexity while providing all the flexibility we need. Combined with the great science for creating diverse user personas and conversational use cases, it becomes quite easy to execute an idea and realize the kind of conversational dataset we need.

Including other stakeholders in the process

Snowglobe is also doing a great job of creating visualizations and other ways to analyze the generated data, helping us understand what it looks like. And this is available to all stakeholders, not just engineers or data scientists, bypassing the need for them to do the analysis of the dataset and share that across the team. It’s immensely useful to allow anyone in the team to look through the data through their UI, or even kick off new runs of generation based on what they’ve found.

Experiments

As we are working on launching new OnCall Coach models, we need synthetic conversational data for expected conversations that our users might want to have with them. We have completely switched to using SnowGlobe for this data.

We are also using this opportunity to measure the impact of this switch. We are working to set up experiments that use other baselines to train our models with, and compare the impact on evaluation. We hope to share more information from those experiments soon!

Ready to Raise the Bar for AI Reliability?

Explore how Snowglobe can enhance the safety and reliability of your AI solution.

  • Book time with one of our founders to request a demo and see Snowglobe in action

  • Try out Snowglobe now!

@Snowglobe2025. All rights reserved.

@Snowglobe2025. All rights reserved.

@Snowglobe2025. All rights reserved.