Improving Trace Visibility and Reproducibility in Langfuse

November 11, 2025

Introduction

Over the last few weeks, I’ve been working on a set of improvements to how we capture and analyze traces in Langfuse, and I’m excited to share what has landed so far.

One of the meaningful updates is that every agent run trace now includes a version tag. This may seem small, but it makes a big difference. It allows us (and you) to easily see which version of an agent produced a given trace, understand whether users are on the latest version, and track how things evolve across deployments. Alongside that, each trace now includes its LLM cost, and these costs automatically roll up into a dashboard so you can track spend across staging and production environments.

How it works

I’ve also spent time around the dashboarding and replay experience. You can now create custom dashboards directly in the UI by adding widgets and writing queries. This gives you much more flexibility in understanding user behavior and agent execution patterns over time.

On the trace level, we now include both input and output data, which is crucial for reproducing what happened during a specific run. This is especially valuable during evaluation—you can replay a trace or re-run a particular agent event without guesswork.

Right now, I’m working on expanding what we capture as part of the trace context. Soon, traces will be able to include things like:

The all `.kernel` folder
Docker Compose files
SQL files
Request payloads
The raw log messages sent to the API

Langfuse trace UI with additional context

The goal is simple: make a trace a complete snapshot of what actually happened. This removes friction when debugging, evaluating, or comparing agent runs.

We also clarified what we mean when we talk about “input” and “output.” While we initially used the prompt and response directly, we now have the flexibility to choose which dynamic, meaningful context gets stored as input. The result is clearer, more useful traces, without logging unnecessary static data.

Overall, these improvements are about making it easier to understand, monitor, and reproduce agent behavior to iterate on the prompt, and ultimately helping teams move faster with more confidence.

Subscribe to our blog

Get the latest technical guides and product updates delivered to your inbox.

Subscribe to the AI Builder Series

Get a weekly roundup of practical guides, tools, and insights for building AI-native products.

You're in!

Oops! Something went wrong while submitting the form.