Satvik Shreesha

ALL WORK

OBSERVE.AI | 2025 INTERNSHIP

Building an internal preview to surface AI copilot gaps and create system trust.

MY ROLE

Product Design Intern

TEAMMATE

Nitin (PM)

Rahul (PM)

Albert (Design/Mentor)

Me!

TOOLS

Figma

Lovable

TIMELINE

Jul - Aug 2025

DESCRIPTION

change

CONTEXT

change

CONTEXT: REALTIME COPILOT

RealTime Copilot assists customer service agents through contextual recommendations during their calls.

Customer service agents face the daily challenge of delivering timely and accurate assistance that aligns with company policies and vast knowledge bases. Because they are the direct face of the company, agents go through rigorous training to ensure they are providing excellent customer service. Yet each customer call is so unique, making it impossible to anticipate and excel in every scenario.

RealTime Copilot is an AI-powered assistant that provides in-call guidance to agents through context-specific recommendations. Assistance is provided through several modules in a feed-like view throughout the call.

RealTime Copilot is composed of multiple different recommendation modules.

Some relevant modules for this case study (the top 3 modules above):

Customer Profile (green): Fetches customer data from SalesForce CRM to provide context for Copilot’s suggestions.
Knowledge Base (blue): The most frequently used module—summarizes relevant knowledge articles into actionable steps.
Call End Summary (purple): Creates a concise summary of call details for agents’ post-call tasks.

RealTime Copilot Demo

To put things into perspective, here is a quick demo I made of Copilot in action!

PROBLEM

Trust barriers prevented prospects from actually adopting RealTime Copilot into their company workflow

While prospects like Paycor recognized the value of RealTime Copilot during product demos, they were hesitant to follow through with deals because of a fundamental concern: trust.

Without a clear way for prospects to test Copilot, they weren't confident to deploy at scale, risking negative customer experiences and a loss on their investments into the product.

SOLUTION PREVIEW

An Internal Preview tool to empower admins to test and refine RealTime Copilot before and after deployment

PRODUCT REQUIREMENT DOC OVERVIEW

content

While being onboarded onto this project, Rahul and Nitin (the two RealTime Copilot PMs) provided me with a PRD that was based on early conversations both internally and with concerned relevant customers.

They wanted to push out a solution fast, so they divided the project into 6 phases and assigned me control over phase 1 (all I had time to explore during my short internship duration). The main difference between each phase was the complexity of the input type (what the testable transcript was based off of).

Project Phases

The 6 phases of the project each added a new unique input type.

With only a couple weeks remaining in my internship, I was only able to explore Phase 1 of the Preview tool. The solution was detailed to solve these 3 requirements:

Phase 1 Requirements

Quickly build the first input mechanism: roleplaying as both agent and customer

Before I started working on this project, the PMs had their own workaround for testing the Copilot. They had a minimalistic interface on the main Copilot configuration screen where they would manually type in "Agent: blah blah" and "Customer: blah blah" and generate a chat thread of recommendations.

While functional, they needed a polished and simpler UI to release to the public. The engineering team already had the feature built, so the PMs wanted to utilize this and release phase 1 soon.

Simulating the real agent experience

The PMs wanted admins to not only see the Copilot recommendations, but feel what the agent would experience at that moment.

Showcase the correlation between a transcript instance and a Copilot recommendation

Because this was a testing environment, the goal was to be transparent about every decision/outcome as possible. Therefore, it was a priority to make it clear what prompted a Copilot recommendation to appear.

QUESTIONING THE PRD

"When admins identified an incorrect Copilot recommendation, how do they diagnose why it is incorrect and determine how to fix it?"

PRD GAP

Half the Solution: proposing a Copilot Diagnosis Tool to empower Admins to pinpoint specific issues and thus save Engineering Team resources

The current requirements only involved previewing the Copilot, and this would leave admins in the dark about what was going on. I truly believe that AI should never be seen as a magical "black box". This was bad for two reasons:

Admin's problem

Viewed Copilot AI as a magical "black box" —> had no behind-the-scenes into what was going on
Had no control over the system iterations done on their Copilot

Observe's Engineering Team's problem

No clear issue diagnostic —> increased time and resources solving a problem
Took away precious time working on other high impact Copilot projects

My idea: Copilot Diagnosis Tool

I envisioned adding a "Copilot Diagnosis Tool" that exposes how Copilot was converting a Dialogue into a System recommendation, laying out the mini-steps in between. Non-technical admins would be able to see and understand at what mini-step the output was looking wrong.

Identifying which "mini-step" the Copilot went wrong

PMs disagreed with prioritizing the Copilot Diagnosis Tool

When I brought up my concerns to the PMs, they agreed that while it was a good consideration, it was something they could later explore in Phase 4 due to the sheer amount of Copilot work going on during Q3.

While I understood where they were coming from, I believed we were letting our users down by not properly addressing their real needs. Talking with my manager Albert Wang, I added the Copilot Diagnosis Tool onto my to-do list. [Jump-to Diagnosis Tool section through the left side bar]

UPDATED PROJECT REQUIREMENTS

Design a Copilot Internal Preview Tool that doesn't just allow admins to preview Copilot suggestions, but to explore how and why these suggestions arise and solve issues accordingly

"Build your own transcript" —> fast to build + admins can quickly test out scenarios with the Copilot
Simulate the real agent experience
Highlight how a Transcript Dialogue corresponds to a Copilot recommendation
+ Create a Copilot Diagnosis Tool to increase admin trust with the AI Copilot

PRODUCT STRATEGY

Conflicting Requirements: two different layouts

content

Two of the project requirements conflicted

Simulating the Real Agent Experience for the Admins

Option 1: Side-by-side

Highlighting how a Transcript Dialogue corresponds to a Copilot Recommendation

Option 2: Integrated within Transcript

While ideating, I ran into a problem. Two of the main project requirements conflicted with each other. Implementing one meant sacrificing the other.

I personally vouched for the latter option because I believed the primary goal of the Internal Preview to be identifying errors and tweaking the system, which would best happen through a direct connection between transcript dialogue and it's corresponding Copilot recommendation.

FEATURE #1: AGENT-CUSTOMER ROLEPLAY

"Build Your Own Transcript"

Like mentioned earlier, Engineering had already built a makeshift way to preview Copilot recommendations through manually typing in "Agent: blah blah" and "Customer: blah blah" before their corresponding message.

I was tasked to create an Agent-Customer Roleplay chat system for Phase 1

The final chat interface allowed Admins to type messages as both Customer and Agent

FEATURE #2: AI THINKING

Content

If I had to summarize the goal of Internal Preview in one phrase, it would be:

"Keep the user in the loop to build trust."

A lot of different AI companies were exploring this concept. I checked them out:

How are other AI models showing their reasoning?

ChatGPT

Displays thinking time but has hidden mini-steps

What I liked:

Displaying how long it thought for
How compact the UI was

Claude

Openly breaks down each mini-step of the task

What I liked:

The organization of the thinking steps consisted of a clear icon and title

Synthesizing the UX patterns in ChatGPT and Claude, alongside the needs of our user, I came up with two areas to display AI thinking patterns:

Above each recommendation: compactly show long the recommendation took to generate
Copilot Diagnosis tool: clearly showcase the thinking steps

Integrating into Internal Preview

Showing Copilot Thinking above every recommendation

VERSION 1

A way to simply preview Copilot recommendations based on the system configurations

Putting it all together, I created a working prototype for the Internal Preview V1 that followed the Product Requirement Doc.

An additional feature we had included was a "Triggered Recommendations" bookmark (leftside bar).

After creating this mock, I explored how the Diagnosis Tool would look like.

FEATURE #3: ERROR DIAGNOSIS TOOL

Content

I felt most passionate about this feature. Although the PMs didn't believe in prioritizing it at the moment, I strongly felt we had the duty to fully explore how to solve our users' problem.

The current process

The inefficient flow proposed by the PRD

content

A process that empowers Admins to diagnose and flag the issue

When Admins believe a recommendation to be wrong, they can click into it to view more details (keeps the Error Diagnosis Tool out of the main Internal Preview screen)

Shows the mini-steps of how Copilot translated a Transcript Dialogue into a Recommendation Output

Non-technical Admins can spot at which mini-step the Copilot went wrong. They can flag the mini-step and provide additional details to make the solution process easier for Engineering

Introducing the Error Diagnosis Tool

content

USER RESEARCH

Content

I conducted 4 semi-structured interviews with Customer Admins and Observe's Engineering team. I mainly set out to validate the current designs for Internal Preview, gauge opinions on the direction of future phases, and see if our system would build confidence in the Copilot.

Validating the Error Diagnosis Tool

When I showed the interviewees the Internal Preview V1 designs, they felt it would be inefficient at solving the actual problem at hand: building trust through improving Copilot.

At this point, I was able to pull out my Error Diagnosis Tool designs— something I planned on going over at the very end if we had time.

Besides a few comments, the Customer Admins and Observe's Engineering Team validated that the Error Diagnosis Tool would allow for them to better identify and solve Copilot configuration issues— building trust over time.

Pushing for the Error Diagnosis Tool to be included in Phase 1

After validating the Error Diagnosis Tool, I revisited my conversation with the PMs. Showing the real user evidence, I talked about how nearly ever interviewee saw this concept as essential to using the Internal Preview tool effectively.

While the PMs originally viewed it as a lower priority for later phases, the feedback revealed that what they viewed as a "nice-to-have" was viewed as a necessity for users. I pushed for it to

ITERATIONS

Content

Intro content

Issue #1: Admins wanted more than a "Flag Issue" button for the Error Diagnosis Tool

Admins could pinpoint which mini-step the Copilot went wrong through the Error Diagnosis Tool, but they had no control to fix it. They didn't want to be hyper-reliant on the Observe Engineering team, and wanted the ability to make small configuration tweaks themselves instead of waiting for delayed Engineering support.

Iteration: Empowering Admins with a "Go to Configuration" Shortcut

Introduced a direct "Go to Configuration" button that lived alongside the "Flag Issue" button. This gave admins the autonomy to jump straight into Copilot's setup and make quick prompt adjustments.

Issue #2: Full-Screen Internal Preview design would hinder rapid Copilot testing and editing for Admins

By empowering admins to both identify errors and directly edit Copilot configurations, the existing full-screen Internal Preview became a bottleneck. Admins were forced to constantly switch between views, breaking their flow and slowing fast iteration.

Iteration: Half-screen Internal Preview enables simultaneous testing and configuration

I redesigned the preview for a more split-screen view— half the screen for Internal Preview, and half for the Copilot configuration. This side-by-side setup eliminated extra clicks, supporting fluid testing and prompt adjustments on a singular screen.

FINAL DESIGN

content

content

LESSONS LEARNED

Making AI products more human :)

Designing for Trust is Essential for the Success of AI Products

The biggest lesson I learned this summer (which also turned into a big personal passion) is designing experiences that demystify AI and return control back to the users. As technology rapidly advances and machines take on greater decision-making roles, building system trust becomes essential. I hope to explore this lesson in future experiences!

A Good Product Designer Relentlessly Advocates for the User

I was constantly championing for the Error Diagnosis Tool, something the PMs had deprioritized from the start. But I learned that a designer's true impact is always advocating for the user—even when it means navigating conflicts with PMs, leadership, engineering, and anyone else.