OBSERVE.AI | 2025 INTERNSHIP
Building an internal preview to surface AI copilot gaps and create system trust.
MY ROLE
Product Design Intern
TEAMMATE
Nitin (PM)
Rahul (PM)
Albert (Design/Mentor)
Me!
TOOLS
Figma
Lovable
TIMELINE
Jul - Aug 2025
DESCRIPTION
change
CONTEXT
change
CONTEXT: REALTIME COPILOT
RealTime Copilot assists customer service agents through contextual recommendations during their calls.
Customer service agents face the daily challenge of delivering timely and accurate assistance that aligns with company policies and vast knowledge bases. Because they are the direct face of the company, agents go through rigorous training to ensure they are providing excellent customer service. Yet each customer call is so unique, making it impossible to anticipate and excel in every scenario.
RealTime Copilot is an AI-powered assistant that provides in-call guidance to agents through context-specific recommendations. Assistance is provided through several modules in a feed-like view throughout the call.
RealTime Copilot is composed of multiple different recommendation modules.
Some relevant modules for this case study (the top 3 modules above):
Customer Profile (green): Fetches customer data from SalesForce CRM to provide context for Copilot’s suggestions.
Knowledge Base (blue): The most frequently used module—summarizes relevant knowledge articles into actionable steps.
Call End Summary (purple): Creates a concise summary of call details for agents’ post-call tasks.
RealTime Copilot Demo
To put things into perspective, here is a quick demo I made of Copilot in action!
PROBLEM
Trust barriers prevented prospects from actually adopting RealTime Copilot into their company workflow
While prospects like Paycor recognized the value of RealTime Copilot during product demos, they were hesitant to follow through with deals because of a fundamental concern: trust.
Without a clear way for prospects to test Copilot, they weren't confident to deploy at scale, risking negative customer experiences and a loss on their investments into the product.
SOLUTION PREVIEW
An Internal Preview tool to empower admins to test and refine RealTime Copilot before and after deployment
PRODUCT REQUIREMENT DOC OVERVIEW
content
While being onboarded onto this project, Rahul and Nitin (the two RealTime Copilot PMs) provided me with a PRD that was based on early conversations both internally and with concerned relevant customers.
They wanted to push out a solution fast, so they divided the project into 6 phases and assigned me control over phase 1 (all I had time to explore during my short internship duration). The main difference between each phase was the complexity of the input type (what the testable transcript was based off of).
Project Phases
The 6 phases of the project each added a new unique input type.
With only a couple weeks remaining in my internship, I was only able to explore Phase 1 of the Preview tool. The solution was detailed to solve these 3 requirements:
Phase 1 Requirements
1
Quickly build the first input mechanism: roleplaying as both agent and customer
Before I started working on this project, the PMs had their own workaround for testing the Copilot. They had a minimalistic interface on the main Copilot configuration screen where they would manually type in "Agent: blah blah" and "Customer: blah blah" and generate a chat thread of recommendations.
While functional, they needed a polished and simpler UI to release to the public. The engineering team already had the feature built, so the PMs wanted to utilize this and release phase 1 soon.
2
Simulating the real agent experience
The PMs wanted admins to not only see the Copilot recommendations, but feel what the agent would experience at that moment.
3
Showcase the correlation between a transcript instance and a Copilot recommendation
Because this was a testing environment, the goal was to be transparent about every decision/outcome as possible. Therefore, it was a priority to make it clear what prompted a Copilot recommendation to appear.
QUESTIONING THE PRD
"When admins identified an incorrect Copilot recommendation, how do they diagnose why it is incorrect and determine how to fix it?"
PRD GAP
Half the Solution: proposing a Copilot Diagnosis Tool to empower Admins to pinpoint specific issues and thus save Engineering Team resources
The current requirements only involved previewing the Copilot, and this would leave admins in the dark about what was going on. I truly believe that AI should never be seen as a magical "black box". This was bad for two reasons:
Admin's problem
Viewed Copilot AI as a magical "black box" —> had no behind-the-scenes into what was going on
Had no control over the system iterations done on their Copilot
Observe's Engineering Team's problem
No clear issue diagnostic —> increased time and resources solving a problem
Took away precious time working on other high impact Copilot projects
My idea: Copilot Diagnosis Tool
I envisioned adding a "Copilot Diagnosis Tool" that exposes how Copilot was converting a Dialogue into a System recommendation, laying out the mini-steps in between. Non-technical admins would be able to see and understand at what mini-step the output was looking wrong.
Identifying which "mini-step" the Copilot went wrong
PMs disagreed with prioritizing the Copilot Diagnosis Tool
When I brought up my concerns to the PMs, they agreed that while it was a good consideration, it was something they could later explore in Phase 4 due to the sheer amount of Copilot work going on during Q3.
While I understood where they were coming from, I believed we were letting our users down by not properly addressing their real needs. Talking with my manager Albert Wang, I added the Copilot Diagnosis Tool onto my to-do list. [Jump-to Diagnosis Tool section through the left side bar]
UPDATED PROJECT REQUIREMENTS
Design a Copilot Internal Preview Tool that doesn't just allow admins to preview Copilot suggestions, but to explore how and why these suggestions arise and solve issues accordingly
"Build your own transcript" —> fast to build + admins can quickly test out scenarios with the Copilot
Simulate the real agent experience
Highlight how a Transcript Dialogue corresponds to a Copilot recommendation
+ Create a Copilot Diagnosis Tool to increase admin trust with the AI Copilot
PRODUCT STRATEGY
Conflicting Requirements: two different layouts
content
Two of the project requirements conflicted
Simulating the Real Agent Experience for the Admins
Option 1: Side-by-side

vs
Highlighting how a Transcript Dialogue corresponds to a Copilot Recommendation
Option 2: Integrated within Transcript

While ideating, I ran into a problem. Two of the main project requirements conflicted with each other. Implementing one meant sacrificing the other.
I personally vouched for the latter option because I believed the primary goal of the Internal Preview to be identifying errors and tweaking the system, which would best happen through a direct connection between transcript dialogue and it's corresponding Copilot recommendation.
FEATURE #1: AGENT-CUSTOMER ROLEPLAY
"Build Your Own Transcript"
Like mentioned earlier, Engineering had already built a makeshift way to preview Copilot recommendations through manually typing in "Agent: blah blah" and "Customer: blah blah" before their corresponding message.
I was tasked to create an Agent-Customer Roleplay chat system for Phase 1
The final chat interface allowed Admins to type messages as both Customer and Agent
FEATURE #2: AI THINKING
Content
If I had to summarize the goal of Internal Preview in one phrase, it would be:
"Keep the user in the loop to build trust."
A lot of different AI companies were exploring this concept. I checked them out:
How are other AI models showing their reasoning?
ChatGPT
Displays thinking time but has hidden mini-steps
What I liked:
Displaying how long it thought for
How compact the UI was
Claude
Openly breaks down each mini-step of the task
What I liked:
The organization of the thinking steps consisted of a clear icon and title
Synthesizing the UX patterns in ChatGPT and Claude, alongside the needs of our user, I came up with two areas to display AI thinking patterns:
Above each recommendation: compactly show long the recommendation took to generate
Copilot Diagnosis tool: clearly showcase the thinking steps
Integrating into Internal Preview
Showing Copilot Thinking above every recommendation
VERSION 1
A way to simply preview Copilot recommendations based on the system configurations
Putting it all together, I created a working prototype for the Internal Preview V1 that followed the Product Requirement Doc.
An additional feature we had included was a "Triggered Recommendations" bookmark (leftside bar).
After creating this mock, I explored how the Diagnosis Tool would look like.
FEATURE #3: ERROR DIAGNOSIS TOOL
Content
I felt most passionate about this feature. Although the PMs didn't believe in prioritizing it at the moment, I strongly felt we had the duty to fully explore how to solve our users' problem.
The current process
The inefficient flow proposed by the PRD
content
A process that empowers Admins to diagnose and flag the issue
1
When Admins believe a recommendation to be wrong, they can click into it to view more details (keeps the Error Diagnosis Tool out of the main Internal Preview screen)
2
Shows the mini-steps of how Copilot translated a Transcript Dialogue into a Recommendation Output
3
Non-technical Admins can spot at which mini-step the Copilot went wrong. They can flag the mini-step and provide additional details to make the solution process easier for Engineering
Introducing the Error Diagnosis Tool
content
USER RESEARCH
Content
I conducted 4 semi-structured interviews with Customer Admins and Observe's Engineering team. I mainly set out to validate the current designs for Internal Preview, gauge opinions on the direction of future phases, and see if our system would build confidence in the Copilot.
Validating the Error Diagnosis Tool
When I showed the interviewees the Internal Preview V1 designs, they felt it would be inefficient at solving the actual problem at hand: building trust through improving Copilot.
At this point, I was able to pull out my Error Diagnosis Tool designs— something I planned on going over at the very end if we had time.
Besides a few comments, the Customer Admins and Observe's Engineering Team validated that the Error Diagnosis Tool would allow for them to better identify and solve Copilot configuration issues— building trust over time.
Pushing for the Error Diagnosis Tool to be included in Phase 1
After validating the Error Diagnosis Tool, I revisited my conversation with the PMs. Showing the real user evidence, I talked about how nearly ever interviewee saw this concept as essential to using the Internal Preview tool effectively.
While the PMs originally viewed it as a lower priority for later phases, the feedback revealed that what they viewed as a "nice-to-have" was viewed as a necessity for users. I pushed for it to
ITERATIONS
Content
Intro content
Issue #1: Admins wanted more than a "Flag Issue" button for the Error Diagnosis Tool
Admins could pinpoint which mini-step the Copilot went wrong through the Error Diagnosis Tool, but they had no control to fix it. They didn't want to be hyper-reliant on the Observe Engineering team, and wanted the ability to make small configuration tweaks themselves instead of waiting for delayed Engineering support.
Iteration: Empowering Admins with a "Go to Configuration" Shortcut
Introduced a direct "Go to Configuration" button that lived alongside the "Flag Issue" button. This gave admins the autonomy to jump straight into Copilot's setup and make quick prompt adjustments.
Issue #2: Full-Screen Internal Preview design would hinder rapid Copilot testing and editing for Admins
By empowering admins to both identify errors and directly edit Copilot configurations, the existing full-screen Internal Preview became a bottleneck. Admins were forced to constantly switch between views, breaking their flow and slowing fast iteration.
Iteration: Half-screen Internal Preview enables simultaneous testing and configuration
I redesigned the preview for a more split-screen view— half the screen for Internal Preview, and half for the Copilot configuration. This side-by-side setup eliminated extra clicks, supporting fluid testing and prompt adjustments on a singular screen.
FINAL DESIGN
content
content
LESSONS LEARNED
Making AI products more human :)
1
Designing for Trust is Essential for the Success of AI Products
The biggest lesson I learned this summer (which also turned into a big personal passion) is designing experiences that demystify AI and return control back to the users. As technology rapidly advances and machines take on greater decision-making roles, building system trust becomes essential. I hope to explore this lesson in future experiences!
A Good Product Designer Relentlessly Advocates for the User
I was constantly championing for the Error Diagnosis Tool, something the PMs had deprioritized from the start. But I learned that a designer's true impact is always advocating for the user—even when it means navigating conflicts with PMs, leadership, engineering, and anyone else.
Nitin (PM), Vache (CPO), Albert (Manager), and me!
The very cool sign that greeted me every morning
Last ever 0.5 selfie taken in office :/