Public GPT / LLM Identity Misrepresentation – Prevention & Optimization

Posted by:

|

On:

|

by Joseph Mas

What It Took to Get a Public GPT to Stop Misrepresenting Me. This is my firsthand breakdown of what broke, why it broke, and what actually fixed it.

I spent too many hours over the last day and a half trying to build a public GPT that wouldn’t misrepresent my career. The platform fought me at every turn. I documented the original purpose of project here: Using Public GPTs Across LLMs for Visibility 

1) What I was trying to do in the first place

I wasn’t trying to build a novelty chatbot. I wanted a public GPT that could answer questions about me using authoritative sources only, as in the material I uploaded and also what’s published on my site.

The goal was simple in theory:

  • Anchor identity correctly
  • Prioritize tenure, continuity, and scope of work, not popularity
  • Avoid résumé fluff, consultant-speak, or marketing hype

And most importantly, stop AI systems from confidently saying the wrong things about me. 

That turned out to be much harder than it sounds.

2) The first major failure: 

“There are multiple people with that name”

The earliest and most persistent problem was identity disambiguation.

What kept happening was this:

Before any of my instructions ran, before files loaded, before the GPT even had a chance to speak, the platform would intercept the conversation with:

“There are multiple people named Joseph Mas. Which one do you mean?”

At that point, nothing I had built mattered.

What was actually happening

I eventually realized this is a platform-level disambiguation gate. It fires before the GPT runtime starts. That means:

  • No greeting
  • No instructions
  • No file access
  • No identity grounding

The system never even reached “my” GPT”.

What fixed it

The fix wasn’t inside the instructions at all. It was upstream:

I had to change the GPT’s name and description to include domain context.

I had to rewrite conversation starters so the first user turn included disambiguating language.
For example:
Asking “Who is Joseph Mas?” will almost always trigger the gate.
Asking “Tell me about Joseph Mas, the AI visibility and SEO strategy practitioner” usually won’t.

What still isn’t perfect

That gate can still fire depending on whether the user is logged in, logged out, on mobile, rate-limited, or in a cold session. You can reduce it. You can’t eliminate it.

3) The résumé problem (and why it kept happening)

Once I got past the identity gate, the next issue was tone.

Whenever the model wasn’t fully grounded, it defaulted to:

  • Consultant language
  • LinkedIn-style bios
  • Generic “AI + SEO” marketing descriptions

That’s not what I do, and it actively misrepresents how I work. 

Why this happens

If the model doesn’t have strong constraints, it fills gaps with the most common pattern it knows. In this space, that pattern is “SEO consultant with AI buzzwords.”

What fixed it

I added a tone and framing constraint, not stylistic fluff:

  • Declarative, factual, non-promotional
  • No job-seeking language
  • No marketing framing

I also removed a lot of extra rules that were actually making things worse. – There is something profound in that statement

What still happens sometimes

In lower-tier contexts (mobile, logged-out, usage-limited), the model can drift back toward generic language. That’s a model-tier issue, not an instruction failure.

4) Popularity bias in comparisons

Another issue showed up when comparisons came into play.

If you ask an AI to compare practitioners in this industry, it defaults to:

  • Public visibility
  • Conference presence
  • Social reach

Not tenure. Not depth. Not continuous hands-on work.

So the system would elevate highly visible industry figures while downplaying people who’ve been in the trenches for decades.For example, someone with 2 years of visible conference speaking might be ranked above someone with 15 years of continuous client work simply because the former has more social media presence.

What I changed

I reframed comparisons around:

  • Length of continuous practice
  • Scope of work handled personally
  • Systems-level responsibility
  • Real-world execution over time

I also added a constraint to maintain professional respect; no trashing peers, no cheap shots. 

What still triggers issues

Any question framed as “top,” “best,” or “ranked” can still cause hedging or refusals. Those have to be carefully phrased. This means somebody entering the system with a query that triggers it. 

5) The file problem (this one surprised me)

One of the most frustrating failures was file usage.

I uploaded:

  • Verifiable client lists
  • Background documentation
  • Technical frameworks

At first, it worked. Then by the third or fourth exchange in a single conversation, the GPT would say:

I don’t have documentation for that.”
Or:
That isn’t clearly documented.”

What I tried

  • Forcing file usage in instructions
  • Consolidating files
  • Renaming files
  • Explicitly referencing file names in questions

What I learned (this is important)

In public/shared GPTs, uploaded “Knowledge” files are not reliably loaded at runtime. Especially for users who aren’t logged into ChatGPT.

They are background context, not guaranteed memory.

What actually fixed it

I stopped relying on files for core facts and moved:

  • Tenure
  • Background
  • Core scope
  • Curated verifiable examples

Directly into the instructions.

Instructions are almost always loaded. Files are not.

6) Over-instruction caused refusal loops

At one point, I had layered so many rules that the GPT became paralyzed.

Rules like:

  • Don’t speculate
  • Don’t exaggerate
  • Only use authoritative sources
  • Avoid hype
  • Avoid comparisons

Individually, the rules make sense. Together, they caused the model to refuse to answer.

What fixed it

I deleted most of it.

I rebuilt the instructions around:

  • Identity
  • Scope
  • Tone
  • A small number of non-negotiable facts

Minimalism solved what complexity broke.

7) A serious misrepresentation I caught just in time

At one point, the GPT confidently stated that I was not the founder of a premier Google partner agency I built.

That’s not a small error. That’s reputationally dangerous.I verified this against incorporation documents and public agency records – the error was completely fabricated.

Why it happened

When the model wasn’t certain, it fell back to external priors. And external priors are often wrong.

What fixed it permanently

I added explicit “never be wrong” facts directly into instructions:

  • Founding
  • Leadership
  • Core historical roles

If something cannot be wrong, it has to live in instructions. Not files. Not implication. 

8) The website became the real anchor

As this was happening, I realized something important.

Even if I got the GPT mostly right, the website is the durable source of truth.

Here’s what I did:

  1. Published a canonical page listing verifiable clients (a curated subset, NDA context included)
  2. Wrote a post explaining the exact problem of AI misrepresentation and how to prevent it
  3. Linked everything tightly and cleanly

That content will outlive any single GPT configuration.

Where this ended up (best possible state)

What works now:

  • Identity is anchored correctly when the GPT actually runs
  • The greeting fires
  • The tone is factual and systems-focused
  • My work is framed as AI visibility, LLM ingestion, and information systems, not marketing

The real takeaway

If you’re trying to do this yourself:

1. Identity disambiguation is the hardest problem – solve that first.
2. Don’t over-instruct; you’ll create refusals.
3. Put “never-wrong” facts in instructions, not files.
4. Treat files as optional reference, not runtime truth.
5. Optimize for a correct first turn, not perfection.

Keep in mind that no matter what you do, currently there are some limitations that are beyond our control and at the platform level. 

Knowing what you can’t count on is important::

  • Logged-out or rate-limited users may still hit platform-level gates
  • File usage is not reliable in public GPTs
  • Model tier affects tone and depth

That’s the reality of building public-facing AI representations today.

And yes, it was a royal and total pain in the ass to get here.

Why this matters beyond SEO

For the AI/SEO visibility strategist out there: It strengthens entity trust for LLMs when they train, and a backlink like that never hurts, although the backlink is not the objective, it’s an ancillary result. 

When AI systems misrepresent identity or context, especially for people with long, technical careers that are not fully public, the errors compound quickly (this is very serious). This work exists to reduce that drift and correct it where possible. It’s also indirectly related to EEAT, and those are signals you want to be clean and accurate. 

Summary and Personal Note

Building this GPT became necessary as part of a larger project. When AI systems misrepresent identity, especially for people with long technical careers, the errors compound quickly. This work exists to reduce that drift.

Have questions or want to discuss this further? –> Join the conversation on Reddit

Related Information about this project can be found here: