The Jewell Assessment Framework

Layer 1

Foundation

The raw materials: data quality, accessibility, format consistency, infrastructure readiness, and documented governance policies. This is the layer every other assessment already covers. It's necessary but it's the floor, not the ceiling.

Where Companies Fail

"We have the data somewhere"

Organizations claim data readiness because data exists — but it's scattered across 14 systems, three cloud providers, and someone's desktop spreadsheet labeled "master_list_FINAL_v3." When an AI model needs to pull 12 months of customer interactions, it finds six months in Salesforce, four months in a legacy CRM, and two months in email threads nobody migrated.

A mid-market insurance company launched an AI claims processing pilot. Three months in, they discovered 40% of their claims data was trapped in scanned PDFs with no OCR pipeline. The AI could read the structured database records but was blind to nearly half the actual claims history. Cost: $200K+ in wasted implementation spend.

"Governance by document"

The organization has a 40-page AI governance policy that legal drafted, the board approved, and nobody has read since. Data ownership is defined on paper but not enforced in practice. The CISO signed off on an AI security framework, but the engineering team doesn't know it exists.

A Fortune 500 retailer had a comprehensive data governance policy. When audited, they discovered 60% of their data pipelines had no lineage tracking — they couldn't trace which customer data fed which AI model. A material GDPR/CCPA violation that the governance document was supposed to prevent.

Where Companies Succeed

"Data as a product"

Organizations that treat data like a product — with owners, quality standards, SLAs, and consumers — consistently outperform. They don't just have data; they have data that is maintained for specific use cases.

Spotify's data mesh approach assigns domain teams ownership of their data as a product. When they build AI features like Discover Weekly, the recommendation team doesn't beg the streaming team for clean data — the streaming team already publishes it with documented schemas, freshness guarantees, and quality scores.

"Start with the query, not the warehouse"

Instead of building a massive data infrastructure and hoping AI use cases emerge, successful organizations start with a specific question and work backward to what data they need. This avoids the common trap of spending 18 months on a data lake nobody uses.

Capital One's approach begins with business questions, not data infrastructure. Each AI initiative starts with a clearly defined decision it needs to improve — credit risk scoring, fraud detection, customer targeting — and builds only the data pipeline required for that specific decision.

Assessment Signals

Can the person describe where their critical business data lives without saying "I'd have to check"?
Do they know who owns their data governance policy — not who wrote it, but who enforces it?
Can they describe a recent instance where data quality blocked a project?
Is there a data catalog or inventory, or does discovery require tribal knowledge?

Layer 2

Architecture

How the pieces connect. Process design, integration layers, workflow readiness. The question isn't whether you have the technology — it's whether your processes deserve to be automated or need to be redesigned first.

Where Companies Fail

"Automating the mess"

The most expensive mistake in enterprise AI: taking a broken manual process and layering AI on top of it. The AI runs faster, which means it produces bad outputs faster. Organizations confuse speed with improvement.

A large bank automated their loan approval workflow. The existing process had 23 manual handoffs, six of which existed because of a regulatory requirement repealed in 2019 but never removed. The AI automated all 23 steps, including the six unnecessary ones. 40% faster but still fundamentally wasteful — and now harder to change because it was embedded in code.

"Integration by prayer"

Systems connected through brittle point-to-point integrations, manual CSV exports, or someone running a script on their laptop every Tuesday morning. When the AI needs real-time data from three sources, it gets stale data from two and nothing from the third because Kevin is on vacation.

A hospital network's AI patient flow prediction relied on bed availability data updated via manual nurse entry that synced every 4 hours. The AI predicted patient flow beautifully against reported data, but reported data was always 2–4 hours behind reality. Technically accurate. Operationally useless.

Where Companies Succeed

"Redesign then automate"

Organizations that achieve the highest ROI from AI treat implementation as a trigger to redesign workflows first. "If we were building this process from scratch today, what would it look like?" Then they automate the redesigned version.

When Toyota implemented AI-assisted quality inspection, they didn't just point cameras at existing stations. They redesigned the entire quality workflow — moving inspection earlier, consolidating redundant checks, eliminating steps that existed because human inspectors couldn't see certain defects. Defects dropped 50% while removing inspection steps.

"API-first architecture"

Organizations with clean API layers between systems can plug AI in without re-architecting everything. The AI becomes a new consumer of existing, well-documented interfaces.

Stripe's architecture means any new capability — AI fraud detection, smart routing, risk scoring — plugs into the same API infrastructure external customers use. No special integration work needed. This is why they ship AI features in weeks, not quarters.

Assessment Signals

Can the person map their critical workflow end-to-end, including handoffs between systems and teams?
When was the last time a major process was redesigned (not just patched)?
How many "Kevin on a laptop" dependencies exist in critical data flows?
If a system goes down at 2am, does the workflow stop or degrade gracefully?

Layer 3

Accountability

Not governance in the abstract compliance sense. Actual human accountability. Who owns AI outcomes? Who decides to kill a failing project? Who gets the call when the agent hallucinates at 2am? Most organizations have governance documents but no governance behavior.

Where Companies Fail

"Everybody owns it, nobody owns it"

AI gets assigned to a Center of Excellence or an AI Council that meets monthly, reviews dashboards, and has no actual decision-making authority. When something goes wrong, there are four people who could theoretically be responsible but none who will actually pick up the phone.

A major retailer's AI pricing engine began recommending prices below cost during a holiday weekend. Engineering assumed business was monitoring. Business assumed engineering had guardrails. The AI team assumed pricing had override authority. Nobody acted for 72 hours. Estimated impact: $2M+ in margin erosion.

"No kill switch"

Organizations invest months and millions into AI projects and create institutional momentum that makes it politically impossible to stop a failing initiative. Nobody wants to be the person who killed the CEO's pet AI project.

IBM's Watson for Oncology recommended cancer treatments worldwide. Internal documents revealed it sometimes recommended unsafe treatments, but organizational momentum behind "AI-powered cancer care" made it extraordinarily difficult for clinicians to push back. Accountability deferred to the technology's reputation rather than clinical outcomes.

Where Companies Succeed

"Named humans, not committees"

A single named individual — not a team, not a committee — is personally accountable for each AI deployment's outcomes. This person has the authority to pause or kill the deployment without committee approval.

Airbnb assigns a DRI (Directly Responsible Individual) to every AI feature. When pricing suggestions showed signs of discrimination, the DRI had pre-authorized authority to disable the feature within hours, investigate, and re-enable only after the issue was understood and fixed.

"Pre-committed kill criteria"

Before launching an AI initiative, successful organizations define the specific conditions under which they will stop, pause, or pivot — documented before institutional momentum builds.

Netflix includes pre-registered failure criteria for every model deployment. If a recommendation model's metrics drop below threshold for 48 hours, it automatically rolls back. The decision to stop is made before the emotional investment begins.

Assessment Signals

Can the person name the single individual accountable for AI outcomes (not a committee)?
Has the organization ever killed an AI project mid-flight?
Are there pre-defined criteria for when an AI deployment gets paused?
When was the last time someone said "no" to an AI initiative, and what happened to them politically?

Layer 4

Culture

Talent, change readiness, and organizational honesty. Can your people actually use these tools? Are they willing to change how they work? And most diagnostically — can your organization be honest about where it actually is versus where it wishes it were?

Where Companies Fail

"Training without transformation"

The organization runs AI literacy training, checks the "upskilled" box, and changes nothing about how work is actually done. Employees attend a 2-hour workshop on prompt engineering, go back to their desks, and continue doing their jobs exactly the same way. Training created awareness but not behavior change.

A Big Four consulting firm gave all consultants access to an internal AI tool for research synthesis. After 6 months, 80% of queries were basic factual lookups — the equivalent of using a Ferrari to drive to the mailbox. The tool could synthesize 50-page reports, but nobody was trained on those capabilities because the rollout focused on "how to log in" rather than "how this changes what your Tuesday looks like."

"Performative honesty"

Leadership asks "are we ready for AI?" and gets optimistic answers because nobody wants to slow down the CEO's vision. Readiness assessments are gamed — teams rate themselves 4/5 on data quality because admitting it's a 2 would mean explaining why they haven't fixed it yet.

McKinsey's research found that while 88% of organizations report using AI, only 39% can point to measurable EBIT impact. That delta represents organizations that told leadership "we're doing AI" but couldn't prove it was working — and nobody asked hard enough.

Where Companies Succeed

"Redesign the job, not just the toolkit"

Organizations that actually transform don't just give people AI tools — they redefine what good performance looks like. The performance review, the workflow, the daily standup, the definition of "done" — all of it shifts.

Klarna eliminated reliance on Salesforce and Workday, replacing significant portions with AI. The key: they simultaneously restructured teams, eliminated middle-management layers, and redefined success metrics. Customer service agents weren't just given an AI assistant — their role was redefined from "resolve this ticket" to "handle the 15% of cases AI can't."

"Radical candor about readiness"

Organizations where leaders are rewarded for identifying gaps — not punished for admitting them — consistently deploy AI more successfully.

GitLab's radically transparent culture means AI readiness gaps get surfaced immediately. When their AI code review tool showed lower accuracy on proprietary Ruby code, the team publicly documented the limitation, proposed a timeline to fix it, and set explicit criteria for when it would be ready. No political cover-up, no inflated metrics.

Assessment Signals

Has the organization redesigned any job role or workflow around AI (not just added AI to the existing one)?
Can the person describe an AI initiative that didn't work, and what they learned?
What percentage of employees with AI tool access actually use the tools weekly?
Is "we're not ready for this" a safe sentence to say in a meeting?

Layer 5

Taste

Strategic judgment — the ability to distinguish between a good AI decision and a merely popular one. The discernment to know when NOT to deploy AI. This is the dimension no other assessment measures, because it can't be self-reported — it has to be revealed through choices.

The Thesis

Every organization has leaders who can evaluate a spreadsheet. Far fewer have leaders who can evaluate a decision. Taste is the difference between "this AI initiative has good metrics" and "this AI initiative is solving the right problem in the right way at the right time."

Taste is not subjective — it's observable through the quality of decisions an organization makes when certainty is low and stakes are high. It's what separates the 4% of organizations that achieve scaled AI deployment from the 96% that don't.

Where Companies Fail

"Hype-driven deployment"

The organization deploys AI because competitors are, because the board is asking, because the CEO saw a demo at Davos. Use case selection is driven by what sounds most impressive in a press release, not by where AI creates the most value.

Numerous enterprise chatbot deployments in 2023–2024 launched because "everyone needs a chatbot." Many handled 5% of queries, frustrated customers on the other 95%, and cost more than the human agents they replaced. Companies with taste invested in back-office document processing — unglamorous, high-ROI, and invisible to the press.

"Metric fixation"

Optimizing for the number that's easy to measure rather than the outcome that actually matters. An AI service bot that resolves 85% of tickets but tanks NPS. A recruiting AI that screens 10x more candidates but introduces subtle bias.

Amazon's internal AI recruiting tool was trained on 10 years of hiring data and got very good at predicting which resumes matched historical hires — which meant it systematically penalized resumes including "women's." The metric (match rate) was excellent. The judgment (training on biased data) was terrible. Amazon killed the project. That kill decision itself was an act of taste.

"AI for everything"

Deploying AI to 15 use cases simultaneously, spreading resources thin, getting mediocre results everywhere, and achieving excellence nowhere. Taste is also about restraint.

Where Companies Succeed

"The elegant constraint"

Organizations with taste deploy AI to fewer use cases but deploy it excellently. They choose the use case where AI creates disproportionate leverage.

John Deere didn't try to make AI do everything on a farm. They focused on one problem: identifying and spraying only the weeds, not the entire field. See & Spray reduced herbicide use by 77%. The taste wasn't in the technology — it was in the restraint.

"Knowing when NOT to use AI"

The highest expression of AI taste is recognizing when AI is the wrong solution. When a simpler rule-based system, a process redesign, or even a well-structured spreadsheet solves the problem better.

Basecamp (37signals) has been vocal about NOT deploying AI where simpler solutions work. If a rule-based system solves 95% of cases, don't build a machine learning model to get to 97%. The additional 2% rarely justifies the complexity. This is taste — knowing where sophistication creates value and where it creates cost.

"Second-order thinking"

Not just asking "will this AI work?" but "what happens after it works?" If the AI bot handles 80% of queries, what happens to the remaining 20%? Are those the hardest, highest-stakes interactions — meaning your human agents now handle only angry, complex cases all day?

When Shopify deployed AI for merchant support, they explicitly designed for the second-order effect. They knew AI would handle routine queries, leaving humans with harder cases. So they simultaneously restructured the human support role — different title, higher pay, different training. They anticipated that "AI handles the easy stuff" would change the human job, and they designed for that change proactively.

How Taste Is Assessed

Taste can't be self-reported. Nobody says "my AI judgment is poor." So instead, it's tested through scenario-based choices where there's no obviously correct answer. The pattern across multiple scenarios reveals whether you default to speed, safety, sophistication, or inertia.

The Pilot Dilemma

Your AI pilot shows 78% accuracy on a task that humans do at 85%. What's your move?

A) Kill it — it underperforms humans.
B) Launch it alongside humans as an assist tool.
C) Investigate what the 22% failure cases have in common.
D) Redefine the success metric — maybe accuracy isn't what matters.

What it reveals: A is premature. B is safe but incurious. C shows analytical depth. D shows strategic sophistication. The best answer depends on context — which is why the follow-up "why?" matters more than the choice itself.

The Shiny Object Test

Your CEO returns from a conference excited about deploying an LLM for internal knowledge management. Your knowledge base is 60% outdated Confluence pages, SharePoint files, and tribal knowledge. First move?

A) Start evaluating LLM vendors.
B) Audit and clean the knowledge base content first.
C) Run a pilot with a small team using the messy data.
D) Ask the CEO what specific problem they're trying to solve.

What it reveals: A is hype-driven. B is disciplined but assumes the solution. C tests assumptions empirically. D reframes entirely — the CEO might be solving a problem with a simpler answer.

The Kill Decision

18 months and $400K in. Mixed results. Passionate team. Invested executive sponsor. Growing usage but unclear ROI. Your call?

A) Give it 6 more months with a clearer ROI framework.
B) Pivot to a related but more measurable use case.
C) Kill it and reallocate the budget.
D) Commission an independent review before deciding.

What it reveals: A is sunk cost avoidance disguised as patience. B shows creative problem-solving. C shows courage. D shows governance maturity. The pattern across scenarios like this reveals whether the organization defaults to momentum or judgment.

The Agent Question

Your team proposes an AI agent that autonomously processes customer refunds up to $500. 96% accurate in testing. What's your primary concern?

A) The 4% error rate on financial transactions.
B) Customer reaction to knowing AI handled their refund.
C) What the agent does when it encounters a case it wasn't trained on.
D) Whether the current refund process should exist at all.

What it reveals: A is risk management (important but table stakes). B is brand awareness. C is edge-case thinking — the hallmark of implementation maturity. D is first-principles thinking. Taste is revealed in what someone worries about first.

Feature	Traditional Assessment	Jewell Assessment
Format	Static questionnaire	Adaptive conversation
Measures	Capability (can you?)	Capability + Judgment (should you?)
"I don't know"	Skip / N/A	Diagnostic signal
Output	Score + checklist	Constraint diagnosis + taste signature
Accountability	Measures governance docs	Measures governance behavior
Taste	Doesn't exist	Revealed through scenario choices
Time to value	6–10 week engagement	10 minutes to first insight
Cost	$50K – $175K consulting	Free
Bias	Built by vendors selling implementation	Built by an independent practitioner

Five Layers of AI Readiness

Foundation

"We have the data somewhere"

"Governance by document"

"Data as a product"

"Start with the query, not the warehouse"

Assessment Signals

Architecture

"Automating the mess"

"Integration by prayer"

"Redesign then automate"

"API-first architecture"

Assessment Signals

Accountability

"Everybody owns it, nobody owns it"

"No kill switch"

"Named humans, not committees"

"Pre-committed kill criteria"

Assessment Signals

Culture

"Training without transformation"

"Performative honesty"

"Redesign the job, not just the toolkit"

"Radical candor about readiness"

Assessment Signals

Taste

"Hype-driven deployment"

"Metric fixation"

"AI for everything"

"The elegant constraint"

"Knowing when NOT to use AI"

"Second-order thinking"

The Pilot Dilemma

The Shiny Object Test

The Kill Decision

The Agent Question

How This Is Different

Take the Assessment