18 Apr 2025 13 min read AI

Inside Box’s AI Strategy: Aaron Levie on Agents, Interoperability, and the Future of Work

In this fast-paced and insight-packed episode, Box CEO Aaron Levie joins the pod to unpack how one of the most established enterprise content platforms is reinventing itself around AI. From building intelligent agents on top of 20 years of enterprise data, to navigating model interoperability and the coming age of agent-to-agent ecosystems, Aaron delivers a masterclass in what it really takes to go from SaaS to AI-native.

We cover everything from Box’s internal AI culture and their refusal to join the model arms race, to his vision for agent-based interoperability and the pitfalls of chaining probabilistic systems. Whether you’re building the next AI workflow tool or wondering how incumbents can stay relevant, this is one conversation you don’t want to miss.

“95% of enterprise data is underutilized. AI agents let us finally activate it.” — Aaron Levie

Tune in for sharp insights, candid strategy, and a look into the AI-powered future of enterprise software. PS: Also check out Box's origin story.

Part One: The State of AI in the Enterprise

🔁 From Hype to Adoption: Where Are We, Really?

Aaron frames the state of enterprise AI adoption using the “Crossing the Chasm” model:

“We’re in the ascent on the early pragmatist to pragmatist side.”

That means we’ve moved beyond early enthusiasts to practical, risk-managed implementations.

But—and this is key—not all AI use cases are at the same point. You have to think about AI adoption category by category:

Use Case	Adoption Stage
AI coding (e.g., GitHub Copilot)	✅ Fully in pragmatist growth mode
Document RAG (retrieval-augmented generation)	🚀 Early pragmatist, climbing fast
AI outbound sales reps	🔬 Still early adopters only
Multi-step agents with chain-of-thought	⚠️ Experimental, error-prone

Levie says:

“You can’t just say ‘AI’ as a monolith. You need to know what problem and what workflow you’re applying it to.”

🧪 Proof-of-Concept vs. Deployment: What's Actually Live?

Enterprise interest is high. Executives are leaning in, experimenting actively:

“Most companies I meet with are like, ‘How many use cases can I apply this to?’”

However, a reality check is also needed:

Many deployments are still in pilot or proof-of-concept phase.
The biggest blockers aren’t belief—but technical feasibility, data quality, and governance.

For instance:

“If you just deploy a really good agent across a 5,000-person company... they’ll start surfacing corporate secrets by accident because of bad permissions.”

This is why search, indexing, and data governance are still critical AI infrastructure problems.

⚖️ Why AI ≠ Cloud (But It Rhymes)

Levie contrasts AI adoption with the last big transformation: cloud computing. With cloud, there was years of resistance from banks and enterprises.

With AI?

“There’s no philosophical resistance like with cloud. The energy is different—everyone wants in.”

Why the difference?

ChatGPT gave employees and execs a personal “aha” moment.
A new generation of employees expects AI tools as part of how they work.
Top-down (CEO) and bottom-up (end-user) pressures are aligned—unlike in previous tech waves.

📊 Can You Charge More Just Because It's AI?

Levie is crystal clear on this:

“You won’t grow faster because there’s a magical AI premium on your software.”

There is no pricing multiplier just for having AI features. Instead, growth comes from expanding into new workflows and new TAM.

✅ You can charge more if:

AI unlocks new use cases that weren’t previously software-driven.
AI automates previously manual processes (e.g., contract review, sales prep).

❌ You cannot charge more if:

You're just layering AI on top of existing features and expecting higher price points.
You assume users will pay a premium “because it’s AI.”

“Wall Street got ahead of themselves for three months thinking you could just raise prices because it seems cool. That was ill-fated.”

🔁 Product & GTM Adjustments in the AI Era

AI doesn’t just change products—it shifts go-to-market motions and business models:

Land-and-expand motions are supercharged with agents.
You can enter more verticals without building full vertical SaaS apps (if agents are modular).
Pricing may shift to usage-based or task-based in some contexts, but Box still monetizes primarily via seat-based pricing plus add-ons.

🌍 Why Open, Not Walled Garden, Wins

Levie repeatedly emphasizes openness and composability:

“We want to be the best place for companies to manage content—but we don’t need to own the interface. Just let the agents talk.”

That includes:

Supporting multiple foundation models (OpenAI, Anthropic, Gemini, etc.)
Offering API-first agent infrastructure for internal or external developers
Enabling agent-to-agent interoperability across SaaS products

He compares this to the rise of REST APIs 20 years ago—agent-to-agent communication may follow the same pattern.

⚠️ The Real Bottlenecks

Even if enterprises are hyped, the following issues are still big friction points:

Search quality and relevance: Bad search = bad AI.
Governance and permissions: Especially in agent workflows.
Agent chaining errors: Compounding probabilistic logic can go sideways fast.
Lack of reviewable output: If users can’t validate, trust breaks.

“AI will only be as good as the data you feed it... and we still live in a world of messy search.”

Part Two: AI Strategy and Product at Box

Box’s AI strategy is a top-down, system-level orchestration of AI services, layered on top of two decades of enterprise content management. It's not about building foundational models, but about activating underutilized enterprise data using composable agents, powerful developer tools, and targeted interfaces.

Aaron Levie sums it up:

“What gets me excited about agents is the expansion of what people can do with software — solving use cases we just never ended up prioritizing before.”

🏗️ The Box AI Architecture: A Layered System

1. The Foundation: Decades of Enterprise Content

Box already powers content for over 100,000 organizations — this includes:

Versioned documents
Permission layers
Governance and compliance frameworks
In-browser preview tech

This existing infrastructure becomes the launchpad for AI workloads. For example, Box’s file viewer — originally designed to render documents — already extracted text and PDF previews, which is now reused for embedding generation and document parsing.

“That conversion engine we built to view Word docs in your browser? That’s now how we extract text for embeddings.”

2. AI Platform Layer

Box has built a robust AI platform that includes:

Text Extraction: From Office files, PDFs, contracts, etc.
Embedding Generation: Based on selected corpora, not applied universally yet due to cost (hundreds of billions of files).
Vector Storage: Enabling semantic search and retrieval.
Model Abstraction Layer: Integrates multiple LLMs (OpenAI, Anthropic, Gemini); customers can plug in their own API keys.
Agent Framework: Users can configure model instructions + tools = primitive agent setup.

“We don’t care if the customer uses the UI or just APIs. It’s all about making our platform AI-native and developer-ready.”

3. Box Hubs: Targeted, Scoped RAG Interfaces

Hubs are a UI and data construct where customers can query specific document sets (e.g., HR policies, earnings reports).

Why this approach works:

Data scope is curated: Customers upload relevant docs.
User queries are topic-bound: HR questions in the HR Hub.
Reduces hallucination, increases precision.

“We’re cheating in two ways: the data is constrained and the intent is constrained. That eliminates 95% of the traditional RAG problems.”

4. Box AI Studio + API Access

AI Studio: GUI for configuring custom agents (models, tools, prompts).
API Suite: Enables external developers to:
- Query Hubs
- Run agents on documents
- Pull summaries, extracted data, or filtered results
- Integrate Box AI into their own apps

“You can say: ‘Here’s the hub ID, here’s the agent I want to use — now go.’”

This setup positions Box as a headless content AI infrastructure, not just a SaaS tool.

🧠 Agent Use Cases: Not Just Replacing People — Enabling the Impossible

Levie emphasizes: AI’s real power is not just replacing manual labor, but unlocking work that previously wasn’t possible due to time, cost, or resource constraints.

Some examples:

Legal: Parse and extract contract clauses at scale → inform renewal strategy
Sales: Pull insights from sales decks and proposals → improve personalization
M&A: Review thousands of diligence docs → detect risks and duplications
Support: Query product manuals and technical docs → power self-service bots

“For 95% of enterprise data, there’s value we just never tapped into. Agents let us unlock that value.”

🤝 Open by Design: Box’s Multi-Model Philosophy

Box does not train its own LLMs, nor does it fine-tune models. Instead, it:

Supports leading LLMs (OpenAI, Anthropic, Gemini, with Meta and xAI planned)
Allows customers to plug in their own models via API keys
Builds only lightweight scaffolding (like embedding improvements or prompt templates)

“We considered building a model for about 10 minutes... why would we want to enter that war?”
“If someone ships a breakthrough model, we want to adopt it — not be locked into something we built six months ago.”

Even fine-tuning is discouraged internally. Levie’s team focuses on model flexibility, not lock-in or over-customization.

🧪 Innovations: E-RAG and Future Direction

Box has already developed internal enhancements like E-RAG — an “Enhanced RAG” that improves entity extraction and retrieval precision for enterprise docs. These are infrastructure accelerators, not product silos.

“The goal is not to outdo the model companies. It’s to scaffold around the best models with the best content context.”

🔮 Vision: Enterprise Agent Ecosystem

In the long term, Box envisions a composable agent ecosystem, where:

Agents in Box talk to agents in Salesforce, Workday, Snowflake, etc.
Each system retains domain authority, but interoperates via shared agent protocols
Developers choose between traditional APIs or agent-level handoffs

“It’s like REST APIs 20 years ago. We’re going to see the same explosion in agent-to-agent communication.”

✅ Strategic Takeaways

Area	Strategic Choice
Foundation Model Strategy	Stay out of model wars. Plug into best-in-class.
Product Approach	Build orchestration, not algorithms. Enable use cases, not raw infrastructure.
Customer Flexibility	API-first, model-agnostic, scoped AI interfaces.
Innovation Focus	Enhanced retrieval, smart agents, practical workflows.
Long-Term Vision	Agent-based ecosystem across the enterprise stack.

Part Three: Building an AI Culture at Box

Box didn’t just add AI features — they went all in. The company undertook a full cultural shift, turning AI from a product initiative into a company-wide operating principle. This required mindset shifts, structural changes, and leadership alignment across every department.

“We’ve told everybody in the company that we want to use AI to be as productive as possible and aggressively use it across the business.”

🧭 Step 1: Founder-Led Conviction

Aaron Levie experienced his “ChatGPT moment” just like millions of others — but as a CEO and founder, it triggered immediate strategic action.

“The moment I realized I could copy-paste a document and ask questions... I thought, ‘This will change how we work forever.’”

Despite having access to playgrounds like GPT-3 before, it wasn’t until the ChatGPT interface simplified the experience that the lightbulb went off. He jumped into “founder mode,” evangelizing the opportunity internally — but this wasn’t a solo act.

🔗 Leadership lock-in:

CTO (from a Box-acquired company) aligned quickly and took ownership.
CPO and engineering leads joined the charge.
The initial pitch was modest — “just some API integrations” — but momentum quickly snowballed.

“We said it would be lightweight... now it’s the biggest team in the company.”

🛠️ Step 2: Organizational Rollout

Once the core AI team was formed, Box started to infuse AI thinking into the rest of the company, department by department.

How they did it:

Dedicated AI team built foundational tooling (e.g., Hubs, Studio, model orchestration).
Weekly internal all-hands demos showcased how employees across functions were using AI in their workflows.
Encouraged bottom-up experimentation across product, marketing, support, legal, etc.

This wasn’t just a centralized R&D effort — it became everybody’s job.

“Every sales rep, every support rep, every engineer — everyone now plays a role in our AI strategy.”

🧠 Step 3: Normalize Everyday AI Usage

Rather than treat AI as a high-level strategic layer, Box embedded it into daily workflows:

Internal tools widely adopted:

Box AI (their own product): Used by teams to summarize, query, and create content inside Box Notes and Hubs.
GitHub Copilot: Became widespread among engineering teams.
Claude: Integrated for various creative and research tasks.
Cursor: Rolled out for VS Code users.

“By volume, Box AI might be our most used AI tool internally, maybe even more than Copilot.”

AI at Box wasn’t reserved for product managers or engineers — support teams, sales reps, and marketers were using it too.

🌱 Step 4: Cultivate AI Fluency Company-Wide

Toby from Shopify posted a memo about mandating AI usage company-wide — Aaron praised it and said Box took a similar approach.

“Toby’s memo hit all the right notes — we’re doing much of the same: internal demos, shared learnings, pushing people to experiment.”

Key aspects of Box’s internal AI culture:

AI education is informal but constant — via demos, Slack sharing, and public wins.
Permission to play — employees are encouraged to test, prompt, experiment.
No AI priesthood — everyone, from entry-level to execs, is expected to build fluency.

⚖️ Cultural Tension: Founder Excitement vs. Organizational Skepticism

Levie is self-aware about the risks of overhyping trends. As a founder who regularly gets excited about new tech, he knew he had to prove this wasn’t “just another VR moment.”

“You have to do this filtering: is this just founder hype, or a company-level pivot?”

That’s why early traction, prototypes, and use cases were critical. He positioned AI as a “code red” opportunity — not a shiny new toy, but a shift in how the company would work, build, and sell going forward.

🔄 AI Work = Everyone’s Work

AI is now part of the fabric of Box’s internal strategy. Every department has a role to play in building, testing, or adopting AI:

Function	AI Role
Engineering	Build internal tools, integrate external models, test developer agents
Product	Design UX around Hubs, agents, and enterprise workflows
Sales	Use Hubs to extract customer insights, generate proposals
Marketing	Leverage Box AI + Claude for content ideation and automation
Support	Build answer bots from internal product documentation
Legal/Compliance	Ensure data governance, permissions, and safe agent deployment

This wasn't a departmental initiative. It was organizational transformation.

🧩 Summary: The Playbook for AI Culture at Box

Principle	How Box Made It Happen
Founder-led urgency	Levie and execs pushed hard after ChatGPT’s launch
Central team, open APIs	Built platform components, but encouraged usage across org
“Use AI aggressively” mandate	Company-wide permission and expectation to integrate AI
Internal demos > top-down lectures	Weekly all-hands featured cross-functional AI wins
Empowerment, not control	No gatekeeping — anyone can try tools, give feedback
Phased scaling	Started lean, now Box’s AI org is the largest internal team

Part Four: The Future of Enterprise Agents

Aaron Levie believes the future of enterprise AI isn’t just about better models — it’s about how systems talk to each other. The next big wave, in his view, is not just using AI in isolation, but embedding agents deeply into the enterprise software fabric so they can collaborate across tools, vendors, and data systems.

“We imagine a world where agents can run around and talk to each other. And Box is one of those agents.”

🧩 The Big Shift: From Monoliths to Modular Agent Ecosystems

Today, enterprises often build siloed AI features into their platforms — chatbots here, summaries there, smart filters elsewhere. But that model won’t scale. Levie believes we’re heading toward:

Composable software systems, where AI agents act on behalf of users and apps.
Horizontal data access, where content and knowledge live across tools (Box, Salesforce, Workday, etc.).
Agent-to-agent workflows, where one app doesn’t own the entire workflow, but participates in it.

“We don’t need to own the interface — we just want to be the best place for content to be used in these workflows.”

🧠 Conceptual Framework: Agents as System-Level Actors

Levie sees agents as a natural abstraction on top of software APIs. Instead of writing brittle integrations, imagine this:

A Box Agent handles content queries: “Give me all contracts signed in the past year.”
A Salesforce Agent adds metadata: “These deals closed above $1M.”
A Workday Agent tags employees involved in those deals.

Each of these agents communicates asynchronously, respects permissions, and works on scoped tasks.

He references the analogy of REST APIs in the early 2000s:

“It’s following the same curve we saw with APIs 20 years ago. Agents will become the new API surface.”

🏗️ Enabling the Future: What Box Is Building

To prepare for this ecosystem, Box is doing two things:

1. Plug-and-Play Agent Infrastructure

Every Box Hub can be accessed via API.
You can query a Hub with an agent of your choice.
Box AI doesn’t require its own frontend — you can build your own.

“We don’t care if the customer comes through our UI or just uses the agent in their own system.”

2. Agent Development APIs

Box offers developer APIs that let third-party systems:
- Trigger agents
- Query content via Box Hubs
- Integrate into workflows outside of Box (e.g., internal dashboards or vertical SaaS)

This enables composable automation — agents that can be assembled like LEGO blocks to complete work across platforms.

🛑 Interop Challenges: Where Things Break Today

While the vision is compelling, Levie is realistic about the technical and operational challenges:

Search reliability: If an agent pulls the wrong data (due to poor retrieval), every step after is tainted.
Permissions and access control: AI can surface sensitive data never meant to be discoverable.
Agent chaining errors: With each handoff, probabilistic errors can compound — leading to hallucinated outputs.
Lack of shared standards: There's no universal "language" or protocol for how agents should describe capabilities or hand off tasks.

These are non-trivial engineering challenges that need to be solved before agentic systems can scale reliably.

“The moment the AI finds the wrong thing, you're path-dependent on that error. Now everything downstream is wrong.”

🔧 Agent Standards: The Role of MCP and Beyond

Levie calls out Anthropic’s MCP (Multimodal Communication Protocol) initiative as a potential catalyst:

“God bless Anthropic for putting MCP out early... someone needed to plant the flag.”

MCP proposes a standard protocol that agents could use to describe their capabilities, communicate context, and pass along tasks — like an API contract, but for LLM agents.

He’s optimistic, but also acknowledges:

Competing frameworks will emerge.
It may take 2–3 years before robust agent handoff becomes mainstream.

🛣️ Timeline and Adoption Curve

Levie sees the future playing out much like the evolution of cloud APIs:

Milestone	Equivalent in Agent Ecosystem
REST APIs become common	Agents define and expose capabilities
API documentation becomes standard	Agents describe skills, inputs, outputs
API calls become orchestrated	Agents chain workflows intelligently

“Two years from now, I think we’ll feel totally comfortable that all our software can talk to each other agentically.”

⚖️ Developer Tradeoffs: API vs. Agent

As this new paradigm emerges, developers will face key architectural choices:

Should I write direct API calls for precision and cost control?
Or use agents to orchestrate tasks in a more flexible but fuzzy way?

Levie’s prediction:

“A new generation of vibe coders will think the world is all MCPs... but we’ll need to be thoughtful about when you use agents vs. raw APIs.”

He warns against over-agentifying tasks that would be simpler with deterministic APIs. A hybrid model — agent for intelligence, API for precision — will likely dominate.

🧠 Strategic Positioning: Box’s Role in the Ecosystem

Box isn’t trying to be everything — it’s focused on being the best content agent in a network of interoperable systems:

They won’t build vertical SaaS apps (e.g., contract management or legaltech).
But they will expose content agents to help those tools access Box data reliably.
They welcome deep integrations, even with “competing” AI stacks.

“There’s always going to be more developers outside your company than inside — don’t rely only on internal innovation.”
(Referencing Bill Joy’s quote about Sun Microsystems)

🧭 Summary: The Interoperable Agent Future

Pillar	Strategy
Architecture	Build agents on top of content and workflows — not UIs or monoliths
Openness	Support third-party models, developer APIs, and plug-ins
Vision	Become a modular agent in a larger enterprise AI ecosystem
Interoperability	Bet on open protocols like MCP for agent-to-agent handoff
Pragmatism	Use agents where appropriate, retain API-first options where better