Three role families that did not exist on any org chart twelve months ago — agent infrastructure engineers, agent safety evaluators, and agent product managers — now command dedicated teams at Anthropic, OpenAI, Google DeepMind, and Microsoft, with open-role counts across those four employers up an estimated 290% year-over-year, per ENTRA open-role tracking across their Greenhouse boards, Q2 2025 vs. Q2 2026. They are the fastest-growing hiring sub-segment in the AI labor market as of June 2026, and the candidates who understand precisely how these roles differ from existing tracks — and what they actually pay — hold the sharpest information edge in the current market.
Section 1 — The Three New Roles
The term "agent engineering" has been applied loosely enough to lose meaning. Precision matters here: the three emerging role families are structurally distinct from each other and from the existing AI engineering taxonomy. Conflating them — as many job postings currently do — is the reason the supply-demand mismatch in this segment is steeper than the headline numbers suggest.
Agent Infrastructure Engineers build the scaffolding that autonomous systems run on. They are not model researchers. They do not fine-tune weights or publish training runs. Their domain is tool-use integration (connecting agents to APIs, databases, and external services), memory architecture (episodic, semantic, and working memory systems that persist context across long-horizon tasks), and planning-loop reliability (the orchestration layer that decides when an agent should call a sub-agent, when it should request human review, and when it should terminate). The skill set overlaps with senior backend and distributed-systems engineering more than it overlaps with ML research — but it requires fluency with model behavior that a pure backend engineer does not have. The canonical prior-role trajectory is: Senior Software Engineer with LLM API experience → Staff-level role building internal agent frameworks → Agent Infrastructure Engineer at a lab or applied AI team. The role did not exist as a distinct title two years ago. It now represents the single largest cluster of new open roles in the agent layer, accounting for an estimated 44% of agent-specific postings reviewed by ENTRA across the top four employers through May 2026.
Agent Safety Evaluators occupy a distinct niche from alignment researchers. Alignment research — the domain of Anthropic's Alignment Science team, DeepMind's AGI Safety group, and OpenAI's Superalignment team — operates at the level of training objectives, value specification, and fundamental model behavior. Agent safety evaluation operates at the deployment layer: it is adversarial red-teaming of agents running in production or near-production environments, assessing what happens when an agent with tool access, memory, and a multi-step planning horizon is subjected to prompt injection, goal misgeneralization, tool-misuse attacks, and emergent harmful behavior at the task level. The profile here is distinct from both a pure researcher and a traditional security engineer. The most competitive candidates have a combination of ML engineering background (enough to understand model internals and attack surfaces), red-teaming methodology (drawn from AI safety research or adversarial ML), and the operational security instincts to think about threat models at the system level rather than the model level. Anthropic's Frontier Red Team and OpenAI's preparedness function have both published role criteria that explicitly distinguish this profile from alignment research. The supply of candidates who meet all three criteria simultaneously is thin.
Agent Product Managers are the hardest role to describe precisely because the job is still being invented in real time. They are not traditional PMs who write product requirement documents for engineering teams. They are not AI engineers who informally own roadmaps. They sit at the intersection of three competencies that are rarely combined in a single candidate: deep technical fluency with how agents behave at the system level (not model theory, but actual agent behavior in production), product instinct about what enterprise and consumer users need from autonomous workflows, and the communication capacity to translate agent failure modes into product decisions that non-technical stakeholders can act on. The clearest analogue is the developer-relations PM role that emerged in API-era tech companies — someone who speaks both languages. Microsoft's Copilot Studio team has been the most explicit about codifying this role, with a job architecture that places Agent PMs on the same seniority ladder as Principal PMs but with distinct evaluation criteria that include agent-specific technical assessments. At Anthropic, the equivalent function sits within the Claude product organization and shares characteristics with both product management and AI engineering. The role is currently under-hired across the industry relative to demand: there are approximately 2.7 agent-infrastructure and agent-safety openings for every agent-PM opening at the four major employers tracked by ENTRA — but nearly every senior leader interviewed by ENTRA for this analysis named agent PM hiring as the harder fill.
Section 2 — Cross-Lab Comparison
The four organizations most aggressively building out the agent talent stack are operating from meaningfully different starting positions, hiring at different velocities, and competing for overlapping but not identical candidate pools.
Anthropic has the most mature agent-specific org structure of any frontier lab. The Claude Agents team — which oversees tool use, computer use, and long-horizon task execution capabilities — has grown from an estimated 18 engineers and researchers in January 2026 to approximately 47 as of June 2026, per LinkedIn tracking and public role data reviewed by ENTRA. The growth reflects both organic hiring and internal transfers from the broader Claude product engineering org. Anthropic's agent infrastructure roles are listed under the Claude Engineering umbrella and require explicit experience with multi-step task orchestration and stateful agent systems — the job descriptions are among the most technically precise in the market, which functions as a de facto screening mechanism that narrows the applicant pool but improves conversion rates. Agent safety evaluation at Anthropic sits within the Frontier Red Team, which the company has described publicly as a distinct function from Alignment Science. Headcount in that team is not disclosed, but open-role data shows at least 6 active evaluator postings as of late May 2026, consistent with a team of 15–25.
OpenAI is hiring for the agent stack at greater volume but with less organizational formalization. The Operators team — the internal designation for the function building OpenAI's autonomous-workflow and agent-API infrastructure — is estimated at 55–70 engineers and PMs as of H1 2026, per ENTRA LinkedIn analysis, growing from a smaller combined function that did not have a distinct team identity before Q3 2025. The agent safety function at OpenAI sits within the Preparedness team, which was formalized in late 2023 and has continued to grow; current estimates, per ENTRA LinkedIn analysis, place the team at 30–40 researchers and evaluators. OpenAI's agent PM profile is the most product-manager-classic of the four organizations: the role emphasizes external developer experience and enterprise workflow design more than internal research collaboration, reflecting OpenAI's commercial-first positioning in the agent market.
Google DeepMind's Gemini Agents org sits at the intersection of DeepMind's research culture and Google's product deployment infrastructure — a combination that makes it structurally unlike either of its main US competitors. The Gemini Agents team, based across Mountain View, London, and Zurich, has built its agent infrastructure talent strategy around Research Engineer profiles rather than the pure-software-engineering profiles that Anthropic and OpenAI favor. The E6–E7 Research Engineer band at Google ($262K–$395K base; $420K–$650K TC including RSU) is the primary entry point for agent infrastructure roles, with the expectation that candidates have both systems-engineering depth and enough ML fluency to reason about model behavior in agentic contexts. The cross-site structure — with planning-system work in Mountain View, memory architecture research in London, and safety evaluation distributed across both — creates a hiring footprint that is harder to track from the outside. ENTRA estimates the combined Gemini Agents technical headcount at 80–110 across sites as of June 2026.
Microsoft's Copilot Studio team is the applied-AI outlier in this comparison. It is the only one of the four primarily a product-deployment organization rather than a research organization with a product function attached — and that distinction shapes its agent talent stack entirely. Copilot Studio's agent infrastructure is built on Azure AI infrastructure rather than on lab-built scaffolding, which means the infrastructure engineers it hires are closer to the senior platform-engineer profile than to the research-engineer profile. The team has grown from approximately 4,200 in January 2026 to an estimated 6,800 as of June per ENTRA LinkedIn analysis, though that figure encompasses the entire Copilot product organization rather than the agent-specific function. Within Copilot Studio, the agent-layer headcount is concentrated in three areas: tool-orchestration engineering (the Azure integration layer), enterprise safety and compliance (which maps roughly to the agent safety evaluator function but with a compliance-engineering rather than adversarial-testing orientation), and agent PM, which Microsoft has most explicitly articulated as a distinct role family. The comp structure reflects Microsoft's position as a large-cap tech employer rather than a frontier lab: Copilot Studio agent infrastructure engineers enter at the L63–L65 band ($190K–$310K TC), materially below equivalent agent-infrastructure roles at Anthropic or OpenAI.
Section 3 — The Comp Math
The agent-layer comp structure has clarified enough in H1 2026 to publish specific bands — and the picture that emerges is nuanced in ways that the simple "agents pay less than research" summary obscures.
The foundational bifurcation is real. A senior agent infrastructure engineer at a frontier lab — L6 equivalent at Anthropic or OpenAI, E6 at Google — earns $350K–$480K TC. The equivalent research-track role (Member of Technical Staff at Anthropic, Researcher L4–L5 at OpenAI, Research Scientist RS2–RS3 at DeepMind) earns $480K–$740K TC at the same seniority level. The $130K–$260K gap between agent infrastructure and frontier research at the senior level is the agent-layer analogue of the broader research-engineering bifurcation documented in this publication's May 2026 comp analysis.
But the gap is narrowing faster in the agent-infrastructure segment than anywhere else in the applied-engineering spectrum. The $350K–$480K TC band for senior agent infrastructure engineers in June 2026 represents an approximately $40K–$70K increase from the equivalent roles twelve months earlier, when comparable positions — where they existed at all — sat at $290K–$420K TC. The growth rate of agent-infrastructure comp outpaces the growth rate of frontier research comp over the same period by approximately 15–20 percentage points. This is a supply-constraint signal, not a deliberate market-design outcome: the combination of thin candidate supply, high employer urgency, and the organizational novelty of the role (which means candidates can credibly negotiate on the basis of scarcity) has pushed agent-infrastructure comp upward faster than either the research track or the broader applied-engineering track.
| Role | Track | Level Equiv. | TC Range (USD) | YoY Change | |---|---|---|---|---| | Agent Infrastructure Engineer | Applied / Systems | L6 / E6 | $350K–$480K | +$55K median | | Agent Safety Evaluator | Safety / Applied Research | L5–L6 | $380K–$520K | +$60K median | | Agent PM | Product | L6 / E6 | $290K–$400K | +$45K median | | Frontier Research Scientist | Research | L5–L6 | $480K–$740K | +$30K median | | Applied / Product Engineer | Applied | L5–L6 | $360K–$540K | +$25K median |
Sources: Levels.fyi Q1–Q2 2026; 6figr 2026; ENTRA recruiter survey Q1 2026; open-role market analysis. TC includes base, equity grant-date value, and performance cash. YoY change reflects median TC movement from June 2025 to June 2026.
The agent safety evaluator band deserves specific attention. At $380K–$520K TC for senior roles, agent safety evaluation commands a $30K–$80K premium over agent infrastructure engineering at equivalent seniority and sits below the frontier research band. The premium reflects the dual scarcity: the candidate must combine ML engineering competence with adversarial security methodology, a pairing that the market has priced above either discipline held in isolation.
Agent PMs command the lowest absolute TC of the three new role families at $290K–$400K. This does not reflect lower organizational priority — it reflects the fact that the PM labor market, even in AI-specialized form, has not experienced the same supply compression as engineering and safety evaluation tracks. The candidates who can fill agent PM roles draw from a broader population (experienced product managers who develop technical AI fluency) than the candidates who can fill agent infrastructure or agent safety roles (engineers who develop product instincts, or safety researchers who develop deployment fluency). That larger supply pool suppresses the comp premium relative to what scarcity alone would price.
Section 4 — What's Next
Three signals will determine the shape of the agent talent stack through H2 2026 and into 2027.
The agent-infrastructure title is about to standardize. Right now, the role appears on job boards under at least fourteen distinct titles: Agent Systems Engineer, LLM Infrastructure Engineer, Agentic Frameworks Lead, Autonomous Systems Engineer, Tool-Use Platform Engineer, and variants thereof. This nomenclature fragmentation is typical of a role category that has not yet achieved sufficient market mass to generate uniform language. The signal to watch is whether one of the four major employers — most likely Anthropic or OpenAI, which have the most public-facing engineering brands — codifies a standard title that the broader market adopts. When that happens, the candidate pool will self-identify more accurately, applicant tracking will improve, and the supply-demand tracking that currently requires manual aggregation will become legible in LinkedIn and Levels.fyi data. ENTRA expects title standardization to begin in Q3 2026 and largely complete by Q1 2027.
The university pipeline is misaligned with the demand curve. Agent infrastructure engineering draws from computer science and systems engineering programs whose curricula do not yet treat multi-agent orchestration, stateful system design for LLM-based workflows, or production agent monitoring as core graduate competencies. The programs closest to producing candidate-ready graduates — CMU's ML Systems track, MIT's System Design and AI program, Stanford's AI Lab and DAWN group — are producing an ENTRA-estimated combined cohort of 200–300 graduates annually with relevant profiles, against an estimated 1,800–2,400 open roles at the agent-infrastructure level across the full market per ENTRA open-role tracking. The gap will not close from the university supply side before 2028 at the earliest. In the interim, the fastest route into the role family is internal transfer from distributed systems or backend engineering roles with LLM API experience — the path that a growing number of senior engineers at the four major employers are now taking deliberately.
The enterprise market is the next demand wave. The agent stack hiring described in this analysis is concentrated at frontier labs and applied AI organizations. The second-order demand wave — enterprise AI teams at financial services firms, healthcare systems, and professional services organizations that are deploying agent workflows on top of Anthropic's Claude, OpenAI's Operators, and Google's Gemini APIs — is beginning to generate agent-layer hiring that will dwarf lab-side demand by volume, if not by comp. Enterprise agent safety evaluation, in particular, is a nascent function at regulated-industry firms that have begun to recognize that deploying autonomous agents in compliance-sensitive environments requires dedicated adversarial testing capability that general information-security teams are not equipped to provide. The comp bands will be lower than the frontier-lab figures cited above — enterprise agent infrastructure engineers are likely to enter at $220K–$310K TC outside of financial services, with bulge-bracket banks paying closer to $280K–$380K — but the hiring volume will be substantially larger.
The agentic AI layer is not a research project or a roadmap item. It is a functioning talent market with measurable comp bands, supply-demand imbalances, and a hiring velocity that has outpaced every other AI sub-segment in H1 2026. The labs and applied AI organizations that move fastest to formalize these role families — with precise job architectures, competitive comp bands, and clear career-ladder positioning relative to both the research track and the applied-engineering track — will lock in the candidates who define what autonomous AI systems can do in production. The organizations that treat agent roles as applied-engineering variations will spend H2 2026 filling the same positions twice.
