Two-Factor Authentication for AI Organizations: Clerk-Powered MFA
TL;DR
- We ship Clerk-powered auth with MFA support for AI agents -- the same security controls human organizations require, applied to a cyborgenic organization.
- Clerk handles the authentication layer (login, session management, MFA enrollment) while we focus on agent-specific authorization and challenge/response flows.
- If your agents can deploy code and send emails, a single compromised credential should not be enough to impersonate them.
Your AI agent can send emails on your behalf, deploy code to production, and make API calls to your cloud infrastructure. How confident are you that the thing requesting those actions is actually your agent?
Most AI authentication stops at API keys. Maybe a service account. The assumption is that agents run in controlled environments, so identity verification is somebody else's problem. That assumption breaks the moment your agents operate as a cyborgenic organization -- with roles, permissions, inter-agent communication, and access to systems that matter.
Our CSO (Chief Security Officer agent) flagged this gap during a security review. The recommendation was specific: implement proper multi-factor authentication with challenge/response flows and per-session state. Not as a feature request. As a security requirement.
We integrated Clerk as the authentication provider and built agent-specific MFA flows on top. Clerk handles user/agent identity, session management, and MFA enrollment (including optional TOTP as a second factor). We handle the agent-specific challenge/response layer — the part that makes MFA work when the users are autonomous agents, not humans clicking buttons.
Why MFA for Agents
The standard argument for MFA is that passwords get stolen. That argument translates directly to the agent world, just with different nouns.
API keys get leaked in logs. Service account tokens get over-scoped. Session credentials persist longer than they should. An agent that authenticates once at boot and then operates unchallenged for hours has the same vulnerability profile as a human user who logs in once and never locks their screen.
The risk compounds when agents interact with each other. In our fleet, agents delegate tasks, send messages, join meetings, and access shared resources. If one agent's identity is compromised, the blast radius includes every system that agent can reach — and every agent that trusts its messages.
MFA adds a second verification layer. Before an agent performs a sensitive operation, it must prove its identity through a time-based one-time password that changes every 30 seconds. Even if a session token is compromised, the attacker cannot generate valid TOTP codes without access to the shared secret.
This is not theoretical hardening. This is the same security posture that every serious human organization requires — applied to an organization that happens to run on AI agents.
Authentication Architecture: Clerk + Agent-Specific MFA
The authentication stack has two layers. Clerk handles the heavy lifting — user identity, session management, and MFA enrollment. Our agent-specific layer handles the part Clerk was not designed for: challenge/response flows for autonomous agents that operate without a browser.
Clerk provides out-of-the-box MFA support including TOTP as a second factor. Users and agent operators enroll through Clerk's standard flow. What Clerk does not provide is a programmatic challenge/response mechanism for agents that need to prove their identity during sensitive operations without human interaction.
That is where our MFA Manager comes in.
Backup Codes with Replay Prevention
Clerk manages MFA enrollment and provides backup codes through its standard flow. On our side, we add an additional layer of replay prevention for the agent-specific challenge codes used in programmatic operations.
Each agent-challenge backup code is SHA-256 hashed before storage. When used, it moves from the active list to a used list in a single atomic Firestore transaction. Each code works exactly once. The used list serves as an audit trail — an agent burning through backup codes at a high rate has a configuration problem that needs investigation, not more backup codes.
The MFA Manager: Challenge/Response with Per-Session State
The MFA Manager (mfa_manager.py, 131 lines) sits between the TOTP service and the API layer. Its job is managing the challenge/response flow — the stateful interaction where the system says "prove your identity" and the agent has a limited window to respond.
This is where the CSO's per-session state requirement comes in. The challenge/response flow could be stateless — issue a challenge token, embed the expected answer in a signed JWT, validate it when the response comes back. But stateless challenges have a problem: they are replayable within the token's validity window. If an attacker intercepts a challenge token, they can use it from a different session.
Per-session state solves this. Each challenge is bound to a specific session ID. The challenge state lives in memory (not Firestore — it is ephemeral by design) with a 5-minute TTL.
class MFAManager:
def __init__(self, totp_service: TOTPService):
self._totp = totp_service
self._challenges: dict[str, ChallengeState] = {}
self._ttl = timedelta(minutes=5)
async def create_challenge(
self, user_id: str, session_id: str
) -> Challenge:
"""Issue a new MFA challenge bound to a session."""
self._cleanup_expired()
challenge_id = secrets.token_urlsafe(32)
self._challenges[challenge_id] = ChallengeState(
user_id=user_id,
session_id=session_id,
created_at=datetime.utcnow(),
)
return Challenge(challenge_id=challenge_id, expires_in=300)
async def validate_challenge(
self, challenge_id: str, session_id: str, code: str
) -> bool:
"""Validate a TOTP code against an active challenge."""
state = self._challenges.get(challenge_id)
if not state:
return False
if state.session_id != session_id:
return False
if datetime.utcnow() - state.created_at > self._ttl:
del self._challenges[challenge_id]
return False
valid = await self._totp.verify_code(state.user_id, code)
if valid:
del self._challenges[challenge_id]
return valid
The 5-minute TTL is a deliberate trade-off. Shorter TTLs are more secure but risk timing out during legitimate verification (an agent in the middle of a long tool call might not respond immediately). Longer TTLs expand the attack window. Five minutes matches the typical agent response cycle — long enough that a busy agent can respond, short enough that a stale challenge does not linger.
Challenges are single-use. A successful validation deletes the challenge. A failed validation leaves it in place (the agent might have a clock sync issue and retry with the next code). An expired challenge gets cleaned up on the next create_challenge call. No garbage collection threads, no scheduled cleanup — the cleanup is amortized across normal operations.
The API: Six Endpoints for the Full MFA Lifecycle
The FastAPI router at /api/v1/mfa exposes six endpoints. 245 lines. Every operation an agent or administrator needs for MFA management.
router = APIRouter(prefix="/api/v1/mfa", tags=["mfa"])
@router.post("/enroll")
async def enroll(
request: EnrollRequest,
current_user: User = Depends(get_current_user),
) -> EnrollResponse:
"""Start MFA enrollment. Returns secret and backup codes."""
result = await totp_service.enroll(current_user.id)
return EnrollResponse(
provisioning_uri=result.provisioning_uri,
backup_codes=result.backup_codes,
)
@router.post("/verify")
async def verify(
request: VerifyRequest,
current_user: User = Depends(get_current_user),
) -> VerifyResponse:
"""Complete enrollment by verifying first TOTP code."""
valid = await totp_service.verify_code(current_user.id, request.code)
return VerifyResponse(verified=valid)
@router.post("/challenge")
async def challenge(
request: ChallengeRequest,
current_user: User = Depends(get_current_user),
) -> ChallengeResponse:
"""Issue an MFA challenge for a sensitive operation."""
result = await mfa_manager.create_challenge(
current_user.id, request.session_id
)
return ChallengeResponse(
challenge_id=result.challenge_id,
expires_in=result.expires_in,
)
@router.post("/validate")
async def validate(
request: ValidateRequest,
current_user: User = Depends(get_current_user),
) -> ValidateResponse:
"""Validate a TOTP code against an active challenge."""
valid = await mfa_manager.validate_challenge(
request.challenge_id, request.session_id, request.code
)
return ValidateResponse(valid=valid)
@router.post("/unenroll")
async def unenroll(
current_user: User = Depends(get_current_user),
) -> UnenrollResponse:
"""Remove MFA enrollment for a user."""
await totp_service.unenroll(current_user.id)
return UnenrollResponse(success=True)
@router.get("/status")
async def status(
current_user: User = Depends(get_current_user),
) -> StatusResponse:
"""Check MFA enrollment and verification status."""
enrolled = await totp_service.is_enrolled(current_user.id)
verified = await totp_service.is_verified(current_user.id)
return StatusResponse(enrolled=enrolled, verified=verified)
The endpoint split is intentional. enroll and verify handle the setup phase — they run once per user. challenge and validate handle the runtime phase — they run every time a sensitive operation requires MFA. unenroll and status are administrative. Separating them makes it easy to apply different rate limits, logging levels, and access controls to each phase.
Every endpoint uses Depends(get_current_user) for primary authentication. MFA is the second factor — it does not replace the first. An unauthenticated request never reaches the MFA layer.
20 Tests, 5 Categories
The test suite covers five areas: Clerk enrollment and session validation, MFA challenge/response flows (session binding, TTL enforcement, single-use guarantees), backup codes (single-use enforcement, replay prevention, hash verification), integration flow (full enroll-verify-challenge-validate cycle), and API contract (HTTP status codes, response schemas, error handling for every endpoint).
Twenty tests is not a large suite. But these twenty tests cover the security-critical paths — the ones where a bug means authentication bypass, not a cosmetic issue. Every test that verifies a rejection path (expired challenge, replayed backup code, wrong session ID, Clerk session mismatch) is more valuable than a test that verifies a happy path. The attacker does not care about your happy path.
The Insight: AI Organizations Need Real Security
There is a tendency to treat AI agent security as a future problem. The agents are running in your infrastructure, behind your firewall, using your credentials. Why would they need MFA?
Because the threat model is not "someone breaks into your data center." The threat model is: a session token leaks in a log file. A misconfigured NATS subscription lets one agent impersonate another. A supply-chain compromise in an MCP tool exfiltrates credentials. These are the same threats that drove human organizations to adopt MFA a decade ago — adapted for a world where the users are agents.
Our agents send emails. They make API calls to GCP, AWS, and Azure. They access Firestore, Neo4j, and PostgreSQL. They communicate with each other through authenticated channels. Every one of those interactions is a surface. Clerk-powered auth with MFA support does not eliminate the surface, but it ensures that access to a single credential is not sufficient to exploit it.
The CSO was right to make this a requirement, not a suggestion. When your organization runs on AI agents, the authentication layer is not a nice-to-have. It is load-bearing infrastructure.
Try It
A cyborgenic organization demands the same security posture as a human one -- Clerk-powered authentication, MFA, backup codes, per-session challenges. The system is live and the agents are authenticating.
Build your own cyborgenic organization at agent.ceo.