Your test suite is green. CI passed. The mock responses look exactly like what the BaaS documentation says they should look like. You ship.
Three weeks later, your transaction approval rate drops 4%. Your reconciliation job starts producing gaps. A specific edge case — high-value transfers with multi-party metadata — starts rejecting in production at a rate you can't explain from your test logs. You spend two days on it before you find it: the provider updated their validation schema six days before your deploy. Your mocks never knew.
This is the central problem with embedded finance integration testing. Most teams test their code. Almost none test their integrations. The distinction sounds subtle. In production, it's the difference between a smooth operation and a 3 AM incident.
Why Mocks Lie
Mocks are good for unit tests. They're dangerous for integration testing because they encode assumptions — and in embedded finance, the assumptions expire.
When you write a mock for a BaaS provider response, you're capturing the API's behavior at a point in time. The mock doesn't know that six months later, the provider added a required field to high-value transactions. It doesn't know that the error code for a declined transaction changed from INSUFFICIENT_FUNDS to DECLINED_INSUFFICIENT_FUNDS. It doesn't know that the approval SLA shrunk from 8 hours to 4 hours and now your timeout logic is wrong.
The mock faithfully returns the response you told it to return. Your test passes. The real provider returns something different. Production fails.
This isn't hypothetical. It's the default state of most embedded finance codebases. Mocks get written once, committed, and never updated — because they're not connected to the thing they're mocking. They accumulate drift the same way the integration itself does, just invisibly, inside your test suite.
The scariest part isn't that mocks lie. It's that they lie consistently. Every test passes. The CI is green. Confidence is high. And the first time you know the mock was wrong is when production breaks.
What You Actually Need to Test
Integration testing in embedded finance isn't about covering more code paths. It's about validating the contract between your system and your provider. That means four distinct layers, each of which requires a different testing approach.
Schema Drift
Your BaaS provider's API has a schema: field names, types, required/optional designations, allowed values. That schema is a contract. When it changes, your integration breaks — or worse, silently degrades.
Testing for schema drift means running your integration against the live schema on a continuous schedule, not just during development. The test should compare what your code sends and expects to receive against what the provider's current OpenAPI spec (or equivalent) says it should send and return. When there's a mismatch, you know before production does.
Practically: this means your test suite needs to pull the provider spec on every run, not use a cached copy. Schema tests that run against a spec file you checked in three months ago are schema tests that lie.
Latency and Rate Limits
Your code assumes the KYC verification endpoint responds in under 2 seconds. Your timeout is set to 3 seconds. In staging, you've never seen it take longer than 800ms.
Then your provider migrates to a new infrastructure tier. P99 latency goes from 800ms to 4.5 seconds for a subset of request types. Your timeouts start firing. Your error handling logic wasn't tested for this case — the mock always returned in under 10ms. You now have a cascading failure pattern you've never seen before.
Rate limit behavior is worse. Most teams only discover rate limits under production load. Mocks have no concept of rate limits. The test suite runs 50 requests sequentially and everything passes. Production sends 3,000 requests in 30 seconds during a batch job and gets throttled. Your retry logic — which also wasn't tested — starts hammering the endpoint and makes it worse.
Integration tests that don't exercise latency variance and rate limit paths are tests with a significant blind spot. You need to test: what happens when the endpoint is slow? What happens when you hit a rate limit? What does your system do when the response takes 10x longer than expected?
Error Codes and Edge-Case Responses
BaaS providers have rich error taxonomies. A transaction can fail for dozens of reasons: insufficient funds, compliance hold, KYC not completed, fraud flag, provider maintenance window, sponsor bank timeout. Each has a different error code. Each should trigger different behavior in your application.
Most integration test suites mock exactly two cases: success and generic error. The real provider returns eighteen distinct error states. Your code handles two of them gracefully. The other sixteen produce undefined behavior — usually a generic 500 passed to the end user, sometimes a silent data inconsistency, occasionally a hung operation that requires manual remediation.
You can't test all error paths with mocks alone, because you don't know what error codes the provider will return until you're in production and something breaks. The right approach: run against a sandbox environment that actually returns real error codes, then assert your handling at each path. Cross-reference your error handling logic against the provider's documented error taxonomy. Anything in the taxonomy that isn't tested is a gap.
Get weekly insights on embedded finance monitoring and integration testing patterns.
8 questions. 2 minutes. Get your score.
Staging vs. Production Parity
This is where most embedded finance testing strategies fall apart completely.
BaaS providers give you a sandbox. You test in the sandbox. The sandbox behaves like production — most of the time. But sandbox environments are maintained separately from production. They don't always get the same updates. Provider engineering teams push changes to production first, then sandbox. Sometimes the sandbox is weeks behind.
The result: your sandbox tests pass because they're running against old behavior. Production has already moved. By the time your tests catch the difference, you've already shipped to production and the break is live.
Worse: sandbox environments often have different rate limits, different KYC requirements, and different data seeding than production. A KYC check that's instant in sandbox because the test account is pre-approved takes 4 hours in production because it requires real document verification. Your staging tests don't reflect this. Your staging to production release is always a step into uncertainty.
Closing the staging-production parity gap requires more than better tests — it requires continuous validation against production behavior, not just pre-deploy testing against a sandbox. The question isn't "does this pass in staging?" It's "is my production integration behaving the same way it was yesterday?"
The Continuous Validation Model
Pre-deploy testing is necessary but not sufficient. Embedded finance providers change behavior continuously — not just when you deploy. A provider schema update, a rate limit adjustment, a new error code taxonomy, a latency change — none of these are triggered by your deployment cycle. They happen on the provider's timeline.
The teams with reliable embedded finance operations have shifted from a test-before-deploy model to a continuous validation model. The difference:
- Test-before-deploy: You test when you're about to ship. Testing is reactive to your release cycle. Between releases, you have no visibility into whether your integration is still behaving correctly.
- Continuous validation: Your integration is validated on a schedule — hourly, daily, or per-transaction. When provider behavior changes, you know within minutes, not weeks. The test suite runs against live provider behavior, not cached mocks or sandbox environments.
Continuous validation catches what pre-deploy testing misses: drift that happens between your releases. It also gives you a behavioral baseline — you know what "normal" looks like for your integration (latency, approval rates, error distribution), so when something changes, the deviation is immediately visible.
This is harder to build than a standard test suite. It requires treating your integration like infrastructure: monitoring it, alerting on deviations, and maintaining a continuous picture of whether the contract between your system and your provider is intact. Most teams don't have the tooling to do this. They're still running manual integration tests, which means they're always one provider update away from a production incident they couldn't see coming.
What to Build vs. What to Buy
If you're building a rigorous embedded finance integration testing strategy from scratch, the honest breakdown looks like this:
Build internally: Unit tests with mocks for your own business logic. These test your code, not your integration, and that's the right scope. Mocks belong here — for testing whether your transaction approval logic correctly handles a "KYC not complete" response, not for testing whether the provider actually returns that response correctly.
Build internally, carefully: Sandbox integration tests that run against the actual sandbox environment before every deploy. These should cover the happy path, the documented error codes, and the edge cases in your provider's API spec. The risk is sandbox drift — these tests give you a signal, not a guarantee.
Hard to build, high value: Continuous production monitoring that validates your integration against real provider behavior on an ongoing schedule. This is the layer most teams don't have. It requires handling auth against production providers, running non-destructive read operations to validate behavior, diffing responses against expected schemas, and alerting when deviations appear. Building this correctly takes significant engineering time — and needs to be maintained as providers evolve.
Is your integration testing strategy production-grade?
Most embedded finance teams have unit tests and pre-deploy checks. Very few have continuous validation against live provider behavior. Here's what's worth asking your team:
- When did you last update your BaaS provider mocks to match the current API spec?
- How would you know if your provider changed their error code taxonomy last week?
- Do your staging tests reflect actual production rate limits and latency characteristics?
- What's your monitoring strategy for integration drift between releases?
Conduit provides continuous contract testing, behavioral monitoring, and schema drift detection — built for embedded finance teams who need production-grade integration validation without building it themselves.
See Conduit in action →The teams ahead of the curve on embedded finance reliability aren't shipping less often or testing more carefully before each deploy. They've changed the fundamental model: from point-in-time testing to continuous validation. The integration is always being observed. Drift is caught when it happens, not when it breaks something.
Your code can be perfect. The integration can still fail. Testing only the code means you've covered half the problem.