CURP integration API — best practices for production deployment and error handling?

manuel_architect opened this thread · · 2 replies

curp-apiintegrationproductionerror-handling

Question

M
manuel_architect Asker

We have selected a CURP integration API provider and now need to implement it properly in our microservices architecture. Our platform is a multi-tenant SaaS serving three different financial institutions in Mexico, each requiring CURP validation as part of their customer onboarding. Combined volume is approximately 5,000 CURP lookups per day with peaks of 200 concurrent requests during business hours.

I'm looking for advice on production-hardening a CURP integration API. Our first attempt was naive — direct synchronous calls with no caching, no circuit breaking, and a 30-second timeout. When the provider had a brief outage last week, our entire onboarding service became unresponsive because all threads were blocked waiting for CURP responses.

What I want to implement:

  • Circuit breaker pattern — if the CURP integration API fails X times in Y seconds, stop calling it and return a degraded response
  • Response caching — cache successful CURP lookups for 24-48 hours since the underlying data rarely changes
  • Async processing — for non-blocking use cases, queue the CURP lookup and process it asynchronously
  • Rate limit awareness — respect the provider's rate limits and distribute calls evenly using a token bucket
  • Multi-provider failover — if primary CURP integration API is down, failover to a secondary provider
  • Idempotent retries — retry transient failures without double-counting against our quota

Our tech stack is Java/Spring Boot with Redis for caching and RabbitMQ for async messaging. The CURP integration API provider uses bearer token auth with tokens that expire every 4 hours. We're deploying on Kubernetes in a Mexico City region data center.

Specific questions:

  1. What circuit breaker thresholds work well for CURP integration API calls? (failure count, time window, recovery probe interval)
  2. Is 24 hours too aggressive for cache TTL on CURP data? What if someone's CURP status changes?
  3. How do you handle the case where cached data shows "valid" but the CURP has since been revoked?
  4. Any recommendations for monitoring and alerting on CURP integration API health?

We want to be confident that our system degrades gracefully rather than cascading failures to other services. The CURP validation is critical but it shouldn't bring down the rest of the onboarding pipeline if the external dependency is temporarily unavailable.

Answers

S
sofia_sre

For CURP integration API resilience, here's what works well at our scale (similar volume to yours):

Circuit breaker config: We use Resilience4j with a sliding window of 10 calls. If 5+ fail (50% failure rate), circuit opens for 30 seconds. After that, we allow 3 probe requests — if 2 succeed, circuit closes. Timeout per request is 5 seconds — anything longer and the user experience is already degraded.

Caching: 24 hours is fine for a positive cache (CURP found and valid). For negative results (CURP not found), cache for only 1 hour since it might be a newly issued CURP that isn't in the provider's system yet. Key your cache on the CURP string itself — it's inherently unique.

Stale-while-revalidate: If cache is expired but circuit is open, serve stale data with a flag "stale": true so downstream services can decide whether to accept it. For lending, we accept stale-valid but reject stale-invalid.

T
tomas_platform

On the provider selection side — look for CURP integration API providers on apipull.com API Hub that offer idempotency keys. This means you can safely retry any failed request by sending the same idempotency key and you won't get double-billed. Makes your retry logic much simpler.

For multi-provider failover, keep two providers configured but with different characteristics: primary should be fastest (low latency), secondary should be most reliable (highest uptime). This way your normal path is fast and your fallback is dependable even if slightly slower.

Monitoring-wise: track p50/p95/p99 latency, error rate, and cache hit ratio. Alert on error rate >5% sustained for 2 minutes. Also monitor your token refresh — if the auth token fails to refresh, you'll start getting 401s immediately.

● Thread open · 2 replies

Find API Providers on apipull.com