Home / Blog / Engineering / Kafka vs REST API
EngineeringLATAM

Real-time data architecture: Kafka or REST API — the right call for LATAM

When Kafka is the answer.
When REST API still wins.
Why most "real-time" projects in LATAM are actually pseudo-real-time.

Sergei Filatov
Sergei FilatovFounder · data-metrics.pro · May 26, 2026
◷ 14 min read

60-second summary

In August 2025, a Lima retailer lost USD 68,000 during a single Cyber Wow event. Not because of marketing — because of architecture. The engineering lead picked REST API to handle 8,000 transactions per second at peak hour. By day three the pipeline collapsed, the finance dashboard showed zeros, and the team spent two hours without knowing if they had sold anything. Kafka and three weeks of refactoring saved them. A single conversation with a senior architect before launch would have saved them too.

Most "real-time" projects in LATAM today are actually pseudo-real-time: REST APIs polled every 30 seconds while data engineers call it streaming. That works in 80% of cases and breaks in the remaining 20% — when the regulator demands synchronous validation, when a retailer processes 12,000 orders per minute on Black Friday, when fraud detection has to fire in 200 milliseconds instead of 30.

I have reviewed more than 40 architectures across LATAM in five years — Estée Lauder with 12 brands, Leroy Merlin with dynamic pricing, Dodo Pizza with kitchen computer vision, NLMK with a B2B portal for 50,000 partners. The pattern is consistent: companies pick Kafka when they need REST and REST when they need Kafka. Both mistakes are expensive — the first burns people and infrastructure, the second burns revenue and regulatory fines.

This is a technical breakdown without marketing fluff: where Kafka is actually needed, where it's overkill, where REST API is still the right call in 2026, and why the LATAM regulatory landscape (SUNAT, SAT CFDI 4.0, SRI online scheme, DIAN CUFE, ARCA CAE) fundamentally changes the answer.

  • Kafka — for high-throughput streaming, event sourcing, decoupling across 4 or more teams, replay'ability. Minimum useful setup: 3 brokers + Schema Registry + Kafka Connect. TCO from USD 2,500/month on AWS MSK for production-grade
  • REST/gRPC API — for synchronous transactions with request-response semantics, for regulator integrations (SUNAT, SAT, SRI, DIAN, ARCA — they all run on HTTP), for cross-team contracts with a clear SLA
  • Not Kafka if you have fewer than 1,000 events/sec and no replay requirement — it's overengineering
  • Not REST if you need decoupling, broadcast, or more than 10 consumers per event — synchronous fan-out destroys latency
  • Hybrid wins more often than the pure choice: REST at the edge for regulators and partners, Kafka inside for domain events, CDC (Debezium) to sync databases
  • The LATAM factor: real-time e-invoicing requires sub-5-second HTTP validation against SAT/SRI/DGI. That is REST territory. Everything after the regulator call — domain events, analytics, fraud — is Kafka

Why real-time stopped being optional in LATAM

Three years ago, real-time architecture in LATAM was about being "progressive." In 2026 it is a regulatory requirement for half the industry and a competitive edge for the other half.

Real-time e-invoicing is the new normal. Ecuador SRI has run an online scheme since 2014: electronic vouchers are validated within hours for regular taxpayers and effectively in real time for special contributors, with authorization returning in seconds. Mexico SAT, on CFDI 4.0 (mandatory since 2023), requires synchronous PAC validation with a UUID returned in under 5 seconds. Uruguay DGI tightened real-time validation with CFE 25.1 (Resolución 198/024 DGI). Peru SUNAT launched SIRE in 2024 — the Sistema Integrado de Registros Electrónicos, replacing PLE — which runs as a near-real-time system with mandatory submission on the 8th of each month. Argentina's ARCA (formerly AFIP) requires CAE (Código de Autorización Electrónico) in real time for electronic invoices, with a 10-second ceiling for approval.

If your Odoo or ERP integrates with a tax authority, you already have a real-time API integration — whether you like it or not. More background in the Odoo Peru guide and in DIAN electronic invoicing 2026.

Retail and fintech raised the pressure. MercadoLibre processes peaks of more than 25,000 orders per minute during Hot Sale. NuBank reported 110+ million customers in Brazil, Mexico, and Colombia in 2025 — over 200,000 transactions per second at peak hour. Rappi operates across 9 LATAM countries with tens of thousands of concurrent pricing and restaurant-availability queries. These companies physically cannot run on REST polling — they built event-streaming platforms (Kafka, AWS Kinesis, GCP Pub/Sub) 5 to 7 years ago.

Mining, agritech, and smart cities piled on IoT load. Codelco in Chile, Antamina in Peru, Vale in Brazil equip trucks and heavy machinery with thousands of sensors reporting telemetry at 1–10 Hz. That adds up to 50,000+ events/sec on a mid-size mining site. REST API cannot handle that load, with or without pooling and retry.

Smart cities in Mexico City, Bogotá, and Buenos Aires — traffic sensors, smart parking, environmental monitoring — all run on streaming pipelines, because batch processing kills actionability.

Real-time architecture in 2026 is not a fashion choice. It is a specific skill set without which a company either breaks compliance or loses market speed. For broader context on LATAM data infrastructure, see why LATAM SMBs leak USD 500B a year.

Kafka and REST sit at different stack layers

The first critical mistake in most "Kafka vs API" writeups is comparing different layers of the stack.

Kafka is a distributed log. Not a database, not a message queue, not an API. It is an append-only log of immutable events, split into partitions, replicated across brokers, available for read by many consumers in parallel with independent offsets. The producer writes to a topic; the consumer reads from any position — from the start of history, from the current moment, or from an arbitrary offset. Retention is configurable: 7 days by default, weeks or "forever" based on your storage budget. Full details live in the official Apache Kafka documentation.

Kafka latency in a production-grade setup:

  • p50 produce latency: 2–5 ms (single data center, no acks=all)
  • p99 end-to-end (producer → consumer): 10–30 ms
  • Throughput: 100,000 msg/sec on a modest 3-broker cluster, millions on a tuned setup
  • Storage cost: ~USD 0.10/GB/month on AWS EBS gp3 — keeping a full year is cheap for most use cases

REST API is a request-response protocol over HTTP. Synchronous, blocking, point-to-point. The client sends a request, waits for the response, uses the response. The breakdown: HTTP server, application layer (Express, FastAPI, Spring), business logic, usually a synchronous database write, response returned.

REST API latency in a production-grade setup:

  • Intra-AZ (single data center): 5–50 ms on a simple endpoint
  • Cross-region (PE → US East): 80–200 ms p50, more than 500 ms p99 with TLS handshake
  • With full business logic and DB query: 100–500 ms typical

The fundamental difference is not performance. It is the semantics of coupling.

REST: producer and consumer know each other. The producer calls the consumer's endpoint. If the consumer is down, the producer has to handle the error, retry, and implement a circuit breaker. If there are many consumers, the producer calls each of them. That is tight coupling.

Kafka: producer and consumer don't know about each other. The producer writes to a topic. There can be 0, 1, or 100 consumers — the producer doesn't know. If a consumer crashes, the producer doesn't notice. That is loose coupling with a buffer between the two sides. The concept is laid out in Martin Kleppmann's "Designing Data-Intensive Applications" — dataintensive.net.

This difference is fundamental. Everything else (latency, throughput, complexity) is downstream from it.

gRPC is still request-response, but with binary protocol, HTTP/2 multiplexing, and protobuf schemas. Latency is lower than REST (no JSON parsing) but the semantics are identical: synchronous coupling. gRPC streaming gets closer to Kafka, but without persistence or multi-consumer.

GraphQL is a different syntax for request-response, not a different pattern. Architecturally it is REST with composable schemas. It relates to real-time through Subscriptions (usually WebSocket), but that is a separate layer.

Webhooks are REST in reverse: you tell the regulator or partner you want events, and they call your endpoint. They solve fan-out for 1 to 10 consumers, but if you have 1,000, that is Kafka territory.

Server-Sent Events (SSE) and WebSockets are for streaming into the browser. They are not a Kafka alternative on the backend — they are the transport between backend and frontend. Common pattern: backend reads from Kafka and pushes to the browser via SSE.

When people say "Kafka vs API," they usually mean "event streaming vs request-response." That isn't an either/or comparison — it's both. Real architectures use both: REST for cross-team contracts, Kafka for internal event flow, CDC (Change Data Capture via Debezium) to sync database changes into Kafka.

Five real-world scenarios — what works, what doesn't

Five situations that show up in every LATAM architecture audit. What to pick and why.

#1. Integration with SUNAT, SAT, SRI, or DIAN for e-invoicing

This is REST. Period. The regulator demands synchronous HTTP validation with a signed XML returned in under 5 seconds. Kafka is useless here — the regulator does not have a Kafka consumer, and you do not have streaming integration with the government. What you can do: drop the invoice_emitted event into Kafka after the REST call to SAT/PAC succeeds, so 12 other services (analytics, customer notification, accounting reconciliation) react asynchronously. The primary call to the regulator stays REST.

!

Don't put Kafka between your ERP and SAT/SUNAT. I have seen teams try to buffer the regulator behind a Kafka middleware layer: it worked fine until the first SAT outage, when retry logic built a 4-hour queue and the billing team had to manually reconcile 11,000 vouchers. The regulator edge is synchronous REST, period. Kafka comes in after confirmation, never before.

#2. Pricing intelligence for retail

Say you manage 12 beauty brands, monitor 8 marketplaces, and need to move prices dynamically. This is Kafka territory, but with REST at the edges. Web scrapers write competitor_price_changed events into a topic at 5,000–20,000 events/minute throughput. The pricing engine consumes, calculates the optimal price through an ML model, writes to price_recommendation. Several consumers pick it up: one pushes to e-commerce via the marketplace's REST API, another notifies merchandisers, a third writes to the data warehouse. If you have 1 marketplace and 50 SKUs, Kafka is overkill — REST polling every hour works.

#3. Fraud detection in fintech

NuBank, Belvo, Bitso — they all build fraud detection on event streaming. Transactions are evaluated in 200–500 ms, otherwise UX suffers. Kafka + stream processing (Flink or ksqlDB) is the standard here. REST destroys latency: model call, rules-engine call, graph database call for relationships — that's 4 sequential HTTP calls and 400 ms on the best infrastructure. If you handle 50 transactions a day, who cares that fraud check takes 5 seconds? REST.

#4. Real-time IoT for mining or agritech

A mine with 500 trucks50 excavatorsand 200 sensors per unit produces 50,000 events/sec — and that is operational telemetry only, before production data. REST cannot physically handle that — the TCP connections pool bursts. Kafka (or Kinesis or GCP Pub/Sub) is required. Producers write to topics partitioned by equipment_id; consumers are predictive maintenance models, dashboards, and alerting. If you have 5 sensors on 1 farm, write to Postgres via REST and do not invent.

#5. B2B portal with partner notifications

NLMK launched a B2B portal for 50,000 partners — order status, inventory updates, document workflow. Inside: event-driven architecture with domain events (order_createdinvoice_sentdelivery_scheduled) flowing through Kafka. Outside: REST API for each partner, because partners are external integrators with their own systems and they are not going to connect to your Kafka. Hybrid: Kafka inside, REST at the edge. That is the default model for modern enterprise platforms.

One simple rule for LATAM data architecturethe boundary between organizations is REST. The boundary between services inside one organization is Kafka (or async). The boundary with the regulator is always REST. The boundary with the database via CDC is Kafka.

Five mistakes I find in every other audit

Patterns that repeat across companies in Lima, Bogotá, Mexico City, and Buenos Aires. Each one costs between USD 10,000 and USD 500,000 a year in wasted engineering or lost revenue.

#1. Kafka for every integration

The team reads "Designing Data-Intensive Applications," falls in love with event streaming, and puts Kafka between everything. Result: a simple GET /user/{id} goes through 3 topics with response pairing, latency jumps from 50 ms to 800 ms, and engineers debug it for months. In a recent audit (anonymized, fintech in Lima) an 8-person team spent 40% of its time on Kafka operations — that is USD 480,000/year of lost feature capacity.

Fix: Kafka only where there is real decoupling value — 3 or more consumers, replay'ability, throughput above 5,000 events/sec, event sourcing requirement.

#2. REST for streaming

The opposite extreme. The team wants a "real-time" dashboard and polls an endpoint once a second. At 100 users that is 100 req/sec; at 10,000 it is 10k req/sec and the backend goes down. A retailer in Bogotá lost USD 34,000 in a Cyber Days week because the dashboard polled every 500 ms and ate the entire API quota.

Fix: for real-time UI, use WebSocket or SSE on the frontend with a Kafka consumer + push on the backend.

#3. No Schema Registry

Kafka without Schema Registry is a time bomb. The producer writes JSON with no contract, changes a field a week later, the downstream consumer crashes in production on a Saturday. At one e-commerce client in Mexico this turned into 6 hours of reporting downtime, USD 12,000 in lost merchandiser decisions, and three weeks of forensics to find out who changed the schema and when.

Fix: Confluent Schema Registry or Apicurio from day one. Avro or Protobuf as the schema format, JSON Schema as a minimum. Schema evolution rules (backward, forward) configured and enforced in CI.

#4. One topic for everything

Teams create an events topic, throw everything into it, and then wonder why 200 consumers filter 99% of the messages. Anti-pattern that kills scalability.

Fix: one topic per domain event type, partition key by entity ID. Compaction for state changes, retention by compliance requirement (in finance, 7 years minimum — see PLE/SIRE requirements in Peru).

#5. Ignoring idempotency

REST API with no idempotency keys by default: double transactions on retry, duplicates in the database, regulatory violations. Kafka producers without enable.idempotence=true: duplicates in the topic, broken exactly-once semantics. At one PSP (Payment Service Provider) in Argentina I found 0.3% double-charges over six months — more than USD 200,000 in customer chargebacks and operational losses.

Fix: idempotency keys on every mutating endpoint (UUID v4 in the Idempotency-Key header), Kafka with enable.idempotence=trueacks=allmin.insync.replicas=2 for exactly-once.

Anonymous case: 12-brand LATAM retailer

Portfolio client (anonymized by contract, publicly described in Forbes materials as Estée Lauder Travel Retail): retailer with 12 brands8 marketplaces, and a need to move prices dynamically and launch promos while competitors were moving in real time.

Starting state: REST API between the ERP and each marketplace, batch updates every 6 hours, marketing ROAS at 1.5x — because prices were 6 hours stale and ad spend went toward products competitors had already undercut.

What we shipped:

  1. Stood up a Kafka cluster (Confluent Cloud, 3 brokers, ~USD 3,200/month — cheaper than self-managed at the start)
  2. Web scrapers (Scrapy + Playwright) write competitor_price_change events at ~12,000 events/hour throughput
  3. Pricing engine in Python + LightGBM consumes, computes the recommendation, writes to the price_recommendation topic
  4. REST bridges to each marketplace API take the recommendations and make synchronous update calls (still REST because Mercado Libre, Linio, and Falabella don't expose Kafka endpoints)
  5. Schema Registry with Avro, backward-compatibility rules, CI gates

Six-month result:

MetricBeforeAfter
Marketing ROAS1.5x4.2x (+180%)
Price update frequencyevery 6 hours2–8 seconds
Promo fraud detected0USD 1.2M across 11 brands
Operational costUSD 1,200/month (REST batch)USD 9,000/month (Kafka + Schema Registry + ops)
Annual upside (estimated)USD 4.2M

One detail matters: REST was not removed. Between Kafka and the marketplaces, REST bridges stayed — they are external integrations. Hybrid. That is the norm for LATAM enterprise, and the backbone of every transformation case we publish.

Five questions to answer before you pick an architecture

Before telling the team "let's drop in Kafka" or "let's put REST API on everything," answer these 5 questions:

  1. Throughput per event type: events/sec — did you measure it, or are you guessing?
  2. Consumer count per event: 1, 5, or 50?
  3. Replay requirement: do you need to replay history for the last month or year?
  4. Regulator integrations: SUNAT, SAT, SRI, DIAN, ARCA — they all have REST endpoints, period
  5. Engineering capacity: is there someone on the team who'll get up at 3 a.m. to fix Kafka offset issues?

If three or more answers are "I don't know," you need an audit before architectural decisions, not after.

i

Rule of thumb: if your pipeline processes fewer than 1,000 events/sec and nobody asked for history replay, the total cost of running Kafka (broker management, schema registry, monitoring, SREs) will exceed the benefit. Start with REST + simple queues (RabbitMQ, SQS) and migrate to Kafka when volume and consumer count justify it. Migrating from REST to Kafka is weeks of work; migrating away from badly-scaled Kafka takes quarters.

The verdict: "Kafka vs API" is not an ideological argument. It is an engineering decision across 5 parameters — throughput, consumers, replay, regulatory constraints, team capacity. LATAM adds another layer: tax authorities require REST, real-time e-invoicing is a mandatory REST boundary, Kafka lives inside the organization.

If you are picking architecture for a new real-time product in LATAM in 2026, the default is hybrid: REST at the edge (regulators, partners, frontend over WebSocket or SSE), Kafka inside (domain events, analytics, audit log), CDC between database and Kafka. Pure REST works in projects below 1,000 events/sec with no replay. Pure Kafka works on pure data platforms with no external integrations — which barely exists in LATAM.

The most expensive mistake is following the trend. I have seen 5-person startups with a USD 5,000/month Kafka cluster carrying 50 events per hour. And I have seen enterprises still running batch ETL in 2026 and losing real money on stale data.

If you have an Odoo or ERP that has outgrown simple CRUD and you are thinking about real-time analytics, real-time fraud detection, or real-time pricing, request a 30-minute audit. We'll tell you honestly whether you need Kafka or not.

Frequently asked questions

What's cheaper in production — Kafka or REST API?

At low volumes (under 1k events/sec) REST is at least 5x cheaper — no Kafka infrastructure, no operations. Above 10k events/sec, especially with multiple consumers, Kafka becomes cheaper because the cost scales sublinearly with consumer count.

The inflection point sits around 3–5k events/sec and 3 or more consumers.

Can I use AWS Kinesis instead of Kafka?

Yes, and it is often the right call for AWS-native teams. Kinesis Data Streams is simpler in operations (managed) but more expensive at high volume (10+ shards), less flexible in schema management, and locks you in to AWS.

For LATAM teams on AWS, Kinesis is fine to start; migrating to MSK or Confluent is justified when throughput grows or you need a serious Schema Registry.

Confluent Cloud or AWS MSK — which one in LATAM?

Confluent Cloud is more expensive (~2x) but dramatically simpler in operations — managed Schema Registry, ksqlDB, and Connect cluster are included. MSK is cheaper but demands 1 or 2 dedicated SREs.

For SMBs in Lima, Bogotá, or Buenos Aires without senior Kafka expertise — Confluent Cloud for the first 2 years, then re-evaluate self-managed.

Is RabbitMQ enough instead of Kafka?

For message queuing, yes. For event streaming with replay, many consumers, and history retention, no. RabbitMQ is good for task queues (Celery jobs, background workers) but bad for event sourcing.

How do SUNAT and SAT affect architecture?

These regulators demand synchronous REST integration with a signed XML returned in under 5 seconds. You can't fool them with Kafka. Any company operating in Peru, Mexico, Ecuador, Uruguay, Colombia, or Argentina ends up with a REST pipeline into the tax authority.

Kafka sits behind that pipeline, serving internal domain events.

Real-time = streaming = Kafka. Are they synonyms?

No. Real-time is about latency requirements. Streaming is about the data model (continuous, unbounded). Kafka is one specific tool. Real-time can be done over REST (if volumes are low); streaming can be done on Pulsar, Kinesis, or Pub/Sub.

What do I do if I already have a REST API monolith with 5,000 endpoints?

Don't migrate everything to Kafka at once — that's the path to disaster. Strangler pattern: new domain events go to Kafka; existing REST endpoints stay and get pulled out into event-driven flows as business cases emerge.

In practice 30–40% of endpoints stay REST forever, because they sit on a cross-org boundary or because request-response semantics are the right fit.

How many people does it take to run Kafka in production?

Self-managed Kafka on AWS MSK or EC2 with a small cluster (3 brokers): 1 part-time SRE when things are stable, 1–2 SREs during incidents or migrations. Medium cluster (10+ brokers, 100+ topics): minimum 2 dedicated SREs.

With Confluent Cloud or other managed services you can start at 0.5 SRE part-time, but the managed-service price tag replaces the headcount.

What's the biggest architectural risk in LATAM in 2026?

The biggest risk isn't picking the wrong tool — it's building on a synchronous architecture that doesn't survive the regulator's peak hour. When SUNAT, SAT, or DIAN change an endpoint or get hammered at month-end close, pure REST systems with naive retry logic cascade-fail.

The safe pattern: synchronous REST to the regulator with a short timeout + outbox pattern + Kafka behind it for reprocessing. You're covered against regulator downtime and against your own traffic spikes.