from Hacker News

Ask HN: How do you ensure that a distributed app is working as expected?

by aprao on 9/21/22, 5:25 PM with 0 comments

Imagine a very simple user creation flow in an online marketplace:

- Service A (user service) receives the request and creates a user object and sends an async request to service B and C (e.g. via Kafka)

- Service B (notification service) receives the request and sends an email to the newly created user

- Service C (referral service) receives the request and credits some funds to the referrer

While this design might be laid out correctly in a design doc, it is only implicitly defined in code because the services talk to each other. How would you:

- Ensure that the services are talking to each other in the correct order when implementing the user creation flow (integration tests might not suffice here since they generally test a very narrow set of path)?

- Define and enforce SLO guarantees between services in production?

- Debug which service is to blame when the flow breaks down?