from Hacker News

Ask HN: Is there a proper design pattern for this situation?

by holden_nelson on 12/28/21, 4:33 AM with 3 comments

I'm building a simple crud app in Django which is basically enhanced reporting for commerce related software. So I pull information on behalf of a user from another software's API, do a bunch of aggregation/calculations on the data, and then display a report to the user.

I'm having a hard time figuring out how to make this all happen quickly. While I'm certain there are optimizations that can be made in my Python code, I'd like to find some way to be able to make multiple calculations on reasonably large sets of objects without making the user wait like 10 seconds.

The problem is that the data can change in a given moment. If the user makes changes to their data in the other software, there is no way for me to know that without hitting the API again for that info. So I don't see a way that I can cache information or pre-fetch it without making a ridiculous number of requests to the API.

In a perfect world the API would have webhooks I could use to watch for data changes but that's not an option.

Is my best bet to just optimize the factors I control to the best of my ability and hope my users can live with it?

by delbronski on 1/3/22, 2:44 PM
If this is an internal tool, then 10 seconds wait is really not so bad for a report like this. I've run into the same issues before where calling external APIs to get the most up to date data can take up to 30 seconds even. I would just keep it simple and allow the user to manually update the report themselves by clicking a button. I would also show the a timestamp of when it was last updated so they can see if the information is up to date or not.
by Southland on 12/28/21, 4:53 AM
To me, the first optimizations I’d look at are the following:
- Find the amount of time a user can still view a non real-time report. If the report timeframe is days then you can simply show the last full day. Even if it is real-time, using a small TTL and avoiding the recalculation on every refresh will help. Personally, I think users are forgiving with some lag between actions and reporting.
“ I'd like to find some way to be able to make multiple calculations on reasonably large sets of objects without making the user wait like 10 seconds.”
- If you can’t get it fast enough, then avoid your calculations in Python over large data objects you need to always have loaded in memory. Moving the raw data into a database better suited for the aggregations/calculations needed could reduce the time your user has to wait.
by NicoJuicy on 12/28/21, 5:47 AM
Catch the external data in a document db and update when required. Only call the external API when your user is online.
Let them click if they want the updated data if aggregating it is resource intensive.
If it's a SAAS make the polling internal dependant on their subscription.