by arkenflame on 9/6/15, 3:38 PM with 35 comments
by makecheck on 9/6/15, 6:16 PM
This is one of the things that drives me crazy about some of Apple's technologies. For instance, somebody at Apple decided long ago that all application HTML help pages should be cached. The "off switch" for this cache remains a bit of black magic but it's something like "rm -Rf com.apple.help.DamnedNearEverything" followed by "killall UnnecessaryBackgroundHelpProcess" every damned time you modify a page or else the help system might show you an older version of the content that you just "changed".
by ggreer on 9/6/15, 9:18 PM
...
FILE *fp = fopen("example.txt", "r");
char dest;
int bytes_read = fread(&dest, 1, 1, fp);
putchar(dest);
...
Think of how many caches likely contain the first byte of example.txt. There's the internal cache on the hard disk or SSD. There's the OS's filesystem cache in RAM. There's your copy (dest) in RAM, and also in L3, L2, and L1 cache. (These aren't inclusive on modern Intel CPUs. I'm just talking about likelihood.) Implementing your own software RAM cache puts you well into diminishing returns. The increased complexity simply isn't worth it.by sirgawain33 on 9/6/15, 8:25 PM
I've seen many devs jump to caching before investing time in understanding what is really causing performance problems (I was one of them for a time, of course). Modern web stacks can scream without any caching at all.
Years ago, a talk by Rasmus Lerdorf really opened my eyes up to this idea. [1] He takes a vanilla PHP app (Wordpress, I think) and dramatically increases its throughput by identifying and tweaking a few performance bottlenecks like slow SSL connections. One of the best lines: "Real Performance is Architecture Driven"
[1] I think it was a variation of this one: https://vimeo.com/13768954
by gabbo on 9/6/15, 9:36 PM
By dropping a cache into an existing system, you're weakening consistency in the name of performance. At best, your strongly-consistent system has started taking on eventually-consistent properties (but maybe not even eventual depending on how you invalidate/expire what's in your cache). Eventual consistency can help you scale, but reasoning about it is really hard.
In some sense caching as described by OP is a tool to implement CAP theorem tradeoffs, and Eric Brewer described the reality of trading off the C (consistency) for A/P (availability/partition-tolerance) better than I ever could:
Another aspect of CAP confusion is the hidden cost of
forfeiting consistency, which is the need to know the
system’s invariants. The subtle beauty of a consistent
system is that the invariants tend to hold even when the
designer does not know what they are. Consequently, a
wide range of reasonable invariants will work just fine.
Conversely, when designers choose A, which requires
restoring invariants after a partition, they must be
explicit about all the invariants, which is both
challenging and prone to error. At the core, this is the
same concurrent updates problem that makes multithreading
harder than sequential programming.
by markbnj on 9/6/15, 4:16 PM
With that in mind, I do think most of the pitfalls listed here can be avoided with well-understood tools and techniques. There's no real need to be running your cache in-process with your GC'd implementation language. Cache refilling can be a complex challenge for large scale sites, but I expect that a majority of systems can live with slower responses while the cache refills organically from traffic.
The points about testing and reproducible behavior are dead on - no equivocation needed there. As always keeping it as simple as possible should be a priority.
by armon on 9/6/15, 10:09 PM
That said, caching is absolutely critical to almost every piece of software ever. Even if you explicitly caching isn't used, a wide variety of caches are likely still being depending upon including CPU caching (L1, L2, L3), OS filesystem caching, DNS caching, ARP caching, etc etc.
Caching certainly adds complexity but it's also one of the best patterns for solving a wide range of performance problems. I would recommend developers spend more time learning and understanding the complexities so that they can make use of caching correctly and without applying it as a premature optimization.
by zkhalique on 9/6/15, 8:32 PM
I think that, if a cache is combined with a push indicating a change, then it's basically a local "eventually consistent replica" which catches up as soon as there is a connection to the source of truth.
Seriously, many times you are READING data which changes rarely (read: every X minutes / hours / days). So, in the meantime, every code path that will need access to the data may as well look in the local snapshot first.
The question about consistency is an interesting one. The client's view of the authoritative server state may be slightly out of date, when the user issues a request. If certain events happened in the meantime that affect the user's view, then the action can just be kicked back to the user, to be resolved. But 90%+ of the time, the view depends on 10 things that "change rarely", so a cache is a great improvement.
Related issues involve batching / throttling / waiting for already-sent requests to complete.
PS: That was quick. I posted this and literally 10 seconds later it got a downvote.
by chubot on 9/6/15, 7:37 PM
A cache can still be useful if to reduce load and increase capacity... but latency becomes more complex.
by velox_io on 9/7/15, 10:51 PM
The problem is that implementing caching is a bit of a canary in a coal mine. If there are problems with the architecture, then trying to add caching into the mix will make things much more difficult.
I wouldn't say adding cache to parts which you know will be heavily read, upfront (or at least adding hoods to make it easier to implement later) is a waste of time or "Premature Optimisation". The 80-20 rule is live and well, just use your judgement.
by 0xcde4c3db on 9/6/15, 8:33 PM
I wonder how many sleepless nights have been caused by combining the two.
by contingencies on 9/6/15, 10:07 PM
by patsplat on 9/7/15, 2:17 PM
by amelius on 9/6/15, 10:22 PM