by li4ick on 6/24/22, 12:53 AM with 1 comments
by dekhn on 6/24/22, 1:12 AM
After this document was written, Hadoop became popular on the outside but had no end of problems. Eventually, most systems were replaced with more advanced ones- for example, MapReduce was replaced by Flume(Java/C++/whatever). And often times, people do jobs in these systems against storage systems that have indexing.
Most importantly, there was no rdbms that could build the google index at the time, and google only succeeded because they could build large indices fast. It literally was a technology that made or broke the company (I was hired around 2008 to help run a system that was mission critical and did run on an rdbms, but it played a very different role from MapReduce. I also worked on one of the hairiest mapreduces, used to do something it really was not well designed for: large-scale machine learning.
Note that Dewitt is the guy that the dewitt clause was written for. And stonebraker invented modern rdbmss. Why they chose this hill to die on (and there was a whole saga that happened after this paper was written) is mystifying to me.