from Hacker News

Firefly – Full Text Search Engine for Dropbox

by human_afterall on 4/12/15, 6:11 PM with 21 comments

  • by eiopa on 4/12/15, 9:12 PM

    I would've loved to hear more details about why you built your own. For example, you mention that Elastic Search wasn't deployed at your scale and there was some talk about machine footprint, but it doesn't explain how your solution compares to something like ES.

    Did ES just didn't scale when you tried it? Is your solution better/faster? If so, by how much and on what workloads?

    Contrast this with something like RocksDB. They just show you the numbers - http://rocksdb.org/

  • by dapz on 4/13/15, 6:41 AM

    "Firstly, we expect some users to have a large number of documents in their Dropbox, making it non-trivial to update their corresponding index “instantly”."

    If the alternative is maintaining a single index, won't the time it takes to update it at least be the time it takes to update per-user indexes? The former naively sounds like updating a single, gigantic binary search tree, the latter seems like updating a hashmap of UserId/BST pairs.

    "Secondly, this approach requires the system to maintain as many indices as there are users with each stored in a separate file. With over 300 million users, keeping track of so many indices in production would be an operational nightmare."

    ..Why?

    Anyway the stuff about shared documents is enough to make per-user indexing probably a bad idea, but I don't understand the reasons they provided above.

  • by vskr on 4/12/15, 11:17 PM

    Is LevelDB linked in this article related to leveldb developed at Google ( https://github.com/google/leveldb ) ?
  • by majke on 4/12/15, 9:24 PM

    Aren't documents on dropbox supposed to be encrypted?
  • by tomglindmeier on 4/12/15, 9:58 PM

    I can't help but I just don't want Dropbox or anybody else to read all my files.
  • by cdnsteve on 4/13/15, 11:39 AM

    RabbitMQ, interesting. I was just reading up on NSQ and it seems like a good alternate.

    Any details on the tech Firefly was coded in? Go, C++, Java?

  • by georgehm on 4/12/15, 11:27 PM

    Can it do substring search? Unfortunately, Firefly is only available for business customers.
  • by tuyguntn on 4/12/15, 11:13 PM

    Today. We have indexed all of your documents, you can search easily inside your documents, even though you have created good directory structure and named your files accordingly.

    Tomorrow. Hmm we have your data, lots of data, we wanted to know what is interesting to our users, so we decided to analyse them and find people with common interest.

    Day after tomorrow. Miss Rice challenged us "can you find terrorist users using all of the documents you have indexed and analysed?"

    Future. Hey user your first name is strange, your documents contain some strange characters, you are uploading data from country where our political leaders have problems, are you terrorist?

  • by escaped_hn on 4/13/15, 5:18 PM

    If DropBox is now letting you search and index your files, then they've been doing it for months.