from Hacker News

Google Cloud Storage FUSE

by mvolfik on 5/2/23, 8:46 AM with 108 comments

  • by ofek on 5/2/23, 4:37 PM

    I do appreciate that Google is now officially supporting gcsfuse because it genuinely is a great project. However, their Kubernetes CSI driver seems to have in large part copied code from the one I and a co-maintainer have been working on for years:

    - https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver

    - https://github.com/ofek/csi-gcs

    Here is the initial commit: https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/c...

    Notice for example not just the code but also the associated files. In the Dockerfile it blatantly copied the one from my repo, even the dual license I chose because I was very into Rust at the time. Or take a look at the deployment examples which use Kustomize which I like but is very uncommon and most Kubernetes projects provide Helm charts instead.

    They were most certainly aware of the project because Google reached out to discuss potential collaboration but never responded back: https://imgur.com/a/KDuf9mj

  • by MontyCarloHall on 5/2/23, 12:22 PM

    I’ve experimented with using gcsfuse and its AWS equivalent, s3fs-fuse in production. At best, they are suited to niche applications; at worst, they are merely nice toys. The issue is that every file system operation is fundamentally an HTTP request, so the latency is several orders of magnitude higher than the equivalent disk operation.

    For certain applications that consistently read limited subsets of the filesystem, this can be mitigated somewhat by the disk cache, but for applications that would thrash the cache, cloud buckets are simply not a good storage backend if you desire disk-like access.

    What I would really like to see is a two-tier cache system: most recently accessed files are cached to RAM, with less recently accessed files spilling over to a disk-backed cache. That would open up a world of additional applications whose useful cache size exceeds practical RAM amounts.

  • by nickcw on 5/2/23, 3:21 PM

    As the author of rclone I thought I'd have a quick look through the docs to see what this is about.

    From reading the docs, it looks very similar to `rclone mount` with `--vfs-cache-mode off` (the default). The limitations are almost identical.

    * Metadata: Cloud Storage FUSE does not transfer object metadata when uploading files to Cloud Storage, with the exception of mtime and symlink targets. This means that you cannot set object metadata when you upload files using Cloud Storage FUSE. If you need to preserve object metadata, consider uploading files using gsutil, the JSON API, or the Google Cloud console.

    * Concurrency: Cloud Storage FUSE does not provide concurrency control for multiple writes to the same file. When multiple writes try to replace a file, the last write wins and all previous writes are lost. There is no merging, version control, or user notification of the subsequent overwrite.

    * Linking: Cloud Storage FUSE does not support hard links.

    * File locking and file patching: Cloud Storage FUSE does not support file locking or file patching. As such, you should not store version control system repositories in Cloud Storage FUSE mount points, as version control systems rely on file locking and patching. Additionally, you should not use Cloud Storage FUSE as a filer replacement.

    * Semantics: Semantics in Cloud Storage FUSE are different from semantics in a traditional file system. For example, metadata like last access time are not supported, and some metadata operations like directory renaming are not atomic. For a list of differences between Cloud Storage FUSE semantics and traditional file system semantics, see Semantics in the Cloud Storage FUSE GitHub documentation.

    * Overwriting in the middle of a file: Cloud Storage FUSE does not support overwriting in the middle of a file. Only sequential writes are supported. Access: Authorization for files is governed by Cloud Storage permissions. POSIX-style access control does not work.

    However rclone has `--vfs-cache-mode writes` which caches file writes to disk first to allow overwriting in the middle of a file and `--vfs-cache-mode full` to cache all objects on a LRU basis. They both make the file system a whole lot more POSIX compatible and most applications will run using `--vfs-cache-mode writes` unlike `--vfs-cache-mode off`.

    And of course rclone supports s3/azureblob/b2/r2/sftp/webdav/etc/etc also...

    I don't think it is possible to adapt something with cloud storage semantics to a file system without caching to disk, unless you are willing to leave behind the 1:1 mapping of files seen in the mount to object in the cloud storage.

  • by milesward on 5/2/23, 4:10 PM

    Please, listen to me: use this only in extremely limited cases where performance, stability, and cost efficiency are not paramount. An object store is not a file system no matter how hard you bludgeon it.
  • by rippercushions on 5/2/23, 12:02 PM

    Is this the same gcsfuse that's been around for years, only now with official Google support?

    https://github.com/GoogleCloudPlatform/gcsfuse

  • by askvictor on 5/2/23, 12:43 PM

    Now for official Google Drive support on Linux...
  • by jijji on 5/2/23, 2:03 PM

    I've been using rclone [0] to do the same under linux for years, how is this different?

    [0] https://rclone.org

  • by ISL on 5/2/23, 1:12 PM

    Can this be used to mount Drive under linux?
  • by retrocryptid on 5/2/23, 3:34 PM

    This has been a thing for a while; I remember using it (or something like it) several years ago. While it's great for random files you might want to place in the G-Cloud, what I really wanted was to access my google docs content from the Linux command line. And you can do that, it's just that they're in non-obvious, non-documented, frequently changing formats that will only ever be usable with Google Docs.

    But if you're using the google cloud like you might use Box.Net or DropBox, it seems fine for light usage.

  • by manigandham on 5/2/23, 12:47 PM

    Object storage is a higher-level abstraction than block-storage. FUSE and similar tech can do the job for basic requirements like read-only access by legacy applications but rarely works well for other scenarios.

    A more complex layer like https://objectivefs.com/ (based on the S3 API) would be more useful, although I would've expected the cloud providers to scale their own block-store/SANs backed with object-stores by now.

  • by remram on 5/2/23, 1:33 PM

    See also: JuiceFS: https://juicefs.com/

    Adds a DBMS or key-value store for metadata, making the filesystem much faster (POSIX, small overwrites don't have to replace a full object in the GCS/S3 backend).

    Almost certainly a better solution if you want to turn your object storage into a mountable filesystem, with the (big) caveat that you can't access the files directly in the bucket (they are not stored transparently).

  • by jefftk on 5/2/23, 12:18 PM

    Cloud Storage FUSE does not support overwriting in the middle of a file. Only sequential writes are supported.

    This seems like a big limitation?

  • by iamjk on 5/2/23, 3:12 PM

    I mean I get why everyone wants everything to be fuse-compatible but some things just aren't meant to be done.
  • by goodpoint on 5/2/23, 12:54 PM

    FUSE is really not suitable for this.
  • by trollied on 5/2/23, 12:10 PM

    Be aware that this is not free:

    "Cloud Storage FUSE is available free of charge, but the storage, metadata, and network I/O it generates to and from Cloud Storage are charged like any other Cloud Storage interface. In other words, all data transfer and operations performed by Cloud Storage FUSE map to Cloud Storage transfers and operations, and are charged accordingly."