[BUG] Why text-generation-inference using s3 with fluid is slower than s3 with s3-fuse (first access) #4425

hualongfeng · 2024-12-04T08:22:39Z

What is your environment(Kubernetes version, Fluid version, etc.)

~/ai/opea/chatqna# kubectl get node
NAME               STATUS   ROLES           AGE    VERSION
icelake-server-2   Ready    control-plane   19d    v1.29.9
opea-dev10         Ready    <none>          5d2h   v1.29.9
~/ai/opea/chatqna# kubectl version
Client Version: v1.29.9
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.9
~/ai/opea/chatqna# helm list
NAME 	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART      	APP VERSION  
fluid	default  	1       	2024-11-15 13:53:02.904415378 +0800 CST	deployed	fluid-1.0.3	1.0.3-ccdf3a9
root@icelake-server-2:~/ai/opea/chatqna# sudo lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy
root@opea-dev10:~# sudo lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Describe the bug
I build a ceph cluster for s3 storage on one server. And K8s node access s3 using fluid and s3fuse.

	s3 w/ fluid no dataload first access	s3 w/ s3-fuse first access
download	6.105737	8.808546
shard	390.42	56.09
others	12.60	16.15

The s3 with fluid first access is slower than s3 with s3-fuse.

Note: every test to run echo 3 > /proc/sys/vm/drop_caches on k8s node.

What you expect to happen:

I would expect the two tests to be close in time.

How to reproduce it
Build a ceph s3 environment:
Reference https://github.com/ceph/ceph

$ git clone https://github.com/ceph/ceph.git
$ cd ceph
$ git submodule update --init --recursive --progress
$ sudo apt install curl
$ sudo ./install-deps.sh
$ sudo apt install python3-routes
$ ./do_cmake.sh
$ cd build
$ ninja -j32
$ cat start_ceph.sh
  MON=1 OSD=4 MDS=0 MGR=1 RGW=1 ../src/vstart.sh -n --bluestore -X \
        -o "osd_pool_default_pg_autoscale_mode=off" \
        -o "osd pool default size = 2" \
        -o "osd_pool_default_min_size = 2" \
        -o "mon_allow_pool_size_one = true" \
        -o "bluestore_block_wal_path = \"\"" \
        -o "bluestore_block_db_path = \"\"" \
        -o "bluestore_bluefs = true" \
        -o "bluestore_block_create = false" \
        -o "bluestore_block_db_create = false" \
        -o "bluestore_block_wal_create = false" \
        -o "bluestore_block_wal_separate = false" \
        -o "osd_op_num_shards = 32" \
        -o "osd_op_num_threads_per_shard = 2" \
        -o "osd_memory_target = 32G" \
        -o "rbd cache = false" \
        -o "ms_async_op_threads = 3" \
        --bluestore-devs /dev/nvme4n1,/dev/nvme5n1,/dev/nvme6n1,/dev/nvme7n1
$ ./start_ceph.sh
$ sudo apt install s3cmd
$ cat <<EOF | sudo tee ~/.s3cfg
[default]
access_key = 0555b35654ad1656d804
host_base = 192.168.0.62:8000
host_bucket = no.way.in.hell
secret_key = h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q==
use_https = False
EOF
$ s3cmd mb s3://opea-models                            #create s3 bucket
$ s3cmd sync --follow-symlinks ./* s3://opea-models/   #Upload model data to s3

Using s3 with fluid

root@icelake-server-2:~/ai/opea/chatqna# cat fluid_opea_s3_only_read_chat_7b.yaml 
---
apiVersion: v1
kind: Secret
metadata:
  name: s3-secret
  namespace: default
type: Opaque
data:
  AWS_ACCESS_KEY_ID: "MDU1NWIzNTY1NGFkMTY1NmQ4MDQ="  # echo -n '{access_key}' | base64
  AWS_SECRET_ACCESS_KEY: "aDdHaHh1QkxUcmxoVlV5eFNQVUtVVjhyLzJFSTRuZ3FKeEQ3aUJkQllMaHdsdU4zMEphVDNRPT0="   # echo -n '{secret_key}' | base64
---

---
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: opea-models
spec:
  mounts:
    - mountPoint: s3://opea-models/models--Intel--neural-chat-7b-v3-3
      name: models--Intel--neural-chat-7b-v3-3
      options:
        alluxio.underfs.s3.endpoint: http://192.168.0.62:8000
        alluxio.underfs.s3.disable.dns.buckets: "true"
        alluxio.underfs.s3.inherit.acl: "false"
      encryptOptions:
      - name: aws.accessKeyId
        valueFrom:
          secretKeyRef:
            name: s3-secret
            key: AWS_ACCESS_KEY_ID
      - name: aws.secretKey
        valueFrom:
          secretKeyRef:
            name: s3-secret
            key: AWS_SECRET_ACCESS_KEY
  accessModes:
    - ReadOnlyMany
---
---
# runtime
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: opea-models
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 50Gi
        high: "0.95"
        low: "0.7"
---
root@icelake-server-2:~/ai/opea/chatqna# cat pod_for_text-generation-inference.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: text-generation-pod
spec:
  containers:
    - name: text-generation-container
      image: ghcr.io/huggingface/text-generation-inference:2.2.0
      args: ["--model-id", "Intel/neural-chat-7b-v3-3"]
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP
      resources:
        limits:
          nvidia.com/gpu: 4
      volumeMounts:
        - mountPath: /data
          name: model-volume
      securityContext:
        capabilities:
          add: ["SYS_ADMIN"]
  volumes:
    - name: model-volume
      persistentVolumeClaim:
        claimName: opea-models
  restartPolicy: Never

root@icelake-server-2:~/ai/opea/chatqna#  kubectl apply -f fluid_opea_s3_only_read_chat_7b.yaml
root@icelake-server-2:~/ai/opea/chatqna# kubectl apply -f pod_for_text-generation-inference.yaml 
root@icelake-server-2:~/ai/opea/chatqna# kubectl logs text-generation-pod

Using s3 with s3-fuse
Reference https://github.com/s3fs-fuse/s3fs-fuse

root@icelake-server-2:~/ai/opea/chatqna# echo ACCESS_KEY_ID:SECRET_ACCESS_KEY > ${HOME}/.passwd-s3fs
root@icelake-server-2:~/ai/opea/chatqna# chmod 600 ${HOME}/.passwd-s3fs
root@icelake-server-2:~/ai/opea/chatqna# s3fs opea-models-no-blobs ./s3-mount -o passwd_file=~/.passwd-s3fs -o url=http://192.168.0.62:8000 -o no_check_certificate -o nonempty -o use_path_request_style
root@icelake-server-2:~/ai/opea/chatqna# cat pod_for_text-generation-inference.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: text-generation-pod
spec:
  containers:
    - name: text-generation-container
      image: ghcr.io/huggingface/text-generation-inference:2.2.0
      args: ["--model-id", "Intel/neural-chat-7b-v3-3"]
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP
      resources:
        limits:
          nvidia.com/gpu: 4
      volumeMounts:
        - mountPath: /data
          name: model-volume
      securityContext:
        capabilities:
          add: ["SYS_ADMIN"]
  volumes:
    - name: model-volume
      hostPath:
        path: /mnt/s3-mount
        type: Directory
  restartPolicy: Never
root@icelake-server-2:~/ai/opea/chatqna# kubectl apply -f pod_for_text-generation-inference.yaml 
root@icelake-server-2:~/ai/opea/chatqna# kubectl logs text-generation-pod

Additional Information

The text was updated successfully, but these errors were encountered:

cheyang · 2024-12-17T02:45:25Z

I suspected that it does something with your hardware configuration. If possible, please drop an email to me. My email address is [email protected].

hualongfeng added the bug Something isn't working label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Why text-generation-inference using s3 with fluid is slower than s3 with s3-fuse (first access) #4425

[BUG] Why text-generation-inference using s3 with fluid is slower than s3 with s3-fuse (first access) #4425

hualongfeng commented Dec 4, 2024

cheyang commented Dec 17, 2024

[BUG] Why text-generation-inference using s3 with fluid is slower than s3 with s3-fuse (first access) #4425

[BUG] Why text-generation-inference using s3 with fluid is slower than s3 with s3-fuse (first access) #4425

Comments

hualongfeng commented Dec 4, 2024

cheyang commented Dec 17, 2024