JuiceFS:元数据详解

引言

juicefs是一款面向云原生设计的高性能分布式文件系统,其有如下特点:

  • 数据存储和元数据存储分离,可以适配多种数据和元数据存储引擎。

  • 后端存储可以直接对接各种对象存储,使用起来更方便,更加适配云服务趋势。

相关技术架构可直接参考:https://juicefs.com/docs/zh/community/architecture

JuiceFS部署

部署规划

  • 本文使用mysql作为元数据存储引擎。

  • 由于主要关注元数据组织,使用s3协议的对象存储作为数据存储引擎。

部署方法

  • 安装mysql,并创建database juicefs

  • 安装juicefs

[root@k8s-master /data/juicefs]# tar -zxf juicefs-1.0.3-linux-amd64.tar.gz
[root@k8s-master /data/juicefs]# sudo install juicefs /usr/local/bin/
  • format,使用mysql作为元数据存储引擎,本地目录/var/jfs/myjfs作为后端存储

[root@k8s-master /data/juicefs]# juicefs format --storage s3 --bucket http://100.99.50.95/jfs --access-key xxxxxxxx --secret-key xxxxxxxx "mysql://root:@(127.0.0.1:3306)/juicefs" myjfs
2023/01/11 14:07:06.680579 juicefs[3629024] <INFO>: Meta address: mysql://root:****@(127.0.0.1:3306)/juicefs [interface.go:402]
2023/01/11 14:07:06.681384 juicefs[3629024] <WARNING>: the database does not support read-only transaction [sql.go:686]
2023/01/11 14:07:06.681977 juicefs[3629024] <INFO>: Data use s3://jfs/myjfs/ [format.go:429]
2023/01/11 14:07:06.938532 juicefs[3629024] <INFO>: Volume is formatted as {
  "Name": "myjfs",
  "UUID": "3f584ef7-0fa1-4f75-813e-3763b73977ee",
  "Storage": "s3",
  "Bucket": "http://100.99.50.95/jfs",
  "AccessKey": "xxxxxxxx",
  "SecretKey": "xxxxxxxx",
  "BlockSize": 4096,
  "Compression": "none",
  "KeyEncrypted": true,
  "TrashDays": 1,
  "MetaVersion": 1
} [format.go:466]
  • mount,将文件系统myjfs,mount到目录/data/juicefs/myjfs目录

[root@k8s-master /data/juicefs]# juicefs mount "mysql://root:@(127.0.0.1:3306)/juicefs" /data/juicefs/myjfs/2023/01/11 14:08:15.368521 juicefs[3630386] <INFO>: Meta address: mysql://root:****@(127.0.0.1:3306)/juicefs [interface.go:402]
2023/01/11 14:08:15.369229 juicefs[3630386] <WARNING>: the database does not support read-only transaction [sql.go:686]
2023/01/11 14:08:15.370647 juicefs[3630386] <INFO>: Data use s3://jfs/myjfs/ [mount.go:428]
2023/01/11 14:08:15.370959 juicefs[3630386] <INFO>: Disk cache (/var/jfsCache/3f584ef7-0fa1-4f75-813e-3763b73977ee/): capacity (102400 MB), free ratio (10%), max pending pages (15) [disk_cache.go:94]
2023/01/11 14:08:15.381783 juicefs[3630386] <INFO>: Create session 1 OK with version: 1.0.3+2022-12-27.e4bf15a [base.go:275]
2023/01/11 14:08:15.382575 juicefs[3630386] <INFO>: Prometheus metrics listening on 127.0.0.1:9567 [mount.go:161]
2023/01/11 14:08:15.382653 juicefs[3630386] <INFO>: Mounting volume myjfs at /data/juicefs/myjfs/ ... [mount_unix.go:181]
2023/01/11 14:08:15.872197 juicefs[3630386] <INFO>: OK, myjfs is ready at /data/juicefs/myjfs/ [mount_unix.go:45]

JuiceFS元数据分析

在mount目录中写入如下数据。

[root@k8s-master /data/juicefs/myjfs]# tree -h
.
├── [4.0K]  dir0
│   └── [1.0M]  file1
└── [256M]  file1

edge:通过目录名称找到inode

如果需要找到文件myjfs/dir0/file1,通过下表可以先找到dir0,其inode为3,然后找到以3为parent的file1,其inode为4。

MariaDB [juicefs]> select * from jfs_edge;
+----+--------+-------+-------+------+
| id | parent | name  | inode | type |
+----+--------+-------+-------+------+
|  1 |      1 | file1 |     2 |    1 |
|  2 |      1 | dir0  |     3 |    2 |
|  3 |      3 | file1 |     4 |    1 |
+----+--------+-------+-------+------+

inode:文件/目录属性

MariaDB [juicefs]> select * from jfs_node;
+---------------------+------+-------+------+-----+-----+------------------+------------------+------------------+-------+-----------+------+--------+
| inode               | type | flags | mode | uid | gid | atime            | mtime            | ctime            | nlink | length    | rdev | parent |
+---------------------+------+-------+------+-----+-----+------------------+------------------+------------------+-------+-----------+------+--------+
|                   1 |    2 |     0 |  511 |   0 |   0 | 1673417226937074 | 1673417630041311 | 1673417630041311 |     3 |      4096 |    0 |      1 |
|                   2 |    1 |     0 |  420 |   0 |   0 | 1673417333270140 | 1673417336289864 | 1673417336289864 |     1 | 268435456 |    0 |      1 |
|                   3 |    2 |     0 |  493 |   0 |   0 | 1673417630041311 | 1673417700029898 | 1673417700029898 |     2 |      4096 |    0 |      1 |
|                   4 |    1 |     0 |  420 |   0 |   0 | 1673417700029898 | 1673417700121148 | 1673417700121148 |     1 |   1048576 |    0 |      3 |
| 9223372032828243968 |    2 |     0 |  365 |   0 |   0 | 1673417226937074 | 1673417226937074 | 1673417226937074 |     2 |      4096 |    0 |      1 |
+---------------------+------+-------+------+-----+-----+------------------+------------------+------------------+-------+-----------+------+--------+

chunk:某个chunk与inode和slice的对应关系

  • inode2对应文件myjfs/file1,其为256MB,包含了4个64M的chunk。

  • inode4对应文件myjfs/dir0/file1,其为1MB,因此只包含一个chunk。

由于其在表中的slices为blob类型,因此无法在此处打印出来。

MariaDB [juicefs]> select * from jfs_chunk;
+----+-------+------+--------------------------+
| id | inode | indx | slices                   |
+----+-------+------+--------------------------+
|  1 |     2 |    0 |                       |
|  2 |     2 |    1 |                       |
|  3 |     2 |    2 |                       |
|  4 |     2 |    3 |                       |
|  5 |     4 |    0 |                       |
+----+-------+------+--------------------------+

block:文件在对象存储中的对象文件

使用下面命令可以显示出这个文件在对象存储中的存储情况。

如myjfs/dir0/file1在对象存储中存储在bucket jfs中,其对象名称为myjfs/chunks/0/0/6_0_1048576。

[root@k8s-master /data/juicefs/myjfs]# juicefs info dir0/file1 
dir0/file1 :
  inode: 4
  files: 1
   dirs: 0
 length: 1.00 MiB (1048576 Bytes)
   size: 1.00 MiB (1048576 Bytes)
   path: /dir0/file1
objects:
+------------+------------------------------+---------+--------+---------+
| chunkIndex |          objectName          |   size  | offset |  length |
+------------+------------------------------+---------+--------+---------+
|          0 | myjfs/chunks/0/0/6_0_1048576 | 1048576 |      0 | 1048576 |
+------------+------------------------------+---------+--------+---------+

元数据编码分析

inode

inode的ID为mysql中的自增id。

数据对象文件名

可见源码,其由slice id,block indx,blocksize构成

func (s *rSlice) key(indx int) string {
    if s.store.conf.HashPrefix {
        return fmt.Sprintf("chunks/%02X/%v/%v_%v_%v", s.id%256, s.id/1000/1000, s.id, indx, s.blockSize(indx))
    }
    return fmt.Sprintf("chunks/%v/%v/%v_%v_%v", s.id/1000/1000, s.id/1000, s.id, indx, s.blockSize(indx))
}

slice id

slice id通过在全局的id池中获取,源码如下

func (m *baseMeta) NewSlice(ctx Context, id *uint64) syscall.Errno {
    m.freeMu.Lock()
    defer m.freeMu.Unlock()
    if m.freeSlices.next >= m.freeSlices.maxid {
        v, err := m.en.incrCounter("nextChunk", sliceIdBatch)
        if err != nil {
            return errno(err)
        }
        m.freeSlices.next = uint64(v) - sliceIdBatch
        m.freeSlices.maxid = uint64(v)
    }
    *id = m.freeSlices.next
    m.freeSlices.next++
    return 0
}

block idx

某个chunk为64M,每个block为4M,因此id依次递增

综上,myjfs/file1的数据对象列表为。

[root@k8s-master /data/juicefs/myjfs]# juicefs info file1 
file1 :
  inode: 2
  files: 1
   dirs: 0
 length: 256.00 MiB (268435456 Bytes)
   size: 256.00 MiB (268435456 Bytes)
   path: /file1
objects:
+------------+-------------------------------+---------+--------+---------+
| chunkIndex |           objectName          |   size  | offset |  length |
+------------+-------------------------------+---------+--------+---------+
|          0 | myjfs/chunks/0/0/1_0_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_1_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_2_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_3_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_4_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_5_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_6_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_7_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_8_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_9_4194304  | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_10_4194304 | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_11_4194304 | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_12_4194304 | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_13_4194304 | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_14_4194304 | 4194304 |      0 | 4194304 |
|          0 | myjfs/chunks/0/0/1_15_4194304 | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_0_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_1_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_2_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_3_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_4_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_5_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_6_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_7_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_8_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_9_4194304  | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_10_4194304 | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_11_4194304 | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_12_4194304 | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_13_4194304 | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_14_4194304 | 4194304 |      0 | 4194304 |
|          1 | myjfs/chunks/0/0/3_15_4194304 | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_0_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_1_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_2_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_3_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_4_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_5_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_6_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_7_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_8_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_9_4194304  | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_10_4194304 | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_11_4194304 | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_12_4194304 | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_13_4194304 | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_14_4194304 | 4194304 |      0 | 4194304 |
|          2 | myjfs/chunks/0/0/4_15_4194304 | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_0_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_1_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_2_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_3_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_4_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_5_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_6_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_7_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_8_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_9_4194304  | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_10_4194304 | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_11_4194304 | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_12_4194304 | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_13_4194304 | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_14_4194304 | 4194304 |      0 | 4194304 |
|          3 | myjfs/chunks/0/0/5_15_4194304 | 4194304 |      0 | 4194304 |
+------------+-------------------------------+---------+--------+---------+

附:juicefs info命令

func info(ctx *cli.Context) error {
    ...
    // 获取文件对应的inode
    inode, err = utils.GetFileInode(d)
    ...
    // 打开一个临时文件.control
    f := openController(d)
    ...
    // 将数据写入到临时文件
    _, err = f.Write(wb.Bytes())
    ...
    // 根据临时文件中的inode,读取chunk,slice,block信息
    // 此函数中包含了一个reader.Read(sizeBuf[n:])操作,读取信息
    err = resp.Decode(f)
    ...
    // 打印相关数据
}

参考文献

https://juicefs.com/docs/zh/community/installation

https://blog.csdn.net/easonwx/article/details/128635853?spm=1001.2014.3001.5501

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值