### Managing Ceph Placement Groups (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for managing Placement Groups (PGs) in Ceph, which are crucial for data distribution and recovery. `ceph pg stat` shows PG status, `ceph pg repair` attempts to fix inconsistencies, and `ceph pg scrub` verifies data integrity. ```Bash ceph pg stat ``` ```Bash ceph pg repair ``` ```Bash ceph pg scrub ``` -------------------------------- ### Monitoring Ceph Cluster Performance Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands are used to monitor the performance of the Ceph cluster, including I/O statistics, OSD performance metrics, and identification of slow requests. The PG distribution analysis helps in understanding data placement. ```bash # 查看集群 I/O 统计 ceph iostat watch "ceph iostat" # 查看 OSD 性能统计 ceph osd perf # 查看慢请求 ceph daemon osd.X dump_slow_requests ceph daemon osd.X dump_historic_slow_ops # PG 分布分析 ceph pg dump | grep ^pg | awk '{print $1,$15}' | sort ``` -------------------------------- ### Modifying RGW User Properties - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command allows modification of an existing RGW user's properties, such as their display name, identified by their UID. ```bash radosgw-admin user modify --uid= --display-name="New Name" ``` -------------------------------- ### Viewing RGW User Information - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands list all RGW users or display detailed information for a specific user identified by their UID. ```bash radosgw-admin user list radosgw-admin user info --uid= ``` -------------------------------- ### Creating RGW Users - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands create new users in Ceph RGW (RADOS Gateway). Users are identified by a unique UID and can have an optional display name and email address. ```bash radosgw-admin user create --uid= --display-name="" radosgw-admin user create --uid=testuser --display-name="Test User" --email=test@example.com ``` -------------------------------- ### Generating RGW Swift Keys - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command creates a Swift-compatible secret key for a specified RGW user, enabling access via the OpenStack Swift API. -------------------------------- ### Committing Changes to Local Repository (Git) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/CONTRIBUTING.md This command stages all modified and deleted files (`-a`) and commits them to the local repository with the provided message (`-m`). It creates a snapshot of your changes. ```Shell git commit -am 'Add some feature' ``` -------------------------------- ### Handling Ceph MDS Capability Grant Messages (C) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This C function, `handle_cap_grant`, processes capability GRANT messages received from the Ceph Metadata Server (MDS). It manages client file access permissions, updates inode metadata (size, mode, uid, gid, nlink), handles cache consistency (invalidation, writeback), and processes extended attributes. The function requires `s_mutex` and `i_ceph_lock` to be held by the caller, which it then releases. It incorporates logic for encrypted files, stale capabilities, and capability migration to ensure data integrity and consistency. ```C /* * Handle a cap GRANT message from the MDS. (Note that a GRANT may * actually be a revocation if it specifies a smaller cap set.) * * 主要架构组件: * 1. 能力管理系统 (Capability Management System) * - 处理MDS发送的能力授权/撤销消息 * - 维护客户端的文件访问权限状态 * * 2. 锁机制 (Lock Management) * - i_ceph_lock: 保护inode的ceph特定字段 * - snap_rwsem: 快照读写信号量 * - s_mutex: 会话互斥锁 * * 3. 缓存一致性 (Cache Coherency) * - 处理缓存失效和回写操作 * - 管理文件数据的一致性状态 * * 4. 文件属性管理 (File Attribute Management) * - 更新文件大小、权限、时间戳等元数据 * - 处理加密文件的特殊逻辑 * * caller holds s_mutex and i_ceph_lock, we drop both. */ static void handle_cap_grant(struct inode *inode, struct ceph_mds_session *session, struct ceph_cap *cap, struct ceph_mds_caps *grant, struct ceph_buffer *xattr_buf, struct cap_extra_info *extra_info) __releases(ci->i_ceph_lock) __releases(session->s_mdsc->snap_rwsem) { struct ceph_client *cl = ceph_inode_to_client(inode); struct ceph_inode_info *ci = ceph_inode(inode); /* * 工作原理概述: * 1. 解析MDS发送的能力授权消息 * 2. 处理能力状态变化(授权/撤销/无变化) * 3. 更新文件元数据和属性 * 4. 处理缓存一致性操作 * 5. 触发相应的后续操作(回写、失效等) */ int seq = le32_to_cpu(grant->seq); int newcaps = le32_to_cpu(grant->caps); int used, wanted, dirty; u64 size = le64_to_cpu(grant->size); u64 max_size = le64_to_cpu(grant->max_size); /* 状态标志位 - 用于控制后续操作 */ unsigned char check_caps = 0; bool was_stale = cap->cap_gen < atomic_read(&session->s_cap_gen); bool wake = false; // 是否唤醒等待线程 bool writeback = false; // 是否需要回写数据 bool queue_trunc = false; // 是否需要截断文件 bool queue_invalidate = false; // 是否需要失效缓存 bool deleted_inode = false; // inode是否被删除 bool fill_inline = false; // 是否填充内联数据 bool revoke_wait = false; // 是否等待撤销完成 int flags = 0; /* * 加密文件大小处理 - Ceph的加密文件系统支持 * 如果文件是加密的且有内容,使用fscrypt_file_size */ if (IS_ENCRYPTED(inode) && size) size = extra_info->fscrypt_file_size; /* 调试输出 - 记录能力变化 */ doutc(cl, "%p %llx.%llx cap %p mds%d seq %d %s\n", inode, ceph_vinop(inode), cap, session->s_mds, seq, ceph_cap_string(newcaps)); doutc(cl, " size %llu max_size %llu, i_size %llu\n", size, max_size, i_size_read(inode)); /* * 缓存失效处理 - 一致性管理的核心 * 当CACHE能力被撤销且没有脏缓冲区时,尝试失效缓存 */ if (S_ISREG(inode->i_mode) && /* 仅对常规文件 */ ((cap->issued & ~newcaps) & CEPH_CAP_FILE_CACHE) && (newcaps & CEPH_CAP_FILE_LAZYIO) == 0 && !(ci->i_wrbuffer_ref || ci->i_wb_ref)) { if (try_nonblocking_invalidate(inode)) { /* 页面被锁定,稍后在单独线程中失效 */ if (ci->i_rdcache_revoking != ci->i_rdcache_gen) { queue_invalidate = true; ci->i_rdcache_revoking = ci->i_rdcache_gen; } } } /* 处理过期能力 - 重置为基本PIN能力 */ if (was_stale) cap->issued = cap->implemented = CEPH_CAP_PIN; /* * 能力迁移处理 - 处理MDS之间的能力转移 * 确保消息顺序的正确性 */ if (ceph_seq_cmp(seq, cap->seq) <= 0) { WARN_ON(cap != ci->i_auth_cap); WARN_ON(cap->cap_id != le64_to_cpu(grant->cap_id)); seq = cap->seq; newcaps |= cap->issued; } /* 更新能力状态 */ cap->cap_gen = atomic_read(&session->s_cap_gen); cap->seq = seq; /* 检查能力授权的副作用 */ __check_cap_issue(ci, cap, newcaps); /* 更新inode版本号 - 用于缓存一致性 */ inode_set_max_iversion_raw(inode, extra_info->change_attr); /* * 文件属性更新 - AUTH_SHARED能力允许读取基本属性 */ if ((newcaps & CEPH_CAP_AUTH_SHARED) && (extra_info->issued & CEPH_CAP_AUTH_EXCL) == 0) { umode_t mode = le32_to_cpu(grant->mode); /* 检查inode类型是否发生变化 */ if (inode_wrong_type(inode, mode)) pr_warn_once("inode type changed! (ino %llx.%llx is 0%o, mds says 0%o)\n", ceph_vinop(inode), inode->i_mode, mode); else inode->i_mode = mode; /* 更新用户ID和组ID */ inode->i_uid = make_kuid(&init_user_ns, le32_to_cpu(grant->uid)); inode->i_gid = make_kgid(&init_user_ns, le32_to_cpu(grant->gid)); ci->i_btime = extra_info->btime; doutc(cl, "%p %llx.%llx mode 0%o uid.gid %d.%d\n", inode, ceph_vinop(inode), inode->i_mode, from_kuid(&init_user_ns, inode->i_uid), from_kgid(&init_user_ns, inode->i_gid)); #if IS_ENABLED(CONFIG_FS_ENCRYPTION) /* 加密认证信息检查 - 防止未授权修改 */ if (ci->fscrypt_auth_len != extra_info->fscrypt_auth_len || memcmp(ci->fscrypt_auth, extra_info->fscrypt_auth, ci->fscrypt_auth_len)) pr_warn_ratelimited_client(cl, "cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len %d new len %d)\n", ci->fscrypt_auth_len, extra_info->fscrypt_auth_len); #endif } /* * 链接计数更新 - LINK_SHARED能力允许读取链接信息 */ if ((newcaps & CEPH_CAP_LINK_SHARED) && (extra_info->issued & CEPH_CAP_LINK_EXCL) == 0) { set_nlink(inode, le32_to_cpu(grant->nlink)); if (inode->i_nlink == 0) deleted_inode = true; // 标记为已删除 } /* * 扩展属性更新 - 处理xattr数据 */ if ((extra_info->issued & CEPH_CAP_XATTR_EXCL) == 0 && grant->xattr_len) { int len = le32_to_cpu(grant->xattr_len); u64 version = le64_to_cpu(grant->xattr_version); if (version > ci->i_xattrs.version) { ``` -------------------------------- ### Creating RGW Subusers - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command creates a subuser associated with a main RGW user, providing a way to manage access credentials for applications or services under a primary user. ```bash radosgw-admin subuser create --uid= --subuser= --access=full ``` -------------------------------- ### Deleting RGW Users - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command removes an RGW user and all associated data, identified by their UID. ```bash radosgw-admin user rm --uid= ``` -------------------------------- ### Determining Need for Capability Check in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This snippet determines if a capability check is required, typically after an import operation or if the inode's state was stale. It sets 'check_caps' to 1 if the inode was stale or the grant operation is an import, and if there are wanted capabilities that are neither currently wanted by the MDS nor newly granted. This ensures that the system re-evaluates its capability needs under specific conditions. ```C if ((was_stale || le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT) && (wanted & ~(cap->mds_wanted | newcaps))) { check_caps = 1; } ``` -------------------------------- ### Troubleshooting Ceph OSD Issues Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This collection of commands assists in diagnosing and resolving issues with Ceph OSDs. It includes commands for retrieving detailed OSD information, viewing logs, initiating data scrubbing for integrity checks, listing associated PGs, and performing in-depth performance diagnostics. ```bash # 查看 OSD 详细信息 ceph osd find X ceph osd metadata X # 查看 OSD 日志 journalctl -u ceph-osd@X -f journalctl -u ceph-osd@X --since "1 hour ago" # 修复 OSD ceph osd scrub X ceph osd deep-scrub X # 查看 OSD 的 PG ceph pg ls-by-osd X # OSD 性能诊断 ceph daemon osd.X perf dump ceph daemon osd.X dump_ops_in_flight ceph daemon osd.X dump_blocked_ops ``` -------------------------------- ### Incrementing Write Buffer Reference Count for Dirty Folio (C) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This function is called when a folio (page) becomes dirty. It increments `i_wrbuffer_ref` by one for each dirty folio. If it's the first dirty page, it acquires an additional inode reference (`ihold`). It also updates snapshot-specific dirty page counts, either for the head snapshot (`i_wrbuffer_ref_head`) or a pending `ceph_cap_snap`. ```C static bool ceph_dirty_folio(struct address_space *mapping, struct folio *folio) { struct inode *inode = mapping->host; struct ceph_inode_info *ci = ceph_inode(inode); spin_lock(&ci->i_ceph_lock); // 关键:每个 folio 增加 1 个引用计数 if (ci->i_wrbuffer_ref == 0) ihold(inode); // 第一个脏页时持有 inode 引用 ++ci->i_wrbuffer_ref; // 每个脏页增加 1 // 根据快照上下文分别处理 if (__ceph_have_pending_cap_snap(ci)) { // 如果有待处理的快照,增加快照的脏页计数 struct ceph_cap_snap *capsnap = list_last_entry(&ci->i_cap_snaps, struct ceph_cap_snap, ci_item); capsnap->dirty_pages++; // 快照脏页计数 +1 } else { // 否则增加 head 快照的脏页计数 ++ci->i_wrbuffer_ref_head; // head 快照脏页计数 +1 } spin_unlock(&ci->i_ceph_lock); } ``` -------------------------------- ### Creating RGW S3 Access Keys - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command generates new S3 access keys for a specified RGW user, enabling programmatic access to RGW buckets and objects. ```bash radosgw-admin key create --uid= --key-type=s3 ``` -------------------------------- ### Ceph Address Space Operations Definition (C) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This snippet defines `ceph_aops`, a static constant structure of type `address_space_operations`. It registers Ceph's custom implementations for file system operations like `writepages`, `write_begin`, and `write_end`, which are crucial for integrating with the kernel's page cache and writeback mechanisms. ```C // 在 Ceph 的 address_space_operations 中 static const struct address_space_operations ceph_aops = { .writepages = ceph_writepages, .write_begin = ceph_write_begin, .write_end = ceph_write_end, // ... }; ``` -------------------------------- ### Managing Ceph Managers (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for managing Ceph Managers, which provide additional services like the dashboard and metrics. `ceph mgr module enable/disable` controls manager modules, and `ceph mgr stat` shows manager status, useful for feature management and monitoring. ```Bash ceph mgr module enable ``` ```Bash ceph mgr module disable ``` ```Bash ceph mgr stat ``` -------------------------------- ### Pushing Local Branch to Remote Repository (Git) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/CONTRIBUTING.md This command uploads the local `feature/your-feature-name` branch and its commits to the `origin` remote repository. This makes your changes available for a pull request. ```Shell git push origin feature/your-feature-name ``` -------------------------------- ### Analyzing Inode Capability Status in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This code block analyzes and logs the current state of an inode's capabilities. It retrieves the 'wanted', 'used', and 'dirty' capability flags using helper functions __ceph_caps_wanted, __ceph_caps_used, and __ceph_caps_dirty. The 'doutc' macro then prints these states in a human-readable format, aiding in debugging and understanding the inode's capability lifecycle. ```C wanted = __ceph_caps_wanted(ci); used = __ceph_caps_used(ci); dirty = __ceph_caps_dirty(ci); doutc(cl, " my wanted = %s, used = %s, dirty %s\n", ceph_cap_string(wanted), ceph_cap_string(used), ceph_cap_string(dirty)); ``` -------------------------------- ### Managing Ceph CRUSH Map (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for managing the CRUSH map, which dictates data placement. `ceph osd crush tree` visualizes the hierarchy, `crushtool` is a utility for map manipulation, and `ceph osd crush rule` manages data placement rules, essential for data distribution and fault domain awareness. ```Bash ceph osd crush tree ``` ```Bash crushtool ``` ```Bash ceph osd crush rule ``` -------------------------------- ### Managing File Layout and Size in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This section handles core file metadata management, specifically file layout and size, when any read or write file capabilities (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR) are granted. It updates the file's layout, including the storage pool and namespace. If the pool or namespace changes, it clears the pool permission cache. Finally, it processes file size updates and truncation, potentially queuing a truncation operation. ```C if (newcaps & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR)) { /* 文件布局可能已更改 */ s64 old_pool = ci->i_layout.pool_id; struct ceph_string *old_ns; ceph_file_layout_from_legacy(&ci->i_layout, &grant->layout); old_ns = rcu_dereference_protected(ci->i_layout.pool_ns, lockdep_is_held(&ci->i_ceph_lock)); rcu_assign_pointer(ci->i_layout.pool_ns, extra_info->pool_ns); /* 如果存储池发生变化,清除权限缓存 */ if (ci->i_layout.pool_id != old_pool || extra_info->pool_ns != old_ns) ci->i_ceph_flags &= ~CEPH_I_POOL_PERM; extra_info->pool_ns = old_ns; /* 处理文件大小和截断 */ queue_trunc = ceph_fill_file_size(inode, extra_info->issued, le32_to_cpu(grant->truncate_seq), le64_to_cpu(grant->truncate_size), size); } ``` -------------------------------- ### Managing Ceph RGW (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for managing Ceph RADOS Gateway (RGW), which provides S3/Swift object storage. `radosgw-admin user create` manages RGW users, and `radosgw-admin bucket` manages buckets, essential for object storage administration. ```Bash radosgw-admin user create ``` ```Bash radosgw-admin bucket ``` -------------------------------- ### Managing CephFS Directory Quotas - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands use setfattr to set file and byte quotas on a CephFS directory, limiting the number of files and total bytes it can contain. getfattr is used to retrieve existing quota settings. ```bash setfattr -n ceph.quota.max_files -v 10000 /mnt/cephfs/dir setfattr -n ceph.quota.max_bytes -v 1000000000 /mnt/cephfs/dir getfattr -n ceph.quota.max_files /mnt/cephfs/dir ``` -------------------------------- ### Managing Write Buffer and Snapshot References on Completion (C) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This function is crucial for managing write buffer and snapshot-related reference counts after write completion. It decrements `i_wrbuffer_ref` by the number of pages (`nr`). It also handles snapshot-specific dirty page counts (`dirty_pages` for `ceph_cap_snap` or `i_wrbuffer_ref_head`). When `i_wrbuffer_ref` reaches zero, `ceph_check_caps` is called to notify the MDS, and the inode's reference is released. If `i_wb_ref` becomes zero, waiting threads are woken up. ```C void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr, struct ceph_snap_context *snapc) { struct inode *inode = &ci->netfs.inode; bool last = false; bool wake_ci = false; spin_lock(&ci->i_ceph_lock); // 减少总的写缓冲区引用计数 ci->i_wrbuffer_ref -= nr; if (ci->i_wrbuffer_ref == 0) { last = true; // 标记为最后一个引用 } // 处理快照相关的引用计数 if (ci->i_head_snapc == snapc) { ci->i_wrbuffer_ref_head -= nr; } else { // 处理 capsnap 的引用计数 struct ceph_cap_snap *capsnap; list_for_each_entry(capsnap, &ci->i_cap_snaps, ci_item) { if (capsnap->context == snapc) { capsnap->dirty_pages -= nr; break; } } } // 减少总的写引用计数 if (ci->i_wb_ref && (--ci->i_wb_ref == 0)) { wake_ci = true; } spin_unlock(&ci->i_ceph_lock); // 关键:当所有写引用都释放完成时,触发能力检查 if (last) { ceph_check_caps(ci, 0); // 通知 MDS 权限可以释放 } if (wake_ci) wake_up_all(&ci->i_cap_wq); // 如果是最后一个引用,释放 inode 引用 if (last) iput(inode); } ``` -------------------------------- ### Performing Ceph Backup and Recovery (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands and methods for Ceph backup and recovery. `ceph mon getmap` retrieves the monitor map, `ceph auth export` exports authentication keys, and general data export strategies are used for disaster recovery and migration purposes. ```Bash ceph mon getmap ``` ```Bash ceph auth export ``` -------------------------------- ### Enabling Multiple CephFS File Systems - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command sets a global flag to enable the creation and management of multiple independent CephFS file systems within the same Ceph cluster. ```bash ceph fs flag set enable_multiple true ``` -------------------------------- ### Deleting RGW S3 Access Keys - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command removes a specific S3 access key associated with an RGW user, identified by the user's UID and the access key itself. ```bash radosgw-admin key rm --uid= --key-type=s3 --access-key= ``` -------------------------------- ### Processing Inode Work in Ceph (C) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This static function serves as the handler for inode work items. It retrieves the `ceph_inode_info` from the `work_struct` and checks for specific work bits. If `CEPH_I_WORK_WRITEBACK` is set, it initiates a writeback operation using `filemap_fdatawrite`. After processing, it decrements the inode's reference count. ```C static void ceph_inode_work(struct work_struct *work) { struct ceph_inode_info *ci = container_of(work, struct ceph_inode_info, i_work); struct inode *inode = &ci->netfs.inode; if (test_and_clear_bit(CEPH_I_WORK_WRITEBACK, &ci->i_work_mask)) { doutc(cl, "writeback %p %llx.%llx\n", inode, ceph_vinop(inode)); filemap_fdatawrite(&inode->i_data); // 这里是关键! } // 处理其他工作类型... iput(inode); // 释放引用计数 } ``` -------------------------------- ### Updating Directory Statistics in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This snippet updates the file and subdirectory counts for a directory inode. It proceeds only if the CEPH_CAP_FILE_SHARED capability is active and the directory statistics in extra_info are valid. This ensures that directory metadata, such as the number of contained files and subdirectories, is kept up-to-date. ```C if ((newcaps & CEPH_CAP_FILE_SHARED) && extra_info->dirstat_valid) { ci->i_files = extra_info->nfiles; ci->i_subdirs = extra_info->nsubdirs; } ``` -------------------------------- ### Creating a New Feature Branch (Git) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/CONTRIBUTING.md This command creates a new branch named `feature/your-feature-name` and immediately switches to it. It's the first step in isolating your changes for a new feature or bug fix. ```Shell git checkout -b feature/your-feature-name ``` -------------------------------- ### Analyzing Ceph Performance (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for analyzing Ceph cluster performance. `ceph osd perf` shows OSD performance metrics, `rbd perf image iostat` provides RBD image I/O statistics, and `cephfs-top` monitors CephFS activity, aiding in performance testing and bottleneck identification. ```Bash ceph osd perf ``` ```Bash rbd perf image iostat ``` ```Bash cephfs-top ``` -------------------------------- ### Configuring Ceph Dashboard Manager Module Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This snippet demonstrates how to enable and configure the Ceph Dashboard module. It includes steps for creating a self-signed certificate, setting up user authentication, assigning roles, and configuring the dashboard's network address and port. ```bash # Dashboard 模块 ceph mgr module enable dashboard ceph dashboard create-self-signed-cert ceph dashboard ac-user-create -i ceph dashboard ac-role-add-scope-perms admin read-write pool ceph config set mgr mgr/dashboard/server_addr 0.0.0.0 ceph config set mgr mgr/dashboard/server_port 8443 ``` -------------------------------- ### Core Capability State Transition Logic in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This is the central logic for handling changes in inode capabilities. It distinguishes between three scenarios: capability revocation, no change, and capability granting. For revocations, it determines if writeback or invalidation is needed and sets 'check_caps' accordingly. For grants, it updates the issued and implemented capabilities and may trigger a wake-up. It also includes a debug assertion to ensure consistency. ```C if (cap->issued & ~newcaps) { /* 能力撤销情况 */ int revoking = cap->issued & ~newcaps; doutc(cl, "revocation: %s -> %s (revoking %s)\n", ceph_cap_string(cap->issued), ceph_cap_string(newcaps), ceph_cap_string(revoking)); if (S_ISREG(inode->i_mode) && (revoking & used & CEPH_CAP_FILE_BUFFER)) { writeback = true; /* 启动回写,延迟确认 */ revoke_wait = true; } else if (queue_invalidate && revoking == CEPH_CAP_FILE_CACHE && (newcaps & CEPH_CAP_FILE_LAZYIO) == 0) { revoke_wait = true; /* 等待失效完成 */ } else if (cap == ci->i_auth_cap) { check_caps = 1; /* 仅检查授权能力 */ } else { check_caps = 2; /* 检查所有能力 */ } /* 如果有新能力,尝试唤醒等待者 */ if (~cap->issued & newcaps) wake = true; cap->issued = newcaps; cap->implemented |= newcaps; } else if (cap->issued == newcaps) { /* 能力无变化 */ doutc(cl, "caps unchanged: %s -> %s\n", ceph_cap_string(cap->issued), ceph_cap_string(newcaps)); } else { /* 能力授权情况 */ doutc(cl, "grant: %s -> %s\n", ceph_cap_string(cap->issued), ceph_cap_string(newcaps)); /* 检查是否有其他MDS在撤销新授权的能力 */ if (cap == ci->i_auth_cap && __ceph_caps_revoking_other(ci, cap, newcaps)) check_caps = 2; cap->issued = newcaps; cap->implemented |= newcaps; /* 仅添加位,避免覆盖待撤销的 */ wake = true; } BUG_ON(cap->issued & ~cap->implemented); ``` -------------------------------- ### Managing Ceph Authentication (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md These commands are used for managing authentication keys and permissions within the Ceph cluster. They allow listing, creating, and deleting authentication entries, which is vital for securing the cluster and controlling access. ```Bash ceph auth list ``` ```Bash ceph auth create ``` ```Bash ceph auth del ``` -------------------------------- ### Special Handling for Capability Revocation Messages in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This snippet provides special handling for explicit capability revocation messages from the MDS. If a revocation operation is received and no 'revoke_wait' is pending, it clears the MDS's wanted capabilities, forces a flush of capabilities, and sets 'check_caps' to trigger a re-evaluation of the inode's capabilities. This ensures a prompt response to MDS-initiated revocations. ```C if (!revoke_wait && le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) { cap->mds_wanted = 0; flags |= CHECK_ ``` -------------------------------- ### Monitoring Ceph Cluster Status (Overall) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands provide a comprehensive overview of the Ceph cluster's current state, including its health, storage usage, and version information. They are essential for daily monitoring and initial fault diagnosis. ```bash # 查看集群状态(最常用) ceph status ceph -s # 查看集群健康状态 ceph health ceph health detail # 实时监控集群状态 ceph -w # 查看集群存储使用情况 ceph df ceph df detail # 查看集群版本信息 ceph version ceph versions ``` -------------------------------- ### Mounting CephFS using Kernel Client - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands demonstrate how to mount a CephFS file system using the kernel client. Authentication can be done either by providing the secret key directly or by specifying a secret file. ```bash mount -t ceph :/ /mnt/cephfs -o name=admin,secret= mount -t ceph :/ /mnt/cephfs -o name=admin,secretfile=/etc/ceph/admin.secret ``` -------------------------------- ### Monitoring Ceph I/O Performance (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md These commands help in monitoring the I/O performance of the Ceph cluster. `ceph iostat` provides detailed I/O statistics (version dependent), while `ceph -w` and `ceph status` offer real-time activity and overall status relevant to I/O. ```Bash ceph iostat ``` ```Bash ceph -w ``` ```Bash ceph status ``` -------------------------------- ### Updating File Timestamps on Read Access in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This block updates the modification, access, and change timestamps (mtime, atime, ctime) of a file's inode if the CEPH_CAP_ANY_RD (any read) capability is granted. It decodes the timestamps from the grant information and then applies them to the inode using ceph_fill_file_time, considering time warp sequence for consistency. ```C if (newcaps & CEPH_CAP_ANY_RD) { struct timespec64 mtime, atime, ctime; ceph_decode_timespec64(&mtime, &grant->mtime); ceph_decode_timespec64(&atime, &grant->atime); ceph_decode_timespec64(&ctime, &grant->ctime); ceph_fill_file_time(inode, extra_info->issued, le32_to_cpu(grant->time_warp_seq), &ctime, &mtime, &atime); } ``` -------------------------------- ### Managing Ceph RBD Images (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for managing Ceph's RADOS Block Device (RBD) images. They allow creating and removing block devices, creating snapshots for data recovery, and mapping/unmapping RBD images to hosts for use as block storage. ```Bash rbd create ``` ```Bash rbd rm ``` ```Bash rbd snap create ``` ```Bash rbd map ``` ```Bash rbd unmap ``` -------------------------------- ### Using Ceph Specialized Tools (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md These are specialized tools for advanced Ceph operations. `ceph-objectstore-tool` and `ceph-bluestore-tool` are used for low-level data recovery and deep diagnostics, typically in severe fault scenarios. ```Bash ceph-objectstore-tool ``` ```Bash ceph-bluestore-tool ``` -------------------------------- ### Mounting CephFS using ceph-fuse - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands show how to mount CephFS using the FUSE-based client, ceph-fuse. The allow_other option permits non-root users to access the mounted file system. ```bash ceph-fuse /mnt/cephfs ceph-fuse /mnt/cephfs -o allow_other ``` -------------------------------- ### Managing Ceph Configuration (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md These commands are used for managing Ceph cluster configuration parameters. `ceph config set/get` allows dynamic modification and retrieval of settings, while `ceph tell` sends commands to specific daemons, useful for parameter tuning and troubleshooting. ```Bash ceph config set ``` ```Bash ceph config get ``` ```Bash ceph tell ``` -------------------------------- ### Viewing CephFS File System Status - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands are used to list existing CephFS file systems, check the detailed status of a specific file system, and retrieve its configuration. ```bash ceph fs ls ceph fs status ceph fs get ``` -------------------------------- ### Managing Ceph Storage Pools (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md These commands are used for creating, deleting, and configuring Ceph storage pools. They are fundamental for storage planning, setting replication or erasure coding, and managing quotas within the cluster. ```Bash ceph osd pool create ``` ```Bash ceph osd pool delete ``` ```Bash ceph osd pool set ``` -------------------------------- ### Viewing CephFS Client Connections - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands are used to inspect active client connections and sessions on a specific Ceph Metadata Server (MDS) daemon, identified by its ID. ```bash ceph daemon mds. client ls ceph daemon mds. session ls ``` -------------------------------- ### Performing Basic Ceph Manager Operations Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands provide essential operations for Ceph Managers, including checking their status, enabling or disabling specific Manager modules (like the Dashboard), listing available modules, and forcing a failover to another Manager instance. ```bash # 查看 Manager 状态 ceph mgr stat ceph mgr dump # 启用/禁用 Manager 模块 ceph mgr module enable ceph mgr module disable # 查看可用模块 ceph mgr module ls # 故障转移 Manager ceph mgr fail ``` -------------------------------- ### Managing Ceph MDS Daemons - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands provide tools for managing Metadata Server (MDS) daemons. They allow checking MDS status, dumping MDS information, failing a specific MDS, and marking an MDS as repaired. ```bash ceph mds stat ceph mds dump ceph mds fail ceph mds repaired ``` -------------------------------- ### Performing Ceph Monitor Maintenance Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This set of commands facilitates maintenance tasks for Ceph Monitors, such as starting/stopping the service, compacting the Monitor database for performance, synchronizing time across Monitors, and recovering a Monitor from a failed state using its monmap. ```bash # 启动/停止 Monitor systemctl start ceph-mon@ systemctl stop ceph-mon@ # 压缩 Monitor 数据库 ceph tell mon. compact # 同步 Monitor 时间 ceph time-sync-status # Monitor 故障恢复 ceph-mon --extract-monmap /tmp/monmap --mon-data /var/lib/ceph/mon/ceph- ``` -------------------------------- ### Updating Inode Extended Attributes (Xattrs) in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This snippet updates the extended attributes (xattrs) associated with an inode in Ceph. It first logs the new xattr version, then releases any existing xattr blob before acquiring the new one from 'xattr_buf'. It also updates the xattr version and invalidates cached ACLs and security contexts to ensure consistency. ```C doutc(cl, " got new xattrs v%llu on %p %llx.%llx len %d\n", version, inode, ceph_vinop(inode), len); if (ci->i_xattrs.blob) ceph_buffer_put(ci->i_xattrs.blob); ci->i_xattrs.blob = ceph_buffer_get(xattr_buf); ci->i_xattrs.version = version; ceph_forget_all_cached_acls(inode); ceph_security_invalidate_secctx(inode); ``` -------------------------------- ### Handling Maximum File Size for Authorized MDS in Ceph (C/C++) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This snippet manages the maximum file size for an inode, applicable only when the current capability ('cap') is the authoritative one ('ci->i_auth_cap') and write capabilities (CEPH_CAP_ANY_FILE_WR) are granted. If the 'max_size' received differs from the current 'i_max_size', it updates the inode's maximum size. If the new 'max_size' meets or exceeds the 'i_wanted_max_size', it resets the wanted and requested max sizes and signals a wake-up. ```C if (ci->i_auth_cap == cap && (newcaps & CEPH_CAP_ANY_FILE_WR)) { if (max_size != ci->i_max_size) { doutc(cl, "max_size %lld -> %llu\n", ci->i_max_size, max_size); ci->i_max_size = max_size; if (max_size >= ci->i_wanted_max_size) { ci->i_wanted_max_size = 0; /* 重置 */ ci->i_requested_max_size = 0; } wake = true; } } ``` -------------------------------- ### Mounting CephFS Subdirectory - Bash Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This command demonstrates how to mount a specific subdirectory within a CephFS file system, rather than the entire root, using the kernel client. ```bash mount -t ceph :/subdir /mnt/subdir -o name=admin ``` -------------------------------- ### Queueing Inode Work in Ceph (C) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/draft/Kcephfs-Caps.md This function queues a specific work item for an inode into Ceph's inode work queue. It sets a work bit on the inode's mask, increments the inode's reference count, and attempts to queue the work. If queuing fails (meaning it's already in the queue), the reference count is immediately decremented to prevent leaks. ```C void ceph_queue_inode_work(struct inode *inode, int work_bit) { struct ceph_fs_client *fsc = ceph_inode_to_fs_client(inode); struct ceph_inode_info *ci = ceph_inode(inode); set_bit(work_bit, &ci->i_work_mask); // 设置工作位 ihold(inode); // 增加引用计数 if (queue_work(fsc->inode_wq, &ci->i_work)) { // 成功提交到工作队列 } else { // 已经在队列中,释放引用 iput(inode); } } ``` -------------------------------- ### Monitoring Ceph Cluster Health (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md These commands are used for routine monitoring of the Ceph cluster's overall status, health, disk space usage, and real-time events. They are essential for daily operations and initial fault diagnosis. ```Bash ceph -s ``` ```Bash ceph health ``` ```Bash ceph df ``` ```Bash ceph -w ``` -------------------------------- ### Performing Ceph OSD Maintenance Operations Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands facilitate advanced OSD maintenance, including the complete and safe removal of an OSD, replacing a failed OSD, and reweighting OSDs to optimize data distribution and balance storage utilization across the cluster. ```bash # 安全移除 OSD(完整流程) ceph osd out X # 标记为out,开始数据迁移 ceph osd safe-to-destroy X # 检查是否安全移除 ceph osd destroy X --yes-i-really-mean-it # 销毁OSD ceph osd crush remove osd.X # 从CRUSH map移除 ceph auth del osd.X # 删除认证信息 ceph osd rm X # 从集群移除 # 替换故障 OSD ceph osd destroy X --yes-i-really-mean-it # 重新权衡 OSD ceph osd reweight X 0.8 # 临时权重调整 ceph osd crush reweight osd.X 2.0 # 永久权重调整 ceph osd reweight-by-utilization # 按使用率自动调整权重 ``` -------------------------------- ### Performing Basic Ceph OSD Operations Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md This set of commands covers fundamental operations for Ceph OSDs, including checking their status and usage, starting/stopping OSD services, and marking OSDs as 'out' or 'in' for data rebalancing, or 'down'/'up' for immediate state changes. ```bash # 查看 OSD 状态 ceph osd stat ceph osd dump ceph osd tree # 查看 OSD 使用情况 ceph osd df ceph osd df tree # 启动/停止 OSD systemctl start ceph-osd@X systemctl stop ceph-osd@X systemctl restart ceph-osd@X # 将 OSD 标记为 out/in ceph osd out X ceph osd in X # 将 OSD 标记为 down/up ceph osd down X ceph osd up X ``` -------------------------------- ### Troubleshooting Ceph Issues (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands and methods for troubleshooting Ceph cluster issues. `journalctl` is used for system log inspection, `ceph daemon dump` provides daemon-specific diagnostic information, and general log analysis helps in problem diagnosis and root cause identification. ```Bash journalctl ``` ```Bash ceph daemon dump ``` -------------------------------- ### Performing Basic Ceph Monitor Operations Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/zh-cn/实用指南:Ceph常用工具汇总.md These commands are used for fundamental management of Ceph Monitors, including checking their status, quorum state, adding new Monitors, removing existing ones, and inspecting the Monitor map for cluster configuration. ```bash # 查看 Monitor 状态 ceph mon stat ceph mon dump # 查看 Monitor 仲裁状态 ceph quorum_status # 添加 Monitor ceph mon add # 移除 Monitor ceph mon remove # 查看 Monitor 映射 ceph mon getmap -o /tmp/monmap monmaptool --print /tmp/monmap ``` -------------------------------- ### Managing CephFS (Bash) Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Operation-Skills/README.md Commands for managing Ceph File System (CephFS). `ceph fs status` shows file system health, `ceph mds stat` checks Metadata Server (MDS) status, `ceph fs dump` provides file system details, and `ceph mds fail` can be used for MDS recovery, crucial for file system operations. ```Bash ceph fs status ``` ```Bash ceph mds stat ``` ```Bash ceph fs dump ``` ```Bash ceph mds fail ``` -------------------------------- ### Using AWS S3 Connector with PyTorch Source: https://github.com/wuhongsong/ceph-deep-dive/blob/main/Distributed-Storage/en/From.deepseek-3FS.to.AI.Storage.md This snippet demonstrates how to build a training dataset and a data loader using the `s3torchconnector` library for PyTorch. It shows how to connect to an S3 bucket, specify the region, and apply transformations to the data, enabling efficient data loading from AWS S3 for AI training workloads. ```Python from s3torchconnector import S3MapDataset from torch.utils.data import DataLoader # Build training dataset based on AWS S3 Connector uri = 's3://mnist/train' aws_region = os.environ['AWS_REGION'] train_dataset = S3MapDataset.from_prefix(uri, region=aws_region, transform=MNISTTransform(transform)) # Training data loading train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True) ```