RHEL AI: ilab model train command failed with error "Unable to find" skills train msg file.

Solution Verified - Updated -

Issue

  • RHEL AI: ilab model train command failed with error.
$ ilab -v -v model train --pipeline simple --data-path /var/home/instruct/.local/share/instructlab/skills_train_msgs_2025-02-12T09_50_37.jsonl --device cuda 
INFO 2025-02-12 10:10:00,989 numexpr.utils:148: Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2025-02-12 10:10:00,989 numexpr.utils:161: NumExpr defaulting to 16 threads.
INFO 2025-02-12 10:10:02,098 datasets:59: PyTorch version 2.4.1 available.
LINUX_TRAIN.PY: NUM EPOCHS IS:  8
LINUX_TRAIN.PY: TRAIN FILE IS:  /var/home/instruct/.local/share/instructlab/skills_train_msgs_2025-02-12T09_50_37.jsonl
LINUX_TRAIN.PY: TEST FILE IS:  /var/home/instruct/.local/share/instructlab/skills_train_msgs_2025-02-12T09_50_37.jsonl
LINUX_TRAIN.PY: Using device 'cuda:0'
  NVidia CUDA version: 12.4
  AMD ROCm HIP version: n/a
  cuda:0 is 'NVIDIA L40S' (44.1 GiB of 44.5 GiB free, capability: 8.9)
  cuda:1 is 'NVIDIA L40S' (44.1 GiB of 44.5 GiB free, capability: 8.9)
  cuda:2 is 'NVIDIA L40S' (44.1 GiB of 44.5 GiB free, capability: 8.9)
  cuda:3 is 'NVIDIA L40S' (44.1 GiB of 44.5 GiB free, capability: 8.9)
LINUX_TRAIN.PY: LOADING DATASETS
Unable to find '/var/home/instruct/.local/share/instructlab/skills_train_msgs_2025-02-12T09_50_37.jsonl/train_gen.jsonl

Environment

  • Red Hat Enterprise Linux AI 1.3
  • Red Hat Enterprise Linux AI 1.4
  • Nvidia GPUs

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content