VLC media player 3.0.20 Vetinari (revision 3.0.20-0-g6f0d0ab126b) [0000566ab552f560] main libvlc: 正在以默认界面运行 vlc。使用“cvlc”可以无界面模式使用 vlc。 [0000566ab55d4030] main playlist: playlist is empty libva info: VA-API version 1.20.0 libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so libva info: Found init function __vaDriverInit_1_16 libva info: va_openDriver() returns 0 [1] 27636 segmentation fault (core dumped) vlc
/dev/sdb6: recovering journal /dev/sdb6 contains a file system with errors, check forced. Inode 7350190, i_blocks is 212832, should be 212824. FIXED. Unattached inode 7354055
/dev/sdb6: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) fsck exited with status code 4 The root filesystem on /dev/sdb6 requires a manual fsck
BusyBox v1.30.1 (Ubuntu 1:1.30.1-7ubuntu3.1) built-in shell (ash) Enter 'help'for a list of built-in commands.
/root/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead. from torch.distributed.optim import \ Traceback (most recent call last): File "tools/train.py", line 9, in <module> from mmaction.registry import RUNNERS File "/root/autodl-tmp/mmaction2/mmaction/__init__.py", line 16, in <module> assert (digit_version(mmcv_minimum_version) <= mmcv_version AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.2.0.
+---------------------------------------------------------------------------------------+ | Processes: | | XPU XI CI PID Type Process name XPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2118 C 21528MiB | | 0 N/A N/A 3161 C /usr/bin/python 25726MiB | +---------------------------------------------------------------------------------------+
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python 2801 bml mem CHR 195, 255 943 /dev/xpuctrl python 2801 bml mem CHR 195, 6 1724 /dev/xpu6 python 2801 bml 3u CHR 195, 255 0t0 943 /dev/xpuctrl python 2801 bml 4u CHR 195, 6 0t0 1724 /dev/xpu6 ......
其中COMMAND下的为程序名,一般用kill终止掉其中所有python后就能释放显存。
在确认本机无其他人用XPU的情况下也可以用
1
lsof -t /dev/xpu* | sargs -r kill -9
终止所有占用XPU的应用。
Dataloader报错
在训练过程中可能遇到Dataloader环节报错,比如
1 2 3 4 5 6 7 8 9 10 11 12 13
Traceback (most recent call last): File "xxx/PaddleYOLO-example/tools/train.py", line 202, in <module> main() File "xxx/PaddleYOLO-example/tools/train.py", line 198, in main run(FLAGS, cfg) File "xxx/PaddleYOLO-example/tools/train.py", line 151, in run trainer.train(FLAGS.eval) File "xxx/PaddleYOLO-example/ppdet/engine/trainer.py", line 496, in train for step_id, data in enumerate(self.loader): File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 850, in __next__ self._reader.read_next_list()[0] SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /host/Paddle/paddle/phi/core/operators/reader/blocking_queue.h:175)
```bash ❯ ./python Python 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import distutils >>> distutils.util.get_platform() 'linux-x86_64' # 说明平台为linux-x86_64