pytorch中的torch.compile是如何加速vLLM大模型推理的?
huafeng.:
如有错误,欢迎指正
ubuntu源码方式安装TensorRT-LLM推理框架(超详细)
lckj2009:
博主,在 git lfs pull 环节就报错了:
batch response: batch error: auth batch error: LFS only supported repository in paid or trial enterprise.
batch response: batch error: auth batch error: LFS only supported repository in paid or trial enterprise.
batch response: batch error: auth batch error: LFS only supported repository in paid or trial enterprise.
batch response: batch error: auth batch error: LFS only supported repository in paid or trial enterprise.
batch response: batch error: auth batch error: LFS only supported repository in paid or trial enterprise.
error: failed to fetch some objects from 'https://gitee.com/mirrors/tensorrt-llm.git/info/lfs'
vLLM v1源码阅读 : 整体流程梳理(详细debug)
OmO_xx:
非常感谢,另外我还发现vscode debug的时候可以对多个进程分别进行debug,但是需要自己打断点才会在多进程中停下来
vLLM v1源码阅读 : 整体流程梳理(详细debug)
dongbo910220:
想问下debug 的环境是怎样的? WSL 么?
vLLM v1源码阅读 : 整体流程梳理(详细debug)
lily2248531287:
我是在vllm/v1/engine/llm_engine.py里面,把语句self.engine_core = EngineCoreClient.make_client(
multiprocess_mode= multiprocess_mode,
asyncio_mode=False,
vllm_config=vllm_config,
executor_class=executor_class,
log_stats=False, # FIXME: implement
)中的 multiprocess_mode= multiprocess_mode,修改为 multiprocess_mode=False,这样就会在当前进程中直接构造一个 EngineCore 对象,可以一步步debug到forward的位置