0 m8 E. n- f% c6 e) U6 K5 S9 K; IInitially, we did not use NVLink to avoid extra costs and maintain stability, as HFReduce was sufficient for training requirements at that time. However, as the demand for LLMs increased, we added NVLink specifically for LLM training purposes. The decision to install NVLink should be based on actual needs due to its potential drawbacks。# ]2 L/ b; l0 ~0 y% V0 S
0 a/ P, G* \/ f; e. x4 y最大的问题:这样的做法在scalability和transportability方面有什么长处、短处?因为没有看懂,所以还是没有解答最初的疑问:如果换GPU,换模型架构,或者极大增大模型尺寸,这套架构需要推倒重来吗? Q s j$ q3 t5 k : `2 M; a3 o6 g这是case by case optimization,还是scalable and transportable framework not only in terms of concept but also toolset?作者: xiejin77 时间: 2025-2-8 11:13