大模型量化部署方案
# 基于 cpu 进行模型推理
https://github.com/abetlen/llama-cpp-python/
# 需要生成ggml格式的bin文件,进行量化
# 两种量化方案
gptq vs ggml
# 量化
https://blog.csdn.net/god_zzZ/article/details/130328307
1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
# llama.cpp转换
# colab转换
https://colab.research.google.com/drive/1FnFkyKhrnS7s-2lDDeous-AutdI_SkAd?usp=sharing#scrollTo=gw2xpYC0RcQC
# python bind
https://github.com/abetlen/llama-cpp-python/
https://github.com/ggerganov/llama.cpp
https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
# vLLM 转换
vLLM:https://github.com/vllm-project/vllm
https://www.atyun.com/56675.html
1
2
2
# 接口方案
# 支持openai接口 (https://chat.lmsys.org/)
https://github.com/lm-sys/FastChat
1
2
2
上次更新: 2023-09-07 23:06:40