Leaderboard of Joint Inference Learning with Cloud Edge Collaborative Inference For LLM Scenario
rank |
algorithm |
Accuracy |
Edge Ratio |
Time to First Token |
Throughput |
Internal Token Latency |
Cloud Prompt Tokens |
Cloud Completion Tokens |
Edge Prompt Tokens |
Edge Completion Tokens |
paradigm |
hard_example_mining |
edgemodel-model |
edgemodel-backend |
cloudmodel-model |
time |
url |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
query-routing |
84.22 |
87.62 |
0.347 |
179.28 |
0.006 |
1560307 |
20339 |
10695142 |
30104 |
jointinference |
OracleRouter |
Qwen/Qwen2.5-7B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:58:30 |
./workspace-mmlu/benchmarkingjob/query-routing/b8eb2606-950a-11ef-8cbc-c97e05df5d14 |
2 |
query-routing |
82.75 |
77.55 |
0.316 |
216.72 |
0.005 |
2727792 |
18177 |
9470276 |
291364 |
jointinference |
OracleRouter |
Qwen/Qwen2.5-3B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:58:19 |
./workspace-mmlu/benchmarkingjob/query-routing/b8eb2605-950a-11ef-8cbc-c97e05df5d14 |
3 |
query-routing |
82.22 |
76.12 |
0.256 |
320.39 |
0.003 |
2978026 |
23254 |
9209538 |
29126 |
jointinference |
OracleRouter |
Qwen/Qwen2.5-1.5B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:58:09 |
./workspace-mmlu/benchmarkingjob/query-routing/b8eb2604-950a-11ef-8cbc-c97e05df5d14 |
4 |
query-routing |
75.99 |
0.0 |
0.691 |
698.83 |
0.001 |
11739216 |
79115 |
0 |
0 |
jointinference |
CloudOnly |
Qwen/Qwen2.5-1.5B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:57:43 |
./workspace-mmlu/benchmarkingjob/query-routing/abe4062e-950a-11ef-8cbc-c97e05df5d14 |
5 |
query-routing |
71.84 |
100.0 |
0.301 |
164.34 |
0.006 |
0 |
0 |
12335559 |
34817 |
jointinference |
EdgeOnly |
Qwen/Qwen2.5-7B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:57:30 |
./workspace-mmlu/benchmarkingjob/query-routing/9b726328-950a-11ef-8cbc-c97e05df5d14 |
6 |
query-routing |
60.3 |
100.0 |
0.206 |
176.71 |
0.006 |
0 |
0 |
12335559 |
397386 |
jointinference |
EdgeOnly |
Qwen/Qwen2.5-3B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:57:23 |
./workspace-mmlu/benchmarkingjob/query-routing/9b726327-950a-11ef-8cbc-c97e05df5d14 |
7 |
query-routing |
58.35 |
100.0 |
0.123 |
271.81 |
0.004 |
0 |
0 |
12335559 |
38982 |
jointinference |
EdgeOnly |
Qwen/Qwen2.5-1.5B-Instruct |
vllm |
gpt-4o-mini |
2024-10-28 16:57:16 |
./workspace-mmlu/benchmarkingjob/query-routing/9b726326-950a-11ef-8cbc-c97e05df5d14 |