How to config algorithm

Lets take the example of cloud-edge-collaborative-inference-for-llm scenario and understand how algorithm developer is able to test his/her own targeted algorithm and configs the algorithm using the following configuration.

The configuration of algorithm

Model Configuration

The models are configured in examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/test_queryrouting.yaml.

In the configuration file, there are two models available for configuration: EdgeModel and CloudModel.

EdgeModel Configuration

The EdgeModel is the model that will be deployed on your local machine, supporting huggingface and vllm as serving backends.

For EdgeModel, the open parameters are:

Parameter Name

Type

Description

Defalut

model

str

model name

Qwen/Qwen2-1.5B-Instruct

backend

str

model serving framework

huggingface

temperature

float

What sampling temperature to use, between 0 and 2

0.8

top_p

float

nucleus sampling parameter

0.8

max_tokens

int

The maximum number of tokens that can be generated in the chat completion

512

repetition_penalty

float

The parameter for repetition penalty

1.05

tensor_parallel_size

int

The size of tensor parallelism (Used for vLLM)

1

gpu_memory_utilization

float

The percentage of GPU memory utilization (Used for vLLM)

0.9

CloudModel Configuration

The CloudModel represents the model on cloud, it will call LLM API via OpenAI API format.

For CloudModel, the open parameters are:

Parameter Name

Type

Description

Defalut

model

str

model name

gpt-4o-mini

temperature

float

What sampling temperature to use, between 0 and 2

0.8

top_p

float

nucleus sampling parameter

0.8

max_tokens

int

The maximum number of tokens that can be generated in the chat completion

512

repetition_penalty

float

The parameter for repetition penalty

1.05

Router Configuration

Router is a component that routes the query to the edge or cloud model. The router is configured by hard_example_mining in examples/cloud-edge-collaborative-inference-for-llm/testrouters/query-routing/test_queryrouting.yaml.

Currently, supported routers include:

Router Type

Description

Parameters

EdgeOnly

Route all queries to the edge model.

CloudOnly

Route all queries to the cloud model.

OracleRouter

Optimal Router

BERTRouter

Use a BERT classifier to route the query to the edge or cloud model.

model, threshold

RandomRouter

Route the query to the edge or cloud model randomly.

threshold

You can modify the router parameter in test_queryrouting.yaml to select the router you want to use.

For BERT router, you can use routellm/bert or routellm/bert_mmlu_augmented or your own BERT model.

Data Processor Configuration

The Data Processor allows you to customize your own data format after the dataset gets loaded.

Currently, supported routers include:

Data Processor

Description

Parameters

OracleRouterDatasetProcessor

Expose gold label to OracleRouter

Show example

# test_queryrouting.yaml
algorithm:
  # paradigm name; string type;
  paradigm_type: "jointinference"

  # algorithm module configuration in the paradigm; list type;
  modules:
    # kind of algorithm module; string type;
    - type: "dataset_processor"
      # name of custom dataset processor; string type;
      name: "OracleRouterDatasetProcessor"
      # the url address of custom dataset processor; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/data_processor.py"

    - type: "edgemodel"
      # name of edge model module; string type;
      name: "EdgeModel"
      # the url address of edge model module; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/edge_model.py"

      hyperparameters:
      # name of the hyperparameter; string type;
        - model:
            values:
              - "Qwen/Qwen2.5-1.5B-Instruct"
              - "Qwen/Qwen2.5-3B-Instruct"
              - "Qwen/Qwen2.5-7B-Instruct"
        - backend:
            # backend; string type;
            # currently the options of value are as follows:
            #  1> "huggingface": transformers backend;
            #  2> "vllm": vLLM backend;
            #  3> "api": OpenAI API backend;
            values:
              - "vllm"
        - temperature:
            # What sampling temperature to use, between 0 and 2; float type;
            # For reproducable results, the temperature should be set to 0;
            values:
              - 0
        - top_p:
            # nucleus sampling parameter; float type;
            values:
              - 0.8
        -  max_tokens:
            # The maximum number of tokens that can be generated in the chat completion; int type;
            values:
              - 512
        -  repetition_penalty:
            # The parameter for repetition penalty; float type;
            values:
              - 1.05
        -  tensor_parallel_size:
            # The size of tensor parallelism (Used for vLLM)
            values:
              - 4
        -  gpu_memory_utilization:
            # The percentage of GPU memory utilization (Used for vLLM)
            values:
              - 0.9
        -  use_cache:
            # Whether to use reponse cache; boolean type;
            values:
              - true

    - type: "cloudmodel"
      # name of python module; string type;
      name: "CloudModel"
      # the url address of python module; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/cloud_model.py"

      hyperparameters:
        # name of the hyperparameter; string type;
        - model:
            values:
              - "gpt-4o-mini"
        - temperature:
            values:
              - 0
        - top_p:
            values:
              - 0.8
        -  max_tokens:
            values:
              - 512
        -  repetition_penalty:
            values:
              - 1.05
        -  use_cache:
            values:
              - true

    - type: "hard_example_mining"
      # name of Router module; string type;
      # BERTRouter, EdgeOnly, CloudOnly, RandomRouter, OracleRouter
      name: "EdgeOnly"
      # the url address of python module; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/hard_sample_mining.py"