How to config algorithm

Lets take the example of cloud-edge-collaborative-inference-for-llm scenario and understand how algorithm developer is able to test his/her own targeted algorithm and configs the algorithm using the following configuration.

The configuration of algorithm

Model Configuration

The models are configured in examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/test_queryrouting.yaml.

In the configuration file, there are two models available for configuration: EdgeModel and CloudModel.

EdgeModel Configuration

The EdgeModel is the model that will be deployed on your local machine, supporting huggingface and vllm as serving backends.

For EdgeModel, the open parameters are:

Parameter Name	Type	Description	Defalut
model	str	model name	Qwen/Qwen2-1.5B-Instruct
backend	str	model serving framework	huggingface
temperature	float	What sampling temperature to use, between 0 and 2	0.8
top_p	float	nucleus sampling parameter	0.8
max_tokens	int	The maximum number of tokens that can be generated in the chat completion	512
repetition_penalty	float	The parameter for repetition penalty	1.05
tensor_parallel_size	int	The size of tensor parallelism (Used for vLLM)	1
gpu_memory_utilization	float	The percentage of GPU memory utilization (Used for vLLM)	0.9

CloudModel Configuration

The CloudModel represents the model on cloud, it will call LLM API via OpenAI API format.

For CloudModel, the open parameters are:

Parameter Name	Type	Description	Defalut
model	str	model name	gpt-4o-mini
temperature	float	What sampling temperature to use, between 0 and 2	0.8
top_p	float	nucleus sampling parameter	0.8
max_tokens	int	The maximum number of tokens that can be generated in the chat completion	512
repetition_penalty	float	The parameter for repetition penalty	1.05

Router Configuration

Router is a component that routes the query to the edge or cloud model. The router is configured by hard_example_mining in examples/cloud-edge-collaborative-inference-for-llm/testrouters/query-routing/test_queryrouting.yaml.

Currently, supported routers include:

Router Type	Description	Parameters
EdgeOnly	Route all queries to the edge model.
CloudOnly	Route all queries to the cloud model.
OracleRouter	Optimal Router
BERTRouter	Use a BERT classifier to route the query to the edge or cloud model.	model, threshold
RandomRouter	Route the query to the edge or cloud model randomly.	threshold

You can modify the router parameter in test_queryrouting.yaml to select the router you want to use.

For BERT router, you can use routellm/bert or routellm/bert_mmlu_augmented or your own BERT model.

Data Processor Configuration

The Data Processor allows you to customize your own data format after the dataset gets loaded.

Currently, supported routers include:

Data Processor	Description	Parameters
OracleRouterDatasetProcessor	Expose `gold` label to OracleRouter

Show example

# test_queryrouting.yaml
algorithm:
  # paradigm name; string type;
  paradigm_type: "jointinference"

  # algorithm module configuration in the paradigm; list type;
  modules:
    # kind of algorithm module; string type;
    - type: "dataset_processor"
      # name of custom dataset processor; string type;
      name: "OracleRouterDatasetProcessor"
      # the url address of custom dataset processor; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/data_processor.py"

    - type: "edgemodel"
      # name of edge model module; string type;
      name: "EdgeModel"
      # the url address of edge model module; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/edge_model.py"

      hyperparameters:
      # name of the hyperparameter; string type;
        - model:
            values:
              - "Qwen/Qwen2.5-1.5B-Instruct"
              - "Qwen/Qwen2.5-3B-Instruct"
              - "Qwen/Qwen2.5-7B-Instruct"
        - backend:
            # backend; string type;
            # currently the options of value are as follows:
            #  1> "huggingface": transformers backend;
            #  2> "vllm": vLLM backend;
            #  3> "api": OpenAI API backend;
            values:
              - "vllm"
        - temperature:
            # What sampling temperature to use, between 0 and 2; float type;
            # For reproducable results, the temperature should be set to 0;
            values:
              - 0
        - top_p:
            # nucleus sampling parameter; float type;
            values:
              - 0.8
        -  max_tokens:
            # The maximum number of tokens that can be generated in the chat completion; int type;
            values:
              - 512
        -  repetition_penalty:
            # The parameter for repetition penalty; float type;
            values:
              - 1.05
        -  tensor_parallel_size:
            # The size of tensor parallelism (Used for vLLM)
            values:
              - 4
        -  gpu_memory_utilization:
            # The percentage of GPU memory utilization (Used for vLLM)
            values:
              - 0.9
        -  use_cache:
            # Whether to use reponse cache; boolean type;
            values:
              - true

    - type: "cloudmodel"
      # name of python module; string type;
      name: "CloudModel"
      # the url address of python module; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/cloud_model.py"

      hyperparameters:
        # name of the hyperparameter; string type;
        - model:
            values:
              - "gpt-4o-mini"
        - temperature:
            values:
              - 0
        - top_p:
            values:
              - 0.8
        -  max_tokens:
            values:
              - 512
        -  repetition_penalty:
            values:
              - 1.05
        -  use_cache:
            values:
              - true

    - type: "hard_example_mining"
      # name of Router module; string type;
      # BERTRouter, EdgeOnly, CloudOnly, RandomRouter, OracleRouter
      name: "EdgeOnly"
      # the url address of python module; string type;
      url: "./examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/hard_sample_mining.py"