Qwen3-Next-80B-A3B-Instruct模型服务 ValueError: The number of actual input tensors: 17 is not equal to the number of dynamic shape tensors: 16.

+ export vLLM_MODEL_BACKEND=MindFormers
+ vLLM_MODEL_BACKEND=MindFormers
+ export MS_ENABLE_TRACE_MEMORY=off
+ MS_ENABLE_TRACE_MEMORY=off
+ python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model /model --trust_remote_code --tensor_parallel_size=8 --max-num-seqs=192 --max_model_len=32768 --max-num-batched-tokens=16384 --block-size=32 --gpu-memory-utilization=0.9
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO 09-15 07:29:13 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 09-15 07:29:13 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
  warn(
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO 09-15 07:29:14 [__init__.py:248] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 09-15 07:29:16 [_custom_ops.py:22] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 09-15 07:29:18 vllm-mindspore[logger.py:75] The config of vllm-mindspore logger has been updated successfully.
INFO 09-15 07:29:18 vllm-mindspore[utils.py:195] environment variable "vLLM_MODEL_BACKEND" is MindFormers
WARNING 09-15 07:29:18 vllm-mindspore[utils.py:198] "vLLM_MODEL_BACKEND" will be removed, please use "VLLM_MS_MODEL_BACKEND"
INFO 09-15 07:29:18 vllm-mindspore[utils.py:201] environment variable "VLLM_MS_MODEL_BACKEND" is None
INFO:datasets:Disabling PyTorch because USE_TF is set
INFO:datasets:Disabling Tensorflow because USE_TORCH is set
WARNING 09-15 07:29:20 vllm-mindspore[registry.py:51] Error when importing MindSpore ONE: No module named 'mindone'
INFO 09-15 07:29:20 vllm-mindspore[utils.py:322] Run with Mindformers backend!
2025-09-15 07:29:21,594 - mindformers/home/work/easyedge/llm/output/log[/usr/lib/python3.10/warnings.py:109] - WARNING - RuntimeWarning: 'vllm_mindspore.entrypoints.__main__' found in sys.modules after import of package 'vllm_mindspore.entrypoints', but prior to execution of 'vllm_mindspore.entrypoints.__main__'; this may result in unpredictable behaviour
WARNING 09-15 07:29:27 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
INFO 09-15 07:29:28 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 09-15 07:29:28 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:14: UserWarning: Failed to load image Python extension: 'not support import any ops for now.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  return self._float_to_str(self.smallest_subnormal)
WARNING 09-15 07:30:00 vllm-mindspore[config.py:262] Casting bfloat16 to BFloat16.
INFO 09-15 07:30:00 [config.py:1946] Defaulting to use mp for distributed inference
INFO 09-15 07:30:00 [config.py:1980] Disabled the custom all-reduce kernel because it is not supported on current platform.
INFO 09-15 07:30:00 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 09-15 07:30:01 [core.py:455] Waiting for init message from front-end.
INFO 09-15 07:30:01 [core.py:70] Initializing a V1 LLM engine (v0.9.2.dev0+gb6553be1b.d20250715) with config: model='/model', speculative_config=None, tokenizer='/model', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=BFloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/model, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":1,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":[],"use_inductor":true,"compile_sizes":[20],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null}
INFO 09-15 07:30:01 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
INFO 09-15 07:30:01 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7], buffer_handle=(8, 16777216, 10, 'psm_0a2e8213'), local_subscribe_addr='ipc:///tmp/edc446eb-ecd0-4605-adff-809d9e1652ae', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-15 07:30:03 vllm-mindspore[worker.py:148] bind process 1072 in rank 1 to cpu: [180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191]
WARNING 09-15 07:30:03 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f208c10>
(VllmWorker rank=1 pid=1072) INFO 09-15 07:30:03 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=1 pid=1072) INFO 09-15 07:30:03 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=1 pid=1072) INFO 09-15 07:30:03 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_3320abdf'), local_subscribe_addr='ipc:///tmp/057e47c0-60ed-4c4a-9509-cdd5aab547fc', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-15 07:30:03 vllm-mindspore[worker.py:148] bind process 1069 in rank 0 to cpu: [168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179]
INFO 09-15 07:30:03 vllm-mindspore[worker.py:148] bind process 1075 in rank 2 to cpu: [120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]
WARNING 09-15 07:30:03 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f208af0>
(VllmWorker rank=0 pid=1069) INFO 09-15 07:30:03 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
WARNING 09-15 07:30:03 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f20bcd0>
(VllmWorker rank=2 pid=1075) INFO 09-15 07:30:03 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=2 pid=1075) INFO 09-15 07:30:03 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=0 pid=1069) INFO 09-15 07:30:03 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=2 pid=1075) INFO 09-15 07:30:03 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c53a845c'), local_subscribe_addr='ipc:///tmp/527d32b7-f625-419e-8c0a-67bcec63eeed', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=0 pid=1069) INFO 09-15 07:30:03 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_2dd7d633'), local_subscribe_addr='ipc:///tmp/2cd29b69-1f1a-4f52-9712-9abd90c891e4', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-15 07:30:03 vllm-mindspore[worker.py:148] bind process 1078 in rank 3 to cpu: [132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143]
INFO 09-15 07:30:03 vllm-mindspore[worker.py:148] bind process 1084 in rank 5 to cpu: [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
WARNING 09-15 07:30:04 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f20bee0>
(VllmWorker rank=3 pid=1078) INFO 09-15 07:30:04 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=3 pid=1078) INFO 09-15 07:30:04 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=3 pid=1078) INFO 09-15 07:30:04 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_79f7bfd7'), local_subscribe_addr='ipc:///tmp/a91816c2-3d1c-4f9f-a0f0-ecda7f7feef6', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 09-15 07:30:04 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f208130>
(VllmWorker rank=5 pid=1084) INFO 09-15 07:30:04 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=5 pid=1084) INFO 09-15 07:30:04 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=5 pid=1084) INFO 09-15 07:30:04 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_b1f7503e'), local_subscribe_addr='ipc:///tmp/da945510-a81c-464d-b183-836c2b610d73', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-15 07:30:04 vllm-mindspore[worker.py:148] bind process 1087 in rank 6 to cpu: [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83]
WARNING 09-15 07:30:04 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f2091e0>
(VllmWorker rank=6 pid=1087) INFO 09-15 07:30:04 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_d79289ac'), local_subscribe_addr='ipc:///tmp/e9003014-cfb9-4786-bc33-255ae1a886cb', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-15 07:30:04 vllm-mindspore[worker.py:148] bind process 1090 in rank 7 to cpu: [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
WARNING 09-15 07:30:04 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f2093f0>
(VllmWorker rank=7 pid=1090) INFO 09-15 07:30:04 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_afef9fd9'), local_subscribe_addr='ipc:///tmp/3b9c0a4d-8413-432b-8bea-d15f5d7bb898', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 09-15 07:30:04 vllm-mindspore[worker.py:148] bind process 1081 in rank 4 to cpu: [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
WARNING 09-15 07:30:04 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xffff1f208460>
(VllmWorker rank=4 pid=1081) INFO 09-15 07:30:04 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_f7a8f872'), local_subscribe_addr='ipc:///tmp/b9944f44-2e0c-4bd0-9493-759a79987cea', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=0 pid=1069) INFO 09-15 07:30:41 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_7cdf1ca7'), local_subscribe_addr='ipc:///tmp/649ecf4a-947a-4849-a8b2-7f3ecfcd8b66', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=6 pid=1087) INFO 09-15 07:30:41 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=6 pid=1087) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 6 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 6, EP rank 6
(VllmWorker rank=1 pid=1072) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 1 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
(VllmWorker rank=7 pid=1090) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7, EP rank 7
(VllmWorker rank=0 pid=1069) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(VllmWorker rank=2 pid=1075) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 2 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
(VllmWorker rank=3 pid=1078) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
(VllmWorker rank=5 pid=1084) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 5 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 5, EP rank 5
(VllmWorker rank=4 pid=1081) INFO 09-15 07:30:41 [parallel_state.py:1065] rank 4 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 4, EP rank 4
(VllmWorker rank=4 pid=1081) INFO 09-15 07:30:42 vllm-mindspore[config.py:356] The generated MindFormers config: {'run_mode': 'predict', 'use_legacy': False, 'load_ckpt_format': 'safetensors', 'auto_trans_ckpt': True, 'parallel_mode': 'STAND_ALONE', 'parallel_config': {'model_parallel': 8, 'pipeline_stage': 1, 'data_parallel': 1, 'vocab_emb_dp': False}, 'model': {'model_config': {'compute_dtype': mindspore.bfloat16, 'max_position_embeddings': 32768, 'block_size': 32, 'layernorm_compute_dtype': 'bfloat16', 'rotary_dtype': 'bfloat16', 'params_dtype': 'bfloat16', 'router_dense_type': 'bfloat16', 'pre_process': True, 'post_process': True, 'offset': 0}}}
(VllmWorker rank=4 pid=1081) {'parallel_config': None, 'pet_config': None, 'context_parallel_algo': 'colossalai_cp', 'is_dynamic': False, 'compute_dtype': mindspore.bfloat16, 'layernorm_compute_dtype': mindspore.bfloat16, 'rotary_dtype': mindspore.bfloat16, 'rotary_cos_format': 'rotate_half', 'bias_swiglu_fusion': False, 'mla_qkv_concat': True, 'use_contiguous_weight_layout_attention': False, 'use_interleaved_weight_layout_mlp': True, 'normalization': 'RMSNorm', 'fused_norm': True, 'add_bias_linear': False, 'add_mlp_fc1_bias_linear': None, 'add_mlp_fc2_bias_linear': None, 'gated_linear_unit': True, 'use_flash_attention': True, 'attention_pre_tokens': None, 'attention_next_tokens': None, 'query_chunk': True, 'rotary_seq_len_interpolation_factor': None, 'rope_scaling': None, 'use_rope_scaling': False, 'input_layout': 'BNSD', 'sparse_mode': 0, 'use_alibi_mask': False, 'use_attn_mask_compression': False, 'use_eod_attn_mask_compression': False, 'use_attention_mask': True, 'use_ring_attention': False, 'fp16_lm_cross_entropy': False, 'untie_embeddings_and_output_weights': True, 'hidden_act': 'silu', 'mask_func_type': 'attn_mask_fill', 'comp_comm_parallel': False, 'comp_comm_parallel_degree': 2, 'norm_topk_prob': True, 'use_fused_ops_topkrouter': False, 'use_shared_expert_gating': False, 'topk_method': 'greedy', 'npu_nums_per_device': 8, 'use_pad_tokens': True, 'callback_moe_droprate': False, 'moe_init_method_std': 0.01, 'first_k_dense_replace': None, 'moe_router_enable_expert_bias': False, 'moe_router_force_expert_balance': False, 'moe_router_score_function': 'softmax', 'moe_router_fusion': False, 'moe_shared_expert_gated': True, 'use_eod_reset': False, 'hidden_dropout': 0.0, 'residual_dtype': None, 'print_separate_loss': True, 'vocab_size': 151936, 'seq_length': 4096, 'pad_token_id': 0, 'ignore_token_id': -100, 'max_position_embeddings': 32768, 'sandwich_norm': False, 'tie_word_embeddings': False, 'block_size': 32, 'num_blocks': 1024, 'parallel_decoding_params': None, 'pre_process': True, 'post_process': True, 'dispatch_global_max_bs': 0, 'attn_reduce_scatter': False, 'attn_allgather': False, 'attn_allreduce': True, 'ffn_reduce_scatter': False, 'ffn_allgather': False, 'ffn_allreduce': True, 'use_alltoall': False, 'use_fused_mla': False, 'quantization_config': None, 'disable_lazy_inline': False, 'data_parallel_size': 1, 'tensor_model_parallel_size': 8, 'pipeline_model_parallel_size': 1, 'virtual_pipeline_model_parallel_size': None, 'sequence_parallel': False, 'context_parallel_size': 1, 'hierarchical_context_parallel_sizes': 1, 'expert_model_parallel_size': 1, 'expert_tensor_parallel_size': None, 'micro_batch_num': 1, 'seq_split_num': 1, 'gradient_aggregation_group': 4, 'offset': 0, 'ulysses_degree_in_cp': 1, 'vocab_emb_dp': False, 'fp16': False, 'bf16': False, 'params_dtype': mindspore.bfloat16, 'finalize_model_grads_func': None, 'grad_scale_func': None, 'grad_sync_func': None, 'param_sync_func': None, 'num_microbatches_with_partial_activation_checkpoints': None, 'cpu_offloading': False, 'cpu_offloading_num_layers': None, 'cpu_offloading_weights': False, 'op_swap': None, 'default_prefetch': 1, 'num_layers': 48, 'mtp_num_layers': 0, 'mtp_loss_scaling_factor': None, 'hidden_size': 2048, 'num_attention_heads': 16, 'softmax_scale': None, 'num_query_groups': 2, 'ffn_hidden_size': 5120, 'kv_channels': 256, 'attention_dropout': 0.0, 'fp32_residual_connection': False, 'apply_residual_connection_post_layernorm': False, 'layernorm_epsilon': 1e-06, 'layernorm_zero_centered_gamma': False, 'add_qkv_bias': False, 'activation_func': 'gelu', 'num_moe_experts': 512, 'rotary_interleaved': False, 'calculate_per_token_loss': False, 'multi_latent_attention': False, 'position_embedding_type': 'rope', 'rotary_base': 10000000, 'partial_rotary_factor': 0.25, 'qk_layernorm': False, 'linear_conv_kernel_dim': 4, 'linear_expand_v': 0, 'linear_key_head_dim': 128, 'linear_num_key_heads': 16, 'linear_num_value_heads': 32, 'linear_value_head_dim': 128, 'full_attention_interval': 4, 'init_method': <function init_method_normal.<locals>.init_ at 0xfffef038c0d0>, 'output_layer_init_method': <function scaled_init_method_normal.<locals>.init_ at 0xfffef038c160>, 'init_method_std': 0.02, 'init_model_with_meta_device': False, 'apply_query_key_layer_scaling': False, 'attention_softmax_in_fp32': True, 'softmax_compute_dtype': mindspore.float32, 'disable_bf16_reduced_precision_matmul': False, 'bias_activation_fusion': False, 'masked_softmax_fusion': False, 'persist_layer_norm': False, 'memory_efficient_layer_norm': False, 'bias_dropout_fusion': False, 'apply_rope_fusion': False, 'recompute': False, 'select_recompute': False, 'parallel_optimizer_comm_recompute': False, 'select_comm_recompute': False, 'mp_comm_recompute': True, 'recompute_slice_activation': False, 'select_recompute_exclude': False, 'select_comm_recompute_exclude': False, 'moe_shared_expert_intermediate_size': 512, 'moe_shared_expert_overlap': False, 'moe_layer_freq': 1, 'moe_ffn_hidden_size': 512, 'moe_router_load_balancing_type': 'aux_loss', 'moe_router_topk': 10, 'moe_router_num_groups': None, 'moe_router_group_topk': None, 'moe_router_pre_softmax': False, 'moe_router_topk_scaling_factor': None, 'moe_router_dtype': mindspore.bfloat16, 'moe_router_bias_update_rate': 0.001, 'moe_grouped_gemm': False, 'moe_aux_loss_coeff': 0.0, 'moe_z_loss_coeff': None, 'moe_input_jitter_eps': None, 'group_wise_a2a': False, 'moe_token_dispatcher_type': 'alltoall', 'moe_enable_deepep': False, 'moe_per_layer_logging': False, 'moe_expert_capacity_factor': None, 'moe_pad_expert_input_to_capacity': False, 'moe_token_drop_policy': 'probs', 'moe_permute_fusion': False, 'moe_apply_probs_on_input': False, 'shared_expert_num': 1, 'enable_expert_relocation': False, 'expert_relocation_initial_iteration': 20, 'expert_relocation_freq': 50, 'print_expert_load': False, 'cp_comm_type': 'all_gather'}
(VllmWorker rank=1 pid=1072) 2025-09-15 07:30:47,508 - mindformers/home/work/easyedge/llm/output/log[mindformers/models/model_config_utils.py:165] - WARNING - | use_sliding_window     | Useless                                     |

。。。。
go': 'colossalai_cp', 'is_dynamic': False, 'compute_dtype': mindspore.bfloat16, 'layernorm_compute_dtype': mindspore.bfloat16, 'rotary_dtype': mindspore.bfloat16, 'rotary_cos_format': 'rotate_half', 'bias_swiglu_fusion': False, 'mla_qkv_concat': True, 'use_contiguous_weight_layout_attention': False, 'use_interleaved_weight_layout_mlp': True, 'normalization': 'RMSNorm', 'fused_norm': True, 'add_bias_linear': False, 'add_mlp_fc1_bias_linear': None, 'add_mlp_fc2_bias_linear': None, 'gated_linear_unit': True, 'use_flash_attention': True, 'attention_pre_tokens': None, 'attention_next_tokens': None, 'query_chunk': True, 'rotary_seq_len_interpolation_factor': None, 'rope_scaling': None, 'use_rope_scaling': False, 'input_layout': 'BNSD', 'sparse_mode': 0, 'use_alibi_mask': False, 'use_attn_mask_compression': False, 'use_eod_attn_mask_compression': False, 'use_attention_mask': True, 'use_ring_attention': False, 'fp16_lm_cross_entropy': False, 'untie_embeddings_and_output_weights': True, 'hidden_act': 'silu', 'mask_func_type': 'attn_mask_fill', 'comp_comm_parallel': False, 'comp_comm_parallel_degree': 2, 'norm_topk_prob': True, 'use_fused_ops_topkrouter': False, 'use_shared_expert_gating': False, 'topk_method': 'greedy', 'npu_nums_per_device': 8, 'use_pad_tokens': True, 'callback_moe_droprate': False, 'moe_init_method_std': 0.01, 'first_k_dense_replace': None, 'moe_router_enable_expert_bias': False, 'moe_router_force_expert_balance': False, 'moe_router_score_function': 'softmax', 'moe_router_fusion': False, 'moe_shared_expert_gated': True, 'use_eod_reset': False, 'hidden_dropout': 0.0, 'residual_dtype': None, 'print_separate_loss': True, 'vocab_size': 151936, 'seq_length': 4096, 'pad_token_id': 0, 'ignore_token_id': -100, 'max_position_embeddings': 32768, 'sandwich_norm': False, 'tie_word_embeddings': False, 'block_size': 32, 'num_blocks': 1024, 'parallel_decoding_params': None, 'pre_process': True, 'post_process': True, 'dispatch_global_max_bs': 0, 'attn_reduce_scatter': False, 'attn_allgather': False, 'attn_allreduce': True, 'ffn_reduce_scatter': False, 'ffn_allgather': False, 'ffn_allreduce': True, 'use_alltoall': False, 'use_fused_mla': False, 'quantization_config': None, 'disable_lazy_inline': False, 'data_parallel_size': 1, 'tensor_model_parallel_size': 8, 'pipeline_model_parallel_size': 1, 'virtual_pipeline_model_parallel_size': None, 'sequence_parallel': False, 'context_parallel_size': 1, 'hierarchical_context_parallel_sizes': 1, 'expert_model_parallel_size': 1, 'expert_tensor_parallel_size': None, 'micro_batch_num': 1, 'seq_split_num': 1, 'gradient_aggregation_group': 4, 'offset': 0, 'ulysses_degree_in_cp': 1, 'vocab_emb_dp': False, 'fp16': False, 'bf16': False, 'params_dtype': mindspore.bfloat16, 'finalize_model_grads_func': None, 'grad_scale_func': None, 'grad_sync_func': None, 'param_sync_func': None, 'num_microbatches_with_partial_activation_checkpoints': None, 'cpu_offloading': False, 'cpu_offloading_num_layers': None, 'cpu_offloading_weights': False, 'op_swap': None, 'default_prefetch': 1, 'num_layers': 48, 'mtp_num_layers': 0, 'mtp_loss_scaling_factor': None, 'hidden_size': 2048, 'num_attention_heads': 16, 'softmax_scale': None, 'num_query_groups': 2, 'ffn_hidden_size': 5120, 'kv_channels': 256, 'attention_dropout': 0.0, 'fp32_residual_connection': False, 'apply_residual_connection_post_layernorm': False, 'layernorm_epsilon': 1e-06, 'layernorm_zero_centered_gamma': False, 'add_qkv_bias': False, 'activation_func': 'gelu', 'num_moe_experts': 512, 'rotary_interleaved': False, 'calculate_per_token_loss': False, 'multi_latent_attention': False, 'position_embedding_type': 'rope', 'rotary_base': 10000000, 'partial_rotary_factor': 0.25, 'qk_layernorm': False, 'linear_conv_kernel_dim': 4, 'linear_expand_v': 0, 'linear_key_head_dim': 128, 'linear_num_key_heads': 16, 'linear_num_value_heads': 32, 'linear_value_head_dim': 128, 'full_attention_interval': 4, 'init_method': <function init_method_normal.<locals>.init_ at 0xfffef038c0d0>, 'output_layer_init_method': <function scaled_init_method_normal.<locals>.init_ at 0xfffef038c160>, 'init_method_std': 0.02, 'init_model_with_meta_device': False, 'apply_query_key_layer_scaling': False, 'attention_softmax_in_fp32': True, 'softmax_compute_dtype': mindspore.float32, 'disable_bf16_reduced_precision_matmul': False, 'bias_activation_fusion': False, 'masked_softmax_fusion': False, 'persist_layer_norm': False, 'memory_efficient_layer_norm': False, 'bias_dropout_fusion': False, 'apply_rope_fusion': False, 'recompute': False, 'select_recompute': False, 'parallel_optimizer_comm_recompute': False, 'select_comm_recompute': False, 'mp_comm_recompute': True, 'recompute_slice_activation': False, 'select_recompute_exclude': False, 'select_comm_recompute_exclude': False, 'moe_shared_expert_intermediate_size': 512, 'moe_shared_expert_overlap': False, 'moe_layer_freq': 1, 'moe_ffn_hidden_size': 512, 'moe_router_load_balancing_type': 'aux_loss', 'moe_router_topk': 10, 'moe_router_num_groups': None, 'moe_router_group_topk': None, 'moe_router_pre_softmax': False, 'moe_router_topk_scaling_factor': None, 'moe_router_dtype': mindspore.bfloat16, 'moe_router_bias_update_rate': 0.001, 'moe_grouped_gemm': False, 'moe_aux_loss_coeff': 0.0, 'moe_z_loss_coeff': None, 'moe_input_jitter_eps': None, 'group_wise_a2a': False, 'moe_token_dispatcher_type': 'alltoall', 'moe_enable_deepep': False, 'moe_per_layer_logging': False, 'moe_expert_capacity_factor': None, 'moe_pad_expert_input_to_capacity': False, 'moe_token_drop_policy': 'probs', 'moe_permute_fusion': False, 'moe_apply_probs_on_input': False, 'shared_expert_num': 1, 'enable_expert_relocation': False, 'expert_relocation_initial_iteration': 20, 'expert_relocation_freq': 50, 'print_expert_load': False, 'cp_comm_type': 'all_gather'}
INFO 09-15 07:35:49 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
Loading safetensors checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 41/41 [03:05<00:00,  4.54s/it]
(VllmWorker rank=1 pid=1072) 2025-09-15 07:36:30,371 - mindformers/home/work/easyedge/llm/output/log[mindformers/parallel_core/inference/base_models/gpt/gpt_model.py:473] - WARNING - These parameters are not loaded in the network: set()
(VllmWorker rank=1 pid=1072) INFO 09-15 07:36:30 [default_loader.py:272] Loading weights took 3971725.01 seconds
(VllmWorker rank=1 pid=1072) INFO 09-15 07:36:31 [gpu_model_runner.py:1624] Model loading took 36.8734 GiB and 348.161880 seconds
INFO 09-15 07:36:31 vllm-mindspore[shm_broadcast.py:36] Entering mindspore shm_broadcast
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     output = func(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/msadapter/utils/_contextlib.py", line 117, in decorate_context
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return func(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/msadapter/utils/_contextlib.py", line 117, in decorate_context
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return func(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     self.model_runner.profile_run()
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2012, in profile_run
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     self.model_runner.profile_run()
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     hidden_states = self._dummy_run(self.max_num_tokens)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2012, in profile_run
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/msadapter/utils/_contextlib.py", line 117, in decorate_context
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     hidden_states = self._dummy_run(self.max_num_tokens)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return func(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/msadapter/utils/_contextlib.py", line 117, in decorate_context
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1847, in _dummy_run
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return func(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     outputs = model(
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1847, in _dummy_run
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm_mindspore/model_executor/models/model_base.py", line 244, in __call__
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     outputs = model(
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return self.forward(input_ids,
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm_mindspore/model_executor/models/model_base.py", line 244, in __call__
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm_mindspore/model_executor/models/mf_models/mindformers.py", line 376, in forward
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return self.forward(input_ids,
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     hidden_states = self.network(**model_inputs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/vllm_mindspore/model_executor/models/mf_models/mindformers.py", line 376, in forward
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/nn/cell.py", line 1373, in __call__
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     hidden_states = self.network(**model_inputs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return self._complex_call(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/nn/cell.py", line 1373, in __call__
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/nn/cell.py", line 1383, in _complex_call
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return self._complex_call(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     output = self.construct(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/nn/cell.py", line 1383, in _complex_call
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindformers/models/utils.py", line 167, in decorator
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     output = self.construct(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return ms.jit(func, jit_level='O0', infer_boost='on')(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindformers/models/utils.py", line 167, in decorator
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 1117, in staging_specialize
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return ms.jit(func, jit_level='O0', infer_boost='on')(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     out = jit_executor(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 1117, in staging_specialize
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 187, in wrapper
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     out = jit_executor(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     results = fn(*arg, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 187, in wrapper
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 656, in __call__
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     results = fn(*arg, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     predict, res = self._predict(*args, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 656, in __call__
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 639, in _predict
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     predict, res = self._predict(*args, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     raise err
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 639, in _predict
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 636, in _predict
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     raise err
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     predict_phase = self.compile(self.fn.__name__, *args_list, **kwargs)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 636, in _predict
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 693, in compile
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     predict_phase = self.compile(self.fn.__name__, *args_list, **kwargs)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     compile_args = self._generate_compile_args(args)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 693, in compile
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 891, in _generate_compile_args
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     compile_args = self._generate_compile_args(args)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return self._generate_compile_args_by_set_inputs(args_list)
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 891, in _generate_compile_args
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 848, in _generate_compile_args_by_set_inputs
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     return self._generate_compile_args_by_set_inputs(args_list)
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     raise ValueError(f"The number of actual input tensors: {len(args_list)} is not equal to the number of "
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]   File "/usr/local/lib/python3.10/dist-packages/mindspore/common/api.py", line 848, in _generate_compile_args_by_set_inputs
(VllmWorker rank=2 pid=1075) ERROR 09-15 07:36:31 [multiproc_executor.py:527] ValueError: The number of actual input tensors: 17 is not equal to the number of dynamic shape tensors: 16.
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527]     raise ValueError(f"The number of actual input tensors: {len(args_list)} is not equal to the number of "
(VllmWorker rank=3 pid=1078) ERROR 09-15 07:36:31 [multiproc_executor.py:527] ValueError: The number of actual input tensors: 17 is not equal to the number of dynamic shape tensors: 16.
(VllmWorker rank=1 pid=1072) ERROR 09-15 07:36:31 [multiproc_executor.py:527]

MindSpore/vllm-mindspore

内容风险标识

评论 (1)

MindSpore/vllm-mindspore .gitee-modal { width: 500px !important; }

内容风险标识