# Github_cpufp **Repository Path**: PanYizhe/Github_cpufp ## Basic Information - **Project Name**: Github_cpufp - **Description**: https://github.com/pigirons/cpufp.git - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: https://github.com/pigirons/cpufp.git - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-03-31 - **Last Updated**: 2025-04-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # cpufp This is a cpu tool for benchmarking the peak performance of floating-points and AI ISAs. It can automatically sense the local SIMD|DSA ISAs while compiling. ## Support OS and ISA | Arch |Linux| MacOS| Windows| |:--------------|:---:|:----:|:------:| | arm64 | yes | no | no | | e2k | yes | no | no | | loongarch64 | yes | no | no | | riscv64 | yes | no | no | | x86-64 | yes | no | no | ## Support x86-64 SIMD|DSA ISA |Arch|ISA|Feature|Data Type|Description| | ------------ | ------------ | ------------ | ------------ | ------------ | |SIMD|SSE|Vector|fp32|Before Sandy Bridge| |SIMD|SSE2|Vector|fp64|Before Sandy Bridge| |SIMD|AVX|Vector|fp32/fp64|From Sandy Bridge| |SIMD|FMA|Vector|fp32/fp64|From Haswell/Zen| |SIMD|AVX512f|Vector|fp32/fp64|From Skylake X/Zen4| |SIMD|AVX512_VNNI|Vector|int8/int16|From IceLake| |SIMD|AVX_VNNI|Vector|int8/int16|From Alder Lake| |SIMD|AVX512_FP16|Vector|fp16|From Intel Sapphire Rapids| |SIMD|AVX512_BF16|Vector|bf16|From AMD Zen4| |SIMD|AVX_VNNI_INT8|Vector|int8|Unknown| |DSA|AMX_INT8|Matrix|int8|From Intel Sapphire Rapids| |DSA|AMX_BF16|Matrix|bf16|From Intel Sapphire Rapids| ## Support arm64 SIMD ISA |Arch|ISA|Feature|Data Type|Description| | ------------ | ------------ | ------------ | ------------ | ------------ | |SIMD|asimd|Vector|fp32/fp64|From Cortex-A57/A53| |SIMD|asimd_hp|Vector|fp16|From Cortex-A75/A55| |SIMD|asimd_dp|Vector|int8|From Cortex-A75/A55| |SIMD|bf16|Matrix|bf16|From Cortex-X2/A710/A510| |SIMD|i8mm|Matrix|int8|From Cortex-X2/A710/A510| ## Support riscv64 VECTOR ISA |Arch|ISA|Feature|Data Type|Description| | ------------ | ------------ | ------------ | ------------ | ------------ | |SIMD|V|Vector|fp16/fp32/fp64|From RISC-V "V" vector extension. Version 1.0| |DSA|ime|Matrix|int8|From SpacemiT-X60| NOTE: ime is a SpacemiT custom vendor extension. ## Support loongarch64 ISA |Arch|ISA|Feature|Data Type|Description| | ------------ | ------------ | ------------ | ------------ | ------------ | |SIMD|LASX|Vector|fp32/fp64|From Loongson 3A5000| |SIMD|LSX|Vector|fp32/fp64|From Loongson 3A5000| |Scalar|FP|Scalar|fp32/fp64|From Loongson 3A5000| ## Support e2k ISA | Arch | ISA |Feature| Vector Width | Data Type |Description |:-----|:------|:-----:|:------------:|----------:|:---------- | SIMD | v6 | Vector| 128 | fp32/fp64 | FMA | SIMD | v5 | Vector| 128 | fp32/fp64 | Combined operations |Scalar| v1-v4 | Scalar| | fp64 | Combined operations | SIMD | v1-v4 | Vector| 64 | fp32 | Combined operations ### Combined operations E2K has support for instructions that perform two independant operations. It is like FMA, but with additional rounding as these operations is independant. #### Example `fmul_addd` ``` fmul_addd src1, src2, src3, dst ``` ##### Description Multiply double-precision (64-bit) floating-point values from `src1` and `src2`, and add the intermediate result to value from `src3`. Store the result in `dst`. ##### Operation ``` dst[63:0] := src3[63:0] + src1[63:0] * src2[63:0] ``` ##### Latency and Throughput | Architecture | Latency | Throughput (CPI) | ALC |:--------------|:-------:|:----------------:|:---: | elbrus-v4 | 8 | 0.16 | `012345` | elbrus-v1 | 8 | 0.25 | `01-34-` * ALC (Arithmetic Logic Complex/Channel) is an execution port for RISC-like instructions ## How to build build x64 version: `./build_x64.sh` build arm64 version: `./build_arm64.sh` build riscv64 version: `./build_riscv64.sh` build loongarch64 version: `./build_loongarch64.sh` build e2k version: `./build_e2k.sh` clean: `./clean.sh` ## How to benchmark `./cpufp --thread_pool=[xxx] --idle_time=yyy` --thread_pool: [xxx] is the list of cpu thread to benchmarking, from setting affinities. Please reference the result of lstopo command. For example, [0,3,5-8,13-15]. --idle_time: the interval time(sec) between any two adjacent benchmarks, default is 0. ## Benchmark results [x86-64 cpufp benchmark results](benchmark_result/x64.md) [arm64 cpufp benchmark results](benchmark_result/arm64.md) [riscv64 cpufp benchmark results](benchmark_result/riscv64.md) [loongarch64 cpufp benchmark results](benchmark_result/loongarch64.md) [e2k cpufp benchmark results](benchmark_result/e2k.md) ## Todo list Add armv9(SVE, SVE2 & SME) Supports.