# IntentRecognitionModule **Repository Path**: cjp_scut/intent-recognition-module ## Basic Information - **Project Name**: IntentRecognitionModule - **Description**: 介绍:意图识别模块整合; - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: feat/interface_integration - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2022-11-14 - **Last Updated**: 2023-09-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 项目文档 ## 说明 1. `./metaData`存放元数据,该文件夹并不push至git,需开发者本地配置 2. `./module`中包含子模块代码 3. `./utils`存放通用组件代码 ## 子模块 ### 维度识别模块 #### 模块输入输出 输入: ```json "今年9月合同金额的同期比" ``` 输出字段: ```json { "question": "今年9月合同金额的同期比", "like": [], "sel": [ { "member_amount": 0, "name": "年", "timeLevel": "year", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682", "type": "STRING" } { "member_amount": 0, "name": "月", "timeLevel": "month", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Month-LEVEL-1648634671682", "type": "STRING" }, { "member_amount": 0, "name": "合同金额", "timeLevel": "", "id": "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "type": "DOUBLE" } ] } ``` ### 意图识别模块 #### 模块输入输出 输入字段有: ```json { "question": "今年9月合同金额的同期比", "sel": [ { "member_amount": 1, "name": "月", "timeLevel": "month", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Month-LEVEL-1648634671682", "type": "STRING" }, { "member_amount": 0, "name": "合同金额", "timeLevel": "", "id": "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "type": "DOUBLE" } ] } ``` 输出字段有: ```json { "question": "今年9月合同金额的同期比", "agg": [ "", "SUM", "SUM", "SUM", "SUM" ], "measure": [ "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e" ], "limit": [], "sel": [ { "member_amount": 1, "name": "月", "timeLevel": "month", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Month-LEVEL-1648634671682", "type": "STRING" }, { "member_amount": 0, "name": "合同金额", "timeLevel": "", "id": "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "type": "DOUBLE" }, ], "mdx":[ { "function_name": "同比", "id": "合同金额同比", "name": "合同金额同比", "type": "MEASURE", "member_amount": 0, "param": { "time_id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Month-LEVEL-1648634671682", "measure_id": "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e"} } ] } ``` ### 条件分类模块 #### 模块输入输出 输入字段有: ```json { "question": "去年合同金额大于100万的分部", "sel": [ { "member_amount": 1, "name": "年", "timeLevel": "year", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682", "type": "STRING" }, { "member_amount": 23, "name": "销售分部", "timeLevel": "", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.2b4ff0f606441075f12c1be466e3642e-LEVEL-1659584076544", "type": "STRING" }, { "member_amount": 0, "name": "合同金额", "timeLevel": "", "id": "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "type": "DOUBLE" } ], "agg": [ "", "", "SUM" ], "limit": [], "measure": [ "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e" ] } ``` 输出字段有: ```json { "question": "去年合同金额大于100万的分部", "sel": [ { "member_amount": 1, "name": "年", "timeLevel": "year", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682", "type": "STRING" }, { "member_amount": 23, "name": "销售分部", "timeLevel": "", "id": "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.2b4ff0f606441075f12c1be466e3642e-LEVEL-1659584076544", "type": "STRING" }, { "member_amount": 0, "name": "合同金额", "timeLevel": "", "id": "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "type": "DOUBLE" } ], "agg": [ "", "", "SUM" ], "conds": [ [ "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682", "==", "2021" ] ], "group_by": [ "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.2b4ff0f606441075f12c1be466e3642e-LEVEL-1659584076544", "AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682" ], "order_by": [ [ "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "DESC", "COL", "NAME", "合同金额" ] ], "cond_conn_op": "", "having": [ [ "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e", "SUM", ">", 1000000.0 ] ], "limit": [], "measure": [ "AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e" ] } ``` ## 前后端交互测试接口 接口:`/getSQLs` 方法:`post` **前端:** 1. 正常输入问题: ```json { "status": 200, "data": { "question": "广州分部的合同金额?" } } ``` 2. 用户选择维度: ```json { "status": 300, "data": { "question": "华南大区、广州茂川科技有限公司今年的合同金额?", "dimensions": { "华南大区": "一级部门", "广州茂川科技有限公司": "客户名称" } } } ``` **后端:** 1. 正常流程 ```json { "status": 200, "data": [ [ { "key": "question", "content": "广州分部的合同金额" }, { "key": "sel", "content": "[{'member_amount': 0, 'name': '年', 'timeLevel': 'year', 'id': 'AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682', 'type': 'STRING'}, {'member_amount': 23, 'name': '销售分部', 'timeLevel': '', 'id': 'AUGMENTED_DATASET_FIELD.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.2b4ff0f606441075f12c1be466e3642e', 'type': 'STRING'}, {'member_amount': 0, 'name': '合同金额', 'timeLevel': '', 'id': 'AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e', 'type': 'DOUBLE'}]" }, { "key": "agg", "content": "['', '', 'SUM']" }, { "key": "measure", "content": "['AUGMENTED_DATASET_MEASURE.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_METRICS.I8a8ae5ca0178549554951b9501785d983aaa005e']" }, { "key": "limit", "content": "[]" }, { "key": "conds", "content": "[['AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682', '==', '2022'], ['AUGMENTED_DATASET_FIELD.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.2b4ff0f606441075f12c1be466e3642e', '==', '广州分部']]" }, { "key": "cond_conn_op", "content": "and" }, { "key": "having", "content": "[]" }, { "key": "group_by", "content": "['AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682', 'AUGMENTED_DATASET_FIELD.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.2b4ff0f606441075f12c1be466e3642e']" }, { "key": "order_by", "content": "[['AUGMENTED_DATASET_LEVEL.I8a8ae5ca0178549554951b9501785cefe3f00058.MT_DIMENSION_FIELD.02de0a83c07e6c06b5f6dc8d09f9fce7-C_DATE_Year-LEVEL-1648634671682', 'ASC', 'COL', 'NAME', '年']]" } ] ] } ``` 2. 需要用户选择: ```json { "status": 300, "data": [ { "ambiguity_dim": "华南大区", "candidates": [ "销售区域", "二级部门" ] }, { "ambiguity_dim": "广州茂川科技有限公司", "candidates": [ "渠道名称", "客户名称" ] } ] } ``` ## 待解决问题 3. 条件分类模块: | 编号 | 问题类型 | 问题描述 | 解决情况 | |------|---------------|-----------------------------------------------------------------------------------------------|------------| | 3-1 | 维度拆分 | 现阶段的维度拆分只是针对conds进行,其他字段没有拆分 | 未完成 | | 3-2 | 时间维度识别 | 季度quarter还未进行识别 | 已完成 | | 3-3 | group_by识别 | group_by规则还未进行完善 | 未完成 | | 3-4 | like模糊查询 | 其实这属于并列句问题,例子“广州和北京的合同金额”,如果“广州”和“北京”都选择“销售分部”的话,conds只会显示一个like | 已完成 | | 3-5 | having指标字面量匹配 | 例子“广州分部合同金额大于10万的合同个数”,会将“大于10万”分给“合同金额”和“合同个数” | 已完成 | | 3-6 | 时间维度的合并 | (年:2022;季:2022Q1;月:2022-03;天:2021-07-01) | “日”的合并尚未完成 | | 3-7 | “周”的识别 | 实际上是不存在“周”这一维度的,把具体周的日期放到list中作"in []" | 已完成 | | 3-8 | 时间维度说法的统一 | 如“前x天”到底是一段时间还是具体某一天 | 未完成 | | 3-9 | day时间维度的补充修改 | 要识别“2018年5月12日”中的12日为“今年这个月的12日”,然后再把“12日”提出来进行合并 | 已完成 | | 3-10 | 时间跨度的识别 | 涉及时间:年、月、季度、日;规则:(今年1到/至5月、1月~5月、2018年到2022年、去年5月到今年1月、2018年1季度到2019年Q2季度)(后两种比较困难,思迈特系统也未实现) | 未完成 | - [x] 针对3-4,将like放到代码进行处理,如找出“%广州%”,“%北京%”所有维度成员放到列表中,然后将like变为"in [广州分部, 北京分部]" - [x] 针对下列问题(下列问题包括3-5),先定位问题出在哪里,然后补到问题列表里: - [x] 广州分部*金额*大于十万的*合同数量*:合同金额和合同个数都是指标,“大于十万”不知道如何匹配(having指标字面量匹配) - [x] 广州分部合同金额大于十万的数量:暂时不用处理,agg这边的识别任务 - [x] 广州分部金额大于十万的合同数:暂时不用处理,其他模块的识别任务 - [x] 广州分部金额大于十万的合同个数:合同金额和合同个数都是指标,“大于十万”不知道如何匹配(having指标字面量匹配) ***having指标字面量匹配问题***解决思路: 将operator的识别操作和数字字面量的识别操作单独拆出来识别,识别完成之后根据在question的距离进行配对,如对于question="广州分部金额大于十万的合同个数" ,识别出”大于“和”十万“之后,根据是否挨着的进行拼接,之后全部的拼接结果塞入列表中,在指标操作阶段再把指标和列表中的元素进行一一匹配赋值。 ***时间维度的合并问题***解决思路: 实际上要进行合并的是(年,季度),(年,月),(年,日),(周,日),在进行合并之后再进行排序操作取最小时间单位放到conds中(年 >季度 >月 >星期 >日), 但在排序的时候需要注意,如果是并列句的话并列维度不需要进行排序比较。实现方案是改正现有代码,在正则识别的时候同时存储span的位置,后续合并的时候再根据span 之间的相对距离远近进行判断。 比较格式: ```json { "key": "周", "id": "1213", "opt": "in", "value": [ { "span": (0,2), "literal": ['2022-12-12','2022-12-13'] }, { "span": (3,6), "literal": "2022-12-10" } ] } ``` 时间维度测试: 1. 去年第三季度和今年第二季度广州分部的合同金额——能成功识别 【'in', ['2022Q2', '2021Q3']】