# ml_start

**Repository Path**: tonyw/ml_start

## Basic Information

- **Project Name**: ml_start
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2017-01-21
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 机器学习笔记与代码

依赖：
0. python 使用版本：3.5
1. numpy 实现向量运算
2. sklearn 在knn中使用pca库降维,通过kmeans聚类采样数据
3. joblib 并行计算，加速pca训练速度
5. matplotlib logisic回归效果图

# 主要内容

1. 将mnist图片解析为矩阵，预测手写数字识别
2. knn识别
3. logistic回归

# 代码阅读指导：

## 数据预处理
1. 使用pip安装安装依赖
2. 运行python convert_original_data.py
3. 运行python tain_pca.py
 1. pca后的图片存储到pca_train_matrix_{i}_{dim}文件中，dim为降维的维度。默认维数为30。30可以保留90%左右的特征
4. run python kmeans.py 减少数据量，方便快速验证。但是会降低分类器的效果(正确率)

## KNN分类

1. 运行python test_knn.py,用测试数据验证分类效果，并将分类标签存储到lables目录中
2. you can run python test_knn_tensorflow_gpu.py for tensorflow version,before that you should install tensorflow by pip
2. 运行python persent.py,打印分类正确率

## 线性Logistic分类
1. 运行python logistic.py,训练logistic线性分类器的模型
2. 运行python test_logistic.py,用于测试分类器性能
 1. logistic classification can't use the pca data, since the pca make the vectors undivided
 2. 分类时，将二分模型转化为多分类模型