常用的工具函数整理

2019年8月17日07:32:37

评论 2377字阅读7分55秒

摘要这里就是一些常用的工具函数, 可以在需要的时候快速的进行使用.

文章目录(Table of Contents)

简介

这里放一些工具函数, 如计算准确率, 文件跳行读取的方式等, 方便自己之后的使用.

文件的读取和判读

图片文件的判断(判断图像的大小和通道个数)

有的时候, 我们会先判断一下我们数据集中的图片是否都是RGB的, 会是否都是灰度图. 我们可以通过下面的方式来进行判断.

import os
import cv2
import numpy as np
from tqdm import tqdm
from PIL import Image
# 获取图片的路径
def get_img_path(path):
ls = os.listdir(path)
ls = [path+"/"+x for x in ls]
return ls
# 对路径内所有图片遍历, 打印出不是三通道的图片
def Preprocess(path):
imagePaths = get_img_path(path) # 获取图片路径
for imagePath in tqdm(imagePaths):
imageTest = Image.open(imagePath)
# imageTest = cv2.imread(imagePath, cv2.IMREAD_COLOR) # 读取图片
try:
if imageTest==None:
print('图像错误: {}'.format(imagePath))
imageTest.close()
os.remove(imagePath)
print('已删除!')
elif imageTest.mode=='L': # 查看图片大小
print('图像通道错误: {}'.format(imagePath))
imageTest.close()
os.remove(imagePath)
print('已删除!')
else:
imageTest.close()
except:
pass
print('Finish!')
if __name__ == "__main__":
Preprocess('./datasets/monet2photo/trainB')

csv文件读取方式

有的时候文件较大的时候我们可以跳行进行读取, 将一个大的文件分成几次读入, 我们可以使用下面的方式进行文件的读取.

df = pd.read_csv(csv_path,
header=None,
nrows = base, # 从第base行开始读取
skiprows=skiprow) # 跳过读取的行数

评价指标的计算

计算准确率

当target是numpy的数据类型.

def accuracy(target, logit):
''' Obtain accuracy for training round '''
target = target.argmax(axis=1) # convert from one-hot encoding to class indices
corrects = (logit == target).sum()
accuracy = 100.0 * corrects / len(logit)
return accuracy

使用pytorch的时候来计算准确率.

def accuracy(target, logit):
''' Obtain accuracy for training round '''
target = torch.max(target, 1)[1] # convert from one-hot encoding to class indices
corrects = (logit == target).sum()
accuracy = 100.0 * corrects / len(logit)
return accuracy

AUC的计算

关于更多AUC的计算, 可以查看链接: ROC曲线的绘制与AUC的计算

首先是关于AUC的计算，可以直接使用sklearn中AUC的计算方式来完成。参考链接 : [sklearn]性能度量之AUC值（from sklearn.metrics import roc_auc_curve）

下面是一个简单的例子，可以看到输入的预测值可以是概率也可以是类别。

### 真实值和预测值
import numpy as np
y_test = np.array([0,0,1,1])
y_pred1 = np.array([0.3,0.2,0.25,0.7])
y_pred2 = np.array([0,0,1,0])
### 性能度量auc
from sklearn.metrics import roc_auc_score
# 预测值是概率
auc_score1 = roc_auc_score(y_test,y_pred1)
print(auc_score1)
# 预测值是类别
auc_score2 = roc_auc_score(y_test,y_pred2)
print(auc_score2)

微信公众号
关注微信公众号

QQ群
我们的QQ群号

机器学习

Kaggle House Price--一个完整的训练过程

Kaggle House Price–一个完整的训练过程

機器學習基石上 (Machine Learning Foundations)---Mathematical Foundations

機器學習基石上 (Machine Learning Foundations)—Mathematical Foundations

Jupyter Notebook操作指南

Jupyter Notebook操作指南

矩阵运算与优化算法例题

矩阵运算与优化算法例题

基于矩阵分解的协同过滤

基于矩阵分解的协同过滤

决策树案例(Decision Tree)与决策树的可视化

决策树案例(Decision Tree)与决策树的可视化

熵, 交叉熵, 和KL散度

熵, 交叉熵, 和KL散度

ROC曲线的绘制与AUC的计算

ROC曲线的绘制与AUC的计算

模型评价指标说明与实践--混淆矩阵的说明

模型评价指标说明与实践–混淆矩阵的说明

数据样本不平衡时处理方法(Resampling strategies for imbalanced datasets)

数据样本不平衡时处理方法(Resampling strategies for imbalanced datasets)

本文由王茂南发表于 2019年8月17日07:32:37
转载请务必保留本文链接：https://mathpretty.com/10980.html

目录

繁
本页二维码