Commit cbc4c641 by 20200116044

LogisticRegression-ML homework.ipynb

parent bc33f884
{
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# logistic回归\n",
"本次作业主要来练习使用逻辑回归对文本数据进行分类。通过完成作业,你将会学到: 1、如何调用逻辑回归进行分类; 2、如何对文本数据进行分类;3、如何评估模型效果。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```不要单独创建一个文件,所有的都在这里面编写(在TODO后编写),不要试图改已经有的函数名字 (但可以根据需求自己定义新的函数)```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"logistic回归又称logistic回归分析,是一种广义的线性回归分析模型,常用于数据挖掘,疾病自动诊断,经济预测等领域。例如,探讨引发疾病的危险因素,并根据危险因素预测疾病发生的概率等。以胃癌病情分析为例,选择两组人群,一组是胃癌组,一组是非胃癌组,两组人群必定具有不同的体征与生活方式等。因此因变量就为是否胃癌,值为“是”或“否”,自变量就可以包括很多了,如年龄、性别、饮食习惯、幽门螺杆菌感染等。自变量既可以是连续的,也可以是分类的。然后通过logistic回归分析,可以得到自变量的权重,从而可以大致了解到底哪些因素是胃癌的危险因素。同时根据该权值可以根据危险因素预测一个人患癌症的可能性。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在本次项目中,你将会用到以下几个工具:\n",
"- ```sklearn```。具体安装请见:http://scikit-learn.org/stable/install.html sklearn包含了各类机器学习算法和数据处理工具,包括本项目需要使用的词袋模型,均可以在sklearn工具包中找得到。 \n",
"- ```pandas```,数据处理库:https://pandas.pydata.org/pandas-docs/stable/\n",
"- ```matplotlib```,绘图库,绘制各种图表,本次作业中将进行各种模型评价指标的可视化展示:www.matplotlib.org"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 文件读取\n",
"将文本数据读入,并探查数据的情况"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"垃圾邮件个数:label 747\n",
"content 747\n",
"dtype: int64\n",
"正常邮件个数:label 4825\n",
"content 4825\n",
"dtype: int64\n"
]
}
],
"source": [
"#导入其他需要的算法库\n",
"import pandas as pd\n",
"#读取垃圾邮件数据,并统计垃圾邮件和正常邮件的数量\n",
"## TODO: 利用pandas库pd中read_csv()函数写出读取垃圾邮件数据csv文件的代码\n",
"smsDir = './SMSSpamCollection.csv' \n",
"df = pd.read_csv(smsDir)\n",
"\n",
"#数据探查\n",
"#print(df.head)\n",
"print(\"垃圾邮件个数:%s\" % df[df['label']=='spam'].count())\n",
"print(\"正常邮件个数:%s\" % df[df['label']=='ham'].count())\n",
"#print(df['content'])\n",
"#df[df['label']=='ham'] = '1'\n",
"#df[df['label']=='spam'] = '0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 准备训练数据\n",
"将数据分为训练数据、测试数据、训练标签、测试标签,并将文本转化数值特征。\n",
"本次使用的数据是对垃圾邮件分类:数据有两列,第一列是标签(ham为非垃圾邮件、spam为垃圾邮件),待分类的邮件为英文文本。"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [],
"source": [
"#导入sklearn算法库中训练测试数据分割算法train_test_split,以及计算准确率等的算法cross_val_score\n",
"from sklearn.model_selection import train_test_split,cross_val_score\n",
"\n",
"# 对原始csv中的数据进行类型转换\n",
"y = df['label'].values.astype('U')\n",
"x = df['content'].values.astype('U')\n",
"## TODO: 利用train_test_split()函数对数据进行拆分,分出训练数据和测试数据\n",
"X_train_raw,X_test_raw,y_train,y_test = train_test_split(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TF-IDF是一种统计方法,用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加,但同时会随着它在语料库中出现的频率成反比下降。TF-IDF加权的各种形式常被搜索引擎应用,作为文件与用户查询之间相关程度的度量或评级。除了TF-IDF以外,因特网上的搜索引擎还会使用基于链接分析的评级方法,以确定文件在搜寻结果中出现的顺序。\n",
"详细资料可参考百度百科:https://baike.baidu.com/item/tf-idf/8816134?fr=aladdin"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['ham' 'ham' 'ham' ... 'ham' 'spam' 'ham']\n"
]
}
],
"source": [
"#导入sklearn算法库中文本特征提取的TFIDF算法\n",
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"\n",
"# 文本是无法直接用模型进行计算的,需要对文本数值化\n",
"## TODO: 利用sklearn.feature_extraction.text的TfidfVectorizer模块对文本进行TFIDF特征转换\n",
"vectorizer = TfidfVectorizer()\n",
"X_train = vectorizer.fit_transform(X_train_raw)\n",
"X_test = vectorizer.transform(X_test_raw)\n",
"\n",
"print(y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 训练模型"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"logistic回归是一种广义线性回归(generalized linear model),因此与多重线性回归分析有很多相同之处。它们的模型形式基本上相同,都具有 w‘x+b,其中w和b是待求参数,其区别在于他们的因变量不同,多重线性回归直接将w‘x+b作为因变量,即y =w‘x+b,而logistic回归则通过函数L将w‘x+b对应一个隐状态p,p =L(w‘x+b),然后根据p 与1-p的大小决定因变量的值。如果L是logistic函数,就是logistic回归,如果L是多项式函数就是多项式回归。\n",
"logistic回归的因变量可以是二分类的,也可以是多分类的,但是二分类的更为常用,也更加容易解释,多类可以使用softmax方法进行处理。实际中最为常用的就是二分类的logistic回归。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对X_train、y_train进行训练,对X_test、y_test进行测试。"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"预测为 ham ,信件为 Hi hope u get this txt~journey hasnt been gdnow about 50 mins late I think.\n",
"预测为 ham ,信件为 Super msg da:)nalla timing.\n",
"预测为 ham ,信件为 Eat at old airport road... But now 630 oredi... Got a lot of pple...\n",
"预测为 ham ,信件为 Some are lasting as much as 2 hours. You might get lucky.\n",
"预测为 ham ,信件为 K..k.:)congratulation ..\n"
]
}
],
"source": [
"#导入sklearn算法库logistic回归的算法\n",
"from sklearn.linear_model.logistic import LogisticRegression\n",
"\n",
"LR = LogisticRegression()\n",
"## TODO:写出LogisticRegression函数训练的代码,使用LR.fit()函数,第一个参数是训练的特征数据,第二个参数是训练的标签数据\n",
"LR.fit(X_train, y_train)\n",
"## TODO:写出LogisticRegression函数预测的代码,使用LR.predict()函数,参数是待遇测的特征数据\n",
"predictions = LR.predict(X_test)\n",
"#打印出预测的结果\n",
"for i, prediction in enumerate(predictions[:5]):\n",
" print(\"预测为 %s ,信件为 %s\" % (prediction, X_test_raw[i]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 评估模型\n",
"训练完模型,需要利用二分类分类指标,以及ROC曲线衡量模型性能。"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [],
"source": [
"#导入绘图需要的matplotlib库\n",
"import matplotlib\n",
"matplotlib.rcParams['font.sans-serif']=[u'simHei']\n",
"matplotlib.rcParams['axes.unicode_minus']=False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.1 混淆矩阵"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAQsAAAD0CAYAAACM5gMqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAFVRJREFUeJzt3XuQHWWZx/HvL0PCJUEIO1wFjMHgiqsBjJAoYMISAWFRKFhRZEtZCl0tlb25Iljuuiwi660KFjBUQBYtIKKgKEjES23AICaCoKKiLnFBUUJiQlAumXn2j7fHHE5m5rw9OZ0+PfP7VHXNOW/3dL9zMvPkeS/dryICM7NOJtVdATNrBgcLM8viYGFmWRwszCyLg4WZZXGwMLMsDhY9RNLeJY6dI2n/4nW/pAsl7Ve8XyjpK5Kmthw/XdJ1krbdgvrNKK7TN9ZzWHNtU3cF7Dk+JemPwFnAw8ADbfsPBqZHxNPAi4BLJL0duAX4IfDfRTCYCrwuIp5s+d6jgGeK70XSdOCXwC9ajpkK/ANwJ/AYcD9wIHBQRNwP/CXwwogY6OLPbA3hYNFDIuJkSf8IzAbWAR9rO+QS4Jni2OskPQz8EXg+sD8wAFwAHA4skXRJRFxdfO9pwKGSfkn6d/9n4MmImNNeD0nbAb8BjgO+ALxE0hLghcBPJa0ABOwIHBkRD3frM7De5WDRIyQdApwMXBgRayQF8FTbYRERIWknYP+IuEPSZOCjwHJSprBHRPy9pD2BGcW5X0YKJq8BFgELi/N9fITqbCy+vh64JiKWSHoA+BTwXuB5EfGdLf+pJ66jF0yNx9fkJWgr73v6tog4puIqdeRg0TtWAicAt0t6BbATcH7bMf1FM2M/4FpJ3wK+BhwBzAUOAB6W9Obi+G0lfQb4CnAGMAgQEc8CSNpN0r3FsdOA+yPixJbrzQRWStqGlOWcA7yquL6DxRZYvWaA796W10U1ec9f9FdcnSwOFj2i6Ac4T9Iniuzh6IhYCSBpJrAuIh4vDv++pJcDR0fEjcAXJP07EMBHWk57f0SskvRaYB4pAM2UdBFwO/C7iDiwuMZRwOlt1TofuIOU4SwgZSW7pMN1BPDbiHhDtz+LiSEYiMG6K1GKR0N6z67F12sAJE0B3gfMk3Rka9YA3FccczJwLnAzsHexvZNNzY3HgO+ROk0fAb5J6pMYVURsAH5KCkKvIAWcDwOfi4h5DhRjF8AgkbX1CmcWPUTSNODmohnSL+kO4EHgSeBfgEdJ/RMAbwCOl3Q/8CbSaMhJFB2gwL7AEoCIuAe4R9IBwO7APRHx26IZsqKlCkvb6nMsMIeUUfw4IgYkte6fDAx6dGRsBmlWZuFg0Vv+CVgcEU9I+lVEHKb013kxsCgirilGKgBOBK4mDX1eAlwHvHVoZELSc/o7JE0C3grcA3y9aEb8brjRkBY7kIZS5wIfl7QR6Cf1hRwFTCEFsVu78LNPKEEw0LDHQzhY9AhJs4C3AQdI2gXYTdIngd2AtcBvi0OvknQpaXj0lJZ5E5OAL0pqzSw+2HKJj5CGY08BLgMuGqYOk4HJpCyZiPhCsetLwH8Ux5wJvCgi3t+Nn3si66UmRg4Hi94xA/jPiHhS0gLSnIl7gOuBM4HLJT0C7EXKIq4dChSF3YBjIuLhorlxMfADAElvBN4CHBoRg5LOLo4/oGjqDJlEmldxxSj13LbYbAsEMNCwYCE/Kav3FFnC5LZgsCXnE7BnRPy6G+fbGiTtDtwQEYfXXZcqzJ49JW67JW9EdM+9f7OyQ3Nxq/BoSEmSFktaLum8qq4REYPdChTF+aJhgWI6qT9maqdjm2wwc+sVDhYlSDoJ6IuIeaT5CrPqrtM4NQC8EVhfd0WqEgQDmVuvcJ9FOfMphiNJw4yHkYY2rYsiYj1A6zDtuBMw0DtxIIszi3KmkiY1AawhzVkwKy1NympWM8SZRTkbgO2L19NwsLUxEwM0K3PyL3s5K0lND0i3kT9UX1WsyQIYjLytVzizKOcmYJmkvYBjSTMbrSIRMb/uOlQlgGca9n91s2pbs6LjbT5wF7AgItbVWyNrssFQ1tYrnFmUFBFr2TQiYjYmaQZn7wSCHA4WZjUIxEDDEvtm1bZHSDqr7jqMdxPhM25aM8TBYmzG/S9yDxjXn/FQMyRn6xVuhpjVQgxEs/6vrj1Y9O/SFzP2mVx3NUrZ9/nbMGf2dj00Aj66n923Q91VKG07duB52qUxnzHAE6xdHRG7dj4yZRbP0qy1mmoPFjP2mczdt+1TdzXGtaP3OrDuKkwIt8cNq3KPjXBmYWaZBnuoPyKHg4VZDVIHpzMLM+vIzRAzy5BuUXewMLMOAvFMeDTEzDIMuhliZp24g9PMsgRioIfu+8jhYGFWk6Z1cDartmbjRAQMxKSsLYek3SUtK15PlnSzpDslnVGmbDQOFma1EIOZW8czbb4o07uBlRHxauBkSTuWKBuRg4VZDQJ4JrbJ2jK0L8o0n01Pc/sfYE6JshG5z8KsBkGpB9v0S1rR8n5RRCz607k2X5RpuPVtcstG5GBhVpMSQ6erSy6MPLS+zTrS+jYbSpSNyM0QsxqkdUMmZW1jMNz6NrllI3JmYVaLSh+ZdzVwi6TDgQOA75KaGzllI3JmYVaDKjKLoUWZImIVsBC4EzgqIgZyy0Y7vzMLs5pU+TDeiPg1bevb5JaNxMHCrAYR4tnBZv35Nau2ZuNEep6F7w0xs478pCwzy5A6OJ1ZmFkGP8/CzDoqOd27JzhYmNWkac+zcLAwq0EEPDvoYGFmHaRmiIOFmWWocgZnFRwszGrgoVMzy+RmiJll8nRvM+soPd3bwcLMOgjExkGvdWpmGdwMMbOOPBpiZtk8GmJmnYVvJDOzDH5Slpllc2ZhZh0FsNF3nZpZJ018+E1loU3SYknLJZ1X1TXMmmwQZW29opJgIekkoC8i5gEzJc2q4jpmjRWpzyJn6xVVZRbz2bTK0VI2Lb4KgKSzJK2QtOKxx0ddMc1sXBqalOVgAVNJi64CrAF2b90ZEYsiYk5EzNn1z5o1P96sW5oWLKrq4NwAbF+8noYXYDZ7jkAMNGw0pKrarmRT02M28FBF1zFrrKZ1cFaVWdwELJO0F3AsMLei65g1UkT3JmVJmg58DtgNWBkRb5e0GDgA+GpEnF8ct1lZGZVkFhGxntTJeRewICLWVXEdsyaLUNaW4XTgcxExB9hR0vtoG43sxghlZZOyImItm0ZEzOw5SnVe9kta0fJ+UUQsann/OPAXknYG9gHWsflo5EHDlD1YpsaewWlWk8ysAWB1kTWM5A7gOOA9wAPAFJ47Gnkwm49QHly2vs3qjjUbJ7o8z+JDwDsi4sPAT4A3s/lo5BaPUDpYmNWheGBvzpZhOvAySX3AocCFbD4aucUjlG6GmNUgKNUM6eQjwFXAC4DlwCfZfDQyhikrxcHCrBbdm50ZEXcDL33O2aX5wELgoqHRyOHKynCwMKtJRJXn3nw0cktHKB0szGrSxWbIVuFgYVaDCAcLM8vUS3eU5nCwMKvJ4KCDhZl1EGTf99EzHCzMalLhYEglHCzM6uAOTjPL1rDUwsHCrCbOLMwsS5UzOKvgYGFWgwiIhj2w18HCrCbOLMwsj4OFmXXmSVlmlsuZhZl15ElZZpbNmYWZZXFmYWZZGpZZjDorRNIkSVNH2ffX1VTLbJwLUmaRs/WITpnFDOBkSd8jrU3QSqQ1Fr1EodkYjLdJWRuBAeCDwDJgd+AI4PukdRIb9uOa9ZCG/fWMGCwkbQOcD+wI7Al8FZgFvBi4G7gTeMVWqKPZ+NRDTYwcne5kWQY803ZctH01s7ICNJi39YoRM4uI2ChpKbATsCtwMWlh1T2L7c3A77ZGJc3Gn97qvMzRqc9iX+DeiPhY+w5Jk0hNEzMbi4bl5qP1WWwLfAB4StKRwxwyCXikqoqZjXvjJVhExNPAsZJmAhcALwfOBh4vDhGwbeU1NBuvxkuwGBIRvwROlXQy8KuI+En11TIb54YmZXWRpEuBWyPiZkmLgQOAr0bE+cX+zcrKyH6uV0TcEBE/kfTqlso5szAbI0XelnUu6XBgjyJQnAT0RcQ8YKakWcOVla1vx2Ah6UFJK1qKLijKTwQ+VPaCZlaIzK0DSZOBK4CHJL0emM+mmdVLgcNGKCsl50ayhyJiYcv7JyX1AecAx5W9YLsHfziNY2e9uvOBNmaTZu9bdxUmhnvLHZ6bNQD9bf9hL4qIRS3v/wb4MXAR8G7gXcDiYt8a4GBgKpsGJIbKSskJFiHppaR7Q35WlL0F+FJEPFb2gmZWyO+zWB0Rc0bZfxApgDwq6bPAq0hzogCmkVoQG4YpK2XEb5A0WdKbSNO9XwKcAlwCvJJ0j8gny17MzAq5TZC87OPnwMzi9RzSDaBDzYzZwEPAymHKShkts+gHFgIbI+IGSS+PiPdKuhXYGXgPcGHZC5pZoXtDp4uBKyWdCkwm9U98WdJewLHA3OJqy9rKShkxs4iI30TEGaRJWYcA20k6HlBEfAA4XtJuZS9oZkm3RkMi4omIOCUijoiIeRGxihQw7gIWRMS6iFjfXla2vjntliD1VXyGdD/I0K0ti4FTy17QzArda4ZsfuqItRGxJCIeHa2sjJxg8QLS3afrgX8ndY4A3EbqyzCzkjSe7jodEhEvbn0v6SJJZ0TElZLeW13VzMa5ht112ukZnPOKfoo/iYivAKdJ2hn4dJWVMxvXKmyGVKFTZjEJ6JP0A+Bp0s1jQWqavA34VrXVMxu/SkzK6gmd+iyGfpw1pGdX/B74BnAfsD/w2eqqZjbOjbPM4q+A/2PzqkdE/F2VFTMb10rcJNYrRpvBOYk0n/yEoaK2/cOuJ2JmmRqWWYw2KWsQuB64bKio5auAyyX1V1s9s/GraUOnuTeTPI80RXRHYAHpqVmfBt5RUb3MrMd06rPoA6a03/Em6ZsRcUfx9CwzG4seamLk6BQs7qStr6JwBUBEnN31GplNBA3s4Bw1WETEwAjl11ZTHbMJZDwFCzOrkIOFmXUixlkzxMwqEr01LJrDwcKsLs4szCyLg4WZ5XCfhZnlcbAws4567CaxHA4WZjXxaIiZZXGfhZnlcbAws47cZ2FmOcTwt3P3MgcLs7o4szCzHO7gNLM8Hjo1s44a+KSs3Af2mlm3dXEpAEm7S7qneL1Y0nJJ57Xs36ysLAcLs5oo8rZMHwO2l3QS0BcR84CZkmYNVzaW+jpYmNUlP7Pol7SiZTur9TSSjgSeBB4F5gNLil1LgcNGKCvNfRZmNSmRNaxuX47jT+eQpgAfBE4EbiKtIvhIsXsNcPAIZaU5WJjVoXszON8PXBoRv5cEsAHYvtg3jdR6GK6sNDdDzGogurZ84VHAuyR9GziQtJj5UDNjNvAQsHKYstKcWZjVpQuZRUQcMfS6CBgnAMsk7UVacnRucaX2stIqyyyKoZxlVZ3frOkUkbXlioj5EbGe1KF5F7AgItYNVzaW+laSWUiaDlxN6lgxs3YV3nUaEWvZNPoxYllZVWUWA8AbgfUVnd+s8bo8z6JylWQWRdpD0Tu7mWKc+CyA7eTkwyaoHgoEOWrp4IyIRcAigJ36+hv2kZl1Ry9lDTk8GmJWBy9faGbZnFlsEhHzqzy/WVN5FXUzy1diDkUvcLAwq4kzCzPrzEsBmFkuj4aYWRYHCzPrLHAHp5nlcQenmeVxsDCzTjwpy8zyRLjPwszyeDTEzLK4GWJmnQUw2Kxo4WBhVpdmxQoHC7O6uBliZnk8GmJmOZxZmFlHCpA7OM0si+dZmFmOMksT9gIHC7M6+ElZZpbH94aYWaamjYZUtTCymXUydOdpp60DSTtJulXSUkk3SpoiabGk5ZLOazlus7IyHCzM6hCggcjaMpwGfCIiXgs8CpwK9EXEPGCmpFmSTmovK1tlN0PM6pLfDOmXtKLl/aJicfF0mohLW/btCrwF+FTxfilwGHAQsKSt7MEy1XWwMKtJiaHT1RExp+P5pHnAdOAh4JGieA1wMDB1mLJS3Awxq0uX+iwAJO0CXAycAWwAti92TSP9nQ9XVoqDhVkdgjSDM2frQNIU4PPAORGxClhJamYAzCZlGsOVleJmiFkNRHRzBuffkpoV50o6F7gKOF3SXsCxwFxSeFrWVlaKg4VZXboULCLiMuCy1jJJXwYWAhdFxLqibH57WRkOFmZ1CCBvWHRsp49Yy6bRjxHLynCwMKuJbyQzszwOFmbWmW8kM7McXkXdzLL5SVlmlsMdnGbWWQADzUotHCzMauEOztLWDz6+eumGq1fVXY+S+oHVdVci2711V2BMmvUZJy8odbSDRTkRsWvddShL0oqcW4Zt7CbEZ+xgYWYdeRV1M8sTEO7gnAgWdT6kepImAwMR6bdO0jak0fupEfHECN8zE1hb3FSEpO0i4qmW8xERz26N+nfQE59xZRo4GuKH34xB6/MPtyZJh0v6uqSbJT1Ceo7BlyQ9Lukm4CbgVcDtkuZL+rykz0i6XtJBxWnOID2PcchNkl4jaQbwNuBKSTMk7VcEn1rU9RlvVV18UtbW4MyiQSJimaSPAscAV0bEjcDlkm6LiDcMHSfpdaRnMQ4A55Ke/twvaSnwHYq5g5L2A54GtgVOAV5ZvD6Z9LvxX8CwGYp1QQ8FghzOLJrnD8ChEXGjpLmS7gZWSbpc0n2S5gKHRMTPi+MvB3YGngWeaTvXBcADwO3A60gZx58DxwPfG6kpY92QmVX0UEBxZtEgkk4Dzkov9W3ga8AtpIexLgf2Bn4EfFHSULAYANYPc65TSM9i/N+IGJQ0FTi92H0cKTOxqgQw6D4Lq861wHzg98DdwK+L8j0oJjAV2cAJpAeyCpgMbCxet/oRcHbL++2BFxXbblVU3to4s7CqtIx6AJxDekjrTGAf4FdsCgivB/YnBYkdSf0OQ4Fj6Fw/lrRDy+n3BM4sXu8BfL2qn8MKPRQIcjhYNFREDEj6A7AKOILUUblc0iTgPaROygOBk4AXAleQMsnDhj8jq0mjKQCHVFh1A4ggBgbqrkUpboY0jFJaIYCI+BEpc/gGcE3x9UzSiMcTwIeBfwWeAt4B/JTUgTn0WypgkqQ+YB1wR7H9rLhW39b4mSaswcjbeoQziwYpFpP5DnBt8Yd8CSngvxPYAbieFByWkPod/i0iHpZ0AamZsTvwfVJ/B6Rh0n5SJ+ljxfcOeSXp9+O6Sn+oiaxhzRBFwypsm0h6fkQ80vJ+B+DpiGhWfjsB7dTXH/OmnZB17G3rr1rZCzfVObNosNZAUbz/Q111sTFo2H/UDhZmNYmGzbNwsDCrRW/NocjhYGFWhwAaNnTqYGFWgwCih4ZFczhYmNUh/PAbM8vUtMzC8yzMaiDpa6QJcTlWR8QxVdYnh4OFmWXxvSFmlsXBwsyyOFiYWRYHCzPL4mBhZlkcLMwsi4OFmWVxsDCzLA4WZpbl/wGCT5CMagtNrgAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 288x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 二元分类分类指标\n",
"from sklearn.metrics import confusion_matrix\n",
"import matplotlib.pyplot as plt\n",
"# 计算predictions 与 y_test的混淆矩阵\n",
"## TODO: 利用confusion_matrix模块计算混淆矩阵,并使用matplot展示\n",
"confusion_matrix = confusion_matrix(y_test, predictions)\n",
"\n",
"#添加图示\n",
"plt.matshow(confusion_matrix)\n",
"plt.title(\"混淆矩阵\")\n",
"plt.colorbar()\n",
"plt.ylabel(\"真实值\")\n",
"plt.xlabel(\"预测值\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.2 precision、recall、f1-score"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" ham 0.97 1.00 0.98 1190\n",
" spam 0.99 0.81 0.89 203\n",
"\n",
"avg / total 0.97 0.97 0.97 1393\n",
"\n"
]
}
],
"source": [
"# 自动计算precision、recall、f1-score指标\n",
"from sklearn.metrics import classification_report\n",
"print(classification_report(y_test,predictions))"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"平均精准率为: [1. 0.98780488 0.97297297 1. 1. ]\n",
"平均召回率为: [0.59633028 0.74311927 0.66055046 0.64220183 0.61111111]\n",
"平均F1值为: [0.74712644 0.84816754 0.78688525 0.78212291 0.75862069]\n"
]
}
],
"source": [
"## TODO:手动计算precision、recall、f1-score指标\n",
"# 精准率\n",
"from sklearn.preprocessing import LabelEncoder\n",
"label_int = LabelEncoder()\n",
"y_train = label_int.fit_transform(y_train)\n",
"y_test = label_int.fit_transform(y_test)\n",
"precision = cross_val_score(LR, X_train, y_train, cv = 5, scoring='precision') \n",
"print(\"平均精准率为: \",precision)\n",
"# 召回率\n",
"recall = cross_val_score(LR, X_train, y_train, cv = 5, scoring=\"recall\")\n",
"print(\"平均召回率为: \",recall) \n",
"# F1值\n",
"f1 = cross_val_score(LR, X_train, y_train, cv = 5, scoring=\"f1\")\n",
"print(\"平均F1值为: \",f1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.3 绘制ROC曲线"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAETCAYAAADd6corAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3Xm8leP+//HXp6TdsFGOmWM+oSjKEEkTJRFOyFhEwjEcx3jM9OvQIcd8ihwcMlPGikhFhh0VEscQ8lUaEe3dsD+/P65728tuT+32ve611n4/H4/92Gvd61prfdZd+/6s67ru63ObuyMiIpKqXtIBiIhI5lFyEBGRNSg5iIjIGpQcRERkDUoOIiKyBiUHkRiYma3r882sSW3FU/a1M/G1JLMoOcg6MbPGZrapmW1vZm3MrJeZXWBm/zGzDaM261fw3FPMbFcz62pmFf5fNLPPzOxPVcQx1cw2iW5faWYfmtnE6OfDctrnm9nbZrZFdP8NM+teyesfaGYvlNl2rpldXcFTBpnZ8MpiruB9VplZU2BH4PNK2vUws91S7vcxs4Mq249Ru+OAc9c2rkocYWaX1OLrSYZQcpAaMbNhZvYD8B7wGHA7cCrQDtgceAdoFjV/1Mz6lvMyXYFdgIHR81NfP8/M6kd3i4CVZtbAzOqZ2b/MbO8yr7UC+CXl9uXu3sndO0X3yzoccHf/3sxaAX8EXqvkIxcCP0WxrRcdhH+N4jIza5QS+xbAVcC9ZnaxmX1pZrPNbI6Z/S9KdvPMbMty3me5uy+LPnNReYFE730LkJey+QCgDXCdmV1cwfP+CJwJ/NvM+pvZcjObH/3cXNILMLPDzexrM1toZoNTnr9dlIR/MLMRZtYAeA7Yz8z2rWTfSRZScpAacfcLgQuAe9y9C9AIeMndrwV2AvZ19zlR83OAC8ysqZmtl/IyhYADA4CJZd7iH8B7ZlZA+BY9BpgG/JNwYF8AYGbbmNlRQHPgcDPbNHr+EDObYmZTgNXlfIS/AvdEt/8O5ANTzawg+vnAzLpE7/EqsB5QHLU/AigArow+23vA81Hb5tHtG4Fp7v5Pd9/B3Xch9ATOcPc/ufvm7v5/5cRVEmtlq1NPAp4BCs2sa7TtF8L+vBGYb2b7lPO864GL3L0kWY5x982AlkBPQi+gGfBf4HigBXCCmfWI2j9E+CKwOWF/n+phFe35wHWVxCtZaL2qm4hU6B3gVuBOYCiwiZkdSOgN/HZwcvd5ZrY/cAZwtpmtjB7aCjgEWATkmdlzJQcud/9ryfPNbDpwJNCKkBw6u/s30cPbEg5kmwJ9gE/cfWgUT7nMrBehh/ODmR0Q3b4c2Nbdr4za1I9+7wC0JgzF7GVmN7j7VcAzZtYf2Nzdb0x5+bbAZ8CrwNtm1tHdy+0BpMSzhJA4ioCmUUJbH9g8up0H/NHdNzWznYGzCAfz+4CZZrYxsH+0P48FFgONgXdT3qNh9PneL/v+7r4wSoAtCMeET9z9reh5jwG9zGw2sAdh3xeb2VPAUcAId//WzH41s63c/bvKPqtkDyUHqREzm0o0bBQdOEpsThjGed/MFhMSx+nAN+7eHxie8hoXAavc/V8p27YCxgE/U/otujmhZ9EYmAeMisblz3X3ycAUM5tIGNa6wsweip5zOqF38D2wO7AJUB8YFr0OhOT2N2Djsh/R3VdFvYehwOuEntLVZvY48KfotczMLoze40F3HxYdaMcRhqlWmdkW7v59JbtzJXCMu88xs6Xu3sHMtgamRLd3At6I2vYAtgAmA/OBJ4B+wKfAZHe/Jhp2amtmG7v7ouh5OwCflPfmUXLpRBgK+xPwdcrD3wC9CIn5C3cv+TcZC8xIafcOYR8rOeQIJQepqe3cfYvKGpjZ9+4+ysxeA56Mto0jfAMtJBx0d0p9TvTNs1XUth6wJ/Av4E3g6pQhkdT3MaAB8P+APxCGRfYmDAPNB24GBkf3z4hea5Po6T3cfXHUCzg96lUY4cB3CqEHsALYMtre1t2PK/P+s929dcqm84CDCQfyLYGpwDbRY8PN7Fdgtbu3i7YVU7WSYaa73f2OKEFdQzh4vwCcCBwd9YSaRtuvJPTKICTyJWVes7eZzYv2xS2E+YMr+f1cxwrCkOFGwLLfgnFfCixNabeE0jkmyQFKDlJTl5tZG+BFwgGk5ICyPjAp6iWcmNJ+NYC7d4++5fclHJSOKu/Fzawx4dvyNoReRDegm5nlAe+6+2lRu16E+YnNCRPbRcB2hIN7T8JE9zLCt/5iYCRh7PzRKJ7FKW97X8mwUoqvgb2As6PX+9bMjo5ul9jOzGYCN0XvNQj4NRp+KeT3B9sz3X1iOR/5STOrbFiphJvZZYRv6EWE+YK+wCxgX8J8yADCfMdnKc9bxJq9ozGEIblZwCvu7lHiSn2/hkQT79FtAKJJ/K7uflu06Q/AnHI+l2QpJQepEXd/ACCaEB1F+KbuhMnZkVGb3539Ew0FFaZsmgXsYWb1S4YrLJz+ehphjH8DoE/qwdTMOhG+/ZeYChwIjCYcyJ8FNiR8y28BdAd+JBxsh7v7vYSJ3Eo/n5mt5+6rote+gTCs9KK7X2bh1NzH3H1m1HY24UyheoRe0fGsOcFelaqGlUpebzPCBPCnUVx3uvtfojjOJsxHDIhiSPUlsGvZN40Swp2EifVXonbHpjTZFviKMCeyfcr2NoQzpEqSQ3vCHIjkCJ2tJOvE3WcTvoHeRRiaeCOaByirGTCBMHEK4dv4joQhni4W1hE0Jswr7EI4zXUKYRim5AyiAsKcxW9n8rj7omiIo0QPoD9hmOkTwvDKQ+7eLkoMlTndzKZHE+AzzKyZu09y94MIQzclVhFOz22REkexu69y9/fdfXoV71Nj7v59NJx3GGE+5U/w2xDc14SzsDoBM8s8byXwiZl1KOdlHwQOtHCq6+tACzPrEM1FHEc4++oDYKmZnRqdwnpS1JZoktzd/Yfa/rySHCUHWSdmlk/4FnsCYTjlA1vz/P3tCd9a7wUuIqwpOJ4wfPFvwqmkFwFHRwe/M939q+i5Z0YH9nbRGP2ZhF5ByfubmW1HmIBeTfjW+yRhMnpRyeua2bPRUEiJeqmvE7nP3dtEPy3dPXWM3krau3sxYV7BLZyaW97f0XrREFj9ch7DzNa30tN6jTCsNIXSYaVnKB1Weizled3M7DtgUvRZl5rZ7oTE+zVhHmA/wsT5aWXe9hrgH9G/2W+idRUPAYOiRHsS8Ajh3/Vhd38l+syHEXol30bvdV/0Ge4Arijvc0r20rCS1IiZXUs4wK8mnLnSjjD3cCrwl+iAPTGavJ0LHOnuL5nZNGBmylkvmNkZhDH81OEiCGPcw83s55Rt+cBvK56jYZE7gJcJcwp7A73c/avoG+5K4FDC5HLqGUMNCENNqfcbVPKRG5Iy5u7uE6JvzAuA+8tp3wC4kHB67dKo1wNQsthsfeBa4OnodY9JWRfyO9Gw0lvR3YnA3iVrJMysN6FXc4G7Pxvt97GEifA9U1/H3eeX/Lu5+wjggZTH/ppy+wV+31Mq2f458Lueh5kdBjzq7musQpfsZq4rwUkNmNlGhDNufq6kTb3oG2dVr2XAhmWGh7KCmZkn/EdkZuunnsUVzYls4O4LEwxLspySg4iIrCG2OQcz28zMypuYLHm8gZk9b2ZvljM2KiIiCYolOVioz/IgUFnJ4XMJ52IfAPQpO0kmIiLJiWtCejXhFLgxlbTpBFwW3Z5EmNB8PbWBmQ0kLGyiSZMmbXfZZZdaDzQbLFgAixdX3W5dLIvWvjZtGu/7iEh6bFo0lw1XLuADihe6+yZVP+P3YkkO7l5S2riyZk0orcOymLC4p+zrjABGALRr184LCgrKNondiBEwalTa3/Z3pk0Lvw86KN73OeEEGDgw3vcQkRh9/DEUFkLbtuEb39Kl2DbbfF31E9eU5Kmsywg1W34k1IJZVnnzeFR18H8jKncW94G5MgcdpAO3iFRixQq48UYYPBj23RcmTw7DAOswFJBkcphGOGf6KUJJ5LfTHcCIEXDmmeF2RQd/HZhFJKO99x4MGAAffgjHHw+33Vb1c6ohLckhKnu8m7vfmbL5QeAlC/X/dyOU/I1V2V5CSa9g+HAd/EUkC73xBnTpAltsAc89B4cfXmsvneg6h6jMQgdgnLv/WFnb6sw51GSISL0CEck6S5ZAs2awahUMHQrnnAMbblhuUzObllIevtqyZhFc2eRQXiKozvyAkoGIZK0ff4RLLoHRo8Pk8x/+UOVTapocsra20qhRIRmkJgLND4hIznrhBRg0CL7/Hi68EBo3jvXtsi45lPQYShLDxIlJRyQiEqOiIjj1VHj0UWjVCp55BvbZp+rnraOsK9k9ahRMn17aSxARyWnrrw/FxXDddWHRUxoSA2RhzwGgTRv1GEQkh82dC3/9KwwZAjvvHHoNVVy9sLZlVc9hxIjSSWcRkZxTXBwOdC1bwosvwowZYXuaEwNkWXIoOTtJw0kiknM+/xy6dg0rc9u2DYva+vRJLJysG1Y66CCdjSQiOejuu+H99+Hee8OK5wR6C6myqucgIpJTPvwQStZvXX89zJoFp5+eeGIAJQcRkfQrKoJrroG99goTzxCK5G21VbJxpVByEBFJp7ffDknh+uuhb9+w2jkDZd2cg4hI1po4MRTK22qrcDZSz55JR1Qh9RxEROK2aFH4feCB8I9/hLpIGZwYIIuSw4IFWuMgIllm6VI44wzYdVdYuBDq14dLL4UNNkg6siplTXIouYay1jiISFZ47rmwmO3++0NtpCZNko5orWTVnIPWOIhIxisqgn794PHHYY89YMwYaLfWFbMTlzU9BxGRrNCwIdSrBzfcENYwZGFiACUHEZF19803cPTR8Nln4f4jj8CVV0KDBsnGtQ6UHEREaqq4GO65J8wtjBsHH30UtmfACud1peQgIlITn30GnTrB2WdD+/bh9NSjj046qlqTVRPSIiIZY8SIUBvp/vuhf/+c6C2kUs9BRKS6ZsyA994Lt6+7LhTKO/XUnEsMoOQgIlK1oiK46qpw5tHf/ha2NWkCW2yRbFwxUnIQEanM1Kmw554weHBYhfvss0lHlBaacxARqUhJobxttoGXX4YePZKOKG3UcxARKWvhwvD7wAPhppvCKap1KDGAkoOISKklS+C000KhvAULQqG8iy+G/PykI0s7JQcREYBnnoHddoOHHgqVVOtgQkilOQcRqduKiuDEE+Hpp6FNG3jppTABXcep5yAidVvDhpCXB0OGwLvvKjFElBxEpO75+mvo3Rs+/TTc/+9/4fLLs7pQXm1TchCRuqO4GO68MxTKmzAhrHCGnFzhvK6UHESkbvj0U+jYEc49Fzp0CIXyjjoq6agyliakRaRuuPfe0FN44AE45RT1FqqgnoOI5K4PPoB33gm3Swrl9eunxFANsSUHMxtpZlPN7MoKHm9mZi+ZWYGZDY8rDhGpgwoLwwTz3nvDJZeEbU2awOabJxtXFoklOZjZ0UB9d28P7GBmO5fT7GTgEXdvB+SbWaUXWl22LIZARST3TJkCrVvDjTeG4aPRo5OOKCvF1XPoBDwR3R4PdCinzSKglZltBGwDfFu2gZkNjHoWBRAKIoqIVOj110M9pBUrYPz4cCGeZs2SjiorxZUcmgDfRbcXA5uV02YKsC1wHvBJ1O533H2Eu7dz93ZNm8LAgTFFKyLZbcGC8LtjR7jllnCFtoMPTjamLBdXclgGNIpuN63gfa4BBrn79cBs4NSYYhGRXLV4cZhg3m230kJ5F14ITZsmHVnWiys5TKN0KKk1MKecNs2A3c2sPrAv4DHFIiK56KmnQvXUUaNg0CDYYIOkI8opcSWH0cDJZjYMOBb42MwGl2nzD2AE8CPQHHg0plhEJJcUFcGf/wzHHANbbw0FBXDDDaFGktSaWBbBuftPZtYJOBgY6u7zgBll2rwLtIzj/UUkhzVsCI0bh7OR/vY3WE9reeMQ2zoHd1/i7k9EiUFEpOa++gp69YLZs8P9hx6CSy9VYoiRVkiLSOZavRpuuw1atYJJk0qrqGqFc+yUHEQkM82aFdYsXHABHHRQKJTXu3fSUdUZ6pOJSGb6z3/gs8/g4YfDClj1FtLK3LPjDNL8/Hb+888FSYchInGaNg1WroT99oNffw11czbdNOmospqZTYvKFK0VDSuJSPKWLw8TzPvuG35DOCNJiSExSg4ikqxJk0KhvKFD4dRTYcyYpCMSNOcgIkl6/XXo0gW23x5efRW6dk06Iomo5yAi6Td/fvjdsSMMGxYK5SkxZBQlBxFJn4UL4aSTQqG8H34IhfL++tdwIR7JKEoOIhI/d3j88ZAUnngCzj0XNtww6aikEppzEJF4FRXBscfCc8+Fy3aOHAm77550VFIF9RxEJF4NG4arsd18M0ydqsSQJZQcRKT2ffkl9OxZWijvgQdCBdX69RMNS6pPyUFEas/q1XDrraFQ3ptvhvIXkpWUHESkdnz0Eey/f7hMZ9euoVDeEUckHZXUkCakRaR2PPhgGE4aNQr69lWhvCynwnsiUnPvvhuGktq3D4XyfvkFNtkk6agkhQrviUj6/PorXHRRSAp//3vY1rixEkMOUXIQkbXz+uvhdNRbboEzzoDRo5OOSGKgOQcRqb7XXguTzTvuGJJEp05JRyQxUc9BRKr2/ffhd6dO4ZrOM2cqMeQ4JQcRqdiCBeESna1ahUJ59erBeeeF+QXJaUoOIrIm93BK6q67wlNPwfnnw0YbJR2VpJHmHETk94qKoE8feOGFcNnOkSOhZcuko5I0U89BRH6vYcNwSuqwYaEEhhJDnaTkICLwv/9B9+4wa1a4f//94SI8KpRXZyk5iNRlq1bBP/8Je+wB77wDX3yRdESSITTnIFJXzZwJAwZAQQH07g133w1bbpl0VJIhlBxE6qpHHoFvvgmX7ezTR4Xy5HdUeE+kLnnnnVAob//9Q32k5cth442TjkpipMJ7IlKxX34J11lo3x6uuCJsa9xYiUEqpOQgkusmTAiF8m69Fc46C8aMSToiyQKacxDJZRMmQLdusPPO8MYb0LFj0hFJllDPQSQXffdd+N25M9xxB8yYocQga0XJQSSXzJ8Pxx4bhpHmzw+F8v7yF2jUKOnIJMvElhzMbKSZTTWzK6tod7eZHR5XHCJ1gjv897+w225hTuGii6B586SjkiwWS3Iws6OB+u7eHtjBzHauoN2BwObu/nwccYjUCYWFcNhhcMop0KIFTJ8eLt3ZoEHSkUkWi6vn0Al4Iro9HuhQtoGZNQDuBeaYWe/yXsTMBppZgZkVrFy5MqZQRbJcXl5Y2Xz77TB5ciizLbKO4koOTYBoRozFwGbltDkFmAUMBfYxs3PLNnD3Ee7ezt3bNdC3IJFSn30GBx9cWijvvvvg3HNVKE9qTVzJYRlQMgPWtIL32RMY4e7zgIeBzjHFIpI7Vq2Cm24KhfIKCuCrr5KOSHJUXMlhGqVDSa2BOeW0+RzYIbrdDvg6plhEcsOMGeHiO5ddBj17hl7DYYclHZXkqLgWwY0GJpvZlsChQF8zG+zuqWcujQTuN7O+QAOgT0yxiOSGUaPC+oWnnoI//znpaCTHxVZ4z8yaAQcDk6Kho3WiwntSJ731VjhN9YADQpG85ct1iqqslYwrvOfuS9z9idpIDCJ1zrJlcN550KEDXHVV2NaokRKDpI1WSItkmvHjoVUruPNOOOccFcqTRKjwnkgmmTAhXMu5RQuYNCn0HEQSoJ6DSCaYOzf87tw5XK5z+nQlBklUlcnBzBqWub+emZ0WX0gidci8eeESnXvsUVoo76yzwqpnkQRVmhzMrD4wycyus6A/8DfgqHQEJ5Kz3OHBB0OhvBdegEsu0WSzZJRKk4O7rwaWA18ARxJWNT8KrIo/NJEcVVgIhx4K/ftDy5Zhcdtll6lQnmSU6sw5OKFO0ktAM+DmaJuI1EReHvzxj+FspDfeCJPPIhmmqmGl4wiJYBvgMWA4sD6wlZkda2YnxB+iSA6YPRu6di0tlDdiRDhNtZ7OCZHMVNX/zM2APxJqIO0MnAnkA3nAFsDWsUYnku1WroQhQ6B1a/jgA5gzJ+mIRKql0nUO7n67mR0FfAn8QqiHdD7wo7vflob4RLLX++/DgAHhtNRjjgnXct6svOr1IpmnOn3aesACoB/QHTg91ohEcsUTT4RTVZ95JtxWYpAsUtWcw3qE6zLsA3xFuLrb/6P0Wg0ikmrKlPADcM01YY7hKJ35LdmnqmGlVYTEUGK6mV0KqF6wSKqff4bLL4e77oIuXUIZjEaNwo9IFlrrUyXc/Sd3/08cwYhkpbFjQ6G8u++G889XoTzJCZX2HMzsTaAQKC77ENAQON/d348pNpHM9+qrYUHbrrvCm29C+/ZJRyRSK6qqyrrC3buW94CZvaLEIHWSeyiUt802YQjp3/8Oq50bNqzyqSLZoqphJQcws1vN7LXoZ1Qa4hLJTN9/Hy7RmVoo78wzlRgk51T3eg6t3L0LgJm9FmM8IpnJHf7zH7jwQigqguuvh403TjoqkdjoYj8iVSkshMMPD/MLHTvCfffBzjsnHZVIrKqbHL4zs+ej28vjCkYkI+XlwY47huGkgQNVD0nqhGolB3fvX97m2g1FJIPMmgVnnx3WLbRsGSadReqQqpLDttEcQ9nrN9QHtjazNu4+PZ7QRBKwYgXcdBMMHgz5+fDttyE5iNQxVa2Q3jFdgYgkrqAgFMqbORP69oXbboNNN006KpFEVLUI7mpgHvAt8K67LzKzYyntSTRw98djjlEkPZ5+GhYuDCucjzgi6WhEElXVzNoxhCGkjsBMM9sWuBHYHrgJ2Cre8ERi9sYbMHlyuH3NNfDxx0oMIlQ95zDP3e8BMLP6hGQw191vMbNe7j4s9ghF4vDTT3DppWGiuWvXcJpqXl74EZHqrZAGcPdL3P2tmOMRid+LL4ZJ5hEjwqI2FcoTWUNVyaGNmf3TzPZKSzQicXv1VejVCzbcEN56C265BZo0SToqkYxTVXL4HzAF+IeZ3V7mMa1zkOzgDl9/HW536QL33hsu4bnvvsnGJZLBqkoOy919DNCDsK7hVGBXM3sf+JOZfRB7hCLr4rvv4MgjoU2bcMnOevXg9NNh/fWTjkwko1U4IW1m9YCPANzdzewMQi9ic3dfnab4RGrGPdRAuugiWLkSbrgBNtkk6ahEskaFycHdi4ELUu4vMrPTlBgk4xUWwmGHwWuvQadOYRhpp52Sjkokq6xVVVZ3nxpXICK1Ji8PWrSA444LQ0gqlCey1vRXI7nho49COe0PPwz3775bFVRF1kFsfzlmNtLMpprZlVW020wT21JjK1bAddfBXnvBJ5/A//1f0hGJ5IRYkoOZHQ3Ud/f2wA5mVtmVUW4GGsURh+S4d9+Ftm3h2mvhmGNCme3u3ZOOSiQnxNVz6AQ8Ed0eD3Qor5GZdQF+IRT3K+/xgWZWYGYFK1eujCNOyWajR8OSJfD88/DIIzobSaQWxZUcmgDfRbcXA5uVbWBm6wNXAZdV9CLuPsLd27l7uwYNGsQSqGSZ118PxfIArr46FMrr1SvZmERyUFzJYRmlQ0VNK3ify4C73X1pTDFILvnxRzjzzLDCefDgsC0vL5TBEJFaF1dymEbpUFJrYE45bboB55jZREINp/tiikWy3fPPw267hUVtF1+sQnkiabBW6xzWwmhgspltCRwK9DWzwe7+25lL7t6x5LaZTXT302OKRbLZ+PHh+gq77x6SQrt2SUckUieYezz188ysGXAwMMndy51wXhv5+e38558L1j0wyXzuMGcObL89FBfDAw/ASSepHpJIDZjZNHdf629Vsa1zcPcl7v5EbSQGqUPmzg09hT33LC2Ud9ppSgwiaablo5IZioth+PAwt/Daa2Htgk5NFUlMXHMOItVXWAiHHgoTJ4ZLdo4YATvskHRUInWakoMkxx3Mwimpu+0GJ54IAwaEbSKSKA0rSTJmzoQOHUoL5d11V6igqsQgkhGUHCS9iorCyua2beHzz8Oks4hkHA0rSfq8/XYYNpo1C04+GW69FTbeOOmoRKQcSg6SPi+8AD//DC+9FCagRSRjxbYIrrZpEVyWmjAB1lsPDjoonJW0YgVssEHSUYnUGRm3CE7quKVLwwRzt24wZEjYlpenxCCSJZQcpPaNGRNOTX3gAbjssnDdBRHJKppzkNo1fjwceSS0bh2qqbZtm3REIlID6jnIunOHL74It7t1Cz2G995TYhDJYkoOsm6++QZ69gyJoKRQXr9+oCv3iWQ1JQepmeLisKq5ZUuYPBluuEGF8kRyiOYcZO0VFsIhh4SkcPDBoVDedtslHZWI1CIlB6m+1EJ5bdqE6yz066d6SCI5SMNKUj0zZsD++4eCeQC33w79+ysxiOQoJQepXGEhXHlluHbzV1/BDz8kHZGIpIGGlaRib70VCuXNnh2Gj4YNg+bNk45KRNJAyUEq9tJL8OuvMHYsdO+edDQikkYqvCe/N358WKPQuXMYUlq5EvLzk45KRGpIhfdk3SxeDKeeGnoIN94YtuXlKTGI1FFKDgJPPx0K5f33v3DFFaFwnojUaZpzqOvGjYM+fWDPPcPcQps2SUckIhlAPYe6yD1cvxnCCueHHoJ331ViEJHfKDnUNXPmQI8evy+Ud/LJ4WptIiIRJYe6orgY7rgDWrUK6xeGDIFNN006KhHJUPq6WBcsXx6Gj958M5yNNHw4bLtt0lGJSAZTcshlJYXyGjUK5S8GDgxDSKqHJCJV0LBSrnr/fdhnn1AwD+Bf/4JTTlFiEJFqUXLINcuXw2WXhcQwdy4sWpR0RCKShZQccsnkyeF01JtuCuW0Z82CLl2SjkpEspDmHHLJK6/AihXhd7duSUcjIllMhfey3csvQ8OGoYdQVASrVkGTJklHJSIZQoX36ppFi8IEc8+eMHRo2NawoRKDiNSK2JKDmY00s6lmdmUFj29oZi+b2Xgze9bM1o8rlpziDk8+GQrlPfooXHWVCuWJSK2LJTmY2dFAfXdvD+xgZjuX0+xEYJi7HwLMA3rEEUvOGT8ejj0WttkGCgrg+utDj0FEpBbF1XPoBDwR3R5Bu9OSAAANtElEQVQPdCjbwN3vdvdXorubAGtcnNjMBppZgZkVrFy5MqZQs4A7fPZZuH3IIfDww/D229C6dbJxiUjOiis5NAG+i24vBjarqKGZtQeaufvbZR9z9xHu3s7d2zVo0CCeSDPdl1+G0hft2sH334dFbCeeqEJ5IhKruJLDMqBRdLtpRe9jZs2BO4DTYooje61eHVY17757KKc9dChsVmGOFRGpVXF9/ZxGGEp6G2gNfFq2QTQB/SRwubt/HVMc2Wn58nBq6ttvw2GHwT33hDkGEZE0iavnMBo42cyGAccCH5vZ4DJtBgB7AVeY2UQzOy6mWLJHyZqTRo2gfXt45BF4/nklBhFJu9gWwZlZM+BgYJK7z1vX18v5RXAFBTBoENx3n67IJiK1JuMWwbn7End/ojYSQ05bvhwuuQT23TdMOC9ZknREIiJaIZ2oN96APfaAf/4TBgwIhfI6d046KhERFd5L1IQJ4fKdEyaoeqqIZBQV3ku3F18MK5q7dVOhPBGJXcbNOUgZCxaExWu9esGwYWGbCuWJSIZScoibOzz2WCiU9+STcO21MHp00lGJiFRKcw5xGzcOjj8+XLZz5Eho1SrpiEREqqSeQxzcYfbscLt791Ba+623lBhEJGsoOdS2L76Arl1DT2HevFAor29fqF8/6chERKpNyaG2rF4dJpp33x2mTYNbblGhPBHJWppzqA3Ll4fFa++8A4cfHgrlbbVV0lGJiNSYeg7rIrVQXocOYW5hzBglBhHJekoONfXOO7DnnjB9erh/881hbsEs2bhERGqBksPa+vVX+NvfYP/9YdEi+PHHpCMSEal1Sg5r4/XXw4TzsGFw5pnw8cdw0EFJRyUiUus0Ib023ngD6tWDiROVFEQkp6nwXlWeey5MOB98MKxYEQrlNW6c/jhEstTKlSuZO3cuhYWFSYeS0/Ly8th6661p0KDB77bXtPCeeg4V+eEHOO88ePxx6NkzJIf11w8/IlJtc+fOJT8/n+222w7TCRuxcHcWLVrE3Llz2X777WvlNTXnUJZ7uHbzbrvBM8/A9dfDs88mHZVI1iosLGTjjTdWYoiRmbHxxhvXau9MPYeyxo2Dk06C/fYL13Nu2TLpiESynhJD/Gp7H6vnAOFqbLNmhdvdu4ehpClTlBhEpM5Scvjf/0Lpi/32Ky2Ud+yxKpQnInVa3U0Oq1bB0KGwxx4wYwb8618qlCeSoxYvXkx+fv5vY/L9+/dnypQpAFx77bU8/PDDrF69moEDB3LggQfSr18/iouLY4nlk08+oXfv3pW2Wbp0KR07duSAAw7g5ZdfrnBbnOrmnMPy5dCxIxQUwJFHwl13wZZbJh2VSM674ILSijO1pU2b8N2uMq+88gqFhYVMmjSJQw45pNw2jz/+OEVFRUyePJlLL72U0aNHc/TRR6/Rrnfv3vyYUhnhhBNOYODAgdWK9YsvvuDiiy9m2bJllba7+uqrOe200zj55JPp1q0bPXr0KHdbnHM5dSs5uIdho0aNwlDSJZdAnz6qhySS48aOHcs555zD2LFjK0wO48aN47DDDgPguOOO45dffim33ZgxY2ocR35+Pk8//TTdu3evtN2kSZMYMmQI9evXp0WLFsyZM6fcbbV12mp56k5ymDoVBg2CBx4IBfOGDk06IpE6p6pv+HGZOnUqU6ZMoWvXrhW2mT9/Ps2bNwdgr732iiWOTTfdtFrt1ltvPZo2bQpA8+bNmT9/frnblBzWxS+/wBVXwO23w9Zbw88/Jx2RiKTRzJkzWbhwIX369GHOnDl8++23awzHmBkbbLDBb8M9o0ePZtmyZZx00klrvN66DCtVV/2UE2KWLVtGcXFxudvilNsT0hMmhOs233YbnH12KJTXsWPSUYlIGo0bN46///3vTJw4kfPOO49x48ax2Wab8eWXXwLw5Zdfsvnmm3PAAQfwyiuvAGGOYqONNir39caMGcPEiRN/+6ntxADQsmVLCgpCuaAZM2aw7bbblrstTrmdHCZPDuUuJk2CO++E/PykIxKRNBs3bhxdunQBoEuXLowdO5ZBgwYxfPhwDjzwQAoLC+ncuTMDBw5k8eLFdOjQgZ9++omePXumJb5Ro0bx1FNP/W7bWWedxYABAxg4cCD5+flstdVW5W6LU+4V3hs9OhTGO+SQUChv9eowAS0iifjkk0/Yddddkw4j63z++edMnz6dww8/nIYNG1a4LVV5+1qF9+bPh3PPhSefhF69QnJQkTyRjODuKqGxlnbaaSd22mmnKreVqO0v+tk/rOQODz0Eu+4aymsPGRIK5olIRsjLy2PRokW1fvCSUiVVWfPy8mrtNbO/5zB2LPTrFy7bOXIk7LJL0hGJSIqtt96auXPnsmDBgqRDyWkl13OoLdmZHEoK5bVqBT16wFNPwVFHhau0iUhGadCgQazn40s8su9o+umn4RKd7duXFsr785+VGEREalFsR1QzG2lmU83synVp81tbHG68EVq3DusV7rxThfJERGISS3Iws6OB+u7eHtjBzHauSZtU2xXOhssvD2cizZoV5hl09oOISCzimnPoBDwR3R4PdAD+t7ZtzGwgULL8sMjgI55+Gp5+OoaQs8ofgIVJB5EhtC9KaV+U0r4o1aImT4orOTQBvotuLwbKq2JVZRt3HwGMADCzgpos5MhF2heltC9KaV+U0r4oZWbVWD28prjmHJYBJcuSm1bwPtVpIyIiCYjrgDyNMEwE0BqYU8M2IiKSgLiGlUYDk81sS+BQoK+ZDXb3Kytps18VrzkinlCzkvZFKe2LUtoXpbQvStVoX8RWeM/MmgEHA5PcfV5N24iISPplTVVWERFJH00Ci4jIGpQcRERkDRmXHGq77EY2q+pzmtmGZvaymY03s2fNLGcvYFHdf3Mz28zMPkhXXElYi31xt5kdnq64klCNv5FmZvaSmRWY2fB0x5dO0f/9yZU83sDMnjezN83stKpeL6OSQxxlN7JVNT/nicAwdz8EmAf0SGeM6bKW/+Y3U7p+JudUd1+Y2YHA5u7+fFoDTKNq7ouTgUeiBXH5ZpaTC+Oik3seJCwursi5wDR3PwDoY2aVXjc5o5ID5ZfUqEmbXNCJKj6nu9/t7q9EdzcBfkhPaGnXiWr8m5tZF+AXQqLMVZ2oYl+YWQPgXmCOmfVOX2hp14mq/18sAlqZ2UbANsC36Qkt7VYDxwE/VdKmE6X7axJQaaLMtORQtqRGeWVXq9MmF1T7c5pZe6CZu7+djsASUOW+iIbUrgIuS2NcSajO/4tTgFnAUGAfMzs3TbGlW3X2xRRgW+A84JOoXc5x95/c/ccqmq3VsTPTkoPKbpSq1uc0s+bAHUCVY4hZrDr74jLgbndfmraoklGdfbEnMCJaO/Qw0DlNsaVbdfbFNcAgd78emA2cmqbYMtFaHTsz7cCqshulqvyc0bflJ4HL3f3r9IWWdtX5N+8GnGNmE4E2ZnZfekJLu+rsi8+BHaLb7YBc/b9RnX3RDNjdzOoD+wJ1eWHX2h073T1jfoANgBnAMEIXsDUwuIo2GyYdd4L74ixgCTAx+jku6biT2hdl2k9MOuaE/1/kE740TAKmAlslHXeC+2If4GPCt+ZXgKZJxx3zPpkY/e4C/KXMY9tG++I24D3CZH6Fr5VxK6RVdqNUXfmc1aF9UUr7opT2xdqJatl1AMZ5FXMUGZccREQkeZk25yAiIhlAyUFERNag5CB1mpntXeb+ema21n8XZla/qhWnItlEyUHqrOhgfouZjTWzV81sNnAG8JqZzYva/GBmE81siZm1j9ql/mwTvdz2lF7vfEsz2yB6fIMy79kg5fb5ZtY35X5cF98SWWv6zyh12eHA48De7t7fzJ5y93uAe8xsdNTmPeBI4FlgFVDg7pcBRGspSv6GCqNtWxJW474YbVtR5j3fMrPlQDHwR+AbMxsEGNDYzDq6+/J4Pq5I9Sk5SF12MjAS2NTMCoCNzOwqwqmRO0VtnLB46h3KX0DlUfmSIwjrC4YCzYG2hPPunzOzJkBnd18BjAXeBf4A7Ah8Saj3sxWwoxKDZAolB6mTzKwt0CK6+4O794y2b0A4wK9Oad6ZsLq0IrsA84EtgPeBw9z9YDN7Aejj7oUpbYcA2xHqQDUhrGY+DugJXLtun0qk9mjOQeqqDYG7ott/iOYdriOsKn6cUIenxNNA/+h235L5BuDQaNsKSpPHc8CtZmYlTy5znY1dgH7ApcAHhBLj9xISUL/a+Wgi6049B6mT3P21qGghwMJozqE+YS7gYVK+xbv7rOix5sBjZeYccPdHzGzr6PYcM+sEjAPaEJLFKqCnmW0InAlcAdxHSFAXASuBK4HjzWz9aPhJJFFKDiKh5/AqYfL5YcI8xAMlD5rZ6YRv/EXVeTF3fwB4oJxhpT8TzmoaGd3fhDBP8R2h3k0DYCEhcYgkSslB6jIjTDIvdPf+AGa2C7A7pReFqUcYanoK+BNwgpntFz3WArgx5bUsGk6q5+6/zVmUrJtw9/uB+1O2XwDMc/fHYvl0IutAyUHqsgapv6Pr6h5BKHN9ezSU1NDdP4seX0C4NsBL0f1DCVVxARpGP3sQ1k6siraPJiSYWwhDTakaor9ByVAqvCcSMbOG7l4U3a5H+PtYXcXTRHKSkoOIiKxBp7KKiMgalBxERGQNSg4iIrIGJQcREVnD/wcT7lyERlMKUgAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from sklearn.metrics import roc_curve,auc\n",
"## 绘制ROC曲线\n",
"# 利用逻辑回归的predict_proba函数输出预测概率\n",
"predictions_pro = LR.predict_proba(X_test)\n",
"# 利用roc_curve函数生成如下指标\n",
"false_positive_rate, recall, thresholds = roc_curve(y_test, predictions_pro[:,1])\n",
"\n",
"roc_auc = auc(false_positive_rate, recall)\n",
"plt.title(\"受试者操作特征曲线(ROC)\")\n",
"plt.plot(false_positive_rate, recall, 'b', label='AUC = % 0.2f' % roc_auc)\n",
"plt.legend(loc='lower right')\n",
"plt.plot([0,1],[0,1],'r--')\n",
"plt.xlim([0.0, 1.0])\n",
"plt.ylim([0.0, 1.0])\n",
"plt.xlabel('假阳性率')\n",
"plt.ylabel('召回率')\n",
"plt.show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment