Commit c6e8b5f4 by 20200519029

0621

parents 3c46eefb 6eb34d5e
......@@ -18,8 +18,8 @@
| 6月14日 (周日) 10:30AM | (直播-Lecture4) <br> 凸优化 (2) |Dualtiy, KKT条件,SVM的primal-Dual|[Lecture4](http://47.94.6.102/NLP7/course-info/blob/master/%E8%AF%BE%E4%BB%B6/0614%E5%87%B8%E4%BC%98%E5%8C%96%EF%BC%882%EF%BC%89.pptx)|||
| 6月14日 (周日) 8:00PM | (直播-Discussion) <br> Inventory Optimization with Stochastic Programming ||[课件](http://47.94.6.102/NLP7/course-info/blob/master/%E8%AF%BE%E4%BB%B6/0614%20Inventory%20Optimization%20with%20Stochastic%20Programming%5B%E9%98%BF%E5%8B%87%5D.pptx)|||
| PART 1 自然语言处理基础 |
| 6月21日 (周日,父亲节) 10:30AM | (直播-Lecture5) <br>文本表示|分词<br>拼写纠错<br>停用词过滤<br>词的标准化<br>词袋模型<br>文本相似度计算<br>词向量<br>句子向量<br>语言模型||||[project1](http://47.94.6.102/NLP7/course-info/blob/master/%E8%AF%BE%E4%BB%B6/project1/Project1%E9%A1%B9%E7%9B%AE.zip) <br><br> 截止日期:7月1日(周三)<br>北京时间 23:59PM, <br>上传到gitlab|
| 6月21日 (周日) 5:30PM | (直播-Discussion) <br>各类文本相似度计算技术的Survey| 短文本<br>长文本 |||||
| 6月21日 (周日,父亲节) 10:30AM | (直播-Lecture5) <br>文本表示|分词<br>拼写纠错<br>停用词过滤<br>词的标准化<br>词袋模型<br>文本相似度计算<br>词向量<br>句子向量<br>语言模型|[Lecture5](http://47.94.6.102/NLP7/course-info/blob/master/%E8%AF%BE%E4%BB%B6/0621%E6%96%87%E6%9C%AC%E8%A1%A8%E7%A4%BA.pptx)|||[project1](http://47.94.6.102/NLP7/course-info/blob/master/%E8%AF%BE%E4%BB%B6/project1/Project1%E9%A1%B9%E7%9B%AE.zip) <br><br> 截止日期:7月1日(周三)<br>北京时间 23:59PM, <br>上传到gitlab|
| 6月21日 (周日) 5:30PM | (直播-Discussion) <br>各类文本相似度计算技术的Survey| 短文本<br>长文本 |[课件](http://47.94.6.102/NLP7/course-info/blob/master/%E8%AF%BE%E4%BB%B6/0621%E6%96%87%E6%9C%AC%E7%9B%B8%E4%BC%BC%E5%BA%A6%E8%AE%A1%E7%AE%97%E6%8A%80%E6%9C%AF.pptx)||||
| 6月21日 (周日) 8:00PM | (直播-Discussion) 搜索引擎技术介绍|Vector-space model, 倒排表索引,PageRank等||||
| TBD | (直播-Discussion) 词向量实战:如何使用Glove, BERT词向量在自己的项目当中? ||||||
| TBD | (直播-Discussion) 问答系统的搭建:完整流程,相似度匹配,排序,文本预处理等||||
......
需要安装包
需要安装包
pip install sklearn
pip install pandas
pip install xgboost
pip install lightgbm
pip install catboost
预处理参考https://www.jianshu.com/p/3efc09ef4369
回归任务和部分实验尚未针对该数据集做测试,只有未被注释的部分实验可运行成功。
\ No newline at end of file
#!/usr/bin/env python
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import numpy as np
import pandas as pd
import time
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn import tree
# import matplotlib as plt
from sklearn.metrics import accuracy_score
def load_loan_pred(path_train):
# reading the dataset
df = pd.read_csv(path_train)
print(df.head())
print(df.describe())
print(df.isnull().sum())
# filling missing values
# pre-process according to https://www.jianshu.com/p/3efc09ef4369
df['Gender'].fillna(df['Gender'].value_counts().idxmax(), inplace=True)
df['Married'].fillna(df['Married'].value_counts().idxmax(), inplace=True)
df['Dependents'].fillna(df['Dependents'].value_counts().idxmax(), inplace=True)
df['Self_Employed'].fillna(df['Self_Employed'].value_counts().idxmax(), inplace=True)
df["LoanAmount"].fillna(df["LoanAmount"].mean(skipna=True), inplace=True)
df['Loan_Amount_Term'].fillna(df['Loan_Amount_Term'].value_counts().idxmax(), inplace=True)
df['Credit_History'].fillna(df['Credit_History'].value_counts().idxmax(), inplace=True)
df['LoanAmount_log'] = np.log(df['LoanAmount'])
df['TotalIncome'] = df['ApplicantIncome'] + df['CoapplicantIncome']
df['TotalIncome_log'] = np.log(df['TotalIncome'])
# df['LoanAmount_log'].hist(bins=20)
# print(df.isnull().sum())
from sklearn.preprocessing import LabelEncoder
var_mod = ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Property_Area', 'Loan_Status']
le = LabelEncoder()
for i in var_mod:
df[i] = le.fit_transform(df[i])
fields = ['Credit_History', 'Education', 'Married', 'Self_Employed', 'Property_Area','Loan_Status']
df = df[fields]
print(df.head())
# split dataset into train and test
train, test = train_test_split(df, test_size=0.3, random_state=0)
x_train = train.drop('Loan_Status', axis=1)
y_train = train['Loan_Status']
x_test = test.drop('Loan_Status', axis=1)
y_test = test['Loan_Status']
# create dummies
x_train = pd.get_dummies(x_train)
x_test = pd.get_dummies(x_test)
return x_train, y_train, x_test, y_test
def exp_voting(x_train,y_train,x_test,y_test):
model1 = LogisticRegression(random_state=1)
model2 = DecisionTreeClassifier(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')
model.fit(x_train,y_train)
model.score(x_test,y_test)
def exp_avg(x_train,y_train,x_test):
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict_proba(x_test)
pred2=model2.predict_proba(x_test)
pred3=model3.predict_proba(x_test)
finalpred=(pred1+pred2+pred3)/3
def exp_avg_weighted(x_train,y_train,x_test):
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict_proba(x_test)
pred2=model2.predict_proba(x_test)
pred3=model3.predict_proba(x_test)
finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)
def Stacking(model,train,y,test,n_fold):
folds=StratifiedKFold(n_splits=n_fold,random_state=1)
test_pred=np.empty((test.shape[0],1),float)
train_pred=np.empty((0,1),float)
for train_indices,val_indices in folds.split(train,y.values):
x_train,x_val=train.iloc[train_indices],train.iloc[val_indices]
y_train,y_val=y.iloc[train_indices],y.iloc[val_indices]
model.fit(X=x_train,y=y_train)
train_pred=np.append(train_pred,model.predict(x_val))
test_pred=np.append(test_pred,model.predict(test))
return test_pred.reshape(-1,1),train_pred
def exp_stacking(x_train,y_train,x_test,y_test):
model1 = DecisionTreeClassifier(random_state=1)
test_pred1, train_pred1 = Stacking(model=model1, n_fold=10, train=x_train, test=x_test, y=y_train)
train_pred1 = pd.DataFrame(train_pred1)
test_pred1 = pd.DataFrame(test_pred1)
model2 = KNeighborsClassifier()
test_pred2, train_pred2 = Stacking(model=model2, n_fold=10, train=x_train, test=x_test, y=y_train)
train_pred2 = pd.DataFrame(train_pred2)
test_pred2 = pd.DataFrame(test_pred2)
df = pd.concat([train_pred1, train_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis=1)
model = LogisticRegression(random_state=1)
model.fit(df, y_train)
score = model.score(df_test, y_test)
print(score)
def exp_bagging(x_train,y_train,x_test,y_test):
from sklearn.ensemble import BaggingClassifier
from sklearn import tree
model = BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
model.fit(x_train, y_train)
score = model.score(x_test, y_test)
print(score)
def exp_bagging_regress(x_train,y_train,x_test,y_test):
from sklearn.ensemble import BaggingRegressor
model = BaggingRegressor(tree.DecisionTreeRegressor(random_state=1))
model.fit(x_train, y_train)
model.score(x_test,y_test)
def exp_adaboost(x_train,y_train,x_test,y_test):
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier(random_state=1)
model.fit(x_train, y_train)
score = model.score(x_test, y_test)
print(score)
def exp_adaboost_regress(x_train, y_train, x_test, y_test):
from sklearn.ensemble import AdaBoostRegressor
model = AdaBoostRegressor()
model.fit(x_train, y_train)
model.score(x_test, y_test)
def exp_gbm(x_train, y_train, x_test, y_test):
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(learning_rate=0.01, random_state=1)
model.fit(x_train, y_train)
score = model.score(x_test, y_test)
print(score)
def exp_gbm_regress(x_train, y_train, x_test, y_test):
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor()
model.fit(x_train, y_train)
model.score(x_test, y_test)
def exp_xgboost(x_train, y_train, x_test, y_test):
import xgboost as xgb
model = xgb.XGBClassifier(random_state=1, learning_rate=0.01)
model.fit(x_train, y_train)
score = model.score(x_test, y_test)
print(score)
def exp_xgboost_regress(x_train, y_train, x_test, y_test):
import xgboost as xgb
model = xgb.XGBRegressor()
model.fit(x_train, y_train)
model.score(x_test, y_test)
def exp_lgb(x_train, y_train, x_test, y_test):
import lightgbm as lgb
train_data = lgb.Dataset(x_train, label=y_train)
# define parameters
params = {'learning_rate': 0.001}
model = lgb.train(params, train_data, 100)
y_pred = model.predict(x_test)
for i in range(0, 185):
if y_pred[i] >= 0.5:
y_pred[i] = 1
else:
y_pred[i] = 0
score = accuracy_score(y_test, y_pred)
print(score)
def exp_lgb_regress(x_train, y_train, x_test, y_test):
import lightgbm as lgb
train_data = lgb.Dataset(x_train, label=y_train)
params = {'learning_rate': 0.001}
model = lgb.train(params, train_data, 100)
from sklearn.metrics import mean_squared_error
rmse = mean_squared_error(y_pred, y_test) ** 0.5
def exp_catboost(x_train, y_train, x_test, y_test):
from catboost import CatBoostClassifier
model = CatBoostClassifier()
# categorical_features_indices = np.where(df.dtypes != np.float)[0]
model.fit(x_train, y_train, cat_features=([0, 1, 2, 3, 4, ]), eval_set=(x_test, y_test))
score = model.score(x_test, y_test)
print(score)
def exp_catboost_regress(x_train, y_train, x_test, y_test):
from catboost import CatBoostRegressor
model = CatBoostRegressor()
categorical_features_indices = np.where(df.dtypes != np.float)[0]
model.fit(x_train, y_train, cat_features=([0, 1, 2, 3, 4, 10]), eval_set=(x_test, y_test))
model.score(x_test, y_test)
def main():
path_train = 'train_loan_pred.csv'
[x_train, y_train, x_test, y_test] = load_loan_pred(path_train)
# voting/avging
# exp_voting(x_train,y_train,x_test,y_test)
# exp_avg(x_train, y_train, x_test)
# exp_avg_weighted(x_train, y_train, x_test)
# stacking
# exp_stacking(x_train, y_train, x_test, y_test)
# bagging
exp_bagging(x_train, y_train, x_test, y_test)
# boosting
exp_adaboost(x_train, y_train, x_test, y_test)
exp_gbm(x_train, y_train, x_test, y_test)
exp_xgboost(x_train, y_train, x_test, y_test)
# exp_lgb(x_train, y_train, x_test, y_test)
# exp_catboost(x_train, y_train, x_test, y_test)
# bagging/boosting regression
# exp_bagging_regress(x_train, y_train, x_test, y_test)
# exp_adaboost_regress(x_train, y_train, x_test, y_test)
# exp_gbm_regress(x_train, y_train, x_test, y_test)
# exp_xgboost_regress(x_train, y_train, x_test, y_test)
# exp_lgb_regress(x_train, y_train, x_test, y_test)
# exp_catboost_regress(x_train, y_train, x_test, y_test)
if __name__ == '__main__':
t_start = time.time()
main()
t_end = time.time()
t_all = t_end - t_start
print('whole time: {:.2f} min'.format(t_all / 60.))
Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area
Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area
LP001015,Male,Yes,0,Graduate,No,5720,0,110,360,1,Urban
LP001022,Male,Yes,1,Graduate,No,3076,1500,126,360,1,Urban
LP001031,Male,Yes,2,Graduate,No,5000,1800,208,360,1,Urban
LP001035,Male,Yes,2,Graduate,No,2340,2546,100,360,,Urban
LP001051,Male,No,0,Not Graduate,No,3276,0,78,360,1,Urban
LP001054,Male,Yes,0,Not Graduate,Yes,2165,3422,152,360,1,Urban
LP001055,Female,No,1,Not Graduate,No,2226,0,59,360,1,Semiurban
LP001056,Male,Yes,2,Not Graduate,No,3881,0,147,360,0,Rural
LP001059,Male,Yes,2,Graduate,,13633,0,280,240,1,Urban
LP001067,Male,No,0,Not Graduate,No,2400,2400,123,360,1,Semiurban
LP001078,Male,No,0,Not Graduate,No,3091,0,90,360,1,Urban
LP001082,Male,Yes,1,Graduate,,2185,1516,162,360,1,Semiurban
LP001083,Male,No,3+,Graduate,No,4166,0,40,180,,Urban
LP001094,Male,Yes,2,Graduate,,12173,0,166,360,0,Semiurban
LP001096,Female,No,0,Graduate,No,4666,0,124,360,1,Semiurban
LP001099,Male,No,1,Graduate,No,5667,0,131,360,1,Urban
LP001105,Male,Yes,2,Graduate,No,4583,2916,200,360,1,Urban
LP001107,Male,Yes,3+,Graduate,No,3786,333,126,360,1,Semiurban
LP001108,Male,Yes,0,Graduate,No,9226,7916,300,360,1,Urban
LP001115,Male,No,0,Graduate,No,1300,3470,100,180,1,Semiurban
LP001121,Male,Yes,1,Not Graduate,No,1888,1620,48,360,1,Urban
LP001124,Female,No,3+,Not Graduate,No,2083,0,28,180,1,Urban
LP001128,,No,0,Graduate,No,3909,0,101,360,1,Urban
LP001135,Female,No,0,Not Graduate,No,3765,0,125,360,1,Urban
LP001149,Male,Yes,0,Graduate,No,5400,4380,290,360,1,Urban
LP001153,Male,No,0,Graduate,No,0,24000,148,360,0,Rural
LP001163,Male,Yes,2,Graduate,No,4363,1250,140,360,,Urban
LP001169,Male,Yes,0,Graduate,No,7500,3750,275,360,1,Urban
LP001174,Male,Yes,0,Graduate,No,3772,833,57,360,,Semiurban
LP001176,Male,No,0,Graduate,No,2942,2382,125,180,1,Urban
LP001177,Female,No,0,Not Graduate,No,2478,0,75,360,1,Semiurban
LP001183,Male,Yes,2,Graduate,No,6250,820,192,360,1,Urban
LP001185,Male,No,0,Graduate,No,3268,1683,152,360,1,Semiurban
LP001187,Male,Yes,0,Graduate,No,2783,2708,158,360,1,Urban
LP001190,Male,Yes,0,Graduate,No,2740,1541,101,360,1,Urban
LP001203,Male,No,0,Graduate,No,3150,0,176,360,0,Semiurban
LP001208,Male,Yes,2,Graduate,,7350,4029,185,180,1,Urban
LP001210,Male,Yes,0,Graduate,Yes,2267,2792,90,360,1,Urban
LP001211,Male,No,0,Graduate,Yes,5833,0,116,360,1,Urban
LP001219,Male,No,0,Graduate,No,3643,1963,138,360,1,Urban
LP001220,Male,Yes,0,Graduate,No,5629,818,100,360,1,Urban
LP001221,Female,No,0,Graduate,No,3644,0,110,360,1,Urban
LP001226,Male,Yes,0,Not Graduate,No,1750,2024,90,360,1,Semiurban
LP001230,Male,No,0,Graduate,No,6500,2600,200,360,1,Semiurban
LP001231,Female,No,0,Graduate,No,3666,0,84,360,1,Urban
LP001232,Male,Yes,0,Graduate,No,4260,3900,185,,,Urban
LP001237,Male,Yes,,Not Graduate,No,4163,1475,162,360,1,Urban
LP001242,Male,No,0,Not Graduate,No,2356,1902,108,360,1,Semiurban
LP001268,Male,No,0,Graduate,No,6792,3338,187,,1,Urban
LP001270,Male,Yes,3+,Not Graduate,Yes,8000,250,187,360,1,Semiurban
LP001284,Male,Yes,1,Graduate,No,2419,1707,124,360,1,Urban
LP001287,,Yes,3+,Not Graduate,No,3500,833,120,360,1,Semiurban
LP001291,Male,Yes,1,Graduate,No,3500,3077,160,360,1,Semiurban
LP001298,Male,Yes,2,Graduate,No,4116,1000,30,180,1,Urban
LP001312,Male,Yes,0,Not Graduate,Yes,5293,0,92,360,1,Urban
LP001313,Male,No,0,Graduate,No,2750,0,130,360,0,Urban
LP001317,Female,No,0,Not Graduate,No,4402,0,130,360,1,Rural
LP001321,Male,Yes,2,Graduate,No,3613,3539,134,180,1,Semiurban
LP001323,Female,Yes,2,Graduate,No,2779,3664,176,360,0,Semiurban
LP001324,Male,Yes,3+,Graduate,No,4720,0,90,180,1,Semiurban
LP001332,Male,Yes,0,Not Graduate,No,2415,1721,110,360,1,Semiurban
LP001335,Male,Yes,0,Graduate,Yes,7016,292,125,360,1,Urban
LP001338,Female,No,2,Graduate,No,4968,0,189,360,1,Semiurban
LP001347,Female,No,0,Graduate,No,2101,1500,108,360,0,Rural
LP001348,Male,Yes,3+,Not Graduate,No,4490,0,125,360,1,Urban
LP001351,Male,Yes,0,Graduate,No,2917,3583,138,360,1,Semiurban
LP001352,Male,Yes,0,Not Graduate,No,4700,0,135,360,0,Semiurban
LP001358,Male,Yes,0,Graduate,No,3445,0,130,360,0,Semiurban
LP001359,Male,Yes,0,Graduate,No,7666,0,187,360,1,Semiurban
LP001361,Male,Yes,0,Graduate,No,2458,5105,188,360,0,Rural
LP001366,Female,No,,Graduate,No,3250,0,95,360,1,Semiurban
LP001368,Male,No,0,Graduate,No,4463,0,65,360,1,Semiurban
LP001375,Male,Yes,1,Graduate,,4083,1775,139,60,1,Urban
LP001380,Male,Yes,0,Graduate,Yes,3900,2094,232,360,1,Rural
LP001386,Male,Yes,0,Not Graduate,No,4750,3583,144,360,1,Semiurban
LP001400,Male,No,0,Graduate,No,3583,3435,155,360,1,Urban
LP001407,Male,Yes,0,Graduate,No,3189,2367,186,360,1,Urban
LP001413,Male,No,0,Graduate,Yes,6356,0,50,360,1,Rural
LP001415,Male,Yes,1,Graduate,No,3413,4053,,360,1,Semiurban
LP001419,Female,Yes,0,Graduate,No,7950,0,185,360,1,Urban
LP001420,Male,Yes,3+,Graduate,No,3829,1103,163,360,0,Urban
LP001428,Male,Yes,3+,Graduate,No,72529,0,360,360,1,Urban
LP001445,Male,Yes,2,Not Graduate,No,4136,0,149,480,0,Rural
LP001446,Male,Yes,0,Graduate,No,8449,0,257,360,1,Rural
LP001450,Male,Yes,0,Graduate,No,4456,0,131,180,0,Semiurban
LP001452,Male,Yes,2,Graduate,No,4635,8000,102,180,1,Rural
LP001455,Male,Yes,0,Graduate,No,3571,1917,135,360,1,Urban
LP001466,Male,No,0,Graduate,No,3066,0,95,360,1,Semiurban
LP001471,Male,No,2,Not Graduate,No,3235,2015,77,360,1,Semiurban
LP001472,Female,No,0,Graduate,,5058,0,200,360,1,Rural
LP001475,Male,Yes,0,Graduate,Yes,3188,2286,130,360,,Rural
LP001483,Male,Yes,3+,Graduate,No,13518,0,390,360,1,Rural
LP001486,Male,Yes,1,Graduate,No,4364,2500,185,360,1,Semiurban
LP001490,Male,Yes,2,Not Graduate,No,4766,1646,100,360,1,Semiurban
LP001496,Male,Yes,1,Graduate,No,4609,2333,123,360,0,Semiurban
LP001499,Female,Yes,3+,Graduate,No,6260,0,110,360,1,Semiurban
LP001500,Male,Yes,1,Graduate,No,3333,4200,256,360,1,Urban
LP001501,Male,Yes,0,Graduate,No,3500,3250,140,360,1,Semiurban
LP001517,Male,Yes,3+,Graduate,No,9719,0,61,360,1,Urban
LP001527,Male,Yes,3+,Graduate,No,6835,0,188,360,,Semiurban
LP001534,Male,No,0,Graduate,No,4452,0,131,360,1,Rural
LP001542,Female,Yes,0,Graduate,No,2262,0,,480,0,Semiurban
LP001547,Male,Yes,1,Graduate,No,3901,0,116,360,1,Urban
LP001548,Male,Yes,2,Not Graduate,No,2687,0,50,180,1,Rural
LP001558,Male,No,0,Graduate,No,2243,2233,107,360,,Semiurban
LP001561,Female,Yes,0,Graduate,No,3417,1287,200,360,1,Semiurban
LP001563,,No,0,Graduate,No,1596,1760,119,360,0,Urban
LP001567,Male,Yes,3+,Graduate,No,4513,0,120,360,1,Rural
LP001568,Male,Yes,0,Graduate,No,4500,0,140,360,1,Semiurban
LP001573,Male,Yes,0,Not Graduate,No,4523,1350,165,360,1,Urban
LP001584,Female,No,0,Graduate,Yes,4742,0,108,360,1,Semiurban
LP001587,Male,Yes,,Graduate,No,4082,0,93,360,1,Semiurban
LP001589,Female,No,0,Graduate,No,3417,0,102,360,1,Urban
LP001591,Female,Yes,2,Graduate,No,2922,3396,122,360,1,Semiurban
LP001599,Male,Yes,0,Graduate,No,4167,4754,160,360,1,Rural
LP001601,Male,No,3+,Graduate,No,4243,4123,157,360,,Semiurban
LP001607,Female,No,0,Not Graduate,No,0,1760,180,360,1,Semiurban
LP001611,Male,Yes,1,Graduate,No,1516,2900,80,,0,Rural
LP001613,Female,No,0,Graduate,No,1762,2666,104,360,0,Urban
LP001622,Male,Yes,2,Graduate,No,724,3510,213,360,0,Rural
LP001627,Male,No,0,Graduate,No,3125,0,65,360,1,Urban
LP001650,Male,Yes,0,Graduate,No,2333,3803,146,360,1,Rural
LP001651,Male,Yes,3+,Graduate,No,3350,1560,135,360,1,Urban
LP001652,Male,No,0,Graduate,No,2500,6414,187,360,0,Rural
LP001655,Female,No,0,Graduate,No,12500,0,300,360,0,Urban
LP001660,Male,No,0,Graduate,No,4667,0,120,360,1,Semiurban
LP001662,Male,No,0,Graduate,No,6500,0,71,360,0,Urban
LP001663,Male,Yes,2,Graduate,No,7500,0,225,360,1,Urban
LP001667,Male,No,0,Graduate,No,3073,0,70,180,1,Urban
LP001695,Male,Yes,1,Not Graduate,No,3321,2088,70,,1,Semiurban
LP001703,Male,Yes,0,Graduate,No,3333,1270,124,360,1,Urban
LP001718,Male,No,0,Graduate,No,3391,0,132,360,1,Rural
LP001728,Male,Yes,1,Graduate,Yes,3343,1517,105,360,1,Rural
LP001735,Female,No,1,Graduate,No,3620,0,90,360,1,Urban
LP001737,Male,No,0,Graduate,No,4000,0,83,84,1,Urban
LP001739,Male,Yes,0,Graduate,No,4258,0,125,360,1,Urban
LP001742,Male,Yes,2,Graduate,No,4500,0,147,360,1,Rural
LP001757,Male,Yes,1,Graduate,No,2014,2925,120,360,1,Rural
LP001769,,No,,Graduate,No,3333,1250,110,360,1,Semiurban
LP001771,Female,No,3+,Graduate,No,4083,0,103,360,,Semiurban
LP001785,Male,No,0,Graduate,No,4727,0,150,360,0,Rural
LP001787,Male,Yes,3+,Graduate,No,3089,2999,100,240,1,Rural
LP001789,Male,Yes,3+,Not Graduate,,6794,528,139,360,0,Urban
LP001791,Male,Yes,0,Graduate,Yes,32000,0,550,360,,Semiurban
LP001794,Male,Yes,2,Graduate,Yes,10890,0,260,12,1,Rural
LP001797,Female,No,0,Graduate,No,12941,0,150,300,1,Urban
LP001815,Male,No,0,Not Graduate,No,3276,0,90,360,1,Semiurban
LP001817,Male,No,0,Not Graduate,Yes,8703,0,199,360,0,Rural
LP001818,Male,Yes,1,Graduate,No,4742,717,139,360,1,Semiurban
LP001822,Male,No,0,Graduate,No,5900,0,150,360,1,Urban
LP001827,Male,No,0,Graduate,No,3071,4309,180,360,1,Urban
LP001831,Male,Yes,0,Graduate,No,2783,1456,113,360,1,Urban
LP001842,Male,No,0,Graduate,No,5000,0,148,360,1,Rural
LP001853,Male,Yes,1,Not Graduate,No,2463,2360,117,360,0,Urban
LP001855,Male,Yes,2,Graduate,No,4855,0,72,360,1,Rural
LP001857,Male,No,0,Not Graduate,Yes,1599,2474,125,300,1,Semiurban
LP001862,Male,Yes,2,Graduate,Yes,4246,4246,214,360,1,Urban
LP001867,Male,Yes,0,Graduate,No,4333,2291,133,350,1,Rural
LP001878,Male,No,1,Graduate,No,5823,2529,187,360,1,Semiurban
LP001881,Male,Yes,0,Not Graduate,No,7895,0,143,360,1,Rural
LP001886,Male,No,0,Graduate,No,4150,4256,209,360,1,Rural
LP001906,Male,No,0,Graduate,,2964,0,84,360,0,Semiurban
LP001909,Male,No,0,Graduate,No,5583,0,116,360,1,Urban
LP001911,Female,No,0,Graduate,No,2708,0,65,360,1,Rural
LP001921,Male,No,1,Graduate,No,3180,2370,80,240,,Rural
LP001923,Male,No,0,Not Graduate,No,2268,0,170,360,0,Semiurban
LP001933,Male,No,2,Not Graduate,No,1141,2017,120,360,0,Urban
LP001943,Male,Yes,0,Graduate,No,3042,3167,135,360,1,Urban
LP001950,Female,Yes,3+,Graduate,,1750,2935,94,360,0,Semiurban
LP001959,Female,Yes,1,Graduate,No,3564,0,79,360,1,Rural
LP001961,Female,No,0,Graduate,No,3958,0,110,360,1,Rural
LP001973,Male,Yes,2,Not Graduate,No,4483,0,130,360,1,Rural
LP001975,Male,Yes,0,Graduate,No,5225,0,143,360,1,Rural
LP001979,Male,No,0,Graduate,No,3017,2845,159,180,0,Urban
LP001995,Male,Yes,0,Not Graduate,No,2431,1820,110,360,0,Rural
LP001999,Male,Yes,2,Graduate,,4912,4614,160,360,1,Rural
LP002007,Male,Yes,2,Not Graduate,No,2500,3333,131,360,1,Urban
LP002009,Female,No,0,Graduate,No,2918,0,65,360,,Rural
LP002016,Male,Yes,2,Graduate,No,5128,0,143,360,1,Rural
LP002017,Male,Yes,3+,Graduate,No,15312,0,187,360,,Urban
LP002018,Male,Yes,2,Graduate,No,3958,2632,160,360,1,Semiurban
LP002027,Male,Yes,0,Graduate,No,4334,2945,165,360,1,Semiurban
LP002028,Male,Yes,2,Graduate,No,4358,0,110,360,1,Urban
LP002042,Female,Yes,1,Graduate,No,4000,3917,173,360,1,Rural
LP002045,Male,Yes,3+,Graduate,No,10166,750,150,,1,Urban
LP002046,Male,Yes,0,Not Graduate,No,4483,0,135,360,,Semiurban
LP002047,Male,Yes,2,Not Graduate,No,4521,1184,150,360,1,Semiurban
LP002056,Male,Yes,2,Graduate,No,9167,0,235,360,1,Semiurban
LP002057,Male,Yes,0,Not Graduate,No,13083,0,,360,1,Rural
LP002059,Male,Yes,2,Graduate,No,7874,3967,336,360,1,Rural
LP002062,Female,Yes,1,Graduate,No,4333,0,132,84,1,Rural
LP002064,Male,No,0,Graduate,No,4083,0,96,360,1,Urban
LP002069,Male,Yes,2,Not Graduate,,3785,2912,180,360,0,Rural
LP002070,Male,Yes,3+,Not Graduate,No,2654,1998,128,360,0,Rural
LP002077,Male,Yes,1,Graduate,No,10000,2690,412,360,1,Semiurban
LP002083,Male,No,0,Graduate,Yes,5833,0,116,360,1,Urban
LP002090,Male,Yes,1,Graduate,No,4796,0,114,360,0,Semiurban
LP002096,Male,Yes,0,Not Graduate,No,2000,1600,115,360,1,Rural
LP002099,Male,Yes,2,Graduate,No,2540,700,104,360,0,Urban
LP002102,Male,Yes,0,Graduate,Yes,1900,1442,88,360,1,Rural
LP002105,Male,Yes,0,Graduate,Yes,8706,0,108,480,1,Rural
LP002107,Male,Yes,3+,Not Graduate,No,2855,542,90,360,1,Urban
LP002111,Male,Yes,,Graduate,No,3016,1300,100,360,,Urban
LP002117,Female,Yes,0,Graduate,No,3159,2374,108,360,1,Semiurban
LP002118,Female,No,0,Graduate,No,1937,1152,78,360,1,Semiurban
LP002123,Male,Yes,0,Graduate,No,2613,2417,123,360,1,Semiurban
LP002125,Male,Yes,1,Graduate,No,4960,2600,187,360,1,Semiurban
LP002148,Male,Yes,1,Graduate,No,3074,1083,146,360,1,Semiurban
LP002152,Female,No,0,Graduate,No,4213,0,80,360,1,Urban
LP002165,,No,1,Not Graduate,No,2038,4027,100,360,1,Rural
LP002167,Female,No,0,Graduate,No,2362,0,55,360,1,Urban
LP002168,Male,No,0,Graduate,No,5333,2400,200,360,0,Rural
LP002172,Male,Yes,3+,Graduate,Yes,5384,0,150,360,1,Semiurban
LP002176,Male,No,0,Graduate,No,5708,0,150,360,1,Rural
LP002183,Male,Yes,0,Not Graduate,No,3754,3719,118,,1,Rural
LP002184,Male,Yes,0,Not Graduate,No,2914,2130,150,300,1,Urban
LP002186,Male,Yes,0,Not Graduate,No,2747,2458,118,36,1,Semiurban
LP002192,Male,Yes,0,Graduate,No,7830,2183,212,360,1,Rural
LP002195,Male,Yes,1,Graduate,Yes,3507,3148,212,360,1,Rural
LP002208,Male,Yes,1,Graduate,No,3747,2139,125,360,1,Urban
LP002212,Male,Yes,0,Graduate,No,2166,2166,108,360,,Urban
LP002240,Male,Yes,0,Not Graduate,No,3500,2168,149,360,1,Rural
LP002245,Male,Yes,2,Not Graduate,No,2896,0,80,480,1,Urban
LP002253,Female,No,1,Graduate,No,5062,0,152,300,1,Rural
LP002256,Female,No,2,Graduate,Yes,5184,0,187,360,0,Semiurban
LP002257,Female,No,0,Graduate,No,2545,0,74,360,1,Urban
LP002264,Male,Yes,0,Graduate,No,2553,1768,102,360,1,Urban
LP002270,Male,Yes,1,Graduate,No,3436,3809,100,360,1,Rural
LP002279,Male,No,0,Graduate,No,2412,2755,130,360,1,Rural
LP002286,Male,Yes,3+,Not Graduate,No,5180,0,125,360,0,Urban
LP002294,Male,No,0,Graduate,No,14911,14507,130,360,1,Semiurban
LP002298,,No,0,Graduate,Yes,2860,2988,138,360,1,Urban
LP002306,Male,Yes,0,Graduate,No,1173,1594,28,180,1,Rural
LP002310,Female,No,1,Graduate,No,7600,0,92,360,1,Semiurban
LP002311,Female,Yes,0,Graduate,No,2157,1788,104,360,1,Urban
LP002316,Male,No,0,Graduate,No,2231,2774,176,360,0,Urban
LP002321,Female,No,0,Graduate,No,2274,5211,117,360,0,Semiurban
LP002325,Male,Yes,2,Not Graduate,No,6166,13983,102,360,1,Rural
LP002326,Male,Yes,2,Not Graduate,No,2513,1110,107,360,1,Semiurban
LP002329,Male,No,0,Graduate,No,4333,0,66,480,1,Urban
LP002333,Male,No,0,Not Graduate,No,3844,0,105,360,1,Urban
LP002339,Male,Yes,0,Graduate,No,3887,1517,105,360,0,Semiurban
LP002344,Male,Yes,0,Graduate,No,3510,828,105,360,1,Semiurban
LP002346,Male,Yes,0,Graduate,,2539,1704,125,360,0,Rural
LP002354,Female,No,0,Not Graduate,No,2107,0,64,360,1,Semiurban
LP002355,,Yes,0,Graduate,No,3186,3145,150,180,0,Semiurban
LP002358,Male,Yes,2,Graduate,Yes,5000,2166,150,360,1,Urban
LP002360,Male,Yes,,Graduate,No,10000,0,,360,1,Urban
LP002375,Male,Yes,0,Not Graduate,Yes,3943,0,64,360,1,Semiurban
LP002376,Male,No,0,Graduate,No,2925,0,40,180,1,Rural
LP002383,Male,Yes,3+,Graduate,No,3242,437,142,480,0,Urban
LP002385,Male,Yes,,Graduate,No,3863,0,70,300,1,Semiurban
LP002389,Female,No,1,Graduate,No,4028,0,131,360,1,Semiurban
LP002394,Male,Yes,2,Graduate,No,4010,1025,120,360,1,Urban
LP002397,Female,Yes,1,Graduate,No,3719,1585,114,360,1,Urban
LP002399,Male,No,0,Graduate,,2858,0,123,360,0,Rural
LP002400,Female,Yes,0,Graduate,No,3833,0,92,360,1,Rural
LP002402,Male,Yes,0,Graduate,No,3333,4288,160,360,1,Urban
LP002412,Male,Yes,0,Graduate,No,3007,3725,151,360,1,Rural
LP002415,Female,No,1,Graduate,,1850,4583,81,360,,Rural
LP002417,Male,Yes,3+,Not Graduate,No,2792,2619,171,360,1,Semiurban
LP002420,Male,Yes,0,Graduate,No,2982,1550,110,360,1,Semiurban
LP002425,Male,No,0,Graduate,No,3417,738,100,360,,Rural
LP002433,Male,Yes,1,Graduate,No,18840,0,234,360,1,Rural
LP002440,Male,Yes,2,Graduate,No,2995,1120,184,360,1,Rural
LP002441,Male,No,,Graduate,No,3579,3308,138,360,,Semiurban
LP002442,Female,Yes,1,Not Graduate,No,3835,1400,112,480,0,Urban
LP002445,Female,No,1,Not Graduate,No,3854,3575,117,360,1,Rural
LP002450,Male,Yes,2,Graduate,No,5833,750,49,360,0,Rural
LP002471,Male,No,0,Graduate,No,3508,0,99,360,1,Rural
LP002476,Female,Yes,3+,Not Graduate,No,1635,2444,99,360,1,Urban
LP002482,Female,No,0,Graduate,Yes,3333,3916,212,360,1,Rural
LP002485,Male,No,1,Graduate,No,24797,0,240,360,1,Semiurban
LP002495,Male,Yes,2,Graduate,No,5667,440,130,360,0,Semiurban
LP002496,Female,No,0,Graduate,No,3500,0,94,360,0,Semiurban
LP002523,Male,Yes,3+,Graduate,No,2773,1497,108,360,1,Semiurban
LP002542,Male,Yes,0,Graduate,,6500,0,144,360,1,Urban
LP002550,Female,No,0,Graduate,No,5769,0,110,180,1,Semiurban
LP002551,Male,Yes,3+,Not Graduate,,3634,910,176,360,0,Semiurban
LP002553,,No,0,Graduate,No,29167,0,185,360,1,Semiurban
LP002554,Male,No,0,Graduate,No,2166,2057,122,360,1,Semiurban
LP002561,Male,Yes,0,Graduate,No,5000,0,126,360,1,Rural
LP002566,Female,No,0,Graduate,No,5530,0,135,360,,Urban
LP002568,Male,No,0,Not Graduate,No,9000,0,122,360,1,Rural
LP002570,Female,Yes,2,Graduate,No,10000,11666,460,360,1,Urban
LP002572,Male,Yes,1,Graduate,,8750,0,297,360,1,Urban
LP002581,Male,Yes,0,Not Graduate,No,2157,2730,140,360,,Rural
LP002584,Male,No,0,Graduate,,1972,4347,106,360,1,Rural
LP002592,Male,No,0,Graduate,No,4983,0,141,360,1,Urban
LP002593,Male,Yes,1,Graduate,No,8333,4000,,360,1,Urban
LP002599,Male,Yes,0,Graduate,No,3667,2000,170,360,1,Semiurban
LP002604,Male,Yes,2,Graduate,No,3166,2833,145,360,1,Urban
LP002605,Male,No,0,Not Graduate,No,3271,0,90,360,1,Rural
LP002609,Female,Yes,0,Graduate,No,2241,2000,88,360,0,Urban
LP002610,Male,Yes,1,Not Graduate,,1792,2565,128,360,1,Urban
LP002612,Female,Yes,0,Graduate,No,2666,0,84,480,1,Semiurban
LP002614,,No,0,Graduate,No,6478,0,108,360,1,Semiurban
LP002630,Male,No,0,Not Graduate,,3808,0,83,360,1,Rural
LP002635,Female,Yes,2,Not Graduate,No,3729,0,117,360,1,Semiurban
LP002639,Male,Yes,2,Graduate,No,4120,0,128,360,1,Rural
LP002644,Male,Yes,1,Graduate,Yes,7500,0,75,360,1,Urban
LP002651,Male,Yes,1,Graduate,,6300,0,125,360,0,Urban
LP002654,Female,No,,Graduate,Yes,14987,0,177,360,1,Rural
LP002657,,Yes,1,Not Graduate,Yes,570,2125,68,360,1,Rural
LP002711,Male,Yes,0,Graduate,No,2600,700,96,360,1,Semiurban
LP002712,Male,No,2,Not Graduate,No,2733,1083,180,360,,Semiurban
LP002721,Male,Yes,2,Graduate,Yes,7500,0,183,360,1,Rural
LP002735,Male,Yes,2,Not Graduate,No,3859,0,121,360,1,Rural
LP002744,Male,Yes,1,Graduate,No,6825,0,162,360,1,Rural
LP002745,Male,Yes,0,Graduate,No,3708,4700,132,360,1,Semiurban
LP002746,Male,No,0,Graduate,No,5314,0,147,360,1,Urban
LP002747,Female,No,3+,Graduate,No,2366,5272,153,360,0,Rural
LP002754,Male,No,,Graduate,No,2066,2108,104,84,1,Urban
LP002759,Male,Yes,2,Graduate,No,5000,0,149,360,1,Rural
LP002760,Female,No,0,Graduate,No,3767,0,134,300,1,Urban
LP002766,Female,Yes,0,Graduate,No,7859,879,165,180,1,Semiurban
LP002769,Female,Yes,0,Graduate,No,4283,0,120,360,1,Rural
LP002774,Male,Yes,0,Not Graduate,No,1700,2900,67,360,0,Urban
LP002775,,No,0,Not Graduate,No,4768,0,125,360,1,Rural
LP002781,Male,No,0,Graduate,No,3083,2738,120,360,1,Urban
LP002782,Male,Yes,1,Graduate,No,2667,1542,148,360,1,Rural
LP002786,Female,Yes,0,Not Graduate,No,1647,1762,181,360,1,Urban
LP002790,Male,Yes,3+,Graduate,No,3400,0,80,120,1,Urban
LP002791,Male,No,1,Graduate,,16000,5000,40,360,1,Semiurban
LP002793,Male,Yes,0,Graduate,No,5333,0,90,360,1,Rural
LP002802,Male,No,0,Graduate,No,2875,2416,95,6,0,Semiurban
LP002803,Male,Yes,1,Not Graduate,,2600,618,122,360,1,Semiurban
LP002805,Male,Yes,2,Graduate,No,5041,700,150,360,1,Urban
LP002806,Male,Yes,3+,Graduate,Yes,6958,1411,150,360,1,Rural
LP002816,Male,Yes,1,Graduate,No,3500,1658,104,360,,Semiurban
LP002823,Male,Yes,0,Graduate,No,5509,0,143,360,1,Rural
LP002825,Male,Yes,3+,Graduate,No,9699,0,300,360,1,Urban
LP002826,Female,Yes,1,Not Graduate,No,3621,2717,171,360,1,Urban
LP002843,Female,Yes,0,Graduate,No,4709,0,113,360,1,Semiurban
LP002849,Male,Yes,0,Graduate,No,1516,1951,35,360,1,Semiurban
LP002850,Male,No,2,Graduate,No,2400,0,46,360,1,Urban
LP002853,Female,No,0,Not Graduate,No,3015,2000,145,360,,Urban
LP002856,Male,Yes,0,Graduate,No,2292,1558,119,360,1,Urban
LP002857,Male,Yes,1,Graduate,Yes,2360,3355,87,240,1,Rural
LP002858,Female,No,0,Graduate,No,4333,2333,162,360,0,Rural
LP002860,Male,Yes,0,Graduate,Yes,2623,4831,122,180,1,Semiurban
LP002867,Male,No,0,Graduate,Yes,3972,4275,187,360,1,Rural
LP002869,Male,Yes,3+,Not Graduate,No,3522,0,81,180,1,Rural
LP002870,Male,Yes,1,Graduate,No,4700,0,80,360,1,Urban
LP002876,Male,No,0,Graduate,No,6858,0,176,360,1,Rural
LP002878,Male,Yes,3+,Graduate,No,8334,0,260,360,1,Urban
LP002879,Male,Yes,0,Graduate,No,3391,1966,133,360,0,Rural
LP002885,Male,No,0,Not Graduate,No,2868,0,70,360,1,Urban
LP002890,Male,Yes,2,Not Graduate,No,3418,1380,135,360,1,Urban
LP002891,Male,Yes,0,Graduate,Yes,2500,296,137,300,1,Rural
LP002899,Male,Yes,2,Graduate,No,8667,0,254,360,1,Rural
LP002901,Male,No,0,Graduate,No,2283,15000,106,360,,Rural
LP002907,Male,Yes,0,Graduate,No,5817,910,109,360,1,Urban
LP002920,Male,Yes,0,Graduate,No,5119,3769,120,360,1,Rural
LP002921,Male,Yes,3+,Not Graduate,No,5316,187,158,180,0,Semiurban
LP002932,Male,Yes,3+,Graduate,No,7603,1213,197,360,1,Urban
LP002935,Male,Yes,1,Graduate,No,3791,1936,85,360,1,Urban
LP002952,Male,No,0,Graduate,No,2500,0,60,360,1,Urban
LP002954,Male,Yes,2,Not Graduate,No,3132,0,76,360,,Rural
LP002962,Male,No,0,Graduate,No,4000,2667,152,360,1,Semiurban
LP002965,Female,Yes,0,Graduate,No,8550,4255,96,360,,Urban
LP002969,Male,Yes,1,Graduate,No,2269,2167,99,360,1,Semiurban
LP002971,Male,Yes,3+,Not Graduate,Yes,4009,1777,113,360,1,Urban
LP002975,Male,Yes,0,Graduate,No,4158,709,115,360,1,Urban
LP002980,Male,No,0,Graduate,No,3250,1993,126,360,,Semiurban
LP002986,Male,Yes,0,Graduate,No,5000,2393,158,360,1,Rural
LP002989,Male,No,0,Graduate,Yes,9200,0,98,180,1,Rural
\ No newline at end of file
Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
LP001002,Male,No,0,Graduate,No,5849,0,,360,1,Urban,Y
LP001003,Male,Yes,1,Graduate,No,4583,1508,128,360,1,Rural,N
LP001005,Male,Yes,0,Graduate,Yes,3000,0,66,360,1,Urban,Y
LP001006,Male,Yes,0,Not Graduate,No,2583,2358,120,360,1,Urban,Y
LP001008,Male,No,0,Graduate,No,6000,0,141,360,1,Urban,Y
LP001011,Male,Yes,2,Graduate,Yes,5417,4196,267,360,1,Urban,Y
LP001013,Male,Yes,0,Not Graduate,No,2333,1516,95,360,1,Urban,Y
LP001014,Male,Yes,3+,Graduate,No,3036,2504,158,360,0,Semiurban,N
LP001018,Male,Yes,2,Graduate,No,4006,1526,168,360,1,Urban,Y
LP001020,Male,Yes,1,Graduate,No,12841,10968,349,360,1,Semiurban,N
LP001024,Male,Yes,2,Graduate,No,3200,700,70,360,1,Urban,Y
LP001027,Male,Yes,2,Graduate,,2500,1840,109,360,1,Urban,Y
LP001028,Male,Yes,2,Graduate,No,3073,8106,200,360,1,Urban,Y
LP001029,Male,No,0,Graduate,No,1853,2840,114,360,1,Rural,N
LP001030,Male,Yes,2,Graduate,No,1299,1086,17,120,1,Urban,Y
LP001032,Male,No,0,Graduate,No,4950,0,125,360,1,Urban,Y
LP001034,Male,No,1,Not Graduate,No,3596,0,100,240,,Urban,Y
LP001036,Female,No,0,Graduate,No,3510,0,76,360,0,Urban,N
LP001038,Male,Yes,0,Not Graduate,No,4887,0,133,360,1,Rural,N
LP001041,Male,Yes,0,Graduate,,2600,3500,115,,1,Urban,Y
LP001043,Male,Yes,0,Not Graduate,No,7660,0,104,360,0,Urban,N
LP001046,Male,Yes,1,Graduate,No,5955,5625,315,360,1,Urban,Y
LP001047,Male,Yes,0,Not Graduate,No,2600,1911,116,360,0,Semiurban,N
LP001050,,Yes,2,Not Graduate,No,3365,1917,112,360,0,Rural,N
LP001052,Male,Yes,1,Graduate,,3717,2925,151,360,,Semiurban,N
LP001066,Male,Yes,0,Graduate,Yes,9560,0,191,360,1,Semiurban,Y
LP001068,Male,Yes,0,Graduate,No,2799,2253,122,360,1,Semiurban,Y
LP001073,Male,Yes,2,Not Graduate,No,4226,1040,110,360,1,Urban,Y
LP001086,Male,No,0,Not Graduate,No,1442,0,35,360,1,Urban,N
LP001087,Female,No,2,Graduate,,3750,2083,120,360,1,Semiurban,Y
LP001091,Male,Yes,1,Graduate,,4166,3369,201,360,,Urban,N
LP001095,Male,No,0,Graduate,No,3167,0,74,360,1,Urban,N
LP001097,Male,No,1,Graduate,Yes,4692,0,106,360,1,Rural,N
LP001098,Male,Yes,0,Graduate,No,3500,1667,114,360,1,Semiurban,Y
LP001100,Male,No,3+,Graduate,No,12500,3000,320,360,1,Rural,N
LP001106,Male,Yes,0,Graduate,No,2275,2067,,360,1,Urban,Y
LP001109,Male,Yes,0,Graduate,No,1828,1330,100,,0,Urban,N
LP001112,Female,Yes,0,Graduate,No,3667,1459,144,360,1,Semiurban,Y
LP001114,Male,No,0,Graduate,No,4166,7210,184,360,1,Urban,Y
LP001116,Male,No,0,Not Graduate,No,3748,1668,110,360,1,Semiurban,Y
LP001119,Male,No,0,Graduate,No,3600,0,80,360,1,Urban,N
LP001120,Male,No,0,Graduate,No,1800,1213,47,360,1,Urban,Y
LP001123,Male,Yes,0,Graduate,No,2400,0,75,360,,Urban,Y
LP001131,Male,Yes,0,Graduate,No,3941,2336,134,360,1,Semiurban,Y
LP001136,Male,Yes,0,Not Graduate,Yes,4695,0,96,,1,Urban,Y
LP001137,Female,No,0,Graduate,No,3410,0,88,,1,Urban,Y
LP001138,Male,Yes,1,Graduate,No,5649,0,44,360,1,Urban,Y
LP001144,Male,Yes,0,Graduate,No,5821,0,144,360,1,Urban,Y
LP001146,Female,Yes,0,Graduate,No,2645,3440,120,360,0,Urban,N
LP001151,Female,No,0,Graduate,No,4000,2275,144,360,1,Semiurban,Y
LP001155,Female,Yes,0,Not Graduate,No,1928,1644,100,360,1,Semiurban,Y
LP001157,Female,No,0,Graduate,No,3086,0,120,360,1,Semiurban,Y
LP001164,Female,No,0,Graduate,No,4230,0,112,360,1,Semiurban,N
LP001179,Male,Yes,2,Graduate,No,4616,0,134,360,1,Urban,N
LP001186,Female,Yes,1,Graduate,Yes,11500,0,286,360,0,Urban,N
LP001194,Male,Yes,2,Graduate,No,2708,1167,97,360,1,Semiurban,Y
LP001195,Male,Yes,0,Graduate,No,2132,1591,96,360,1,Semiurban,Y
LP001197,Male,Yes,0,Graduate,No,3366,2200,135,360,1,Rural,N
LP001198,Male,Yes,1,Graduate,No,8080,2250,180,360,1,Urban,Y
LP001199,Male,Yes,2,Not Graduate,No,3357,2859,144,360,1,Urban,Y
LP001205,Male,Yes,0,Graduate,No,2500,3796,120,360,1,Urban,Y
LP001206,Male,Yes,3+,Graduate,No,3029,0,99,360,1,Urban,Y
LP001207,Male,Yes,0,Not Graduate,Yes,2609,3449,165,180,0,Rural,N
LP001213,Male,Yes,1,Graduate,No,4945,0,,360,0,Rural,N
LP001222,Female,No,0,Graduate,No,4166,0,116,360,0,Semiurban,N
LP001225,Male,Yes,0,Graduate,No,5726,4595,258,360,1,Semiurban,N
LP001228,Male,No,0,Not Graduate,No,3200,2254,126,180,0,Urban,N
LP001233,Male,Yes,1,Graduate,No,10750,0,312,360,1,Urban,Y
LP001238,Male,Yes,3+,Not Graduate,Yes,7100,0,125,60,1,Urban,Y
LP001241,Female,No,0,Graduate,No,4300,0,136,360,0,Semiurban,N
LP001243,Male,Yes,0,Graduate,No,3208,3066,172,360,1,Urban,Y
LP001245,Male,Yes,2,Not Graduate,Yes,1875,1875,97,360,1,Semiurban,Y
LP001248,Male,No,0,Graduate,No,3500,0,81,300,1,Semiurban,Y
LP001250,Male,Yes,3+,Not Graduate,No,4755,0,95,,0,Semiurban,N
LP001253,Male,Yes,3+,Graduate,Yes,5266,1774,187,360,1,Semiurban,Y
LP001255,Male,No,0,Graduate,No,3750,0,113,480,1,Urban,N
LP001256,Male,No,0,Graduate,No,3750,4750,176,360,1,Urban,N
LP001259,Male,Yes,1,Graduate,Yes,1000,3022,110,360,1,Urban,N
LP001263,Male,Yes,3+,Graduate,No,3167,4000,180,300,0,Semiurban,N
LP001264,Male,Yes,3+,Not Graduate,Yes,3333,2166,130,360,,Semiurban,Y
LP001265,Female,No,0,Graduate,No,3846,0,111,360,1,Semiurban,Y
LP001266,Male,Yes,1,Graduate,Yes,2395,0,,360,1,Semiurban,Y
LP001267,Female,Yes,2,Graduate,No,1378,1881,167,360,1,Urban,N
LP001273,Male,Yes,0,Graduate,No,6000,2250,265,360,,Semiurban,N
LP001275,Male,Yes,1,Graduate,No,3988,0,50,240,1,Urban,Y
LP001279,Male,No,0,Graduate,No,2366,2531,136,360,1,Semiurban,Y
LP001280,Male,Yes,2,Not Graduate,No,3333,2000,99,360,,Semiurban,Y
LP001282,Male,Yes,0,Graduate,No,2500,2118,104,360,1,Semiurban,Y
LP001289,Male,No,0,Graduate,No,8566,0,210,360,1,Urban,Y
LP001310,Male,Yes,0,Graduate,No,5695,4167,175,360,1,Semiurban,Y
LP001316,Male,Yes,0,Graduate,No,2958,2900,131,360,1,Semiurban,Y
LP001318,Male,Yes,2,Graduate,No,6250,5654,188,180,1,Semiurban,Y
LP001319,Male,Yes,2,Not Graduate,No,3273,1820,81,360,1,Urban,Y
LP001322,Male,No,0,Graduate,No,4133,0,122,360,1,Semiurban,Y
LP001325,Male,No,0,Not Graduate,No,3620,0,25,120,1,Semiurban,Y
LP001326,Male,No,0,Graduate,,6782,0,,360,,Urban,N
LP001327,Female,Yes,0,Graduate,No,2484,2302,137,360,1,Semiurban,Y
LP001333,Male,Yes,0,Graduate,No,1977,997,50,360,1,Semiurban,Y
LP001334,Male,Yes,0,Not Graduate,No,4188,0,115,180,1,Semiurban,Y
LP001343,Male,Yes,0,Graduate,No,1759,3541,131,360,1,Semiurban,Y
LP001345,Male,Yes,2,Not Graduate,No,4288,3263,133,180,1,Urban,Y
LP001349,Male,No,0,Graduate,No,4843,3806,151,360,1,Semiurban,Y
LP001350,Male,Yes,,Graduate,No,13650,0,,360,1,Urban,Y
LP001356,Male,Yes,0,Graduate,No,4652,3583,,360,1,Semiurban,Y
LP001357,Male,,,Graduate,No,3816,754,160,360,1,Urban,Y
LP001367,Male,Yes,1,Graduate,No,3052,1030,100,360,1,Urban,Y
LP001369,Male,Yes,2,Graduate,No,11417,1126,225,360,1,Urban,Y
LP001370,Male,No,0,Not Graduate,,7333,0,120,360,1,Rural,N
LP001379,Male,Yes,2,Graduate,No,3800,3600,216,360,0,Urban,N
LP001384,Male,Yes,3+,Not Graduate,No,2071,754,94,480,1,Semiurban,Y
LP001385,Male,No,0,Graduate,No,5316,0,136,360,1,Urban,Y
LP001387,Female,Yes,0,Graduate,,2929,2333,139,360,1,Semiurban,Y
LP001391,Male,Yes,0,Not Graduate,No,3572,4114,152,,0,Rural,N
LP001392,Female,No,1,Graduate,Yes,7451,0,,360,1,Semiurban,Y
LP001398,Male,No,0,Graduate,,5050,0,118,360,1,Semiurban,Y
LP001401,Male,Yes,1,Graduate,No,14583,0,185,180,1,Rural,Y
LP001404,Female,Yes,0,Graduate,No,3167,2283,154,360,1,Semiurban,Y
LP001405,Male,Yes,1,Graduate,No,2214,1398,85,360,,Urban,Y
LP001421,Male,Yes,0,Graduate,No,5568,2142,175,360,1,Rural,N
LP001422,Female,No,0,Graduate,No,10408,0,259,360,1,Urban,Y
LP001426,Male,Yes,,Graduate,No,5667,2667,180,360,1,Rural,Y
LP001430,Female,No,0,Graduate,No,4166,0,44,360,1,Semiurban,Y
LP001431,Female,No,0,Graduate,No,2137,8980,137,360,0,Semiurban,Y
LP001432,Male,Yes,2,Graduate,No,2957,0,81,360,1,Semiurban,Y
LP001439,Male,Yes,0,Not Graduate,No,4300,2014,194,360,1,Rural,Y
LP001443,Female,No,0,Graduate,No,3692,0,93,360,,Rural,Y
LP001448,,Yes,3+,Graduate,No,23803,0,370,360,1,Rural,Y
LP001449,Male,No,0,Graduate,No,3865,1640,,360,1,Rural,Y
LP001451,Male,Yes,1,Graduate,Yes,10513,3850,160,180,0,Urban,N
LP001465,Male,Yes,0,Graduate,No,6080,2569,182,360,,Rural,N
LP001469,Male,No,0,Graduate,Yes,20166,0,650,480,,Urban,Y
LP001473,Male,No,0,Graduate,No,2014,1929,74,360,1,Urban,Y
LP001478,Male,No,0,Graduate,No,2718,0,70,360,1,Semiurban,Y
LP001482,Male,Yes,0,Graduate,Yes,3459,0,25,120,1,Semiurban,Y
LP001487,Male,No,0,Graduate,No,4895,0,102,360,1,Semiurban,Y
LP001488,Male,Yes,3+,Graduate,No,4000,7750,290,360,1,Semiurban,N
LP001489,Female,Yes,0,Graduate,No,4583,0,84,360,1,Rural,N
LP001491,Male,Yes,2,Graduate,Yes,3316,3500,88,360,1,Urban,Y
LP001492,Male,No,0,Graduate,No,14999,0,242,360,0,Semiurban,N
LP001493,Male,Yes,2,Not Graduate,No,4200,1430,129,360,1,Rural,N
LP001497,Male,Yes,2,Graduate,No,5042,2083,185,360,1,Rural,N
LP001498,Male,No,0,Graduate,No,5417,0,168,360,1,Urban,Y
LP001504,Male,No,0,Graduate,Yes,6950,0,175,180,1,Semiurban,Y
LP001507,Male,Yes,0,Graduate,No,2698,2034,122,360,1,Semiurban,Y
LP001508,Male,Yes,2,Graduate,No,11757,0,187,180,1,Urban,Y
LP001514,Female,Yes,0,Graduate,No,2330,4486,100,360,1,Semiurban,Y
LP001516,Female,Yes,2,Graduate,No,14866,0,70,360,1,Urban,Y
LP001518,Male,Yes,1,Graduate,No,1538,1425,30,360,1,Urban,Y
LP001519,Female,No,0,Graduate,No,10000,1666,225,360,1,Rural,N
LP001520,Male,Yes,0,Graduate,No,4860,830,125,360,1,Semiurban,Y
LP001528,Male,No,0,Graduate,No,6277,0,118,360,0,Rural,N
LP001529,Male,Yes,0,Graduate,Yes,2577,3750,152,360,1,Rural,Y
LP001531,Male,No,0,Graduate,No,9166,0,244,360,1,Urban,N
LP001532,Male,Yes,2,Not Graduate,No,2281,0,113,360,1,Rural,N
LP001535,Male,No,0,Graduate,No,3254,0,50,360,1,Urban,Y
LP001536,Male,Yes,3+,Graduate,No,39999,0,600,180,0,Semiurban,Y
LP001541,Male,Yes,1,Graduate,No,6000,0,160,360,,Rural,Y
LP001543,Male,Yes,1,Graduate,No,9538,0,187,360,1,Urban,Y
LP001546,Male,No,0,Graduate,,2980,2083,120,360,1,Rural,Y
LP001552,Male,Yes,0,Graduate,No,4583,5625,255,360,1,Semiurban,Y
LP001560,Male,Yes,0,Not Graduate,No,1863,1041,98,360,1,Semiurban,Y
LP001562,Male,Yes,0,Graduate,No,7933,0,275,360,1,Urban,N
LP001565,Male,Yes,1,Graduate,No,3089,1280,121,360,0,Semiurban,N
LP001570,Male,Yes,2,Graduate,No,4167,1447,158,360,1,Rural,Y
LP001572,Male,Yes,0,Graduate,No,9323,0,75,180,1,Urban,Y
LP001574,Male,Yes,0,Graduate,No,3707,3166,182,,1,Rural,Y
LP001577,Female,Yes,0,Graduate,No,4583,0,112,360,1,Rural,N
LP001578,Male,Yes,0,Graduate,No,2439,3333,129,360,1,Rural,Y
LP001579,Male,No,0,Graduate,No,2237,0,63,480,0,Semiurban,N
LP001580,Male,Yes,2,Graduate,No,8000,0,200,360,1,Semiurban,Y
LP001581,Male,Yes,0,Not Graduate,,1820,1769,95,360,1,Rural,Y
LP001585,,Yes,3+,Graduate,No,51763,0,700,300,1,Urban,Y
LP001586,Male,Yes,3+,Not Graduate,No,3522,0,81,180,1,Rural,N
LP001594,Male,Yes,0,Graduate,No,5708,5625,187,360,1,Semiurban,Y
LP001603,Male,Yes,0,Not Graduate,Yes,4344,736,87,360,1,Semiurban,N
LP001606,Male,Yes,0,Graduate,No,3497,1964,116,360,1,Rural,Y
LP001608,Male,Yes,2,Graduate,No,2045,1619,101,360,1,Rural,Y
LP001610,Male,Yes,3+,Graduate,No,5516,11300,495,360,0,Semiurban,N
LP001616,Male,Yes,1,Graduate,No,3750,0,116,360,1,Semiurban,Y
LP001630,Male,No,0,Not Graduate,No,2333,1451,102,480,0,Urban,N
LP001633,Male,Yes,1,Graduate,No,6400,7250,180,360,0,Urban,N
LP001634,Male,No,0,Graduate,No,1916,5063,67,360,,Rural,N
LP001636,Male,Yes,0,Graduate,No,4600,0,73,180,1,Semiurban,Y
LP001637,Male,Yes,1,Graduate,No,33846,0,260,360,1,Semiurban,N
LP001639,Female,Yes,0,Graduate,No,3625,0,108,360,1,Semiurban,Y
LP001640,Male,Yes,0,Graduate,Yes,39147,4750,120,360,1,Semiurban,Y
LP001641,Male,Yes,1,Graduate,Yes,2178,0,66,300,0,Rural,N
LP001643,Male,Yes,0,Graduate,No,2383,2138,58,360,,Rural,Y
LP001644,,Yes,0,Graduate,Yes,674,5296,168,360,1,Rural,Y
LP001647,Male,Yes,0,Graduate,No,9328,0,188,180,1,Rural,Y
LP001653,Male,No,0,Not Graduate,No,4885,0,48,360,1,Rural,Y
LP001656,Male,No,0,Graduate,No,12000,0,164,360,1,Semiurban,N
LP001657,Male,Yes,0,Not Graduate,No,6033,0,160,360,1,Urban,N
LP001658,Male,No,0,Graduate,No,3858,0,76,360,1,Semiurban,Y
LP001664,Male,No,0,Graduate,No,4191,0,120,360,1,Rural,Y
LP001665,Male,Yes,1,Graduate,No,3125,2583,170,360,1,Semiurban,N
LP001666,Male,No,0,Graduate,No,8333,3750,187,360,1,Rural,Y
LP001669,Female,No,0,Not Graduate,No,1907,2365,120,,1,Urban,Y
LP001671,Female,Yes,0,Graduate,No,3416,2816,113,360,,Semiurban,Y
LP001673,Male,No,0,Graduate,Yes,11000,0,83,360,1,Urban,N
LP001674,Male,Yes,1,Not Graduate,No,2600,2500,90,360,1,Semiurban,Y
LP001677,Male,No,2,Graduate,No,4923,0,166,360,0,Semiurban,Y
LP001682,Male,Yes,3+,Not Graduate,No,3992,0,,180,1,Urban,N
LP001688,Male,Yes,1,Not Graduate,No,3500,1083,135,360,1,Urban,Y
LP001691,Male,Yes,2,Not Graduate,No,3917,0,124,360,1,Semiurban,Y
LP001692,Female,No,0,Not Graduate,No,4408,0,120,360,1,Semiurban,Y
LP001693,Female,No,0,Graduate,No,3244,0,80,360,1,Urban,Y
LP001698,Male,No,0,Not Graduate,No,3975,2531,55,360,1,Rural,Y
LP001699,Male,No,0,Graduate,No,2479,0,59,360,1,Urban,Y
LP001702,Male,No,0,Graduate,No,3418,0,127,360,1,Semiurban,N
LP001708,Female,No,0,Graduate,No,10000,0,214,360,1,Semiurban,N
LP001711,Male,Yes,3+,Graduate,No,3430,1250,128,360,0,Semiurban,N
LP001713,Male,Yes,1,Graduate,Yes,7787,0,240,360,1,Urban,Y
LP001715,Male,Yes,3+,Not Graduate,Yes,5703,0,130,360,1,Rural,Y
LP001716,Male,Yes,0,Graduate,No,3173,3021,137,360,1,Urban,Y
LP001720,Male,Yes,3+,Not Graduate,No,3850,983,100,360,1,Semiurban,Y
LP001722,Male,Yes,0,Graduate,No,150,1800,135,360,1,Rural,N
LP001726,Male,Yes,0,Graduate,No,3727,1775,131,360,1,Semiurban,Y
LP001732,Male,Yes,2,Graduate,,5000,0,72,360,0,Semiurban,N
LP001734,Female,Yes,2,Graduate,No,4283,2383,127,360,,Semiurban,Y
LP001736,Male,Yes,0,Graduate,No,2221,0,60,360,0,Urban,N
LP001743,Male,Yes,2,Graduate,No,4009,1717,116,360,1,Semiurban,Y
LP001744,Male,No,0,Graduate,No,2971,2791,144,360,1,Semiurban,Y
LP001749,Male,Yes,0,Graduate,No,7578,1010,175,,1,Semiurban,Y
LP001750,Male,Yes,0,Graduate,No,6250,0,128,360,1,Semiurban,Y
LP001751,Male,Yes,0,Graduate,No,3250,0,170,360,1,Rural,N
LP001754,Male,Yes,,Not Graduate,Yes,4735,0,138,360,1,Urban,N
LP001758,Male,Yes,2,Graduate,No,6250,1695,210,360,1,Semiurban,Y
LP001760,Male,,,Graduate,No,4758,0,158,480,1,Semiurban,Y
LP001761,Male,No,0,Graduate,Yes,6400,0,200,360,1,Rural,Y
LP001765,Male,Yes,1,Graduate,No,2491,2054,104,360,1,Semiurban,Y
LP001768,Male,Yes,0,Graduate,,3716,0,42,180,1,Rural,Y
LP001770,Male,No,0,Not Graduate,No,3189,2598,120,,1,Rural,Y
LP001776,Female,No,0,Graduate,No,8333,0,280,360,1,Semiurban,Y
LP001778,Male,Yes,1,Graduate,No,3155,1779,140,360,1,Semiurban,Y
LP001784,Male,Yes,1,Graduate,No,5500,1260,170,360,1,Rural,Y
LP001786,Male,Yes,0,Graduate,,5746,0,255,360,,Urban,N
LP001788,Female,No,0,Graduate,Yes,3463,0,122,360,,Urban,Y
LP001790,Female,No,1,Graduate,No,3812,0,112,360,1,Rural,Y
LP001792,Male,Yes,1,Graduate,No,3315,0,96,360,1,Semiurban,Y
LP001798,Male,Yes,2,Graduate,No,5819,5000,120,360,1,Rural,Y
LP001800,Male,Yes,1,Not Graduate,No,2510,1983,140,180,1,Urban,N
LP001806,Male,No,0,Graduate,No,2965,5701,155,60,1,Urban,Y
LP001807,Male,Yes,2,Graduate,Yes,6250,1300,108,360,1,Rural,Y
LP001811,Male,Yes,0,Not Graduate,No,3406,4417,123,360,1,Semiurban,Y
LP001813,Male,No,0,Graduate,Yes,6050,4333,120,180,1,Urban,N
LP001814,Male,Yes,2,Graduate,No,9703,0,112,360,1,Urban,Y
LP001819,Male,Yes,1,Not Graduate,No,6608,0,137,180,1,Urban,Y
LP001824,Male,Yes,1,Graduate,No,2882,1843,123,480,1,Semiurban,Y
LP001825,Male,Yes,0,Graduate,No,1809,1868,90,360,1,Urban,Y
LP001835,Male,Yes,0,Not Graduate,No,1668,3890,201,360,0,Semiurban,N
LP001836,Female,No,2,Graduate,No,3427,0,138,360,1,Urban,N
LP001841,Male,No,0,Not Graduate,Yes,2583,2167,104,360,1,Rural,Y
LP001843,Male,Yes,1,Not Graduate,No,2661,7101,279,180,1,Semiurban,Y
LP001844,Male,No,0,Graduate,Yes,16250,0,192,360,0,Urban,N
LP001846,Female,No,3+,Graduate,No,3083,0,255,360,1,Rural,Y
LP001849,Male,No,0,Not Graduate,No,6045,0,115,360,0,Rural,N
LP001854,Male,Yes,3+,Graduate,No,5250,0,94,360,1,Urban,N
LP001859,Male,Yes,0,Graduate,No,14683,2100,304,360,1,Rural,N
LP001864,Male,Yes,3+,Not Graduate,No,4931,0,128,360,,Semiurban,N
LP001865,Male,Yes,1,Graduate,No,6083,4250,330,360,,Urban,Y
LP001868,Male,No,0,Graduate,No,2060,2209,134,360,1,Semiurban,Y
LP001870,Female,No,1,Graduate,No,3481,0,155,36,1,Semiurban,N
LP001871,Female,No,0,Graduate,No,7200,0,120,360,1,Rural,Y
LP001872,Male,No,0,Graduate,Yes,5166,0,128,360,1,Semiurban,Y
LP001875,Male,No,0,Graduate,No,4095,3447,151,360,1,Rural,Y
LP001877,Male,Yes,2,Graduate,No,4708,1387,150,360,1,Semiurban,Y
LP001882,Male,Yes,3+,Graduate,No,4333,1811,160,360,0,Urban,Y
LP001883,Female,No,0,Graduate,,3418,0,135,360,1,Rural,N
LP001884,Female,No,1,Graduate,No,2876,1560,90,360,1,Urban,Y
LP001888,Female,No,0,Graduate,No,3237,0,30,360,1,Urban,Y
LP001891,Male,Yes,0,Graduate,No,11146,0,136,360,1,Urban,Y
LP001892,Male,No,0,Graduate,No,2833,1857,126,360,1,Rural,Y
LP001894,Male,Yes,0,Graduate,No,2620,2223,150,360,1,Semiurban,Y
LP001896,Male,Yes,2,Graduate,No,3900,0,90,360,1,Semiurban,Y
LP001900,Male,Yes,1,Graduate,No,2750,1842,115,360,1,Semiurban,Y
LP001903,Male,Yes,0,Graduate,No,3993,3274,207,360,1,Semiurban,Y
LP001904,Male,Yes,0,Graduate,No,3103,1300,80,360,1,Urban,Y
LP001907,Male,Yes,0,Graduate,No,14583,0,436,360,1,Semiurban,Y
LP001908,Female,Yes,0,Not Graduate,No,4100,0,124,360,,Rural,Y
LP001910,Male,No,1,Not Graduate,Yes,4053,2426,158,360,0,Urban,N
LP001914,Male,Yes,0,Graduate,No,3927,800,112,360,1,Semiurban,Y
LP001915,Male,Yes,2,Graduate,No,2301,985.7999878,78,180,1,Urban,Y
LP001917,Female,No,0,Graduate,No,1811,1666,54,360,1,Urban,Y
LP001922,Male,Yes,0,Graduate,No,20667,0,,360,1,Rural,N
LP001924,Male,No,0,Graduate,No,3158,3053,89,360,1,Rural,Y
LP001925,Female,No,0,Graduate,Yes,2600,1717,99,300,1,Semiurban,N
LP001926,Male,Yes,0,Graduate,No,3704,2000,120,360,1,Rural,Y
LP001931,Female,No,0,Graduate,No,4124,0,115,360,1,Semiurban,Y
LP001935,Male,No,0,Graduate,No,9508,0,187,360,1,Rural,Y
LP001936,Male,Yes,0,Graduate,No,3075,2416,139,360,1,Rural,Y
LP001938,Male,Yes,2,Graduate,No,4400,0,127,360,0,Semiurban,N
LP001940,Male,Yes,2,Graduate,No,3153,1560,134,360,1,Urban,Y
LP001945,Female,No,,Graduate,No,5417,0,143,480,0,Urban,N
LP001947,Male,Yes,0,Graduate,No,2383,3334,172,360,1,Semiurban,Y
LP001949,Male,Yes,3+,Graduate,,4416,1250,110,360,1,Urban,Y
LP001953,Male,Yes,1,Graduate,No,6875,0,200,360,1,Semiurban,Y
LP001954,Female,Yes,1,Graduate,No,4666,0,135,360,1,Urban,Y
LP001955,Female,No,0,Graduate,No,5000,2541,151,480,1,Rural,N
LP001963,Male,Yes,1,Graduate,No,2014,2925,113,360,1,Urban,N
LP001964,Male,Yes,0,Not Graduate,No,1800,2934,93,360,0,Urban,N
LP001972,Male,Yes,,Not Graduate,No,2875,1750,105,360,1,Semiurban,Y
LP001974,Female,No,0,Graduate,No,5000,0,132,360,1,Rural,Y
LP001977,Male,Yes,1,Graduate,No,1625,1803,96,360,1,Urban,Y
LP001978,Male,No,0,Graduate,No,4000,2500,140,360,1,Rural,Y
LP001990,Male,No,0,Not Graduate,No,2000,0,,360,1,Urban,N
LP001993,Female,No,0,Graduate,No,3762,1666,135,360,1,Rural,Y
LP001994,Female,No,0,Graduate,No,2400,1863,104,360,0,Urban,N
LP001996,Male,No,0,Graduate,No,20233,0,480,360,1,Rural,N
LP001998,Male,Yes,2,Not Graduate,No,7667,0,185,360,,Rural,Y
LP002002,Female,No,0,Graduate,No,2917,0,84,360,1,Semiurban,Y
LP002004,Male,No,0,Not Graduate,No,2927,2405,111,360,1,Semiurban,Y
LP002006,Female,No,0,Graduate,No,2507,0,56,360,1,Rural,Y
LP002008,Male,Yes,2,Graduate,Yes,5746,0,144,84,,Rural,Y
LP002024,,Yes,0,Graduate,No,2473,1843,159,360,1,Rural,N
LP002031,Male,Yes,1,Not Graduate,No,3399,1640,111,180,1,Urban,Y
LP002035,Male,Yes,2,Graduate,No,3717,0,120,360,1,Semiurban,Y
LP002036,Male,Yes,0,Graduate,No,2058,2134,88,360,,Urban,Y
LP002043,Female,No,1,Graduate,No,3541,0,112,360,,Semiurban,Y
LP002050,Male,Yes,1,Graduate,Yes,10000,0,155,360,1,Rural,N
LP002051,Male,Yes,0,Graduate,No,2400,2167,115,360,1,Semiurban,Y
LP002053,Male,Yes,3+,Graduate,No,4342,189,124,360,1,Semiurban,Y
LP002054,Male,Yes,2,Not Graduate,No,3601,1590,,360,1,Rural,Y
LP002055,Female,No,0,Graduate,No,3166,2985,132,360,,Rural,Y
LP002065,Male,Yes,3+,Graduate,No,15000,0,300,360,1,Rural,Y
LP002067,Male,Yes,1,Graduate,Yes,8666,4983,376,360,0,Rural,N
LP002068,Male,No,0,Graduate,No,4917,0,130,360,0,Rural,Y
LP002082,Male,Yes,0,Graduate,Yes,5818,2160,184,360,1,Semiurban,Y
LP002086,Female,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,N
LP002087,Female,No,0,Graduate,No,2500,0,67,360,1,Urban,Y
LP002097,Male,No,1,Graduate,No,4384,1793,117,360,1,Urban,Y
LP002098,Male,No,0,Graduate,No,2935,0,98,360,1,Semiurban,Y
LP002100,Male,No,,Graduate,No,2833,0,71,360,1,Urban,Y
LP002101,Male,Yes,0,Graduate,,63337,0,490,180,1,Urban,Y
LP002103,,Yes,1,Graduate,Yes,9833,1833,182,180,1,Urban,Y
LP002106,Male,Yes,,Graduate,Yes,5503,4490,70,,1,Semiurban,Y
LP002110,Male,Yes,1,Graduate,,5250,688,160,360,1,Rural,Y
LP002112,Male,Yes,2,Graduate,Yes,2500,4600,176,360,1,Rural,Y
LP002113,Female,No,3+,Not Graduate,No,1830,0,,360,0,Urban,N
LP002114,Female,No,0,Graduate,No,4160,0,71,360,1,Semiurban,Y
LP002115,Male,Yes,3+,Not Graduate,No,2647,1587,173,360,1,Rural,N
LP002116,Female,No,0,Graduate,No,2378,0,46,360,1,Rural,N
LP002119,Male,Yes,1,Not Graduate,No,4554,1229,158,360,1,Urban,Y
LP002126,Male,Yes,3+,Not Graduate,No,3173,0,74,360,1,Semiurban,Y
LP002128,Male,Yes,2,Graduate,,2583,2330,125,360,1,Rural,Y
LP002129,Male,Yes,0,Graduate,No,2499,2458,160,360,1,Semiurban,Y
LP002130,Male,Yes,,Not Graduate,No,3523,3230,152,360,0,Rural,N
LP002131,Male,Yes,2,Not Graduate,No,3083,2168,126,360,1,Urban,Y
LP002137,Male,Yes,0,Graduate,No,6333,4583,259,360,,Semiurban,Y
LP002138,Male,Yes,0,Graduate,No,2625,6250,187,360,1,Rural,Y
LP002139,Male,Yes,0,Graduate,No,9083,0,228,360,1,Semiurban,Y
LP002140,Male,No,0,Graduate,No,8750,4167,308,360,1,Rural,N
LP002141,Male,Yes,3+,Graduate,No,2666,2083,95,360,1,Rural,Y
LP002142,Female,Yes,0,Graduate,Yes,5500,0,105,360,0,Rural,N
LP002143,Female,Yes,0,Graduate,No,2423,505,130,360,1,Semiurban,Y
LP002144,Female,No,,Graduate,No,3813,0,116,180,1,Urban,Y
LP002149,Male,Yes,2,Graduate,No,8333,3167,165,360,1,Rural,Y
LP002151,Male,Yes,1,Graduate,No,3875,0,67,360,1,Urban,N
LP002158,Male,Yes,0,Not Graduate,No,3000,1666,100,480,0,Urban,N
LP002160,Male,Yes,3+,Graduate,No,5167,3167,200,360,1,Semiurban,Y
LP002161,Female,No,1,Graduate,No,4723,0,81,360,1,Semiurban,N
LP002170,Male,Yes,2,Graduate,No,5000,3667,236,360,1,Semiurban,Y
LP002175,Male,Yes,0,Graduate,No,4750,2333,130,360,1,Urban,Y
LP002178,Male,Yes,0,Graduate,No,3013,3033,95,300,,Urban,Y
LP002180,Male,No,0,Graduate,Yes,6822,0,141,360,1,Rural,Y
LP002181,Male,No,0,Not Graduate,No,6216,0,133,360,1,Rural,N
LP002187,Male,No,0,Graduate,No,2500,0,96,480,1,Semiurban,N
LP002188,Male,No,0,Graduate,No,5124,0,124,,0,Rural,N
LP002190,Male,Yes,1,Graduate,No,6325,0,175,360,1,Semiurban,Y
LP002191,Male,Yes,0,Graduate,No,19730,5266,570,360,1,Rural,N
LP002194,Female,No,0,Graduate,Yes,15759,0,55,360,1,Semiurban,Y
LP002197,Male,Yes,2,Graduate,No,5185,0,155,360,1,Semiurban,Y
LP002201,Male,Yes,2,Graduate,Yes,9323,7873,380,300,1,Rural,Y
LP002205,Male,No,1,Graduate,No,3062,1987,111,180,0,Urban,N
LP002209,Female,No,0,Graduate,,2764,1459,110,360,1,Urban,Y
LP002211,Male,Yes,0,Graduate,No,4817,923,120,180,1,Urban,Y
LP002219,Male,Yes,3+,Graduate,No,8750,4996,130,360,1,Rural,Y
LP002223,Male,Yes,0,Graduate,No,4310,0,130,360,,Semiurban,Y
LP002224,Male,No,0,Graduate,No,3069,0,71,480,1,Urban,N
LP002225,Male,Yes,2,Graduate,No,5391,0,130,360,1,Urban,Y
LP002226,Male,Yes,0,Graduate,,3333,2500,128,360,1,Semiurban,Y
LP002229,Male,No,0,Graduate,No,5941,4232,296,360,1,Semiurban,Y
LP002231,Female,No,0,Graduate,No,6000,0,156,360,1,Urban,Y
LP002234,Male,No,0,Graduate,Yes,7167,0,128,360,1,Urban,Y
LP002236,Male,Yes,2,Graduate,No,4566,0,100,360,1,Urban,N
LP002237,Male,No,1,Graduate,,3667,0,113,180,1,Urban,Y
LP002239,Male,No,0,Not Graduate,No,2346,1600,132,360,1,Semiurban,Y
LP002243,Male,Yes,0,Not Graduate,No,3010,3136,,360,0,Urban,N
LP002244,Male,Yes,0,Graduate,No,2333,2417,136,360,1,Urban,Y
LP002250,Male,Yes,0,Graduate,No,5488,0,125,360,1,Rural,Y
LP002255,Male,No,3+,Graduate,No,9167,0,185,360,1,Rural,Y
LP002262,Male,Yes,3+,Graduate,No,9504,0,275,360,1,Rural,Y
LP002263,Male,Yes,0,Graduate,No,2583,2115,120,360,,Urban,Y
LP002265,Male,Yes,2,Not Graduate,No,1993,1625,113,180,1,Semiurban,Y
LP002266,Male,Yes,2,Graduate,No,3100,1400,113,360,1,Urban,Y
LP002272,Male,Yes,2,Graduate,No,3276,484,135,360,,Semiurban,Y
LP002277,Female,No,0,Graduate,No,3180,0,71,360,0,Urban,N
LP002281,Male,Yes,0,Graduate,No,3033,1459,95,360,1,Urban,Y
LP002284,Male,No,0,Not Graduate,No,3902,1666,109,360,1,Rural,Y
LP002287,Female,No,0,Graduate,No,1500,1800,103,360,0,Semiurban,N
LP002288,Male,Yes,2,Not Graduate,No,2889,0,45,180,0,Urban,N
LP002296,Male,No,0,Not Graduate,No,2755,0,65,300,1,Rural,N
LP002297,Male,No,0,Graduate,No,2500,20000,103,360,1,Semiurban,Y
LP002300,Female,No,0,Not Graduate,No,1963,0,53,360,1,Semiurban,Y
LP002301,Female,No,0,Graduate,Yes,7441,0,194,360,1,Rural,N
LP002305,Female,No,0,Graduate,No,4547,0,115,360,1,Semiurban,Y
LP002308,Male,Yes,0,Not Graduate,No,2167,2400,115,360,1,Urban,Y
LP002314,Female,No,0,Not Graduate,No,2213,0,66,360,1,Rural,Y
LP002315,Male,Yes,1,Graduate,No,8300,0,152,300,0,Semiurban,N
LP002317,Male,Yes,3+,Graduate,No,81000,0,360,360,0,Rural,N
LP002318,Female,No,1,Not Graduate,Yes,3867,0,62,360,1,Semiurban,N
LP002319,Male,Yes,0,Graduate,,6256,0,160,360,,Urban,Y
LP002328,Male,Yes,0,Not Graduate,No,6096,0,218,360,0,Rural,N
LP002332,Male,Yes,0,Not Graduate,No,2253,2033,110,360,1,Rural,Y
LP002335,Female,Yes,0,Not Graduate,No,2149,3237,178,360,0,Semiurban,N
LP002337,Female,No,0,Graduate,No,2995,0,60,360,1,Urban,Y
LP002341,Female,No,1,Graduate,No,2600,0,160,360,1,Urban,N
LP002342,Male,Yes,2,Graduate,Yes,1600,20000,239,360,1,Urban,N
LP002345,Male,Yes,0,Graduate,No,1025,2773,112,360,1,Rural,Y
LP002347,Male,Yes,0,Graduate,No,3246,1417,138,360,1,Semiurban,Y
LP002348,Male,Yes,0,Graduate,No,5829,0,138,360,1,Rural,Y
LP002357,Female,No,0,Not Graduate,No,2720,0,80,,0,Urban,N
LP002361,Male,Yes,0,Graduate,No,1820,1719,100,360,1,Urban,Y
LP002362,Male,Yes,1,Graduate,No,7250,1667,110,,0,Urban,N
LP002364,Male,Yes,0,Graduate,No,14880,0,96,360,1,Semiurban,Y
LP002366,Male,Yes,0,Graduate,No,2666,4300,121,360,1,Rural,Y
LP002367,Female,No,1,Not Graduate,No,4606,0,81,360,1,Rural,N
LP002368,Male,Yes,2,Graduate,No,5935,0,133,360,1,Semiurban,Y
LP002369,Male,Yes,0,Graduate,No,2920,16.12000084,87,360,1,Rural,Y
LP002370,Male,No,0,Not Graduate,No,2717,0,60,180,1,Urban,Y
LP002377,Female,No,1,Graduate,Yes,8624,0,150,360,1,Semiurban,Y
LP002379,Male,No,0,Graduate,No,6500,0,105,360,0,Rural,N
LP002386,Male,No,0,Graduate,,12876,0,405,360,1,Semiurban,Y
LP002387,Male,Yes,0,Graduate,No,2425,2340,143,360,1,Semiurban,Y
LP002390,Male,No,0,Graduate,No,3750,0,100,360,1,Urban,Y
LP002393,Female,,,Graduate,No,10047,0,,240,1,Semiurban,Y
LP002398,Male,No,0,Graduate,No,1926,1851,50,360,1,Semiurban,Y
LP002401,Male,Yes,0,Graduate,No,2213,1125,,360,1,Urban,Y
LP002403,Male,No,0,Graduate,Yes,10416,0,187,360,0,Urban,N
LP002407,Female,Yes,0,Not Graduate,Yes,7142,0,138,360,1,Rural,Y
LP002408,Male,No,0,Graduate,No,3660,5064,187,360,1,Semiurban,Y
LP002409,Male,Yes,0,Graduate,No,7901,1833,180,360,1,Rural,Y
LP002418,Male,No,3+,Not Graduate,No,4707,1993,148,360,1,Semiurban,Y
LP002422,Male,No,1,Graduate,No,37719,0,152,360,1,Semiurban,Y
LP002424,Male,Yes,0,Graduate,No,7333,8333,175,300,,Rural,Y
LP002429,Male,Yes,1,Graduate,Yes,3466,1210,130,360,1,Rural,Y
LP002434,Male,Yes,2,Not Graduate,No,4652,0,110,360,1,Rural,Y
LP002435,Male,Yes,0,Graduate,,3539,1376,55,360,1,Rural,N
LP002443,Male,Yes,2,Graduate,No,3340,1710,150,360,0,Rural,N
LP002444,Male,No,1,Not Graduate,Yes,2769,1542,190,360,,Semiurban,N
LP002446,Male,Yes,2,Not Graduate,No,2309,1255,125,360,0,Rural,N
LP002447,Male,Yes,2,Not Graduate,No,1958,1456,60,300,,Urban,Y
LP002448,Male,Yes,0,Graduate,No,3948,1733,149,360,0,Rural,N
LP002449,Male,Yes,0,Graduate,No,2483,2466,90,180,0,Rural,Y
LP002453,Male,No,0,Graduate,Yes,7085,0,84,360,1,Semiurban,Y
LP002455,Male,Yes,2,Graduate,No,3859,0,96,360,1,Semiurban,Y
LP002459,Male,Yes,0,Graduate,No,4301,0,118,360,1,Urban,Y
LP002467,Male,Yes,0,Graduate,No,3708,2569,173,360,1,Urban,N
LP002472,Male,No,2,Graduate,No,4354,0,136,360,1,Rural,Y
LP002473,Male,Yes,0,Graduate,No,8334,0,160,360,1,Semiurban,N
LP002478,,Yes,0,Graduate,Yes,2083,4083,160,360,,Semiurban,Y
LP002484,Male,Yes,3+,Graduate,No,7740,0,128,180,1,Urban,Y
LP002487,Male,Yes,0,Graduate,No,3015,2188,153,360,1,Rural,Y
LP002489,Female,No,1,Not Graduate,,5191,0,132,360,1,Semiurban,Y
LP002493,Male,No,0,Graduate,No,4166,0,98,360,0,Semiurban,N
LP002494,Male,No,0,Graduate,No,6000,0,140,360,1,Rural,Y
LP002500,Male,Yes,3+,Not Graduate,No,2947,1664,70,180,0,Urban,N
LP002501,,Yes,0,Graduate,No,16692,0,110,360,1,Semiurban,Y
LP002502,Female,Yes,2,Not Graduate,,210,2917,98,360,1,Semiurban,Y
LP002505,Male,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,N
LP002515,Male,Yes,1,Graduate,Yes,3450,2079,162,360,1,Semiurban,Y
LP002517,Male,Yes,1,Not Graduate,No,2653,1500,113,180,0,Rural,N
LP002519,Male,Yes,3+,Graduate,No,4691,0,100,360,1,Semiurban,Y
LP002522,Female,No,0,Graduate,Yes,2500,0,93,360,,Urban,Y
LP002524,Male,No,2,Graduate,No,5532,4648,162,360,1,Rural,Y
LP002527,Male,Yes,2,Graduate,Yes,16525,1014,150,360,1,Rural,Y
LP002529,Male,Yes,2,Graduate,No,6700,1750,230,300,1,Semiurban,Y
LP002530,,Yes,2,Graduate,No,2873,1872,132,360,0,Semiurban,N
LP002531,Male,Yes,1,Graduate,Yes,16667,2250,86,360,1,Semiurban,Y
LP002533,Male,Yes,2,Graduate,No,2947,1603,,360,1,Urban,N
LP002534,Female,No,0,Not Graduate,No,4350,0,154,360,1,Rural,Y
LP002536,Male,Yes,3+,Not Graduate,No,3095,0,113,360,1,Rural,Y
LP002537,Male,Yes,0,Graduate,No,2083,3150,128,360,1,Semiurban,Y
LP002541,Male,Yes,0,Graduate,No,10833,0,234,360,1,Semiurban,Y
LP002543,Male,Yes,2,Graduate,No,8333,0,246,360,1,Semiurban,Y
LP002544,Male,Yes,1,Not Graduate,No,1958,2436,131,360,1,Rural,Y
LP002545,Male,No,2,Graduate,No,3547,0,80,360,0,Rural,N
LP002547,Male,Yes,1,Graduate,No,18333,0,500,360,1,Urban,N
LP002555,Male,Yes,2,Graduate,Yes,4583,2083,160,360,1,Semiurban,Y
LP002556,Male,No,0,Graduate,No,2435,0,75,360,1,Urban,N
LP002560,Male,No,0,Not Graduate,No,2699,2785,96,360,,Semiurban,Y
LP002562,Male,Yes,1,Not Graduate,No,5333,1131,186,360,,Urban,Y
LP002571,Male,No,0,Not Graduate,No,3691,0,110,360,1,Rural,Y
LP002582,Female,No,0,Not Graduate,Yes,17263,0,225,360,1,Semiurban,Y
LP002585,Male,Yes,0,Graduate,No,3597,2157,119,360,0,Rural,N
LP002586,Female,Yes,1,Graduate,No,3326,913,105,84,1,Semiurban,Y
LP002587,Male,Yes,0,Not Graduate,No,2600,1700,107,360,1,Rural,Y
LP002588,Male,Yes,0,Graduate,No,4625,2857,111,12,,Urban,Y
LP002600,Male,Yes,1,Graduate,Yes,2895,0,95,360,1,Semiurban,Y
LP002602,Male,No,0,Graduate,No,6283,4416,209,360,0,Rural,N
LP002603,Female,No,0,Graduate,No,645,3683,113,480,1,Rural,Y
LP002606,Female,No,0,Graduate,No,3159,0,100,360,1,Semiurban,Y
LP002615,Male,Yes,2,Graduate,No,4865,5624,208,360,1,Semiurban,Y
LP002618,Male,Yes,1,Not Graduate,No,4050,5302,138,360,,Rural,N
LP002619,Male,Yes,0,Not Graduate,No,3814,1483,124,300,1,Semiurban,Y
LP002622,Male,Yes,2,Graduate,No,3510,4416,243,360,1,Rural,Y
LP002624,Male,Yes,0,Graduate,No,20833,6667,480,360,,Urban,Y
LP002625,,No,0,Graduate,No,3583,0,96,360,1,Urban,N
LP002626,Male,Yes,0,Graduate,Yes,2479,3013,188,360,1,Urban,Y
LP002634,Female,No,1,Graduate,No,13262,0,40,360,1,Urban,Y
LP002637,Male,No,0,Not Graduate,No,3598,1287,100,360,1,Rural,N
LP002640,Male,Yes,1,Graduate,No,6065,2004,250,360,1,Semiurban,Y
LP002643,Male,Yes,2,Graduate,No,3283,2035,148,360,1,Urban,Y
LP002648,Male,Yes,0,Graduate,No,2130,6666,70,180,1,Semiurban,N
LP002652,Male,No,0,Graduate,No,5815,3666,311,360,1,Rural,N
LP002659,Male,Yes,3+,Graduate,No,3466,3428,150,360,1,Rural,Y
LP002670,Female,Yes,2,Graduate,No,2031,1632,113,480,1,Semiurban,Y
LP002682,Male,Yes,,Not Graduate,No,3074,1800,123,360,0,Semiurban,N
LP002683,Male,No,0,Graduate,No,4683,1915,185,360,1,Semiurban,N
LP002684,Female,No,0,Not Graduate,No,3400,0,95,360,1,Rural,N
LP002689,Male,Yes,2,Not Graduate,No,2192,1742,45,360,1,Semiurban,Y
LP002690,Male,No,0,Graduate,No,2500,0,55,360,1,Semiurban,Y
LP002692,Male,Yes,3+,Graduate,Yes,5677,1424,100,360,1,Rural,Y
LP002693,Male,Yes,2,Graduate,Yes,7948,7166,480,360,1,Rural,Y
LP002697,Male,No,0,Graduate,No,4680,2087,,360,1,Semiurban,N
LP002699,Male,Yes,2,Graduate,Yes,17500,0,400,360,1,Rural,Y
LP002705,Male,Yes,0,Graduate,No,3775,0,110,360,1,Semiurban,Y
LP002706,Male,Yes,1,Not Graduate,No,5285,1430,161,360,0,Semiurban,Y
LP002714,Male,No,1,Not Graduate,No,2679,1302,94,360,1,Semiurban,Y
LP002716,Male,No,0,Not Graduate,No,6783,0,130,360,1,Semiurban,Y
LP002717,Male,Yes,0,Graduate,No,1025,5500,216,360,,Rural,Y
LP002720,Male,Yes,3+,Graduate,No,4281,0,100,360,1,Urban,Y
LP002723,Male,No,2,Graduate,No,3588,0,110,360,0,Rural,N
LP002729,Male,No,1,Graduate,No,11250,0,196,360,,Semiurban,N
LP002731,Female,No,0,Not Graduate,Yes,18165,0,125,360,1,Urban,Y
LP002732,Male,No,0,Not Graduate,,2550,2042,126,360,1,Rural,Y
LP002734,Male,Yes,0,Graduate,No,6133,3906,324,360,1,Urban,Y
LP002738,Male,No,2,Graduate,No,3617,0,107,360,1,Semiurban,Y
LP002739,Male,Yes,0,Not Graduate,No,2917,536,66,360,1,Rural,N
LP002740,Male,Yes,3+,Graduate,No,6417,0,157,180,1,Rural,Y
LP002741,Female,Yes,1,Graduate,No,4608,2845,140,180,1,Semiurban,Y
LP002743,Female,No,0,Graduate,No,2138,0,99,360,0,Semiurban,N
LP002753,Female,No,1,Graduate,,3652,0,95,360,1,Semiurban,Y
LP002755,Male,Yes,1,Not Graduate,No,2239,2524,128,360,1,Urban,Y
LP002757,Female,Yes,0,Not Graduate,No,3017,663,102,360,,Semiurban,Y
LP002767,Male,Yes,0,Graduate,No,2768,1950,155,360,1,Rural,Y
LP002768,Male,No,0,Not Graduate,No,3358,0,80,36,1,Semiurban,N
LP002772,Male,No,0,Graduate,No,2526,1783,145,360,1,Rural,Y
LP002776,Female,No,0,Graduate,No,5000,0,103,360,0,Semiurban,N
LP002777,Male,Yes,0,Graduate,No,2785,2016,110,360,1,Rural,Y
LP002778,Male,Yes,2,Graduate,Yes,6633,0,,360,0,Rural,N
LP002784,Male,Yes,1,Not Graduate,No,2492,2375,,360,1,Rural,Y
LP002785,Male,Yes,1,Graduate,No,3333,3250,158,360,1,Urban,Y
LP002788,Male,Yes,0,Not Graduate,No,2454,2333,181,360,0,Urban,N
LP002789,Male,Yes,0,Graduate,No,3593,4266,132,180,0,Rural,N
LP002792,Male,Yes,1,Graduate,No,5468,1032,26,360,1,Semiurban,Y
LP002794,Female,No,0,Graduate,No,2667,1625,84,360,,Urban,Y
LP002795,Male,Yes,3+,Graduate,Yes,10139,0,260,360,1,Semiurban,Y
LP002798,Male,Yes,0,Graduate,No,3887,2669,162,360,1,Semiurban,Y
LP002804,Female,Yes,0,Graduate,No,4180,2306,182,360,1,Semiurban,Y
LP002807,Male,Yes,2,Not Graduate,No,3675,242,108,360,1,Semiurban,Y
LP002813,Female,Yes,1,Graduate,Yes,19484,0,600,360,1,Semiurban,Y
LP002820,Male,Yes,0,Graduate,No,5923,2054,211,360,1,Rural,Y
LP002821,Male,No,0,Not Graduate,Yes,5800,0,132,360,1,Semiurban,Y
LP002832,Male,Yes,2,Graduate,No,8799,0,258,360,0,Urban,N
LP002833,Male,Yes,0,Not Graduate,No,4467,0,120,360,,Rural,Y
LP002836,Male,No,0,Graduate,No,3333,0,70,360,1,Urban,Y
LP002837,Male,Yes,3+,Graduate,No,3400,2500,123,360,0,Rural,N
LP002840,Female,No,0,Graduate,No,2378,0,9,360,1,Urban,N
LP002841,Male,Yes,0,Graduate,No,3166,2064,104,360,0,Urban,N
LP002842,Male,Yes,1,Graduate,No,3417,1750,186,360,1,Urban,Y
LP002847,Male,Yes,,Graduate,No,5116,1451,165,360,0,Urban,N
LP002855,Male,Yes,2,Graduate,No,16666,0,275,360,1,Urban,Y
LP002862,Male,Yes,2,Not Graduate,No,6125,1625,187,480,1,Semiurban,N
LP002863,Male,Yes,3+,Graduate,No,6406,0,150,360,1,Semiurban,N
LP002868,Male,Yes,2,Graduate,No,3159,461,108,84,1,Urban,Y
LP002872,,Yes,0,Graduate,No,3087,2210,136,360,0,Semiurban,N
LP002874,Male,No,0,Graduate,No,3229,2739,110,360,1,Urban,Y
LP002877,Male,Yes,1,Graduate,No,1782,2232,107,360,1,Rural,Y
LP002888,Male,No,0,Graduate,,3182,2917,161,360,1,Urban,Y
LP002892,Male,Yes,2,Graduate,No,6540,0,205,360,1,Semiurban,Y
LP002893,Male,No,0,Graduate,No,1836,33837,90,360,1,Urban,N
LP002894,Female,Yes,0,Graduate,No,3166,0,36,360,1,Semiurban,Y
LP002898,Male,Yes,1,Graduate,No,1880,0,61,360,,Rural,N
LP002911,Male,Yes,1,Graduate,No,2787,1917,146,360,0,Rural,N
LP002912,Male,Yes,1,Graduate,No,4283,3000,172,84,1,Rural,N
LP002916,Male,Yes,0,Graduate,No,2297,1522,104,360,1,Urban,Y
LP002917,Female,No,0,Not Graduate,No,2165,0,70,360,1,Semiurban,Y
LP002925,,No,0,Graduate,No,4750,0,94,360,1,Semiurban,Y
LP002926,Male,Yes,2,Graduate,Yes,2726,0,106,360,0,Semiurban,N
LP002928,Male,Yes,0,Graduate,No,3000,3416,56,180,1,Semiurban,Y
LP002931,Male,Yes,2,Graduate,Yes,6000,0,205,240,1,Semiurban,N
LP002933,,No,3+,Graduate,Yes,9357,0,292,360,1,Semiurban,Y
LP002936,Male,Yes,0,Graduate,No,3859,3300,142,180,1,Rural,Y
LP002938,Male,Yes,0,Graduate,Yes,16120,0,260,360,1,Urban,Y
LP002940,Male,No,0,Not Graduate,No,3833,0,110,360,1,Rural,Y
LP002941,Male,Yes,2,Not Graduate,Yes,6383,1000,187,360,1,Rural,N
LP002943,Male,No,,Graduate,No,2987,0,88,360,0,Semiurban,N
LP002945,Male,Yes,0,Graduate,Yes,9963,0,180,360,1,Rural,Y
LP002948,Male,Yes,2,Graduate,No,5780,0,192,360,1,Urban,Y
LP002949,Female,No,3+,Graduate,,416,41667,350,180,,Urban,N
LP002950,Male,Yes,0,Not Graduate,,2894,2792,155,360,1,Rural,Y
LP002953,Male,Yes,3+,Graduate,No,5703,0,128,360,1,Urban,Y
LP002958,Male,No,0,Graduate,No,3676,4301,172,360,1,Rural,Y
LP002959,Female,Yes,1,Graduate,No,12000,0,496,360,1,Semiurban,Y
LP002960,Male,Yes,0,Not Graduate,No,2400,3800,,180,1,Urban,N
LP002961,Male,Yes,1,Graduate,No,3400,2500,173,360,1,Semiurban,Y
LP002964,Male,Yes,2,Not Graduate,No,3987,1411,157,360,1,Rural,Y
LP002974,Male,Yes,0,Graduate,No,3232,1950,108,360,1,Rural,Y
LP002978,Female,No,0,Graduate,No,2900,0,71,360,1,Rural,Y
LP002979,Male,Yes,3+,Graduate,No,4106,0,40,180,1,Rural,Y
LP002983,Male,Yes,1,Graduate,No,8072,240,253,360,1,Urban,Y
LP002984,Male,Yes,2,Graduate,No,7583,0,187,360,1,Urban,Y
LP002990,Female,No,0,Graduate,Yes,4583,0,133,360,0,Semiurban,N
\ No newline at end of file
{
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 图的遍历"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"\n",
"def BFS(graph, s):\n",
" queue = [] # \n",
" queue.append(s)\n",
" seen = set() # \n",
" while len(queue) > 0:\n",
" vertex = queue.pop(0)\n",
" nodes = graph[vertex]\n",
" for w in nodes:\n",
" if w not in seen:\n",
" queue.append(w)\n",
" seen.add(w)\n",
"\n",
" # print(vertex)\n",
"\n",
" print(seen)\n",
"\n",
"\n",
"graph = {\n",
" \"A\": [\"B\", \"C\"],\n",
" \"B\": [\"A\", \"C\", \"D\"],\n",
" \"C\": [\"A\", \"B\", \"E\", \"D\"],\n",
" \"D\": [\"B\", \"C\", \"E\", \"F\"],\n",
" \"F\": [\"D\"],\n",
" \"E\": [\"C\", \"D\"],\n",
"}\n",
"\n",
"BFS(graph, \"F\")\n",
"#\n",
"# def breadth_travel(root):\n",
"# \"\"\"利⽤队列实现树的层次遍历\"\"\"\n",
"# if root == None:\n",
"# return\n",
"# queue = []\n",
"# queue.append(root)\n",
"# while queue:\n",
"# node = queue.pop(0)\n",
"# print(node.elem)\n",
"# if node.lchild is not None:\n",
"# queue.append(node.lchild)\n",
"# if node.rchild != None:\n",
"# queue.append(node.rchild)\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# dijkstra heap"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import heapq as hp\n",
"import math\n",
"\n",
"graph = {\n",
"\n",
" \"A\": {\"B\": 5, \"C\": 1},\n",
" \"B\": {\"A\": 5, \"C\": 2, \"D\": 1},\n",
" \"C\": {\"A\": 1, \"B\": 2, \"E\": 8, \"D\": 4},\n",
" \"D\": {\"B\": 1, \"C\": 4, \"E\": 3, \"F\": 6},\n",
" \"F\": {\"D\": 6},\n",
" \"E\": {\"C\": 8, \"D\": 3},\n",
"}\n",
"\n",
"\n",
"def init_distance(graph, s):\n",
" distance = {s: 0}\n",
" for key in graph:\n",
" if key != s:\n",
" distance[key] = math.inf\n",
" return distance\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<module 'heapq' from 'F:\\\\Installpath\\\\Anaconda3\\\\lib\\\\heapq.py'>\n",
"{'A': 0, 'B': inf, 'C': inf, 'D': inf, 'F': inf, 'E': inf}\n"
]
},
{
"ename": "NameError",
"evalue": "name 'seen' is not defined",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-3-37b8f44b2d3a>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 23\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 24\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 25\u001b[1;33m \u001b[0md\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdijkstra\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgraph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"A\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 26\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0md\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m<ipython-input-3-37b8f44b2d3a>\u001b[0m in \u001b[0;36mdijkstra\u001b[1;34m(graph, s)\u001b[0m\n\u001b[0;32m 11\u001b[0m \u001b[0mnode\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpair\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;31m#\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 12\u001b[0m \u001b[1;31m# seen.add(node)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 13\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"seen: \"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mseen\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 14\u001b[0m \u001b[0mnodes\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mgraph\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mnode\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m#\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nodes: \"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnodes\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mNameError\u001b[0m: name 'seen' is not defined"
]
}
],
"source": [
"def dijkstra(graph, s):\n",
" pqueue = []\n",
" hp.heappush(pqueue, (0, s)) #\n",
" print(hp)\n",
" seen = set()\n",
" distance = init_distance(graph, s)\n",
" print(distance)\n",
" while len(pqueue) > 0:\n",
" pair = hp.heappop(pqueue)\n",
" dist = pair[0] # \n",
" node = pair[1] #\n",
" seen.add(node)\n",
" print(\"seen: \", seen)\n",
" nodes = graph[node].keys() # \n",
" print(\"nodes: \", nodes)\n",
" #\n",
" for w in nodes:\n",
" if dist + graph[node][w] < distance[w]:\n",
" hp.heappush(pqueue, (dist + graph[node][w], w))\n",
" distance[w] = dist + graph[node][w]\n",
" print(f\"change distance for {w}: \", distance)\n",
" return distance\n",
"\n",
"\n",
"d = dijkstra(graph, \"A\")\n",
"print(d)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# dijkstra 动态规划"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Inf = float('inf')\n",
"Adjacent = [[0, 5, 1, Inf, Inf, Inf],\n",
" [5, 0, 2, 1, Inf, Inf],\n",
" [1, 2, 0, 4, 8, Inf],\n",
" [Inf, 1, 4, 0, 3, 6],\n",
" [Inf, Inf, 8, 3, 0, Inf],\n",
" [Inf, Inf, Inf, 6, Inf, 0]]\n",
"Src, Dst, N = 0, 5, 6\n",
"\n",
"\n",
"# 动态规划\n",
"def dijstra(adj, src, dst, n):\n",
" dist = [Inf] * n #\n",
" dist[src] = 0\n",
" book = [0] * n # 记录已经确定的顶点\n",
" # 每次找到起点到该点的最短途径\n",
" u = src\n",
" for _ in range(n - 1): # 找n-1次\n",
" book[u] = 1 # 已经确定\n",
" # 更新距离并记录最小距离的结点\n",
" next_u, minVal = None, float('inf')\n",
" for v in range(n): # w\n",
" w = adj[u][v]\n",
" if w == Inf: # 结点u和v之间没有边\n",
" continue\n",
" if not book[v] and dist[u] + w < dist[v]: # 判断结点是否已经确定了\n",
" dist[v] = dist[u] + w\n",
" if dist[v] < minVal:\n",
" next_u, minVal = v, dist[v]\n",
" # 开始下一轮遍历\n",
" u = next_u\n",
" print(dist)\n",
" return dist[dst]\n",
"\n",
"\n",
"dijstra(Adjacent, Src, Dst, N)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 模拟退火"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import division\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import math\n",
" \n",
"#define aim function\n",
"def aimFunction(x):\n",
" y=x**3-60*x**2-4*x+6\n",
" return y\n",
"x=[i/10 for i in range(1000)]\n",
"y=[0 for i in range(1000)]\n",
"for i in range(1000):\n",
" y[i]=aimFunction(x[i])\n",
"\n",
"plt.plot(x,y)\n",
"plt.show()\n",
"\n",
"print('最小值',y.index(min(y))) \n",
"print(\"最优值\",x[400], min(y))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"T=1000 #initiate temperature\n",
"Tmin=10 #minimum value of terperature\n",
"x=np.random.uniform(low=0,high=100)#initiate x\n",
"k=50 #times of internal circulation \n",
"y=0#initiate result\n",
"t=0#time\n",
"while T>=Tmin:\n",
" for i in range(k):\n",
" #calculate y\n",
" y=aimFunction(x)\n",
" #generate a new x in the neighboorhood of x by transform function\n",
" xNew=x+np.random.uniform(low=-0.055,high=0.055)*T\n",
" if (0<=xNew and xNew<=100):\n",
" yNew=aimFunction(xNew)\n",
" if yNew-y<0:\n",
" x=xNew\n",
" else:\n",
" #metropolis principle\n",
" p=math.exp(-(yNew-y)/T)\n",
" r=np.random.uniform(low=0,high=1)\n",
" if r<p:\n",
" x=xNew\n",
" t+=1\n",
"# print(t)\n",
" T=1000/(1+t)\n",
" \n",
"print (x,aimFunction(x))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
{
{
......@@ -9,17 +9,9 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'B', 'C', 'D', 'E', 'A', 'F'}\n"
]
}
],
"outputs": [],
"source": [
"\n",
"def BFS(graph, s):\n",
......@@ -116,7 +108,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"<module 'heapq' from '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/heapq.py'>\n",
"<module 'heapq' from 'F:\\\\Installpath\\\\Anaconda3\\\\lib\\\\heapq.py'>\n",
"{'A': 0, 'B': inf, 'C': inf, 'D': inf, 'F': inf, 'E': inf}\n",
"seen: {'A'}\n",
"nodes: dict_keys(['B', 'C'])\n",
......@@ -130,19 +122,19 @@
"seen: {'A', 'B', 'C'}\n",
"nodes: dict_keys(['A', 'C', 'D'])\n",
"change distance for D: {'A': 0, 'B': 3, 'C': 1, 'D': 4, 'F': inf, 'E': 9}\n",
"seen: {'D', 'A', 'B', 'C'}\n",
"seen: {'A', 'D', 'B', 'C'}\n",
"nodes: dict_keys(['B', 'C', 'E', 'F'])\n",
"change distance for E: {'A': 0, 'B': 3, 'C': 1, 'D': 4, 'F': inf, 'E': 7}\n",
"change distance for F: {'A': 0, 'B': 3, 'C': 1, 'D': 4, 'F': 10, 'E': 7}\n",
"seen: {'D', 'A', 'B', 'C'}\n",
"seen: {'A', 'D', 'B', 'C'}\n",
"nodes: dict_keys(['A', 'C', 'D'])\n",
"seen: {'D', 'A', 'B', 'C'}\n",
"seen: {'A', 'D', 'B', 'C'}\n",
"nodes: dict_keys(['B', 'C', 'E', 'F'])\n",
"seen: {'B', 'D', 'C', 'E', 'A'}\n",
"seen: {'D', 'E', 'A', 'C', 'B'}\n",
"nodes: dict_keys(['C', 'D'])\n",
"seen: {'B', 'D', 'C', 'E', 'A'}\n",
"seen: {'D', 'E', 'A', 'C', 'B'}\n",
"nodes: dict_keys(['C', 'D'])\n",
"seen: {'B', 'D', 'C', 'E', 'A', 'F'}\n",
"seen: {'D', 'E', 'A', 'C', 'F', 'B'}\n",
"nodes: dict_keys(['D'])\n",
"{'A': 0, 'B': 3, 'C': 1, 'D': 4, 'F': 10, 'E': 7}\n"
]
......@@ -153,14 +145,14 @@
" pqueue = []\n",
" hp.heappush(pqueue, (0, s)) #\n",
" print(hp)\n",
"# seen = set()\n",
" seen = set()\n",
" distance = init_distance(graph, s)\n",
" print(distance)\n",
" while len(pqueue) > 0:\n",
" pair = hp.heappop(pqueue)\n",
" dist = pair[0] # \n",
" node = pair[1] #\n",
"# seen.add(node)\n",
" seen.add(node)\n",
" print(\"seen: \", seen)\n",
" nodes = graph[node].keys() # \n",
" print(\"nodes: \", nodes)\n",
......@@ -255,30 +247,9 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD4CAYAAAAZ1BptAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3deXxV9Z3/8dcn+0oIELaEEJbIIqJARNzaqlVxaaFq3apSa7Xzq+3Urmo7HWfame4dl446danbryMqOpVxqowittaqEERQ9oBAEhISCNnJcnO/88c90BBzQ4Ak596b9/PxuI97z/ece7+f48H7zvme5ZpzDhERke7E+V2AiIhELoWEiIiEpZAQEZGwFBIiIhKWQkJERMJK8LuAvjZixAhXUFDgdxkiIlFl9erVe51zOV3bYy4kCgoKKC4u9rsMEZGoYmY7u2vXcJOIiISlkBARkbAUEiIiEpZCQkREwup1SJhZvJmtMbOXvOkJZvaumZWY2TNmluS1J3vTJd78gk6fcafXvtnMLuzUPt9rKzGzOzq1d9uHiIgMjKPZk/gGsLHT9M+Bu51zk4H9wE1e+03Afq/9bm85zGw6cDVwIjAfeMALnnjgfuAiYDpwjbdsT32IiMgA6FVImFkecAnwiDdtwLnAEm+RJ4CF3usF3jTe/PO85RcAi51zrc65j4ASYK73KHHObXfOtQGLgQVH6ENERAZAb/ck7gG+BwS96eFArXMu4E2XAbne61ygFMCbX+ctf6i9y3vCtffUx2HM7BYzKzaz4urq6l6ukohIbCitaeYXr2yiqqGlzz/7iCFhZpcCVc651X3eex9xzj3knCtyzhXl5HzsgkERkZj2/HtlPPinbbQFgkde+Cj15orrM4HPmtnFQAowBLgXGGpmCd5f+nlAubd8OTAOKDOzBCAL2Nep/aDO7+mufV8PfYiICBAMOpasLuOMScPJy07r888/4p6Ec+5O51yec66A0IHn151zXwBWAFd4iy0CXvReL/Wm8ea/7kI/f7cUuNo7+2kCUAisBFYBhd6ZTEleH0u994TrQ0REgHc/qqFs/wE+P2fckRc+BsdzncTtwLfMrITQ8YNHvfZHgeFe+7eAOwCcc+uBZ4ENwCvArc65Dm8v4WvAMkJnTz3rLdtTHyIiAjy3upTM5AQuPHF0v3y+xdpvXBcVFTnd4E9EBoPG1gCn/strLJw1lp9eNvO4PsvMVjvnirq264prEZEo9T/rdnOgvYMr+mmoCRQSIiJRa8nqMibmpDM7f2i/9aGQEBGJQh/tbWLVjv1cMSeP0LXH/UMhISIShZasLiXO4PLZef3aj0JCRCTKdAQdL7xXzidOyGHUkJR+7UshISISZd4q2UtFXUu/XRvRmUJCRCTKPLe6jKzURM6bNrLf+1JIiIhEkboD7SxbX8mCU8aSkhjf7/0pJEREosh/r91NWyA4IENNoJAQEYkqi1ftYuroTGbkDhmQ/hQSIiJR4sPyOj4sr+eaufn9em1EZwoJEZEo8fTKXSQnxLHwlG5/f61fKCRERKJAc1uAF9/fzSUzx5CVljhg/SokRESiwEtrK2hsDXDN3PwB7VchISISBZ5etYvJIzMoGp89oP0qJEREItymynrW7Krl6lPHDdgB64MUEiIiEW7xylKS4uO4rJ9v5tcdhYSISARrae/ghffKuHDGaIalJw14/woJEZEI9vKHFdS3BLhm7sBcYd2VQkJEJII9/W4pBcPTOH3icF/6V0iIiESokqpGVu6o4apTB+4K664UEiIiEeqZVbtIiDOumDPwB6wPUkiIiESglvYOnltdxgUnjiInM9m3OhQSIiIR6KV1FdQ2t3PdvPG+1qGQEBGJQE+9s5PJIzN8O2B9kEJCRCTCrCurZW1pLdfPG+/bAeuDFBIiIhHmqbd3kpYUz+dmD9wtwcNRSIiIRJDa5jaWrt3Nwlm5DEkZuFuCh6OQEBGJIEtWl9EaCHK9zwesD1JIiIhEiGDQ8dQ7Ozm1IJtpYwbmN6yPRCEhIhIh3izZy859zb6f9tqZQkJEJEI89fZORmQkMX/GaL9LOUQhISISAcr2N/P6pj1cfWo+yQnxfpdziEJCRCQCPL1yFwDXnDawv2F9JAoJERGftbR3sHhlKedNG0Xu0FS/yzmMQkJExGdL1+5mX1MbN55R4HcpH6OQEBHxkXOOx97awZRRmZw+yd/7NHXniCFhZilmttLM1prZejP7Z699gpm9a2YlZvaMmSV57cnedIk3v6DTZ93ptW82sws7tc/32krM7I5O7d32ISISK97ZXsPGinpuPLPA9/s0dac3exKtwLnOuZOBU4D5ZjYP+Dlwt3NuMrAfuMlb/iZgv9d+t7ccZjYduBo4EZgPPGBm8WYWD9wPXARMB67xlqWHPkREYsJjb31EdloiC2f5f5+m7hwxJFxIozeZ6D0ccC6wxGt/AljovV7gTePNP89C8bgAWOyca3XOfQSUAHO9R4lzbrtzrg1YDCzw3hOuDxGRqLdrXzOvbtzDtaflk5IYOae9dtarYxLeX/zvA1XAq8A2oNY5F/AWKQMOxmAuUArgza8Dhndu7/KecO3De+ija323mFmxmRVXV1f3ZpVERHz3xNs7iDfj+nkFfpcSVq9CwjnX4Zw7Bcgj9Jf/1H6t6ig55x5yzhU554pycnL8LkdE5IgaWwM8u6qUi08aw+isFL/LCeuozm5yztUCK4DTgaFmluDNygPKvdflwDgAb34WsK9ze5f3hGvf10MfIiJRbUlxKQ2tAW48s8DvUnrUm7ObcsxsqPc6FTgf2EgoLK7wFlsEvOi9XupN481/3TnnvParvbOfJgCFwEpgFVDoncmUROjg9lLvPeH6EBGJWsGg4/G/7mBW/lBm5Wf7XU6PEo68CGOAJ7yzkOKAZ51zL5nZBmCxmf0LsAZ41Fv+UeApMysBagh96eOcW29mzwIbgABwq3OuA8DMvgYsA+KB3znn1nufdXuYPkREotaKzVXs2NfMty6Y4ncpR2ShP9hjR1FRkSsuLva7DBGRsL7wyDtsq2rizdvPITE+Mq5pNrPVzrmiru2RUZ2IyCDxYXkdb5XsY9EZBRETED2J/ApFRGLIw29uJz0pnmsj7G6v4SgkREQGSNn+Zl5aV8E1c/PJSk30u5xeUUiIiAyQ3/1lBwZ86awJfpfSawoJEZEBUNfczuJVu/jMyWMZG2G/GdEThYSIyAD4/cqdNLd1cPPZE/0u5agoJERE+llroIPH3trB2YUjmD52iN/lHBWFhIhIP3txzW6qG1r5yicm+V3KUVNIiIj0o2DQ8dCb25k+ZghnTo68X547EoWEiEg/emNLFSVVjXzlkxMj8pfnjkQhISLSjx58Yxu5Q1O5+KQxfpdyTBQSIiL9ZOVHNazasZ9bPjExKm7B0Z3orFpEJAr8+4oSRmQkcdWp4468cIRSSIiI9IN1ZbX8eUs1N501MWJ/v7o3FBIiIv3ggRXbGJKSwHXzouNGfuEoJERE+tjWPQ28sr6SL55RQGZKdNzILxyFhIhIH3vgjW2kJcVz45nRcyO/cBQSIiJ9aNe+Zpau3c21c/PJTk/yu5zjppAQEelD//HnbcSbcfMnoutGfuEoJERE+khlXQtLisu4oiiPUUNS/C6nTygkRET6yEN/3k6Hc/xdFN7ILxyFhIhIH6iqb+H37+7kc7NyyR+e5nc5fUYhISLSBx54YxuBoOPr5072u5Q+pZAQETlOlXUt/OfKXVw+O5fxw9P9LqdPKSRERI7Tg2+UEAw6vn5uod+l9DmFhIjIcaioO8DTK0u5Yk4e44bFzrGIgxQSIiLH4cE3thF0jlvPia1jEQcpJEREjtHu2gMsXlnK54vGxeReBCgkRESO2QNvlOBwfC3GzmjqTCEhInIMymsP8MyqUq4sGkfu0FS/y+k3CgkRkWNw32tbMYyvxuixiIMUEiIiR6mkqpHnVpdy3bzxMb0XAQoJEZGj9m+vbiY1MZ5bz4mdezSFo5AQETkK68pq+eMHlXz57IkMz0j2u5x+p5AQETkKv1y2mWHpSXz57Oj/1bneUEiIiPTSWyV7eXPrXr76qUlR/9vVvaWQEBHpBeccv1i2mbFZKVw3b7zf5QyYI4aEmY0zsxVmtsHM1pvZN7z2YWb2qplt9Z6zvXYzs/vMrMTM1pnZ7E6ftchbfquZLerUPsfMPvDec5+ZWU99iIgMtGXr97C2tJbbzj+BlMR4v8sZML3ZkwgA33bOTQfmAbea2XTgDmC5c64QWO5NA1wEFHqPW4AHIfSFD9wFnAbMBe7q9KX/IHBzp/fN99rD9SEiMmACHUF+9b+bmZSTzmWzcv0uZ0AdMSSccxXOufe81w3ARiAXWAA84S32BLDQe70AeNKFvAMMNbMxwIXAq865GufcfuBVYL43b4hz7h3nnAOe7PJZ3fUhIjJglqwuo6Sqke9cMIWE+ME1Sn9Ua2tmBcAs4F1glHOuwptVCYzyXucCpZ3eVua19dRe1k07PfTRta5bzKzYzIqrq6uPZpVERHrU1Brg169uYc74bObPGO13OQOu1yFhZhnA88Btzrn6zvO8PQDXx7Udpqc+nHMPOeeKnHNFOTk5/VmGiAwyv/3TNqobWvnBJdPwDpcOKr0KCTNLJBQQv3fOveA17/GGivCeq7z2cmBcp7fneW09ted1095THyIi/a6i7gAPvbmdS2eOYXb+4DxvpjdnNxnwKLDROfdvnWYtBQ6eobQIeLFT+w3eWU7zgDpvyGgZcIGZZXsHrC8Alnnz6s1sntfXDV0+q7s+RET63a+WbSEYhNvnT/W7FN8k9GKZM4HrgQ/M7H2v7fvAz4BnzewmYCdwpTfvj8DFQAnQDNwI4JyrMbMfA6u85X7knKvxXn8VeBxIBV72HvTQh4hIv/qwvI4X1pRxy9kTY/YHhXrDQkP9saOoqMgVFxf7XYaIRDHnHNc+/C6bKut547vnkJUa+1dXm9lq51xR1/bBdS6XiEgvLN9Yxdvb93Hbp08YFAHRE4WEiEgn7R1BfvLyRibmpHPtafl+l+M7hYSISCdP/HUH26ub+P5F00gcZBfOdUf/BUREPNUNrdz72lY+eUIO500b6Xc5EUEhISLi+cUrm2gJdPCPn5k+KC+c645CQkQEWLNrP8+tLuNLZ05gUk6G3+VEDIWEiAx6waDjn5auZ2RmMl8/r9DvciKKQkJEBr3nVpeytqyOOy+eSkZyb64xHjwUEiIyqNUdaOcXr2ymaHw2C08ZXL8V0RuKTBEZ1O55bQs1zW088dm5OljdDe1JiMigtWF3PU++vZNr5+YzIzfL73IikkJCRAaljqDj+//1AdlpiXz3wil+lxOxFBIiMij957s7eb+0ln+4ZDpD05L8LidiKSREZNCpqm/hF69s5qzJI1hwyli/y4loCgkRGXR+9NIGWjuC/HjhDB2sPgKFhIgMKm9sruKldRV87ZzJTBiR7nc5EU8hISKDxoG2Dn744odMzEnnK5+c6Hc5UUHXSYjIoHHv8q2U1hzg6ZvnkZwQ73c5UUF7EiIyKKwrq+XhN7dzZVEep08a7nc5UUMhISIxry0Q5HtL1jEiI4kfXDLd73KiioabRCTm3b+ihE2VDTy6qGjQ/2b10dKehIjEtI0V9dy/ooSFp4zlvGmj/C4n6igkRCRmBTqCfHfJWoamJXLXZ070u5yopOEmEYlZD725nQ/L63ngC7PJTtetN46F9iREJCZt2dPAPa9t5eKTRnPxSWP8LidqKSREJOa0BYLctvh9MpMT+OfPzvC7nKim4SYRiTn3Lt/Chop6Hrp+DjmZyX6XE9W0JyEiMaV4Rw0PvrGNq4rGccGJo/0uJ+opJEQkZjS2BvjWs2vJzU7lh5/RRXN9QcNNIhIz/uWlDZTtb+bZr5xORrK+3vqC9iREJCa8umEPi1eV8nefnERRwTC/y4kZCgkRiXqVdS3c/vw6po8Zwm2fPsHvcmKKQkJEolpH0HHbM2toae/gN9fOIilBX2t9SYN2IhLV7l9Rwjvba/jV509mUk6G3+XEHEWuiEStd7fv457XtvC5WblcPjvX73JikkJCRKLS/qY2vrH4fcYPT+fHC2dgZn6XFJM03CQiUcc5x3eXrKWmqY0XFp2h01370RH3JMzsd2ZWZWYfdmobZmavmtlW7znbazczu8/MSsxsnZnN7vSeRd7yW81sUaf2OWb2gfee+8z7cyBcHyIij7z5Ea9trOLOi6cyIzfL73JiWm+Gmx4H5ndpuwNY7pwrBJZ70wAXAYXe4xbgQQh94QN3AacBc4G7On3pPwjc3Ol984/Qh4gMYm9v28fPXtnERTNG88UzCvwuJ+YdMSScc38Garo0LwCe8F4/ASzs1P6kC3kHGGpmY4ALgVedczXOuf3Aq8B8b94Q59w7zjkHPNnls7rrQ0QGqcq6Fr7+9HsUDE/jl58/WcchBsCxHrge5Zyr8F5XAgd/EzAXKO20XJnX1lN7WTftPfXxMWZ2i5kVm1lxdXX1MayOiES6tkCQW//zPZrbOvjt9XN0HGKAHPfZTd4egOuDWo65D+fcQ865IudcUU5OTn+WIiI++ckfN7J6535+ccVMJo/M9LucQeNYQ2KPN1SE91zltZcD4zotl+e19dSe1017T32IyCDzhzXlPP7XHXz5rAlcOnOs3+UMKscaEkuBg2coLQJe7NR+g3eW0zygzhsyWgZcYGbZ3gHrC4Bl3rx6M5vnndV0Q5fP6q4PERlE1pXVcvvz65hbMIzbL5rqdzmDzhEH9czsaeBTwAgzKyN0ltLPgGfN7CZgJ3Clt/gfgYuBEqAZuBHAOVdjZj8GVnnL/cg5d/Bg+FcJnUGVCrzsPeihDxEZJPbUt3Dzk8WMyEjmgetmkxiv638HmoWG+2NHUVGRKy4u9rsMETlOLe0dXPXbt9la1cjz/+8Mpo0Z4ndJMc3MVjvnirq26/QAEYk4zjm+t2Qd68rr+O11cxQQPtK+m4hEnAfe2MbStbv5zgVT9DvVPlNIiEhEefmDCn65bDMLTxnLVz81ye9yBj2FhIhEjOIdNdz2zPvMzh/Kzy6fqSuqI4BCQkQiQklVI19+spjcoak8suhUUhLj/S5JUEiISASoamjhi4+tJCHOePzGuQxLT/K7JPHo7CYR8VVja4AvPb6KmqY2Ft8yj/zhaX6XJJ0oJETEN22BILf+/j02VjTwyA1FzMwb6ndJ0oWGm0TEFx1BxzefeZ8/banmJ5+bwTlTR/pdknRDISEiAy4YdNz5wjr+54MKfnDxNK46Nd/vkiQMhYSIDCjnHD/+nw08W1zG359XyM2fmOh3SdIDhYSIDKi7X9vKY2/t4EtnTuCbny70uxw5AoWEiAyY//jTNu5bvpWrisbxw0un6WK5KKCzm0RkQNy/ooRfLtvMZ04ey08uO0kBESUUEiLS736zfCu/fnULC04Zy68/fzLxcQqIaKGQEJF+dc9rW7jnta1cNiuXXyogoo5CQkT6hXOOu1/byn3Lt3LFnDx+fvlMBUQUUkiISJ8LBh3/+seNPPqXj7iyKI+fXTaTOAVEVFJIiEifau8IcvuSdbywppwvnlHAP146XQERxRQSItJnWto7uPX377F8UxXfPv8EvnbuZJ3FFOUUEiLSJ+oOtHPzE8Ws2lnDjxfO4Pp54/0uSfqAQkJEjlt57QFuenwV26ob+c01s7h05li/S5I+opAQkeOytrSWm54opjXQwWNfnMtZhSP8Lkn6kEJCRI7ZKx9WcNsz7zMiI5mnbz6NwlGZfpckfUwhISJHzTnHw29u56cvb+LkvKE8fEMROZnJfpcl/UAhISJH5UBbBz/4rw94YU05l5w0hl9feTIpifF+lyX9RCEhIr1WWtPMV55azcbKem77dCF/f26hroGIcQoJEemVN7dW8/Wn19ARdDy6qIhzp47yuyQZAAoJEelRMOh48E/b+PX/bqZwZCa/vX4OBSPS/S5LBohCQkTCqmpo4VvPrOUvJXu5dOYYfn75TNKT9bUxmGhri0i33thcxbefXUtTW4CfXnYSV586TrfYGIQUEiJymLZAkF8u28TDb37E1NGZLL5mnq5/GMQUEiJyyIfldXznubVsqmzg+nnj+cEl03R66yCnkBAR2gJB/n1FCQ+sKGFYehKPLirivGk6e0kUEiKD3vrddXznuXVsrKjnslm53PWZE8lKS/S7LIkQCgmRQaqxNcA9r27hsb/uYFh6Eg/fUMT507X3IIdTSIgMMs45Xv6wkh/99wYq61u4Zu44bp8/laFpSX6XJhEo4kPCzOYD9wLxwCPOuZ/5XJJI1PpobxP/tHQ9f9pSzbQxQ7j/C7OZMz7b77IkgkV0SJhZPHA/cD5QBqwys6XOuQ3+ViYSXfY3tXHv8q38/3d2kpwQxw8vnc6i08eTEB/nd2kS4SI6JIC5QIlzbjuAmS0GFgB9HhKP/uUjNlfWEx8XR2K8kRAXR0K8kRBnJCfEk54cT0ZyAhkpCaQnJ5CZHHrOTktiWHoSSQn6n00iT0t7B0++vYPfvF5CU2uAq07N55vnFzIyM8Xv0iRKRHpI5AKlnabLgNO6LmRmtwC3AOTn5x9TRxsr6vnL1r0Ego5AMEigo/OzO+L7M1MSGJ6exPCMZO85iZGZKYwdmsKYrNRDz7qlgQyE9o4g//VeOfe9vpWy/Qf41JQc7rxoGlNG66I4OTox8Y3lnHsIeAigqKjoyN/o3fjV508OOy/QEaSprYPG1gBNrQEaWwM0toSea5vb2dfYyr6mttCjsZVdNc28t2s/+5racF2qGZKScCg0xg9PZ/zwNAq857zsNO2RyHE5GA6/WbGV0poDnJSbxU8vO4mzC3P8Lk2iVKSHRDkwrtN0ntc2oBLi48hKjSMr9ejOHW8LBNlT30JFXQsVdQfYXfu35921B1j5UQ1NbR2Hlo8zyM1OPRQaBcPTmTwyg8kjMxiblar79ktYLe0d/GFNOQ+8sY1dNc2clJvFPy06kXOnjtT9luS4RHpIrAIKzWwCoXC4GrjW35J6LykhjnHD0hg3LK3b+c459jW1sXNfEzv2Noee94Wel76/m/qWwKFl05LiQ4GRk8HkURkUjsxk8sgM8oelEa/wGLRqmtp46u2dPPXODvY2tjEjdwiP3FDEedMUDtI3IjoknHMBM/sasIzQKbC/c86t97msPmNmjMhIZkRGMnPGD/vY/JqmNkqqGtla1cDWPY1sq27kr9v28cKav+1MJSXEMXFEOlNGZzJ19BCmjslk6uhMRg9J0ZdEDNtUWc+Tb+/k+dVltAaCnDMlh5vPnsjpk4Zru0ufMtd10DzKFRUVueLiYr/L6Ff1Le1sq2pka1VjKET2NLC5soHddS2HlslKTWTq6EymjRniBUgmU0ZnkpYU0X8XSA+a2wK8tK6Cp1fuYs2uWpIS4vjcKbl8+ewJukurHDczW+2cK+rarm+MKDQkJZFZ+dnMyj/8Iqi65nY272lgU2U9GytCz88Vlx467mEG44elMXV0KDimjQntfeQPS9PxjgjlnOO9Xfv5w5rd/GFNOQ2tASblpPMPl0zj8tl5ZKfrKmnpXwqJGJKVlsjcCcOYO+FvQ1fBoKNs/wE2VtazuTIUHJsqGli2ofLQmVepifGcMCqDKaMzmTJ6yKG9jhEZyT6tyeDmnGNjRQNL1+7mv9fuprz2AMkJcVxy0hiuOS2fovHZGlKSAaPhpkHqQFsHW6sa2FTRcChANlc2sK+p7dAyIzKSmDI6kxNGHRyuGsIJozI0ZNUPAh1B1pTW8trGPby2YQ/bqpuIjzPOLhzBZ08ey/nTR5GZojuzSv/RcJMcJjUpnpl5Q5mZN/Sw9uqG1lBg7Glgsxcei1eWcqD9b0NW+cPSOgVH6LlgeLpu8XCUqhpaeHvbPt7YXM2KzVXUNreTEGecNnEYN545gYtPGsMwDSeJzxQScpiczGRyMpM5q3DEobZg0LGrptkLjoZDw1bLN+7h4MXoSfFxTMxJZ1JOBhNz0kOPEaHX+gs4pLqhlVU7anh72z7+um0v26qbAMhOS+TcKSM5b9oozj5hBEP030siiIab5Ji1tHdQUtXIFi88tuxpYPveJkprmul8J5ORmclecGQwcUQ6E0akM25YGnnZqTE7dFV3oJ315XWsLatjbWkt68pqD519lpYUz6kFwzh90nBOnzicE8cO0V6Y+E7DTdLnUhLjmZGbxYzcrMPaWwMd7NrXzLbqJrbvbWR7dRPbqxv54wcV1Da3H7bs8PQk8rzAyMtOZVx26PXorBRGZqaQnZYYsQdpO4KOPfUtlO0/cOh6ltApyY1U1v/tdOTxw9OYUzCML+VlMSs/m5l5WSQqFCRKKCSkzyUnxFM4KrPbc/drmtr4aG8TZfubKdt/4NDz+vI6/nd9Je0dh+/ZJsYbORnJ5AxJYWRmMiO94bCs1ESyUhMZkpJIVpr3nJpIZkoCKYnxx3QVunOOlvYgzW0Bmts6Qvflamqlpqnt0KOqoZWy/c2U1x6gorblsJs/pibGUzgqgzMmD2fyyAxOHJvFzNwsnaYqUU0hIQNqWHro1urd/dBNMOjY0xD6y7yyroXqhlaqGlqpagi93rWvmeIdNezvsjfSndAt3uNISogjOSGe5MQ4DHCAcxB07tApwK2BDprbOjjQ3vGxGzJ2/cwRGcnkZqcyOz+b3Jmp5GankpedxqScdN1fS2KSQkIiRlycMSYrlTFZqT0uF+gI0tASoO5AO/Ut7aHnA6HphpZ2WgNBWgMdtLYHaQ0EaQsEaQmEAsAM4swwAAPDSE6MIy0xntSk0CMtMZ60pASGpiUeCrXh6ckMSU2I2KEvkf6ikJCokxAfR3Z6koZxRAaAjp6JiEhYCgkREQlLISEiImEpJEREJCyFhIiIhKWQEBGRsBQSIiISlkJCRETCirm7wJpZNbDzGN8+Atjbh+VEA63z4KB1HhyOZ53HO+dyujbGXEgcDzMr7u5WubFM6zw4aJ0Hh/5YZw03iYhIWAoJEREJSyFxuIf8LsAHWufBQes8OPT5OuuYhIiIhKU9CRERCUshISIiYSkkPGY238w2m1mJmd3hdz19zczGmdkKM9tgZuvN7Bte+zAze/S0dBIAAANtSURBVNXMtnrPH/9d0ShnZvFmtsbMXvKmJ5jZu962fsbMYurXi8xsqJktMbNNZrbRzE6P9e1sZt/0/l1/aGZPm1lKrG1nM/udmVWZ2Yed2rrdrhZyn7fu68xs9rH2q5Ag9CUC3A9cBEwHrjGz6f5W1ecCwLedc9OBecCt3jreASx3zhUCy73pWPMNYGOn6Z8DdzvnJgP7gZt8qar/3Au84pybCpxMaN1jdjubWS7w90CRc24GEA9cText58eB+V3awm3Xi4BC73EL8OCxdqqQCJkLlDjntjvn2oDFwAKfa+pTzrkK59x73usGQl8cuYTW8wlvsSeAhf5U2D/MLA+4BHjEmzbgXGCJt0hMrbOZZQGfAB4FcM61OedqifHtTOinmFPNLAFIAyqIse3snPszUNOlOdx2XQA86ULeAYaa2Zhj6VchEZILlHaaLvPaYpKZFQCzgHeBUc65Cm9WJTDKp7L6yz3A94CgNz0cqHXOBbzpWNvWE4Bq4DFviO0RM0snhrezc64c+BWwi1A41AGrie3tfFC47dpn32kKiUHGzDKA54HbnHP1nee50PnQMXNOtJldClQ551b7XcsASgBmAw8652YBTXQZWorB7ZxN6C/nCcBYIJ2PD8vEvP7argqJkHJgXKfpPK8tpphZIqGA+L1z7gWvec/B3VDvucqv+vrBmcBnzWwHoSHEcwmN1w/1hiUg9rZ1GVDmnHvXm15CKDRieTt/GvjIOVftnGsHXiC07WN5Ox8Ubrv22XeaQiJkFVDonQ2RROig11Kfa+pT3lj8o8BG59y/dZq1FFjkvV4EvDjQtfUX59ydzrk851wBoW36unPuC8AK4ApvsVhb50qg1MymeE3nARuI4e1MaJhpnpmlef/OD65zzG7nTsJt16XADd5ZTvOAuk7DUkdFV1x7zOxiQuPX8cDvnHP/6nNJfcrMzgLeBD7gb+Pz3yd0XOJZIJ/QLdavdM51PTgW9czsU8B3nHOXmtlEQnsWw4A1wHXOuVY/6+tLZnYKoQP1ScB24EZCfxDG7HY2s38GriJ0Ft8a4MuExuBjZjub2dPApwjdDnwPcBfwB7rZrl5Y/juhYbdm4EbnXPEx9auQEBGRcDTcJCIiYSkkREQkLIWEiIiEpZAQEZGwFBIiIhKWQkJERMJSSIiISFj/B5UhF9RDA2I4AAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"最小值 400\n",
"最优值 40.0 -32154.0\n"
]
}
],
"outputs": [],
"source": [
"from __future__ import division\n",
"import numpy as np\n",
......@@ -303,17 +274,9 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"39.69856448101894 -32147.369845045607\n"
]
}
],
"outputs": [],
"source": [
"T=1000 #initiate temperature\n",
"Tmin=10 #minimum value of terperature\n",
......@@ -368,7 +331,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.7.0"
}
},
"nbformat": 4,
......
This source diff could not be displayed because it is too large. You can view the blob instead.
{
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 搭建一个简单的问答系统 (Building a Simple QA System)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"本次项目的目标是搭建一个基于检索式的简易的问答系统,这是一个最经典的方法也是最有效的方法。 \n",
"\n",
"```不要单独创建一个文件,所有的都在这里面编写,不要试图改已经有的函数名字 (但可以根据需求自己定义新的函数)```\n",
"\n",
"```预估完成时间```: 5-10小时"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 检索式的问答系统\n",
"问答系统所需要的数据已经提供,对于每一个问题都可以找得到相应的答案,所以可以理解为每一个样本数据是 ``<问题、答案>``。 那系统的核心是当用户输入一个问题的时候,首先要找到跟这个问题最相近的已经存储在库里的问题,然后直接返回相应的答案即可(但实际上也可以抽取其中的实体或者关键词)。 举一个简单的例子:\n",
"\n",
"假设我们的库里面已有存在以下几个<问题,答案>:\n",
"- <\"贪心学院主要做什么方面的业务?”, “他们主要做人工智能方面的教育”>\n",
"- <“国内有哪些做人工智能教育的公司?”, “贪心学院”>\n",
"- <\"人工智能和机器学习的关系什么?\", \"其实机器学习是人工智能的一个范畴,很多人工智能的应用要基于机器学习的技术\">\n",
"- <\"人工智能最核心的语言是什么?\", ”Python“>\n",
"- .....\n",
"\n",
"假设一个用户往系统中输入了问题 “贪心学院是做什么的?”, 那这时候系统先去匹配最相近的“已经存在库里的”问题。 那在这里很显然是 “贪心学院是做什么的”和“贪心学院主要做什么方面的业务?”是最相近的。 所以当我们定位到这个问题之后,直接返回它的答案 “他们主要做人工智能方面的教育”就可以了。 所以这里的核心问题可以归结为计算两个问句(query)之间的相似度。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 项目中涉及到的任务描述\n",
"问答系统看似简单,但其中涉及到的内容比较多。 在这里先做一个简单的解释,总体来讲,我们即将要搭建的模块包括:\n",
"\n",
"- 文本的读取: 需要从相应的文件里读取```(问题,答案)```\n",
"- 文本预处理: 清洗文本很重要,需要涉及到```停用词过滤```等工作\n",
"- 文本的表示: 如果表示一个句子是非常核心的问题,这里会涉及到```tf-idf```, ```Glove```以及```BERT Embedding```\n",
"- 文本相似度匹配: 在基于检索式系统中一个核心的部分是计算文本之间的```相似度```,从而选择相似度最高的问题然后返回这些问题的答案\n",
"- 倒排表: 为了加速搜索速度,我们需要设计```倒排表```来存储每一个词与出现的文本\n",
"- 词义匹配:直接使用倒排表会忽略到一些意思上相近但不完全一样的单词,我们需要做这部分的处理。我们需要提前构建好```相似的单词```然后搜索阶段使用\n",
"- 拼写纠错:我们不能保证用户输入的准确,所以第一步需要做用户输入检查,如果发现用户拼错了,我们需要及时在后台改正,然后按照修改后的在库里面搜索\n",
"- 文档的排序: 最后返回结果的排序根据文档之间```余弦相似度```有关,同时也跟倒排表中匹配的单词有关\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 项目中需要的数据:\n",
"1. ```dev-v2.0.json```: 这个数据包含了问题和答案的pair, 但是以JSON格式存在,需要编写parser来提取出里面的问题和答案。 \n",
"2. ```glove.6B```: 这个文件需要从网上下载,下载地址为:https://nlp.stanford.edu/projects/glove/, 请使用d=200的词向量\n",
"3. ```spell-errors.txt``` 这个文件主要用来编写拼写纠错模块。 文件中第一列为正确的单词,之后列出来的单词都是常见的错误写法。 但这里需要注意的一点是我们没有给出他们之间的概率,也就是p(错误|正确),所以我们可以认为每一种类型的错误都是```同等概率```\n",
"4. ```vocab.txt``` 这里列了几万个英文常见的单词,可以用这个词库来验证是否有些单词被拼错\n",
"5. ```testdata.txt``` 这里搜集了一些测试数据,可以用来测试自己的spell corrector。这个文件只是用来测试自己的程序。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在本次项目中,你将会用到以下几个工具:\n",
"- ```sklearn```。具体安装请见:http://scikit-learn.org/stable/install.html sklearn包含了各类机器学习算法和数据处理工具,包括本项目需要使用的词袋模型,均可以在sklearn工具包中找得到。 \n",
"- ```jieba```,用来做分词。具体使用方法请见 https://github.com/fxsjy/jieba\n",
"- ```bert embedding```: https://github.com/imgarylai/bert-embedding\n",
"- ```nltk```:https://www.nltk.org/index.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 第一部分:对于训练数据的处理:读取文件和预处理"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- ```文本的读取```: 需要从文本中读取数据,此处需要读取的文件是```dev-v2.0.json```,并把读取的文件存入一个列表里(list)\n",
"- ```文本预处理```: 对于问题本身需要做一些停用词过滤等文本方面的处理\n",
"- ```可视化分析```: 对于给定的样本数据,做一些可视化分析来更好地理解数据"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.1节: 文本的读取\n",
"把给定的文本数据读入到```qlist```和```alist```当中,这两个分别是列表,其中```qlist```是问题的列表,```alist```是对应的答案列表"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_corpus():\n",
" \"\"\"\n",
" 读取给定的语料库,并把问题列表和答案列表分别写入到 qlist, alist 里面。 在此过程中,不用对字符换做任何的处理(这部分需要在 Part 2.3里处理)\n",
" qlist = [\"问题1\", “问题2”, “问题3” ....]\n",
" alist = [\"答案1\", \"答案2\", \"答案3\" ....]\n",
" 务必要让每一个问题和答案对应起来(下标位置一致)\n",
" \"\"\"\n",
" # TODO 需要完成的代码部分 ...\n",
" \n",
" \n",
" \n",
" assert len(qlist) == len(alist) # 确保长度一样\n",
" return qlist, alist"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2 理解数据(可视化分析/统计信息)\n",
"对数据的理解是任何AI工作的第一步, 需要对数据有个比较直观的认识。在这里,简单地统计一下:\n",
"\n",
"- 在```qlist```出现的总单词个数\n",
"- 按照词频画一个```histogram``` plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: 统计一下在qlist中总共出现了多少个单词? 总共出现了多少个不同的单词(unique word)?\n",
"# 这里需要做简单的分词,对于英文我们根据空格来分词即可,其他过滤暂不考虑(只需分词)\n",
"\n",
"print (word_total)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: 统计一下qlist中出现1次,2次,3次... 出现的单词个数, 然后画一个plot. 这里的x轴是单词出现的次数(1,2,3,..), y轴是单词个数。\n",
"# 从左到右分别是 出现1次的单词数,出现2次的单词数,出现3次的单词数... \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: 从上面的图中能观察到什么样的现象? 这样的一个图的形状跟一个非常著名的函数形状很类似,能所出此定理吗? \n",
"# hint: [XXX]'s law\n",
"# \n",
"# "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### 1.3 文本预处理\n",
"此部分需要做文本方面的处理。 以下是可以用到的一些方法:\n",
"\n",
"- 1. 停用词过滤 (去网上搜一下 \"english stop words list\",会出现很多包含停用词库的网页,或者直接使用NLTK自带的) \n",
"- 2. 转换成lower_case: 这是一个基本的操作 \n",
"- 3. 去掉一些无用的符号: 比如连续的感叹号!!!, 或者一些奇怪的单词。\n",
"- 4. 去掉出现频率很低的词:比如出现次数少于10,20.... (想一下如何选择阈值)\n",
"- 5. 对于数字的处理: 分词完只有有些单词可能就是数字比如44,415,把所有这些数字都看成是一个单词,这个新的单词我们可以定义为 \"#number\"\n",
"- 6. lemmazation: 在这里不要使用stemming, 因为stemming的结果有可能不是valid word。\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: 需要做文本方面的处理。 从上述几个常用的方法中选择合适的方法给qlist做预处理(不一定要按照上面的顺序,不一定要全部使用)\n",
"\n",
"qlist = # 更新后的问题列表"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 第二部分: 文本的表示\n",
"当我们做完必要的文本处理之后就需要想办法表示文本了,这里有几种方式\n",
"\n",
"- 1. 使用```tf-idf vector```\n",
"- 2. 使用embedding技术如```word2vec```, ```bert embedding```等\n",
"\n",
"下面我们分别提取这三个特征来做对比。 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.1 使用tf-idf表示向量\n",
"把```qlist```中的每一个问题的字符串转换成```tf-idf```向量, 转换之后的结果存储在```X```矩阵里。 ``X``的大小是: ``N* D``的矩阵。 这里``N``是问题的个数(样本个数),\n",
"``D``是词典库的大小"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO \n",
"vectorizer = # 定义一个tf-idf的vectorizer\n",
"\n",
"X_tfidf = # 结果存放在X矩阵里"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.2 使用wordvec + average pooling\n",
"词向量方面需要下载: https://nlp.stanford.edu/projects/glove/ (请下载``glove.6B.zip``),并使用``d=200``的词向量(200维)。国外网址如果很慢,可以在百度上搜索国内服务器上的。 每个词向量获取完之后,即可以得到一个句子的向量。 我们通过``average pooling``来实现句子的向量。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO 基于Glove向量获取句子向量\n",
"emb = # 这是 D*H的矩阵,这里的D是词典库的大小, H是词向量的大小。 这里面我们给定的每个单词的词向量,\n",
" # 这需要从文本中读取\n",
" \n",
"X_w2v = # 初始化完emb之后就可以对每一个句子来构建句子向量了,这个过程使用average pooling来实现\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.3 使用BERT + average pooling\n",
"最近流行的BERT也可以用来学出上下文相关的词向量(contex-aware embedding), 在很多问题上得到了比较好的结果。在这里,我们不做任何的训练,而是直接使用已经训练好的BERT embedding。 具体如何训练BERT将在之后章节里体会到。 为了获取BERT-embedding,可以直接下载已经训练好的模型从而获得每一个单词的向量。可以从这里获取: https://github.com/imgarylai/bert-embedding , 请使用```bert_12_768_12```\t当然,你也可以从其他source获取也没问题,只要是合理的词向量。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO 基于BERT的句子向量计算\n",
"\n",
"X_bert = # 每一个句子的向量结果存放在X_bert矩阵里。行数为句子的总个数,列数为一个句子embedding大小。 "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### 第三部分: 相似度匹配以及搜索\n",
"在这部分里,我们需要把用户每一个输入跟知识库里的每一个问题做一个相似度计算,从而得出最相似的问题。但对于这个问题,时间复杂度其实很高,所以我们需要结合倒排表来获取相似度最高的问题,从而获得答案。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.1 tf-idf + 余弦相似度\n",
"我们可以直接基于计算出来的``tf-idf``向量,计算用户最新问题与库中存储的问题之间的相似度,从而选择相似度最高的问题的答案。这个方法的复杂度为``O(N)``, ``N``是库中问题的个数。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_top_results_tfidf_noindex(query):\n",
" # TODO 需要编写\n",
" \"\"\"\n",
" 给定用户输入的问题 query, 返回最有可能的TOP 5问题。这里面需要做到以下几点:\n",
" 1. 对于用户的输入 query 首先做一系列的预处理(上面提到的方法),然后再转换成tf-idf向量(利用上面的vectorizer)\n",
" 2. 计算跟每个库里的问题之间的相似度\n",
" 3. 找出相似度最高的top5问题的答案\n",
" \"\"\"\n",
" \n",
" top_idxs = [] # top_idxs存放相似度最高的(存在qlist里的)问题的下标 \n",
" # hint: 请使用 priority queue来找出top results. 思考为什么可以这么做? \n",
" \n",
" return alist[top_idxs] # 返回相似度最高的问题对应的答案,作为TOP5答案 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: 编写几个测试用例,并输出结果\n",
"print (get_top_results_tfidf_noindex(\"\"))\n",
"print (get_top_results_tfidf_noindex(\"\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"你会发现上述的程序很慢,没错! 是因为循环了所有库里的问题。为了优化这个过程,我们需要使用一种数据结构叫做```倒排表```。 使用倒排表我们可以把单词和出现这个单词的文档做关键。 之后假如要搜索包含某一个单词的文档,即可以非常快速的找出这些文档。 在这个QA系统上,我们首先使用倒排表来快速查找包含至少一个单词的文档,然后再进行余弦相似度的计算,即可以大大减少```时间复杂度```。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.2 倒排表的创建\n",
"倒排表的创建其实很简单,最简单的方法就是循环所有的单词一遍,然后记录每一个单词所出现的文档,然后把这些文档的ID保存成list即可。我们可以定义一个类似于```hash_map```, 比如 ``inverted_index = {}``, 然后存放包含每一个关键词的文档出现在了什么位置,也就是,通过关键词的搜索首先来判断包含这些关键词的文档(比如出现至少一个),然后对于candidates问题做相似度比较。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO 请创建倒排表\n",
"inverted_idx = {} # 定一个一个简单的倒排表,是一个map结构。 循环所有qlist一遍就可以"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.3 语义相似度\n",
"这里有一个问题还需要解决,就是语义的相似度。可以这么理解: 两个单词比如car, auto这两个单词长得不一样,但从语义上还是类似的。如果只是使用倒排表我们不能考虑到这些单词之间的相似度,这就导致如果我们搜索句子里包含了``car``, 则我们没法获取到包含auto的所有的文档。所以我们希望把这些信息也存下来。那这个问题如何解决呢? 其实也不难,可以提前构建好相似度的关系,比如对于``car``这个单词,一开始就找好跟它意思上比较类似的单词比如top 10,这些都标记为``related words``。所以最后我们就可以创建一个保存``related words``的一个``map``. 比如调用``related_words['car']``即可以调取出跟``car``意思上相近的TOP 10的单词。 \n",
"\n",
"那这个``related_words``又如何构建呢? 在这里我们仍然使用``Glove``向量,然后计算一下俩俩的相似度(余弦相似度)。之后对于每一个词,存储跟它最相近的top 10单词,最终结果保存在``related_words``里面。 这个计算需要发生在离线,因为计算量很大,复杂度为``O(V*V)``, V是单词的总数。 \n",
"\n",
"这个计算过程的代码请放在``related.py``的文件里,然后结果保存在``related_words.txt``里。 我们在使用的时候直接从文件里读取就可以了,不用再重复计算。所以在此notebook里我们就直接读取已经计算好的结果。 作业提交时需要提交``related.py``和``related_words.txt``文件,这样在使用的时候就不再需要做这方面的计算了。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO 读取语义相关的单词\n",
"def get_related_words(file):\n",
" \n",
" return related_words\n",
"\n",
"related_words = get_related_words('related_words.txt') # 直接放在文件夹的根目录下,不要修改此路径。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.4 利用倒排表搜索\n",
"在这里,我们使用倒排表先获得一批候选问题,然后再通过余弦相似度做精准匹配,这样一来可以节省大量的时间。搜索过程分成两步:\n",
"\n",
"- 使用倒排表把候选问题全部提取出来。首先,对输入的新问题做分词等必要的预处理工作,然后对于句子里的每一个单词,从``related_words``里提取出跟它意思相近的top 10单词, 然后根据这些top词从倒排表里提取相关的文档,把所有的文档返回。 这部分可以放在下面的函数当中,也可以放在外部。\n",
"- 然后针对于这些文档做余弦相似度的计算,最后排序并选出最好的答案。\n",
"\n",
"可以适当定义自定义函数,使得减少重复性代码"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_top_results_tfidf(query):\n",
" \"\"\"\n",
" 给定用户输入的问题 query, 返回最有可能的TOP 5问题。这里面需要做到以下几点:\n",
" 1. 利用倒排表来筛选 candidate (需要使用related_words). \n",
" 2. 对于候选文档,计算跟输入问题之间的相似度\n",
" 3. 找出相似度最高的top5问题的答案\n",
" \"\"\"\n",
" \n",
" top_idxs = [] # top_idxs存放相似度最高的(存在qlist里的)问题的下表 \n",
" # hint: 利用priority queue来找出top results. 思考为什么可以这么做? \n",
" \n",
" return alist[top_idxs] # 返回相似度最高的问题对应的答案,作为TOP5答案"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_top_results_w2v(query):\n",
" \"\"\"\n",
" 给定用户输入的问题 query, 返回最有可能的TOP 5问题。这里面需要做到以下几点:\n",
" 1. 利用倒排表来筛选 candidate (需要使用related_words). \n",
" 2. 对于候选文档,计算跟输入问题之间的相似度\n",
" 3. 找出相似度最高的top5问题的答案\n",
" \"\"\"\n",
" \n",
" top_idxs = [] # top_idxs存放相似度最高的(存在qlist里的)问题的下表 \n",
" # hint: 利用priority queue来找出top results. 思考为什么可以这么做? \n",
" \n",
" return alist[top_idxs] # 返回相似度最高的问题对应的答案,作为TOP5答案"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_top_results_bert(query):\n",
" \"\"\"\n",
" 给定用户输入的问题 query, 返回最有可能的TOP 5问题。这里面需要做到以下几点:\n",
" 1. 利用倒排表来筛选 candidate (需要使用related_words). \n",
" 2. 对于候选文档,计算跟输入问题之间的相似度\n",
" 3. 找出相似度最高的top5问题的答案\n",
" \"\"\"\n",
" \n",
" top_idxs = [] # top_idxs存放相似度最高的(存在qlist里的)问题的下表 \n",
" # hint: 利用priority queue来找出top results. 思考为什么可以这么做? \n",
" \n",
" return alist[top_idxs] # 返回相似度最高的问题对应的答案,作为TOP5答案"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: 编写几个测试用例,并输出结果\n",
"\n",
"test_query1 = \"\"\n",
"test_query2 = \"\"\n",
"\n",
"print (get_top_results_tfidf(test_query1))\n",
"print (get_top_results_w2v(test_query1))\n",
"print (get_top_results_bert(test_query1))\n",
"\n",
"print (get_top_results_tfidf(test_query2))\n",
"print (get_top_results_w2v(test_query2))\n",
"print (get_top_results_bert(test_query2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. 拼写纠错\n",
"其实用户在输入问题的时候,不能期待他一定会输入正确,有可能输入的单词的拼写错误的。这个时候我们需要后台及时捕获拼写错误,并进行纠正,然后再通过修正之后的结果再跟库里的问题做匹配。这里我们需要实现一个简单的拼写纠错的代码,然后自动去修复错误的单词。\n",
"\n",
"这里使用的拼写纠错方法是课程里讲过的方法,就是使用noisy channel model。 我们回想一下它的表示:\n",
"\n",
"$c^* = \\text{argmax}_{c\\in candidates} ~~p(c|s) = \\text{argmax}_{c\\in candidates} ~~p(s|c)p(c)$\n",
"\n",
"这里的```candidates```指的是针对于错误的单词的候选集,这部分我们可以假定是通过edit_distance来获取的(比如生成跟当前的词距离为1/2的所有的valid 单词。 valid单词可以定义为存在词典里的单词。 ```c```代表的是正确的单词, ```s```代表的是用户错误拼写的单词。 所以我们的目的是要寻找出在``candidates``里让上述概率最大的正确写法``c``。 \n",
"\n",
"$p(s|c)$,这个概率我们可以通过历史数据来获得,也就是对于一个正确的单词$c$, 有百分之多少人把它写成了错误的形式1,形式2... 这部分的数据可以从``spell_errors.txt``里面找得到。但在这个文件里,我们并没有标记这个概率,所以可以使用uniform probability来表示。这个也叫做channel probability。\n",
"\n",
"$p(c)$,这一项代表的是语言模型,也就是假如我们把错误的$s$,改造成了$c$, 把它加入到当前的语句之后有多通顺?在本次项目里我们使用bigram来评估这个概率。 举个例子: 假如有两个候选 $c_1, c_2$, 然后我们希望分别计算出这个语言模型的概率。 由于我们使用的是``bigram``, 我们需要计算出两个概率,分别是当前词前面和后面词的``bigram``概率。 用一个例子来表示:\n",
"\n",
"给定: ``We are go to school tomorrow``, 对于这句话我们希望把中间的``go``替换成正确的形式,假如候选集里有个,分别是``going``, ``went``, 这时候我们分别对这俩计算如下的概率:\n",
"$p(going|are)p(to|going)$和 $p(went|are)p(to|went)$, 然后把这个概率当做是$p(c)$的概率。 然后再跟``channel probability``结合给出最终的概率大小。\n",
"\n",
"那这里的$p(are|going)$这些bigram概率又如何计算呢?答案是训练一个语言模型! 但训练一个语言模型需要一些文本数据,这个数据怎么找? 在这次项目作业里我们会用到``nltk``自带的``reuters``的文本类数据来训练一个语言模型。当然,如果你有资源你也可以尝试其他更大的数据。最终目的就是计算出``bigram``概率。 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.1 训练一个语言模型\n",
"在这里,我们使用``nltk``自带的``reuters``数据来训练一个语言模型。 使用``add-one smoothing``"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nltk.corpus import reuters\n",
"\n",
"# 读取语料库的数据\n",
"categories = reuters.categories()\n",
"corpus = reuters.sents(categories=categories)\n",
"\n",
"# 循环所有的语料库并构建bigram probability. bigram[word1][word2]: 在word1出现的情况下下一个是word2的概率。 \n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.2 构建Channel Probs\n",
"基于``spell_errors.txt``文件构建``channel probability``, 其中$channel[c][s]$表示正确的单词$c$被写错成$s$的概率。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO 构建channel probability \n",
"channel = {}\n",
"\n",
"for line in open('spell-errors.txt'):\n",
" # TODO\n",
"\n",
"# TODO\n",
"\n",
"print(channel) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.3 根据错别字生成所有候选集合\n",
"给定一个错误的单词,首先生成跟这个单词距离为1或者2的所有的候选集合。 这部分的代码我们在课程上也讲过,可以参考一下。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def generate_candidates(word):\n",
" # 基于拼写错误的单词,生成跟它的编辑距离为1或者2的单词,并通过词典库的过滤。\n",
" # 只留写法上正确的单词。 \n",
" \n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.4 给定一个输入,如果有错误需要纠正\n",
"\n",
"给定一个输入``query``, 如果这里有些单词是拼错的,就需要把它纠正过来。这部分的实现可以简单一点: 对于``query``分词,然后把分词后的每一个单词在词库里面搜一下,假设搜不到的话可以认为是拼写错误的! 人如果拼写错误了再通过``channel``和``bigram``来计算最适合的候选。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def spell_corrector(line):\n",
" # 1. 首先做分词,然后把``line``表示成``tokens``\n",
" # 2. 循环每一token, 然后判断是否存在词库里。如果不存在就意味着是拼写错误的,需要修正。 \n",
" # 修正的过程就使用上述提到的``noisy channel model``, 然后从而找出最好的修正之后的结果。 \n",
" \n",
" return newline # 修正之后的结果,假如用户输入没有问题,那这时候``newline = line``\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4.5 基于拼写纠错算法,实现用户输入自动矫正\n",
"首先有了用户的输入``query``, 然后做必要的处理把句子转换成tokens的形状,然后对于每一个token比较是否是valid, 如果不是的话就进行下面的修正过程。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_query1 = \"\" # 拼写错误的\n",
"test_query2 = \"\" # 拼写错误的\n",
"\n",
"test_query1 = spell_corector(test_query1)\n",
"test_query2 = spell_corector(test_query2)\n",
"\n",
"print (get_top_results_tfidf(test_query1))\n",
"print (get_top_results_w2v(test_query1))\n",
"print (get_top_results_bert(test_query1))\n",
"\n",
"print (get_top_results_tfidf(test_query2))\n",
"print (get_top_results_w2v(test_query2))\n",
"print (get_top_results_bert(test_query2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 附录 \n",
"在本次项目中我们实现了一个简易的问答系统。基于这个项目,我们其实可以有很多方面的延伸。\n",
"- 在这里,我们使用文本向量之间的余弦相似度作为了一个标准。但实际上,我们也可以基于基于包含关键词的情况来给一定的权重。比如一个单词跟related word有多相似,越相似就意味着相似度更高,权重也会更大。 \n",
"- 另外 ,除了根据词向量去寻找``related words``也可以提前定义好同义词库,但这个需要大量的人力成本。 \n",
"- 在这里,我们直接返回了问题的答案。 但在理想情况下,我们还是希望通过问题的种类来返回最合适的答案。 比如一个用户问:“明天北京的天气是多少?”, 那这个问题的答案其实是一个具体的温度(其实也叫做实体),所以需要在答案的基础上做进一步的抽取。这项技术其实是跟信息抽取相关的。 \n",
"- 对于词向量,我们只是使用了``average pooling``, 除了average pooling,我们也还有其他的经典的方法直接去学出一个句子的向量。\n",
"- 短文的相似度分析一直是业界和学术界一个具有挑战性的问题。在这里我们使用尽可能多的同义词来提升系统的性能。但除了这种简单的方法,可以尝试其他的方法比如WMD,或者适当结合parsing相关的知识点。 "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"好了,祝你好运! "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment