{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 搭建倒排表\n",
    "倒排表的作用是让搜索更加快速,是搜索引擎中常用的技术。根据课程中所讲的方法,你需要完成这部分的代码。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from tqdm import tqdm\n",
    "import numpy as np\n",
    "import pickle\n",
    "from gensim.models import KeyedVectors  # 词向量用来比较俩俩之间相似度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 读取数据: 导入在preprocessor.ipynb中生成的data/question_answer_pares.pkl文件,并将其保存在变量QApares中\n",
    "with open('data/question_answer_pares.pkl','rb') as f:\n",
    "    QApares = pickle.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>answer</th>\n",
       "      <th>question_after_preprocessing</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>买二份有没有少点呀</td>\n",
       "      <td>亲亲真的不好意思我们已经是优惠价了呢小本生意请亲谅解</td>\n",
       "      <td>[买, 二份, 有没有, 少点]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>那就等你们处理喽</td>\n",
       "      <td>好的亲退了</td>\n",
       "      <td>[处理]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>那我不喜欢</td>\n",
       "      <td>颜色的话一般茶刀茶针和二合一的话都是红木檀和黑木檀哦</td>\n",
       "      <td>[喜欢]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>不是免运费</td>\n",
       "      <td>本店茶具订单满99包邮除宁夏青海内蒙古海南新疆西藏满39包邮</td>\n",
       "      <td>[免, 运费]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>好吃吗</td>\n",
       "      <td>好吃的</td>\n",
       "      <td>[好吃]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    question                          answer question_after_preprocessing\n",
       "0  买二份有没有少点呀      亲亲真的不好意思我们已经是优惠价了呢小本生意请亲谅解             [买, 二份, 有没有, 少点]\n",
       "1   那就等你们处理喽                           好的亲退了                         [处理]\n",
       "2      那我不喜欢      颜色的话一般茶刀茶针和二合一的话都是红木檀和黑木檀哦                         [喜欢]\n",
       "3      不是免运费  本店茶具订单满99包邮除宁夏青海内蒙古海南新疆西藏满39包邮                      [免, 运费]\n",
       "4        好吃吗                             好吃的                         [好吃]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "QApares.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```TODO1``` 构造一个倒排表,不需要考虑单词的相似度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 构建一个倒排表,有关倒排表的详细内容参考实验手册\n",
    "# 为了能够快速检索,倒排表应用哈希表来存储。python中字典内部便是用哈希表来存储的,所以这里我们直接将倒排表保存在字典中\n",
    "# 注意:在这里不需要考虑单词之间的相似度。\n",
    "inverted_list = {}\n",
    "for index,sentence in enumerate(QApares.question_after_preprocessing):\n",
    "    ### 你需要完成的代码\n",
    "    for word in sentence:\n",
    "        if word in inverted_list:\n",
    "            inverted_list[word].add(index)\n",
    "        else:\n",
    "            inverted_list[word] = set()\n",
    "            inverted_list[word].add(index)\n",
    "    \n",
    "    ### 你需要完成的代码结束"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{5,\n",
       " 65541,\n",
       " 32776,\n",
       " 17,\n",
       " 18,\n",
       " 65554,\n",
       " 29,\n",
       " 65566,\n",
       " 32800,\n",
       " 32803,\n",
       " 98339,\n",
       " 32810,\n",
       " 98346,\n",
       " 32818,\n",
       " 55,\n",
       " 98366,\n",
       " 64,\n",
       " 65604,\n",
       " 65611,\n",
       " 32850,\n",
       " 98387,\n",
       " 98398,\n",
       " 65631,\n",
       " 102,\n",
       " 65639,\n",
       " 65640,\n",
       " 65646,\n",
       " 98415,\n",
       " 98416,\n",
       " 118,\n",
       " 122,\n",
       " 65659,\n",
       " 125,\n",
       " 32894,\n",
       " 133,\n",
       " 65669,\n",
       " 65670,\n",
       " 65671,\n",
       " 142,\n",
       " 65679,\n",
       " 32912,\n",
       " 98451,\n",
       " 150,\n",
       " 151,\n",
       " 32929,\n",
       " 65708,\n",
       " 98484,\n",
       " 98489,\n",
       " 187,\n",
       " 32957,\n",
       " 200,\n",
       " 32973,\n",
       " 65742,\n",
       " 98518,\n",
       " 65755,\n",
       " 220,\n",
       " 223,\n",
       " 65764,\n",
       " 65783,\n",
       " 33017,\n",
       " 65786,\n",
       " 254,\n",
       " 65790,\n",
       " 261,\n",
       " 65798,\n",
       " 65810,\n",
       " 275,\n",
       " 98586,\n",
       " 65833,\n",
       " 33068,\n",
       " 65838,\n",
       " 33073,\n",
       " 65843,\n",
       " 65844,\n",
       " 310,\n",
       " 65852,\n",
       " 318,\n",
       " 65862,\n",
       " 65863,\n",
       " 344,\n",
       " 65883,\n",
       " 65885,\n",
       " 350,\n",
       " 33120,\n",
       " 364,\n",
       " 98668,\n",
       " 65903,\n",
       " 33140,\n",
       " 98678,\n",
       " 65913,\n",
       " 33149,\n",
       " 33158,\n",
       " 33163,\n",
       " 33168,\n",
       " 401,\n",
       " 98709,\n",
       " 98715,\n",
       " 33189,\n",
       " 33191,\n",
       " 65960,\n",
       " 33193,\n",
       " 98740,\n",
       " 33210,\n",
       " 98751,\n",
       " 98754,\n",
       " 33220,\n",
       " 453,\n",
       " 98763,\n",
       " 461,\n",
       " 469,\n",
       " 33244,\n",
       " 98794,\n",
       " 495,\n",
       " 98803,\n",
       " 33273,\n",
       " 33276,\n",
       " 66046,\n",
       " 33290,\n",
       " 530,\n",
       " 33300,\n",
       " 66069,\n",
       " 66077,\n",
       " 542,\n",
       " 543,\n",
       " 66081,\n",
       " 66091,\n",
       " 556,\n",
       " 66093,\n",
       " 66094,\n",
       " 98860,\n",
       " 66097,\n",
       " 98877,\n",
       " 66111,\n",
       " 33346,\n",
       " 579,\n",
       " 33347,\n",
       " 588,\n",
       " 66152,\n",
       " 66153,\n",
       " 98932,\n",
       " 66166,\n",
       " 639,\n",
       " 640,\n",
       " 642,\n",
       " 98948,\n",
       " 66185,\n",
       " 651,\n",
       " 33422,\n",
       " 98962,\n",
       " 33427,\n",
       " 98970,\n",
       " 66207,\n",
       " 673,\n",
       " 33441,\n",
       " 33446,\n",
       " 66221,\n",
       " 687,\n",
       " 66224,\n",
       " 33459,\n",
       " 694,\n",
       " 33464,\n",
       " 33466,\n",
       " 33468,\n",
       " 99005,\n",
       " 66238,\n",
       " 99007,\n",
       " 710,\n",
       " 99017,\n",
       " 718,\n",
       " 99044,\n",
       " 743,\n",
       " 33513,\n",
       " 749,\n",
       " 33523,\n",
       " 759,\n",
       " 66299,\n",
       " 99068,\n",
       " 33535,\n",
       " 769,\n",
       " 33539,\n",
       " 66307,\n",
       " 66309,\n",
       " 99084,\n",
       " 66321,\n",
       " 99090,\n",
       " 66323,\n",
       " 99098,\n",
       " 66334,\n",
       " 800,\n",
       " 66337,\n",
       " 33576,\n",
       " 99120,\n",
       " 99122,\n",
       " 33592,\n",
       " 33593,\n",
       " 33600,\n",
       " 33604,\n",
       " 99140,\n",
       " 841,\n",
       " 845,\n",
       " 99149,\n",
       " 66389,\n",
       " 854,\n",
       " 33624,\n",
       " 33625,\n",
       " 858,\n",
       " 66395,\n",
       " 99163,\n",
       " 33629,\n",
       " 99173,\n",
       " 33638,\n",
       " 874,\n",
       " 66411,\n",
       " 878,\n",
       " 33647,\n",
       " 66415,\n",
       " 66417,\n",
       " 33650,\n",
       " 66418,\n",
       " 99182,\n",
       " 891,\n",
       " 33662,\n",
       " 66436,\n",
       " 901,\n",
       " 99208,\n",
       " 66441,\n",
       " 33674,\n",
       " 33675,\n",
       " 33682,\n",
       " 916,\n",
       " 66452,\n",
       " 33691,\n",
       " 33692,\n",
       " 66461,\n",
       " 66463,\n",
       " 33697,\n",
       " 99234,\n",
       " 931,\n",
       " 938,\n",
       " 944,\n",
       " 66480,\n",
       " 33716,\n",
       " 99254,\n",
       " 959,\n",
       " 33728,\n",
       " 99267,\n",
       " 33736,\n",
       " 33741,\n",
       " 978,\n",
       " 33750,\n",
       " 992,\n",
       " 33765,\n",
       " 99301,\n",
       " 1000,\n",
       " 1005,\n",
       " 99309,\n",
       " 33780,\n",
       " 99318,\n",
       " 1017,\n",
       " 33787,\n",
       " 66557,\n",
       " 1024,\n",
       " 1027,\n",
       " 1034,\n",
       " 1036,\n",
       " 66577,\n",
       " 66580,\n",
       " 99351,\n",
       " 99360,\n",
       " 1057,\n",
       " 33825,\n",
       " 99363,\n",
       " 33831,\n",
       " 66600,\n",
       " 33837,\n",
       " 66607,\n",
       " 99376,\n",
       " 33841,\n",
       " 99380,\n",
       " 66613,\n",
       " 33849,\n",
       " 66622,\n",
       " 1087,\n",
       " 1088,\n",
       " 99391,\n",
       " 1095,\n",
       " 1098,\n",
       " 66637,\n",
       " 1102,\n",
       " 99405,\n",
       " 99410,\n",
       " 99413,\n",
       " 66646,\n",
       " 99419,\n",
       " 66652,\n",
       " 1118,\n",
       " 99431,\n",
       " 1129,\n",
       " 66670,\n",
       " 33912,\n",
       " 33914,\n",
       " 66685,\n",
       " 33927,\n",
       " 99467,\n",
       " 99470,\n",
       " 99472,\n",
       " 66705,\n",
       " 66708,\n",
       " 66709,\n",
       " 99484,\n",
       " 99485,\n",
       " 99487,\n",
       " 66721,\n",
       " 33954,\n",
       " 1187,\n",
       " 99495,\n",
       " 66728,\n",
       " 66732,\n",
       " 66734,\n",
       " 99510,\n",
       " 99517,\n",
       " 66751,\n",
       " 1223,\n",
       " 33992,\n",
       " 99528,\n",
       " 99537,\n",
       " 66770,\n",
       " 99539,\n",
       " 1237,\n",
       " 1242,\n",
       " 66779,\n",
       " 66780,\n",
       " 1247,\n",
       " 99555,\n",
       " 34024,\n",
       " 1265,\n",
       " 34034,\n",
       " 1270,\n",
       " 34042,\n",
       " 1277,\n",
       " 66814,\n",
       " 99583,\n",
       " 34064,\n",
       " 1297,\n",
       " 66834,\n",
       " 66844,\n",
       " 1312,\n",
       " 66849,\n",
       " 1326,\n",
       " 66866,\n",
       " 34099,\n",
       " 99637,\n",
       " 1334,\n",
       " 66871,\n",
       " 66877,\n",
       " 1342,\n",
       " 1345,\n",
       " 99651,\n",
       " 66887,\n",
       " 1352,\n",
       " 1354,\n",
       " 66895,\n",
       " 99671,\n",
       " 66909,\n",
       " 99681,\n",
       " 1392,\n",
       " 66929,\n",
       " 34162,\n",
       " 1401,\n",
       " 1402,\n",
       " 34169,\n",
       " 99709,\n",
       " 1407,\n",
       " 66944,\n",
       " 66945,\n",
       " 1412,\n",
       " 1415,\n",
       " 34184,\n",
       " 1420,\n",
       " 99729,\n",
       " 1426,\n",
       " 34196,\n",
       " 66971,\n",
       " 1436,\n",
       " 66974,\n",
       " 1443,\n",
       " 34217,\n",
       " 66986,\n",
       " 1452,\n",
       " 66993,\n",
       " 1459,\n",
       " 34227,\n",
       " 66996,\n",
       " 34258,\n",
       " 99797,\n",
       " 99803,\n",
       " 1508,\n",
       " 99814,\n",
       " 1512,\n",
       " 1515,\n",
       " 67053,\n",
       " 99821,\n",
       " 99824,\n",
       " 99825,\n",
       " 34303,\n",
       " 67071,\n",
       " 34307,\n",
       " 67077,\n",
       " 67079,\n",
       " 99854,\n",
       " 67092,\n",
       " 67094,\n",
       " 99862,\n",
       " 1565,\n",
       " 34333,\n",
       " 1567,\n",
       " 34338,\n",
       " 1579,\n",
       " 1585,\n",
       " 67127,\n",
       " 99897,\n",
       " 1603,\n",
       " 1604,\n",
       " 67147,\n",
       " 67152,\n",
       " 34388,\n",
       " 67156,\n",
       " 99924,\n",
       " 1623,\n",
       " 34395,\n",
       " 99931,\n",
       " 1635,\n",
       " 99946,\n",
       " 67180,\n",
       " 99949,\n",
       " 99954,\n",
       " 34419,\n",
       " 99957,\n",
       " 67197,\n",
       " 34434,\n",
       " 99970,\n",
       " 34440,\n",
       " 1673,\n",
       " 1675,\n",
       " 1676,\n",
       " 34443,\n",
       " 67212,\n",
       " 67216,\n",
       " 34457,\n",
       " 1691,\n",
       " 1692,\n",
       " 1697,\n",
       " 34473,\n",
       " 67241,\n",
       " 34480,\n",
       " 67250,\n",
       " 67269,\n",
       " 1743,\n",
       " 67284,\n",
       " 34528,\n",
       " 1775,\n",
       " 34553,\n",
       " 67323,\n",
       " 1798,\n",
       " 34572,\n",
       " 1805,\n",
       " 34576,\n",
       " 67344,\n",
       " 67350,\n",
       " 1820,\n",
       " 1821,\n",
       " 34590,\n",
       " 1825,\n",
       " 34606,\n",
       " 1839,\n",
       " 67375,\n",
       " 67379,\n",
       " 1852,\n",
       " 1861,\n",
       " 1866,\n",
       " 67402,\n",
       " 1874,\n",
       " 67416,\n",
       " 67431,\n",
       " 1904,\n",
       " 67441,\n",
       " 34687,\n",
       " 1920,\n",
       " 34689,\n",
       " 34701,\n",
       " 34708,\n",
       " 34710,\n",
       " 67492,\n",
       " 67493,\n",
       " 1970,\n",
       " 34744,\n",
       " 34753,\n",
       " 67531,\n",
       " 2004,\n",
       " 2008,\n",
       " 2009,\n",
       " 2023,\n",
       " 67560,\n",
       " 34793,\n",
       " 67562,\n",
       " 34795,\n",
       " 2037,\n",
       " 34807,\n",
       " 2046,\n",
       " 34819,\n",
       " 2052,\n",
       " 34827,\n",
       " 2068,\n",
       " 67613,\n",
       " 34848,\n",
       " 67626,\n",
       " 67628,\n",
       " 34864,\n",
       " 2099,\n",
       " 67636,\n",
       " 34869,\n",
       " 67641,\n",
       " 2110,\n",
       " 67649,\n",
       " 2117,\n",
       " 67655,\n",
       " 67666,\n",
       " 67669,\n",
       " 67680,\n",
       " 34914,\n",
       " 67682,\n",
       " 2151,\n",
       " 34926,\n",
       " 2160,\n",
       " 2161,\n",
       " 34929,\n",
       " 67699,\n",
       " 2170,\n",
       " 2178,\n",
       " 34949,\n",
       " 67718,\n",
       " 2187,\n",
       " 2193,\n",
       " 67730,\n",
       " 67735,\n",
       " 2216,\n",
       " 2218,\n",
       " 2230,\n",
       " 34998,\n",
       " 2235,\n",
       " 67771,\n",
       " 2244,\n",
       " 2247,\n",
       " 67791,\n",
       " 2263,\n",
       " 67800,\n",
       " 2266,\n",
       " 35037,\n",
       " 67810,\n",
       " 67816,\n",
       " 35049,\n",
       " 35055,\n",
       " 67824,\n",
       " 67825,\n",
       " 35059,\n",
       " 35063,\n",
       " 2296,\n",
       " 35067,\n",
       " 2323,\n",
       " 2326,\n",
       " 2330,\n",
       " 67867,\n",
       " 2335,\n",
       " 67886,\n",
       " 2365,\n",
       " 2370,\n",
       " 2372,\n",
       " 2377,\n",
       " 67915,\n",
       " 2380,\n",
       " 2392,\n",
       " 67928,\n",
       " 67934,\n",
       " 35174,\n",
       " 35180,\n",
       " 35188,\n",
       " 67958,\n",
       " 35195,\n",
       " 67968,\n",
       " 35202,\n",
       " 2441,\n",
       " 35211,\n",
       " 67981,\n",
       " 35215,\n",
       " 67984,\n",
       " 35219,\n",
       " 35220,\n",
       " 35228,\n",
       " 2465,\n",
       " 2483,\n",
       " 2484,\n",
       " 35256,\n",
       " 68025,\n",
       " 2510,\n",
       " 35280,\n",
       " 35281,\n",
       " 68066,\n",
       " 2532,\n",
       " 35310,\n",
       " 68084,\n",
       " 2553,\n",
       " 68089,\n",
       " 68097,\n",
       " 2566,\n",
       " 35351,\n",
       " 35358,\n",
       " 2594,\n",
       " 2607,\n",
       " 35378,\n",
       " 2612,\n",
       " 68151,\n",
       " 35385,\n",
       " 2620,\n",
       " 35394,\n",
       " 35395,\n",
       " 68163,\n",
       " 68164,\n",
       " 68169,\n",
       " 35415,\n",
       " 2653,\n",
       " 68199,\n",
       " 68200,\n",
       " 68209,\n",
       " 2677,\n",
       " 68215,\n",
       " 35458,\n",
       " 35459,\n",
       " 2692,\n",
       " 35466,\n",
       " 35472,\n",
       " 2705,\n",
       " 35473,\n",
       " 35474,\n",
       " 68240,\n",
       " 2710,\n",
       " 2712,\n",
       " 35481,\n",
       " 2715,\n",
       " 2721,\n",
       " 68257,\n",
       " 68264,\n",
       " 68265,\n",
       " 35503,\n",
       " 2744,\n",
       " 68290,\n",
       " 2761,\n",
       " 68306,\n",
       " 35539,\n",
       " 35547,\n",
       " 35549,\n",
       " 2786,\n",
       " 35557,\n",
       " 35559,\n",
       " 68329,\n",
       " 68332,\n",
       " 68334,\n",
       " 68337,\n",
       " 35570,\n",
       " 2804,\n",
       " 68343,\n",
       " 35587,\n",
       " 2824,\n",
       " 35603,\n",
       " 2838,\n",
       " 68375,\n",
       " 35613,\n",
       " 2853,\n",
       " 35622,\n",
       " 35634,\n",
       " 2868,\n",
       " 35636,\n",
       " 68408,\n",
       " 68419,\n",
       " 35654,\n",
       " 2887,\n",
       " 68440,\n",
       " 68445,\n",
       " 2916,\n",
       " 35689,\n",
       " 68457,\n",
       " 35697,\n",
       " 35729,\n",
       " 2968,\n",
       " 68506,\n",
       " 35740,\n",
       " 68512,\n",
       " 2981,\n",
       " 35758,\n",
       " 2991,\n",
       " 35759,\n",
       " 2996,\n",
       " 3000,\n",
       " 68537,\n",
       " 3009,\n",
       " 68549,\n",
       " 68556,\n",
       " 35792,\n",
       " 35793,\n",
       " 68562,\n",
       " 68568,\n",
       " 35804,\n",
       " 35809,\n",
       " 35810,\n",
       " 68578,\n",
       " 3050,\n",
       " 3053,\n",
       " 68590,\n",
       " 3061,\n",
       " 3067,\n",
       " 35835,\n",
       " 35843,\n",
       " 3076,\n",
       " 68613,\n",
       " 35846,\n",
       " 3081,\n",
       " 35855,\n",
       " 68629,\n",
       " 3094,\n",
       " 35863,\n",
       " 35867,\n",
       " 35888,\n",
       " 35899,\n",
       " 35915,\n",
       " 68683,\n",
       " 68685,\n",
       " 3151,\n",
       " 68687,\n",
       " 68690,\n",
       " 68697,\n",
       " 35932,\n",
       " 68705,\n",
       " 3171,\n",
       " 68708,\n",
       " 68711,\n",
       " 68721,\n",
       " 3187,\n",
       " 35974,\n",
       " 68744,\n",
       " 3210,\n",
       " 3211,\n",
       " 35983,\n",
       " 68761,\n",
       " 36002,\n",
       " 3236,\n",
       " 68774,\n",
       " 3240,\n",
       " 68790,\n",
       " 3256,\n",
       " 68792,\n",
       " 3265,\n",
       " 3269,\n",
       " 3271,\n",
       " 68809,\n",
       " 3275,\n",
       " 36052,\n",
       " 3285,\n",
       " 3286,\n",
       " 68823,\n",
       " 3288,\n",
       " 3291,\n",
       " 3293,\n",
       " 3303,\n",
       " 68855,\n",
       " 68865,\n",
       " 3333,\n",
       " 36107,\n",
       " 3356,\n",
       " 3364,\n",
       " 3375,\n",
       " 68914,\n",
       " 68930,\n",
       " 68948,\n",
       " 3414,\n",
       " 36192,\n",
       " 36197,\n",
       " 36201,\n",
       " 3434,\n",
       " 36211,\n",
       " 36214,\n",
       " 3455,\n",
       " 36226,\n",
       " 69014,\n",
       " 69015,\n",
       " 3482,\n",
       " 3483,\n",
       " 69022,\n",
       " 3496,\n",
       " 3500,\n",
       " 36273,\n",
       " 36277,\n",
       " 3526,\n",
       " 3527,\n",
       " 36294,\n",
       " 36300,\n",
       " 3536,\n",
       " 3539,\n",
       " 36308,\n",
       " 3542,\n",
       " 3543,\n",
       " 69082,\n",
       " 36320,\n",
       " 36321,\n",
       " 36326,\n",
       " 3560,\n",
       " 3572,\n",
       " 36340,\n",
       " 36341,\n",
       " 36346,\n",
       " 69124,\n",
       " 3590,\n",
       " 36370,\n",
       " 3610,\n",
       " 3613,\n",
       " 69158,\n",
       " 3627,\n",
       " 3636,\n",
       " 36404,\n",
       " 36416,\n",
       " 36419,\n",
       " 3652,\n",
       " 3655,\n",
       " 3657,\n",
       " 3669,\n",
       " 3673,\n",
       " 36444,\n",
       " 36446,\n",
       " 3680,\n",
       " 69220,\n",
       " 69221,\n",
       " 69231,\n",
       " 36469,\n",
       " 69244,\n",
       " 3712,\n",
       " 3713,\n",
       " 36481,\n",
       " 69251,\n",
       " 36488,\n",
       " 69264,\n",
       " 36501,\n",
       " 36509,\n",
       " 36510,\n",
       " 3744,\n",
       " 69282,\n",
       " 69287,\n",
       " 69294,\n",
       " 36527,\n",
       " 36528,\n",
       " 3764,\n",
       " 3785,\n",
       " 3791,\n",
       " 69338,\n",
       " 36572,\n",
       " 3807,\n",
       " 69344,\n",
       " 3815,\n",
       " 69365,\n",
       " 36598,\n",
       " 3832,\n",
       " 3837,\n",
       " 36619,\n",
       " 69405,\n",
       " 3882,\n",
       " 3883,\n",
       " 36650,\n",
       " 69426,\n",
       " 36687,\n",
       " 3928,\n",
       " 3931,\n",
       " 69474,\n",
       " 36707,\n",
       " 69477,\n",
       " 36710,\n",
       " 3945,\n",
       " 36723,\n",
       " 3960,\n",
       " 69512,\n",
       " 3980,\n",
       " 69516,\n",
       " 69518,\n",
       " 69519,\n",
       " 69523,\n",
       " 36756,\n",
       " 36762,\n",
       " 36765,\n",
       " 36771,\n",
       " 36778,\n",
       " 4026,\n",
       " 69563,\n",
       " 69569,\n",
       " 69578,\n",
       " 69596,\n",
       " 36829,\n",
       " 4069,\n",
       " 36838,\n",
       " 69607,\n",
       " 36841,\n",
       " 4074,\n",
       " 36845,\n",
       " 69617,\n",
       " 4093,\n",
       " 36871,\n",
       " 36874,\n",
       " 36878,\n",
       " 36880,\n",
       " 36888,\n",
       " 36890,\n",
       " 4124,\n",
       " 69662,\n",
       " 4127,\n",
       " 36898,\n",
       " 4133,\n",
       " 36916,\n",
       " 36917,\n",
       " 36924,\n",
       " 69692,\n",
       " 69695,\n",
       " 4161,\n",
       " 4162,\n",
       " 4165,\n",
       " 69707,\n",
       " 4175,\n",
       " 36948,\n",
       " 4182,\n",
       " 36958,\n",
       " 69732,\n",
       " 36974,\n",
       " 36978,\n",
       " 69748,\n",
       " 4220,\n",
       " 36995,\n",
       " 69766,\n",
       " 36999,\n",
       " 69767,\n",
       " 69770,\n",
       " 4238,\n",
       " 37007,\n",
       " 69783,\n",
       " 4250,\n",
       " 69800,\n",
       " 69802,\n",
       " 4267,\n",
       " 37036,\n",
       " 37037,\n",
       " 37039,\n",
       " 4278,\n",
       " 37049,\n",
       " 4282,\n",
       " 4287,\n",
       " 4290,\n",
       " 4291,\n",
       " 37064,\n",
       " 4301,\n",
       " 37072,\n",
       " 37073,\n",
       " 69847,\n",
       " 4312,\n",
       " 4321,\n",
       " 4323,\n",
       " 4324,\n",
       " 4332,\n",
       " 4336,\n",
       " 69874,\n",
       " 69880,\n",
       " 37114,\n",
       " 37120,\n",
       " 4354,\n",
       " 4360,\n",
       " 69898,\n",
       " 37131,\n",
       " 69903,\n",
       " 37150,\n",
       " 4387,\n",
       " 69923,\n",
       " 69928,\n",
       " 37166,\n",
       " 69934,\n",
       " 69944,\n",
       " 37177,\n",
       " 4425,\n",
       " 37193,\n",
       " 4429,\n",
       " 37216,\n",
       " 4463,\n",
       " 37234,\n",
       " 37235,\n",
       " 70002,\n",
       " 4492,\n",
       " 4493,\n",
       " 4500,\n",
       " 70040,\n",
       " ...}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inverted_list[\"发货\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3832"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(inverted_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#d ata/retrieve/sgns.zhihu.word是从https://github.com/Embedding/Chinese-Word-Vectors下载到的预训练好的中文词向量文件\n",
    "#使 用KeyedVectors.load_word2vec_format()函数加载预训练好的词向量文件\n",
    "model = KeyedVectors.load_word2vec_format('data/retrieve/sgns.zhihu.word')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_similar_by_word(word,topk):\n",
    "    '''\n",
    "        返回与一个单词word相似度最高的topk个单词所组成的单词列表\n",
    "        出参:\n",
    "            word_list:与word相似度最高的topk个单词所组成的单词列表。格式为[单词1,单词2,单词3,单词4,单词5]\n",
    "    '''\n",
    "    similar_words = model.similar_by_word(word,topk)\n",
    "    word_list = [word[0] for word in similar_words]\n",
    "    return word_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['昨天', '现在', '今天下午', '明天', '今日']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_similar_by_word(\"今天\",5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```TODO2``` 构造一个新的倒排表,考虑单词之间的语义相似度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 3832/3832 [00:44<00:00, 85.74it/s] "
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OOV_count: 832\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "# TODO:\n",
    "# 构造一个新的倒排表,并将结果保存在字典inverted_list_new中\n",
    "# 新的倒排表键为word,值为老倒排表[word]、老倒排表[单词1]、老倒排表[单词2]、老倒排表[单词3]、老倒排表[单词4]的并集\n",
    "# 即新倒排表保存了包含单词word或包含与单词word最相近的5个单词中的某一个的问题的index\n",
    "inverted_list_new = {}\n",
    "OOV_count = 0\n",
    "for word in tqdm(inverted_list):\n",
    "    ### 你需要完成的部分\n",
    "    try:\n",
    "        top_4_words = get_similar_by_word(word,4)\n",
    "        inverted_list_new[word] = set()\n",
    "        inverted_list_new[word] = inverted_list_new[word].union(inverted_list[word])\n",
    "        for t_word in top_4_words:\n",
    "            if t_word in inverted_list:\n",
    "                inverted_list_new[word] = inverted_list_new[word].union(inverted_list[t_word])\n",
    "    except Exception as e:\n",
    "        OOV_count += 1\n",
    "print(\"OOV_count:\",OOV_count)\n",
    "    ### 你需要完成的代码结束\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{81920,\n",
       " 16386,\n",
       " 5,\n",
       " 65541,\n",
       " 81927,\n",
       " 32776,\n",
       " 81930,\n",
       " 81935,\n",
       " 17,\n",
       " 18,\n",
       " 65554,\n",
       " 16401,\n",
       " 81947,\n",
       " 98331,\n",
       " 29,\n",
       " 65566,\n",
       " 32800,\n",
       " 81953,\n",
       " 32803,\n",
       " 98339,\n",
       " 81959,\n",
       " 32810,\n",
       " 98346,\n",
       " 49194,\n",
       " 32818,\n",
       " 16435,\n",
       " 55,\n",
       " 49209,\n",
       " 98366,\n",
       " 64,\n",
       " 49219,\n",
       " 65604,\n",
       " 81988,\n",
       " 16458,\n",
       " 65611,\n",
       " 81995,\n",
       " 81998,\n",
       " 16463,\n",
       " 16464,\n",
       " 49233,\n",
       " 32850,\n",
       " 98387,\n",
       " 49234,\n",
       " 82004,\n",
       " 86,\n",
       " 81999,\n",
       " 98386,\n",
       " 16475,\n",
       " 32859,\n",
       " 49245,\n",
       " 98398,\n",
       " 65631,\n",
       " 82015,\n",
       " 65630,\n",
       " 102,\n",
       " 65639,\n",
       " 65640,\n",
       " 49259,\n",
       " 65646,\n",
       " 98415,\n",
       " 98416,\n",
       " 49263,\n",
       " 16495,\n",
       " 49267,\n",
       " 16500,\n",
       " 82035,\n",
       " 118,\n",
       " 16503,\n",
       " 65650,\n",
       " 65656,\n",
       " 122,\n",
       " 65659,\n",
       " 49275,\n",
       " 125,\n",
       " 32894,\n",
       " 65660,\n",
       " 133,\n",
       " 65669,\n",
       " 65670,\n",
       " 65671,\n",
       " 32902,\n",
       " 139,\n",
       " 49293,\n",
       " 142,\n",
       " 65679,\n",
       " 32912,\n",
       " 98451,\n",
       " 150,\n",
       " 151,\n",
       " 32929,\n",
       " 49318,\n",
       " 49320,\n",
       " 65708,\n",
       " 82092,\n",
       " 82093,\n",
       " 65711,\n",
       " 98484,\n",
       " 98489,\n",
       " 49337,\n",
       " 187,\n",
       " 49340,\n",
       " 32957,\n",
       " 200,\n",
       " 16588,\n",
       " 32973,\n",
       " 65742,\n",
       " 16589,\n",
       " 98518,\n",
       " 16598,\n",
       " 49366,\n",
       " 82137,\n",
       " 65755,\n",
       " 220,\n",
       " 223,\n",
       " 65764,\n",
       " 82149,\n",
       " 82155,\n",
       " 16621,\n",
       " 49396,\n",
       " 65783,\n",
       " 33017,\n",
       " 65786,\n",
       " 254,\n",
       " 65790,\n",
       " 82176,\n",
       " 261,\n",
       " 65798,\n",
       " 65806,\n",
       " 65810,\n",
       " 275,\n",
       " 279,\n",
       " 98586,\n",
       " 82208,\n",
       " 49442,\n",
       " 65833,\n",
       " 33068,\n",
       " 82220,\n",
       " 65838,\n",
       " 82221,\n",
       " 82224,\n",
       " 33073,\n",
       " 65843,\n",
       " 65844,\n",
       " 16691,\n",
       " 310,\n",
       " 16696,\n",
       " 82234,\n",
       " 65852,\n",
       " 49468,\n",
       " 318,\n",
       " 49471,\n",
       " 49472,\n",
       " 16706,\n",
       " 49475,\n",
       " 65862,\n",
       " 65863,\n",
       " 16712,\n",
       " 82248,\n",
       " 65866,\n",
       " 49483,\n",
       " 49493,\n",
       " 16726,\n",
       " 344,\n",
       " 16730,\n",
       " 65883,\n",
       " 82268,\n",
       " 65885,\n",
       " 350,\n",
       " 33120,\n",
       " 16745,\n",
       " 364,\n",
       " 98668,\n",
       " 65903,\n",
       " 33140,\n",
       " 98678,\n",
       " 65913,\n",
       " 33149,\n",
       " 49535,\n",
       " 33158,\n",
       " 49544,\n",
       " 82313,\n",
       " 33163,\n",
       " 16783,\n",
       " 33168,\n",
       " 401,\n",
       " 82322,\n",
       " 98709,\n",
       " 49558,\n",
       " 98715,\n",
       " 16796,\n",
       " 49565,\n",
       " 82333,\n",
       " 82334,\n",
       " 33189,\n",
       " 33191,\n",
       " 65960,\n",
       " 33193,\n",
       " 49579,\n",
       " 16812,\n",
       " 98740,\n",
       " 65973,\n",
       " 16822,\n",
       " 82359,\n",
       " 16825,\n",
       " 33210,\n",
       " 16827,\n",
       " 82365,\n",
       " 98751,\n",
       " 16832,\n",
       " 98754,\n",
       " 33220,\n",
       " 453,\n",
       " 49604,\n",
       " 98763,\n",
       " 461,\n",
       " 469,\n",
       " 49623,\n",
       " 16856,\n",
       " 33244,\n",
       " 49629,\n",
       " 16867,\n",
       " 16869,\n",
       " 98794,\n",
       " 82410,\n",
       " 82412,\n",
       " 495,\n",
       " 16882,\n",
       " 98803,\n",
       " 49651,\n",
       " 49656,\n",
       " 33273,\n",
       " 16889,\n",
       " 49658,\n",
       " 33276,\n",
       " 82426,\n",
       " 66046,\n",
       " 507,\n",
       " 16895,\n",
       " 49669,\n",
       " 82437,\n",
       " 33290,\n",
       " 49674,\n",
       " 16911,\n",
       " 530,\n",
       " 33300,\n",
       " 66069,\n",
       " 16918,\n",
       " 16922,\n",
       " 66077,\n",
       " 542,\n",
       " 543,\n",
       " 82463,\n",
       " 66081,\n",
       " 16932,\n",
       " 66091,\n",
       " 556,\n",
       " 66093,\n",
       " 66094,\n",
       " 98860,\n",
       " 82475,\n",
       " 66097,\n",
       " 49709,\n",
       " 16942,\n",
       " 49715,\n",
       " 49716,\n",
       " 66106,\n",
       " 98877,\n",
       " 66111,\n",
       " 49729,\n",
       " 33346,\n",
       " 579,\n",
       " 33347,\n",
       " 82499,\n",
       " 49735,\n",
       " 16967,\n",
       " 49739,\n",
       " 588,\n",
       " 16983,\n",
       " 82522,\n",
       " 16991,\n",
       " 82528,\n",
       " 49761,\n",
       " 49762,\n",
       " 66151,\n",
       " 66152,\n",
       " 66153,\n",
       " 17003,\n",
       " 49777,\n",
       " 98932,\n",
       " 17012,\n",
       " 66166,\n",
       " 17020,\n",
       " 17021,\n",
       " 639,\n",
       " 640,\n",
       " 49793,\n",
       " 642,\n",
       " 17026,\n",
       " 98948,\n",
       " 17027,\n",
       " 49796,\n",
       " 82563,\n",
       " 17030,\n",
       " 66185,\n",
       " 82566,\n",
       " 651,\n",
       " 33422,\n",
       " 82576,\n",
       " 98962,\n",
       " 33427,\n",
       " 17043,\n",
       " 49812,\n",
       " 82584,\n",
       " 98970,\n",
       " 17050,\n",
       " 17052,\n",
       " 66207,\n",
       " 82592,\n",
       " 673,\n",
       " 33441,\n",
       " 17057,\n",
       " 17061,\n",
       " 33446,\n",
       " 49831,\n",
       " 66221,\n",
       " 82605,\n",
       " 687,\n",
       " 66224,\n",
       " 49841,\n",
       " 33453,\n",
       " 33459,\n",
       " 17076,\n",
       " 66228,\n",
       " 694,\n",
       " 33464,\n",
       " 33466,\n",
       " 33468,\n",
       " 99005,\n",
       " 66238,\n",
       " 99007,\n",
       " 99006,\n",
       " 82628,\n",
       " 710,\n",
       " 82630,\n",
       " 82632,\n",
       " 99017,\n",
       " 711,\n",
       " 718,\n",
       " 82639,\n",
       " 49872,\n",
       " 82645,\n",
       " 82652,\n",
       " 49889,\n",
       " 82658,\n",
       " 99044,\n",
       " 743,\n",
       " 17127,\n",
       " 33513,\n",
       " 49897,\n",
       " 17132,\n",
       " 749,\n",
       " 49901,\n",
       " 49903,\n",
       " 82674,\n",
       " 33523,\n",
       " 17142,\n",
       " 759,\n",
       " 17144,\n",
       " 33527,\n",
       " 66299,\n",
       " 99068,\n",
       " 33535,\n",
       " 17152,\n",
       " 769,\n",
       " 33539,\n",
       " 66307,\n",
       " 66309,\n",
       " 82694,\n",
       " 17162,\n",
       " 99084,\n",
       " 49932,\n",
       " 17167,\n",
       " 49935,\n",
       " 66321,\n",
       " 99090,\n",
       " 66323,\n",
       " 783,\n",
       " 82713,\n",
       " 99098,\n",
       " 66333,\n",
       " 66334,\n",
       " 82719,\n",
       " 800,\n",
       " 66337,\n",
       " 17185,\n",
       " 49954,\n",
       " 17190,\n",
       " 33576,\n",
       " 49962,\n",
       " 17196,\n",
       " 99120,\n",
       " 99122,\n",
       " 33587,\n",
       " 49972,\n",
       " 49974,\n",
       " 17207,\n",
       " 33592,\n",
       " 33593,\n",
       " 33600,\n",
       " 33604,\n",
       " 99140,\n",
       " 82757,\n",
       " 841,\n",
       " 49994,\n",
       " 845,\n",
       " 99149,\n",
       " 17235,\n",
       " 82771,\n",
       " 66389,\n",
       " 854,\n",
       " 82775,\n",
       " 33624,\n",
       " 33625,\n",
       " 858,\n",
       " 66395,\n",
       " 99163,\n",
       " 33629,\n",
       " 17245,\n",
       " 82782,\n",
       " 17249,\n",
       " 17251,\n",
       " 82787,\n",
       " 99173,\n",
       " 33638,\n",
       " 17255,\n",
       " 874,\n",
       " 66411,\n",
       " 82794,\n",
       " 17259,\n",
       " 878,\n",
       " 33647,\n",
       " 66415,\n",
       " 66417,\n",
       " 33650,\n",
       " 66418,\n",
       " 99182,\n",
       " 82798,\n",
       " 33652,\n",
       " 891,\n",
       " 17275,\n",
       " 82812,\n",
       " 33662,\n",
       " 82813,\n",
       " 17283,\n",
       " 66436,\n",
       " 901,\n",
       " 99208,\n",
       " 66441,\n",
       " 33674,\n",
       " 33675,\n",
       " 17291,\n",
       " 82829,\n",
       " 33682,\n",
       " 916,\n",
       " 66452,\n",
       " 17304,\n",
       " 33688,\n",
       " 33691,\n",
       " 33692,\n",
       " 66461,\n",
       " 66463,\n",
       " 82847,\n",
       " 33697,\n",
       " 99234,\n",
       " 931,\n",
       " 82848,\n",
       " 17317,\n",
       " 938,\n",
       " 17324,\n",
       " 82863,\n",
       " 944,\n",
       " 66480,\n",
       " 33716,\n",
       " 99254,\n",
       " 82878,\n",
       " 959,\n",
       " 33728,\n",
       " 99267,\n",
       " 33736,\n",
       " 33741,\n",
       " 17359,\n",
       " 978,\n",
       " 17364,\n",
       " 33750,\n",
       " 82905,\n",
       " 17371,\n",
       " 992,\n",
       " 17378,\n",
       " 33765,\n",
       " 99301,\n",
       " 1000,\n",
       " 50152,\n",
       " 50155,\n",
       " 1005,\n",
       " 99309,\n",
       " 50158,\n",
       " 82928,\n",
       " 66542,\n",
       " 33780,\n",
       " 99318,\n",
       " 1017,\n",
       " 33787,\n",
       " 66557,\n",
       " 1024,\n",
       " 17408,\n",
       " 17409,\n",
       " 1027,\n",
       " 50179,\n",
       " 82949,\n",
       " 82950,\n",
       " 1034,\n",
       " 50187,\n",
       " 1036,\n",
       " 50191,\n",
       " 66577,\n",
       " 66580,\n",
       " 99351,\n",
       " 82969,\n",
       " 82970,\n",
       " 17437,\n",
       " 99360,\n",
       " 1057,\n",
       " 33825,\n",
       " 99363,\n",
       " 82977,\n",
       " 82979,\n",
       " 33831,\n",
       " 66600,\n",
       " 17447,\n",
       " 17448,\n",
       " 33837,\n",
       " 50221,\n",
       " 66607,\n",
       " 99376,\n",
       " 33841,\n",
       " 82993,\n",
       " 99380,\n",
       " 66613,\n",
       " 82996,\n",
       " 82998,\n",
       " 33849,\n",
       " 83003,\n",
       " 66622,\n",
       " 1087,\n",
       " 1088,\n",
       " 99391,\n",
       " 17473,\n",
       " 83011,\n",
       " 50245,\n",
       " 1095,\n",
       " 1098,\n",
       " 66637,\n",
       " 1102,\n",
       " 99405,\n",
       " 50256,\n",
       " 99410,\n",
       " 66644,\n",
       " 99413,\n",
       " 66646,\n",
       " 99419,\n",
       " 66652,\n",
       " 83037,\n",
       " 1118,\n",
       " 83040,\n",
       " 99431,\n",
       " 66663,\n",
       " 1129,\n",
       " 17516,\n",
       " 83052,\n",
       " 66670,\n",
       " 33900,\n",
       " 50290,\n",
       " 17524,\n",
       " 33912,\n",
       " 83065,\n",
       " 33914,\n",
       " 66685,\n",
       " 83071,\n",
       " 17537,\n",
       " 50309,\n",
       " 33927,\n",
       " 50314,\n",
       " 99467,\n",
       " 99470,\n",
       " 50318,\n",
       " 99472,\n",
       " 66705,\n",
       " 83088,\n",
       " 66708,\n",
       " 66709,\n",
       " 50327,\n",
       " 99484,\n",
       " 99485,\n",
       " 83101,\n",
       " 99487,\n",
       " 66721,\n",
       " 33954,\n",
       " 1187,\n",
       " 83105,\n",
       " 50337,\n",
       " 50341,\n",
       " 99495,\n",
       " 66728,\n",
       " 50345,\n",
       " 50346,\n",
       " 17578,\n",
       " 66732,\n",
       " 83116,\n",
       " 66734,\n",
       " 50351,\n",
       " 50356,\n",
       " 99510,\n",
       " 17591,\n",
       " 50363,\n",
       " 99517,\n",
       " 66751,\n",
       " 17599,\n",
       " 1223,\n",
       " 33992,\n",
       " 99528,\n",
       " 17607,\n",
       " 50376,\n",
       " 50379,\n",
       " 17616,\n",
       " 99537,\n",
       " 66770,\n",
       " 99539,\n",
       " 1237,\n",
       " 1242,\n",
       " 66779,\n",
       " 66780,\n",
       " 17628,\n",
       " 1247,\n",
       " 17631,\n",
       " 83168,\n",
       " 50402,\n",
       " 99555,\n",
       " 50405,\n",
       " 34024,\n",
       " 50409,\n",
       " 50411,\n",
       " 1265,\n",
       " 34034,\n",
       " 83186,\n",
       " 17651,\n",
       " 50419,\n",
       " 1270,\n",
       " 34042,\n",
       " 99580,\n",
       " 1277,\n",
       " 66814,\n",
       " 99583,\n",
       " 83200,\n",
       " 17671,\n",
       " 50441,\n",
       " 17677,\n",
       " 34064,\n",
       " 1297,\n",
       " 66834,\n",
       " 50448,\n",
       " 50449,\n",
       " 50456,\n",
       " 66844,\n",
       " 83229,\n",
       " 83231,\n",
       " 1312,\n",
       " 66849,\n",
       " 50464,\n",
       " 17698,\n",
       " 17699,\n",
       " 83236,\n",
       " 17702,\n",
       " 1326,\n",
       " 99632,\n",
       " 17713,\n",
       " 66866,\n",
       " 34099,\n",
       " 83252,\n",
       " 99637,\n",
       " 1334,\n",
       " 66871,\n",
       " 83258,\n",
       " 66877,\n",
       " 1342,\n",
       " 50493,\n",
       " 17728,\n",
       " 1345,\n",
       " 99651,\n",
       " 66887,\n",
       " 1352,\n",
       " 50503,\n",
       " 1354,\n",
       " 17737,\n",
       " 50508,\n",
       " 66895,\n",
       " 99671,\n",
       " 66909,\n",
       " 34143,\n",
       " 99681,\n",
       " 83300,\n",
       " 17770,\n",
       " 1392,\n",
       " 66929,\n",
       " 34162,\n",
       " 17777,\n",
       " 1401,\n",
       " 1402,\n",
       " 34169,\n",
       " 50554,\n",
       " 99709,\n",
       " 50557,\n",
       " 1407,\n",
       " 66944,\n",
       " 66945,\n",
       " 17788,\n",
       " 1412,\n",
       " 1415,\n",
       " 34184,\n",
       " 1420,\n",
       " 99729,\n",
       " 1426,\n",
       " 34196,\n",
       " 83351,\n",
       " 66971,\n",
       " 1436,\n",
       " 83357,\n",
       " 66974,\n",
       " 17821,\n",
       " 1443,\n",
       " 83363,\n",
       " 50598,\n",
       " 34217,\n",
       " 66986,\n",
       " 83369,\n",
       " 1452,\n",
       " 66993,\n",
       " 50609,\n",
       " 1459,\n",
       " 34227,\n",
       " 66996,\n",
       " 83385,\n",
       " 50618,\n",
       " 50626,\n",
       " 50627,\n",
       " 83406,\n",
       " 83407,\n",
       " 34258,\n",
       " 50643,\n",
       " 83412,\n",
       " 99797,\n",
       " 50647,\n",
       " 99803,\n",
       " 50652,\n",
       " 50654,\n",
       " 1508,\n",
       " 50661,\n",
       " 99814,\n",
       " 1512,\n",
       " 1515,\n",
       " 50667,\n",
       " 67053,\n",
       " 99821,\n",
       " 17901,\n",
       " 99824,\n",
       " 99825,\n",
       " 83438,\n",
       " 50672,\n",
       " 83441,\n",
       " 50677,\n",
       " 17910,\n",
       " 83446,\n",
       " 83447,\n",
       " 83448,\n",
       " 34292,\n",
       " 17916,\n",
       " 17918,\n",
       " 34303,\n",
       " 67071,\n",
       " 34307,\n",
       " 83460,\n",
       " 67077,\n",
       " 1540,\n",
       " 67079,\n",
       " 34310,\n",
       " 99854,\n",
       " 83473,\n",
       " 67092,\n",
       " 67094,\n",
       " 99862,\n",
       " 67100,\n",
       " 1565,\n",
       " 34333,\n",
       " 1567,\n",
       " 34338,\n",
       " 99877,\n",
       " 1579,\n",
       " 83502,\n",
       " 83504,\n",
       " 1585,\n",
       " 67127,\n",
       " 99897,\n",
       " 50753,\n",
       " 83521,\n",
       " 1603,\n",
       " 1604,\n",
       " 17991,\n",
       " 17992,\n",
       " 50760,\n",
       " 83527,\n",
       " 67147,\n",
       " 17996,\n",
       " 67152,\n",
       " 34388,\n",
       " 67156,\n",
       " 99924,\n",
       " 1623,\n",
       " 34395,\n",
       " 99931,\n",
       " 18016,\n",
       " 1635,\n",
       " 99946,\n",
       " 67180,\n",
       " 99949,\n",
       " 99954,\n",
       " 34419,\n",
       " 83571,\n",
       " 99957,\n",
       " 83573,\n",
       " 83572,\n",
       " 18043,\n",
       " 50811,\n",
       " 67197,\n",
       " 34434,\n",
       " 99970,\n",
       " 18053,\n",
       " 83590,\n",
       " 34440,\n",
       " 1673,\n",
       " 50826,\n",
       " 1675,\n",
       " 1676,\n",
       " 34443,\n",
       " 67212,\n",
       " 67216,\n",
       " 18065,\n",
       " 18066,\n",
       " 18068,\n",
       " 18069,\n",
       " 50838,\n",
       " 50839,\n",
       " 34457,\n",
       " 50841,\n",
       " 1691,\n",
       " 1692,\n",
       " 83615,\n",
       " 1697,\n",
       " 50855,\n",
       " 34473,\n",
       " 67241,\n",
       " 50861,\n",
       " 34480,\n",
       " 67250,\n",
       " 18104,\n",
       " 18108,\n",
       " 83645,\n",
       " 18112,\n",
       " 18116,\n",
       " 67269,\n",
       " 1743,\n",
       " 83667,\n",
       " 67284,\n",
       " 50904,\n",
       " 83674,\n",
       " 50910,\n",
       " 34528,\n",
       " 34529,\n",
       " 18146,\n",
       " 50917,\n",
       " 83688,\n",
       " 50923,\n",
       " 1775,\n",
       " 18160,\n",
       " 18168,\n",
       " 34553,\n",
       " 67322,\n",
       " 67323,\n",
       " 18171,\n",
       " 50939,\n",
       " 67327,\n",
       " 50944,\n",
       " 50947,\n",
       " 1798,\n",
       " 83718,\n",
       " 34572,\n",
       " 1805,\n",
       " 83724,\n",
       " 34576,\n",
       " 67344,\n",
       " 83730,\n",
       " 67350,\n",
       " 1820,\n",
       " 1821,\n",
       " 34590,\n",
       " 50972,\n",
       " 1825,\n",
       " 50986,\n",
       " 50988,\n",
       " 34606,\n",
       " 1839,\n",
       " 67375,\n",
       " 50990,\n",
       " 50993,\n",
       " 67379,\n",
       " 83767,\n",
       " 51000,\n",
       " 1852,\n",
       " 34620,\n",
       " 83774,\n",
       " 18239,\n",
       " 1861,\n",
       " 18245,\n",
       " 1866,\n",
       " 67402,\n",
       " 1874,\n",
       " 83795,\n",
       " 83799,\n",
       " 67416,\n",
       " 18270,\n",
       " 51039,\n",
       " 83807,\n",
       " 83810,\n",
       " 51043,\n",
       " 51046,\n",
       " 67431,\n",
       " 51047,\n",
       " 1904,\n",
       " 67441,\n",
       " 18289,\n",
       " 34687,\n",
       " 1920,\n",
       " 34689,\n",
       " 51071,\n",
       " 51074,\n",
       " 1919,\n",
       " 83846,\n",
       " 18314,\n",
       " 51084,\n",
       " 34701,\n",
       " 18319,\n",
       " 34708,\n",
       " 34710,\n",
       " 18326,\n",
       " 51095,\n",
       " 67492,\n",
       " 67493,\n",
       " 51108,\n",
       " 1959,\n",
       " 34725,\n",
       " 18348,\n",
       " 67500,\n",
       " 83887,\n",
       " 1970,\n",
       " 83895,\n",
       " 34744,\n",
       " 51128,\n",
       " 83898,\n",
       " 83902,\n",
       " 34753,\n",
       " 83906,\n",
       " 51140,\n",
       " 18373,\n",
       " 67531,\n",
       " 18384,\n",
       " 2004,\n",
       " 51159,\n",
       " 2008,\n",
       " 2009,\n",
       " 18397,\n",
       " 18401,\n",
       " 83940,\n",
       " 2023,\n",
       " 67560,\n",
       " 34793,\n",
       " 67562,\n",
       " 34795,\n",
       " 51175,\n",
       " 83951,\n",
       " 51184,\n",
       " 83953,\n",
       " 51185,\n",
       " ...}"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inverted_list_new[\"发货\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将新的倒排表保存在文件data/retrieve/invertedList.pkl中\n",
    "with open('data/retrieve/invertedList.pkl','wb') as f:\n",
    "    pickle.dump(inverted_list_new,f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以下为测试,完成上述过程之后,可以运行以下的代码来测试准确性。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "#这一格的内容是从preprocessor.ipynb中粘贴而来,包含了数据预处理的几个关键函数\n",
    "import emoji\n",
    "import re\n",
    "import jieba\n",
    "def clean(content):\n",
    "    content = emoji.demojize(content)\n",
    "    content = re.sub('<.*>','',content)\n",
    "    return content\n",
    "#这一函数是用于对句子进行分词,在preprocessor.ipynb中由于数据是已经分好词的,所以我们并没有进行这一步骤,但是对于一个新的问句,这一步是必不可少的\n",
    "def question_cut(content):\n",
    "    return list(jieba.cut(content))\n",
    "def strip(wordList):\n",
    "    return [word.strip() for word in wordList if word.strip()!='']\n",
    "with open(\"data/stopWord.json\",\"r\") as f:\n",
    "    stopWords = f.read().split(\"\\n\")\n",
    "def rm_stop_word(wordList):\n",
    "    return [word for word in wordList if word not in stopWords]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 从data/retrieve/invertedList.pkl加载倒排表并将其保存在变量invertedList中\n",
    "with open('data/retrieve/invertedList.pkl','rb') as f:\n",
    "    invertedList = pickle.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_retrieve_result(sentence):\n",
    "    '''\n",
    "        输入一个句子sentence,根据倒排表进行快速检索,返回与该句子较相近的一些候选问题的index\n",
    "        候选问题由包含该句子中任一单词或包含与该句子中任一单词意思相近的单词的问题索引组成\n",
    "    '''\n",
    "    sentence = clean(sentence)\n",
    "    sentence = question_cut(sentence)\n",
    "    sentence = strip(sentence)\n",
    "    sentence = rm_stop_word(sentence)\n",
    "    candidate = set()\n",
    "    for word in sentence:\n",
    "        if word in invertedList:\n",
    "            candidate = candidate | invertedList[word]\n",
    "    return candidate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{81920,\n",
       " 16386,\n",
       " 65541,\n",
       " 5,\n",
       " 81927,\n",
       " 32776,\n",
       " 81930,\n",
       " 81935,\n",
       " 17,\n",
       " 18,\n",
       " 65554,\n",
       " 16401,\n",
       " 98331,\n",
       " 81947,\n",
       " 29,\n",
       " 65566,\n",
       " 32800,\n",
       " 81953,\n",
       " 32803,\n",
       " 98339,\n",
       " 81959,\n",
       " 32810,\n",
       " 98346,\n",
       " 49194,\n",
       " 32818,\n",
       " 16435,\n",
       " 55,\n",
       " 49209,\n",
       " 98366,\n",
       " 64,\n",
       " 49219,\n",
       " 65604,\n",
       " 81988,\n",
       " 16458,\n",
       " 65611,\n",
       " 81995,\n",
       " 81998,\n",
       " 16463,\n",
       " 16464,\n",
       " 49233,\n",
       " 32850,\n",
       " 98387,\n",
       " 98386,\n",
       " 49234,\n",
       " 86,\n",
       " 81999,\n",
       " 82004,\n",
       " 32859,\n",
       " 16475,\n",
       " 49245,\n",
       " 98398,\n",
       " 65631,\n",
       " 65630,\n",
       " 82015,\n",
       " 102,\n",
       " 65639,\n",
       " 65640,\n",
       " 49259,\n",
       " 65646,\n",
       " 98415,\n",
       " 98416,\n",
       " 49263,\n",
       " 65650,\n",
       " 16495,\n",
       " 49267,\n",
       " 16500,\n",
       " 118,\n",
       " 82035,\n",
       " 65656,\n",
       " 16503,\n",
       " 122,\n",
       " 65659,\n",
       " 65660,\n",
       " 125,\n",
       " 32894,\n",
       " 49275,\n",
       " 133,\n",
       " 65669,\n",
       " 65670,\n",
       " 65671,\n",
       " 32902,\n",
       " 139,\n",
       " 49293,\n",
       " 142,\n",
       " 65679,\n",
       " 32912,\n",
       " 98451,\n",
       " 150,\n",
       " 151,\n",
       " 32929,\n",
       " 49318,\n",
       " 49320,\n",
       " 65708,\n",
       " 82092,\n",
       " 82093,\n",
       " 65711,\n",
       " 98484,\n",
       " 98489,\n",
       " 49337,\n",
       " 187,\n",
       " 49340,\n",
       " 32957,\n",
       " 200,\n",
       " 16588,\n",
       " 32973,\n",
       " 65742,\n",
       " 16589,\n",
       " 98518,\n",
       " 16598,\n",
       " 49366,\n",
       " 82137,\n",
       " 65755,\n",
       " 220,\n",
       " 223,\n",
       " 65764,\n",
       " 82149,\n",
       " 82155,\n",
       " 16621,\n",
       " 49396,\n",
       " 65783,\n",
       " 33017,\n",
       " 65786,\n",
       " 254,\n",
       " 65790,\n",
       " 82176,\n",
       " 261,\n",
       " 65798,\n",
       " 65806,\n",
       " 65810,\n",
       " 275,\n",
       " 279,\n",
       " 98586,\n",
       " 82208,\n",
       " 49442,\n",
       " 65833,\n",
       " 33068,\n",
       " 82220,\n",
       " 65838,\n",
       " 82221,\n",
       " 82224,\n",
       " 33073,\n",
       " 65843,\n",
       " 65844,\n",
       " 16691,\n",
       " 310,\n",
       " 16696,\n",
       " 82234,\n",
       " 65852,\n",
       " 49468,\n",
       " 318,\n",
       " 49471,\n",
       " 49472,\n",
       " 16706,\n",
       " 49475,\n",
       " 65862,\n",
       " 65863,\n",
       " 16712,\n",
       " 82248,\n",
       " 65866,\n",
       " 49483,\n",
       " 49493,\n",
       " 16726,\n",
       " 344,\n",
       " 16730,\n",
       " 65883,\n",
       " 82268,\n",
       " 65885,\n",
       " 350,\n",
       " 33120,\n",
       " 16745,\n",
       " 364,\n",
       " 98668,\n",
       " 65903,\n",
       " 33140,\n",
       " 98678,\n",
       " 65913,\n",
       " 33149,\n",
       " 49535,\n",
       " 33158,\n",
       " 49544,\n",
       " 82313,\n",
       " 33163,\n",
       " 16783,\n",
       " 33168,\n",
       " 401,\n",
       " 82322,\n",
       " 98709,\n",
       " 49558,\n",
       " 98715,\n",
       " 16796,\n",
       " 49565,\n",
       " 82333,\n",
       " 82334,\n",
       " 33189,\n",
       " 33191,\n",
       " 65960,\n",
       " 33193,\n",
       " 49579,\n",
       " 16812,\n",
       " 98740,\n",
       " 65973,\n",
       " 16822,\n",
       " 82359,\n",
       " 16825,\n",
       " 33210,\n",
       " 16827,\n",
       " 82365,\n",
       " 98751,\n",
       " 16832,\n",
       " 98754,\n",
       " 33220,\n",
       " 453,\n",
       " 49604,\n",
       " 98763,\n",
       " 461,\n",
       " 469,\n",
       " 49623,\n",
       " 16856,\n",
       " 33244,\n",
       " 49629,\n",
       " 16867,\n",
       " 16869,\n",
       " 98794,\n",
       " 82410,\n",
       " 82412,\n",
       " 495,\n",
       " 16882,\n",
       " 98803,\n",
       " 49651,\n",
       " 49656,\n",
       " 33273,\n",
       " 16889,\n",
       " 507,\n",
       " 33276,\n",
       " 82426,\n",
       " 66046,\n",
       " 49658,\n",
       " 16895,\n",
       " 49669,\n",
       " 82437,\n",
       " 33290,\n",
       " 49674,\n",
       " 16911,\n",
       " 530,\n",
       " 33300,\n",
       " 66069,\n",
       " 16918,\n",
       " 16922,\n",
       " 66077,\n",
       " 542,\n",
       " 543,\n",
       " 82463,\n",
       " 66081,\n",
       " 16932,\n",
       " 66091,\n",
       " 556,\n",
       " 66093,\n",
       " 66094,\n",
       " 98860,\n",
       " 33324,\n",
       " 66097,\n",
       " 82475,\n",
       " 49709,\n",
       " 16942,\n",
       " 49715,\n",
       " 49716,\n",
       " 66106,\n",
       " 98877,\n",
       " 66111,\n",
       " 49729,\n",
       " 33346,\n",
       " 579,\n",
       " 33347,\n",
       " 82499,\n",
       " 49735,\n",
       " 16967,\n",
       " 49739,\n",
       " 588,\n",
       " 16983,\n",
       " 82522,\n",
       " 16991,\n",
       " 82528,\n",
       " 49761,\n",
       " 49762,\n",
       " 66151,\n",
       " 66152,\n",
       " 66153,\n",
       " 17003,\n",
       " 49777,\n",
       " 98932,\n",
       " 17012,\n",
       " 66166,\n",
       " 17020,\n",
       " 17021,\n",
       " 639,\n",
       " 640,\n",
       " 49793,\n",
       " 642,\n",
       " 17026,\n",
       " 98948,\n",
       " 17027,\n",
       " 82563,\n",
       " 49796,\n",
       " 17030,\n",
       " 66185,\n",
       " 82566,\n",
       " 651,\n",
       " 33422,\n",
       " 82576,\n",
       " 98962,\n",
       " 33427,\n",
       " 17043,\n",
       " 49812,\n",
       " 82584,\n",
       " 98970,\n",
       " 17050,\n",
       " 17052,\n",
       " 66207,\n",
       " 82592,\n",
       " 673,\n",
       " 33441,\n",
       " 17057,\n",
       " 17061,\n",
       " 33446,\n",
       " 49831,\n",
       " 66221,\n",
       " 33453,\n",
       " 687,\n",
       " 66224,\n",
       " 82605,\n",
       " 49841,\n",
       " 33459,\n",
       " 66228,\n",
       " 17076,\n",
       " 694,\n",
       " 33464,\n",
       " 33466,\n",
       " 33468,\n",
       " 99005,\n",
       " 66238,\n",
       " 99007,\n",
       " 99006,\n",
       " 82628,\n",
       " 710,\n",
       " 711,\n",
       " 82630,\n",
       " 99017,\n",
       " 82632,\n",
       " 718,\n",
       " 82639,\n",
       " 49872,\n",
       " 82645,\n",
       " 82652,\n",
       " 49889,\n",
       " 82658,\n",
       " 99044,\n",
       " 743,\n",
       " 17127,\n",
       " 33513,\n",
       " 49897,\n",
       " 17132,\n",
       " 749,\n",
       " 49901,\n",
       " 49903,\n",
       " 82674,\n",
       " 33523,\n",
       " 17142,\n",
       " 759,\n",
       " 33527,\n",
       " 17144,\n",
       " 66299,\n",
       " 99068,\n",
       " 33535,\n",
       " 17152,\n",
       " 769,\n",
       " 33539,\n",
       " 66307,\n",
       " 66309,\n",
       " 82694,\n",
       " 17162,\n",
       " 99084,\n",
       " 49932,\n",
       " 783,\n",
       " 17167,\n",
       " 66321,\n",
       " 99090,\n",
       " 66323,\n",
       " 49935,\n",
       " 82713,\n",
       " 99098,\n",
       " 66333,\n",
       " 66334,\n",
       " 82719,\n",
       " 800,\n",
       " 66337,\n",
       " 17185,\n",
       " 49954,\n",
       " 17190,\n",
       " 33576,\n",
       " 49962,\n",
       " 17196,\n",
       " 99120,\n",
       " 99122,\n",
       " 33587,\n",
       " 49972,\n",
       " 49974,\n",
       " 17207,\n",
       " 33592,\n",
       " 33593,\n",
       " 33600,\n",
       " 33604,\n",
       " 99140,\n",
       " 82757,\n",
       " 841,\n",
       " 49994,\n",
       " 845,\n",
       " 99149,\n",
       " 17235,\n",
       " 82771,\n",
       " 66389,\n",
       " 854,\n",
       " 82775,\n",
       " 33624,\n",
       " 33625,\n",
       " 858,\n",
       " 66395,\n",
       " 99163,\n",
       " 33629,\n",
       " 17245,\n",
       " 82782,\n",
       " 17249,\n",
       " 17251,\n",
       " 82787,\n",
       " 99173,\n",
       " 33638,\n",
       " 17255,\n",
       " 874,\n",
       " 66411,\n",
       " 82794,\n",
       " 17259,\n",
       " 878,\n",
       " 33647,\n",
       " 66415,\n",
       " 66417,\n",
       " 33650,\n",
       " 66418,\n",
       " 99182,\n",
       " 33652,\n",
       " 82798,\n",
       " 891,\n",
       " 17275,\n",
       " 82812,\n",
       " 33662,\n",
       " 82813,\n",
       " 17283,\n",
       " 66436,\n",
       " 901,\n",
       " 99208,\n",
       " 66441,\n",
       " 33674,\n",
       " 33675,\n",
       " 17291,\n",
       " 82829,\n",
       " 33682,\n",
       " 916,\n",
       " 66452,\n",
       " 33688,\n",
       " 17304,\n",
       " 33691,\n",
       " 33692,\n",
       " 66461,\n",
       " 66463,\n",
       " 82847,\n",
       " 33697,\n",
       " 99234,\n",
       " 931,\n",
       " 82848,\n",
       " 17317,\n",
       " 938,\n",
       " 17324,\n",
       " 82863,\n",
       " 944,\n",
       " 66480,\n",
       " 33716,\n",
       " 99254,\n",
       " 82878,\n",
       " 959,\n",
       " 33728,\n",
       " 99267,\n",
       " 33736,\n",
       " 33741,\n",
       " 17359,\n",
       " 978,\n",
       " 17364,\n",
       " 33750,\n",
       " 82905,\n",
       " 17371,\n",
       " 992,\n",
       " 17378,\n",
       " 33765,\n",
       " 99301,\n",
       " 1000,\n",
       " 50152,\n",
       " 50155,\n",
       " 1005,\n",
       " 99309,\n",
       " 66542,\n",
       " 50158,\n",
       " 82928,\n",
       " 33780,\n",
       " 99318,\n",
       " 1017,\n",
       " 33787,\n",
       " 66557,\n",
       " 1024,\n",
       " 17408,\n",
       " 17409,\n",
       " 1027,\n",
       " 50179,\n",
       " 82949,\n",
       " 82950,\n",
       " 1034,\n",
       " 50187,\n",
       " 1036,\n",
       " 50191,\n",
       " 66577,\n",
       " 66580,\n",
       " 99351,\n",
       " 82969,\n",
       " 82970,\n",
       " 17437,\n",
       " 99360,\n",
       " 1057,\n",
       " 33825,\n",
       " 99363,\n",
       " 82977,\n",
       " 82979,\n",
       " 33831,\n",
       " 66600,\n",
       " 17447,\n",
       " 17448,\n",
       " 33837,\n",
       " 50221,\n",
       " 66607,\n",
       " 99376,\n",
       " 33841,\n",
       " 82993,\n",
       " 99380,\n",
       " 66613,\n",
       " 82996,\n",
       " 82998,\n",
       " 33849,\n",
       " 83003,\n",
       " 66622,\n",
       " 1087,\n",
       " 1088,\n",
       " 99391,\n",
       " 17473,\n",
       " 83011,\n",
       " 50245,\n",
       " 1095,\n",
       " 1098,\n",
       " 66637,\n",
       " 1102,\n",
       " 99405,\n",
       " 50256,\n",
       " 99410,\n",
       " 66644,\n",
       " 99413,\n",
       " 66646,\n",
       " 99419,\n",
       " 66652,\n",
       " 83037,\n",
       " 1118,\n",
       " 83040,\n",
       " 99431,\n",
       " 66663,\n",
       " 1129,\n",
       " 33900,\n",
       " 17516,\n",
       " 66670,\n",
       " 83052,\n",
       " 50290,\n",
       " 17524,\n",
       " 33912,\n",
       " 83065,\n",
       " 33914,\n",
       " 66685,\n",
       " 83071,\n",
       " 17537,\n",
       " 50309,\n",
       " 33927,\n",
       " 50314,\n",
       " 99467,\n",
       " 99470,\n",
       " 50318,\n",
       " 99472,\n",
       " 66705,\n",
       " 83088,\n",
       " 66708,\n",
       " 66709,\n",
       " 50327,\n",
       " 99484,\n",
       " 99485,\n",
       " 83101,\n",
       " 99487,\n",
       " 66721,\n",
       " 33954,\n",
       " 1187,\n",
       " 83105,\n",
       " 50337,\n",
       " 50341,\n",
       " 99495,\n",
       " 66728,\n",
       " 50345,\n",
       " 50346,\n",
       " 17578,\n",
       " 66732,\n",
       " 83116,\n",
       " 66734,\n",
       " 50351,\n",
       " 50356,\n",
       " 99510,\n",
       " 17591,\n",
       " 50363,\n",
       " 99517,\n",
       " 66751,\n",
       " 17599,\n",
       " 1223,\n",
       " 33992,\n",
       " 99528,\n",
       " 17607,\n",
       " 50376,\n",
       " 50379,\n",
       " 17616,\n",
       " 99537,\n",
       " 66770,\n",
       " 99539,\n",
       " 1237,\n",
       " 1242,\n",
       " 66779,\n",
       " 66780,\n",
       " 17628,\n",
       " 1247,\n",
       " 17631,\n",
       " 83168,\n",
       " 50402,\n",
       " 99555,\n",
       " 50405,\n",
       " 34024,\n",
       " 50409,\n",
       " 50411,\n",
       " 1265,\n",
       " 34034,\n",
       " 83186,\n",
       " 17651,\n",
       " 50419,\n",
       " 1270,\n",
       " 34042,\n",
       " 99580,\n",
       " 1277,\n",
       " 66814,\n",
       " 99583,\n",
       " 83200,\n",
       " 17671,\n",
       " 50441,\n",
       " 17677,\n",
       " 34064,\n",
       " 1297,\n",
       " 66834,\n",
       " 50448,\n",
       " 50449,\n",
       " 50456,\n",
       " 66844,\n",
       " 83229,\n",
       " 83231,\n",
       " 1312,\n",
       " 66849,\n",
       " 50464,\n",
       " 17698,\n",
       " 17699,\n",
       " 83236,\n",
       " 17702,\n",
       " 1326,\n",
       " 99632,\n",
       " 17713,\n",
       " 66866,\n",
       " 34099,\n",
       " 83252,\n",
       " 99637,\n",
       " 1334,\n",
       " 66871,\n",
       " 83258,\n",
       " 66877,\n",
       " 1342,\n",
       " 50493,\n",
       " 17728,\n",
       " 1345,\n",
       " 99651,\n",
       " 66887,\n",
       " 1352,\n",
       " 50503,\n",
       " 1354,\n",
       " 17737,\n",
       " 50508,\n",
       " 66895,\n",
       " 99671,\n",
       " 66909,\n",
       " 34143,\n",
       " 99681,\n",
       " 83300,\n",
       " 17770,\n",
       " 1392,\n",
       " 66929,\n",
       " 34162,\n",
       " 17777,\n",
       " 1401,\n",
       " 1402,\n",
       " 34169,\n",
       " 50554,\n",
       " 99709,\n",
       " 17788,\n",
       " 1407,\n",
       " 66944,\n",
       " 66945,\n",
       " 50557,\n",
       " 1412,\n",
       " 1415,\n",
       " 34184,\n",
       " 1420,\n",
       " 99729,\n",
       " 1426,\n",
       " 34196,\n",
       " 83351,\n",
       " 66971,\n",
       " 1436,\n",
       " 83357,\n",
       " 66974,\n",
       " 17821,\n",
       " 1443,\n",
       " 83363,\n",
       " 50598,\n",
       " 34217,\n",
       " 66986,\n",
       " 83369,\n",
       " 1452,\n",
       " 66993,\n",
       " 50609,\n",
       " 1459,\n",
       " 34227,\n",
       " 66996,\n",
       " 83385,\n",
       " 50618,\n",
       " 50626,\n",
       " 50627,\n",
       " 83406,\n",
       " 83407,\n",
       " 34258,\n",
       " 50643,\n",
       " 83412,\n",
       " 99797,\n",
       " 50647,\n",
       " 99803,\n",
       " 50652,\n",
       " 50654,\n",
       " 1508,\n",
       " 50661,\n",
       " 99814,\n",
       " 1512,\n",
       " 1515,\n",
       " 50667,\n",
       " 67053,\n",
       " 99821,\n",
       " 17901,\n",
       " 99824,\n",
       " 99825,\n",
       " 83438,\n",
       " 50672,\n",
       " 34292,\n",
       " 83441,\n",
       " 50677,\n",
       " 17910,\n",
       " 83446,\n",
       " 83447,\n",
       " 83448,\n",
       " 17916,\n",
       " 17918,\n",
       " 34303,\n",
       " 67071,\n",
       " 34307,\n",
       " 1540,\n",
       " 67077,\n",
       " 34310,\n",
       " 67079,\n",
       " 83460,\n",
       " 99854,\n",
       " 83473,\n",
       " 67092,\n",
       " 67094,\n",
       " 99862,\n",
       " 67100,\n",
       " 1565,\n",
       " 34333,\n",
       " 1567,\n",
       " 34338,\n",
       " 99877,\n",
       " 1579,\n",
       " 83502,\n",
       " 83504,\n",
       " 1585,\n",
       " 67127,\n",
       " 99897,\n",
       " 50753,\n",
       " 83521,\n",
       " 1603,\n",
       " 1604,\n",
       " 17991,\n",
       " 17992,\n",
       " 50760,\n",
       " 83527,\n",
       " 67147,\n",
       " 17996,\n",
       " 67152,\n",
       " 34388,\n",
       " 67156,\n",
       " 99924,\n",
       " 1623,\n",
       " 34395,\n",
       " 99931,\n",
       " 18016,\n",
       " 1635,\n",
       " 99946,\n",
       " 67180,\n",
       " 99949,\n",
       " 99954,\n",
       " 34419,\n",
       " 83571,\n",
       " 99957,\n",
       " 83572,\n",
       " 83573,\n",
       " 18043,\n",
       " 50811,\n",
       " 67197,\n",
       " 34434,\n",
       " 99970,\n",
       " 18053,\n",
       " 83590,\n",
       " 34440,\n",
       " 1673,\n",
       " 50826,\n",
       " 1675,\n",
       " 1676,\n",
       " 34443,\n",
       " 67212,\n",
       " 67216,\n",
       " 18065,\n",
       " 18066,\n",
       " 18068,\n",
       " 18069,\n",
       " 50838,\n",
       " 50839,\n",
       " 34457,\n",
       " 50841,\n",
       " 1691,\n",
       " 1692,\n",
       " 83615,\n",
       " 1697,\n",
       " 50855,\n",
       " 34473,\n",
       " 67241,\n",
       " 50861,\n",
       " 34480,\n",
       " 67250,\n",
       " 18104,\n",
       " 18108,\n",
       " 83645,\n",
       " 18112,\n",
       " 18116,\n",
       " 67269,\n",
       " 1743,\n",
       " 83667,\n",
       " 67284,\n",
       " 50904,\n",
       " 83674,\n",
       " 50910,\n",
       " 34528,\n",
       " 34529,\n",
       " 18146,\n",
       " 50917,\n",
       " 83688,\n",
       " 50923,\n",
       " 1775,\n",
       " 18160,\n",
       " 18168,\n",
       " 34553,\n",
       " 67322,\n",
       " 67323,\n",
       " 18171,\n",
       " 50939,\n",
       " 67327,\n",
       " 50944,\n",
       " 50947,\n",
       " 1798,\n",
       " 83718,\n",
       " 34572,\n",
       " 1805,\n",
       " 83724,\n",
       " 34576,\n",
       " 67344,\n",
       " 83730,\n",
       " 67350,\n",
       " 1820,\n",
       " 1821,\n",
       " 34590,\n",
       " 50972,\n",
       " 1825,\n",
       " 50986,\n",
       " 50988,\n",
       " 34606,\n",
       " 1839,\n",
       " 67375,\n",
       " 50990,\n",
       " 50993,\n",
       " 67379,\n",
       " 83767,\n",
       " 51000,\n",
       " 1852,\n",
       " 34620,\n",
       " 83774,\n",
       " 18239,\n",
       " 1861,\n",
       " 18245,\n",
       " 1866,\n",
       " 67402,\n",
       " 1874,\n",
       " 83795,\n",
       " 83799,\n",
       " 67416,\n",
       " 18270,\n",
       " 51039,\n",
       " 83807,\n",
       " 83810,\n",
       " 51043,\n",
       " 51046,\n",
       " 67431,\n",
       " 51047,\n",
       " 1904,\n",
       " 67441,\n",
       " 18289,\n",
       " 34687,\n",
       " 1920,\n",
       " 34689,\n",
       " 1919,\n",
       " 51071,\n",
       " 51074,\n",
       " 83846,\n",
       " 18314,\n",
       " 51084,\n",
       " 34701,\n",
       " 18319,\n",
       " 34708,\n",
       " 34710,\n",
       " 18326,\n",
       " 51095,\n",
       " 67492,\n",
       " 67493,\n",
       " 34725,\n",
       " 1959,\n",
       " 51108,\n",
       " 67500,\n",
       " 18348,\n",
       " 83887,\n",
       " 1970,\n",
       " 83895,\n",
       " 34744,\n",
       " 51128,\n",
       " 83898,\n",
       " 83902,\n",
       " 34753,\n",
       " 83906,\n",
       " 51140,\n",
       " 18373,\n",
       " 67531,\n",
       " 18384,\n",
       " 2004,\n",
       " 51159,\n",
       " 2008,\n",
       " 2009,\n",
       " 18397,\n",
       " 18401,\n",
       " 83940,\n",
       " 2023,\n",
       " 67560,\n",
       " 34793,\n",
       " 67562,\n",
       " 34795,\n",
       " 51175,\n",
       " 83951,\n",
       " 51184,\n",
       " 83953,\n",
       " ...}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_retrieve_result('什么时候发货')  # 通过倒排表返回文档IDs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:greedyaiqa] *",
   "language": "python",
   "name": "conda-env-greedyaiqa-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}