20200331

dd74ebf8 · 20200318029 · dd74ebf8
Commit dd74ebf8 authored Mar 31, 2020 by 20200318029
Show whitespace changes
Inline Side-by-side

Showing with 541 additions and 0 deletions

homework1/MiniHomework#1.ipynb
+541 -0

No files found.
--- a/homework1/MiniHomework#1.ipynb
+++ b/homework1/MiniHomework#1.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 机器学习中的优化  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 问题1 (30分）\n",
+    "假设我们有训练数据$D=\\{(\\mathbf{x}_1,y_1),...,(\\mathbf{x}_n,y_n)\\}$, 其中$(\\mathbf{x}_i,y_i)$为每一个样本，而且$\\mathbf{x}_i$是样本的特征并且$\\mathbf{x}_i\\in \\mathcal{R}^D$, $y_i$代表样本数据的标签（label）, 取值为$0$或者$1$. 在逻辑回归中，模型的参数为$(\\mathbf{w},b)$。对于向量，我们一般用粗体来表达。 为了后续推导的方便，可以把b融入到参数w中。 这是参数$w$就变成 $w=(w_0, w_1, .., w_D)$，也就是前面多出了一个项$w_0$, 可以看作是b，这时候每一个$x_i$也需要稍作改变可以写成 $x_i = [1, x_i]$，前面加了一个1。稍做思考应该能看出为什么可以这么写。\n",
+    "\n",
+    "请回答以下问题。请用Markdown自带的Latex来编写。\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "####  (a) ```编写逻辑回归的目标函数```\n",
+    "请写出目标函数（objective function）, 也就是我们需要\"最小化\"的目标（也称之为损失函数或者loss function)，不需要考虑正则。 把目标函数表示成最小化的形态，另外把$\\prod_{}^{}$转换成$\\log \\sum_{}^{}$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$\\mathcal{L} (\\mathbf{w}) = - \n",
+    "\\sum_{i = 1}^{n} \\left[\n",
+    "    y_{i} \\log \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) +\n",
+    "    (1 - y_{i}) \\log [1 -  \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})]\n",
+    "\\right]$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "####  (b) ```求解对w的一阶导数```\n",
+    "为了做梯度下降法，我们需要对参数$w$求导，请把$L(w)$对$w$的梯度计算一下："
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$\\begin{aligned}\n",
+    "\\frac{\\partial \\mathcal{L} (\\mathbf{w})}{\\partial \\mathbf{w}}\n",
+    "& = - \\frac{\n",
+    "    \\partial \\sum_{i = 1}^{n} \\left[\n",
+    "        y_{i} \\log \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) + \n",
+    "        (1 - y_{i}) \\log [1 -  \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})]\n",
+    "    \\right]\n",
+    "}\n",
+    "{\n",
+    "    \\partial \\mathbf{w}\n",
+    "} \\\\\n",
+    "& = - \\sum_{i = 1}^{n} \\left[\n",
+    "    y_{i} \\frac{\n",
+    "        \\partial \\log \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})\n",
+    "    }\n",
+    "    {\n",
+    "        \\partial \\mathbf{w}\n",
+    "    } + \n",
+    "    (1 - y_{i}) \\frac{\n",
+    "        \\partial \\log [1 -  \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})]\n",
+    "    }\n",
+    "    {\n",
+    "        \\partial \\mathbf{w}\n",
+    "    }\n",
+    "\\right] \\\\\n",
+    "& = - \\sum_{i = 1}^{n} \\left[\n",
+    "    y_{i} \\left( 1 - \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i} ) \\right) \\mathbf{x}_{i} - \n",
+    "    (1 - y_{i}) \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i} ) \\mathbf{x}_{i}\n",
+    "\\right] \\\\\n",
+    "& = \\sum_{i = 1}^{n} \\left[\n",
+    "    \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) - y_{i}\n",
+    "\\right] \\mathbf{x}_{i} \\\\\n",
+    "\\end{aligned}$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "####  (c) ```求解对w的二阶导数```\n",
+    "在上面结果的基础上对$w$求解二阶导数，也就是再求一次导数。 这个过程需要回忆一下线性代数的部分 ^^。 参考： matrix cookbook: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf, 还有 Hessian Matrix。 "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$\\begin{aligned}\n",
+    "\\frac{\\partial^{2} \\mathcal{L}(\\mathbf{w})}{\\partial^{2} \\mathbf{w}}\n",
+    "& = \\frac{\n",
+    "    \\partial \\sum_{i = 1}^{n} \\left[\n",
+    "        \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) - y_{i}\n",
+    "    \\right] \\mathbf{x}_{i}\n",
+    "}\n",
+    "{\n",
+    "    \\partial \\mathbf{w}\n",
+    "} \\\\\n",
+    "& = \\sum_{i = 1}^{n} \\frac{\n",
+    "    \\partial \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})\n",
+    "}\n",
+    "{\n",
+    "    \\partial \\mathbf{w}\n",
+    "} \\mathbf{x}_{i}^{\\text{T}} \\\\\n",
+    "& = \\sum_{i = 1}^{n}\n",
+    "\\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})\n",
+    "\\left( 1 - \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) \\right)\n",
+    "\\mathbf{x}_{i} \\mathbf{x}_{i}^{\\text{T}} \\\\\n",
+    "\\end{aligned}$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### (d) ```证明逻辑回归目标函数是凸函数```\n",
+    "试着证明逻辑回归函数是凸函数。假设一个函数是凸函数，我们则可以得出局部最优解即为全局最优解，所以假设我们通过随机梯度下降法等手段找到最优解时我们就可以确认这个解就是全局最优解。证明凸函数的方法有很多种，在这里我们介绍一种方法，就是基于二次求导大于等于0。比如给定一个函数$f(x)=x^2-3x+3$，做两次\n",
+    "求导之后即可以得出$f''(x)=2 > 0$，所以这个函数就是凸函数。类似的，这种理论也应用于多元变量中的函数上。在多元函数上，只要证明二阶导数是posititive semidefinite即可以。 问题（c）的结果是一个矩阵。 为了证明这个矩阵（假设为H)为Positive Semidefinite，需要证明对于任意一个非零向量$v\\in \\mathcal{R}$, 需要得出$v^{T}Hv >=0$\n",
+    "请写出详细的推导过程："
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "// TODO 请写下推导过程\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\mathbf{v}^{\\text{T}} \\frac{\\partial^{2} \\mathcal{L}(\\mathbf{w})}{\\partial^{2} \\mathbf{w}} \\mathbf{v}\n",
+    "& = \\sum_{i = 1}^{n}\n",
+    "\\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i})\n",
+    "\\left( 1 - \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) \\right)\n",
+    "\\mathbf{v}^{\\text{T}} \\mathbf{x}_{i} \\mathbf{x}_{i}^{\\text{T}} \\mathbf{v}\\\\\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "由于$0 \\lt \\sigma (\\mathbf{w}^{\\text{T}} \\mathbf{x}_{i}) \\lt 1$，故只需证明$\\mathbf{v}^{\\text{T}} \\mathbf{x}_{i} \\mathbf{x}_{i}^{\\text{T}} \\mathbf{v} \\geq 0$\n",
+    "\n",
+    "$$\\mathbf{v}^{\\text{T}} \\mathbf{x}_{i} \\mathbf{x}_{i}^{\\text{T}} \\mathbf{v}\n",
+    "= \\left( \\mathbf{v}^{\\text{T}} \\mathbf{x}_{i} \\right)^{2}\n",
+    "\\geq 0\n",
+    "\\Rightarrow\n",
+    "\\frac{\\partial^{2} \\mathcal{L}(\\mathbf{w})}{\\partial^{2} \\mathbf{w}}\n",
+    "\\geq 0\n",
+    "\\Rightarrow\n",
+    "\\frac{\\partial^{2} \\mathcal{L}(\\mathbf{w})}{\\partial^{2} \\mathbf{w}}\n",
+    "\\succeq 0\n",
+    "$$\n",
+    "\n",
+    "即$\\frac{\\partial^{2} \\mathcal{L}(\\mathbf{w})}{\\partial^{2} \\mathbf{w}}$为半正定矩阵，逻辑回归函数是凸函数得证。\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 问题2 （20分）\n",
+    "证明p-norm是凸函数， p-norm的定义为：\n",
+    "$||x||_p=(\\sum_{i=1}^{n}|x_i|^p)^{1/p}$\n",
+    "\n",
+    "hint: Minkowski’s Inequality"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "// TODO: your proof\n",
+    "\n",
+    "minkowski’s inequality, if $1 \\leq p \\lt \\infty$, then whenever $\\mathbf{x}, \\mathbf{y} \\in \\mathcal{V}_{F}$, we have\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\left\\| \\mathbf{x} + \\mathbf{y} \\right\\|_{p} \\leq \\left\\| \\mathbf{x} \\right\\|_{p} + \\left\\| \\mathbf{y} \\right\\|_{p}\n",
+    "\\end{aligned}$$\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "given two arbitrary points, $\\mathbf{x}$ and $\\mathbf{y}$, and $0 \\leq \\alpha \\leq 1$, let $\\mathbf{z} = \\alpha \\mathbf{x} + (1 - \\alpha)\\mathbf{y}$\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\| \\mathbf{z} \\| =\n",
+    "\\| \\alpha \\mathbf{x} + (1 - \\alpha) \\mathbf{y} \\|_{p} \\leq\n",
+    "\\| \\alpha \\mathbf{x}_{p} \\| + \\| (1 - \\alpha) \\mathbf{y} \\|_{p} =\n",
+    "\\alpha \\| \\mathbf{x}_{p} \\| + (1 - \\alpha) \\| \\mathbf{y} \\|_{p}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "i.e. p-norm is convex"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 问题3 （20分）"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "在第一次课程中，我们讲了在判定凸函数的时候用到一项技术：second-order convexity， 也就是当函数f(x)在每一个点上twice differentiable, 这时候我们就有个性质： f(x)是凸函数，当且仅当 f(x)的二阶为PSD矩阵（半正定矩阵）。 请在下方试着证明一下此理论。 "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "// your proof of second order convexity\n",
+    "\n",
+    "1. sufficiency, to prove $\\nabla^{2} f(\\mathbf{x}) \\succeq 0 \\Rightarrow f(\\mathbf{y}) \\geq f(\\mathbf{x}) + {\\nabla f(\\mathbf{x})}^{\\text{T}}$\n",
+    "\n",
+    "using tayler expansion at $\\mathbf{x}$, let $\\mathbf{y} = \\mathbf{x} + \\Delta \\mathbf{x}$ and $0 \\lt \\theta \\lt 1$,\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "f(\\mathbf{y})\n",
+    "& = f(\\mathbf{x} + \\Delta \\mathbf{x}) \\\\\n",
+    "& = f(\\mathbf{x}) + {\\nabla f(\\mathbf{x})}^{\\text{T}} \\Delta \\mathbf{x}\n",
+    "+ \\frac{1}{2} {\\Delta \\mathbf{x}}^{\\text{T}} {\\nabla^{2} f(\\mathbf{x} + \\theta \\Delta \\mathbf{x})} \\Delta \\mathbf{x} \\\\\n",
+    "& \\because \\nabla^{2} f(\\mathbf{x}) \\succeq 0 \\\\\n",
+    "& \\therefore {\\Delta \\mathbf{x}}^{\\text{T}} {\\nabla^{2} f(\\mathbf{x} + \\theta \\Delta \\mathbf{x})} \\Delta \\mathbf{x} \\geq 0 \\\\\n",
+    "f(\\mathbf{y})\n",
+    "& \\geq f(\\mathbf{x}) + {\\nabla f(\\mathbf{x})}^{\\text{T}} \\Delta \\mathbf{x}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "2. necessity, to prove $f(\\mathbf{y}) \\geq f(\\mathbf{x}) + {\\nabla f(\\mathbf{x})}^{\\text{T}} \\Rightarrow \\nabla^{2} f(\\mathbf{x}) \\succeq 0 $\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\nabla^{2} f(\\mathbf{x})\n",
+    "& = \\lim_{\\Delta \\mathbf{x} \\rightarrow 0} \\frac{\n",
+    "    f(\\mathbf{x} + \\Delta \\mathbf{x}) + f(\\mathbf{x} - \\Delta \\mathbf{x}) - 2 f(\\mathbf{x})\n",
+    "}{\n",
+    "    {\\Delta \\mathbf{x}}^{\\text{T}}{\\Delta \\mathbf{x}}\n",
+    "} \\\\\n",
+    "& \\geq \\lim_{\\Delta \\mathbf{x} \\rightarrow 0} \\frac{\n",
+    "    f(\\mathbf{x} + \\Delta \\mathbf{x}) + f(\\mathbf{x} - \\Delta \\mathbf{x}) - 2 f(\\mathbf{x})\n",
+    "}{\n",
+    "    {\\Delta \\mathbf{x}}^{\\text{T}}{\\Delta \\mathbf{x}}\n",
+    "} \\\\\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 问题4 （30分）\n",
+    "在课堂里我们讲过Transportation problem. 重新描述问题： 有两个城市北京和上海，分别拥有300件衣服和500件衣服，另外有三个城市分别是1，2，3分别需要200，300，250件衣服。现在需要把衣服从北京和上海运送到城市1，2，3。 我们假定每运输一件衣服会产生一些代价，比如：\n",
+    "- 北京 -> 1:  5\n",
+    "- 北京 -> 2:  6\n",
+    "- 北京 -> 3:  4\n",
+    "- 上海 -> 1:  6\n",
+    "- 上海 -> 2:  3\n",
+    "- 上海 -> 3:  7\n",
+    "\n",
+    "最后的值是单位cost. \n",
+    "\n",
+    "问题：我们希望最小化成本。 "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```(a)``` 请写出linear programming formulation。 利用标准的写法(Stanford form)，建议使用矩阵、向量的表示法。 "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "// your formulation\n",
+    "\n",
+    "$\\mathbf{x} = (x_{1}, x_{2}, \\dots, x_{6})$, $\\mathbf{w} = (5, 6, 4, 6, 3, 7)$\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "& \\min_{\\mathbf{x}} \\mathbf{w}^{\\text{T}} \\mathbf{x} \\\\\n",
+    "& s.t. \n",
+    "\\begin{cases}\n",
+    "    x_{1} + x_{2} + x_{3} \\leq 300 \\\\\n",
+    "    x_{4} + x_{5} + x_{6} \\leq 500 \\\\\n",
+    "    x_{1} + x{3} = 200 \\\\\n",
+    "    x_{2} + x{4} = 300 \\\\\n",
+    "    x_{3} + x{6} = 250 \\\\\n",
+    "    x_{i} \\geq 0\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "i.e.\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "& \\min_{\\mathbf{x}} \\mathbf{w}^{\\text{T}} \\mathbf{x} \\\\\n",
+    "& s.t. \n",
+    "\\begin{cases}\n",
+    "    \\mathbf{A}_{\\text{ub}} \\mathbf{x} - \\mathbf{b}_{\\text{ub}} \\leq 0 \\\\\n",
+    "    \\mathbf{A}_{\\text{eq}} \\mathbf{x} - \\mathbf{b}_{\\text{eq}} = 0 \\\\\n",
+    "    - \\mathbf{x} \\leq 0\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "with\n",
+    "\n",
+    "$$\\mathbf{A}_{\\text{ub}} = \\begin{bmatrix}\n",
+    "    1 & 1 & 1 & 0 & 0 & 0 \\\\\n",
+    "    0 & 0 & 0 & 1 & 1 & 1\n",
+    "\\end{bmatrix}$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```(b)``` 利用lp solver求解最优解。 参考：\n",
+    "https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html\n",
+    "    或者： http://cvxopt.org/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# your implementation\n",
+    "\n",
+    "from scipy.optimize import linprog\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "cost: 3049.999996 with x = [5.00000000e+01 3.80939619e-08 2.49999999e+02 1.50000000e+02\n",
+      " 2.99999999e+02 1.15679510e-07]\n"
+     ]
+    }
+   ],
+   "source": [
+    "w = np.array([5, 6, 4, 6, 3, 7])\n",
+    "\n",
+    "A_ub = np.array([[1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1]])\n",
+    "b_ub = np.array([300, 500])\n",
+    "\n",
+    "A_eq = np.array([[1, 0, 0, 1, 0, 0], [0, 1, 0, 0, 1, 0], [0, 0, 1, 0, 0, 1]])\n",
+    "b_eq = np.array([200, 300, 250])\n",
+    "\n",
+    "ret = linprog(c=w, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)\n",
+    "\n",
+    "if ret.success:\n",
+    "    x_opt = ret.x\n",
+    "    cost = ret.fun\n",
+    "else:\n",
+    "    x_opt = None\n",
+    "    cost = None\n",
+    "    \n",
+    "print(\"cost: {:f} with x = {}\".format(cost, x_opt))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "cost: 3050 with x = [50, 0, 250, 150, 300, 0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# rounding to integer\n",
+    "\n",
+    "x_opt = [int(round(num)) for num in x_opt]\n",
+    "cost = np.sum(c * x)\n",
+    "print(\"cost: {:d} with x = {}\".format(cost, x_opt))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```(c)```: 试着把上述LP转化成Dual formulation，请写出dual form. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "// your formulation\n",
+    "\n",
+    "* primal problem:\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "& \\min_{\\mathbf{x}} \\mathbf{w}^{\\text{T}} \\mathbf{x} \\\\\n",
+    "& s.t. \n",
+    "\\begin{cases}\n",
+    "    \\mathbf{A}_{\\text{ub}} \\mathbf{x} - \\mathbf{b}_{\\text{ub}} \\leq 0 \\\\\n",
+    "    \\mathbf{A}_{\\text{eq}} \\mathbf{x} - \\mathbf{b}_{\\text{eq}} = 0 \\\\\n",
+    "    - \\mathbf{x} \\leq 0\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "lagrangian formula\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\mathcal{L}(\\mathbf{x}, \\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma})\n",
+    "& = \\mathbf{w}^{\\text{T}} \\mathbf{x}\n",
+    "+ \\mathbf{\\alpha}^{\\text{T}} (\\mathbf{A}_{\\text{ub}} \\mathbf{x} - \\mathbf{b}_{\\text{ub}})\n",
+    "+ \\mathbf{\\beta}^{\\text{T}} (\\mathbf{A}_{\\text{eq}} \\mathbf{x} - \\mathbf{b}_{\\text{eq}})\n",
+    "- \\mathbf{\\gamma}^{\\text{T}} \\mathbf{x}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "with\n",
+    "\n",
+    "$$\\mathbf{\\alpha} \\geq 0, \\mathbf{\\gamma} \\geq 0$$\n",
+    "\n",
+    "define\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\mathcal{P}(\\mathbf{x})\n",
+    "& = \\sup_{\\mathbf{\\alpha} \\geq 0, \\mathbf{\\beta}, \\mathbf{\\gamma} \\geq 0} \\mathcal{L}(\\mathbf{x}, \\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma}) \\\\\n",
+    "& = \\begin{cases}\n",
+    "    \\mathbf{w}^{\\text{T}} \\mathbf{x}, & \\text{if } \\mathbf{x} {\\text{ is feasible}} \\\\\n",
+    "    + \\infty, & \\text{otherwise} \\\\\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "therefore\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\min_{\\mathbf{x}} \\mathbf{w}^{\\text{T}} \\mathbf{x}\n",
+    "& \\Leftrightarrow \\inf_{\\mathbf{x}} \\mathcal{P}(\\mathbf{x})\n",
+    "= \\inf_{\\mathbf{x}} \\sup_{\\mathbf{\\alpha} \\geq 0, \\mathbf{\\beta}, \\mathbf{\\gamma} \\geq 0} \\mathcal{L}(\\mathbf{x}, \\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma})\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "* dual problem:\n",
+    "\n",
+    "define\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\mathcal{D}(\\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma})\n",
+    "& = \\inf_{\\mathbf{x}} \\mathcal{L}(\\mathbf{x}, \\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma}) \\\\\n",
+    "& = \\mathbf{w}^{\\text{T}} \\mathbf{x}\n",
+    "+ \\mathbf{\\alpha}^{\\text{T}} (\\mathbf{A}_{\\text{ub}} \\mathbf{x} - \\mathbf{b}_{\\text{ub}})\n",
+    "+ \\mathbf{\\beta}^{\\text{T}} (\\mathbf{A}_{\\text{eq}} \\mathbf{x} - \\mathbf{b}_{\\text{eq}})\n",
+    "- \\mathbf{\\gamma}^{\\text{T}} \\mathbf{x} \\\\\n",
+    "& = (\\mathbf{w}^{\\text{T}} + \\mathbf{\\alpha}^{\\text{T}} \\mathbf{A}_{\\text{ub}} + \\mathbf{\\beta}^{\\text{T}} \\mathbf{A}_{\\text{eq}} - \\mathbf{\\gamma}^{\\text{T}}) \\mathbf{x}\n",
+    "- \\mathbf{\\alpha}^{\\text{T}} \\mathbf{b}_{\\text{ub}}\n",
+    "- \\mathbf{\\beta}^{\\text{T}} \\mathbf{b}_{\\text{eq}} \\\\\n",
+    "& = \\begin{cases}\n",
+    "    - \\mathbf{\\alpha}^{\\text{T}} \\mathbf{b}_{\\text{ub}}\n",
+    "    - \\mathbf{\\beta}^{\\text{T}} \\mathbf{b}_{\\text{eq}},\n",
+    "    & \\text{if } \\mathbf{w}^{\\text{T}} + \\mathbf{\\alpha}^{\\text{T}} \\mathbf{A}_{\\text{ub}} + \\mathbf{\\beta}^{\\text{T}} \\mathbf{A}_{\\text{eq}} - \\mathbf{\\gamma}^{\\text{T}} = 0 \\\\\n",
+    "    - \\infty, & \\text{otherwise} \\\\\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "therefore, the dual problem\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "\\sup_{\\mathbf{\\alpha} \\geq 0, \\mathbf{\\beta}, \\mathbf{\\gamma} \\geq 0} \\mathcal{D}(\\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma})\n",
+    "& = \\sup_{\\mathbf{\\alpha} \\geq 0, \\mathbf{\\beta}, \\mathbf{\\gamma} \\geq 0} \\inf_{\\mathbf{x}} \\mathcal{L}(\\mathbf{x}, \\mathbf{\\alpha}, \\mathbf{\\beta}, \\mathbf{\\gamma})\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "can be written as\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "& \\max_{\\mathbf{\\alpha}, \\mathbf{\\beta}}(- \\mathbf{\\alpha}^{\\text{T}} \\mathbf{b}_{\\text{ub}} - \\mathbf{\\beta}^{\\text{T}} \\mathbf{b}_{\\text{eq}}) \\\\\n",
+    "& s.t. \n",
+    "\\begin{cases}\n",
+    "    \\mathbf{w}^{\\text{T}} + \\mathbf{\\alpha}^{\\text{T}} \\mathbf{A}_{\\text{ub}} + \\mathbf{\\beta}^{\\text{T}} \\mathbf{A}_{\\text{eq}} \\geq 0 \\\\\n",
+    "    \\mathbf{\\alpha} \\geq 0 \\\\\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$\n",
+    "\n",
+    "i.e.\n",
+    "\n",
+    "$$\\begin{aligned}\n",
+    "& \\min_{\\mathbf{\\alpha}, \\mathbf{\\beta}}(\n",
+    "    \\mathbf{b}_{\\text{ub}}^{\\text{T}} \\mathbf{\\alpha}\n",
+    "    + \\mathbf{b}_{\\text{eq}}^{\\text{T}} \\mathbf{\\beta}\n",
+    ") \\\\\n",
+    "& s.t. \n",
+    "\\begin{cases}\n",
+    "    \\mathbf{w}^{\\text{T}} + \\mathbf{\\alpha}^{\\text{T}} \\mathbf{A}_{\\text{ub}} + \\mathbf{\\beta}^{\\text{T}} \\mathbf{A}_{\\text{eq}} \\geq 0 \\\\\n",
+    "    \\mathbf{\\alpha} \\geq 0 \\\\\n",
+    "\\end{cases}\n",
+    "\\end{aligned}$$"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}