Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
ml2_MiniAssignments
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
20200318029
ml2_MiniAssignments
Commits
d7870fd6
Commit
d7870fd6
authored
May 28, 2020
by
20200318029
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
homework3
parent
92fbff0e
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
208 additions
and
0 deletions
+208
-0
minihomework3/PMM.ipynb
+178
-0
minihomework3/readme.md
+6
-0
minihomework3/提交方法.txt
+24
-0
No files found.
minihomework3/PMM.ipynb
0 → 100644
View file @
d7870fd6
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 泊松混合模型"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 泊松分布\n",
"\n",
"泊松分布(Poisson distribution)的概率分布函数为:\n",
"\n",
"$$P(X = k) = \\frac{\\lambda^{k}}{k !} e^{- \\lambda}, k = 0, 1, \\cdots \\tag {1}$$\n",
"\n",
"泊松分布的参数$\\lambda$表示单位时间(单位面积)内随机事件的平均发生次数。泊松分布适合于描述单位时间内随机事件发生的次数。\n",
"\n",
"泊松分布的期望$\\lambda$:,方差:$\\lambda$,特征函数:$\\phi(t) = \\exp(\\lambda (e^{i t} - 1))$\n",
"\n",
"### 泊松分布与二项分布\n",
"\n",
"当二项分布的$n$很大而$p$很小时,二项分布可近似为泊松分布,其中$\\lambda = np$。通常当$n \\geq 20, p \\leq 0.05$时,就可以用泊松分布近似。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 泊松混合模型\n",
"\n",
"泊松混合模型(Poisson mixture model)的概率密度函数为:\n",
"\n",
"$$p(y; \\theta) = \\sum_{k}^{K} \\alpha_{k} \\frac{\\lambda^{k}}{k !} e^{- \\lambda} \\tag {2}$$\n",
"\n",
"其中,$\\alpha_{k} \\geq 0$,$\\sum_{k}^{K} \\alpha_{k} = 1$。\n",
"\n",
"给定由$K$个Poisson分布生成的$N$条独立数据构成数据集$D = {y_{j} | j = 1, 2, \\cdots, N}$,估计泊松混合模型的参数$\\theta_{k} = (\\alpha_{k}, \\lambda_{k} | k = 1, 2, \\cdots, K)$。\n",
"\n",
"1. 隐变量\n",
"\n",
"观测数据$y_{j}$已知,反映观测数据$y_{j}$来自第$k$个分模型的数据未知,以隐变量$\\gamma_{k}$表示:\n",
"\n",
"$$\\gamma_{jk} = \\begin{cases}\n",
"1, & \\text{第}j\\text{个观测来自第}k\\text{个分模型} \\\\\n",
"0, & 否则\n",
"\\end{cases}\n",
"j = 1, 2, \\cdots, N; k = 1, 2, \\cdots, K \\tag {3}$$\n",
"\n",
"未观测数据$\\gamma_{jk}$为`0-1`随机变量,则完全数据为\n",
"\n",
"$$(y_{j}, \\gamma_{j1}, \\cdots, \\gamma_{jK}), j = 1, 2, \\cdots, N$$\n",
"\n",
"完全数据的似然函数:\n",
"\n",
"$$\\begin{aligned}\n",
"P(y, \\gamma; \\theta) & = \\prod_{j = 1}^{N} P(y_{j}, \\gamma_{j1}, \\cdots, \\gamma_{jK}; \\theta) \\\\\n",
"& = \\prod_{j = 1}^{N} \\prod_{k = 1}^{K} [\\alpha_{k} P(y_{j}; \\theta_{k})]^{\\gamma_{jk}} \\\\\n",
"& = \\prod_{k = 1}^{K} \\alpha_{k}^{n_{k}} \\prod_{j = 1}^{N} [P(y_{j}; \\theta_{k})]^{\\gamma_{jk}} \\\\\n",
"& = \\prod_{k = 1}^{K} \\alpha_{k}^{n_{k}} \\prod_{j = 1}^{N} \\left[\n",
" \\frac{\\lambda_{k}^{y_{j}}}{y_{j} !} e^{- \\lambda_{k}}\n",
"\\right]^{\\gamma_{jk}}\n",
"\\end{aligned}$$\n",
"\n",
"其中,$n_{k} = \\sum_{j = 1}^{N} \\gamma_{jk}$,$N = \\sum_{k = 1}^{K} n_{k}$。完全数据的对数似然函数为\n",
"\n",
"$$\\begin{aligned}\n",
"\\log P(y, \\gamma; \\theta) = \\sum_{k = 1}^{K} \\left( n_{k} \\log \\alpha_{k} + \\sum_{j = 1}^{N} \\gamma_{jk} \\left[\n",
" y_{j} \\log \\lambda_{k} - \\log y_{j}! - \\lambda_{k}\n",
"\\right] \\right) \\\\\n",
"\\end{aligned} \\tag {4}$$\n",
"\n",
"2. E步,$Q$函数\n",
"\n",
"$$\\begin{aligned}\n",
"Q(\\theta, \\theta^{(i)})\n",
"& = \\text{E}_{\\gamma} \\left[ \\log P(y, \\gamma; \\theta) | y; \\theta^{(i)} \\right] \\\\\n",
"& = \\sum_{k = 1}^{K} \\left(\n",
" \\sum_{j = 1}^{N} \\text{E}[\\gamma_{jk} | y_{j}; \\theta^{(i)}] \\log \\alpha_{k} + \\sum_{j = 1}^{N} \\text{E}[\\gamma_{jk} | y_{j}; \\theta^{(i)}] \\left[\n",
" y_{j} \\log \\lambda_{k} - \\log y_{j}! - \\lambda_{k}\n",
"\\right] \\right) \\\\\n",
"\\end{aligned}$$\n",
"\n",
"令$\\hat{\\gamma}_{jk} = \\text{E}[\\gamma_{jk} | y_{j}; \\theta^{(i)}]$,表示当前模型参数下第$j$个观测数据来自第$k$个分模型的概率,称为分模型$k$对观测数据$y_{j}$的响应度。\n",
"\n",
"$$\\begin{aligned}\n",
"\\hat{\\gamma}_{jk} & = \\text{E}[\\gamma_{jk} | y_{j}; \\theta^{(i)}] \\\\\n",
"& = \\frac{\n",
" P(y_{j} | \\gamma_{jk} = 1; \\theta^{(i)}) P(\\gamma_{jk} = 1; \\theta^{(i)})\n",
"}{\n",
" \\sum_{k = 1}^{K} P(y_{j} | \\gamma_{jk} = 1; \\theta^{(i)}) P(\\gamma_{jk} = 1; \\theta^{(i)})\n",
"} \\\\\n",
"& = \\frac{\n",
" \\frac{\\alpha_{k} \\lambda_{k}^{y_{j}}}{y_{j} !} e^{- \\lambda_{k}}\n",
"}{\n",
" \\sum_{k = 1}^{K} \\frac{\\alpha_{k} \\lambda_{k}^{y_{j}}}{y_{j} !} e^{- \\lambda_{k}}\n",
"} \\\\\n",
"\\end{aligned} \\tag {5}$$\n",
"\n",
"则$n_{k} = \\sum_{j = 1}^{N} E[\\hat{\\gamma}_{jk}]$\n",
"\n",
"$$\\begin{aligned}\n",
"Q(\\theta, \\theta^{(i)}) = \\sum_{k = 1}^{K} \\left(\n",
" \\sum_{j = 1}^{N} \\hat{\\gamma}_{jk} \\log \\alpha_{k} + \\sum_{j = 1}^{N} \\hat{\\gamma}_{jk} \\left[\n",
" y_{j} \\log \\lambda_{k} - \\log y_{j}! - \\lambda_{k}\n",
"\\right] \\right) \\\\\n",
"\\end{aligned} \\tag {6}$$\n",
"\n",
"3. M步\n",
"\n",
"求使函数$Q(\\theta, \\theta^{(i)})$取极大值时的参数$\\theta^{(i + 1)} = (\\hat{\\lambda}_{k}, \\hat{\\alpha}_{k} | k = 1, 2, \\cdots, K)$:\n",
"\n",
"$$\\begin{aligned}\n",
"\\theta^{(i + 1)} & = \\argmax_{\\theta} Q(\\theta, \\theta^{(i)}) \\\\\n",
"& \\text{s.t.} \\sum_{k = 1}^{K} \\alpha_{k} = 1, \\ \\alpha_{k} \\geq 1\n",
"\\end{aligned}$$\n",
"\n",
"通过拉格朗日乘数法可得:\n",
"\n",
"$$\\hat{\\lambda}_{k} = \\frac{\\sum_{j = 1}^{N} \\hat{\\gamma}_{jk} y_{j}}{\\sum_{j = 1}^{N} \\hat{\\gamma}_{jk}} \\tag {7}$$\n",
"\n",
"$$\\hat{\\alpha}_{k} = \\frac{n_k}{N} = \\frac{\\sum_{j = 1}^{N} \\hat{\\gamma}_{jk}}{N} \\tag {8}$$\n",
"\n",
"### 泊松混合模型EM参数估计\n",
"\n",
"输入:观测数据$y_{1}, y_{2}, \\cdots,y_{N}$,泊松混合模型;\n",
"\n",
"输出:泊松混合模型参数。\n",
"\n",
"(1)参数初始化\n",
"\n",
"(2)E步:依据当前模型参数,计算分模型$k$对观测数据$y_{j}$的响应度\n",
"\n",
"$$\\begin{aligned}\n",
"\\hat{\\gamma}_{jk} = \\frac{\n",
" \\frac{\\alpha_{k} \\lambda_{k}^{y_{j}}}{y_{j} !} e^{- \\lambda_{k}}\n",
"}{\n",
" \\sum_{k = 1}^{K} \\frac{\\alpha_{k} \\lambda_{k}^{y_{j}}}{y_{j} !} e^{- \\lambda_{k}}\n",
"} \\\\\n",
"\\end{aligned}$$\n",
"\n",
"(3)M步:计算新一轮迭代的模型参数\n",
"\n",
"$$\\begin{aligned}\n",
"\\hat{\\lambda}_{k} & = \\frac{\\sum_{j = 1}^{N} \\hat{\\gamma}_{jk} y_{j}}{\\sum_{j = 1}^{N} \\hat{\\gamma}_{jk}} \\\\\n",
"\\hat{\\alpha}_{k} & = \\frac{n_k}{N} = \\frac{\\sum_{j = 1}^{N} \\hat{\\gamma}_{jk}}{N} \\\\\n",
"k & = 1, 2, \\cdots, K\n",
"\\end{aligned}$$\n",
"\n",
"(4)重复(2)、(3),直至收敛(对数似然函数值不再有明显的变化)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
minihomework3/readme.md
0 → 100644
View file @
d7870fd6
老师应大家的要求,为大家布置了这个小练习,这个是咱们第二个小练习,Jerry老师建议大家一周之内完成,所以请大家抓紧时间哦~,不过因为这个是一次小作业,不会计入最后的项目统计
<br/><br/>
要求如下:
<br/><br/>
作业截至时间:5月29日23:59,具体答案将会在截止的周末进行讲解并公布答案
<br/><br/>
----更加具体的要求,请看相关文档
<br/><br/><br/>
结果:
<br/>
1.
完成即可
minihomework3/提交方法.txt
0 → 100644
View file @
d7870fd6
无论Homework还是Project,都是需要通过gitlab提交的。
无论Homework还是Project,都是需要通过gitlab提交的。
视频版教程:
链接:https://pan.baidu.com/s/1HxNRvrPwOBT4xCzU1iSfwg
提取码:toov
文字版教程:
一、git基本操作
1、克隆仓库
git clone url(git仓库的地址)
2、添加还未被追踪的文件文件
git add 文件名称或者是. (.代表了所有的文件)
3、添加本次提交的注释信息
git commit -m "这里是注释信息"
4、推到远程
git push
5、在已有仓库中拉取远程的代码
git pull
二、添加ssh-key
1、生成ssh-key
命令:ssh-keygen -t rsa -C "你的gitlab的邮箱"
2、到个人的家目录下复制公钥内容
- cd ~/.ssh
- cat id_rsa.pub
复制内容到gitlab的ssh keys中(点击右上角头像--》settings--》ssh-keys--》粘贴内容到keys中,title名字任意)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment