本日は待ちに待ったソフトマックス回帰の実装です。ロジスティックの兄さんのような存在のロジスティック。出力が確率の多値分類になっただけです。この記事では数式をゴリゴリ計算していこうと思います。
Softmax function
![Rendered by QuickLaTeX.com \[\phi: \mathbb{R}^n \rightarrow \mathbb{R}^K \textbf{ s.t. } \sum_{i=1}^{K} \phi_i = 1 }\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-0a15e15453c54ca42bcd77bea0492c8f_l3.png)
![]()
![]()
![]()
![]()
![]()
![Rendered by QuickLaTeX.com \[\dfrac{\exp({\theta^{(k)}}^T x^{(i)})}{\sum_{j=1}^{K}\exp({\theta^{(j)}}^T x^{(i)})} = \dfrac{C\exp({\theta^{(k)}}^T x^{(i)})}{C\sum_{j=1}^{K}\exp({\theta^{(j)}}^T x^{(i)})} = \dfrac{\exp({\theta^{(k)}}^T x^{(i)} + \log C)}{\sum_{j=1}^{K}\exp({\theta^{(j)}}^T x^{(i)} + \log C)}\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-6d55b55f685779179281bc3d82359511_l3.png)
![]()
![Rendered by QuickLaTeX.com \[J(\theta) = - \left[ \sum_{i=1}^{N}\sum_{k=1}^{K} 1{y^{(i)} = k} \log \dfrac{\exp({\theta^{(k)}}^T x^{(i)})}{\sum_{j=1}^{K}\exp({\theta^{(j)}}^T x^{(i)})} \right]\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-4112bc9cee49a3857bbaebcd8e06b078_l3.png)
![Rendered by QuickLaTeX.com \[P(y=1|x;\theta) = \phi(x;\theta) = \dfrac{\exp({\theta^{(1)}}^T x)}{\sum_{k=1}^{K}\exp({\theta^{(k)}}^T x)}\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-8e00230ffbacb4ba906457871e685cb8_l3.png)
![Rendered by QuickLaTeX.com \[\nabla_{\theta^{(k)}} J(\theta) = - \nabla_{\theta^{(k)}} \left[ \sum_{i=1}^{N}\sum_{k=1}^{K} 1{y^{(i)} = k} \log \dfrac{\exp({\theta^{(k)}}^T x^{(i)})}{\sum_{j=1}^{K}\exp({\theta^{(j)}}^T x^{(i)})} \right]\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-fe4d03cedcf81e67ef80913441ea362c_l3.png)
![Rendered by QuickLaTeX.com \[= - \nabla_{\theta^{(k)}} \sum_{i=1}^{N}\sum_{k=1}^{K} \left[ 1{y^{(i)} = k} \left{ {\theta^{(k)}}^T x^{(i)} - \left { \log \sum_{j=1}^{K}\exp({\theta^{(j)}}^T x^{(i)}) \right}\right} \right]\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-a0b7d01442484633e7b9ae4b79c5727e_l3.png)
![Rendered by QuickLaTeX.com \[= - \sum_{i=1}^{N} \left{ 1{y^{(i)} = k} [x^{(i)} - P(y^{(i)}=k|x^{(i)};\theta) x^{(i)}]\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-bd6f38d4dbae7f7839d4bb4ad2c2694d_l3.png)
![Rendered by QuickLaTeX.com \[= - \sum_{i=1}^{N} x^{(i)} \left[ 1{y^{(i)} = k} - P(y^{(i)}=k|x^{(i)};\theta) \right]\]](https://research.miidas.jp/wp-content/ql-cache/quicklatex.com-ed1ea8c71ea53a3d39b54f7f489c259f_l3.png)
参考
- https://en.wikipedia.org/wiki/Softmax_function
- https://www.kdnuggets.com/2016/07/softmax-regression-related-logistic-regression.html
- http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
- https://math.stackexchange.com/questions/1428344/what-is-the-derivation-of-the-derivative-of-softmax-regression-or-multinomial-l
- https://houxianxu.github.io/2015/04/23/logistic-softmax-regression/