(some ideas)
label sequence is y, the target is maximize p(y|x)
x → emmition_graph → y
for a single y, there are many corresponding paths in emmition_graph.
all paths cross over some nodes.
some nodes may belong to a few paths.
other may belong to more paths.
more paths, stronger relationship between a node and y.
so occupation probability is a representation of relation tense between a node and y.
Optimize y, means optimize all valid paths which can generate it.
in a graph, the total score is sum of probability on each valid paths.
y = a + b
grad of a is 1, grad_b is 1
the more paths crossed, the more 1 each node will get(occupation probabity).