(some ideas)

label sequence is y, the target is maximize p(y|x)

x → emmition_graph → y

for a single y, there are many corresponding paths in emmition_graph.

all paths cross over some nodes.

some nodes may belong to a few paths.

other may belong to more paths.

more paths, stronger relationship between a node and y.

so occupation probability is a representation of relation tense between a node and y.

Optimize y, means optimize all valid paths which can generate it.

in a graph, the total score is sum of probability on each valid paths.

y = a + b

grad of a is 1, grad_b is 1

the more paths crossed, the more 1 each node will get(occupation probabity).