label sequence is y, the target is maximize p(y|x)

x → emmition_graph → y

for a single y, there are many corresponding paths in emmition_graph.

all paths cross over some nodes.

some nodes may belong to a few paths.

other may belong to more paths.

more paths, stronger relationship between a node and y.

so occupation probability is a representation of relation tense between a node and y.

Optimize y, means optimize all valid paths which can generate it.

in a graph, the total score is sum of probability on each valid paths.

y = a + b

grad of a is 1, grad_b is 1

the more paths crossed, the more 1 each node will get(occupation probabity).

What does GTN do?

Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs.

Let’s recall something of pytorch firstly. With pytorch, programmer only concentrate forward process, during which all tensors are created, manipulated, and interacted with each other. Usually, a scalar type loss is generated as the end of forward process. Then just do

loss.backward() // pytorch will do automatic differentiation
// over all tensors in its dynamic computation graph

Replace “tensor” with “fsa/fst” of above process, you got the idea of what GTN do.

Through the separation of graphs from operations on graphs, this…

Liyong Guo

speech recognition / voiceprint / gender classification

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store