facebook GTN usage examples(draft)

Liyong Guo
3 min readNov 19, 2020

What does GTN do?

Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs.

Let’s recall something of pytorch firstly. With pytorch, programmer only concentrate forward process, during which all tensors are created, manipulated, and interacted with each other. Usually, a scalar type loss is generated as the end of forward process. Then just do

loss.backward() // pytorch will do automatic differentiation
// over all tensors in its dynamic computation graph

Replace “tensor” with “fsa/fst” of above process, you got the idea of what GTN do.

Through the separation of graphs from operations on graphs, this framework enables the exploration of new structured loss functions which in turn eases the encoding of prior knowledge into learning algorithms.

GTN provides methods to define fst graphs and operators to manipulate them. Programmers can just focus the process of fst graphs interaction. Then GTN will handle the backward process with a single line:

import gtn
g1 = gtn.Graph()
g2 = gtn.Graph()
# ... add some nodes and arcs to the graph ...
# Compute a function of the graphs:
intersection = gtn.intersect(g1, g2)
score = gtn.forward_score(intersection)
# Calculate gradients:
gtn.backward(score)

usage example: build ctc loss with GTN

Following example is to build a example of ctc loss. Click the following link then you can run the example step by step.

https://colab.research.google.com/drive/1w23Ko69x5qMqvZkAsVRUYE11L3oCtoxT?authuser=1#scrollTo=s8-6q9w-EJyH

Note that the grad of emmision graph is

0 4 
0 1 0 0 0.2
0 1 1 1 0.8
1 2 0 0 0.6
1 2 1 1 0.4
2 3 0 0 0.6
2 3 1 1 0.4
3 4 0 0 0.2
3 4 1 1 0.8

Why these value?

The label graph is: means the label sequence is AA.

label graph

The emission graph is: means the each valid path has 4 chars.

emmission graph

In emmission graph, there are 2*2*2*2 = 16 path.

blank blank blank blank        》》 nothing
blank blank blank A 》》 A
blank blank A blank 》》 A
blank blank A A 》》 A
blank A blank blank 》》 A
blank A blank A 》》 AA (valid)
blank A A blank 》》 A
blank A A A 》》 A
A blank blank blank 》》 A
A blank blank A 》》 AA (valid)
A blank A blank 》》 AA (valid)
A blank A A 》》 AA (valid)
A A blank blank 》》 A
A A blank A 》》 AA (valid)
A A A blank 》》 A
A A A A 》》 A

take all valid paths

blank A     blank A            》》 AA  (valid)
A blank blank A 》》 AA (valid)
A blank A blank 》》 AA (valid)
A blank A A 》》 AA (valid)
A A blank A 》》 AA (valid)

for first column, there is one blank and four A, so blank’s occupation probability is 1/ (1 + 4) = 0.2; while A’s occupation probability is 4 / (1+4) = 0.8.

As why occupation probability is identical to grad? we will discuss in next blog.

Reference:

(paper) DIFFERENTIABLE WEIGHTED FINITE-STATE TRANSDUCERS

--

--

Liyong Guo
0 Followers

speech recognition / voiceprint / gender classification