facebook GTN usage examples(draft)
What does GTN do?
Let’s recall something of pytorch firstly. With pytorch, programmer only concentrate forward process, during which all tensors are created, manipulated, and interacted with each other. Usually, a scalar type loss is generated as the end of forward process. Then just do
loss.backward() // pytorch will do automatic differentiation
// over all tensors in its dynamic computation graph
Replace “tensor” with “fsa/fst” of above process, you got the idea of what GTN do.
GTN provides methods to define fst graphs and operators to manipulate them. Programmers can just focus the process of fst graphs interaction. Then GTN will handle the backward process with a single line:
import gtn
g1 = gtn.Graph()
g2 = gtn.Graph()
# ... add some nodes and arcs to the graph ...
# Compute a function of the graphs:
intersection = gtn.intersect(g1, g2)
score = gtn.forward_score(intersection)# Calculate gradients:
gtn.backward(score)
usage example: build ctc loss with GTN
Following example is to build a example of ctc loss. Click the following link then you can run the example step by step.
Note that the grad of emmision graph is
0 4
0 1 0 0 0.2
0 1 1 1 0.8
1 2 0 0 0.6
1 2 1 1 0.4
2 3 0 0 0.6
2 3 1 1 0.4
3 4 0 0 0.2
3 4 1 1 0.8
Why these value?
The label graph is: means the label sequence is AA.
The emission graph is: means the each valid path has 4 chars.
In emmission graph, there are 2*2*2*2 = 16 path.
blank blank blank blank 》》 nothing
blank blank blank A 》》 A
blank blank A blank 》》 A
blank blank A A 》》 A
blank A blank blank 》》 A
blank A blank A 》》 AA (valid)
blank A A blank 》》 A
blank A A A 》》 A
A blank blank blank 》》 A
A blank blank A 》》 AA (valid)
A blank A blank 》》 AA (valid)
A blank A A 》》 AA (valid)
A A blank blank 》》 A
A A blank A 》》 AA (valid)
A A A blank 》》 A
A A A A 》》 A
take all valid paths
blank A blank A 》》 AA (valid)
A blank blank A 》》 AA (valid)
A blank A blank 》》 AA (valid)
A blank A A 》》 AA (valid)
A A blank A 》》 AA (valid)
for first column, there is one blank and four A, so blank’s occupation probability is 1/ (1 + 4) = 0.2; while A’s occupation probability is 4 / (1+4) = 0.8.
As why occupation probability is identical to grad? we will discuss in next blog.