Pytorch to train Deepspeech and optimize it.

So, for the last 6 months, I have been optimizing the FFTs into deepspeech, having a model of the iPhone FFT’s DSP to improve accuracy of the deeplearning recognition, because the closest you get from the input stream, the better the recognition level of deepspeech is.

So, I have modeled the DSP to be able to reprocess the large data set of Deepspeech, and added few 100 hours of real audio from the iPhone recording subsystems. That had given me a huge boost in recognition compare to the default sets.

Yesterday, because of curiority, and because of some extra features of Pytorch compare to TensorFlow, I did look at Pytorch, and its port of deepspeech, I plugged the FFT model, and incorporated my workload into pytorch.

Interestingly, it was pretty straight forward to make it work, I have no surprise on the python side, few libraries with different versions, left or right, but nothing much to worry.

Now, what is interesting: There are some pretty good tools from Apple to brigde from Pytorch to CoreML. For the moment , I use a modified version of TensorFlow Light on iPhone, I have added the missing feedback features of Tensorflow in the light version, but this is not using yet the full power of the neuronal engine of the A12 Bionoc. So, if the bridge between Pytorch and core ML is enabling a full direct conversion, I would be able to see some serious performance gain.

This is the flow to follow PyTorch → ONNX→ CoreML

I could not find the list of features supported for those conversion, so, one of the best way to learn about it is to make the test, it should be about a day of work to get the all pipeline in place, and get a good idea about what is missing in term of feature set, and then, learning about how much it needs to be reworked to have the full feature list required. (very often, mobile version of convolution systems only few the top 200 features of a full training/scoring features, and most of innovating neural net use more than this limited set)

So, I know what I do this week, I really want Deepspeech singing on this A12 Bionic Neuronal Engine.