Custom TensorFlow models on ML Kit: Understanding Input and Output

Posted on Sep 7, 2018

You can find this example in my repo deep-learning under android-mlkit-sample: https://github.com/miquelbeltran/deep-learning/tree/master/android-mlkit-sample

My recommendation, before attempting this tutorial, be familiar with Firebase and to take a look at the codelab: Identify objects in images using custom machine learning models with ML Kit for Firebase.

The problem with the above examples is that they were a bit too much complex for me. To undestand better how all works, I created a simpler example.

In this example, you will be creating a model from scratch, and you will have better understanding on how the inputs and outputs are shaped.

My TensorFlow model can be found here: MatMul Jupyter Notebook.

This simple model is just multiplying an input by a local variable, and then returning the output. The idea behind this model is to experiment with different input and output sizes. Feel free to add more operations in between once you got it running.

The model creation is also available here:

If you want to understand what this TensorFlow code is doing:

The important learning so far is that we have defined the input shape of our model, and we also know the output shape of it. We are passing a 1x1 matrix and we are getting back another 1x1 matrix. I encourage you to change this shape to make the model more complex!

Help! I can’t figure out my model shapes!

tf.shape to the rescue! TensorFlows tf.shape method will help you print each tensors shape, this is specially useful if you are having problems matching the output shape of your model to your ML Kit implementation.

The next step is in Android Studio. I recommend you to open the project in https://github.com/miquelbeltran/deep-learning/tree/master/android-mlkit-sample with Android Studio, then I will guide you through the code:

In this part of the code, I specify the input and output shape for the model.

Our inputDims are a 1x1 matrix, so we define it as an intArrayOf(1,1) .

If we would like to process an image, most likely our inputDims would be an intArrayOf(32, 32, 3) for a 32 x 32 pixel image. And what is the 3? The 3 represents the three different color channels, so most likely you are passing an RGB image, and each channel is represented separately.

Sometimes you will see intArrayOf(1, 32, 32, 3) for images too. That’s because in TensorFlow you are not limited to a single data input when training, but instead you can pass a batch of many images together to train your model. However, when running our trained models, we only need a single input, that’s why we have a 1 there (a batch size of one).

The output dimensions are defined by the output of the model. Which sounds obvious, but will helps us verify if we are doing something wrong in our model.

In most classification models you will obtain an array or a matrix as long as the number of categories. So a model trained with ImageNet you can expect more than 1000 different categories, then an array of size 1000 or a matrix of 1x1000, with the probabilities for each category.

Secondly, we also specify the data type. In your case is FLOAT32 since we have defined our inp placeholder with the dtype=tf.float32, but we could also have different values heres, like byte, int, etc.

Secondly we need to understand how to pass inputs and read outputs:

In our example, we have a 1x1 matrix of floats, and we can build that with arrayOf(floatArrayOf(x)). If you have a different shape of data, you will have to figure out on how to create the required data buffer shape, which is probably the most complicated part of this task.

As you can see, we are specifying the data shape in two places: in the setInputFormat call and now shaping our data. The good news is that if you do a mistake defining the input shape, ML Kit will complain an let you know, so you can fix your data shape.

Similarly, we need to call to task.result.getOutput to obtain the output data. We will have to specify the type of this data. In our case, is a float[][] or with Kotlin an Array<FloatArray>, and as expected, the output value is in the position [0][0] .

I think it is important to understand how the input and output shapes and data types from our models affects our ML Kit integration. In the above example, you have seen how the configuration for the input and output data need to be adapted to our TensorFlow model.

This will help you build more complex models, export them to ML Kit, and use them on your mobile applications. I encourage you to modify my sample and to experiment with different input and output shapes.

Share Tweet