Realtime Object and Face Detection in Android using Tensorflow Object Detection API

Computer Science has seen many advancements as the years go by. One such advancement is AI and in AI, Image Recognition is making waves. In keeping up with this tech, our AI team worked on a small image recognition project and find out what it is right here.

Realtime Object and Face Detection in Android using Tensorflow Object Detection API
Karthik Kamalakannan

Karthik Kamalakannan

Founder and CEO

Artificial Intelligence is one of the breakthrough tech in computer science milestones among all their achievements. Image Recognition was a challenging part for the machine to detect objects. In this century, many computing resources and intelligent algorithms make it easy. But the feature will be only for those who have specifically configured machines to detect objects. After the release of Tensorflow Lite on Nov 14th, 2017 which made it easy to develop and deploy Tensorflow models in mobile and embedded devices - in this blog we provide steps to a develop android applications which can detect custom objects using Tensorflow Object Detection API.

Realtime Object and Face Detection in Android using Tensorflow Object Detection API

Requirements

Before You Get Started

Since the project is full of work with Python codes, libraries and API it’s a good methodology to work in a Python virtual environment and we use pip to install Python package make sure you installed it.

PIP installation:

Python Virtual Environment installation:

If you completed all the steps mentioned above you have to see the ouput like this in your terminal or command prompt:

Python Virtual Environment

To know more about python virtual environment feel free to visit this link.

Step 1 - Collect Data:

In this project, we are going to work with custom images so I’m collecting images of Steve jobs and Elon Musk for it. After collecting all the images, annotate or box the object which you have to detect in the image using Labelimg and save both the .jpeg and .xml file of it in the image folder.

Collecting Data

After creating a boundary box of an object, we get .xml file of it, it’s called as annotation file which will be used to specify the region where the classifier should focus on and it will be looking like this. The boundary box notation will be xmin,ymin,xmax,ymax.

<annotation>
        <folder>steve_jobs</folder>
        <filename>14_steve-jobs.jpg</filename>
        <path>"path of the image"</path>
        <source>
                <database>Unknown</database>
        </source>
        <size>
                <width>1024</width>
                <height>768</height>
                <depth>3</depth>
        </size>
        <segmented>0</segmented>
        <object>
                <name>steve jobs</name>
                <pose>Unspecified</pose>
                <truncated>0</truncated>
                <difficult>0</difficult>
                <bndbox>
                        <xmin>270</xmin>
                        <ymin>9</ymin>
                        <xmax>753</xmax>
                        <ymax>642</ymax>
                </bndbox>
        </object>
</annotation>

We successfully collected the required data and annotated it. Now we have to convert the images from xml file to a single csv file, because the file conversion is in this manner .jpeg >.xml > .csv > .record. So we have to convert it, I provided the code to convert the xml’s below.

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
 
 
def xml_to_csv(path):
    xml_list:[]
    for xml_file in glob.glob(path + '/*.xml'):
        tree:ET.parse(xml_file)
        root:tree.getroot()
        for member in root.findall('object'):
            value:(root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name:['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df:pd.DataFrame(xml_list, columns=column_name)
    return xml_df
 
 
def main():
    for directory in ['train','test']:
        image_path:os.path.join(os.getcwd(), 'images/{}'.format(directory))
        xml_df:xml_to_csv(image_path)
        #Storing the csv file into the data directory.
        xml_df.to_csv('data/{}.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')
 
main()

To save the data file create another data directory in your project file, so its normally easy to organize otherwise save as you wish. But when you create the data directory, create an empty train.csv and test.csv into it. Next step is to convert the csv file to tfrecord file because Tensorflow have many functions when we use our data file in a tfrecord format. I’ve given the code below to convert the .csv to .record or .tfrecord.

import os
import io
import pandas as pd
import tensorflow as tf
 
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict
 
flags:tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS:flags.FLAGS
 
def class_text_to_int(row_label):
    #if you working with your own classes chenge the label
    if row_label == 'steve jobs':
        return 1
    elif row_label == 'elon musk':
        return 2
    else:
        return None
 
def split(df, group):
    data:namedtuple('data', ['filename', 'object'])
    gb:df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
 
 
def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg:fid.read()
    encoded_jpg_io:io.BytesIO(encoded_jpg)
    image:Image.open(encoded_jpg_io)
    width, height:image.size
 
    filename:group.filename.encode('utf8')
    image_format:b'jpg'
    xmins:[]
    xmaxs:[]
    ymins:[]
    ymaxs:[]
    classes_text:[]
    classes:[]
 
    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))
 
    tf_example:tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example
 
 
def main(_):
    writer:tf.python_io.TFRecordWriter(FLAGS.output_path)
    path:os.path.join(os.getcwd(), 'images')
    examples:pd.read_csv(FLAGS.csv_input)
    grouped:split(examples, 'filename')
    for group in grouped:
        tf_example:create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())
 
    writer.close()
    output_path:os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))
 
if __name__ == '__main__':
    tf.app.run()

That’s great we completed the data processing! Now you need to have two files in your data folder train.record and test.record as your final output of this step.

Second Step - Creating and Training your model:

It’s a tedious process to create a convolutional net, feed your data and train it and also we cannot achieve a good accuracy when we develop the net on our own. So we are going to use Google pre-trained model called ssd_mobilenet_v1_coco. We use the model and config file of it in our project.

Link to download the files:

Create a training folder in your project, move the config file into it. Then create another file in training folder called object_label.pbtxt which defines the labels of the class during testing the model. The object_label should contain:

item
{
  id: 1
  name: 'Steve Jobs'
}
item
{
  id: 2
  name: 'Elon Musk'
}

When you extract the ssd_mobilenet file you get all the pre-trained models. Now we have to make some changes of the config file. But in this config file I already changed it. If you want make changes for your images follow this step.

sd
{
num_classes: 2
box_coder {
  faster_rcnn_box_coder {
    y_scale: 10.0
    x_scale: 10.0
    height_scale: 5.0
    width_scale: 5.0
  }
}
}
train_config: {
  batch_size: 15 #Change the Batch size
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.001
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 300 #Number of steps to train
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}
train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record" #path of our train record
  }
  label_map_path: "training/object_detection.pbtxt"
}
 
eval_config: {
  num_examples: 2000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}
 
eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/test.record" #path of our test record
  }
  label_map_path: "training/object_detection.pbtxt"
  shuffle: false
  num_readers: 1
}

Great! Now we made all our configuration for the project. We have to download the Tensorflow object detection API (TensorFlow Object Detection API) as we need only their object models, I have downloaded and it will be available at this link. Now extract the models zip file and store it in your project folder.

Run, $ python setup.py install

Now you get all the required properties installed to run the API in your system.

Tensorflow Object Detection API

Note: You should be in your python virtual environment while executing these command

$ cd “PATH TO THE MODELS FOLDER” $ sudo apt-get install protobuf-compiler python-pil python-lxml $ sudo pip install pillow $ sudo pip install lxml $ sudo pip install jupyter $ sudo pip install matplotlib $ protoc object_detection/protos/*.proto —python_out=. $ export PYTHONPATH=$PYTHONPATH:pwd:pwd/slim

$ python train.py —logtostderr —train_dir=training/ —pipeline_config_path=training/ssd_mobilenet_v1_pets.config

where, logtostderr - it defines that to store the log data

Step 3 - Testing and Exporting the model:

Now we created a model which detects Steve or Elon in the image, but we didn’t see our output Here comes the testing. Before testing we should create an inference graph.

python export_inference_graph.py
--input_type image_tensor
--pipeline_config_path training/ssd_mobilenet_v1_pets.config
--trained_checkpoint_prefix training/model.ckpt-10856
--output_directory steve_elon

where,

—trained_checkpoint_prefix is the latest ckpt in the training folder

—output__directory defines the directory where the inference graph should be saved

Cool ,at last we created a successful model now our job is to deploy in a android app.

Last Step, Deploying in Android:

Explanations:

Few important pointers that we should know:

Procedures:

# Uncomment and update the paths in these entries to build the Android demo.
#android_sdk_repository(
#    name: "androidsdk",
#    api_level:23,
#    build_tools_version: "25.0.1",
#    # Replace with path to Android SDK on your system
#    path: "<PATH_TO_SDK>",
#)
#
#android_ndk_repository(
#    name="androidndk",
#    path="<PATH_TO_NDK>",
#    api_level=14)

Example: For SDK,

android_sdk_repository(
  name: "androidsdk",
  api_level:23,
  build_tools_version: "27.0.1",
  # Replace with path to Android SDK on your system
  path: "/Users/robinreni/Android/SDK",
)

For NDK,

android_ndk_repository(
  (name = 'androidndk'),
  (path = '/Users/robinreni/Downloads/android-ndk-r15/'),
  (api_level = 15)
)

Unknown Steve Jobs Elon Musk

$ bazel build -c opt //tensorflow/contrib/android:libtensorflow_inference.so —crosstool_top=//external:android/crosstool —host_crosstool_top=@bazel_tools//tools/cpp:toolchain —cpu=armeabi-v7a

bazel-bin/tensorflow/contrib/android/libtensorflow_inference.so Move the libtensorflow_inference.so file to the temp_folder

$ bazel build //tensorflow/contrib/android:android_tensorflow_inference_java

bazel-bin/tensorflow/contrib/android/libandroid_tensorflow_inference_java.jar Move the libandroid_tensorflow_inference_java.jar file to the temp_folder

Now the .so file is built with your project. Let’s change some configurations

FINAL OUTPUT:

https://youtu.bedhGU4acR3m0

Finally you must get the result like this. If not check your code or your model.

As soon as this feature hits production, start developing this cool stuff in your mobile. If you’re stuck at any point and need help, comment in the section below and we’ll get back to you. Happy Coding! 🙂