2012 marked a breakthrough for Artificial Intelligence (AI), in particular for the field of Machine Learning (ML) that deals with algorithms that improve automatically by learning from data. Back then, Geoffrey Hinton and his team at Toronto University managed to design and efficiently implement an artificial neural network on a set of graphics cards of the kind normally used to run the latest and greatest video games. For the first time, a neural network of this kind was fast enough to be trained on millions of images. As a result, their method clearly beat the state of the art in object classification, almost halving the error rate of the previous best result. It was the beginning of the success story of Deep Learning, a branch of Machine Learning exploring Deep Neural Networks (DNNs). Due to their deep architecture with many stacked layers of artificial neurons, DNNs are capable of learning from much more data samples than it was previously possible with other Machine Learning methods.
From the Cloud to the Edge
At Captural, we use Deep Learning for Computer Vision in order to build the next-generation photo book app. Computer Vision is the discipline that deals with how computers can gain high-level understanding from digital images or videos. It is one of the areas that profited and still profits most from the Deep Learning revolution. Many techniques such as face, object and scene recognition as well as image segmentation have improved greatly in accuracy and execution speed over the past few years. In addition, new exciting applications such as transferring the style from one image to another or synthesis of photo realistic images have emerged.
At first, Deep Learning algorithms were mainly deployed to the cloud, because of the large amount of compute resources they typically require. However, nowadays many smartphones and tablets come with dedicated hardware for Machine Learning. In addition, iOS and Android both offer powerful software development kits that are optimized for these ML chips and greatly simplify mobile ML. Although most models are still trained in the cloud or on powerful desktop machines, once trained, the aforementioned advancements make it easy to run model inference on mobile devices. The practice of bringing the computation closer to where its results are needed is called Edge Computing or Edge ML for this particular case.
Why choose ML at the Edge
We are working on powerful ML capabilities for the Captural photo book app. Features that are already in the app or will soon become available include:
- Smart search filters that allow to find photos based on their contents (objects, scenes, faces, places)
- Smart content-aware cropping in order to create beautiful image compositions.
- Auto-detection of photo book themes given the photos a user selected.
- Identification of similar and duplicate images.
Executing inference for the required ML models directly on-device leads to a lower latency compared to running it in the cloud, because no data has to be sent back and forth. Furthermore, our app becomes more reliable, since for the most part of the user experience it can easily function without internet connectivity. Future features and use cases we are excited about will also profit from the very low latency. Some of them, for example Augmented Reality, are simply not feasible with inference running in the cloud.
In addition to low latency and offline usage, there is another advantage of Edge ML we believe is crucial: Privacy. If user data never leaves the device, there is no reason for privacy concerns. Let’s look at this in more detail.
The Privacy Threat
For big tech companies, which make most of their revenue selling advertising space on the internet, personal data is very valuable. It allows them to learn a lot about their users’ preferences and interests. Although companies such as Google and Facebook have improved their privacy policies and nowadays inform users more transparently about which data is collected and how it is used, what exactly happens inside the big data machinery that processes all this information is very complex and mostly hidden from the public eye. Even if customers agree with the data use of big tech companies, wherever massive amounts of data are being collected and stored, there is always a risk that some of it will be lost — or even worse — get into the wrong hands. It’s a fact that these companies apply very high security standards in order to protect customer data. Nevertheless, on many occasions, missing or failing internal processes and human errors have led to massive data breaches, the most striking one perhaps being the scandal involving Facebook and Cambridge Analytica.
Photos are very sensitive data, because they often capture private moments and contain a lot of personal information. For example, it is very easy to extract the metadata contained in the photos we take on our smartphones. This metadata often includes GPS coordinates of the place a photo was taken at and almost always the exact point in time. With the powerful ML algorithms and huge server farms dedicated to big data analysis, it’s possible to recognize who is in a picture, which environment people are in and/or which activity they are performing. With sentiment analysis algorithms it’s possible to predict the mood a person was in when the picture was taken. Similarly, sentiments can be extracted from voice recordings by analyzing the tone and phrasing of speech. From videos, even respiratory-patterns and heart rate can be extracted. By analyzing all of this personal data over time, it becomes possible to predict whether someone is happy, depressed or anxious and even track their state of mental health and much more.
Captural Respects Your Privacy
At Captural, we are convinced that users should have full control over their personal data. It must be very easy for them to understand exactly which data is used by our apps as well as why and how it is processed. Let’s look at how we handle data privacy for our photo book app in more detail:
Since today’s powerful smartphones make it possible to run all our ML algorithms on-device, we see no reason to analyze user photos in the cloud. We can provide a great and individual user experience based on AI insights gained directly on device. We only store the data in the cloud that is necessary for printing and fulfilling orders. Moreover, we apply strict security practices and delete all photos automatically after the product is shipped. Removing the photos from our servers is very much in our best interest, since it helps us to avoid the risk of a data breach. When photos are removed from our servers, the photo book projects are of course still available in the app and another printed copy can be ordered anytime. In case a user decides to use our voice memo feature to enrich a photo book with audible memories that can be shared with friends and family, it’s also necessary to store the corresponding audio clips and image thumbnails in the cloud. We offer a simple option for the users to delete this data on our end at their convenience. Last but not least, our goal is not to monetize user data. We never share any user data with third-parties. We use ML solely to create a great product and user experience.
Our models are not trained on our customers’ photos, but on data that was separately collected explicitly for this purpose. For evaluation of the models, we use our own photo collections. In addition, to tailor the app experience even more to users’ individual needs, it’s possible to enrich and fine-tune an ML model developed in our lab with a user’s own pictures directly on their device. This is an interesting topic we might explore in the future!
Give it a Try
In summary, Captural makes use of powerful Deep Learning and Computer Vision algorithms that run directly on your device and help you find and edit your best pictures. We make it very quick and easy for you to create beautiful photo books, while keeping your personal photos private and safe. Try it out for yourself by downloading the Captural app from the iOS App Store and let us know what you think.
Author: Roman Frigg, Principal Machine Learning & Computer Vision Engineer at Captural