David Imel / Android Authority
Smartphone chipsets have come a long way since the the early days of Android. While the vast majority of budget phones were woefully underpowered just a few years ago, today’s mid-range smartphones performs as well as flagship products of one or two years.
Now that the average smartphone is more than capable of handling general day-to-day tasks, chipmakers and developers have set bigger goals for themselves. From this perspective, it is therefore clear why ancillary technologies such as artificial intelligence and machine learning (ML) are now taking center stage. But what does machine learning mean on the device, especially for end users like you and me?
In the past, machine learning tasks required sending data to the cloud for processing. This approach has many drawbacks, ranging from slow response times to privacy concerns and bandwidth limitations. However, modern smartphones can generate predictions completely offline thanks to advancements in chip design and ML research.
To understand the implications of this breakthrough, let’s explore how machine learning has changed the way we use our smartphones every day.
The Birth of Machine Learning on the Device: Improved Photography and Text Predictions
Jimmy Westenberg / Android Authority
The mid-2010s saw an industry-wide race to improve camera image quality year after year. This, in turn, has proven to be a key stimulus for the adoption of machine learning. Manufacturers have realized that technology can help bridge the gap between smartphones and dedicated cameras, even though the former have substandard hardware.
To that end, almost all of the big tech companies have started improving the efficiency of their chips for machine learning-related tasks. In 2017, Qualcomm, Google, Apple and Huawei had all released SoCs or smartphones with accelerators dedicated to machine learning. Over the following years, smartphone cameras have improved overall, especially in terms of dynamic range, noise reduction, and low-light photography.
More recently, manufacturers such as Samsung and Xiaomi have found new use cases for the technology. The old one Single grip functionality, for example, uses machine learning to automatically create a high-quality album from a single 15-second video clip. Xiaomi’s use of the technology, on the other hand, has gone from simply detecting objects in the camera app to replace the whole sky if you desire it.
In 2017, almost all of the big tech companies started improving the efficiency of their chips in machine learning-related tasks.
Many Android OEMs now also use machine learning on the device to automatically tag faces and objects in your smartphone gallery. This is a feature that was previously only offered by cloud-based services such as Google Photos.
Of course, machine learning on smartphones goes far beyond just taking pictures. It’s safe to say that text-related apps have been around for just as long, if not longer.
Swiftkey may have been the first to use a neural network for better keyboard predictions as early as 2015. The company claims that he had trained his model on millions of sentences to better understand the relationship between different words.
Another hallmark feature emerged a few years later when Android Wear 2.0 (now Wear OS) gained the ability to predict relevant responses to incoming chat messages. Google later dubbed the feature Smart Reply and generalized it with Android 10. You probably take this feature for granted whenever you reply to a message from your phone’s notification shadow.
Voice and AR: nuts harder to crack
While on-device machine learning has matured in text prediction and photography, speech recognition and computer vision are two areas that are still seeing significant and impressive improvements every few months.
Take Google’s Instant Camera Translation feature, for example, which overlays real-time translation of foreign text right into your live camera feed. While the results aren’t as accurate as their online counterpart, the feature is more than usable for travelers with a limited data plan.
High-fidelity body tracking is another futuristic-sounding augmented reality feature that can be achieved through powerful machine learning on the device. Imagine the LG G8 Air movement gestures, but infinitely smarter and for larger applications such as training follow-up and sign language interpretation instead.
Learn more about the Google Assistant: 5 tips and tricks you may not know
When it comes to speech, voice recognition and dictation have been around for over a decade at this point. However, it wasn’t until 2019 that smartphones were able to take them completely offline. For a quick demo of this, check out Google Recorder app, which leverages machine learning technology on the device to automatically transcribe speech in real time. The transcript is stored as editable text and can also be searchable – a boon for journalists and students.
The same technology also powers Live legend, an Android 10 (and later) feature that automatically generates closed captions for any media content streamed to your phone. Besides serving as an accessibility feature, it can be useful if you are trying to decipher audio clip content in a noisy environment.
While these are certainly cool features on their own, there are a number of ways they can evolve in the future as well. Improved speech recognition, for example, could allow faster interactions with virtual assistants, even for those with unusual accents. Although Google Assistant has the ability to process voice commands on the device, this feature is unfortunately exclusive to the Pixel range. Still, it offers a glimpse into the future of this technology.
Personalization: the next frontier for machine learning on the device?
The vast majority of machine learning applications today rely on pre-trained models, which are generated in advance on powerful hardware. Deducing solutions from such a pre-trained model, like generating a contextual intelligent response on Android, only takes a few milliseconds.
At present, only one model is trained by the developer and distributed to all the phones that need it. This unique approach, however, does not take into account the preferences of each user. It also cannot be fueled by new data collected over time. As a result, most models are relatively static and only receive updates every now and then.
Addressing these issues requires that the model training process be moved from the cloud to individual smartphones – a tall order given the disparity in performance between the two platforms. Still, it would allow a keyboard app, for example, to tailor its predictions specifically to your typing style. To go further, it might even take into account other contextual clues, such as your relationships with other people during a conversation.
Currently, Google’s Gboard uses a mix of on-device and cloud-based training (called federated learning) to improve the quality of predictions for all users. However, this hybrid approach has its limits. For example, Gboard predicts your likely next word rather than entire sentences based on your individual habits and past conversations.
A still unrealized idea that SwiftKey had been considering for its keyboard since 2015
This type of one-on-one training should definitely be done on the device, as the privacy implications of sending sensitive user data (like keystrokes) to the cloud would be dire. Apple even acknowledged this when it announced CoreML 3 in 2019, which allowed developers to re-train existing models with new data for the first time. Even so, however, most of the model must initially be trained on powerful hardware.
On Android, this kind of iterative model retraining is best represented by the adaptive brightness feature. Since Android Pie, Google has used machine learning to “observe the interactions a user makes with the screen brightness slider” and reform a model tailored to each individual’s preferences.
On-device training will continue to evolve in new and exciting ways.
With this feature enabled, Google claims a noticeable improvement in Android’s ability to predict the correct screen brightness in just a week of normal smartphone interaction. I didn’t realize how well this feature worked until I migrated from a Galaxy Note 8 with adaptive brightness to the new LG Wing which surprisingly only included the old brightness logic ” auto “.
As to why training on the device has so far only been limited to a few simple use cases, it’s pretty clear. Besides the obvious compute, battery, and power constraints on smartphones, there aren’t many training techniques or algorithms designed for this purpose.
While this grim reality doesn’t change overnight, there are several reasons to be optimistic about the next decade of mobile ML. With tech giants and developers both focusing on ways to improve user experience and privacy, on-device training will continue to evolve in new and exciting ways. Maybe then we can finally think of our phones as smart in every sense of the word.