Meet Spotter, the Physical Therapist AI

Problem

When I started lifting weights at the start of this year and chatting with other amateur gym-goers, a pattern emerged. Many people spoke about injuring themselves because of bad posture. One friend took a month to recover because he arched his back slightly during a deadlift. It's not only lifting, of course. Form matters in yoga, running on a treadmill, etc.

Why do people hurt themselves? There are at least two reasons:

  1. Education: Exercisers don't know proper form.
  2. Awareness: Exercisers lose focus. It's 6:30am. They haven't had breakfast. Their minds are somewhere else.

Solution

I made Spotter to mitigate the awareness problem. It's a virtual assistant that offers tips for proper exercise posture.

Screen capture:

Using your device's camera, you or your trainer demonstrate "good" and "bad" form for a particular movement. Spotter will tell you "Be careful" when your posture seems incorrect or dangerous.

The web app is optimized for privacy and offline use. Spotter doesn't transmit video or audio. All the processing happens on your phone or laptop. Because the advice is based on examples you provide, it works with most body types and exercises.

You can try it here, but keep in mind a few disclaimers. Spotter was a long weekend side project. Please use it at your own risk. It is not intended to provide medical advice or replace a human physical therapist.

Challenges

Two technical challenges of this project might interest you.

1. Creating a serverless, offline web app

Starting out, I wanted Spotter to have the responsive feel of a native app. Also, for privacy reasons (which I'll explore shortly), I knew I wasn't going to do any heavy lifting on the serverside. A serverless architecture seemed like a good solution to these problems.

After developing a single-page app (SPA) with Webpack and React, I deployed both a static index.html file and the Webpack "bundle" to S3. I used a service worker to cache the static assets so that you can access them offline.

This serverless, offline design allows for near-instant loading, and delivering files from S3 is cheap (in terms of speed & money). At the moment I don't need to integrate with APIs. If the need arises, I'll probably use Lambda.

2. Efficiently classifying images in a mobile browser

For a gentle introduction to machine learning in the browser, I recommend checking out Google's Teachable Machine. The project show how easy it is to define "classes" — for example, Spotter's "correct" and "incorrect" postures — and classify webcam images.

I'll try to explain how the classification works at a high level. (If you're a real data scientist, feel free to jump in and correct me.) The Teachable Machine example uses deeplearn.js: a hardware-accelerated machine learning library for the web.

More specifically, it uses a K-nearest neighbors (KNN) algorithm with transfer learning. Transfer learning involves feeding data to a pre-trained neural net. This makes it feasible to classify images in a browser whereas training from scratch would make your computer burst into tears.

The Teachable Machine example code couples view, model and presentation logic in the interest of keeping things simple. My first step was to abstract model-related code into a standalone class:

Integrating the ImageClassifier with the React app was straightforward. Check out the TrainingSession component to see how I'm calling it.

The tricky bit was optimizing for mobile. While the classifier doesn't need to transmit video or audio, it asynchronously downloads weights of the pre-trained model. Meanwhile, it uses a hefty chunk of GPU memory. These issues delayed Time to Interactive (TTI) by 23 seconds over slow 3G.

What gives?

Initially, I used this algorithm for feeding data to the KNN:

  1. Request an animation frame in the browser
  2. Extract pixel data from a video element
  3. Push the training data to the classifier
  4. Back to Step 1

There are two problems with this approach:

  • It waits for the classifier to download weights before it starts accepting "example" images. Over slow 3G, the "Training AI" progress bar advanced at a snail's pace.
  • Adding training data to the classifier is expensive. Doing this on each animation frame drags down the main thread.

I improved responsiveness by tweaking the algorithm a bit:

  1. Start initializing the classifier in the background. Don't wait for it to finish.
  2. On each animation frame, extract pixel data from the video and push it to a pixelData stack.
  3. Whenever the classifier is done loading, begin processing the stack. Pop n items from pixelData every k seconds and pass them to KNNImageClassifier.addImage.

With this approach, there's no loading from the user's perspective. She can complete both training steps while the classifier does its magic in the background.

Summary

Using Spotter along with a human trainer might help some exercisers focus on proper form. Open-source libraries like deeplearnjs make it surprisingly easy to classify images on the web — though performance can be tricky. A serverless architecture with an offline classifer can be a good solution when privacy is paramount.

Want to contribute? Head over to GitHub. I have lots of ideas for improvement, and I'd love to hear yours.

To reiterate the disclaimers above: Spotter was a weekend side project. Please use it at your own risk. It is not intended to provide medical advice or replace a human physical therapist / trainer.

Cody Romano

Cody Romano

Product engineer at Airbnb. This is my personal website for side projects & ramblings.

Read More