We have been working with ExecuTorch to try and run ML models onboard the phone. This has been a bit challenging, as we need to deviate from our single .wasm file method. Instead, we need to upload and run bundles containing:

  • WASM file
  • model.pte
  • manifest.json
  • labels.json (if we want bounding boxes to show class and not just ID)

For this, we are specifically using a SSDLite-MobileNetV3-Large checkpoint from torchvision. This is the smallest obstacle detection model they have, and it yields decent results if we lower the confidence threshold.

Notably, we are using the XNNPack backend, and not the CoreML backend which would theoretically be faster on an iPhone. This is due to the fact the CoreML does not support dynamic graphs; so, the model backbone itself works fine but it can’t do NMS to suppress duplicate bounding boxes. However, Thomason did some engineering to optimize the image passing and inference pipelins, such that we don’t have much of a noticeable drop in framerate.