130,000 reasons why data science can help clean up San Francisco was a freelance consulting project I did for Rubbish, a San Francisco startup. Rubbish is building a smart trash grabber and companion phone application. As you use the grabber to pick up litter off the curb, it uses your phone to automatically photograph what you've picked up. You then classify the pickup into one of several categories, and the data point gets filed away (with a timestamp and a GPS coordinate) in the Rubbish database. So you're not only cleaning up the street, but also generating actionable point cloud data in the process.
To validate the usefulness of this data I did an extensive analysis of Rubbish's Polk Street survey zone, a three block sector on a commercial strip in San Francisco that the group had been doing regular pickups on. I had a year's worth of data (130,000 points in all) to work with.
Executing on this project required some tricky geospatial join operations. I built a small special-purpose Python package called
streetmapper to help me through this process. I wrote a blog post discussing these problems in detail: Bringing together building, block, street, and point data.
Overall I learned a ton of interesting stuff working on this engagement. The resulting blog post made the front page of Hacker News.