The whole point of this project was that I could watch TV while working on my laptop without having to constantly check if we're on commercial or not. There's not much more reason than that. The only sports it can classify are basketball and football, as well as commercials.
The classifier is built with Python using torch to read the images and train the model. The frontend is React and Next.js with Electron, which is just one big overlay that sits on your screen with a border displaying green (sport is on) or red (commercial).
Project code: https://github.com/marker6275/sports-classifier
The classifier uses a Convolutional Neural Network to classify the sport based on the image. The parameters I used to train with were:
Initially, I used 10 epochs and a learning rate of 0.001, which worked quite well. I tried to get higher accuracy by increasing these parameters and the returns were really not worth it for an overnight training session.
For the final product, to read the images, I set up a webcam and the program would read the image every second and classify it. It would then display the result on the screen and send it to the frontend via a websocket connection. Since this could all just be run locally, it didn't need to be hosted on a server. Additionally, every 5 seconds, the program would screenshot the screen and save it to be used later.
I captured every image myself. Initially, I would screen record games on my laptop, using those to train the model. However, there were two flaws with this method.
To address the image background noise issue, I would set up a webcam pointed at my TV and take images, as described. But I also used a YOLO classifier to detect the TV in the frame. So, every five seconds, the program would check where the TV was, and adjust the screenshot frame accordingly. This way, our images would only capture the TV with minimal background noise. This actually worked quite well.
Since the only two sports I spent time watching were football and basketball, I only trained the model on those two sports. There was plenty of full games on YouTube, so I just recorded those to train with. For commercials, again, there are plenty of commercial compilations on YouTube. My live test was on an NFL game, so the data points are heavier for football and commercials (as you can see below).
The frontend is a just one big border to your screen. I built it with Electron since it's pretty lightweight and easy to use (I don't think I used the full power of Electron). This overlay lets you use your laptop normally and doesn't get in the way of anything. It changes color depending on the classification result, green for commercial and red for sport.
On an unrelated note, I think the idea of overlays are super cool and useful. The only places you see them currently are in interview cheating apps such and AI notetakers (think Cluely). But I think there's lots of potential for how we can use them.
The model was able to work with a pretty high accuracy at around 98% training accuracy and 97% validation accuracy. The commercials were more of a if it doesn't fit a sport, it's a commercial situation. Regardless, I was impressed with how well it worked.
One issue I kept running into was false positives for commercials. Since it was easier to determine when the game was on, even as humans, visually, we can tell if there's a court/field on vs. something else. But, there was a bit of noise in my training data, since it would include images such as shots of players' faces, the broadcast booth, transition screens, etc. These are present in both the NBA and NFL, and would occasionally show up as commercials. The model would tend to classify these as commercials.
To fix this, I just set a delay before the program updates the classification. It would need 5 straight classifications of commercials before updating as a commercial, and 2 straight classifications of a specific sport before updating the classification. I allowed it more leeway between sports since on the frontend, a binary decision between sport vs. commercial allowed more room for error.
This was pretty cool for me since this is my first ML project that wasn't a homework assignment. I thought this idea was really cool and could actually showcase my skills. Also, it was cool to mostly understand what was happening. This wasn't a I'm just gonna copy a tutorial and call it a day project.