How birdSense Works

Read about how I built a AI powered bird classification system for my backyard. Built using a Raspberry Pi, Swin transformer model, aws cloud infrastructure, and this Next.js web application.

Table of Contents

Project Introduction

I decided I wanted to build BirdSense to help my dad identify all the different specicies of birds that visit our bird feeders in our backyard. I also wanted to build something with machine learning and computer vision and those things interst me and I wanted to learn more about them. what begain as simple curioisity quickly evolved into this fully fleged project, invloving reading research papers, training my own neural network, using a raspberry pi and through that learning about linux, and desinigng a website. I had no idea how challanging but also how rewarding it would be. I learned a emense amount about machine learning, computer vision, system architecture, and cloud infasturture. The end result is more than anything I could have imagined I could have built. The model achieves roughly 92% accuracy on over 200 species of birds.

Project Goals

  • Real-time bird species identification using ML
  • Scalable cloud architecture
  • user friendly web application to display bird classifications
BirdSense Home Page

BirdSense Home page

System Architecture

The birdSense architecture implements a hybrid edge-cloud model, optimizing for both real-time performance and scalable processing. The system uses a raspberry pi, AWS Lambda for serverless functions, MongoDB Atlas for data storage, and Next.js for a responsive web experience.

Edge Processing Layer

  • • Raspberry Pi with camera module
  • • YOlO11 for detection
  • • SwinT model for classification
  • • norfair object tracking
  • • upload via API

Cloud Processing Layer

  • • AWS Lambda serverless functions
  • • MongoDB Atlas
  • • Authentication
System Architecture Diagram

Data flow from edge device through AWS Lambda to MongoDB Atlas and Next.js frontend

Technologies Used

Raspberry Pi 5PythonSwin TransformerYOLO11AWS LambdaS3 bucketMongoDB AtlasNext.jsTypeScriptOpenCVPyTorch

Hardware Setup

The edge computing component uses a Raspberry Pi 5 as the primary processing unit. A rasbperry pi camera modeule 3 is used for the camrea. Designed 3D printed housing for raspberry pi and mount for camera (waiting on 3d-printer access to print). The hardware setup focuses on reliability, performance, and weather resistance.

Hardware Components

Raspberry Pi 5 (16GB RAM)

memory for ML operations

Raspberry pi Camera module 3

12MP sensor with autofocus

Enclosure

housing and camera mount with thermal management

2.4 Ghz WiFi Connectivity

connected to internet even in backyard to allow uploading to backend

Edge Device Hardware Setup

Rapberry Pi with camera In housing

Machine Learning Pipeline

To balance performance and accuract, BirdSense deploys a two-stage pipeline. First YOLOV11 detects and birds in the camera frame and draw a bounding box around them, then the cropped image is fed into the Swin transformer model for classification.

At first I wanted to use a dataset provided by Cornell lab of ornithology to train my classification model. I tried to use their NABirds dataset, this contained classification images for 400 species of birds, and was hierarchical. I quickly realized that that I was way above my head and that I needed a simplier dataset, without a hierarchical structure. I then discovered the dataset CUB-200, a much simplier dataset with 200 species of birds and over 10000 images which I ended up using for training. I split the dataset into 50% training and 50% validation according to metadata provided by the dataset.

To figure out how to train my model I read several research papers that utilized the CUB-200 dataset. I found the paper: Fine-grained Visual Classification with High-temperature Refinement and Background Suppression interesting and decided to use it as a model for my training pipeline. I initialized the SwinT backbone using the ImageNet pretrained weights. The image size was 384x384, I used an effective batch size of 32, 80 training epochs and a LR scheduler with a linear warmup followed by cosine annealing to a LR of zero. I used augmentation of random resize crop, random erasing, and the built in random augmentation method with pyTorch. the validation accuracy was logged into TensorBoard at each epoch and the best model checkpoint was saved whenever accuracy improved. The classification head combined the pre-trained backbone with a BSHead that computes top k-pooling and composite loss to achieve background suppression as specified in the paper. The model was trained using PyTorch on my Desktop computer with a 3070 Super, it took 3 whole days to run all 80 epochs. In the end the classifier achieved a top accuracy of 92% on the validation set.

Model Pipeline

1
Detect birds in frame (YOLOv11)
2
crop by bounding box to isolate bird
3
Classify isolated bird image with SwinT + BShead
4
Upload classification via lambda and display
SwinT training

SwinT training

Model Performance Metrics

92%
Top-1 Accuracy
200
Species Classes

Cloud Infrastructure

I decided to use AWS and MongoDB Atlas for cloud infrastructure because of their scability, free to use tier, and because I wanter to learn how to use them. the cloud infrastructure handles my backend API and datastorage needs. I have a lambda functions that generates presigned URLs for loading and uploading images from my s3 bucket, and I have lambda functions that handle uploading and loading classification data to a from my MonogDB database. I also have a lambda function that handles my authentication. This setup is easy to maintain, extremely scalable, free, and extremely reliable.

AWS Lambda Functions

  • • Generate S3 pre-signed URLs for the Pi to upload cropped bird images
  • • Receive JSON payloads from the Pi containing species, confidence, timestamp, and S3 key
  • • Write images into S3 via the signed URL; hand off metadata to MongoDB Atlas

MongoDB Atlas

  • • Persist each classification record (species, confidence score, timestamp, S3 image key)
  • • User session and device management
  • • Support real-time queries for the Next.js front-end
  • • Index on timestamp and species to power analytics and history lookups