What technologies were used in Smart Agriculture IoT Platform?

Apache Kafka, TimescaleDB, Python / Faust. Handles 50M+ daily events with exactly-once semantics and replay capability

What were the results of Smart Agriculture IoT Platform?

Real-Time Insights from 50M+ Daily Readings Key metrics: 50M+ daily readings processed, 100x query performance, 30% water savings for clients, 99.5% data accuracy.

How long did the Smart Agriculture IoT Platform project take?

The project took 10 months with a team of 5 engineers. Real-Time Streaming Pipeline with Intelligent Alerting

Who built Smart Agriculture IoT Platform?

This project was built by Nazmul Hoque Khan (Shuvro), a senior software engineer with 10+ years experience. Role: Senior Software Engineer at Spiio. Contact for similar projects: cal.com/nazmul

← All Case Studies

📖 0 min read 22 words

Agriculture Technology / IoT 2022-2023 10 months

Smart Agriculture IoT Platform

Spiio — Senior Software Engineer

Processing millions of sensor readings daily to help farmers and facilities optimize plant health through data-driven insights

50M+ Daily Readings

99.5% Data Accuracy

30% Water Savings

🎯 The Challenge

Raw Sensor Data Wasn't Actionable

Spiio's wireless sensors measured soil moisture, light, and temperature across thousands of plants in commercial facilities. But the raw data flood was overwhelming; facilities received millions of readings with no clear guidance on what actions to take. The existing system couldn't scale and had frequent data gaps.

Key Pain Points

Sensors generated 50M+ readings daily but most were noise without context
Legacy batch processing created 30-minute delays in alerts
No data validation; faulty sensors polluted the dataset undetected
Time-series queries on PostgreSQL were becoming unsustainably slow

💡 The Solution

Real-Time Streaming Pipeline with Intelligent Alerting

We rebuilt the entire data pipeline using stream processing principles, implementing real-time anomaly detection and actionable recommendations.

Technical Approach

Kafka-Based Streaming Architecture

Replaced batch processing with real-time streaming. Sensor readings flow through Kafka topics, enabling sub-second processing and multiple parallel consumers.

TimescaleDB for Time-Series

Migrated from vanilla PostgreSQL to TimescaleDB, achieving 100x query performance improvement for time-range aggregations.

ML-Powered Anomaly Detection

Trained models to distinguish sensor malfunction from actual plant stress. Faulty sensors are automatically flagged before corrupting analytics.

Actionable Alert System

Instead of raw threshold alerts, the system generates specific recommendations: 'Zone 3 needs watering within 4 hours based on soil moisture trend.'

🛠️ Technology Stack

🚀 Core Technologies

Apache Kafka

Event streaming and message broker

Handles 50M+ daily events with exactly-once semantics and replay capability

TimescaleDB

Time-series data storage

PostgreSQL-compatible with hypertables, automatic partitioning, and 100x faster time-range queries

Python / Faust

Stream processing applications

Python-native Kafka Streams alternative, integrates with ML ecosystem

🔧 Supporting Technologies

InfluxDB scikit-learn FastAPI

☁️ Infrastructure

AWS (MSK, RDS, EC2) Docker / Kubernetes Terraform

🏗️ Architecture

The system processes sensor data through a streaming pipeline:

1
2
3
4
5
Sensors → MQTT Gateway → Kafka → Stream Processors → TimescaleDB
              ↓              ↓           ↓              ↓
          Validation     Topics      Anomaly       Analytics
                        (raw,       Detection      Dashboard
                        enriched)

System Components

MQTT Gateway

Receives sensor data via MQTT, validates, and publishes to Kafka

Stream Processors

Faust workers for enrichment, aggregation, and anomaly detection

Alert Engine

Generates actionable recommendations based on patterns

Analytics API

FastAPI service exposing aggregated insights to dashboards

⚙️ Implementation Details

Designing the Streaming Pipeline

The core challenge was handling 50M+ daily readings while maintaining real-time responsiveness:

Kafka Topic Design:

sensor-raw: Unvalidated sensor readings (high volume)
sensor-validated: Passed validation, enriched with metadata
sensor-alerts: Threshold violations and anomalies
sensor-recommendations: Actionable insights for facilities

Stream Processing Stages:

Validation: Schema validation, range checks, sensor health
Enrichment: Add plant type, zone info, historical context
Aggregation: 5-minute, hourly, and daily rollups
Analysis: Trend detection, anomaly scoring, predictions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
@app.agent(sensor_raw_topic)
async def process_reading(readings):
    async for reading in readings:
        # Validate
        if not validate_reading(reading):
            await sensor_invalid.send(value=reading)
            continue
        
        # Enrich with plant/zone metadata
        enriched = await enrich_reading(reading)
        
        # Check for anomalies
        anomaly_score = await anomaly_detector.score(enriched)
        enriched.anomaly_score = anomaly_score
        
        await sensor_validated.send(value=enriched)

TimescaleDB Optimization

Moving from PostgreSQL to TimescaleDB required careful planning:

Hypertable Design:

Partitioned by time (1 week chunks) and sensor_id
Compression enabled for data older than 7 days
Retention policy: raw data 90 days, aggregates 2 years

Continuous Aggregates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT 
  time_bucket('1 hour', timestamp) AS bucket,
  sensor_id,
  AVG(moisture) as avg_moisture,
  MIN(moisture) as min_moisture,
  MAX(moisture) as max_moisture,
  COUNT(*) as reading_count
FROM sensor_readings
GROUP BY bucket, sensor_id;

Query Performance:

7-day range query: 45s → 120ms
30-day aggregation: 3min → 800ms
Dashboard load: 5s → 200ms

📊 Results & Impact

Real-Time Insights from 50M+ Daily Readings

50M+ Daily Readings Processed With sub-second latency

100x Query Performance After TimescaleDB migration

30% Water Savings for Clients Through precision irrigation recommendations

99.5% Data Accuracy After anomaly detection implementation

Additional Outcomes

Facilities reduced water usage by 30% through data-driven irrigation scheduling
Faulty sensors detected within minutes, not days
Plant health issues identified 24 hours earlier through trend analysis

📚 Key Takeaways

IoT Data Quality is Everything

We spent 30% of development time on data validation. Sensors fail in unexpected ways: cold causes drift, water damages circuits. reliable validation saved downstream headaches.

TimescaleDB Continuous Aggregates are Magic

Pre-computing common aggregations (hourly averages, daily min/max) reduced API response times from seconds to milliseconds for dashboard queries.

Domain Expertise Trumps Generic ML

Off-the-shelf anomaly detection failed. Working with agronomists to understand plant biology led to much more effective alerting thresholds.

📝 Additional Details

Experience: Senior Engineer at Spiio

Technologies: Python, Kafka, PostgreSQL, AWS, Docker/Kubernetes

Related Case Studies: Real-time EdTech Platform | Real-time NEMT Dispatch

Want Similar Results?

Let's discuss how I can help solve your engineering challenges.

📅 Schedule a Call 📧 Send Email