Data Engineer - GCP/BigQuery
We Make Change
Software Engineering, Data Science
London, UK
Volunteer for a startup bringing financial inclusion to sub-Saharan African farmers! 🌾📱
Hiveonline enables unbanked smallholder farmers in sub-Saharan Africa to access credit, markets, and financial services through mobile technology to build lasting economic resilience.
Millions of smallholder farmers in sub-Saharan Africa form the backbone of the agricultural economy yet remain entirely unbanked, lacking access to credit, formal markets, or digital financial records. This economic isolation severely limits their ability to invest in their lands, grow their businesses, or escape subsistence poverty. Traditional digital banking solutions fail to reach these rural communities due to deep-seated systemic barriers, including low digital literacy, a lack of reliable internet connectivity, and low personal phone ownership—a challenge that disproportionately excludes women farmers.
Hiveonline addresses this financial exclusion by deploying a suite of mobile-based digital tools—including myCoop.online, VSLA.online, and e-Vouchers—built on a collaborative B2B model that partners with NGOs, financial institutions, and agribusinesses. By leveraging blockchain technology, the platform generates secure, transparent, and tamper-proof financial records that establish a reliable credit history for rural workers. Crucially, Hiveonline uses a community-based infrastructure, allowing farmers to safely access credit, critical agricultural inputs, and new markets even if they do not own a personal smartphone. This inclusive design bypasses traditional technology barriers, directly increasing farmer incomes and empowering underserved communities to thrive in the modern digital economy.
Role (Volunteer, unpaid): [To add]
Role Description: Key Responsibilities & Estimated Effort As our Volunteer Data Engineer, you will work closely with our core engineering team to lay the foundation of our new analytics pipeline. We have scoped this project into distinct phases:
- Infrastructure Provisioning (~5 Hours): Set up Google Cloud Pub/Sub topics and BigQuery datasets to handle real-time event ingestion.
- Medallion Architecture Build (~25 Hours): * Configure a "Smart Bronze" BigQuery layer to capture raw, unstructured JSON event payloads.
- Develop Silver layer transformations (using SQL/dbt) to parse JSON trees and enforce strict data types.
- Build Gold layer data contracts (Fact and Dimension tables) designed specifically for BI consumption.
- Historical Data Migration / ETL (~35 Hours): Develop and execute an idempotent backfill script (Python/Node) to extract legacy relational data. You will orchestrate a multi-pass migration sequence (Groups ➔ Agents ➔ Users ➔ Transactions/Engagements) to safely populate the new event stream without violating relational dependencies.
- Backend Event Integration (~20 Hours): Collaborate with our backend developers to modify existing application functions, ensuring core domain events are correctly formatted as JSON and emitted to our Pub/Sub event bus.
- BI Integration (~15 Hours): Connect and optimise the Gold layer tables for consumption in Apache Superset, ensuring Row-Level Security is properly implemented for multi-tenant client reporting.
Time Commitment
- Total Estimated Hours: 100 hours.
- Suggested Schedule: 8–12 hours per week over an 8 to 12-week period (flexible based on your availability).
Ideal Skills & Experience
- Strong experience with Google Cloud Platform, specifically BigQuery and Pub/Sub.
- Proficiency in modern data warehousing patterns, specifically Medallion Architecture (Bronze, Silver, Gold).
- Advanced SQL skills (experience with dbt is a major plus).
- Experience writing robust ETL/migration scripts with state management (watermarking/cursors) for safe, restartable data extraction.
- Experience connecting and optimizing data for modern BI tools (Apache Superset preferred).
- Backend development experience (Node.js/Python/etc.) to assist with emitting API events.
What We Provide You will not be starting from scratch. We have fully documented Gold Data Contracts, established JSON event schemas, a clear architectural roadmap, and a defined sequence strategy for the historical data migration. You will have a clear mandate, supportive stakeholders, and the opportunity to build a modern, high-performance data stack that creates real-world social impact.
Time Commitment: Volunteer 7-9 hours per week for 3-5 months remotely 💻
If you want to make change happen, apply to volunteer with Hiveonline now!