At one point in my career, I was facing a challenging architectural dilemma.
I needed to design a centralized data warehouse that combined IoT GPS data with ERP data inside Databricks. The platform was planned to serve two very different audiences. The first was external users who would consume SaaS analytical products. The second was internal stakeholders who required a churn management dashboard that combined IoT GPS, subscription, and after-sales data stored in ERP systems.
The main challenge here was cost ownership. Databricks is extremely powerful for modern enterprise analytical data products, but it comes with compute costs. The business expectation was clear: Databricks usage should primarily be funded by external SaaS customers, while internal analytics usage should not significantly increase platform costs
The strategy I implemented was to separate the SQL compute layer while keeping the data layer centralized in Databricks Delta Lake. External SaaS analytical products continued to run on Databricks SQL compute engine, ensuring high performance, scalability, and premium analytical capabilities for paying customers.
For internal analytics workloads, I introduced Trino as an alternative SQL engine. By leveraging Databricks Delta Lake’s open table format, Trino could query the same centralized data without requiring additional Databricks compute resources.
This architecture allowed us to maintain a single source of truth in Databricks Delta Lake while optimizing cost allocation between internal and external users. In the end, whatever solutions we deliver need to align with business strategy and continuously deliver long-term business value.
Originally published at:(https://medium.com/@angga.faizul05/a-real-world-approach-to-splitting-analytics-workloads-between-databricks-and-trino-627b76a19123)

Top comments (0)