Series Week 15/52 - OCI ExaCS, DB System, On Prem — Managed Service Complexity Handled

#oci #oracle #nabhaas #thoughtleadership

{ Abhilash Kumar Bhattaram : Follow on LinkedIn }

Treating Oracle Cloud Infrastructure (OCI) simply as an off-site extension of a local data center is the primary precursor to operational failure. In many migrations, databases are "lifted and shifted" into cloud services without a fundamental grasp of how those services operate internally. Teams often treat OCI DB Systems as standard on-premises VMs and view Exadata Cloud Service (ExaCS) merely as a larger, more powerful server. In these scenarios, redundancy, scalability, and resource limits are assumed to be inherent rather than engineered.

While the platform may appear stable initially, business growth inevitably exposes architectural flaws. Load patterns shift, session spikes occur, and storage costs begin to rise linearly because capacity was provisioned in advance rather than designed for on-demand flexibility. These issues frequently manifest as "cascading operational failures"—feedback loops where a small spike in errors or latency triggers a reduction in capacity, making the original problem worse and eventually requiring manual human intervention to recover.

Ultimately, what is often mislabeled as a "cloud technology failure" is actually a consequence of architectural ignorance. This article explores the recurring pitfalls encountered when cloud internals are ignored, demonstrating that a deep understanding of OCI database architectures is the only way to build predictable platforms and end the cycle of constant firefighting.

1. Ground Zero: Where Challenges Start

+--------------------------------------------------------------------------------------+
| Ground Zero: Where Challenges Start                                                  |
|--------------------------------------------------------------------------------------|
| - OCI DB Systems sized like on-prem VMs                                              |
| - ExaCS deployed but used as “big VM + storage”                                      |
| - Same architecture copied across Prod, UAT, DR                                      |
| - SLAs defined without knowing service-level constraints                             |
|                                                                                      |
| Repeated Pitfalls Seen in the Field:                                                 |
| • DB System chosen for high-concurrency apps → SESSION/PROCESS limits hit            |
| • ExaCS used but ASM redundancy not understood (HIGH vs EXTERNAL)                    |
| • Storage over-provisioned due to double redundancy assumptions                      |
| • Connection storms during sales / month-end not modeled                             |
| • Patching timelines copied without understanding rolling vs non-rolling             |
|                                                                                      |
| >> These don’t fail on Day 1 — they fail on Day 180                                  |
+--------------------------------------------------------------------------------------+

2. Underneath Ground Zero:

+--------------------------------------------------------------------------------------+
| Underneath Ground Zero: Finding the Real Problem                                     |
|--------------------------------------------------------------------------------------|
| - Architecture decisions made before workload is understood                          |
| - OCI services treated as infrastructure, not platforms                              |
| - Application compatibility ignored during DB version choices                        |
| - Business growth assumed to be linear                                               |
|                                                                                      |
| Patterns You’ve Written About Earlier:                                               |
| • SLAs defined without MTTR realism (Week 4)                                         |
| • Patch behavior differs across environments (Week 10)                               |
| • Non-prod testing doesn’t reflect prod load (Week 11)                               |
| • Databases multiplied instead of consolidated (Week 8)                              |
| • Global support teams unaware of local workload rhythm (Week 6)                     |
|                                                                                      |
| Root Cause:                                                                          |
| • Teams know Oracle DB — but not OCI DB services                                     |
| • Cloud internals discovered only during incidents                                   |
|                                                                                      |
| >> Most “cloud problems” are actually design problems                                |
+--------------------------------------------------------------------------------------+

3. Working Upwards:


+--------------------------------------------------------------------------------------+
| Working Upwards: From Understanding to Solution                                      |
|--------------------------------------------------------------------------------------|
| - Start from business rhythm (sales, month-end, reporting)                           |
| - Translate rhythm into workload patterns                                            |
| - Choose DB System vs ExaCS intentionally                                            |
| - Align ASM redundancy with service architecture                                     |
| - Design for peak concurrency, not average load                                      |
| - Validate assumptions using historical AWR, not guesses                             |
|                                                                                      |
| Practical Design Shifts:                                                             |
| • High concurrency + elastic growth → ExaCS                                          |
| • Predictable, steady workloads → DB System                                          |
| • Multiple apps → consolidate with isolation strategy                                |
| • Compliance-driven systems → architecture-first design                              |
|                                                                                      |
| CTO Outcome:                                                                         |
| • No redesign during scale                                                           |
| • Predictable RTO/RPO                                                                |
| • Controlled cloud costs                                                             |
| • Fewer “why is this happening now?” moments                                         |
|                                                                                      |
| >> Cloud predictability starts at design — not at incident response                  |
+--------------------------------------------------------------------------------------+

How Nabhaas helps you

If you’ve made it this far, you already sense there’s a better way — in fact, you have a way ahead.

If you’d like Nabhaas to assist in your journey, remember — TAB is just one piece. Our Managed Delivery Service ensures your Oracle operations run smoothly between patch cycles, maintaining predictability and control across your environments.

TAB - Whitepaper ,
download here

Managed Delivery Services - Whitepaper ,
download here