0
Skip to Content
Cloud Portable Tech
Featured
Content Hub
About
Cloud Portable Tech
Featured
Content Hub
About
Featured
Content Hub
About

References for DeepSeek Blog Post

[1] - https://unfoldai.com/deepseek-r1/

[2] - https://c3.unu.edu/blog/deepseek-r1-pioneering-open-source-thinking-model-and-its-impact-on-the-llm-landscape

[3] - https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-open-source-contributions-unlock-ai-innovation

[4] - https://www.forbes.com/sites/janakirammsv/2025/01/26/all-about-deepseekthe-chinese-ai-startup-challenging-the-us-big-tech/

[5] - https://community.ibm.com/community/user/ibmz-and-linuxone/blogs/philip-dsouza/2024/12/10/why-all-the-fuss-about-open-source-vs-proprietary

[6] - https://www.linkedin.com/pulse/rise-open-source-ai-flexibility-transparency-smes-devendra-goyal-0pf0c

[7] - https://www.run.ai/blog/the-executives-guide-to-llms-open-source-vs-proprietary

[8] - https://www.nbcnews.com/tech/tech-news/china-ai-assistant-deepseek-rcna189385

[9] - https://www.reuters.com/technology/chinas-deepseek-sets-off-ai-market-rout-2025-01-27/

[10] - https://www.wing.vc/content/open-source-versus-proprietary-ai-models-the-new-frontiers-of-ai-competition

[11] - https://www.macro4.com/blog/why-all-the-fuss-about-open-source-vs-proprietary-ai/

What the Data Says

No single provider wins everything, and that’s the point

LKE wins on generation efficiency — the metric users feel longest.

TPOT p50 is 37.9 ms on LKE vs 54.2 ms on EKS and 56.6 ms on GKE — 30–33% faster token generation, sustained across 3,500 requests. This carries through to throughput (17.70 tokens/sec vs 16.22 and 15.30) and total duration (155s vs 172s and 181s). In a streaming interface where the user watches tokens flow for 10–15 seconds, TPOT determines whether a response feels like it's flowing or buffering.

EKS is fastest to first token — but the gap narrows under pressure.

EKS delivers the first token at p50 in 1,518 ms, 3x faster than LKE's 4,556 ms. At p95, GKE leads (6,391 ms) and the providers tighten up. The tradeoff: EKS gets to the first token fastest, but LKE generates the rest of the response 30% faster. Despite trailing on TTFT, LKE finishes the typical request first (latency p50: 14,245 ms vs 15,023 EKS and 15,968 GKE).

Reliability: all three are production-grade, but not equal:

  • LKE: 3,498/3,500.

  • GKE: 3,495/3,500.

  • EKS: 3,475/3,500 — 25 failures across 7 runs, the most of any provider.

East–West is a platform signal, not a RAG signal:

EKS leads pod-to-pod throughput at 4.87 Gbps (5.2x LKE). GKE shows zero median retransmits. But this RAG workload generates minimal internal traffic — embeddings are < 2 KB, Qdrant payloads are small, and the bottleneck is GPU inference, not network bandwidth. EKS's 5x throughput advantage doesn't show up in the North-South results, which is exactly what you'd expect for single-GPU inference with lightweight retrieval.

Cost is the multiplier:

  • LKE runs at $433/mo.

  • EKS at $768 (+77%).

  • GKE at $807 (+86%).

    LKE isn't trading cost for performance — it wins 5 of 8 North-South metrics while costing 44–46% less. Across a fleet of 10 clusters, the annual gap is $40K–$45K before reserved pricing.

Reproduce the Results

Don’t take my word for it

The entire stack — application, infrastructure, observability, and benchmarks — is public. You can deploy the same system to your own clusters and run the same scorecard.

The high-level flow is three steps:

git clone https://github.com/jgdynamite10/rag-ray-haystack

1. Provision infrastructure (Terraform)

</> bash

cd infra/terraform/akamai-lke && terraform init && terraform apply

cd infra/terraform/aws-eks    && terraform init && terraform apply

cd infra/terraform/gcp-gke    && terraform init && terraform apply

2. Deploy the stack (Helm via deploy script)

</> bash

./scripts/deploy.sh --provider akamai-lke --env dev

./scripts/deploy.sh --provider aws-eks    --env dev

./scripts/deploy.sh --provider gcp-gke    --env dev

3. Run the benchmark

</> bash

./scripts/benchmark/run_ns.sh akamai-lke --url http://<frontend-ip>/api/query/stream

./scripts/benchmark/run_ns.sh aws-eks    --url http://<frontend-ip>/api/query/stream

./scripts/benchmark/run_ns.sh gcp-gke    --url http://<frontend-ip>/api/query/stream

Each step has prerequisites (cloud credentials, container images, KubeRay operator) documented in the repo. The commands above are the conceptual flow — the detailed walkthrough lives in:

  • docs/DEPLOYMENT.md — step-by-step for all three providers

  • docs/BENCHMARKING.md — methodology, pre-flight checklist, and run procedures

Results land as JSON in benchmarks/ with the same measurement contract, the same client, and the same event sequence — on your infrastructure, from your location.

Conclusion

The chain holds

This project started with a simple question:

For the same RAG workload, what do you actually get across cloud providers — and what does it cost?

The answer isn't simple, and that's the point. No provider swept every metric. EKS is fastest to first token. GKE has the tightest TTFT tail. LKE generates tokens 30% faster and finishes the workload first — at 44–46% lower cost. Each provider has a profile, and which one matters depends on what you're optimizing for.

But the deeper finding isn't about any single provider. It's that these differences are measurable, repeatable, and defensible — if you control the environment, define the measurement contract, and run enough iterations to separate signal from noise. Most GenAI benchmarks don't do that. Most comparisons are one-time demos on one cloud with one set of conditions, and the numbers don't survive scrutiny.

The anchor gets the credit, but the chain does the work. For GenAI moving into production, the chain is infrastructure parity, measurement discipline, and cost transparency. Get those right, and the decisions that follow — which provider, which GPU, which region, how much to budget — become engineering conversations backed by evidence, not vendor pitches backed by hope.

The repo is public. The methodology is documented. The scorecard is reproducible. Run it yourself.

Copyright © 2024 - CloudPortableTech