Learn the basics in depth

I often receive questions from aspiring data engineers. Some are fresh grads, others are switching from software or analytics roles. And a same question appears so many times: What tool should I learn in depth? I understand why people keep having such questions. The tech world moves extremely fast. Every few months, there is a new framework, a new orchestration tool, a shiny feature in a cloud, or some articles filled with buzzwords that make you feel like you are already behind. The pressure to keep up is real. But here is something I have learned over the years, and I want to say it clearly: focus on learning the basics in depth. Tools will change, 0 and 1 will not. ...

May 22, 2025 · 4 min

How to start a successful Data Warehouse project

Any organization aiming to leverage the power of data-driven decision-making stands to benefit greatly from a successful Data Warehouse project. A well-designed Data Warehouse not only centralizes your data but also guarantees that it is reliable, scalable, maintainable, and usable by stakeholders. Over the past few months, my team and I have launched a new Data Warehouse project in production. The opportunity to start from scratch is always a valuable chance to gain new insights and expertise. I would like to share the experiences from this success story in the hope that they will be as beneficial to others as they have been to us. ...

Aug 11, 2024 · 8 min

What is a reliable Data System?

In today’s data-driven world, information is gold, and the systems that store and manage it serve as crucial infrastructure. I have seen people talk a lot about terms like “distributed computing”, “scalability”… but one fundamental characteristic is often overlooked: reliability. Without it, scalability, maintainability, flexibility, anything-bility are meaningless, like a beautiful castle built on sand. What is Reliability? Everyone has their own intuition about what is reliable: A piggy bank is reliable because it consistently holds your money and accurately reflects what you’ve deposited. You trust that when you put a coin in, it will be there later, and the total will reflect your savings. And when you want to make a withdrawal, you can get your money immediately. A calculator is reliable because it consistently produces accurate results based on your input. You trust that regardless of who uses it, 2 + 2 will always equal 4. And the result should appear instantly on the screen. Different systems have different reliability requirements. In general, we can define reliability as follow: ...

Feb 16, 2024 · 4 min