From SRE ⇔ NASA

Site Reliability Engineering (SRE) is a discipline that combines aspects of software engineering and applies them to operations whose goal is to create scalable and reliable software systems. SRE teams are responsible for the reliability, performance, scalability, and monitoring of software systems. The SRE culture originated in the early 2000s […]

Read More

Prometheus and Thanos: A Symbiotic Relationship

In my journey with monitoring and alerting tools, I’ve come to deeply appreciate Prometheus. Its real-time monitoring capability feels like having a pulse on your systems. But, just like any good story, our hero, Prometheus, has its Achilles’ heel. I remember the first time I loaded it with a ton […]

Read More

Best efforts to OSS

As an engineer/systems architect, you must keep up-to-date with the latest trends, technologies, and best practices in your field. One of the most effective ways to do that is by writing technical write-ups and sharing your knowledge with the community. However, sometimes it’s easier said than done, and there might […]

Read More

How I brought down production

I figured every SRE / Systems / DevOps / Infrastructure Engineer brings down production system some time or the other. So did I, that too on a monday morning at 11:00 am. I think this might actually be the biggest highlight of my career till now. I’ve solved countless problems […]

Read More
aws-dynamodb-gateway

AWS: DynamoDB + API Gateway – The Correct Way

I went through the hard way so you don’t. This is not a tutorial but more of a post to skip all step to fetch and clean data with DynamoDB + API Gateway.Before I say anything, I want to thank aws for creating this and second scrutinize them to not […]

Read More