Building an Uptime Monitor: API Monitoring Architecture - Part 1
Last time I made a configurable open-source rate limitter, and I got a huge positive response from the community, and I also got several pieces of feedback on how I could improve the implementation. I really liked those suggestions; that's why this time I will be sharing my progress along with the thought process and implementation details while developing this new project.
This time I have decided to develop an open-source uptime monitor. I have planned to provide the below-mentioned features:
1. Website Monitoring
2. API Monitoring
3. Cron Job Monitoring
4. Heartbeat Monitoring
5. SSL Monitoring
6. PING & Port Monitoring
7. Customizable Frequency
8. Dashboard
9. Status Pages
10. Email, Slack, Telegram, Discord, and WhatsApp Alerts
11. WebHooks
12. Scheduled Emails
13. Maintenance Window
If this goes well, I will also try to convert it into a micro-SaaS product. But let's not focus on the money right now. Instead, let's focus on developing this.
Website / API Monitoring
So the first part of this implementation is monitor Website & API Uptime. Below is how I am planning to implement it.
I am saving user-provided API details in an SQL database and also in Redis. SQL will be used as persistent storage, and Redis will be used for faster data access, as I have to access this data every 30, 50, 120, or 300 seconds as per user request.
In case any update is done by user in API details I’ll make sure to update Redis as well.
I will have a worker thread that will fetch this data from Redis at regular intervals and send it to Kafka.
After this, an uptime service that will consume from Kafka Will check if the API is up or down.
If the API is up then there is nothing need to be done.
But if an API is down, then I will increment its fail counter that will be stored somewhere in Redis.
If the fail counter is equal to the maximum fail count limit specified by the user, I will notify them.
I will make sure that I only notify every 15 minutes, or this number can be configured by the user. I don't want to notify the user every 30 seconds because I think this is not required.
My Questions:
What can I improve in this architecture?
Do I need to store success results from uptime monitoring service?
Instead of Kakfa should I use Redis Pub/Sub?
Should I notify user every 15 minutes of use a exponential strategy?
After getting all the reviews of this post and implementing this part, I will write another post showcasing that implemented stuff. If you want to get notified, you can subscribe to this newsletter, which is completely free, and I will make sure that postage is to your mailbox as soon as I publish it.