How to Fix a GraphQL Endpoint Down Issue
By Aradhna
Your GraphQL endpoint is down, your client apps are throwing errors, and the pressure is on. Before you start randomly restarting services, it helps to work through the problem methodically. This guide covers the most common reasons a GraphQL endpoint goes down and how to fix each one — plus how to catch these issues before your users do.
!Diagram showing a GraphQL endpoint returning an error response to a client
Why Is My GraphQL Endpoint Down?
"Down" can mean several different things in GraphQL land:
- The server process has crashed entirely
- The server is running but returning HTTP 5xx errors
- The server responds but GraphQL returns errors in the
errorsarray (HTTP 200)
- A network or infrastructure layer is blocking requests
The first step is to determine which of these you are dealing with, because the fix is completely different in each case.
Step 1 — Check the raw HTTP response
Run a quick curl against your endpoint:
`bash curl -v -X POST https://api.yourapp.com/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{ __typename }"}' `
| HTTP Status | Likely culprit | |---|---| | Connection refused / timeout | Server process down, firewall, or DNS failure | | 502 / 503 | Reverse proxy or load balancer can't reach your app server | | 500 | Application-level exception (schema error, resolver crash) | | 200 with errors array | GraphQL-layer error — schema, resolver, or auth | | 401 / 403 | Authentication or authorisation misconfiguration |
Common Causes and Fixes
1. The Server Process Has Crashed
If you get a connection refused error, your app server (Node, Python, Go — whatever backs your GraphQL layer) has stopped.
Fix:
- Check your process manager logs:
pm2 logs,journalctl -u your-service, or your container runtime logs.
- Look for out-of-memory kills (
OOMKilledin Kubernetes events) or unhandled promise rejections in Node.
- Restart the process, then address the root cause before it happens again.
2. Reverse Proxy or Load Balancer Misconfiguration
A 502 or 503 usually means your GraphQL server is unreachable from the proxy, not necessarily from the internet.
Fix:
- Verify the upstream address in your Nginx/Caddy/ALB config matches where your app is actually listening.
- Check that the app is bound to
0.0.0.0and not just127.0.0.1if the proxy runs on a separate host.
- Confirm health-check paths are returning 200 — many load balancers will remove a node from rotation if the health check fails.
3. Schema or Resolver Errors Causing 500s
A bad deployment — say, a schema change that references a type that no longer exists — can cause every request to throw a 500.
Fix:
- Inspect application logs immediately after the failing request.
- Roll back the deployment if the schema change introduced the breakage.
- Use a schema registry or SDL validation in CI so broken schemas are caught before they reach production.
4. Database or Downstream Service Unavailable
GraphQL resolvers typically fan out to databases, microservices, or third-party APIs. If any critical dependency is down, resolvers will throw, and your endpoint will appear broken.
Fix:
- Identify which resolver is failing by examining the
pathfield in the GraphQL error response.
- Check the health of the downstream service independently.
- Implement timeouts and fallback values in resolvers so a single unhealthy dependency doesn't take down the entire graph.
5. SSL Certificate Expired
Clients (and curl, unless you pass -k) will refuse to connect to an endpoint whose TLS certificate has expired. From the user's perspective, the GraphQL endpoint is simply "down".
Fix:
- Renew the certificate immediately.
- Set up automated renewal (Let's Encrypt + certbot, AWS ACM auto-renewal, etc.).
- Monitor certificate expiry proactively — Uptrue's SSL monitoring alerts you days before a cert expires so this never catches you off guard.
6. DNS Resolution Failure
If the hostname no longer resolves, no client will reach your endpoint regardless of how healthy the server is.
Fix:
- Test resolution:
dig api.yourapp.comornslookup api.yourapp.com.
- Check for accidental record deletion in your DNS provider dashboard.
- Uptrue's DNS monitoring tracks your records continuously and notifies you the moment a change or failure is detected.
Mid-Page Check: Are You Monitoring Your GraphQL Endpoint?
Fixing a GraphQL endpoint down incident reactively is painful. Setting up a monitor takes two minutes and means you find out first — not from a customer tweet.
Start monitoring your GraphQL endpoint free → Uptrue
Uptrue sends an HTTP POST with a lightweight introspection or __typename query on your chosen interval, alerts you the moment it fails, and tracks response-time trends so you can spot degradation before it becomes an outage.
Debugging GraphQL-Specific Errors (HTTP 200, But Still Broken)
One of GraphQL's quirks is that a failing query often returns HTTP 200 with an errors array in the body. Standard uptime monitors that only check for a non-5xx status will miss this entirely.
To catch these:
- Use a monitoring tool that can inspect the response body and assert that
errorsis absent (or empty).
- Uptrue's API monitoring lets you write body assertions — for example, checking that the JSON response contains
"data"and does not contain"errors"— giving you real GraphQL-aware uptime data rather than a false green.
You can test your endpoint right now with Uptrue's free HTTP checker tool to see exactly what headers and body your GraphQL server is returning.
How to Prevent a GraphQL Endpoint Going Down Again
Reactive fixes are necessary, but prevention is better. Here is a short checklist:
- Deploy schema changes behind feature flags — validate in staging before production traffic hits the new schema.
- Set memory and CPU limits on your containers and restart policies to recover from crashes automatically.
- Add resolver-level error boundaries so one failing data source doesn't cascade.
- Monitor SSL, DNS, and HTTP continuously — not just after something breaks.
- Track response time baselines — a sudden spike in p95 latency often precedes a full outage.
- Alert on GraphQL error rates, not just HTTP status codes.
Conclusion
A GraphQL endpoint down incident almost always traces back to one of a handful of root causes: a crashed process, a misconfigured proxy, a schema or resolver error, a dead dependency, an expired SSL cert, or a DNS failure. Work through the HTTP response first to narrow down the category, then apply the relevant fix.
The bigger win is getting proper monitoring in place so these issues surface in Slack or PagerDuty before your users report them. Between uptime checks, SSL expiry alerts, DNS monitoring, and response-body assertions, Uptrue covers the full surface area of a GraphQL API in production.
Set up your GraphQL endpoint monitor on Uptrue — free to start