Proposal
Hey.
Could you possible document best practises for exporters, when collection of their metrics fails only partially?
There is this:
Failed scrapes
There are currently two patterns for failed scrapes where the application you’re talking to doesn’t respond or has other problems.
The first is to return a 5xx error.
The second is to have a myexporter_up, e.g. haproxy_up, variable that has a value of 0 or 1 depending on whether the scrape worked.
The latter is better where there’s still some useful metrics you can get even with a failed scrape, such as the HAProxy exporter providing process stats. The former is a tad easier for users to deal with, as up works in the usual way, although you can’t distinguish between the exporter being down and the application being down.
But it’s IMO rather incomplete.
I mean it's clear, that if everything fails one might return a 5xx. But for the partial case <exporter>_up alone doesn't really seem to be enough.
In my example I write an exporter which collects part of its metrics via some SSH interface and the other via some REST interface.
If only either of them fails, the other metrics are still valuable.
And <exporter>_up would only indicate the whole exporter being up/down. So one would need a more granular schema, e.g. in my case <exporter>_ssh_based_metrics_up and <exporter>_rest_based_metrics_up.
But of course that would also be rather useless for the user of that metrics, because they don’t necessarily know which metrics are SSH and which are REST based.
In some cases it might be possible to use a special value for the metrics, which indicates that it’s "invalid"/down, but in general that seems rather a bad idea to me.
Similarly, it doesn’t seem feasible to add an <foo>_up to every metric named <foo>.
Or is the recommendation to simply leave out those metrics that couldn’t be collected? (i.e. not print them at all)?
Well, I', not claiming that I know the best way to handle these cases, so it would be nice if some best current practises could be documented.
Thanks,
Chris.
Proposal
Hey.
Could you possible document best practises for exporters, when collection of their metrics fails only partially?
There is this:
But it’s IMO rather incomplete.
I mean it's clear, that if everything fails one might return a 5xx. But for the partial case
<exporter>_upalone doesn't really seem to be enough.In my example I write an exporter which collects part of its metrics via some SSH interface and the other via some REST interface.
If only either of them fails, the other metrics are still valuable.
And
<exporter>_upwould only indicate the whole exporter being up/down. So one would need a more granular schema, e.g. in my case<exporter>_ssh_based_metrics_upand<exporter>_rest_based_metrics_up.But of course that would also be rather useless for the user of that metrics, because they don’t necessarily know which metrics are SSH and which are REST based.
In some cases it might be possible to use a special value for the metrics, which indicates that it’s "invalid"/down, but in general that seems rather a bad idea to me.
Similarly, it doesn’t seem feasible to add an
<foo>_upto every metric named<foo>.Or is the recommendation to simply leave out those metrics that couldn’t be collected? (i.e. not print them at all)?
Well, I', not claiming that I know the best way to handle these cases, so it would be nice if some best current practises could be documented.
Thanks,
Chris.