by kkoppenhaver on 11/16/23, 5:29 PM with 75 comments
by CSMastermind on 11/16/23, 8:03 PM
It sounds like they were in a place that a lot of companies are in where they don't have a single pane of glass for observability. One of if not the main benefit I've gotten out of Datadog is having everything in Datadog so that it's all connected and I can easily jump from a trace to logs for instance.
One of the terrible mistakes I see companies make with this tooling is fragmenting like this. Everyone has their own personal preference for tool and ultimately the collective experience is significantly worse than the sum of its parts.
by tapoxi on 11/16/23, 6:10 PM
The collector (which processes and ships metrics) can be installed in K8S through Helm or an operator, and we just added a variable to our charts so the agent can be pointed at the collector. The collector speaks OTLP which is the fancy combined metrics/traces/logs protocol the OTEL SDKs/agents use, but it also speaks Prometheus, Zipkin, etc to give you an easy migration path. We currently ship to Datadog as well as an internal service, with the end goal being migrating off of Datadog gradually.
by MajimasEyepatch on 11/16/23, 6:07 PM
by Jedd on 11/17/23, 12:33 AM
Partly this lets us easily re-route & duplicate telemetry, partly it means changes to backend products in the future won't be a big disruption.
For metrics we're a mostly telegraf->prometheus->grafana mimir shop - telegraf because its rock solid and feature-rich, prometheus because there's no real competition in that tier, and mimir because of scale & self-host options.
Our scale problem means most online pricing calculators generate overflow errors.
Our non-security log destination preference is Loki - for similar reasons to Mimir - though a SIEM it definitely is not.
Tracing to a vendor, but looking to bring that back to grafana Tempo. Product maturity is a long way off commercial APM offerings, but it feels like the feature-set is about 70% there and converging rapidly. Off-the-shelf tracing products have an appealingly low cost of entry, which only briefly defers lock-in & pricing shocks.
by nevon on 11/16/23, 9:04 PM
by nullify88 on 11/17/23, 7:17 AM
Luckily the prometheus exporters have a switch to enable this behaviour, but there's talk of removing this functionality because it breaks the spec. If you were to use the OpenTelemetry protocol in to something like Mimir, you don't have the option of enabling that behaviour unless you use prometheus remote write.
Our developers aren't a fan of that.
https://opentelemetry.io/docs/specs/otel/compatibility/prome...
by roskilli on 11/16/23, 7:43 PM
Have encountered this a lot from teams attempting to use the metrics SDK.
Are you open to comment on specifics here and also what kind of shim you had to put in front of the SDK? It would be great to continue to retrieve feedback so that we can as a community have a good idea of what remains before it's possible to use the SDK for real world production use cases in anger. Just wiring up the setup in your app used to be fairly painful but that has gotten somewhat better over the last 12-24 months, I'd love to also hear what is currently causing compatibility issues w/ the metric types themselves using the SDK which requires a shim and what the shim is doing to achieve compatibility.
by caust1c on 11/16/23, 6:03 PM
Congrats too! As I understand it from stories I've heard from others, migrating to OTel is no easy undertaking.
by throwaway084t95 on 11/16/23, 10:25 PM
by tsamba on 11/16/23, 7:42 PM
by shoelessone on 11/16/23, 10:31 PM
In theory you can send telemetry data with OTel to Cloud Watch, but I've struggle to connect the dots with the front end application (e.g. React/Next.js).
by jon-wood on 11/17/23, 10:11 AM
This is endemic now. Doesn't matter what someone is writing about there'll be some pointless stock photo taking up half the page. There'll probably be some more throughout the page. Stop it please.
by k__ on 11/16/23, 6:19 PM