WiseWeasel 6 years ago

Today's uninteresting log noise is tomorrow's critical data.

I've been loving Kibana for filtering and reporting on log data in flexible and insightful ways, including automatically generated charts for certain data sources.

  • sebcat 6 years ago

    Yes. Using loglevels/priorities, facilities and identities in a sane way, your logs would already be classified.

    Let's say there's a service failure and I want to know what the service has done prior to the failure. I wouldn't want a classifier to filter the logs in that case, so that use case is out of the picture. What other use cases than filtering are there for this? Maybe as a way to provide feedback to developers to fix the log messages, as in "this thing that we log all the time can be determined to never affect the process of trouble-shooting our services, and the classifier thinks it's noise, so we'll remove it".

peterevans 6 years ago

It would be neat if MachineBox could sense whether log noise would be useful in other contexts--e.g., as a metric that can be graphed. Or whether your logging is lacking something that might be useful, or just lacking signal at all (hey, user, your logs are just noise!).

bpchaps 6 years ago

One of the ways that I do this (assuming you have access to unix utilities) is to do:

  cat output.log | tr -d '[0-9]' | sort | uniq -c | sort -n
This is a fairly useful way of removing relatively useless information such as timestamps and line numbers when you're looking for rare or unique events. The alternative, I think, is to do a bunch of awk or sed magic, which isn't really fun for anybody. It's especially useful in a time crunch when there's an ongoing outage.
  • _ZeD_ 6 years ago

    onestly I found really fun to do "a bunch of awk or sed magic"

    • bpchaps 6 years ago

      Except for us weirdos :p

  • apotheosis 6 years ago

    It would be nice if you told us what this command does for you.

    • bpchaps 6 years ago

      It removes numbers from a log file, sorts it, groups and counts unique lines, then sorts numerically by the count of each unique line.

      But don't take my word for it. Try it yourself!

lopmotr 6 years ago

Is it possible to make a ML algorithm which has only "noise" data for training and then identifies abnormalities? It seems like that's people do that easily and it would be ideal for an application like this where you might not have much training data on all the "not noise" types of examples.

Another application would be a security camera that detects unusual events without having to train it on actual burglars.

  • slashcom 6 years ago

    This is a subfield called anomaly detection.

kthielen 6 years ago

Maybe an easier way to go is to record it structured up front (it’s already structured in the original application source anyway). This makes it much easier to record efficiently (so you can record more data) and also much easier to query efficiently, where eg you might invest time in machine learning on logical data instead of having to mess around with text.

That’s what we do here anyway, it’s worked well for us:

https://github.com/Morgan-Stanley/hobbes/blob/master/README....

foo101 6 years ago

Would not false negative (a critical log being muted) be a major concern while using machine learning in this domain?

What if I never see a critical log because the trained model decided that it is unimportant? How is such a situation generally solved in the industry?

  • sannee 6 years ago

    I have limited experience, but I think that usually you would take this into account when building your loss function and heavily penalize false negatives during training.

arbie 6 years ago

It would be nice if logfile analysis tools (including ELK) supported logs that were multiple lines per message. Does anyone know of such tools?

vinchuco 6 years ago

I really wanted this to be about real time sound editing and not about log data.

dumpValve 6 years ago

I have never read useful log output that wasn't generated by code that I had written.

Other people's logs are mostly noxious CPU exhaust.