Ideas need a birds-eye view of the landscape to understand existing work. Papers are the only way to understand that. Bookmarking few notes for future reference
Paper #1 - Log-based software monitoring: a systematic mapping study
Key Notes
- The Lifecycle of Log
- Possible components would be Elasticsearch, Logstash, and Kibana
- Kibana provides an interface for visualization, query, and exploration of log data
- LOGGING - 1) empirical studies on logging practices, (2) requirements for application logs, and (3) implementation of log statements
- LOG INFRASTRUCTURE - (1) log parsing, and (2) log storage.
- LOG ANALYSIS: : (1) anomaly detection, (2) security and privacy, (3) root cause analysis, (4) failure prediction, (5) quality assurance, (6) model inference and invariant mining, (7) reliability and dependability, and (8) log platforms
- Log Parsing - “textual similarity” between the log messages.
- Each log is converted to a binary vector, with each element representing whether the log contains that keyword
- Transformer - TEMPLATE2VEC (as an alternative to WORD2VEC) to represent extracted templates from logs and LSTMs to learn common sequences of log sequences
Root Cause Analysis
- By correlating log messages and resource consumption, their
- approach builds relationships between changes in resource consumption and application events.
- They propose a technique based on the correlation of console logs and resource usage information to link jobs with anomalous behavior and erroneous nodes.
Failure Prediction
- Utilize system logs to predict failures by mining recurring event sequences that are correlated
Paper #2 - Multi-Source Anomaly Detection in Distributed IT Systems
Key Notes
- Three categories-modalities: metrics, application logs, and distributed traces
- Word frequencies and metrics derived from the logs (e.g TF-IDF)
- Decompose the trace in its building blocks, the events/spans, and predict the next span in the sequence
Paper #3 - LogBERT: Log Anomaly Detection via BERT
Key Notes
- LogBERT leverages the Transformer encoder to
- model log sequences and is trained by novel self-supervised tasks to capture the patterns of normal sequences.
Baselines
- Principal Component Analysis (PCA) [19]. PCA builds counting matrix based on the frequency of log keys sequences and then reduces the original counting matrix into a low dimensional space to detect anomalous sequences
- One-Class SVM (OCSVM) [14]. One-Class SVM is a well-known one-class classification model and widely used for log anomaly detection [5,16] by only observing the normal data.
- IsolationForest (iForest) [7]. Isolation forest is an unsupervised learning algorithm for anomaly detection by representing features as tree structures.
- LogCluster [6]. LogCluster is a clustering based approach, where the anomalous log sequences are detected by comparing with the existing clusters.
- DeepLog [2]. DeepLog is a state-of-the-art log anomaly detection approach.
- DeepLog adopts recurrent neural network to capture patterns of normal log sequences and further identifies the anomalous log sequences based on the performance of log key predictions.
- LogAnomaly [23]. Log Anomaly is a deep learning-based anomaly detection approach and able to detect sequential and quantitative log anomalies.
Paper #4 - A Survey on Automated Log Analysis for Reliability Engineering
- Log event sequence: A sequence of log events recording system’s activities.
- Log event count vector: A feature vector recording the log events occurrence
- The query for selected values vs Bulk Upload of Data
- Usage patterns segmented for weekday/weekend / Trading hours
- Usage patterns across different time zones
- Usage patterns across different sections of applications
- Number of ad-hoc queries
- Restrict bulk upload to certain timezones / non-peak hours
- Two-stage commit - Upload and commit at a later stage
- Limit users to App Access during peak hours (5 calls during peak hours)
- Limit users to App Access during peak hours (10 calls during non-peak hours)
- Refer to replicated data in case of data that has stop-gap 5 hours delay
- Pagination of results
- Cache/reuse of results
- Identify maximum reported errors
- Patterns of errors over a weekday
- User login activities and queries
- User value - Application usage vs Revenue
- User Action predictions
- Take top 100 users, Plot the sequence of usage and see common flow/patterns
- What is the blocking that happens between
- Page load query vs Search query
- Search query vs Data upload query
- Data upload vs Report download query
- Measure potential data conflicts that cause issues
- Understand problem statement
- Understand data sources
- Understand data access / permissions
- Frame NLP / Data / User level details
- Initial Analysis Scope
- Application Understanding
- Connects / Feedback
- User based - Create APIs / Read APIs / Update APIs - Simple / Bulk / Delete APIs - Single / Bulk
- Do we track at UserId, Numberofcalls,Avgtime
- Nature of transactions - Realtime vs Reporting vs Bulk inserts vs Bulk Updates
- API calls across day by time
- %% Mix of workflow and common tables mapped / accessed by them - Time dimension added for pattern
- A,B,C @ Time T1
- A,B at Time T2
- Detecting Anomalies in Software Execution Logs with Siamese Network
- How to Predict the Next Best Action to Progress Sales with TensorFlow 2
- Markov Chains in Python: Beginner Tutorial
No comments:
Post a Comment