ML Configuration
Netdata's Machine Learning capabilities are enabled by default if the Database mode is set to db = dbengine
.
Enabling or Disabling Machine Learning
To enable or disable Machine Learning capabilities on a node:
- Edit
netdata.conf
. - In the
[ml]
section:- Set
enabled
toyes
to enable ML. - Set
enabled
tono
to disable ML. - Leave it at the default
auto
to enable ML only when Database mode is set todbengine
.
- Set
- Restart Netdata.
Available Configuration Parameters
Below is a list of all available configuration parameters and their default values:
[ml]
# enabled = auto
# maximum num samples to train = 21600
# minimum num samples to train = 900
# train every = 3h
# number of models per dimension = 18
# dbengine anomaly rate every = 30
# num samples to diff = 1
# num samples to smooth = 3
# num samples to lag = 5
# random sampling ratio = 0.2
# maximum number of k-means iterations = 1000
# dimension anomaly score threshold = 0.99
# host anomaly rate threshold = 1.0
# anomaly detection grouping method = average
# anomaly detection grouping duration = 5m
# hosts to skip from training = !*
# charts to skip from training = netdata.*
# dimension anomaly rate suppression window = 15m
# dimension anomaly rate suppression threshold = 450
# delete models older than = 7d
Configuration Examples
If you want to run ML on a parent instead of at the edge, the examples below illustrate various configurations.
This example assumes three child nodes streaming to one parent node. It shows different ways to configure ML:
- Running ML on the parent for some or all children.
- Running ML on the children themselves.
- A mixed approach.
# Parent will run ML for itself and Child 1 & 2, but skip Child 0.
# Child 0 and Child 1 will run ML independently.
# Child 2 will rely on the parent for ML and will not run it itself.
# Parent configuration
[ml]
enabled = yes
hosts to skip from training = child-0-ml-enabled
# Child 0 configuration
[ml]
enabled = yes
# Child 1 configuration
[ml]
enabled = yes
# Child 2 configuration
[ml]
enabled = no
Parameter Descriptions (Min/Max Values)
General Settings
enabled
: Controls whether ML is enabled.yes
to enable.no
to disable.auto
lets Netdata decide based on database mode.
maximum num samples to train
(3600
-86400
): Defines the maximum training period. The default of21600
trains on the last 6 hours of data.minimum num samples to train
(900
-21600
): The minimum amount of data needed to train a model. If less than900
samples (15 minutes of data) are available, training is skipped.train every
(3h
-6h
): Determines how often models are retrained. The default of3h
means retraining occurs every three hours. Training is staggered to distribute system load.
Model Behavior
number of models per dimension
(1
-168
): Specifies how many trained models per dimension are used for anomaly detection. The default of18
means models trained over the last ~54 hours are considered.dbengine anomaly rate every
(30
-900
): Defines how frequently Netdata aggregates anomaly bits into a single chart.
Feature Processing
num samples to diff
(0
-1
): Determines whether ML operates on raw data (0
) or differences (1
). Using differences helps detect anomalies in cyclical patterns.num samples to smooth
(0
-5
): Controls data smoothing. The default of3
averages the last three values to reduce noise.num samples to lag
(0
-5
): Defines how many past values are included in the feature vector. The default5
helps the model detect patterns over time.
Training Efficiency
random sampling ratio
(0.2
-1.0
): Controls the fraction of data used for training. The default0.2
means 20% of available data is used, reducing system load.maximum number of k-means iterations
: Limits iterations during k-means clustering (leave at default in most cases).
Anomaly Detection Sensitivity
dimension anomaly score threshold
(0.01
-5.00
): Sets the threshold for flagging an anomaly. The default0.99
flags values that are in the top 1% of anomalies based on training data.host anomaly rate threshold
(0.1
-10.0
): Defines the percentage of dimensions that must be anomalous for the host to be considered anomalous. The default1.0
means more than 1% must be anomalous.
Anomaly Detection Grouping
anomaly detection grouping method
: Defines the method used to calculate the node-level anomaly rate.anomaly detection grouping duration
(1m
-15m
): Determines the time window for calculating anomaly rates. The default5m
calculates anomalies over a 5-minute rolling window.
Skipping Hosts and Charts
hosts to skip from training
: Allows excluding specific child hosts from training. The default!*
means no hosts are skipped.charts to skip from training
: Excludes charts from anomaly detection. By default, Netdata-related charts are excluded to prevent false anomalies caused by normal dashboard activity.
Model Retention
delete models older than
(1d
-7d
): Defines how long old models are stored. The default7d
removes unused models after seven days.
Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.