2. ML Experiments

import matplotlib.pyplot as plt
plt.rcParams["font.family"] = "Times New Roman"
from utils import make_data
from ai4water.experiments import MLRegressionExperiments
data, _, _ = make_data(encoding="ohe")

print(data.shape)
(1514, 75)
data.head()
Adsorption Time (min) Pyrolysis Temperature Pyrolysis Time (min) Initial Concentration Solution pH Adsorbent Loading Volume (L) Adsorption Temperature Surface Area Pore Volume Adsorbent_0 Adsorbent_1 Adsorbent_2 Adsorbent_3 Adsorbent_4 Adsorbent_5 Adsorbent_6 Adsorbent_7 Adsorbent_8 Adsorbent_9 Adsorbent_10 Adsorbent_11 Adsorbent_12 Adsorbent_13 Adsorbent_14 Adsorbent_15 Adsorbent_16 Adsorbent_17 Adsorbent_18 Adsorbent_19 Adsorbent_20 Adsorbent_21 Adsorbent_22 Adsorbent_23 Adsorbent_24 Adsorbent_25 Adsorbent_26 Adsorbent_27 Adsorbent_28 Adsorbent_29 Adsorbent_30 Adsorbent_31 Adsorbent_32 Adsorbent_33 Adsorbent_34 Adsorbent_35 Adsorbent_36 Adsorbent_37 Adsorbent_38 Adsorbent_39 Adsorbent_40 Adsorbent_41 Adsorbent_42 Adsorbent_43 Adsorbent_44 Adsorbent_45 Adsorbent_46 Adsorbent_47 Dye_0 Dye_1 Dye_2 Dye_3 Dye_4 Dye_5 Dye_6 Dye_7 Dye_8 Dye_9 Dye_10 Dye_11 Dye_12 Dye_13 Dye_14 Dye_15 Adsorption
0 0.0 25 0.0 200.0 2.8 10.0 1.0 25.0 2.75 0.0001 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0000
1 5.0 25 0.0 200.0 2.8 10.0 1.0 25.0 2.75 0.0001 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0000
2 10.0 25 0.0 200.0 2.8 10.0 1.0 25.0 2.75 0.0001 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 11.4310
3 15.0 25 0.0 200.0 2.8 10.0 1.0 25.0 2.75 0.0001 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 18.3542
4 20.0 25 0.0 200.0 2.8 10.0 1.0 25.0 2.75 0.0001 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 19.2639


Initialize the experiment

comparisons = MLRegressionExperiments(
    input_features=data.columns.tolist()[0:-1],
    output_features=data.columns.tolist()[-1:],
    split_random=True,
    seed=1575,
    verbosity=0,
    show=False
)

fit/train all the models

comparisons.fit(
    data=data,
    run_type="dry_run",
    include=['XGBRegressor',
             'AdaBoostRegressor', 'LinearSVR',
             'BaggingRegressor', 'DecisionTreeRegressor',
             'HistGradientBoostingRegressor',
             'ExtraTreesRegressor', 'ExtraTreeRegressor',
             'LinearRegression', 'KNeighborsRegressor']
)
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
running  XGBRegressor model
findfont: Font family ['Times New Roman'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Times New Roman'] not found. Falling back to DejaVu Sans.
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
running  AdaBoostRegressor model
divide by zero encountered in true_divide
divide by zero encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
running  LinearSVR model
Liblinear failed to converge, increase the number of iterations.
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
running  BaggingRegressor model
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in log
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in log
running  DecisionTreeRegressor model
invalid value encountered in true_divide
divide by zero encountered in log
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in log
running  HistGradientBoostingRegressor model
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
running  ExtraTreesRegressor model
divide by zero encountered in true_divide
divide by zero encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
running  ExtraTreeRegressor model
divide by zero encountered in true_divide
divide by zero encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
running  LinearRegression model
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log
invalid value encountered in log
running  KNeighborsRegressor model
divide by zero encountered in true_divide
divide by zero encountered in log
divide by zero encountered in true_divide
divide by zero encountered in log

Compare R2

_ = comparisons.compare_errors(
    'r2',
    data=data)
plt.tight_layout()
plt.show()
Train, test
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in true_divide

Compare MSE

_ = comparisons.compare_errors(
    'mse',
    data=data,
    cutoff_val=1e7,
    cutoff_type="less"
)
plt.tight_layout()
plt.show()
Train, test
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in true_divide
_ = best_models = comparisons.compare_errors(
    'r2_score',
    cutoff_type='greater',
    cutoff_val=0.01,
    data=data
)
plt.tight_layout()
plt.show()
Train, test
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in true_divide
comparisons.taylor_plot(data=data)
, Train, Test
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
divide by zero encountered in true_divide
invalid value encountered in true_divide
divide by zero encountered in true_divide

<Figure size 500x800 with 2 Axes>
comparisons.compare_edf_plots(
    data=data,
    exclude=["SGDRegressor", "KernelRidge", "PoissonRegressor"])

plt.tight_layout()
plt.show()
Empirical Distribution Function Plot
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
_ = comparisons.compare_regression_plots(data=data, figsize=(12, 14))
ml experiments
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)
_ = comparisons.compare_residual_plots(data=data, figsize=(12, 14))
ml experiments
***** Training *****
input_x shape:  (847, 74)
target shape:  (847, 1)
***** Validation *****
input_x shape:  (212, 74)
target shape:  (212, 1)
***** Test *****
input_x shape:  (455, 74)
target shape:  (455, 1)

Total running time of the script: (1 minutes 56.864 seconds)

Gallery generated by Sphinx-Gallery