Initial FeO prediction

Start by importing MagmaPEC and MagmaPandas (or regular Pandas)

[1]:

import MagmaPEC as mpc
import MagmaPandas as mp

Import your melt or whole-rock data

[2]:

wholerock_file = "./data/wholerock.csv"

wholerock = mp.read_melt(wholerock_file, index_col=["name"])
wholerock.head()

[2]:

	SiO2	TiO2	Al2O3	MnO	MgO	CaO	Na2O	K2O	P2O5	Cr2O3	FeO	total
name
PI032	46.962818	3.447381	16.650124	0.174618	7.004558	9.400741	3.131253	1.290536	0.635346	0.009630	11.350806	100.057812
PI041	46.292709	3.553973	17.191874	0.170671	6.221699	9.443416	2.701110	1.076079	0.610387	0.007262	10.986336	98.255517
PI053	47.515736	3.084096	15.182294	0.169948	8.250577	9.640357	2.298223	1.013332	0.448063	0.009225	10.955438	98.567288
PI054	47.524357	3.369594	16.456587	0.174671	6.569622	9.216309	2.653994	1.367337	0.648534	0.029540	10.953660	98.964204
PI055	46.452168	3.287018	16.039806	0.169084	7.396060	9.469899	2.035204	1.072211	0.539792	0.011676	10.896147	97.369065

Create the FeOi prediction object and initialise it with your data. Make sure to remove the FeO column from the melt compositions used to predict FeOi.

[3]:

x = wholerock.drop(columns=["FeO"])
FeOi_predict = mpc.FeOi_prediction(x=x, FeO=wholerock["FeO"])

Select the columns in x that you do not want to use to predict initial FeO contents. Here we do not want to use minor elements and the totals column:

[4]:

do_not_use = ["MnO", "P2O5", "Cr2O3", "total"]

The calculate_model_fits method calculates best-fit multiple linear regressions for all possible combinations of elements in x. For each new regression, the element whose removal results in the lowest regression F-test p-value is removed from the dataset.

It returns a dataframe with fitted coefficients and misfit statistics (RMSE, cross-validated RMSE and R2). NaN values for coefficients indicate that this element has been removed from the calibration dataset and is not used in the regression.

[5]:

model_fits = FeOi_predict.calculate_model_fits(exclude=do_not_use)
model_fits

[5]:

	intercept	SiO2	TiO2	Al2O3	MgO	CaO	Na2O	K2O	RMSE	CV-RMSE	deltaRMSE	r2
6	24.162353	-0.184450	1.085716	-0.337465	NaN	-0.292231	0.344686	-0.579578	0.151416	0.278102	0.126686	0.930966
5	21.878478	-0.144730	1.179186	-0.293183	NaN	-0.296750	NaN	-0.300763	0.165387	0.273553	0.108166	0.917640
4	22.442147	-0.168107	1.114827	-0.289476	NaN	-0.260280	NaN	NaN	0.171032	0.232349	0.061317	0.911921
3	11.585125	NaN	1.463365	-0.230722	NaN	-0.169262	NaN	NaN	0.206434	0.237533	0.031100	0.871685
2	9.213242	NaN	1.640962	-0.217524	NaN	NaN	NaN	NaN	0.253769	0.294748	0.040978	0.806092
1	5.357205	NaN	1.725906	NaN	NaN	NaN	NaN	NaN	0.319430	0.355670	0.036240	0.692766

RMSE and R2 are both calculated on the entire calibration dataset and indicate how good the model is at predicting FeO. For RMSE lower values are better, while for R2 values close to 1 are best. To check for overfitting, cross-validated RMSE’s (CV-RMSE) are also calculated, where large differences between RMSE and CV-RMSE (deltaRMSE) can indicate overfitting issues. As long as RMSE and R2 values are acceptable, the model with the smallest deltaRMSE should be selected. Here, models 1, 2 and 3 have similar RMSE, deltaRMSE and R2 values and any of these models would work well for predicting melt FeO contents. In models 4, 5 and 6, deltaRMSE has increased and overfitting might be an issue here.

Next we use the results for the previous step to select our model. Here we use model 3, where TiO2, Al2O3 and CaO are used as predictors. In the select_predictor method, you pass the index of your preferred model to the idx parameter. It shows a plot of the regression and calibration data. Be mindful of the FeO range, and make sure that you do not extrapolate (too much) beyond this range during PEC corrections.

[6]:

FeOi_predict.select_predictors(idx=3)

We can check if the right predictors are used with the predictors attribute

[7]:

FeOi_predict.predictors

[7]:

array(['TiO2', 'Al2O3', 'CaO'], dtype=object)

Coefficients of the linear regression were automatically calculated by the select_predictors method and we can access these with the coefficients, intercept_error and errors attributes.

Fitted coefficients:

[8]:

FeOi_predict.coefficients

[8]:

intercept    11.585125
TiO2          1.463365
Al2O3        -0.230722
CaO          -0.169262
dtype: float64

and their errors as standard deviations:

[9]:

FeOi_predict.errors

[9]:

intercept    1.203255
TiO2         0.164396
Al2O3        0.048417
CaO          0.049363
dtype: float64

The model attribute stores the selected model as a callable function, which we can use to predict melt FeO contents. The function accepts a pandas DataFrame or MagmaFrame with melt major element compositions in oxide wt. % as input. Let’s try that out on the whole-rock data:

[11]:

model = FeOi_predict.model
model(wholerock)

[11]:

name
PI032    11.197161
PI041    11.220927
PI053    10.963646
PI054    11.159202
PI055    11.091599
PI056    10.944143
PI057    10.590489
PI058    10.757390
PI060    11.154730
PI063    11.988362
PI065    11.165823
PI066    11.055243
PI067    11.146931
PI068    11.240555
PI069    10.846550
PI070    10.381338
PI071    10.801455
PI074    10.470945
PI077    10.007513
PI078     9.718330
PI079     9.230219
PI081    10.813904
PI084    10.438283
PI094    10.511939
PI097    10.183464
PI111    10.866402
PI114    10.701429
dtype: float32

The random_sample_coefficients method randomly samples fitted coefficients within their errors and calculates the matching x-intercept value. This method is used internally in the Monte Carlo PEC correction model to propagate FeOi prediction errors.

[10]:

FeOi_predict.random_sample_coefficients(n=5)

[10]:

	TiO2	Al2O3	CaO	intercept
0	1.555664	-0.186890	-0.129272	10.195312
1	1.704102	-0.263428	-0.229004	11.937500
2	1.424805	-0.151489	-0.275146	11.398438
3	1.440430	-0.228882	-0.127197	11.226562
4	1.457031	-0.237793	-0.176147	11.789062