{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Initial FeO prediction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start by importing MagmaPEC and MagmaPandas (or regular Pandas)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import MagmaPEC as mpc\n",
"import MagmaPandas as mp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import your melt or whole-rock data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" SiO2 | \n",
" TiO2 | \n",
" Al2O3 | \n",
" MnO | \n",
" MgO | \n",
" CaO | \n",
" Na2O | \n",
" K2O | \n",
" P2O5 | \n",
" Cr2O3 | \n",
" FeO | \n",
" total | \n",
"
\n",
" \n",
" | name | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | PI032 | \n",
" 46.962818 | \n",
" 3.447381 | \n",
" 16.650124 | \n",
" 0.174618 | \n",
" 7.004558 | \n",
" 9.400741 | \n",
" 3.131253 | \n",
" 1.290536 | \n",
" 0.635346 | \n",
" 0.009630 | \n",
" 11.350806 | \n",
" 100.057812 | \n",
"
\n",
" \n",
" | PI041 | \n",
" 46.292709 | \n",
" 3.553973 | \n",
" 17.191874 | \n",
" 0.170671 | \n",
" 6.221699 | \n",
" 9.443416 | \n",
" 2.701110 | \n",
" 1.076079 | \n",
" 0.610387 | \n",
" 0.007262 | \n",
" 10.986336 | \n",
" 98.255517 | \n",
"
\n",
" \n",
" | PI053 | \n",
" 47.515736 | \n",
" 3.084096 | \n",
" 15.182294 | \n",
" 0.169948 | \n",
" 8.250577 | \n",
" 9.640357 | \n",
" 2.298223 | \n",
" 1.013332 | \n",
" 0.448063 | \n",
" 0.009225 | \n",
" 10.955438 | \n",
" 98.567288 | \n",
"
\n",
" \n",
" | PI054 | \n",
" 47.524357 | \n",
" 3.369594 | \n",
" 16.456587 | \n",
" 0.174671 | \n",
" 6.569622 | \n",
" 9.216309 | \n",
" 2.653994 | \n",
" 1.367337 | \n",
" 0.648534 | \n",
" 0.029540 | \n",
" 10.953660 | \n",
" 98.964204 | \n",
"
\n",
" \n",
" | PI055 | \n",
" 46.452168 | \n",
" 3.287018 | \n",
" 16.039806 | \n",
" 0.169084 | \n",
" 7.396060 | \n",
" 9.469899 | \n",
" 2.035204 | \n",
" 1.072211 | \n",
" 0.539792 | \n",
" 0.011676 | \n",
" 10.896147 | \n",
" 97.369065 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" SiO2 TiO2 Al2O3 MnO MgO CaO Na2O \\\n",
"name \n",
"PI032 46.962818 3.447381 16.650124 0.174618 7.004558 9.400741 3.131253 \n",
"PI041 46.292709 3.553973 17.191874 0.170671 6.221699 9.443416 2.701110 \n",
"PI053 47.515736 3.084096 15.182294 0.169948 8.250577 9.640357 2.298223 \n",
"PI054 47.524357 3.369594 16.456587 0.174671 6.569622 9.216309 2.653994 \n",
"PI055 46.452168 3.287018 16.039806 0.169084 7.396060 9.469899 2.035204 \n",
"\n",
" K2O P2O5 Cr2O3 FeO total \n",
"name \n",
"PI032 1.290536 0.635346 0.009630 11.350806 100.057812 \n",
"PI041 1.076079 0.610387 0.007262 10.986336 98.255517 \n",
"PI053 1.013332 0.448063 0.009225 10.955438 98.567288 \n",
"PI054 1.367337 0.648534 0.029540 10.953660 98.964204 \n",
"PI055 1.072211 0.539792 0.011676 10.896147 97.369065 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wholerock_file = \"./data/wholerock.csv\"\n",
"\n",
"wholerock = mp.read_melt(wholerock_file, index_col=[\"name\"])\n",
"wholerock.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create the FeOi prediction object and initialise it with your data. Make sure to remove the FeO column from the melt compositions used to predict FeOi. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"x = wholerock.drop(columns=[\"FeO\"])\n",
"FeOi_predict = mpc.FeOi_prediction(x=x, FeO=wholerock[\"FeO\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select the columns in *x* that you do not want to use to predict initial FeO contents. Here we do not want to use minor elements and the totals column:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"do_not_use = [\"MnO\", \"P2O5\", \"Cr2O3\", \"total\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The *calculate_model_fits* method calculates best-fit multiple linear regressions for all possible combinations of elements in *x*. For each new regression, the element whose removal results in the lowest regression F-test p-value is removed from the dataset.\n",
"\n",
"It returns a dataframe with fitted coefficients and misfit statistics (RMSE, cross-validated RMSE and R2). *NaN* values for coefficients indicate that this element has been removed from the calibration dataset and is not used in the regression."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" intercept | \n",
" SiO2 | \n",
" TiO2 | \n",
" Al2O3 | \n",
" MgO | \n",
" CaO | \n",
" Na2O | \n",
" K2O | \n",
" RMSE | \n",
" CV-RMSE | \n",
" deltaRMSE | \n",
" r2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 6 | \n",
" 24.162353 | \n",
" -0.184450 | \n",
" 1.085716 | \n",
" -0.337465 | \n",
" NaN | \n",
" -0.292231 | \n",
" 0.344686 | \n",
" -0.579578 | \n",
" 0.151416 | \n",
" 0.278102 | \n",
" 0.126686 | \n",
" 0.930966 | \n",
"
\n",
" \n",
" | 5 | \n",
" 21.878478 | \n",
" -0.144730 | \n",
" 1.179186 | \n",
" -0.293183 | \n",
" NaN | \n",
" -0.296750 | \n",
" NaN | \n",
" -0.300763 | \n",
" 0.165387 | \n",
" 0.273553 | \n",
" 0.108166 | \n",
" 0.917640 | \n",
"
\n",
" \n",
" | 4 | \n",
" 22.442147 | \n",
" -0.168107 | \n",
" 1.114827 | \n",
" -0.289476 | \n",
" NaN | \n",
" -0.260280 | \n",
" NaN | \n",
" NaN | \n",
" 0.171032 | \n",
" 0.232349 | \n",
" 0.061317 | \n",
" 0.911921 | \n",
"
\n",
" \n",
" | 3 | \n",
" 11.585125 | \n",
" NaN | \n",
" 1.463365 | \n",
" -0.230722 | \n",
" NaN | \n",
" -0.169262 | \n",
" NaN | \n",
" NaN | \n",
" 0.206434 | \n",
" 0.237533 | \n",
" 0.031100 | \n",
" 0.871685 | \n",
"
\n",
" \n",
" | 2 | \n",
" 9.213242 | \n",
" NaN | \n",
" 1.640962 | \n",
" -0.217524 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 0.253769 | \n",
" 0.294748 | \n",
" 0.040978 | \n",
" 0.806092 | \n",
"
\n",
" \n",
" | 1 | \n",
" 5.357205 | \n",
" NaN | \n",
" 1.725906 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 0.319430 | \n",
" 0.355670 | \n",
" 0.036240 | \n",
" 0.692766 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" intercept SiO2 TiO2 Al2O3 MgO CaO Na2O K2O \\\n",
"6 24.162353 -0.184450 1.085716 -0.337465 NaN -0.292231 0.344686 -0.579578 \n",
"5 21.878478 -0.144730 1.179186 -0.293183 NaN -0.296750 NaN -0.300763 \n",
"4 22.442147 -0.168107 1.114827 -0.289476 NaN -0.260280 NaN NaN \n",
"3 11.585125 NaN 1.463365 -0.230722 NaN -0.169262 NaN NaN \n",
"2 9.213242 NaN 1.640962 -0.217524 NaN NaN NaN NaN \n",
"1 5.357205 NaN 1.725906 NaN NaN NaN NaN NaN \n",
"\n",
" RMSE CV-RMSE deltaRMSE r2 \n",
"6 0.151416 0.278102 0.126686 0.930966 \n",
"5 0.165387 0.273553 0.108166 0.917640 \n",
"4 0.171032 0.232349 0.061317 0.911921 \n",
"3 0.206434 0.237533 0.031100 0.871685 \n",
"2 0.253769 0.294748 0.040978 0.806092 \n",
"1 0.319430 0.355670 0.036240 0.692766 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_fits = FeOi_predict.calculate_model_fits(exclude=do_not_use)\n",
"model_fits"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"RMSE and R2 are both calculated on the entire calibration dataset and indicate how good the model is at predicting FeO. For RMSE lower values are better, while for R2 values close to 1 are best. To check for overfitting, cross-validated RMSE's (CV-RMSE) are also calculated, where large differences between RMSE and CV-RMSE (deltaRMSE) can indicate overfitting issues. As long as RMSE and R2 values are acceptable, the model with the smallest deltaRMSE should be selected. Here, models 1, 2 and 3 have similar RMSE, deltaRMSE and R2 values and any of these models would work well for predicting melt FeO contents. In models 4, 5 and 6, deltaRMSE has increased and overfitting might be an issue here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we use the results for the previous step to select our model. Here we use model 3, where TiO2, Al2O3 and CaO are used as predictors. In the *select_predictor* method, you pass the index of your preferred model to the *idx* parameter."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"FeOi_predict.select_predictors(idx=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can check if the right predictors are used with the *predictors* attribute"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['TiO2', 'Al2O3', 'CaO'], dtype=object)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FeOi_predict.predictors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Coefficients of the linear regression were automatically calculated by the *select_predictors* method and we can access these with the *coefficients*, *intercept_error* and *errors* attributes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fitted coefficients:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"intercept 11.585125\n",
"TiO2 1.463365\n",
"Al2O3 -0.230722\n",
"CaO -0.169262\n",
"dtype: float64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FeOi_predict.coefficients"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and their errors as standard deviations:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"intercept 1.203255\n",
"TiO2 0.164396\n",
"Al2O3 0.048417\n",
"CaO 0.049363\n",
"dtype: float64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FeOi_predict.errors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The *model* attribute stores the selected model as a callable function, which we can use to predict melt FeO contents. The function accepts a pandas DataFrame or MagmaFrame with melt major element compositions in oxide wt. % as input. Let's try that out on the whole-rock data:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"name\n",
"PI032 11.197161\n",
"PI041 11.220927\n",
"PI053 10.963646\n",
"PI054 11.159202\n",
"PI055 11.091599\n",
"PI056 10.944143\n",
"PI057 10.590489\n",
"PI058 10.757390\n",
"PI060 11.154730\n",
"PI063 11.988362\n",
"PI065 11.165823\n",
"PI066 11.055243\n",
"PI067 11.146931\n",
"PI068 11.240555\n",
"PI069 10.846550\n",
"PI070 10.381338\n",
"PI071 10.801455\n",
"PI074 10.470945\n",
"PI077 10.007513\n",
"PI078 9.718330\n",
"PI079 9.230219\n",
"PI081 10.813904\n",
"PI084 10.438283\n",
"PI094 10.511939\n",
"PI097 10.183464\n",
"PI111 10.866402\n",
"PI114 10.701429\n",
"dtype: float32"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = FeOi_predict.model\n",
"model(wholerock)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The *random_sample_coefficients* method randomly samples fitted coefficients within their errors and calculates the matching x-intercept value. This method is used internally in the Monte Carlo PEC correction model to propagate FeOi prediction errors."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" TiO2 | \n",
" Al2O3 | \n",
" CaO | \n",
" intercept | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1.555664 | \n",
" -0.186890 | \n",
" -0.129272 | \n",
" 10.195312 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1.704102 | \n",
" -0.263428 | \n",
" -0.229004 | \n",
" 11.937500 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1.424805 | \n",
" -0.151489 | \n",
" -0.275146 | \n",
" 11.398438 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1.440430 | \n",
" -0.228882 | \n",
" -0.127197 | \n",
" 11.226562 | \n",
"
\n",
" \n",
" | 4 | \n",
" 1.457031 | \n",
" -0.237793 | \n",
" -0.176147 | \n",
" 11.789062 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TiO2 Al2O3 CaO intercept\n",
"0 1.555664 -0.186890 -0.129272 10.195312\n",
"1 1.704102 -0.263428 -0.229004 11.937500\n",
"2 1.424805 -0.151489 -0.275146 11.398438\n",
"3 1.440430 -0.228882 -0.127197 11.226562\n",
"4 1.457031 -0.237793 -0.176147 11.789062"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FeOi_predict.random_sample_coefficients(n=5)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "py310",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}