1.1 -Basics of geological modeling with GemPy

Importing GemPy

import gempy as gp

# Importing auxiliary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

# Setting options
np.random.seed(1515)
pd.set_option('precision', 2)

Importing and creating a set of input data

The data used for the construction of a model in GemPy is stored in Python objects. The main data classes are:

-  Surface_points
-  Orientations
-  Grid
-  Surfaces
-  Series
-  Additional data
-  Faults

We will see each of this class in further detail in the future.

Most of data can also be generated from raw data that comes in the form of CSV-files (CSV = comma-separated values). Such files might be attained by exporting model data from a different program such as GeoModeller or by simply creating it in spreadsheet software such as Microsoft Excel or LibreOffice Calc.

In this tutorial, all input data is created by importing such CSV-files. These exemplary files can be found in the input_data folder in the root folder of GemPy. The data comprises \(x\)-, \(y\)- and \(z\)-positional values for all surface points and orientation measurements. For the latter, poles, azimuth and polarity are additionally included. Surface points are furthermore assigned a formation. This might be a lithological unit such as “Sandstone” or a structural feature such as “Main Fault”. It is decisive to remember that, in GemPy, interface position points mark the bottom of a layer. If such points are needed to resemble a top of a formation (e.g. when modeling an intrusion), this can be achieved by defining a respectively inverted orientation measurement.

As we generate our Data from CSV-files, we also have to define our model’s real extent in \(x\), \(y\) and \(z\), as well as declare a desired resolution for each axis. This resolution will in turn determine the number of voxels used during modeling. Here, we rely on a medium resolution of 50x50x50, amounting to 125,000 voxels. The model extent should be chosen in a way that it contains all relevant data in a representative space. As our model voxels are not cubes, but prisms, the resolution can take a different shape than the extent. We don’t recommend going much higher than 100 cells in every direction (1,000,000 voxels), as higher resolutions will become increasingly expensive to compute.

geo_model = gp.create_model('Tutorial_ch1_1_Basics')
data_path = 'https://raw.githubusercontent.com/cgre-aachen/gempy_data/master/'
# Importing the data from CSV-files and setting extent and resolution
gp.init_data(geo_model, [0, 2000., 0, 2000., 0, 750.], [50, 50, 50],
             path_o=data_path + "/data/input_data/getting_started/"
                                "simple_fault_model_orientations.csv",
             path_i=data_path + "/data/input_data/getting_started/"
                                "simple_fault_model_points.csv",
             default_values=True)

Out:

Active grids: ['regular']

Tutorial_ch1_1_Basics  2020-09-15 11:59
surface series order_surfaces color id
0 Shale Default series 1 #015482 1
1 Sandstone_1 Default series 2 #9f0052 2
2 Siltstone Default series 3 #ffbe00 3
3 Sandstone_2 Default series 4 #728f02 4
4 Main_Fault Default series 5 #443988 5
5 basement Basement 1 #ff3f20 6


The input data can then be listed using the command get_data. Note that the order of formations and respective allocation to series is still completely arbitrary. We will fix this in the following.

gp.get_data(geo_model, 'surface_points').head()
X Y Z smooth surface
0 1000 50 450.00 2.00e-06 Shale
1 1000 150 433.33 2.00e-06 Shale
2 1000 300 433.33 2.00e-06 Shale
3 1000 500 466.67 2.00e-06 Shale
4 1000 1000 533.33 2.00e-06 Shale


gp.get_data(geo_model, 'orientations').head()
X Y Z G_x G_y G_z smooth surface
0 1000 1000 300 0.32 1.00e-12 0.95 0.01 Shale
1 400 1000 420 0.32 1.00e-12 0.95 0.01 Sandstone_2
2 500 1000 300 -0.95 1.00e-12 0.32 0.01 Main_Fault


Declaring the sequential order of geological formations

  • TODO update this section

We want our geological units to appear in the correct order relative to age. Such order might for example be given by a depositional sequence of stratigraphy, unconformities due to erosion or other lithological genesis events such as igneous intrusions. A similar age-related order is to be declared for the faults in our model. In GemPy, the function set_series is used to assign formations to different sequential series via declaration in a Python dictionary.

Defining the correct order of series is vital to the construction of the model! If you are using Python >3.6, the age-related order will already be defined by the order of key entries, i.e. the first entry is the youngest series, the last one the oldest. For older versions of Python, you will have to specify the correct order as a separate list attribute “order_series” (see cell below).

You can assign several surfaces to one series. The order of the units within such as series is only relevant for the color code, thus we recommend to be consistent. You can define this order via another attribute “order_formations/ ” or by using the specific command set_order_formations. (If the order of the pile differs from the final result the color of the interfaces and input data will be different.)

Every fault is treated as an independent series and have to be at set at the top of the pile. The relative order between the distinct faults defines the tectonic relation between them (first entry is the youngest).

In a model with simple sequential stratigraphy, all layer formations can be assigned to one single series without a problem. All unit boundaries and their order would then be given by interface points. However, to model more complex lithostratigraphical relations and interactions, the definition of separate series becomes important. For example, you would need to declare a “newer” series to model an unconformity or an intrusion that disturbs older stratigraphy.

By default we create a simple sequence inferred by the data:

Our example model comprises four main layers (plus an underlying basement that is automatically generated by GemPy) and one main normal fault displacing those layers. Assuming a simple stratigraphy where each younger unit was deposited onto the underlying older one, we can assign these layer formations to one series called “Strat_Series”. For the fault, we declare a respective “Fault_Series” as the first key entry in the set_series dictionary. We could give any other names to these series, the formations however have to be referred to as named in the input data.

surface series order_surfaces color id
0 Shale Default series 1 #015482 1
1 Sandstone_1 Default series 2 #9f0052 2
2 Siltstone Default series 3 #ffbe00 3
3 Sandstone_2 Default series 4 #728f02 4
4 Main_Fault Default series 5 #443988 5
5 basement Basement 1 #ff3f20 6


gp.map_stack_to_surfaces(geo_model,
                         {"Fault_Series": 'Main_Fault',
                          "Strat_Series": ('Sandstone_2', 'Siltstone',
                                           'Shale', 'Sandstone_1', 'basement')},
                         remove_unused_series=True)
surface series order_surfaces color id
4 Main_Fault Fault_Series 1 #443988 1
0 Shale Strat_Series 1 #015482 2
1 Sandstone_1 Strat_Series 2 #9f0052 3
2 Siltstone Strat_Series 3 #ffbe00 4
3 Sandstone_2 Strat_Series 4 #728f02 5
5 basement Strat_Series 5 #ff3f20 6


surface series order_surfaces color id
4 Main_Fault Fault_Series 1 #443988 1
0 Shale Strat_Series 1 #015482 2
1 Sandstone_1 Strat_Series 2 #9f0052 3
2 Siltstone Strat_Series 3 #ffbe00 4
3 Sandstone_2 Strat_Series 4 #728f02 5
5 basement Strat_Series 5 #ff3f20 6


order_series BottomRelation isActive isFault isFinite
Fault_Series 1 Erosion False False False
Strat_Series 2 Erosion False False False


geo_model.set_is_fault(['Fault_Series'])

Out:

Fault colors changed. If you do not like this behavior, set change_color to False.
order_series BottomRelation isActive isFault isFinite
Fault_Series 1 Fault True True False
Strat_Series 2 Erosion True False False


Fault_Series Strat_Series
Fault_Series False True
Strat_Series False False


order_series BottomRelation isActive isFault isFinite
Fault_Series 1 Fault True True False
Strat_Series 2 Erosion True False False


Fault_Series Strat_Series
Fault_Series False True
Strat_Series False False


Returning information from our input data

Our model input data, here named “geo_model”, contains all the information that is essential for the construction of our model. You can access different types of information by using gp.get_data or simply by accessiong the atrribues.

We can, for example, return the coordinates of our modeling grid via:

Out:

Grid Object. Values:
array([[  20. ,   20. ,    7.5],
       [  20. ,   20. ,   22.5],
       [  20. ,   20. ,   37.5],
       ...,
       [1980. , 1980. ,  712.5],
       [1980. , 1980. ,  727.5],
       [1980. , 1980. ,  742.5]])

As mentioned before, GemPy’s core algorithm is based on interpolation of two types of data: - surface_points and - orientation measurements

(if you want to know more on how this this interpolation algorithm works, checkout our paper: https://www.geosci-model-dev.net/12/1/2019/gmd-12-1-2019.pdf).

We introduced the function get\_data above. You can also specify which kind of data you want to call, by declaring the string attribute “dtype” to be either 'surface_points' (interfaces) or 'orientations'.

Interfaces Dataframe:

gp.get_data(geo_model, 'surface_points').head()
X Y Z smooth surface
52 700 1000 300.0 2.00e-06 Main_Fault
53 600 1000 200.0 2.00e-06 Main_Fault
54 500 1000 100.0 2.00e-06 Main_Fault
55 1000 1000 600.0 2.00e-06 Main_Fault
56 1100 1000 700.0 2.00e-06 Main_Fault


Orientations Dataframe:

gp.get_data(geo_model, 'orientations')
X Y Z G_x G_y G_z smooth surface
2 500 1000 300 -0.95 1.00e-12 0.32 0.01 Main_Fault
0 1000 1000 300 0.32 1.00e-12 0.95 0.01 Shale
1 400 1000 420 0.32 1.00e-12 0.95 0.01 Sandstone_2


Notice that now all surfaces have been assigned to a series and are displayed in the correct order (from young to old).

Visualizing input data

We can also visualize our input data. This might for example be useful to check if all points and measurements are defined the way we want them to. Using the function plot_data, we attain a 2D projection of our data points onto a plane of chosen direction (we can choose this attribute to be either \(x\), \(y\) or \(z\)).

plot = gp.plot_2d(geo_model, show_lith=False, show_boundaries=False)
plt.show()
Cell Number: mid Direction: y

Using plot_data_3D, we can also visualize this data in 3D. Note that direct 3D visualization in GemPy requires the Visualization Toolkit (VTK) to be installed.

All 3D plots in GemPy are interactive. This means that we can drag and drop any data point and measurement. The perpendicular axis views in VTK are particularly useful to move points solely on a desired 2D plane. Any changes will then be stored permanently in the “InputData” dataframe. If we want to reset our data points, we will then need to reload our original input data.

Executing the cell below will open a new window with a 3D interactive plot of our data.

gpv = gp.plot_3d(geo_model, image=False, plotter_type='basic')
ch1 1 basics

Model generation

Once we have made sure that we have defined all our primary information as desired in our object DataManagement.InputData (named geo_data in these tutorials), we can continue with the next step towards creating our geological model: preparing the input data for interpolation.

This is done by generating an InterpolatorData object (named interp_data in these tutorials) from our InputData object via the following function:

gp.set_interpolator(geo_model,
                    compile_theano=True,
                    theano_optimizer='fast_compile',
                    )

Out:

Setting kriging parameters to their default values.
Compiling theano function...
Level of Optimization:  fast_compile
Device:  cpu
Precision:  float64
Number of faults:  1
Compilation Done!
Kriging values:
                   values
range            2.9e+03
$C_o$              2e+05
drift equations   [3, 3]

<gempy.core.interpolator.InterpolatorModel object at 0x7f8572ca04d0>

This function rescales the extent and coordinates of the original data (and store it in the attribute geo_data_res which behaves as a usual InputData object) and adds mathematical parameters that are needed for conducting the interpolation. The computation of this step may take a while, as it also compiles a theano function which is required for the model computation. However, should this not be needed, we can skip it by declaring compile_theano = False in the function.

Furthermore, this preparation process includes an assignment of numbers to each formation. Note that GemPy’s always creates a default basement formation as the last formation number. Afterwards, numbers are allocated from youngest to oldest as defined by the sequence of series and formations. On the property formations on our interpolation data, we can find out which number has been assigned to which formation:

The parameters used for the interpolation can be returned using the function get_kriging_parameters. These are generated automatically from the original data, but can be changed if needed. However, users should be careful doing so, if they do not fully understand their significance.

gp.get_data(geo_model, 'kriging')
values
range 2.9e+03
$C_o$ 2e+05
drift equations [3, 3]


At this point, we have all we need to compute our full model via compute_model. By default, this will return two separate solutions in the form of arrays. The first gives information on the lithological formations, the second on the fault network in the model. These arrays consist of two subarrays as entries each:

  1. Lithology block model solution:

    • Entry [0]: This array shows what kind of lithological formation is found in each voxel, as indicated by a respective formation_number.

    • Entry [1]: Potential field array that represents the orientation of lithological units and layers in the block model.

  2. Fault network block model solution:

    • Entry [0]: Array in which all fault-separated areas of the model are represented by a distinct number contained in each voxel.

    • Entry [1]: Potential field array related to the fault network in the block model.

Below, we illustrate these different model solutions and how they can be used.

Out:

Lithology ids
  [6. 6. 6. ... 2. 2. 2.]

Out:

Lithology ids
  [6. 6. 6. ... 2. 2. 2.]

Direct model visualization in GemPy

Model solutions can be easily visualized in 2D sections in GemPy directly. Let’s take a look at our lithology block:

gp.plot_2d(geo_model, show_data=True)
plt.show()
Cell Number: mid Direction: y

With cell_number=25 and remembering that we defined our resolution to be 50 cells in each direction, we have chosen a section going through the middle of our block. We have moved 25 cells in direction='y', the plot thus depicts a plane parallel to the \(x\)- and \(y\)-axes. Setting plot_data=True, we could plot original data together with the results. Changing the values for cell_number and direction, we can move through our 3D block model and explore it by looking at different 2D planes.

We can do the same with out lithological scalar-field solution:

gp.plot_2d(geo_model, show_data=False, show_scalar=True, show_lith=False)
plt.show()
Cell Number: mid Direction: y
gp.plot_2d(geo_model, series_n=1, show_data=False, show_scalar=True, show_lith=False)
plt.show()
Cell Number: mid Direction: y

This illustrates well the fold-related deformation of the stratigraphy, as well as the way the layers are influenced by the fault.

The fault network modeling solutions can be visualized in the same way:

Out:

array([[0.03075848, 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.77174354, 0.72471042, 0.80357372, 0.83598092]])
gp.plot_2d(geo_model, show_block=True, show_lith=False)
plt.show()
Cell Number: mid Direction: y
gp.plot_2d(geo_model, series_n=1, show_block=True, show_lith=False)
plt.show()
Cell Number: mid Direction: y

Marching cubes and vtk visualization

In addition to 2D sections we can extract surfaces to visualize in 3D renderers. Surfaces can be visualized as 3D triangle complexes in VTK (see function plot_surfaces_3D below). To create these triangles, we need to extract respective vertices and simplices from the potential fields of lithologies and faults. This process is automatized in GemPy with the function get_surface.

ver, sim = gp.get_surfaces(geo_model)
gpv = gp.plot_3d(geo_model, image=False, plotter_type='basic')
ch1 1 basics

Using the rescaled interpolation data, we can also run our 3D VTK visualization in an interactive mode which allows us to alter and update our model in real time. Similarly to the interactive 3D visualization of our input data, the changes are permanently saved (in the InterpolationInput.dataframe object). Additionally, the resulting changes in the geological models are re-computed in real time.

Adding topography

geo_model.set_topography(d_z=(350, 750))

Out:

Active grids: ['regular' 'topography']

Grid Object. Values:
array([[  20.        ,   20.        ,    7.5       ],
       [  20.        ,   20.        ,   22.5       ],
       [  20.        ,   20.        ,   37.5       ],
       ...,
       [2000.        , 1918.36734694,  423.48951452],
       [2000.        , 1959.18367347,  430.25455308],
       [2000.        , 2000.        ,  431.07163663]])
gp.compute_model(geo_model)
gp.plot_2d(geo_model, show_topography=True)
plt.show()


# sphinx_gallery_thumbnail_number = 9
gpv = gp.plot_3d(geo_model, plotter_type='basic', show_topography=True, show_surfaces=True,
                 show_lith=True,
                 image=False)
ch1 1 basics Cell Number: mid Direction: y

Out:

/home/travis/build/cgre-aachen/gempy/gempy/core/solution.py:128: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  self.geological_map = np.array([values[0][:, l0: l1], values[4][:, l0: l1].astype(float)])

Compute at a given location

This is done by modifying the grid to a custom grid and recomputing. Notice that the results are given as grid + surfaces_points_ref + surface_points_rest locations

x_i = np.array([[3, 5, 6]])
sol = gp.compute_model(geo_model, at=x_i)

Out:

Active grids: ['custom']
/home/travis/build/cgre-aachen/gempy/gempy/core/solution.py:124: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  self.custom = np.array([values[0][:, l0: l1], values[4][:, l0: l1].astype(float)])

Therefore if we just want the value at x_i:

Out:

array([array([[6.]]), array([[0.18630133],
       [0.63163565]])], dtype=object)

This return the id, and the scalar field values for each series

Save the model

GemPy uses Python [pickle] for fast storing temporary objects (https://docs.python.org/3/library/pickle.html). However, module version consistency is required. For loading a pickle into GemPy, you have to make sure that you are using the same version of pickle and dependent modules (e.g.: Pandas, NumPy) as were used when the data was originally stored.

For long term-safer storage we can export the pandas.DataFrames to csv by using:

gp.save_model(geo_model)

Out:

True

Total running time of the script: ( 0 minutes 24.790 seconds)

Gallery generated by Sphinx-Gallery