1.2: Data Structure and Manipulation

# Importing GemPy
import gempy as gp
import gempy

# Importing auxiliary libraries
import numpy as np
import pandas as pd
pd.set_option('precision', 2)

Series

Series is the object that contains the properties associated with each independent scalar field. Right now it is simply the order of the series (which is inferred by the index order). But in the future will be add the unconformity relation or perhaps the type of interpolator

Series and Faults classes are quite entangled since fault are a view of series

order_series BottomRelation isActive
Default series 1 Erosion False


We can modify the series bt using set_series_index:

series.set_series_index(['foo', 'foo2', 'foo5', 'foo7'])
series
order_series BottomRelation isActive
foo 1 Erosion False
foo2 2 Erosion False
foo5 3 Erosion False
foo7 4 Erosion False


The index of series are pandas categories. These provides quite handy backend functionality (see pandas.Categorical).

Out:

CategoricalIndex(['foo', 'foo2', 'foo5', 'foo7'], categories=['foo', 'foo2', 'foo5', 'foo7'], ordered=False, dtype='category')

For adding new series:

order_series BottomRelation isActive
foo 1 Erosion False
foo2 2 Erosion False
foo5 3 Erosion False
foo7 4 Erosion False
foo3 5 Erosion False


Delete series

order_series BottomRelation isActive
foo 1 Erosion False
foo2 2 Erosion False
foo5 3 Erosion False
foo7 4 Erosion False


Rename series:

series.rename_series({'foo': 'boo'})
series
order_series BottomRelation isActive
boo 1 Erosion False
foo2 2 Erosion False
foo5 3 Erosion False
foo7 4 Erosion False


Reorder series:

series.reorder_series(['foo2', 'boo', 'foo7', 'foo5'])
series
order_series BottomRelation isActive
foo2 1 Erosion False
boo 2 Erosion False
foo7 3 Erosion False
foo5 4 Erosion False


Faults

The df faults is used to characterize which mathematical series behave as fault and if mentioned faults are finite or infinite. Both dataframes get updated automatically as we modify the series object linked to the fault object (by passing it when a Series object is created).

isFault isFinite
foo2 False False
boo False False
foo7 False False
foo5 False False


Finally we have the faults relations df which captures which mathematical series a given fault offset in order to reproduce complex faulting networks

foo2 boo foo7 foo5
foo2 False False False False
boo False False False False
foo7 False False False False
foo5 False False False False


We can use set_is_fault to choose which of our series are faults:

isFault isFinite
foo2 False False
boo True False
foo7 False False
foo5 False False


Similar thing for the fault relations:

fr = np.zeros((4, 4))
fr[2, 2] = True
fr[1, 2] = True

faults.set_fault_relation(fr)
foo2 boo foo7 foo5
foo2 False False False False
boo False False True False
foo7 False False False False
foo5 False False False False


Now if we change the series df and we update the series already defined will conserve their values while the new ones will be set to false:

order_series BottomRelation isActive
foo2 1 Erosion False
boo 2 Erosion False
foo7 3 Erosion False
foo5 4 Erosion False
foo20 5 Erosion False


order_series BottomRelation isActive
foo2 1 Erosion False
boo 2 Erosion False
foo7 3 Erosion False
foo5 4 Erosion False
foo20 5 Erosion False


isFault isFinite
foo2 False False
boo True False
foo7 False False
foo5 False False
foo20 False False


foo2 boo foo7 foo5 foo20
foo2 False False False False False
boo False False True False False
foo7 False False False False False
foo5 False False False False False
foo20 False False False False False


When we add new series the values switch to NaN. We will be careful not having any NaNs in the DataFrames or we will raise errors down the line.

isFault isFinite
foo2 False False
boo True False
foo7 False False
foo5 False False
foo20 False False


foo2 boo foo7 foo5 foo20
foo2 False False False False False
boo False False False False False
foo7 False False False False False
foo5 False False False False False
foo20 False False False False False


Surfaces:

The df surfaces contains three properties. id refers to the order of the surfaces on the sequential pile, i.e. the strict order of computation. values on the other hand is the final value that each voxel will have after discretization. This may be useful for example in the case we want to map a specific geophysical property (such as density) to a given unit. By default both are the same since to discretize lithological units the value is arbitrary.

From an empty df

The Surfaces class needs to have an associate series object. This will limit the name of the series since they are a pandas.Categorical.

We can set any number of formations by passing a list with the names. By default they will take the name or the first series.

surfaces.set_surfaces_names(['foo', 'foo2', 'foo5'])
surface series order_surfaces color id
0 foo foo20 1 #015482 1
1 foo2 foo20 2 #9f0052 2
2 foo5 foo20 3 #ffbe00 3


order_series BottomRelation isActive
foo2 1 Erosion False
boo 2 Erosion False
foo7 3 Erosion False
foo5 4 Erosion False
foo20 5 Erosion False


We can add new formations:

surface series order_surfaces color id
0 foo foo20 1 #015482 1
1 foo2 foo20 2 #9f0052 2
2 foo5 foo20 3 #ffbe00 3
3 feeeee foo20 4 #728f02 4


The column formation is also a ‘’pandas.Categorical’’. This will be important for the Data classes (surface_points and Orientations)

surfaces.df['surface']

Out:

0       foo
1      foo2
2      foo5
3    feeeee
Name: surface, dtype: object
surface series order_surfaces color id
0 foo foo20 1 #015482 1
1 foo2 foo20 2 #9f0052 2
2 foo5 foo20 3 #ffbe00 3
3 feeeee foo20 4 #728f02 4


Set values

To set the values we do it with the following method

surface series order_surfaces color id value_0
0 foo foo20 1 #015482 1 2
1 foo2 foo20 2 #9f0052 2 2
2 foo5 foo20 3 #ffbe00 3 2
3 feeeee foo20 4 #728f02 4 5


surface series order_surfaces color id value_0
0 foo foo20 1 #015482 1 2
1 foo2 foo20 2 #9f0052 2 2
2 foo5 foo20 3 #ffbe00 3 2
3 feeeee foo20 4 #728f02 4 5


Set values with a given name:

We can give specific names to the properties (i.e. density)

surfaces.add_surfaces_values([[2, 2, 2, 6], [2, 2, 1, 8]], ['val_foo', 'val2_foo'])
surface series order_surfaces color id value_0 val_foo val2_foo
0 foo foo20 1 #015482 1 2 2 2
1 foo2 foo20 2 #9f0052 2 2 2 2
2 foo5 foo20 3 #ffbe00 3 2 2 1
3 feeeee foo20 4 #728f02 4 5 6 8


surface series order_surfaces color id value_0 val_foo val2_foo
0 foo foo20 1 #015482 1 2 2 2
1 foo2 foo20 2 #9f0052 2 2 2 2
2 foo5 foo20 3 #ffbe00 3 2 2 1
3 feeeee foo20 4 #728f02 4 5 6 8


Delete formations values

To delete a full property:

surfaces.delete_surface_values(['val_foo', 'value_0'])

Out:

True

One of the formations must be set be the basement:

surface series order_surfaces color id val2_foo
0 foo foo20 1 #015482 1 2
1 foo2 foo20 2 #9f0052 2 2
2 foo5 foo20 3 #ffbe00 3 1
3 feeeee foo20 4 #728f02 4 8


Set formation values

We can also use set_surface_values instead adding. This will delete the previous properties and add the new one

surfaces.set_surfaces_values([[2, 2, 2, 6], [2, 2, 1, 8]], ['val_foo', 'val2_foo'])
surfaces
surface series order_surfaces color id val_foo val2_foo
0 foo foo20 1 #015482 1 2 2
1 foo2 foo20 2 #9f0052 2 2 2
2 foo5 foo20 3 #ffbe00 3 2 1
3 feeeee foo20 4 #728f02 4 6 8


The last property is the correspondant series that each formation belong to. series and formation are pandas categories. To get a overview of what this mean check https://pandas.pydata.org/pandas-docs/stable/categorical.html.

surfaces.df['series']

Out:

0    foo20
1    foo20
2    foo20
3    foo20
Name: series, dtype: category
Categories (5, object): ['foo2', 'boo', 'foo7', 'foo5', 'foo20']
surfaces.df['surface']

Out:

0       foo
1      foo2
2      foo5
3    feeeee
Name: surface, dtype: object

Map series to formation

To map a series to a formation we can do it by passing a dict:

surface series order_surfaces color id val_foo val2_foo
0 foo foo20 1 #015482 1 2 2
1 foo2 foo20 2 #9f0052 2 2 2
2 foo5 foo20 3 #ffbe00 3 2 1
3 feeeee foo20 4 #728f02 4 6 8


order_series BottomRelation isActive
foo2 1 Erosion False
boo 2 Erosion False
foo7 3 Erosion False
foo5 4 Erosion False
foo20 5 Erosion False


If a series does not exist in the Series object, we rise a warning and we set those formations to nans

d = {"foo7": 'foo', "booX": ('foo2', 'foo5', 'fee')}
surface series order_surfaces color id val_foo val2_foo
0 foo foo7 1 #015482 1 2 2
1 foo2 foo20 1 #9f0052 2 2 2
2 foo5 foo20 2 #ffbe00 3 2 1
3 feeeee foo20 3 #728f02 4 6 8


surfaces.map_series({"foo7": 'foo', "boo": ('foo2', 'foo5', 'fee')})
surface series order_surfaces color id val_foo val2_foo
1 foo2 boo 1 #9f0052 1 2 2
2 foo5 boo 2 #ffbe00 2 2 1
0 foo foo7 1 #015482 3 2 2
3 feeeee foo20 1 #728f02 4 6 8


surface series order_surfaces color id val_foo val2_foo
1 foo2 boo 1 #9f0052 1 2 2
2 foo5 boo 2 #ffbe00 2 2 1
0 foo foo7 1 #015482 3 2 2
3 feeeee foo20 1 #728f02 4 6 8


An advantage of categories is that they are order so no we can tidy the df by series and formation

Modify surface name

surfaces.rename_surfaces({'foo2': 'lala'})
surface series order_surfaces color id val_foo val2_foo
1 lala boo 1 #9f0052 1 2 2
2 foo5 boo 2 #ffbe00 2 2 1
0 foo foo7 1 #015482 3 2 2
3 feeeee foo20 1 #728f02 4 6 8


surface series order_surfaces color id val_foo val2_foo
1 lala boo 1 #9f0052 1 2 2
2 foo5 boo 2 #ffbe00 2 2 1
0 foo foo7 1 #015482 3 2 2
3 feeeee foo20 1 #728f02 4 6 8


surfaces.df.loc[2, 'val_foo'] = 22
surface series order_surfaces color id val_foo val2_foo
1 lala boo 1 #9f0052 1 2 2
2 foo5 boo 2 #ffbe00 2 22 1
0 foo foo7 1 #015482 3 2 2
3 feeeee foo20 1 #728f02 4 6 8


Modify surface color

The surfaces DataFrame also contains a column for the color in which the surfaces are displayed. To change the color, call

surfaces.colors.change_colors()

Out:

Click to select new colors.
VBox(children=(ColorPicker(value='#015482', description='foo'), ColorPicker(value='#9f0052', description='foo2'), ColorPicker(value='#ffbe00', description='foo5'), ColorPicker(value='#728f02', description='feeeee')))

This allow to change the colors interactively. If you already know which colors you want to use, you can also update them with a dictionary mapping the surface name to a hex color string:

new_colors = {'foo': '#ff8000', 'foo5': '#4741be'}
surfaces.colors.change_colors(new_colors)

Data

surface_points

These two DataFrames (df from now on) will contain the individual information of each point at an interface or orientation. Some properties of this table are mapped from the df below.

X Y Z smooth surface


surface_points.set_surface_points(pd.DataFrame(np.random.rand(6, 3)),
                                  ['foo', 'foo5', 'lala', 'foo5', 'lala', 'feeeee'])
X Y Z smooth surface
0 0.39 0.41 0.82 2.00e-06 foo
1 0.69 0.89 0.06 2.00e-06 foo5
2 0.53 0.55 0.03 2.00e-06 lala
3 0.02 0.23 0.55 2.00e-06 foo5
4 0.95 0.83 0.94 2.00e-06 lala
5 0.45 0.66 0.41 2.00e-06 feeeee


X Y Z smooth surface
0 0.39 0.41 0.82 2.00e-06 foo
1 0.69 0.89 0.06 2.00e-06 foo5
2 0.53 0.55 0.03 2.00e-06 lala
3 0.02 0.23 0.55 2.00e-06 foo5
4 0.95 0.83 0.94 2.00e-06 lala
5 0.45 0.66 0.41 2.00e-06 feeeee


surface_points.map_data_from_surfaces(surfaces, 'series')
surface_points
X Y Z smooth surface
0 0.39 0.41 0.82 2.00e-06 foo
1 0.69 0.89 0.06 2.00e-06 foo5
2 0.53 0.55 0.03 2.00e-06 lala
3 0.02 0.23 0.55 2.00e-06 foo5
4 0.95 0.83 0.94 2.00e-06 lala
5 0.45 0.66 0.41 2.00e-06 feeeee


surface_points.map_data_from_surfaces(surfaces, 'id')
surface_points
X Y Z smooth surface
0 0.39 0.41 0.82 2.00e-06 foo
1 0.69 0.89 0.06 2.00e-06 foo5
2 0.53 0.55 0.03 2.00e-06 lala
3 0.02 0.23 0.55 2.00e-06 foo5
4 0.95 0.83 0.94 2.00e-06 lala
5 0.45 0.66 0.41 2.00e-06 feeeee


order_series BottomRelation isActive
foo2 1 Erosion False
boo 2 Erosion False
foo7 3 Erosion False
foo5 4 Erosion False
foo20 5 Erosion False


surface_points.map_data_from_series(series, 'order_series')
surface_points
X Y Z smooth surface
0 0.39 0.41 0.82 2.00e-06 foo
1 0.69 0.89 0.06 2.00e-06 foo5
2 0.53 0.55 0.03 2.00e-06 lala
3 0.02 0.23 0.55 2.00e-06 foo5
4 0.95 0.83 0.94 2.00e-06 lala
5 0.45 0.66 0.41 2.00e-06 feeeee


X Y Z smooth surface
2 0.53 0.55 0.03 2.00e-06 lala
4 0.95 0.83 0.94 2.00e-06 lala
1 0.69 0.89 0.06 2.00e-06 foo5
3 0.02 0.23 0.55 2.00e-06 foo5
0 0.39 0.41 0.82 2.00e-06 foo
5 0.45 0.66 0.41 2.00e-06 feeeee


isFault isFinite
foo2 False False
boo True False
foo7 False False
foo5 False False
foo20 False False


Set values passing pole vectors:

orientations.set_orientations(np.random.rand(6, 3) * 10,
                              np.random.rand(6, 3),
                              surface=['foo', 'foo5', 'lala', 'foo5',
                                       'lala', 'feeeee'])
X Y Z G_x G_y G_z smooth surface
0 8.48 3.08 8.24 0.40 0.06 0.37 0.01 foo
1 6.87 0.41 8.03 0.84 0.27 0.31 0.01 foo5
2 8.21 3.42 1.50 0.89 0.56 0.38 0.01 lala
3 9.39 9.91 0.13 0.72 0.84 0.38 0.01 foo5
4 2.07 8.72 3.94 0.36 0.70 0.93 0.01 lala
5 2.60 0.10 4.96 1.00 0.75 0.97 0.01 feeeee


Set values pasing orientation data: azimuth, dip, pole (dip direction)

orientations.set_orientations(np.random.rand(6, 3) * 10,
                              orientation=np.random.rand(6, 3) * 20,
                              surface=['foo', 'foo5', 'lala', 'foo5',
                                       'lala', 'feeeee'])
X Y Z G_x G_y G_z smooth surface
0 8.33 8.79 5.68 0.43 4.03 12.22 0.01 foo
1 8.25 6.22 5.13 0.02 0.06 7.72 0.01 foo5
2 3.37 6.46 0.18 0.69 5.60 16.39 0.01 lala
3 3.82 1.27 8.42 0.15 4.21 13.70 0.01 foo5
4 5.79 9.10 0.89 0.54 6.43 18.63 0.01 lala
5 2.46 3.97 8.81 0.02 0.33 4.14 0.01 feeeee


Mapping data from the other df

orientations.map_data_from_surfaces(surfaces, 'series')
orientations
X Y Z G_x G_y G_z smooth surface
0 8.33 8.79 5.68 0.43 4.03 12.22 0.01 foo
1 8.25 6.22 5.13 0.02 0.06 7.72 0.01 foo5
2 3.37 6.46 0.18 0.69 5.60 16.39 0.01 lala
3 3.82 1.27 8.42 0.15 4.21 13.70 0.01 foo5
4 5.79 9.10 0.89 0.54 6.43 18.63 0.01 lala
5 2.46 3.97 8.81 0.02 0.33 4.14 0.01 feeeee


orientations.map_data_from_surfaces(surfaces, 'id')
orientations
X Y Z G_x G_y G_z smooth surface
0 8.33 8.79 5.68 0.43 4.03 12.22 0.01 foo
1 8.25 6.22 5.13 0.02 0.06 7.72 0.01 foo5
2 3.37 6.46 0.18 0.69 5.60 16.39 0.01 lala
3 3.82 1.27 8.42 0.15 4.21 13.70 0.01 foo5
4 5.79 9.10 0.89 0.54 6.43 18.63 0.01 lala
5 2.46 3.97 8.81 0.02 0.33 4.14 0.01 feeeee


orientations.map_data_from_series(series, 'order_series')
orientations
X Y Z G_x G_y G_z smooth surface
0 8.33 8.79 5.68 0.43 4.03 12.22 0.01 foo
1 8.25 6.22 5.13 0.02 0.06 7.72 0.01 foo5
2 3.37 6.46 0.18 0.69 5.60 16.39 0.01 lala
3 3.82 1.27 8.42 0.15 4.21 13.70 0.01 foo5
4 5.79 9.10 0.89 0.54 6.43 18.63 0.01 lala
5 2.46 3.97 8.81 0.02 0.33 4.14 0.01 feeeee


X Y Z G_x G_y G_z smooth surface
0 8.33 8.79 5.68 0.43 4.03 12.22 0.01 foo
1 8.25 6.22 5.13 0.02 0.06 7.72 0.01 foo5
2 3.37 6.46 0.18 0.69 5.60 16.39 0.01 lala
3 3.82 1.27 8.42 0.15 4.21 13.70 0.01 foo5
4 5.79 9.10 0.89 0.54 6.43 18.63 0.01 lala
5 2.46 3.97 8.81 0.02 0.33 4.14 0.01 feeeee


Grid

grid = gp.Grid()
grid.create_regular_grid([0, 10, 0, 10, 0, 10], [50, 50, 50])

Out:

<gempy.core.grid_modules.grid_types.RegularGrid object at 0x7fcc46d33ee0>

Out:

array([[0.1, 0.1, 0.1],
       [0.1, 0.1, 0.3],
       [0.1, 0.1, 0.5],
       ...,
       [9.9, 9.9, 9.5],
       [9.9, 9.9, 9.7],
       [9.9, 9.9, 9.9]])

Rescaling Data

rescaling = gempy.core.data_modules.geometric_data.ScalingSystem(
    surface_points, orientations, grid)
X Y Z smooth surface
2 0.53 0.55 0.03 2.00e-06 lala
4 0.95 0.83 0.94 2.00e-06 lala
1 0.69 0.89 0.06 2.00e-06 foo5
3 0.02 0.23 0.55 2.00e-06 foo5
0 0.39 0.41 0.82 2.00e-06 foo
5 0.45 0.66 0.41 2.00e-06 feeeee


X Y Z G_x G_y G_z smooth surface
0 8.33 8.79 5.68 0.43 4.03 12.22 0.01 foo
1 8.25 6.22 5.13 0.02 0.06 7.72 0.01 foo5
2 3.37 6.46 0.18 0.69 5.60 16.39 0.01 lala
3 3.82 1.27 8.42 0.15 4.21 13.70 0.01 foo5
4 5.79 9.10 0.89 0.54 6.43 18.63 0.01 lala
5 2.46 3.97 8.81 0.02 0.33 4.14 0.01 feeeee


Additional Data

values
Structure isLith True
isFault True
number faults 1
number surfaces 4
number series 5
number surfaces per series [0, 2, 1, 0, 1]
len surfaces surface_points [2, 2, 1, 1]
len series surface_points [0, 4, 1, 0, 1]
len series orientations [0, 4, 1, 0, 1]
Options dtype float64
output geology
theano_optimizer fast_compile
device cpu
verbosity None
Kriging range 17.32
$C_o$ 7.14
drift equations [3, 3, 3, 3, 3]
Rescaling rescaling factor 17.73
centers [4.1742456039270355, 4.664036206917114, 4.423674404271726]


values
isLith True
isFault True
number faults 1
number surfaces 4
number series 5
number surfaces per series [0, 2, 1, 0, 1]
len surfaces surface_points [2, 2, 1, 1]
len series surface_points [0, 4, 1, 0, 1]
len series orientations [0, 4, 1, 0, 1]


values
dtype float64
output geology
theano_optimizer fast_compile
device cpu
verbosity None


dtype output theano_optimizer device verbosity
values float64 geology fast_compile cpu None


Out:

dtype               category
output              category
theano_optimizer    category
device              category
verbosity             object
dtype: object
values
range 17.32
$C_o$ 7.14
drift equations [3, 3, 3, 3, 3]


ad.rescaling_data
values
rescaling factor 17.73
centers [4.1742456039270355, 4.664036206917114, 4.423674404271726]


Interpolator

faults.df['isFault'].values

Out:

array([False,  True, False, False, False])

Out:

Compiling theano function...
Level of Optimization:  fast_compile
Device:  cpu
Precision:  float64
Number of faults:  1
Compilation Done!

<theano.compile.function.types.Function object at 0x7fcc3c8187f0>

Out:

len sereies i [0 2]
len sereies o [0 4]
len sereies w [ 0 17]
n surfaces per series [0 2]
n universal eq [3]
is finite [0 0 0 0 0]
is erosion [0]
is onlap [0]

Total running time of the script: ( 0 minutes 3.449 seconds)

Gallery generated by Sphinx-Gallery