Big data, small data and foxes in seismic. How I am going to quit at least picking first breaks.


‘Big data’, a term emerged a few years ago, is today still a buzzword: but a rather vague one. In other fields, big data indicates datasets that are too large or complex to analyse, process, visualise with conventional approaches. Often these big data are produced by digital interactions over the internet, by connected devices, by sensors in objects. But the term big data also designates the predictive analytics, the machine learning and the cognitive computing that are developed to extract information from these data. The emerging technology and philosophy is going to change many disciplines and industries: the O&G is cited as one of main actors and clients of the big data revolution.

Seismic data are big data

On the size part of the big data buzz, with the volume-velocity-variety definition, there’s not much novelty in the seismic industry: we have been using big data for a while. The data size of 3D seismic surveys have become larger and larger, passing the petabyte years ago: this certainly requires hardware that you can’t find at the supermarket, specific software and architecture, and a sound working method.

But our relationship with the data size is a bit different: because the data that we process to see inside the Earth are not a by‑product of internet interactions, nor the tracks of the clickstream of millions of geophysicists on LinkedIn. Most often, the size of our data is still limited by the cost of acquiring it, and by the time needed for the acquisition; and the sampling is normally designed with care. Geophysicists normally know well the difference between data and information, and don’t get too excited by the data alone.

In the current difficult moment, the industry is even reconsidering the value of old 2D lines: the big wave of bigger seismic data is probably not going to spread worldwide right now.


How and why we could get larger dataset?

The data size has grown over time: the high channel-count acquisition systems allow deploying larger and larger numbers of sensors. And advances in processing allow separating the wavefields generated by sources which are activated simultaneously. This is like listening to two songs at the same time; or like recognising in the image below the Dashur pyramid and the Fitz Roy, and being able to separate them.


When you really separate seismic records, you are ready to accept a bit of leakage: some residual interference appears in the two images as noise, as you can see below in a very small scale surface-wave experiment.


You could argue that this residual noise is not a problem, because in return simultaneous shooting can reduce the ambient noise: if a fox, while checking the spread, walks close to a geophone that is recording five simultaneous shots, the Fox’s noise will be divided amongst the five separated records. Unfortunately this dilution of the ambient noise is not so straightforward: it works only with the incoherent noise, which could increase with the larger operations of a larger crew.


But, anyway, the evolving technology can give us many more traces: and many more terabytes (or terrabytes). With more traces, we can explore larger areas, acquiring larger surveys. Or we can get a higher data density: assuming that higher data density implies better data, from which we can extract more information.

Often the data density implies some redundancy: we need to make sure that the connection between redundancy and information is causal. Of course we prefer to have more traces: but do we prefer to have more traces, is they are noisier traces? The debate around the compromise between data quality and data size is open, also because we evaluate our data design in terms of the cost and value of the information. The survey design is more than the science of sampling the wavefield: it is maybe the science of finding the best compromise in sampling the wavefield and the noise that goes with it.


Designing the redundancy

Things are not trivial since the data density can be increased in many different ways, not all equally useful. Imagine I want to image the pattern engraved on a brass coffee table: I can simply take a photo of it, but on its surface are laid some other things, which cover and hide my target.


The random nature of the dice is exemplary: and here dices represent of course the random noise, as they are not correlated to the data. A simple way of better seeing the target is to repeat the experiment, rolling again the dices. The noise, if incoherent, will not repeat itself exactly: unless it is deterministically reproducible, and therefore coherent.


Looking at the different images, one sees that what is not visible in one image, is visible in another, and one guesses that the dice noise can be processed out. There are lots of very clever ways of removing this noise: the good old stack is still used as a key ingredient in many processing steps.

By summing just three of the raw photos you will get the stacked image below: which is already pretty good, to be a fold 3 section!

07_FigureAVG_random noise3images_small

The dice is removed to some extent only, it becomes partially transparent: the term noise attenuation is more honest than noise removal. But if you have 30 images, and you process them to remove the noise (here, with something more than a stack, removing what is nonstationary over the acquired sample of images), you can very well attenuate the noise, and almost remove all the dices.


The point is that here you don’t need to take 1000 images and have a 1000-fold redundancy. Taking 1000 photos from the same point of view would be expensive and would miss the biggest problem of the photo: which is that it is has a short depth of field. It is focused only partially, is sharp only in a small central region.

So if I cannot get a bigger depth of field, I’d rather rotate the focus ring and take multiple photos (like multiple offsets and azimuths). In the image below, the near-mid-far focussed photos, in which you see better the coffee table (with a dice noise that doesn’t move).


When you have lots of incoherent noise, then you need the redundancy (the fold) in every subset, to remove the dices from the different views.

If we had systems that can acquire many more data, we still would need to design the sampling. As in statistics, we can talk about the sampling error, because we don’t look at the whole population (the continuous wavefield), but also of sampling bias, because we select a sample.

It is obvious that a larger sample is not necessarily a better sample (is it obvious?).

The very famous case of the 1936 American presidential elections, with candidates Roosevelt and Landon, is emblematic. The Literary Digest performed a very large and very expensive poll: with more than 2 million replies to 10 million questionnaires, it predicted the wrong result (mainly because of the non-response bias). At the same time George Gallup, with a well-designed and 50 times smaller sample, managed to correctly predict the Roosevelt election. And with a 3000 people sample he also predicted that the Literary Digest would mis-predict the results.

We can make similar and expensive mistakes in survey design, if we think that the fold solves everything: the idea n-fold gives a square-root-of-n better signal can be misleading.

The problem is that the so-call random noise is seldom random: in many cases, with modern high-fold data, the incoherent noise is not the biggest problem. When our models will be able to consider all the coherent seismic events that we generate, then the fox noise and its attenuation will become really important.



Model-free and theory-free seismic

Another aspect of big data is the trend towards theory-free models: with enough data, with a gigantic data sample, the numbers speak for themselves, some people advocates. The old statistical sampling will become obsolete, and we will be allowed to confuse correlation and causation. The results of the elections shall be predicted from the analysis of millions of tweets and posts: not only by the estimation of the sentiment of political tweets, but maybe by the number of kitties posted.

In 2008, the end of the scientific method, and the ‘End of Theory’ were predicted ( ).

Our situation is different, and it’s difficult imagining that we can go without a physical models: but we can very well imagine in the future a dataset transformed into a subsurface image by a single step of full-waveform inversion. Truly full, and able to predict all the events that are considered today both signal and coherent noise.

Today, besides the current FWIs, with all their strengths and limitations, we already have some routine methods able to magically estimate a part of the wavefield just correlating and convolving lots of traces. And this is a bit like google automatically translating Russian into Italian.

In the processing there are still lots of labour-intensive steps in which the complex and very non-linear relationships between the data and the information require lots of manual work; not only the velocity model building. Some simple tasks, like picking first breaks, can have a big impact on the information which is eventually extracted from the data.

Picking the first breaks is like extracting the skyline from your data, finding at each location the highest point.


It can be easy on a clear day: it is obvious to see what is above and below the city.


The problem of picking, is that Shanghai can be hazy, and as gloomy as the London on the opening pages of Heart of Darkness. And if you try to pick the skyline from a noisy Shanghai skyline, it’s difficult to get very far.


In cases like this, most of the current automatic methods still struggle to reach the manual quality. The progress in machine intelligence which is driven by big data will hopefully help us. The fact that Google recently open-sourced TensorFlow is very good news: a great library for machine intelligence can seriously help the progress of many fields including ours.

One day we’ll image the subsurface using all the recorded energy, maybe also including the fox noise. In the meantime, I think I’d rather spend my time in exploring what we can do with TensorFlow rather than hand-picking the first breaks of another million of shot gathers.



One thought on “Big data, small data and foxes in seismic. How I am going to quit at least picking first breaks.

  • In the processing there are still lots of labour-intensive steps in which the complex and very non-linear relationships between the data and the information require lots of manual work; not only the velocity model building. Some simple tasks, like picking first breaks, can have a big impact on the information which is eventually extracted from the data.

Leave a Reply

Your email address will not be published. Required fields are marked *