How Data Preparation Can Help Extract the Ultimate Value From Data

Isabel Caballero Doménech, Operations Scientist at Aurora Technology B.V. for European Space Agency reveals the advanced data preparation efforts commercial organisations can learn from.

Isabel gives her advice on: Improving data preparation efforts Structuring the data team to enable data-driven work Breaking the data silos

  • Improving data preparation efforts

  • Structuring the data team to enable data-driven work

  • Breaking the data silos

Data lessons from outer space – Isabel shares the basic requirements for extracting value from data and the crucial role data preparation plays in any successful data-driven strategy. 

Throughout her career, data preparation is the key first step, whether for machine learning and artificial intelligence or simply querying.  Most modern businesses have hugely complex data and without appropriate data preparation it is not usable. 

Data preparation includes various tasks, including finding the required data, understanding the format of the data, cleaning the data, reconciling different sources, and anonymizing it.   

Data science typically begins with identifying the business question to be addressed, before the appropriate data to answer that question is identified.  Then the preparation process must be undertaken before any subsequent modeling and deployment can be performed. 

Preparing the data

The problem is that when organizations recruit data scientists, they typically expect them to perform highly sophisticated modeling and mathematical tasks, but in reality, the bulk of their time is spent performing data preparation, accessing the right data from across the business, and developing tools to process data.

While data preparation is certainly not a sexy task, it’s nonetheless a vital one. Doing it well not only increases the productivity of the data science teams but reduces the time to market of any solutions they create. The increased hygiene of the data also assists with reproducibility and will help to bring the data to a wider audience by making it easier to work with. This will be vital in the development of any kind of data-driven culture.

Isabel specifically works on a satellite that is observing the universe for gamma rays and x-rays. They generate a huge amount of data, which is transmitted to ground stations on earth, where it’s processed by operations centres in Germany and Spain. The data is then sent to a data center in Geneva where engineers collect and prepare the data, while also developing tools to help the scientists to analyse and model the data. It’s a clear segmentation of roles so you’re not using advanced data scientists to perform data preparation tasks.

The nature of the work means that months can pass with few noteworthy events being captured, and then many are recorded in a short space of time. This then requires rapid action and coordination with a range of bodies.

The approach used by Isabel and her team is accepted best practice across a range of space-related bodies, with a dedicated team of engineers existing to collect and prepare the data for space scientists to analyse. The data is also archived to allow for historic analysis to be performed, with mission data existing for over 30 years.

Data-driven work

For a truly data-driven culture to emerge requires the efficient exploration and analysis of data, and this in itself is only possible if the organisation invests in data preparation first.

Isabel highlights the evolving role of the data scientist in producing such a culture. Now, advanced data teams require not only data scientists but also Data Engineers, DevOps, and Machine Learning and AI Ops. The important thing, however, is that each of these professions is able to work with pre-prepared data.

Among the short-term actions organisations can take towards achieving this kind of culture, Isabel recommends:

  • Listening to the needs of the data team in terms of the data and infrastructure they need
  • Work to support and facilitate the work the data team try to do
  • Adapt and change to the rapidly changing technology landscape

Because of the relatively unglamorous nature of data preparation, it can be easy to underestimate its importance in the wider data work undertaken by an organisation, but this would be a huge mistake to make. Organisations should invest in the data grunt work and listen to the needs of their data team to help provide them with the tools they need to do their job.

Group discussion

  • What is your business doing currently, and what can it do better in terms of data preparation? 
  • How can the data science team’s needs be heard better and how can these be facilitated? 

The delegates revealed that the feedback loop between data management and data analytics is crucial so that the data is correct in the master system.  This way it reduces the need to cleanse it.  It was also suggested that introducing the business into the data teams can help as they have the domain knowledge to understand what good data really is.  These “BusDevOps” teams help to better prepare the data. 

They also thought that the needs of data science can be better heard via the role of the translater that Bill introduced earlier in the Virtual Masterclass.  This person/s can really help to act as the bridge between the data teams and the rest of the business. 

THE AUTHOR

Laura Bineviciute

Head of Community and Engagement
← Back to Insights