The data century – both a grey zone and a new frontier

Rachael Bolton

The 21st century will be looked back on as the data century. Data is the new frontier in so many ways. As a species, we are now producing around 1.145 trillion megabytes of data every single day.

Everything we do creates some kind of data trail. As individual fragments, these tiny data points might not mean all that much. Each fragment alone doesn’t necessarily identify us or paint much of a picture – but together, these elements could be used to paint a detailed timeline of our movements and habits.

As an individual, we’ve got used to having our browsing habits and app choices used to build a picture of our behaviour; and we worry about the impact on our privacy. In the commercial world, we open bank accounts; we take out loans; we start a business; we contract with each other. These activities can also be combined to build a whole picture that exposes things we may have tried to keep secret.

To those in the business of investigation, the potential uses for recombinant data must be obvious. The question is: where do we find the expertise needed to make sense of all the billions of dancing data particles?

Dr Jason Shepherd is Director of Analysis and Innovation at Thomson Reuters Special Services International, (TRSSI), which – in partnership with Refinitiv, an LSEG (London Stock Exchange Group) business – has developed an open-source intelligence service that helps law enforcement and other investigative entities leverage the power of publicly available data sets.

He describes data as “hairy with information” – that is, data is the physical record and from it we can extrapolate not only what it was intended to record, but information about who recorded, it, how, why, when and where.

An example is a public record such as a driving licence; it may be used to confirm your right to drive, but also reveals your age, name, address, and driving record. “When we look at a dataset, we’re asking not only ‘what was it intended to record?’, but ‘what information can I extract from it?’”

Dr Shepherd explained: “Information is also promiscuous. It ends up in multiple datasets, and it ends up spread across datasets. When we talk about open-source data, we’re often asking ‘what data can I lay my hands on?’, but what I’m really asking is ‘where can I access the information I’m looking for?’, as it is going to be recorded in many different data sets.”

Government, law enforcement and intelligence agencies are able to access covert, legally protected information about a suspect or person of interest through application of a warrant, for example. These information collection activities give the investigating agency access to information that will never be available to the average member of the public.

However, the sheer volume of data now produced in public view or in a commercially collected and freely accessible form far outweighs what intelligence services could collect on their own – and Dr Shepherd said this shift in dynamic has been particularly challenging for those working in the industry.

“This staggering volume of data is quite frightening for people who work in secret intelligence.

“15 years ago, the data pulled in from the outside world was used as collateral that validated – or at least made sense of – the intelligence gathered by analysts. Now, this model has completely flipped. The challenge is less about finding new secret information, but using that to help make sense of the information encoded in the vast amount of data already available.”

The current international political climate is only intensifying the need for expertise in this area. According to Dr Shepherd, “economic competition between nations is becoming more explicit, and it’s re-emerging as a domain of statecraft”.

In order to face up to the threats of the rapidly shifting geopolitical environment, western governments need to make decisions about regulatory policy, trade policy, and policing political influence. To inform those decisions they need information, and the ability to understand that information; as well as the motivation to recognise the competition they’re facing.

He observed that while the efforts of intelligence communities have been focused primarily on counterterrorism and the pursuit of individuals hiding in a population for the last two decades, the threats we now face are thematic, systemic and strategic.

Traditional intelligence agencies are equipped to use secret techniques and special legal status to acquire unique information, but they are not configured to make sense of vast amounts of easily accessible information – which is what we’re up against.

This is where the case for reliable and trustworthy public-private partnerships comes into the equation. The commercial world has developed many of the information gathering, data engineering, and analytic capabilities needed to inform governments in their decision making.

“For governments to process and analysis data on the scale the commercial world does, is just not feasible. Data and analytics are core capabilities for many companies such as TRSSI and Refinitiv, and it does not make sense for a government to try to duplicate what the private sector has been doing very successfully for many years,” says Dr Shepherd.

The effort involved in developing commercial databases is significant. The resources required to collect, collate, deduplicate, resolve and maintain a consolidated commercial database is beyond the capability of many Government departments.

For example, the Refinitiv World-Check database is updated twice daily and has 470 researchers dedicated to its upkeep, (most are bi-lingual at a minimum). They cover every country in the world consolidating global lists including sanctions, law enforcement, regulatory enforcement, thousands of news feeds and a myriad of other sources. This is just one data source that goes into the TRSSI graph.

Concluded Dr Shepherd, “Governments around the world have the opportunity to partner with the private sector when it comes to understanding large amounts of data. Bringing in pre-analysed open-source data is a far more productive way to utilise the valuable resources that are available to government, rather than trying to do it all in-house.

Overlaying that data with the confidential data governments hold is will really unlock value. At the same time, governments themselves can help generate more accurate, clean and up to date data, by enforcing good regulatory frameworks.”

This sponsored article was produced by in partnership with Refinitiv.

Do you know more? Contact James Riley via Email.

Leave a Comment

Related stories