Making Open-Source Information work for your Company
Data that is available from open sources – social media, websites, real-time tracking
applications and satellite imagery to name but a few, is ubiquitous and is increasingly being turned into actionable knowledge.
As the volume and diversity of data sources continues to increase at an exponential
rate, the harnessing of open-source information becomes an increasingly valuable process across a multitude of sectors and industries. The most high-profile application of open-source intelligence is in the national security, law enforcement and human rights abuses identification domains. A notable example is the international impact of Bellingcat the online investigation organisation, with its exhaustive and successful investigations of unique global events including the shooting down of the Malaysian Airlines Flight 17 and the identification of suspects in the Skripal poisoning.
The insurance industry is driven by data and continuously strives to achieve more timely and accurate knowledge to make decisions. For example, the digital footprints of individuals can become important information in the fight against fraud. Automatically integrating data from social media and news websites, such as photographs of extreme weather events and fusing with other data sources, like hyper-local weather observations, can greatly assist insurers to reduce time consuming manual activities and generate internal efficiency. This can
save revenue by enabling the quick identification of false claims and at the same time, increasing customer satisfaction by speeding up approval claims. The fusion of information also offers opportunities for dynamic customer engagement and the real time mitigation of risk. Over longer periods of time, new perceptions, reasoning, and learning can be attained to accelerate the pace of organisational change.
What are the principles for achieving the successful exploitation of open-source information and how can organisations identify and coherently pursue its use? Can technology and processes used in one sector readily translate across to another? What are key lessons learned for maximizing the use of open-source information?
Tim Ripley is journalist and defence analyst who has reported on international security and military matters for numerous media organisations including The Sunday Times, The Scotsman, Jane’s Defence Weekly and Jane’s Intelligence Review. His most recent books are ‘Operation Aleppo: Russia’s War in Syria’ and ‘Little Green Men: Putin’s Wars since 2014’. Both are distinguished by the substantial use of meticulously researched open-source
information. Balkerne spoke to Tim Ripley to examine the application of open-source information techniques to identify critical information for a range of organisations.
Balkerne: Over the past five years, do you think there is a demonstrable growth in the
use of open-source information for commercial objectives?
Tim Ripley: Without a doubt and in parallel we have also seen a growth in the scale of data
being created. This growth is going to continue and if anything, the tempo will accelerate. However, growth in volume is not being accompanied by a parallel growth in quality, particularly with the rise of agenda-based misinformation. I think the segmentation of the market demand has become more apparent. By that, I mean there is a growing realisation of what commercially can be achieved through the use of open-source information and the accompanying need for specialised domain knowledge, including regional geographic segmentation. For the organisations and individuals that have identified and harnessed this
information, I have no doubt that substantial commercial benefit is being achieved.
Balkerne: What are the major challenges and opportunities presented by pursuing the use
of open-source data?
Tim Ripley: The major challenge is very obviously the dramatic rise of misinformation in
pursuit of explicit agendas and sometimes the overwhelming difficulties in being able to satisfactorily undo the damages caused by this misinformation. Technology has enabled a big change and we have moved on from the era of clumsy North Korean photoshopped efforts. A growing challenge is the maturity of technology used to create misinformation, such as artificial intelligence enabled deep fakes. The most credible examples of their use that I can think of, have been for political objectives but no doubt their use will expand. Maybe in a few
years we’ll see very sophisticated false news of a CEO of a major company announcing
significant commercial activity in an emerging market as a means of share price
The opportunities for the use of open-source intelligence are enormous. Focus is
needed to carefully understand and define the product or service to meet customer requirements. I think where customers can provide a lot of detail on why they want certain knowledge, it is much more advantageous than simply stating what information is required. Practitioners of open-source information will understand the value of different sources and take more of a fusion approach, which involves far more cross-referencing. I believe results of these activities can be leveraged far beyond that of individual pieces of information.
Balkerne: What do you think has been the most successful use of open-source information
in recent years and how were these successes achieved?
Tim Ripley: Bellingcat and their investigations into the downing of the Malaysia Airlines
Flight 17 over Ukraine and the follow up to the Skripal poisoning in Salisbury. In both cases, the thoroughness of the investigations was such that each push back against the findings ultimately only added to the credibility of the results of the investigations. If anyone is interested in finding out more details of Bellingcat’s work, I recommend the book, ‘We are Bellingcat: An Intelligence Agency for the People.’ Although I am highlighting the work of a
very well organised and diligent entity, I can think of many examples of successful use of open-source information, even from informal groups. For example, picking through various Reddit Forums I was fascinated to see in August 2020 how an individual created an informal community to verify that the Chairman of a FTSE 250 company violated market regulations by purchasing shares in the company only a few days before the publishing of annual results. The group came to a collective decision that this information was actionable through a short-term financial position, with vindication being achieved when the company’s results were published days later.
Balkerne: What examples are there of the emergence of open-source data protocols and
software tools to ensure that data can be shared and used across an industry?
Tim Ripley: The simplest and most attractive for my industry is Creative Commons 4.0, which
can be summarized as allowing the copying and redistribution of material and adaptation, including for commercial purposes and I have found this very beneficial for use of imagery. With every on-line site or application, there will be user agreements which although often overly verbose will make it clear about permissions for the use of information posted publicly, such as Tweets or Facebook posts. Computing wise, the Open Data Protocol for the consuming of RESTful APIs has been super helpful to me and I have certainly seen the growth
of resources using this protocol. Just as much as open-source protocols, I would champion the curation of open-source resources and software tools. A fantastic example is the work of Arno Reuser and his comprehensive open-source intelligence resource discovery toolkit.
Balkerne: What is the most important lesson you have learnt from using open-source information?
Context is all and by that, I mean there is a need to have absolute clarification for what purposes information is being gathered. Purposes often change as new knowledge is attained or external factors have an impact. That leads to re-visiting what I call the information cycle. For my cycle, I use six steps of requirements, planning and selecting, collection, processing and exploitation, analysis and production and finally publication. Being curious has many benefits for working with open-source information but can also bring challenges. Given the huge number of potential sources of information, it is all too easy to get drawn down the ‘rabbit hole’ of a deep dive on a particular event or theme. I’ve learnt from experience before committing to deep dive, to force myself to complete my own checklist to formally review the decision to commit my most precious resource, which is time. I try to be clear in my own
mind the qualitative and quantitative outcomes I am seeking from any deep dive. Linked to this is the lesson of the need to be ruthless and cut losses where necessary, but also sometimes to commit further when successes start to flow.
Balkerne: Are there any technology tools you consider to be fundamental to exploiting
Tim Ripley: Due to what I do, my requirements are very much focused on text and imagery.
With imagery there is a continuous need for chronolocation and geolocation. Verifying
the time and the place and when and where it was created. The app Suncalc is very well known for being very helpful with the task of identifying the time and date through reviewing the position of shadows and the sun. It can also help with geolocation when the sun or significant shadows appear in an image. However, I’ve often had to resort to mixing with manual satellite imagery analysis and ground level imagery analysis to get personal vindication in the achievement of geolocation results, so Google Earth is an obvious favourite and hugely user friendly and intuitive. Google Street View is a good starting point to
orientate to a particular location. Even if satellite and street view imagery can only be snap shots in time, they give an idea of the normal pattern of life. Text content can be more challenging by simply the number of sources and in my line of work, political fact checking is important, especially when an international crisis is developing, or a significant event has occurred. Politifact is useful to review the US Political environment. There is a need to use tools to automatically extract and analyse content and I have used publicly available
tools generally built on Python and experimented with the Elasticsearch tool.
Balkerne: From your experience of using open-source information, what opportunities are
presented to the insurance industry?
Tim Ripley: I’m sure everyone can think of news articles covering examples of individual
fraud being identified from information which very often the individuals perpetrating the fraud have posted. For example, individuals attempting to claim severe whiplash injuries but then posting images on social media such as Facebook or Instagram of an intense gym workout. More seriously, I think that Surowiecki’s book ‘Wisdom of the Crowds’, although 15 years old, remains very relevant and the public posting on core platforms such as Twitter and Facebook can provide some remarkable opportunities for attaining both insight and knowledge. As events occur, members of the public will always post information about their
experiences, what they have seen, experienced or indeed what they think. There is also a growing amount of information from what I call verifiable sources, such as Police Forces and Fire Services. This can all be hugely valuable information, especially if imagery is provided. Direct examples I can think of, are during storm events, with the location, content and frequency of posts enabling insurance stakeholders to quickly analyse and estimate the likely
scale, type and location of losses. Undoubtedly, there is also the ability to use this information to mitigate potential losses. Post major loss events, I would expect open-source information to be fundamental to the claims process and of course if that can be done at speed and scale through technology, that brings competitive advantages. For high profile events, there will always be a place for the meticulous structuring of information to piece together in forensic detail everything about an event.