2025-08-26 8 minute read

Intelligence As A Business Function

Six weeks in and this week I wanted to share how to find datasets, and how by systematically retrieving those datasets and a writing a bit of code you can build a tool capable of generating actionable intelligence (for you or your business).

This isn't going to be written in a technical way , as almost all technical writing is very dull. Instead, it'll be a whistlestop tour of interesting approaches and findings so that by the end you hopefully begin to see how in this world; 1 + 1 can equal 5.

Business Motivation

Almost every part of a business in the modern world exists because it contributes towards achieving some unique objective, and retrieving and analysing datasets is no different. As an example, my first business motivation is simple; businesses need customers, and nobody I know has bought anything from me (yet). This means that my customers must (probably) be people I don't already know. The question then is; who are they? where are they? would they be interested in subscribing? ( just kidding... unless 👀)

The first step in this process is to sit and have a think about what's available on the internet. Which things are publicly accessible, which organisations, governments, or hobbyists might already be gathering data which could help answer this 'who' and 'where'. A trick from a book I read recently is to define a 'customer archetype', which is just a short (and changeable) description of who exactly your customers are. In my case, this is going to be early stage asset management companies and tech consultancies, although ideally I could cast a slightly wider net if needed.

Now this is where it gets interesting; what we're looking to achieve is what my cryptography professor described as the creative moment . That is, the moment where you combine two pieces of information to unlock a new way to solve a problem. An example from student life might be; problem = what's to eat for dinner, information = haven't been shopping recently, no fresh vegetables available, answer = cereal for dinner*. This creative moment of realising that cereal is a viable option depends on the knowledge that you have milk, you have cereal, and that the two can be combined. In other words, with more information comes more creative moments .

In the data world, we need to find the equivalent facts and then write some code to capture them so that in student terms we can make a bowl of cereal automatically, every day, for ever.

Quick note: I'm aware that this is a somewhat backward approach to most business textbooks, which is to find a person with a problem, solve it for them, and then find another person. My current approach is a little different, in that I know that every hedge fund, bank, family office, and other asset management companies, (and tech consultancies) need reliable and discretely collected datasets for whatever business motivations they are pursuing. My goal is therefore to find ones which are open to partnership , since I can adapt technically to any data acquisition requirements, but don't want to drift into the business of something like mobile app development.

Sourcing Datasets

When it comes to finding data sources (cereal), it's really about knowing which businesses depend on which datasets. For example, if you're trading equities (stocks and shares), you'll need price data about those equities. This is going to be available wherever those equities are traded, and probably with a hefty price tag depending on how much and how frequently you want it.

So back to the 'who' and 'where' problem, which industries rely on data about companies and employees? The recruitment industry stands out, since platforms like Indeed , CV Library , and of course LinkedIn , contain profiles for most people and companies. However, platforms like this are typically pretty well defended in that you might need to do the technical equivalent of wearing a trilby and trench coat each time you ask for some data, otherwise you might get blocked . Approaches like this are never technically insurmountable, but it's a question of taking several hours/days to get something reliable set up, versus looking elsewhere for a source that's a bit more meaty.

The other obvious place where company and employee information is held is by the central government. In the UK that means Companies House, and unlike their peers in the business world, most government websites which aren't classed as critical national infrastructure are essentially wide open. In the case of Companies House specifically, this is actually by design, since this openness enables an entire genre of businesses to exist which use this data for science, strategy, and profit . With a viable source now identified, the next step is to go about getting their data in a recurring, reliable way.

Systematic Retrieval

As any Game of Thrones enjoyer would know, it's usually not enough to just drink and know things, instead you'll need to actually do something in order to make any kind of progress. Here, that means setting up servers, data storage infrastructure, sneaky secret tech things like proxy chains, counter-fingerprinting measures, and more.

I could insert a shameless plug here about how the experts over at OJS Group do this as a full time job, but this is an ad-free newsletter. Instead, lets assume that we've got all of that set up already, and can reliably retrieve data from Companies House and other places. With a bit of testing, we also know that the data is updated every day, so we can set our system up to retrieve it every morning before breakfast.

Creating Something More

Having identified at least one useful source, and set up a way of reliably retrieving it (plus any updates), the final step is to magically turn thing one and thing two into thing five. To do this, we need to introduce a second piece of information (the milk!) which can be combined with the companies data to generate some action we can take.

To get this right, we can go back to the original question of 'who' and 'where'. Data from Companies House gives us the 'who' every morning, we can see which companies were incorporated, who their directors are, and some other bits and bobs. We also have a 'registered office address' for both the companies and directors, but no way of turning that data into usable information. We also don't have any contact information for the directors, which is sort of the whole point.

To remedy both of these, we can use the postcodes in the addresses to get the latitude and longitude coordinates of the offices and directors (usually the same). This lets us plot them on a map , which then means we can see something like 'a new company was formed in Wimbledon, London, and the directors are X, Y, Z'. That's already pretty useful, but we can do better by adding the 'nature of business' to the display, which will let us hone in on baby hedge funds, tech companies, crypto trading bros, and AI slop factories - really whatever we want.

This is even more useful, but for the final point we need an action to take. For me, this would be most useful as a daily email which said something like 'Here are the baby hedge funds and asset management firms which were born yesterday, and here are links to google/linkedin/reddit searches of their founding directors'. I can then click these links or store them as part of another system, to then 'reach out' and 'sync up' with them (😷), or just send them branded flowers or cake or something.

Anyway, this entire process plays out multiple times and in many different ways across the finance world, and it's been really interesting building out something that's more of an intelligence/surveillance tool than a more standard data pipeline. It's been doubly interesting in that the systems and tools which I'm building are implicitly useful for my business , and are therefore self-validating. I'm working with others already to provide useful things for their businesses, so if it sounds like something you'd be interested in please get in touch! I hope this has been interesting, and that you can start to see how intelligence as a business function makes sense, since without it, I'd probably be doing something like endlessly scrolling LinkedIn fishing for leads, or worse, talking to people in person.

That's all for this week, next week I'll be sharing another intelligence tool I'm working on to do with European land prices , but I haven't finished it yet - lots to do!

Thanks and all the best,

Oliver

* I encourage you to do this at least once a year to remember how it all started, preferably not for a major anniversary or holiday, but that's up to you.

Previous Episode: Thinking Internationally