|
|
 |
|
 |
 |
 |
 |
|
Oscar Kilo Ltd provides data analysis services and tools. These include:
- Data Screening and Classification Services
- Data Mining
- Statistical Analysis, AI
- Software development
- Database services
- Geographical Information Systems
OK was formed to deploy some unique and proven techniques that had been developed for our adaptive real-time classification system (DETECT). The principals behind DETECT were originally developed to address the problem of credit-card fraud detection.
Credit card fraud detection is an example of a ‘difficult’ classification problem. While the cost to banks and credit-card companies is high the actual incidence of fraud is very low, less that 0.1% of transactions. There are therefore very few exemplars on which to base predictions. There are many technical difficulties associated with dealing with such rare-event classification problems some of which are described later.
OK’s background in classification problems extends beyond rare-event classifications to other areas like document indexing.
Data Screening and Classification Services
The colloquial example of a hard search problem is 'finding a needle in a haystack', however it is even harder
to find a particular grass type in a haystack. The needle is very different and can be easily identified. Suppose,
however, that the hay contains hundreds of different species of grass, and that the task is to identify a particular
rare species of grass that is subtly different from all the others. To make things even more difficult you have
previously only ever seen a few thousand examples of the special grass and this is all that classifications can be
based on.
This contrived example illustrates a type of problem that is common. A typical real-life example is the credit-card
fraud detection problem; here the data consists of millions of transactions which have a multiplicity of features
such that no two are exactly the same. Most are perfectly legal. Only a tiny percentage are known to be fraudulent
and there is almost as much variation amongst these as there is in the legal transactions.
Data like this is so sparse and lacking in consistency, it seems an impossible task to spot future fraud using a
system based on the sample data as the basis of each decision. However, while there simply is not enough information
to go on to make black and white decisions, we can perform powerful screening, in real time, to flag which
transactions should be investigated further by human experts, who would otherwise be overwhelmed by the sheer
quantity of data. The name of the game is to flag likely fraud without ‘crying wolf’ too often.
We specialise in building systems which classify large numbers of events or other kind of data record or document.
Such data typically has the following characteristics:
- Each record has many different attributes (fields).
- There is a logical relationships between records, such as time order.
- Large range of possible variation between records.
- Sparse occurrence of the kind of record we are interested in identifying (a fraction of a percent).
- Insufficient data to make positive IDs from the information contained in the records alone.
Some typical applications are:
- Credit Card Fraud Detection: detecting fraudulent transactions in real-time.
- Insurance Fraud Detection: assessing which claims to investigate.
- Delinquent Spending on Store Cards: spotting indicators of behaviour likely to lead to runaway spending.
- Mobile phone fraud: identifying calls from stolen or cloned phones or other network misuse.
- Medical: Occasionally patients have adverse reactions to otherwise commonly used drugs. It may be possible to identify patients who might benefit from closer monitoring. A system like ours which can be constantly re-calibrated might pick up on geographical or other changing risk factors which might otherwise be discovered later or missed altogether.
- Shipping container security: identifying which containers to inspect from bills of lading data.
- Tax fraud: selecting which tax returns to investigate. There may be many analogous applications in central and local government as more and more business (both C2G and B2G) is conducted online. The E-government agenda is that all transactions will be conducted online within 18 months.
- Police and Security Services: these may have access to many of the data sources from other application areas but be looking from another perspective- deciding which organisations or individuals to investigate further when combining the data with previous known criminal or terrorist activity.
- Sales, Marketing and Business Intelligence: Modern call centre operations accumulate large quantities of data- there may be applications in deciding which calls to follow up on the basis of data from previous successful sales. The ability to keep up with changing trends might give our system the edge of traditional techniques.
All of these examples require a very different kind of search to that which Google (and OK’s Octopus) is adept at, where large numbers of documents are searched for a very specific feature set (key words). It is much more similar to the problem of filtering email messages for junk mail on the basis of examples of junk and non-junk except that junk email is not a rare event and email is largely 'unstructured' data.
Contact us:
If you feel we can help your business please contact us to discuss your requirements.
|
|
 |
|
 |