Noisy data has been a challenge for fraud detection because it becomes difficult to differentiate data from noise. Hence, it becomes difficult to detect anomaly
As as a solution, we propose the usage of low-rank matrix approximation.
The low-rank matrix approximation facilitates the process of finding the actual matrix, prior to noise interference. It also aims to find out the most relevant and the most consistent matrix, besides diagnosing for collinearity.
Let, A ∈ Rm x n
Suppose, A= BCT, B ∈ R m x r, and C ∈ R n x r
Therefore, elements in A = m.n
And elements in BCT = m.r + n.r
Because A ∈ Rm x n, which is often large, there is a need to look for a matrix Ak ∈ Rm x n, where k ≤ r
As a way out the concerned research proposes the usage of the Frobenius norm of ‘A - Ak’
││ A - Ak ││ = min {││ A - Ak ││F : Ak ∈ Rm x n , r (Ak) ≤ k}
Ak
This is expected to take the form of an optimization problem. Depending upon whether it is a convex problem or not, the next steps will be decided.
Since larger data usually leads to a non-convex problem, it is proposed that we use singular value decomposition (SVD). This gives:
Ak = i uivi
So that, ││A- Ak ││≤ ││A- à ││
For any à ∈ Rm x n with rank (Ã) ≤ k