Opinion Fraud Detection: Uncovering Fake Identities and Misleading Reviews

Introduction

In this case, it is proposed that the use of text-based statistical analysis is a fit approach. We will scrap data from Twitter and Facebook by API open authorization and form a corpus of words. Further, we plan to visualize and analyze the data as per LNRE models. Here, the applicability of LNRE models will be proved by proving two equations- one on vocabulary size and the other on tokens - V (N) and the other on frequency class of N samples - V (m, N).

Mathematical Framework

If the pdf of a Gamma (Ф, s) distribution of a variable p is defined as :

This proves equation no. 2

Apart from that, the concerned research proposes to test the lexical diversity of such collected corpora. To test the lexical diversity, the following indices are proposed:

Codes:

Available on request

Page updated

Google Sites

Report abuse