We're comparing several thousands of texts to calculate their similarity, based on Jaccard distance. The number of texts we compare can go upto 100 K. Each text has a reference number.
At the end of this comparison process, we get (n*(n-1))/2 values where n is the number of articles we compared.
We now want to extract the less similar texts after this comparison work. We want to have 2 options: extract the x less similar texts or extract all the text with a maximum similarity of n %.
We also want to generate a table where we'd get the following information: the number of texts we can extract with a maximum similarity ratio of x %, with x going from 0 to 100 by increments of 1.
To adress this part, we need to hire a mathematician. The calculation/algorithm will then be implemented by our developer.
20 freelancers are bidding on average $149 for this job
Hello, respect employer! I have experience in mathematics and programming. Also, I made task, where we use texts similarities (like levenshtein distance) for semantic images searching. Best regards, Dmitry S.