Find Jobs
Hire Freelancers

Develop Text Classification and/or Clustering Algorithms in Python

$250-750 USD

Completed
Posted almost 8 years ago

$250-750 USD

Paid on delivery
We require assistance on the following tasks. Please contact us directly to describe how you would solve them. Russian language skills may be necessary. 1) Task: Develop/employ a text-classification algorithm in Python or R that classifies items as one of several thousand 10-digit product codes using a descriptive text field of roughly 300 characters in UTF-8 (Russian / Cyrillic). Description: We have a database of several million textual descriptions of products that have been entered by humans. Each entry is connected to a 10 digit product code, but the same product code can be used for multiple differing textual entries. We require a text-classification algorithm that probabilistically classifies a document that can then be applied to another dataset (see task 2). This task requires tokenizing, stemming, and removing stop words, and therefore you may need to know Russian or to use available NLTK packages. Similarly, several different algorithms may need to be used to improve precision. Output: Python scripts/algorithm(s) classifying documents into 10-digit product codes that can be used in task 2. 2) Task: Use the classification algorithm in (1) to classify textual entries in a second dataset. Description: Once the clean list has been created, employ a machine learning algorithm to assign the 10 digit codes to a target dataset of over 60 million textual product descriptions in UTF-8 (Russian / Cyrillic). Not all entries will have sufficient information to be classified ('leftovers') and should be marked so. For example, this could be done if no classification has a probability above some threshold. Also, the dataset in (1) only contains examples of a subset of the items in the second dataset, but we will be able to estimate which items these are. Output: Second dataset of 60 million entries are matched to 10 digit product codes. 3) Task: For the 'leftovers' of (2), develop/employ a text clustering algorithm that groups entries in k subclasses Description: We will provide you a higher-level grouping variable for the 'leftovers' and a number k that designates how many we clusters need within each grouping. Your task will be to use a text clustering algorithm to create k amount of 'clusters' within the higher groups for the 'leftovers'. Output: A unique variable designating cluster membership for each item in the 'leftovers' (those without 10 digit product codes from step 2).
Project ID: 10479220

About the project

18 proposals
Remote project
Active 8 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
Hello! My name is Andrey. I'm physicist from Russia with experience in machine learning field. I know how to implement ML methods in practice. For example, I developed predictive algorithm for sport betting. You can find additional information about this job in my Upwork profile. Also I have experience in text classification. I developed model for classification of wikipedia articales at my work. Also you can find me on kaggle.com. My nick is gradiente. But first of all I have to estimate amont of work and see text samples for classifiacation. Hope to work with you under this project!
$611 USD in 20 days
5.0 (6 reviews)
5.5
5.5
18 freelancers are bidding on average $1,329 USD for this job
User Avatar
HI there. I would love to be part of this project as it seems very interesting. I am a data scientist with experience applying data mining algorithms to large amounts of data for prediction and description. I do not have knowledge of russian language, but I do have experience using already developed packages to pre process data. I would do all tasks in python. Hope to hear back from you soon. Thanks, Daniel
$526 USD in 10 days
4.9 (101 reviews)
7.8
7.8
User Avatar
We are a group of Data Scientists based in Bangalore. Our core areas of expertise are big data and machine learning.
$10,000 USD in 40 days
4.9 (9 reviews)
6.4
6.4
User Avatar
I am a computer science professional with a PhD degree and excellent skills in Python and a number of other languages. I've done many projects involving Clustering or Classification. I'm also a fluent Russian speaker. Please see reviews on my profile. It would be my pleasure to do your project. Here is another large project in which I had to process a large volume of texts in Russian using Python: https://www.freelancer.com/projects/Python/Data-Extraction-from-Word-documents/
$1,000 USD in 10 days
5.0 (59 reviews)
6.3
6.3
User Avatar
I am very interesting in your project. I have experience in this field. If you work with me, you will get success. I am ready to work with you now. Phon.
$736 USD in 10 days
4.9 (25 reviews)
5.8
5.8
User Avatar
Dear Client, Greetings from Flowgica technologies, I have experience with these skills. We do have similar experience therefore I am looking forward to discuss and move ahead. please check our freelancer portfolio at https://www.freelancer.com/u/mmadi.html?page=portfolio I am ready to work with you,kindly waiting for your response. Thanks & Regards, Mmadi
$600 USD in 10 days
5.0 (1 review)
4.0
4.0
User Avatar
My name is Mike and I’m from UK. I work with individual clients and also provide outsourcing services for a number of UK and USA based agencies. Your project description sounds interesting to me and I do have skills & experience that is required to complete this project. I can show you some examples of my work. Please contact me to discuss your project.
$555 USD in 10 days
5.0 (1 review)
3.2
3.2
User Avatar
i have gone through your requirement we done similar kind of job before looking forward your earliest Reply on this for a project discussion Awaiting for your earliest reply
$555 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I understood the initial scope of this project. Although i want to discuss further this job in order to prepare the final concept for this project. After Complete discussion over the call or in chat, i will prepare following things for you - Technical Project Proposal - Flow chart for this Project - Execution plan (Step by step procedure with explanation how and at what that we are going to execute a particular task)
$773 USD in 20 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Currently Im working part time, where Im using R on daily basis. I have practical experience with R programming and also with classification algorithms, text mining, clustering and machine learning. Im also student in the field of Economics and Econometrics in Praque.
$1,666 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
A proposal has not yet been provided
$1,111 USD in 21 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Washington, United States
5.0
6
Payment method verified
Member since Jan 6, 2016

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.