I need a tool developed that can take the following INPUT:
1-5 columns of excel or csv data of text which are consumer-provided brand names. There will be noise in the data like multiple spellings (e.g. "Adidas" and "Addiddas"), non-answers (e.g. "dont know"), and blanks.
I need the tool to provide the following OUTPUT:
1. Cleaned / homogenized brand lists and frequencies. So the output becomes a list of:
BRAND NAME (PROPER): FREQUENCY
Adidas:17 - including Adidas, Aiddiddas, Addidas - within the scope of what Google would autocomplete or other solutions you may have
Budweiser:12 - including Budwieser, etc.
Comcast:5 - including com cast, comccast, etc.
Can be in HTML, PHP, excel, doesn't matter.
The tool needs to be delivered with a brand list of most common North American brands, and be able to add to dictionary.
The tool needs to be responsive, so that additional groupings can be forced in the output - e.g. it may not group "Coors" and "Coors Light" together, but should do so if the user requests it.
It needs to be able to learn new languages/brand lists easily in the future