ICE8 Description - MadhuriGumma/Python-Programming GitHub Wiki
In ICE 8 I tried to understand how Markov chains work. Markov chain is a system that is used in "Supervised learning" that transition from one state to other with the help of trained data. Supervised learning requires "learning data" to be fed to it.
In our case here we are generating names of 2 genders Boys and Girls with the help of Markov chain. For this we have fed usually used names that people use in 2 files boysNames.txt and girlsNames.txt. Thoough Markov chain helps in generating names they are usually nonsense data. It blindly learns the pattern from the fed data. But sometimes it can give very unique and good names.
To generate names of babies we first built a map which takes "order and gender" as parameters. Once we know the gender we will choose which file to read data from or which file to take as input:
def buildMap(order,gender): listOfNames = [] if gender.lstrip() == 'b': fileName = "namesBoys.txt" elif gender.lstrip() == 'g': fileName = "namesGirls.txt" else: exit("Please only enter b or g")
Basing on the order the Map tries to find out a pattern that it can use. i.e. what character usually or more frequenly comes after a character. It will give the familiarity factor for the names and forms an OccuranceMap. After that it keeps on adding keys of occurance Map to currentMap. This process goes on until all the names are read from the file and all the chars of a name to check the existence of a key i.e., finding pattern of occurrence in the names.
for n in listOfNames:
#Loop through characters within a name to find the keys and their following chars
for index in range(0, len(n) - order):
key = n[index:order + index]
nextChar = n[index + order:index + order + 1]
#Check for existance of the key
if key in occuranceMap:
currentMap = occuranceMap[key]
if nextChar in currentMap:
currentMap[nextChar] += 1
else:
currentMap[nextChar] = 1
occuranceMap[key] = currentMap
else:
newEntry = {}
newEntry[nextChar] = 1
occuranceMap[key] = newEntry
return occuranceMap
Then once a key is selected as starting char of name i.e., a or b or c.... so on, it checks for the next character to come.
def getCharsForKey(key, map):
chars = map[key]
return chars
After that it forms a list of all possible combinations that can come next and the character that has highest frequency is chosen next. if more than one char has same frequency of occurrence then it will form equal combinations for all of them.
def generateNextChar(order,name, map): chars = getCharsForKey(name[len(name) - order:len(name)], map) listOfChars = [] charsMap = list(chars.items()) for (key, value) in charsMap: for i in range(0, value): listOfChars.append(key) randomIndex = random.randint(0, len(listOfChars) - 1) selectedLetter = listOfChars[randomIndex] return selectedLetter
After that it generates the names of the people with the formed data above with the help of all the above functions and the len of the name not exceeding more or equal to "maxLength + order" of to map.
def generateNewNames(minLength, maxLlength, order,count, gender): print("") print("") print("Here are generated names. Hope you like them! :)") #OccuranceMap is a map of map. Its keys are some chars based on the Markove order and its values are the map containing # characters following the keys and the number of times they occur occuranceMap = buildMap(order, gender) c = 0 while (c < count): name = "" * order while len(name) < maxLlength + order: char = generateNextChar(order,name, occuranceMap) if char == '': break name += char #Remome some unwanted characters generatedName = name.replace('\n', '').replace('\r', '') # WE only accept names that their length is bigger than minL if (len(generatedName[order:]) > minLength ): print(generatedName[order:]) c += 1
We have used state transition of Markov in this process and inputvalidator and random packages to find a proper name, validate data and generate new names respectively.
for the given input:
the output is:
I think the name "ardou" is best name that it has given which is out of pattern and far from probability of occurance.