Prob 0022 ‐ Names Scores - maccergit/Euler GitHub Wiki
This is straightforward - but includes file processing, sorting, and string handling. The first thing to do when tackling a problem like this is to take a look at the input file to see what format it has. Again, the name does not quite match what was given in the problem description - it is "0022_names.txt". This is not a big deal, but worth noting - we could rename it, but I like to leave it alone if possible. The other thing we see is the file is comma-separated values, with each value surrounded by double-quotes, and all on a single line. Also, the names appear to be in random order. To solve this, we need to read in the names (without any extra characters), sort the names, and then find their values and sum them.
01
Python can easily read the entire file into memory as one long string - don't forget to strip off any leading/trailing whitespace (like a newline at the end of the file). Splitting on comma characters gives us a list of names - but each name is surrounded by double-quotes - using a slice to remove the first and last characters is easy in a generator comprehension. Python includes a fast sort as a built in keyword ("sorted") to give us a sorted list of names.
With that out of the way, we now turn our attention to getting the value of each name. Since we need letter values, we note that most code pages do not have 'A' = 1, 'B' = 2, and so on (there is only one that I know of that has A-Z in the first positions of the code page). However, we also note that they also tend to have A-Z in sequence, so that the value for 'Z' minus the value for 'A' is 25. This works for ASCII, UTF-8, EBCDIC, etc... (it also helps that the letters used here are all upper-case). Once we have a function that takes a string of letters and computes the corresponding value for the string, we are almost done. It's again easy to use a generator comprehension to compute the required sum. Since this problem is a fixed size, we just get the timing results in a table (we pass in a limit of 0 simply because the timing API wants one - but the limit is ignored here) - taking the average of 100 runs is done in under a second, as the average run is only ~7 milliseconds.
02
Since comma-separated value files are commonly used (as CSV files - more correctly known as "character separated value" files, as the separator character is often TAB, or "|", or ";", or whatever...), we might expect Python to have built in support for them - and it does, as it has the standard "csv" library. This reader knows how to parse CSV files (defaulting to a "," separator, as this is the most common use) - including trimming whitespace, handling quotes, and even optionally using the first line of a CSV file as a header line. Of course, once the data is read in, the rest of the program is the same. Since I don't include the reading of the file as part of the test time, the timing is unchanged - it was just a fun exercise in seeing if we can simplify the CSV parsing.