Programming for Everybody: Assignment 07.2 Files - edorlando07/datasciencecoursera GitHub Wiki

###Python Data Structures

7.2 Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:

X-DSPAM-Confidence: 0.8475

Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution. You can download the sample data at http://www.pythonlearn.com/code/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.

Use the file name mbox-short.txt as the file name.

A sample of the file structure is listed below:

Received: (from apache@localhost)
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit) id m05ECIaH010327
for [email protected]; Sat, 5 Jan 2008 09:12:18 -0500
Date: Sat, 5 Jan 2008 09:12:18 -0500
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to [email protected] using -f
To: [email protected]
From: [email protected]
Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-5-x/content-            
impl/impl/src/java/org/sakaiproject/content/impl
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan  5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

Author: [email protected]
Date: 2008-01-05 09:12:07 -0500 (Sat, 05 Jan 2008)
New Revision: 39772

Modified:
content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl/ContentServiceSqlOracle.java
content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl/DbContentService.java
Log:
SAK-12501 merge to 2-5-x: r39622, r39624:5, r39632:3 (resolve conflict from differing linebreaks for r39622)

----------------------
This automatic notification message was sent by Sakai Collab (https://collab.sakaiproject.org/portal) from the Source     
site.
You can modify how you receive notifications at My Workspace > Preferences.

The actual code is listed below:

fname = raw_input("Enter file name: ")
fh = open(fname)

count = 0
total = 0

for line in fh:
    line = line.rstrip()
    if not line.startswith("X-DSPAM-Confidence:") : continue
    atpos = line.find(':')
    line = line[atpos+2 :atpos+8]
    line = float(line)
    count = count + 1      ##starts the count of lines in the stripped down version  
    total = total + line   ##starts the running sum of the lines

average = total/count

print "Average spam confidence: " + str(average)

The output for the code listed above is the following:

Enter file name: mbox-short.txt
Average spam confidence: 0.750718518519