1.10: Set - nealtran1905/PythonForResearch GitHub Wiki

Sets are unordered collections of distinct hashable objects. But what does it mean for an object to be hashable? That's a more technical topic, and we will not go into details here. In practice, what that means is you can use sets for immutable objects like numbers and strings, but not for mutable objects like lists and dictionaries. There are two types of sets. One type of set is called just "a set". And the other type of set is called "a frozen set". The difference between these two is that a frozen set is not mutable once it has been created. In other words, it's immutable. In contrast, your usual, normal set is mutable.

You can think of a set as an unordered collection of objects.

One of the key ideas about sets is that they cannot be indexed.

So the objects inside sets don't have locations.

Another key feature about sets is that the elements can never be duplicated.

So if you have a given element or object in your set, say number 3, if you try adding that number again in the set, nothing happens. This means that all of the objects inside a set are always going to be unique or distinct. Python sets are especially useful for keeping track of distinct objects and doing mathematical set operations like unions, intersections, and set differences. Let's next experiment with using sets. Let me start by creating an empty set. I'm going to create an object, a set that I'm going to call ids. And the idea is that this would contain distinct ids in my study or my data set. I can create an empty set by just using the key word set, and just following that with a set of parenthesis. In this case, I would have created a set called ids, and it would be empty. It would have no objects in it. Let's say that I want to do something a little different. I'd like to create a set that has a few members in it. And in this case, the syntax is very similar. I use the keyword set, followed by parentheses. And inside the parenthesis, I insert a list. Let's say that the numbers or the ids of our subjects are the following-- 1, 2, 4, 6, 7, 8, and 9. And this is my initial set. If I wanted to ask how many members do I have in this set, I can use the len function. And Python tells me that I have seven objects in this set. Let's say I wanted to add one more id to this set. Let's call that id number 10. So I would type ids.add, and I am adding an object with an id number 10 to my set. If I type ids, Python tells me that these are the current members of the set. And id number 10 has been added to this set. If I try adding, let's say, number 2, which I already have in my set, and then I ask what other members of the set now, you'll see that nothing has happened. And this is one of the key features of set. In other words, if you already have an object in the set, and if you try adding that same object again, nothing happens. We can remove members or objects from sets using the pop function. In that case, Python returns to you an arbitrary member of that set. So I can run this a couple of times. If I look at the contents of my set, I can see now that I have five objects remaining in my set. Let me redefine my ids set. Let's say that it consists of individuals with ids ranging from 0 to 9. I can look at the contents and this looks correct. Imagine that some of these objects are males and females. So I'm going to construct a set that I'm going to call males. So it's a set. I need to build that as a list. And let's say that these are the ids of the males. A very useful property of sets is that we can use them from a mathematical set operations. I can now use the set males to define a new set that I'm going to call females. So I'm going to define females as all of the ids minus males. If I ask Python what is the type of females, Python is telling me it's a set. I can look at the contents of that set. I can also look at the contents of my males set. And I see that these two are distinct. There are other ways to perform set operations in Python.

For example, I can perform the set union operation in a very handy way.

males = set(1,3,5,7,9)

females = set(2,4,6,8)

everyone = males | females

everyone

{1,2,3,4,5,6,7,8,9}

Let's say that I want to create a set which I'm going to call everyone. And everyone consists of all of the males and all of the females. The short hand operation for a set union in Python is a vertical line. Again if I look at the contents of the set everyone, I can see that all of the set members are there.

Finally, I can take an intersection of two sets using the ampersand operation.

everyone & set([1,2,3])

everyone

{1,2,3}

Let's say I want to take everyone, and I wanted to take out another set. So this is performing at the intersection operation. I can define another set, which in this case consists of the ids 1, 2, and 3. And then I can ask Python to return everybody who is in the intersection of these two sets-- one set containing members 1, 2, and 3 and the other one containing everybody. And in this case, the answer is the set that consists of the ids 1, 2, and 3. As a simple application of sets, let's use sets to count the number of unique letters in a word. So let me first define my word of interest. Let's go with something a little more complicated, something like anitdisestablishmentarianism. I spelt that right. Now what I can do next is I can construct a set, so I just say set. I construct that from my string which is called "word". And I'm going to call this "letters". To find out how many unique letters I have in this word, I just ask Python to return the length of the letters object, which is 12. So in this case, we were able to use the set object to simply count the number of unique letters in a string.