Skip to content

The joy of sets in C#

September 22, 2012

.NET 4 introduced a new interface ISet<T>.  What is this and why might we want to use it?

First, a set is defined as a “data structure that can store certain values, without any particular order, and no repeated values”.  So it’s like a collection, but with uniqueness of its members enforced.  In fact, in .NET, ISet<T> implements ICollection<T>, so we can use a set wherever we are currently using a collection, if uniqueness is what we need.  And, more often than not, it turns out that a set is the more appropriate data structure to use in the real world.

For instance, imagine a call centre application that allows the staff to look up a customer details based on surname.  The API method behind this function could be declared to return a ICollection<Customer>, but as it wouldn’t be helpful to show the same customer twice in the search results, it makes more sense to instead return a set, which describes more accurately the semantics of the API.

True, collections or lists are sometimes appropriate, but in the general case, sets are usually what you want.

Implementations in .NET

.NET gives us two implementations of ISet<T> – the HashSet and the SortedSet.

HashSet<T>

The basic implementation of a set.  The class uses a hash table internally to keep track of its items, allowing it to quickly determine if the set already contains an item.  It does this by calling GetHashCode() on the object and searching for this code in the hash table.  If the lookup succeeds then the object is considered to be part of the set.

How we define the semantics of uniqueness is down to the business rules.  For instance, in our Customer example above, we may want to consider two different instances of a Customer object equal if they both have the same CustomerNumber, as they must therefore represent the same customer.  How could we do this?  We make sure customers with the same customer number return the same value for GetHashCode() by overriding the method.  Here’s an example:

public class Customer
{
    public string CustomerNo { get; private set; }
    public string FullName { get; set; }
    // and other properties...

    public Customer(string customerNo)
    {
        CustomerNo = customerNo;
    }

    public override int GetHashCode()
    {
        // use hash code of customer number
        return CustomerNo.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        return obj.GetType() == GetType()
            && String.Equals(CustomerNo, ((Customer) obj).CustomerNo);
    }
}

Some points of interest:

  • We’ve just delegated hashing responsibility to the String class here.  Much better to trust the .NET framework than to write our own function (good hash functions can be tricky).
  • Because we’ve overridden GetHashCode(), we must also override the Equals() method.  If two objects are considered Equal, then they must return the same hash code, and vice versa.
  • We should ensure that the value(s) used to calculate our hash code will not change during the life of the object.  For instance, if an object is added to a hash table with one hash code, and a property on which this hash code depends later changes, then we might not be able to read it back out again as its hash code will have also changed.  For this reason, we declared the CustomerNo property with a private setter.

SortedSet<T>

The SortedSet<T> enforces uniqueness, but also guarantees a consistent sort order.  This comes at the price of performance.  While probably still perfectly fine for most real world tasks, it uses a tree structure internally instead of a hash table, so for large sets certain operations could be slower.

Note that IComparable<T> must also be implemented for items added to the set, as this is how the SortedSet determines both if the item already exists in the set, and the order in which the items are sorted.  The set’s order is not, as you might expect, the order in which the items were added!

For instance, by default ints are sorted in ascending order, and strings will be returned alphabetically.  Here’s an example demonstrating this for strings:

var stringSet = new SortedSet<string> 
{
    "first item",
    "second item",
    "last item"
};

foreach(var str in stringSet)
{
    Debug.WriteLine(str);
}
// output:
// first item
// last item
// second item

See how the last item was returned before the second item?

If your type T does not implement IComparable<T>, or if it does but you want to define a custom sort order, then there are constructors that allow you to supply your own IComparer<T>.

Working with ISet<T>

Once of the nice things about the .NET implementation of a set is its Add() method.  Because a set must contain unique values, you might think that this method would throw an exception should we try to add the same value twice.  Instead, the method just returns a bool indicating whether the add was successful.  In the case of a duplicate item, it does nothing except return false.  This saves us having to litter our code with try/catches.  Here’s Add() in action:

var set = new HashSet<int> { 1, 2, 3 };
Debug.WriteLine(set.Count);
set.Add(1);
set.Add(1);
Debug.WriteLine(set.Add(1));
Debug.WriteLine(set.Count);
// outputs:
// 3
// False
// 3

The upshot of this philosophy is that we can easily convert a non-unique collection to a set just using the constructor:

var ints = new[] { 1, 1, 2, 2, 3, 3, 3, 3 };
var set = new HashSet<int>(ints);
Debug.WriteLine(set.Count);
Debug.WriteLine(String.Join(",", set));
// outputs:
// 3
// 1,2,3

There’s also a bunch of useful methods provided for us to calculate set equality, subsets, supersets, unions, etc.  Here an example of SetEquals():

var set1 = new HashSet<int> { 1, 2, 3 };
var set2 = new HashSet<int> { 3, 2, 1 };
Debug.WriteLine(set1.SetEquals(set2));
// outputs:
// True
Advertisements

From → Programming

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: