Archive

Archive for December, 2010

Database Sharding

December 2, 2010 1 comment

Database Sharding is not a new concept but has been historically difficult to implement. Society and business alike have an insatiable appetite for growing amounts of data and as such, there is a need for larger databases than ever before.

The onset of Microsoft’s Azure presents interesting possibilities to scale in massive proportion. Utilizing Azure tables, massive scale is possible. Azure tables however is not SQL based and present a different paradigm than most development shops are familiar with. SQL Azure essentially gives us the option of having Microsoft SQL-Server in the cloud but is currently limited in database size.

How then would one implement a solution that accounts for large amounts of relational data to scale in similar fashion to Azure Tables Storage? One method would be to shard the database or in other words, breakup the database into smaller chunks and store them in as many databases as makes sense.

The details of the implementation can sound a bit daunting but as with most well architected patterns, it isn’t quite as complicated as it may first seem. This blog will introduce one such pattern that was used in a social networking application. The application was built for ultimate scalability utilizing Frameworks, Windows Azure Web Roles, Azure Worker Roles, Azure Table Storage and SQL Azure. 

Aside: An interesting by-product of this approach, especially when using Frameworks, is that you can easily mix and match relational data with Azure Tables. Frameworks can provide persistence ignorance. Both storage mechanisms have their strengths, so why not take advantage of them both within the same application?

The basic premise is that a database schema is used for a functional purpose such as person information or blogging data. Within each function and logical schema, there is any number of physical shards (or partitions if you will).

When a user logs on, we ‘fan out’ to all Person shards to find their primary key and partition id. Once we have keys, the PartitionId will point us to the database in which that person’s data lives. We then perform normal SQL operations.

Frameworks code generates the data access layer and abstracts the sharding patterns. Since the details of data access are left to code generation, the developer just needs to call the DAOs in an intelligent manner. The code below shows a PersonManger which will call the data access layer. A front-end might first call the PersonManager with a user name and password to retrieve a Person object. The Person will now have a PartitionId that points us to the shard in which their data lives. The pattern continues as blogs are kept in another shard while posts of the blog in yet another shard.

public class PersonManager : IPersonManager
{
     // Store the Data Access objects injected by the IOC
     // so that we can use them for the life of the object.
     private IPersonDao _PersonDao;
     private IBlogDao _BlogDao;
     private IPostDao _PostDao;

     // Use IOC to inject the needed concrete implementations...
     public PersonManager(IPersonDao personDao, IBlogDao blogDao, IPostDao postDao)
     {
          _PersonDao = personDao;
          _BlogDao = blogDao;
          _PostDao = postDao;
     }

     // This is a pass through method as we only want the BusinessRules layer
     // to interact with Data Access Layer.
     public Person GetByUserNameAndPassword(string username, string password)
     {
          // The generated code will 'Fan out' to all the database in
          // an asyncronous manner.
          return _PersonDao.GetByUserNameAndPassword(username, password);
     }

     // Now that we have the person, we can retrieve their blog
     public Person GetBlog(IPerson person)
     {
          // Fan out is not needed because we know the id of the
          // partition in other words, shard
          return _BlogDao.GetByPersonId(person.PartitionId, person.Id);
     }

     // Post are kept yet in another shard...
     public IList<Post> GetPostList(IBlog blog, string[] tagArray)
     {
          // Again, fanning is not needed
          return _PostDao.GetByBlogAndTags(blog.PostPartitionId, blogId, tagArray);
     }

} 

This has been an introduction into Database sharding using Frameworks. In the next part we will deep dive into the data access layer and look at the generated code.

%d bloggers like this: