LINQ and Geocoding

August 30, 2007

There’s a lot you can do with LINQ and the new C# goodies that are coming up in .NET 3.0. In previous posts I’ve talked about LinqToSql, vars and Lambdas, and using LINQ to query things other than a DB. Today I want to try to do something a little more applicable and business-oriented to show just how far you can push LINQ and LinqToSQL. I’m putting on my geek hat today, cowboy size…

Geocoding Basics
Mapping applications are everywhere, and having “locational awareness” is paramount when running customer-related applications. Gathering stats about a user’s IP can tell you a lot about their physical location, which in turn can tell you a lot about the market you’re serving.

The core of geocoding is setting an address or location to a point on the globe using Latitude and Longitude. This is pretty easily done using some free services out there like geocoder.us and Google (you need a free Google Maps API key to use this API). Between the two, geocoder.us is much more accurate in terms of finding addresses, but Google is about 100 times faster if you have a lot of data.

Geocoder.us is really easy to use:

        public static Coordinates GetLatAndLong(string address) {
            Coordinates result = null;
            //using RPC.Geocoder - this is a free service

            //do the web request and get the reply
            string reply = SubSonic.Sugar.Web.ReadWebPage("http://rpc.geocoder.us/service/csv?address=" + address);

            //the reply is in CSV Format, like this:
            //38.898748,-77.037684,1600 Pennsylvania Ave NW,Washington,DC,20502
            if(!String.IsNullOrEmpty(reply)) {
                try
                {
                    result = new Coordinates();
                    string[] resultCSV = reply.Split(',');
                    if (resultCSV.Length > 0)
                    {
                        result.Latitude = resultCSV[0];
                        result.Longitude = resultCSV[1];
                        result.Address = resultCSV[2];
                        result.City = resultCSV[3];
                        result.State = resultCSV[4];
                        result.Zip = resultCSV[5];
                    }
                }
                catch
                {

                }
            }
            return result;

        }

I’m using SubSonic to grab the web page, but the code’s really simple and if you need it, leave me a comment and I’ll post it.

Preparing Your Data
So I have an old database of about 3000 US hotels here on my box that I’m going to use for a demo. They don’t have any Latitude/Longitude data - so I’ll need to add that in. The table is really basic - just name and address info. To this I’m going to add two fields of NULLABLE REAL, called Latitude and Longitude.

Next, I kick up a LinqToSql class and drag my Hotels table on. I want to make a minor adjustment to the model - I don’t like working with Sytem.Float, so instead I set the model to use System.Double for lat/long instead:

dbcontext 

Now I just need to run a loop on all the hotels in the table (where longitude is null), and then set the values accordingly:

        static void GeoCodeData()
        {
            //create the context
            Geo.DataContext db = new Geo.DataContext();

            //load up the hotels that we don't have any geo data for
            var result = from hotels in db.Hotels
                         where hotels.Longitude==null
                         select hotels;

            //counters for output
            int totalHotels = result.Count();
            int currentHotel = 1;

            foreach (var hotel in result)
            {
                //our result object
                Geo.Coordinates coord;

                //build a usable address from the record
                string fullAddress=hotel.Street+", "+hotel.City+","+hotel.State+" "+hotel.Zip;

                //notify what's going on
                Console.WriteLine("(" + currentHotel.ToString() + " of " + totalHotels.ToString() + " - " + fullAddress);

                //look it up
                coord = Geo.GeoCodeService.GetLatAndLong(fullAddress);
                if (coord != null)
                {
                    //set the values
                    hotel.Latitude = float.Parse(coord.Latitude);
                    hotel.Longitude =float.Parse(coord.Longitude);
                    Console.WriteLine("Found it! **************************");

                    //save the data as we go - want to save each record
                    //in case of trouble, like network outage or something
                    db.SubmitChanges();
                }
                else
                {
                    //can't find the address
                    Console.WriteLine("XXXXXXXXXXX No luck XXXXXXXXXXXXXXXX");
                }
                currentHotel++;
            }

            //save it
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

This is going to take a while since I have 3000 records. If you have a lot - use Google’s API, it’s a lot faster. I have the code that you can download here to use with Google if you like.

Assuming that all worked out, let’s GEEK OUT!

Running a Proximity Query
If you liked math in school, you’re going to love this next part. Let’s create a business case so we can write some code for our newly-geocoded data:

The application needs to locate hotels for people within a specified distance of a given location. For instance, if we’re traveling to La Mesa, CA to learn more about Jon Galaxy and Submarines, we might need a place to stay. The application should return all hotels within x miles of La Mesa.

There’s a lot of math involved with translating distances between two sets of coordinates on a sphere. The X and Y of it all gets skewed by the fact that the earth is round, and you have to run up some very in-depth trigonometry to get this right.

There is a formula you can use, and it’s called the Haversine Formula:

Now I forgot long ago how to read this notation, but a quick Google or two led me to this post on (appropriately named) “Ask Dr. Math” (note the initial “presumption” - I love math geeks):

Presuming a spherical Earth with radius R (see below), and that the
locations of the two points in spherical coordinates (longitude and
latitude) are lon1,lat1 and lon2,lat2, then the Haversine Formula
(from R. W. Sinnott, “Virtues of the Haversine,” Sky and Telescope,
vol. 68, no. 2, 1984, p. 159):

  dlon = lon2 - lon1
  dlat = lat2 - lat1
  a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
  c = 2 * atan2(sqrt(a), sqrt(1-a))
  d = R * c

[where R (the Earth's Radius)] is 360 degrees times 60 minutes/degree times 1.852 km/minute = 40003.2 km. The implied radius is the circumference divided by 2 pi: R = 6367 km = 3956 mi

Bleh. Well, after some tinkering I was able to write up a nice method that ended up being about 180 lines long - for one equation! All to calculate the distance between two points on a sphere!

I decided to try and flex Lambda expressions for what they’re good for, and was able to write the above (with some minor tweaks to avoid a multi-step function) into this (the Lambda is at the bottom):

        const Double EARTH_RADIUS_IN_MILES = 3956.0;
        //helper method to make reading the lambda a bit easier
        static double ToRadian(double val)
        {
            return val * (Math.PI / 180);
        }
        //helper method for converting Radians, making the lamda easier to read
        static double DiffRadian(double val1, double val2)
        {
            return ToRadian(val2) - ToRadian(val1);
        } 

        /// <summary>
        /// Function to evaluate the distance between two points on the Earth
        /// </summary>
        Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) =>
        EARTH_RADIUS_IN_MILES * 2 *
        (
            Math.Asin(
                Math.Min(1,
                    Math.Sqrt(
                        (
                            Math.Pow(Math.Sin((DiffRadian(lat1, lat2)) / 2.0), 2.0) +
                            Math.Cos(ToRadian(lat1)) * Math.Cos(ToRadian(lat2)) *
                            Math.Pow(Math.Sin((DiffRadian(lon1, lon2)) / 2.0), 2.0)
                        )
                   )
               )
           )
         )
        ;

This bit of code illustrates three things about Lambda expressions:

  1. Their roots. They come from computational languages and they’re all about Math, and it shows here!
  2. They are pretty damn powerful - reducing my 180 lines of code to 14 - and I could have made it one if I wanted to
  3. They’re ugly. I wouldn’t want anyone to try and understand this. At the same time - that’s precisely what made the “regular” function so long - I tried to be as clear as I could with all the math and ended up creating a monster. I spose if you’re into math, reading a Lambda expression is pretty easy - so maybe this is preferred?

Now the good news here is that you can put this into a Scalar function if you want to put it into your DB. It’s up to you and your business to determine maintainability - there is a way to do this (I’ll show you below) without stuffing logic into your DB - heck you can even port the solution to MySQL if you want!

Anyway - here’s the SQL Function (which I grabbed from here):

CREATE FUNCTION [dbo].[DistanceBetween] (@Lat1 as real,
                @Long1 as real, @Lat2 as real, @Long2 as real)
RETURNS real
AS
BEGIN

DECLARE @dLat1InRad as float(53);
SET @dLat1InRad = @Lat1 * (PI()/180.0);
DECLARE @dLong1InRad as float(53);
SET @dLong1InRad = @Long1 * (PI()/180.0);
DECLARE @dLat2InRad as float(53);
SET @dLat2InRad = @Lat2 * (PI()/180.0);
DECLARE @dLong2InRad as float(53);
SET @dLong2InRad = @Long2 * (PI()/180.0);

DECLARE @dLongitude as float(53);
SET @dLongitude = @dLong2InRad - @dLong1InRad;
DECLARE @dLatitude as float(53);
SET @dLatitude = @dLat2InRad - @dLat1InRad;
/* Intermediate result a. */
DECLARE @a as float(53);
SET @a = SQUARE (SIN (@dLatitude / 2.0)) + COS (@dLat1InRad)
                 * COS (@dLat2InRad)
                 * SQUARE(SIN (@dLongitude / 2.0));
/* Intermediate result c (great circle distance in Radians). */
DECLARE @c as real;
SET @c = 2.0 * ATN2 (SQRT (@a), SQRT (1.0 - @a));
DECLARE @kEarthRadius as real;
/* SET kEarthRadius = 3956.0 miles */
SET @kEarthRadius = 6376.5;        /* kms */

DECLARE @dDistance as real;
SET @dDistance = @kEarthRadius * @c;
return (@dDistance);
END

If you visit the site with the link above, please note that the code he has there is incorrect (the C# stuff). I don’t have the math chops to know why, but I do know that I had to translate Dr. Math’s equation into a Lambda to make it work properly.

Putting It Together
OK so now we have the code - let’s use it! The first thing I can do, if my boss is a DBA and I’m in a SQL Server shop is to crank up an SP that uses the function above:

CREATE PROCEDURE FindNearestHotel
(
    @sourceLat real,
    @sourceLong real,
    @mileage int
)
AS
BEGIN
        SELECT     Name, Address, Latitude, Longitude,
        dbo.DistanceBetween(
            @sourceLat,
            @sourceLong,
            Hotels.Latitude,
            Hotels.Longitude) as distance
        FROM         Hotels
        WHERE dbo.DistanceBetween(
            @sourceLat,
            @sourceLong,
            Hotels.Latitude,
            Hotels.Longitude)<@mileage
END

If I run this (using a test Lat/Long) it works nicely:

 

sptest

Exploding LINQ
When I wrote up that Lambda I remember giggling to myself, thinking “there’s just no way LINQ is going to know what to do with this!”. Was I right? Let’s see…

The first thing I did was to create a partial class and extend the Hotel class to have a method called “CalculateDistanceFrom()” - doing it this way will allow me to use the method as a condition in a LINQ where statement. So I added in this to my partial, referencing the Lambda function (and helpers) above:

        public double CalculateDistanceFrom(double fromLat, double fromLon)
        {
            return CalcDistance(this.Latitude,this.Longitude, fromLat, fromLon);
        }

Next, I worked up the following LINQ statement:

 

        Geo.DataContext db = new Geo.DataContext();
        var result = from hotel in db.Hotels
                    where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100
                    select hotel;

It reads nice doesn’t it? And the survey says….

KABOOOM!

LINQ didn’t like this at all - but I have to applaud it for trying! The error I got was “Cannot translate this to SQL” and that makes perfect sense, since there’s some whacky gymnastics in there. Hmmm - looks like I’ve maxxed out LINQ! Or did I…

I remember seeing some great querying examples from Anders and others, so I decided to do some caching of the data, and querying the cache instead of LinqToSql (db.Hotels). This is a nice alternative because it decouples my query from my DB! The List<Hotel> could have come from any DB provider (even, *gasp*, MySQL).

To make this work, I created a static List<Hotel> to hold ALL of the hotels in the DB in memory, and I then created a “singleton” access point to get this data:

        //cached data set of hotels
        static List<Geo.Data.Hotel> hotels;

        //singleton instance... sort of
        static List<Geo.Data.Hotel> GetHotels()
        {
            if (hotels == null)
            {
                Geo.DataContext db = new Geo.DataContext();
                hotels = db.Hotels.Where(hotel => hotel.Longitude != null).ToList<Geo.Data.Hotel>();
            }
            return hotels;
        }

Now, I can rewrite the query pretty easily:

        var result = from hotel in GetHotels()
                     where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100
                     select hotel;

        foreach (var h in result)
        {
            Console.WriteLine(h.Name);
        }
        Console.ReadLine();

And it worked perfectly :), which really isn’t a surprise. What is nice about this is that, once again, we see how nice the syntax is to read, rather than looping over foreach… loops.

Summary
So once again we’re reminded that there is translation going on here between you, LINQ, and your DB. I’d say I stretched my comfort zone trying to write that Lambda up - but it was pretty dang fun - especially seeing it work!

The question here is - does it belong in code, or the DB? What do you think? It is a data operation after all… isn’t it? Or is this more business logic? You tell me!

Dork Fights

August 30, 2007

I just got my rear handed to me today. I tried to make fun of Jon Galloway for being a dork, and I got dork-served:

jon_special

Don’t mess with Jon. He has a Rock Star Wife, has hung out with Sonic Youth,  and won the Showcase Showdown on the Price is Right. He can be scarey if pushed…

The C# Makeover

August 29, 2007

This post was supposed to be all about Lambda Expressions and how you can use these to strip some serious lines of code from your application. As I started working with some examples it became clear to me that Scott Guthrie already said pretty much all that needs to be said about that.

What’s more interesting to me is what’s happening to the C# language feature set! On one hand it looks like things are getting a lot cleaner and more fun (Extensions, the new Property declaration syntactic sugar, etc) but I have to tell ya - I’m starting to get confused a bit these new kids - vars and Lambda expressions, and what they’re doing here.

You might be thinking “what’s so confusing about a var or a Lambda - dude that’s CS 101!” and I’d say you’re right. Why are they here in my statically-typed C#?

As far as technical things go - It’s easy to understand how to use vars and Lambdas, it’s harder to understand when or why to use them.

So let’s get this out of the way up front: C# is a typed language; always has been, always will be. Microsoft is introducing some new keywords and abilities to C# with the 3.0 version that will allow it to be a little “looser” and more “dynamic” - but it’s important to also understand that

vars and Lambdas were introduced to facilitate LINQ. You may now continue with your normally scheduled programming…

Given that - I think it’s very important to understand what these things are doing “under the hood” so you can understand when to use them.

Vars Are Not Ducks
There are some who’ve compared vars to Ruby’s (and other dynamic languages) ability to do type-interpretation on the fly (duck-typing), and that’s not the case at all with vars. In fact I’ll go so far as to say that it’s not even close.

Duck Typing (and dynamic-language programming in general) is a different approach and mindset altogether. I’ll get to that later - for now I want to go into how vars work.

Vars need to know what’s coming in order to type themselves. You can’t fool Mother Compiler - you can only hold her off for a bit. In this case a var is “late-typed” (or latent-typed) based on what you assign it to:

var myBadHabit = DateTime.Now;

If you compile this statement then look at the IL, you’ll see:

.locals init ([0] DateTime myBadHabit)

When using LINQ, a var is “late-typed” for you to be the return set (the “select” statement) so that you can use a typed result set for the query return. So in the following LINQ query, “result” is our Anonymous Type that’s typed as … <AnonymousType>:

Northwind.NorthwindContext db = new NorthwindContext();
var result =
      from products in db.Products
      select new {products.ProductID, products.ProductName};

 

If I open up reflector, The IL looks like this (it’s kind of fun to see LINQ’s Expression Tree here and all the code that’s saved by the sugary LINQ syntax):

ParameterExpression CS$0$0000;
NorthwindContext db = new NorthwindContext();
var result =
db.Products.Select(
        Expression.Lambda(Expression.New((ConstructorInfo)
        methodof(<>f__AnonymousType0<int, string>..ctor, <>f__AnonymousType0<int, string>), 

        new Expression[]
        { Expression.Property(CS$0$0000 = Expression.Parameter(typeof(Product), "products"),
        (MethodInfo) methodof(Product.get_ProductID)),
        Expression.Property(CS$0$0000,
        (MethodInfo) methodof(Product.get_ProductName)) }, 

        new MethodInfo[] { (MethodInfo)
        methodof(<>f__AnonymousType0<int, string>.get_ProductID, <>f__AnonymousType0<int, string>),
        (MethodInfo) methodof(<>f__AnonymousType0<int, string>.get_ProductName, <>f__AnonymousType0<int, string&gt ;) }), 

        new ParameterExpression[] { CS$0$0000 }));

 

Of particular note here is <>f__AnonymousType0<int, string> - this is my return type (since I’m saying select new ProductID and ProductName). Moreover, looking around Reflector a bit I can see that this class is declared as part of my project (as well as a note saying this was compiler-generated). I went ahead and took a look at this Anonymous Type and here’s what it decompiles too (note the use of the constructor, as well as the methods hidden from the debugger - wonder why they do that?):

[CompilerGenerated, DebuggerDisplay(@"\{ ProductID = {ProductID}, ProductName = {ProductName} }", Type="<Anonymous Type>")]
internal sealed class <>f__AnonymousType0<<ProductID>j__TPar, <ProductName>j__TPar>
{
    // Fields
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <ProductID>j__TPar <ProductID>i__Field;
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <ProductName>j__TPar <ProductName>i__Field;

    // Methods
    [DebuggerHidden]
    public <>f__AnonymousType0(<ProductID>j__TPar ProductID, <ProductName>j__TPar ProductName)
    {
        this.<ProductID>i__Field = ProductID;
        this.<ProductName>i__Field = ProductName;
    }

    [DebuggerHidden]
    public override bool Equals(object value)
    {
        var type = value as <>f__AnonymousType0<<ProductID>j__TPar, <ProductName>j__TPar>;
        return (((type != null) && EqualityComparer<<ProductID>j__TPar>.Default.Equals(this.<ProductID>i__Field, type.<ProductID>i__Field)) && EqualityComparer<<ProductName>j__TPar>.Default.Equals(this.<ProductName>i__Field, type.<ProductName>i__Field));
    }

    [DebuggerHidden]
    public override int GetHashCode()
    {
        int num = 0x6d97624d;
        num = (-1521134295 * num) + EqualityComparer<<ProductID>j__TPar>.Default.GetHashCode(this.<ProductID>i__Field);
        return ((-1521134295 * num) + EqualityComparer<<ProductName>j__TPar>.Default.GetHashCode(this.<ProductName>i__Field));
    }

    [DebuggerHidden]
    public override string ToString()
    {
        StringBuilder builder = new StringBuilder();
        builder.Append("{ ProductID = ");
        builder.Append(this.<ProductID>i__Field);
        builder.Append(", ProductName = ");
        builder.Append(this.<ProductName>i__Field);
        builder.Append(" }");
        return builder.ToString();
    }

    // Properties
    public <ProductID>j__TPar ProductID
    {
        get
        {
            return this.<ProductID>i__Field;
        }
    }

    public <ProductName>j__TPar ProductName
    {
        get
        {
            return this.<ProductName>i__Field;
        }
    }
}

The upshot here is that vars generate some serious code - all for good reason when using LINQ:

But NOT for a good reason if you’re being lazy - which is the point of this whole post.

If you find yourself using “var” anywhere that’s not within a LINQ statement, it’s probably not a good idea.

It’s not duck-typing, and the compiler doesn’t “infer” anything - you tell it exactly what you need and it will make it for you (which is kind of groovy).

I think it’s worth going further here, showing explicitly what duck-typing is and why some people love it. I’ll be using Ruby (surprise) for these examples.

Ruby, Quack For Me
Phil Haack wrote up a nice post about some “duck-typey” feature in C#, and how they can benefit you. While I disagree that these things are anything but “late” or “coerced” typing - I totally agree with the spirit of the article:

The problem for me is [that] this is a lot more code to maintain just to get around the constraints caused by static typing. Is all this additional code worth the headache?

When C# or Java developers talk about dynamic languages, they’re usually dismissed as “silly” or “hippy” languages that slow down a processor and bring a non-threaded web server to it’s knees. This is all changing as servers get bigger, and popular web servers (like Apache and IIS) implement threaded CGI support. In fact IIS 7 is stating pretty clearly that it will run PHP and Ruby up to 5 times faster than other platforms, simply because of it’s native threading/pooling model (using FastCGI).

So don’t be surprised to find yourself coding in Ruby someday (if you’re not already). It takes a while to get used to not having types, but pretty soon you’ll get the hang of the language, and in fact come to realize that:

Duck-typing is all about what an object does, not what it is - The Cuato Rule

To some that might read like a license to be sloppy. To a Ruby-head they see Java/C# as a reason to be verbose and “codey”.

In the spirit of Detente, and getting to know each other, let’s take Phil’s example of a Duck/Rabbit and put it in terms of code:

Suppose we have a method that takes in a duck instance, and another method that takes in a rabbit instance. In a dynamically typed language that supports duck typing, I can pass in my object to the first method as long as my object supports the methods and properties of duck in use by that method. Likewise, I can pass my object into the second method as long as it supports the methods and properties of rabbit called by the second method. Is my object a duck or is it a rabbit? Like the above image, it’s neither and it’s both

A Ruby-head might point out that methods don’t expect types, they expect abilities - which is sort of splitting hairs on semantics. But it’s a valid point since that is the mindset of using a language like Ruby - it’s all about what an object can do - because by God if it can, it will!

To do this I am going to use a “Mixin” in Ruby - which is sort of like an interface in C#, and sort of like an INCLUDE in C++. Essentially a Mixin is a module that allows objects to share functionality. In Ruby, a Mixin is considered an “Adjective” to the classes “Noun” (it defines an ability, or “flavor” of the class) - so in our case I’m going to create a Mixin called “Chatty” that will define a method called “greet” which will be shared among the classes:

module Chatty
    def greet(other_animal)
        puts "It's "+other_animal.class.name+" season! Not "+self.class.name+" season!"
    end
end

This method is simply looking at the class names of the objects involved, then outputting something to the screen. Easy enough - now I need to create the classes to implement the Mixin:

class Duck
    include Chatty
end

class Rabbit
    include Chatty
end

Not coincidentally, you need the “include” keyword in Ruby if you’re going to use a Mixin - now each of these classes implements the “greet” method. You can overwrite this method if you want by simply writing a new one with the same name.

Finally, let’s write out the method, remembering that each animal involved has to be able to “greet” the other:

def converse(animal, other_animal)
    if(animal.respond_to?(:greet) && other_animal.respond_to?(:greet))
        puts animal.greet(other_animal)
    else
        puts "I don't speak "+other_animal.class.name+"-ish"
    end
end

Notice in line 2 that I make a check to see if the animal involved can use “greet”, and if not I output something appropriate. We don’t care what “Type” of animals are involved here - only that they can greet each other. Now if I run this code:

bugs = Rabbit.new
daffy = Duck.new

converse(bugs, daffy)

The response is:

“It’s Duck season! Not Rabbit season!”

This is exactly what Phil was talking about in terms of “I can pass in my object to the first method as long as my object supports the methods and properties of duck in use by that method” however I think it’s very, very important to look at this in terms of function - not types. No methods expect a type - just the abilities of the type.

Compiler Freedom
Not having a compiler look over your shoulder for type errors can be a little scary. But dynamic language guys will tell you that you “just need to test more”. That’s a worth while idea since complete testing is always a good idea.

One of the really neat features of Rails is the way they can interpret methods that don’t exist. To illustrate, suppose I have a database full of Ducks and Rabbits, and I want to run a query on the Ducks:

ducks=Duck.find(:all)

This will run a query which returns every duck. If I wanted to add some conditions, I could do that as well:

ducks=Duck.find(:all, :conditions => [:name => 'quackers', :age =>10])

And that works OK, but Rails also offers this little gem:

ducks=Duck.find_by_name_and_age('quackers', 10)

This last bit of code implements the “Missing#Method” exception catcher since this method doesn’t exist - I made it up! It then parses the statement into SQL syntax! This works nicely since there’s no compiler, and moreover - look at how that reads!

There’s a price for this kind of thing, of course - I’ll “duck” that and leave it up to you and your needs. If you’re not running Twitter, something like Rails can be a good thing for you and your team.

Summary
I really dig that the DLR is going to be folded into .NET and we’ll be getting access to all kinds of nice dynamic languages. I’ll find it very interesting to see how the community reacts to it’s presence. So far .NET has been all about static typing and, let’s face it, some verbose code structures - it’s nice to see this move towards brevity with things like the new property declarations and Lambda expressions.

What do you think of all this?

I’ve been working lately with LINQ and LinqToSql in an effort to see how SubSonic can “play nice” with the emerging framework. In my previous posts I discussed the DataContext in LinqToSql and what it means from an architectural point of view. I also discussed ideas for abstracting the framework a bit.

In this post I want to go a little deeper with LINQ as a language, and see what’s under the hood of the technology, and get to know the players a bit.

Dramatis Personae
There are 3 major elements to understanding a LINQ query, and it pays to take a second to get to the know them VERY well, since understanding them will allow you to work with the language much easier (because LINQ is very much a language in and of itself). These guys are:

  1. The new var keyword
  2. System.Data.Linq.Table<> and
  3. The Lambda Expression

Consider this LINQ query as an example:

            Northwind.NorthwindContext db = new NorthwindContext(myConnectionString);
            var result =
                  from products in db.Products
                  select products;

I already dived into the DataContext - this is the thing that works up a connection to your DB, and then keeps track of all the changes you make during your context “session”. Let’s take a look at the other stuff though…

Fashionably Late Typing With VAR
This is a new construct for C# and you can think of this a “Late-typed”. A lot of people (self included) thought of this as “variant” - sort of a “bucket type” - but that’s not accurate. All that var does is let you defer typing until you know what it is you’re working with. It’s not the same as the Object - it’s meant to be worked with as a typed value. Consider this:

    var myDateVar=DateTime.Now;

    object myDateOb=DateTime.Now;

These are both valid assignments, but if you try to work with myDateOb as an object…

    myDate=myDateOb.AddDays(4);

… you’ll get a compiler error because it’s an object, not a DateTime object.

If you do the same with myDateVar:

    myDate=myDateVar.AddDays(4);

Everything works out fine.

Why do this? Because of LINQ! There’s a gap when working with a query language like LINQ - what do I do with my results? Using something like a Reader is nice - but that doesn’t allow you to use LINQ on non-DB queries (like querying over an array of strings let’s say). DataTables are nice too but they don’t have any kind of type-referencing either.

By using a var, you can return a result set and work directly with it as a newly-typed object. So if we rewrite the query above, we can iterate over the result set using the same var keyword:

var_query

There are all kinds of great applications of late-typing that allow you to work in a “duck-typing” way with a degree of freedom from compiler pain. Duck-typing (working with compiler-inferred types) is more a style of programming than it is a technology. I won’t go into that here, but there should be some recognition that introducing this “loosey goosey” way of programming into a pretty strict language set (C#) can cause some headaches. This, more than anything, is a good reason to step out and learn a new language on the weekend (like Ruby!) that does this kind of thing so you can see for yourself the goods and bads from a learned perspective (which is always important).

System.Data.Linq.Table<>
When working with a LINQ statement, you’ll be running queries against a representative “table” that derives from the type System.Data.Linq.Table<>. The type that fits itself in that Generic Bikini <> is defined by your from clause (and you can have more than one). So in our example query, this would be DataContext.Products. Since we used LinqToSql to query the database, LINQ knows that when we say “from db.Products” we’re talking about a System.Data.Linq.Table<Product>. In fact

the var result set of a LINQ query is a System.Data.Linq.Table<whatever your select statement is>. This is KEY to understanding the way LINQ works.

There are some key methods in this class, and they are:

  • Single() - returns a single record according to the passed-in Lambda expression (more below)
  • Where() - returns a subset of records according to a Where()
  • Count() - returns the count of the rows in the System.Data.Linq.Table<>
  • ToList<>() - sends the results to a typed list
  • ToArray<>() - sends the results to a typed array

What’s even more important with this class is that it’s full of query methods that allow you to do things like

  • Aggregate()
  • Sum()
  • GroupBy()
  • Any()
  • Average()
  • Contains()

and so on. This might seem confusing - why in the world would you want to run a query then Aggregate (or GroupBy etc) on the result set? The answer is that System.Data.Linq.Table<> not only holds the result, it also constructs a LINQ query! So you can use these methods easily with LINQ:

        Northwind.NorthwindContext db = new NorthwindContext();
        decimal? result =db.Products.Sum(p => p.UnitPrice);

Notice here that I didn’t need to use a var (though I could have if I wanted). Since db.Products is a System.Data.Linq.Table<Product> - I can just query it and it goes and gets the data. Many of these methods return IEnumerable or IQueryable objects (which you can think of as System.Data.Linq.Table<> constructs), which you can append onto your query as needed. So if you wanted to use Where() instead of the build in where clause, you could:

        Northwind.NorthwindContext db = new NorthwindContext();
        var result =
              from products in db.Products.Where(p => p.CategoryID==5)
              select products;

I’ll go more into these aggregate and advanced queries down below. First we need to figure out just what this Lambda thing is we’re working with.

Method Shorthand With Lambda
Let’s jump right to it - Lambda expressions are methods in and of themselves, and the whole idea to them is that they don’t need to be typed - they infer their types. A lot of people call them Anonymous Methods which is partially true - they’re actually anonymous method degate shorthand (huh?).

The syntax for a Lambda expression is: parameter(s) => function.

No types are needed - they are inferred from the parameters (Left Side) of the Lambda by their use in the Right Side

To fully get this, let’s write a simple method to return a square of a number, and then translate that to a Lambda:

        decimal GetSquare(decimal number){
            decimal result=number*number;
            return result;
        }

That’s not that much code is it? It could be written nicely in Lambda form however like this:

x => x * x;

In this Lambda we’re saying “take this parameter, x, and shove it through this function x * x”. This statement is key to understanding why Lambdas are a big deal to LINQ:

Lambdas can be parsed into a Left/Right expression (Binary) for use in LINQ Expression Trees

Understanding this concept, you can see just how powerful the LINQ query language can be by embedding methods and expressions withing methods and expressions.

Another note on Lambdas - you might see the term “Func” all over the intellisence prompter when running up queries - this is an “Anonymous Delegate” declaration - basically a definition holder for a Lambda expression. So to fully write out the Lambda expression above in 3.5 code, we’d use:

    Func<decimal, decimal> = x => x*x;

Then to use it, you’d write:

    decimal root=f1(4);

Now we’re getting to the meat of it all. When you’re working with a LINQ query and you specify a “Where” (for example), you’re asked to pass in a “Func<Type,bool>” - this is an Anonymous Delegate that gets defined at runtime by you. In other words, you get to specify to a pretty detailed level what you want to see. So consider this:

    Northwind.NorthwindContext db = new NorthwindContext();
    List<Northwind.Data.Product> result=db.Products    .Where(p => p.ProductID >20 && p.CategoryID ==5 && p.Deleted ==false).ToList<Product>();

In this expression we’re passing in the parameter Product, and then a boolean expression - which is precisely the “Func” signature you see in intellisense: “Func<Product, boolean>”. You can see how this type of “anonymous expression generation” can come in very handy!

This expression can be shorthanded by using the where clause:

    var result =
    from products in db.Products
    where products.ProductID > 20 && products.CategoryID == 5 && products.Deleted == false
    select products;

In fact you can now see that LINQ is essentially two things: a massive expression parser and a set of Extension Methods. The where clause above is simply an extension method for creating the same Lambda statement above it.

Bugs In The Butter
In working up all the examples for this post I decided I was going to do some cool rollups and joins. Turns out that it *seems* to be broken. I’ve communicated what I found to the MS team and they’re looking into it.

To illustrate, let’s take a look at a simple join statement on a rollup query:

    var result =
          from products in db.Products
          join orderItems in db.Order_Details on products.ProductID equals orderItems.ProductID into joined
          select new {products.ProductName, OrderTotal=joined.Sum(t => t.UnitPrice*t.Quantity)};

In this query, I’m joining the Order Details table (shows renamed properly with an underscore :p ) into a simple table called “joined”. Simple enough so far. Next I’m returning a result set with the name of the product and a total for all orders on it. Notice how Sum() takes a Lambda? Remembering from discussion above you can dissect this Lambda by now (I hope) and see that it can be transalated to “Func(Type, decimal)”.

Our left side (the parameter) is the type Product, and the right side is our decimal: UnitPrice * Quantity. The important thing here is the right side: it’s type “decimal”, not nullable type “decimal?”. That should be OK right?

Well it’s not unfortunately. Running up the debugger you can see the SQL that’s created for this query:

SELECT [t0].[ProductName], (
    SELECT SUM([t1].[UnitPrice] * (CONVERT(Decimal(29,4),[t1].[Quantity])))
    FROM [dbo].[Order Details] AS [t1]
    WHERE [t0].[ProductID] = [t1].[ProductID]
    ) AS [value]
FROM [dbo].[Products] AS [t0]

A nested SELECT? Dude where’s my JOIN? Before I get my crank on - note the intelligent CONVERT. Since Quantity is an int, the multiplication with UnitPrice in SQL Server would have resulted in an int result - not what we want at all. LINQ is smart enough to convert this for us so we don’t need to stress on it :).

But it’s this conversion that’s also causing us a problem - but in the code this time. Since LINQ is using a nested SELECT here without a JOIN, it’s effectively doing a LEFT OUTER - returning ALL products and then rolling up stats. Shouldn’t “join” be an INNER by default since that’s what it is in SQL Server? What if I have some products that don’t have orders?

The answer: Runtime Crash.

To see this more clearly, add a record to your Products table and then run the above LINQ query - the error you get is

“The null value cannot be assigned to a member with type System.Decimal which is a non-nullable value type.”

Hmmm - two things are wrong with this. Why isn’t it a nullable type? There could be situations where I’m Summing and I have null data (like right now)!

Also - why is LINQ doing a nested SELECT? It might not be a bug - but it’s something to be aware of.

Sum()ming It All Up
Hopefully understanding the major players with LINQ will help you to see how you can use LINQ more effectively, and also when NOT to use LINQ. In the tests I’ve done, I’ve found that writing up more advanced summing/statistical queries is much easier in SQL though I’m sure that there might be some exceptions.

Also, the error I received reminds me that LINQ, above all, is a translation tool and at some point my mind isn’t going to work like those in Redmond (for better or for worse, til death do us part). Sometimes it’s better to just use SQL (view or SP), which is something I say a lot with SubSonic :).

Would love to hear your thoughts, and if you’re an MS/LINQ person, I’d love to hear your take on the exception above.

Why don’t you rip with me, the maniac psycho
Cuz when I pull out my jammy, get ready cuz it might go
Blauh!

In my last post on LinqToSql, I covered the DataContext and likened it to the Great Gazoo and Phoebe Cates - possibly the worst analogies in any form of tech writing, ever. But that’s what you’ve come to expect so I won’t disappoint. This aint no Peanut Butter Jelly Time or Hamster Dance ya know…

In this post I’m going to take it a step further and show you how you can polish the chrome on LinqToSql so that you can cleanly separate your What from your How. I’m also going to go a little bit into how SubSonic fits into the whole “LINQ thang”.

Don’t call it a comeback
For the last week I’ve been beating on LinqToSql because, quite frankly, I want to know just where SubSonic fits in. If the ORM tool that ships with .NET 3.x does what we do, there’s no reason to duplicate. SubSonic’s all about improving YOUR experience, and we can just as easily wrap our loving arms around L2S just as easily as any other data access system.

The challenge here isn’t the technology - it’s how it’s supposed to be used. In other words - I have a problem with LINQ and how easy it is to stick data access code where it shouldn’t go. It’s twice as tempting with LinqToSql. When first created, your model practically BEGS you to let it run loose in your code behind, batting it’s eyes innocently at you as it sticks the fake ID in its back pocket, ready to crawl out of your application come midnight.

We’ve tried to make SubSonic as malleable as we can so that you, as the architect, can use its goodness wherever you please. LinqToSql sort of does this, but it’s not straightforward from the get-go.

I been here for years, Rockin’ my peers
ORM’s been around a long time, and it bears mentioning right up front that

This is ORM, the Microsoft Way

I’ll explain more as I go along - but keep that thought in your mind. The best explanations involve lots of code, so let’s jump right in.

The first thing you want to do is create a separate project. This will make sure that the protection patters we use below can be enforced. So right-click, add a class library project, remove Class1.cs (why do they put that in there?), and then right-click your project, selecting “Add Linq to SQL classes”. It’s a good idea to name it something relevant - like NorthwindSchema or something.

When finished, you get a nice Class Diagram that lets you get your ORM on:

classdiag

The mapper let’s you do all kinds of things - including set inheritance, relationships, naming, and so on.

When you first “map stuff out”, you’re ready right away to do some coding. By default your DataContext class is the same name as the L2S file you created - I named mine NorthwindContext.

To grab some data, all you need to do is create a context to work in, and then ask for the data:

NorthwindContext db=new NorthwindContext();
Gridview1.DataSource=db.Products;
Gridview1.DataBind();

Man, that’s easy! It’s actually too easy - and will tempt you into some bad habits; namely writing the code above.

Puttin’ suckers in fear
Every project is “small, quick and simple” at some point - even for the prototype. That’s not a license to be sloppy, and often the compromises made to run out a quick prototype never get refactored out like they should (guilty as charged). While I completely agree with the Premature Optimization theory, it’s still important to make sure that your skeleton approach is sound so you don’t build crap on a pile of crap, wrapped crappily in crap coating.

Things to pay attention to up front:

  • Abstract the Database and how you call your DB routines
  • Put your business logic in a central place
  • Keep the DB away from your UI

I’ll keep the focus of this post on the Database and suggest that if you want to keep things on the simple side, go with something like a data-broker pattern, where all of your calls for data involve a static class returning data object (aka “models” or “entities” in the ORM world). I’ll be using this model for the rest of the post.

Makin’ the tears rain down like a monsoon, Listen to the bass go boom
Now that we have our model, we need to crank things down a bit. So the first thing I’m going to do is to change the namespaces of the generated code, and also reset the inheritance model. To do this, right-click in the diagram space and select “Properties” to bring up the Property Sheet, then get to settin’:

props

Things to notice here:

  • I renamed the Context to something meaningful and I like the pattern [DBName]Context
  • I set the Context namespace to “Northwind” and the entities to “Northwind.Data” - this will help a lot when coding and keep stuff “tidy”.
  • I locked down the Inheritance to “sealed” so no data objects can be extended outside the project
  • I locked down access to the Context to internal so that it can’t be used outside the project

The next thing I did was to create my broker class in the same project. This is a static class that hands out data as needed by the application. I called mine “NorthwindService”.

From an external application (a test application for instance), your DB interface class now looks like this:
visibility

Nice and tight! There’s no hint of LinqToSql anywhere - which is how it should be. The Data{} namespace simply exposes the entities, but no DB logic. Just how we want it.

Explosion, overpowerin’; Over the competition, I’m towerin’
LinqToSql exposes groups of entities using generics. So if you want to get a group of records to iterate over programmatically (in other words you’re not DataBinding) you’ll need to do a ToList<Type>():

NorthwindContext db=new NorthwindContext();
List<Product> prods=db.Products.Where(p => p.CategoryID==5).ToList<Product>();

This is OK most of the time, but it’s a bit of a pain to work with Lists directly - especially if you want to extend them. An example of extending List<> might be for sorting, or overriding the index method. You also might want to add some business logic to a List<> - say OrderItems.AdjustQuantity() for instance.

Before I work with the model, I’m going to abstract the List<> by entity - in other words I’m going to create Collection classes for the classes that need it (I probably don’t need collection classes for all my tables). For this post that means the product table. Being nice and tidy, I’ll create a folder called “Collections” and in it, I’ll add a new class called “ProductCollection”:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Northwind.Data;

namespace Northwind.Data {
    public class ProductCollection:List<Product> {

    }
}

Doing it this way locks things down nice and tight, and we’ve effectively erased LINQ’s footsteps from our application :).

Don’t ever compare me to the rest; They’ll all get sliced and diced
In my last LinqToSql post I mentioned there was some weirdness with the way Update() works (Rick goes into it a lot here), and how the DataContext was attached to the model at the hip. This is important so I’ll go into that now.

Up above, in this post, I mentioned this is ORM done Microsoft’s way. That’s a very important thing to consider when working with the DataContext and updating data.

Built into LinqToSql is automatic change tracking, with Optimistic Concurrency checks turned on. What this means is that if you pull a record from the DB, work on it, and someone else comes and updates the same record in the DB, your changes are likely to be invalid (or else piss someone else off extremely).

This all works nicely in the DataContext, but I don’t want that thing lurking all over my application - I want to do things in one place. This is where ORM has a shortfall - when you try to go outside it’s rules it doesn’t like you anymore. LinqToSql let’s you work outside it’s context, but if you want to keep playing with it you need to be a good developer and consider concurrency issues in your model. There are two ways to do this:

  • enable version-tracking with a timestamp field in your tables (usually a very good idea) or
  • tell LinqToSql to take a hike and leave you alone

In some ways I don’t like it when technology dictates design changes, but at the same time a timestamp is a good idea for all tables with important data and in this case I have to say I’m grateful that I’ve been made to think about. So I’m going to add a “version” field to the Products table (type timestamp) to play ball.

But I’m also going to tell LinqToSql to take a hike, and disable change tracking on the Employees table since they are the only ones (in my business case) who can update their data. I don’t think it’s necessary to keep adding timestamp fields to all my tables, so I get my crank going and have at the designer:

  • Open up the dbml file and find the Employees table
  • Shift-click all the columns (it has to be ALL of them - not just some of them. Not sure why this is a column setting because it doesn’t work without setting for all columns) and then set “Update Check” to “Never”:

updatecheck

This is a nice feature since it makes you think through these scenarios.

You can’t gain or maintain; Unless you say my name
Now let’s add some code and run some tests to make sure it all works as we want to it to. To the NorthwindService class I’ll add these methods to expose data to my application - you can use whatever class structure you want for this, for now I just put it into one class:

    public static class NorthwindService {

        //static single-point context builder
        static Northwind.NorthwindContext db {
            get
            {
                return new NorthwindContext();
            }
         }

        #region Products
        public static Product ProductGet(int id){
            return db.Products.Single(p => p.ProductID==id);
        }
        public static ProductCollection ProductsByCategory(int categoryID){
            List<Product> prods=db.Products.Where(p => p.CategoryID==categoryID).ToList<Product>();
            return prods as ProductCollection;
        }
        public static int ProductSave(Product p){
            NorthwindContext context=db;
            bool isNew=p.Version==null;

            if(isNew){
                context.Products.Add(p);
            }else{
                context.Products.Attach(p, true);
            }
            context.SubmitChanges();
            return p.ProductID;
        }
        #endregion

        #region Employees
        public static Employee EmployeeGet(int id) {
            return db.Employees.Single(e => e.EmployeeID == id);
        }
        public static int EmployeeSave(Employee e) {
            NorthwindContext context = db;
            bool isNew = e.EmployeeID == 0;

            if (isNew) {
                context.Employees.Add(e);
            } else {
                context.Employees.Attach(e, true);
            }
            context.SubmitChanges();
            return e.EmployeeID;
        }
        #endregion

    }

Notice the ProductSave() method - by using a timestamp field called “version”, I know whether it’s been newly created. I could test the primary key value as well, but this type of thing doesn’t always hold. The Customers table, for instance, uses a char(5) value for customerID.

Since I have that timestamp field for Products, I can run the attach() method and Linq wont’ complain. Also, since I told Linq to cheese off with regards to checking the Employees table, EmployeeSave also works, but I still have an issue. The check of the Primary Key is not really a good way to gaurantee that the passed in class is new. Currently this type of state checking isn’t available in LinqToSql (that I’ve seen).

To get around this I can create a set of partial classes for all of my objects and hook into their events to set some properties - but that’s Fudge (note the capital F). The other thing I can do is to set something that LinqToSql should have set for me in the first place - the Read Only property for the primary key (it should be set to true - it’s not by default):

idcheck

Much better - now we can reliably check EmployeeID (if it’s 0, it’s new) - there’s no set accessor any more so people can’t set this value.

I’m gonna knock you out; Mama said knock you out
You can plainly see where Microsoft doesn’t want to lay tread on your turf, and they’ve always been that way (for better or worse). SubSonic, on the other hand, is all about convention and I can see how we can wrap LinqToSql nicely, generating things (and resetting settings) to wrap your schema.

We could also generate up a set of partial classes for each Linq entity and throw in some state checks that are a little more explicit (like IsNew), and also do an audit of the schema to make sure your plan will play ball with Linq.

You can see that this means moving from helper to more of an architectural standard, which in some ways is OK with me. I’d love to hear your thoughts on this!

Technorati Tags: , , ,

…are the things you say when screaming at your monitor when young ears are in the next room. Yes, it’s silly. What’s even more silly is the smelly mess on the floor that I call Apache 2.2.

I do recognize that this here free thing “powers half the internet”. I also realize that George Bush is the president. Some things you are powerless over even if the constitution says otherwise.

offtopic alert offtopic alert offtopic alert offtopic alert offtopic alert offtopic alert offtopic alert

Zack Owens in cracking his knuckles… ready to fire off a flame at Rob’s off-topic Bush/Apache reference… but Rob ducks! Swooshes and sliiiiiides over the to corner, putting on his best smile: “I’m only here to talk about Apache maaannnnn. Just chiiiiilllll…”

LinqUx and Other Bad Ideas
So today I had the bright idea of doing something clever with LINQ, Mono, Apache and Rails (eLMAR if you will) and so the first step was to install Ubuntu on Virtual Server 2005. That was ridiculously simple believe it or not - I had expected to find myself in a big fat quagmire with lots of casualties and no ability to answer for the loss of lives along the way. I’ll let Dick Cheney explain more about what I mean ….

Zack - dude I didn’t mean that. I’ll stop here - I promise. You have to admit that video is a little on the money though...

So my virtual server is all kicked up and happy, and I’m tootling away crampily on the command line trying to remember HTF to use emacs to edit that damned httpd.config file when it hits me:

I’m out of coffee, I’m hungry, tired and thirsty and I have no idea WHY I’m trying to figure out what this thing is saying to me, and why I’m being tortured like this… emacs should be named Guantan-

(stop… Rob…. don’t do it… please no more political references…). I apologize. I am simply a vessel for my muse, and slavishly bark at it’s siren’s call…

Anyway… being unable to give up, I finally remember my C- commands and off I go (with some help from Google) and edit up my config file. Within an hour or two I have Rails (almost) running nicely on a Virtual Server using Apache/Ubuntu, which itself is a Guest OS on my box at MaximumASP (thanks Phillip for the tips btw).

Server Bunnies
Lovely feeling this - I have a whole new (virtual) box, and it’s Ubuntu Server running on Win2003 Server! Yes Yes Yes I’ve made lots of VMs here on my dev box, but for some reason the thought never occurred to me to rev a few up on a co-lo. My server is a fertile thing! If I copy the VM, I can rev up a second instance if I like - slick!

The sad part is that I couldn’t get anything to work because Apache just wouldn’t play ball. It’s been a long time (and happy at that) since I’ve had to work the command line and edit stuff fingers-only, and I suck at it. So as I’m typing away streams of “monkey words” (as my daughter calls them - not swear words mind you, “proxy” swear words) come flying out of my mouth, violating my prime directive (don’t sound like an Ape).

This leads me to conclude that, as much as people rip MS apart for some IIS’s downfalls - the thing is really easy to use and is pretty damn stable. I wish I could get it to hand off proxy requests without IsapiRewrite … but oh well.

Moral of the story: give IIS a hug tomorrow.

So I’m really close - I just can’t get the proxy bits to route to Mongrel properly. If anyone out there has a nice *NIX image they want to share with MySQL/Rails etc. configured, lemme know! I have a fun post in the queue here and I’d love for it to stop smellin up my office floor…

Lots to read lately on LINQ and Linq2Sql, and for the last couple of days I’ve been tits-deep in code, trying to figure out neat and fun ways to clamp LINQ to my Database’s privates while not causing offense…

Let’s start this post again. WTF Rob…

So, over the last week I’ve been duct-taping LINQ onto Northwind and my little tester application. I know that might not sound so pretty but it’s the best way I can put it since I’m not trying to code with it - I’m trying to understand it and break it (not hatin… i promise).

The presence of LINQ, while all around very positive, makes me question my commitment to my database and my application. To play with LINQ I need to change my rules, need to work from a “different perspective” as it were. It’s not a bad thing - Rails made me do this and good things have happened because of it (I think - well it was good for me anyway… isn’t that what’s important? :p).

I’ve always been your typical, faithful geek that doesn’t stray too far from the barn - but then LINQ comes along, wearing all that leather and smelling enticingly of chocolate and burnt…

Crap. Let’s just get to the code… enough analogies already so I’ll just get to the point:

LINQ Confuses Me

Sure It’s Cute- But Can It Dance?
The first thing I did, cause as you know I’m sorta performance-minded, was to make sure my DB could party with LINQ and hold it’s own. The last thing I need is my DB crawling into work with bags under it’s eyes, tired and sore from the night before. So I kicked up a console app and decided to crank out some perf tests. The tests were simple and involved me rolling over the Products table 1000 times (using LinqToSql), and loading it to a List<>:

DateTime dEnd = DateTime.Now;
DateTime dStart = DateTime.Now;

List<Product> prods = null;
for (int i = 0; i < 1000; i++) {
    Northwind.NorthwindDB db = new NorthwindDB(GetConnString());

    Table<Product> tbl = db.Products;
    prods = tbl.ToList<Product>();
}
dEnd = DateTime.Now;
TimeSpan ts = SubSonic.Sugar.Dates.Diff(dEnd, dStart);
int ms = ts.Milliseconds;
WriteOut("result: " + ms.ToString() + " ms for " + prods.Count + " records");

For fun, I then loaded up the same thing in SubSonic:

for (int i = 0; i < 1000; i++) {
    new ProductCollection.Load();
}

To my surprise LINQ was within 50ms of SubSonic - which is pretty groovy! Looping 1000 times using SubSonic came in at~600ms (an average), and LINQ came in between 600 and 700 ms - as far as I’m concerned that’s a tie.

I was talking to ScottGu the other day and he mentioned that you can improve even more on performance by using “CompiledQuery”, so I gave it a shot and added it to the test. The trick here is you declare the query in advance, then use it to pull back the result set:

var resultComp = CompiledQuery.Compile((NorthwindDB ndb) => from p in ndb.Products
                                                            select p);
for (int i = 0; i < 1000; i++) {
    Northwind.NorthwindDB db = new NorthwindDB(GetConnString());
    prods = resultComp.Invoke(db).ToList<Product>();
}

There was a small improvement but it was negligible in this case - mainly because the query I’m doing is pretty basic. If it was more complicated I’m sure this would help.

Also - I can improve even more on the Linq2Sql query by creating the instance of the DB context outside of the loop - but that’s not quite representative of load really. Or is it?

This is where the confusion begins. What do I do with this context thinger anyway?

I’m Sorry, I Can’t Hear You, There’s a DataContext In My Ear
So I shoot an email off to ScottGu and ask him if he has a minute to let me in on the DataContext secret. I’ll summarize here and say that the DataContext is the Secret Sauce behind LinqToSql - sort of like Ranch Dressing for your fries. Sure fries taste good, but there’s something special about a slopping them around in a bunch of Ranch Dressing isn’t there. Come on… I know it was the 80’s… but you still sneak some Country Fresh Ranch from the market… I know Atwood does. It’s why he’s got the Twitter Jitters.

Anyway - he explains to me that the idea behind the context is to have an persistence medium (my words) between the app and the DB.

For instance, let’s say you create an SP that returns all products for a given category (ProductsByCategory) - the same thing Scott showed on his blog today. One of the cool things you can do with Linq2Sql is change the return type so that you return a Product for this SP - super slick.

So let’s call this SP and bring back all products for CategoryID 1:

NorthwindDB db = new NorthwindDB();
//get products from SP
List<Product> product1 = db.ProductsByCategory(1).ToList<Product>();

For fun, let’s bring back another result set for all products:

//get the products from L2S
List<Product> product2 = db.Products.ToList<Product>();

So now we have two List<Product> collections to play with. Here’s where the “persistence” bits come in - if I change one of the objects in product1, product2 will also be changed under the covers:

//grab the first product from L2S
Product p1 = product1.Single<Product>(prod => prod.ProductID==1);
Product p2 = product2.Single<Product>(prod2 => prod2.ProductID == 1);
//reset the name
p1.ProductName = "Keoki Gold Lager";
Assert.IsTrue(p1.ProductName == p2.ProductName);

… And this test passes nicely. Pretty slick stuff!TheGreatGazoo

There’s a lot more this way, and I know Scott has a post coming on it in the next week or so, so I’ll let him go deeper into it. Just know there’s a lot going on with the DataContext in the background.

So in running up this code, I couldn’t help but feel like the DataContext was sort of … insinuating itself into my application’s architecture. Sort of like the Great Gazoo used to follow Fred and Barney around. Fred was always confused by the whole Gazoo thing wasn’t he? I mean - was he a friend? Or was he annoying? 

Well I’m not afraid to admit that I like Ranch Dressing, and I don’t mind flying green martian guys either for that matter - so let’s see if we can buddy up and party like Rock Stars.

Molly Ringwald and Co-Dependent Relationships
I think I was the only kid in Junior High that didn’t have a crush on Molly Ringwald - even after that whole “lipstick scene” in the Breakfast Club. Cute, but sort of clunky and sad. Now Phoebe Cates. THAT’s a scene I don’t need to explain to anyone. In fact I don’t even need to use her name - it’s just “The Scene” as far as most guys go.

So here we are, with our new friend the DataContext, wondering if we should try and hold hands or just be friends. Initially I thought of it as a sort of Molly Ringwald - “cute… but…”. So I created a Singleton out of it and decided that my entire application would work from one context:

NorthwindDB _context;
public NorthwindDB DBContext {
    get {
        if (_context == null) {
            lock (this) {
                _context = new NorthwindDB();
            }
        }
        return _context;
    }
}

The problem here is that the DataContext keeps track of what’s going on with all of your objects, and that leads to some serious overhead. Also, if you’re working from a static class that serves up business logic (i.e. a Service class or BLL) then you’re going to really screw things up since the DataContext stores and tracks state per table (as shown above) - so if a user changes some data, that change will ripple out to your entire application. Usually not good.

OK so now we know that the DataContext needs to be instanced up and used with certain rules. It’s not as easy as Molly - it’s more high maintenance -  like Phoebe.

For example - let’s create some CRUD methods for a Service class (Rick Strahl has a great post about this, in a lot more detail): inserting and updating a Product:

//create a new product
Product p = new Product();
p.ProductName = "Phils Heinous Chai Tea";
p.CategoryID = 1;
p.UnitPrice = 100;
p.UnitsInStock = 0;
p.SupplierID = 2;
p.CreatedOn = DateTime.Now;
p.ModifiedOn = DateTime.Now;

Service.InsertProduct(p);
public static void InsertProduct(Product p) {
    //instance up the Context
    NorthwindDB db = new NorthwindDB();
    //attach the product to the context
    db.Products.Add(p);
    //save it down
    db.SubmitChanges();

}

This works exactly as you would expect - so no problems here. But what about updating? This is a little muddier:

//get product
Product p = db.Products.Single<Product>(prod => prod.ProductName == "Phils Heinous Chai Tea");
//Phil says people don't know what heinous means
p.ProductName = "Phil's Horrible Chai";
UpdateProduct(p);

To get the update to happen, you need to create a context and use Attach() - but this isn’t working at the moment (Rick has more on his blog). I’ve been told this might be a bug - I’d like to assume that it is. Here’s how the code should look:

public static void UpdateProduct(Product p) {
   //load the context
    NorthwindDB db = new NorthwindDB();
    //this is a bug - I think. Or is it? Phoebe?
    db.Products.Attach(p,true);
    db.SubmitChanges();
}

You can check to see if Phoebe’s been listening to you at all if you’d like, by taking a look at the the ChangeSet - inserting the following lines of code right before db.SubmitChanges():

//so did the change happen? Is our product queued?
ChangeSet changes = db.GetChangeSet();
int mods = changes.ModifiedEntities.Count;
int adds = changes.AddedEntities.Count;

It turns out that, in fact, that this doesn’t work at this time. I think it will get fixed - we shall see.

LINQ Drops Its Top…
So up to this point we’ve been looking at the DataContext and how to leverage it into an application. It’s mighty pretty stuff as it saunters casually through the sprinklers towards us, but that doesn’t mean that it won’t be screaming at us for leaving our empty beer bottles in the shower just six months from now.

Normally a degree of separation is wanted when you talk to your DB - this is true for most “client-server” apps out there today, which is about 90% i’d say. In other words, the app asks for data, get’s it, and says “buh bye” until it needs to hand some data back. There can be issues with this (concurrency violations for one) and people have spent a lot of time working on solutions to locking and concurrency issues.

The interesting thing is that once you get to know LinqToSql, you see that it’s an ORM model that persists a sort of “state” of your domain - pushing the changes to the DB only when you tell it to. In the past we’ve been told to make more calls to the DB to keep things running happily - this is something different, even for ORM and using it might take some getting used to - and also call for doing some things differently. 

For instance - the DataContext can exist in the same way, perhaps, as a User Session does in a web application. In fact you might even be able to store it in the user’s Profile (I haven’t tried this but… well you never know!).

…And We Find Gazoo Dressed In Phoebe’s Clothes
So the DataContext seems to be this bucket - or perhaps a floating green martian that helps us to work with our DB. We’re not really sure where it lives, nor what it needs to eat (Ranch Dressing?) - but it has its rules and if we can play by them, it could be a good thing:

You whistle a tune and here comes the DataContext, our Great Gazoo:

“What can I do for you sir?”
“You can get all the products from the db, update the view stats, and then save down this log for me. But wait a minute cause this user might want to one of the products to their basket”
“OK - Just let me know. I’ll hang out here in the Session I spose. “
“Sounds good - we should find a better home for you but for now that’ll work.”
“OK the user wants to add these products. Can you get the sales tax info, shipping, and then check out - can you handle this for me?”
“Sure, I’m transactional you know.”
“Perfect - submit your changes now and then we’ll see about that new home…”

(I seriously resisted using the Tron/Bit thing here I’ll have you know…)

The thing here is that you’re now making architectural decisions based on the concept of “state”, where the web is a stateless sort of thing. But that’s so 2002 - haven’t things sort of changed now with Ajax and all that jazzy snazzy kid stuff? In a way it’s almost refreshing to abstract the database from this “thing that holds data” to more of a gateway. This is a new and different sort of ORM and I think there are some creative ways to work with it.

Ranch Dressing for The New Millennium
So here we are - in the age of client server, slowly retracing our steps back into a “Thick Client” world. Consider that AJAX, Silverlight, and Flash have ushered in the “Thickness Renaissance”. Before Google brought Ajax out of the bathroom closet, Microsoft had used the XMLHTTP thinger to do some fancy stuff “waybackwhen” and was basically lambasted for it only to have Google club it over the head with it’s own technology - done right.

Is the DataContext doing the same? Why yes - it is. Is it bad? I don’t know yet - I haven’t broken it enough, to know. I do know that it’s high-maintenance, powerful stuff and it’s demanding that I change some old habits.

It’s been a long time since I put Ranch Dressing on anything - but you never know, it might just taste good again.

It’s late in the day, my IPA that’s been fermenting now for 10 days has a wonderful sour/bandaid taste (the yeast was crap evidently), and a hurricane is making it’s way slowly towards us. Not really conducive to work so I thought I’d muck around with Rails a bit and make y’all a screencast that “goes beyond” the sticky sweet stuff and shows you how easy it is to extend and build out your application.

Specifically - Rails comes with a console application that will wrap your website and allow you to work with your models and controllers directly through a console. Very groovy. It also has a very cool plugin structure that allows you to go get code that other people have written and install it in your application right quick - I’ll show you how to do that too.

All good stuff and a fun way to kill 20 minutes :). Hope you enjoy it and let me know if you have any questions!

You can watch it here…

If you want to see the other Rails webcasts I’ve done, you can watch them here:

All about Rails

Rails Testing

Why Am I Doing This?
I find Rails inspirational. It’s a treat to play with and to work in and I think everyone should learn a new language or platform every year as part of “rounding out” your experience!

I get eye-strain headaches sometimes, and it really sucks. A lot of this has to do with the black-on-white default settings that VS uses. The good news is that you can changes this :).

A lot has been written about IDEs and their colors, and none (to me) is more impressive than TextMate’s Vibrant Ink:

I really like the Monaco font, and the black background cuts down on the overall glare. Jon Lam recently created the Vibrant Ink theme and released it on his blog, but he also included his full settings which for some (me included) were a little much. I tweaked his theme a bit by adding in the Monaco font (pictured above) and raising the pure black a few shades so the contrast isn’t as drastic:

rob_vibrant_ink

I’ve attached below the theme I use, plus two others I really like - MidTones (you can see more here) and Jeff Atwood’s super groovy settings (codenamed Pruno).

In addition I’ve included the Monaco font (pictured above) - to install just drag it into your System/Fonts folder.

Download the theme here.

We’ve had an avalanche of questions regarding many to many relationships on our forums, and so I thought it would be a great subject for the next in our series of webcasts!

You can view it here.

This webcast is about 11 minutes long and shows you the generated methods we create for you, the schema we expect, and also how to use the Scaffold and ManyManyList to work with the data.