Linq to SqlXML

I’m hacking along on my Document DB emulator ontop of Sql Server XML columns.

I have some decent Linq support in place now.
The following query:

var query = from order in orders
            //must have status shipped
            where order.Status >= OrderStatus.Shipped      
            //must contain foo or bar products
            where order.OrderDetails.Any(d => d.ProductNo == "Foo" || d.ProductNo == "Bar")
            //must have an order total > 100
            where order.OrderDetails.Sum(d => d.ItemPrice * d.Quantity) > 100 
            select order;

will yield the following Sql + XQuery to the Sql Server:

select *
from documents
where CollectionName = 'Order'  and 
--must have an order total > 100
          for $A in OrderDetails[1]/object/state 
                return ($A/ItemPrice[1] * $A/Quantity[1])) > xs:decimal(100))]') = 1) and 
--must contain foo or bar products
(documentdata.exist('/object/state[OrderDetails[1]/object/state[((ProductNo[1] = "Foo") or 
 (ProductNo[1] = "Bar"))]]') = 1) and 
--must have status shipped
(documentdata.exist('/object/state[(Status[1] >= xs:int(2))]') = 1)

Building a Document DB ontop of Sql Server

I’ve started to build a Document DB emulator ontop of Sql Server XML columns.
Sql Server XML columns can store schema free xml documents, pretty much like RavenDB or MongoDB stores schema free Json/Bson documents.

XML Columns can be indexed and queried using XPath queries.

So I decided to build an abstraction layer ontop of this in order to achieve similair ease of use.
I’ve built a serializer/deserializer that deals with my own XML structure for documents (state + metadata) and also an early Linq provider for querying.

Executing the following code:

var ctx = new DocumentContext("main");
var customers = ctx.GetCollection<Customer>().AsQueryable();

var query = from customer in customers
            where customer.Address.City == "abc" && customer.Name == "Acme Inc5"
            orderby customer.Name
            select customer;

var result = query.ToList();
foreach (var item in result)

Will yield the following SQL + XPath query:

select *

from documents

where CollectionName = 'Customer' and
   ((documentdata.exist('/object/state/Address/object/state/City/text()[. = "abc"]') = 1) and
    (documentdata.exist('/object/state/Name/text()[. = "Acme Inc5"]') = 1))

order by documentdata.value('((/object/state/Name)[1])','nvarchar(MAX)')

The result of the query will be returned to the client and then deserialized into the correct .NET type.

.NET disassembled Part 2: for loops

.NET Disassembled
Part 1: integers in .NET
Part 2: This post

This is my second post on .NET performance, in this series of posts I will try to show how .NET code performs compared to pure C++ code.
Today I will show how a C# for loop compares to a Win32 C++ for loop.

First we need some sample code to compare, so here it is:

The C# code:

The Win32 C++ code:

I hope we can agree that the above code is fairly similair, we loop “i” from 0 to 999 and call the function “Foo” while doing so.

I will begin with the disassembly for the C++ code, just so we can see how optimized it is:

What are we seeing here?
The “i” variable have been replaced with a native register (esi) which is set to 1000.
The Foo function have been inlined in the for loop.
The loop guard have been optimized to “esi — ; if esi >0 goto loopbody” since this is more efficient in native code (dec,jne)
So the C++ code is clearly very optimized.

So how does the C# counterpart hold up against the optimized C++ code?
[Edit] big thanks to Omer for pointing out how to disassemble the optimized .NET code.

Just like the C++ version, the variable “i” have been replaced with a native register, “esi”
The Foo function call have been inlined (address 6-18), do note that calling console.writeline is not the same as calling cout in win32, so the code will differ.
So the optimized .NET code is pretty much equal to the win32 C++ version.

Hope this can crush some of the myths floating around…


.NET disassembled Part 1: Integers in .NET

Today I attended one of Juval Löwy’s sessions on our inhouse conference.

He argued that every object in .NET is a COM object behind the scenes, even Int32.
Fair enough, Int32 is a type and it does have a Guid.
The argument was an attempt to show that .NET is slow and that everything is pushed around as a COM object internally, even the integers…
(This was an up ramp for his “every object should be a WCF service”, to show that performance doesn’t matter, much..)

This is however completely wrong..
.NET have specific IL op codes in order to deal with primitives, and those opcodes will translate to pure machine code.

Here is a C# code snippet:

Here is the same code disassembled into x86

The value hex 7B (dec 123) is moved directly into the memory slot for the local variable.
No COM objects involved, no nothing, just normal machinecode, as long as you treat it as an int that is..
You can not make this faster or better even if you resort to pure ASM.