The Daily Parker

Politics, Weather, Photography, and the Dog

W is for while (and other iterators)

Blogging A to ZWe're in the home stretch. It's day 23 of the Blogging A-to-Z challenge and it's time to loop-the-loop.

C# has a number of ways to iterate over a collection of things, and a base interface that lets you know you can use an iterator.

The simplest ways to iterate over code is to use while, which just keeps looping until a condition is met:

var n = 1;
while (n < 6)
{
	Console.WriteLine($"n = {n}");
	n++;
}
Console.WriteLine("Done");

while is similar to do:

var n = 1;
do
{
	Console.WriteLine($"n = {n}");
	n++;
} while (n < 6);
Console.WriteLine("Done");

The main difference is that the do loop will always execute once, but the while loop may not.

The next level up is the for loop:

for (var n = 1; n < 6; n++)
{
	Console.WriteLine($"n = {n}");
}
Console.WriteLine("Done");

Similar, no?

Then there is foreach, which iterates over a set of things. This requires a bit more explanation.

The base interface IEnumerable and its generic equivalent IEnumerable<T> expose a single method, GetEnumerator (or GetEnumerator<T>) that foreach uses to go through all of the items in the class. Generally, anything in the BCL that holds a set of objects implements IEnumerable: System.Array, System.Collections.ICollection, System.Collections.Generic.List<T>...and many, many others. Each of these classes lets you manipulate the set of objects the thing contains:

var things = new[] { 1, 2, 3, 4, 5 }; // array of int, or int[]
foreach(var it in things)
{
	Console.WriteLine(it);
}

foreach will iterate over all the things in the order they were added to the array. But it also works with LINQ to give you even more power:

var things = new List<int> {1, 2, 3, 4, 5};
foreach (var it in things.Where(p => p % 2 == 0))
{
	Console.WriteLine(it);
}

Three guesses what that snippet does.

These keywords and structures are so fundamental to C#, I recommend reading up on them

V is for var

Blogging A to ZFor my second attempt at this post (after a BSOD), here (on time yet!) is day 22 of the Blogging A-to-Z challenge.

Today's topic: the var keyword, which has sparked more religious wars since it emerged in 2007 than almost every other language improvement in the C# universe.

Before C# 3.0, the language required you to declare every variable explicitly, like so:

using System;
using InnerDrive.Framework.Financial;

Int32 x = 123; // same as int x = 123;
Money m = 123;

Starting with C# 3.0, you could do this instead:

var i = 123;
var m = new Money(123);

As long as you give the compiler enough information to infer the variable type, it will let you stop caring about the type. (The reason line 2 works in the first example is that the Money struct can convert from other numeric types, so it infers what you want from the assignment. In the second example, you still have to declare a new Money, but the compiler can take it from there.)

Some people really can't stand not knowing what types their variables are. Others can't figure it out and make basic errors. Both groups of people need to relax and think it through.

Variables should convey meaning, not technology. I really don't care whether m is an integer, a decimal, or a Money, as long as I can use it to make the calculations I need. Where var gets people into trouble is when they forget that the compiler can't infer type from the contents of your skull, only the code you write. Which is why this is one of my favorite interview problems:

var x = 1;
var y = 3;
var z = x / y;

// What is the value of z?

The compiler infers that x and y are integers, so when it divides them it comes up with...zero. Because 1/3 is less than 1, and .NET truncates fractions when doing integer math.

In this case you need to do one of four things:

  • Explicitly declare x to be a floating-point type
  • Explicitly declare y to be a floating-point type
  • Explicitly declare the value on line 1 to be a floating-point value
  • Explicitly declare the value on line 2 to be a floating-point value
// Solution 1:

double x = 1;
int y = 3;
var z = x / y;

// z = 0.333...

// Solution 3:

var x = 1f;
var y = 3;
var z = x / y;

// z == 0.333333343

(I'll leave it as an exercise for the reader why the last line is wrong. Hint: .NET has three floating-point types, and they all do math differently.)

Declaring z to be a floating-point type won't help. Trust me on this.

The other common reason for using an explicit declaration is when you want to specify which interface to use on a class. This is less common, but still useful. For example, System.String implements both IEnumerable and IEnumerable<char>, which behave differently. Imagine an API that accepts both versions and you want to specify the older, non-generic version:

var s = "The lazy fox jumped over the quick dog.";
System.Collections.IEnumerable e = s;

SomeOldMethod(e);

Again, that's an unusual situation and not the best code snippet, but you can see why this might be a thing. The compiler won't infer that you want to use the obsolete String.IEnumerable implementation under most circumstances. This forces the issue. (So does using the as keyword.)

In future posts I may come back to this, especially if I find a good example of when to use an explicit declaration in C# 7.

S is for String

Blogging A to ZDay 19 of the Blogging A-to-Z challenge was Saturday, but Apollo After Hours drained me more or less completely for the weekend.

So this morning, let's pretend it's still Saturday for just a moment, and consider one of the oddest classes in the .NET Base Class Library (BCL): System.String.

A string is just a sequence of one or more characters. A character could be anything: a letter, a number, a random two-byte value, what have you. System.String holds the sequence for you and gives you some tools to control them, like Compare, Format, Join, Split, and StartsWith. Under the hood, the class holds the string as an array of char values.

Even though System.String is a class and not a struct, it behaves much more like the latter than the former. Strings are immutable: once you create a string, any changes you make to it create a new string instance. Also, below a certain length, strings live on the stack rather than in the heap, which has consequences for memory management and performance. But unlike structs, strings can be null. This is valid code:

var s = "Hello, world";
string t = null;
var u = t + s;

Strings can also be zero-length or all whitespace, which is why the class has a very useful method String.IsNullOrWhitespace().

The blog C# in Depth has a good description of strings that's worth reading. Jon Skeet also takes on string memory management in one of his longer posts.

T is for Type

Blogging A to ZNow that I've caught up, day 20 of the Blogging A-to-Z challenge is just a few hours late. (The rest of the week should be back to noon UTC/7 am Chicago time.)

Today's topic: Types.

Everything in .NET is a type, even System.Type, which governs their metadata. Types exist in a hierarchy called the Common Type System (CTS). Distilled, there are two kinds of types: value types and reference types. I alluded to this distinction Saturday earlier today when discussing strings, which are reference types (classes) that behave a lot like value types (structs).

The principal distinction is that value types live on the stack and reference types live on the heap. This means that all of the data in a value type is contained in one place. The CLR sets aside memory for the entire object and moves the whole thing around as a unit. Naturally, this means value types tend to be small: numbers, characters, booleans, that sort of thing.

Reference types also partially live on the stack but only as a pointer to the heap where they keep their main data. The .NET memory manager can move reference types and their data independently as needed to handle different situations.

One of the consequences of this distinction is that when you pass a value type to a method, you're passing the entire thing; but when you pass a reference type to a method, you're only passing its pointer. This can give new .NET developers terrific headaches:

public void ChangeStuff()
{
	var i = 12345; // i is a System.Int32
	var p = new MyClass { Name = "Hubert", Count = i };
	
	ChangeStuff(i, p);
	
	Debug.Assert(12345 == i); // Succeeds!
	Debug.Assert(54321 == MyClass.Count); // Succeeds!
	Debug.Assert("Hubert" == MyClass.Name); // Fails!
}

private void ChangeStuff(int i, MyClass myClass)
{
	i = 54321; // The i in this method is not the i in the calling method
	MyClass.Name = "Humbert"; // But the MyClass is
	MyClass.Count = i;
}

When the CLR passes i to the second ChangeStuff method, it creates a copy of i on the stack and passes in that copy. Then on line 15, we create a third copy of i that replaces the second copy on the stack.

But lines 16 and 17 aren't creating copies of MyClass; they're using the pointer to the instance of MyClass created on line 4, so that any operations on MyClass work on the very same instance.

I recommend new developers read the MSDN article on Types I referenced above. Also read up on boxing and unboxing and anonymous types.

R is for Reflection

Blogging A to ZOK, I lied. I managed to find 15 minutes to bring you day 18 of the Blogging A-to-Z challenge, in which I'll discuss one of the coolest feature of the .NET ecosystem: reflection.

Reflection gives .NET code the ability to inspect and use any other .NET code, full stop. If you think about it, the runtime has to have this ability just to function. But any code can use tools in the System.Reflection namespace. This lets you do some pretty cool stuff.

Here's a (necessarily brief) example, from the Inner Drive Extensible Architecture. This method, in the TypeInfo class, uses reflection to describe all the methods in a Type:

using System;
using System.Collections.Generic;
using System.Data;
using System.Diagnostics;
using System.Globalization;
using System.Linq;
using System.Reflection;
using System.Text;

public static IDictionary<string, MethodInfo> ReflectMethods(object target)
{
	if (null == target) return new Dictionary<string, MethodInfo>();

	var thisType = target.GetType();
	var methods = thisType.GetMethods(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static);

	var returnList = new Dictionary<string, MethodInfo>();
	foreach (var thisMethod in methods.OrderBy(m => m.Name))
	{
		var parameters = thisMethod.GetParameters();
		var keyBuilder = new StringBuilder(thisMethod.Name.Length + (parameters.Length * 16));
		keyBuilder.Append(thisMethod.Name);
		foreach (var t in parameters)
		{
			keyBuilder.Append(";P=");
			keyBuilder.Append(t.ParameterType.ToString());
		}
		keyBuilder.Append(";R=");
		keyBuilder.Append(thisMethod.ReturnType.ToString());
		keyBuilder.Append(";F=");
		keyBuilder.Append(thisMethod.Attributes.ToString());

		try 
		{
			returnList.Add(keyBuilder.ToString(), thisMethod);
		}
		catch (ArgumentException aex) 
		{
			Trace.WriteLine(string.Format(CultureInfo.InvariantCulture, "> Exception on {0}: {1}", keyBuilder, aex.Message));
		}
		catch (Exception ex)
		{
			Trace.WriteLine(string.Format(CultureInfo.InvariantCulture, "> Exception on {0}: {1}", keyBuilder, ex.Message));
			throw;
		}
	}

	return returnList;
}

The System.Reflection.Assembly class also allows you to create and activate instances of any type, use its methods, read its properties, and send it events.

Reflection is hugely powerful and hugely useful. And very, very cool.

Q is for Querying

Blogging A to ZPosting day 17 of the Blogging A-to-Z challenge just a little late because of stuff (see next post). Apologies.

Today's topic is querying, which .NET makes relatively easy through the magic of LINQ. Last week I showed how LINQ works when dealing with in-memory collections of things. In combination with Entity Framework, or another object-relational mapper (ORM), LINQ makes getting data out of your database a ton easier.

When querying a database in a .NET application, you will generally need a database connection object, a database command object, and a data reader. Here's a simple example using SQL Server:

public void DirectQueryExample(string connectionString)
{
	using (var conn = new SqlConnection(connectionString))
	{
		var command = new SqlCommand("SELECT * FROM LookupData", conn);
		conn.Open();
		var reader = command.ExecuteReader();
		foreach (var row in reader)
		{
			Console.WriteLine(reader[0]);
		}
	}
}

(Let's skip for now how bad it is to execute raw SQL from your application.)

With Entity Framework (or another ORM), the ORM generates classes that represent tables or views in your database. Imagine you have an animals table, represented by an animal class in your data project. Finding them in your database might now look like this:

public IEnumerable<Animal> OrmQueryExample(string species)
{
	var result = new List<Animal>();
	using (var db = Orm.Context)
	{
		var dtos = db.Animals.Where(p => p.Species == species);
		result.AddRange(dtos.ForEach(MapDtoToDomainObject));
	}

	return result;
}

private Animal MapDtoToDomainObject(AnimalDto animalDto)
{
	// Code elided
}

That looks a little different, no? Instead of opening a connection to the database and executing a query, we use a database context that Entity Framework supplies. We then execute a LINQ query directly on the Animals table representation with a predicate. The ORM handles constructing and executing the query, and returns an IQueryable<T> of its self-generated Animal data transfer object (DTO). When we get that collection back, we map the fields on the DTO to the domain object we want, and return an IEnumerable<T> of our own domain object back to the caller. If the query comes up empty, we return an empty list. (Here's a decent Stack Overflow post on the difference between the two collection types.)

These are naive examples, of course, and there are many other ways of using EF. For example, for field mapping we might use a package like AutoMapper instead of rolling our own field-level mapping. I encourage you to read up on EF and related technologies.

P is for Polymorphism

Blogging A to ZWe're now past the half-way point, 16 days into the Blogging A-to-Z challenge. Time to go back to object-oriented design fundamentals.

OO design has four basic concepts:

All four have specific meanings. Today we'll just look at polymorphism (from Greek: "poly" meaning many and "morph" meaning shape).

Essentially, polymorphism means using the same identifiers in different ways. Let's take a contrived but common example: animals.

Imagine you have a class representing any animal (see under "abstraction"). Animals can move. So:

public abstract class Animal
{
	public abstract void Move();
}

Notice that the Move method has no implementation, since animal species have many different ways of moving.

Now imagine two concrete animal classes:

public class Dog : Animal
{
	public override void Move() 
	{
		// Walk like a quadraped
	}
}

public class Guppy : Animal
{
	public override void Move() 
	{
		// Swim like a fish
	}
}

Guppies and dogs both move around just fine in their own environments, and dogs can move around in the littoral areas of a guppy's environment as well. So both animals have a Move method.

In this way, the Move method is polymorphic. A caller doesn't need to know anything about guppies or dogs in order to get them to move. And the implementations of the Move method will be completely different:

public void MoveAll(IEnumerable<Animal> animals)
{
	animals.ForEach(a => a.Move());
}

That method doesn't care what the list contains. It moves them all the same.

Now imagine this class:

public class Electron : Lepton
{
	public override void Move() 
	{
		// Walk like a quadraped
	}
}

Electrons move too. The implementation of Electron.Move() differs from Dog.Move() or Guppy.Move() so vastly that no one really knows how electrons do it. But if you call Electron.Move(), you expect the thing to move.

I've only given examples of subtyping and duck typing today, so it's worth reading more about polymorphism in general. Also, as you recall from my discussion of interfaces, you probably would also define an interface like IMovable to express that your class can move, rather than relying on the abstract classes and inheritance. (Program to interfaces, not to implementations!)

N is for Namespace

Blogging A to ZDay 14 of the Blogging A-to-Z challenge brings us to namespaces.

Simply put, a namespace puts logical scope around a group of types. In .NET and in other languages, types typically belong to namespaces two or three levels down.

Look at the sample code for this series. You'll notice that all of the types have a scope around them something like this:

namespace InnerDrive.Application.Module
{
}

(In some languages it's customary to use the complete domain name of the organization creating the code as part of the namespace and to use alternate letter cases. If I were writing Java, for example, that would look like com.inner-drive.application.module.)

Every type defined in the namespace belongs to only that namespace. If I defined a type in the example namespace above called Foo, the fully-qualified type name would be InnerDrive.Application.Module.Foo. Because using FQTNs requires a lot of typing and makes code harder to read, .NET gives you another use of the namespace keyword:

using InnerDrive.Application.Module;

namespace InnerDrive.Application.OtherModule
{
	public class Bar
	{
		public void Initialize() 
		{
			// var foo = new InnerDrive.Application.Module.Foo() is not required
			var foo = new Foo();
			foo.Start();
		}
	}
}

Also, that Bar class belongs only to the InnerDrive.Application.OtherModule namespace, so another developer could create another Bar class in her own namespace without needing to worry about mine.

M is for Method

Blogging A to ZAlphabetical order doesn't actually put topics in the best sequence for learning, so we've had to wait until Day 13 of the Blogging A-to-Z challenge to talk about one of the most basic parts of an object-oriented program: methods.

A method takes a message from an object and does something with it. It's the behavior part of the behavior-plus-data pairing that orients your objects in the OO universe.

In .NET, even though you define fields, events, properties, and methods on your classes, under the hood the CLR sees only fields and methods. Properties and events are basically special flavors of methods that C# syntax makes easier to understand for humans. (See Monday's post.)

Take this simple C# snippet:

public string Name { get; internal set; }

The compiled code for that property will look almost the same as the compiled code for this pair of methods with a backing field:

public string get_Name()
{
	return _name;
}

internal void set_Name(string value)
{
	_name = value;
}

private string _name;

In fact, the method pair should look very familiar to Java developers, since that language hasn't really kept up with the times, you know? (Java developers would call the simplified version "syntactic sugar," which is what people call things that make life simpler when their salaries depend on it being complicated. It's essentially every argument a Rails developer has with her .NET counterpart until the first time she needs to decouple the database from the front end. That's when the .NET guy shows her a coding horror from the VB3 era with a mournful warning not to let this happen to her. But I digress.)

To sum up: Methods change the data or behavior of an object, but C# prefers that you use properties to change data, events to express behaviors to external consumers, and methods to ask the object to do something.

The A-to-Z challenge is off tomorrow, but it will return next week with a basic tool of organizing your software, a basic tool of performance testing, a basic principle of OO design, and three other posts I haven't thought about yet.

L is for LINQ

Blogging A to ZDay 12 of the Blogging A-to-Z challenge will introduce you to LINQ, another way .NET makes your life easier.

LINQ stands for Language INtegrated Query, which Microsoft describes as follows:

Traditionally, queries against data are expressed as simple strings without type checking at compile time or IntelliSense support. Furthermore, you have to learn a different query language for each type of data source: SQL databases, XML documents, various Web services, and so on. With LINQ, a query is a first-class language construct, just like classes, methods, events.

LINQ does a lot of things, so let me show just a small example. Before LINQ, if you wanted to loop through a collection and filter for specific characteristics, you'd have to do something like this:

public static ICollection<Room> ForEachLooping(IEnumerable<Room> rooms, string filter)
{
	var result = new List<Room>();
	foreach (var item in rooms)
	{
		if (filter == item.Name) result.Add(item);
	}

	return result;
}

Here's the LINQ version; see if you can spot the difference:

public static ICollection<Room> LinqLooping(IEnumerable<Room> rooms, string filter)
{
	return rooms.Where(p => p.Name == filter).ToList();
}

LINQ adds a whole set of extension methods to the IEnumerable<T> interface, including Average, Sum, Sort, Join...basically, everything you can do with a SQL statement, you can do with a LINQ statement.

In fact, there's an alternate syntax that's even more SQL-like:

public static ICollection<Room> SqlishLinq(IEnumerable<Room> rooms, string filter)
{
	return
		(from r in rooms
		where r.Name == filter
		select r)
	.ToList();
}

Note that LINQ naturally operates on and returns IEnumerable<T>, not ICollection<T>, so I invoked the .ToList() method for easier testing. In fact you would want to return IEnumerable<T> so that you can easily chain methods that use LINQ, as LINQ doesn't evaluate the whole query chain until you try to use one of its results. Calling ToList() forces an invocation.

LINQ is super-powerful and super-handy in too many cases to enumerate* in this short post. But if you use ReSharper (see Tuesday's post), you will learn it super-quickly.

(* See what I did there?)