Introduction
In this article Codesta interviews Peter Hallam, the development lead for the C# compiler. With questions ranging from why Microsoft invested in the creation of the new technologies, to discussing future features, the goal of the article is to get an inside look at Microsoft's newest programming language, C#, and the associated virtual machine, the Common Language Runtime (CLR).
A Look at Why
Codesta: Why did Microsoft create a new language?
Peter Hallam:
When we were looking for languages to target the CLR we did not find any existing ones which satisfied our core design goals that are now the foundation of the C# language. We felt that by designing C# we could add significant value to our customers over any existing language.
We regularly ask our customers whether they would prefer for Microsoft to continue to improve programmer productivity through innovation or if they would prefer that we limited our tools to accepted standards. Our customers have consistently returned with the desire that Microsoft continue to innovate while providing a solid migration story.
The existing C# language design is our first step in improving programmer productivity through language design. We have exciting plans for the future of C# which will deliver even more benefits to our customers.
Codesta: Why build a virtual machine if you can guarantee your platform?
Peter Hallam:
Virtual machines allow you to do some things which can be very difficult without them. Things like accurate garbage collection and code access security are dramatically easier to implement on top of a strongly typed virtual machine. Also, implementing RAD environments, which blur the line between design time and runtime, benefit significantly from virtual machine technology.
Microsoft has employed virtual machine technology in our products for a long time. The entire VB product line has been built on a virtual machine which has evolved since its inception.
One thing that Java did was bring garbage collection (automatic memory management) into the mainstream of programming. Before Java, garbage collecting programming systems were limited to a small fraction of the programming community. Java really proved to the programming community at large that garbage collection was mature enough for real world programming.
When COM was being designed (over a decade ago) there was serious discussion on whether it should be garbage collection based. Eventually the decision was made that the programming community wasn't ready for garbage collection.
The main argument against virtual machines has been performance, both memory usage and execution speed. Recent improvements in computing hardware have made virtual machine performance quite reasonable.
With garbage collection firmly entrenched in the mainstream development community, and networked computing driving security concerns, you will see more and more mainstream programming targeting virtual machines.
Discussing Design Goals
Codesta: What were some of the major design goals behind C#?
Peter Hallam:
C# is an object oriented, type safe, garbage collected general purpose language that was designed for high productivity. It inherits most of its syntax from the C/C++ family of languages. C++ is an incredibly powerful language which enables programming in many different styles, but this power comes at a cost of complexity. One of the goals of C# is to provide a simpler programming model than C++. This comes at a cost of some power, but the designers felt that this tradeoff was well worth it.
A significant trend in programming in recent years has been component oriented programming. COM has been the basis of Windows based component programming. C# has taken the core component ideas from COM, including interfaces, methods, properties, and events, and turned them into first class language features. This represented a key design goal for C#. Here's a quick example showing some of the important component features of the C# language:
// delegate types are like strongly typed
// function pointer types in C. They are
// a key part of C#'s eventing model
delegate void OnClickHandler(Button b);
class Button {
private string text;
// Text is a read/write property
public string Text {
get { return this.text; }
set { this.text = value; this.Invalidate(); }
}
// OnClick is an event
public event OnClickHandler OnClick;
// ClickIt fires the OnClick event
// which invokes all subscribers to the event
public void ClickIt() {
if (this.OnClick != null)
this.OnClick(this);
}
// more stuff goes here ...
}
class Demo {
static void Main() {
Button b = new Button();
// properties are referenced just like fields
b.Text = "Demo Button";
// subscribe to an event using +=
b.OnClick += new OnClickHandler(OnClickCallback);
// OnClickCallback will be called when the
// event is fired
b.ClickIt();
}
static void OnClickCallback(Button b) {
Console.WriteLine("{0} was clicked", b.Text);
}
}
The type system was a major design area for the C# language. There have been a couple of approaches to type system design in OO languages.
In pure OO languages like Smalltalk, everything is an object. This has a tremendous benefit in the programming model as it is easy to write very general code which can handle any kind of data. It does have a couple of drawbacks however. First, primitive types like 'int' or 'boolean' can take on a null value which is counterintuitive. And you also pay a performance penalty when operating on primitive types because all data must be allocated on the heap.
The common alternative to this is to split the type system and treat primitive types specially. This is the approach taken by C++ and Java. Primitive types are allocated inline rather than on the heap. This gets better performance for primitive types but loses the generality that you get from being able to treat all data as objects. Java has the additional restriction that user defined types must always be heap allocated, user defined types can never be allocated inline which is very useful for small types like points in a graphics application.
In C# predefined types like 'int' are allocated inline but they can be implicitly converted to an object as well. Primitive types get the performance of being allocated inline, but they can also be used very naturaly in general purpose code. User defined types can be specified to have reference semantics, meaning they are always allocated on the heap, or value semantics, meaning they are treated like primitive types and are allocated inline unless they are converted to object. C#'s type system really hits the sweet spot for performance, user model, and extensibility.
Here's a quick example:
// types declared using the class' keyword are
// reference types. Customer instances will be
// allocated on the heap
class Customer {
...
}
// Point is declared using the 'truct' keyword
// so it will have value semantics and will
// be allocated inline
struct Point {
public int x,y;
// even though Point is a value type it can
// override virtual methods defined on
// the object base type
public override ToString() {
return string.Format("({0}, {1})", x, y);
}
}
class TypeDemo {
// this method will print any value
static void PrintIt(object o) {
Console.WriteLine(o.ToString());
}
static void Main() {
// this will allocate a new Customer
// object on the heap
Customer cust = new Customer();
// Point is a value type so it will
// be allocated inline. There is no
// heap allocation here
Point p;
p.x = 5;
p.y = 10;
// reference types are objects
PrintIt(cust);
// strings are objects
PrintIt("hello world!");
// but integers are objects too
PrintIt(5);
// and user defined value types are objects
PrintIt(p);
}
}
The last major design goal I'd like to talk about is C#'s versioning features. Releasing new versions of software components which are compatible with previous versions is a very difficult problem. This problem is largely ignored by most programming languages.
Existing OO languages like C++ and Java suffer from a problem which has been called the fragile base class problem. The fragile base class problem is that when adding new features to a base class it is very easy to break existing derived classes. When adding a new virtual method to a base class, existing methods with the same name in derived classes will automatically override the new method.
If the semantics of the new method don't match the existing method in the derived class, which it almost certainly won't, then trouble ensues. The problem occurs because in C++ and Java the user cannot specify their intent with respect to overriding and so overriding happens silently by default.
In C# the user must explicitly specify their intent when a method in a derived class has the same name and signature as a method in a base class. Unless the user explicitly states their intention to override the derived method will not override the base method.
Suppose you ship version 1 of a component that looks like this:
public class BaseComponent {
}
And a client of your component writes a derived class like this:
public class ClientComponent : BaseComponent {
public virtual void Foo() { ... }
}
Now you decide that you want to add a Foo() method to your BaseComponent in version 2. You would add it like this:
public class BaseComponent {
public virtual void Foo() { ... }
}
Now in C++ or Java your ClientComponent's Foo() method would now override your new method. Calls to your new Foo() method would get dispatched to the ClientComponent's Foo() method. This is almost certainly going to lead to problems. The base version of Foo() is almost certainly not going to have the same semantics as the derived version.
In C# the derived Foo() in ClientComponent will not override the new Foo() in the base class. ClientComponent.Foo() will get its own vtable slot distinct from the vtable slot for BaseComponent.Foo(). The C# compiler will emit a warning that ClientComponent.Foo() will hide BaseComponent.Foo(). The user can get rid of the warning by explicitly stating their intent using the 'new' or override' keyword. The 'new' keyword indicates that the user intends the derived method to hide the base method. If however the user decides that the derived Foo() should override the base method they add the override keyword like this:
public class ClientComponent : BaseComponent {
public override void Foo() { ... }
}
This demonstrates a common design theme in C#, making the intent explict in situations which can easily lead to subtle bugs.
C# also takes advantage of the strong version aware binding features of the CLR. Together with requiring the user to specify their intent when overriding methods these features result in a great versioning story. Changes to the private implementation of a type cannot break existing derived classes of a type. Adding virtual methods to a type will not change the behaviour of existing derived classes.
There are a lot of other great features in C# but that's an overview of some of the key design points in the language.
Codesta: And the CLR, what were some of its design goals?
Peter Hallam:
While sharing some of the goals of C#, such as a unified type system, the CLR was designed to be a high performance platform for executing code. The Microsoft Intermediate Language (MSIL), the CLR's instruction set, was designed to be compiled to the native platform at install time or 'ust-in-time' (i.e. during execution) rather than being interpreted.
One of the primary goals of the CLR was to support multiple languages. This is reflected in the type system, object model and Intermediate Language (IL) which are general enough to support a broad range of languages. For the first time your C++ class can be a super class to a VB class. This has tremendous benefits for teams that include developers of varying backgrounds and skills.
At this moment there are over 20 languages targeting the CLR. Microsoft has C++, C#, VB, J#, and JScript. Outside of Microsoft you will find Cobol, Eiffel, Scheme, and many others.
Security was also quite important, though perhaps I am not the best to describe it. In general, the CLR has a rich security model to allow trusted code to safely interact with untrusted code.
Experience with DLL's and general application installation also played a role in designing the CLR. In particular the CLR was designed to allow easy deployment of applications, and to ensure that existing deployed systems are robust when new versions of existing components are installed. The CLR allows running multiple versions of the same component side by side.
A Few Deeper Details
Codesta: With Microsoft's push for increased security, how was the CLR affected?
Peter Hallam:
Making it easier for our customers to write secure code is one of the core goals of the CLR. Running code on a secure virtual machine allows the barrier between untrusted and trusted computing to exist inside a single process. Trusted components can leverage the services of untrusted components within the same process. The CLR enables you to write secure components in this manner and evolves the programming model forward with respect to security.
Writing secure code is extremely difficult however, and there are no silver bullets. The only way to write secure code is to think hard about security through the entire design and development process. Secure code can be written using most programming environments. The CLR makes it easier for programmers to deliver secure solutions.
Codesta: C#/CLR has 2 kinds of code, safe and unsafe. What is it trying to provide and how did this affect the virtual machine?
Peter Hallam:
For C# the terms are safe and unsafe. The CLR uses the terms verifiable and unverifiable.
When running verifiable code the CLR can enforce security policies; the CLR can prevent verifiable code from doing things that it doesn't have permission to do. When running potentially malicious code, code that was downloaded from the internet for example, the CLR will only run verifiable code, and will ensure that the untrusted code doesn't access anything that it doesn't have permission to access.
The use of standard C style pointers creates unverifiable code. The CLR supports C style pointers natively. Once you've got a C style pointer you can read or write to any byte of memory in the process, so the runtime cannot enforce security policy. Actually it could but the performance penalty would make it impractical.
Unverifiable code is useful for interoperating with existing non-CLR code (existing C DLLs and COM components). It can also be useful when dealing with existing binary formats found in things like disk files and low level network protocols. This way you don't need to write custom marshalling code in C++ to access legacy components and binary formats.
Any unverifiable code must be fully trusted for the CLR to run it. Unverifiable code is often used to write a secure API on top of an existing legacy component. Many of the .NET Framework libraries are written entirely in C# using unsafe features to access the underlying platform.
The term unsafe has always bothered me. Unsafe code in C# really means more power to access the machine at a lower level without resorting to a lower level language like C. It is certainly possible to write safe secure applications in unsafe C# in the same way that it is possible to write secure applications in C. The difference is that in safe C# the strongly typed nature of the language and the security features of the runtime make it significantly easier to write secure code.
When using unsafe C#, as with standard C code, much more of a burden is placed on the coder to write safe secure code. The tradeoff between safe and unsafe code is really productivity versus power. It is not a tradeoff of safety. The C# language designers wanted to discourage the use of C style pointers and so the best keyword they found was the unfortunately named 'unsafe'.
Codesta: In the early days of Java you created an experimental Just-In-Time (JIT) compiler/virtual machine. How would you compare the Java and CLR virtual machines?
Peter Hallam:
The CLR design is much more mature than the JVM. The JVM made some design choices which have proven to be fairly limiting in the real world.
The JVM byte code was originally designed to be interpreted even though most implementations of the JVM now compile to the native machine for better performance. MSIL was designed to be compiled up front so it is more compiler friendly.
The JVM was designed with only one language in mind so it is missing many useful constructs which makes targeting the JVM painful for other languages. This includes the lack of user defined value types, delegates (type safe function pointers), byref types as well as pointer types.
The JVM's class file format and deployment model have a number of shortcomings. The class file format makes some sense if you are deploying a single class at a time, but in the real world that rarely happens. Typically you deploy a library of inter-dependant classes which you've tested together.
In addition, the JVM has no story for executing multiple versions of a class library in the same environment. The CLR's side by side features and deployment model is an enormous win.
Finally, JNI, the JVM's interoperability story, is extremely weak. To interoperate with existing code you must always write a custom marshaling layer in another language like C/C++.
In C# and the CLR most legacy code can be accessed directly from C# code using the built-in platform invoke and COM interop features of the CLR and the unsafe features (C style pointers) in C#. This lowers the bar to interoperating with legacy systems. Often you don't need to go to another language to access underlying platform features when programming in C# or any other language which targets the CLR.
High level constructs provided by strongly typed object oriented languages are great for building complex systems but ultimately it all comes down to bits. Java and the JVM really make it difficult to access the underlying computing platform.
A Broad Look at the Features
Codesta: Microsoft has often criticized for its lack of innovation. What features of C# and the CLR are truly innovative?
Peter Hallam:
The list here is pretty long so I'll stick to a few highlights:
- the unified type system
- the deployment and versioning support
- supporting multiple languages
It is also true that many of the innovations in C# and the CLR are evolutions of technologies and designs which have been around for many years. Garbage collection is a good example. Garbage collection, including the features for getting good performance such as incremental and concurrent garbage collection, has been around for several decades. The garbage collector in the CLR, which was built upon research from the Compuer Science community, is truly world class.
Codesta: Where did some of the C#/CLR language features get their inspiration from?
Peter Hallam:
C# and the CLR draw heavily from existing computing practice and theory. C# inherits most of its features from the C and C++ family of languages. Garbage collection comes from the dawn of computing (aka before I was born). Many of the component features (properties, delegates and events) come from the VB and COM world. The unified type system sprang from a desire to get the ease of use of fully object oriented type systems from languages like Smalltalk and combine it with the performance benefits of type systems from languages like C and Pascal. The versioning features are pretty much new to C#.
Codesta: Are there any future features in C# or the CLR you can talk about?
Peter Hallam:
Anders Hejlsberg, the chief designer of C#, announced some of the plans for C# at OOPSLA 2002 in Seattle. These included generics (similar to C++ templates), anonymous methods (unnamed code blocks encapsulated in a delegate) and iterators (a mechanism for traversing collections).
Taking a closer look at generics, we've had some really smart guys in our research department working on a design for both C# and the CLR and they've done some amazing work. By adding generics to the CLR as well as to the language we really hit the sweet spot in all dimensions - execution performance, type soundness, type identity, MSIL size, and native code size.
Here's a quick example of what generics will look like:
class Stack{ T[] buffer; int count; public void Push(T value) { buffer[count++] = value; } public T Pop() { return buffer[--count]; } // more code goes here ... } class GenericDemo { static void Main() { // generic types can be instantiated // with reference types like string ... Stack strStack = new Stack (); strStack.Push("hello"); // ... and also with value types like int Stack intStack = new Stack (); intStack.Push(5); } }
Some Miscellaneous Questions
Codesta: In their current incarnations, would you say C# or the CLR are more suitable for front-end development or back-end development?
Peter Hallam:
Yes and Yes.
VB has been, and continues to be, Microsoft's premier front-end development tool. VB .NET, targeting the CLR, is the next step in Microsoft's RAD client side tools. C# .NET leverages much of the VB .NET design time tools and libraries so it has a great client side development story as well. A lot of great things like this fall out because the CLR and the .NET Framework libraries are designed to be multi language.
On the server side we've got ASP .NET, web services, database integration ... everything you want to write server side apps. We've gotten a ton of positive feedback on the runtime performance and programmer productivity of ASP .NET.
Codesta: Is there the possibility of seeing a product, such as Office, or maybe parts of future versions of Windows implemented in C# and/or making use of the CLR?
Peter Hallam:
Actually some parts of Visual Studio .NET are already written in C#. Microsoft has made a huge bet on the CLR being a key part of our future platforms and you can expect to see that reflected in everything we do going forward.
Codesta: In general, what kind of industry support have you seen for C# and the CLR?
Peter Hallam:
I'm probably not the right guy to ask about this, but the few times I have talked with customers I've received very positive responses. There are C# development tools being produced by other vendors; this is a great indication of industry support behind C# and the CLR. We've also had a lot of interest in the CLR from academia with the CLR being used as the target of many research compilers.
Codesta: What is your view of projects like Mono and your general opinion of bringing C# and the CLR to non-Microsoft platforms?
Peter Hallam:
I think the Mono project is a great validation of our direction with C# and the CLR. It's great to see an independent implementation of C# and the CLR. I look forward to seeing what they produce.
C# and the Common Language Infrastructure (the CLI, a subset of the full Microsoft CLR) have been accepted as international standards by ECMA. Microsoft has also released Rotor, a shared source implementation of C# and the CLI which targets Windows XP, FreeBSD, and more recently Mac OS X 10.2.
Codesta: Final Questions, Why use C# to build your next application or back-end system?
Peter Hallam:
Productivity. Simple as that.
C#'s goal is to be the most productive way to deliver your solution whether it's a rich client or server.
Conclusion
Codesta would like to thank Peter Hallam for his time and his willingness to answer these questions. C# and the CLR are important Microsoft technologies and the insight is much appreciated.
If you have any questions or comments regarding this article, please do not hesitate to e-mail comments@codesta.com!
Reference Material
Peter Hallam's Biography
With over 10 years of experience Peter Hallam is the development lead on the C# compiler and a member of the C# language design team. He has worked at Microsoft for 7 years spending the last 4 years on the C# compiler. In previous roles at Microsoft he worked on Visual Basic for Applications, OLE Automation, Windows CE, and 64-bit Windows XP. Before joining Microsoft he was a software engineer at Iris Power Engineering and CitiBank. Peter Hallam graduated from the University of Waterloo in 1994 with a Bachelor of Mathematics.
Reference Links
| Microsoft Links | |
|
|
|
| Microsoft Visual C# | |
| The home page for Microsoft's Visual C# Development Environment. | |
| Introduction to C# | |
| A Microsoft article that provides an introduction to C#. | |
| C# Language Specification | |
| Microsoft's language specification for C#. | |
| .NET Framework FAQ | |
| A Microsoft Frequently Asked Question (FAQ) list covering the .NET framework, including the CLR. | |
| Microsoft's Shared Source CLI | |
| Microsoft's shared source version of the ECMA compliant CLI implementation for Windows XP, FreeBSD and Mac OS X 10.2. | |
| GotDotNet | |
| A Microsoft web site dedicated to the .NET framework and technologies. | |
| New C# Language Features | |
| This page and associated links takes a look at new C# features recently announced at OOPSLA 2002 in Seattle. | |
| ECMA Links | |
|
|
|
| Standard ECMA-334: C# Language Specification | |
| The December 2001 ECMA C# Langauge Specification Standard. | |
| Standard ECMA-335: Common Language Infrastructure (CLI) | |
| The December 2001 ECMA Common Language Infrastructure Standard. | |
| Additional Links | |
|
|
|
| Mono Homepage | |
| The site dedicated to an open source implementation of C#, the CLR, and the .NET framework. | |
