Wednesday, January 29, 2014

Difference Between Serialization And Marshaling

  Difference Between Serialization And Marshaling

Serialization

Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file, a memory buffer, or transmitted across a network connection to be "resurrected" later in the same or another computer environment. And this sequence of bits can be of any format the user chooses; however, they are usually formed as XML or binary.
Serialization comes in many forms in .NET Framework, it can be observed in ADO.NET, Web services, WCF services, Remoting, and others.
For example, calling the WriteXml() function of a DataSet serializes this DataSet into a XML file.


ds.WriteXml("data.xml");


And if we have such this structure:


1
public struct User


2
{


3
   public int id;


4
   public string name;


5
}


we can get the following results if we serialize a collection of that structure into XML:ers>


01


02


03


04
   


05
       12


06
       Mark


07
   


08
   


09
       13


10
       Charles


11
   


12
   


13
       14


14
       John


15
   


16


Serialization can be observed in Web and WCF services too. The request and parameter information for a function are serialized into XML, and when the function returns the response and the returned data too are serialized into XML. Actually, you don't have to think about these XML data, CLR handles this for you.
In the same vein, when it comes to Remoting, the sender and recipient must agree to the same form of XML data. That's, when you send some data CLR serializes this data for you before it sends it to the target process. When the target process receives this XML data, it turns it back (deserializes it) to its original form to be able to handle it.
Thus, the process of converting data structures and objects into series of bits is called Serialization. The reverse of this process, converting these bits back to the original data structures and objects, is called Deserialization.
Therefore, the following ADO.NET line does deserializes the XML file:


1
DataSet ds;


2
ds.ReadXml("data.xml");


And when your application receives response from the server or from another process, the CLR deserializes that XML data for you.
So why XML is preferred over binary serialization? That's because XML is text-based. Thus, it's free to be transmitted from a process to another or via a network connection, and firewalls always allow it.

Marshaling

Marshaling is the process of converting managed data types to unmanaged data types. There are big differences between the managed and unmanaged environments. One of those differences is that data types of one environment is not available (and not acceptable) in the other.
For example, you can't call a function like SetWindowText() -that sets the text of a given window- with a System.String because this function accepts LPCTSTR and not System.String. In addition, you can't interpret (handle) the return type, BOOl, of the same function, that's because your managed environment (or C# because of the context of this writing) doesn't have a BOOL, however, it has a System.Boolean.
To be able to interact with the other environment, you will need to not to change the type format, but to change its name.
For example, a System.String is a series of characters, and a LPCTSTR is a series of characters too! Why not just changing the name of the data type and pass it to the other environment?
Consider the following situation. You have a System.String that contains the value "Hello":


System.String str = "Hello";


The same data can be represented in an array of System.Char too, like the following line:


System.Char[] ch = str.ToCharArray();


So, what is the difference between that System.String variable and that System.Char array? Nothing. Both contain the same data, and that data is laid-out the same way in both variables. That's what Marshaling means.
So what is the difference between Serialization and Marshaling?
C# has a System.Int32, and Windows API has an INT, and both refer to a 32-bit signed integer (on 32-bit machines.) When you marshal the System.Int32 to INT, you just change its type name, you don't change its contents, or lay it in another way (usually.) When you serialize a System.Int32, you convert it to another form (XML for instance,) so it's completely changed.

Summary

I mean that, Marshaling is a very general term used to describe transformations of memory. Theoretically, it's more general than Serialization. In Python for instance, the terms Marshaling and Serialization are used interchangeably. There (in Python,) Marshaling = Serialization, and Serialization = Marshaling, there's no difference. In computer methodology, there's a silent difference between Marshaling and Serialization (check the Wikipedia definition.)
So what is that System.MarshalByRefObject class? Why that name -specifically- was used? First, System.MarshalByRefObject class allows objects to be passed by reference rather than by value in applications that use Remoting.
Personally, I like to say that Microsoft .NET Framework team's name was very scientific when they have called that object "MarshalByRefObject" with respect to that silent difference between serialization and marshaling or maybe that name was derived from Python, dunno!
After all, we should keep in mind that in .NET methodology, there's a big difference between Serialization and Marshaling, Marshaling usually refers to the Interop Marshaling. In .NET Remoting, it refers to that serialization process.
By the way, Marshalling is so named because it was first studied in 1962 by Edward Waite Marshall, then with the General Electric corporation.

No comments:

Followers

Link