2. www.tothenew.com
Serialization - Basic Concepts
➢Serialization is the encoding of objects, and the objects reachable in them, into a stream of bytes.
➢Concept is by no means unique to Java, but PPT is related to Java’s Serialization.
➢It is basis for all PERSISTENCE in java.
➢Handles versioning with the use of serialVersionUID.
➢Adding Marker interface - Serializable makes the class serializable.
➢Transient and static fields are not serialized.
3. www.tothenew.com
Serialization - Advantages
➢Provide way to hook into Serialization process
○ by providing implementation of readObject() and writeObject()
○ by providing implementation for readExternal() and writeExternal()
➢When you want to serialize just part of the class
○ provide implementations for readResolve() and writeReplace() methods describing what you
want to serialize.
➢Object Validation
○ provide implementation for validateObject() of ObjectInputValidation interface, which shall be
called automatically when de-serializing the object
4. www.tothenew.com
Serialization - Problem
➢Slow Processing
○ Serialization discovers which fields to write/read through reflection and Type Introspection, which is usually slow.
○ Serialization writes extra data to stream.
○ You can offset the cost of Serialization, to some extent, by having application objects implement java.io.Externalizable, but
there still will be significant overhead in marshalling the class descriptor. To avoid the cost, have these objects implement
Externalizable, and call readExternal and writeExternal on them directly. For example, call obj.writeExternal(stream) rather
than stream.writeObject(obj). See this Link
➢No proper Handling of fields
○ readObject() and writeObject() may not handle proper serialization of transient and static fields.
○ when default handling is inefficient, use the Externalizable interface instead of Serializable.
○ this way you need to write readExternal() and writeExternal(), a lot more work for simple serialization.
➢Not Secure
○ Because the format is fully documented, it's possible to read the contents of the serialized stream without the class being available
➢No proper version handling, even using serialVersionUID won’t help much. Not using it makes the Serializable class not version changes in class,
and using it will result in API break when version changed.
5. www.tothenew.com
Protocol Buffer - Basic Concepts
➢Library for Serializing Messages
➢Protocol Buffer is a Serialization format with an interface description language developed by Google
➢Write a .proto file with structure of data(message format) and run it through protocol compiler,
generate classes in java
➢Each class has accessor for fields defined
➢Methods for parsing and serializing the data in a compact and very fast data
➢Protocol buffers are Strongly typed
➢Handles Versioning Automatically
➢Generates Classes into C++/ Java/ Python
○ More languages supported into external repos(C#, Erlang etc)
➢Each generated class represents a Single Message
➢protoc generates code that depends on private APIs in libprotobuf.jar which often change between
versions. So, use same version in maven as the compiler installed on system.
6. www.tothenew.com
.proto File
➢Defines a message format/class
➢Simple syntax for defining message
➢Fields in a message class must be identified via a numeric index
➢Field have a name, type and descriptor such as it’s a required field or not
➢Messages can import or subclass other messages
7. www.tothenew.com
Sample .proto File
package java;
option java_package="com.shashi.protoc.generated";
option java_outer_classname="AddressBookProtos";
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber number = 4;
}
message AddressBook {
repeated Person person = 1;
}
8. www.tothenew.com
import Command
➢Simply import another .proto file
➢Allows for separating different message classes into different files
➢Imported file should be into same directory
○ can be into another directory, in case have to specify additional argument to protoc compiler
9. www.tothenew.com
package Command
➢In message file, generate namespaces
➢package abc.def would mean
namespace abc {
namespace def {
. . .
}
}
➢package here has same significance as in java Language.
10. www.tothenew.com
message Command
➢Encloses a message class
➢Follows the term “message” with the name of the message, which will become it’s Java Class name
➢Message classes are encapsulated
11. www.tothenew.com
enum Command
➢Enum followed by the name of enumeration
➢Zero based enumeration
➢will produce actual Java Enumeration
➢Simple defines an enumeration, will not create a field in the message for that enumeration
12. www.tothenew.com
Fields
➢Fields are members of the message class
➢Convention is [descriptor] type name = index
➢index is 1-based
➢index 1-16 are better performing than 17+, so save 1-16 for the most frequently accessed fields
13. www.tothenew.com
Descriptor
➢Describes the field
➢Required means that the message requires this field to be non-null before writing
➢Optional means that the field is not required to be set before writing
➢Repeated means that the field is a collection(Dynamic array) of another type
○ For historical reasons, repeated fields of scalar numeric types aren't encoded as efficiently as they could be. New code should use the special
option [packed=true] to get a more efficient encoding
message AddressBook {
repeated Person person = 1 [packed = true];
}
14. www.tothenew.com
Types
➢The Expected type of the field
➢There are range of integer types and String types
➢Can be name of an enumeration
➢Can be a name of another Message class
15. www.tothenew.com
Class Generation
➢Use the Protoc Compiler
➢protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto
➢Use your classes via aggregation
○ DO NOT inherit from your message class
16. www.tothenew.com
Advantage / Disadvantages
➢Advantages:
○ If you add new fields in the structure, and there are any old programs that dont know about
those structures then these old programs will ignore these new fields.
○ If you remove a field, old program will just assume default value for this deleted field.
➢Disadvantages
○ Can not remove required fields once added. Have to plan schema in advance.
■ suggested to add only optional fields. make only id etc required.
○ Just a way to encode data, not an RPC
■ it’s designed to be implemented with any RPC implementation
○ Not for Unstructured text
○ Not great if your first priority is human readability(Not Good for debugging and stuff)
17. www.tothenew.com
Alternatives
➢Apache Avro :
○ Essentially ProtoBuf with RPC facility, it is a Data Serialization and RPC framework used in
APache Hadoop
○ Dynamic Typing - no code generation required, only schema in json format
■ Can optionally use Avro IDL.
○ No Static Data Types - facilitates generic data-processing systems
➢Apache Thrift:
○ a code generation engine. Has a IDL and Binary communication protocol developed by FB
○ Facilitates calling between different language platforms
○ Instead of writing a load of boilerplate code to serialize and transport your objects and invoke
remote methods, you can get right down to business.