Featured Post

Event Sourcing Video from Michael Ploed

Event Sourcing I want to share a great video I found few days ago that describes very well what Event Sourcing is.

Monday, February 19, 2018

Scalability Techniques


Scalability is about the capability of a system to manage the increase workload of a system without causing qualities impact (i.e. performance, availability).

Scalability can be consider inversely proportional to the increase of time spent for individual work units to complete.

Some methods that betters scalabilities are:

  1. Caching: reduce the overhead of fetching resources (application cache, object cache, CDN)
  2. Asynchronous Communication: reduce the impacts of problems affecting the called service. It decouples and preserves autonomy of a system from another
  3. Stateless Application: allow to introduce architectural components that route request to stateless applications (load balancer)
  4. Data Proximity: reduce the overhead to fetch data remotely by designing component in a way that data is in the proximity of client application
  5. Pooling: reduce the overhead to create expensive resources by pooling them. This also allows to tune the capacity of important resources in order to protect the overall system qualities
  6. Parallelization/Splitting: decrease the time taken to complete a unit of work in a distribute or local manner
  7. Coarse Grained API: making interfaces more coarse-grained reduces the overhead of handling/consuming a lot of requests to complete a service operation

Friday, February 16, 2018

Spring Cloud Config Overview

Spring Cloud Config

  • It has no concept of peers
  • It can be used with two types of backend:
    • Git backend
    • File System backend (with usual techniques against hard disk failures for   high availability)
    • Vault backend (tool for securely accessing secrets)
    • JDBC backend
  • Configuration values are read on client's startup and in order to refresh, spring cloud offers the refresh endpoint to reload the properties
It's possible to implement a runtime notification of changes to configuration:
  1. The Config Server subscribes to the notifications of Git changes. 
  2. The Config Server via the Spring Cloud bus notifies every Config Clients regards of the configuration changes
  3. Config Clients and Config Server communicate eachother using Spring Cloud Bus 
  4. Config Clients call the refresh endpoint that spring actuator exposes in order to reload the context of configured @RefreshScope bean


Scalability Axis: The Scale Cube

Scalability can be obtained along different axis:


  • AXIS X (Scale by cloning):  scalability is obtained by cloning all the services (services and data)
    • Applications are cloned: 
      • if they are stateless,  applications are behind a balancer that redirects requests to every cloned application
      • if they are stateful, applications need to preserve state in a different way
    •  Data are cloned: data synchronization is required to make all nodes updated with the latest information. This synchronization is made by the master via asynchronous operations.

  • AXIS Y (Scale by splitting different things) scalability is obtained dividing the entire system into different functional subsystem that can scale indipendently
    • The entire workload is split among microservices each having a distinct functionality
    • It relies on the decomposition of applications/systems into services
    • Each request for a service is assigned to the responsible node for that service

  • AXIS Z (scale by splitting similar things):  This type of scaling is obtained by data partitioning

    • Each node/server is responsible for only a subset of data
    • There are components whose responsibility is to route request to the appropriate node/server
    • Databases usually use z-axis scale (e.g. cassandra sharding)
    • An advantage of this z-axis is the reducing of memory usage with a drawback of more complexity



Wednesday, February 14, 2018

DDD components

In DDD there are four components to take into consideration:

Entity

An entity is something identifieble and addressable with a unique identifier. It represents a specific object in the domain that may evolves and changes over time. Changes need to be tracked and maintained by making entities searchable and distinct from other entities.

Value Object

A value object is important for the attributes it transports.  It does not refer to a particular object that the domain naturally is composed by. 
Value object has a set of properties that capture a specific aspect of the domain.
Value object has to be "side effect free": they do not have to be modified because of every aspect captured has to be preserved immutable.

Repository

A repository takes care of every ope9ration that discovers entities from their storage ( database, in memory location, file system). Its responsobility is also parks entity into one or multiple storages depending of your needs.

Service

A service takes care of domain aspects that are out of the scope of  an entity. It usually collaborates with repository to fetch entities and uses entitities to do actions that makes value for the domain.
At design time a service can be identified when an operation involves two or more entities.

Futher details in new posts.

Monday, February 12, 2018

NoSQL Datamodel

Key Value

A Key Value store is similar to a document based store, except for the fact that is more like an hash table where the key is unique and the value is not necessarily a JSON structure. Furthermore the Key Value store accepts different types from value to value.
General use cases are:
  • Session management
  • Profile management
Key Value store are precious when dealing with different data that can be identified by a unique key.
Example of key-value store:
  • Redis
  • Riak
  • RocksDB

Document Oriented

A Document Oriented stores can be considered an extension of a key-value store, where the key can be specified or provided by the database. Document Oriented database uses generally value types: XML, JSON, BSON. Documents offer a richer capability to index or search, and they overcome limitations building nested document compared to columnar databases.
Document database are well when:
  • Managing list of variable attributes, such as products
  • Tracking variable types of metadata

Example of document oriented store:
  • MongoDB

Column Oriented

A column-oriented database stores each column continuously. In this way columns with the same value are neighbour and values can be compressed.
Column database are well when:
  • Read operations affects many rows with few columns
  • Compression is needed

General use cases are:
  • Event logging
  • Analitycs
Example of column based store:
  • Apache HBase

Graph Based

Graph database are well when:
  • Very connected data

Relational Database uses foreign key to build references between tables and when a query requires to navigate a relationship, join operations are used at runtime with increasing cost and time.
Graph Databases use the concept of node that directly contains a list of relationships-records towards to other nodes. Every time in a RDBMS a join has to be used, in a Graph DB the direct access to its connected nodes eliminates the need for expensive searches and match computations.
Actually JOIN tables are transformed into data relationships.

References

  • https://www.mongodb.com/scale/nosql-database-comparison
  • https://blog.pandorafms.org/nosql-databases-the-definitive-guide/

Saturday, February 10, 2018

Software Quality Requirements



Here are some quality measures that should be consider in the development process of a software product:

Rigidity

A software is affected by rigidity is the design is difficult to change. The development team has serious problems to change a previous decision, so most of choices are influenced by past decision.

Fragility

The design is easy to break and even if unit tests help in preserving the current behaviours, every future action is aimed at keeping certain components repaired from changes.

Immobility

The design is difficult to reuse and developers prefer wrap functionalities instead of using. This increases the amount of code for a certain feature and paves the way for the bad habit of circumventing solutions.

Viscosity

It is difficult to do the right thing because the design and responsibilities are clear and documented but the software is closed to modification.

Needless complexity

Design is a balance between openess and closure to changes. TDD helps in the development of new features.  So avoiding overdesign leads to semplicity.

Opacity

The software has unclear targets, design and aims. This confusion leads to waste of time for the development team, especially in the case of poor documentation.


These quality requirements for software should be monitored and any quick or slow actions must be planned regardless of costs.  There is no money return for software that is affected by these problem: the absence of love for the software production, reflects itself in the trust and love of customers.

Friday, February 9, 2018

Eureka (Netflix Registry)


Eureka is a Registry offered by Netflix that is based on a REST API. Eureka consists of two components: 
  • Eureka Server: a running instance of Eureka;
  • Eureka Client: a framework part of an application.

Eureka Client 

Eureka Client registers to an Eureka Server and sends an heartbeat every 30 seconds in order to inform Eureka server that the client instance is still alive.

Eureka Server 

The Eureka Client registers with Eureka Server and after fetching registry information, it stores locally and every 30 seconds, it refreshes those information.

  • If Eureka Server does not receive heartbeat over a configurable time, the Eureka Client instance is removed from the registry.
  • If the Eureka Server receives less heartbeat than forecasting, it goes into self-preservation: it stops expiring already registered instances.

Eureka is a registry that comes with different modes:
  1. Standalone
  2. Peer Aware

  • Eureka achieves high availability by a synchronization among Eureka peers. 
  • Eureka Server are aware of other instances, in fact every Eureka Server acts as Eureka Client and registers to its peer. 
  • Eureka Servers replicate registry information with peers. 

Eureka Client and Availability Zone

  • Eureka Client uses the same Availability Zone by default (a concept that matches the Availability Zone and Region of AWS cloud platform).  
  • Eureka Client manages failover by using service url from other zones.


REACTIVE MANIFESTO, REACTIVE PROGRAMMING

REACTIVE MANIFESTO

The four principles of the Reactive Manifesto are:

Responsive

Responsiveness is the capability of a system  to respond in a timely manner in respect of usability and utility. It encourages rapid response in order to provide a greater quality of service.

Resilient

Resilience is the capability of a system to stay responsive in the case of failure.

Resilience is achieved by isolating components and ensuring that when they fail does not compromise the entire system. 
Provide resilience means to identify component that has a business critical functionality and assign to each component a score in your system. 
The first citizien in resilience is identify and classify functionalities that if do not perform well, take the entire system to be useless.

Elastic

Elastic is the capability of a system to responsive when its workload changes.

The increase or decrease of worload must be handled in a way every component still run at the same quality level in term of performance and reliability.

The core concept is to distribute input of the entire system over many components. This capability of a system is achieved by replicating components, avoiding bottlenecks.

Message Driven

Message Driven system makes use of asynchronous messages that realize:

  1. Decoupled components
  2. Defined system boundary
  3. Enhanced flow control
  4. Let components collaborate without disturbing eachother
  5. Work in a location transparency environment.

REACTIVE PROGRAMMING

Reactive programming is a paradigm that tells a consumer to react to the data as it comes in. The practice of using asynchronous events to propagates event changes, pave the way to put business operations to be executed in reaction of notified event.

Reactivex is a project which provides implementations for this concept for different programming languages. 

Thursday, February 8, 2018

GOOD ARTICLE ON JPA



This is a great article on JPA, which makes a good reference to the JAVA BLOG.

https://www.researchgate.net/publication/313263324_Review_on_ORM_based_JPA_implementations

Thanks for the consideration.

Lean Principles

Eliminate waste

Eliminate stress, eliminate waste of time is a continous practice that has to be taken by the entire development team but expecially is a principle that has to be understood by the entire organization.
The most important activities are those that create valuable software.

Amplify learning

The software development is a learning process so every opportunity is right to learn. Creating processes, using tools, listening to people and learning from experience are all valid elements to improve the way an organization work and build software.

Decide as late as possible
Making decisions in a project is a normal thing and happens continuously. One of the agile principles is to postpone as much as possible a decision, not with the purpose of procrastinating a decision for incompetence,  but in order to collect as much information as possible to take conscious actions. What often happens in information technology is to make decisions that are the result of constraints and deadlines and to lose opportunities to learn from conscious decisions.

Deliver as fast as possible

Delivering quickly is important bothso for quick feedback and for reducing the work queue. Feedback coming from the outside faster helps to bring new perspectives (the perspectives of the stakeholders)

Empower the team

Empower a team means organizing work in a way that each team member feels to do a great job. A team member that is involved, stablish a focused and effective work.

Build integrity in

Dividing a work package into tasks is sometimes a technical way of distribute and assign to team members. This can lead to build an unsupervised and unchoherent whole.

See the whole

Because of stakeholder's feedback and any other product or project input, the entire picture may be different from that expected.

Wednesday, February 7, 2018

RabbitMQ Exchange




Exchange 

RabbitMQ has the concept of Exchange: "messages are not published directly to a queue, instead, the producer sends messages to an exchange".
The Exchange is therefore responsible for the routing of the messages to different queues. It uses. The Routing Key (message attribute) in order to decide how to route the message to queues.


Roting Key

  • The Routing key is a message attribute in the message header sent by the producer.
  • The Routing Key can be a list of words delimited by a period:  robotics.us , robotics.eu.model.


Exchange Types


There are four types of exchange:

  1. Direct Exchange:  a message goes to a queue  if the binding key exactly matches the routing key of the message;
  2. Topic Exchange: a message is queued based on wildcard matches between the routing key and a routing pattern specified by the queue binding. Messages are routed to one or many queues. Each consumer declares which topics they are interested in with a binding that gives a routing pattern to the exchange. 
  3. Fanout Exchange: a message is copied and routed to all queues that are bound to it regardless of routing keys or pattern matching;
  4. Header Exchanges: a message is routed based on arguments contained int the headers and optional values instead of routing keys.

References

https://www.rabbitmq.com/documentation.html

Tuesday, February 6, 2018

XII APP FACTORS

1. One codebase tracked in revision control,many deploys

Each app should have only one codebase used for different potential deployments. Each deployment of an app may differ in environments properties and capabilities.

2. Explicitly declare and isolate dependencies

You should not rely on system tool. Dependencies has to be declared into a declaration manifest so that any developers can use it via a package manager.
Managing application's dependencies with isolation and explicitly declaration means that you can deploy with a repeatable process.

3. Store config in the environment

Configuration has to be kept separated from code and credentials. All the code that is immutable among environments is to be consider the invariant part of you application.
A good practice is to consider configuration as supplied by environment properties or to fetch properties from a config server that provides properties depending on the target environment.
Another good practice is to keep a version control of your configuration.

4. Treat backing services as attached resources

A backing service is any service that your application uses to fulfill its functionality (another application service, a datastore, etc...).
Your application has to declare its need to use a service, and let the cloud to bind the declaration with the actual/proper values.

5.   Strictly separate build and run stages

A codebase has to be transformed into a production deployment. This can be done via three stages:
  1.   build: code repository is converted into versioned artifact;
  2.  release: push to cloud environment. Combining the output of build stage with the environment to produce an immutable artifact;
  3.   run: done by the cloud provider; a general pattern is to place your application is in a container based system.

6.   Threat logs as event stream

Logs should be threated a sequence of events emitted from an application in time-ordered sequence. This sequence can be collected together and routed to on or more final destinations for viewing or long-term archival.

7.   Execute the app as one or more stateless processes

Prefer stateless application: application that handles requests, without assumptions about memory contents. If necessary keep long state from a backing services. This means that the state should not be maintained in your application.

8. Export services via port binding

Cloud-native application is selfcontained and does not rely on external application server or a container. In this way via runtime port binding, it can act as a backing service for another application.

9. Concurrency

Scaling out in cloud-native applications should be done with a process model. Increase application size (CPU, RAM) to reach new capacity is an old model.

10. Disposability

Disposability is a capability of your application to come with a fast startup and shutdown. This capability facilitates fast elastic scaling and rapid deployment and all changes that can happen in a production environment.

11. Development, Staging, and Production as similar as possible

Keep the gap between development and production environment as small as possible.
A developer can write code in a way that the development team is always involved in solutions closed to production environment.

12.  Run admin/management tasks as one-off processes

One-off admin processes should be run in an identical environment as the regular long-running processes of the app. They run against a release, using the same code and config as any process run against that release. Admin code must ship with application code to avoid synchronization issues.

References 

https://roots.io/twelve-factor-12-admin-processes/
https://12factor.net/admin-processes

Monday, February 5, 2018

MongoDB Shards and Replica Set


Data Model

Mongo DB uses a flexible schema: a document store. 


  • Document: A document is a BSON object that consists of an ordered list of element, each element has a field, type and value.
  • Collection:  a group of documents. Something equivalent to a RDBMS table.

Data References 

Two types of references:

  1. Links or references from one document to another
  2. Embedded Data

Atomicity

MongoDB  supports write atomicity at the document level. With a denormalized structure a single write  operation can allow to insert or update a document, therefore a normalized structure implies to split operation into multiple write operations (not atomic).


Sharding


Sharding is a mechanism used by MongoDB to scale horizontally. Data are spread in the cluster based on the sharding key.
A client of a sharded cluster uses mongos in order to query the cluster. Client applications do not connect directly with the shards.
In a sharded cluster all mongod instance should be able to connect to all other instances in the cluster

In a sharded cluster consider the following elements:

  1. Shard Instance: that contains a subset of the sharded data.
  2. Mongos: is a query router that decouples client from the sharded instances
  3. Config Server: mongod instance that stores metadata and configuration (generally in a replica set configuration)

  • Mongos is able to direct a query to a specific shard if the query include the shard key otherwise a broadcast query is sent to every shard;
  • Shard is a decision to make at collection level; it's possibile to have a mix of sharded and unsharded collections: sharded collections are spread in shards, while unsharded collections are stored in the primary shard (generally the mongos having less data)
  • Collections are divided into chunk based on the shard key across shards in the cluster (default chunk size is 64 MB)


 Replica Set


MongoDB supports redundancy and availability via Replica Set: a group of mongod instances with the same data set.

In a group of mondod instances there are two types: primary node, and secondary node.

  • A primary node  receives all write operations and records changes into its dataset in the oplog.
  • A secondary node uses the primary's oplog and apply the operations to their data sets. If the primary node becomes unavailable a leader/primary election runs.
  • An extra mongod instance (arbiter) is used to maintain a quorum in a replica set for the election requests
Read and write operations are managed by the primary node, but it's possibile to specify read preferences in order to query directly a secondary node giving up consistency.






Thursday, February 1, 2018

KAFKA PARTITIONS

TOPIC

A Topic log consists of many partitions which can be spread on multiple Kafka nodes. Kafka distributes topic partitions on different nodes in a cluster for high performance with horizontal scalability.

Partitions

A topic partition is the unit of parallelism in Kafka.

Producers

  • Producers writes to different partitions can be done fully in parallel and dont' wait for acknowledgements from Kafka and sends messages as fast as Kafka can handle. 
  • Kafka appends records to the end of a topic log. Producer chooses partition to send a record via Round-Robin or based on the Record’s Key


Consumers

  • Consumers can use Consumer Groups 
  • A Consumer Group consists of many Consumer Thread that always receives from a single partition.
  • So the degree of parallelism in the consumer within a consumer group is bounded by the number of partitions.
  • Therefore, an higher number of partitions increases the throughput to achieve.


References