Clusterpoint’s unique hybrid indexation technology indexes all data types to provide ultra fast answers to the complex questions you always wanted to ask.
- schema free XML database server, supporting also JSON
- transparent MapReduce for cluster-wide database operations
- advanced enterprise search engine software
- full indexation across structured and unstructured data
- unique database ranking facility for relevant Big Data search
- combined Full-Text and SQL-like search through a single API query
- masterless clustering technology with linear scale out ability
- integrated replication and fault tolerance
- load balancing for failover and for distributed demand
- milliseconds-latency as a critical feature for web & mobile apps
Clusterpoint also brings simplicity, cost-efficiency and ease of use:
- Wide software language support eases adoption into the enterprise
- Schema free XML model handles loosely typed variable data with ease
- Delivers multi-tenancy of many databases on the same infrastructure
- Cluster architecture scales out elastically on-premise or in the cloud
- Use of commodity hardware maximizes performance & minimizes TCO
- Web GUI provides cluster-wide centralized management of operations
CLUSTERPOINT DATABASE – The Product Componets
Key Clusterpoint features include:
- Loads XML, JSON and delimited data
- > 1 Billion records per day on a commodity server
- Auto-sharding distributes data across the cluster
- Index all data types in real-time
- Manage indexing policies with a single control file
- Specify ranking policies in same control file
- Use rich Google-like search – including full text and geo-spatial
- Ensure optimum relevance in results by using the ranking index
- Integrate search through a rich API (that also supports query)
- Filter using SQL-like predicates and constructs on all fields
- Embed Google-like rich text search o make queries more powerful
- Integrate query through a rich API (that also supports search)
- Perform powerful roll-ups based on selected key fields
- Use rich text search as filters and keys for aggregations
- Alert on critical events and trigger downstream business processes
- Define alert filters that combine search and query
- Flexible options for triggering alerts on load or on timer basis
Clusterpoint is a NoSQL database
It is written in C++ for maximum performance and runs at near metal-frame speed. Unlike traditional relational databases, you store information in documents rather than in rows and columns.
- Simplicity – Some information should not and cannot be broken down into rows and columns or normalized into a bunch of tables. That information is usually text based or managed as an object. Web pages, emails, notes, tweets etc. are good examples. Clusterpoint’s XML based document model is ideally suited to this type of information.
- Agility – Indexing and ranking everything (especially text content) is not what relational databases were designed for. The complexity of the schema and associated indexing strategy quickly gets out of hand. Clusterpoint lets you load information ‘as is’ and indexes everything. This is especially efficient for managing information with poorly defined, poorly followed, changing, or even unknown schemas.
- Results – Sometimes the answers you need come from a combination of structured access (SQL-like query) and unstructured access (Google-like search). Clusterpoint is sturucture-aware and lets you combine these in a single call through one API. This results in faster answers, less calls to the database and simpler applications.
Using Clusterpoint, organizations are delivering powerful data hubs , big data search solutions and mobile apps that leverage our key benefits…
|Unified Data||Reduce development, integration and maintenance costs with faster deployment of functionally rich but less complex apps. Management of your structured and unstructured data in one database with combined Google-like search and SQL-like query.|
|Ultra-Fast||Deliver sub-second interactive response times from hundreds of terabytes of mixed data with fine-grained relevance control. Hybrid indexation across data types delivers answers at the speed of thought and guarantees relevance in your results.|
|Flexible||Increase agility by delivering apps that deal with variety in data types and volatility in data feeds without breaking sweat. Single schema-free solution for all data types that integrates easily into your infrastructure and application stacks.|
|Scalable||Deploy data intensive applications faster and run them with predictably scalable performance and predictable costs. Shared nothing model with auto-sharding scales elastically on commodity hardware either on-premise or in the cloud.|
Clusterpoint is a shared nothing NoSQL database that stores information in XML document format. Clusterpoint uses hybrid indexation to unify management of structured and unstructured data in a single database. Data is stored and retrieved through a rich API which supports a full range of CRUD operations and provides both Google-like search and SQL-like query.
Clusterpoint is a full function enterprise database platform written in C++ to run at near metal-frame speed. It can be implemented in a three-tier architecture using web servers to serve content to web and mobile devices. It can also be implemented in a two-tier mode where clients communicate directly with the database platform itself.
Clusterpoint’s primary API is through XML which makes it accessible from any platform using any programming language. Additional libraries provide API support for Java, Python, PHP, C, C++ and .NET (C#, VB and ASP).
Clusterpoint Logical Model
A Clusterpoint database comprises one or more stores. Each store contains a repository and its associated index. The repository contains all data objects and is written back to disk. The index contains information for all of the fields that you have asked to be indexed (regardless of data type). The index is written to disk for persistency but is partially cached in memory for performance.
Clusterpoint uses hybrid indexation to build the index. This means that there is only ever one index per repository and that index contains all the information required to index structured and unstructured data i.e. all data types including text.
Clusterpoint Physical Model
Each Clusterpoint store is a collection of XML documents. A store can be sharded; where each shard contains a subset of the documents contained in that store. Shards are then distributed across nodes with sharding performed either by key or using a hashing algorithm.
Clusterpoint uses a shared-nothing architecture which scales out on either physical servers or virtual machines. The same Clusterpoint software image is installed on each node so the cluster operates in multi-master mode i.e. all nodes are equal and can handle all operations.
If we consider an example of a news feed database that includes news Articles along with Author profiles there are two distinct data components; Articles and Authors. This database would probably be implemented as two separate stores; one for Articles and one for Authors. The diagram below shows how that database might roll out onto a three node cluster with each store replicated once.
In the diagram, the Articles store contains a high number of records and has been split into two shards. The primary copies of these shards are hosted on nodes X and Y. On the other hand the Authors store is implemented as a single shard with the primary copy hosted on node X.
Each store can be replicated a number of times with replication level set at the store level. In the diagram both the Articles and Authors store have been replicated once. Accordingly the replica copies of the Articles shards are hosted on nodes Y and Z while the replica copy of the Authors shard is hosted on shard Z.
Replica copies are automatically synchronized and maintained so replication is transparent to the application. Replicas are active so searches and queries will automatically exploit the additional copies for performance. In fact the decision to replicate is sometimes taken to boost performance as well as to protect against hardware failure. Replication can also be performed across datacenters; an option typically used either to boost performance for cross-geo applications or for pure business continuity.
Clusterpoint provides a high degree of flexibility; enabling customers to build out the architecture that best fits specific application requirements. Options include:
- one or more databases per cluster
- one or more stores per database
- one or more shards per store• one or more replicas per store
- any shard on any node
The separation between logical and physical database is key. All applications interact at the logical level and don’t need to be aware of sharding, replication or the subsequent distribution of shards across nodes. Any need to alter sharding, replication or shard distribution across the nodes does not require any application level change.