NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
In short, NoSQL database management systems are useful when working with a huge quantity of data when the data's nature does not require a relational model. The data can be structured, but NoSQL is used when what really matters is the ability to store and retrieve great quantities of data, not the relationships between the elements.
- Precisamos sempre usar um armazenamento distribuído (como o HDFS no caso do Hadoop) para colocar em cima um banco de dados NoSQL (como o Hbase)?
Para larga escala, é melhor usar armazenamento distribuído com o NoSQL. Mas, para pequena escala, você pode usar o sistema de armazenamento comum.
- Acho difícil entender a arquitetura de dados BIG típica, especialmente para dados não estruturados.
Unstructured Data (or unstructured information) refers to information
that either does not have a pre-defined data model and/or does not fit
well into relational tables. Unstructured information is typically
text-heavy, but may contain data such as dates, numbers, and facts as
well. This results in irregularities and ambiguities that make it
difficult to understand using traditional computer programs as
compared to data stored in fielded form in databases or annotated
(semantically tagged) in documents.
In information technology, big data is a collection of data sets so
large and complex that it becomes difficult to process using on-hand
database management tools or traditional data processing applications.
Big data usually includes data sets with sizes beyond the ability of
commonly used software tools to capture, curate, manage, and process
the data within a tolerable elapsed time. Big data sizes are a
constantly moving target, as of 2012 ranging from a few dozen
terabytes to many petabytes of data in a single data set. With this
difficulty, new platforms of "big data" tools are being developed to
handle various aspects of large quantities of data.
Para entender a arquitetura BIG Data especialmente para dados não estruturados, veja como os Giants trabalham com Big Data.
Por exemplo, Google
Por exemplo, IBM