@Sarah 2015-12-22T19:51:04.000000Z 字数 4575 阅读 1158

NOSQL DAY1

NOSQL

not only SQl

1 cassandra
2 neo45
3 data locality:in the heart of every data数据局部性

1.what is NoSQL

no-relational data storage//与sql的不同
没有Joins （更多的Join意味着less performance）嵌入式的
network speed /seek time less sidk I/O
没有规范化/没有范式就没有redundancy
可以hold unstructure的data
BASE

2 RDBMS局限性

只有规范化的表才被接受
高范式导致performance低
受限制的scalability 可扩展性
slow
need to be consisitancy

3nosql应用

FB加#可以连接到指定的话题：用noSQL
存电影
social media network
agile development

Saas[NOSQL] - - - -DYNAMODB(by amazon)

#NOSQL 特点
- schema free
- no joins
- map-reduce style programming
- generally prefers data instead of consistency

CAP Theorem

c consestancy
a avability
p partation dollors 不能跳过

在理論計算機科學中，CAP定理（CAP theorem），又被稱作布魯爾定理（Brewer's theorem），它指出對於一個分布式计算系統來說，不可能同時滿足以下三點：[1][2]
一致性（英语：Consistency (database systems)）（等同于所有节点访问同一份最新的数据副本）
可用性（Availability）（对数据更新具备高可用性）
容忍网络分区（Partition tolerance）（以实际效果而言，分区相当于对通信的时限要求。系统如果不能在时限内达成数据一致性，就意味着发生了分区的情况，必须就当前操作在C和A之间做出选择[3]。）
根據定理，分佈式系統只能滿足三項中的兩項而不可能滿足全部三項[4]。理解CAP理论的最简单方式是想象两个节点分处分区两侧。允许至少一个节点更新状态会导致数据不一致，即丧失了C性质。如果为了保证数据一致性，将分区一侧的节点设置为不可用，那么又丧失了A性质。除非两个节点可以互相通信，才能既保证C又保证A，这又会导致丧失P性质。

RDBMS:C+P 没有A
NoSQL:A+P 没有CA

BASE//与SQL acid区分

B basically avalibale 数据一致都avaliable
S soft state 不是consistancy
E eventual consistency

versioning

timestamps

A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. The term derives from rubber stamps used in offices to stamp the current date, and sometimes time, in ink on paper documents, to record when the document was received. Common examples of this type of timestamp are a postmark on a letter or the "in" and "out" times on a time card.

optimistic locking

what is MONgoDB

Document--Json-BSON(binary格式的Json)

Document Database
sue 26

{
name:"sue",
age:26,
status:"A",
groups:["news","sports"]
}

key-value

JSOM example:

{subjects:
["Maths","English"]    
}

{address:
{"staddress":"SEc.12A",
"city":"Dallas",
"state":"Texas",
"zopcode":"13409"
}
}

{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}

--storeEngine=im-memory mmapv1

Data Model

阶级

database-collections
tables
document-row
field
key-value

show dbs
use test
db.restaurants.find()
use test
db.restaurants.create
db.restaurants.insert( 
{ "address" : 
{ "street" : "2 Avenue", "zipcode" : "10075", "building" : "1480", 
"coord" : [ -73.9557413, 40.7720266 ], }, 
"borough" : "Manhattan", 
"cuisine" : "Italian", 
    "grades" : [ 
    { "date" : ISODate("2014-10-    01T00:00:00Z"),     \"grade" : "A", "score" : 11 }, 
    {"date" : ISODate("2014-01-16T00:00:00Z"),      "grade" : "B", "score" : 17 } ], 
"name" : "Vella", 
"restaurant_id" : "41704620" } )

https://raw.githubusercontent.com/mongodb/docs-assets/primer-dataset/dataset.json

ayush.jsom格式
server/3.2/bin文件夹

--file-dataset.json

CRUD operations:read operations

cursors
curor batches :cursor.next()

db.users.find({age:{$gt:18}}).sort({age:1})
//$gt:great than
collection+  query criteria+  modification

selection

db.restaurants.findOne({"borough":"Manhattan"},{name:1,_id:0});

db.restaurants.find({"grade.grade":"A"})(name:1,_id)

objectid:12位唯一byte值。优势：不需要主键了

data modelling:两种都可以
- embedded
- reference

update

 db.users.update(
 {age:{$gt:18}},
 {$set:{status:"A"}},
 {multi:true
 }
 db.users.update(
 {"age":{$gt:25}},{$set:{status:"C"}},{multi:true})

Embedded Data Model

AWS S3

shift+} 返回上一步

ayush.gaur@utdallas.edu

MongoDB DAY2

可以用来sort the data

http://blog.csdn.net/youmoo/article/details/8898662

mogoDB

document
key value:redis,cassnndra,hbase,
graph:
column:

SQL局限性： horizontal

1Wircd Tiger
2ls-memory
3encryt
4mmavd

——id: is the primary key in MongoDB
embeded有什么局限性：增加的时候持续增加一个文件的大小，因为是一个整体，不能分成部分，

根据关系类型使用哪种结构：
1:M:用enbedded
M:M: relational ：为啥不能用嵌入式？因为会导致每一个都重复嵌入一遍，
而用relational可以设置FK

db.creatCollection

grouping & aggregation

db.collection_name.find({ $or:[key1:'v1'},{key2:'v2'}) db.restaurants.find({$ or:[{"b":"M},{"c":"A"}]})

indexing

secondary indexed
name{FName:
ranged-based query operations
B+ tree ：sql 高效index
Hash based:更快的index

single field

db.restaurants.find({"cusine";"Bakery"},{_id:0,cuisine:1})

creatindex({"address,zipcode".1})

嵌入式里的索引按照什么排序？

不是id

db.factory.createIndex({ttt:1})

unique index

{unique:true}

background construction

optimization Query

db.inventory.find({type:"food",item:/^c/

Replication

克服sever faliure

primary:master node
secondary:slave node
arbiter:决定哪个奴隶能够成为主要的，当主要的失败的时候
当主要的失败的时候，奴隶投票谁成为主要的

all the write requests are only sent to the primary node

read request.........makes the reads as consistent

在不同srver里同步
data is local to the particuality machine,increase data localitity

cont

Sharding分片

shard key:可以是single也可以是compound 但必须是个key index
- range based:
- hash nased partitioning:都放进hash表里然偶返回integers 通过functions .

Splitting

map-reduce with java-cont

Hadoop:
- hdfs存储
- Mr:process

mapper :convert data
通过code flowing to the data
reducer

input mappcs-mappce phase

0,this is mongo class
每一个value都是一个key 有word所以不是0应该是1