Understanding NoSQL

                              Understanding NoSQL

1.What is NoSQL

    Agenda

        Common Traits(特点)

        Consistency

        Indexing

        Queries

        MapReduce

        Sharding


    NoSQL Common Traits


    Non-relational

    Non-schematized/schema-free

    Eventual Consistency

    Open source

    Distributed

    "Web scale"

    Developed at big internet companies


    Consistency


    CAP Theorem

        Databases may only excel at tow of the following thress attributes:

        consistency,availability and partition tolerance

    NoSQL does not offer ‘ACID' guarantees

        Atomicity,consistency,isolation and durability

    Instead offers 'eventual consistency'

        Similar to DNS propagation


    Indexing


        Most NoSQL databases are indexed by key

        Some allow so-called 'secondary' indexed

        Often the primary key indexes are clustered

        Hbase uses Hadoop Distributed File System,which is append-only

            Writes are logged

            Logged writes are batched

            File is re-created and sorted


    Queries


        Typically no query language

        Instead,create procedural program

        Sometimes SQL is supported

        Sometimes MapReduce code is used ...


    MapReduce


        Map step:split the query up

        Reduce step:merge the results

        Most typical of Hadoop and used with Wide Column Stores,esp,Hbase

        Amazon Web Service's Elastic MapReduce(EMR) can read/write DynamoDB,s3,Relational Database Service(RDS)

        "Hive" offers a HiveSQL(SQL-like) abstraction over MR

            Use with Hive tables

            Use with Hbase


    Sharding


        A partition pattern where separate servers store partitions

        Fan-out queries supported

        Partitions may by duplicated,so replication also provided

            Good for disaster recovery

        Since 'shards" can be geographically distributed,sharding can act like a CDN

        Good for keeping data close to processing

            Reduces network traffic when MapReduce splitting takes place


2.NoSQL Technology Breakdown

    Agenda

        Key-Value Stores

        Wide-Column Stores

        Document Stores

            Demo Couchdb

        Graph Databases


    Key-Value "mechanics" present throughout


    Key-Value Stores


        The most common;not necessarily the most popular

        Has rows,each with something like a big dictionary|associative array

            Schema may differ from row to row

        Common on Cloud platforms

            e.g,Amazon SimpleDB,Azure Table Storage

        MemcachedDB,Voldemort

        DynamoDB(AWS),Dynomite,Redis and riak


    Document Stores


        Have 'databases',which are akin(类似) to tables

        Have 'documents',akin to rows

            Documents are typically JSON objects

            Each document has properties and values

            Values can be scalars,arrays,links to documents in other databases or sub-documents(i.e,contained JSON objects - Allow for hierarchical storage)

            Can have attachments as well

        Old versions are retained

            So Doc Stores work well for content management

        Some view doc stores as specialized KV stores

        Most popular with developers,startups,VCs

        The biggies:

            CouchDB

            MongoDB


    Document Store Application Orientation

        Documents can each be addressed by URIs

        CouchDB supports full REST interface

        Very geared towards JavaScript and JSON

            Documents are JSON objects

            CouchDB|MongoDB use JavaScript as native language

        In CouchDB,'view functions' also have unique URIs and they return HTML

            so you can build applications in the database

        Demo CouchDB

        http://127.0.0.1:5984/pluralsight/_design/example/_view/dotNet

        http://127.0.0.1:5984/pluralsight/_design/example/_view/dataacess

        http://127.0.0.1:5984/pluralsight/_design/example/_show/showfunction/_id


    Wide Column Stores


    Has tables with declared column families

        Each column family has "columns" with are KV pair that can vary from row to row

    These are the most foundational for large sites

        BigTable(Google)

        Hbase(Originally part of Yahoo-dominate Hadoop project)

        Cassandra(Facebook)

            Calls column families "super columns" and tables "super column families"

    They are the most "Big Data"-ready

        Especially Hbase + Hadoop


    Graph Databases


        Great for social network applications and other where relationships are important

        Node and edges

            Edge like a join

            Nodes like rows in a table

        Nodes can also have properities and values

        Neo4j is a popular graph db


3.Where is a NoSQL Killer App

    Agenda

        Content Management

        Product Catalogs

        Social

        Big Data

        Miscellaneous


    Content Management


        Document databases work really well here

        Regular KV pairs can store meta data

        Can also store text-based content

        Attachments can store file-based or binary content

        Versioning and URI addressability help as well


        CouchDB gets called a 'Web database'

            Database for Web apps

            Database that can contain Web apps

            Think Web sites,not Browser-based LOB applications

            Think EverNote


    Product Catalogs

        Products is a catalog tend to have many attributes in common and then various others that are class-specific

        Common

            ProductID

            Name

            Description

            Price

        Class-Specific

            Flavor,Color

            Resolution,Clockspeed

        Key Value Stores and Wide Column Stores work well here

        KV Stores better when schema will change over time

            Since nothing is declared


    Social

        Graph databases work best here

        Great for tracking:

            Networks

            Followers

            Group membership

            Threaded interactions(comments,likes/favorites)

        Great for Membership,Ownership

        Avoids the self-joins and many-to-many table necessary in relational DBs


    Big Data


        Wide Column and Key-Value stores work best here

        MapReduce is designed for this scenarios

        Hadoop and Hbase come up a lot

        Sharding and append-only help here


        Premise of analytics is reading data,not maintaining it

            This is perfect for NoSQL

            Aggregation,Correlation,regression do not require formal schema,or sophisticated query capabilities

            Just need to read and perform mathematical operations on data really,really quickly


    Miscellaneous


        Event-driven data(i.e,logs)

        User Profiles,preferences

        Mail,status message streams

        Other Web data

            Automobile directions

            info for sites on maps(category,name,description,lat/long,photo)

            User reviews

            Etc.


4.What Good is Relational

    Agenda

        Transactional

        Formal Schema

        Line of Business Applications

        Declarative Query

        Banded Reporting


    Transactional


        Business systems require atomic transactions

        You can't process an order without decrementing inventory(清单)

        You can't register a credit without its corresponding debit

        No exceptions,no excuses


    Formal Schema


        Regular processes have regular data

        Stocks,trades

        PO line items

        Personnel records

        Insurance policies


        These need relational databases with declared schema

        These don't need MapReduce,document or graph representation


    Line of Businesses Applications


        Screen layouts and data binding require consistent schema

        Data Transfer Objects have properties defined in code

            You can't have strong typing without a schema

        Object Relational Mapping

            Object models are mapped to database schema

            If the schema is not consistent then the mapping can't be either


    Declarative Query


        I silly to write imperative code for each routing query

        Makes ad hoc queries and reporting difficult

        Lose out on engine optimization

        Lose out on versatility(多功能性)

        Imperative query works best when the range of queries is very small

        Relational stored procedures do set precedent for pre-written queries,but they still don't iterate through data sets imperatively


    Banded Reporting


        Operational reporting is based on detail and group sections with predictable,consisent layout,based on known schema

        Very hard to design pixel-perfect reports against indeterminate schema

        You can dump all columns/all rows,but that's generic

        Forms are formal,by definition

        This highlights how operational business processes almost always require relational databases


5.NoSQL and Microsoft

    Agenda

        Azure Table Storage

        SQL Server/Azure XML Columns

        SQL Azure Federations

            Demo

        OData

        MongoDB on Azure

        Hadoop on Azure/Windows

            Demo

        SQL Server "Beyond Relational"

        SQL Server Parallel Data Warehouse



    Azure Table Storage

        Cloud-based Key-Value Store

        Supports OData interface(more on that later)

        Key-Value works nicely for general pupose storage and retrieval

        SQL Server Data Services (precursor to SQL Azure) also implemented a Key-Value store


    SQL Azure XML Columns


        XML columns hold structured data that can differ between rows

        Combining scalar and XML columns allows combination of static and dynamic schemas

        XML schemas can still be declared

            But you can have more than one

            And it's not required

        If motivation to use NOSQL is loose schema,then consider XML columns

        To prove the point:Azure Dev Fabric's Table Storage is implemented with  SQL Server Express and XML columns


    SQL Azure Federations


        Federations are the SQL Azure version of sharding

        Just for partitioning,not for replication

            Replication is automatic,implicit in SQL Azure

        Federation Root (physical & logical db,defines F.Key)

            F.Member(physical db - contains specific range of F.Key values)

                F.Atomic Unit(AU - container for all data with same F.Key value)

                    F.Table

        F.Members can be addrssed by absolute name or relative key value

        Allow online repartitioning

        Offer ACID guarantees withing F.Members and adopt Evetual Consistency between them

        Multi-tenancy(租用) applications

        Do not support fan-out query


    OData


        RESTful api for data access,with rendering in XML or JSON

        Clients for JavaScript,mobile platforms,.NET,Java

        Works for feeds and updates

        The following feature OData interfaces:

            Azure Table Storage

            SQL Server/Azure(via WCF Data Services)

            Azure DataMarket

            SQL Server Reporting Services (in 2008 R2.2012)

            SharePoint Lists(2010)

            NetFlix,eBay catalogs;TwitPic

            IBM WebSphere eXtrem Scale REST data service

            Pluralsight catalog!

        Compare to JavaScript/JSON orientation of Document Stores


    Run MongoDB,others on Azure


        Deploy to worker roles

        Put databases in Azure Blog Storage;mount as drives(Azure Drive)

        MongoDB Replica Set Azure wrapper supports this directly

        Use from on-premise or cloud application code

        Similar approach can be used for other NoSQL DBs


    Hadoop on Azure/Windows


        MS + HortonWorks have developed Windows Version of Hadoop

        Currently in Community Technology Preview

        Can use installer to create cluster

            On-premises

            On Azure

        Can also use Hadoop On Azure

            Provision entire cluster from Portal

            Currently has 48-hour lifetime

            Browser-based Hive console

        Hive ODBC Driver

            Use from Excel (with add-in)

            Also use from PowerPivot,Analysis Services(2012 Tabular Mode),Reporting Services


    SQL Server "Beyond Relational" Features


        XML Columns(already discussed)

        HierarchyId

        Sparse columns(SQL Server-only)

        Filestream(SQL Server-only)


        Allow schema flexibility while retaining ACID guarantees


SQL Server Parallel Data Warehouse Edition(SQL PDWE)


    Makes a cluster of SQL Server instances appear as on logical server

    Uses MPP:Massively Parallel Processing

        Compare to MapReduce

    Supports SQL,so no imperative coding needed

    Supports fan-out queries

    Supported by most SQL Server clients

    Available only as appliance

        Has finely tuned processor,storage,networking internals


6.NoSQL,Relational or Both?

    Agenda

        Type of App

        Productivity

        Skill Sets and investment

        Recommendations


    Type of App


        Really a question of consistency versus massive scale

        Is this an internal system or a public one?

        Is is an application for the data or data for a system?

        Below a certain threshold of concurrent usage,NoSQL may e slower than relational


    Productivity


        NoSQL db tooling still immature

        Queries require significant work,and testing

        Programming platforms,frameworks and components may support RDBMSes much more robustly

            Especially enterprise platforms

        If schema subject to frequent change then NoSQL may be more productive


    Skill Sets and investment


        Does your staff have RDBMS skills already?

        Do you have significant investment in relational database hw/sw?(hardware/software)

        Lots of apps that use an RDBMS?

        Do you want to retool(改革)?

        Do you want to support both?


        Are you a startup?

        Employ developers who possess NoSQL skills and prefer NoSQL?

        Does availability/scalability make RDBMS investment questions moot?


    Recommendations


        Large,public,content-centric properties:NoSQL

        Internal LOB(line of business) supporting business operations:relational

        Investment in RDBMS licenses,infrastructure,skills:

            Relational

            Use both (application-dependent)

            Use Hybrid approaches

        Productivity

            Do cost-benefit analysis

                How much extra dev times/$$?

                What is cost of less scalable system?


        It will be tempting ot use one for the other

            And it very well may work,but that doesn't make it right


转载于:https://my.oschina.net/8pBwdEmxK2hL/blog/285096

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值