• UID12
  • 登录2016-06-14
  • 粉丝112
  • 关注50
  • 发帖1415
  • 主页
  • 金币8548枚
社区居民
最爱沙发
忠实会员
喜欢达人
原创写手
极分享 发布于2016-06-02 16:36
1/1392

Themis 来自小米: 为HBase 提供了跨行/跨表的事务处理,基于 Google 的 percolator.

楼层直达

Themis 来自小米: 为HBase 提供了跨行/跨表的事务处理

 

源码:https://github.com/XiaoMi/themis

 

Themis provides cross-row/cross-table transaction on HBase based on google's Percolator.

Themis guarantees the ACID characteristics of cross-row transaction by two-phase commit and conflict resolution, which is based on the single-row transaction of HBase. Themis depends on Chronos to provide global strictly incremental timestamp, which defines the global order for transactions and makes Themis could read database snapshot before given timestamp. Themis adopts HBase coprocessor framework, which could be applied without changing source code of HBase. We validate the correctness of Themis for a few months, and optimize the algorithm to achieve better performance.

Implementation

Themis contains three components: timestamp server, client library, themis coprocessor.

themis_architecture

Timestamp Server

Themis uses the timestamp of HBase's KeyValue internally, and the timestamp must be global strictly incremental. Themis depends on Chronos to provide such timestamp service.

Client Library

  1. Provide transaction APIs.
  2. Fetch timestamp from Chronos.
  3. Issue requests to themis coprocessor in server-side.
  4. Resolve conflict for concurrent mutations of other clients.

Themis Coprocessor:

  1. Provide RPC methods for two-phase commit and read.
  2. Create auxiliary families and set family attributes for the algorithm automatically.
  3. Periodically clean the data of the aborted and expired transactions.

Usage

Build

  1. Get the latest source code of Themis:

     git clone https://github.com/XiaoMi/themis.git 
    
  2. The master branch of Themis depends on hbase 0.94.21 with hadoop.version=2.0.0-alpha. We can download source code of hbase 0.94.21 and install it in maven local repository by:

     (in the directory of hbase 0.94.21)
     mvn clean install -DskipTests -Dhadoop.profile=2.0
    
  3. Build Themis and install in local repository:

     cd themis
     mvn clean install -DskipTests
    

Loads themis coprocessor in HBase:

  1. Add themis-coprocessor dependency in the pom of HBase:

     <dependency>
       <groupId>com.xiaomi.infra</groupId>
       <artifactId>themis-coprocessor</artifactId>
       <version>1.0-SNAPSHOT</version>
     </dependency>
    
  2. Add configurations for themis coprocessor in hbase-site.xml:

     <property>
       <name>hbase.coprocessor.user.region.classes</name>
       <value>org.apache.hadoop.hbase.themis.cp.ThemisProtocolImpl,org.apache.hadoop.hbase.themis.cp.ThemisScanObserver,org.apache.hadoop.hbase.regionserver.ThemisRegionObserver</value>
     </property>
     <property>
        <name>hbase.coprocessor.master.classes</name>
        <value>org.apache.hadoop.hbase.master.ThemisMasterObserver</value>
     </property>
    
    
  3. Add the themis-client dependency in the pom of project which needs cross-row transactions.

Depends themis-client:

Add the themis-client dependency in the pom of project which needs cross-row transactions.

 <dependency>
  <groupId>com.xiaomi.infra</groupId>
  <artifactId>themis-client</artifactId>
  <version>1.0-SNAPSHOT</version>
 </dependency>

Run the example code

  1. Start a standalone HBase cluster(0.94.21 with hadoop.version=2.0.0-alpha) and make sure themis-coprocessor is loaded as above steps.

  2. After building Themis, run example code by:

     cd themis-client
     mvn exec:java -Dexec.mainClass="org.apache.hadoop.hbase.themis.example.Example"
    

The screen will output the result of read and write transactions.

Example of Themis API

The APIs of Themis are defined in TransactionInterface.java, including put/delete/get/getScanner, which are similar to HBase's APIs:

 public void put(byte[] tableName, ThemisPut put) throws IOException;
 public void delete(byte[] tableName, ThemisDelete delete) throws IOException;
 public void commit() throws IOException;
 public Result get(byte[] tableName, ThemisGet get) throws IOException;
 public ThemisScanner getScanner(byte[] tableName, ThemisScan scan) throws IOException;

The following code shows how to use Themis APIs:

 // This class shows an example of transfer $3 from Joe to Bob in cash table, where rows of Joe and Bob are
 // located in different regions. The example will use the 'put' and 'get' APIs of Themis to do transaction.
 public class Example {
   private static final byte[] CASHTABLE = Bytes.toBytes("CashTable"); // cash table
   private static final byte[] JOE = Bytes.toBytes("Joe"); // row for Joe
   private static final byte[] BOB = Bytes.toBytes("Bob"); // row for Bob
   private static final byte[] FAMILY = Bytes.toBytes("Account");
   private static final byte[] CASH = Bytes.toBytes("cash");

   public static void main(String args[]) throws IOException {
     Configuration conf = HBaseConfiguration.create();
     HConnection connection = HConnectionManager.createConnection(conf);
     // create table and set THEMIS_ENABLE in family 'Account' 
     createTable(connection);

     // transfer $3 from Joe to Bob
     Transaction transaction = new Transaction(conf, connection);
     // firstly, read out the current cash for Joe and Bob
     ThemisGet get = new ThemisGet(JOE).addColumn(FAMILY, CASH);
     int cashOfJoe = Bytes.toInt(transaction.get(CASHTABLE, get).getValue(FAMILY, CASH));
     get = new ThemisGet(BOB).addColumn(FAMILY, CASH);
     int cashOfBob = Bytes.toInt(transaction.get(CASHTABLE, get).getValue(FAMILY, CASH));

     // then, transfer $3 from Joe to Bob, the mutations will be cached in client-side
     int transfer = 3;
     ThemisPut put = new ThemisPut(JOE).add(FAMILY, CASH, Bytes.toBytes(cashOfJoe - transfer));
     transaction.put(CASHTABLE, put);
     put = new ThemisPut(BOB).add(FAMILY, CASH, Bytes.toBytes(cashOfBob + transfer));
     transaction.put(CASHTABLE, put);
     // commit the mutations to server-side
     transaction.commit();

     connection.close();
     Transaction.destroy();
   }
 }

For the full example, please see : org.apache.hadoop.hbase.themis.example.Example.java

Schema Support

  1. Themis will use the timestamp of KeyValue internally, so that the timestamp and version attributes of HBase's KeyValue can't be used by the application.
  2. For families need Themis, set THEMIS_ENABLE to 'true' by adding "CONFIG => {'THEMIS_ENABLE', 'true'}" to the family descriptor when creating table.
  3. For each column, Themis will introduce two auxiliary columns : lock column and commit column. Themis saves the auxiliary columns in specific families : lock column in family 'L', and commit column in family #p(or in family #d if it is a Delete). The character '#' is preserved by Themis and application should not include it in name of the family needing Themis. Themis will create auxiliary families automically when creating table if 'THEMIS_ENABLE' is set on some family.

Themis Configuration

Client Side

Timestamp server

If users want strong consistency across client processes, the 'themis.timestamp.oracle.class' should be set to 'RemoteTimestampOracleProxy'. Then, Themis will access globally incremental timestamp from Chronos, the entry of Chronos will be registered in Zookeeper where the quorum address and entry node can be configured.

The default value of 'themis.timestamp.oracle.class' is 'LocalTimestampOracle', which provides incremental timestamp locally in one process. If users only need strong consistency in one clent process, the default value could be used.

Key Description Default Value
themis.timestamp.oracle.class timestamp server type LocalTimestampOracle
themis.remote.timestamp.server.zk.quorum ZK quorum where remote timestamp server registered 127.0.0.1:2181
themis.remote.timestamp.server.clustername cluster name of remote timestamp server default-cluster

Lock clean

The client needs to clean lock if encountering conflict. Users can configure the ttl of lock in client-side by 'themis.client.lock.clean.ttl'. The default value of this configuration is 0, which means the lock ttl will be decided by the server side configurations.

Users can configure 'themis.worker.register.class' to 'ZookeeperWorkerRegister' to help resolve conflict faster. For details of conflict resolve, please see: Percolator paper.

Key Description Default Value
themis.client.lock.clean.ttl lock ttl configured in client-side 0
themis.worker.register.class worker register class NullWorkerRegister
themis.retry.count retry count when clean lock 10
themis.pause sleep time between retries 100

Server Side

Data Clean Options

Both read and write transactions should not last too long. Users can set 'themis.transaction.ttl.enable' to enable transaction ttl. If this configuration is enabled, 'themis.read.transaction.ttl' and 'themis.write.transaction.ttl' could be used to configure the ttl for read transaction and write transaction respectively.

If users enable transaction ttl, old data may become expired and can not be read by any transaction. Users can enable 'themis.expired.data.clean.enable' to clean the old and expired data from HBase.

Key Description Default Value
themis.transaction.ttl.enable whether the transaction will be expired true
themis.read.transaction.ttl ttl for read transaction 86400
themis.write.transaction.ttl ttl for write transaction 60
themis.expired.data.clean.enable enable cleaning the old and expired data true

Metrics

Themis provides metrics for major APIs, which could be retrieved from JMX or sent to a file. The following configuration in hadoop-metrics.properties will send the metric to file periodically:

 themis.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext
 themis.period=10
 themis.fileName=./themis_metrics.out

MapReduce Support

Themis implement InputFormat and OutputFormat interface in MapReduce framework:

  1. ThemisTableInputFormat is implemented to read data from themis-enable table in Mapper. To read data from multi-tables, please use MultiThemisTableInputFormat.

  2. ThemisTableOutputFormat is implemented to write data by Themis to themis-enable table in Reducer. To write data across multi-tables, please use MultiThemisTableOutputFormat.

  3. ThemisTableMapReduceUtil provides utility methods to start a MapReduce job.

Global Secondary Index Support

Based on the cross table data consistency guaranteed by Themis transaction, we build an expiremental sub-project "themis-index" to support global secondary index, this sub-project is in progress.

Test

Correctness Validation

We design an AccountTransfer simulation program to validate the correctness of implementation. This program will distribute initial values in different tables, rows and columns in HBase. Each column represents an account. Then, configured client threads will be concurrently started to read out a number of account values from different tables and rows by themisGet. After this, clients will randomly transfer values among these accounts while keeping the sum unchanged, which simulates concurrent cross-table/cross-row transactions. To check the correctness of transactions, a checker thread will periodically scan account values from all columns, make sure the current total value is the same as the initial total value. We run this validation program for a period when releasing a new version for Themis.

Performance Test

Percolator Result:

Percolator tests the read/write performance for single-column transaction(represents the worst case of Percolator) and gives the relative drop compared to BigTable as follow table.

  BigTable Percolator Relative
Read/s 15513 14590 0.94
Write/s 31003 7232 0.23

Themis Result: We evaluate the performance of Themis under similar test conditions with Percolator's and give the relative drop compared to HBase.

Evaluation of get. Load 30g data into HBase before testing get by reading loaded rows. We set the heap size of region server to 10g and hfile.block.cache.size=0.45.

Client Thread GetCount Themis AvgLatency(us) HBase AvgLatency(us) Relative
5 10000000 1029.88 1191.21 0.86
10 20000000 1230.44 1407.93 0.87
20 20000000 1848.05 2190.00 0.84
50 30000000 4529.80 5382.87 0.84

Evaluation of put. Load 3,000,000 rows data into HBase before testing put. We config 256M cache size to keep locks in memory for write transaction.

Client Thread PutCount Themis AvgLatency(us) HBase AvgLatency(us) Relative
1 3000000 1620.69 818.62 0.51
5 10000000 1695.89 1074.13 0.63
10 20000000 2057.55 1309.12 0.64
20 20000000 2761.66 1902.79 0.69
50 30000000 5441.48 3702.04 0.68

The above tests are all done in a single region server. From the results, we can see the performance of get is 85% of HBase's get and the performance of put is about 60% of HBase's put. For get, the result is about 10% lower to that reported in Percolator paper. The put performance is much better compared to that reported in Percolator paper. We optimize the performance of single-column transaction by the following skills:

  1. In prewrite phase, we only write the lock to MemStore;

  2. In commit phase, we erase corresponding lock if it exist, write data and commit information at the same time.

The aboving skills make prewrite phase not sync HLog, so that improving the write performance a lot. After applying the skills, if region server restarts after prewrite phase, the commit phase can't read the persistent lock and the transaction will fail, this won't break correctness of the algorithm.

Future Works

  1. Optimize the memory usage of RegionServer. Only locks of unfinished transactions should be kept in memory.
  2. Support different ioslation levels. Study the tradeoff between isolation levels and efficiency.
  3. Commit secondary rows in background to improve latency.
  4. Optimize the lock clean process.
  5. Open source the correctness validate program: AccountTransfer.

Contact Us

Any suggestion or discussion about Themis is welcomed. Please contact us by cuijianwei@xiaomi.com, or in HBase jiraHBASE-10999.


0人打赏
  • UID1242
  • 登录2017-10-15
  • 粉丝2
  • 关注1
  • 发帖43
  • 主页
  • 金币970枚
社区居民
忠实会员
pongleung 发布于2016-06-03 21:59
沙发F
您需要登录后才可以回帖
发表回复
极贡献
技术问答
专题荟萃
程序人生
视觉设计
Android开发
iOS开发
编程语言
前端开发
后端开发
服务器架构
软件测试
运维方案
创业路上



最热文章墙

  • 76999/375   【精品推荐】200多种Android动画效果的强悍框架,太全了,不看这个,再有动画的问题,不理你了^@^

  • 44075/191   情人节福利,程序员表白的正确姿势:改几行代码就变成自己的表白了

  • 43705/0   Python爬虫:常用浏览器的useragent

  • 40218/259   【精品推荐】Android版产品级的音乐播放器源码,功能太强大了,最好的产品原型有木有?

  • 37956/145   省时省力的Android组件群来了,非常棒的原型参考

  • 29218/142   2016抢红包软件及源码

  • 28975/2   超全!整理常用的iOS第三方资源

  • 28846/71   原创表白APP,以程序员的姿势备战新年后的7夕,持续完善中!

  • 23562/159   Android版类似UC浏览器:非常赞,产品级的源码

  • 22587/30   麻省理工的一帮疯子,真的实现了随意操控万物!(绝对黑科技)

  • 22270/25   Android工程师面试题大全

  • 22145/27   2016程序员跳槽全攻略

  • 21735/9   GitHub上排名前50的iOS项目:总有一款你用得着

  • 20701/20   码魂:程序员的牛B漫画

  • 18795/10   2016年最全的Android面试考题+答案 精编版

  • 18644/85   Android小而全的博客源码:非常适合全面掌握开发技巧

  • 18528/3   吐槽那些程序员的搞笑牛逼注释

  • 18517/42   一个绚丽的loading动效分析与实现!

  • 18504/73   【持续更新中】Android福利贴(二):资料源码大放送

  • 17453/1   iOS 动画总结

  • 17300/45   惊艳的App引导页:背景图片切换加各个页面动画效果

  • 17097/81   仿京东商城客户端Android最新版,不错的原型和学习资料

  • 16925/104   Android带弹幕的视频播放器源码,来自大名鼎鼎的Bilibili弹幕网站

  • 16838/23   个人收集的Android 各类功能源代码

  • 16324/5   新一代Android渠道打包工具:1000个渠道包只需要5秒

  • 16247/10   女程序员的梦,众网友的神回复

  • 16242/21   Android福利第三波【Android电子书】

  • 16109/53   基于瀑布流的美女图片浏览App,有注释的源代码

  • 16082/17   用JavaScript 来开发iOS和Android 原生应用:React Native开源框架中文版来啦

  • 16001/81   【精品推荐】类似360安全卫士安Android源码:非常赞的产品原型

  • 15938/11   年会上现场review代码是怎么样的体验!

  • 15744/23   珍藏多年的素材,灵感搜寻网站

  • 15742/0   iOS中文版资源库,非常全

  • 15048/18   65条最常用正则表达式,你要的都在这里了

  • 14573/15   基于Android支付宝支付设计和开发方案

  • 13999/17   什么是真正的黑客:收获12200+Stars,人气远超微软开源VS

  • 13968/11   有木有这样一张酷图帮你集齐所有git命令超实用

  • 13654/46   在线音乐播放器完整版(商用级的源码):非常赞,可听免费高品质专辑

  • 13480/0   GitHub iOS 库和框架Top100 

  • 13332/7   一张图搞定iOS学习路线,非常全面

  • 13325/7   用程序员的姿势抢过年的火车票

  • 13198/61   【技巧一】搭配Android Studio,如何实现App远程真机debug?

  • 12958/10   成为Java顶尖程序员 ,看这11本书就够了

  • 12875/10   微信支付终于成功了(安卓,iOS),在此分享

  • 12800/18   一张图搞定Android学习路线,非常全面

  • 12542/29   【持续更新中】Android福利贴(一):资料源码

  • 12475/3   基于Node.js的强大爬虫,能直接发布抓取的文章哦

  • 12185/4   46 个非常有用的 PHP 代码片段

  • 11740/3   即时通信第三方库

  • 11252/8   流媒体视频直播方案

  • 11170/18   八个最优秀的Android Studio插件

  • 11052/9   B站建开源工作组:APP想支持炫酷弹幕的看过来

  • 10882/9   烧了5亿美金,这家神秘的公司即将颠覆人类未来!

  • 10785/2   【精品推荐】高质量PHP代码的50个实用技巧:非常值得收藏

  • 10692/10   中国黑客的隐秘江湖:攻守对立,顶尖高手月入千万美元

  • 10056/6   开箱即用!Android四款系统架构工具

  • 9876/10   十大技巧快速提升Android应用开发性能

  • 9798/3   10款GitHub上最火爆的国产开源项目——可以媲美西半球

  • 9696/1   Android性能优化视频,文档以及工具

  • 9659/3   一张图看清Linux 内核运行原理

  • 返回顶部