• UID12
  • 登录2016-06-14
  • 粉丝114
  • 关注50
  • 发帖1415
  • 主页
  • 金币8696枚
极分享 发布于2016-06-02 15:11

Minos 一个分布式的发布和监控系统


Minos 是小米公司开发的一个分布式的发布和监控系统。最初是小米开发的用来在 Hadoop 和 ZooKeeper 集群上发布和管理的工具。Minos 可轻松扩展来支持其他的系统,目前已经支持包括 HDFS、YARN 和 Impala 。




What is Minos

Minos is a distributed deployment and monitoring system. It was initially developed and used at Xiaomi to deploy and manage the Hadoop, HBase and ZooKeeper clusters used in the company. Minos can be easily extended to support other systems, among which HDFS, YARN and Impala have been supported in the current release.


The Minos system contains the following four components:


This is the command line client tool used to deploy and manage processes of various systems. You can use this client to perform various deployment tasks, e.g. installing, (re)starting, stopping a service. Currently, this client supports ZooKeeper, HDFS, HBase, YARN and Impala. It can be extended to support other systems. You can refer to the following Using Client to learn how to use it.


This is the dashboard system to display the status of all processes, where users can take a overview of the whole clusters managed by Minos. It collects data from servers through JMX interface. And it organizes pages in cluster, job and task corresponding to the definition in cluster configuration. It also provides some utils like health alerter, HDFS quota updater and quota reportor. You can refer to Installing Owl to learn how to install and use it.


This is the process management and monitoring system. Supervisor is an open source project, a client/server system that allows its users to monitor and control a number of processes on a UNIX-like operating system.

Based on the version of supervisor-3.0b1, we extended Supervisor to support Minos. We implemented an RPC interface under the deployment directory, so that our deploy client can invoke the services supplied by supervisord.

When deploying a Hadoop cluster for the first time, you need to set up supervisord on every production machine. This only needs to be done once. You can refer to Installing Supervisor to learn how to install and use it.


This is a simple package management Django app server for our deployment tool. When setting up a cluster for the first time, you should set up a tank server first. This also needs to be done only once. You can refer to Installing Tank to learn how to install and use it.

Setting Up Minos on Centos/Ubuntu


Install Python

Make sure install Python 2.7 or later from http://www.python.org.

Install JDK

Make sure that the Oracle Java Development Kit 6 is installed (not OpenJDK) fromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html, and that JAVA_HOME is set in your environment.

Building Minos

Clone the Minos repository

To Using Minos, just check out the code on your production machine:

git clone https://github.com/XiaoMi/minos.git

Build the virtual environment

All the Components of Minos run with its own virtual environment. So, before using Minos, building the virtual environment firstly.

cd minos
./build.sh build

Note: If you only use the Client component on your current machine, this operation is enough, then you can refer toUsing Client to learn how to deploy and manage a cluster. If you want to use the current machine as a Tank server, you can refer to Installing Tank to learn how to do that. Similarly, if you want to use the current machine as a Owl server or a Supervisor server, you can refer to Installing Owl and Installing Supervisor respectively.

Installing Tank

Start Tank

cd minos
./build.sh start tank --tank_ip ${your_local_ip} --tank_port ${port_tank_will_listen}

Note: If you do not specify the tank_ip and tank_port, it will start tank server using on 8000 port.

Stop Tank

./build.sh stop tank

Installing Supervisor


Make sure you have intstalled Tank on one of the production machines.

Start Supervisor

cd minos
./build.sh start supervisor --tank_ip ${tank_server_ip} --tank_port ${tank_server_port}

When starting supervisor for the first time, the tank_ip and tank_port must be specified.

After starting supervisor on the destination machine, you can access the web interface of the supervisord. For example, if supervisord listens on port 9001, and the serving machine's IP address is, you can access the following URL to view the processes managed by supervisord:

Stop Supervisor

./build.sh stop supervisor

Monitor Processes

We use Superlance to monitor processes. Superlance is a package of plug-in utilities for monitoring and controlling processes that run under supervisor.

We integrate superlance-0.7 to our supervisor system, and use the crashmail tool to monitor all processes. When a process exits unexpectedly, crashmail will send an alert email to a mailing list that is configurable.

We configure crashmail as an auto-started process. It will start working automatically when the supervisor is started. Following is a config example, taken from minos/build/template/supervisord.conf.tmpl, that shows how to configure crashmail:

command=python superlance/crashmailbatch.py \
        --toEmail="alert@example.com" \
        --fromEmail="robot@example.com" \
        --password="123456" \
        --smtpHost="mail.example.com" \
        --tickEvent=TICK_5 \

Note: The related configuration information such as the server port or username is set inminos/build/template/supervisord.conf.tmpl, if you don't want to use the default value, change it.

Using Client


Make sure you have intstalled Tank and Supervisor on your production machines.

A Simple Tutorial

Here we would like to show you how to use the client in a simple tutorial. In this tutorial we will use Minos to deploy an HDFS service, which itself requires the deployment of a ZooKeeper service.

The following are some conventions we will use in this tutorial:

  • Cluster type: we define three types of clusters: tst for testing, prc for offline processing, and srv for online serving.
  • ZooKeeper cluster name: we define the ZooKeeper cluster name using the IDC short name and the cluster type. For example, dptst is used to name a testing cluster at IDC dp.
  • Other service cluster names: we define other service cluster names using the corresponding ZooKeeper cluster name and the name of the business for which the service is intended to serve. For example, the dptst-example is the name of a testing cluster used to do example tests.
  • Configuration file names: all the services will have a corresponding configuration file, which will be named as${service}-${cluster}.cfg. For example, the dptst ZooKeeper service's configuration file is named as zookeeper-dptst.cfg, and the dptst example HDFS service's configuration file is named as hdfs-dptst-example.cfg.

Configuring deploy.cfg

There is a configuration file named deploy.cfg under the root directory of minos. You should first edit this file to set up the deployment environment. Make sure that all service packages are prepared and configured in deploy.cfg.

Configuring ZooKeeper

As mentioned in the cluster naming conventions, we will set up a testing ZooKeeper cluster at the dp IDC, and the corresponding configuration file for the cluster will be named as zookeeper-dptst.cfg.

You can edit zookeeper-dptst.cfg under the config/conf/zookeeper directory to configure the cluster. The zookeeper-dptst.cfg is well commented and self explained, so we will not explain more here.

Setting up a ZooKeeper Cluster

To set up a ZooKeeper cluster, just do the following two steps:

  • Install a ZooKeeper package to the tank server:

    cd minos/client
    ./deploy install zookeeper dptst
  • Bootstrap the cluster, this is only needed once when the cluster is setup for the first time:

    ./deploy bootstrap zookeeper dptst

Here are some handy ways to manage the cluster:

  • Show the status of the ZooKeeper service:

    ./deploy show zookeeper dptst
  • Start/Stop/Restart the ZooKeeper cluster:

    ./deploy stop zookeeper dptst
    ./deploy start zookeeper dptst
    ./deploy restart zookeeper dptst
  • Clean up the ZooKeeper cluster:

    ./deploy cleanup zookeeper dptst
  • Rolling update the ZooKeeper cluster:

    ./deploy rolling_update zookeeper dptst

Configuring HDFS

Now it is time to configure the HDFS system. Here we set up a testing HDFS cluster named dptst-example, whose configuration file will be named as hdfs-dptst-example.cfg, as explained in the naming conventions.

You can edit hdfs-dptst-example.cfg under the config/conf/hdfs directory to configure the cluster. The hdfs-dptst-example.cfg is well commented and self explained, so we will not explain more here.

Setting Up HDFS Cluster

Setting up and managing an HDFS cluster is similar to setting up and managing a ZooKeeper cluster. The only difference is the cluster name, dptst-example, which implies that the corresponding ZooKeeper cluster is dptst:

./deploy install hdfs dptst-example
./deploy bootstrap hdfs dptst-example
./deploy show hdfs dptst-example
./deploy stop hdfs dptst-example
./deploy start hdfs dptst-example
./deploy restart hdfs dptst-example
./deploy rolling_update hdfs dptst-example --job=datanode
./deploy cleanup hdfs dptst-example


The client tool also supports a very handy command named shell. You can use this command to manage the files on HDFS, tables on HBase, jobs on YARN, etc. Here are some examples about how to use the shell command to perform several different HDFS operations:

./deploy shell hdfs dptst-example dfs -ls /
./deploy shell hdfs dptst-example dfs -mkdir /test
./deploy shell hdfs dptst-example dfs -rm -R /test

You can run ./deploy --help to see the detailed help messages.

Installing Owl

Owl must be installed on the machine that you also use the Client component, they both use the same set of cluster configuration files.


Install Gnuplot

Gnuplot is required for opentsdb, you can install it with the following command.

Centos: sudo yum install gnuplot
Ubuntu: sudo apt-get install gnuplot

Install Mysql

sudo apt-get install mysql-server
sudo apt-get install mysql-client

yum install mysql-server mysql mysql-devel


Configure the clusters you want to monitor with owl in minos/config/owl/collector.cfg. Following is an example that shows how to modify the configuration.

# service name(space seperated)
service = hdfs hbase

# cluster name(space seperated)
# job name(space seperated)
jobs=journalnode namenode datanode
# url for collecotr, usually JMX url

Note: Some other configurations such as and opentsdb port is set in minos/build/minos_config.py. You can change the default port for avoiding port conflicts.

Start Owl

cd minos
./build.sh start owl --owl_ip ${your_local_ip} --owl_port ${port_owl_monitor_will_listen}

After starting Owl, you can access the web interface of the Owl. For example, if Owl listens on port 8088, and the machine's IP address is, you can access the following URL to view the Owl web interface:

Stop Owl

./build.sh stop owl


  1. When installing Mysql-python, you may get an error of _mysql.c:44:23: error: my_config.h: No such file or directory (centos) or EnvironmentError: mysql_config not found (ubuntu). As mysql_config is part of mysql-devel, installing mysql-devel allows the installation of Mysql-python. So you may need to install it.

    ubuntu: sudo apt-get install libmysqlclient-dev
    centos: sudo yum install mysql-devel
  2. When installing twisted, you may get an error of CompressionError: bz2 module is not available and compile appears:

    Python build finished, but the necessary bits to build these modules were not found:
    _sqlite3           _tkinter           bsddb185
    bz2                dbm                dl

    Then, you may need to install bz2 and sqlite3 such as

    sudo apt-get install libbz2-dev
    sudo apt-get install libsqlite3-dev
  3. When setting up the stand-alone hbase on Ubuntu, you may fail to start it because of the /etc/hosts file. You can refer to http://hbase.apache.org/book/quickstart.html#ftn.d2907e114 to fix the problem.

  4. When using the Minos client to install a service package, if you get an error of socket.error: [Errno 101] Network is unreachable, please check your tank server configuration in deploy.cfg file, you might miss it.

Note: See Minos Wiki for more advanced features.



  • 86533/384   【精品推荐】200多种Android动画效果的强悍框架,太全了,不看这个,再有动画的问题,不理你了^@^

  • 49566/191   情人节福利,程序员表白的正确姿势:改几行代码就变成自己的表白了

  • 47762/0   Python爬虫:常用浏览器的useragent

  • 44994/261   【精品推荐】Android版产品级的音乐播放器源码,功能太强大了,最好的产品原型有木有?

  • 41299/145   省时省力的Android组件群来了,非常棒的原型参考

  • 32833/143   2016抢红包软件及源码

  • 30972/71   原创表白APP,以程序员的姿势备战新年后的7夕,持续完善中!

  • 30016/2   超全!整理常用的iOS第三方资源

  • 26906/161   Android版类似UC浏览器:非常赞,产品级的源码

  • 23921/31   麻省理工的一帮疯子,真的实现了随意操控万物!(绝对黑科技)

  • 23713/27   2016程序员跳槽全攻略

  • 23647/26   Android工程师面试题大全

  • 22682/10   GitHub上排名前50的iOS项目:总有一款你用得着

  • 21878/21   码魂:程序员的牛B漫画

  • 20670/74   【持续更新中】Android福利贴(二):资料源码大放送

  • 20438/85   Android小而全的博客源码:非常适合全面掌握开发技巧

  • 20283/43   一个绚丽的loading动效分析与实现!

  • 19593/10   2016年最全的Android面试考题+答案 精编版

  • 19546/3   吐槽那些程序员的搞笑牛逼注释

  • 19293/104   Android带弹幕的视频播放器源码,来自大名鼎鼎的Bilibili弹幕网站

  • 18933/82   仿京东商城客户端Android最新版,不错的原型和学习资料

  • 18929/45   惊艳的App引导页:背景图片切换加各个页面动画效果

  • 18304/25   个人收集的Android 各类功能源代码

  • 18301/1   iOS 动画总结

  • 18210/23   Android福利第三波【Android电子书】

  • 18035/81   【精品推荐】类似360安全卫士安Android源码:非常赞的产品原型

  • 17877/1   iOS中文版资源库,非常全

  • 17754/5   新一代Android渠道打包工具:1000个渠道包只需要5秒

  • 17661/54   基于瀑布流的美女图片浏览App,有注释的源代码

  • 17331/18   用JavaScript 来开发iOS和Android 原生应用:React Native开源框架中文版来啦

  • 17086/23   珍藏多年的素材,灵感搜寻网站

  • 17064/19   65条最常用正则表达式,你要的都在这里了

  • 17020/10   女程序员的梦,众网友的神回复

  • 16849/11   年会上现场review代码是怎么样的体验!

  • 15624/16   基于Android支付宝支付设计和开发方案

  • 15299/18   什么是真正的黑客:收获12200+Stars,人气远超微软开源VS

  • 15224/62   【技巧一】搭配Android Studio,如何实现App远程真机debug?

  • 15144/11   有木有这样一张酷图帮你集齐所有git命令超实用

  • 15108/47   在线音乐播放器完整版(商用级的源码):非常赞,可听免费高品质专辑

  • 15059/4   46 个非常有用的 PHP 代码片段

  • 14391/0   GitHub iOS 库和框架Top100 

  • 14195/7   一张图搞定iOS学习路线,非常全面

  • 14186/7   用程序员的姿势抢过年的火车票

  • 13796/10   成为Java顶尖程序员 ,看这11本书就够了

  • 13783/10   微信支付终于成功了(安卓,iOS),在此分享

  • 13697/18   一张图搞定Android学习路线,非常全面

  • 13685/29   【持续更新中】Android福利贴(一):资料源码

  • 13470/4   基于Node.js的强大爬虫,能直接发布抓取的文章哦

  • 13156/1   基于node-webkit跨平台应用案例集之(一)

  • 12706/3   即时通信第三方库

  • 12289/9   流媒体视频直播方案

  • 12229/18   八个最优秀的Android Studio插件

  • 12079/2   【精品推荐】高质量PHP代码的50个实用技巧:非常值得收藏

  • 11950/9   B站建开源工作组:APP想支持炫酷弹幕的看过来

  • 11616/9   烧了5亿美金,这家神秘的公司即将颠覆人类未来!

  • 11557/12   中国黑客的隐秘江湖:攻守对立,顶尖高手月入千万美元

  • 11445/0   过上惬意生活的精华资源:创业、工作、生活成长

  • 10872/6   开箱即用!Android四款系统架构工具

  • 10818/2   Android性能优化视频,文档以及工具

  • 10652/4   10款GitHub上最火爆的国产开源项目——可以媲美西半球

  • 返回顶部