测试Hive INT数据类型

测试HIVE如何处理NUMBER类型数据


  • 以TEST.NUMTEST为例

Environment

Product Version
HDP 2.4.0.0-169
Sqoop 1.4.6.2.4.0.0-169
Hive 1.2.1000.2.4.0.0-169
Oracle 11g.203

TEST.NUMTEST表结构:

表内容:

Purpose

测试出Hive处理NUMBER类型的结果

Procedure

1、不进行map-column-hive,直接导入,sqoop语句如下:

1
sqoop import --connect 'jdbc:oracle:thin:@10.143.90.12:1521:infodb' --username bdc  --password bdc --table TEST.NUMTEST --hive-import --hive-database TEST --hive-table  numtest  --fields-terminated-by "\t" --lines-terminated-by "\n"    --hive-overwrite --delete-target-dir  --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'  -m 1

HIVE处理结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hive> desc NUMTEST;
OK
three double
nine double
ten double
fifteen double
nineteen double
twentytwo double
Time taken: 0.429 seconds, Fetched: 6 row(s)
hive> select * from NUMTEST;
OK
0.0 1.23456789E8 1.23456789E9 1.23456789123456E14 1.23456789123456794E18 1.2345678901234568E21
10.0 NULL NULL NULL NULL NULL
Time taken: 0.165 seconds, Fetched: 2 row(s)

2、转化规则如下:
–map-column-hive THREE=tinyint,FIFTEEN=bigint,TEN=int,NINE=int,TWENTYTWO=bigint,NINETEEN=bigint

HIVE处理结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hive> desc NUMTEST;
OK
three tinyint
nine int
ten int
fifteen bigint
nineteen bigint
twentytwo bigint
Time taken: 0.427 seconds, Fetched: 6 row(s)
hive> select * from NUMTEST;
OK
0 123456789 1234567890 123456789123456 1234567891234567890 NULL
10 NULL NULL NULL NULL NULL
Time taken: 0.089 seconds, Fetched: 2 row(s)

3、单独测试,NUMBER(15)转为INT

HIVE处理结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hive> desc NUMTEST;
OK
three tinyint
nine int
ten int
fifteen int
nineteen bigint
twentytwo bigint
Time taken: 0.424 seconds, Fetched: 6 row(s)
hive> select * from NUMTEST;
OK
0 123456789 1234567890 NULL 1234567891234567890 NULL
10 NULL NULL NULL NULL NULL
Time taken: 0.135 seconds, Fetched: 2 row(s)

4、单独测试,NUMBER(19)转为INT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hive> desc NUMTEST;
OK
three tinyint
nine int
ten int
fifteen int
nineteen int
twentytwo bigint
Time taken: 0.437 seconds, Fetched: 6 row(s)
hive> select * from NUMTEST;
OK
0 123456789 1234567890 NULL NULL NULL
10 NULL NULL NULL NULL NULL
Time taken: 0.113 seconds, Fetched: 2 row(s)

5、测试NUMBER(22)转为string:
–map-column-hive THREE=tinyint,FIFTEEN=bigint,TEN=int,NINE=int,TWENTYTWO=string,NINETEEN=bigint

HIVE结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hive> desc NUMTEST;
OK
three tinyint
nine int
ten int
fifteen bigint
nineteen bigint
twentytwo string
Time taken: 0.426 seconds, Fetched: 6 row(s)
hive> select * from NUMTEST;
OK
0 123456789 1234567890 123456789123456 1234567891234567890 1234567890123456789012
10 NULL NULL NULL NULL NULL
Time taken: 0.132 seconds, Fetched: 2 row(s)

6、测试NUMBER(11)转为INT和BIGINT

INT的HIVE结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
hive> desc NUMTEST;
OK
three tinyint
nine int
ten int
fifteen bigint
nineteen bigint
twentytwo string
eleven int
Time taken: 0.43 seconds, Fetched: 7 row(s)
hive> select * from NUMTEST;
OK
0 123456789 1234567890 123456789123456 1234567891234567890 1234567890123456789012 NULL
10 NULL NULL NULL NULL NULL NULL
Time taken: 0.1 seconds, Fetched: 2 row(s)

BIGINT的HIVE结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
hive> desc NUMTEST;
OK
three tinyint
nine int
ten int
fifteen bigint
nineteen bigint
twentytwo string
eleven bigint
Time taken: 0.426 seconds, Fetched: 7 row(s)
hive> select * from NUMTEST;
OK
0 123456789 1234567890 123456789123456 1234567891234567890 1234567890123456789012 12345678901
10 NULL NULL NULL NULL NULL NULL
Time taken: 0.132 seconds, Fetched: 2 row(s)

7、测试NUMBER无长度,无精度类型,需要转成string

Hive的结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
hive> desc numtest;
OK
three tinyint
nine int
ten int
fifteen bigint
nineteen bigint
twentytwo string
eleven bigint
aa string
bb double
Time taken: 0.425 seconds, Fetched: 9 row(s)
hive> select * from numtest;
OK
0 123456789 1234567890 123456789123456 1234567891234567890 1234567890123456789012 12345678901 123456789012345 1.234567812E7
10 NULL NULL NULL NULL NULL NULL 123.99 NULL
Time taken: 0.176 seconds, Fetched: 2 row(s)

8、双精度double类型,转成string

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
hive> desc numtest;
OK
three tinyint
nine int
ten int
fifteen bigint
nineteen bigint
twentytwo string
eleven bigint
aa string
bb string
Time taken: 0.553 seconds, Fetched: 9 row(s)
hive> select * from numtest;
OK
0 123456789 1234567890 123456789123456 1234567891234567890 1234567890123456789012 12345678901 123456789012345 12345678.12
10 NULL NULL NULL NULL NULL NULL 123.99 NULL
Time taken: 0.162 seconds, Fetched: 2 row(s)

Conclusion

HIVE转化Oracle的NUMBER数据类型结论如下:

Orace数据类型 Hive数据类型
NUMBER(1)~NUMBER(3) TINYINT
NUMBER(4)~NUMBER(10) INT
NUMBER(11)~NUMBER(19) BIGINT
NUMBER(20)以上 STRING

HIVE数据类型转换

所有类型 BOOLEAN TINYINT SMALLINT INT BIGINT FLOAT DOUBLE DECIMAL STRING VARCHAR TIMESTAMP DATE BINARY
BOOLEAN true false false false false false false false false false false false false
TINYINT false true true true true true true true true true false false false
SMALLINT false false true true true true true true true true false false false
INT false false false true true true true true true true false false false
BIGINT false false false false true true true true true true false false false
FLOAT false false false false false true true true true true false false false
DOUBLE false false false false false false true true true true false false false
DECIMAL false false false false false false false true true true false false false
STRING false false false false false false true true true true false false false
VARCHAR false false false false false false true true true true false false false
TIMESTAMP false false false false false false false false true true true false false
DATE false false false false false false false false true true false true false
BINARY false false false false false false false false false false false false true

自定义Ambari Alerts

Update thresholds for the Ambari alert


  • 以Host Disk Usage为例

Environment

Product Version
Ambari 2.2.1.0-169
HDP 2.4.0.0-169

Purpose

修改Host Disk Usage warning value,默认50%

Procedure

1、找到集群名字

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@gmjk-dsj8 ~]# curl --user admin:admin http://localhost:8080/api/v1/clusters/
{
"href" : "http://localhost:8080/api/v1/clusters/",
"items" : [
{
"href" : "http://localhost:8080/api/v1/clusters/test",
"Clusters" : {
"cluster_name" : "test",
"version" : "HDP-2.4"
}
}
]
}

2、输出集群所有配置

1
2
3
4
[root@gmjk-dsj8 ~]# curl --user admin:admin http://localhost:8080/api/v1/clusters/test > /root/0620
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 885k 0 885k 0 0 2475k 0 --:--:-- --:--:-- --:--:-- 2488k

3、找到Host Disk Usage配置,URL应该是

1
http://gmjk-dsj8:8080/api/v1/clusters/test/alert_definitions/3

4、查看Host Disk Usage配置细节

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[root@gmjk-dsj8 ~]# curl -u admin:admin -X GET http://gmjk-dsj8:8080/api/v1/clusters/test/alert_definitions/3
{
"href" : "http://gmjk-dsj8:8080/api/v1/clusters/test/alert_definitions/3",
"AlertDefinition" : {
"cluster_name" : "test",
"component_name" : "AMBARI_AGENT",
"description" : "This host-level alert is triggered if the amount of disk space used goes above specific thresholds. The default threshold values are 70% for WARNING and 80% for CRITICAL",
"enabled" : true,
"id" : 3,
"ignore_host" : false,
"interval" : 1,
"label" : "Host Disk Usage",
"name" : "ambari_agent_disk_usage",
"scope" : "HOST",
"service_name" : "AMBARI",
"source" : {
"parameters" : [
{
"display_name" : "Minimum Free Space",
"description" : "The overall amount of free disk space left before an alert is triggered.",
"name" : "minimum.free.space",
"value" : "5.0E9",
"type" : "NUMERIC",
"units" : "bytes",
"threshold" : "WARNING"
},
{
"display_name" : "Warning",
"description" : "The percent of disk space consumed before a warning is triggered.",
"name" : "percent.used.space.warning.threshold",
"value" : "0.7",
"type" : "PERCENT",
"units" : "%",
"threshold" : "WARNING"
},
{
"display_name" : "Critical",
"description" : "The percent of disk space consumed before a critical alert is triggered.",
"name" : "percent.free.space.critical.threshold",
"value" : "0.8",
"type" : "PERCENT",
"units" : "%",
"threshold" : "CRITICAL"
}
],
"path" : "alert_disk_space.py",
"type" : "SCRIPT"
}
}
}

5、拷贝以上输出,去掉”href”行,修改需要的value和Description

6、上传修改后的JSON,注意 PUT -d 后的单引号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[root@gmjk-dsj8 ~]# curl -u admin:admin -H 'X-Requested-By:admin' -X PUT  -d '{
> "AlertDefinition" : {
> "cluster_name" : "test",
> "component_name" : "AMBARI_AGENT",
> "description" : "This host-level alert is triggered if the amount of disk space used goes above specific thresholds. The default threshold values are 70% for WARNING and 80% for CRITICAL",
> "enabled" : true,
> "id" : 3,
> "ignore_host" : false,
> "interval" : 1,
> "label" : "Host Disk Usage",
> "name" : "ambari_agent_disk_usage",
> "scope" : "HOST",
> "service_name" : "AMBARI",
> "source" : {
> "parameters" : [
> {
> "name" : "minimum.free.space",
> "display_name" : "Minimum Free Space",
> "units" : "bytes",
> "value" : 5.0E9,
> "description" : "The overall amount of free disk space left before an alert is triggered.",
> "type" : "NUMERIC",
> "threshold" : "WARNING"
> },
> {
> "name" : "percent.used.space.warning.threshold",
> "display_name" : "Warning",
> "units" : "%",
> "value" : 0.7,
> "description" : "The percent of disk space consumed before a warning is triggered.",
> "type" : "PERCENT",
> "threshold" : "WARNING"
> },
> {
> "name" : "percent.free.space.critical.threshold",
> "display_name" : "Critical",
> "units" : "%",
> "value" : 0.8,
> "description" : "The percent of disk space consumed before a critical alert is triggered.",
> "type" : "PERCENT",
> "threshold" : "CRITICAL"
> }
> ],
> "path" : "alert_disk_space.py",
> "type" : "SCRIPT"
> }
> }
> }' http://gmjk-dsj8:8080/api/v1/clusters/test/alert_definitions/3

7、检查修改后的监控项

Hadoop重新做HA

1、首先在Active上执行以下命令,手动备份整个name目录:

1
2
3
4
5
6
7
8
#关闭故障自动控制器
hadoop-daemon.sh stop zkfs

#进入安全模式
hdfs dfsadmin -safemode enter

#刷新editslog到fsimage
hdfs dfsadmin -saveNamespace

2、在Standby上备份整个name和journal目录,在执行:

1
2
hadoop-daemon.sh stop zkfc
hdfs namenode -bootstrapStandby

若报以下错:
FATAL ha.BootstrapStandby: Unable to read transaction ids 10-100 from the configured shared edits storage qjournal://1.1.1.1:8485;1.1.1.2:8485/sec-hdfs-cluster. Please copy these logs into the shared edits storage or call saveNamespace on the active node.
Error: Gap in transactions. Expected to be able to read up until at least txid 10 but unable to find any edit logs containing txid 10

需要将Active整个name目录复制到Standby,然后启动namenode:

1
2
scp -r /data/hadoop/name $standby_ip:/data/hadoop
hadoop-daemon.sh start namenode

CentOS安装加固总结

  • 要更新的软件:
    bash,openssl,glibc

配置密码不允许重复次数和复杂度

#编辑/etc/pam.d/system-auth文件,找到以下行:
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
#后面添加  remember=3  代表不允许使用之前3次用过的密码
#找到以下行
password    requisite     pam_cracklib.so try_first_pass retry=3 type=

修改为:
password    requisite     pam_cracklib.so try_first_pass retry=6 ucredit=-1 lcredit=-1 dcredit=-1 ocredit=-1
#retry=N 重试N次后返回密码错误
#difok=N 新密码必需与旧密码不同的位数
#dcredit=N N>=0表示密码最多N个数字,反之最少N个数字
#lcredit=N 小写字母个数
#ucredit=N 小写字母个数
#ccredit=N 小写字母个数
#minclass=N 密码组成(大/小字母,数字,特殊字符)
#root不受此限制,普通用户登录后改密码才受限制

禁止root登录

#更改/etc/ssh/sshd_config,之后重启sshd
PermitRootLogin yes
改成:PermitRootLogin no

修改密码长度和过期时间:

#改 /etc/login.defs
PASS_MAX_DAYS   90
PASS_MIN_DAYS   1
PASS_MIN_LEN    9
PASS_WARN_AGE   7
解释:
# PASS_MAX_DAYS   密码使用的最长期限
# PASS_MIN_DAYS   密码允许更改的最短期限
# PASS_MIN_LEN    能接受的密码最短长度
# PASS_WARN_AGE   密码过期之前的提醒天数

设置连续失败次数和失败锁定时间

修改(/etc/pam.d/system-auth和/etc/pam.d/sshd)分别在开始位置加入以下内容:

auth    required    pam_tally2.so    deny=6    unlock_time=300    even_deny_root root_unlock_time=120
#最大失败次数6次,超过6次锁定300s,root用户锁定120s
#两个文件中"required pam_tally2.so deny"要删掉

umask降权,设置历史命令数

默认uid>199的umask是022,uid<199的umask是022,可以在/etc/profile里新加一行”umask 027”,修改HISSIZE=100,即最大历史命令是100

修改ssh banner

自定义banner文件:echo “Only Gomeholdings users can login” > /etc/ssh/gomebanner
再修改/etc/ssh/sshd_config 新增”Banner /etc/ssh/gomebanner”

修改ssh访问控制

/etc/hosts.allow文件
如: sshd:10.35.0.0/255.255.0.0
禁止访问的文件 /etc/hosts.deny sshd:all
hosts.allow文件优先

空闲超时退出

改/etc/profile 追加 TMOUT=300

su限制

修改/etc/login.defs 追加”SU_WHEEL_ONLY yes”
修改/etc/pam.d/su
将”auth required pam_wheel.so use_uid”取消注释
将要配置su权限的用户加到wheel组:
usermod -G wheel work

取消CTRL+ALT+DELETE

修改/etc/init/control-alt-delete.conf,注释掉以下内容:
start on control-alt-delete
exec /sbin/shutdown -r now “Control-Alt-Delete pressed”

对于Centos7只需:rm -rf /usr/lib/systemd/ctrl-alt-del.target

修改motd,登陆系统后的提示信息

echo “Illegal users are not allowed to login, Only authorized users can login” > /etc/motd
当用显示器连接linux主机的时候,屏幕会显示系统版本等相关信息,可以将他取消掉,将/etc/issue和/etc/issue.net改名即可

禁用不用的用户和组

改/etc/passwd
#adm:x:3:4:adm:/var/adm:/sbin/nologin
#lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
#sync:x:5:0:sync:/sbin:/bin/sync
#shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
#halt:x:7:0:halt:/sbin:/sbin/halt
#uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
#operator:x:11:0:operator:/root:/sbin/nologin
#games:x:12:100:games:/usr/games:/sbin/nologin
#gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
改/etc/group
#adm:x:4:root,adm,daemon
#lp:x:7:daemon,lp
#uucp:x:14:uucp
#games:x:20:
#dip:x:40:

禁用没用的服务

chkconfig acpid off
chkconfig anacron off
chkconfig auditd off
chkconfig avahi-daemon off
chkconfig autofs off
chkconfig atd off
chkconfig bluetooth off
chkconfig cpuspeed off
chkconfig cups off
chkconfig firstboot off
chkconfig gpm off
chkconfig haldaemon off
chkconfig hidd off
chkconfig ip6tables off
chkconfig iptables off
chkconfig lvm2-monitor off
chkconfig mcstrans off
chkconfig mdmonitor off
chkconfig messagebus off
chkconfig nfslock off
chkconfig pcscd off
chkconfig rawdevices off
chkconfig readahead_early off
chkconfig restorecond off
chkconfig rpcidmapd off
chkconfig sendmail off
chkconfig xfs off
chkconfig yum-updatesd off
chkconfig NetworkManager off
,