2015/09/26

變更 hadoop 的 HDFS 檔案系統 block size

在 hadoop 2.7 版本的 hadoop 內把預設的 block size 設定為 128MB ,對於規模不大的教學用 hadoop 是很浪費空間的。所以修改一下把它改成 64MB 的 size。

----
先觀看 hdfs 的 datanode 狀況
----
# 也可以用 web 看
http://localhost:50070/dfshealth.html#tab-datanode 

# 用指令看
[hadoop@hnamenode ~]$ hdfs dfsadmin -report
Configured Capacity: 64280172384256 (58.46 TB)
Present Capacity: 64247015936000 (58.43 TB)
DFS Remaining: 64238110507008 (58.42 TB)
DFS Used: 8905428992 (8.29 GB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (17):

Name: 192.168.1.11:50010 (hdatanode11.cm.nsysu.edu.tw)
Hostname: hdatanode11.cm.nsysu.edu.tw
Decommission Status : Normal
Configured Capacity: 3998832504832 (3.64 TB)
DFS Used: 407236608 (388.37 MB)
Non DFS Used: 745283584 (710.76 MB)
DFS Remaining: 3997679984640 (3.64 TB)
DFS Used%: 0.01%
DFS Remaining%: 99.97%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Sep 26 10:37:29 CST 2015

.... skip ...

Name: 192.168.1.100:50010 (hnamenode.cm.nsysu.edu.tw)
Hostname: hnamenode.cm.nsysu.edu.tw
Decommission Status : Normal
Configured Capacity: 298852306944 (278.33 GB)
DFS Used: 2968330240 (2.76 GB)
Non DFS Used: 21231382528 (19.77 GB)
DFS Remaining: 274652594176 (255.79 GB)
DFS Used%: 0.99%
DFS Remaining%: 91.90%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Sep 26 10:37:30 CST 2015


----
更改 hadoop hdfs block size (dfs.blocksize)
----
從 http://localhost:19888/conf  看到 default 的 dfs.blocksize 值
<property>
<name>dfs.blocksize</name>
<value>134217728</value> 134217728(128MB) 預計變更為--> 67108864 (64MB)
<source>hdfs-default.xml</source>
</property>

# 它被預設再 hdfs-default.xml 檔案內,系統變數預設請看
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

# 可以用指令檢查
[hadoop@hnamenode ~]$ hdfs getconf -confKey dfs.blocksize
134217728

# 針對單一檔案檢查 block 的狀況
[hadoop@hnamenode ~]$ hdfs dfs  -stat %o /home/hadoop/test_map.R
134217728

# 使用 fsck 觀看某個檔案 blocks 分佈的狀況
[hadoop@hnamenode ~]$ hdfs fsck /home/hadoop/test_map.R -blocks
Connecting to namenode via http://hnamenode:50070/fsck?ugi=hadoop&blocks=1&path=%2Fhome%2Fhadoop%2Ftest_map.R
FSCK started by hadoop (auth:SIMPLE) from /192.168.1.100 for path /home/hadoop/test_map.R at Sat Sep 26 14:59:54 CST 2015
.Status: HEALTHY
 Total size: 81 B
 Total dirs: 0
 Total files: 1
 Total symlinks: 0
 Total blocks (validated): 1 (avg. block size 81 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 3
 Average block replication: 3.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 17
 Number of racks: 1
FSCK ended at Sat Sep 26 14:59:54 CST 2015 in 0 milliseconds


The filesystem under path '/home/hadoop/test_map.R' is HEALTHY

# 先停掉 hdfs and yarn
# stop-all.sh

# 修改 hdfs-site.xml 檔案,加入底下的內容把 dfs.blocksize 改變為 64MB ,系統預設為 128MB 。
<!-- by mtchang -->
<property>
<name>dfs.blocksize</name>
<value>67108864</value>
</property>
<!-- change block size to 64MB -->

# 改完後重新啟動
# start-all.sh

# 觀看修改後的 blocksize
[hadoop@hnamenode hadoop]$ hdfs getconf -confKey dfs.blocksize
67108864

# 推一個大檔案,約 242MB ,到 hdfs 上面看看。
[hadoop@hnamenode data]$ hdfs dfs -put big_number_1G.RData /home/hadoop/

# 變更為 64mb 的 block size 了
[hadoop@hnamenode data]$ hdfs dfs  -stat %o /home/hadoop/big_number_1G.RData 
67108864

# 但是原本已經存在的檔案 block size就沒有變動
[hadoop@hnamenode data]$ hdfs dfs  -stat %o /public/data/big_num_400t1t.RData
134217728

# 檢查看看 blocks 的檔案狀況
[hadoop@hnamenode data]$ hdfs fsck /home/hadoop/big_number_1G.RData -blocks
Connecting to namenode via http://hnamenode:50070/fsck?ugi=hadoop&blocks=1&path=%2Fhome%2Fhadoop%2Fbig_number_1G.RData
FSCK started by hadoop (auth:SIMPLE) from /192.168.1.100 for path /home/hadoop/big_number_1G.RData at Sat Sep 26 15:14:07 CST 2015
.Status: HEALTHY
 Total size: 307474871 B
 Total dirs: 0
 Total files: 1
 Total symlinks: 0
 Total blocks (validated): 5 (avg. block size 61494974 B)
 Minimally replicated blocks: 5 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 3
 Average block replication: 3.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 16
 Number of racks: 1
FSCK ended at Sat Sep 26 15:14:07 CST 2015 in 1 milliseconds

The filesystem under path '/home/hadoop/big_number_1G.RData' is HEALTHY

# 檢查新上傳的檔案 block size
[hadoop@hnamenode data]$ hdfs dfs  -stat %o /home/hadoop/big_number_1G.RData
67108864

指令:
http://hadoop.apache.org/docs/r1.2.1/commands_manual.html#Generic+Options
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

解釋:
http://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html#Rebalancer
https://www.quora.com/How-do-I-check-HDFS-blocksize-default-custom

張貼留言

like