@chenyaofo
2018-08-15T13:57:22.000000Z
字数 9290
阅读 784
未分类
本次测试选用HMDB51数据集,其压缩包大小为2.0G左右,其中包含6817个视频。
使用opencv库对这些视频进行分帧,最后得到639417帧,数据大小为15.4GB,占用空间16.6G,平均每一帧大小为25.25KB。
然后使用MIO库将HMDB51数据集全部帧转换为MIO格式,索引大小为7.4MB,对象大小为16GB。
可以直接跳到文章结尾查看结果总结。
每一次测试之前都会使用以下命令清空所有缓冲区,包括页面缓存,目录项和inode,
sync; echo 3 > /proc/sys/vm/drop_caches
以保证测试的公平性。
每个测试代码执行3次,计算平均值。
# | |
---|---|
OS | Ubuntu 16.04.4 LTS (4.4.0-116-generic) |
CPU | Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz(4C8T) |
Memory | 16GB (no swap) |
Disk | Western Digital Blue(3.5 inchs, 7200 npm, 1TB) |
以下两个测试均使用单线程进行读取。
import time
import pathlib
if __name__ == '__main__':
root = pathlib.Path("/home/chenyaofo/datasets/hmdb51")
start = time.time()
total_size = 0
for category in root.iterdir():
for video in category.iterdir():
size = len(list(video.iterdir()))
for i in range(size):
b = (video/"{:05d}.jpg".format(i)).read_bytes()
total_size += len(b)
cost_time = time.time() - start
total_size /= (1024 * 1024)
print("Normal sequence read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
.format(total_size, cost_time, total_size / cost_time))
# test 1
Normal sequence read, fetch 15692.37MB in total, cost 230.56s, avg speed is 68.06MB/s.
# test 2
Normal sequence read, fetch 15692.37MB in total, cost 228.25s, avg speed is 68.75MB/s.
# test 3
Normal sequence read, fetch 15692.37MB in total, cost 229.54s, avg speed is 68.36MB/s.
# avg
cost 229.45s, speed is 68.39MB/s
import time
from torchlearning.mio import MIO
if __name__ == '__main__':
m = MIO("/home/chenyaofo/datasets/hmdb51_mio")
start = time.time()
total_size = 0
for i in range(m.size):
objects = m.fetchall(i)
for o in objects:
total_size += len(o)
cost_time = time.time() - start
total_size /= (1024 * 1024)
print("MIO sequence read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
.format(total_size, cost_time, total_size / cost_time))
# test 1
MIO sequence read, fetch 15692.37MB in total, cost 87.45s, avg speed is 179.44MB/s.
# test 2
MIO sequence read, fetch 15692.37MB in total, cost 89.71s, avg speed is 174.93MB/s.
# test 3
MIO sequence read, fetch 15692.37MB in total, cost 87.50s, avg speed is 179.34MB/s.
# avg
cost 88.22s, speed is 177.90MB/s
每个视频随机读取8帧。
import time
import random
import pathlib
if __name__ == '__main__':
root = pathlib.Path("/home/chenyaofo/datasets/hmdb51")
start = time.time()
total_size = 0
all_videos = []
for category in root.iterdir():
for video in category.iterdir():
all_videos.append(video)
random.shuffle(all_videos)
for video in all_videos:
size = len(list(video.iterdir()))
to_sample_ids = sorted(random.sample(range(size), k=8))
for i in to_sample_ids:
b = (video / "{:05d}.jpg".format(i)).read_bytes()
total_size += len(b)
cost_time = time.time() - start
total_size /= (1024 * 1024)
print("Normal random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
.format(total_size, cost_time, total_size / cost_time))
测试结果
# for 1 frames
#test 1
Normal random read, fetch 169.14MB in total, cost 91.77s, avg speed is 1.84MB/s.
# test 2
Normal random read, fetch 169.24MB in total, cost 91.60s, avg speed is 1.85MB/s.
# test 3
Normal random read, fetch 169.26MB in total, cost 92.27s, avg speed is 1.83MB/s.
# avg
cost 91.88s, speed is 1.84MB/s
# for 2 frames
# test 1
Normal random read, fetch 338.50MB in total, cost 120.87s, avg speed is 2.80MB/s.
# test 2
Normal random read, fetch 338.68MB in total, cost 120.83s, avg speed is 2.80MB/s.
# test 3
Normal random read, fetch 338.14MB in total, cost 121.33s, avg speed is 2.79MB/s.
# avg
cost 121.01s, speed is 2.80MB/s
# for 4 frames
# test 1
Normal random read, fetch 676.36MB in total, cost 151.41s, avg speed is 4.47MB/s.
# test 2
Normal random read, fetch 676.67MB in total, cost 152.40s, avg speed is 4.44MB/s.
# test 3
Normal random read, fetch 676.75MB in total, cost 151.65s, avg speed is 4.46MB/s.
# avg
cost 151.82s, speed is 4.46MB/s
# for 8 frames
# test 1
Normal random read, fetch 1352.37MB in total, cost 179.60s, avg speed is 7.53MB/s.
# test 2
Normal random read, fetch 1353.67MB in total, cost 179.49s, avg speed is 7.54MB/s.
# test 3
Normal random read, fetch 1352.99MB in total, cost 179.41s, avg speed is 7.54MB/s.\
# avg
cost 179.5s, speed is 7.54MB/s
import time
import random
from torchlearning.mio import MIO
if __name__ == '__main__':
m = MIO("/home/chenyaofo/datasets/hmdb51_mio")
start = time.time()
total_size = 0
collection_ids = list(range(m.size))
random.shuffle(collection_ids)
for i in collection_ids:
print(i)
to_sample_ids = sorted(random.sample(range(m.get_collection_size(i)),k=8))
objects = m.fetchmany(i,to_sample_ids)
for o in objects:
total_size += len(o)
cost_time = time.time() - start
total_size /= (1024 * 1024)
print("MIO random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
.format(total_size, cost_time, total_size / cost_time))
# for 1 frames
# test 1
MIO random read, fetch 169.34MB in total, cost 59.15s, avg speed is 2.86MB/s.
# test 2
MIO random read, fetch 169.10MB in total, cost 58.36s, avg speed is 2.90MB/s.
# test 3
MIO random read, fetch 169.40MB in total, cost 58.64s, avg speed is 2.89MB/s.
# avg
cost 58.72s, speed is 2.88MB/s
# for 2 frames
# test 1
MIO random read, fetch 338.30MB in total, cost 81.41s, avg speed is 4.16MB/s.
# test 2
MIO random read, fetch 338.11MB in total, cost 82.52s, avg speed is 4.10MB/s.
# test 3
MIO random read, fetch 337.82MB in total, cost 81.90s, avg speed is 4.12MB/s.
# avg
cost 81.94s, speed is 4.13MB/s
# for 4 frames
# test 1
MIO random read, fetch 676.16MB in total, cost 105.99s, avg speed is 6.38MB/s.
# test 2
MIO random read, fetch 676.40MB in total, cost 104.98s, avg speed is 6.44MB/s.
# test 3
MIO random read, fetch 675.68MB in total, cost 104.38s, avg speed is 6.47MB/s.
# avg
cost 105.12s, speed is 6.43MS/s
# for 8 frames
# test 1
MIO random read, fetch 1352.68MB in total, cost 123.42s, avg speed is 10.96MB/s.
# test 2
MIO random read, fetch 1352.40MB in total, cost 122.81s, avg speed is 11.01MB/s.
# tests 3
MIO random read, fetch 1353.02MB in total, cost 123.88s, avg speed is 10.92MB/s.
# avg
cost 123.37s, speed is 10.96MB/s
在多进程读取测试中我们将使用pytorch中的dataloader加载数据
import time
import pathlib
import random
from torch.utils.data import DataLoader,Dataset
class VideoDataset(Dataset):
def __init__(self,root):
self.root = pathlib.Path(root)
self.all_videos = []
for category in self.root.iterdir():
for video in category.iterdir():
self.all_videos.append(video)
def __getitem__(self, index):
video = self.all_videos[index]
size = len(list(video.iterdir()))
to_sample_ids = sorted(random.sample(range(size), k=8))
rev = []
for i in to_sample_ids:
rev.append((video / "{:05d}.jpg".format(i)).read_bytes())
return rev,0
def __len__(self):
return len(self.all_videos)
dataloader = DataLoader(
dataset=VideoDataset("/home/chenyaofo/datasets/hmdb51"),
batch_size=8,
shuffle=True,
num_workers=8,
collate_fn=lambda batch: batch,
)
start = time.time()
total_size = 0
for data in dataloader:
for video,label in data:
for frame in video:
total_size += len(frame)
cost_time = time.time() - start
total_size /= (1024 * 1024)
print("Normal pytorch dataloader random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
.format(total_size, cost_time, total_size / cost_time))
# 2 processes
# test 1
Normal pytorch dataloader random read, fetch 1351.81MB in total, cost 253.17s, avg speed is 5.34MB/s.
# test 2
Normal pytorch dataloader random read, fetch 1353.02MB in total, cost 254.13s, avg speed is 5.32MB/s.
# test 3
Normal pytorch dataloader random read, fetch 1351.99MB in total, cost 253.90s, avg speed is 5.32MB/s.
# avg
cost 253.73s, speed is 5.33MV/s
# 4 processes
# test 1
Normal pytorch dataloader random read, fetch 1352.94MB in total, cost 276.83s, avg speed is 4.89MB/s.
# test 2
Normal pytorch dataloader random read, fetch 1353.21MB in total, cost 276.60s, avg speed is 4.89MB/s.
# test 3
Normal pytorch dataloader random read, fetch 1352.38MB in total, cost 276.46s, avg speed is 4.89MB/s.
# avg
cost 276.63s, speed is 4.89MB/s
# 8 processes
# test 1
Normal pytorch dataloader random read, fetch 1352.62MB in total, cost 278.33s, avg speed is 4.86MB/s.
# test 2
Normal pytorch dataloader random read, fetch 1352.78MB in total, cost 275.84s, avg speed is 4.90MB/s.
# test 3
Normal pytorch dataloader random read, fetch 1353.25MB in total, cost 275.81s, avg speed is 4.91MB/s.
# avg
cost 276.66s, speed is 4.89MB/s
import time
import random
from torchlearning.datasets import MioDataset
from torch.utils.data import DataLoader
dataset = MioDataset(
root="/home/chenyaofo/datasets/hmdb51_mio",
sampler=lambda size: sorted(random.sample(list(range(size)), k=8)),
)
dataloader = DataLoader(
dataset=dataset,
batch_size=8,
shuffle=True,
num_workers=8,
collate_fn=lambda batch: batch,
)
start = time.time()
total_size = 0
for data in dataloader:
for video,label in data:
for frame in video:
total_size += len(frame)
cost_time = time.time() - start
total_size /= (1024 * 1024)
print("MIO pytorch dataloader random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
# 2 processes
# test 1
MIO pytorch dataloader random read, fetch 1352.52MB in total, cost 158.24s, avg speed is 8.55MB/s.
# test 2
MIO pytorch dataloader random read, fetch 1352.20MB in total, cost 158.85s, avg speed is 8.51MB/s.
# test 3
MIO pytorch dataloader random read, fetch 1352.41MB in total, cost 160.38s, avg speed is 8.43MB/s.
# avg
cost 159.16s, speed is 8.50MB/s
# 4 processes
# test 1
MIO pytorch dataloader random read, fetch 1352.05MB in total, cost 167.81s, avg speed is 8.06MB/s.
# test 2
MIO pytorch dataloader random read, fetch 1352.29MB in total, cost 169.65s, avg speed is 7.97MB/s.
# test 3
MIO pytorch dataloader random read, fetch 1352.38MB in total, cost 168.46s, avg speed is 8.03MB/s.
# avg
cost 168.64s, speed is 8.02MB/s
# 8 processes
# test 1
MIO pytorch dataloader random read, fetch 1352.85MB in total, cost 168.30s, avg speed is 8.04MB/s.
# test 2
MIO pytorch dataloader random read, fetch 1353.60MB in total, cost 168.01s, avg speed is 8.06MB/s.
# test 3
MIO pytorch dataloader random read, fetch 1352.27MB in total, cost 169.02s, avg speed is 8.00MB/s.
# avg
cost 168.44s, speed is 8.05MB/s
下面是上述所有测试结果的汇总,我们主要关注加载数据的时间消耗和加载速度。
测试项目 | 普通加载 | MIO格式加载 | 普通加载 | MIO格式加载 | ||
---|---|---|---|---|---|---|
顺序读取全部帧(单进程) | 229.45s | 88.22s | -61.56% | 68.39MB/s | 177.90MB/s | +160.13% |
随机读取1帧(单进程) | 91.88s | 58.72s | -36.10% | 1.84MB/s | 2.88MB/s | +56.52% |
随机读取2帧(单进程) | 121.01s | 81.94s | -32.23% | 2.80MB/s | 4.13MB/s | +47.50% |
随机读取4帧(单进程) | 151.82s | 105.12s | -32.74% | 4.46MB/s | 6.43MS/s | +44.17% |
随机读取8帧(单进程) | 179.5s | 123.37s | -31.27% | 7.54MB/s | 10.96MB/s | +45.36% |
随机读取8帧(2进程) | 253.73s | 159.16s | -37.27% | 5.33MV/s | 8.50MB/s | +59.47% |
随机读取8帧(4进程) | 276.63s | 168.64s | -39.04% | 4.89MB/s | 8.02MB/s | +64.01% |
随机读取8帧(8进程) | 276.66s | 168.44s | -39.12% | 4.89MB/s | 8.05MB/s | +64.62% |