@chenyaofo
2018-08-15T13:57:22.000000Z
字数 9290
阅读 847
未分类
本次测试选用HMDB51数据集,其压缩包大小为2.0G左右,其中包含6817个视频。
使用opencv库对这些视频进行分帧,最后得到639417帧,数据大小为15.4GB,占用空间16.6G,平均每一帧大小为25.25KB。
然后使用MIO库将HMDB51数据集全部帧转换为MIO格式,索引大小为7.4MB,对象大小为16GB。
可以直接跳到文章结尾查看结果总结。
每一次测试之前都会使用以下命令清空所有缓冲区,包括页面缓存,目录项和inode,
sync; echo 3 > /proc/sys/vm/drop_caches
以保证测试的公平性。
每个测试代码执行3次,计算平均值。
| # | |
|---|---|
| OS | Ubuntu 16.04.4 LTS (4.4.0-116-generic) |
| CPU | Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz(4C8T) |
| Memory | 16GB (no swap) |
| Disk | Western Digital Blue(3.5 inchs, 7200 npm, 1TB) |
以下两个测试均使用单线程进行读取。
import timeimport pathlibif __name__ == '__main__':root = pathlib.Path("/home/chenyaofo/datasets/hmdb51")start = time.time()total_size = 0for category in root.iterdir():for video in category.iterdir():size = len(list(video.iterdir()))for i in range(size):b = (video/"{:05d}.jpg".format(i)).read_bytes()total_size += len(b)cost_time = time.time() - starttotal_size /= (1024 * 1024)print("Normal sequence read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
# test 1Normal sequence read, fetch 15692.37MB in total, cost 230.56s, avg speed is 68.06MB/s.# test 2Normal sequence read, fetch 15692.37MB in total, cost 228.25s, avg speed is 68.75MB/s.# test 3Normal sequence read, fetch 15692.37MB in total, cost 229.54s, avg speed is 68.36MB/s.# avgcost 229.45s, speed is 68.39MB/s
import timefrom torchlearning.mio import MIOif __name__ == '__main__':m = MIO("/home/chenyaofo/datasets/hmdb51_mio")start = time.time()total_size = 0for i in range(m.size):objects = m.fetchall(i)for o in objects:total_size += len(o)cost_time = time.time() - starttotal_size /= (1024 * 1024)print("MIO sequence read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
# test 1MIO sequence read, fetch 15692.37MB in total, cost 87.45s, avg speed is 179.44MB/s.# test 2MIO sequence read, fetch 15692.37MB in total, cost 89.71s, avg speed is 174.93MB/s.# test 3MIO sequence read, fetch 15692.37MB in total, cost 87.50s, avg speed is 179.34MB/s.# avgcost 88.22s, speed is 177.90MB/s
每个视频随机读取8帧。
import timeimport randomimport pathlibif __name__ == '__main__':root = pathlib.Path("/home/chenyaofo/datasets/hmdb51")start = time.time()total_size = 0all_videos = []for category in root.iterdir():for video in category.iterdir():all_videos.append(video)random.shuffle(all_videos)for video in all_videos:size = len(list(video.iterdir()))to_sample_ids = sorted(random.sample(range(size), k=8))for i in to_sample_ids:b = (video / "{:05d}.jpg".format(i)).read_bytes()total_size += len(b)cost_time = time.time() - starttotal_size /= (1024 * 1024)print("Normal random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
测试结果
# for 1 frames#test 1Normal random read, fetch 169.14MB in total, cost 91.77s, avg speed is 1.84MB/s.# test 2Normal random read, fetch 169.24MB in total, cost 91.60s, avg speed is 1.85MB/s.# test 3Normal random read, fetch 169.26MB in total, cost 92.27s, avg speed is 1.83MB/s.# avgcost 91.88s, speed is 1.84MB/s# for 2 frames# test 1Normal random read, fetch 338.50MB in total, cost 120.87s, avg speed is 2.80MB/s.# test 2Normal random read, fetch 338.68MB in total, cost 120.83s, avg speed is 2.80MB/s.# test 3Normal random read, fetch 338.14MB in total, cost 121.33s, avg speed is 2.79MB/s.# avgcost 121.01s, speed is 2.80MB/s# for 4 frames# test 1Normal random read, fetch 676.36MB in total, cost 151.41s, avg speed is 4.47MB/s.# test 2Normal random read, fetch 676.67MB in total, cost 152.40s, avg speed is 4.44MB/s.# test 3Normal random read, fetch 676.75MB in total, cost 151.65s, avg speed is 4.46MB/s.# avgcost 151.82s, speed is 4.46MB/s# for 8 frames# test 1Normal random read, fetch 1352.37MB in total, cost 179.60s, avg speed is 7.53MB/s.# test 2Normal random read, fetch 1353.67MB in total, cost 179.49s, avg speed is 7.54MB/s.# test 3Normal random read, fetch 1352.99MB in total, cost 179.41s, avg speed is 7.54MB/s.\# avgcost 179.5s, speed is 7.54MB/s
import timeimport randomfrom torchlearning.mio import MIOif __name__ == '__main__':m = MIO("/home/chenyaofo/datasets/hmdb51_mio")start = time.time()total_size = 0collection_ids = list(range(m.size))random.shuffle(collection_ids)for i in collection_ids:print(i)to_sample_ids = sorted(random.sample(range(m.get_collection_size(i)),k=8))objects = m.fetchmany(i,to_sample_ids)for o in objects:total_size += len(o)cost_time = time.time() - starttotal_size /= (1024 * 1024)print("MIO random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
# for 1 frames# test 1MIO random read, fetch 169.34MB in total, cost 59.15s, avg speed is 2.86MB/s.# test 2MIO random read, fetch 169.10MB in total, cost 58.36s, avg speed is 2.90MB/s.# test 3MIO random read, fetch 169.40MB in total, cost 58.64s, avg speed is 2.89MB/s.# avgcost 58.72s, speed is 2.88MB/s# for 2 frames# test 1MIO random read, fetch 338.30MB in total, cost 81.41s, avg speed is 4.16MB/s.# test 2MIO random read, fetch 338.11MB in total, cost 82.52s, avg speed is 4.10MB/s.# test 3MIO random read, fetch 337.82MB in total, cost 81.90s, avg speed is 4.12MB/s.# avgcost 81.94s, speed is 4.13MB/s# for 4 frames# test 1MIO random read, fetch 676.16MB in total, cost 105.99s, avg speed is 6.38MB/s.# test 2MIO random read, fetch 676.40MB in total, cost 104.98s, avg speed is 6.44MB/s.# test 3MIO random read, fetch 675.68MB in total, cost 104.38s, avg speed is 6.47MB/s.# avgcost 105.12s, speed is 6.43MS/s# for 8 frames# test 1MIO random read, fetch 1352.68MB in total, cost 123.42s, avg speed is 10.96MB/s.# test 2MIO random read, fetch 1352.40MB in total, cost 122.81s, avg speed is 11.01MB/s.# tests 3MIO random read, fetch 1353.02MB in total, cost 123.88s, avg speed is 10.92MB/s.# avgcost 123.37s, speed is 10.96MB/s
在多进程读取测试中我们将使用pytorch中的dataloader加载数据
import timeimport pathlibimport randomfrom torch.utils.data import DataLoader,Datasetclass VideoDataset(Dataset):def __init__(self,root):self.root = pathlib.Path(root)self.all_videos = []for category in self.root.iterdir():for video in category.iterdir():self.all_videos.append(video)def __getitem__(self, index):video = self.all_videos[index]size = len(list(video.iterdir()))to_sample_ids = sorted(random.sample(range(size), k=8))rev = []for i in to_sample_ids:rev.append((video / "{:05d}.jpg".format(i)).read_bytes())return rev,0def __len__(self):return len(self.all_videos)dataloader = DataLoader(dataset=VideoDataset("/home/chenyaofo/datasets/hmdb51"),batch_size=8,shuffle=True,num_workers=8,collate_fn=lambda batch: batch,)start = time.time()total_size = 0for data in dataloader:for video,label in data:for frame in video:total_size += len(frame)cost_time = time.time() - starttotal_size /= (1024 * 1024)print("Normal pytorch dataloader random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
# 2 processes# test 1Normal pytorch dataloader random read, fetch 1351.81MB in total, cost 253.17s, avg speed is 5.34MB/s.# test 2Normal pytorch dataloader random read, fetch 1353.02MB in total, cost 254.13s, avg speed is 5.32MB/s.# test 3Normal pytorch dataloader random read, fetch 1351.99MB in total, cost 253.90s, avg speed is 5.32MB/s.# avgcost 253.73s, speed is 5.33MV/s# 4 processes# test 1Normal pytorch dataloader random read, fetch 1352.94MB in total, cost 276.83s, avg speed is 4.89MB/s.# test 2Normal pytorch dataloader random read, fetch 1353.21MB in total, cost 276.60s, avg speed is 4.89MB/s.# test 3Normal pytorch dataloader random read, fetch 1352.38MB in total, cost 276.46s, avg speed is 4.89MB/s.# avgcost 276.63s, speed is 4.89MB/s# 8 processes# test 1Normal pytorch dataloader random read, fetch 1352.62MB in total, cost 278.33s, avg speed is 4.86MB/s.# test 2Normal pytorch dataloader random read, fetch 1352.78MB in total, cost 275.84s, avg speed is 4.90MB/s.# test 3Normal pytorch dataloader random read, fetch 1353.25MB in total, cost 275.81s, avg speed is 4.91MB/s.# avgcost 276.66s, speed is 4.89MB/s
import timeimport randomfrom torchlearning.datasets import MioDatasetfrom torch.utils.data import DataLoaderdataset = MioDataset(root="/home/chenyaofo/datasets/hmdb51_mio",sampler=lambda size: sorted(random.sample(list(range(size)), k=8)),)dataloader = DataLoader(dataset=dataset,batch_size=8,shuffle=True,num_workers=8,collate_fn=lambda batch: batch,)start = time.time()total_size = 0for data in dataloader:for video,label in data:for frame in video:total_size += len(frame)cost_time = time.time() - starttotal_size /= (1024 * 1024)print("MIO pytorch dataloader random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
# 2 processes# test 1MIO pytorch dataloader random read, fetch 1352.52MB in total, cost 158.24s, avg speed is 8.55MB/s.# test 2MIO pytorch dataloader random read, fetch 1352.20MB in total, cost 158.85s, avg speed is 8.51MB/s.# test 3MIO pytorch dataloader random read, fetch 1352.41MB in total, cost 160.38s, avg speed is 8.43MB/s.# avgcost 159.16s, speed is 8.50MB/s# 4 processes# test 1MIO pytorch dataloader random read, fetch 1352.05MB in total, cost 167.81s, avg speed is 8.06MB/s.# test 2MIO pytorch dataloader random read, fetch 1352.29MB in total, cost 169.65s, avg speed is 7.97MB/s.# test 3MIO pytorch dataloader random read, fetch 1352.38MB in total, cost 168.46s, avg speed is 8.03MB/s.# avgcost 168.64s, speed is 8.02MB/s# 8 processes# test 1MIO pytorch dataloader random read, fetch 1352.85MB in total, cost 168.30s, avg speed is 8.04MB/s.# test 2MIO pytorch dataloader random read, fetch 1353.60MB in total, cost 168.01s, avg speed is 8.06MB/s.# test 3MIO pytorch dataloader random read, fetch 1352.27MB in total, cost 169.02s, avg speed is 8.00MB/s.# avgcost 168.44s, speed is 8.05MB/s
下面是上述所有测试结果的汇总,我们主要关注加载数据的时间消耗和加载速度。
| 测试项目 | 普通加载 | MIO格式加载 | 普通加载 | MIO格式加载 | ||
|---|---|---|---|---|---|---|
| 顺序读取全部帧(单进程) | 229.45s | 88.22s | -61.56% | 68.39MB/s | 177.90MB/s | +160.13% |
| 随机读取1帧(单进程) | 91.88s | 58.72s | -36.10% | 1.84MB/s | 2.88MB/s | +56.52% |
| 随机读取2帧(单进程) | 121.01s | 81.94s | -32.23% | 2.80MB/s | 4.13MB/s | +47.50% |
| 随机读取4帧(单进程) | 151.82s | 105.12s | -32.74% | 4.46MB/s | 6.43MS/s | +44.17% |
| 随机读取8帧(单进程) | 179.5s | 123.37s | -31.27% | 7.54MB/s | 10.96MB/s | +45.36% |
| 随机读取8帧(2进程) | 253.73s | 159.16s | -37.27% | 5.33MV/s | 8.50MB/s | +59.47% |
| 随机读取8帧(4进程) | 276.63s | 168.64s | -39.04% | 4.89MB/s | 8.02MB/s | +64.01% |
| 随机读取8帧(8进程) | 276.66s | 168.44s | -39.12% | 4.89MB/s | 8.05MB/s | +64.62% |