[关闭]
@chenyaofo 2018-08-15T13:57:22.000000Z 字数 9290 阅读 784

MIO数据格式加载速度测试

未分类


本次测试选用HMDB51数据集,其压缩包大小为2.0G左右,其中包含6817个视频。

使用opencv库对这些视频进行分帧,最后得到639417帧,数据大小为15.4GB,占用空间16.6G,平均每一帧大小为25.25KB。

然后使用MIO库将HMDB51数据集全部帧转换为MIO格式,索引大小为7.4MB,对象大小为16GB。

可以直接跳到文章结尾查看结果总结。

测试说明

每一次测试之前都会使用以下命令清空所有缓冲区,包括页面缓存,目录项和inode,

  1. sync; echo 3 > /proc/sys/vm/drop_caches

以保证测试的公平性。

每个测试代码执行3次,计算平均值。

测试环境

#
OS Ubuntu 16.04.4 LTS (4.4.0-116-generic)
CPU Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz(4C8T)
Memory 16GB (no swap)
Disk Western Digital Blue(3.5 inchs, 7200 npm, 1TB)

单进程读取测试

以下两个测试均使用单线程进行读取。

顺序读取全部帧

普通读取

  1. import time
  2. import pathlib
  3. if __name__ == '__main__':
  4. root = pathlib.Path("/home/chenyaofo/datasets/hmdb51")
  5. start = time.time()
  6. total_size = 0
  7. for category in root.iterdir():
  8. for video in category.iterdir():
  9. size = len(list(video.iterdir()))
  10. for i in range(size):
  11. b = (video/"{:05d}.jpg".format(i)).read_bytes()
  12. total_size += len(b)
  13. cost_time = time.time() - start
  14. total_size /= (1024 * 1024)
  15. print("Normal sequence read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
  16. .format(total_size, cost_time, total_size / cost_time))
  1. # test 1
  2. Normal sequence read, fetch 15692.37MB in total, cost 230.56s, avg speed is 68.06MB/s.
  3. # test 2
  4. Normal sequence read, fetch 15692.37MB in total, cost 228.25s, avg speed is 68.75MB/s.
  5. # test 3
  6. Normal sequence read, fetch 15692.37MB in total, cost 229.54s, avg speed is 68.36MB/s.
  7. # avg
  8. cost 229.45s, speed is 68.39MB/s

MIO读取

  1. import time
  2. from torchlearning.mio import MIO
  3. if __name__ == '__main__':
  4. m = MIO("/home/chenyaofo/datasets/hmdb51_mio")
  5. start = time.time()
  6. total_size = 0
  7. for i in range(m.size):
  8. objects = m.fetchall(i)
  9. for o in objects:
  10. total_size += len(o)
  11. cost_time = time.time() - start
  12. total_size /= (1024 * 1024)
  13. print("MIO sequence read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
  14. .format(total_size, cost_time, total_size / cost_time))
  1. # test 1
  2. MIO sequence read, fetch 15692.37MB in total, cost 87.45s, avg speed is 179.44MB/s.
  3. # test 2
  4. MIO sequence read, fetch 15692.37MB in total, cost 89.71s, avg speed is 174.93MB/s.
  5. # test 3
  6. MIO sequence read, fetch 15692.37MB in total, cost 87.50s, avg speed is 179.34MB/s.
  7. # avg
  8. cost 88.22s, speed is 177.90MB/s

随机读取部分帧

每个视频随机读取8帧。

普通读取

  1. import time
  2. import random
  3. import pathlib
  4. if __name__ == '__main__':
  5. root = pathlib.Path("/home/chenyaofo/datasets/hmdb51")
  6. start = time.time()
  7. total_size = 0
  8. all_videos = []
  9. for category in root.iterdir():
  10. for video in category.iterdir():
  11. all_videos.append(video)
  12. random.shuffle(all_videos)
  13. for video in all_videos:
  14. size = len(list(video.iterdir()))
  15. to_sample_ids = sorted(random.sample(range(size), k=8))
  16. for i in to_sample_ids:
  17. b = (video / "{:05d}.jpg".format(i)).read_bytes()
  18. total_size += len(b)
  19. cost_time = time.time() - start
  20. total_size /= (1024 * 1024)
  21. print("Normal random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
  22. .format(total_size, cost_time, total_size / cost_time))

测试结果

  1. # for 1 frames
  2. #test 1
  3. Normal random read, fetch 169.14MB in total, cost 91.77s, avg speed is 1.84MB/s.
  4. # test 2
  5. Normal random read, fetch 169.24MB in total, cost 91.60s, avg speed is 1.85MB/s.
  6. # test 3
  7. Normal random read, fetch 169.26MB in total, cost 92.27s, avg speed is 1.83MB/s.
  8. # avg
  9. cost 91.88s, speed is 1.84MB/s
  10. # for 2 frames
  11. # test 1
  12. Normal random read, fetch 338.50MB in total, cost 120.87s, avg speed is 2.80MB/s.
  13. # test 2
  14. Normal random read, fetch 338.68MB in total, cost 120.83s, avg speed is 2.80MB/s.
  15. # test 3
  16. Normal random read, fetch 338.14MB in total, cost 121.33s, avg speed is 2.79MB/s.
  17. # avg
  18. cost 121.01s, speed is 2.80MB/s
  19. # for 4 frames
  20. # test 1
  21. Normal random read, fetch 676.36MB in total, cost 151.41s, avg speed is 4.47MB/s.
  22. # test 2
  23. Normal random read, fetch 676.67MB in total, cost 152.40s, avg speed is 4.44MB/s.
  24. # test 3
  25. Normal random read, fetch 676.75MB in total, cost 151.65s, avg speed is 4.46MB/s.
  26. # avg
  27. cost 151.82s, speed is 4.46MB/s
  28. # for 8 frames
  29. # test 1
  30. Normal random read, fetch 1352.37MB in total, cost 179.60s, avg speed is 7.53MB/s.
  31. # test 2
  32. Normal random read, fetch 1353.67MB in total, cost 179.49s, avg speed is 7.54MB/s.
  33. # test 3
  34. Normal random read, fetch 1352.99MB in total, cost 179.41s, avg speed is 7.54MB/s.\
  35. # avg
  36. cost 179.5s, speed is 7.54MB/s

MIO读取

  1. import time
  2. import random
  3. from torchlearning.mio import MIO
  4. if __name__ == '__main__':
  5. m = MIO("/home/chenyaofo/datasets/hmdb51_mio")
  6. start = time.time()
  7. total_size = 0
  8. collection_ids = list(range(m.size))
  9. random.shuffle(collection_ids)
  10. for i in collection_ids:
  11. print(i)
  12. to_sample_ids = sorted(random.sample(range(m.get_collection_size(i)),k=8))
  13. objects = m.fetchmany(i,to_sample_ids)
  14. for o in objects:
  15. total_size += len(o)
  16. cost_time = time.time() - start
  17. total_size /= (1024 * 1024)
  18. print("MIO random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
  19. .format(total_size, cost_time, total_size / cost_time))
  1. # for 1 frames
  2. # test 1
  3. MIO random read, fetch 169.34MB in total, cost 59.15s, avg speed is 2.86MB/s.
  4. # test 2
  5. MIO random read, fetch 169.10MB in total, cost 58.36s, avg speed is 2.90MB/s.
  6. # test 3
  7. MIO random read, fetch 169.40MB in total, cost 58.64s, avg speed is 2.89MB/s.
  8. # avg
  9. cost 58.72s, speed is 2.88MB/s
  10. # for 2 frames
  11. # test 1
  12. MIO random read, fetch 338.30MB in total, cost 81.41s, avg speed is 4.16MB/s.
  13. # test 2
  14. MIO random read, fetch 338.11MB in total, cost 82.52s, avg speed is 4.10MB/s.
  15. # test 3
  16. MIO random read, fetch 337.82MB in total, cost 81.90s, avg speed is 4.12MB/s.
  17. # avg
  18. cost 81.94s, speed is 4.13MB/s
  19. # for 4 frames
  20. # test 1
  21. MIO random read, fetch 676.16MB in total, cost 105.99s, avg speed is 6.38MB/s.
  22. # test 2
  23. MIO random read, fetch 676.40MB in total, cost 104.98s, avg speed is 6.44MB/s.
  24. # test 3
  25. MIO random read, fetch 675.68MB in total, cost 104.38s, avg speed is 6.47MB/s.
  26. # avg
  27. cost 105.12s, speed is 6.43MS/s
  28. # for 8 frames
  29. # test 1
  30. MIO random read, fetch 1352.68MB in total, cost 123.42s, avg speed is 10.96MB/s.
  31. # test 2
  32. MIO random read, fetch 1352.40MB in total, cost 122.81s, avg speed is 11.01MB/s.
  33. # tests 3
  34. MIO random read, fetch 1353.02MB in total, cost 123.88s, avg speed is 10.92MB/s.
  35. # avg
  36. cost 123.37s, speed is 10.96MB/s

多进程读取测试

在多进程读取测试中我们将使用pytorch中的dataloader加载数据

普通加载

  1. import time
  2. import pathlib
  3. import random
  4. from torch.utils.data import DataLoader,Dataset
  5. class VideoDataset(Dataset):
  6. def __init__(self,root):
  7. self.root = pathlib.Path(root)
  8. self.all_videos = []
  9. for category in self.root.iterdir():
  10. for video in category.iterdir():
  11. self.all_videos.append(video)
  12. def __getitem__(self, index):
  13. video = self.all_videos[index]
  14. size = len(list(video.iterdir()))
  15. to_sample_ids = sorted(random.sample(range(size), k=8))
  16. rev = []
  17. for i in to_sample_ids:
  18. rev.append((video / "{:05d}.jpg".format(i)).read_bytes())
  19. return rev,0
  20. def __len__(self):
  21. return len(self.all_videos)
  22. dataloader = DataLoader(
  23. dataset=VideoDataset("/home/chenyaofo/datasets/hmdb51"),
  24. batch_size=8,
  25. shuffle=True,
  26. num_workers=8,
  27. collate_fn=lambda batch: batch,
  28. )
  29. start = time.time()
  30. total_size = 0
  31. for data in dataloader:
  32. for video,label in data:
  33. for frame in video:
  34. total_size += len(frame)
  35. cost_time = time.time() - start
  36. total_size /= (1024 * 1024)
  37. print("Normal pytorch dataloader random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s."
  38. .format(total_size, cost_time, total_size / cost_time))
  1. # 2 processes
  2. # test 1
  3. Normal pytorch dataloader random read, fetch 1351.81MB in total, cost 253.17s, avg speed is 5.34MB/s.
  4. # test 2
  5. Normal pytorch dataloader random read, fetch 1353.02MB in total, cost 254.13s, avg speed is 5.32MB/s.
  6. # test 3
  7. Normal pytorch dataloader random read, fetch 1351.99MB in total, cost 253.90s, avg speed is 5.32MB/s.
  8. # avg
  9. cost 253.73s, speed is 5.33MV/s
  10. # 4 processes
  11. # test 1
  12. Normal pytorch dataloader random read, fetch 1352.94MB in total, cost 276.83s, avg speed is 4.89MB/s.
  13. # test 2
  14. Normal pytorch dataloader random read, fetch 1353.21MB in total, cost 276.60s, avg speed is 4.89MB/s.
  15. # test 3
  16. Normal pytorch dataloader random read, fetch 1352.38MB in total, cost 276.46s, avg speed is 4.89MB/s.
  17. # avg
  18. cost 276.63s, speed is 4.89MB/s
  19. # 8 processes
  20. # test 1
  21. Normal pytorch dataloader random read, fetch 1352.62MB in total, cost 278.33s, avg speed is 4.86MB/s.
  22. # test 2
  23. Normal pytorch dataloader random read, fetch 1352.78MB in total, cost 275.84s, avg speed is 4.90MB/s.
  24. # test 3
  25. Normal pytorch dataloader random read, fetch 1353.25MB in total, cost 275.81s, avg speed is 4.91MB/s.
  26. # avg
  27. cost 276.66s, speed is 4.89MB/s

MIO加载

  1. import time
  2. import random
  3. from torchlearning.datasets import MioDataset
  4. from torch.utils.data import DataLoader
  5. dataset = MioDataset(
  6. root="/home/chenyaofo/datasets/hmdb51_mio",
  7. sampler=lambda size: sorted(random.sample(list(range(size)), k=8)),
  8. )
  9. dataloader = DataLoader(
  10. dataset=dataset,
  11. batch_size=8,
  12. shuffle=True,
  13. num_workers=8,
  14. collate_fn=lambda batch: batch,
  15. )
  16. start = time.time()
  17. total_size = 0
  18. for data in dataloader:
  19. for video,label in data:
  20. for frame in video:
  21. total_size += len(frame)
  22. cost_time = time.time() - start
  23. total_size /= (1024 * 1024)
  24. print("MIO pytorch dataloader random read, fetch {:.2f}MB in total, cost {:.2f}s, avg speed is {:.2f}MB/s.".format(total_size, cost_time, total_size / cost_time))
  1. # 2 processes
  2. # test 1
  3. MIO pytorch dataloader random read, fetch 1352.52MB in total, cost 158.24s, avg speed is 8.55MB/s.
  4. # test 2
  5. MIO pytorch dataloader random read, fetch 1352.20MB in total, cost 158.85s, avg speed is 8.51MB/s.
  6. # test 3
  7. MIO pytorch dataloader random read, fetch 1352.41MB in total, cost 160.38s, avg speed is 8.43MB/s.
  8. # avg
  9. cost 159.16s, speed is 8.50MB/s
  10. # 4 processes
  11. # test 1
  12. MIO pytorch dataloader random read, fetch 1352.05MB in total, cost 167.81s, avg speed is 8.06MB/s.
  13. # test 2
  14. MIO pytorch dataloader random read, fetch 1352.29MB in total, cost 169.65s, avg speed is 7.97MB/s.
  15. # test 3
  16. MIO pytorch dataloader random read, fetch 1352.38MB in total, cost 168.46s, avg speed is 8.03MB/s.
  17. # avg
  18. cost 168.64s, speed is 8.02MB/s
  19. # 8 processes
  20. # test 1
  21. MIO pytorch dataloader random read, fetch 1352.85MB in total, cost 168.30s, avg speed is 8.04MB/s.
  22. # test 2
  23. MIO pytorch dataloader random read, fetch 1353.60MB in total, cost 168.01s, avg speed is 8.06MB/s.
  24. # test 3
  25. MIO pytorch dataloader random read, fetch 1352.27MB in total, cost 169.02s, avg speed is 8.00MB/s.
  26. # avg
  27. cost 168.44s, speed is 8.05MB/s

总结

下面是上述所有测试结果的汇总,我们主要关注加载数据的时间消耗和加载速度。

测试项目 普通加载 MIO格式加载 普通加载 MIO格式加载
顺序读取全部帧(单进程) 229.45s 88.22s -61.56% 68.39MB/s 177.90MB/s +160.13%
随机读取1帧(单进程) 91.88s 58.72s -36.10% 1.84MB/s 2.88MB/s +56.52%
随机读取2帧(单进程) 121.01s 81.94s -32.23% 2.80MB/s 4.13MB/s +47.50%
随机读取4帧(单进程) 151.82s 105.12s -32.74% 4.46MB/s 6.43MS/s +44.17%
随机读取8帧(单进程) 179.5s 123.37s -31.27% 7.54MB/s 10.96MB/s +45.36%
随机读取8帧(2进程) 253.73s 159.16s -37.27% 5.33MV/s 8.50MB/s +59.47%
随机读取8帧(4进程) 276.63s 168.64s -39.04% 4.89MB/s 8.02MB/s +64.01%
随机读取8帧(8进程) 276.66s 168.44s -39.12% 4.89MB/s 8.05MB/s +64.62%
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注