"Research on Correlation-Aware Organization Model and Key Technology in Hybrid Storage Systems", National Natural Science Foundation of China (NSFC), No. 61173043, 2012- 2015.

 

Abstract:

There exist two main problems of storage organization models in current large-scale file systems, i.e., massive data and heterogeneous devices. The scalability and performance optimization become challenging. In order to address these two problems, this project proposes correlation-aware based model in hybrid storage organization. The basic idea is to use locality-sensitive hashing computation to obtain fast and accurate aggregation for correlated data. Many file operations (e.g., read/write/update) can be executed in sequence. The performance of SSD can be significantly improved with the decrements of erasing times. This can also improve the interface between systems and users to facilitate the expression from users' requests to system operations. The correlation-aware design allows high-level applications to know the data layout and access patterns, to further improve the storage efficiency for massive data and the utilization of heterogeneous devices. The aggregation based multi-version update method can improve system reliability. The research work has some previous basis and can be implemented and tested in massive storage environments. Semi-hierarchical Semantic-aware Storage releases source codes of main components in GitHub for public use.

 

Publications:

Journal:

Yu Hua, Hong Jiang, Dan Feng, "Real-time Semantic Search using Approximate Methodology for Large-scale Storage Systems", Accepted and to appear in IEEE Transactions on Parallel and Distributed Systems (TPDS).

Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Jingning Liu, Wen Xia, Fangting Huang, Qing Liu, "Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge", Accepted and to appear in IEEE Transactions on Parallel and Distributed Systems (TPDS).

Wen Xia, Hong Jiang, Dan Feng, Lei Tian, "DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads", Accepted and to appear in IEEE Transactions on Computers (TC).

Yu Hua, Bin Xiao, Xue Liu, Dan Feng, "The Design and Implementations of Locality-aware Approximate Queries in Hybrid Storage Systems", IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 26, No.11, November 2015, pages: 3194-3207. WNLO News Report.

Jinjun Liu, Dan Feng, Yu Hua, Bin Peng, Zhenhua Nie, "Using Provenance to Efficiently Improve Metadata Searching Performance in Storage Systems", Future Generation Computer Systems (FGCS), Volume 50, September 2015, Pages 99–110.

Wen Xia, Hong Jiang, Dan Feng, Yu Hua, "Similarity and Locality based Indexing for High Performance Data Deduplication", IEEE Transactions on Computers (TC), Vol.64, No.4, April 2015, pages: 1162-1176.

Mao Wei, Liu Jingning, Tong Wei, Feng Dan, Li Zheng, Zhou Wen, Zhang Shuangwu, "A Review of Storage Technology Research based on Phase Change Memory",  Chinese Journal of Computer (计算机学报), Vol.18, No.5, 2015, pages: 944-960. (in Chinese)

Yu Hua, Xue Liu, Wenbo He, Dan Feng, "Design and Implementation of Holistic Scheduling and Efficient Storage for FlexRay", IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol.25, No.10, October 2014, pages: 2529-2539.

Yu Hua, Xue Liu, Hong Jiang, "ANTELOPE: A Semantic-aware Data Cube Scheme for Cloud Data Center Networks", IEEE Transactions on Computers (TC), Vol.63, No.9, September 2014, pages: 2146-2159.

Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Xu, "SANE: Semantic-Aware Namespace in Ultra-large-scale File Systems", IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol.25, No.5, May 2014, pages:1328-1338.

Nan Zhu, Xue Liu, Jie Liu, Yu Hua, "Towards A Cost-Efficient MapReduce: Mitigating Power Peaks for Hadoop Clusters", Tsinghua Science and Technology (TST), Volume 19, Issue 1, Feb. 2014, pages: 24-32. (Selected as the Spotlight Paper and Best Paper Award)

Yu Hua, Xue Liu, Dan Feng, "Data Similarity-aware Computation Infrastructure for the Cloud", IEEE Transactions on Computers (TC), Vol.63, No.1, January 2014, pages: 3-16.

Dan He, Fang Wang, Hong Jiang, Dan Feng, JingNing Liu, Wei Tong, Zheng Zhang, “Improving Hybrid FTL by Fully Exploiting Internal SSD Parallelism with Virtual Blocks,” ACM Transactions on Architecture and Code Optimization (TACO), Vol. 11, No. 4, December 2014, pages: 43-62.

Yulai Xie, Dan Feng, Zhipeng Tan, and Junzhe Zhou, “Design and Evaluation of a Provenance-Based Rebuild Framework”, IEEE Transactions on Magnetics, Vol. 49, No. 6, June 2013, pages: 2805-2811.

Yu Hua, Xue Liu, "Scheduling Heterogeneous Flows with Delay-aware Deduplication for Avionics Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 23, No. 9, September 2012, pages: 1790-1802.

Yu Hua, Bin Xiao, Bharadwaj Veeravalli, Dan Feng. "Locality-Sensitive Bloom Filter for Approximate Membership Query", IEEE Transactions on Computers (TC), Vol. 61, No. 6, June 2012, pages: 817-830. (Download Source Codes and Manual)

Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian. "Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems", IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol.23, No. 2, February 2012, pages: 337-344.

 

 

Conference:

Qing Liu, Dan Feng, Hong Jiang, Yuchong Hu, Tianfeng Jiao, "Z codes: General Systematic Erasure Codes with Optimal Repair Bandwidth under Minimum Storage for Distributed Storage Systems", Proceedings of the 34th Symposium on Reliable Distributed Systems (SRDS), 2015.

Yunxiang Wu, Fang Wang, Yu Hua, Dan Feng, Yuchong Hu, Jingning Liu, Wei Tong, "FastFCoE: An Efficient and Scale-up Multi-core Framework for FCoE-based SAN Storage Systems", Proceedings of the 44th International Conference on Parallel Processing (ICPP), 2015, Pages: 330-339. (Acceptance rate: 99/305=32.5%)

Wen Xia, Chunguang Li, Hong Jiang, Dan Feng, Yu Hua, Leihua Qin, Yucheng Zhang, "Edelta: A Word-Enlarging Based Fast Delta Compression Approach", Proceedings of  the 7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), 2015. (Acceptance rate: 17/55=30.9%)

Yu Hua, "Smart Hashing based Queries in the Cloud", Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQoS), 2015, pages: 1-10. (Acceptance rate: 20/89=22.5% as Full Paper, in conjunction with ACM FCRC 2015)

Zheng Li, Shuangwu Zhang, Jingning Liu, Wei Tong, Yu Hua, Dan Feng, Chenye Yu, "A Software-Defined Fusion Storage System for PCM and NAND Flash", Proceedings of  the 4th IEEE Non-Volatile Memory System and Applications Symposium (NVMSA), 2015

Qing Liu, Dan Feng, Zhan Shi, Min Fu, "General Functional Regenerating Codes with Uncoded Repair for Distributed Storage System", Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015.

Yuanyuan Sun, Yu Hua, Dan Feng, Ling Yang, Pengfei Zuo, Shunde Cao, "MinCounter: An Efficient Cuckoo Hashing Scheme for Cloud Storage Systems", Proceedings of the 31st International Conference on Massive Storage Systems and Technology (MSST), 2015. (Acceptance rate: 23/100=23%)

Yukun Zhou, Dan Feng, Wen Xia, Min Fu, Fangting Huang, Yucheng Zhang, and Chunguang Li. "SecDep: A User-Aware Efficient Fine-Grained Secure Deduplication Scheme with Multi-Level Key Management". Proceedings of the 31st International Conference on Massive Storage Systems and Technology (MSST), June, 2015. (Acceptance rate: 23/100=23%)

Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Yucheng Zhang, Yujuan Tan, "Design Tradeoffs for Data Deduplication Performance in Backup Workloads", Proceedings of  the 13th USENIX Conference on File and Storage Technologies (FAST), February 2015, pages:331-344. (Acceptance rate: 28/130=21.5%)

Yu Hua, Wenbo He,  Xue Liu, Dan Feng, "SmartEye: Real-time and Efficient Cloud Image Sharing for Disaster Environments", Proceedings of the 34th IEEE International Conference on  Computer Communications (INFOCOM), 2015, pages: 1616-1624. (Acceptance rate: 316/1640=19%)

Yucheng Zhang, Hong Jiang, Dan Feng, Wen Xia, Min Fu, Fangting Huang, Yukun Zhou. “AE: An Asymmetric Extremum Content Defined Chunking Algorithm for Fast and Bandwidth-Efficient Data Deduplication”. Proceedings of the 34th IEEE International Conference on  Computer Communications (INFOCOM), 2015, pages: 1337-1345. (Acceptance rate: 316/1640=19%)

Jinjun Liu, Dan Feng, Yu Hua, Bin Peng, Pengfei Zuo, Yuanyuan Sun, "P-index: An Efficient Searchable Metadata Indexing Scheme based on Data Provenance in Cold Storage", Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), 2015.

Yu Hua, Hong Jiang, Dan Feng, "FAST: Near Real-time Searchable Data Analytics for the Cloud", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), November 2014, Pages: 754-765. (Acceptance rate: 82/394=20.8%)

Jinjun Liu, Dan Feng, Yu Hua, Bin Peng, Pengfei Zuo, "Application-aware Video-Sharing Services via Provenance in Cloud Storage", Accepted and to appear in the Proceedings of the 33rd IEEE International Performance Computing and Communications Conference (IPCCC), December 2014.

Yu Hua, Dan Feng, "A Correlation-Aware Partial Materialization Scheme for Near Real-Time Automotive Queries", Proceedings of 2014 International Conference on Smart Computing (SMARTCOMP), November 2014, pages: 237-244.

Yu Hua, Lei RaoXue Liu, Dan Feng, "Cooperative and Efficient Real-time Scheduling for Automotive Communications", Proceedings of the 34th International Conference on Distributed Computing Systems (ICDCS), 2014, pages:134-143. (Acceptance rate: 66/500=13%)

Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Fangting Huang, Qing Liu, "Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information", Proceedings of USENIX Annual Technical Conference (USENIX ATC), June 2014, pages: 181-192. (Acceptance rate: 36/241=14.9%).

Yu Hua, Dan Feng, "Needle in A Haystack: Cost-Effective Data Analytics for Real-Time Cloud Sharing", Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQoS), 2014, pages: 41-49. (Acceptance rate: 23.8%)

Qiuyu Li, Yu Hua, Wenbo He, Dan Feng, Zhenhua Nie, Yuanyuan Sun, "Necklace: An Efficient Cuckoo Hashing Scheme for Cloud Storage Services", Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQoS), 2014, pages: 50-55.

Yu Hua, Xue Liu, Dan Feng, "Neptune: Efficient Remote Communication Services for Cloud Backups", Proceedings of the 33rd IEEE International Conference on  Computer Communications (INFOCOM), 2014, pages: 844-852. (Acceptance rate: 19.4%)

Jing Zhang, Xiangke Liao, Shanshan Li, Yu Hua, Xue Liu, Bin Lin, "Aggrecode: Constructing Route Intersection for Data Reconstruction in Erasure Coded Storage Systems", Proceedings of the 33rd IEEE International Conference on  Computer Communications (INFOCOM), 2014, pages: 2139-2147. (Acceptance rate: 19.4%)

Rongyu Lai, Yu Hua, Dan Feng, Wen Xia, Min Fu, Yifan Yang, "A Near-exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems", Proceedings of the 14th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), LNCS 8630, 2014, pages 457-471. (Acceptance rate: 70/285=24.5%)

Zhenhua Nie, Yu Hua, Dan Feng, Qiuyu Li, Yuanyuan Sun, "Efficient Storage Support for Real-time Near-duplicate Video Retrieval", Proceedings of the 14th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), LNCS 8631, 2014, pages: 312-324. (Acceptance rate: 70/285=24.5%)

Wen Xia, Hong Jiang, Dan Feng, and Lei Tian, "Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets", Proceedings of IEEE Data Compression Conference (IEEE DCC), Snowbird, Utah, USA, March 26-28, 2014.

Wen Xia, Hong Jiang, Dan Feng, Lei Tian, Min Fu, and Yukun Zhou, "Ddelta: A Deduplication-Inspired Fast Delta Compression Approach", Proceedings of the 32nd International Symposium on Computer Performance, Modeling, Measurements and Evaluation (IFIP Performance), Italy, October 7-9, 2014.

Yu Hua, Bin Xiao, Xue Liu, "NEST: Locality-aware Approximate Query Service for Cloud Computing", Proceedings of the 32nd IEEE International Conference on  Computer Communications (INFOCOM), April 2013, pages: 1327-1335. (Acceptance rate: 17%, Download  Source Codes and Manual)

Sai Huang, Qingsong Wei , Jianxi Chen, Cheng Chen, Dan Feng, "Improving Flash-based Disk Cache with Lazy Adaptive Replacement", Proceedings of the IEEE International Conference on Massive Storage Systems and Technology (MSST), 2013.

Yu Hua, Xue Liu, Dan Feng, "MERCURY: A Scalable and Similarity-aware Scheme in Multi-level Cache Hierarchy", Proceedings of the 20th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), August 2012, pages: 371-378.

Yu Hua, Xue Liu, Wenbo He, "HOSA: Holistic Scheduling and Analysis for Scalable Fault-tolerant FlexRay Design", Proceedings of the 31st IEEE International Conference on  Computer Communications (INFOCOM), March 2012, pages: 1233-1241. (Acceptance rate: 18%)

Zhichao Yan, Hong Jiang, Dan Feng, Lei Tian and Yujuan Tan, "SUV: A Novel Single-Update Version-Management Scheme for Hardware Transactional Memory Systems", Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), 2012.

Yulai Xie, Dan Feng, Zhipeng Tan, Lei Chen, Kiran-KumarMuniswamy-Reddy, Yan Li and Darrell D. E. Long, "A Hybrid Approach for Efficient Provenance Storage", Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012.

Wen Xia, Hong Jiang, Dan Feng, Lei Tian, Min Fu, Zhongtao Wang, "P-Dedupe: Exploiting Parallelism in Data Deduplication System", Proceedings of the IEEE Seventh International Conference on Networking, Architecture, and Storage (NAS), 2012.

Yi Qin, Dan Feng, Jingning Liu, Wei Tong, Yang Hu, Zhiming Zhu, "A Parity Scheme to Enhance Reliability for SSDs", Proceedings of the IEEE Seventh International Conference on Networking, Architecture and Storage (NAS), 2012.

 

Open-source Software

LSBF (Locality-Sensitive Bloom Filter) in GitHub (Download Paper, Source Codes and Manual).

NEST: in GitHub (Download Paper, Source CodesManual and TraceData).

E-STORE: in GitHub. E-STORE (FAST 2015 Poster) offers near-deduplication for image sharing based on the energy availability in Smartphone.

MinCounter: in GitHub. MinCounter is the proposed data structure in the MSST 2015 Paper.

 

 

Patent and Software Copyright

Energy-aware real-time image sharing in disaster environments, 2015 (Filed)

Smart In-network Redundancy Identification for Online Multimedia Applications, 2015 (Filed, Software Copyright)

In-network deduplication scheme in software-defined networks, 2015 (Filed)

Provenance-aware video sharing services in cloud storage systems, 2014 (Filed)

A fragmentation elimination scheme based on historical information, 2014 (Filed) 

A data recovery approach in cloud backups, 2014  (Filed)

A provenance based metadata query scheme in storage systems, 2013 (Filed)

Near-duplicate video detection scheme in large-scale storage systems, 2013 (Filed)

A write sequence analysis in internal SSD, 2013 (Filed)

......

 

Correlation-aware multi-dimensional metadata management systems, 2015 (Granted)

Cost-efficient smartphone-based image sharing and near-deduplication services, July 2015 (Granted, Software Copyright)

A data deduplication scheme by exploiting similarity and locality, August 2012 (Granted)

A computer management method, May 2012 (Granted)

Data classification management software, October 2011 (Granted, Software Copyright)

An improved method for parallel streaming media servers, December 2010 (Granted)

Importance evaluation method of files, April 2010 (Granted)

......