X-Received: by 10.42.76.145 with SMTP id e17mr1547529ick.34.1429586808759; Mon, 20 Apr 2015 20:26:48 -0700 (PDT) X-Received: by 10.50.114.8 with SMTP id jc8mr354537igb.11.1429586808745; Mon, 20 Apr 2015 20:26:48 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!m20no5038664iga.0!news-out.google.com!db6ni21402igc.0!nntp.google.com!l13no7961161iga.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.sys.mac.programmer.help Date: Mon, 20 Apr 2015 20:26:48 -0700 (PDT) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=65.49.68.187; posting-account=a_BjagoAAAA6awvngiyeSEJGIwHilPnK NNTP-Posting-Host: 65.49.68.187 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: The core of the core of the big data solutions -- Map From: Wenwei Peng Injection-Date: Tue, 21 Apr 2015 03:26:48 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.sys.mac.programmer.help:460 Title: The core of the core of the big data solutions -- Map Author: pengwenwei Email: wenwei1971043=20 Language: c++ Platform: Windows, linux Technology: Perfect hash algorithm Level: Advanced Description: Map algorithm with high performance Section MFC c++ map stl SubSection c++ algorithm License: (GPLv3) Download demo project - 1070 Kb Download source - 1070 Kb Introduction: For the c++ program, map is used everywhere.And bottleneck of program perfo= rmance is often the performance of map.Especially in the case of large data= ,and the business association closely and unable to realize the data distri= bution and parallel processing condition.So the performance of map becomes = the key technology. In the work experience with telecommunications industry and the information= security industry, I was dealing with the big bottom data,especially the m= ost complex information security industry data,all can't do without map. For example, IP table, MAC table, telephone number list, domain name resolu= tion table, ID number table query, the Trojan horse virus characteristic co= de of cloud killing etc.. The map of STL library using binary chop, its has the worst performance.Goo= gle Hash map has the optimal performance and memory at present, but it has = repeated collision probability.Now the big data rarely use a collision prob= ability map,especially relating to fees, can't be wrong. Now I put my algorithms out here,there are three kinds of map,after the bui= ld is Hash map.We can test the comparison,my algorithm has the zero probabi= lity of collision,but its performance is also better than the hash algorith= m, even its ordinary performance has no much difference with Google. My algorithm is perfect hash algorithm,its key index and the principle of c= ompression algorithm is out of the ordinary,the most important is a complet= ely different structure,so the key index compression is fundamentally diff= erent.The most direct benefit for program is that for the original map need= ten servers for solutions but now I only need one server. Declare: the code can not be used for commercial purposes, if for commercia= l applications,you can contact me with QQ 75293192. Download: https://sourceforge.net/projects/pwwhashmap/files Applications: First,modern warfare can't be without the mass of information query, if the= query of enemy target information slows down a second, it could lead to th= e delaying fighter, leading to failure of the entire war. Information retri= eval is inseparable from the map, if military products use pwwhashMap inste= ad of the traditional map,you must be the winner. Scond,the performance of the router determines the surfing speed, just repl= ace open source router code map for pwwHashMap, its speed can increase ten = times. There are many tables to query and set in the router DHCP ptotocol,such as = IP,Mac ,and all these are completed by map.But until now,all map are using= STL liabrary,its performance is very low,and using the Hash map has error = probability,so it can only use multi router packet dispersion treatment.If = using pwwHashMap, you can save at least ten sets of equipment. Third,Hadoop is recognized as the big data solutions at present,and its mos= t fundamental thing is super heavy use of the map,instead of SQL and table.= Hadoop assumes the huge amounts of data so that the data is completely unab= le to move, people must carry on the data analysis in the local.But as long= as the open source Hadoop code of the map changes into pwwHashMap, the per= formance will increase hundredfold without any problems. Background to this article that may be useful such as an introduction to th= e basic ideas presented: http://blog.csdn.net/chixinmuzi/article/details/1727195