X-Received: by 10.50.41.1 with SMTP id b1mr1386000igl.7.1429585466572; Mon, 20 Apr 2015 20:04:26 -0700 (PDT) X-Received: by 10.50.152.39 with SMTP id uv7mr353854igb.7.1429585466559; Mon, 20 Apr 2015 20:04:26 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!l13no7957529iga.0!news-out.google.com!db6ni21402igc.0!nntp.google.com!l13no7957527iga.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.os.linux.development.apps Date: Mon, 20 Apr 2015 20:04:25 -0700 (PDT) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=65.49.68.187; posting-account=a_BjagoAAAA6awvngiyeSEJGIwHilPnK NNTP-Posting-Host: 65.49.68.187 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <8a5c8a11-800f-4079-be95-371889c5d7cb@googlegroups.com> Subject: The core of the core of the big data solutions -- Map From: Wenwei Peng Injection-Date: Tue, 21 Apr 2015 03:04:26 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.os.linux.development.apps:776 Title: The core of the core of the big data solutions -- Map Author: pengwenwei Email: =20 Language: c++ Platform: Windows, linux Technology: Perfect hash algorithm Level: Advanced Description: Map algorithm with high performance Section MFC c++ map stl SubSection c++ algorithm License: (GPLv3) Download demo project - 1070 Kb Download source - 1070 Kb Introduction: For the c++ program, map is used everywhere.And bottleneck of program perfo= rmance is often the performance of map.Especially in the case of large data= ,and the business association closely and unable to realize the data distri= bution and parallel processing condition.So the performance of map becomes = the key technology. In the work experience with telecommunications industry and the information= security industry, I was dealing with the big bottom data,especially the m= ost complex information security industry data,all can't do without map. For example, IP table, MAC table, telephone number list, domain name resolu= tion table, ID number table query, the Trojan horse virus characteristic co= de of cloud killing etc.. The map of STL library using binary chop, its has the worst performance.Goo= gle Hash map has the optimal performance and memory at present, but it has = repeated collision probability.Now the big data rarely use a collision prob= ability map,especially relating to fees, can't be wrong. Now I put my algorithms out here,there are three kinds of map,after the bui= ld is Hash map.We can test the comparison,my algorithm has the zero probabi= lity of collision,but its performance is also better than the hash algorith= m, even its ordinary performance has no much difference with Google. My algorithm is perfect hash algorithm,its key index and the principle of c= ompression algorithm is out of the ordinary,the most important is a complet= ely different structure,so the key index compression is fundamentally diff= erent.The most direct benefit for program is that for the original map need= ten servers for solutions but now I only need one server. Declare: the code can not be used for commercial purposes, if for commercia= l applications,you can contact me with QQ 75293192. Download: https://sourceforge.net/projects/pwwhashmap/files Applications: First,modern warfare can't be without the mass of information query, if the= query of enemy target information slows down a second, it could lead to th= e delaying fighter, leading to failure of the entire war. Information retri= eval is inseparable from the map, if military products use pwwhashMap inste= ad of the traditional map,you must be the winner. Scond,the performance of the router determines the surfing speed, just repl= ace open source router code map for pwwHashMap, its speed can increase ten = times. There are many tables to query and set in the router DHCP ptotocol,such as = IP,Mac ,and all these are completed by map.But until now,all map are using= STL liabrary,its performance is very low,and using the Hash map has error = probability,so it can only use multi router packet dispersion treatment.If = using pwwHashMap, you can save at least ten sets of equipment. Third,Hadoop is recognized as the big data solutions at present,and its mos= t fundamental thing is super heavy use of the map,instead of SQL and table.= Hadoop assumes the huge amounts of data so that the data is completely unab= le to move, people must carry on the data analysis in the local.But as long= as the open source Hadoop code of the map changes into pwwHashMap, the per= formance will increase hundredfold without any problems. Background to this article that may be useful such as an introduction to th= e basic ideas presented: http://blog.csdn.net/chixinmuzi/article/details/1727195