Groups > comp.lang.c > #172354 > unrolled thread

C vs Haskell for XML parsing

Started by	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
First post	2023-08-16 00:31 -0700
Last post	2023-08-17 03:42 -0700
Articles	20 on this page of 287 — 19 participants

Back to article view | Back to comp.lang.c

  C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 00:31 -0700
    Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-16 11:14 +0100
      Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:23 +0100
        Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 21:38 -0700
          Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 12:19 +0100
            Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-17 07:53 -0700
              Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 00:15 +0100
                Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-18 16:33 -0700
                  Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 21:46 +0100
                Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-19 03:04 -0700
                  Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-19 13:19 +0000
                  Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-19 14:48 +0000
                    Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-19 15:09 +0000
                      Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-19 15:17 +0000
                      Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-19 21:05 +0000
                    Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 21:05 +0100
                      Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-19 21:07 +0000
                  Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 22:31 +0100
                    Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-19 22:04 -0700
                      Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-20 07:41 -0400
                      Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-20 17:00 +0100
                        Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-20 11:20 -0700
                          Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-20 14:45 -0700
                          Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-21 00:05 +0100
                            Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-20 19:45 -0700
                              Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-21 14:51 +0100
                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-21 11:28 +0200
                              Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-21 02:59 -0700
                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-21 15:17 +0200
                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-21 23:03 -0700
                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-22 14:09 +0200
                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-22 05:38 -0700
                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-22 15:31 +0200
                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-22 06:51 -0700
                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-22 19:19 +0200
                                              Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-22 21:59 -0700
                                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 09:57 +0200
                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 07:48 -0700
                                                    Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-23 16:05 +0100
                                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 08:21 -0700
                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 19:30 +0200
                                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 18:50 +0200
                                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 10:49 -0700
                                                        Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-23 18:08 +0000
                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 21:28 +0200
                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 20:53 -0700
                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-24 15:15 +0200
                                                              Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-24 07:50 -0700
                                                                Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-24 16:48 +0100
                                                                  Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-24 17:35 +0000
                                                                    Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-24 18:09 +0000
                                                                  Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 09:59 +0200
                                                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 09:46 +0200
                                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 01:37 -0700
                                                                    Re: C vs Haskell for XML parsing Spiros Bousbouras <spibou@gmail.com> - 2023-08-25 08:50 +0000
                                                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 01:53 -0700
                                                                        Underscores in type names (was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-25 09:17 +0000
                                                                          Re: Underscores in type names (was : C vs Haskell for XML parsing) Richard Harnden <richard.nospam@gmail.com> - 2023-08-25 11:35 +0100
                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 13:42 +0200
                                                                        Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-25 13:59 +0000
                                                                          Re: C vs Haskell for XML parsing candycane@f172.n1.z21.fsxnet (candycane) - 2023-08-26 00:45 +1300
                                                                        Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-25 19:50 +0100
                                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 02:55 -0700
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-26 19:21 +0200
                                                                              Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-27 03:05 -0700
                                                                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:28 +0200
                                                                                  Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:01 +0000
                                                                                    Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:07 -0700
                                                                                      Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 09:16 +0200
                                                                                        Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 19:22 +0000
                                                                                          Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-29 19:38 +0000
                                                                                            Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 20:11 +0000
                                                                                          Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 21:59 +0200
                                                                                            Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-30 00:43 -0700
                                                                                              Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 12:30 +0200
                                                                                                Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-30 05:04 -0700
                                                                                                  Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 17:50 +0200
                                                                                                    Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-30 19:41 +0000
                                                                                                      Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-31 11:18 +0200
                                                                                          Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-30 14:40 +0000
                                                                                            Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-30 15:03 +0000
                                                                                            Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 12:00 -0700
                                                                                            Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 20:50 -0700
                                                                                              Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-31 08:12 +0000
                                                                                                Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-01 11:51 -0700
                                                                            Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 00:55 +0100
                                                                              Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:17 -0700
                                                                    Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 04:31 -0700
                                                                      Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-25 14:06 +0000
                                                                        Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-25 15:35 +0000
                                                                          Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 11:45 -0700
                                                                            Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-25 20:06 +0000
                                                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 19:35 -0700
                                                                        Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 19:55 -0700
                                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 20:26 -0700
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-26 19:24 +0200
                                                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 02:52 -0700
                                                                        Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-26 14:10 +0000
                                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 22:54 -0700
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:39 +0200
                                                                            Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-08-27 15:56 -0400
                                                                              Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 00:42 -0700
                                                                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 10:39 +0200
                                                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 02:03 -0700
                                                                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 13:29 +0200
                                                                                      Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 16:35 +0000
                                                                                        Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 10:11 -0700
                                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 19:40 +0200
                                                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 12:31 -0700
                                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 22:39 +0200
                                                                                              Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 14:22 -0700
                                                                                                Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:02 -0700
                                                                                Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 16:21 +0000
                                                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 10:05 -0700
                                                                                Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 14:50 -0700
                                                                                Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 14:50 -0700
                                                                              Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:13 +0000
                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-26 19:31 +0200
                                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 23:08 -0700
                                                                            Re: C vs Haskell for XML parsing "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2023-08-26 23:23 -0700
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:41 +0200
                                                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 13:38 +0200
                                                                      Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 11:59 -0700
                                                                        Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 19:34 -0400
                                                                          Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 17:12 -0700
                                                                            Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-26 01:44 +0100
                                                                            Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 22:18 -0400
                                                                              Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 19:58 -0700
                                                                                Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 23:07 -0400
                                                                                  Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 21:17 -0700
                                                                                    Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 10:12 -0400
                                                                                      Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 15:13 -0700
                                                                                        Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 19:47 -0400
                                                                                          Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 19:09 -0700
                                                                                            Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 22:27 -0400
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:55 +0200
                                                                          Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 02:16 +0100
                                                                            Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 18:39 -0700
                                                                            Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 22:26 -0400
                                                                              Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 11:07 +0100
                                                                                Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 10:33 -0400
                                                                                  Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 16:27 +0100
                                                                                    Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 11:57 -0400
                                                                                      Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 17:11 +0100
                                                                                        Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 12:35 -0400
                                                                                          Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 18:24 +0100
                                                                                            Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 13:35 -0400
                                                                                              Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 20:11 +0100
                                                                                                Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 17:07 -0400
                                                                                                  Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 22:40 +0100
                                                                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 23:32 -0700
                                                                                                    Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 03:02 -0700
                                                                                                    Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 13:25 +0100
                                                                                                Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 14:37 -0700
                                                                                    Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-26 19:49 +0000
                                                                                      Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 22:00 +0100
                                                                                        Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 17:31 -0400
                                                                                        Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 15:28 -0700
                                                                                        Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-27 04:24 +0000
                                                                                          Re: C vs Haskell for XML parsing "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2023-08-26 21:59 -0700
                                                                                          Re: C vs Haskell for XML parsing candycane@f172.n1.z21.fsxnet (candycane) - 2023-08-27 02:42 +1300
                                                                                          Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-27 11:23 +0100
                                                                                            Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-27 22:45 +0000
                                                                                              Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-27 19:06 -0400
                                                                                                Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-08-28 02:18 -0400
                                                                                              Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 16:21 -0700
                                                                                                Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 00:00 +0000
                                                                                                  Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 19:36 -0700
                                                                                                    Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 03:00 +0000
                                                                                                Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 06:58 -0700
                                                                                                  Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 15:22 -0700
                                                                                                    Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:49 -0700
                                                                                                      Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 17:11 -0700
                                                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 16:06 +0200
                                                                                                        Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 08:27 -0700
                                                                                                      Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 01:36 +0100
                                                                                                        Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 01:22 +0000
                                                                                                          Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 10:40 +0100
                                                                                                            Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-29 02:53 -0700
                                                                                                            Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 03:00 -0700
                                                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 16:18 +0200
                                                                                                              Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 13:06 -0700
                                                                                                                Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 22:14 -0700
                                                                                                                  Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 01:32 -0700
                                                                                                                    Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 21:09 -0700
                                                                                                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 12:44 +0200
                                                                                                                  Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-08-30 12:32 -0400
                                                                                                                    Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 11:44 -0700
                                                                                                                      Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-09-09 01:15 -0400
                                                                                                                    Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-31 04:47 -0700
                                                                                                                  Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 11:42 -0700
                                                                                                                    Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 23:36 -0700
                                                                                                                      Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-31 08:15 +0000
                                                                                                                        Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-01 11:48 -0700
                                                                                                                Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-09-03 03:55 -0700
                                                                                                                  Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-03 11:44 -0700
                                                                                                                    Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-09-03 16:20 -0700
                                                                                                                      Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-03 16:47 -0700
                                                                                                                        Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-09-03 17:24 -0700
                                                                                                                          Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-03 03:16 -0700
                                                                                                                        Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-09-03 17:26 -0700
                                                                                                                          Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-03 03:19 -0700
                                                                                                            Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 19:43 +0000
                                                                                                              Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 13:23 -0700
                                                                                                                Re: C vs Haskell for XML parsing Bobby Moore <bobbymoore018@gmail.com> - 2023-08-29 13:54 -0700
                                                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 11:41 +0200
                                                                                                        Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 08:29 -0700
                                                                                                          Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 16:54 +0100
                                                                                                      Re: Named function arguments (Was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:30 +0000
                                                                                                        Re: Named function arguments (Was : C vs Haskell for XML parsing) Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-30 19:53 +0000
                                                                                                          Re: Named function arguments (Was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 20:07 +0000
                                                                                                            Re: Named function arguments (Was : C vs Haskell for XML parsing) Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-30 20:42 +0000
                                                                                                            Re: Named function arguments (Was : C vs Haskell for XML parsing) Richard Harnden <richard.nospam@gmail.com> - 2023-08-30 23:15 +0100
                                                                                                              Re: Named function arguments (Was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:41 +0000
                                                                                                            Re: Named function arguments (Was : C vs Haskell for XML parsing) David Brown <david.brown@hesbynett.no> - 2023-08-31 12:43 +0200
                                                                                                        Re: Named function arguments (Was : C vs Haskell for XML parsing) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 20:40 -0700
                                                                                                Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:15 +0000
                                                                                                  Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 15:53 +0100
                                                                                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 18:41 +0200
                                                                                                      Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 18:01 +0100
                                                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 20:01 +0200
                                                                                                          Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 20:14 +0100
                                                                                                            Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 19:27 +0000
                                                                                                              Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:09 -0700
                                                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 21:53 +0200
                                                                                                            Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 20:37 +0000
                                                                                                              Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 23:39 +0100
                                                                                                                Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 00:23 +0000
                                                                                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-29 01:01 -0700
                                                                                                                    Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 19:28 +0000
                                                                                                                  Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 11:08 +0100
                                                                                              Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 01:31 +0100
                                                                            Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 20:18 -0700
                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:50 +0200
                                                                          Re: C vs Haskell for XML parsing Richard Harnden <richard.nospam@gmail.com> - 2023-08-27 19:18 +0100
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 21:19 +0200
                                                                            Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-27 20:33 +0100
                                                                            Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 14:14 -0700
                                                                          Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 13:56 -0700
                                                                            Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 11:00 +0200
                                                                              Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 15:12 -0700
                                                                                Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 16:32 +0200
                                                                                  Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 13:12 -0700
                                                                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 12:50 +0200
                                                                      Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 23:38 -0700
                                                                        Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-26 14:09 +0000
                                                                        Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 00:44 +0100
                                                                          Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-27 00:18 -0700
                                                                            Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 17:56 +0100
                                                                          Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 19:20 +0200
                                                                            Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-27 11:18 -0700
                                                                              Re: C vs Haskell for XML parsing kalevi@kolttonen.fi (Kalevi Kolttonen) - 2023-08-27 18:34 +0000
                                                                                Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 00:32 -0700
                                                                                  Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 11:14 +0200
                                                                                    Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:10 +0000
                                                                                    Re: C vs Haskell for XML parsing kalevi@kolttonen.fi (Kalevi Kolttonen) - 2023-08-29 10:47 +0000
                                                                                      Re: C vs Haskell for XML parsing Michael S <already5chosen@yahoo.com> - 2023-08-29 04:53 -0700
                                                                                        Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 06:35 -0700
                                                                                Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:12 -0700
                                                                              Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 08:24 +0200
                                                                                Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-28 22:17 +0100
                                                                                  Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 14:35 -0700
                                                                              Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 14:38 -0700
                                                                            Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-28 01:00 +0100
                                                                              Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 11:24 +0200
                                                                                Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 03:29 -0700
                                                                                  Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 14:01 +0200
                                                                                    Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 08:40 -0700
                                                                        Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 19:11 +0200
                                                                  Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-25 14:49 +0100
                                                                    Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 19:59 +0200
                                                                      Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-25 18:31 +0000
                                                                    Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 20:03 -0700
                                                    Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-23 14:54 -0700
                                        Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-22 14:57 +0000
                                      Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-22 14:10 +0100
                              Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-21 13:46 +0100
      Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:32 -0700
        Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:47 -0700
      Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 00:37 +0000
        Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:40 -0700
        Re: C vs Haskell for XML parsing Michael S <already5chosen@yahoo.com> - 2023-08-17 02:37 -0700
          Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 13:50 +0000
    Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:07 +0100
    Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:25 -0700
    Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-17 03:32 -0700
      Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-17 03:42 -0700

Page 1 of 15 [1] 2 3 … 15 Next page →

#172354 — C vs Haskell for XML parsing

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-08-16 00:31 -0700
Subject	C vs Haskell for XML parsing
Message-ID	<576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>

Some people here are interested in Haskell.
They might be interested in this:

https://chrisdone.com/posts/fast-haskell-c-parsing-xml/

Of course it's written from a pro-Haskell point of view,  and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.

[toc] | [next] | [standalone]

#172359

From	Bart <bc@freeuk.com>
Date	2023-08-16 11:14 +0100
Message-ID	<ubi7hd$38q7d$1@dont-email.me>
In reply to	#172354

On 16/08/2023 08:31, Malcolm McLean wrote:
> Some people here are interested in Haskell.
> They might be interested in this:
> 
> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> 
> Of course it's written from a pro-Haskell point of view,  and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
>   

"Portability (i.e. Windows) is a pain in the arse with C."

I wonder what makes them say that?

Reading from a file must be the world's most portable kind of program. 
While issues with filenames and paths will be the same whatever the 
language.

So what is it?

[toc] | [prev] | [next] | [standalone]

#172423

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-08-17 00:23 +0100
Message-ID	<87o7j6fu74.fsf@bsb.me.uk>
In reply to	#172359

Bart <bc@freeuk.com> writes:

> On 16/08/2023 08:31, Malcolm McLean wrote:
>> Some people here are interested in Haskell.
>> They might be interested in this:
>> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>> Of course it's written from a pro-Haskell point of view,  and writing an
>> improved version when you've got the C in front of you isn't really a
>> fair test. But he does match C for speed.
>>   
>
> "Portability (i.e. Windows) is a pain in the arse with C."
>
> I wonder what makes them say that?

Yes, I wondered that too, since the cut-down XML parsing they are doing
is one of the most potentially portable bits of C one could write (as
you say yourself):

> Reading from a file must be the world's most portable kind of
> program.

But reading more closely, the remark is a general one about dropping out
of a high-level language for some part of a program rather than being
specific to this task.  None the less, I'd have liked a citation or
link.

> While issues with filenames and paths will be the same whatever
> the language.

Not always.  Some languages have standard library functions to handle
such things (e.g. Python and Haskell).  I imagine that's what the author
was thinking about.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#172434

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-08-16 21:38 -0700
Message-ID	<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
In reply to	#172423

On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes: 
> 
> > On 16/08/2023 08:31, Malcolm McLean wrote: 
> >> Some people here are interested in Haskell. 
> >> They might be interested in this: 
> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ 
> >> Of course it's written from a pro-Haskell point of view, and writing an 
> >> improved version when you've got the C in front of you isn't really a 
> >> fair test. But he does match C for speed. 
> >> 
> > 
> > "Portability (i.e. Windows) is a pain in the arse with C." 
> > 
> > I wonder what makes them say that?
> Yes, I wondered that too, since the cut-down XML parsing they are doing 
> is one of the most potentially portable bits of C one could write (as 
> you say yourself):
>
There are some gotchas with files, but not for the cut down parsing
they implement.
Windows used to accept "rt" for reading a text stream. And there's still
a mess with Unicode. And the XML people say that a parser must accept 
UTF-16.
I implement this by having the lexer call a function pointer to read a
UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
on the fly. If it's UTF-8, it's just an alias for fgetc().
But how do you know the file format? I have code that does this, but if 
I called it, the XML parser would no longer be a single file module. So
I read the first few character of the file, the seek back to the start
position.
Bu this only works on seekable streams. So the high-level parse function
which accepts a stream rather than a file name either has to insist on
a seekable stream, or it has to insist that the stream be in known format,
or the stream access function has to maintain a little buffer. The last
solution is the real one, but it's such a fiddly thing that instead I decided on the
known format (if you call with a FILE * rather than a filename, the data must
be in UTF-8).
But it is a complete pain which would have been avoided with a high-level
language which just loads a text file.

[toc] | [prev] | [next] | [standalone]

#172445

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-08-17 12:19 +0100
Message-ID	<877cptgbli.fsf@bsb.me.uk>
In reply to	#172434

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
>> Bart <b...@freeuk.com> writes: 
>> 
>> > On 16/08/2023 08:31, Malcolm McLean wrote: 
>> >> Some people here are interested in Haskell. 
>> >> They might be interested in this: 
>> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ 
>> >> Of course it's written from a pro-Haskell point of view, and writing an 
>> >> improved version when you've got the C in front of you isn't really a 
>> >> fair test. But he does match C for speed. 
>> >> 
>> > 
>> > "Portability (i.e. Windows) is a pain in the arse with C." 
>> > 
>> > I wonder what makes them say that?
>> Yes, I wondered that too, since the cut-down XML parsing they are doing 
>> is one of the most potentially portable bits of C one could write (as 
>> you say yourself):
>>
> There are some gotchas with files, but not for the cut down parsing
> they implement.
> Windows used to accept "rt" for reading a text stream. And there's
> still a mess with Unicode.

None of that matters for the case in point.  The C code treats the input
like a stream of 8-bit bytes.  You can do that without regard to line
convention.

> And the XML people say that a parser must
> accept UTF-16.

Again, that's not relevant to the case in the article.  But it's also a
completely different issue.  An XML parser that must handle either UTF-8
or UTF-16 needs a layer below the parser (conceptually) to detect the
encoding and return "characters" (as I think you have done).  There is
no reason to suppose that that can't be written in portable C.

> I implement this by having the lexer call a function pointer to read a
> UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
> on the fly.

Exactly -- though I think I would not have converted to UTF-8 in a plain
parser.  Maybe your application make that a good choice.

> If it's UTF-8, it's just an alias for fgetc().
> But how do you know the file format?

The first character much be '<' (and, technically, it must be the '<' that
opens an XML declaration).  The encoding should be clear from the first
two bytes.

> I have code that does this, but if 
> I called it, the XML parser would no longer be a single file module. So
> I read the first few character of the file, the seek back to the start
> position.
> But this only works on seekable streams.

Actually, you don't need to read more that one character to determine if
the file is UTF-8 or UTF-16, all you need to do is an ungetc call and
that works on non-seekable streams.

> So the high-level parse function which accepts a stream rather than a
> file name either has to insist on a seekable stream, or it has to
> insist that the stream be in known format, or the stream access
> function has to maintain a little buffer. The last solution is the
> real one, but it's such a fiddly thing

Given that you convert UTF-16 to UTF-8, I'd have thought it was the
natural choice, even though you can get away with just an ungetc
call.  But then I don't know how your code is organised.  What's fiddly
about it?

> that instead I decided on the
> known format (if you call with a FILE * rather than a filename, the
> data must be in UTF-8).  But it is a complete pain which would have
> been avoided with a high-level language which just loads a text file.

Agreed.  Though I don't think the world is that good at agreeing things.
It's possible that this is not what the author had in mind with the
high/low-level portable code remark, but it's not a clear-cut case.  When
the world has decided that a text file can just be opened and read, such
a facility could be provided by standard C, and even if not standard, it
could probably be written in portable C.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#172461

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-08-17 07:53 -0700
Message-ID	<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
In reply to	#172445

On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
> 
> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: 
> >> Bart <b...@freeuk.com> writes: 
> >> 
> >> > On 16/08/2023 08:31, Malcolm McLean wrote: 
> >> >> Some people here are interested in Haskell. 
> >> >> They might be interested in this: 
> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ 
> >> >> Of course it's written from a pro-Haskell point of view, and writing an 
> >> >> improved version when you've got the C in front of you isn't really a 
> >> >> fair test. But he does match C for speed. 
> >> >> 
> >> > 
> >> > "Portability (i.e. Windows) is a pain in the arse with C." 
> >> > 
> >> > I wonder what makes them say that? 
> >> Yes, I wondered that too, since the cut-down XML parsing they are doing 
> >> is one of the most potentially portable bits of C one could write (as 
> >> you say yourself): 
> >> 
> > There are some gotchas with files, but not for the cut down parsing 
> > they implement. 
> > Windows used to accept "rt" for reading a text stream. And there's 
> > still a mess with Unicode.
> None of that matters for the case in point. The C code treats the input 
> like a stream of 8-bit bytes. You can do that without regard to line 
> convention.
> > And the XML people say that a parser must 
> > accept UTF-16.
> Again, that's not relevant to the case in the article. But it's also a 
> completely different issue. An XML parser that must handle either UTF-8 
> or UTF-16 needs a layer below the parser (conceptually) to detect the 
> encoding and return "characters" (as I think you have done). There is 
> no reason to suppose that that can't be written in portable C.
> > I implement this by having the lexer call a function pointer to read a 
> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 
> > on the fly.
> Exactly -- though I think I would not have converted to UTF-8 in a plain 
> parser. Maybe your application make that a good choice.
> > If it's UTF-8, it's just an alias for fgetc(). 
> > But how do you know the file format?
> The first character much be '<' (and, technically, it must be the '<' that 
> opens an XML declaration). The encoding should be clear from the first 
> two bytes.
> > I have code that does this, but if 
> > I called it, the XML parser would no longer be a single file module. So 
> > I read the first few character of the file, the seek back to the start 
> > position.
> > But this only works on seekable streams. 
> 
> Actually, you don't need to read more that one character to determine if 
> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and 
> that works on non-seekable streams.
> 
You need two characters, because you might have a UTF-16 little-endian
stream without a BOM. So the first character in 8 bit bytes would be '<'.
But there's a simple hack, which is to read the first character from the
stream, then set up the lexer with a "<' sitting in its token. So of course
you also have to read the first character when passed a string, which
is a bit of a nuisance (and that's the sort of thing that gives programming 
such a bad reputation). But it should work now when piped a non-seekable 
UTF-16 stream.

[toc] | [prev] | [next] | [standalone]

#172511

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-08-19 00:15 +0100
Message-ID	<87o7j4vt6r.fsf@bsb.me.uk>
In reply to	#172461

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
>> 
>> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: 
>> >> Bart <b...@freeuk.com> writes: 
>> >> 
>> >> > On 16/08/2023 08:31, Malcolm McLean wrote: 
>> >> >> Some people here are interested in Haskell. 
>> >> >> They might be interested in this: 
>> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ 
>> >> >> Of course it's written from a pro-Haskell point of view, and writing an 
>> >> >> improved version when you've got the C in front of you isn't really a 
>> >> >> fair test. But he does match C for speed. 
>> >> >> 
>> >> > 
>> >> > "Portability (i.e. Windows) is a pain in the arse with C." 
>> >> > 
>> >> > I wonder what makes them say that? 
>> >> Yes, I wondered that too, since the cut-down XML parsing they are doing 
>> >> is one of the most potentially portable bits of C one could write (as 
>> >> you say yourself): 
>> >> 
>> > There are some gotchas with files, but not for the cut down parsing 
>> > they implement. 
>> > Windows used to accept "rt" for reading a text stream. And there's 
>> > still a mess with Unicode.
>> None of that matters for the case in point. The C code treats the input 
>> like a stream of 8-bit bytes. You can do that without regard to line 
>> convention.
>> > And the XML people say that a parser must 
>> > accept UTF-16.
>> Again, that's not relevant to the case in the article. But it's also a 
>> completely different issue. An XML parser that must handle either UTF-8 
>> or UTF-16 needs a layer below the parser (conceptually) to detect the 
>> encoding and return "characters" (as I think you have done). There is 
>> no reason to suppose that that can't be written in portable C.
>> > I implement this by having the lexer call a function pointer to read a 
>> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 
>> > on the fly.
>> Exactly -- though I think I would not have converted to UTF-8 in a plain 
>> parser. Maybe your application make that a good choice.
>> > If it's UTF-8, it's just an alias for fgetc(). 
>> > But how do you know the file format?
>> The first character much be '<' (and, technically, it must be the '<' that 
>> opens an XML declaration). The encoding should be clear from the first 
>> two bytes.

I ended up looking at the spec (that's an hour I'll never get back!) and
it's more complicated...

>> > I have code that does this, but if 
>> > I called it, the XML parser would no longer be a single file module. So 
>> > I read the first few character of the file, the seek back to the start 
>> > position.
>> > But this only works on seekable streams. 
>> 
>> Actually, you don't need to read more that one character to determine if 
>> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and 
>> that works on non-seekable streams.
>> 
> You need two characters, because you might have a UTF-16 little-endian
> stream without a BOM. So the first character in 8 bit bytes would be
> '<'.

Yes, I wasn't thinking.  Thanks.  You can't always tell until the second
byte, but you don't have to "unget" anything in that case because you
now know the character.

But as it happens I spoke way too soon...  The full picture is a mess.

> But there's a simple hack, which is to read the first character from the
> stream, then set up the lexer with a "<' sitting in its token. So of course
> you also have to read the first character when passed a string, which
> is a bit of a nuisance (and that's the sort of thing that gives programming 
> such a bad reputation).

What do you mean "when passed a string"?  Do you mean when the parser is
acting on in-memory data?

> But it should work now when piped a non-seekable 
> UTF-16 stream.

It turns out that if you want to be 100% conforming you need to be able
to detect both UCS-4 and (eye roll) EBCDIC.  What's more, you need to
set up just enough of the reading mechanism to be able to read the XML
declaration and then adjust the reading mechanism to handle the named
encoding.  For your application, ISO-8859-1 might be effectively the
same as ISO-8859-15, but UCS-4 is a complication and you might want to
flag certain errors if the encoding is named as ISO-10646-UCS-2 rather
than UTF-16.

While this can obviously be done in C, I would much rather do it in
Haskell.  Haskell's lazy evaluation gives you stream IO for free (so to
speak), and handling the tail of a lazy stream with functions computed
by looking at the start of it comes naturally in Haskell.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#172514

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-08-18 16:33 -0700
Message-ID	<323a8074-838d-4dfd-ad44-32eda639760en@googlegroups.com>
In reply to	#172511

On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
> 
> > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote: 
> >> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
> >> 
> >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: 
> >> >> Bart <b...@freeuk.com> writes: 
> >> >> 
> >> >> > On 16/08/2023 08:31, Malcolm McLean wrote: 
> >> >> >> Some people here are interested in Haskell. 
> >> >> >> They might be interested in this: 
> >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ 
> >> >> >> Of course it's written from a pro-Haskell point of view, and writing an 
> >> >> >> improved version when you've got the C in front of you isn't really a 
> >> >> >> fair test. But he does match C for speed. 
> >> >> >> 
> >> >> > 
> >> >> > "Portability (i.e. Windows) is a pain in the arse with C." 
> >> >> > 
> >> >> > I wonder what makes them say that? 
> >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing 
> >> >> is one of the most potentially portable bits of C one could write (as 
> >> >> you say yourself): 
> >> >> 
> >> > There are some gotchas with files, but not for the cut down parsing 
> >> > they implement. 
> >> > Windows used to accept "rt" for reading a text stream. And there's 
> >> > still a mess with Unicode. 
> >> None of that matters for the case in point. The C code treats the input 
> >> like a stream of 8-bit bytes. You can do that without regard to line 
> >> convention. 
> >> > And the XML people say that a parser must 
> >> > accept UTF-16. 
> >> Again, that's not relevant to the case in the article. But it's also a 
> >> completely different issue. An XML parser that must handle either UTF-8 
> >> or UTF-16 needs a layer below the parser (conceptually) to detect the 
> >> encoding and return "characters" (as I think you have done). There is 
> >> no reason to suppose that that can't be written in portable C. 
> >> > I implement this by having the lexer call a function pointer to read a 
> >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 
> >> > on the fly. 
> >> Exactly -- though I think I would not have converted to UTF-8 in a plain 
> >> parser. Maybe your application make that a good choice. 
> >> > If it's UTF-8, it's just an alias for fgetc(). 
> >> > But how do you know the file format? 
> >> The first character much be '<' (and, technically, it must be the '<' that 
> >> opens an XML declaration). The encoding should be clear from the first 
> >> two bytes.
> I ended up looking at the spec (that's an hour I'll never get back!) and 
> it's more complicated...
> >> > I have code that does this, but if 
> >> > I called it, the XML parser would no longer be a single file module. So 
> >> > I read the first few character of the file, the seek back to the start 
> >> > position. 
> >> > But this only works on seekable streams. 
> >> 
> >> Actually, you don't need to read more that one character to determine if 
> >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and 
> >> that works on non-seekable streams. 
> >> 
> > You need two characters, because you might have a UTF-16 little-endian 
> > stream without a BOM. So the first character in 8 bit bytes would be 
> > '<'.
> Yes, I wasn't thinking. Thanks. You can't always tell until the second 
> byte, but you don't have to "unget" anything in that case because you 
> now know the character. 
> 
> But as it happens I spoke way too soon... The full picture is a mess.
>
Yes, it's awful. You have an "encoding" field in XML 1.0. But you can't
depend on it because not all XML is version 1.0, some of it is bare. Now
I don't have much experience with text, but I reckon that it's entirely
possible that someone would run XML through a program like iconv,
and it won't be clever enough to change the "encoding" field:
>
> > But there's a simple hack, which is to read the first character from the 
> > stream, then set up the lexer with a "<' sitting in its token. So of course 
> > you also have to read the first character when passed a string, which 
> > is a bit of a nuisance (and that's the sort of thing that gives programming 
> > such a bad reputation).
> What do you mean "when passed a string"? Do you mean when the parser is 
> acting on in-memory data?
>
Sorry, I was so close to the program that I forgot that everybody else knows
nothing of the code (It's on GitHub but not in the resource compiler, it's in a
separate project). You can pass it either a file name, an open stream, or a
string. The string has to be UTF-8 because it is a char *. Of course I have to
read the first character of the string to make the string work the same way
as the rest of the code, all to support UTF-16 without a BOM on non-seekable
streams.
> > But it should work now when piped a non-seekable 
> > UTF-16 stream.
> It turns out that if you want to be 100% conforming you need to be able 
> to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to 
> set up just enough of the reading mechanism to be able to read the XML 
> declaration and then adjust the reading mechanism to handle the named 
> encoding. For your application, ISO-8859-1 might be effectively the 
> same as ISO-8859-15, but UCS-4 is a complication and you might want to 
> flag certain errors if the encoding is named as ISO-10646-UCS-2 rather 
> than UTF-16. 
>
The XML people say that a parser must accept  UTF-8 and UTF-16. I have
heard of files which switch encodings, but I think they are largely mythical.
The basic idea of XML was very good, but I'm not impressed with the standard.
>
> While this can obviously be done in C, I would much rather do it in 
> Haskell. Haskell's lazy evaluation gives you stream IO for free (so to 
> speak), and handling the tail of a lazy stream with functions computed 
> by looking at the start of it comes naturally in Haskell. 
> 
The structure of the C function is massively improved by going to a lexer and
having a proper hierarchical, recursive grammar rather than the old ad-hoc
system. (Which was tempting because basic XML is so simple).
However it might be possible to do a much better job in Haskell. Unfortunately
I can't do that better job.
I'm confident that it is shaping up as a very good single file C XML parser,
however.

[toc] | [prev] | [next] | [standalone]

#172556

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-08-19 21:46 +0100
Message-ID	<87pm3ivjyd.fsf@bsb.me.uk>
In reply to	#172514

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
>> 
>> > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote: 
>> >> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
>> >> 
>> >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: 
>> >> >> Bart <b...@freeuk.com> writes: 
>> >> >> 
>> >> >> > On 16/08/2023 08:31, Malcolm McLean wrote: 
>> >> >> >> Some people here are interested in Haskell. 
>> >> >> >> They might be interested in this: 
>> >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ 
>> >> >> >> Of course it's written from a pro-Haskell point of view, and writing an 
>> >> >> >> improved version when you've got the C in front of you isn't really a 
>> >> >> >> fair test. But he does match C for speed. 
>> >> >> >> 
>> >> >> > 
>> >> >> > "Portability (i.e. Windows) is a pain in the arse with C." 
>> >> >> > 
>> >> >> > I wonder what makes them say that? 
>> >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing 
>> >> >> is one of the most potentially portable bits of C one could write (as 
>> >> >> you say yourself): 
>> >> >> 
>> >> > There are some gotchas with files, but not for the cut down parsing 
>> >> > they implement. 
>> >> > Windows used to accept "rt" for reading a text stream. And there's 
>> >> > still a mess with Unicode. 
>> >> None of that matters for the case in point. The C code treats the input 
>> >> like a stream of 8-bit bytes. You can do that without regard to line 
>> >> convention. 
>> >> > And the XML people say that a parser must 
>> >> > accept UTF-16. 
>> >> Again, that's not relevant to the case in the article. But it's also a 
>> >> completely different issue. An XML parser that must handle either UTF-8 
>> >> or UTF-16 needs a layer below the parser (conceptually) to detect the 
>> >> encoding and return "characters" (as I think you have done). There is 
>> >> no reason to suppose that that can't be written in portable C. 
>> >> > I implement this by having the lexer call a function pointer to read a 
>> >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 
>> >> > on the fly. 
>> >> Exactly -- though I think I would not have converted to UTF-8 in a plain 
>> >> parser. Maybe your application make that a good choice. 
>> >> > If it's UTF-8, it's just an alias for fgetc(). 
>> >> > But how do you know the file format? 
>> >> The first character much be '<' (and, technically, it must be the '<' that 
>> >> opens an XML declaration). The encoding should be clear from the first 
>> >> two bytes.
>> I ended up looking at the spec (that's an hour I'll never get back!) and 
>> it's more complicated...
>> >> > I have code that does this, but if 
>> >> > I called it, the XML parser would no longer be a single file module. So 
>> >> > I read the first few character of the file, the seek back to the start 
>> >> > position. 
>> >> > But this only works on seekable streams. 
>> >> 
>> >> Actually, you don't need to read more that one character to determine if 
>> >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and 
>> >> that works on non-seekable streams. 
>> >> 
>> > You need two characters, because you might have a UTF-16 little-endian 
>> > stream without a BOM. So the first character in 8 bit bytes would be 
>> > '<'.
>> Yes, I wasn't thinking. Thanks. You can't always tell until the second 
>> byte, but you don't have to "unget" anything in that case because you 
>> now know the character. 
>> 
>> But as it happens I spoke way too soon... The full picture is a mess.
>>
> Yes, it's awful. You have an "encoding" field in XML 1.0. But you can't
> depend on it because not all XML is version 1.0, some of it is bare. Now
> I don't have much experience with text, but I reckon that it's entirely
> possible that someone would run XML through a program like iconv,
> and it won't be clever enough to change the "encoding" field:

Maybe.  But you are providing a tool and you don't have to accept
everything.  A converted document with the wrong XML declaration is an
error and you could just reject it.  You don't have to bend over
backwards for bad input.

Not being a Windows user, I've not seen a UTF-16 encoded file in the
wild, so I would not even accept that.  In the Linux world, I'd probably
accept only UTF-8 and point my users to xmllint.

xmllint can read any valid XML file and re-write it using UTF-8 (or,
indeed, many other encodings), changing the XML declaration on the fly.
Hence

  xmllint -encode UTF-8 myresources | babyxrc

would be close to a universal XML processor with little extra work on
your part.  But maybe you probably can't assume your users will want to
do that.

>> It turns out that if you want to be 100% conforming you need to be able 
>> to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to 
>> set up just enough of the reading mechanism to be able to read the XML 
>> declaration and then adjust the reading mechanism to handle the named 
>> encoding. For your application, ISO-8859-1 might be effectively the 
>> same as ISO-8859-15, but UCS-4 is a complication and you might want to 
>> flag certain errors if the encoding is named as ISO-10646-UCS-2 rather 
>> than UTF-16. 
>>
> The XML people say that a parser must accept  UTF-8 and UTF-16.

Don't they go further?  I thought they did.  Maybe the others are
optional and those two are the only must-haves.

> I have
> heard of files which switch encodings, but I think they are largely mythical.
> The basic idea of XML was very good, but I'm not impressed with the
> standard.

There are two kinds of standards: those that incorporate lots of options
because of all the interested parties, and those that make decisive
choices between competing candidates.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#172519

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-08-19 03:04 -0700
Message-ID	<cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
In reply to	#172511

On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>
> It turns out that if you want to be 100% conforming you need to be able 
> to detect both UCS-4 and (eye roll) EBCDIC.
>
I had a go at ECBDIC. 

If anyone has an EBCDIC XML file they'd like to test, please post a link.

Of course the next challenge is to support ECBDIC as the execution character
set. This means all the if (ch == '<') statements have to come out and be replaced
by if (ch == ASCII_LESSTHEN). And the strings have to be replaced with hex codes.

Here's where the Baby X resource compiler shows its power. Simply set up the input
<BabyXRC>
<utf8 name="cdata"><CDATA</utf8>
</BabyXRC> 

And so on, and you get all the strings in hex-encoded UTF-8, ready to cut and paste.

[toc] | [prev] | [next] | [standalone]

#172534

From	Lew Pitcher <lew.pitcher@digitalfreehold.ca>
Date	2023-08-19 13:19 +0000
Message-ID	<ubqfgf$r0tm$1@dont-email.me>
In reply to	#172519

On Sat, 19 Aug 2023 03:04:27 -0700, Malcolm McLean wrote:

> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>
>> It turns out that if you want to be 100% conforming you need to be able 
>> to detect both UCS-4 and (eye roll) EBCDIC.
>>
> I had a go at ECBDIC. 
>
> If anyone has an EBCDIC XML file they'd like to test, please post a link.

Be careful of what you ask for, Malcolm

You /do/ realize that "EBCDIC" refers to a whole family of charactersets,
(at least 46 individual charactersets, most with /some/ common elements)
and /not/ to a single characterset like Unicode or US-ASCII (although, you
could argue that ASCII embodied multiple charactersets, just with fewer
variants).

FWIW, there are a number EBCDIC charactersets that you could not reliably use
in XML, as they lack a few of the required characters. You might take a
look at the DKUUG's characterset standards website[1] - they contributed to
the ISO/IEC JTC 1/SC 2 [2] effort to catalogue and standardize charactersets.

[1] http://std.dkuug.dk/i18n/charmaps/
[2] https://en.wikipedia.org/wiki/ISO/IEC_JTC_1/SC_2

[snip]

-- 
Lew Pitcher
"In Skills We Trust"

[toc] | [prev] | [next] | [standalone]

#172538

From	scott@slp53.sl.home (Scott Lurndal)
Date	2023-08-19 14:48 +0000
Message-ID	<IC4EM.686039$TPw2.185069@fx17.iad>
In reply to	#172519

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>
>> It turns out that if you want to be 100% conforming you need to be able 
>> to detect both UCS-4 and (eye roll) EBCDIC.
>>
>I had a go at ECBDIC. 
>
>If anyone has an EBCDIC XML file they'd like to test, please post a link.

Here's one:


Lo§”“@¥…™¢‰–•~ñKð@…•ƒ–„‰•‡~¤£†`øon
%Lo§”“`¢£¨“…¢ˆ……£@ˆ™…†~–•…m™…‡‰¢£…™K§¢“@£¨—…~£…§£a§¢“@on
%LOÄÖÃãè×Å@™…‡‰¢£…™¢@âèâãÅÔ@™…‡‰¢£…™¢K„£„n
%LO``@Ã–—¨™‰‡ˆ£@Mƒ]@òðñð`òðñô@ÁÙÔ@Ó‰”‰£…„K@Á““@™‰‡ˆ£¢@™…¢…™¥…„K@``n
%LO``@ãˆ‰¢@„–ƒ¤”…•£@‰¢@Ã–•†‰„…•£‰“K@ãˆ‰¢@„–ƒ¤”…•£@”¨@–•“¨@‚…@¤¢…„@•„@„‰¢£™‰‚¤£…„@‰•@ƒƒ–™„•ƒ…@¦‰£ˆ@£ˆ…@£…™”¢@–†@£ˆ…@‡™……”…•£@…•£…™…„@‰•£–@‚¨@ÁÙÔ@•„@£ˆ…@—™£¨@£ˆ£@ÁÙÔ@„…“‰¥…™…„@£ˆ‰¢@„–ƒ¤”…•£@£–K@``n
%LO``@ãˆ…@„£@ƒ–•£‰•…„@‰•@£ˆ‰¢@„–ƒ¤”…•£@‰¢@—™…“‰”‰•™¨@•„@¢¤‚‘…ƒ£@£–@ƒˆ•‡…@–™@ƒ–™™…ƒ£‰–•@†–““–¦‰•‡@†¤™£ˆ…™@™…¥‰…¦K@``n
%L™…‡‰¢£…™m—‡…n
%@@L™…‡‰¢£…™¢n
%@@@@L™…‡‰¢£…™@‰¢m™…‡‰¢£…™~ã™¤…@‰¢m‰•£…™•“~Æ“¢…@‰¢m‚•’…„~Æ“¢…@‰¢m–—£‰–•“~Æ“¢…@‰¢m¢£¤‚m…•£™¨~Æ“¢…n
%@@@@@@L™…‡m¢ˆ–™£m•”…nÔÉÄÙmÅÓñLa™…‡m¢ˆ–™£m•”…n
%@@@@@@L™…‡m“–•‡m•”…nÔ‰•@ÉÄ@Ù…‡‰¢£…™La™…‡m“–•‡m•”…n
%@@@@@@
%@@@@@@
%@@@@@@
%@@@@@@L™…‡m„„™…¢¢@…§£…™•“mƒƒ…¢¢~ã™¤…@”…”m”—mƒƒ…¢¢~ã™¤…@—–¦…™m„–”‰•~Ä…‚¤‡n
%@@@@@@@@L™…‡mƒ–”—–•…•£nÄ…‚¤‡La™…‡mƒ–”—–•…•£n
%@@@@@@@@
%@@@@@@@@L™…‡m–††¢…£nLˆ…§•¤”‚…™nð§ÄððLaˆ…§•¤”‚…™nLa™…‡m–††¢…£n
%@@@@@@La™…‡m„„™…¢¢n
%@@@@@@L™…‡m™…¢…£m¥“¤…nãÂÄLa™…‡m™…¢…£m¥“¤…n
%@@@@@@L™…‡mƒƒ…¢¢n
%@@@@@@@@L™…‡mƒƒ…¢¢m¢££…n
%@@@@@@@@@@L™…‡mƒƒ…¢¢m“…¥…“nÄ…†¤“£La™…‡mƒƒ…¢¢m“…¥…“n
%@@@@@@@@@@L™…‡mƒƒ…¢¢m£¨—…nÙÖLa™…‡mƒƒ…¢¢m£¨—…n
%@@@@@@@@La™…‡mƒƒ…¢¢m¢££…n@
%@@@@@@La™…‡mƒƒ…¢¢n
%@@@@@@L™…‡m”——‰•‡¢n
%@@@@@@@@L™…‡m”——‰•‡n
%@@@@@@@@@@L”——…„m•”…@†‰“…•”…~ÁÁ™ƒˆöô`”‰„™m…“ñK§”“nÔÉÄÙmÅÓñLa”——…„m•”…n
%@@@@@@@@@@L”——…„m£¨—…nÁ™ƒˆ‰£…ƒ£¤™“La”——…„m£¨—…n
%@@@@@@@@@@L”——…„m…§…ƒ¤£‰–•m¢££…nÁÁ™ƒˆöôLa”——…„m…§…ƒ¤£‰–•m¢££…n
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@La™…‡m”——‰•‡nL™…‡m”——‰•‡n
%@@@@@@@@@@L”——…„m•”…@†‰“…•”…~ÁÁ™ƒˆóò`”‰„™K§”“nÔÉÄÙLa”——…„m•”…n
%@@@@@@@@@@L”——…„m£¨—…nÁ™ƒˆ‰£…ƒ£¤™“La”——…„m£¨—…n
%@@@@@@@@@@L”——…„m…§…ƒ¤£‰–•m¢££…nÁÁ™ƒˆóòLa”——…„m…§…ƒ¤£‰–•m¢££…n
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@@@
%@@@@@@@@La™…‡m”——‰•‡n
%@@@@@@La™…‡m”——‰•‡¢n
%@@@@@@L™…‡m—¤™—–¢…n
%@@@@@@@@L—¤™—–¢…m£…§£nL—™n×™–¥‰„…¢@‰„…•£‰†‰ƒ£‰–•@‰•†–™”£‰–•@†–™@£ˆ…@×Åk@‰•ƒ“¤„‰•‡@•@‰”—“…”…•£…™@ƒ–„…@†–™@£ˆ…@„…¥‰ƒ…@•„@@„…¥‰ƒ…@ÉÄ@•¤”‚…™KLa—™nLa—¤™—–¢…m£…§£n
%@@@@@@@@
%@@@@@@La™…‡m—¤™—–¢…n
%@@@@@@L™…‡m‡™–¤—¢n
%@@@@@@@@L™…‡m‡™–¤—nÉ„…•£‰†‰ƒ£‰–•@™…‡‰¢£…™¢La™…‡m‡™–¤—n
%@@@@@@La™…‡m‡™–¤—¢n
%@@@@@@L™…‡m¤¢‡…mƒ–•¢£™‰•£¢n
%@@@@@@@@
%@@@@@@La™…‡m¤¢‡…mƒ–•¢£™‰•£¢n
%@@@@@@L™…‡mƒ–•†‰‡¤™£‰–•n
%@@@@@@@@
%@@@@@@La™…‡mƒ–•†‰‡¤™£‰–•n
%@@@@@@
%@@@@@@L™…‡m££™‰‚¤£…¢n
%@@@@@@@@L££™‰‚¤£…¢m£…§£nL—™nÔÉÄÙmÅÓñ@‰¢@@óò`‚‰£@™…‡‰¢£…™KLa—™nLa££™‰‚¤£…¢m£…§£n
%@@@@@@@@
%@@@@@@La™…‡m££™‰‚¤£…¢n
%@@@@@@L™…‡m†‰…“„¢…£¢n
%@@@@@@
%@@@@@@L†‰…“„¢@“…•‡£ˆ~óòn
%@@@@@@
%@@@@@@
%@@@@@@@@
%L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n
%@@L†‰…“„m•”…nÉ”—“…”…•£…™La†‰…“„m•”…n
%@@
%@@L†‰…“„m”¢‚nóñLa†‰…“„m”¢‚n
%@@L†‰…“„m“¢‚nòôLa†‰…“„m“¢‚n
%@@
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nãˆ…@É”—“…”…•£…™@ƒ–„…K@ãˆ‰¢@†‰…“„@”¤¢£@ˆ–“„@•@‰”—“…”…•£…™@ƒ–„…@£ˆ£@ˆ¢@‚……•@¢¢‰‡•…„@‚¨@ÁÙÔK@Á¢¢‰‡•…„@ƒ–„…¢@‰•ƒ“¤„…@£ˆ…@†–““–¦‰•‡zLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL£‚“…nL£‡™–¤—@ƒ–“¢~ónL£ˆ…„nL™–¦nL…•£™¨nÈ…§@™…—™…¢…•££‰–•La…•£™¨nL…•£™¨nÁâÃÉÉ@™…—™…¢…•££‰–•La…•£™¨nL…•£™¨nÉ”—“…”…•£…™La…•£™¨nLa™–¦nLa£ˆ…„nL£‚–„¨nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôñLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÁLa…•£™¨nL…•£™¨nÁÙÔ@Ó‰”‰£…„La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôòLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÂLa…•£™¨nL…•£™¨nÂ™–„ƒ–”@Ã–™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôóLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÃLa…•£™¨nL…•£™¨nÃ¥‰¤”@É•ƒKLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôôLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÄLa…•£™¨nL…•£™¨nÄ‰‡‰£“@Å˜¤‰—”…•£@Ã–™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôùLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÉLa…•£™¨nL…•£™¨nÉ•†‰•…–•@ã…ƒˆ•–“–‡‰…¢@ÁÇLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôÄLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÔLa…•£™¨nL…•£™¨nÔ–£–™–“@–™@Æ™……¢ƒ“…@â…”‰ƒ–•„¤ƒ£–™@É•ƒKLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôÅLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÕLa…•£™¨nL…•£™¨nÕåÉÄÉÁ@Ã–™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§õðLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨n×La…•£™¨nL…•£™¨nÁ——“‰…„@Ô‰ƒ™–@Ã‰™ƒ¤‰£¢@Ã–™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§õñLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nØLa…•£™¨nL…•£™¨nØ¤“ƒ–””@É•ƒKLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§õöLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nåLa…•£™¨nL…•£™¨nÔ™¥…““@É•£…™•£‰–•“@Ó£„KLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§öùLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨n‰La…•£™¨nL…•£™¨nÉ•£…“@Ã–™—–™£‰–•La…•£™¨nLa™–¦nLa£‚–„¨nLa£‡™–¤—nLa£‚“…nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁÙÔ@ƒ•@¢¢‰‡•@ƒ–„…¢@£ˆ£@™…@•–£@—¤‚“‰¢ˆ…„@‰•@£ˆ‰¢@”•¤“K@Á““@¥“¤…¢@•–£@¢¢‰‡•…„@‚¨@ÁÙÔ@™…@™…¢…™¥…„@•„@”¤¢£@•–£@‚…@¤¢…„KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@
%@@L†‰…“„m¥“¤…¢n
%@@@@
%@@@@
%@@@@
%@@La†‰…“„m¥“¤…¢n
%@@
%@@
%@@L†‰…“„m™…¢…£¢n
%@@@@
%@@La†‰…“„m™…¢…£¢n
%@@
%@@
%@@
%La†‰…“„n
%
%@@@@@@
%@@@@@@@@
%L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n
%@@L†‰…“„m•”…nå™‰•£La†‰…“„m•”…n
%@@
%@@L†‰…“„m”¢‚nòóLa†‰…“„m”¢‚n
%@@L†‰…“„m“¢‚nòðLa†‰…“„m“¢‚n
%@@
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁ•@L™”`„…†‰•…„`¦–™„nÉÔ×ÓÅÔÅÕãÁãÉÖÕ@ÄÅÆÉÕÅÄLa™”`„…†‰•…„`¦–™„n@¥™‰•£@•¤”‚…™K@ã¨—‰ƒ““¨k@£ˆ‰¢@†‰…“„@‰¢@¤¢…„@£–@„‰¢£‰•‡¤‰¢ˆ@‚…£¦……•@„‰††…™…•£@—™–„¤ƒ£@¥™‰•£¢k@–™@”‘–™@™…¥‰¢‰–•¢@–†@@—™–„¤ƒ£KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@
%@@L†‰…“„m¥“¤…¢n
%@@@@
%@@@@
%@@@@
%@@La†‰…“„m¥“¤…¢n
%@@
%@@
%@@L†‰…“„m™…¢…£¢n
%@@@@
%@@La†‰…“„m™…¢…£¢n
%@@
%@@
%@@
%La†‰…“„n
%
%@@@@@@
%@@@@@@@@
%L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n
%@@L†‰…“„m•”…nÁ™ƒˆ‰£…ƒ£¤™…La†‰…“„m•”…n
%@@
%@@L†‰…“„m”¢‚nñùLa†‰…“„m”¢‚n
%@@L†‰…“„m“¢‚nñöLa†‰…“„m“¢‚n
%@@
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nãˆ…@—…™”‰££…„@¥“¤…¢@–†@£ˆ‰¢@†‰…“„@™…zLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@
%@@L†‰…“„m¥“¤…¢n
%@@@@
%@@@@
%@@@@L†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nðððñLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥ôLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nððñðLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥ôãLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nððññLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õ@M–‚¢–“…£…]La—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nðñððLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õãLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nðñðñLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õãÅLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nðññðLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õãÅÑLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nðñññLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥öLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@@@@@L†‰…“„m¥“¤…nññññLa†‰…“„m¥“¤…n
%@@@@@@
%@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÄ…†‰•…„@‚¨@Ã×äÉÄ@¢ƒˆ…”…La—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n
%@@@@@@
%@@@@@@
%@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…n
%@@La†‰…“„m¥“¤…¢n
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~†£…™nL—™nÁ““@–£ˆ…™@¥“¤…¢@™…@™…¢…™¥…„KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@
%@@
%@@L†‰…“„m™…¢…£¢n
%@@@@
%@@La†‰…“„m™…¢…£¢n
%@@
%@@
%@@
%La†‰…“„n
%
%@@@@@@
%@@@@@@@@
%L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n
%@@L†‰…“„m•”…n×™£Õ¤”La†‰…“„m•”…n
%@@
%@@L†‰…“„m”¢‚nñõLa†‰…“„m”¢‚n
%@@L†‰…“„m“¢‚nôLa†‰…“„m“¢‚n
%@@
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁ•@L™”`„…†‰•…„`¦–™„nÉÔ×ÓÅÔÅÕãÁãÉÖÕ@ÄÅÆÉÕÅÄLa™”`„…†‰•…„`¦–™„n@—™‰”™¨@—™£@•¤”‚…™@†–™@£ˆ…@„…¥‰ƒ…KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÖ•@—™–ƒ…¢¢–™¢@‰”—“…”…•£…„@‚¨@ÁÙÔk@‰†@£ˆ…@£–—@†–¤™@‚‰£¢@–†@£ˆ…@—™‰”™¨@—™£@•¤”‚…™@™…@Lˆ…§•¤”‚…™nð§ðLaˆ…§•¤”‚…™n@–™@Lˆ…§•¤”‚…™nð§÷Laˆ…§•¤”‚…™nk@£ˆ…@¥™‰•£@•„@™ƒˆ‰£…ƒ£¤™…@™…@…•ƒ–„…„@„‰††…™…•£“¨KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@
%@@L†‰…“„m¥“¤…¢n
%@@@@
%@@@@
%@@@@
%@@La†‰…“„m¥“¤…¢n
%@@
%@@
%@@L†‰…“„m™…¢…£¢n
%@@@@
%@@La†‰…“„m™…¢…£¢n
%@@
%@@
%@@
%La†‰…“„n
%
%@@@@@@
%@@@@@@@@
%L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n
%@@L†‰…“„m•”…nÙ…¥‰¢‰–•La†‰…“„m•”…n
%@@
%@@L†‰…“„m”¢‚nóLa†‰…“„m”¢‚n
%@@L†‰…“„m“¢‚nðLa†‰…“„m“¢‚n
%@@
%@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁ•@L™”`„…†‰•…„`¦–™„nÉÔ×ÓÅÔÅÕãÁãÉÖÕ@ÄÅÆÉÕÅÄLa™”`„…†‰•…„`¦–™„n@™…¥‰¢‰–•@•¤”‚…™@†–™@£ˆ…@„…¥‰ƒ…KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n
%@@
%@@L†‰…“„m¥“¤…¢n
%@@@@
%@@@@
%@@@@
%@@La†‰…“„m¥“¤…¢n
%@@
%@@
%@@L†‰…“„m™…¢…£¢n
%@@@@
%@@La†‰…“„m™…¢…£¢n
%@@
%@@
%@@
%La†‰…“„n
%
%@@@@@@
%@@@@@@@@La†‰…“„¢n
%@@@@@@La™…‡m†‰…“„¢…£¢n
%@@@@@@
%@@@@La™…‡‰¢£…™n
%@@La™…‡‰¢£…™¢n
%@@L£‰”…¢£”—nñôaðóaòðñô@ð÷zôöLa£‰”…¢£”—n
%La™…‡‰¢£…™m—‡…n

[toc] | [prev] | [next] | [standalone]

#172540

From	Lew Pitcher <lew.pitcher@digitalfreehold.ca>
Date	2023-08-19 15:09 +0000
Message-ID	<ubqlvc$shgn$1@dont-email.me>
In reply to	#172538

On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote:

> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>
>>> It turns out that if you want to be 100% conforming you need to be able 
>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>
>>I had a go at ECBDIC. 
>>
>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
> 
> Here's one:
[snip]

And that's an excellent illustration of my point about some EBCDIC
charactersets lacking the necessary characters to properly express XML.

Here are the first four lines of the ASCII equivalent of that message,
as generated by
  dd if=ebcdic.msg of=ascii.msg conv=ascii
where
  conv=ascii
will convert "from EBCDIC to ASCII" (dd(1) manpage)

Note the (translated) format of the DOCTYPE entities
  <?xml version="1.0" encoding="utf-8"?>
  <?xml-stylesheet href="one_register.xsl" type="text/xsl" ?>
  <|DOCTYPE registers SYSTEM "registers.dtd">
  <|-- Copyright (c) 2010-2014 ARM Limited. All rights reserved. -->

Apparently, you used a variant of EBCDIC that includes an exclamation mark
at codepoint 0x4f; dd uses EBCDIC-US which, at codepoint 0x4f encodes
a "VERTICAL LINE"

-- 
Lew Pitcher
"In Skills We Trust"

[toc] | [prev] | [next] | [standalone]

#172542

From	Lew Pitcher <lew.pitcher@digitalfreehold.ca>
Date	2023-08-19 15:17 +0000
Message-ID	<ubqmdh$shgn$2@dont-email.me>
In reply to	#172540

On Sat, 19 Aug 2023 15:09:32 +0000, Lew Pitcher wrote:

> On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote:
> 
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>>
>>>> It turns out that if you want to be 100% conforming you need to be able 
>>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>>
>>>I had a go at ECBDIC. 
>>>
>>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>> 
>> Here's one:
> [snip]
> 
> And that's an excellent illustration of my point about some EBCDIC
> charactersets lacking the necessary characters to properly express XML.
> 
> Here are the first four lines of the ASCII equivalent of that message,
> as generated by
>   dd if=ebcdic.msg of=ascii.msg conv=ascii
> where
>   conv=ascii
> will convert "from EBCDIC to ASCII" (dd(1) manpage)
> 
> Note the (translated) format of the DOCTYPE entities
>   <?xml version="1.0" encoding="utf-8"?>

Oh, and bye the way, that "encoding" value is incorrect for
the XML document you posted. It should have named the
EBCDIC variant you used, not "utf-8".

I suspect that you just machine or hand encoded an existing
utf-8 XML document, rather than compose a completely new
document in EBCDIC

FWIW, I spent many years working in an EBCDIC environment,
manipulating XML documents (in EBCDIC) with a tool developed
in-house. I had to write a number of "white papers" on the
subjects of characterset translation (to/from EBCDIC, and
between EBCDIC variants), and on XML handling in an EBCDIC
environment.  :-)

-- 
Lew Pitcher
"In Skills We Trust"

[toc] | [prev] | [next] | [standalone]

#172559

From	scott@slp53.sl.home (Scott Lurndal)
Date	2023-08-19 21:05 +0000
Message-ID	<t8aEM.562366$SuUf.127420@fx14.iad>
In reply to	#172540

Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
>On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote:
>
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>>
>>>> It turns out that if you want to be 100% conforming you need to be able 
>>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>>
>>>I had a go at ECBDIC. 
>>>
>>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>> 
>> Here's one:
>[snip]
>
>And that's an excellent illustration of my point about some EBCDIC
>charactersets lacking the necessary characters to properly express XML.
>
>Here are the first four lines of the ASCII equivalent of that message,
>as generated by
>  dd if=ebcdic.msg of=ascii.msg conv=ascii
>where
>  conv=ascii
>will convert "from EBCDIC to ASCII" (dd(1) manpage)
>
>Note the (translated) format of the DOCTYPE entities
>  <?xml version="1.0" encoding="utf-8"?>
>  <?xml-stylesheet href="one_register.xsl" type="text/xsl" ?>
>  <|DOCTYPE registers SYSTEM "registers.dtd">
>  <|-- Copyright (c) 2010-2014 ARM Limited. All rights reserved. -->
>
>Apparently, you used a variant of EBCDIC that includes an exclamation mark
>at codepoint 0x4f; dd uses EBCDIC-US which, at codepoint 0x4f encodes
>a "VERTICAL LINE"
>

Actually, I used 'dd' on an old Fedora Core install.

[toc] | [prev] | [next] | [standalone]

#172553

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-08-19 21:05 +0100
Message-ID	<871qfyx0fe.fsf@bsb.me.uk>
In reply to	#172538

scott@slp53.sl.home (Scott Lurndal) writes:

> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>
>>> It turns out that if you want to be 100% conforming you need to be able 
>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>
>>I had a go at ECBDIC. 
>>
>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>
> Here's one:
>
> Lo...

<EBCDIC-encoded XML deleted>

Is that legal?  I thought an EBCDIC XML file must give the correct
encoding in the XML declaration.  xmllint rejects it unless I edit the
declaration.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#172560

From	scott@slp53.sl.home (Scott Lurndal)
Date	2023-08-19 21:07 +0000
Message-ID	<baaEM.562367$SuUf.444809@fx14.iad>
In reply to	#172553

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>scott@slp53.sl.home (Scott Lurndal) writes:
>
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>>
>>>> It turns out that if you want to be 100% conforming you need to be able 
>>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>>
>>>I had a go at ECBDIC. 
>>>
>>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>>
>> Here's one:
>>
>> Lo...
>
><EBCDIC-encoded XML deleted>
>
>Is that legal?  I thought an EBCDIC XML file must give the correct
>encoding in the XML declaration.  xmllint rejects it unless I edit the
>declaration.

As Lew pointed out, it was not properly specified, I had cheated and
encoded (using dd) an existing xml file (from the public ARM Aarch64
SysReg XML).

[toc] | [prev] | [next] | [standalone]

#172563

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-08-19 22:31 +0100
Message-ID	<87jztqvhwf.fsf@bsb.me.uk>
In reply to	#172519

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>
>> It turns out that if you want to be 100% conforming you need to be able 
>> to detect both UCS-4 and (eye roll) EBCDIC.
>>
> I had a go at ECBDIC. 
>
> If anyone has an EBCDIC XML file they'd like to test, please post a
> link.

You can make your own by (a) setting the encoding="..." attribute in the
declaration (EBCDIC-INT is a good one) and then running iconv.

> Of course the next challenge is to support ECBDIC as the execution
> character set. This means all the if (ch == '<') statements have to
> come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings
> have to be replaced with hex codes.

Do you have a user who wants to compile your program on a system that
does not support ASCII C source?

> Here's where the Baby X resource compiler shows its power. Simply set
> up the input
> <BabyXRC>
> <utf8 name="cdata"><CDATA</utf8>
> </BabyXRC>

You've lost me.  That does not parse.

> And so on, and you get all the strings in hex-encoded UTF-8, ready to
> cut and paste.

What strings?  And why hex -- nothing in the XML suggests hex?  I
usually want UTF-8 strings as UTF-8 strings in the source, but I
understand your user base does not include me.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#172568

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2023-08-19 22:04 -0700
Message-ID	<7f9fbbd6-7f5c-4e12-a73b-c9abe91b7f5bn@googlegroups.com>
In reply to	#172563

On Saturday, 19 August 2023 at 22:31:28 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes: 
> 
> > On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: 
> >> 
> >> It turns out that if you want to be 100% conforming you need to be able 
> >> to detect both UCS-4 and (eye roll) EBCDIC. 
> >> 
> > I had a go at ECBDIC. 
> > 
> > If anyone has an EBCDIC XML file they'd like to test, please post a 
> > link.
> You can make your own by (a) setting the encoding="..." attribute in the 
> declaration (EBCDIC-INT is a good one) and then running iconv.
> > Of course the next challenge is to support ECBDIC as the execution 
> > character set. This means all the if (ch == '<') statements have to 
> > come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings 
> > have to be replaced with hex codes.
> Do you have a user who wants to compile your program on a system that 
> does not support ASCII C source?
>
Who knows. The code is publicly available to whoever wants it.
The problem with this model is that, unless the user chooses to get back to
you, you've no idea who he is, or how he is using the code, or if he has
any problems with it. Unlike paying customers who usually leave their
details, and are likely to complain if they don't get what they wanted.

But if the XML parser is to support EBCDIC input, then I'd expect that
an EBCDIC-interested user might also want to compile under a system
where the execution character set is EBCDIC. However he'll get UTF-8
output, which is probably not what he wants.

I'd need a EBCDIC C compiler to test it.
>
> > Here's where the Baby X resource compiler shows its power. Simply set 
> > up the input 
> > <BabyXRC> 
> > <utf8 name="cdata"><CDATA</utf8> 
> > </BabyXRC>
> You've lost me. That does not parse.
> > And so on, and you get all the strings in hex-encoded UTF-8, ready to 
> > cut and paste.
> What strings? And why hex -- nothing in the XML suggests hex? I 
> usually want UTF-8 strings as UTF-8 strings in the source, but I 
> understand your user base does not include me. 
> 
XML documents contain a tag called "CDATA". So the natural thing is
to write
if (!strcmp(tag, "CDATA") /* check for CDATA and process it. */

This will work on a program which accepts data in the execution character
set and only in the execution character set. However the XML parser 
accepts data in ASCII, UTF-8, UTF-16 (two flavours) and, now, EBCDIC.
It does this by converting to a common format via a conversion function
passed to the lexer, and the common format is UTF-8.

So "tag" will be in UTF-8. If the execution character set is ASCII, then
the comparison will still work, and that is what I have done. But if it is
EBCDIC, it will fail.

Instead we need to write

/* CDATA in UTF-8 */
char *cdata = {0x43, 0x44, 0x54, 0x41, 0x00}: 
 
if (!strcmp(tag, cdata)) /* check for CDATA and process it */

This is where the Baby X resource compiler comes to our rescue. It will
convert ASCII to that form, with the utf-8 tag.

[toc] | [prev] | [next] | [standalone]

#172571

From	Richard Damon <Richard@Damon-Family.org>
Date	2023-08-20 07:41 -0400
Message-ID	<5_mEM.119316$uEkc.63082@fx35.iad>
In reply to	#172568

On 8/20/23 1:04 AM, Malcolm McLean wrote:
> On Saturday, 19 August 2023 at 22:31:28 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>>> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>>
>>>> It turns out that if you want to be 100% conforming you need to be able
>>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>>
>>> I had a go at ECBDIC.
>>>
>>> If anyone has an EBCDIC XML file they'd like to test, please post a
>>> link.
>> You can make your own by (a) setting the encoding="..." attribute in the
>> declaration (EBCDIC-INT is a good one) and then running iconv.
>>> Of course the next challenge is to support ECBDIC as the execution
>>> character set. This means all the if (ch == '<') statements have to
>>> come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings
>>> have to be replaced with hex codes.
>> Do you have a user who wants to compile your program on a system that
>> does not support ASCII C source?
>>
> Who knows. The code is publicly available to whoever wants it.
> The problem with this model is that, unless the user chooses to get back to
> you, you've no idea who he is, or how he is using the code, or if he has
> any problems with it. Unlike paying customers who usually leave their
> details, and are likely to complain if they don't get what they wanted.
> 
> But if the XML parser is to support EBCDIC input, then I'd expect that
> an EBCDIC-interested user might also want to compile under a system
> where the execution character set is EBCDIC. However he'll get UTF-8
> output, which is probably not what he wants.
> 
> I'd need a EBCDIC C compiler to test it.
>>
>>> Here's where the Baby X resource compiler shows its power. Simply set
>>> up the input
>>> <BabyXRC>
>>> <utf8 name="cdata"><CDATA</utf8>
>>> </BabyXRC>
>> You've lost me. That does not parse.
>>> And so on, and you get all the strings in hex-encoded UTF-8, ready to
>>> cut and paste.
>> What strings? And why hex -- nothing in the XML suggests hex? I
>> usually want UTF-8 strings as UTF-8 strings in the source, but I
>> understand your user base does not include me.
>>
> XML documents contain a tag called "CDATA". So the natural thing is
> to write
> if (!strcmp(tag, "CDATA") /* check for CDATA and process it. */
> 
> This will work on a program which accepts data in the execution character
> set and only in the execution character set. However the XML parser
> accepts data in ASCII, UTF-8, UTF-16 (two flavours) and, now, EBCDIC.
> It does this by converting to a common format via a conversion function
> passed to the lexer, and the common format is UTF-8.
> 
> So "tag" will be in UTF-8. If the execution character set is ASCII, then
> the comparison will still work, and that is what I have done. But if it is
> EBCDIC, it will fail.
> 
> Instead we need to write
> 
> /* CDATA in UTF-8 */
> char *cdata = {0x43, 0x44, 0x54, 0x41, 0x00}:
>   
> if (!strcmp(tag, cdata)) /* check for CDATA and process it */
> 
> This is where the Baby X resource compiler comes to our rescue. It will
> convert ASCII to that form, with the utf-8 tag.

Why not just write u8"CDATA" instead.

u8 strings are always UTF-8 encoded, no matter what the execution 
character set is.

[toc] | [prev] | [next] | [standalone]

Page 1 of 15 [1] 2 3 … 15 Next page →

csiph-web

C vs Haskell for XML parsing

Contents

#172354 — C vs Haskell for XML parsing

#172359

#172423

#172434

#172445

#172461

#172511

#172514

#172556

#172519

#172534

#172538

#172540

#172542

#172559

#172553

#172560

#172563

#172568

#172571