Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #172354 > unrolled thread
| Started by | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| First post | 2023-08-16 00:31 -0700 |
| Last post | 2023-08-17 03:42 -0700 |
| Articles | 20 on this page of 287 — 19 participants |
Back to article view | Back to comp.lang.c
C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 00:31 -0700
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-16 11:14 +0100
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:23 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 21:38 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 12:19 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-17 07:53 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 00:15 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-18 16:33 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 21:46 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-19 03:04 -0700
Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-19 13:19 +0000
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-19 14:48 +0000
Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-19 15:09 +0000
Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-19 15:17 +0000
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-19 21:05 +0000
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 21:05 +0100
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-19 21:07 +0000
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-19 22:31 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-19 22:04 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-20 07:41 -0400
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-20 17:00 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-20 11:20 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-20 14:45 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-21 00:05 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-20 19:45 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-21 14:51 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-21 11:28 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-21 02:59 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-21 15:17 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-21 23:03 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-22 14:09 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-22 05:38 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-22 15:31 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-22 06:51 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-22 19:19 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-22 21:59 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 09:57 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 07:48 -0700
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-23 16:05 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 08:21 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 19:30 +0200
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 18:50 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 10:49 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-23 18:08 +0000
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-23 21:28 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-23 20:53 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-24 15:15 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-24 07:50 -0700
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-24 16:48 +0100
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-24 17:35 +0000
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-24 18:09 +0000
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 09:59 +0200
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 09:46 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 01:37 -0700
Re: C vs Haskell for XML parsing Spiros Bousbouras <spibou@gmail.com> - 2023-08-25 08:50 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 01:53 -0700
Underscores in type names (was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-25 09:17 +0000
Re: Underscores in type names (was : C vs Haskell for XML parsing) Richard Harnden <richard.nospam@gmail.com> - 2023-08-25 11:35 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 13:42 +0200
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-25 13:59 +0000
Re: C vs Haskell for XML parsing candycane@f172.n1.z21.fsxnet (candycane) - 2023-08-26 00:45 +1300
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-25 19:50 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 02:55 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-26 19:21 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-27 03:05 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:28 +0200
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:01 +0000
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:07 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 09:16 +0200
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 19:22 +0000
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-29 19:38 +0000
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 20:11 +0000
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 21:59 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-30 00:43 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 12:30 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-30 05:04 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 17:50 +0200
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-30 19:41 +0000
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-31 11:18 +0200
Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-30 14:40 +0000
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-30 15:03 +0000
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 12:00 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 20:50 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-31 08:12 +0000
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-01 11:51 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 00:55 +0100
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:17 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 04:31 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-25 14:06 +0000
Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-25 15:35 +0000
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 11:45 -0700
Re: C vs Haskell for XML parsing Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2023-08-25 20:06 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 19:35 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 19:55 -0700
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 20:26 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-26 19:24 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 02:52 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-26 14:10 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 22:54 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:39 +0200
Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-08-27 15:56 -0400
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 00:42 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 10:39 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 02:03 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 13:29 +0200
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 16:35 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 10:11 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 19:40 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 12:31 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 22:39 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 14:22 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:02 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 16:21 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 10:05 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 14:50 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 14:50 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:13 +0000
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-26 19:31 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 23:08 -0700
Re: C vs Haskell for XML parsing "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2023-08-26 23:23 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:41 +0200
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 13:38 +0200
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 11:59 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 19:34 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 17:12 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-26 01:44 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 22:18 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 19:58 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 23:07 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 21:17 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 10:12 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 15:13 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 19:47 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 19:09 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 22:27 -0400
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:55 +0200
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 02:16 +0100
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-25 18:39 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-25 22:26 -0400
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 11:07 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 10:33 -0400
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 16:27 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 11:57 -0400
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 17:11 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 12:35 -0400
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 18:24 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 13:35 -0400
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 20:11 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 17:07 -0400
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 22:40 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-26 23:32 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 03:02 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 13:25 +0100
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 14:37 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-26 19:49 +0000
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-26 22:00 +0100
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-26 17:31 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-26 15:28 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-27 04:24 +0000
Re: C vs Haskell for XML parsing "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2023-08-26 21:59 -0700
Re: C vs Haskell for XML parsing candycane@f172.n1.z21.fsxnet (candycane) - 2023-08-27 02:42 +1300
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-27 11:23 +0100
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-27 22:45 +0000
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-08-27 19:06 -0400
Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-08-28 02:18 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 16:21 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 00:00 +0000
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 19:36 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 03:00 +0000
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 06:58 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 15:22 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:49 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 17:11 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 16:06 +0200
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 08:27 -0700
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 01:36 +0100
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 01:22 +0000
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 10:40 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-29 02:53 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 03:00 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 16:18 +0200
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 13:06 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 22:14 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 01:32 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 21:09 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 12:44 +0200
Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-08-30 12:32 -0400
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 11:44 -0700
Re: C vs Haskell for XML parsing James Kuyper <jameskuyper@alumni.caltech.edu> - 2023-09-09 01:15 -0400
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-31 04:47 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-30 11:42 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 23:36 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-31 08:15 +0000
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-01 11:48 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-09-03 03:55 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-03 11:44 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-09-03 16:20 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-03 16:47 -0700
Re: C vs Haskell for XML parsing Richard Damon <Richard@Damon-Family.org> - 2023-09-03 17:24 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-03 03:16 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-09-03 17:26 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-03 03:19 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 19:43 +0000
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 13:23 -0700
Re: C vs Haskell for XML parsing Bobby Moore <bobbymoore018@gmail.com> - 2023-08-29 13:54 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 11:41 +0200
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 08:29 -0700
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 16:54 +0100
Re: Named function arguments (Was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:30 +0000
Re: Named function arguments (Was : C vs Haskell for XML parsing) Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-30 19:53 +0000
Re: Named function arguments (Was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 20:07 +0000
Re: Named function arguments (Was : C vs Haskell for XML parsing) Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-30 20:42 +0000
Re: Named function arguments (Was : C vs Haskell for XML parsing) Richard Harnden <richard.nospam@gmail.com> - 2023-08-30 23:15 +0100
Re: Named function arguments (Was : C vs Haskell for XML parsing) Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:41 +0000
Re: Named function arguments (Was : C vs Haskell for XML parsing) David Brown <david.brown@hesbynett.no> - 2023-08-31 12:43 +0200
Re: Named function arguments (Was : C vs Haskell for XML parsing) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 20:40 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:15 +0000
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 15:53 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 18:41 +0200
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 18:01 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 20:01 +0200
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 20:14 +0100
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 19:27 +0000
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:09 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 21:53 +0200
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-28 20:37 +0000
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 23:39 +0100
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 00:23 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-29 01:01 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-29 19:28 +0000
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-29 11:08 +0100
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-28 01:31 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 20:18 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 18:50 +0200
Re: C vs Haskell for XML parsing Richard Harnden <richard.nospam@gmail.com> - 2023-08-27 19:18 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 21:19 +0200
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-27 20:33 +0100
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 14:14 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-27 13:56 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 11:00 +0200
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 15:12 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-29 16:32 +0200
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-29 13:12 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-30 12:50 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 23:38 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-26 14:09 +0000
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 00:44 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-27 00:18 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-27 17:56 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 19:20 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-27 11:18 -0700
Re: C vs Haskell for XML parsing kalevi@kolttonen.fi (Kalevi Kolttonen) - 2023-08-27 18:34 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 00:32 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 11:14 +0200
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-28 14:10 +0000
Re: C vs Haskell for XML parsing kalevi@kolttonen.fi (Kalevi Kolttonen) - 2023-08-29 10:47 +0000
Re: C vs Haskell for XML parsing Michael S <already5chosen@yahoo.com> - 2023-08-29 04:53 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 06:35 -0700
Re: C vs Haskell for XML parsing Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-28 16:12 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 08:24 +0200
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-28 22:17 +0100
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 14:35 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-28 14:38 -0700
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-28 01:00 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 11:24 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 03:29 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-28 14:01 +0200
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-28 08:40 -0700
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-27 19:11 +0200
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-25 14:49 +0100
Re: C vs Haskell for XML parsing David Brown <david.brown@hesbynett.no> - 2023-08-25 19:59 +0200
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-25 18:31 +0000
Re: C vs Haskell for XML parsing Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-25 20:03 -0700
Re: C vs Haskell for XML parsing Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-23 14:54 -0700
Re: C vs Haskell for XML parsing scott@slp53.sl.home (Scott Lurndal) - 2023-08-22 14:57 +0000
Re: C vs Haskell for XML parsing Bart <bc@freeuk.com> - 2023-08-22 14:10 +0100
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-21 13:46 +0100
Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:32 -0700
Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:47 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 00:37 +0000
Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:40 -0700
Re: C vs Haskell for XML parsing Michael S <already5chosen@yahoo.com> - 2023-08-17 02:37 -0700
Re: C vs Haskell for XML parsing Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 13:50 +0000
Re: C vs Haskell for XML parsing Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:07 +0100
Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-16 17:25 -0700
Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-17 03:32 -0700
Re: C vs Haskell for XML parsing fir <profesor.fir@gmail.com> - 2023-08-17 03:42 -0700
Page 1 of 15 [1] 2 3 … 15 Next page →
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-16 00:31 -0700 |
| Subject | C vs Haskell for XML parsing |
| Message-ID | <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> |
Some people here are interested in Haskell. They might be interested in this: https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
[toc] | [next] | [standalone]
| From | Bart <bc@freeuk.com> |
|---|---|
| Date | 2023-08-16 11:14 +0100 |
| Message-ID | <ubi7hd$38q7d$1@dont-email.me> |
| In reply to | #172354 |
On 16/08/2023 08:31, Malcolm McLean wrote: > Some people here are interested in Haskell. > They might be interested in this: > > https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ > > Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed. > "Portability (i.e. Windows) is a pain in the arse with C." I wonder what makes them say that? Reading from a file must be the world's most portable kind of program. While issues with filenames and paths will be the same whatever the language. So what is it?
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-17 00:23 +0100 |
| Message-ID | <87o7j6fu74.fsf@bsb.me.uk> |
| In reply to | #172359 |
Bart <bc@freeuk.com> writes: > On 16/08/2023 08:31, Malcolm McLean wrote: >> Some people here are interested in Haskell. >> They might be interested in this: >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ >> Of course it's written from a pro-Haskell point of view, and writing an >> improved version when you've got the C in front of you isn't really a >> fair test. But he does match C for speed. >> > > "Portability (i.e. Windows) is a pain in the arse with C." > > I wonder what makes them say that? Yes, I wondered that too, since the cut-down XML parsing they are doing is one of the most potentially portable bits of C one could write (as you say yourself): > Reading from a file must be the world's most portable kind of > program. But reading more closely, the remark is a general one about dropping out of a high-level language for some part of a program rather than being specific to this task. None the less, I'd have liked a citation or link. > While issues with filenames and paths will be the same whatever > the language. Not always. Some languages have standard library functions to handle such things (e.g. Python and Haskell). I imagine that's what the author was thinking about. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-16 21:38 -0700 |
| Message-ID | <37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com> |
| In reply to | #172423 |
On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: > Bart <b...@freeuk.com> writes: > > > On 16/08/2023 08:31, Malcolm McLean wrote: > >> Some people here are interested in Haskell. > >> They might be interested in this: > >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ > >> Of course it's written from a pro-Haskell point of view, and writing an > >> improved version when you've got the C in front of you isn't really a > >> fair test. But he does match C for speed. > >> > > > > "Portability (i.e. Windows) is a pain in the arse with C." > > > > I wonder what makes them say that? > Yes, I wondered that too, since the cut-down XML parsing they are doing > is one of the most potentially portable bits of C one could write (as > you say yourself): > There are some gotchas with files, but not for the cut down parsing they implement. Windows used to accept "rt" for reading a text stream. And there's still a mess with Unicode. And the XML people say that a parser must accept UTF-16. I implement this by having the lexer call a function pointer to read a UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 on the fly. If it's UTF-8, it's just an alias for fgetc(). But how do you know the file format? I have code that does this, but if I called it, the XML parser would no longer be a single file module. So I read the first few character of the file, the seek back to the start position. Bu this only works on seekable streams. So the high-level parse function which accepts a stream rather than a file name either has to insist on a seekable stream, or it has to insist that the stream be in known format, or the stream access function has to maintain a little buffer. The last solution is the real one, but it's such a fiddly thing that instead I decided on the known format (if you call with a FILE * rather than a filename, the data must be in UTF-8). But it is a complete pain which would have been avoided with a high-level language which just loads a text file.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-17 12:19 +0100 |
| Message-ID | <877cptgbli.fsf@bsb.me.uk> |
| In reply to | #172434 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: >> Bart <b...@freeuk.com> writes: >> >> > On 16/08/2023 08:31, Malcolm McLean wrote: >> >> Some people here are interested in Haskell. >> >> They might be interested in this: >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ >> >> Of course it's written from a pro-Haskell point of view, and writing an >> >> improved version when you've got the C in front of you isn't really a >> >> fair test. But he does match C for speed. >> >> >> > >> > "Portability (i.e. Windows) is a pain in the arse with C." >> > >> > I wonder what makes them say that? >> Yes, I wondered that too, since the cut-down XML parsing they are doing >> is one of the most potentially portable bits of C one could write (as >> you say yourself): >> > There are some gotchas with files, but not for the cut down parsing > they implement. > Windows used to accept "rt" for reading a text stream. And there's > still a mess with Unicode. None of that matters for the case in point. The C code treats the input like a stream of 8-bit bytes. You can do that without regard to line convention. > And the XML people say that a parser must > accept UTF-16. Again, that's not relevant to the case in the article. But it's also a completely different issue. An XML parser that must handle either UTF-8 or UTF-16 needs a layer below the parser (conceptually) to detect the encoding and return "characters" (as I think you have done). There is no reason to suppose that that can't be written in portable C. > I implement this by having the lexer call a function pointer to read a > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 > on the fly. Exactly -- though I think I would not have converted to UTF-8 in a plain parser. Maybe your application make that a good choice. > If it's UTF-8, it's just an alias for fgetc(). > But how do you know the file format? The first character much be '<' (and, technically, it must be the '<' that opens an XML declaration). The encoding should be clear from the first two bytes. > I have code that does this, but if > I called it, the XML parser would no longer be a single file module. So > I read the first few character of the file, the seek back to the start > position. > But this only works on seekable streams. Actually, you don't need to read more that one character to determine if the file is UTF-8 or UTF-16, all you need to do is an ungetc call and that works on non-seekable streams. > So the high-level parse function which accepts a stream rather than a > file name either has to insist on a seekable stream, or it has to > insist that the stream be in known format, or the stream access > function has to maintain a little buffer. The last solution is the > real one, but it's such a fiddly thing Given that you convert UTF-16 to UTF-8, I'd have thought it was the natural choice, even though you can get away with just an ungetc call. But then I don't know how your code is organised. What's fiddly about it? > that instead I decided on the > known format (if you call with a FILE * rather than a filename, the > data must be in UTF-8). But it is a complete pain which would have > been avoided with a high-level language which just loads a text file. Agreed. Though I don't think the world is that good at agreeing things. It's possible that this is not what the author had in mind with the high/low-level portable code remark, but it's not a clear-cut case. When the world has decided that a text file can just be opened and read, such a facility could be provided by standard C, and even if not standard, it could probably be written in portable C. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-17 07:53 -0700 |
| Message-ID | <250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com> |
| In reply to | #172445 |
On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote: > Malcolm McLean <malcolm.ar...@gmail.com> writes: > > > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: > >> Bart <b...@freeuk.com> writes: > >> > >> > On 16/08/2023 08:31, Malcolm McLean wrote: > >> >> Some people here are interested in Haskell. > >> >> They might be interested in this: > >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ > >> >> Of course it's written from a pro-Haskell point of view, and writing an > >> >> improved version when you've got the C in front of you isn't really a > >> >> fair test. But he does match C for speed. > >> >> > >> > > >> > "Portability (i.e. Windows) is a pain in the arse with C." > >> > > >> > I wonder what makes them say that? > >> Yes, I wondered that too, since the cut-down XML parsing they are doing > >> is one of the most potentially portable bits of C one could write (as > >> you say yourself): > >> > > There are some gotchas with files, but not for the cut down parsing > > they implement. > > Windows used to accept "rt" for reading a text stream. And there's > > still a mess with Unicode. > None of that matters for the case in point. The C code treats the input > like a stream of 8-bit bytes. You can do that without regard to line > convention. > > And the XML people say that a parser must > > accept UTF-16. > Again, that's not relevant to the case in the article. But it's also a > completely different issue. An XML parser that must handle either UTF-8 > or UTF-16 needs a layer below the parser (conceptually) to detect the > encoding and return "characters" (as I think you have done). There is > no reason to suppose that that can't be written in portable C. > > I implement this by having the lexer call a function pointer to read a > > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 > > on the fly. > Exactly -- though I think I would not have converted to UTF-8 in a plain > parser. Maybe your application make that a good choice. > > If it's UTF-8, it's just an alias for fgetc(). > > But how do you know the file format? > The first character much be '<' (and, technically, it must be the '<' that > opens an XML declaration). The encoding should be clear from the first > two bytes. > > I have code that does this, but if > > I called it, the XML parser would no longer be a single file module. So > > I read the first few character of the file, the seek back to the start > > position. > > But this only works on seekable streams. > > Actually, you don't need to read more that one character to determine if > the file is UTF-8 or UTF-16, all you need to do is an ungetc call and > that works on non-seekable streams. > You need two characters, because you might have a UTF-16 little-endian stream without a BOM. So the first character in 8 bit bytes would be '<'. But there's a simple hack, which is to read the first character from the stream, then set up the lexer with a "<' sitting in its token. So of course you also have to read the first character when passed a string, which is a bit of a nuisance (and that's the sort of thing that gives programming such a bad reputation). But it should work now when piped a non-seekable UTF-16 stream.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-19 00:15 +0100 |
| Message-ID | <87o7j4vt6r.fsf@bsb.me.uk> |
| In reply to | #172461 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote: >> Malcolm McLean <malcolm.ar...@gmail.com> writes: >> >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: >> >> Bart <b...@freeuk.com> writes: >> >> >> >> > On 16/08/2023 08:31, Malcolm McLean wrote: >> >> >> Some people here are interested in Haskell. >> >> >> They might be interested in this: >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ >> >> >> Of course it's written from a pro-Haskell point of view, and writing an >> >> >> improved version when you've got the C in front of you isn't really a >> >> >> fair test. But he does match C for speed. >> >> >> >> >> > >> >> > "Portability (i.e. Windows) is a pain in the arse with C." >> >> > >> >> > I wonder what makes them say that? >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing >> >> is one of the most potentially portable bits of C one could write (as >> >> you say yourself): >> >> >> > There are some gotchas with files, but not for the cut down parsing >> > they implement. >> > Windows used to accept "rt" for reading a text stream. And there's >> > still a mess with Unicode. >> None of that matters for the case in point. The C code treats the input >> like a stream of 8-bit bytes. You can do that without regard to line >> convention. >> > And the XML people say that a parser must >> > accept UTF-16. >> Again, that's not relevant to the case in the article. But it's also a >> completely different issue. An XML parser that must handle either UTF-8 >> or UTF-16 needs a layer below the parser (conceptually) to detect the >> encoding and return "characters" (as I think you have done). There is >> no reason to suppose that that can't be written in portable C. >> > I implement this by having the lexer call a function pointer to read a >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 >> > on the fly. >> Exactly -- though I think I would not have converted to UTF-8 in a plain >> parser. Maybe your application make that a good choice. >> > If it's UTF-8, it's just an alias for fgetc(). >> > But how do you know the file format? >> The first character much be '<' (and, technically, it must be the '<' that >> opens an XML declaration). The encoding should be clear from the first >> two bytes. I ended up looking at the spec (that's an hour I'll never get back!) and it's more complicated... >> > I have code that does this, but if >> > I called it, the XML parser would no longer be a single file module. So >> > I read the first few character of the file, the seek back to the start >> > position. >> > But this only works on seekable streams. >> >> Actually, you don't need to read more that one character to determine if >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and >> that works on non-seekable streams. >> > You need two characters, because you might have a UTF-16 little-endian > stream without a BOM. So the first character in 8 bit bytes would be > '<'. Yes, I wasn't thinking. Thanks. You can't always tell until the second byte, but you don't have to "unget" anything in that case because you now know the character. But as it happens I spoke way too soon... The full picture is a mess. > But there's a simple hack, which is to read the first character from the > stream, then set up the lexer with a "<' sitting in its token. So of course > you also have to read the first character when passed a string, which > is a bit of a nuisance (and that's the sort of thing that gives programming > such a bad reputation). What do you mean "when passed a string"? Do you mean when the parser is acting on in-memory data? > But it should work now when piped a non-seekable > UTF-16 stream. It turns out that if you want to be 100% conforming you need to be able to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to set up just enough of the reading mechanism to be able to read the XML declaration and then adjust the reading mechanism to handle the named encoding. For your application, ISO-8859-1 might be effectively the same as ISO-8859-15, but UCS-4 is a complication and you might want to flag certain errors if the encoding is named as ISO-10646-UCS-2 rather than UTF-16. While this can obviously be done in C, I would much rather do it in Haskell. Haskell's lazy evaluation gives you stream IO for free (so to speak), and handling the tail of a lazy stream with functions computed by looking at the start of it comes naturally in Haskell. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-18 16:33 -0700 |
| Message-ID | <323a8074-838d-4dfd-ad44-32eda639760en@googlegroups.com> |
| In reply to | #172511 |
On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: > Malcolm McLean <malcolm.ar...@gmail.com> writes: > > > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote: > >> Malcolm McLean <malcolm.ar...@gmail.com> writes: > >> > >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: > >> >> Bart <b...@freeuk.com> writes: > >> >> > >> >> > On 16/08/2023 08:31, Malcolm McLean wrote: > >> >> >> Some people here are interested in Haskell. > >> >> >> They might be interested in this: > >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ > >> >> >> Of course it's written from a pro-Haskell point of view, and writing an > >> >> >> improved version when you've got the C in front of you isn't really a > >> >> >> fair test. But he does match C for speed. > >> >> >> > >> >> > > >> >> > "Portability (i.e. Windows) is a pain in the arse with C." > >> >> > > >> >> > I wonder what makes them say that? > >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing > >> >> is one of the most potentially portable bits of C one could write (as > >> >> you say yourself): > >> >> > >> > There are some gotchas with files, but not for the cut down parsing > >> > they implement. > >> > Windows used to accept "rt" for reading a text stream. And there's > >> > still a mess with Unicode. > >> None of that matters for the case in point. The C code treats the input > >> like a stream of 8-bit bytes. You can do that without regard to line > >> convention. > >> > And the XML people say that a parser must > >> > accept UTF-16. > >> Again, that's not relevant to the case in the article. But it's also a > >> completely different issue. An XML parser that must handle either UTF-8 > >> or UTF-16 needs a layer below the parser (conceptually) to detect the > >> encoding and return "characters" (as I think you have done). There is > >> no reason to suppose that that can't be written in portable C. > >> > I implement this by having the lexer call a function pointer to read a > >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 > >> > on the fly. > >> Exactly -- though I think I would not have converted to UTF-8 in a plain > >> parser. Maybe your application make that a good choice. > >> > If it's UTF-8, it's just an alias for fgetc(). > >> > But how do you know the file format? > >> The first character much be '<' (and, technically, it must be the '<' that > >> opens an XML declaration). The encoding should be clear from the first > >> two bytes. > I ended up looking at the spec (that's an hour I'll never get back!) and > it's more complicated... > >> > I have code that does this, but if > >> > I called it, the XML parser would no longer be a single file module. So > >> > I read the first few character of the file, the seek back to the start > >> > position. > >> > But this only works on seekable streams. > >> > >> Actually, you don't need to read more that one character to determine if > >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and > >> that works on non-seekable streams. > >> > > You need two characters, because you might have a UTF-16 little-endian > > stream without a BOM. So the first character in 8 bit bytes would be > > '<'. > Yes, I wasn't thinking. Thanks. You can't always tell until the second > byte, but you don't have to "unget" anything in that case because you > now know the character. > > But as it happens I spoke way too soon... The full picture is a mess. > Yes, it's awful. You have an "encoding" field in XML 1.0. But you can't depend on it because not all XML is version 1.0, some of it is bare. Now I don't have much experience with text, but I reckon that it's entirely possible that someone would run XML through a program like iconv, and it won't be clever enough to change the "encoding" field: > > > But there's a simple hack, which is to read the first character from the > > stream, then set up the lexer with a "<' sitting in its token. So of course > > you also have to read the first character when passed a string, which > > is a bit of a nuisance (and that's the sort of thing that gives programming > > such a bad reputation). > What do you mean "when passed a string"? Do you mean when the parser is > acting on in-memory data? > Sorry, I was so close to the program that I forgot that everybody else knows nothing of the code (It's on GitHub but not in the resource compiler, it's in a separate project). You can pass it either a file name, an open stream, or a string. The string has to be UTF-8 because it is a char *. Of course I have to read the first character of the string to make the string work the same way as the rest of the code, all to support UTF-16 without a BOM on non-seekable streams. > > But it should work now when piped a non-seekable > > UTF-16 stream. > It turns out that if you want to be 100% conforming you need to be able > to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to > set up just enough of the reading mechanism to be able to read the XML > declaration and then adjust the reading mechanism to handle the named > encoding. For your application, ISO-8859-1 might be effectively the > same as ISO-8859-15, but UCS-4 is a complication and you might want to > flag certain errors if the encoding is named as ISO-10646-UCS-2 rather > than UTF-16. > The XML people say that a parser must accept UTF-8 and UTF-16. I have heard of files which switch encodings, but I think they are largely mythical. The basic idea of XML was very good, but I'm not impressed with the standard. > > While this can obviously be done in C, I would much rather do it in > Haskell. Haskell's lazy evaluation gives you stream IO for free (so to > speak), and handling the tail of a lazy stream with functions computed > by looking at the start of it comes naturally in Haskell. > The structure of the C function is massively improved by going to a lexer and having a proper hierarchical, recursive grammar rather than the old ad-hoc system. (Which was tempting because basic XML is so simple). However it might be possible to do a much better job in Haskell. Unfortunately I can't do that better job. I'm confident that it is shaping up as a very good single file C XML parser, however.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-19 21:46 +0100 |
| Message-ID | <87pm3ivjyd.fsf@bsb.me.uk> |
| In reply to | #172514 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >> Malcolm McLean <malcolm.ar...@gmail.com> writes: >> >> > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote: >> >> Malcolm McLean <malcolm.ar...@gmail.com> writes: >> >> >> >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote: >> >> >> Bart <b...@freeuk.com> writes: >> >> >> >> >> >> > On 16/08/2023 08:31, Malcolm McLean wrote: >> >> >> >> Some people here are interested in Haskell. >> >> >> >> They might be interested in this: >> >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/ >> >> >> >> Of course it's written from a pro-Haskell point of view, and writing an >> >> >> >> improved version when you've got the C in front of you isn't really a >> >> >> >> fair test. But he does match C for speed. >> >> >> >> >> >> >> > >> >> >> > "Portability (i.e. Windows) is a pain in the arse with C." >> >> >> > >> >> >> > I wonder what makes them say that? >> >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing >> >> >> is one of the most potentially portable bits of C one could write (as >> >> >> you say yourself): >> >> >> >> >> > There are some gotchas with files, but not for the cut down parsing >> >> > they implement. >> >> > Windows used to accept "rt" for reading a text stream. And there's >> >> > still a mess with Unicode. >> >> None of that matters for the case in point. The C code treats the input >> >> like a stream of 8-bit bytes. You can do that without regard to line >> >> convention. >> >> > And the XML people say that a parser must >> >> > accept UTF-16. >> >> Again, that's not relevant to the case in the article. But it's also a >> >> completely different issue. An XML parser that must handle either UTF-8 >> >> or UTF-16 needs a layer below the parser (conceptually) to detect the >> >> encoding and return "characters" (as I think you have done). There is >> >> no reason to suppose that that can't be written in portable C. >> >> > I implement this by having the lexer call a function pointer to read a >> >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8 >> >> > on the fly. >> >> Exactly -- though I think I would not have converted to UTF-8 in a plain >> >> parser. Maybe your application make that a good choice. >> >> > If it's UTF-8, it's just an alias for fgetc(). >> >> > But how do you know the file format? >> >> The first character much be '<' (and, technically, it must be the '<' that >> >> opens an XML declaration). The encoding should be clear from the first >> >> two bytes. >> I ended up looking at the spec (that's an hour I'll never get back!) and >> it's more complicated... >> >> > I have code that does this, but if >> >> > I called it, the XML parser would no longer be a single file module. So >> >> > I read the first few character of the file, the seek back to the start >> >> > position. >> >> > But this only works on seekable streams. >> >> >> >> Actually, you don't need to read more that one character to determine if >> >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and >> >> that works on non-seekable streams. >> >> >> > You need two characters, because you might have a UTF-16 little-endian >> > stream without a BOM. So the first character in 8 bit bytes would be >> > '<'. >> Yes, I wasn't thinking. Thanks. You can't always tell until the second >> byte, but you don't have to "unget" anything in that case because you >> now know the character. >> >> But as it happens I spoke way too soon... The full picture is a mess. >> > Yes, it's awful. You have an "encoding" field in XML 1.0. But you can't > depend on it because not all XML is version 1.0, some of it is bare. Now > I don't have much experience with text, but I reckon that it's entirely > possible that someone would run XML through a program like iconv, > and it won't be clever enough to change the "encoding" field: Maybe. But you are providing a tool and you don't have to accept everything. A converted document with the wrong XML declaration is an error and you could just reject it. You don't have to bend over backwards for bad input. Not being a Windows user, I've not seen a UTF-16 encoded file in the wild, so I would not even accept that. In the Linux world, I'd probably accept only UTF-8 and point my users to xmllint. xmllint can read any valid XML file and re-write it using UTF-8 (or, indeed, many other encodings), changing the XML declaration on the fly. Hence xmllint -encode UTF-8 myresources | babyxrc would be close to a universal XML processor with little extra work on your part. But maybe you probably can't assume your users will want to do that. >> It turns out that if you want to be 100% conforming you need to be able >> to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to >> set up just enough of the reading mechanism to be able to read the XML >> declaration and then adjust the reading mechanism to handle the named >> encoding. For your application, ISO-8859-1 might be effectively the >> same as ISO-8859-15, but UCS-4 is a complication and you might want to >> flag certain errors if the encoding is named as ISO-10646-UCS-2 rather >> than UTF-16. >> > The XML people say that a parser must accept UTF-8 and UTF-16. Don't they go further? I thought they did. Maybe the others are optional and those two are the only must-haves. > I have > heard of files which switch encodings, but I think they are largely mythical. > The basic idea of XML was very good, but I'm not impressed with the > standard. There are two kinds of standards: those that incorporate lots of options because of all the interested parties, and those that make decisive choices between competing candidates. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-19 03:04 -0700 |
| Message-ID | <cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com> |
| In reply to | #172511 |
On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: > > It turns out that if you want to be 100% conforming you need to be able > to detect both UCS-4 and (eye roll) EBCDIC. > I had a go at ECBDIC. If anyone has an EBCDIC XML file they'd like to test, please post a link. Of course the next challenge is to support ECBDIC as the execution character set. This means all the if (ch == '<') statements have to come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings have to be replaced with hex codes. Here's where the Baby X resource compiler shows its power. Simply set up the input <BabyXRC> <utf8 name="cdata"><CDATA</utf8> </BabyXRC> And so on, and you get all the strings in hex-encoded UTF-8, ready to cut and paste.
[toc] | [prev] | [next] | [standalone]
| From | Lew Pitcher <lew.pitcher@digitalfreehold.ca> |
|---|---|
| Date | 2023-08-19 13:19 +0000 |
| Message-ID | <ubqfgf$r0tm$1@dont-email.me> |
| In reply to | #172519 |
On Sat, 19 Aug 2023 03:04:27 -0700, Malcolm McLean wrote: > On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >> >> It turns out that if you want to be 100% conforming you need to be able >> to detect both UCS-4 and (eye roll) EBCDIC. >> > I had a go at ECBDIC. > > If anyone has an EBCDIC XML file they'd like to test, please post a link. Be careful of what you ask for, Malcolm You /do/ realize that "EBCDIC" refers to a whole family of charactersets, (at least 46 individual charactersets, most with /some/ common elements) and /not/ to a single characterset like Unicode or US-ASCII (although, you could argue that ASCII embodied multiple charactersets, just with fewer variants). FWIW, there are a number EBCDIC charactersets that you could not reliably use in XML, as they lack a few of the required characters. You might take a look at the DKUUG's characterset standards website[1] - they contributed to the ISO/IEC JTC 1/SC 2 [2] effort to catalogue and standardize charactersets. [1] http://std.dkuug.dk/i18n/charmaps/ [2] https://en.wikipedia.org/wiki/ISO/IEC_JTC_1/SC_2 [snip] -- Lew Pitcher "In Skills We Trust"
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2023-08-19 14:48 +0000 |
| Message-ID | <IC4EM.686039$TPw2.185069@fx17.iad> |
| In reply to | #172519 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >> >> It turns out that if you want to be 100% conforming you need to be able >> to detect both UCS-4 and (eye roll) EBCDIC. >> >I had a go at ECBDIC. > >If anyone has an EBCDIC XML file they'd like to test, please post a link. Here's one: Lo§”“@¥…™¢‰–•~ñKð@…•ƒ–„‰•‡~¤£†`øon %Lo§”“`¢£¨“…¢ˆ……£@ˆ™…†~–•…m™…‡‰¢£…™K§¢“@£¨—…~£…§£a§¢“@on %LOÄÖÃãè×Å@™…‡‰¢£…™¢@âèâãÅÔ@™…‡‰¢£…™¢K„£„n %LO``@Ö—¨™‰‡ˆ£@Mƒ]@òðñð`òðñô@ÁÙÔ@Ó‰”‰£…„K@Á““@™‰‡ˆ£¢@™…¢…™¥…„K@``n %LO``@㈉¢@„–ƒ¤”…•£@‰¢@Ö•†‰„…•£‰“K@㈉¢@„–ƒ¤”…•£@”¨@–•“¨@‚…@¤¢…„@•„@„‰¢£™‰‚¤£…„@‰•@ƒƒ–™„•ƒ…@¦‰£ˆ@£ˆ…@£…™”¢@–†@£ˆ…@‡™……”…•£@…•£…™…„@‰•£–@‚¨@ÁÙÔ@•„@£ˆ…@—™£¨@£ˆ£@ÁÙÔ@„…“‰¥…™…„@£ˆ‰¢@„–ƒ¤”…•£@£–K@``n %LO``@㈅@„£@ƒ–•£‰•…„@‰•@£ˆ‰¢@„–ƒ¤”…•£@‰¢@—™…“‰”‰•™¨@•„@¢¤‚‘…ƒ£@£–@ƒˆ•‡…@–™@ƒ–™™…ƒ£‰–•@†–““–¦‰•‡@†¤™£ˆ…™@™…¥‰…¦K@``n %L™…‡‰¢£…™m—‡…n %@@L™…‡‰¢£…™¢n %@@@@L™…‡‰¢£…™@‰¢m™…‡‰¢£…™~㙤…@‰¢m‰•£…™•“~Æ“¢…@‰¢m‚•’…„~Æ“¢…@‰¢m–—£‰–•“~Æ“¢…@‰¢m¢£¤‚m…•£™¨~Æ“¢…n %@@@@@@L™…‡m¢ˆ–™£m•”…nÔÉÄÙmÅÓñLa™…‡m¢ˆ–™£m•”…n %@@@@@@L™…‡m“–•‡m•”…nÔ‰•@ÉÄ@Ù…‡‰¢£…™La™…‡m“–•‡m•”…n %@@@@@@ %@@@@@@ %@@@@@@ %@@@@@@L™…‡m„„™…¢¢@…§£…™•“mƒƒ…¢¢~㙤…@”…”m”—mƒƒ…¢¢~㙤…@—–¦…™m„–”‰•~Ä…‚¤‡n %@@@@@@@@L™…‡mƒ–”—–•…•£nÄ…‚¤‡La™…‡mƒ–”—–•…•£n %@@@@@@@@ %@@@@@@@@L™…‡m–††¢…£nLˆ…§•¤”‚…™nð§ÄððLaˆ…§•¤”‚…™nLa™…‡m–††¢…£n %@@@@@@La™…‡m„„™…¢¢n %@@@@@@L™…‡m™…¢…£m¥“¤…nãÂÄLa™…‡m™…¢…£m¥“¤…n %@@@@@@L™…‡mƒƒ…¢¢n %@@@@@@@@L™…‡mƒƒ…¢¢m¢££…n %@@@@@@@@@@L™…‡mƒƒ…¢¢m“…¥…“nÄ…†¤“£La™…‡mƒƒ…¢¢m“…¥…“n %@@@@@@@@@@L™…‡mƒƒ…¢¢m£¨—…nÙÖLa™…‡mƒƒ…¢¢m£¨—…n %@@@@@@@@La™…‡mƒƒ…¢¢m¢££…n@ %@@@@@@La™…‡mƒƒ…¢¢n %@@@@@@L™…‡m”——‰•‡¢n %@@@@@@@@L™…‡m”——‰•‡n %@@@@@@@@@@L”——…„m•”…@†‰“…•”…~ÁÁ™ƒˆöô`”‰„™m…“ñK§”“nÔÉÄÙmÅÓñLa”——…„m•”…n %@@@@@@@@@@L”——…„m£¨—…nÁ™ƒˆ‰£…ƒ£¤™“La”——…„m£¨—…n %@@@@@@@@@@L”——…„m…§…ƒ¤£‰–•m¢££…nÁÁ™ƒˆöôLa”——…„m…§…ƒ¤£‰–•m¢££…n %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@La™…‡m”——‰•‡nL™…‡m”——‰•‡n %@@@@@@@@@@L”——…„m•”…@†‰“…•”…~ÁÁ™ƒˆóò`”‰„™K§”“nÔÉÄÙLa”——…„m•”…n %@@@@@@@@@@L”——…„m£¨—…nÁ™ƒˆ‰£…ƒ£¤™“La”——…„m£¨—…n %@@@@@@@@@@L”——…„m…§…ƒ¤£‰–•m¢££…nÁÁ™ƒˆóòLa”——…„m…§…ƒ¤£‰–•m¢££…n %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@@@ %@@@@@@@@La™…‡m”——‰•‡n %@@@@@@La™…‡m”——‰•‡¢n %@@@@@@L™…‡m—¤™—–¢…n %@@@@@@@@L—¤™—–¢…m£…§£nL—™n×™–¥‰„…¢@‰„…•£‰†‰ƒ£‰–•@‰•†–™”£‰–•@†–™@£ˆ…@×Åk@‰•ƒ“¤„‰•‡@•@‰”—“…”…•£…™@ƒ–„…@†–™@£ˆ…@„…¥‰ƒ…@•„@@„…¥‰ƒ…@ÉÄ@•¤”‚…™KLa—™nLa—¤™—–¢…m£…§£n %@@@@@@@@ %@@@@@@La™…‡m—¤™—–¢…n %@@@@@@L™…‡m‡™–¤—¢n %@@@@@@@@L™…‡m‡™–¤—nÉ„…•£‰†‰ƒ£‰–•@™…‡‰¢£…™¢La™…‡m‡™–¤—n %@@@@@@La™…‡m‡™–¤—¢n %@@@@@@L™…‡m¤¢‡…mƒ–•¢£™‰•£¢n %@@@@@@@@ %@@@@@@La™…‡m¤¢‡…mƒ–•¢£™‰•£¢n %@@@@@@L™…‡mƒ–•†‰‡¤™£‰–•n %@@@@@@@@ %@@@@@@La™…‡mƒ–•†‰‡¤™£‰–•n %@@@@@@ %@@@@@@L™…‡m££™‰‚¤£…¢n %@@@@@@@@L££™‰‚¤£…¢m£…§£nL—™nÔÉÄÙmÅÓñ@‰¢@@óò`‚‰£@™…‡‰¢£…™KLa—™nLa££™‰‚¤£…¢m£…§£n %@@@@@@@@ %@@@@@@La™…‡m££™‰‚¤£…¢n %@@@@@@L™…‡m†‰…“„¢…£¢n %@@@@@@ %@@@@@@L†‰…“„¢@“…•‡£ˆ~óòn %@@@@@@ %@@@@@@ %@@@@@@@@ %L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n %@@L†‰…“„m•”…nÉ”—“…”…•£…™La†‰…“„m•”…n %@@ %@@L†‰…“„m”¢‚nóñLa†‰…“„m”¢‚n %@@L†‰…“„m“¢‚nòôLa†‰…“„m“¢‚n %@@ %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™n㈅@É”—“…”…•£…™@ƒ–„…K@㈉¢@†‰…“„@”¤¢£@ˆ–“„@•@‰”—“…”…•£…™@ƒ–„…@£ˆ£@ˆ¢@‚……•@¢¢‰‡•…„@‚¨@ÁÙÔK@Á¢¢‰‡•…„@ƒ–„…¢@‰•ƒ“¤„…@£ˆ…@†–““–¦‰•‡zLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL£‚“…nL£‡™–¤—@ƒ–“¢~ónL£ˆ…„nL™–¦nL…•£™¨nÈ…§@™…—™…¢…•££‰–•La…•£™¨nL…•£™¨nÁâÃÉÉ@™…—™…¢…•££‰–•La…•£™¨nL…•£™¨nÉ”—“…”…•£…™La…•£™¨nLa™–¦nLa£ˆ…„nL£‚–„¨nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôñLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÁLa…•£™¨nL…•£™¨nÁÙÔ@Ó‰”‰£…„La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôòLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÂLa…•£™¨nL…•£™¨n™–„ƒ–”@Ö™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôóLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÃLa…•£™¨nL…•£™¨nÃ¥‰¤”@É•ƒKLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôôLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÄLa…•£™¨nL…•£™¨nĉ‡‰£“@ؤ‰—”…•£@Ö™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôùLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÉLa…•£™¨nL…•£™¨nÉ•†‰•…–•@ã…ƒˆ•–“–‡‰…¢@ÁÇLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôÄLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÔLa…•£™¨nL…•£™¨nÔ–£–™–“@–™@Æ™……¢ƒ“…@â…”‰ƒ–•„¤ƒ£–™@É•ƒKLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§ôÅLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nÕLa…•£™¨nL…•£™¨nÕåÉÄÉÁ@Ö™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§õðLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨n×La…•£™¨nL…•£™¨nÁ——“‰…„@Ô‰ƒ™–@É™ƒ¤‰£¢@Ö™—–™£‰–•La…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§õñLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nØLa…•£™¨nL…•£™¨nؤ“ƒ–””@É•ƒKLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§õöLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨nåLa…•£™¨nL…•£™¨nÔ™¥…““@É•£…™•£‰–•“@Ó£„KLa…•£™¨nLa™–¦nL™–¦nL…•£™¨nLˆ…§•¤”‚…™nð§öùLaˆ…§•¤”‚…™nLa…•£™¨nL…•£™¨n‰La…•£™¨nL…•£™¨nÉ•£…“@Ö™—–™£‰–•La…•£™¨nLa™–¦nLa£‚–„¨nLa£‡™–¤—nLa£‚“…nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁÙÔ@ƒ•@¢¢‰‡•@ƒ–„…¢@£ˆ£@™…@•–£@—¤‚“‰¢ˆ…„@‰•@£ˆ‰¢@”•¤“K@Á““@¥“¤…¢@•–£@¢¢‰‡•…„@‚¨@ÁÙÔ@™…@™…¢…™¥…„@•„@”¤¢£@•–£@‚…@¤¢…„KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@ %@@L†‰…“„m¥“¤…¢n %@@@@ %@@@@ %@@@@ %@@La†‰…“„m¥“¤…¢n %@@ %@@ %@@L†‰…“„m™…¢…£¢n %@@@@ %@@La†‰…“„m™…¢…£¢n %@@ %@@ %@@ %La†‰…“„n % %@@@@@@ %@@@@@@@@ %L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n %@@L†‰…“„m•”…n噉•£La†‰…“„m•”…n %@@ %@@L†‰…“„m”¢‚nòóLa†‰…“„m”¢‚n %@@L†‰…“„m“¢‚nòðLa†‰…“„m“¢‚n %@@ %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁ•@L™”`„…†‰•…„`¦–™„nÉÔ×ÓÅÔÅÕãÁãÉÖÕ@ÄÅÆÉÕÅÄLa™”`„…†‰•…„`¦–™„n@¥™‰•£@•¤”‚…™K@㨗‰ƒ““¨k@£ˆ‰¢@†‰…“„@‰¢@¤¢…„@£–@„‰¢£‰•‡¤‰¢ˆ@‚…£¦……•@„‰††…™…•£@—™–„¤ƒ£@¥™‰•£¢k@–™@”‘–™@™…¥‰¢‰–•¢@–†@@—™–„¤ƒ£KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@ %@@L†‰…“„m¥“¤…¢n %@@@@ %@@@@ %@@@@ %@@La†‰…“„m¥“¤…¢n %@@ %@@ %@@L†‰…“„m™…¢…£¢n %@@@@ %@@La†‰…“„m™…¢…£¢n %@@ %@@ %@@ %La†‰…“„n % %@@@@@@ %@@@@@@@@ %L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n %@@L†‰…“„m•”…nÁ™ƒˆ‰£…ƒ£¤™…La†‰…“„m•”…n %@@ %@@L†‰…“„m”¢‚nñùLa†‰…“„m”¢‚n %@@L†‰…“„m“¢‚nñöLa†‰…“„m“¢‚n %@@ %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™n㈅@—…™”‰££…„@¥“¤…¢@–†@£ˆ‰¢@†‰…“„@™…zLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@ %@@L†‰…“„m¥“¤…¢n %@@@@ %@@@@ %@@@@L†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nðððñLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥ôLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nððñðLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥ôãLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nððññLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õ@M–‚¢–“…£…]La—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nðñððLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õãLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nðñðñLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õãÅLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nðññðLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥õãÅÑLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nðñññLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÁÙÔ¥öLa—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…nL†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@@@@@L†‰…“„m¥“¤…nññññLa†‰…“„m¥“¤…n %@@@@@@ %@@@@@@L†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•nL—™nÄ…†‰•…„@‚¨@Ã×äÉÄ@¢ƒˆ…”…La—™nLa†‰…“„m¥“¤…m„…¢ƒ™‰—£‰–•n %@@@@@@ %@@@@@@ %@@@@La†‰…“„m¥“¤…m‰•¢£•ƒ…n %@@La†‰…“„m¥“¤…¢n %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~†£…™nL—™nÁ““@–£ˆ…™@¥“¤…¢@™…@™…¢…™¥…„KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@ %@@ %@@L†‰…“„m™…¢…£¢n %@@@@ %@@La†‰…“„m™…¢…£¢n %@@ %@@ %@@ %La†‰…“„n % %@@@@@@ %@@@@@@@@ %L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n %@@L†‰…“„m•”…n×™£Õ¤”La†‰…“„m•”…n %@@ %@@L†‰…“„m”¢‚nñõLa†‰…“„m”¢‚n %@@L†‰…“„m“¢‚nôLa†‰…“„m“¢‚n %@@ %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁ•@L™”`„…†‰•…„`¦–™„nÉÔ×ÓÅÔÅÕãÁãÉÖÕ@ÄÅÆÉÕÅÄLa™”`„…†‰•…„`¦–™„n@—™‰”™¨@—™£@•¤”‚…™@†–™@£ˆ…@„…¥‰ƒ…KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÖ•@—™–ƒ…¢¢–™¢@‰”—“…”…•£…„@‚¨@ÁÙÔk@‰†@£ˆ…@£–—@†–¤™@‚‰£¢@–†@£ˆ…@—™‰”™¨@—™£@•¤”‚…™@™…@Lˆ…§•¤”‚…™nð§ðLaˆ…§•¤”‚…™n@–™@Lˆ…§•¤”‚…™nð§÷Laˆ…§•¤”‚…™nk@£ˆ…@¥™‰•£@•„@™ƒˆ‰£…ƒ£¤™…@™…@…•ƒ–„…„@„‰††…™…•£“¨KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@ %@@L†‰…“„m¥“¤…¢n %@@@@ %@@@@ %@@@@ %@@La†‰…“„m¥“¤…¢n %@@ %@@ %@@L†‰…“„m™…¢…£¢n %@@@@ %@@La†‰…“„m™…¢…£¢n %@@ %@@ %@@ %La†‰…“„n % %@@@@@@ %@@@@@@@@ %L†‰…“„@‰¢m¥™‰‚“…m“…•‡£ˆ~Æ“¢…@ˆ¢m—™£‰“m†‰…“„¢…£~Æ“¢…@‰¢m“‰•’…„m£–m—™£‰“m†‰…“„¢…£~Æ“¢…n %@@L†‰…“„m•”…nÙ…¥‰¢‰–•La†‰…“„m•”…n %@@ %@@L†‰…“„m”¢‚nóLa†‰…“„m”¢‚n %@@L†‰…“„m“¢‚nðLa†‰…“„m“¢‚n %@@ %@@L†‰…“„m„…¢ƒ™‰—£‰–•@–™„…™~‚…†–™…nL—™nÁ•@L™”`„…†‰•…„`¦–™„nÉÔ×ÓÅÔÅÕãÁãÉÖÕ@ÄÅÆÉÕÅÄLa™”`„…†‰•…„`¦–™„n@™…¥‰¢‰–•@•¤”‚…™@†–™@£ˆ…@„…¥‰ƒ…KLa—™nLa†‰…“„m„…¢ƒ™‰—£‰–•n %@@ %@@L†‰…“„m¥“¤…¢n %@@@@ %@@@@ %@@@@ %@@La†‰…“„m¥“¤…¢n %@@ %@@ %@@L†‰…“„m™…¢…£¢n %@@@@ %@@La†‰…“„m™…¢…£¢n %@@ %@@ %@@ %La†‰…“„n % %@@@@@@ %@@@@@@@@La†‰…“„¢n %@@@@@@La™…‡m†‰…“„¢…£¢n %@@@@@@ %@@@@La™…‡‰¢£…™n %@@La™…‡‰¢£…™¢n %@@L£‰”…¢£”—nñôaðóaòðñô@ð÷zôöLa£‰”…¢£”—n %La™…‡‰¢£…™m—‡…n
[toc] | [prev] | [next] | [standalone]
| From | Lew Pitcher <lew.pitcher@digitalfreehold.ca> |
|---|---|
| Date | 2023-08-19 15:09 +0000 |
| Message-ID | <ubqlvc$shgn$1@dont-email.me> |
| In reply to | #172538 |
On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote: > Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >>> >>> It turns out that if you want to be 100% conforming you need to be able >>> to detect both UCS-4 and (eye roll) EBCDIC. >>> >>I had a go at ECBDIC. >> >>If anyone has an EBCDIC XML file they'd like to test, please post a link. > > Here's one: [snip] And that's an excellent illustration of my point about some EBCDIC charactersets lacking the necessary characters to properly express XML. Here are the first four lines of the ASCII equivalent of that message, as generated by dd if=ebcdic.msg of=ascii.msg conv=ascii where conv=ascii will convert "from EBCDIC to ASCII" (dd(1) manpage) Note the (translated) format of the DOCTYPE entities <?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet href="one_register.xsl" type="text/xsl" ?> <|DOCTYPE registers SYSTEM "registers.dtd"> <|-- Copyright (c) 2010-2014 ARM Limited. All rights reserved. --> Apparently, you used a variant of EBCDIC that includes an exclamation mark at codepoint 0x4f; dd uses EBCDIC-US which, at codepoint 0x4f encodes a "VERTICAL LINE" -- Lew Pitcher "In Skills We Trust"
[toc] | [prev] | [next] | [standalone]
| From | Lew Pitcher <lew.pitcher@digitalfreehold.ca> |
|---|---|
| Date | 2023-08-19 15:17 +0000 |
| Message-ID | <ubqmdh$shgn$2@dont-email.me> |
| In reply to | #172540 |
On Sat, 19 Aug 2023 15:09:32 +0000, Lew Pitcher wrote: > On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote: > >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >>>> >>>> It turns out that if you want to be 100% conforming you need to be able >>>> to detect both UCS-4 and (eye roll) EBCDIC. >>>> >>>I had a go at ECBDIC. >>> >>>If anyone has an EBCDIC XML file they'd like to test, please post a link. >> >> Here's one: > [snip] > > And that's an excellent illustration of my point about some EBCDIC > charactersets lacking the necessary characters to properly express XML. > > Here are the first four lines of the ASCII equivalent of that message, > as generated by > dd if=ebcdic.msg of=ascii.msg conv=ascii > where > conv=ascii > will convert "from EBCDIC to ASCII" (dd(1) manpage) > > Note the (translated) format of the DOCTYPE entities > <?xml version="1.0" encoding="utf-8"?> Oh, and bye the way, that "encoding" value is incorrect for the XML document you posted. It should have named the EBCDIC variant you used, not "utf-8". I suspect that you just machine or hand encoded an existing utf-8 XML document, rather than compose a completely new document in EBCDIC FWIW, I spent many years working in an EBCDIC environment, manipulating XML documents (in EBCDIC) with a tool developed in-house. I had to write a number of "white papers" on the subjects of characterset translation (to/from EBCDIC, and between EBCDIC variants), and on XML handling in an EBCDIC environment. :-) -- Lew Pitcher "In Skills We Trust"
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2023-08-19 21:05 +0000 |
| Message-ID | <t8aEM.562366$SuUf.127420@fx14.iad> |
| In reply to | #172540 |
Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes: >On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote: > >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >>>> >>>> It turns out that if you want to be 100% conforming you need to be able >>>> to detect both UCS-4 and (eye roll) EBCDIC. >>>> >>>I had a go at ECBDIC. >>> >>>If anyone has an EBCDIC XML file they'd like to test, please post a link. >> >> Here's one: >[snip] > >And that's an excellent illustration of my point about some EBCDIC >charactersets lacking the necessary characters to properly express XML. > >Here are the first four lines of the ASCII equivalent of that message, >as generated by > dd if=ebcdic.msg of=ascii.msg conv=ascii >where > conv=ascii >will convert "from EBCDIC to ASCII" (dd(1) manpage) > >Note the (translated) format of the DOCTYPE entities > <?xml version="1.0" encoding="utf-8"?> > <?xml-stylesheet href="one_register.xsl" type="text/xsl" ?> > <|DOCTYPE registers SYSTEM "registers.dtd"> > <|-- Copyright (c) 2010-2014 ARM Limited. All rights reserved. --> > >Apparently, you used a variant of EBCDIC that includes an exclamation mark >at codepoint 0x4f; dd uses EBCDIC-US which, at codepoint 0x4f encodes >a "VERTICAL LINE" > Actually, I used 'dd' on an old Fedora Core install.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-19 21:05 +0100 |
| Message-ID | <871qfyx0fe.fsf@bsb.me.uk> |
| In reply to | #172538 |
scott@slp53.sl.home (Scott Lurndal) writes: > Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >>> >>> It turns out that if you want to be 100% conforming you need to be able >>> to detect both UCS-4 and (eye roll) EBCDIC. >>> >>I had a go at ECBDIC. >> >>If anyone has an EBCDIC XML file they'd like to test, please post a link. > > Here's one: > > Lo... <EBCDIC-encoded XML deleted> Is that legal? I thought an EBCDIC XML file must give the correct encoding in the XML declaration. xmllint rejects it unless I edit the declaration. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2023-08-19 21:07 +0000 |
| Message-ID | <baaEM.562367$SuUf.444809@fx14.iad> |
| In reply to | #172553 |
Ben Bacarisse <ben.usenet@bsb.me.uk> writes: >scott@slp53.sl.home (Scott Lurndal) writes: > >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >>>> >>>> It turns out that if you want to be 100% conforming you need to be able >>>> to detect both UCS-4 and (eye roll) EBCDIC. >>>> >>>I had a go at ECBDIC. >>> >>>If anyone has an EBCDIC XML file they'd like to test, please post a link. >> >> Here's one: >> >> Lo... > ><EBCDIC-encoded XML deleted> > >Is that legal? I thought an EBCDIC XML file must give the correct >encoding in the XML declaration. xmllint rejects it unless I edit the >declaration. As Lew pointed out, it was not properly specified, I had cheated and encoded (using dd) an existing xml file (from the public ARM Aarch64 SysReg XML).
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-19 22:31 +0100 |
| Message-ID | <87jztqvhwf.fsf@bsb.me.uk> |
| In reply to | #172519 |
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote: >> >> It turns out that if you want to be 100% conforming you need to be able >> to detect both UCS-4 and (eye roll) EBCDIC. >> > I had a go at ECBDIC. > > If anyone has an EBCDIC XML file they'd like to test, please post a > link. You can make your own by (a) setting the encoding="..." attribute in the declaration (EBCDIC-INT is a good one) and then running iconv. > Of course the next challenge is to support ECBDIC as the execution > character set. This means all the if (ch == '<') statements have to > come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings > have to be replaced with hex codes. Do you have a user who wants to compile your program on a system that does not support ASCII C source? > Here's where the Baby X resource compiler shows its power. Simply set > up the input > <BabyXRC> > <utf8 name="cdata"><CDATA</utf8> > </BabyXRC> You've lost me. That does not parse. > And so on, and you get all the strings in hex-encoded UTF-8, ready to > cut and paste. What strings? And why hex -- nothing in the XML suggests hex? I usually want UTF-8 strings as UTF-8 strings in the source, but I understand your user base does not include me. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-19 22:04 -0700 |
| Message-ID | <7f9fbbd6-7f5c-4e12-a73b-c9abe91b7f5bn@googlegroups.com> |
| In reply to | #172563 |
On Saturday, 19 August 2023 at 22:31:28 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
> > On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
> >>
> >> It turns out that if you want to be 100% conforming you need to be able
> >> to detect both UCS-4 and (eye roll) EBCDIC.
> >>
> > I had a go at ECBDIC.
> >
> > If anyone has an EBCDIC XML file they'd like to test, please post a
> > link.
> You can make your own by (a) setting the encoding="..." attribute in the
> declaration (EBCDIC-INT is a good one) and then running iconv.
> > Of course the next challenge is to support ECBDIC as the execution
> > character set. This means all the if (ch == '<') statements have to
> > come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings
> > have to be replaced with hex codes.
> Do you have a user who wants to compile your program on a system that
> does not support ASCII C source?
>
Who knows. The code is publicly available to whoever wants it.
The problem with this model is that, unless the user chooses to get back to
you, you've no idea who he is, or how he is using the code, or if he has
any problems with it. Unlike paying customers who usually leave their
details, and are likely to complain if they don't get what they wanted.
But if the XML parser is to support EBCDIC input, then I'd expect that
an EBCDIC-interested user might also want to compile under a system
where the execution character set is EBCDIC. However he'll get UTF-8
output, which is probably not what he wants.
I'd need a EBCDIC C compiler to test it.
>
> > Here's where the Baby X resource compiler shows its power. Simply set
> > up the input
> > <BabyXRC>
> > <utf8 name="cdata"><CDATA</utf8>
> > </BabyXRC>
> You've lost me. That does not parse.
> > And so on, and you get all the strings in hex-encoded UTF-8, ready to
> > cut and paste.
> What strings? And why hex -- nothing in the XML suggests hex? I
> usually want UTF-8 strings as UTF-8 strings in the source, but I
> understand your user base does not include me.
>
XML documents contain a tag called "CDATA". So the natural thing is
to write
if (!strcmp(tag, "CDATA") /* check for CDATA and process it. */
This will work on a program which accepts data in the execution character
set and only in the execution character set. However the XML parser
accepts data in ASCII, UTF-8, UTF-16 (two flavours) and, now, EBCDIC.
It does this by converting to a common format via a conversion function
passed to the lexer, and the common format is UTF-8.
So "tag" will be in UTF-8. If the execution character set is ASCII, then
the comparison will still work, and that is what I have done. But if it is
EBCDIC, it will fail.
Instead we need to write
/* CDATA in UTF-8 */
char *cdata = {0x43, 0x44, 0x54, 0x41, 0x00}:
if (!strcmp(tag, cdata)) /* check for CDATA and process it */
This is where the Baby X resource compiler comes to our rescue. It will
convert ASCII to that form, with the utf-8 tag.
[toc] | [prev] | [next] | [standalone]
| From | Richard Damon <Richard@Damon-Family.org> |
|---|---|
| Date | 2023-08-20 07:41 -0400 |
| Message-ID | <5_mEM.119316$uEkc.63082@fx35.iad> |
| In reply to | #172568 |
On 8/20/23 1:04 AM, Malcolm McLean wrote:
> On Saturday, 19 August 2023 at 22:31:28 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>>> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>>
>>>> It turns out that if you want to be 100% conforming you need to be able
>>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>>
>>> I had a go at ECBDIC.
>>>
>>> If anyone has an EBCDIC XML file they'd like to test, please post a
>>> link.
>> You can make your own by (a) setting the encoding="..." attribute in the
>> declaration (EBCDIC-INT is a good one) and then running iconv.
>>> Of course the next challenge is to support ECBDIC as the execution
>>> character set. This means all the if (ch == '<') statements have to
>>> come out and be replaced by if (ch == ASCII_LESSTHEN). And the strings
>>> have to be replaced with hex codes.
>> Do you have a user who wants to compile your program on a system that
>> does not support ASCII C source?
>>
> Who knows. The code is publicly available to whoever wants it.
> The problem with this model is that, unless the user chooses to get back to
> you, you've no idea who he is, or how he is using the code, or if he has
> any problems with it. Unlike paying customers who usually leave their
> details, and are likely to complain if they don't get what they wanted.
>
> But if the XML parser is to support EBCDIC input, then I'd expect that
> an EBCDIC-interested user might also want to compile under a system
> where the execution character set is EBCDIC. However he'll get UTF-8
> output, which is probably not what he wants.
>
> I'd need a EBCDIC C compiler to test it.
>>
>>> Here's where the Baby X resource compiler shows its power. Simply set
>>> up the input
>>> <BabyXRC>
>>> <utf8 name="cdata"><CDATA</utf8>
>>> </BabyXRC>
>> You've lost me. That does not parse.
>>> And so on, and you get all the strings in hex-encoded UTF-8, ready to
>>> cut and paste.
>> What strings? And why hex -- nothing in the XML suggests hex? I
>> usually want UTF-8 strings as UTF-8 strings in the source, but I
>> understand your user base does not include me.
>>
> XML documents contain a tag called "CDATA". So the natural thing is
> to write
> if (!strcmp(tag, "CDATA") /* check for CDATA and process it. */
>
> This will work on a program which accepts data in the execution character
> set and only in the execution character set. However the XML parser
> accepts data in ASCII, UTF-8, UTF-16 (two flavours) and, now, EBCDIC.
> It does this by converting to a common format via a conversion function
> passed to the lexer, and the common format is UTF-8.
>
> So "tag" will be in UTF-8. If the execution character set is ASCII, then
> the comparison will still work, and that is what I have done. But if it is
> EBCDIC, it will fail.
>
> Instead we need to write
>
> /* CDATA in UTF-8 */
> char *cdata = {0x43, 0x44, 0x54, 0x41, 0x00}:
>
> if (!strcmp(tag, cdata)) /* check for CDATA and process it */
>
> This is where the Baby X resource compiler comes to our rescue. It will
> convert ASCII to that form, with the utf-8 tag.
Why not just write u8"CDATA" instead.
u8 strings are always UTF-8 encoded, no matter what the execution
character set is.
[toc] | [prev] | [next] | [standalone]
Page 1 of 15 [1] 2 3 … 15 Next page →
Back to top | Article view | comp.lang.c
csiph-web