From steve-t@hpfcso Tue Aug 14 14:47 MDT 1990
Received: from hpfcso by hpfclw.HP.COM; Tue, 14 Aug 90 14:47:48 mdt
Received: by hpfcso.HP.COM; Tue, 14 Aug 90 14:48:05 mdt
Date: Tue, 14 Aug 90 14:48:05 mdt
From: Steve Taylor <steve-t@hpfcso>
Full-Name: Steve Taylor
Message-Id: <9008142048.AA14630@hpfcso.HP.COM>
To: jwh@hpfclw, zac@hpislx
Subject: FYI on 68040 cache
Cc: steve@hpfclw
Status: RO

Jeff and Zac,
	Thought this string from hp-factory.std.unix might be of interest
	to the two of you, since you have to deal with the new 040 cache
	modes.						Regards, Steve taylor


>From: jsm@hpfcdc.HP.COM (John Marvin) 		Date: Tue, 14 Aug 1990 
Organization: HP Fort Collins, Co.  		Newsgroups: hp-factory.std.unix
	Subject: RFC: New cachectl(3C), Changed ld(1),chatr(1)

    The MC68040 processor has introduced a few object compatibility problems due
to:  1) the larger instruction cache, and 2) the addition of copyback caching.
The only applications that are affected are ones that load machine code into the
data and/or stack segment, and then try to execute it.  Applications that are
known to do this include:  HP Softbench, HP Common Lisp, and SoftPC.  We have
decided to provide two solutions to the problem:

    1) Allow the user, via either chatr(1) or ld(1), to specify writethrough
    caching for the data and/or stack segment.  This solution may fix some
    applications without having to modify the application.  See the detailed
    problem description below.

    2) Add a new C library routine, cachectl(3C), that will allow applications
    to purge/flush the cache appropriately.

The first three responses to this note contain:

    1) A new man page for cachectl(3C).

    2) A DEPENDENCIES section to be added to chatr(1).

    3) Some new text to be added to the Series 300/400 DEPENDENCIES section of
    ld(1).

The cachectl(3C) man page is written as a shared man page (with a Series 300/400
DEPENDENCIES section), even though it is marked as "Series 300/400 Only".  This
is because this interface could be implemented for Series 800 in the future.
Cachectl(3C) is not necessary for Series 800 machines since cache operations are
not privileged on Series 800; however, it would be worthwhile to implement for
portability considerations.

Please respond to this proposal in a timely manner since some of the
functionality needs to be provided in the 7.05 release.

	John Marvin 	jsm@hpfcls.fc.hp.com 	1-229-3482

Note:  The rest of this note contains a more detailed description of the problem
and the proposed solutions.  The description is written so that it can be
inserted into a 7.05 Release Notes, if no major problems are found with this
proposal.

Problem Description
-------------------

    New generation HP 9000 Series 300/400 workstations will be using the
Motorola MC68040 microprocessor.  Almost all current Series 300 applications
should run unmodified on the new generation workstations.  However, there are a
very small class of applications that will break on the new work-stations.

    The problem is due to the new copyback cache capability of the MC68040.
With copyback caching enabled, performance in most cases is greatly increased
over the write-through caching case.  Therefore, to be competitive, copyback
caching must be enabled by default.  In most cases copyback caching is
transparent to the application.  The only problem is when machine language is
written at run time into the data cache and then an attempt is made to execute
it.  The problem is due to the fact that the machine language will be sitting in
the data cache, which is not searched on an instruction fetch.

The Solutions
-------------

    There are two solutions to this problem, one requiring a modification to the
application (and therefore requiring access to the source).  One solution is to
specify the use of writeback caching for the data and/or stack segment on a per
program basis.  The other solution is to push dirty cache lines out to main
memory after the machine language has been written to the data segment, but
before execution of the machine language is attempted.  The following sections
will explain these two solutions in detail.

Converting to Write-Through Caching
-----------------------------------

    The chatr(1) command has been modified for the 7.05 release to allow an
executable to be flagged so that HP-UX will use writeback caching for the data
and/or stack segment.  Chatr(1) allows the user to specify writeback caching for
the data segment, the stack segment, or both.  Chatr(1) can also be used to
check on the current caching specification for the executable.  In the next
major release, ld(1) will also be modified to allow the user to specify
writeback caching at the time the executable is created.

    In most cases, only the data segment needs to use writeback caching for
applications that write machine code as data and then execute it.  The stack
segment should not be made writeback unless necessary, since the application
could still benefit from the performance of copyback caching for the stack.

    This solution is the only one available to users who do not have the source
in order to implement the cache pushing solution.  It is possible that this
solution may not work for some applications if they were not written correctly
for the 68020 or 68030 microprocessors.  There will not be a problem if code
that is written into the data segment for execution is not overwritten later
with new code.  If the code is overwritten, then the old code may be found in
the instruction cache when an attempt is made to execute the new code.  This
problem also existed on the 68020 and 68030 microprocessors; however, due to the
small size of their instruction caches (256 bytes), it is possible that
applications never ran into this problem.  The instruction cache on the 68040 is
4096 bytes, so an application is more likely to run into a problem if it did not
purge the instruction cache.  The only correct way an application could have
purged the instruction cache before the 7.05 release would have been to call the
undocumented m68020_advise() system call (which is now obsoleted by the more
powerful cachectl() call).  Some third party developers were informed about the
m68020_advise() system call, and they may have incorporated it into their
product.

Cache Pushing
-------------

    A new C library routine, cachectl(3C), will be provided in the 7.05 release
to allow the user to push the data cache after the machine language has been
written to the data and/or stack segment, but before the code is executed.  This
method, in most cases, will provide better performance for the application,
since it will still be able to use copyback caching.  Since cache flushing
involves a privileged instruction on the MC68040 processor, cachectl(3C) must
make a call to the kernel in order to do the flush.  This does involve a certain
amount of overhead, although every effort has been made to minimize the overhead
involved in calling cachectl().

Conclusion
----------

    In order to get the best performance, applications that write machine code
into the data segment and then execute it should be modified to use cachectl(3C)
to do selective cache management.  Some applications, if they did not overwrite
instructions in the data cache, or if they used the undocumented m68020_advise()
system call, will run correctly if the data segment caching mode is changed to
writethrough via the chatr(1) command.  However, these applications would also
benefit if they were modified to use cachectl(3C).

>From: jsm@hpfcdc.HP.COM (John Marvin) 		Date: Tue, 14 Aug 1990 
Organization: HP Fort Collins, Co.  		Newsgroups: hp-factory.std.unix
	Subject: Re: RFC: New cachectl(3C), Changed ld(1),chatr(1)

     cachectl(3C)           Series 300/400 Only           cachectl(3C)

     NAME
          cachectl - flush and or purge the cache

     SYNOPSIS
          #include <sys/cache.h>
          int cachectl (cachecmd, address, length);
          int cachecmd;
          char *address;
          unsigned length;

     DESCRIPTION
	  cachectl allows program control over the data and/or instruction
	  caches.  The cachecmd parameter specifies what operations to carry out
	  on the cache(s).  The cachecmd parameter should contain one of the
	  following values, which are defined in <sys/cache.h>:

	  CC_PURGE          request a cache controller purge.  Cache lines which
			    are "dirty" (i.e., hold valid data which is not
			    currently in the corresponding memory) may be
			    discarded without being written to memory.

	  CC_FLUSH          request a cache controller flush.  Dirty lines are
			    written out to memory before the cache line is
			    cleared.  This operation is the same as CC_PURGE on
			    models that do not have a copyback cache.

	  CC_IPURGE         flush any dirty data cache entries, then purge any
			    instruction cache which may hold stale contents.
			    This operation is useful for self-modifying code.

	  The following mask, defined in <sys/cache.h>, can be or'ed together
	  with one of the above values in order to purge the external cache (if
	  one exists) at the same time.

          CC_EXTPURGE       purge the external cache (if any).

	  The address parameter specifies the start address of the area to be
	  flushed/purged.  If the specified start address is NULL then the
	  operation will be applied to the entire cache.  Selective
	  flushing/purging may not be supported on all models, see the
	  DEPENDENCIES section for specific details.

	  The length parameter is only used when a start address is specified.
	  It controls the length of the area to be flushed/purged.

     EXAMPLES
	  The following call to cachectl requests that the entire data cache be
	  flushed:

               cachectl (CC_FLUSH, 0, 0);

     RETURN VALUE
	  cachectl returns 0 if the operations succeeds, otherwise -1 will be
	  returned.  The semantics of cachectl, when the address parameter
	  contains a bad address, is subject to change, and may vary from
	  machine to machine.

     DEPENDENCIES
          Series 300/400
	       The MC68020 and MC68030 processors do not have a copyback cache.
	       Selective purging is not supported for the MC68020 and MC68030
	       processors.  Selective purging/flushing is supported on the
	       MC68040 processor, but only under the following conditions.  If
	       the length parameter is 16, then the cache line which includes
	       address will be flushed/purged.  If the length parameter is 4096,
	       then the page which includes address will be flushed/purged.  If
	       the length parameter is not 16 or 4096 then the operation will be
	       applied to the entire cache.

	       On the MC68040 microprocessor, CC_PURGE will instead perform a
	       CC_FLUSH if the length parameter is not 16 or 4096.

     AUTHOR
          cachectl was developed by HP.

     SEE ALSO
          chatr(1), ld(1)
                                   - 2 -   Formatted:  August 13, 1990

>From: jsm@hpfcdc.HP.COM (John Marvin) 		Date: Tue, 14 Aug 1990 
Organization: HP Fort Collins, Co.  		Newsgroups: hp-factory.std.unix
	Subject: Re: RFC: New cachectl(3C), Changed ld(1),chatr(1)

The following DEPENDENCIES section will be added to the chatr(1) man page:

     DEPENDENCIES
          Series 300/400
	       The following option is supported only on Series 300/400
	       workstations with Motorola MC68040 microprocessors.

               -C<cache mode specification>
		    Specify caching mode for the data and/or stack segment.
		    This option has a mandatory argument that specifies whether
		    a particular segment uses either writethrough caching or
		    copyback caching.  In general, copyback caching should be
		    used since it provides greater performance.  However,
		    applications that write object code into either the data or
		    stack segment, and then execute the code, will not work with
		    copyback caching unless the application is modified to push
		    the cache.

		    The cache mode specification takes one or more of the
		    following arguments, concatenated together.  If no option is
		    specified for a particular segment then the caching mode for
		    that segment will remain the same as before.

		    d    Use writethrough caching for the data segment.

                    D    Use copyback caching for the data segment.

		    s    Use writethrough caching for the stack segment.

                    S    Use copyback caching for the stack segment.

>From: jsm@hpfcdc.HP.COM (John Marvin) 		Date: Tue, 14 Aug 1990 
Organization: HP Fort Collins, Co.  		Newsgroups: hp-factory.std.unix
	Subject: Re: RFC: New cachectl(3C), Changed ld(1),chatr(1)

The following paragraphs will be added to the Series 300/400 DEPENDENCIES
section of the ld(1) man page.

	       The following option is supported only on Series 300/400
	       workstations with Motorola MC68040 microprocessors.

               -C<cache mode specification>
		    Specify caching mode for the data and/or stack segment.
		    This option has a mandatory argument that specifies whether
		    a particular segment should use writethrough caching rather
		    than the default copyback caching.  In general, copyback
		    caching should be used since it provides greater
		    performance.  However, applications that write object code
		    into either the data or stack segment, and then execute the
		    code, will not work with copyback caching unless the
		    application is modified to push the cache.

		    The cache mode specification takes one or more of the
		    following arguments, concatenated together.

		    d    Use writethrough caching for the data segment.

		    s    Use writethrough caching for the stack segment.

>From: munir@hpfcmgw.HP.COM (Munir Mallal) 	Date: Tue, 14 Aug 1990 
Organization: HP Fort Collins, CO 		Newsgroups: hp-factory.std.unix
	Subject: Re: RFC: New cachectl(3C), Changed ld(1),chatr(1)

> a problem if it did not purge the instruction cache. The only correct way an
> application could have purged the instruction cache before the 7.05 release
> would have been to call the undocumented m68020_advise() system call (which is
> now obsoleted by the more powerful cachectl() call). Some third party develop-
> ers were informed about the m68020_advise() system call, and they may have
> incorporated it into their product.

It seems to me that it would be a good idea not to obsolete m68020_advise()
until the next major release (8.0).  If the call transparently invoked the
kernel routine to do the same thing, our third parties would have a better
chance of having code work woithout changes.

Munir Mallal

>From: mjs@hpfcso.HP.COM (Marc Sabatella) 	Date: Tue, 14 Aug 1990 
O: Hewlett-Packard, Fort Collins, CO, USA 	Newsgroups: hp-factory.std.unix
	Subject: Re: RFC: New cachectl(3C), Changed ld(1),chatr(1)

> The following paragraphs will be added to the Series 300/400 DEPENDENCIES
> section of the ld(1) man page.
>...
>               -C<cache mode specification>

'-C' is already defined for the series 800 (for parameter checking level).  It
is probably better to choose an option that does not conflict with an existing
option.  Then of course you should change chatr(1) to agree.

We will have to be careful about changing ld(1) for 7.05, since we are also
redelivering ld(1) for 7.40 (compiler release).  We are supporting both 7.0 ->
7.05 -> 7.40 and 7.0 -> 7.40 -> 7.05, and the 7.40 ld(1) is incompatible with
the 7.05 version (different debug formats).  We can add '-C' or its replacement
to 7.40 easily enough, and it can safely replace 7.05, but the converse is not
true.  You will have to deliver "ld" into /tmp (or something similar) and have
the customize script check to see if the 7.40 ld(1) is installed.  If it is, you
should just delete the 7.05 "ld".  If not, then you should move the 7.05 version
into place.

Or, we could simply not modify the 7.05 ld(1), and force programmers to use
chatr(1) or cachectl(2).

--------------   Marc Sabatella   HP Colorado Language Lab (CoLL)   marc@hpmonk

>From: donn@hpfcdc.HP.COM (Donn Terry) 		Date: Tue, 14 Aug 1990 
Organization: HP Fort Collins, Co.  		Newsgroups: hp-factory.std.unix
	Subject: Re: RFC: New cachectl(3C), Changed ld(1),chatr(1)

Several comments on this, all having more to do with presentation than content.
(Would someone who's more current on memory management than I take a look at the
content!)

1) The update_info section has the characteristic of something that the average
   reader would read the first paragraph and then skip, as he has no idea as to
   whether it applies to him or not.  It should be restated that "there are a
   few applications that might break, here are the ones we know of:  <list>.
   You need to do ____ to make them work.  If you see the following types of
   failures, you may have an additional such program.  <description>.  If so,
   you need to do <the same thing, I presume>.  For the technically minded,
   here's what's really going on...."

2) The phrase "to remain competitive" is fine for internal use, but it's bad
   stuff for external.  "To provide the best possible performance" is much
   better.  This will (of course) lead to a question about the performance
   impact of the various options, which should be given to the extent possible.
   Yes, I know it's hard.  It should explicitly give the fact that the library
   call has the smallest (I presume) impact.

3) The "current default" needs to be stated for chatr.  Yeah, it's obvious, but
   not to naive readers.  (Ditto, but not as strongly, for ld.)

4) I found cachectl() got a little too technical too fast.  State what it's good
   for up front (and observe that only certain classes of programs should even
   bother thinking about it.)  Also, it isn't necessary to say anything about
   privelege requirements.  Just describe what it does.  (Thus, if the 68050
   brings that out to user space, you don't have to change the page.)  It would
   probably be useful to indicate the "cost" of the call in the dependencies
   section.

John:  I'll bring you a marked up paper copy with other comments.	  Donn

.	  Donn

