Memory contains three general areas. First, function and operator
    calls via new and delete
    operator or member function calls.  Second, allocation via
    allocator. And finally, smart pointer and
    intelligent pointer abstractions.
  
 Memory management for Standard Library entities is encapsulated in a
 class template called allocator. The
 allocator abstraction is used throughout the
 library in string, container classes,
 algorithms, and parts of iostreams. This class, and base classes of
 it, are the superset of available free store (“heap”)
 management classes.
The C++ standard only gives a few directives in this area:
When you add elements to a container, and the container must allocate more memory to hold them, the container makes the request via its Allocator template parameter, which is usually aliased to allocator_type. This includes adding chars to the string class, which acts as a regular STL container in this respect.
       The default Allocator argument of every
       container-of-T is allocator<T>.
       
       The interface of the allocator<T> class is
	 extremely simple.  It has about 20 public declarations (nested
	 typedefs, member functions, etc), but the two which concern us most
	 are:
       
	 T*    allocate   (size_type n, const void* hint = 0);
	 void  deallocate (T* p, size_type n);
       
	 The n arguments in both those
	 functions is a count of the number of
	 T's to allocate space for, not their
	 total size.
	 (This is a simplification; the real signatures use nested typedefs.)
       
	 The storage is obtained by calling ::operator
	 new, but it is unspecified when or how
	 often this function is called.  The use of the
	 hint is unspecified, but intended as an
	 aid to locality if an implementation so
	 desires. [20.4.1.1]/6
       
     Complete details can be found in the C++ standard, look in
     [20.4 Memory].
   
    The easiest way of fulfilling the requirements is to call
    operator new each time a container needs
    memory, and to call operator delete each time
    the container releases memory. This method may be slower
    than caching the allocations and re-using previously-allocated
    memory, but has the advantage of working correctly across a wide
    variety of hardware and operating systems, including large
    clusters. The __gnu_cxx::new_allocator
    implements the simple operator new and operator delete semantics,
    while __gnu_cxx::malloc_allocator
    implements much the same thing, only with the C language functions
    std::malloc and std::free.
  
    Another approach is to use intelligence within the allocator
    class to cache allocations. This extra machinery can take a variety
    of forms: a bitmap index, an index into an exponentially increasing
    power-of-two-sized buckets, or simpler fixed-size pooling cache.
    The cache is shared among all the containers in the program: when
    your program's std::vector<int> gets
  cut in half and frees a bunch of its storage, that memory can be
  reused by the private
  std::list<WonkyWidget> brought in from
  a KDE library that you linked against.  And operators
  new and delete are not
  always called to pass the memory on, either, which is a speed
  bonus. Examples of allocators that use these techniques are
  __gnu_cxx::bitmap_allocator,
  __gnu_cxx::pool_allocator, and
  __gnu_cxx::__mt_alloc.
  
    Depending on the implementation techniques used, the underlying
    operating system, and compilation environment, scaling caching
    allocators can be tricky. In particular, order-of-destruction and
    order-of-creation for memory pools may be difficult to pin down
    with certainty, which may create problems when used with plugins
    or loading and unloading shared objects in memory. As such, using
    caching allocators on systems that do not support
    abi::__cxa_atexit is not recommended.
  
The only allocator interface that is supported is the standard C++ interface. As such, all STL containers have been adjusted, and all external allocators have been modified to support this change.
     The class allocator just has typedef,
   constructor, and rebind members. It inherits from one of the
   high-speed extension allocators, covered below. Thus, all
   allocation and deallocation depends on the base class.
   
     The base class that allocator is derived from
     may not be user-configurable.
It's difficult to pick an allocation strategy that will provide maximum utility, without excessively penalizing some behavior. In fact, it's difficult just deciding which typical actions to measure for speed.
Three synthetic benchmarks have been created that provide data that is used to compare different C++ allocators. These tests are:
Insertion.
Over multiple iterations, various STL container objects have elements inserted to some maximum amount. A variety of allocators are tested. Test source for sequence and associative containers.
Insertion and erasure in a multi-threaded environment.
This test shows the ability of the allocator to reclaim memory on a per-thread basis, as well as measuring thread contention for memory resources. Test source here.
A threaded producer/consumer model.
Test source for sequence and associative containers.
     The current default choice for
     allocator is
     __gnu_cxx::new_allocator.
   
      In use, allocator may allocate and
      deallocate using implementation-specific strategies and
      heuristics. Because of this, a given call to an allocator object's
      allocate member function may not actually
      call the global operator new and a given call to
      to the deallocate member function may not
      call operator delete.
    
This can be confusing.
     In particular, this can make debugging memory errors more
     difficult, especially when using third-party tools like valgrind or
     debug versions of new.
   
     There are various ways to solve this problem. One would be to use
     a custom allocator that just called operators
     new and delete
     directly, for every allocation. (See the default allocator,
     include/ext/new_allocator.h, for instance.)
     However, that option may involve changing source code to use
     a non-default allocator. Another option is to force the
     default allocator to remove caching and pools, and to directly
     allocate with every call of allocate and
     directly deallocate with every call of
     deallocate, regardless of efficiency. As it
     turns out, this last option is also available.
   
     To globally disable memory caching within the library for some of
     the optional non-default allocators, merely set
     GLIBCXX_FORCE_NEW (with any value) in the
     system's environment before running the program. If your program
     crashes with GLIBCXX_FORCE_NEW in the
     environment, it likely means that you linked against objects
     built against the older library (objects which might still using the
     cached allocations...).
  
     You can specify different memory management schemes on a
     per-container basis, by overriding the default
     Allocator template parameter.  For example, an easy
      (but non-portable) method of specifying that only malloc or free
      should be used instead of the default node allocator is:
   
    std::list <int, __gnu_cxx::malloc_allocator<int> >  malloc_list;Likewise, a debugging form of whichever allocator is currently in use:
    std::deque <int, __gnu_cxx::debug_allocator<std::allocator<int> > >  debug_deque;
      
    Writing a portable C++ allocator would dictate that the interface
    would look much like the one specified for
    allocator. Additional member functions, but
    not subtractions, would be permissible.
  
     Probably the best place to start would be to copy one of the
   extension allocators: say a simple one like
   new_allocator.
   
Several other allocators are provided as part of this implementation. The location of the extension allocators and their names have changed, but in all cases, functionality is equivalent. Starting with gcc-3.4, all extension allocators are standard style. Before this point, SGI style was the norm. Because of this, the number of template arguments also changed. Here's a simple chart to track the changes.
More details on each of these extension allocators follows.
       new_allocator
       
	 Simply wraps ::operator new
	 and ::operator delete.
       
       malloc_allocator
       
	 Simply wraps malloc and
	 free. There is also a hook for an
	 out-of-memory handler (for
	 new/delete this is
	 taken care of elsewhere).
       
       array_allocator
       
	 Allows allocations of known and fixed sizes using existing
	 global or external storage allocated via construction of
	 std::tr1::array objects. By using this
	 allocator, fixed size containers (including
	 std::string) can be used without
	 instances calling ::operator new and
	 ::operator delete. This capability
	 allows the use of STL abstractions without runtime
	 complications or overhead, even in situations such as program
	 startup. For usage examples, please consult the testsuite.
       
       debug_allocator
       
	 A wrapper around an arbitrary allocator A.  It passes on
	 slightly increased size requests to A, and uses the extra
	 memory to store size information.  When a pointer is passed
	 to deallocate(), the stored size is
	 checked, and assert() is used to
	 guarantee they match.
       
	throw_allocator
	
Includes memory tracking and marking abilities as well as hooks for throwing exceptions at configurable intervals (including random, all, none).
       __pool_alloc
       
	 A high-performance, single pool allocator.  The reusable
	 memory is shared among identical instantiations of this type.
	 It calls through ::operator new to
	 obtain new memory when its lists run out.  If a client
	 container requests a block larger than a certain threshold
	 size, then the pool is bypassed, and the allocate/deallocate
	 request is passed to ::operator new
	 directly.
       
	 Older versions of this class take a boolean template
	 parameter, called thr, and an integer template
	 parameter, called inst.
       
	 The inst number is used to track additional memory
      pools.  The point of the number is to allow multiple
      instantiations of the classes without changing the semantics at
      all.  All three of
       
    typedef  __pool_alloc<true,0>    normal;
    typedef  __pool_alloc<true,1>    private;
    typedef  __pool_alloc<true,42>   also_private;
   behave exactly the same way. However, the memory pool for each type (and remember that different instantiations result in different types) remains separate.
The library uses 0 in all its instantiations. If you wish to keep separate free lists for a particular purpose, use a different number.
The thr boolean determines whether the
   pool should be manipulated atomically or not.  When
   thr = true, the allocator
   is thread-safe, while thr =
   false, is slightly faster but unsafe for
   multiple threads.
   
For thread-enabled configurations, the pool is locked with a single big lock. In some situations, this implementation detail may result in severe performance degradation.
(Note that the GCC thread abstraction layer allows us to provide safe zero-overhead stubs for the threading routines, if threads were disabled at configuration time.)
       __mt_alloc
       
A high-performance fixed-size allocator with exponentially-increasing allocations. It has its own chapter in the documentation.
       bitmap_allocator
       
A high-performance allocator that uses a bit-map to keep track of the used and unused memory locations. It has its own chapter in the documentation.
The Standard Librarian: What Are Allocators Good For? . C/C++ Users Journal . 2000-12.
Reconsidering Custom Memory Allocation . Copyright © 2002 OOPSLA.
Allocator Types . C/C++ Users Journal .
Explaining all of the fun and delicious things that can
   happen with misuse of the auto_ptr class
   template (called AP here) would take some
   time. Suffice it to say that the use of AP
   safely in the presence of copying has some subtleties.
   
The AP class is a really nifty idea for a smart pointer, but it is one of the dumbest of all the smart pointers -- and that's fine.
AP is not meant to be a supersmart solution to all resource leaks everywhere. Neither is it meant to be an effective form of garbage collection (although it can help, a little bit). And it can notbe used for arrays!
AP is meant to prevent nasty leaks in the presence of exceptions. That's all. This code is AP-friendly:
    // Not a recommend naming scheme, but good for web-based FAQs.
    typedef std::auto_ptr<MyClass>  APMC;
    extern function_taking_MyClass_pointer (MyClass*);
    extern some_throwable_function ();
    void func (int data)
    {
	APMC  ap (new MyClass(data));
	some_throwable_function();   // this will throw an exception
	function_taking_MyClass_pointer (ap.get());
    }
   When an exception gets thrown, the instance of MyClass that's
      been created on the heap will be delete'd as the stack is
      unwound past func().
   
Changing that code as follows is not AP-friendly:
APMC ap (new MyClass[22]);
You will get the same problems as you would without the use of AP:
char* array = new char[10]; // array new... ... delete array; // ...but single-object delete
     AP cannot tell whether the pointer you've passed at creation points
      to one or many things.  If it points to many things, you are about
      to die.  AP is trivial to write, however, so you could write your
      own auto_array_ptr for that situation (in fact, this has
      been done many times; check the mailing lists, Usenet, Boost, etc).
   
All of the containers described in the standard library require their contained types to have, among other things, a copy constructor like this:
    struct My_Type
    {
	My_Type (My_Type const&);
    };
   
     Note the const keyword; the object being copied shouldn't change.
     The template class auto_ptr (called AP here) does not
     meet this requirement.  Creating a new AP by copying an existing
     one transfers ownership of the pointed-to object, which means that
     the AP being copied must change, which in turn means that the
     copy ctors of AP do not take const objects.
   
The resulting rule is simple: Never ever use a container of auto_ptr objects. The standard says that “undefined” behavior is the result, but it is guaranteed to be messy.
To prevent you from doing this to yourself, the concept checks built in to this implementation will issue an error if you try to compile code like this:
    #include <vector>
    #include <memory>
    void f()
    {
	std::vector< std::auto_ptr<int> >   vec_ap_int;
    }
   Should you try this with the checks enabled, you will see an error.
The shared_ptr class template stores a pointer, usually obtained via new, and implements shared ownership semantics.
The standard deliberately doesn't require a reference-counted implementation, allowing other techniques such as a circular-linked-list.
The shared_ptr code is kindly donated to GCC by the Boost
project and the original authors of the code. The basic design and
algorithms are from Boost, the notes below describe details specific to
the GCC implementation. Names have been uglified in this implementation,
but the design should be recognisable to anyone familiar with the Boost
1.32 shared_ptr.
  
The basic design is an abstract base class, _Sp_counted_base that
does the reference-counting and calls virtual functions when the count
drops to zero.
Derived classes override those functions to destroy resources in a context
where the correct dynamic type is known. This is an application of the
technique known as type erasure.
  
A shared_ptr<T> contains a pointer of
type T* and an object of type
__shared_count. The shared_count contains a
pointer of type _Sp_counted_base* which points to the
object that maintains the reference-counts and destroys the managed
resource.
    
_Sp_counted_base<Lp>The base of the hierarchy is parameterized on the lock policy (see below.) _Sp_counted_base doesn't depend on the type of pointer being managed, it only maintains the reference counts and calls virtual functions when the counts drop to zero. The managed object is destroyed when the last strong reference is dropped, but the _Sp_counted_base itself must exist until the last weak reference is dropped.
_Sp_counted_base_impl<Ptr, Deleter, Lp>
Inherits from _Sp_counted_base and stores a pointer of type Ptr
and a deleter of type Deleter.  _Sp_deleter is
used when the user doesn't supply a custom deleter. Unlike Boost's, this
default deleter is not "checked" because GCC already issues a warning if
delete is used with an incomplete type.
This is the only derived type used by tr1::shared_ptr<Ptr>
and it is never used by std::shared_ptr, which uses one of
the following types, depending on how the shared_ptr is constructed.
    
_Sp_counted_ptr<Ptr, Lp>
Inherits from _Sp_counted_base and stores a pointer of type Ptr,
which is passed to delete when the last reference is dropped.
This is the simplest form and is used when there is no custom deleter or
allocator.
    
_Sp_counted_deleter<Ptr, Deleter, Alloc>
Inherits from _Sp_counted_ptr and adds support for custom deleter and
allocator. Empty Base Optimization is used for the allocator. This class
is used even when the user only provides a custom deleter, in which case
allocator is used as the allocator.
    
_Sp_counted_ptr_inplace<Tp, Alloc, Lp>
Used by allocate_shared and make_shared.
Contains aligned storage to hold an object of type Tp,
which is constructed in-place with placement new.
Has a variadic template constructor allowing any number of arguments to
be forwarded to Tp's constructor.
Unlike the other _Sp_counted_* classes, this one is parameterized on the
type of object, not the type of pointer; this is purely a convenience
that simplifies the implementation slightly.
    
C++11-only features are: rvalue-ref/move support, allocator support,
aliasing constructor, make_shared & allocate_shared. Additionally,
the constructors taking auto_ptr parameters are
deprecated in C++11 mode.
    
The Thread Safety section of the Boost shared_ptr documentation says "shared_ptr objects offer the same level of thread safety as built-in types." The implementation must ensure that concurrent updates to separate shared_ptr instances are correct even when those instances share a reference count e.g.
shared_ptr<A> a(new A); shared_ptr<A> b(a); // Thread 1 // Thread 2 a.reset(); b.reset();
The dynamically-allocated object must be destroyed by exactly one of the threads. Weak references make things even more interesting. The shared state used to implement shared_ptr must be transparent to the user and invariants must be preserved at all times. The key pieces of shared state are the strong and weak reference counts. Updates to these need to be atomic and visible to all threads to ensure correct cleanup of the managed resource (which is, after all, shared_ptr's job!) On multi-processor systems memory synchronisation may be needed so that reference-count updates and the destruction of the managed resource are race-free.
The function _Sp_counted_base::_M_add_ref_lock(), called when
obtaining a shared_ptr from a weak_ptr, has to test if the managed
resource still exists and either increment the reference count or throw
bad_weak_ptr.
In a multi-threaded program there is a potential race condition if the last
reference is dropped (and the managed resource destroyed) between testing
the reference count and incrementing it, which could result in a shared_ptr
pointing to invalid memory.
The Boost shared_ptr (as used in GCC) features a clever lock-free algorithm to avoid the race condition, but this relies on the processor supporting an atomic Compare-And-Swap instruction. For other platforms there are fall-backs using mutex locks. Boost (as of version 1.35) includes several different implementations and the preprocessor selects one based on the compiler, standard library, platform etc. For the version of shared_ptr in libstdc++ the compiler and library are fixed, which makes things much simpler: we have an atomic CAS or we don't, see Lock Policy below for details.
There is a single _Sp_counted_base class,
which is a template parameterized on the enum
__gnu_cxx::_Lock_policy.  The entire family of classes is
parameterized on the lock policy, right up to
__shared_ptr, __weak_ptr and
__enable_shared_from_this. The actual
std::shared_ptr class inherits from
__shared_ptr with the lock policy parameter
selected automatically based on the thread model and platform that
libstdc++ is configured for, so that the best available template
specialization will be used. This design is necessary because it would
not be conforming for shared_ptr to have an
extra template parameter, even if it had a default value.  The
available policies are:
    
       _S_atomic
       
Selected when GCC supports a builtin atomic compare-and-swap operation on the target processor (see Atomic Builtins.) The reference counts are maintained using a lock-free algorithm and GCC's atomic builtins, which provide the required memory synchronisation.
       _S_mutex
       
The _Sp_counted_base specialization for this policy contains a mutex, which is locked in add_ref_lock(). This policy is used when GCC's atomic builtins aren't available so explicit memory barriers are needed in places.
       _S_single
       
This policy uses a non-reentrant add_ref_lock() with no locking. It is
used when libstdc++ is built without --enable-threads.
       
       For all three policies, reference count increments and
       decrements are done via the functions in
       ext/atomicity.h, which detect if the program
       is multi-threaded.  If only one thread of execution exists in
       the program then less expensive non-atomic operations are used.
     
dynamic_pointer_cast, static_pointer_cast,
const_pointer_castAs noted in N2351, these functions can be implemented non-intrusively using the alias constructor. However the aliasing constructor is only available in C++11 mode, so in TR1 mode these casts rely on three non-standard constructors in shared_ptr and __shared_ptr. In C++11 mode these constructors and the related tag types are not needed.
enable_shared_from_this
The clever overload to detect a base class of type
enable_shared_from_this comes straight from Boost.
There is an extra overload for __enable_shared_from_this to
work smoothly with __shared_ptr<Tp, Lp> using any lock
policy.
    
make_shared, allocate_shared
make_shared simply forwards to allocate_shared
with std::allocator as the allocator.
Although these functions can be implemented non-intrusively using the
alias constructor, if they have access to the implementation then it is
possible to save storage and reduce the number of heap allocations. The
newly constructed object and the _Sp_counted_* can be allocated in a single
block and the standard says implementations are "encouraged, but not required,"
to do so. This implementation provides additional non-standard constructors
(selected with the type _Sp_make_shared_tag) which create an
object of type _Sp_counted_ptr_inplace to hold the new object.
The returned shared_ptr<A> needs to know the address of the
new A object embedded in the _Sp_counted_ptr_inplace,
but it has no way to access it.
This implementation uses a "covert channel" to return the address of the
embedded object when get_deleter<_Sp_make_shared_tag>()
is called.  Users should not try to use this.
As well as the extra constructors, this implementation also needs some
members of _Sp_counted_deleter to be protected where they could otherwise
be private.
    
      Examples of use can be found in the testsuite, under
      testsuite/tr1/2_general_utilities/shared_ptr,
      testsuite/20_util/shared_ptr
      and
      testsuite/20_util/weak_ptr.
    
      The shared_ptr atomic access
      clause in the C++11 standard is not implemented in GCC.
    
      Unlike Boost, this implementation does not use separate classes
      for the pointer+deleter and pointer+deleter+allocator cases in
      C++11 mode, combining both into _Sp_counted_deleter and using
      allocator when the user doesn't specify
      an allocator.  If it was found to be beneficial an additional
      class could easily be added.  With the current implementation,
      the _Sp_counted_deleter and __shared_count constructors taking a
      custom deleter but no allocator are technically redundant and
      could be removed, changing callers to always specify an
      allocator. If a separate pointer+deleter class was added the
      __shared_count constructor would be needed, so it has been kept
      for now.
    
      The hack used to get the address of the managed object from
      _Sp_counted_ptr_inplace::_M_get_deleter()
      is accessible to users. This could be prevented if
      get_deleter<_Sp_make_shared_tag>()
      always returned NULL, since the hack only needs to work at a
      lower level, not in the public API. This wouldn't be difficult,
      but hasn't been done since there is no danger of accidental
      misuse: users already know they are relying on unsupported
      features if they refer to implementation details such as
      _Sp_make_shared_tag.
    
tr1::_Sp_deleter could be a private member of tr1::__shared_count but it would alter the ABI.
The original authors of the Boost shared_ptr, which is really nice code to work with, Peter Dimov in particular for his help and invaluable advice on thread safety. Phillip Jordan and Paolo Carlini for the lock policy implementation.
C++ Standard Library Active Issues List . N2456 .