looking for FASTEST putimage/getimage

I'm working on a GUI in 640x480x16 mode for a program.
i made windows95 like windows you can move and all that... but i need
really fast put/getimage procedures (hopefully 32bit assembler or
whatever) for use with BP7.0
(actually real-mode)

hope you could help me ;)

sXi