# TODO - if a symlink is under /www/, use Location to redirect to that. Would need to stat all parent dirs for each file. # ab results: # ab -c 500 -n 4000 http://sam.test/ # Concurrency Level: 500 # Time taken for tests: 0.581 seconds # Complete requests: 4000 # Requests per second: 6883.56 [#/sec] (mean) # TODO for speed - epoll for linux, sendfile # multi-threaded scheduler (2 or 3 threads). # Avoid scheduling 2 processes attached to the same shuttle at the same time. # i.e. check safeness first and skip unsafe. # How to lock a critical section? futex? # Allow other exclusions? or locks / semaphores for different resources. # # cache file-descriptors? use mmap? sendfile would be faster. use tcp_cork # Send gzipped data if it's there - and read it instead of the file then # uncompress it. Thereby able to keep more files in the page cache. # See tux doc for hints on how to do that. # use memory pool allocator? # max_line_length = 2048 # lines longer than max_line_length are not nul-terminate by breadln automatically # I think breadln should just return EOF if it gets a line like that, it's an # error condition. # attempts to speed it up for large files: sched_busy = 128 block_size = 800*1024 users = load_passwd() # if I would use this with lots of users it would be better just to # remember the hashed password, not all the other shit of the user. # users with shell /bin/false or /usr/sbin/nologin should be excluded? # TODO option to load alternate user / shadow files # TODO run_until_blocked() - so I can listen then fork nodelay(fd) # My version of http prefers headers to be in a separate packet from # the request / response body. This is so that the server can read the # headers, but a CGI script can receive the body. (and possible # similar separation on the client). Otherwise the server needs to # pipe data to the CGI script or something like that. # FIXME use recv instead of read to support this. # FIXME server should send 200 HTTP OK for CGI scripts? # It (currently) uses 836 bytes per proc and 16kb * 2 for buffers, there will # be a bit more memory used too. (and another 16kb buffer while sending a file) # So that comes to about 50kb per proc. # cgi: # I would like to be able to just Dup2() the file descriptors # to the CGI script and then forget about it. But some data was # buffered from the request already. # I could pass the data that was already buffered in an # argument or env variable, but that is non-standard. # I'll put the first part of the body which was read already in # REQUEST_BDDY_1. This is dodgy as cannot handle a nul in it. # TODO set standard CGI environment eif status != 0 # failed, or killed by signal # There might have been some output already # or the input might not be fully read. # okay # No keep_alive for CGI at the moment. # put: # PUT replaces a file, POST appends to it. # PUT with the Content-Range header truncates after the data written, # if cr_byte1 >= size-1. POST with the Content-Range does not truncate # FIXME it opens the file as the user, but writes as root, # I think this means it would not respect any quota that might exist. # TODO put with content-range to be like splice, so can insert data? if count < reqlen # we don't unlink the partial file # FIXME it's not really a bad req, prolly a broken connection reqlen = 0 bad def discard_req() # read request if any and discard it # FIXME it should check if there is a POST / PUT method # FIXME what if we have POST or PUT but no Content-Length - # read until EOF and no keep_alive? # FIXME if there is an unnecessary request body, we should give an error # after discarding the message. if reqlen verbose(" discarding request of length %lld", (long long int)reqlen) # TODO fix this to discard it in blocks, a request can be big! # TODO limit request size, line size bread(in, reqlen) verbose(" got %lld bytes", (long long int)buflen(&in)) buffer_shift(&in, imin(reqlen, buflen(&in))) # read_headers # TODO hash the headers first? or use some better dispatch # TODO use try if it works? Seteuidgid(u) # must reset this to root before switching! a bit ugly # httpd GET only executes a cgi program if there is a ? in # the URL. Otherwise, GET retrieves the program. created code = 201 ; msg = "Created" # FIXME should return Location header and URL in body too. mesg # I use bwrite_direct to save copying the data again, # so need to free the out buffer first and recreate it after. buffer_free(&out) I am not using nonblock on the files now. Why not? # nonblock(file_fd) # TODO set standard CGI environment # dodgy apparently I need to indent this because in a macro . waitchild(child, status) put_file # this copy routine seems overly complex in get_dir, the user calls slurpdir, and root stats the nodes # loaded directory ok, root can stat the nodes vec *v = slurpdir(fullpath) # TODO use d_type cstr entpath = path_cat(fullpath, *i) # or could chdir #size_t sendfile_block_size = 800*1024 # Maybe use tcp_cork a bit more widely especially if not using keep_alive, so # the close packet will go with the body packet. # try using listener_try also, should be able to accept more connections # quickly # DONE # - avoid buffer_shift - use circbuf and readv? or read all the headers # into a single buf before trying to parse them. # TODO # spread connections evenly between processes a nice way # TODO # - cache formatted mtime dates per file, even for large files? # - check sysprof again # - fix so that only one process listens, but it passes sockets to the other # process. Or only one process at a time polls the listen sockets (else # they get woken up for nothing). # - fix so that with epoll it doesn't listen for write at the start, must # call write() first. Try so that must call read() first too. # - epoll users that read or write must call read() or write() to register to # receive events before any other blocking operation - otherwise they might # miss an event. This is a problem. Try disabling and enabling the # notifications, or something else, maybe store 1 where the pointer should # go if it's NULL and an event is fired, then when they call read() or # write() it returns immediately. # - format() was another heavy function, reduce dependence on that # FIXME # - when socket error, doesn't free stuff # TODO hack this in if there is no sendfile... ## buffer_free(&out) # # if method == HTTP_GET # not HTTP_HEAD # init(&fr, reader, file_fd) # sh(buffer, &fr, out, This, fio) # init(&fio, buffer, block_size) # start(&fr) # # repeat # pull(fio) # if !buflen(&fio) # break # off_t dsize = buflen(&fio) # if dsize > size # dsize = size # buffer_set_size(&fio, dsize) # size -= dsize # bwrite_direct(out, buffer_range(&fio)) ## bflush(out) # if buflen(&out) # error # keep_alive = 0 # break # buffer_clear(&fio) # if !size # break # push(fio) # # buffer_free(&fio) # ## init(&out, buffer, block_size) # functions using CPU, in order: 25.09 total 5.78 Sprintf (bsayf) *** 3.70 httpd_f itself 0.68 atof fixed 0.60 strcasecmp mostly changed to use strcmp 0.21 strstr 0.11 free 0.06 strcmp 0.05 memmove 0.05 memcpy 0.05 gmtime 0.04 strlen improved 0.04 writev 0.03 ioctl 0.03 strftime 1.54 malloc *** 1.52 strftime *** 0.9 realloc *** 0.52 gmtime *** 0.42 free *** 0.31 strdup *** 0.29 strlen improved 0.21 strrchr 0.18 memcpy 0.18 memchr 0.13 strcmp 0.11 strchr 0.07 strcasecmp 0.07 writev 0.05 gettimeofday 0.04 strtod To speed this up: 5.78 Sprintf (bsayf) *** I can cache the headers per object, all that changes is the date. Or do like litespeed and use writev to output each header separately...? That is probably a good idea. Maybe one reason my server is faster is that it doesn't do logging yet! Or lots of other things... Use threads in the scheduler. How does it work? A separate scheduler / queue for each CPU? - add IO handlers and wait handlers to the normal sched queue once they are triggered, instead of running them directly - run the waiter / IO stuff as one or two procs instead of hardcoded in the scheduler??? that would make things more pluggable. - give each proc a "preferred thread" - the scheduler would have to lock while finding the next proc for a thread or adding a new proc to the queue, etc (check all scheduler data structures). Could do all communication with the scheduler via shuttles? and then the scheduler itself becomes a single task..?? a bit too meta. ??? - maybe have two separate queues, one for each thread, so no need for locking - how to do migration? - how to share a single cache hashtable? posix_mutex_lock? looks slow. or should I just forget about the cache and use sendfile even for small files? If I use threads, I would have to lock - I'll try Russell's suggestion, using mmap to create a page of shared memory before forking, and each process can store connection counts there. Before calling accept, check if the process has more connections than the other. If it has more connections, it will not call accept, it will signal the other process to call acccept. - only server #0 will be listening and accepting. Server #0 will send file descriptors to the server with the fewest active connections using unix sockets created with socketpair. A shared mmap'd page will be used to store the connection counts. - the cache should be shared also. how? memcached? - use open first then fstat instead of stat then open?