有趣的是,在过去10年里作为一个码农,所有我经历过的网站后台开发用的几乎都是用Ruby on Rails。不要误解,我很喜欢Ruby on Rails并且认为它是一个非常棒的开发环境。往往在一段时间后,你开始以ruby的方式来设计系统。这时你会忘记利用多线程,并行,快速执行(fast executions)和较小的内存开销(small memory overhead),软件的架构会变得简单而高效。很多年来,我一直是C/C++,Delphi和C#的开发者。我开始意识到使用正确的工具,工作会变得简单很多。
func (p *Payload) UploadToS3() error {
// the storageFolder method ensures that there are no name collision in
// case we get same timestamp in the key name
storage_path := fmt.Sprintf("%v/%v", p.storageFolder, time.Now().UnixNano())
bucket := S3Bucket
b := new(bytes.Buffer)
encodeErr := json.NewEncoder(b).Encode(payload)
if encodeErr != nil {
return encodeErr
}
// Everything we post to the S3 bucket should be marked 'private'
var acl = s3.Private
var contentType = "application/octet-stream"
func payloadHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
// Read the body into a string for json decoding
var content = &PayloadCollection{}
err := json.NewDecoder(io.LimitReader(r.Body, MaxLength)).Decode(&content)
if err != nil {
w.Header().Set("Content-Type", "application/json; charset=UTF-8")
w.WriteHeader(http.StatusBadRequest)
return
}
// Go through each payload and queue items individually to be posted to S3
for _, payload := range content.Payloads {
go payload.UploadToS3() // <----- DON'T DO THIS
}
我们需要一个不同的解决方案。在一开始,我们就讨论到需要把HTTP请求处理函数写的简洁,然后把处理工作转移到后台。当然,这是你在Ruby on Rails世界里必须做的,否则你会阻塞所有worker的工作(例如puma,unicorn,passenger等等,我们这里就不继续讨论JRuby了)。我们需要用到Resque,Sidekiq,SQS等常用的解决方案。这个列表可以很长,因为有许多方法来完成这项任务。
func payloadHandler(w http.ResponseWriter, r *http.Request) {
...
// Go through each payload and queue items individually to be posted to S3
for _, payload := range content.Payloads {
Queue <- payload
}
...
}
然后我们需要从队列里提取工作任务并进行处理。代码下图所示:
func StartProcessor() {
for {
select {
case job := <-Queue:
job.payload.UploadToS3() // <-- STILL NOT GOOD
}
}
}
坦白的说,我不知道我们当时在想什么。这肯定是熬夜喝红牛的结果。这个方法并没有给我们带来任何帮助。队列仅仅是将问题延后了。我们的处理函数(processor)一次仅上传一个载荷(payload),而接收请求的速率比一个处理函数上传S3的能力大太多了,带缓冲的channel很快就到达了它的极限。从而阻塞了HTTP请求处理函数往队列里添加更多的工作任务。
// Start method starts the run loop for the worker, listening for a quit channel in
// case we need to stop it
func (w Worker) Start() {
go func() {
for {
// register the current worker into the worker queue.
w.WorkerPool <- w.JobChannel
select {
case job := <-w.JobChannel:
// we have received a work request.
if err := job.Payload.UploadToS3(); err != nil {
log.Errorf("Error uploading to S3: %s", err.Error())
}
case <-w.quit:
// we have received a signal to stop
return
}
}
}()
}
// Stop signals the worker to stop listening for work requests.
func (w Worker) Stop() {
go func() {
w.quit <- true
}()
}
我们修改了HTTP请求处理函数来创建一个含有载荷(payload)的Job结构,然后将它送到一个叫JobQueue的channel。worker会对它们进行处理。
func payloadHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
// Read the body into a string for json decoding
var content = &PayloadCollection{}
err := json.NewDecoder(io.LimitReader(r.Body, MaxLength)).Decode(&content)
if err != nil {
w.Header().Set("Content-Type", "application/json; charset=UTF-8")
w.WriteHeader(http.StatusBadRequest)
return
}
// Go through each payload and queue items individually to be posted to S3
for _, payload := range content.Payloads {
// let's create a job with the payload
work := Job{Payload: payload}
// Push the work onto the queue.
JobQueue <- work
}
func (d *Dispatcher) Run() {
// starting n number of workers
for i := 0; i < d.maxWorkers; i++ {
worker := NewWorker(d.pool)
worker.Start()
}
go d.dispatch()
}
func (d *Dispatcher) dispatch() {
for {
select {
case job := <-JobQueue:
// a job request has been received
go func(job Job) {
// try to obtain a worker job channel that is available.
// this will block until a worker is idle
jobChannel := <-d.WorkerPool
// dispatch the job to the worker job channel
jobChannel <- job
}(job)
}
}
}
这里我们提供了创建worker的最大数目作为参数,并把这些worker加入到worker池里。因为我们已经在docker化的Go环境里使用了Amazon的Elasticbeanstalk并且严格按照12-factor方法来配置我们的生产环境,这些参数值可以从环境变量里获得。我们可以方便地控制worker数目和任务队列的长度。我们可以快速地调整这些值而不需要重新部署整个集群。
var (
MaxWorker = os.Getenv("MAX_WORKERS")
MaxQueue = os.Getenv("MAX_QUEUE")
)
部署了新版本之后,我们看到系统延迟一下子就降到了可以忽略的量级。同时处理请求的能力也大幅攀升。