9 篇博文含有标签「开发笔记」

查看所有标签

编程 | 视频剪辑自动转写字幕

2022/05/14 · 阅读需 15 分钟

语音转写市场概览
网易见外的缺点
1. FCPX 不支持网易见外导出的 srt 字幕文件
2. 网易见外的转写结果需要手动进行长度切割
3. 网易见外只支持后期文本替换，而不支持前期预设词库
4. 网易见外不支持基于鼠标点击的文本位置智能跳转语音并播放
讯飞转写
1. 讯飞服务价格
2. 讯飞语音转写控制台
3. 讯飞语音转写的使用
4. 讯飞语音转写使用分词
5. 讯飞语音转写程序
基于 Automator 实现右键语音文件后台自动转写

语音转写市场概览

目前国内中文语音转文字，做的最好的应该是科大讯飞（可惜要付费）：

控制台-讯飞开放平台

所以其实很多 UP 主用的是网易见外之类的免费转写产品。

其他转写平台还有，比如：

网易见外的缺点

我实际体验下来，网易见外还是不太能满足我的需求的，主要如下：

FCPX 不支持网易见外导出的 srt 字幕文件

这个我之前就了解了一下，发现大家都是用一些第三方软件再将网易见外的 srt 转换成另一种 srt，不禁引起了我的好奇，到底为啥网易见外的 srt 不支持 fcpx 导入。

我对比了讯飞转写和网易见外的 srt 文件，发现两者的唯一区别就在于编序下标不同：网易见外从 0 开始标记，而讯飞转写从 1 开始：

由此大胆猜测：FCPX 对 srt 文件的硬性要求就是从 1 开始编码！

于是我将网易见外的第 0 号编码对应的字幕内容去掉，重新导入 FCPX，结果成功了，此即证实了我的猜想。

了解了原因之后，这个问题就变得不那么恐怖了，随便用什么脚本语言把网易见外的 srt 文件里的序号全部加 1，或者干脆把第一条去掉即可。不是什么难事。

网易见外的转写结果需要手动进行长度切割

网易见外的默认转写字幕长度是偏长的。这其实可以理解，因为长一点有助于基于上下文分析提高转写的准确度。

但问题就在于，想把分好的每句字幕再切小一点，则需要手动的在网页上点击分隔按钮一个个地调整，这个工作量就变得无法接受了。

网易见外只支持后期文本替换，而不支持前期预设词库

对于学过机器学习的程序员或者用过jieba分词之类开源代码库的人都知道，预设词库在文本分析相关领域是非常重要的，显著影响算法识别的准确度和使用体验。

然而网易见外是不支持转写之前预设词库，只可以转完之后在页面点击文本替换，由此可见网易转写的机器学习模型（如果有的话）比较低级，它只有通用的一个，不支持用户自己输入词库进行模型微调。

网易见外不支持基于鼠标点击的文本位置智能跳转语音并播放

这个是讯飞转写的一个很友好的功能，并且在技术是线上我认为也不是很难的一件事，但很可惜，网易见外也未能把这个用户体验做起来。

基于以上原因，我认为网易见外是远未达到我对一个智能转写软件平台的目标要求的。

讯飞转写

讯飞服务价格

如果没有开通过讯飞的服务，可以申请免费使用。不过有一定要求，比如企业用户需要提交工商执照相关信息，而个人用户，看声明貌似要 19 年的新用户……反正我是不符合了（我以前应该用过）。

讯飞语音转写控制台

在控制台，可以查看自己所购买的讯飞服务详情，最主要的就是确认自己购买的是“讯飞转写”的服务，然后至少要支持中英文转写，最后在个性化热词里可以加入自己的特殊词库（用于提高准确率，非常有用）。

讯飞语音转写的使用

可以使用讯飞语音转写的 api，在该页可以下载各大语音（例如 python）的 demo：语音转写 API 文档 | 讯飞开放平台文档中心

建议直接对着 demo 改，否则从 0 到 1 写会很麻烦，因为要很多加密相关参数。

讯飞语音转写使用分词

在 /prepare 接口的参数中如果加上 has_participle = true，则会返回带每个词语识别的结果。

不过深入去使用词级别的解析结果也是一件不太容易的事情，因为需要我们自己去处理断句的问题。比如说要通过'wordsName': '，'去判别是个断句。

基于这个角度来看，网易见外不支持按长度的智能分句，似乎也变得不是那么不能接受了。

但尽管如此，讯飞的结果依旧要比网易见外好很多。

讯飞语音转写程序

已同步在个人仓库：https://github.com/MarkShawn2020/mark_keeps_learning/blob/master/mark_scripts/voice2srt/main.py

具体脚本内容如下：

# -*- coding: utf-8 -*-
#
#   author: yanmeng2, xunfei api relative
#   author: markshawn2020, xunfei result to srt-format file, 2022-05-13
#   config: https://console.xfyun.cn/services/lfasr
#   doc: https://www.xfyun.cn/doc/asr/lfasr/API.html
#

import base64
import hashlib
import hmac
import json
import os
import sys
import time

import requests

lfasr_host = 'http://raasr.xfyun.cn/api'

# 请求的接口名
api_prepare = '/prepare'
api_upload = '/upload'
api_merge = '/merge'
api_get_progress = '/getProgress'
api_get_result = '/getResult'
# 文件分片大小10M
file_piece_sice = 10485760

# ——————————————————转写可配置参数————————————————
# 参数可在官网界面（https://doc.xfyun.cn/rest_api/%E8%AF%AD%E9%9F%B3%E8%BD%AC%E5%86%99.html）查看，根据需求可自行在gene_params方法里添加修改
# 转写类型
lfasr_type = 0
# 是否开启分词
has_participle = 'false'
has_seperate = 'true'
# 多候选词个数
max_alternatives = 0
# 子用户标识
suid = ''

N = 10 # 每隔N秒获取一次进度


class SliceIdGenerator:
    """slice id生成器"""

    def __init__(self):
        self.__ch = 'aaaaaaaaa`'

    def getNextSliceId(self):
        ch = self.__ch
        j = len(ch) - 1
        while j >= 0:
            cj = ch[j]
            if cj != 'z':
                ch = ch[:j] + chr(ord(cj) + 1) + ch[j + 1:]
                break
            else:
                ch = ch[:j] + 'a' + ch[j + 1:]
                j = j - 1
        self.__ch = ch
        return self.__ch


# -------------------- 从讯飞结果到srt的转换函数 -----------------

def time2stamp(s):
    """
    将整数时间（毫秒）转化为srt的时间戳，即 HH:MM:SS,mmm
    :param s:
    :return:
    """
    mmm = s % 1000
    s //= 1000
    SS = s % 60
    s //= 60
    MM = s % 60
    s //= 60
    HH = s
    return f"{HH:02}:{MM:02}:{SS:02},{mmm:03}"


def pickle_srt_item(index: int, mm_start: int, mm_end: int, content: str):
    """
    构建srt字幕文件的基本单元
    :param index: 字幕序号，从1开始，手动创建
    :param mm_start: 该字幕单元起始毫秒数
    :param mm_end: 该字幕单元结束毫秒数
    :param content: 该字幕单元的文本内容
    :return:
    """
    return f"{index}\n{time2stamp(mm_start)} --> {time2stamp(mm_end)}\n{content}\n"


def xunfei_json2srt(items):
    """
    将讯飞转写的输出转成srt字幕文本内容
    :param items:
    :return:
    """
    return "\n".join([pickle_srt_item(index + 1, int(item["bg"]), int(item["ed"]), item["onebest"])
                      for (index, item) in enumerate(items)])


class RequestApi(object):
    def __init__(self, appid, secret_key, upload_file_path):
        self._appid = appid
        self._secret_key = secret_key
        self._upload_file_path = upload_file_path
        self.srt_fp = os.path.splitext(self._upload_file_path)[0] + ".srt"
        print("--- initialized ---")

    # 根据不同的apiname生成不同的参数,本示例中未使用全部参数您可在官网(https://doc.xfyun.cn/rest_api/%E8%AF%AD%E9%9F%B3%E8%BD%AC%E5%86%99.html)查看后选择适合业务场景的进行更换
    def gene_params(self, apiname, taskid=None, slice_id=None):
        appid = self._appid
        secret_key = self._secret_key
        upload_file_path = self._upload_file_path
        ts = str(int(time.time()))
        m2 = hashlib.md5()
        m2.update((appid + ts).encode('utf-8'))
        md5 = m2.hexdigest()
        md5 = bytes(md5, encoding='utf-8')
        # 以secret_key为key, 上面的md5为msg， 使用hashlib.sha1加密结果为signa
        signa = hmac.new(secret_key.encode('utf-8'), md5, hashlib.sha1).digest()
        signa = base64.b64encode(signa)
        signa = str(signa, 'utf-8')
        file_len = os.path.getsize(upload_file_path)
        file_name = os.path.basename(upload_file_path)
        param_dict = {}

        if apiname == api_prepare:
            # slice_num是指分片数量，如果您使用的音频都是较短音频也可以不分片，直接将slice_num指定为1即可
            slice_num = int(file_len / file_piece_sice) + (0 if (file_len % file_piece_sice == 0) else 1)
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['file_len'] = str(file_len)
            param_dict['file_name'] = file_name
            param_dict['slice_num'] = str(slice_num)
        elif apiname == api_upload:
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['task_id'] = taskid
            param_dict['slice_id'] = slice_id
        elif apiname == api_merge:
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['task_id'] = taskid
            param_dict['file_name'] = file_name
        elif apiname == api_get_progress or apiname == api_get_result:
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['task_id'] = taskid
        return param_dict

    # 请求和结果解析，结果中各个字段的含义可参考：https://doc.xfyun.cn/rest_api/%E8%AF%AD%E9%9F%B3%E8%BD%AC%E5%86%99.html
    def gene_request(self, apiname, data, files=None, headers=None):
        response = requests.post(lfasr_host + apiname, data=data, files=files, headers=headers)
        result = json.loads(response.text)
        if result["ok"] == 0:
            print("{} success:".format(apiname) + str(result))
            return result
        else:
            print("{} error:".format(apiname) + str(result))
            exit(0)
            return result

    # 预处理
    def prepare_request(self):
        return self.gene_request(apiname=api_prepare,
                                 data=self.gene_params(api_prepare))

    # 上传
    def upload_request(self, taskid, upload_file_path):
        file_object = open(upload_file_path, 'rb')
        try:
            index = 1
            sig = SliceIdGenerator()
            while True:
                content = file_object.read(file_piece_sice)
                if not content or len(content) == 0:
                    break
                files = {
                    "filename": self.gene_params(api_upload).get("slice_id"),
                    "content": content
                }
                response = self.gene_request(api_upload,
                                             data=self.gene_params(api_upload, taskid=taskid,
                                                                   slice_id=sig.getNextSliceId()),
                                             files=files)
                if response.get('ok') != 0:
                    # 上传分片失败
                    print('upload slice fail, response: ' + str(response))
                    return False
                print('upload slice ' + str(index) + ' success')
                index += 1
        finally:
            'file index:' + str(file_object.tell())
            file_object.close()
        return True

    # 合并
    def merge_request(self, taskid):
        return self.gene_request(api_merge, data=self.gene_params(api_merge, taskid=taskid))

    # 获取进度
    def get_progress_request(self, taskid):
        return self.gene_request(api_get_progress, data=self.gene_params(api_get_progress, taskid=taskid))

    # 获取结果
    def get_result_request(self, taskid):
        return self.gene_request(api_get_result, data=self.gene_params(api_get_result, taskid=taskid))

    def all_api_request(self):
        # 1. 预处理
        pre_result = self.prepare_request()
        taskid = pre_result["data"]
        # 2 . 分片上传
        self.upload_request(taskid=taskid, upload_file_path=self._upload_file_path)
        # 3 . 文件合并
        self.merge_request(taskid=taskid)
        # 4 . 获取任务进度
        while True:
            # 每隔N秒获取一次任务进度
            progress = self.get_progress_request(taskid)
            progress_dic = progress
            if progress_dic['err_no'] != 0 and progress_dic['err_no'] != 26605:
                print('task error: ' + progress_dic['failed'])
                return
            else:
                data = progress_dic['data']
                task_status = json.loads(data)
                if task_status['status'] == 9:
                    print('task ' + taskid + ' finished')
                    break
                print('The task ' + taskid + ' is in processing, task status: ' + str(data))

            time.sleep(N)  # 每次获取进度间隔N S

        # 5 . 获取结果
        srt_json = json.loads(self.get_result_request(taskid=taskid)["data"])
        srt_str = xunfei_json2srt(srt_json)
        with open(self.srt_fp, "w") as f:
            f.write(srt_str)
        print("has written converted result into path: " + self.srt_fp)


if __name__ == '__main__':
    api = RequestApi(
        appid=os.environ["XUNFEI_APP_ID"],
        secret_key=os.environ["XUNFEI_APP_SK"],
        upload_file_path=sys.argv[1]
    )
    api.all_api_request()

基于 Automator 实现右键语音文件后台自动转写

目前我们写的脚本，输入是一个文件路径，特别的，是指我们待转的音频文件，根据讯飞接口要求，对输入的文件有以下限制：

从音频格式来看，除了aac格式不支持，其他常用的例如mp3, wav, m4a都支持了，所以还是挺广泛的。

但是每次手动调用程序去转写一个音频文件，未免还是感觉有点麻烦，尤其是对于程序员来说。

比如现在我们转写一个音频文件，需要在命令行执行如下命令：

python3 ~/mark_keeps_learning/mark_scripts/voice2srt/main.py 目标音频文件路径

虽然我们也可以使用alias手段，把这一串命令缩短成一个词，以方便我们直接使用voice2srt 目标音频文件路径 的命令完成目标，但还是要基于命令行，多有不便。

主要是，我们也不需要修改其他参数（基本按照默认即可，热词已经在讯飞官网配好了）。

那么这种纯粹基于文件的操作，在 mac 平台上最好的办法是写一个automator脚本：

这样我们就可以直接在 finder 里面右键我们的音频文件，然后在快捷菜单中找到我们的 automator 脚本选项，鼠标一点就自动转成了，非常方便！

十秒钟之后，就自动在当前文件夹内生成一个讯飞转写后的字幕文件了，非常方便：

并且我将标准输出重定位到了根目录的 log 文件，这样就可以在程序出错时复查：

Automator 是一个很有用的工具，我也对它越来越感兴趣了，比如说还写了一个网易云音乐 ncm 转 mp3 的快捷操作，这样右键一个 ncm 文件，就能转成 mp3 了，非常方便。

但愿本文对你有帮助~

编程 | 北京租房系统设计、研究与经验 0.1.0

2022/03/09 · 阅读需 66 分钟

通过一天使用程序化手段获取两万多套房源转租信息，继而再通过程序化粗筛成两百多套，紧接着人工细筛出十几套，最后两天内实地考察九套。
在北京快速的、低成本的找房，我是如何做到的，我又有什么经验心得与启示，这是本文所要阐述的。

个人背景概要

大学实习基本都是在上海，租过远到嘉定每天来回四小时的自如、租过虹口差点要不回钱的蛋壳，租过和别人同睡一张床第二天赶紧溜走连押金都不想要的复旦老破小，租过闵大荒开始闹到威胁后续被窗外幼儿园治愈的公寓式开间，也为把项目做好租过浦东近万的酒店。

所以租房，尤其是在一线租房，确实是个头疼的问题。

现在，作为一名北漂狗，是时候认真研究一下了。

租房方案设计

个人需求分析

在每个人讨论租房这个问题之前，第一个就是个人需求的剖析。

就我接触下来，99%的人对价格还是相对敏感的，在此基础之上，有些人更重视通勤时间（比如我），有些人更重视独卫的便利性（比如某互联网男生），有些人更重视卫生条件（某金融业女生）。

而预算方面也会因为因人而异，有预算 2000 的，3000 的，4000 的，甚至 6000 的，等等。

因此从需求、预算角度，每个人所适合的策略就不太一样。

本文主要介绍的是汇总租房帖然后逐步筛选的方案，理论上适合一切找房群体，同时更适合希望较低成本找房的朋友。

公寓 vs 民宅

我过往的经历，应该来说，住公寓与酒店比较多，现在的年轻人应该大多都是这样。

公寓也并非不好，尤其是身边同学或者朋友一起住青年公寓那种，就特有氛围。

但如果是长租，以下问题就不能忽视：

首先是价格方面，同面积情况下，公寓普遍租金更高，另外商水商电大概是民水民电费用的三倍左右，比如夏天可能开着空调，民水民电的民宅花 150，但公寓就可能是 450，当然这个估算可能不准确，也因人而异
公寓由于自建等原因，隔音效果普遍不是很好，如果喜欢安静的一定要注意这一点
公寓往往缺少家的感觉

如果你能容忍或者确认以上几点，那就很适合选住公寓，因为公寓的配套设施一般更完善，比如自习室、健身房、大厅，甚至食堂等等。

总之，考虑到我的个人经历与接下来的规划，我就没有考虑公寓了。

线上 vs 线下

租房大致分纯线上、线上转线下与纯线下三种方式。

纯线下方案

首先讲一下纯线下，如果你是经济非常紧张、对居住环境不是很重视，同时社交能力较强（或长的比较友好），希望以最低成本租房的，根据一些文章指出，你可以线下走街串巷，多和低收入人群（保安、清洁工等等）聊一聊，他们通常能给你非常具有性价比的推荐。

我虽然没有通过这种方式找房，但我确实曾经在拉萨坐人力拉车时和车夫聊过，他就住在大昭寺附近，一个月印象中好像是 300（也有可能是 600）的房子，总之，是很低了，即使我们住青旅，也要三四十一天，一个月下来也是近千的。

其实吧，我觉得纯线下的最主要条件，还是你要能来事，理论上不单单是光和低收入人群聊，你有能耐的话，也可以和包租婆聊，说不定更能找到非常具有吸引力的房源，对不~

但对于北漂狗来说，纯线下方案并不是那么容易操作。

纯线上方案

接着是纯线上方案，这个方案，应该是很多学生党比较喜欢的方式，照片视频一看，或者根据平台品牌，一下就入了。

我便是这么走过来的。我甚至是线上看好后，直接拖着箱子再去看房的，这绝对是大忌！

当然这也不是不行，比如嫌麻烦与不缺钱，就可以无脑 all in 自如（公寓相关的分析接下来章节会说），至少能保证下限。

但如果你想选民宅的话，纯线上的最大挑战，是要有足够的鉴别能力。比如某中介在朋友圈发了某套房的房源信息，一般是位置加文字加图片，你需要第一时间能够识别出信息的真实度如何，另外，同步查询这个小区的相关信息：小区建成于什么年代、附近的设施情况如何等等。

而在你未曾亲自线下跑楼之前，你其实对这些都没什么太大概念，比如什么是筒子楼、什么叫隔断、什么是团结户等等。

不过一旦你对这些概念逐步熟悉之后，看房，尤其是看中介与平台的线上房源，心中就会相对有数了，也就是说，至少不太容易出现预期差太多的情况了。

线上转线下方案

相对来说，先在线上看够足够多的房源，再精挑几个去线下调研，应该是最好的。

这里，有两点，第一点就是尽量以正常通勤工具去看房，你不能因为远或者啥的去打车看房，毕竟你是来租房通勤用的，你必须还原你的真实使用场景去体会这是不是你想要的，所以从我的感受来看，离地铁超过 1 公里的房我就不会要了，走路实在需要太久。

第二点，就是怎么说呢，并不需要刻意排斥中介。

是，中介这个职业，总让人爱恨交加，尤其是碰到就为了多坑点你钱的，我也碰到过，并且不少。但别人也确实给你提供了一定帮助，或者说，insight。

你得明白整个租房市场是怎么回事，在一线城市，绝大多数租房都是相对饱和的，好的、性价比高的房子确实很抢手，比如在国贸上班的金融白领们租房一般就是围绕这一块，相应的就会有中介专门负责这几块小区的生意，一旦有一个房间空出来就会立马挂出来，然后数个中介就能找到数个客户对该房源感兴趣。

如果你排斥中介，拒绝接受中介，你怎么能获得这些房源信息？并不是所有人都会在豆瓣、咸鱼发帖转租什么的，所以中介仍旧是找房的一个重要途径，关键是看你怎么用。

我的看法是，不要只 follow 一个中介，你可以多加几个中介，综合他们的 insight，最终逐渐形成你的租房目标。但所谓的多加，也不是随便加，一定是你目标区域内的中介才是有效的，因此如何加到你目标区域内尽可能多的中介，依旧需要我们后续所用到的技术。

再总结一下，不需要排斥中介，但也不要随意加中介，如果碰到目标区域内的中介，不妨加了多和他们聊聊，可以更快地帮你熟悉目标区域。最重要的是，不要拖着箱子，然后坦诚地、大方地线下去走走，实地看看。说不定想去看的某套是不符合期望的，但其他套却很吸引你，这也是可能的。

当然，中介有风险，后面也会讲一些中介的平常套路。

线上平台选择

现在其实有很多租房的平台，比如微信、豆瓣小组、咸鱼、小红书、贝壳找房、链家、58、安居客、自如、泊寓等等。

其中，个人房源主要集中在微信、豆瓣小组、咸鱼、58 等平台上，贝壳找房我试了试房源基本对接的自如的。这其中，58 和安居客据说假房源特别多，所以也未在我选择之列。咸鱼和小红书是我后来才知道的，但不在本次找房所使用工具之类。

本次找房主要使用的是微信和豆瓣小组，其实主要是豆瓣小组，微信的租房群也是通过豆瓣才找到的。

总体设计

flowchart TB

subgraph day1[Day 1]
direction TB
    subgraph platforms[租房平台]
        douban[豆瓣小组-北京租房]
        --> wechat[微信群聊-北京租房]
    end

    douban
    -- 爬虫 --> input[2w+近期租房条目数据]

    wechat -- 表格 --> input

    input -- 清洗 --> washed[100+符合基本条件的租房条目数据]

    -- 一一标记确认 --> marked[10+候选租房目标]
end

subgraph day2[Day 2-3]
    marked
    -- 微信逐个联系 --> connected[<10个线下考察目标]

    -- 线下考察 --> favorite[1-3个最终候选]

    -- 抉择 --> final[1个最终目标]
end

租房前期准备

了解行政区域规划

访问有没有一种电子地图，可显示详细的行政区域边界？ - 知乎可以很方便地查看行政区域规划地图，例如北京的：

我也是第一次看行政区域地图，一一审查他们的区号发现还挺有规律，核心第一圈东西城区 01 和 02，接着第二圈从朝阳开始顺时针——朝阳（05）、丰台（06）、石景山（07）、海淀（08），接着第三圈从房山开始逆时针——房山（11）、通州（12）、顺义（13）、昌平（14），最后第四圈有点杂乱但整体依旧是逆时针：大兴（15）、怀柔（16）、平谷（17）、密云（18）、延庆（19）。

了解行政区划的目的，主要是分清自己大概会租什么位置的房子，因为现在大多数租房软件都会提供基于行政区划与地铁等多种筛选方式，也许地铁比行政区域更重要一些，但提前了解一下行政区域也会有一个比较重要的区位概念。

至少，我了解下来，断定，我大概会在朝阳与丰台这两个区之间做选择，这是比较重要的信息。也就是说，后续看到转租帖，不是这两个区的基本都可以 pass 了。

了解地铁线路规划

从我的角度，行政区域对自己的实际参考意义并不大，作为北漂狗，最重要的依据还是地铁，谁能把握住地铁的脉络，谁就能把握住性价比的尾巴。

所以首先找一张地铁线路图，这个随便一搜就能找到，比如这是百度地图里的北京地铁图：北京市地铁图 - 百度地图

由于公司在四惠南边，所以我从四惠站出发，沿各个方向延伸七站左右，其中遇到换乘加算一站。（七站大概是半小时）

如此来看，其实我的可选范围还是挺多的，整个北京东南方向的一大块都包括了。

但为了控制成本，我特意标记了地铁外环的几个拐角，直觉上这里价格会略低些：

百子湾，位于 7 号线拐角，同时也是离公司最近的侯选处，可以步行上班
北工大西门，位于 14 号线拐角，这里的最大优势是可以直达望京
分钟寺与成寿寺，位于 10 号线上，这两个地方是某北漂朋友推荐的地儿，确实便宜
青年路，6 号线，正好在环外，旁边就是朝阳公园
通州北苑，低处通州区，价格更低，旁边还有万达广场

这几个锚定的区域都需要重点观察。

了解周遭小区分布

我们可以在北京地图找房北京小区地图北京房产地图(北京链家) 简单看到大部分小区的分布情况，例如定位到我司附近的百子湾小区：

再点击一次可以看到具体到每个小区的情况，下面部分有一大块的空白看样子是商区（不确定）。

由于我司位于源创空间大厦，离周围的地铁都比较远，步行距离接近 2 公里（通勤时间可以以步行的 10 分钟/公里计算），因此光是从最近的地铁站步行到公司都要花 20 分钟，所以在租房时要考虑这个问题。

具体地说，如果我的通勤时间要控制在半小时以内，那么大概率远距离的长地铁方案就要 GG 了。（长地铁方案：选择离公司较远、离地铁较近、具有较高性价比租房的方案）比如以我司以及周围的三/四个地铁站为中心，附近有数个小区（高德地图，链家看起来太费劲了，且不全）：

这些小区中，最优先考虑的应该是在四惠东、四惠、大郊亭、百子湾（逆时针）这几个地铁站之间的小区：金海国际、金都杭城、沿海赛洛城和百子湾家园。其中金海国际可能地铁位置最优（仅次于九龙山附近的珠江帝景）但最贵（二手房价格 6w+），沿海赛洛城性价比最高（离地铁次近，二手房价格 4w+），金都杭城与百子湾家园离地铁更远。

其次就是四惠河北部的惠民园、东恒时代、十里堡等。

最后是其他地区，例如大郊亭左边的后现代城、百子湾下面的光华新城、百子湾右边的燕宝保湾家园、四惠东右下角的西店村等。

小区的选择依据

个人认为，无论预算是否充沛，最终是否直接选择离公司最近的小区，适当地过一遍周遭小区还是比较重要的，因为你选择这里的概率会比较大。

但并非必须选项，因为我们往往会考虑地铁沿线更远的区域，尤其是在我们把通勤时间放宽到一小时、公司本身离地铁也很近、最后一公里选择骑车而非步行等前提下。

但这样，我们就需要看了解地铁线路规划中所圈点的更多地铁站旁边的更多小区，工作量指数级上升。

因此，关于其他小区的通勤距离这部分，我倾向于通过程序计算，找到后再单独看~

豆瓣小组租房条目的数据获取

基于我们在平台选择中的分析，豆瓣小组是本文的重点考察对象（后续可能会考虑再加入闲鱼，据说个人房源信息也不错）。

另外，至少有两种获取豆瓣小组租房信息的方式，一种是基于网络流传的 API，一种是基于我们自己的爬虫。

豆瓣小组租房信息获取的目标设计

支持搜索小组名以获得所有相关的小组
1. 例如：“北京租房”
2. 豆瓣支持小组名字相同，所以其实会有很多个同名小组
3. 为了信息的完整性，我们最好捕捉所有符合要求的小组，然后一一进行处理
支持基于小组名（实际是 id）获得该小组下的所有话题
基于话题的标题，进行地名提取与位置确认，进行转租筛选
支持基于话题再获得话题下的图片与文字
1. 如果有了图片，是很方便可以做一些可视化网站的，势必会比较方便
2. 另外也可以做筛选，你说对方要把房子租出去，结果图都不晒一张，这合适吗，这不合适，这能要吗，这肯定不能要
3. 尽管势必会有大量照骗党，杀猪磨盘，但是也不能因为这个而畏惧呢
其他

豆瓣小组 API 方案之分析与准备

之前就听说国内电影体系都是靠豆瓣的 api，因此首先就是先谷歌一下“豆瓣 api”，结果发现果然有豆瓣 Api V2（测试版） | doubanapi，注意这两个 api_key，我一开始表示怀疑，后来发现竟然真能用……

0df993c66c0c636e29ecbb5344252a4a
0b2bdeda43b5688921839c8ecb20399b

但这个网页其实没有提供豆瓣小组的 api，于是我们继续搜索豆瓣小组是否有 api 的信息豆瓣小组是否提供 API？ - 知乎：

有几个有趣的回答，第一个给出了 api 地址与形式，并且附送了一个链接：

不过这个所谓的”出处“链接到很有意思，是 2012 年的，最关键的是，打开后一闪而过，倒也没那么快，大概一秒钟之后跳转到了新的页面，看最后一个网址就知道，是被重定位了。

刚刚是手快，截了一下，但是光截图看不到后面的内容，而想中途拉动又不可能。

这个时候，抓包神器的作用就体现了，所有网页加载都有记录，直接把 html 文件导出，再在本地打开，就发现全出来了：

最有意思的是，这个帖子似乎是有两个“小黑客”在尝试破译豆瓣的接口，好吧，也就是我正在做的事了。

读下来饶有味道：

不过他们所描述的获取 access_token 的形式，我也不清楚豆瓣是怎么想的，可能他们同时有两套接口吧，一套 oauth2，一套 api_key。总之，我们目前应该是不需要 access_token 的，直接用 api_key 就可以，并且是固定的。

比如按照上面的截图，获取豆瓣小组里话题列表的 api 是https://api.douban.com/v2/group/${GROUP_NAME}/topics，例如：https://api.douban.com/v2/group/husttgeek/topics，这个时候如果我们在浏览器中直接打开这个网址，会遇到缺少 api_key 的警告：

这不，我们一开始搜的豆瓣 api 就有用了吗，把那个 api，比如0df993c66c0c636e29ecbb5344252a4a复制过来，拼接到网址上去，重新访问，结果就有了：

得来全不费工夫！

至此，其实我们的思路已经大致确定了，不过那个知乎帖子下面还有两个 git 项目，其中一个是监控小组的，倒是确有一点意义，可供闲暇参考，分别罗列如下：

基于以上分析，可见，豆瓣的 api 还是比较稳定可用的，否则该失效早失效了。

不过有些东西还差一点。其中，我们目前只知道第二点（罗列话题）与第四点（获取话题内容）应该是有了现成的 api，而第一点（搜索小组）则没有，需要我们自己破解。另外第三点是我们后续的算法部分，与豆瓣 api 无关，故暂且可以先不谈。

豆瓣小组 API 方案之如何获取搜索小组的 API

按照直觉，我们可以通过对比正常访问的网页地址与已经明确了的 api 之间的关系，得到 $f(x) = y$，然后再把这个 $f$ 作用到正常搜索网页的网址上去。

我们先试试。

但是很快我发现搜索接口是比较独立的，是一个集中式的接口，和那些分别对应的不太一样。

紧接着我又意识到，其实这么多小组，取前几个就已经比较够了。

function getGroups() {
    curl https://www.douban.com/search\?q\=$1 2> /dev/null \
    | grep -E "小组.*<a href" \
    | gsed -E "s/.*group%2F(.*?)%2F.*/\1/"
}

这样我们就得到了豆瓣小组的前几个小组的 id，例如：

这样，我们的第一步就也痛了，于是乎，第一步获取小组 id 列表，第二步获取小组内的主题，第四步获取小组主题内容，这几步都没有技术上的问题了，接下来就是写程序，以及加上第三步算法部分。

豆瓣小组 API 方案之设计与实现

可以直接访问 https://api.douban.com/v2/group/beijingzufang/topics?apikey=0df993c66c0c636e29ecbb5344252a4a 获取 demo 结果：

基于这个返回结果，可设计数据结构如下：

class Author(TypedDict):
    name: str
    is_suicide: bool
    avatar: str  # url
    uid: str  # int
    alt: str  # url
    type: str  # UserType
    id: str  # int
    large_avatar: str  # url


class Dimension(TypedDict):
    width: int
    height: int


class Photo(TypedDict):
    size: Dimension
    alt: str  # url
    layout: str
    topic_id: str  # int
    seq_id: str  # int
    author_id: str  # int
    title: str
    id: str  # int
    creation_date: str  # datetime


class DoubanApiTopic(TypedDict):
    is_private: bool
    locked: bool
    liked: bool

    like_count: int
    comment_count: int

    id: str  # int
    created: str  # datetime
    updated: str  # datetime

    title: str
    alt: str  # url
    share_url: str  # url
    screenshot_title: str
    screenshot_url: str  # url
    screenshot_type: str
    content: str

    author: Author
    photos: List[Photo]


class DoubanApiTopicResultSuccess(TypedDict):
    """
    请求成功时的结构体
    """
    count: int  # 0 - 100
    start: int  # default: 0
    total: int
    topics: List[DoubanApiTopic]


class DoubanApiTopicResultFailure(TypedDict):
    """
    当请求失败时，就会返回这个结构体
    """
    msg: str  # "access_error"
    code: int  # 403
    request: str
    localized_message: str

有了 apikey 可以直接调用 api，获取 json 结果，然后解析，因此本过程省略，以下主要记录一下核心续爬代码：

start = 0
while True:
    result = self._get_topics_of_group(group, start, limit)

    if result.get("code", 0) == 403:
        finished_reason = f"此请求已触及该apikey<{self._apikey}>限制"
        break

    for item in result["topics"]:
        yield DoubanApiTopic(**item)

    start += limit

    if result["start"] + result["count"] >= result["total"]:
        finished_reason = "不容易，所有数据已全部提取完毕"
        break

print("finished, reason: " + finished_reason)

豆瓣小组爬虫方案之设计与实现

豆瓣的前端页面并不复杂，就是一个传统的 table 结构，按序解析即可。

有一个问题，是关于多个小组的，由于豆瓣小组针对某一话题会有多个不同的小组，甚至名字相同（但 id 唯一），这就导致需要我们对每个小组分别进行数据提取。有一种办法可以避免这样，那就是集中采集我的小组讨论页面。

比如，我所有的小组都是关于租房的，一共有三个小组：北京租房 | 北京无中介租房 | 北京租房房东联盟（中介勿扰），因此这个页面就是这些小组里所有最新的帖子情况。

但这种办法的缺点有二，一个是你要主动过滤非租房相关的小组，另一个就是不支持提供发帖者相关的信息，这在筛选上，缺失了一个比较有用的维度（不过其实我并没有使用这个维度）。

于是，基于特定小组的话题列表的字段设计如下：

class Topic(TypedDict):
    response_latest_time: datetime
    post_title: str
    post_url: str
    author_name: str
    author_url: str
    response_count: int


TopicColumns = ["response_latest_time", "post_title", "post_url", "author_name", "author_url", "response_count"]

直接使用 python 的爬虫基操（requests + beautifulsoup4）即可，核心解析代码如下：

rows = soup.select("#content .article tr")[1:]

for row in rows:
    post_title = row.select_one("td:nth-of-type(1) a")["title"]
    post_url = row.select_one("td:nth-of-type(1) a")["href"]
    author_url = row.select_one("td:nth-of-type(2) a")["href"]
    author_name = row.select_one("td:nth-of-type(2) a").text
    response_count = int(row.select_one("td:nth-of-type(3)").text or 0)

    datetime_str = row.select_one("td:nth-of-type(4)").text
    if not re.match("20", datetime_str):  # test if it's year of 20XX
        datetime_str = f"{datetime.now().year}-{datetime_str}"
    response_latest_time = datetime.strptime(datetime_str, "%Y-%m-%d %H:%M")

    yield Topic(post_title=post_title, post_url=post_url, author_url=author_url,
                author_name=author_name, response_count=response_count,
                response_latest_time=response_latest_time)

豆瓣小组租房条目的数据分析

筛选条件分析

在获取到豆瓣租房的目标条目之后，第一步要做的就是程序化筛选。

目前使用的可供筛选的组合条件如下：

（推荐）发布日期在近十日之内（好房子一般都会较快被租出，因此过长还未租出的房子不值得一看，N 目前可设置为 7 或者 10，是一个比较合理的值）
（推荐）如果是男生求租，标题中带“女生”、“限女”等字样的一般不予考虑，并且此类房间不在少数
（参考）基于一些简单规则提取标题中的价格信息，例如“3000”或者“3 千”等，如果匹配上，便使用预设的价格区间进行筛选
（参考）对评论数进行筛选，评论数一定程度上代表着房源的质量，或发帖者的积极性（有些发帖者会自己给自己顶帖，说明不是中介），因此筛选评论数大于 0 是非常有意义的，但需要谨慎，因为有很多帖子评论数为 0 但房源确实是有效的
（参考）继续下爬发帖者的个人信息，鉴定其豆瓣的活跃度，以筛选出非中介的发帖者，但这个就涉及到二级页面的爬取，是一个比较大的开销，需要平衡是否有必要
（参考）由于帖子的标题比较个性化，小区名、地铁线路信息、行政区域信息等的曝光度均不同，因此基于正向筛选有可能会误杀一些帖子（例如以标题中存在朝阳二字筛选帖子，就有可能把只写了朝阳内部小区名字的帖子给误过滤掉），与此相比反向筛选比较有效（例如如果标题中出现了大兴二字，一般就肯定不是朝阳区域的房源）
（推荐）本项目最有意思的一点是做了智能测距筛选，具体做法就是将豆瓣标题信息喂给高德的区位编码 api 得到经纬度，再结合目标公司经纬度使用高德的线路规划 api 算出通勤时间，然后再进行筛选，该解决方案应该是目前网络上独一份，实际使用下来，也许有误杀，但很有效。

高德 API 之申请 Key 配置

访问 https://console.amap.com/dev/user/permission/authenticate/person 基于支付宝实名认证申请 key：

很快，认证成功：

创建一个 key:

获得 key：33e2719dbfXXXXXXXX1b2e6786052

最后再看一下服务配额：

高德 API 调用之地理编码与逆地理编码

参考地理/逆地理编码-API 文档-开发指南-Web 服务 API | 高德地图 API 可以知道，高德地理编码就是将地址文本转成经纬度。而后续的比如说查询两个点之间的通勤距离都是基于经纬度的，因此地理编码 api 非常重要。

重点参数是 address，即输入的地址。尽管 api 介绍说是一个结构化地址信息，但实际经过我的测试，一个非结构化的但包含地址的句子也是可以的，高德比较智能，会提取出它认为最有可能的地址，这个非常有用！因为允许我们将豆瓣帖子标题直接喂进地理编码而不需要预先提取出地理信息，这让我原先设想的基于 jieba 分词并多次调用高德 api 进行测试的复杂、低效、高耗方案变得不再需要！

第二个重要参数是city，我们只需要指定为北京，这样便免去了全国搜索，速度更快准确度更高。

第三个重要参数是batch，这个允许我们在一次 api 调用里同时返回多个关键字的搜索结果，这或许可以大幅降低我们的 api 开销，因为高德 api 有额度限制（但我不确定最终计算时，是以单词算，还是 N 次算，我暂时没用）

同样，参考地理/逆地理编码-API 文档-开发指南-Web 服务 API | 高德地图 API 我们可以知道逆地理编码就是从经纬度得到结构化地址。

但值得注意的是，地理编码与逆地理编码之间的输入输出并不完全互逆，因为输入的地理信息也许是非标准化的地址，但输出的逆地理编码信息一定是高德内部标准化的。

这个 api，可以用，也可以不用。

高德 API 调用之公线路规划与步行规划

在同城通勤线路规划中，我们主要用公交地铁线路规划（长距离），或者步行规划（短距离）。

参考路径规划-API 文档-开发指南-Web 服务 API|高德地图 API 可以知道公交路径规划的实际含义是transit/integrated，也就是综合了各类公共（火车、公交、地铁）交通方式的通勤方案。

因此，这个 api 就是我们在手机高德地图里最常用的导航方案之一，基于公交、地铁、步行的灵活规划。其中origin和destination是出发地和目的地的经纬度，这个可以通过我们之前说的地理编码 api 获得，city 依旧和之前的 city 保持一致即可（比如：北京）。值得注意的是strategy参数，我个人倾向于使用3：最少步行模式，也许也会有人倾向于使用1：最经济模式 或者 0：最快捷模式，这个因人而异。

而我们最关心的输出则是里面的duration字段，它是我们通勤方案所需要的秒数，我们将基于这个进行地点筛选。

但仅通过公交线路规划是不够的，因为如果出发地和目的地距离很近（例如不超过一站），则会没有结果，此时就需要使用步行规划接口。

参考路径规划-API 文档-开发指南-Web 服务 API|高德地图 API ，我们只需要输入起始地和目的地的经纬度即可。

结果里依旧只需要提取duration字段，单位还是秒。

高德 API 封装设计与实现

理清我们需要的输入与输出。

我们通过豆瓣的 API 或者爬虫可以获得 N 条租房帖条目，其中包括标题，但是不包括具体地址，我们需要通过标题知道这个帖子是否符合自己的通勤目标。

将标题直接喂入高德地理编码 api，得到经纬度 A1
将经纬度 A1 与提前算好的目的地（公司）的经纬度 A2 同时喂进公交路径规划
1. 如果公交路径规划有结果，则选取第一条；否则再将 A1、A2 喂进步行路径规划
2. 如果步行路径规划有结果，则选第一条；否则报错、丢弃（推荐）或者人工审查
基于算好的通勤时间进行筛选，例如控制在 30 分钟以内或者 60 分钟以内，因人而异

由于每个高德 api 都需要 key，因此可以使用requests.Session直接预先设置好固定的 key 参数。

设计如下：

# 配置
KEY = "XXXXXXXXXXX"
CITY = "北京"    # 地理编码更快
STRATEGY = 3    # 步行距离最短方案

# 配置全局字典，减少高德调用
ADDR_NAME2LOC_DICT = {}  # 地理编码全局字典
DURATION_DICT = {}       # 计算两个经纬度之间的距离，统一用"-"拼接成字符串，方便序列化

# 初始化
import requests

s = requests.Session()
s.params["key"] = KEY

# 调用地理编码api
def get_addr_loc(addr_name):
    """
    使用全局字典减少高德api调用
    """
    if addr_name not in ADDR_NAME2LOC_DICT:
        result = s.get('https://restapi.amap.com/v3/geocode/geo', params={
            "city": CITY,
            "address": address,
        }).json()
        count = int(result.get('count', 0))
        if not count:
            # 不报错，但是赋值为空
            ADDR_NAME2LOC_DICT[addr_name] = None
        else:
            ADDR_NAME2LOC_DICT[addr_name] = result["geocodes"][0]["location"]
    return ADDR_NAME2LOC_DICT[addr_name]

# 调用步行路径规划api
def _get_walking_duration(from_loc, to_loc):
    result = s.get('https://restapi.amap.com/v3/direction/walking',
                   params={
                       "origin": from_loc,
                       "destination": to_loc,
                   }).json()
    count = int(result.get("count", 0))
    if count == 0:
        # 用一个不可能的值表示没有找到任何规划
        return -1
    return int(float(result["route"]["paths"][0]["duration"]) / 60) # minutes

# 调用公交路径规划api
def _get_transit_duration(from_loc, to_loc):
    result = s.get('https://restapi.amap.com/v3/direction/walking',
        params={
            "origin": from_loc,
            "destination": to_loc,
            "strategy": 3, # 0：最快捷模式, 1：最经济模式, 2：最少换乘模式, 3：最少步行模式, 5：不乘地铁模式
            "city": CITY
        }).json()
    count = int(result.get("count", 0))
    if count == 0:
        # 如果没有公交路径，就使用步行路径
        return _get_walking_duration(from_loc, to_loc)
    return int(float(result["route"]["transits"][0]["duration"]) / 60) # minutes

def get_duration(from_loc, to_loc):
    """
    使用全局字典减少高德api调用
    """
    key = f"{from_loc}-{to_loc}"
    if key not in DURATION_DICT:
        DURATION_DICT[key] = _get_transit_duration(from_loc, to_loc)
    return DURATION_DICT[key]

if __name__ == "__main__":
    """
    测试demo
    """
    from_addr = "天坛公园"
    to_addr   = "天安门广场"
    from_loc = get_addr_loc(from_addr)
    to_loc   = get_addr_loc(to_addr)
    duration = get_duration(from_loc, to_loc)
    print({
        "from_addr": from_addr,
        "from_loc": from_loc,
        "to_addr": to_addr,
        "to_loc": to_loc,
        "duration": duration
    })

基于 pandas 进行筛选

import os
import pandas as pd

from gaode import get_coords_from_addr, get_transit_duration_between_coords, \
    get_addr_name_from_coords, TARGET_ADDRESS
from globals.utils import dump_dict

DATA_FROM_DOUBAN_DIR = "data_from_douban"
FILENAME = '2022-03-02-zhufang.csv'

filepath = os.path.join(DATA_FROM_DOUBAN_DIR, FILENAME)
print("reading file from: " + filepath)
df = pd.read_csv(filepath)

# convert response_latest_time format into datetime
df.response_latest_time = pd.to_datetime(df.response_latest_time)

# filter datetime
df = df.query("'2022-02-25' < response_latest_time")


# filter personal
df = df[df.post_title.apply(lambda s: "女生" not in s and ("个人" in s or "直租" in s or "转租" in s))]
print("shape: ", df.shape)

# get coords from title
print("getting coords from title")
try:
    df['addr_coords'] = df.post_title.apply(get_coords_from_addr)
finally:
    dump_dict()

# get addr from coords
print("getting addr name from coords")
try:
    df["addr_name"] = df["addr_coords"].apply(get_addr_name_from_coords)
finally:
    dump_dict()

# get duration between coords
print("getting distance from coords")
try:
    df["transit_minutes"] = df["addr_coords"].apply(
        lambda x: get_transit_duration_between_coords(x, get_coords_from_addr(TARGET_ADDRESS)))
finally:
    dump_dict()

# filter duration
print("filter duration")
df = df.query("transit_minutes < 60")

# sort
print('sort')
df = df.sort_values(by=["transit_minutes"], ascending=True)

# dump
print("dump")
df.to_csv(filepath.replace(".csv", "_filter.csv"), encoding="utf-8")

df

结果如下，注意右边的三列addr_coords | addr_name | transit_minutes，就是我们基于程序计算出来的通勤相关数据：

人工筛选环节

表格迭代标注法

我个人初步的预算其实是 2000-3000、通勤在一小时内、最好能有个独卫，后续证明这在大北京确实是一个相当困难的目标。

所以后续为了能有合适的标的，把预算上升到了 4000。

在经过我的程序化简单筛选后（基于明显不符合自己要求的标题、日期，以及基于标题与高德求解的路径耗时），两张 1w+的表格均被晒成了不足 100 个。

紧接着，我将逐一打开这些链接，进一步确定是否尝试联系或者直接 pass，并进行标记：-表示 pass，+表示有意愿。

例如，这是“北京租房”小组的表格经过手动标注后按照 choice 列降序的结果：

在处理“北京无中介租房”小组时，觉得单纯的加与减的级别控制还不太够，所以又使用了+ | ++ | +++三种级别，加号越多表示越想要，从而排序。而这些加号前面的-号，则表示后续在线联系或者线下考察后决定拒绝或者放弃。

因此标注、再标注，这是一个循环迭代的过程，最终我们将只会有很少甚至没有符合目标期望的房源。亦或者，我们将选用最后一个被排除的房源。

这有点像相亲，硬性条件不达标我们一定会 pass，可容忍的缺点我们尝试去磨合，宁缺毋滥。

线上联系也有一些困难

因为豆瓣其实不是一个主打即时社交的平台，很多朋友只是在这里发个帖子，然后就可能……人都找不到了，或者好几天才回。

在我的目标候选房源中，有一个超小的单身独卫房，非常廉价，但却始终联系不上： 6 号线青年路次卧转租—无隔断独立卫生间独立阳台 6 号线青年路次卧转租—无隔断独立卫生间独立阳台

但也许是很快就租出去了吧？反正，豆瓣私信一直联系不上。

所以，错过了见识一下目前认知里最小的独卫房的机会。

其他的基本联系上了，但有些确实比较慢。

通过线上联系直接被 pass 的主要分三种吧，发现是中介、限女生、价格过高等。

也有一些非常热情的朋友（小姐姐居多），聊天、谈房的感觉还是很愉悦的，所以转租确实比冷冰冰的中介体验感要好很多，你更大概率能感觉到真实、安心。

线下跑房环节

时间与路线规划

再经过一两个小时的手动筛选与一一联系之后，就基本决定了线下跑房的计划。

基于豆瓣租房的筛选，我两天内一共跑了九套房，其中第一天晚上原定五套实看四套，第二天下午原定四套实看五套。

当时还挺天真，每两套之间计划只用半小时，结果发现严重不够，最后还鸽了几个……

第二天吸取教训，把每套房控制在了一小时的间隔：

以及路线方面：

跑房（1/10）: 望京附近，3000

第一个房是在望京附近，因为我们甲方爸爸在那，我可能较长一段时间都在那。

当时也是第一次使用豆瓣租房找房，于是找了一所，是个小姐姐，某天晚上过去看了一下。

后续觉得还是得以公司为中心比较好，所以之后开始使用程序手段进行豆瓣小组租房信息爬取与筛选。

跑房（2/10）：3-31陆翔佳园_3200

三月三号晚上整体的跑房策略，就是一路向东，第一个房是在百子湾附近，是一个中介，谈的是金海国际，初步价格是在 3200-3500。

我看着房子还算不错，另外也 check 了地图，是离大郊亭地铁站（这个地铁站位置还行，比百子湾更近中心一点，左边一站就是 14 号线上的九龙山）最近的一个小区。

然而，当我到了之后，有两个中介，好像一位是另一位的上级，我跟着另一位坐上了小电驴……路上问我的价位预期和要求……

我千辛万苦筛选的豆瓣租房，还是从中介的一辆小电驴，开始出发了……

然而，我们最后还是没有去金海国际，因为如果想要金海国际的独卫，预算至少估计要 4000 起，我说我最多只愿意开到 3500，于是我们到了一个小区：陆翔佳园。

【视频：3-31陆翔佳园_3200.mp4】

这个小区的开价是 3500，有独卫，空间也不小，朝南，有飘窗，听说是部委房。我特意还去看了一下其他次卧的公共洗手间，非常狭窄，说实话觉得“他们”真地很不容易。

中介一直劝说我不用看客厅走廊别人的卫生间什么的，因为我住在主卧，但走廊什么的都很挤很暗，因此觉得还是难以让我满意。

另外，还有一个把我怵到了的地方，就是独卫里竟然还有个大石墩子。中介说可能是用于坐的，房东的，搬也搬不走。我当时竟然还脑补了 maybe 是一个小浴桶……但后续却始终想不明白……

和这个令人想不明白的石墩墩同处一室的还有一个高级马桶，是所有房中最高级的，有点像外滩外企的马桶，emm，也忘记看是啥牌子了……

跑房（3/10）：3-32珠江绿洲_3800

看完第一个陆翔佳园的房子其实已经 7 点 40 多了印象中，于是原定于 7 点半的慈云寺的房子就肯定早已经被鸽了，事实上，最后即使租完房也没去成，彻彻底底地鸽了……

因此直接奔赴预定 8 点钟传媒大学附近珠江绿洲的房源，不过看起来是一路向东，其实并没有那么简单……饶了一个大圈呢：

珠江绿洲这个房源让我很放心，因为是个人，并且是刚毕业的两个小姐姐，非常热情，早早地在小区门口等待我。交流下来，非常地纯粹、坦诚，而且有趣地是她们竟然也是（准）程序员，一个做前端，一个做数据分析。

珠江绿洲给我的感觉怎么说呢，房间很大，是主卧，并且有独卫，据小姐姐反馈，原先她们是住三个人的……现在是住两个，并且有两张床……显然，对于我来说，一方面有点多余，另一方面则有点奢侈了。

【视频：3-32珠江绿洲_3800.mp4】

我也开始动摇对独卫的执着，房源未必没有，但独卫的主要适用对象真不是我这种单身狗……而且价格实在太贵了，有 3800……是所有我看的房源中最贵的，并且离公司更远，也离市中心更远。

不过这个小区倒是很繁华，一楼的各种店铺一应俱全，游乐场、小公园的什么的都有，说实话，很宜居。

由于两位小姐姐其实已经找到了新的合租房，并且不在这里，所以我看完后就一起下楼了，一直亲送我到天桥附近，这份真诚让我很是感动，我也是真地很想接手她们的房源以免刚毕业的她们承受不菲的违约费用。

跑房（4/10）：3-33周井大院_2900

从珠江路景出来后，继续往东，奔赴周井大院，下了地铁后还要走大概 1.5 公里……

这个房是两室一厅一卫，内部空间还是挺大的，小屋也被布置的比较温馨，地面还铺了条毯子。租客是一位在北京工作八年多的小姐姐，做餐饮行业，不知道是原来在大兴现在要去望京还是原来在望京现在要去大兴于是想转租出去。

【视频：3-33周井大院_2900.mp4】

我仔细问了小姐姐平常她的交通工具是什么，毕竟离地铁比较远，她说自己是骑小电驴到地铁的，我便恍然大悟了。

不过有一说一，对于长租的人来说，离地铁远一点，配个小电驴，确实能显著提升生活体验，拥有近乎最高的性价比。

不过这个房除了区位较偏、地铁较远、小区较老（没有电梯）外还有个问题，原先帖子里说的是隔壁是有个很 nice 的小哥哥，现在被告知是一对情侣。除此之外，我觉得还行。

跑房（5/10）：3-34珠江帝景_3500

接下来就是当晚最后一间房了，位于九龙山附近的珠江帝景。

理论上来说，这个房的地理位置应该是最好的，因为九龙山这里连通 14、7 号线，到达望京只需 50 分钟，到达四惠的公司只要 3.5 公里，比较适合骑车，地铁路线里只有大郊亭到百子湾段是重合的。

珠江帝景也是中介推荐房，两个人把我领了上去。

【视频：3-34珠江帝景_3500.mp4】

看了一圈下来，空间肯定没有珠江绿洲的大，但其实有的一拼，不过珠江帝景对于我来说，这个阳台的造型与风景更加的有范，比较中我意，但没有独卫则是它的硬伤，否则 3500 的价格在这个地段、这个小区，那是相当值得。

但让我感到不是很舒服的两点之一，珠江帝景这个房是转租，在豆瓣帖上写的很清楚不要中介费和服务费，但是在问中介费用时却仍在算，说给个 500 什么的，我说帖子上不是这么写的，中介立即反应说是嘛，这块晚点确认一下，然后又聊了几句之后我再次确认费用核算时，中介已经没有提中介费相关问题，你细评。

第二点是关于服务费的，因为正常转租，人比较好的或者急于转的一般都会在帖子里说服务费就让给你了不用你交了，等到期再自己和房东与中介谈，这种就是比较友好的，比如珠江绿洲的两个小姐姐就是这样做的。但这里情况就不一样，是要我继续交接下来几个月服务费的，关键是，租客非常同意这一点，说不要在意那么点钱，他自己交了一年都不讲究。租客是一个 tony，也许工资确实比我高？但我觉得我就是很在意这个问题，中介凭什么能收两次服务费，除非，是转给“你”的吧。

不过说来也巧，过了几天后我刷到了一个动画视频，里面的房型竟然几乎和珠江帝景一样！让我哭笑不得。

跑房（6/10）：3-41劲松七区_3000

第二天，吸取昨天的教训，放弃独卫的要求，从而尝试更多高性价比与高质量的房源。

从中午一点起，第一个房源是位于二环的劲松七区，价格 3000。

当天风超级大，出门时打电话都听不大清的那种。而在劲松七区内还看到一条被拴住的小狗，非常可怜。

【视频：大风中的小狗】

这个小区没有电梯，款式较老，有一个小学，感觉像学区房。

“受命”转租的是一位非常 nice 的小姐姐，待转租的是她的一位已经结婚准备回去生孩子的室友。另一位室友也是女生，做行政工作。

房间在六楼，从地铁走过来一点几公里还要再爬六楼着实还是有一点辛苦。

好在小姐姐很热情，消除了我很多的顾虑，我想的是如果租这个房至少能有一个非常好的室友关系，因为这位室友是做财务的，和我本科专业很有联系。

【视频：3-41劲松七区_3000.mp4】

但最大的问题是，这个房子需要三月底才能搬，这对于我来说，并不太可行，因为这几天为了租房一直在住酒店，我需要尽快搞定租房问题。

跑房（7/10）：3-42惠生园_3300

紧接着就是惠生园，我预留了两小时，正好能赶上。

其实从地图上看，惠生园已经离地铁很近了，但我依旧走了较长时间，所以，一点几公里的步行，可能才是人生常态……

带我看房的是租户的室友，一位刚辞职可能准备创业的很斯文的男生，房间里还有一台大打印机。租户是位女生，房间很明亮。

【视频：3-42惠生园_3300.mp4】

我能够接受房子价格高于 3000，但我不能接受高于 3000 还没有独卫……另一方面，它似乎离地铁与公司有点远。

但事实上，这也有可能是出于路线偏差。在来看这个房前因为已经走了很长的路（包括去劲松七区、走错地铁、走到惠生园等），看这个房之后又走了很长的路（2-4 公里）到公司附近得了另一套房，所以强化了惠生园走路很远的印象，然而事实上可能并不是，不过惠生园到公司确实不近，并且没法做地铁。

又查了下地图，好像只要两公里，可能当时走绕了……

跑房（8/10）：3-43沿海赛洛城_2200

下一站是沿海赛洛城，就在公司南边 1.2 公里，是一个中介介绍，只要 2200 很便宜的房子。

实际上惠生园到公司与沿海赛洛城到公司距离差不多，两者到地铁距离也差不多，惠生园接近 1 号线，沿海赛洛城接近 7 号线（也可以往上去 1 号线），所以惠生园应该来说地理位置更好一些，但沿海赛洛城去公司不需要经过快速路，相对比较安全。

房客是一个看起来很精致、大方的女孩，经问住了两年了，这次是到期搬走而不像其他的转租一样是中途搬走（因此需要重新交中介费与服务费等）。

【视频：3-43沿海赛洛城_2200.mp4】

也许是受这女孩感染，查看了她的房后，觉得这个房间虽小，却也能够容纳很多人的生活与梦想，有种很温馨的感觉。

但让我觉得比较遗憾的是，这个窗户太小，并且是朝东的……

跑房（9/10）：3-44百子湾家园_2500

看完上个房后，中介说还有一个房，更大一些，要 2500，我看现在离下一个房还有点时间，所以也答应去看了。

【视频：3-44百子湾家园_2500.mp4】

这个房确实就比 2200 的更明亮许多了，窗户很大，但是厨房比较脏，洗手间的水龙头竟然还打不开，这让我很不能接受。也许是正在维修什么的，但毕竟我今天是来“查房”，第一印象不好那后续就更麻烦了。

跑房（10/10）：3-45垂杨柳百里_3500

看的最后一套房是一个很奇葩的户型：团结户，我也过来见见世面。

接待我的是一个中介，屋子还坐着一个，好像也是中介，地道的北京人。

【视频：3-45垂杨柳百里_3500.mp4】

这个房是个老房子了，据说墙体十分结实，后来过来一个 89 岁高龄的老爷爷，跟我们说住这已经五十多年了，唐山大地震那会对面 70 年代的“新”楼都裂缝了，这个房一点事都没有，还是十二人十二天盖一层楼搞定的……听的我直愣。

【照片：老爷爷】

老爷爷问了这个房的价格，中介来之前说是 3500，然后今早房东说要加价 100，所以是 3600，老爷爷听完说那还好，一楼的那个房厨房面积比这小一半前两年还要 3800……

由于这是我看的最后一套房了，所以聊了很久，中介也很会看人，见我已经从原来的室内站到门口了，说让我进来再坐会再聊聊，聊我的工作和家乡地，还讲了点南京的段子，还是挺有意思的。北京人就是实诚，有话就说，也能说会道，这个我还挺欣赏的。

以上就是全部跑房相关内容的记录了，更多问题可以私聊我。

投票环节

如果是您，您会选择哪一所房源：

【投票：房源】

info

author: 南川

version: 0.1.0

date: 2022-03-09
content:
1. 完成全部代码封测，并开源，项目地址： mark-applications/data-science_douban-houses
2. 使用该项目为身边某位正为租房而发愁的朋友无偿提供帮助

version: 0.0.4

date: 2022-03-08
content: 增加人工筛选与线下跑房环节

version: 0.0.3

date: 2022-03-07
content: 完成高德 API 部分

version: 0.0.2

date: 2022-03-05
content: 完成豆瓣小组 api 部分

version: 0.0.1

date: 2022-02-28
content: 完成租房方案设计部分

编程 | Linux 笔记

2022/02/02 · 阅读需 59 分钟

很久以前，自打我刚开始学计算机，就有一个认知：精通 linux 的人都是大神。
但，我是不需要学的。
很长一段时间，我都是这么认为的。
直到这两年，我选择了 Mac 作为自己的主力开发工具，两年的摸爬滚打，尤其是始于对iTerm + zsh + oh-my-zsh颜值的惊艳，到现在能够在 mac、ubuntu、centos、windows 等各大操作系统之间自如切换，并逐渐意识到命令行系统对于现代操作系统的重要性，命令行熟练度对于提升工作效率的重要性，我知道，我已经走向了一条与计算机越来越近的不归路。
而这一切中间的桥梁，正是 linux，于我而言，不认识不掌握 linux，不可谓入了程序的真正世界。
因此，本系列第一篇，献给 linux，一方面是致敬我心中曾经那触不可及的信仰，另一方面也是为了能重新认识它，对它说一句它可能真正想听的："hello linux, I'm mark"。
因为每一个 linux 的学习者最后都会发现，linux 并不神秘，而 linux 真正启迪人的，是它的设计哲学，与对你工作流的重塑。
在此，我斗胆把 python 之禅搬于此，因为在我心中，它于 linux，是心有灵犀，同样适用：

首先，linux 是什么

作为南川的核心开发笔记（从实际工作经验中提炼出来的笔记），我无意过多展开一些基本的背景介绍，因此也不会花时间去讲述 linux 和 unix 和 mac 之间的关系，这些读者们都可以很方便地在互联网找到答案。

我只简单地如下描述，目前用于个人使用的电脑系统，主要分为 windows 和 mac。

windows 的最大优点，是价格相对平民，生态丰富，交互习惯最符合人类直觉。

mac 的最大优点，是审美一流，做工一流，触控板和屏幕天花板的存在，然后于我而言最大的优点，是其 unix 系统，可以允许我像 linux 一样方便地使用命令行。

所以，我在使用了五年的 windows 后，毅然投入了 mac 的怀抱，而其中的目的之一，就是为了更好地掌握 linux。

（当然，纯粹使用 linux 我是不会接受的，我还是觉得追求审美也是生活中很重要的一部分，于是乎，~~我现在已经极其极其不想看到任何 windows 的界面~~）

说了这么多，还是没说 linux 是什么。

没错，就应该这样，这篇是核心开发笔记，非核心的，可以出门右转某乎，比如：

PS: 以上链接，我都没有看过，只是随手一搜。😄

其次，如何拥有一台 linux

有很多种方法。

如果你是 windows 10+用户，最方便的方法，是基于内置的wsl系统，不知道是什么？出门右转。

至于内置的wsl会不会有什么限制和性能损耗？这个我也不知道，我不用，如果你知道，欢迎留言，我还挺想知道的，只要你愿意留我就愿意听，并且可能会影响我接下来的认知。

但我不会因为这个出门右转，我已经为了给大家示范如何出门右转，已经出门一次了，寒冬腊月，怪冷的。

这是其一。

其二，装一个双系统。

我一直想装个双系统，并且尝试到了最后一步，因为工作需要。当时我给我的 mac 装 ubuntu 的双系统，结果到了安装界面，鼠标和键盘没有响应，查了查好像我的版本 mac 2020 pro 的鼠标和键盘不走 usb，而是总线？

不清楚，后续我可能会继续尝试装个双系统，目前我用的是虚拟机。

PS：友情提示，小白不要轻易尝试双系统，容易留下不学无术后悔的泪。

其三，装一个虚拟机，比如vmware或者virtual box。两个都尝试下来，个人比较喜欢vmware。

至于怎么装，怎么配置，出门右转，或者等我后续系列，会有虚拟机/双系统专刊的，应该（如果不懒的话）。

我目前的虚拟机（内部运行的 ubuntu，可以同时运行多个操作系统）界面如下：

其四，阿里云/腾讯云/华为云/亚马逊云……租一个 linux 服务器或者操作系统等。这是业界搭建后端必备。我之前一直用的阿里云/腾讯云，因为以前做全栈多一些。

目前市面上，服务端应该用 centos 比较多，客户端应该用 ubuntu 比较多，主要原因可能是 centos 默认没有界面，程序较为稳定；ubuntu 的界面很好看，比较适合个人鼓捣。

我目前使用的 ubuntu 界面如下：

这里值得注意的是，有一些企业可能购置了类似于 windows server 之类的服务端，这种本质上还是 windows 系统，不是 linux 系统，只不过一直跑在公司网络上。

所以当我们谈到服务器的时候，可能并不一定是 linux 系统，这是值得注意的。

我为啥知道这玩意呢，因为之前碰到一个项目是这样的，我本来写好的 linux 的后端，想着直接移到目标公司网上，结果一看，windows server，把我给整不会了。

接着，如何登录 linux

理论上，登录这个动作，不应该是个问题，但那仅限于是登录自己的本地的每天都用的电脑。

linux 一般都是在服务端的，一般涉及到远程访问，所以通用做法是用ssh。

基本的 ssh 用法出门右转即可，本文只记录如何免密登录 ssh。

因为 ssh 默认每次都会输入密码，很不畅快。更严重的是，如果涉及到写批运行脚本，则密码输入会成为脚本杀手，至少会让脚本更难理解，出错概率更高。

很简单，第一步，先ssh USERNAME@SERVER，其中USERNAME和SERVER是目标服务器的用户名和 IP 地址（或者域名，如果有的话）。

一般，默认的USERNAME是root，密码也是root，虽然这很不推荐，不安全，但对于新手却是很友好。

尽管我的密码更简单，简单到你会拍案惊奇。

当你第一次ssh之后，会在本机的当前用户文件夹下（mac 和 linux 都是~文件夹，windows 是C:\\Users\\XXX）生成一个.ssh/id_rsa.pub，这是公钥。

这个文件的内容，涉及到了 rsa 算法等密码学相关内容，不展开。而你要做的，就是把里面的文本复制，粘贴到目标服务器的~/.ssh/authorization_keys文件内（如果文件不存在，则新建；如果文件已有内容，则另起一行，补在后面即可）。

以下给出一键脚本：

USERNAME="xxx"
SERVER="xxxxx"
file="id_rsa.pub"

scp ~/.ssh/$file $USERNAME@$SERVER:
ssh $USERNAME@$SERVER
cat $file >> .ssh/authorized_keys
rm $file

这个脚本里的scp是在本地与服务器之间交换文件的意思，最后一个冒号表示当前文件夹，而$USERNAME@SERVER即目标服务器的用户文件夹，这就对上了。

cat是读取所有文本内容到输出流，>>是把输出流内容添加到目标文件的末尾。

rm是删除，这里代表传送的file是一个中转文件的意思，为啥要中转，你猜。

最后，为了防止ssh超时断开，可以在本地（客户端）的ssh配置文件中补一下以下配置，这样每分钟发送一次消息给服务端，服务端就不会主动清理客户端了。相信我，如果不解决这个问题，后续你会很头疼的。

# /etc/ssh/ssh_config
HOST: *
    ServerAliveInterval 60

最后，如何掌控 Linux，以下给出一部分经验笔记

-------------------------------------

BEST-PRACTICE: ubuntu initialization

step 0. install

Ubuntu 18.04.6 LTS (Bionic Beaver)

step 1. config apt source

1. change apt source

ref:

【Ubuntu】Ubuntu 18.04 LTS 更换国内源——解决终端下载速度慢的问题 - 知乎

fastest/script way: 直接修改`/etc/apt/sources.list`

~~其中，第一种方案虽然可行，但是有两个缺点，一个是侵入性高，所以一般都会先备份一下原文件，然后使用替换命令。~~

Update 2022-01-24：实际上所有方式底层都是通过修改 apt sources.list 文件配置的，所以没有什么侵入性一说，都会先做备份。

GUI 版（也就是softwares & update app 里会提供测试，自动修改，改完之后触发更新操作，而这些其实可以自己用脚本完成，响应也更快！

# backup source file
APT_SOURCES_LIST_FILE=/etc/apt/sources.list
cp $APT_SOURCES_LIST_FILE $APT_SOURCES_LIST_FILE.bak

# change source
APT_SOURCE="http://mirrors.yun-idc.com/ubuntu/"
sudo sed -i  "s|deb \S+|deb $APT_SOURCE|g" $APT_SOURCES_LIST_FILE

# update source
sudo apt update

另一个就是，只能填入自己想填的源，但很可能不是最好的源。比如我一直以为阿里云的源很好，结果这次发现，极其地慢，我确认在update时走的是cn.xxx，但依旧只有十几 k。

robustest/recommend-for-newbie way: 在`softwares & update`里修改 server

第二种办法就是在softwares & update里自动测试最好的服务器，然后选择它。我测出来是yun-idc最好，但是因为我没用过，所以当时没考虑它，后来被阿里云折磨的不行了，就试了它，结果飞快，十兆每秒！

use others way: 使用别人写好的 git 仓库进行配置

第三种办法，由于我还没配好源，所以也没 git，而此时下载 git 又会极其地慢，所以也不考虑。

2. update apt

!!!tip If the following commands warn that files are locked, then wait for some minutes, or use lsof FILE to have a check, more refer to: - 解决 apt-get /var/lib/dpkg/lock-frontend 问题 - 知乎

sudo apt get update
sudo apt get upgrade

step 2. config git

sudo apt get git
git config --global user.name YOUR_NAME
git config --global user.email YOUR_EMAIL

step 3. config terminal

step 4. config language

resolution 1: config chinese input source via ibus

ibus 输入方式和搜狗输入法之间不是很兼容，如果出现两者都无法输入中文的情况，建议根据Ubuntu 18.04LTS 安装中文输入法以及快捷键设置 - 简书先卸载搜狗输入法，确认 ibus 是可以用的，具体就是：sudo apt-get remove fcitx*

确认以下配置：

设置 - 地区与语言 - 输入法，中要添加包含智能拼音：

语言支持中，要启用 ibus 方案
注销账号重新登录即可。
如果注销账号还不行，就要尝试sudo reboot一下。

FIXME: resolution 2: config chinese input source via sougou

虽然按照官网走了好几遍，但始终还是没有配成这套方案，我也不知道什么原因。

而且 ibus 方案也必须在 fcitx 卸载之后才能正常使用。

参考官网：

搜狗输入法 for linux

更新：确保：

输入源中只有一个英文（不要有 ibus 的中文，否则会干扰）

输入系统用fcitx

再更新：算了吧，我个人觉得搜过 ubuntu 的 bug 是真地多，一会能用一会不能用，尤其是我在调成中文版能用后又调成英文版，然后用不了了，接着再怎么调中文版都用不了了，非常蛋疼。

how to switch language input source

The best way is to directly install the Chinese version of ubuntu, since the Chinese feature is built-in.

However, what we download directly from the official ubuntu website, may not support chinese choice natively. So it highly depends on what distribution version of ubuntu we download.

ref:

(20 条消息) ubuntu 切换中文输入法_g_y的博客-CSDN 博客_ubuntu 切换中文输入法

how to change language to english

echo export LANGUAGE=en_US.UTF-8 |  sudo tee -a ~/.bashrc

ref:

command line - How do I change the language via a terminal? - Ask Ubuntu

IMPROVE: a script to init ubuntu (may not work)

"
version: 0.0.3
features:
    1. disable sudo password so running commands faster
    2. disable apt password so installing packages faster
    3. enable arrow up/down to backward/forward search prefix command
    4. auto change deb(apt) source
    5. auto install zsh, config oh-my-zsh, set as the default shell, you may need to switch to bash when build android in case of errors, it's easy just to input `bash` in terminal
    6. re-login to make these changes work
"

# !IMPORTANT: config global variables
PASSWORD=" "

# --- step 1 ---
# write password variable into bash_profile
echo "export PASSWORD=$PASSWORD" >> ~/.bash_profile
source ~/.bash_profile

# disable sudo password, ref: https://askubuntu.com/a/878705
echo "$USER ALL=(ALL:ALL) NOPASSWD: ALL" | sudo tee -a /etc/sudoers.d/$USER

# --- step 1.5 ---
# enable backward/forward prefix commands search
echo '"\e[A": history-search-backward
"\e[B": history-search-forward' >> ~/.inputrc
bind -f ~/.inputrc

# --- step 1.7 ---

# change timezone, so that time display ok
echo "export TZ='Asia/Shanghai'" >> ~/.profile # need relogin

# --- step 2 ---
# update apt and install packages

# change apt source
APT_SOURCE="http://mirrors.yun-idc.com/ubuntu/"
APT_SOURCES_LIST_FILE=/etc/apt/sources.list
sudo sed -i.bak -r  "s|deb \S+|deb $APT_SOURCE|g" $APT_SOURCES_LIST_FILE

echo "Y" | sudo apt update # need confirm but skipped since configured
echo "Y" | sudo apt upgrade

INSTALLED_PACKAGES="vim git htop zsh terminator"
echo $INSTALLED_PACKAGES | xargs sudo apt install -y

# --- step 3 ---
# modify timezone (need relogin)

export "TZ=Asia/Shanghai'\n" >> ~/.profile

# --- step 4 ---
# configure zsh (installed in \$INSTALLED_PACKAGES) / oh-my-zsh

# install oh-my-zsh (built-in backward search)
# ref: https://ohmyz.sh/#install
sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

# --- step 5 ---
# diy zsh based on 'oh-my-zsh'

# add dynamical time display
echo 'PROMPT="%{$fg[yellow]%}%D{%Y/%m/%d} %D{%H:%M:%S} %{$fg[default]%}"$PROMPT' >> ~/.zshrc

# set zsh as the default terminal (need relogin!)
sudo chsh -s $(which zsh) # after configed sudo, no need to input password

# enable zsh changes
exec zsh

# --- step 6 ---
# re-login
sudo pkill -u $USER

BEST-PRACTICE: linux connection (`ssh`)

ssh generate public keys

ref: https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key

mkdir -p ~/.ssh
cd ~/.ssh
ssh-keygen -o

ssh no secret/password

提示

最近遇到了发现即使加了 authorization_keys 还是不行，搜了很多帖子，大致理出了解决办法

首先可以在服务器上使用 /usr/sbin/sshd -d -p 2222 开启sshd调试，然后在client上使用 ssh -vv xxx@xxx 进行调试，主要看服务端报什么错，例如：esca算法不支持啥的，然后就意识到了是 ssh 版本太低，要升级。

不过由于 ssh-client 依赖于 ssh-server，所以也可能还比较难升，具体的办法可以用：

sudo apt purge openssh-server
sudo apt-get install openssh-client
sudo apt-get install openssh-server
sudo systemctl restart sshd
sudo systemctl restart ssh

It's easy that if only you generate a id_rsa.pub and scp to your ~/.ssh/authorization_keys then things all done.

Solution 1

username="xxx"
server="xxxxx"
file="~/.ssh/id_rsa.pub"
cat $file | ssh $username@$server "cat - >> ~/.ssh/authorized_keys"

# sample
cat $file | ssh aosp@192.168.1.242 "cat - >> ~/.ssh/authorized_keys"

Solution 2

username="xxx"
server="xxxxx"
file="~/.ssh/id_rsa.pub"

scp $file $username@$server:
ssh $username@$server
cat $file >> .ssh/authorized_keys
rm $file

ref:

How to use the Linux ‘scp’ command without a password to make remote backups | alvinalexander.com

ssh keep connection alive

the simplest way is to force the client to keep sending [a null] message(packet) to the server, in case that the server closed the connection beyond the time limitation, and what you need to do is just to modify 2 lines in your /etc/ssh/ssh_config file.

sudo vim /etc/ssh/ssh_config

change into these:

HOST: *
    ServerAliveInterval 60

finally, maybe you should restart your client. If you use the mac, you can:

# restart-ssh, reference: https://gist.github.com/influx6/46c39709a67f09908cc7542ca444fca2
sudo launchctl stop com.openssh.sshd
sudo launchctl start com.openssh.sshd

BEST-PRACTICE: linux env management

how to change apt source

ref:

command line - How do I change mirrors in Ubuntu Server from regional to main? - Ask Ubuntu

resolution 1: manual change from the App of `Softwares & Updates`

resolution 2: modify the configuration manually from the terminal

CONCLUSION

MIRROR_FROM="us.archive.ubuntu.com"
MIRROR_TO="mirrors.tuna.tsinghua.edu.cn"
APT_FILE="/etc/apt/sources.list"
sudo sed -i "s|${MIRROR_FROM}|${MIRROR_TO}|g" ${APT_FILE}

DETAIL

There are a few mirror servers can be used in China:

mirrors.tuna.tsinghua.edu.cn
ftp.sjtu.edu.cn
mirrors.aliyun.com
mirrors.huaweicloud.com
mirrors.yun-idc.com
...

The format of these mirrors may be as http://${MIRROR_URL}/ubuntu/

And the default configuration of ubuntu servers are at /etc/apt/sources.list, with a copy of backup at /etc/apt/sources.list.save.

Here is what the save contents are:

// /etc/apt/sources.list.save
#deb cdrom:[Ubuntu 18.04.6 LTS _Bionic Beaver_ - Release amd64 (20210915)]/ bionic main restricted

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://us.archive.ubuntu.com/ubuntu/ bionic main restricted
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic main restricted

## Major bug fix updates produced after the final release of the
## distribution.
deb http://us.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic-updates main restricted

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ bionic universe
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic universe
deb http://us.archive.ubuntu.com/ubuntu/ bionic-updates universe
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb http://us.archive.ubuntu.com/ubuntu/ bionic multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic multiverse
deb http://us.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic-updates multiverse

## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu bionic partner
# deb-src http://archive.canonical.com/ubuntu bionic partner

deb http://security.ubuntu.com/ubuntu bionic-security main restricted
# deb-src http://security.ubuntu.com/ubuntu bionic-security main restricted
deb http://security.ubuntu.com/ubuntu bionic-security universe
# deb-src http://security.ubuntu.com/ubuntu bionic-security universe
deb http://security.ubuntu.com/ubuntu bionic-security multiverse
# deb-src http://security.ubuntu.com/ubuntu bionic-security multiverse

how to know what's the os platform

# mac: Darwin
uname

# if platform is mac
if [[ $(uname) == Darwin ]];
then XXX;
else YYY;
fi;

ref:

bash - How to check if running in Cygwin, Mac or Linux? - Stack Overflow

how to configure python environment

install the python on the server, the version of which would better correspond with the one of the local in case of unexpected error caused by version difference
use virtualenv to create an env based on this python version named venv_py under this working directory
activate this env
use pip to install the requirements.txt
run!

PY_VERSION=python3.9

# install the target python version based on its version number
# if you don't use these two lines, then you would suffer from `wget blablabla...` when you checked what the hell the python repo url is
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install ${PY_VERSION}

# use `virtualenv` to create and activate a new python env fast~
sudo apt install virtualenv
virtualenv -p ${PY_VERSION} venv
source venv_py/bin/activate

# install all the requirements
# if you need to dump all the requirements of a python project used, you can use `pip freeze > requirements.txt` so that a file named of `requirements.txt` would be generated under the current directory
pip install -r requirements.txt

# run our backend of `fastapi`
python main.py

✅ cannot use `sudo apt-get install xxx` to install packages

cd /var/lib/dpkg/updates
rm -rf ./*

sudo apt-get update

sudo apt-get disk-upgrade # it may cost a little of time

ref:

I changed the suggestion in this article from rm -r to rm -rf, otherwise not successful.

E:dpkg was interrupted, you must manually run'dpkg 配置'to correct the problem. - 码上快乐

This discussion seems wonderful but didn't get my problem solved.

apt - "The package lists or status file could not be parsed or opened" - Ask Ubuntu

BEST-PRACTICE: linux file system management

mkdir if not exist

mkdir -p DIR

ref:

mkdir

ls and delete files

ls | grep STR | xargs rm -f

WARNING! Since the operation pipeline is silent, you are likely to remove files that you did not intend to remove.

Hence, you'd better use ls | grep STR first to check whether all the files to remove meet your expectation.

fastest delete file

Don't bother checking if the file exists, just try to remove it.

rm -f PATH

brew install dialog
# or
rm PATH 2> /dev/null

find . -name 'test'

if [ "$BLEG" != xxx ]; then command; fi

print("hello")

interface Test {
  name: string;
}

ref:

shell script to remove a file if it already exist - Stack Overflow

how to show absolute path of file from relative

I cannot use brew install realpath like their apt install realpath, but I can use realpath, which may be pre-built-in.

Plus, later I saw that maybe realpath is a submodule of mac package, which is named as findutils.

realpath FILE

ref:

bash - How to retrieve absolute path given relative - Stack Overflow

how to copy file into clipboard

core ref: https://apple.stackexchange.com/a/15327

it's easy to copy a text file

# copy
pbcopy < FILE

# paste to command line
pbpaste

# paste to a new file
pbpaste > FILE2

But attention, the pbpaste would cause corruption when deals with binary file.

but it cannot be done for a binary file

Since the traditional command + c | command + v is just copy the reference of file into clipboard, instead of the content itself, we had no way to use pbcopy to copy a file, and then use command + v to paste at another place.

A solution is to use osascript.

#!/usr/bin/osascript
on run args
  set abs_path to do shell script "/usr/local/bin/greadlink -f -- " & (first item of args)
  set the clipboard to POSIX file abs_path
end

ref:

mac - How to use terminal to copy a file to the clipboard? - Ask Different

how to show file size

# -l show detail
# -h show 'human readable size
ls -lh FILE/DIR

ref:

How to Make ls Command to Show File Sizes in Megabytes in Ubuntu

how to compare between files (`diff` & `vimdiff`)

see: bash - unix diff side-to-side results? - Stack Overflow

There is a few of diff commands for us to choose.

resolution 1: `diff F1 F2`

resolution 2: `diff -y F1 F2` or `sdiff F1 F2`

resolution 3: `vimdiff F1 F2`

It's awesome! Isn't it?

TODO: resolution 4: git diff

FIXME: how to copy/move directory files correctly to soft links under target directory without affecting git?

example:

When I zipped one modified frameworks/native directory to be e.g. RAW, and then reset the frameworks/native to be the init.

Then I move all the files under RAW to frameworks/native with the command:

cp -r RAW/* TARGET/frameworks/native/

The error arose up since there are soft links under frameworks/native, such as libs/ui which is indeed libs/ui -> XXX/ui.

However, in my zipped file of RAW, the links seemingly have turned to be the real files/dirs, which introduced the problem directory --> non-directory.

The wanted effect is copying/moving all the files under conflicted directory to where they should be.

However, the git marked those files as TypeChange...

ref:

BEST-PRACTICE: linux disk management

`ncdu`, disk space tui

`baobab`, disk space gui

ref:

install problem

When installing ncdu, error ocurred: No such file or directory @ rb_sysopen ruby - Stack Overflow

The reason is that some dependency is missing, we can first install it and then install the target.

brew install librsvg
brew install baobab

effects

BEST-PRACTICE: linux shells management

ref:

this article is enough and recommended:

list all the shells

$ cat /etc/shells # list valid login shells
/bin/sh
/bin/bash
/bin/rbash
/bin/dash
/bin/zsh
/usr/bin/zsh

background: `sh` is different with `bash`

When I write source in shell script, and run by sh xx.sh, then it failed with no permission.

However, when I use bash xx.sh, then everything runs well.

Thus, the sh definitely doesn't equal as bash, and it seems that function of sh is the subset of bash.

If so, why I still need to use sh? Just for short?

ref: https://stackoverflow.com/a/48785960/9422455

see what's the current Shell

[1:42:41]:~$ echo $SHELL
/usr/bin/zsh
[1:43:25]:~$ echo $0
/usr/bin/zsh
[1:43:29]:~$ ps -p $$
   PID TTY          TIME CMD
 29657 pts/2    00:00:00 zsh

switch shell

You can change your default shell using the chsh (“change shell” ) command as follows.

The syntax is:

# usage
chsh
chsh -s {shell-name-here}
chsh -s {shell-name-here} {user-name-here}

# samples
chsh -s /bin/bash
chsh -s /bin/bash $USER

chsh -s $(which zsh)

https://askubuntu.com/questions/131823/how-to-make-zsh-the-default-shell

BEST-PRACTICE: linux terminal management

✅ the terminal cannot up down after editing

This is a problem confused me for a long time.

Today, I finally knows what's the hell at: linux - How to scroll up and down in sliced "screen" terminal - Stack Overflow

Anyway, terminal is hard to learn, I just know control + a can help me exit the so-called copy mode.

TODO: bind `option + arrow` to jump word in zsh on ubuntu vmware on MacOS

ref

BEST-PRACTICE: linux commands management

⚠️
be careful to use ``` in terminal / shell since it's would be treated as executable commands:
see: (20 条消息) shell 基础知识-echo 及单引号、反引号和双引号详解_Luckiers 的博客-CSDN 博客_echo 单引号和双引号

how to auto input in command

auto input password for sudo commands

sparkles: Use sudo -S to read input from stdin.

# sample
echo "$USER ALL=(ALL:ALL) NOPASSWD: ALL" | sudo tee -a /etc/sudoers.d/$USER

ref:

shell - sudo with password in one command line? - Super User

auto yes for some command (`yes |` )

# usage
yes | COMMAND

# example
yes | sh ./install.sh # install oh-my-zsh

ref:

linux - How do I script a "yes" response for installing programs? - Stack Overflow

auto yes for `apt` installing packages (`-y`)

Just add a -y in the command.

Example:

sudo apt install -y htop

ref:

apt-get install with --assume-yes is still prompting me to install dependencies - Ask Ubuntu

how to search commands by prefix (`history-search-backward/forward`)

# ~/.inputrc

# Respect default shortcuts.
$include /etc/inputrc

# choice 1: recommended
"\e[A": history-search-backward     # arrow up      --> backward
"\e[B": history-search-forward      # arrow down    --> forward

# choice 2: if prefer to the page up/down
"\e[5~": history-search-backward    # page up       --> backward
"\e[6~": history-search-forward     # page down     --> forward

;warning: you should Close and re-open all terminals for the new behavior to become effective.

ref:

how to repeat command

#  only show the last result
watch -n X command # X: X seconds; command may need quotes

# show all the result history
while true; do command; sleep X; done; # command may need quotes

ref:

bash - Repeat a command every x interval of time in terminal? - Ask Ubuntu

how to use variable as multi args

# when there's only one arg as a variable, it's ok to directly use it, and the following two methods are equal
PACKAGE_TO_INSTALL="vim"
PACKAGE_TO_INSTALL=vim
sudo apt install -y $PACKAGE_TO_INSTALL

# However, if there are multi args as a variable, we need to use [`echo`](https://stackoverflow.com/a/30061925/9422455) to escape the 'hidden quotes' if I didn't understand wrongly. And also, the quotes can't be omitted, or use slashes.
PACKAGES_TO_INSTALL="vim git htop zsh terminator"
PACKAGES_TO_INSTALL=vim\ git\ htop\ zsh\ terminator
sudo apt install -y $(echo $INSTALLED_PACKAGES)

# Since the `echo` is not safe, another way is to use [`xargs`](https://stackoverflow.com/a/51242645/9422455), which seems more professional
PACKAGES_TO_INSTALL="vim git htop zsh terminator"
echo $PACKAGE_TO_INSTALL | xargs sudo apt install -y

ref:

how to set an alias

resolution 1: in terminal

⚠️ this solution only works upon the next command, which can work immediately when executed in shell script file

# don't add any other characters after alias in order to catch bug
alias sed=gsed

resolution 2: write into `~/.bash_aliases`

# ~/.bash_aliases
alias update='sudo yum update'

⚠️ this solution needs to ensure the .bash_aliases enabled in .bashrc

✨ resolution 3: use `.bash_aliases` with `zsh`

Just add one line in ~/.zshrc:

# ~/.zshrc
source ~/.bash_aliases

ref:

How to create a permanent Bash alias on Linux/Unix - nixCraft

unalias

# sample
unalias logout

ref:

How to Create and Use Alias Command in Linux

how to compare between outputs from two commands

diff <(ls old) <(ls new)

ref:

How do I diff the output of two commands? - Ask Ubuntu

BEST-PRACTICE: linux accounts management

init user with root config

脚本：

USER_=chuanmx
HOME_=/home/$USER_

# add user with password and home directory
sudo useradd -m $USER_
echo "$USER_\n$USER_" | sudo -S -k passwd $USER_
sudo usermod -a -G dev $USER_

# enable ssh
sudo mkdir $HOME_/.ssh

# enable zsh
sudo cp -rf ~/.oh-my-zsh $HOME_/
sudo cp  ~/.zshrc $HOME_/
sudo chsh --shell /bin/zsh $USER_

# enable user grant
sudo chown -R $USER_:$USER_ $HOME_

# avoid typing password when using sudo command
echo "$USER_  ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/$USER_

how to create user

# create user with a home directory
sudo useradd -m $USER_
ls -la /home/$USER_


# create user [under root]
sudo useradd $USER_

# create passwd [under root]
sudo passwd $USER_

how to log out

resolution 1 (11.10 and above)

gnome-session-quit

resolution 2

sudo pkill -u $USER

ref:

command line - How can you log out via the terminal? - Ask Ubuntu

set a shorter password for ubuntu

sudo passwd <USER>

ref:
- https://askubuntu.com/questions/180402/how-to-set-a-short-password-on-ubuntu

BEST-PRACTICE: linux net management

how to know my public ip address

resolution 1:

# https://apple.stackexchange.com/questions/20547/how-do-i-find-my-ip-address-from-the-command-line
curl ifconfig.me

resolution 2:

# https://www.digitalocean.com/community/tutorials/how-to-configure-remote-access-for-mongodb-on-ubuntu-20-04#:~:text=curl%20%2D4%20icanhazip.com
curl -4 icanhazip.com

how to monitor network traffic

sudo apt install nethogs

sudo nethogs

ref:

networking - Network usage top/htop on Linux - Stack Overflow

FIXME: check proxy

In Ubuntu 18.04.6 LTS (Bionic Beaver), it introduced two methods to see what/which proxies are we using:

## approach 1
echo $http_proxy

## approach 2
env | grep -i proxy

However, when I configured the proxies in Manual Proxy, I am surprised to find nothing using either commands in the above, while the ping to google.com does work so that I use it as the measure then.

ping google.com

And another wield thing is before the system was restarted, the env | grep -i proxy even shows duplicated results and the change in Manual Proxy doesn't work, which is quite confusing.

Maybe we can do more tests later.

BEST-PRACTICE: linux date/time management

how to format date in terminal

⚠️
the space in formatter should be using \ or anything other ways
date是一个函数，不是变量，变量采用$XX或者${XX}的形式，但是函数要用$(XX)，并且不能在字符串中

# directly output date
date +%Y-%m-%d\ %H:%M:%S

# output date into variable
T='the date is '$(date +%Y-%m-%d\ %H:%M:%S)

ref:

shell script - How to concatenate a date variable and string variable in unix? - Unix & Linux Stack Exchange

how to change timezone (and time)

resolution 1 (conclusion): directly export

echo "export TZ='Asia/Shanghai'\n" >> ~/.profile

sudo pkill -u $USER --force

resolution 2 (detail): choose following directions

# check current time, as well as timezone
date -R

# if the ending is `+0800`, then it's ok, otherwise you need to change (e.g. `-0800`)

# change timezone (just choose as directed)
tzselect

And finally you will get a command suggestion to write into profile file, that is #solution-1-directly-export

BEST-PRACTICE: linux system management

✅ `A stop job is running for Session c2 of user ... (1min 30s)`

resolution

restart system
journalctl -p5
search timed out. Killing
analyze the target process of Killing process 1234 (jack_thru) with signal SIGKILL.

⚠️ 注意，也有其他几种解决方案，比如装watchdog和缩短timeout时间的，这些都侵入性太高了，并且不是治本之策，所以还是得从 log 来分析原因找对应政策。尤其是装watchdog的方案，我简单看了一下，大致是每分钟检查一下系统的情况，但问题是，为什么很久以前系统就没有这种问题呢？那个时候也没装 watchdog 啊，所以对于这个问题，我们不能偷懒!

result

It tells me the last one is because of adb, since I do open the adb and not responding then.

And I also checked the last few times when timed out, but to find they are different.

So I confirmed the timeout error is temporary, since now I am not going to run any adb.

I tried to restart again, and the system does well which identified what I think.

ref

-----------------------------------

BEST-PRACTICE: linux common commands

command:tar

# x: extract, f: file
tar -xf FILE

# v: verbose, logging output, careful when extracting big files, e.g. AOSP
tar -vxf FILE

ref:

How to Extract (Unzip) Tar Gz File | Linuxize

command:perl

how to use perl to replace multi-lines

perl -0pe 's/search/replace/gms' file

-0: without the -0 argument, Perl processes data line-by-line, which causes multiline searches to fail.
-p: loop over all the lines of a file
-e: execute the following arg as a perl script
/s: changes the behavior of the dot metacharacter . so that it matches any character at all. Normally it matches anything except a newline "\n", and so treats the string as a single line even if it contains newlines.
/m: modifies the caret ^ and dollar $ metacharacters so that they match at newlines within the string, treating it as a multi-line string. Normally they will match only at the beginning and end of the string.
/g: global replace(not sure)

ref:

explaining -0: Multiline search replace with Perl - Stack Overflow
explaining /m | /s: regex - Understanding Perl regular expression modifers /m and /s - Stack Overflow

special thanks to: Not sure if you know, but sed has a great feature where you do not need to use a / as the separator.

command:find

how to ignore case

find -iname

ref:

Find command: how to ignore case? - Unix & Linux Stack Exchange

how to specify search type

Use -type to specify the search type (default is c for file), here I used d for directory, ref: find type

And then, when I search directory, it would search all the sub-folders with '/' concatenated, so I need to specify -d 1 in order to only search the top current directory.

➜  Application Support find . -name '*electron*' -type d -d 1
./electron-react-boilerplate
./electron-react-typescript
➜  Application Support rm -rf electron-react-boilerplate
➜  Application Support rm -rf electron-react-typescript

how to exclude dir

TODO: in fact, I really can't catch why -prune is combined with -o (or)

# 1. use `-not -path`
find -name "*.js` -not -path "./directory/*"

# 2. use `-path xx -prune`
find . -path ./misc -prune -o -name '*.txt' -print

# 3. use multiple prune (need to add escaped bracket)
find . -type d \( -path ./dir1 -o -path ./dir2 -o -path ./dir3 \) -prune -o -name '*.txt' -print

# 4. use regex prune (-name)
find . -type d -name node_modules -prune -o -name '*.json' -print

ref:

linux - How to exclude a directory in find . command - Stack Overflow

TODO: how to find file with time used

tip: find efficiency comparison

Use a specified directory is the best and fastest;

If not, limit the maxdepth to a number small enough is also ok;

And then consider the directory prune.

Finally bared run is the worst.

➜  hjxh_express_match git:(main) time find .imgs  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find .imgs -name   0.00s user 0.00s system 52% cpu 0.005 total

➜  hjxh_express_match git:(main) time find . -maxdepth 3  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -maxdepth 3 -name   0.01s user 0.05s system 70% cpu 0.079 total

---

➜  hjxh_express_match git:(main) time find . -maxdepth 4  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -maxdepth 4 -name   0.06s user 0.69s system 87% cpu 0.854 total

➜  hjxh_express_match git:(main) time find . -maxdepth 5  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -maxdepth 5 -name   0.14s user 1.86s system 93% cpu 2.137 total

➜  hjxh_express_match git:(main) time find . -maxdepth 6  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -maxdepth 6 -name   0.26s user 3.21s system 94% cpu 3.683 total

---

➜  hjxh_express_match git:(main) time find .  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -name   0.44s user 5.85s system 51% cpu 12.172 total

➜  hjxh_express_match git:(main) time find . -path './.imgs/*'  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -path './.imgs/*' -name   0.46s user 5.93s system 51% cpu 12.299 total

➜  hjxh_express_match git:(main) time find . -path './.imgs/*'  ! -path "**/node_modules/*"  -name readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
./.imgs/readme-1641287704584-613d44afa250b17be45e5b366487d1dbd42939da44543700b5e7fbd7f6a8ca9e.png
find . -path './.imgs/*' ! -path "**/node_modules/*" -name   0.46s user 5.91s system 51% cpu 12.268 total

command:grep

how to exclude dir (`--exclude-dir=dir`)

Recent versions of GNU Grep (>= 2.5.2) provide:

--exclude-dir=dir

which excludes directories matching the pattern dir from recursive directory searches.

So you can do:

grep -R --exclude-dir=node_modules 'some pattern' /path/to/search

ref:

linux - How can I exclude directories from grep -R? - Stack Overflow

how to limit depth (`-maxdepth`)

 find . -maxdepth 4 -type f -exec grep "11.0.0_r1" {}  \;

ref:

How to do max-depth search in ack and grep? - Unix & Linux Stack Exchange

tip: grep by lines context is MUCH FASTER than grep by words context, and even better for output

➜  erb git:(main) ✗ time (cat release/build/mac/皇家小虎快递分析系统.app/Contents/Resources/app/dist/main/main.js | tr ";" "\n" | grep --context=3 'fake-database')
# var n=this&&this.__importDefault||function(e){return e&&e.__esModule?e:{default:e}}
# Object.defineProperty(t,"__esModule",{value:!0}),t.isDbFinished=t.initDbUpdateResult=t.initDbInsertResult=t.DB_UPDATE_DECLINED=t.DB_UPDATED=t.DB_INSERT_DUPLICATED=t.DB_INSERT_SUCCESS=t.DB_UNKNOWN=t.DB_TIMEOUT=t.DB_TABLE_NOT_EXISTED=t.prisma=void 0
# const i=r(72298),a=`file:${n(r(71017)).default.join(i.app.getPath("userData"),"express_match.sqlite.db")}?connection_limit=1`
# process.env.DATABASE_URL=a,console.log({__dirname,rawDBPath:"file:dev.db?connection_limit=1",newDBPath:a}),t.prisma={erp:{create:()=>{console.log("fake-database: creating one")},findMany:()=>{console.log("fake-database: finding many")},upsert:()=>{console.log("fake-database: upserting one")}}},t.DB_TABLE_NOT_EXISTED="DB_TABLE_NOT_EXISTED",t.DB_TIMEOUT="DB_TIMEOUT",t.DB_UNKNOWN="DB_UNKNOWN",t.DB_INSERT_SUCCESS="DB_INSERT_SUCCESS",t.DB_INSERT_DUPLICATED="DB_INSERT_DUPLICATED",t.DB_UPDATED="DB_UPDATED",t.DB_UPDATE_DECLINED="DB_UPDATE_DECLINED"
# t.initDbInsertResult=()=>({nTotal:0,nInserted:0,nDuplicated:0,nTimeout:0,nUnknown:0,nTableNotExist:0})
# t.initDbUpdateResult=()=>({nTotal:0,nUpdated:0,nDropped:0,nTimeout:0,nUnknown:0,nTableNotExist:0})
# t.isDbFinished=e=>{let t=0
( cat  | tr ";" "\n" | grep --color=auto  --context=3 'fake-database'; )  0.20s user 0.01s system 121% cpu 0.169 total

➜  erb git:(main) ✗ time ( grep -iEo '.{255}fake-database.{255}' release/build/mac/皇家小虎快递分析系统.app/Contents/Resources/app/dist/main/main.js | tr ';' '\n' )
# =`file:${n(r(71017)).default.join(i.app.getPath("userData"),"express_match.sqlite.db")}?connection_limit=1`
# process.env.DATABASE_URL=a,console.log({__dirname,rawDBPath:"file:dev.db?connection_limit=1",newDBPath:a}),t.prisma={erp:{create:()=>{console.log("fake-database: creating one")},findMany:()=>{console.log("fake-database: finding many")},upsert:()=>{console.log("fake-database: upserting one")}}},t.DB_TABLE_NOT_EXISTED="DB_TABLE_NOT_EXISTED",t.DB_TIMEOUT="DB_TIMEOUT",t.DB_UNKNOWN="DB_UNKNOWN",t.DB_INSERT_SUCCESS="D
( grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox} -iEo  t)  3.27s user 0.01s system 99% cpu 3.279 total

tip: grep by negative captured group needs to use `ggrep`

Examples Given the string foobarbarfoo:

bar(?=bar)     # finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar)     # finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar    # finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar    # finds the 2nd bar ("bar" which does not have "foo" before it)

You can also combine them:

# finds the 1st bar ("bar" with "foo" before it and "bar" after it)
(?<=foo)bar(?=bar)

ref:

lookaround - Regex lookahead, lookbehind and atomic groups - Stack Overflow

grep -Pio '(?<=heads\/)(.*?)(?=\n)' text.txt # P option instead of E

ref: https://stackoverflow.com/a/45534127/9422455

command:tree

how to display chinese (`-N`)

tree -N

ref:

(23 条消息) linux 下 tree 命令中文字符乱码解决方案_cxrsdn 的博客-CSDN 博客_linux tree 中文

how to exclude dir(`-I`)

# use `|` to split choices

# exclude
tree -I "XXX|YYY"   # maybe it means 'ignore'

ref:

tree command for multiple includes and excludes - Unix & Linux Stack Exchange

command:head

head basic usage

head 这个命令行其实没有什么要讲的，用法非常简单。

# output the first 5 lines (default)
head FILE

# output the first N lines (replace "N")
head -n "N" FILES

how to exclude the last k rows

但是今天 2022-01-26 碰到了一个问题，就是要获取前 n-1 行，然后试了 stackoverflow 上的方案好像都不对。

head -n -1 FILE

后来才知道，是因为 mac 的原因，要用ghead才行……

brew install coreutils
ghead -n -4 main.m

而 ghead 是在 coreutils 里面，这个名字一看我就有，所以也不用装了。

ref:

shell - head command to skip last few lines of file on MAC OSX - Stack Overflow

discuss: use `head` or `sed`

今天之所以用到 head 其实是因为我想对一个流做 sed 处理，但略去最后一行。

我一开始想用 sed 里的范围标识（地址），但一直没试出来。

后来我把地址标识换成最简单的1,4s/find/replace/之后才意识到，我的思路是不对的。

1,4是一个确实被处理的范围选择，但结果就是，N 行中，前四行都被 sed 替换了，但是后面的行尽管没替换依旧会打印出来，而这正是 sed 的默认行为，它是一个流转换器。

那如果纯粹基于 sed，进行转换，并且去掉最后一行，就要用到-o结合好像是/p的 identifier，具体我也记不大清了。意思就是输出所有被匹配的行，但如果这样的话被匹配的行就会输出两遍了（两遍不必相同（NOT-SURE），一个是转换前，一个是转换后），所以另一个标识符就是阻止原内容的输出，这样就只输出匹配的行的处理结果。

所以，这个思路其实有问题，它可以对，也可以不对，它如果保证匹配 n-1 行，则对；否则就不满足需求了，我们的需求是处理前 n-1 行，并且不管匹配不匹配，都要输出，尽管实际上是都匹配的。

总之，理解到这层后，既可以用纯 sed 的方案，也可以用 head+sed 的方案，一个负责改，一个负责删即可，至于先后，结果都一样，也许 head 在前效率会更高一些。

所以，还是挺有意思的，这个。

command:top

Today(2022-01-27) I finally understood how to use the command of top (though htop is better for display but possibly more costly).

I can switch the display format of memory usage when top has gone to the interactive interface.

The first option I can use is E, which allows to switch memory unit between KiB | MiB | GiB | TiB.

And the second option I can use is m, which switches the memory display type between pure text, columns graph and block graph.

⚠️
不可以在命令行中直接用 top -M （显示以MB为单位的内存使用量），因为top -h的帮助中写的很清楚，只支持部分选项。在 stackoverflow 上有 top -M 的建议(see: linux - How to display meminfo in megabytes in top? - Unix & Linux Stack Exchange)，但可能我的版本（Ubuntu 18）不支持。正确的做法，是先进入top交互界面，然后按E，这样就会切换单位，比如按两次切换到GB单位。另外，独立的，还可以按m去切换内存的显示样式，比如像htop那种竖条状！
以上只在 ubuntu 上测试通过，在 mac 上我刚试了，不行！所以到底怎么用命令，还得根据平台自身！
具体的，可以通过 COMMAND -h去看简洁版的帮助页面，或者man COMMAND (e.g. Ubuntu Manpage: top - display Linux processes) 去看完整版的帮助页面！

command:cat

how to write raw string into file using `cat`

# example
GOPATH=$HOME/my-go
cat <<EOF >> ~/.profile
export GOPATH=$GOPATH
export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
EOF

ref:

linux - How does "cat \<\< EOF" work in bash? - Stack Overflow

usage: how to know what's the bash string

When using bash or zsh, we are supposed to use the function of bindKey.

However, the key we typed are always hard to remember, e.g. what's the ctrl key?

Luckily, there's a super easy (and also interactive) way for us to know it via cat, just to type cat followed by enter in terminal, and then the terminal would display what the character would be when we typing them.

E.g. here is what's the result when we combine control | option | command with arrow left and arrow right would be:

command:Unzip

how to unzip to specific directory

unzip file.zip -d TARGET_DIR

ref:

bash - How to unzip into a given directory - Ask Ubuntu

command:kill

 ps aux  |  grep -i  electron |  awk '{print $2}'  |  xargs sudo kill -9

ref:

https://stackoverflow.com/a/30486159/9422455

command:scp

sudo chown -R USER_NAME REMOTE_FOLDER
sudo chmod 777 REMOTE_FOLDER

the -R means "recursively" since there may be file deeply in the target folder that won't allow you to write.

Hence, you can know, you can specifically claim your authority on the file or directory, since it would not automatically transfer.

Thus, a flexible approach is that just add a -R flag.

reference: https://unix.stackexchange.com/a/347412

command:lsof

check status of port occupation

lsof -i:8888

command:ufw

# check status
sudo ufw status

# enable
sudo ufw enable

# white list
sudo ufw allow 9000

# reload
sudo ufw reload

ref:

(20 条消息) Ubuntu 防火墙的开启，关闭，端口的打开，查看_jiaochiwuzui 的博客-CSDN 博客_ubuntu 查看防火墙开启的端口

command:time

I can directly using time ( COMMAND_1 | COMMAND_2 ), so that it can calculate the final time of this pipe.

However, time calculation is a little more sophisticated than what would I have thought, refer more to: bash - How can I time a pipe? - Unix & Linux Stack Exchange

command:tr

It's useful to split line to lines.

$ echo "111;222;333" | tr ';' '\n'
111
222
333

# use `nl` to add the line number
cat main.js | tr ';' '\n' | nl -ba | head -6

command:cd

usage: a superb scene using `cd .`

ref:

linux - How do I refresh directory in BASH? - Super User

command:sed

ref

a good start:

Sed Command in Linux/Unix with examples - GeeksforGeeks

how to print only matched lines

-n means "No automatically print the each line"
/p means "Print the processed line"

# print only the matched lines
sed -n "s|find|replace|p"

# don't print any line (so useless)
sed -n "s|find|replace|"

# print all the line each, with matched line again(twice, and it's the same)
sed "s|find|replace|p"

# TODO: print the processed, and apply function on it.

ref:

regex - sed: print only matching group - Stack Overflow

✨ how to increment version number

resolution 1: use `echo` based on `//e`

special thanks to: https://stackoverflow.com/a/14348899/9422455

resolution 2: answer

gsed -i -E 's|(.*)"version": "([0-9]+)\.([0-9]+)\.([0-9]+)"|echo "\1\\"version\\": \\"\2.\3.$((\4+1))\\""|e' package.json

test what happened using //pe

➜  erb_sqlite git:(main) head -3 release/app/package.json                                                                                                                                           [7-01-22 | 4:18:17]
{
  "name": "mark.hjxh.express_match",
  "version": "0.2.2",


➜  erb_sqlite git:(main) gsed -E 's|(.*)"version": "([0-9]+)\.([0-9]+)\.([0-9]+)"|echo "\1\\"version\\": \\"\2.\3.$((\4+1))\\""|pe' release/app/package.json                                        [7-01-22 | 4:08:57]
{
  "name": "mark.hjxh.express_match",
echo "  \"version\": \"0.2.$((2+1))\"",
  "version": "0.2.3",

explanation

In fact, the "version": "0.2.2", is changed into echo " \"version\": \"0.2.$((2+1))\"",.

And then the e syntax indicates run this sequency string as a command, so that it finally turns into "version": "0.2.3",

attention

the " needs to be escaped, and to escape ", we need to use a \, and to let the \ work in the echo function, we need to escape it again, that is to be \\"
sed will match all line (including the leading space), and the e would execute all line. So if I just replace the version number part into echo "\\"0.2.3\\"", then all the line would turns into "version": echo "\\"0.2.3\\"", which is certainly unwanted and deserves reflection.

core ref

bash - How to find/replace and increment a matched number with sed/awk? - Stack Overflow

perl | awk alternative

bash - How to increment version number in a shell script? - Stack Overflow

official hack way (but I failed)

sh #!/usr/bin/sed -f

/^0-9/ d

replace all trailing 9s by _ (any other character except digits, could

be used)

:d s/9(*)$/\1/ td

incr last digit only. The first line adds a most-significant

digit of 1 if we have to add a digit.

s/^(_*)$/1\1/; tn s/8()$/9\1/; tn s/7(__)$/8\1/; tn s/6()$/7\1/; tn s/5(_*)$/6\1/; tn s/4(__)$/5\1/; tn s/3(__)$/4\1/; tn s/2(__)$/3\1/; tn s/1(_*)$/2\1/; tn s/0(_*)$/1\1/; tn

:n y/_/0/

- [sed, a stream editor](https://www.gnu.org/software/sed/manual/sed.html#Increment-a-number)

#### how to match digits (`[0-9]` or `:digit:`)

> ref:

thanks for the direction to sed official documentation in this post.

- [regex - Why doesn't `\d` work in regular expressions in sed? - Stack Overflow](https://stackoverflow.com/questions/14671293/why-doesnt-d-work-in-regular-expressions-in-sed)

#### how to insert text before first line of file

suppose the text is:

```text
@tailwind base;
@tailwind components;
@tailwind utilities;

and the file is public/style.css

first, export this variable for better understanding of commands:

T='@tailwind base;
@tailwind components;
@tailwind utilities;'
F='public/style.css'

and copy file as a backup:

cp $F ${F}_

then the reset command is:

cp ${F}_ $F

resolution 1: use `cat` and `;`

The cat approach meets our intuition, but needs a temp file.

First, we dump the T into temp file, then append F into temp, finally replace F with temp, that is:

echo $T > temp; cat $F >> temp; mv temp $F

Be careful about the second operator of >> since it means append otherwise the $T would be flushed, then $F keeps unchanged.

refer:

What are the shell's control and redirection operators? - Unix & Linux Stack Exchange

resolution 2: use `cat` and `|`

In last solution, we used 2 ';', and there is an easy way to change it to just 1 ';'.

echo $T | cat - $F > temp; mv temp $F

In this solution, the $T echoed into second pipe, and cat used - to capture it then joined with $F and dumped into temp, which surely is faster.

refer:

resolution 3: use sed s-command

In the above 2 solutions, we both need an IO, i.e. first saving into a 'temp' file and move to override the raw file, which could be low-efficiently and not elegant.

There's a way allowing us to finish the 'join' action just in the pipe and finally leads to inplace-modify target file. That is "sed's s-command".

When using s-command, we can easily apply regex grammar to achieve what we want.

Like this, we can easily insert text in front of a sequency of text based on regex of '^', which means the beginning of text.

And then, since the basic grammar of how to insert text before specific line of an input file in sed is sed -i 'Ni TEXT' $F , the problem then converts to how to join '1i' with $T. That is what we just learned can be put into practice:

You see, now all the commands have nothing to do with the io, and the principle behind this command chain is straightforward: join into 1i $T then use it as sed -i’s parameter (via -f- which is the former pipe).

resolution 4: use sed e-command

I'd think the e-command is quite confusing, but does good job.

I made some tests on the e-command to help myself understand.

In the above introduction, it indicates that if 'without parameters, the e command executes the command that is found in space and replace the pattern space with the output'.

suppose we have a='aaa\necho "bbb" \nccc', then if we runs echo $a | gsed '2e', that's to say run the second line as a command and let others stay as what they are:

However, 'if a parameter is specified, instead, the e command interprets it as a command and sends its output to the output stream.'

I made an example which may help us to understand what's the mechanism of gsed 'ne xxx, in which xxx is the so-called 'parameter'.

You can see, as the following shows, since a is a three-line text and sent into the pipe as stream, so first line shows 'aaa', and second line shows 'echo "bbb"', as what we preset.

The most notable point is that since gsed sets a 3e command which means 'it will execute following commands at 3rd row of stream'. Thus, the following commands xxx\n echo "yyy"... are executed as separate commands split by lines.

Obviously, neither xxx nor zzz is a valid command and turned into an error. Plus, since the level of error usually is a bit higher than normal output, the error of zzz came before yyy and then is AAA.

Finally, when all the commands were executed, the next stream in pipe came, i.e. ccc, and all the sequences came into end.

ref:

sed e-command

Still, we had other topics to talk about.

We can know the classic usage of sed is sed SCRIPT INPUTFILE, and if -e option is used to specify a script, with all non-option parameters taken as input files.

So what would happen when we combine the -e and an input file.

Back to what we covered the above, we can move a step further now.

In this example, we can see that gsed first read one line from ../temp file which is AAA, and then paused since the 2e flag to execute cat - command which shows all the input stream from echo $a, and finally continued to read the remaining rows of BBB and CCC.

So what about if we specify a -i option, which means change in position?

It's easy to understand, that is all the output would be sent into ../temp, so that ../temp changes to the output just like the result in this example shows.

Hence, we can derive from our conclusion: if we use the following command:

echo $T | gsed -i '1e cat -'  $F

then the goal of inserting text before first line can be achieved just on the fly~

How amazing and beautiful it is!

ref:

gsed overview

conclusion

G1. To insert lines at the beginning of one file:

# 1. dump, dump, and move
echo $T > temp; cat $F >> temp; mv temp $F;

# 2. join, dump, and move
echo $T | cat - $F > temp; mv temp $F;

# 3. [sed s-command] concat-string, inplace-insert
echo $T | gsed '1s/^/1i /' | gsed -i -f- $F

# 4. [sed e-command] ... hard to conclude
echo $T | gsed -i '1e cat -' $F

G2. To insert lines at specific line:

# 1. if text is single line, refer: https://stackoverflow.com/a/6537587/9422455
gsed -i 'Ni $T' $F

# 2. if text is multi lines, refer:
echo $T | gsed -i 'Ne cat -' $F

FIXME: (failed) G3. To insert content after matched position:

# 1. [sed r-command]
echo $T | gsed -i '$P/r -' $F #  the '-' is same for '/dev/stdin'

G4. To insert multi lines manually:

# 1, when lines are already multiplied, just add `\` after each line, refer: https://askubuntu.com/a/702693

# 2, when lines are in one, using `\n`, refer: https://askubuntu.com/a/702690

G5. To insert lines after matched line with the same leading space format:

⚠️ 嵌套 sed 正则注意事项
输入的文本不能与分隔符相同，否则需要转义。比如本次为了在代码中加入注释//，就不方便用/当分隔符
嵌套正则的时候，为了能分清一级与二级，可以应用不同的分隔符，比如本次更新用了|作为一级，_作为二级

# 1. [sed s-command] leading space align with $P
# failed at 2022-01-25
# echo $T | gsed 's/^.*$/s\/(^.*?)($P.*$)\/\\1\\2\\n\0\//' | gsed -E -f- $F

# updated at 2022-01-25
echo $T | gsed 's|^.*$|s_^(.*?)('$P'.*)$_\\1\\2\\n\\1\0_|' | gsed -E -f- $F | grep --context=3 $T

# FIXME: (failed) 2. [sed s-command with named-pipe] [more straightforward]
cat <(echo -n "s/(^.*?)($P.*?$)/\1\2\\\\n\1" & echo -n $T & echo -n "/") | gsed -E -f- $F

ref:

shell - How do I join two named pipes into single input stream in linux - Server Fault

编程 | 记与 Webpack 十二小时的奋战

2021/12/24 · 阅读需 13 分钟

2021 年 12 月 24 日

Content

这篇必须用中文记录，而且直到目前还没有完全解决，只实现了一个还可以接受的替代方案，背后还有一些东西没弄明白。

先直接说结论：

如果所有的依赖包都是commonjs形式（使用require和module.export），子包的package.json中没有定义type: "module"，则 webpack 无论怎么写，基本都不会出错，直接运行webpack --config xxx.webpack.config.js即可
如果xxx.config.js文件中使用了import xxx from xx或者子包都是用的type: "module"，则在项目的package.json中定义一些type: "module"，也可以解决问题
如果出现了commonjs和esm混用的情况，则会报错，要么导致Cannot use import statement outside a module，要么导致mjs等等无法导入因为有type: module之类的问题，下面着重说一下在该种情况下的解决方案

我碰到的问题就是，我直接 clone 的 github 上的两个 boilerplate，一个是electron-react-typescript-webpack-boilerplate，还有一个是electron-typescript-react，这两个 boilerplate 虽然配置有些差异，而且一个用到了babel一个没用，但是共同点就是webpack.config.js中都没有使用import语法，webpack的所有依赖也都是commonjs。

而我为了将markdown文件在前端渲染出来，虽然我可以直接使用marked或者react-markdown这类 package，但是基于以前nextjs | gatsby等框架的经验，直接基于webpack进行markdown解析体验会比较好（毕竟两个项目都用上webpack了）。

但是问题来了，我在webpack官网中找到加载markdown相关的loaders的 remark-loader | webpack 章节时，实际使用发现remark-html是esm，没法直接用，我的webpack会报错。

接着我就经历了 12 个小时的奋斗，尝试了几乎所有能找到的方案，从最基础的改type: "module"，到用上babel-register，甚至详细去读了webpack、babel的相关内容，最后不得不都以失败告终。其中应该说最有影响力、提供解决方案最多（然而最终都失败）的帖子应该是下面这个：

How can I use ES6 in webpack.config.js? - Stack Overflow

问题说的很清楚，就是要在webpack.config.js中使用ES6。

但答案都不 OK，最终，我在万念俱灰无数次之后，偶然在 Bug at render-markdown stage of tutorial: Must use import to load ES Module · Discussion #27758 · vercel/next.js 这个帖子里找到了一句话：npm install remark@13 remark-html@13，凭着经验，我一下就知道是怎么回事了，因为前两天我刚刚与万恶的webstorm不支持tailwindcss最新版最终不得不安装tailwindcss@2.0.1-compat做了同样艰苦的斗争！

果然，安装指定的版本后，项目终于成功运行：

至此，所有与webpack的斗争，应该说告一段落了，不然，我下一个更艰苦卓绝的打算，可能就是去看nextjs之类重型框架的源码，剖析它们是怎么高大上地使用webpack了。。

所以，目前，我可以确定的就是，只要保证所有webpack依赖的库都是commonjs，就可以不借助babel使项目运行，毕竟当一个前端项目需要用上babel的时候，它肯定就已经是另一个项目了，于是，一键 DELETE 恢复世界清净。

不过事实上，就像我开头说的，我 clone 了两个项目，其中一个用了 babel，用于转换jsx之类的文件的，它是基于electron-forge的，为了弄清楚当时的报错我还特意去看了这个库，说实话也没怎么弄明白。它的这个boilerplate对我最大的启发，应该是写了一个标准的main和renderer之间通信的接口，即：

main 中暴露 api 给 renderer 使用的接口定义

// src/@types/bridge.d.ts
import {api} from 'electron/bridge';

declare global {
  // eslint-disable-next-line
  interface Window {
    Main: typeof api;
  }
}

main 中对 api 使用的收发封装

// electron/bridge.ts
import {contextBridge, ipcRenderer} from 'electron';

export const api = {
  /**
   * Here you can expose functions to the renderer process
   * so they can interact with the main (electron) side
   * without security problems.
   *
   * The function below can accessed using `window.Main.sendMessage`
   */

  sendMessage: (message: string) => {
    ipcRenderer.send('message', message);
  },

  /**
   * Provide an easier way to listen to events
   */
  on: (channel: string, callback: Function) => {
    ipcRenderer.on(channel, (_, data) => callback(data));
  },
};

contextBridge.exposeInMainWorld('Main', api);

这两个文件，包括 main 和 render 程序中如何使用这两个文件的写法对我还是有很好的指导作用的，至少 electron 官网上对 typescript 的范式介绍比较少，这就需要从这些 boilerplate 中寻找指点。

但除此之外，这个 boilerplate 对我的意义就不大了，而且，我也不怎么看得懂它给定的几个scripts，其实无法就是开发版、生产版、运行、打包与测试的不同命令，但是被他搞的有点负责，相比较之下，另一个 boilerplate 就很好，清晰明白，也有可能上半年我在学习 electron 的时候正好用的就是第二个 boilerplate，但无论怎样，确实第二个更好。

但为了确信这一点（2 比 1 好），我又重新去确认了一下 github 的 star 数，结果，并不是这样的，第一个 boilerplate 的 star 有 1.2k，而第二个只有 176，好家伙……我的直觉真不靠谱。

不过这也解释地通，因为第一个是叫electron-typescript-react，第二个叫electron-react-typescript-webpack-boilerplate，表面上后者是前者的具化，因为名词更多。

但事实上，恰恰相反，在electron-typescript-react中反而既有webpack也有babel，其实比另一个涉及的东西还有多……从这个角度想，第一个 boilerplate 星多，应该至少归结于三个点：

名字短
加了babel，让使用者有了更多发挥的空间
bridge的接口写的棒，对使用者有更大的价值
...

Anyway，这大概就是关于目前这个问题的所有思考，和两个 boilerplate 带给我的启发。

接下来，才是工作的重头戏，那就是手上工具顺了，就要立即投身到业务实现上了。

但仍然，我开头说的还有一些问题是没有解决的，罗列如下：

为什么webpack官网的所有loader的配置都是import导入的，难道他们所有的 dependencies 都是 esm 吗？
有可能不是，因为我还看到他们的有些 loader 可以使用type: "auto"这种配置，应该就是自动对多种 type 的 js 采取不同的策略智能打包的意思，但这个可以作用于config.js的抬头吗？好像不可以吧，因为我看他们是用在rules里的。
肯定存在很多commonjs和esm混用的项目的，如果不引入babel，webpack是否可以有策略正常运行呢？目前，他们都是基于babel的吗？
如果是基于babel，我认为我之前尝试装那么多babel依赖，应该也可以正常运行啊，为什么就不能奏效呢？

关于第三点，我突然想到我上半年也写过脚本，翻出来看了一下：

可以看到，确实是要用babel_register进行子程序注册的，并且我也确实是用import进行导包的：

人老了啊，记性不行了，当时应该研究这些东西也花了一星期的，结果都一点印象没了，幸好好记性不如烂笔头，整理了当时的代码库。再回想开头看的那个关于babel_register的解决方案，内心突然就有些豁然开朗了：

不过呢，至少这个项目，我应该暂时不需要了 hhh，如果需要，到时候yarn add一顿乱装，加个babel-register脚本，改一下启动命令行，应该就行了，hhh，肯定不会像今天这样头疼了。

那就暂时先这样，对该问题的记录到此为止。

Former Record

[Problem] SyntaxError: Cannot use import statement outside a module
This discussion is very close to what I happened to, however they gave no solution, except pointed out that there may be some package using es which seems not proper for webpack, and inspired me to have a check at the target package I wanted to import as official webpack suggested me to.
javascript - Why does Error 'SyntaxError: Cannot use import statement outside a module ' when I try to use ES Modules? - Stack Overflow
The target package of remark-html I wanted to import
remark-loader | webpack
[Failed] Solution of babel-node
It seems that babel-node is a possible solution to this problem, since it when I installed it, it automatically detected what's going wrong of all my webpack loaders.
[Failed] Solution of @babel/register
How can I use ES6 in webpack.config.js? - Stack Overflow

真正唯一的收获

前段时间有个字节的小姐姐，和我表述想转码的意愿，聊下来觉得前端对她可以一试，毕竟目前各种开发岗位里前端是相对来说最容易上手的了……

但是！

这说的只是十二十年前纯粹用html + css + jquery 或者 php的时代！

现在，前端技术突飞猛进，各种范式层出不穷，变得越来越复杂了，三大框架AngularJS | React | Vue就不说了，webpack和babel绝对能让大厂的前端开发掉一层皮……

但 anyway，如果不论这些，平时写写前端真地能让人心情大好，获得感十足，用一位朋友对开发的态度来说：“平时自己写些玩玩挺好的，当做工作就太无聊了”，确实如是~

保持学习，保持热爱！

reference:

语音转写市场概览​

网易见外的缺点​

FCPX 不支持网易见外导出的 srt 字幕文件​

网易见外的转写结果需要手动进行长度切割​

网易见外只支持后期文本替换，而不支持前期预设词库​

网易见外不支持基于鼠标点击的文本位置智能跳转语音并播放​

讯飞转写​

讯飞服务价格​

讯飞语音转写控制台​

讯飞语音转写的使用​

讯飞语音转写使用分词​

讯飞语音转写程序​

基于 Automator 实现右键语音文件后台自动转写​

个人背景概要​

租房方案设计​

个人需求分析​

公寓 vs 民宅​

线上 vs 线下​

纯线下方案​

纯线上方案​

线上转线下方案​

线上平台选择​

总体设计​

租房前期准备​

了解行政区域规划​

了解地铁线路规划​

了解周遭小区分布​

小区的选择依据​

豆瓣小组租房条目的数据获取​

豆瓣小组租房信息获取的目标设计​

豆瓣小组 API 方案之分析与准备​

豆瓣小组 API 方案之设计与实现​

豆瓣小组爬虫方案之设计与实现​

豆瓣小组租房条目的数据分析​

筛选条件分析​

高德 API 之申请 Key 配置​

高德 API 调用之地理编码与逆地理编码​

高德 API 调用之公线路规划与步行规划​

高德 API 封装设计与实现​

基于 pandas 进行筛选​

人工筛选环节​

表格迭代标注法​

线上联系也有一些困难​

线下跑房环节​

时间与路线规划​

跑房（1/10）: 望京附近，3000​

跑房（2/10）：3-31陆翔佳园_3200​

跑房（3/10）：3-32珠江绿洲_3800​

跑房（4/10）：3-33周井大院_2900​

跑房（5/10）：3-34珠江帝景_3500​

跑房（6/10）：3-41劲松七区_3000​

跑房（7/10）：3-42惠生园_3300​

跑房（8/10）：3-43沿海赛洛城_2200​

跑房（9/10）：3-44百子湾家园_2500​

跑房（10/10）：3-45垂杨柳百里_3500​

投票环节​

info​

version: 0.1.0​

version: 0.0.4​

version: 0.0.3​

version: 0.0.2​

version: 0.0.1​

首先，linux 是什么​

其次，如何拥有一台 linux​

接着，如何登录 linux​

最后，如何掌控 Linux，以下给出一部分经验笔记​

-------------------------------------​

BEST-PRACTICE: ubuntu initialization​

step 0. install​

step 1. config apt source​

1. change apt source​

fastest/script way: 直接修改/etc/apt/sources.list​

robustest/recommend-for-newbie way: 在softwares & update里修改 server​

use others way: 使用别人写好的 git 仓库进行配置​

2. update apt​

step 2. config git​

step 3. config terminal​

step 4. config language​

resolution 1: config chinese input source via ibus​

FIXME: resolution 2: config chinese input source via sougou​

语音转写市场概览

网易见外的缺点

FCPX 不支持网易见外导出的 srt 字幕文件

网易见外的转写结果需要手动进行长度切割

网易见外只支持后期文本替换，而不支持前期预设词库

网易见外不支持基于鼠标点击的文本位置智能跳转语音并播放

讯飞转写

讯飞服务价格

讯飞语音转写控制台

讯飞语音转写的使用

讯飞语音转写使用分词

讯飞语音转写程序

基于 Automator 实现右键语音文件后台自动转写

个人背景概要

租房方案设计

个人需求分析

公寓 vs 民宅

线上 vs 线下

纯线下方案

纯线上方案

线上转线下方案

线上平台选择

总体设计

租房前期准备

了解行政区域规划

了解地铁线路规划

了解周遭小区分布

小区的选择依据

豆瓣小组租房条目的数据获取

豆瓣小组租房信息获取的目标设计

豆瓣小组 API 方案之分析与准备

豆瓣小组 API 方案之设计与实现

豆瓣小组爬虫方案之设计与实现

豆瓣小组租房条目的数据分析

筛选条件分析

高德 API 之申请 Key 配置

高德 API 调用之地理编码与逆地理编码

高德 API 调用之公线路规划与步行规划

高德 API 封装设计与实现

基于 pandas 进行筛选

人工筛选环节

表格迭代标注法

线上联系也有一些困难

线下跑房环节

时间与路线规划

跑房（1/10）: 望京附近，3000

跑房（2/10）：3-31陆翔佳园_3200

跑房（3/10）：3-32珠江绿洲_3800

跑房（4/10）：3-33周井大院_2900

跑房（5/10）：3-34珠江帝景_3500

跑房（6/10）：3-41劲松七区_3000

跑房（7/10）：3-42惠生园_3300

跑房（8/10）：3-43沿海赛洛城_2200

跑房（9/10）：3-44百子湾家园_2500

跑房（10/10）：3-45垂杨柳百里_3500

投票环节

info

version: 0.1.0

version: 0.0.4

version: 0.0.3

version: 0.0.2

version: 0.0.1

首先，linux 是什么

其次，如何拥有一台 linux

接着，如何登录 linux

最后，如何掌控 Linux，以下给出一部分经验笔记

-------------------------------------

BEST-PRACTICE: ubuntu initialization

step 0. install

step 1. config apt source

1. change apt source

fastest/script way: 直接修改`/etc/apt/sources.list`

robustest/recommend-for-newbie way: 在`softwares & update`里修改 server

use others way: 使用别人写好的 git 仓库进行配置

2. update apt

step 2. config git

step 3. config terminal

step 4. config language

resolution 1: config chinese input source via ibus

FIXME: resolution 2: config chinese input source via sougou