最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【记录】Mac本地Solr中新建qa的collection再运行脚本导入数据

Mac crifan 2177浏览 0评论

折腾:

【无需解决】CentOS中正在运行的Solr如何得到其所有的配置

期间,对于本地solr复制在线环境,好像行不通(至少现在不会,以后有机会再说)

现在先去本地新建qa的collection,然后再去运行脚本导入数据。

而大概步骤是:

  • 新建qa这个collection

    • 等价于mysql中的database

    • 而新建的方式,有两种:

  • 导入数据

    • 运行此处的mysql2solr.py脚本,去从mysql中的xx的qa的表中,导入数据到solr中

对此,其实是后来才知道:

本地需要先去运行Python脚本,去从已有mysql数据库中qa的表中生成需要的数据的:

本来打算,为了防止影响,而去删除本地之前拷贝到solr中的qa的整个目录的

不过听说:

solr会自动覆盖的,就暂时不删除之前solr中qa目录了

先去试试能否跑通,以及之后再去优化data数据的目录-》可以单独设置数据的目录的,而不会影响到solr的使用

不过听说需要手动新建qa,所以为了避免影响,还是删除掉从在线的环境中复制到本地的qa的目录吧:

/usr/local/Cellar/solr/7.2.1/server/solr

中的qa:

(改名,后续再删除)

然后再去想办法新建一个qa:

用命令行试试:

<code>solr create -c qa -s 2 -rf 2
</code>

运行之前,去搜搜,看看参数是什么含义

solr create

Solr in 5 minutes – Solr Tutorial.com

How do I create a Solr core on my server? | Code Enigma

Running Solr | Apache Solr Reference Guide 6.6

<code>➜  solr git:(master) ✗ solr create -help

Usage: solr create [-c name] [-d confdir] [-n configName] [-shards #] [-replicationFactor #] [-p port] [-V]

  Create a core or collection depending on whether Solr is running in standalone (core) or SolrCloud
  mode (collection). In other words, this action detects which mode Solr is running in, and then takes
  the appropriate action (either create_core or create_collection). For detailed usage instructions, do:

    bin/solr create_core -help

       or

    bin/solr create_collection -help

➜  solr git:(master) ✗ solr create_core -help

Usage: solr create_core [-c core] [-d confdir] [-p port] [-V]

  -c &lt;core&gt;     Name of core to create

  -d &lt;confdir&gt;  Configuration directory to copy when creating the new core, built-in options are:

      _default: Minimal configuration, which supports enabling/disabling field-guessing support
      sample_techproducts_configs: Example configuration with many optional features enabled to
         demonstrate the full power of Solr

      If not specified, default is: _default

      Alternatively, you can pass the path to your own configuration directory instead of using
      one of the built-in configurations, such as: bin/solr create_core -c mycore -d /tmp/myconfig

  -p &lt;port&gt;     Port of a local Solr instance where you want to create the new core
                  If not specified, the script will search the local system for a running
                  Solr instance and will use the port of the first server it finds.

  -V            Enable more verbose output.

➜  solr git:(master) ✗ solr create_collection -help

Usage: solr create_collection [-c collection] [-d confdir] [-n configName] [-shards #] [-replicationFactor #] [-p port] [-V]

  -c &lt;collection&gt;         Name of collection to create

  -d &lt;confdir&gt;            Configuration directory to copy when creating the new collection, built-in options are:

      _default: Minimal configuration, which supports enabling/disabling field-guessing support
      sample_techproducts_configs: Example configuration with many optional features enabled to
         demonstrate the full power of Solr

      If not specified, default is: _default

      Alternatively, you can pass the path to your own configuration directory instead of using
      one of the built-in configurations, such as: bin/solr create_collection -c mycoll -d /tmp/myconfig

      By default the script will upload the specified confdir directory into Zookeeper using the same
      name as the collection (-c) option. Alternatively, if you want to reuse an existing directory
      or create a confdir in Zookeeper that can be shared by multiple collections, use the -n option

  -n &lt;configName&gt;         Name the configuration directory in Zookeeper; by default, the configuration
                            will be uploaded to Zookeeper using the collection name (-c), but if you want
                            to use an existing directory or override the name of the configuration in
                            Zookeeper, then use the -c option.

  -shards &lt;#&gt;             Number of shards to split the collection into; default is 1

  -replicationFactor &lt;#&gt;  Number of copies of each document in the collection, default is 1 (no replication)

  -p &lt;port&gt;               Port of a local Solr instance where you want to create the new collection
                            If not specified, the script will search the local system for a running
                            Solr instance and will use the port of the first server it finds.

  -V                      Enable more verbose output.

➜  solr git:(master) ✗
</code>

对于命令:

<code>solr create -c qa -s 2 -rf 2
</code>

->

  • -c qa

    • -c <core>     Name of core to create

    • qa是core的名字

  • -s 2

    • 没找到,感觉像是:

      • -shards <#>             Number of shards to split the collection into; default is 1

    • 表示有2个shards

  • -rf 2

    • 没找到,感觉像是:

      • -replicationFactor <#>  Number of copies of each document in the collection, default is 1 (no replication)

    • 意思是是replicationFactor是2

后来看到:

Solr Tutorial | Apache Solr Reference Guide 7.4

是7.4的版本的教程,中间有提到:

<code>And then create a new collection:
bin/solr create -c &lt;yourCollection&gt; -s 2 -rf 2
</code>

再去确认Mac本地的solr的版本:

【已解决】Mac本地查看已安装的Solr的版本

此处是7.2.1的solr,难道是不支持-s和-rf的参数名缩写?

至少官网没有提到啊,而只有新版7.4的官网文档才提示了。

算了,先去创建试试:

<code>➜  solr git:(master) ✗ solr create -c qa -s 2 -rf 2
WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is
         NOT RECOMMENDED for production use.

         To turn it off:
            curl http://localhost:8983/solr/qa/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
</code>

感觉卡死了?

等了好几十秒才完成:

<code>➜  solr git:(master) ✗ solr create -c qa -s 2 -rf 2
WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is
         NOT RECOMMENDED for production use.

         To turn it off:
            curl http://localhost:8983/solr/qa/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'

Created new core 'qa'
</code>

生成的文件和目录结构是:

<code>➜  solr git:(master) ✗ pwd
/usr/local/Cellar/solr/7.2.1/server/solr
➜  solr git:(master) ✗ ll
total 24
-rw-r--r--  1 crifan  admin   2.9K  1 10  2018 README.txt
drwxr-xr-x  4 crifan  admin   128B  1 10  2018 configsets
drwxr-xr-x  5 crifan  admin   160B  8 21 11:55 qa
drwx------@ 5 crifan  admin   160B  8 16 17:15 qa_toDelete
-rw-r--r--  1 crifan  admin   2.1K  1 10  2018 solr.xml
-rw-r--r--  1 crifan  admin   975B  1 10  2018 zoo.cfg
➜  solr git:(master) ✗ tree qa
qa
├── conf
│   ├── lang
│   │   ├── contractions_ca.txt
│   │   ├── contractions_fr.txt
│   │   ├── contractions_ga.txt
│   │   ├── contractions_it.txt
│   │   ├── hyphenations_ga.txt
│   │   ├── stemdict_nl.txt
│   │   ├── stoptags_ja.txt
...
│   │   ├── stopwords_tr.txt
│   │   └── userdict_ja.txt
│   ├── managed-schema
│   ├── params.json
│   ├── protwords.txt
│   ├── solrconfig.xml
│   ├── stopwords.txt
│   └── synonyms.txt
├── core.properties
└── data
    ├── index
    │   ├── segments_1
    │   └── write.lock
    ├── snapshot_metadata
    └── tlog

6 directories, 47 files
➜  solr git:(master) ✗
</code>

其中,如果想要设置data的目录,参考:

https://lucene.apache.org/solr/guide/6_6/coreadmin-api.html#CoreAdminAPI-Input.1

和之前help的输出,感觉是:

传入-d参数?

<code>  -d &lt;confdir&gt;            Configuration directory to copy when creating the new collection, built-in options are:

      _default: Minimal configuration, which supports enabling/disabling field-guessing support
      sample_techproducts_configs: Example configuration with many optional features enabled to
         demonstrate the full power of Solr

      If not specified, default is: _default

      Alternatively, you can pass the path to your own configuration directory instead of using
      one of the built-in configurations, such as: bin/solr create_collection -c mycoll -d /tmp/myconfig

      By default the script will upload the specified confdir directory into Zookeeper using the same
      name as the collection (-c) option. Alternatively, if you want to reuse an existing directory
      or create a confdir in Zookeeper that can be shared by multiple collections, use the -n option
</code>

然后去运行solr的服务端:

<code>➜  solr git:(master) ✗ solr stop -all
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 41795 to stop gracefully.
➜  solr git:(master) ✗ solr start
Waiting up to 180 seconds to see Solr running on port 8983 [-]
Started Solr server on port 8983 (pid=43113). Happy searching!

➜  solr git:(master) ✗ solr status

Found 1 Solr nodes:

Solr process 43113 running on port 8983
{
  "solr_home":"/usr/local/Cellar/solr/7.2.1/server/solr",
  "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:54:21",
  "startTime":"2018-08-21T04:02:56.365Z",
  "uptime":"0 days, 0 hours, 0 minutes, 39 seconds",
  "memory":"27.2 MB (%5.5) of 490.7 MB"}
</code>

然后再去调试本地代码,导入数据

期间出错:

【已解决】python调用torndb出错:import MySQLdb.constants ModuleNotFoundError: No module named MySQLdb

然后去运行调试输出效果的代码,结果半天没反应 

-》貌似初始化需要很长时间?

不清楚

算了,不管这里的display_word2vec,去试试之前的,调试兜底对话的代码吧

另外:

<code>➜  solr git:(master) ✗ solr status

Found 1 Solr nodes:

Solr process 43113 running on port 8983
{
  "solr_home":"/usr/local/Cellar/solr/7.2.1/server/solr",
  "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:54:21",
  "startTime":"2018-08-21T04:02:56.365Z",
  "uptime":"0 days, 1 hours, 42 minutes, 0 seconds",
  "memory":"265 MB (%54) of 490.7 MB"}
</code>

看到solr的memory用了很多 -》 之前只用了20多M,现在200多MB

-》估计是保存数据的确进去了?

去看看本地文件夹,果然是data目录下多了很多数据:

<code>➜  solr git:(master) ✗ ll
total 24
-rw-r--r--  1 crifan  admin   2.9K  1 10  2018 README.txt
drwxr-xr-x  4 crifan  admin   128B  1 10  2018 configsets
drwxr-xr-x  5 crifan  admin   160B  8 21 11:55 qa
drwx------@ 5 crifan  admin   160B  8 16 17:15 qa_toDelete
-rw-r--r--  1 crifan  admin   2.1K  1 10  2018 solr.xml
-rw-r--r--  1 crifan  admin   975B  1 10  2018 zoo.cfg
➜  solr git:(master) ✗ tree qa
qa
├── conf
│   ├── lang
│   │   ├── contractions_ca.txt
...
│   │   └── userdict_ja.txt
│   ├── managed-schema
│   ├── params.json
│   ├── protwords.txt
│   ├── solrconfig.xml
│   ├── stopwords.txt
│   └── synonyms.txt
├── core.properties
└── data
    ├── index
    │   ├── _4.fdt
    │   ├── _4.fdx
    │   ├── _4.fnm
    │   ├── _4.nvd
    │   ├── _4.nvm
    │   ├── _4.si
    │   ├── _4_Lucene50_0.doc
    │   ├── _4_Lucene50_0.pos
    │   ├── _4_Lucene50_0.tim
    │   ├── _4_Lucene50_0.tip
    │   ├── _4_Lucene70_0.dvd
    │   ├── _4_Lucene70_0.dvm
    │   ├── _b.fdt
    │   ├── _b.fdx
    │   ├── _b.fnm
    │   ├── _b.nvd
    │   ├── _b.nvm
    │   ├── _b.si
    │   ├── _b_Lucene50_0.doc
    │   ├── _b_Lucene50_0.pos
    │   ├── _b_Lucene50_0.tim
    │   ├── _b_Lucene50_0.tip
    │   ├── _b_Lucene70_0.dvd
    │   ├── _b_Lucene70_0.dvm
    │   ├── _c.fdt
    │   ├── _c.fdx
    │   ├── _c.fnm
    │   ├── _c.nvd
    │   ├── _c.nvm
    │   ├── _c.si
    │   ├── _c_Lucene50_0.doc
    │   ├── _c_Lucene50_0.pos
    │   ├── _c_Lucene50_0.tim
    │   ├── _c_Lucene50_0.tip
    │   ├── _c_Lucene70_0.dvd
    │   ├── _c_Lucene70_0.dvm
    │   ├── _d.fdt
    │   ├── _d.fdx
    │   ├── _d.fnm
    │   ├── _d.nvd
    │   ├── _d.nvm
    │   ├── _d.si
    │   ├── _d_Lucene50_0.doc
    │   ├── _d_Lucene50_0.pos
    │   ├── _d_Lucene50_0.tim
    │   ├── _d_Lucene50_0.tip
    │   ├── _d_Lucene70_0.dvd
    │   ├── _d_Lucene70_0.dvm
    │   ├── _e.fdt
    │   ├── _e.fdx
    │   ├── _e.fnm
    │   ├── _e.nvd
    │   ├── _e.nvm
    │   ├── _e.si
    │   ├── _e_Lucene50_0.doc
    │   ├── _e_Lucene50_0.pos
    │   ├── _e_Lucene50_0.tim
    │   ├── _e_Lucene50_0.tip
    │   ├── _e_Lucene70_0.dvd
    │   ├── _e_Lucene70_0.dvm
    │   ├── _f.fdt
    │   ├── _f.fdx
    │   ├── _f.fnm
    │   ├── _f.nvd
    │   ├── _f.nvm
    │   ├── _f.si
    │   ├── _f_Lucene50_0.doc
    │   ├── _f_Lucene50_0.pos
    │   ├── _f_Lucene50_0.tim
    │   ├── _f_Lucene50_0.tip
    │   ├── _f_Lucene70_0.dvd
    │   ├── _f_Lucene70_0.dvm
    │   ├── segments_9
    │   └── write.lock
    ├── snapshot_metadata
    └── tlog
        └── tlog.0000000000000000007

6 directories, 120 files
➜  solr git:(master) ✗ du -sh qa/*
296K    qa/conf
4.0K    qa/core.properties
41M    qa/data
</code>

转载请注明:在路上 » 【记录】Mac本地Solr中新建qa的collection再运行脚本导入数据

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
89 queries in 0.381 seconds, using 22.18MB memory